Server Maintenance Plan – Your key to a fast, secure server
Owning a server is much like owning a car.
You don’t have to be an automobile engineer to ride around in a car. It’ll take you places. You just need to fill in gas, check air pressure, and occasionally get it serviced.
In the same way, you don’t have to be a server expert to own (or rent) one and run your business apps in it.
But many server owners forget to get it routinely serviced. They pay their bills and assume all will be well.
In reality, a server needs MORE maintenance than a car. Unlike a car, a server runs 24 hours a day, 7 days a week and performs millions of transactions a week.
All this use is bound to cause some wear and tear, though it is not easy to spot them. Let me tell you how:
Why should you maintain your server?
Over time, your server accumulates mileage on its hardware, software, database and server settings.
- Old server settings will become inadequate to handle increased number of daily transactions.
- Server software will become vulnerable to new attacks.
- SQL tables will get fragmented.
- Hard disks will eventually degrade or fail.
- ..and more
These can result in sluggish services at the best, and data loss or information theft at the worst.
The good news is that all these are easily preventable.
All you need is a plan. A Server Maintenance Plan.
What is a Server Maintenance Plan?
To go back to the car analogy, let’s say you’ll fill in gas every week or so. You’ll check your tyre pressure every month, and you’ll probably get the engine, lights and others checked every 6 months.
In the same way, different things in the server needs to be checked at different time periods.
- Software updates – Anti-virus and software updates can come in any day.
- Security log reviews – Abusive users, site visitors or bots can take away resources from legitimate users, and should be blocked ASAP.
- Vulnerability disclosures – Software vendors and security channels report unpatched vulnerabilities or attacks that are making the round. Emergency patching will protect your data.
- Backup verification – Because if backups aren’t working, you’ll lose only 1 week’s data.
- Disk usage audit – Stale accounts, unfinished backups, old temporary files, etc. can consume disk space, causing disk space issues.
- Database optimization – Because busy databases can get 3% – 5% fragmentation in a month.
- Application tuning – Because traffic patterns can vary in two months, and speed will be affected with unoptimized settings.
- 24 hours a day – Server health monitoring – Server trouble can be spotted by early signs such as load spike, pending mails, etc. Early detection can prevent a total downtime. Some common parameters are:
- RAID health
- Server temperature
- Load average
- Open network connections
- ..and more
A Server Maintenance Plan is a schedule of such checks that needs to be done on an hourly, daily, weekly or monthly basis.
But, how do you do it?
Large companies hire dedicated inhouse server technicians, or get a remote server administrator to take care of their servers.
Medium and small sized companies get part time staff or remote tech staff, or get server management companies to build and execute a server maintenance plan.
However, if you want to do it yourself, here’s a quick primer.
How to build a Maintenance Plan ideal for your servers
One way to approach this is to segment Server Maintenance Activities based on its objective, and then find out what needs to be done to achieve that objective.
- Emergency response – These activities should enable you to:
- Know if something bad is happening to your server.
- Quickly restore service quality in case something goes wrong.
- Preventive actions – These are pro-active audits and checks done on your system to prevent a possible service degradation or misuse. It can include:
- Security checks
- Performance audits
- Resource usage audits
- Insurance actions – These are activities that’ll enable you quickly restore your services (ensuring business continuity) in case your server fails or becomes unavailable. It can include:
- Backup audits
- Mirror fail-over tests
- High availability tests
Building an Emergency Response Plan
Race cars, rocket ships and racing bikes are built for performance. But the components that are likely to fail in each will differ from one another.
In the same way, the kind of software and hardware components that are likely to fail will differ in a database server, mail server and a web or application server.
So, there isn’t a one-size-fit-all list of what needs to be monitored for failure in a server.
You’ll need to think through about what are the common ways in which your server can fail, and how to detect them early.
Let me illustrate by a limited example.
A web server is likely to have the following issues:
- Capacity errors – A sudden surge in traffic might exhaust the memory, and overload the disk, causing sluggish response.
- Abusive users – In a shared environment, some users might run resource heavy scripts (eg. backups) causing server load.
- Brute forcing by bots – Large scale exploits are done by botnets by executing thousands of simultaneous queries on websites.
- Buggy scripts – Poorly coded scripts can cause memory leaks or other resource over-usage.
- Network failures – Web servers could lose connection to backend DB servers or other app servers.
- Hardware errors – From RAID degradation to temperature issues, a wide range of issues can cause the server to function poorly or to freeze.
- Malware injection – Undisclosed vulnerabilities could be used by hackers to inject malware into the server.
- IP / Website reputation issues – Malware injected websites could be detected by search engines, and incoming traffic could drop like a rock.
- ..and more
Server parameters linked to each of these issues need to be monitored 24/7. It can include load average, memory usage, I/O usage and more.
Once you make a list of all these scenarios and server parameters that fit your server type, list down the actions you’ll need to take to get the service back online.
The emergency actions need to be thought out ahead of time because you won’t have time to stop and think when services are failing.
If all that sounds too complicated, don’t worry, you can get server experts to monitor your server 24/7 for as little as $24.99/month.
Creating a Preventive Maintenance Plan
The objective of preventive maintenance is to audit and tune each part of your server and services so that it won’t fail.
Again, what you need to check will vary depending on the kind of server you are running.
Let’s take an example of a Database SQL server.
A MySQL server maintenance plan will include:
- Defragmentation (aka Table Optimization) – Frequent “deletes” in your database leave your tables fragmented. Optimize tables once a month to prevent performance issues and loss of free space.
- Analyzing (optimizing indexes) – MySQL uses indexes to quickly find the data it needs. Run “Analyze” approx once a month to streamline indexes, and make query execution faster.
- Integrity checks – Occasionally, MySQL indexes lose track of a data set due to DB crashes or app errors. Check database integrity weekly to prevent query errors.
- Disk health check – HDD or RAID errors are logged in server logs. Such errors are an early indicator for an impending failure, and you can take actions to replace the disk.
- Space usage check – Your database needs room to grow, to take backups and to do large transactions. Check for stale files, temporary files or old backups once a month.
- Cluster efficiency analysis – DB clusters should sync the data efficiently to prevent query lag and data errors. Early detection of sync lags can prevent costly DB crashes.
- Error log audit – MySQL servers log errors if it detects index or table corruption. Regular error log audits will prevent unexpected downtime.
- Slow query analysis – MySQL logs poorly performing queries to a file. Weekly analysis of these queries and server tweaking can prevent performance logs.
- Server speed audit – Monthly sped tests can show how efficiently MySQL server is executing queries. By detecting and fixing bottlenecks early on, you can avoid performance issues.
- ..and more
If you have a web server or a mail server, the kind of challenges will be different.
You’ll need a bit of experience to know what is likely to fail in a server, and where can you see the early indicators of it.
If you are not sure what to include in a Preventive Maintenance Plan for your server, you can get expert help for as little as $69.99/month.
Planning for Disaster Recovery
Sooner or later your server hard disk will fail. Some data might get lost.
The important question at that point is, “how soon can you get back on your feet?”
If you are prepared for that eventuality, it can be as little as 1 minute.
Here at Bobcares, we maintain servers of web hosts, data centers and other online service providers.
Each of our clients have varying levels of availability requirements. Some can tolerate many hours of downtime. Some cannot tolerate even a minute of downtime.
So, we deploy a wide range of solutions to ensure business continuity that range from high availability clusters and fault-tolerant hardware to fail-over mirrors and incremental backups.
At the very least, your Disaster Recovery Plan should include backup audits. Some checks in that are:
- Status check – Did the backups complete successfully every day? Did it show errors?
- Data integrity check – Are the backup archives corrupted? Is data retrievable from them?
- Disk space check – Is the disk running out of space? Will it have space to receive the next week’s archive?
- Recovery process check – Will the current recovery method work? Are there connectivity errors or other such issues that will prevent a quick recovery?
The starting point of building a Disaster Recovery Plan is to ask yourself how quickly you want to be able to restore services. Then work backwards towards the systems needed, cost involved, and what trade offs you are willing to accept.
If you need help with setting up and/or maintaining a disaster recovery plan, click here to talk to our server experts. We can get it done for you in a few hours.