Case study : Plesk high load average during backup resolved using LVM
As a Server Management company, we monitor and maintain web hosting servers of digital marketers, web designers, web hosts, and other web solution providers.
Some of our customers use Plesk as the web hosting control panel, and we often fix load issues, backup errors and more to keep their services reliable.
Today, we’ll take a look at how we fixed a high load average issue in a Plesk server.
Chapter 1 : Alert reaction – responding to Plesk high load average
Our server experts monitor customer servers 24/7. When we see an issue, we login to the server within minutes and investigate what’s happening.
One day we saw high load average in a Plesk server that we maintain. The load was 26 and climbing, while the normal load should be 4.
We saw backup processes running past its scheduled time, and was hogging all the CPU and I/O resources.
This was causing all productions services such as HTTP, MySQL and Email to respond slow, or to even time out.
[ If your Plesk server is under high load right now, click here for expert care right now. Our Plesk experts are online 24/7 ]
Chapter 2 : Service recovery – Killing the load spike and restoring services
During events like this, our priority is always to restore quality of service.
So, we immediately killed the backup processes to bring the load down. (Of course, if it was a primary service like MySQL that caused the load, killing would not be an option. We would use other means like user limits).
Within 10 minutes the server load was back to normal, and sites were loading fast.
Now, we set out to find why the load spike happened in the first place.
[ Don’t lose sleep over server issues. Our server experts will monitor your servers 24/7 and rescue your server from errors. Click here to know more. ]
Chapter 3 : Investigation – Finding out why Plesk backups were causing high load
From disk errors to connectivity issues, there are a hundred reasons for backups to fail.
To make sure the same issue won’t recur, we had to find exactly how the issue started.
We found that the backup disk was full at 100%.
Over time many customer accounts had grown in size, and the backup disk used for daily backups was no longer adequate.
This was causing backup process to stall, but not terminate itself. It was holding a big chunk of server memory, and not releasing it to other services such as HTTP or MySQL.
This forced the other services to use swap memory, and use the hard disk more – leading to high disk I/O, and high server load.
So, unless we resolved the disk space issue, this error would recur again. At that point, we had 3 options:
- Reduce the number of items to backup. For example, this server was set to backup HTTP, Database, Mail and Log data. Mail and Log could be removed.
- Decrease the backup frequency. Instead of taking backups everyday, we could save space by doing it once a week or twice a week.
- Increase the backup space. This server had a separate disk to store weekly, and monthly backups. We could merge these two disks using LVM to increase the backup space available.
[ Timely detection and preventive action will keep your servers stable and secure. Click here to know how Bobcares can keep your servers rock solid. ]
Chapter 4 : Resolution – Using LVM to increase backup space across 2 disks
We discussed these options with our customers, and found that daily backups were not needed, and we really needed only web and database backups.
So, we set a new backup routine that took:
- Weekly once backup
- Web and Database backup
- 3 week backup retention
This step would reduce the backup space usage, and prevent the backups running during business days. It would prevent any more Plesk high load averages.
However, over time the backups could grow, and still lead to the backup drive getting full.
We needed a way to prevent that.
Using Logical Volume Manager in Linux to extend /backup disk space
Plesk stores all backups in a drive called /backup.
This server had a low capacity drive for /backup, but had an additional drive (called /backup2) to store monthly backups and other ad-hoc files used by development team.
We decided to merge the space of these two drives into a single virtual disk space using a Linux Kernel feature called Logical Volume Manager.
Let’s skip the technical details for brevity, but here are the general steps we followed:
- One copy of backup was retained on /backup so that we’ll always have a usable backup even during the disk merge procedure.
- An LVM volume was created in the /backup drive and another one was created in /backup2 drive. These two partitions were then merged.
- A fresh backup was taken into the newly created LVM.
- Old backups were deleted, and all unused space was added into the new unified backup space.
This process preserved a working backup at all times, while extending the backup space.
By using LVM, we were able to set the backup process on a solid foundation that will not cause a disk or server load issue in a long time to come.
Backup processes are a main cause of high load issues in Plesk servers. Today we’ve seen how Bobcares detected a high load issue, recovered production services, and implemented a permanent fix to backup errors to prevent the same issue recurring.