Backup and Recovery Strategy for a Webhost
There are two major events that haunt every Webhost. One is the ghastly image of the word “HacKeD” popping up on all of the sites on their server, usually accompanied by a skull. Neon green always being the color of choice. The second is usually in the form of a mail or phone call informing you that the hard drive on your server just went bust. As far as problems for Webhosts go, these rank right up there with “Houston…we have a problem.”
Fortunately there is a way to get your men safely back to Earth, without having to reinvent the wheel, and that is documenting your Backup and Recovery strategy. In this article we will see how this can help you, in the case of a failed drive. This strategy could also be applied if your server is hacked, but only after you have patched the holes that allowed it to get hacked in the first place. As you can imagine, there is no universal strategy that is applicable to all hosts. It is always a duel of various factors and constraints. Let us take a look at what common backup and recovery options are available out there.
Backups stored on the Server(Single Drive):
There are some common options available, usually in the form of a “Backups” section in your hosting control panel. For a start up host, this is the easiest and cheapest. Larger hosts would also do good to take advantage of this option in addition to any other backup strategy they have in place. Most opt for generating account backups using the backup feature of the control panel and then designing the rest of their backup strategy around protecting these backups.
- Cheap and easy to setup.
- Backups created on a per account basis.
- Customers can be encouraged to keep a copy.
- Large accounts can delay the entire backup process.
- Failure of drive will result in loss of backups also.
Backups stored on Backup Drive:
Once you have a set of backups, storing them on the same drive as the rest of the data is not a good idea. A drive failure will result in the loss of all data. So a secondary drive just to store backups is the next step. A failure of the primary drive would require only that you replace it with a fresh one, load the OS and control panel and begin restoring the accounts. If the backup drive fails, it can be replaced and new backups can be generated. Secondary hard-drives are quite cheap and should be part of a bare minimum backup strategy.
- Cheap and easy to setup.
- Failure of one drive will still leave the other drive available.
- Backups can be quickly and easily restored.
- If server is inaccessible backups are inaccessible too.
- Disk usage on backup drive should be monitored. Creating new backups of a drive that is
full will result in corrupt backups or failure of backup.
Backups stored in DC storage servers(NAS):
Now what if both drives fail? Though highly unlikely, a good backup strategy takes into account every possible scenario. This is where off-server backups come in. An important point to look for is if this service is provided by your DC itself. This usually means they will have a private network setup, over which data can be transferred from your servers to their storage servers and back.
- Even if server fails backups are still accessible
- Backups will have to be transferred from storage to the server before they can be restored.
- Private network required to avoid bandwidth usage, or clogging of available bandwidth.
Another option worth mentioning are mirrored RAID drives. The failure of one of the drives in a RAID array can be fixed by simply swapping the drive and letting the RAID array rebuild the data from the other drives. An important point to note is that this is only a solution to a bad drive. If the data itself is corrupt, it will be mirrored on all drives. So choose a RAID array for their performance benefits and not as part of your primary backup plan.
Backups stored outside the DC:
What if your DC is victim to a natural disaster? Or your DC does not provide off-server storage? This is where you will have to acquire the services of a 3rd party online backup service provider. You can choose one based on your budget and the level of control you wish to have over your backups. You should make sure that they offer encryption of the data being transferred, as your data will be traveling over the Internet. You would probably want to have this as an alternative to your primary backup strategy and perform this once a month or more frequently, if required.
- Even if DC is inaccessible, backups will be available
- Transfer rates will be high
- Will use up significant amount of bandwidth
- Expensive, but fairly simple to setup
A comparatively new, but expensive, technology in the Webhosting sphere is Continuous Data Protection(CDP). Even though the name CDP suggests that data is constantly being backed up, the various products on offer for Webhosts provide a more “near-continuous” backup, in that backups can be made much more frequently that conventional backups methods. They avoid the overhead caused by conventional backups by first creating a completely copy of the data and then updating that set with only the changes made to the files since the last backup. This significantly reduces the time taken for backups and allows for the storage of a larger number of previous backup sets.
- Backups can be made much more frequently.
- Backup plan highly customizable via GUI.
- Complete drives can be backed up. Thus enabling the server to be restored to a state exactly as it was before a failure.
- Expensive and fairly complicated to setup.
- Dedicated backup server recommended.
Designing your Backup Strategy:
When it comes to designing your backup strategy, the points you should keep in mind are:
1) How frequent?
This will primarily depend on the type of sites you are hosting. Blog, Forum, Shopping etc related sites will require a minimum of a daily backup. If you go for a CDP strategy you could configure it to take backups every hour. Personal, Photo, Video etc related sites wont be updated that frequently but will tend to take up more space. In these cases cost of backup space, and transfer rates will be your main concern. Weekly backups should suffice for these sites.
This will primarily effect the transfer rate and accessibility of the backups. Backups stored on the server itself can be restored the quickest, but a server failure will require manual intervention. You will have to have the DC connect the backup drive to a new server, or wait till they restore the faulty component. Backups stored off-server and within the DC can be used to restore backups to the new server as soon as it is online. Backups stored outside the DC will involve lower transfer rates, but will be available even if the DC is not.
The backup options mentioned above were presented in the increasing order of cost. Choose one that fits your budget and backup requirements.
4) Value of data
Multiple backups. Remember that every system can fail, even your DC backup storage server. So multiple backups are the only way to ensure that at least one set of backups is available at any time.
5) Keeping a list of modifications
Being a webhost, you are bound to be constantly updating software or installing new software on customer request. Backing them up is not always possible and doing so is not recommended. The server you are recovering to may not be exactly the same as the one it was backed up from. A change in the version of your OS may have undesirable effects. Keep a list of modifications made to the server and reinstall the software on the new server.
Just as important as your Backup strategy is your Recovery strategy. This is what truly determines the total downtime involved in the case of a hard drive failure. Based on your backup strategy you may have various sets of backups to choose from. Once you have selected the set of backups from which you are going to restore the data you have two primary recovery options:
Restoring on the same server:
The simplest of the recovery options, it just involves the recovery of the data from the stored backups. The total downtime will include time to transfer data from backup storage to the server, and the time to restore the data from these backups. This is where the transfer rates between the remote backup and your server will count.
Restoring to a different server:
Here a different server is referring to a change in IP address, at the same DC or a different one, for any reason. The problem with a change in IP address is that customers that visited the sites before the failure, would have to wait up to 24 hours before their DNS cache clears and they get the new IP address details. It could take up to 24 hours because that is usually how long it takes for new nameserver IP details to propagate over the Internet. So even if you are able to restore your sites in 8 hours, visitors may not be able to visit them for another 16 hours. To avoid this, you could run your nameservers on a different dedicated server/VPS. The obvious advantage is that only the IP address of the site will have to be changed. Since a sites IP address is usually only cached for 6-8hours, the downtime will be less.
If you are lucky enough to get advanced warning about a failing drive, you could set lower TTL values for the DNS records of your sites. The TTL determines how long a visitor to the site caches that sites DNS information. Do remember that you will have to wait till after the initial TTL period before the new TTL value takes effect. So if you change the TTL value from 8 hours to 5 minutes, you will have to wait for 8 hours before the TTL of 5 minutes takes effect.
CDP software offers the option to backup an entire drive, bit by bit. The advantage is that if you wish you can restore this data to the new drive. The recovery time for these kind of restores vary, and may not be any more faster than a restore from other types of backups, but once the recovery is complete you can be sure that everything will be as it was when that backup set was taken.
Designing your Recovery Strategy:
When designing your recovery strategy the points to keep in mind are:
1) Prioritize backups
If you have multiple sets of backups, you should prioritize them. This can be based on age, accessibility,integrity(chance of backup being corrupt or contain malicious data)
2) Determining expected downtime
Based on your backup strategy, you will probably have multiple backup sets to chose from. Based on the accessibility and size of the backups you must determine the expected time it will take for the backups to be restored.
3) Customer handling procedure
You should determine how you and your support team will handle your customers before, during and after the recovery. This can range from informing customers of an expected drive replacement, to what compensation should be provided to the customer for the downtime.
Now that you know of the various backup and recovery options available, it is important that you document your “Backup and Recovery” strategy. This is important because it gives all the members of your team a clear picture of available backups and the recovery procedure to follow. This greatly reduces the time spent preparing for and proceeding with the recovery, so the only down time will be the actual time taken to restore the data. Part of the backup and recovery strategy should also consist of the responses to give your customers and how frequently. This is especially important because it allows your support team to give your customer, an estimated time for recovery. The more time you spend documenting your backup and recovery strategy now, the less downtime your customers will experience in the case of such an event and the better prepared your support team is to handle it. If you don’t have one, hope this article helps you get started!
About the Author
Hamish, works as a Senior Software Engineer in Bobcares. He joined Bobcares in July 2004, and is an expert in Control panels and Operating systems used in the Web Hosting industry. He is highly passionate about linux and is a great evangelist of open-source. When he is not on his xbox, he is an avid movie lover and critic.