How we achieved high uptime in VPS hosting using oVirt high availability configuration
On Oct 10, Google Drive and Google Docs went down for 3 hours. Google services has been synonymous with high uptime, and a lot of people even used (and still use) Google.com to check if their laptop connection is up. So, predictably, the internet had a meltdown over it.
A reaction like this is a far cry from how things were just 5 years back. People didn’t mind an online service being unavailable for a few mins, or even a couple of hours. That’s not the case any more. Downtimes are rare, and internet users expect 100% uptime for every website they access.
To meet this demand, website owners now insist on high uptime from their hosting providers. This is especially true for premium hosting services such as VPS hosting and Cloud hosting.
In a recent post, we covered how we used oVirt to implement a competitive VPS hosting solution. Today we’ll go a bit further into how we achieved 99.9% uptime by configuring oVirt for high availability.
How high availability works in oVirt
The oVirt system is centrally managed by a server called oVirt engine. The oVirt engine keeps tabs on each server in the hosting system. If a server (say Node 01) becomes unreachable, all VPS running in that server would be transferred to other servers in the system. Users of those VPS would notice a small break in services, but everything would be back online in a couple of minutes.
This works because operating system and application information of all VPS are stored in a shared storage space accessible by all servers. As you can see in the image, “Node 01”, “Node 02” and “Node 03” access the same “Shared storage”. So, VMs in “Node 01” can work equally well in “Node 02” or “Node 03” as long as the “Shared storage” is accessible from those servers.
For high availability to work in oVirt, there were a few pre-conditions to be met. This includes a highly available shared storage device, power management on all servers, and surplus resources on all servers to accommodate VMs from other servers. Let’s take a look at them one by one.
[ Looking for custom plugins to manage your portals? Contact us to get tailor-made plugins to serve your business purposes. ]
Configuring shared storage
A shared storage is at the core of a highly available system. Unlike traditional dedicated servers, VPS in a highly available cluster store their operating system and applications in an external, high-speed storage device that sits outside the servers.
So, even if a VPS’s host server goes down, the storage device remains online. It is then just a matter of starting the VPS in another host server to bring the services back online.
In the cloud system we implemented, one of the shared storage devices we used was a RAID 10 array. All servers in the VPS system had access to this storage device. This made it possible for all VPS using that shared device to run off any host server in the cloud.
We chose a high-speed, redundant storage device such as RAID 10 array for this purpose because high availability would work only if the storage device remains online at all times.
[ You don’t have to lose your sleep to keep your customers happy. Get the best tech support specialists to care for your customers 24/7. ]
Configuring power management on hosts
As mentioned earlier, oVirt is centrally managed by a server called oVirt engine. It is the oVirt engine’s job to detect if a server has gone down, and initiate a VPS transfer.
Now, consider a scenario where the network cable between oVirt engine and a cloud server is cut, but the cloud server is still connected to the shared storage device. The oVirt engine would think that the server is offline and create clones of VMs in that server on other cloud servers. This will essentially corrupt the data of all the VMs hosted on that cloud server.
To avoid this situation, oVirt REQUIRES that power management be accessible on all cloud servers for high availability to function. When oVirt detects a cloud server to be offline, first it’ll try to shutdown the server by turning off the power. ONLY IF the power shutdown is a success, will it attempt to put the VMs on another server.
So, before we enabled high availability in VMs, we configured power management for all servers in the cloud. It is done by navigating to “System” –> “Data Centers” –> “Clusters” –> “Hosts” –> “Edit”. The power management fields were filled in as shown here:
Planning surplus resources to accommodate fail-over VMs
Let’s say there are 25 VMs in a cloud server called “Node 01”. These VMs would be allocated CPU and Memory resources that are carved from “Node 01’s” CPU and Memory capability. Now, let’s say the 25 VMs are allocated 50 GB of memory and 30 CPU cores in total. Then, for high availability to work, the rest of the servers in the cloud system should have a SURPLUS capability of 50 GB RAM and CPU cycles equaling 30 CPU cores.
For example, in the cloud system we implemented, we started off with 3 cloud servers which had 32 core CPUs and 64 GB RAM memory. The maximum resource that we could allocate on one server was 45 GB memory and 25 CPUs (with a bit of overselling). This allocation policy left ample space for VMs from one failed server to be evenly distributed over the other two.
[ Looking for the WHMCS plugin to manage your oVirt interface? Get our WHMCS plugin for oVirt management here. ]
Enabling high availability
Once the shared storage, power management and surplus resource planning was completed in the oVirt system, the VPSs were then ready to be enabled with “High Availability” fail-overs.
To do this, we enabled the “Highly Available” option under “System” –> “Data Centers” –> “Clusters” –> “VMs” –> “Edit” –> “High Availability”. Based on the hosting plan, the priority of fail-overs were selected in that interface. In case of a server failure, a VM marked as “High” priority would be started first, thereby minimizing downtime.
High availability is a core feature in any VPS hosting solution. Here we’ve covered how high availability was implemented for a VPS hosting system using oVirt. Bobcares helps web hosts, VPS providers and cloud providers deliver industry standard VPS services through custom configuration and preventive maintenance of virtualized systems.