Bobcares

Setting up Percona high availability database cluster

by | Mar 16, 2016

MySQL is by far the most popular database used for web applications. It is a very easy database to use, but it doesn’t have the best uptime record. Many database issues such as corrupted tables, resource limits, or replication errors often lead to website downtime.

Many of these database issues can be avoided by proactive database maintenance. However, issues like hard disk crash, or network errors can still happen, leading to database corruption. An effective hedge against such issues is to use a database cluster.

Database high availability using Percona XtraDB Cluster (PXC)

High availability for databases is achieved through clustering. When a database is replicated into several physical servers, even if one server goes down, the website can continue to function using the other servers.

Why MySQL clustering isn’t reliable

MySQL supports clustering. The most popular method is called “master-master” replication, where a web application can write to any database server in the cluster, and the changes will be immediately replicated in all other servers. However, MySQL clusters are known to produce inconsistent data (data that’s different in each server) when the cluster nodes are unable to talk to each other.

So, even though MySQL clusters deliver high availability, the data couldn’t be relied upon.

PXC uses Galera technology to avoid data inconsistency

It is this problem that was fixed by a MySQL clustering solution known as “Galera”. It performs frequent consistency checks, so that if a database node is unable to talk to others, it won’t accept any new updates, and will remove itself from the cluster.

Percona XtraDB Cluster (PXC) combined Galera technology and MySQL’s InnoDB engine to produce a fast, highly available cluster, which is why we recommend our customers to use PXC for their websites.

[ Thinking of implementing a highly available database system for your website? Here’s how we implemented PXC on a high traffic website. ]

Setting up a Percona XtraDB Cluster

The number of nodes in a PXC cluster, the connection limits on each node, etc. depend on the performance requirements of each business. However, there are a few design and configuration decisions that are applicable on a wide variety of web infrastructures. Let’s take a look at those.

Designing a database service architecture

Before a Percona cluster can be setup, it must be decided how the web servers will communicate with the database servers. There are many architectures to evenly distribute database queries to multiple database servers. Some websites connect each web server to each database server (many to many links), while some use a central load balancer to distribute the load.

We manage websites of several online businesses. For these websites, we recommend using a central load balancer where all traffic is channeled through a central load balancer. This design minimizes the load on the database servers (fewer health checks), and enables better monitoring and control.

percona high availability cluster

The database queries are distributed evenly using a load balancer.

Configuring PXC for optimal performance and security

Now, coming to the PXC setup in itself, there are several performance, stability and security considerations involved in deciding the cluster configuration.

Here are a few common settings we configure in the PXC servers that we manage for our customers:

1. Using odd number DB nodes to avoid database corruption

To split the database load, we can use an even number of nodes (like 2) or odd number (eg 3). We recommend using odd-number cluster because it gives a degree of protection against “split-brain” condition. For eg., in a 2-node cluster, if 1 node is out of sync, there is no way to decide who has the latest data. If there are 3 nodes, 2 nodes can agree what is the most recent data.

2. Enabling binary logging for easy fault recovery

To make the cluster robust against failures, we setup a system called “binary logging” in the servers we manage. This enables rolling back a change (like dropping a table, deleting important data, etc.) very easy. This is not enabled by default in Galera clusters, but we’ve seen from experience that accidents like this do happen for customized WordPress websites.

3. Custom firewall configuration to disallow external access

An important consideration in setting up the cluster is security configuration. Galera does not work well with SELinux and default IPtables configuration. In the servers we manage for our customers, we custom configure IPtables so that only DB servers and load balancers can talk to each other, and all external IPs are blocked.

4. Setting up auto-commits to fix update conflicts

In a multi-master cluster, there are situations where the same database row is updated by two users at the same time (known as parallel updates). In these cases, XtraDB approves the first query, while the second query is blocked, leading to a “deadlock error”. We custom configure cluster settings so that any query in deadlock status will be re-attempted multiple times. This is done using the parameter “wsrep_retry_autocommit”.

[ Need to know more about setting up Percona load balancing? Click here to know what are the challenges and their solutions. ]

Setting up high availability of load balancer

With a Percona cluster with 3 or more nodes, a website downtime can be avoided even if 2 database nodes go down. But, if the load balancer crashes for some reason, all 3 nodes would be become inaccessible. To negate this single-point-failure, we use a “backup” load balancer to immediately take over if the main load balancer fails for some reason.

Use a backup load balancer to avoid single point failure

To implement the system, first two load balancer servers are setup with the exact same configuration. Configuration settings and other data are synched to keep both servers current, so that at any point of time, either of them can work as the primary load balancer.

Now, to enable auto-switching to backup load balancer, we need a software that can detect failure of the main server, and bring the backup server online.

KeepAliveD uses virtual IPs (aka floating IPs) to bring backup server online

To enable auto-switching, we use a tool called KeepAliveD. The IPs of the load balancers (both “main” and “backup”) are configured to be controlled by KeepAliveD. In both load balancer servers, KeepAliveD program keeps in touch with each other. If the KeepAliveD program in “backup” HAProxy detects that the “main” server is down, it’ll assume the ownership of the IP, and bring it online in the backup server.

This concept is called “Floating IP” – in that the IP can “float” to another server when the main server goes down. By employing this technique, we achieve virtually 100% database uptime for our customers.

Percona high availability

KeepAliveD controls a pool of virtual IPs which is switched to “backup” server if “main” goes down.

[ Using virtualized infrastructure, high availability can be configured without using specialized HA software. Here’s how we setup high availability using oVirt open virtualization. ]

Business websites need an infrastructure that is highly available, while being cost effective. In this solution we explained how we use Percona XtraDB Cluster (PXC) and KeepAliveD to achieve high availability for databases. By choosing open source load balancing solutions, we were able to lower the infrastructure costs, while ensuring high uptime.

Bobcares helps business websites of all sizes achieve world-class performance and uptime, using tried and tested website architectures. If you’d like to know how to make your website more reliable, we’d be happy to talk to you.

For as low as

$74.99/server/mo

Get full spectrum infrastructure management services – including setup, monitoring & maintenance.

Never again face a critical business downtime. We keep your servers secured, optimized and updated at all times. Our engineers monitor your servers 24/7 and fix issues before it can affect your customers.

SEE SUPPORT PLANS

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Never again lose customers to poor
server speed! Let us help you.