In any web server infrastructure, database servers are often the slowest. This is because databases fetch data from good ol’ hard disks instead of the super fast server memory. High speed storage such as RAID and SSDs have been able to reduce the latency, but if a site visit requires database access, it’ll induce significant page load delay.
An effective way to reduce this latency is to use database clusters, where database requests are served by multiple servers instead of a single server. In this way, each disk need to service only a lower number of requests, thereby lowering the query wait time. The latest entrant among MySQL compatible DB clusters is Percona’s XtraDB.
Percona XtraDB cluster combines the performance advantages of InnnoDB engine, and master-master replication capabilities of Galera to produce a high performance database cluster. Each node (aka database server) in the XtraDB cluster can receive both read and write requests, which enables even distribution of database load across many servers.
Now, the question is how to distribute database queries across multiple servers.
[ Looking for ways to create a MySQL cluster? Here’s a quick guide on how to do it. ]
How to distribute database load in a Percona cluster?
There are several ways in which database queries can be split into several back-end servers. The most common method (also the default setting in most load balancers) is called “Round Robin” where database requests are divided equally among all back-end servers.
However, in reality this may not be the best load balancing method.
We manage Percona XtraDB clusters of several high traffic websites. In these servers, we see database queries of varying complexity. Some queries with several “JOIN” statements take longer to execute, while some gets done in a few seconds. If “Round Robin” load balancing is used in these sites, some servers will be over-loaded while some remain under-utilized.
To resolve this, we use a load balancing method called “Least Connections” in Percona XtraDB clusters. In this method, new requests are sent to back-end servers based on how many connections are active for each server. So, if a server has more un-used “capacity” to receive more connections, it’ll get priority in receiving new queries.
Additionally, some back-end servers might be of higher power, which means it can handle higher load. So, we configure the load balancer in such a way that the high power servers receive proportionately higher number of connections. In many load balancing systems this is known as “weight” – where a weight of “3” assigned to a server means, it’ll receive 3x number of queries than other servers.
How to setup Percona load balancing
Database servers talk to each other, and web servers using a network protocol called “TCP”. So, to distribute load among a cluster of Percona servers, we’ll need a TCP load balancer. Among the many open source load balancing tools, the most popular ones are HAProxy and Nginx.
Let’s take a quick look at how to setup HAProxy and Nginx as load balancers.
HAProxy as Percona load balancer
As we explained above, to make load balancing effective, database queries need to be distributed according to the capability of the servers in the cluster, and in a way that all servers are put to maximum use. Here are a few important settings we configure in HAProxy.
Load balancing method is set to LeastConn for fast response
The default setting of HAProxy is to use “Round Robin” scheduling method. Each server in the cluster is given equal number of connections in this system, causing some servers with complex queries to be overloaded, and some others to be lightly loaded. So, we set the “balance” setting to “leastconn” to send queries to the least busy server.
balance leastconn
Assign server “weight” to distribute queries proportional to server capacity
In the Percona clusters we manage, not all servers have the same hardware configuration. We assign more queries to high power servers using the “weight” configuration setting.
server percona-2 10.2.1.4:3306 check weight 50
server percona-3 10.2.1.5:3306 check weight 75
Setup advanced cluster health check using “clustercheck” script
It is important to detect if a server in the cluster has a problem, and if so, remove it from the database cluster to avoid data loss. This can be detected using two variables called “wsrep_local_state” and “wsrep_cluster_status” in each server. These variables can show if a cluster is out of sync with other servers, or if the database is damaged.
We use a Bash script called clustercheck to test the health of the server every 2 seconds. The script is ran as a server process, and is configured to listen on port 9200. HAProxy is then configured to connect to port 9200, and get the health status every 2 seconds. This is done using the “port” configuration setting.
server percona-2 10.2.1.4:3306 check port 9200
If the “clustercheck” script finds the database to be unstable, it sends a “503 Service Unavailable” signal to HAProxy, and HAProxy then stops sending it new queries until the server is brought back online.
We custom modified the “clustercheck” script to send our support team an alert in case of a downtime. This allows our 24/7 emergency support staff to quickly fix a cluster issue, and avoid service degradation.
Configure “inter” and “downinter” to tweak the frequency of health checks
The integrity of the data depends on detecting a database issue as soon as it happens. We configure HAProxy to check a database server every 2 seconds using the “inter” parameter, and if the server is found to be unstable, check it for all OK, every 5 minutes (we manually verify every database downtime, before its brought back online) using the “downinter” parameter.
server percona-2 10.2.1.4:3306 check inter 2000 downinter 300000
There are many other settings such as “maxconn”, “rise”, “fall”, etc. that can be customized for your Percona cluster’s unique characteristics. So, it is best to get expert opinion before you deploy your HAProxy load balancing system for Percona.
Nginx as Percona load balancer
Nginx has traditionally been a web load balancer. Since version 1.9, the open source version of Nginx support TCP load balancing as well. Since most websites already use Nginx reverse proxies for web load balancing, it is easier (and less complex) to use Nginx for database load balancing as well.
Here are a few settings we configure in Nginx load balancers.
Use “least_conn” setting to enable fast response
By default, Nginx distributes database queries evenly to all servers in the cluster. This can lead to some servers with complex queries to get overloaded while others being under-utilized. So, we use the “Least Connections” method of load balancing where new queries are sent to the server with the least number of active connections. The setting is defined under the “upstream” code block.
upstream dbcluster {
least_conn;
Set “weight” parameter to direct more traffic to high power servers
In the Percona clusters we manage, not all servers have the same hardware configuration. We assign more queries to high power servers using the “weight” configuration setting inside the “upstream” code block. A server with weight “5” will get 5 times more number of queries than one with weight “1”.
server perconadb1:3306 weight=5;
Set “fail_timeout” and “max_fails” to limit retries, and exclude a failed server from cluster
It is important to detect a database server failure as soon as possible, and exclude it from the cluster to avoid data loss. We set Nginx to exclude a server from the cluster if it records 2 connection failures within 5 seconds. The server is brought back online only if a support engineer manually verifies that all is in order.
server perconadb1:3306 max_fails=2 fail_timeout=5s;
It must be noted that the open source version of Nginx does not have health check features. So, unlike the HAProxy configuration, we cannot configure active health checks using the “clustercheck” script.
In these cases, we use a custom version of the :clustercheck” script to block the IP of Nginx load balancer in the corrupted database server. When the server becomes inaccessible, the Nginx proxy excludes the server from the cluster.
The script then sends a notification to our 24/7 emergency server administration team, who finds out the root cause, fixes the issue, and removes the IP block to include the server back into the cluster.
How to maintain a Percona load balancer
Much like any other system, load balancers need to be monitored and continually tweaked to maintain high uptime and performance. Network errors, hardware errors, or even traffic spikes can cause load balancer nodes to respond poorly, or even go offline. Here we’ll go through what we do to maintain load balancing solutions in top order.
Setup and maintain a high availability switch – In a load balanced database cluster, failure of a single database server wont affect the uptime of the website. However, if the load balancer goes down, the whole site gets cut off from the cluster. To prevent this possibility, we use high availability switches such as KeepAliveD to transfer control to a backup server in case the primary goes down. The functionality is tested weekly to make sure everything will work when it’s needed.
Monitor logs to detect errors – Load balancer logs are a gold mine of information on the health of the network. If there’s a delay in connecting to the cluster, or if there are connection errors, it’ll show up in the log. We continuously monitor the logs, and detect negative performance trends. Proactive steps are taken to resolve these before it can affect website performance.
Monitor server metrics to detect errors – In cases of traffic spikes (such as a marketing campaign, DoS attacks, etc.), the number of sessions can exceed the maximum connection limits on the database servers. Quick action is needed to kill stale connections and free up resources for legitimate ones. So, we use monitoring tools such as Zabbix to keep track of critical metrics such as “no: of sessions”, “no: of denied connections”, “no: of connection errors”, etc. to detect system issues.
Keep the software updated – Each new release of load balancing software brings with it new features, better routing, and better security. We maintain test systems identical to production servers, in which we apply the new updates, and test it for proper functioning. If everything works fine, we apply the updates to live servers, thus ensuring a smooth rollover to a new version.
Business websites need an infrastructure that is cost effective, while capable of fast performance. In this solution we explained how we used HAProxy and Nginx to load balance Percona XtraDB clusters. By choosing open source load balancing solutions, we were able to lower the infrastructure costs, while ensuring high uptime.
Bobcares helps business websites of all sizes achieve world-class performance and uptime, using tried and tested website architectures. If you’d like to know how to make your website more reliable, we’d be happy to talk to you.
For as low as
$74.99/server/mo
Get full spectrum infrastructure management services – including setup, monitoring & maintenance.
Never again face a critical business downtime. We keep your servers secured, optimized and updated at all times. Our engineers monitor your servers 24/7 and fix issues before it can affect your customers.
You may want to enable the proxy protocol between HAProxy and percona DB server, to know the client IP address and perform more accurate privileges rules in your DB server.
Really happy to see your comment Baptiste! 🙂
Yes, proxy protocol and ACLs are required in an actual production deployment. This article (which in itself is huge) is just a high level overview for those who are considering switching to Percona from MySQL for performance and high availability reasons. These answer the most common questions that come up from web app owners during initial deployment.
The infrastructure features are customized for each web app based on their business needs.
Thanks again for stopping by. Its a pleasant surprise. 🙂