10
08/09
7:00 am
Using HAProxy for MySQL failover and redundancy
(Update 1 – Aug 29, 2009:) It appears this configuration wasn’t working with HAProxy version 1.3.20 due to the “option nolinger” feature. I have removed it from the configuration and can confirm it works well with HAProxy v.1.3.15 to v.1.3.20. Because of this, you’ll also notice a significant increase in TIME_WAIT sessions, as well as ip_conntrack_count increasing from ~150 to ~925.
This post summarizes my reflections on failover, redundancy, and ultimately scaling MySQL databases using load-balancing software known as HAProxy.
At my current employer, we have been using HAProxy to build very simple server clusters to help clients scale their databases. It works for most people assuming their application:
- Has an acceptable ratio of reads/writes (i.e: 100:1)
- Can separate reads and writes at the application level
If your read/write ratio is lower, that’s when you need to look into different scaling solutions such as sharding.
I’ve designed a slightly more complex HAProxy configuration file which load-balances requests to MySQL databases. It detects failures such as broken replication and offline servers, and adjusts the availability of servers accordingly.
Each database server is running an xinetd daemon. Port 9201 is used to monitor replication and port 9200 is used to monitor mysql status. These ports are monitored by HAProxy as you will see in the configuration file below.
HAProxy backend to monitor replication
128 129 130 131 132 133 | backend db01_replication mode tcp balance roundrobin option tcpka option httpchk server db01 172.16.0.60:3306 check port 9201 inter 1s rise 1 fall 1 |
HAProxy backend to monitor mysql status
168 169 170 171 172 173 | backend db01_status mode tcp balance roundrobin option tcpka option httpchk server db01 172.16.0.60:3306 check port 9200 inter 1s rise 2 fall 2 |
I modified the mysqlchk_status.sh script found at SysBible with my own.
The mysqlchk_replication.sh script is similar to the one above, except it checks a few other variables such as Slave_IO_Running, Slave_SQL_Running and Seconds Behind Master. Success will always return a ‘200 OK’ and failures will always return a ‘503 Service Unavailable’.
- 2 HAProxy load-balancers in Active-Passive mode (VRRP using Keepalived)
- 2 MySQL database servers with Master-Master replication in Active-Passive mode
- 3 MySQL database servers with slave replication (read-only)
Failure scenarios
Based on a small set of failure scenarios, we’re able to determine how the load balancers should direct traffic. We obviously don’t want read requests from a database server who’s not replicating its master. We also don’t want to send writes to a server who’s offline. The examples below describe how HAProxy will react in those scenarios.
1. Replication breaks, lags, or stops working on DB02
- DB01 becomes the master database server.
- HAProxy stops sending requests to DB02 and DB05 (its slave).
- Despite this, DB01 and DB05 are still able to receive replicated data from DB02.
2. Replication breaks, lags, or stops working on DB01
- DB02 becomes the master database server.
- HAProxy stops sending requests to DB01, DB03 and DB04 (its slaves).
- Despite this, DB02, DB03 and DB04 are still able to receive replicated data from DB01.
3. Replication breaks, lags, or stops working on DB01 & DB02
- There is no writable master database server. Service is severely degraded and action should be taken to bring one master server back into replication.
- This is a split-brain problem. Both servers are online, but they aren’t replicating each other.
- HAProxy only sends read requests to DB01 and DB02
- HAProxy stops sending requests to DB03, DB04 and DB05 (the slaves).
- Despite this, DB03 and DB04 are still able to receive replicated data from DB01.
- Despite this, DB05 is still able to receive replicated data from DB02.
4. DB02 is offline, due to a server crash or something similar
- DB01 becomes the master database server.
- HAProxy stops sending requests to DB02 and DB05 (its slave).
- DB05 can’t receive replicated data from DB02.
- DB01 goes into backup mode which can have different settings to support more concurrency, send alerts, etc.
5. DB01 is offline, due to a server crash or something similar
- DB02 becomes the master database server.
- HAProxy stops sending requests to DB01, DB03 and DB04 (its slaves).
- DB03 and DB04 can’t receive replicated data from DB01.
- DB02 goes into backup mode which can have different settings to support more concurrency, send alerts, etc.
6. DB01 and DB02 are offline, due to a server crash or something similar
- There is no master database server.
- HAProxy stops sending requests to all DB servers.
- Call your sysadmin because your website is probably down.
Download
WARNING / DISCLAIMER
This configuration has not been tested in a production environment and should be used at your own risk.
Here are the scripts and config files, or scroll down to view the code:
WARNING / DISCLAIMER
This configuration has not been tested in a production environment and should be used at your own risk.













