Scripted MySQL Replication Consistency Checks

I’ve been fixing and breaking MySQL replication clusters for quite some time now, and I realized one of the biggest problems is the lack of consistency provided through MySQL replication.

Sure your data will be consistent most of the time, but how do you check if it really IS consistent across all your slaves? How do you make sure your slaves don’t have missing or invalid entries?

I’m sure you’ve all run:

mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;

Not always a good idea…

Well today I present you with a little bash script I’ve written which performs all these verifications. I haven’t invented anything. On the contrary, I’m just using the methods and tools provided by Percona in their fantastic toolset called Maatkit.

Usage:

  • Change the default “User Defined Variables” in the script, to reflect your MASTER mysql server.

  • Configure slave reporting on each slave so that “SHOW SLAVE HOSTS\G” works from the MASTER

  • Make sure your slaves are running properly: Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0

  • Make sure you have Maatkit installed (apt-get install maatkit)

  • Run the script on the MASTER like this: ./mysql_consistency.sh -c

I have tested this script on Debian Lenny (5.0) with maatkit version 4334-1 and MySQL 5.0.

How does it work?

When you run the script, after performing some necessary sanity checks, the MASTER will create a checksum of every database and every table. It will store those results in the default database called test in the table called checksum. It will then replicate the data to the SLAVES who will create their own checksums on the same databases and tables. Afterwards it will tell you which slaves are consistent and which ones are not.

[root@db01 /opt (353)]#: ./mysql_consistency.sh -c
Checking consistency
Replication Slave ID 3 on 172.16.0.63:3306 is consistent.
Replication Slave ID 4 on 172.16.0.64:3306 is consistent.
Replication Slave ID 5 on 172.16.0.65:3306 is inconsistent. Requires rebuild

You might get some error messages too.

Download the script here: mysql_consistency.sh

Please notify me in the comments of any errors or adjustements as I’ve only used this in a small test-environment.

mysql_consistency.sh 172 lines
#!/bin/sh
# Script to perform consistency checks on replicated MySQL databases
#
# (c) Alex Williams - 2009 - www.alexwilliams.ca
#
# v0.1
#
# Options:
# -c Check for inconsistent slaves
#
###############
#
# Slaves *must* have reporting enabled in their my.cnf
# example:
# [mysqld]
# report-host = 172.16.0.63
# report-port = 3306
#########################
# User Defined Variables
#########################
MYSQL_HOST="172.16.0.60" # The MASTER database IP
MYSQL_PORT="3306" # The MASTER database PORT
MYSQL_USER="username"
MYSQL_PASS="password"
MYSQL_CHECKSUM="test.checksum" # The database (test) and table (checksum) to store checksum results
# Mandatory commands for this script to work.
COMMANDS="mysql mysqladmin mk-audit mk-table-checksum mk-checksum-filter awk"
##############
# Exit Codes
##############
E_INVALID_ARGS=65
E_INVALID_COMMAND=66
E_NO_SLAVES=67
E_DB_PROBLEM=68
##########################
# Script Functions
##########################
error() {
E_CODE=$?
echo "Exiting: ERROR ${E_CODE}: $E_MSG"
exit $E_CODE
}
usage() {
echo -e "MySQL Replication Consistency - version 0.1 (c) Alex Williams - www.alexwilliams.ca"
echo -e "\nOptions: "
echo -e "\t-c\tCheck for inconsistent slave(s)"
echo -e ""
exit $E_INVALID_ARGS
}
##
# Perform sanity checks before allowing the script to run
##
sanity_checks() {
##
# Verify if commands exist
##
for command in $COMMANDS
do
##
# Set the full path of the command
##
PROG=`which $command`
if [ ! ${PROG} ]; then
##
# Error message if the command doesn't exist
##
E_MSG="missing command '$command'"
return $E_INVALID_COMMAND
else
##
# Create a variable (i.e: $prog_tar)
# substitutes all - for _ (i.e: prog_mk-audit becomes prog_mk_audit)
##
E_MSG="Command not found"
eval prog_${command//-/_}=${PROG} || return
fi
done
}
###
# Check for inconsistent slaves
###
check() {
##
# Run the mk_table_checksum command
##
E_MSG="Problem running '$prog_mk_table_checksum' at the top of check() function"
$prog_mk_table_checksum --quiet --replicate=$MYSQL_CHECKSUM --create-replicate-table --empty-replicate-table h=$MYSQL_HOST,P=$MYSQL_PORT,u=$MYSQL_USER,p=$MYSQL_PASS || return $E_DB_PROBLEM
SLAVE_LIST=`$prog_mysql --user=$MYSQL_USER --password=$MYSQL_PASS -e "SHOW SLAVE HOSTS\G"`
##
# Create arrays for the slave ids, hosts, ports
# To manually create the slave arrays, do something like this instead:
#
# slave_ids=(3 4 5)
# slave_hosts=(172.16.0.63 172.16.0.64 172.16.0.65)
# slave_ports=(3306 3306 3306)
#
##
slave_ids=(`echo "$SLAVE_LIST" | grep "Server_id" | $prog_awk -F ": " '{ print $2 }'`)
slave_hosts=(`echo "$SLAVE_LIST" | grep "Host" | $prog_awk -F ": " '{ print $2 }'`)
slave_ports=(`echo "$SLAVE_LIST" | grep "Port" | $prog_awk -F ": " '{ print $2 }'`)
##
# Define the number of slaves by the number of entries in the slave_ids[] array
##
num_slaves=${#slave_ids[*]}
index=0
if [ $num_slaves -eq 0 ]; then
echo "No Replication Slaves appear in 'SHOW SLAVE HOSTS'"
return $E_NO_SLAVES
fi
##
# verify the checksums on each replicated slave
##
while [ "$index" -lt "$num_slaves" ]
do
slave_id=${slave_ids[$index]}
slave_host=${slave_hosts[$index]}
slave_port=${slave_ports[$index]}
CHECKSUM=`$prog_mk_table_checksum --replicate=$MYSQL_CHECKSUM --replicate-check 2 h=$slave_host,P=$slave_port,u=$MYSQL_USER,p=$MYSQL_PASS` || CHECKSUM="not consistent"
if [ "$CHECKSUM" ]; then
echo "Replication Slave ID $slave_id on $slave_host:$slave_port is inconsistent. Requires rebuild"
else
echo "Replication Slave ID $slave_id on $slave_host:$slave_port is consistent."
fi
let "index = $index + 1"
done
}
for arg in "$@"
do
case $arg in
-c) arg_c=true;;
*) usage;;
esac
done
if sanity_checks; then
sanity=true
if [ $arg_c ]; then
echo "Checking consistency"
check || error
else
usage
fi
else
error
fi

Comments (2)

  • Jean-Francois

    October 4, 2009

    11:58 pm

    Very nice info.
    If the scripts finds that a slave in inconsistent, does the maatkit toolset have something to resync the slave without having to re-create the whole thing from scratch? (ex: stop sync, copy files over, yadda-yadda)

  • Alex

    October 9, 2009

    6:20 pm

    Actually maatkit does have a command: mk-table-sync which would apparently sync the differences. I don’t think it’s necessarily a bad thing to resync everything from scratch, assuming your DB doesn’t hold 500GB of data.

    If your MySQL databases are running off an LVM partition, you can rebuild a slave using the latest snapshot of a master or a “consistent” slave.