Alex Williams http://www.alexwilliams.ca/blog High-Availability Guru Thu, 30 Jun 2001 15:30:45 +0000 http://wordpress.org/?v=2.8.6 en hourly 1 Scripted MySQL Replication Consistency Checks http://www.alexwilliams.ca/blog/2009/10/01/scripted-mysql-replication-consistency-checks/ http://www.alexwilliams.ca/blog/2009/10/01/scripted-mysql-replication-consistency-checks/#comments Fri, 02 Oct 2009 00:45:03 +0000 Alex http://www.alexwilliams.ca/blog/?p=481 I’ve been fixing and breaking MySQL replication clusters for quite some time now, and I realized one of the biggest problems is the lack of consistency provided through MySQL replication.

Sure your data will be consistent most of the time, but how do you check if it really IS consistent across all your slaves? How do you make sure your slaves don’t have missing or invalid entries?

I’m sure you’ve all run:

mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;

Not always a good idea…

Well today I present you with a little bash script I’ve written which performs all these verifications. I haven’t invented anything. On the contrary, I’m just using the methods and tools provided by Percona in their fantastic toolset called Maatkit.

Usage:

  • Change the default “User Defined Variables” in the script, to reflect your MASTER mysql server.

  • Configure slave reporting on each slave so that “SHOW SLAVE HOSTS\G” works from the MASTER

  • Make sure your slaves are running properly: Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0

  • Make sure you have Maatkit installed (apt-get install maatkit)

  • Run the script on the MASTER like this: ./mysql_consistency.sh -c

I have tested this script on Debian Lenny (5.0) with maatkit version 4334-1 and MySQL 5.0.

How does it work?

When you run the script, after performing some necessary sanity checks, the MASTER will create a checksum of every database and every table. It will store those results in the default database called test in the table called checksum. It will then replicate the data to the SLAVES who will create their own checksums on the same databases and tables. Afterwards it will tell you which slaves are consistent and which ones are not.

[root@db01 /opt (353)]#: ./mysql_consistency.sh -c
Checking consistency
Replication Slave ID 3 on 172.16.0.63:3306 is consistent.
Replication Slave ID 4 on 172.16.0.64:3306 is consistent.
Replication Slave ID 5 on 172.16.0.65:3306 is inconsistent. Requires rebuild

You might get some error messages too.

Download the script here: mysql_consistency.sh.txt

Please notify me in the comments of any errors or adjustements as I’ve only used this in a small test-environment.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
#!/bin/sh
 
# Script to perform consistency checks on replicated MySQL databases
#
# (c) Alex Williams - 2009 - www.alexwilliams.ca
#
# v0.1
#
# Options: 
#	  -c      Check for inconsistent slaves
#
###############
#
# Slaves *must* have reporting enabled in their my.cnf
# example:
# 	[mysqld]
# 	report-host     = 172.16.0.63
# 	report-port     = 3306
 
 
#########################
# User Defined Variables
#########################
 
MYSQL_HOST="172.16.0.60"	# The MASTER database IP
MYSQL_PORT="3306"		# The MASTER database PORT
MYSQL_USER="username"
MYSQL_PASS="password"
MYSQL_CHECKSUM="test.checksum"	# The database (test) and table (checksum) to store checksum results
 
# Mandatory commands for this script to work.
COMMANDS="mysql mysqladmin mk-audit mk-table-checksum mk-checksum-filter awk"	
 
##############
# Exit Codes
##############
 
E_INVALID_ARGS=65
E_INVALID_COMMAND=66
E_NO_SLAVES=67
E_DB_PROBLEM=68
 
##########################
# Script Functions
##########################
 
error() {
        E_CODE=$?
        echo "Exiting: ERROR ${E_CODE}: $E_MSG"
 
        exit $E_CODE   
}
 
usage() {
	echo -e "MySQL Replication Consistency - version 0.1 (c) Alex Williams - www.alexwilliams.ca"
	echo -e "\nOptions: "
	echo -e "\t-c\tCheck for inconsistent slave(s)"
	echo -e ""
 
	exit $E_INVALID_ARGS
}
 
##
# Perform sanity checks before allowing the script to run
##
sanity_checks() {
	##
	# Verify if commands exist
	##
	for command in $COMMANDS
	do
		##
		# Set the full path of the command
		##
		PROG=`which $command`
		if [ ! ${PROG} ]; then
			##
			# Error message if the command doesn't exist
			##
			E_MSG="missing command '$command'"
			return $E_INVALID_COMMAND
		else
			##
			# Create a variable (i.e: $prog_tar)
			# 	substitutes all - for _ (i.e: prog_mk-audit becomes prog_mk_audit)
			##
			E_MSG="Command not found"
			eval prog_${command//-/_}=${PROG} || return
		fi
	done
}
 
 
###
# Check for inconsistent slaves
###
check() {
	##
	# Run the mk_table_checksum command
	##
	E_MSG="Problem running '$prog_mk_table_checksum' at the top of check() function"
 
	$prog_mk_table_checksum --quiet --replicate=$MYSQL_CHECKSUM --create-replicate-table --empty-replicate-table h=$MYSQL_HOST,P=$MYSQL_PORT,u=$MYSQL_USER,p=$MYSQL_PASS || return $E_DB_PROBLEM
 
	SLAVE_LIST=`$prog_mysql --user=$MYSQL_USER --password=$MYSQL_PASS -e "SHOW SLAVE HOSTS\G"`
 
	##
	# Create arrays for the slave ids, hosts, ports
	# To manually create the slave arrays, do something like this instead:
	#
	# slave_ids=(3 4 5)
	# slave_hosts=(172.16.0.63 172.16.0.64 172.16.0.65)
	# slave_ports=(3306 3306 3306)
	#
	##
	slave_ids=(`echo "$SLAVE_LIST" | grep "Server_id" | $prog_awk -F ": " '{ print $2 }'`)
	slave_hosts=(`echo "$SLAVE_LIST" | grep "Host" | $prog_awk -F ": " '{ print $2 }'`)
	slave_ports=(`echo "$SLAVE_LIST" | grep "Port" | $prog_awk -F ": " '{ print $2 }'`)
 
	##
	# Define the number of slaves by the number of entries in the slave_ids[] array
	##
	num_slaves=${#slave_ids[*]}
 
	index=0
 
	if [ $num_slaves -eq 0 ]; then
		echo "No Replication Slaves appear in 'SHOW SLAVE HOSTS'"
		return $E_NO_SLAVES
	fi
 
	##
	# verify the checksums on each replicated slave
	##
	while [ "$index" -lt "$num_slaves" ]
	do
		slave_id=${slave_ids[$index]}
		slave_host=${slave_hosts[$index]}
		slave_port=${slave_ports[$index]}
 
		CHECKSUM=`$prog_mk_table_checksum --replicate=$MYSQL_CHECKSUM --replicate-check 2 h=$slave_host,P=$slave_port,u=$MYSQL_USER,p=$MYSQL_PASS` || CHECKSUM="not consistent"
 
		if [ "$CHECKSUM" ]; then
			echo "Replication Slave ID $slave_id on $slave_host:$slave_port is inconsistent. Requires rebuild"
		else
			echo "Replication Slave ID $slave_id on $slave_host:$slave_port is consistent."
		fi
		let "index = $index + 1"
	done
}
 
 
for arg in "$@"
do
	case $arg in
	-c) arg_c=true;;
	*) usage;;
	esac
done
 
if sanity_checks; then
	sanity=true
 
	if [ $arg_c ]; then
		echo "Checking consistency"
		check || error
	else
		usage
	fi
else
	error
fi
]]>
http://www.alexwilliams.ca/blog/2009/10/01/scripted-mysql-replication-consistency-checks/feed/ 3
Using HAProxy for MySQL failover and redundancy http://www.alexwilliams.ca/blog/2009/08/10/using-haproxy-for-mysql-failover-and-redundancy/ http://www.alexwilliams.ca/blog/2009/08/10/using-haproxy-for-mysql-failover-and-redundancy/#comments Mon, 10 Aug 2009 11:00:34 +0000 Alex http://www.alexwilliams.ca/blog/?p=268 (Update 1 – Aug 29, 2009:) It appears this configuration wasn’t working with HAProxy version 1.3.20 due to the “option nolinger” feature. I have removed it from the configuration and can confirm it works well with HAProxy v.1.3.15 to v.1.3.20. Because of this, you’ll also notice a significant increase in TIME_WAIT sessions, as well as ip_conntrack_count increasing from ~150 to ~925.

This post summarizes my reflections on failover, redundancy, and ultimately scaling MySQL databases using load-balancing software known as HAProxy.

haproxy-01

At my current employer, we have been using HAProxy to build very simple server clusters to help clients scale their databases. It works for most people assuming their application:

  • Has an acceptable ratio of reads/writes (i.e: 100:1)
  • Can separate reads and writes at the application level

If your read/write ratio is lower, that’s when you need to look into different scaling solutions such as sharding.

I’ve designed a slightly more complex HAProxy configuration file which load-balances requests to MySQL databases. It detects failures such as broken replication and offline servers, and adjusts the availability of servers accordingly.

Each database server is running an xinetd daemon. Port 9201 is used to monitor replication and port 9200 is used to monitor mysql status. These ports are monitored by HAProxy as you will see in the configuration file below.

HAProxy backend to monitor replication

128
129
130
131
132
133
backend db01_replication
	mode tcp
	balance roundrobin
	option tcpka
	option httpchk
	server db01 172.16.0.60:3306 check port 9201 inter 1s rise 1 fall 1

HAProxy backend to monitor mysql status

168
169
170
171
172
173
backend db01_status
	mode tcp
	balance roundrobin
	option tcpka
	option httpchk
	server db01 172.16.0.60:3306 check port 9200 inter 1s rise 2 fall 2

I modified the mysqlchk_status.sh script found at SysBible with my own.

The mysqlchk_replication.sh script is similar to the one above, except it checks a few other variables such as Slave_IO_Running, Slave_SQL_Running and Seconds Behind Master. Success will always return a ‘200 OK’ and failures will always return a ‘503 Service Unavailable’.

servers-01My test setup

  • 2 HAProxy load-balancers in Active-Passive mode (VRRP using Keepalived)
  • 2 MySQL database servers with Master-Master replication in Active-Passive mode
  • 3 MySQL database servers with slave replication (read-only)

Failure scenarios

Based on a small set of failure scenarios, we’re able to determine how the load balancers should direct traffic. We obviously don’t want read requests from a database server who’s not replicating its master. We also don’t want to send writes to a server who’s offline. The examples below describe how HAProxy will react in those scenarios.

1. Replication breaks, lags, or stops working on DB02

haproxy-02

servers-02

  • DB01 becomes the master database server.
  • HAProxy stops sending requests to DB02 and DB05 (its slave).
  • Despite this, DB01 and DB05 are still able to receive replicated data from DB02.

2. Replication breaks, lags, or stops working on DB01

haproxy-03

servers-03

  • DB02 becomes the master database server.
  • HAProxy stops sending requests to DB01, DB03 and DB04 (its slaves).
  • Despite this, DB02, DB03 and DB04 are still able to receive replicated data from DB01.

3. Replication breaks, lags, or stops working on DB01 & DB02

haproxy-04

servers-04

  • There is no writable master database server. Service is severely degraded and action should be taken to bring one master server back into replication.
  • This is a split-brain problem. Both servers are online, but they aren’t replicating each other.
  • HAProxy only sends read requests to DB01 and DB02
  • HAProxy stops sending requests to DB03, DB04 and DB05 (the slaves).
  • Despite this, DB03 and DB04 are still able to receive replicated data from DB01.
  • Despite this, DB05 is still able to receive replicated data from DB02.

4. DB02 is offline, due to a server crash or something similar

haproxy-05

servers-05

  • DB01 becomes the master database server.
  • HAProxy stops sending requests to DB02 and DB05 (its slave).
  • DB05 can’t receive replicated data from DB02.
  • DB01 goes into backup mode which can have different settings to support more concurrency, send alerts, etc.

5. DB01 is offline, due to a server crash or something similar

haproxy-06

servers-06

  • DB02 becomes the master database server.
  • HAProxy stops sending requests to DB01, DB03 and DB04 (its slaves).
  • DB03 and DB04 can’t receive replicated data from DB01.
  • DB02 goes into backup mode which can have different settings to support more concurrency, send alerts, etc.

6. DB01 and DB02 are offline, due to a server crash or something similar

haproxy-07

servers-07

  • There is no master database server.
  • HAProxy stops sending requests to all DB servers.
  • Call your sysadmin because your website is probably down.

Download

WARNING / DISCLAIMER
This configuration has not been tested in a production environment and should be used at your own risk.

Here are the scripts and config files, or scroll down to view the code:

The xinetd config file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#
# /etc/xinetd.d/mysqlchk
#
service mysqlchk_write
{
        flags           = REUSE
        socket_type     = stream
        port            = 9200
        wait            = no
        user            = nobody
        server          = /opt/mysqlchk_status.sh
        log_on_failure  += USERID
        disable         = no
        only_from       = 172.16.0.0/24 # recommended to put the IPs that need
                                    # to connect exclusively (security purposes)
}
 
service mysqlchk_read
{
        flags           = REUSE
        socket_type     = stream
        port            = 9201
        wait            = no
        user            = nobody
        server          = /opt/mysqlchk_replication.sh
        log_on_failure  += USERID
        disable         = no
        only_from       = 172.16.0.0/24 # recommended to put the IPs that need
                                    # to connect exclusively (security purposes)
}

The mysqlchk_status script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#!/bin/bash
#
# /opt/mysqlchk_status.sh
#
# This script checks if a mysql server is healthy running on localhost. It will
# return:
#
# "HTTP/1.x 200 OK\r" (if mysql is running smoothly)
#
# - OR -
#
# "HTTP/1.x 500 Internal Server Error\r" (else)
#
# The purpose of this script is make haproxy capable of monitoring mysql properly
#
# Author: Unai Rodriguez
#
# It is recommended that a low-privileged-mysql user is created to be used by
# this script. Something like this:
#
# mysql> GRANT SELECT on mysql.* TO 'mysqlchkusr'@'localhost' \
#     -> IDENTIFIED BY '257retfg2uysg218' WITH GRANT OPTION;
# mysql> flush privileges;
#
# Script modified by Alex Williams - August 4, 2009
#       - removed the need to write to a tmp file, instead store results in memory
 
MYSQL_HOST="172.16.0.60"
MYSQL_PORT="3306"
MYSQL_USERNAME="replication_user"
MYSQL_PASSWORD="replication_pass"
 
#
# We perform a simple query that should return a few results :-p
 
ERROR_MSG=`/usr/bin/mysql --host=$MYSQL_HOST --port=$MYSQL_PORT --user=$MYSQL_USERNAME --password=$MYSQL_PASSWORD -e "show databases;" 2>/dev/null`
 
#
# Check the output. If it is not empty then everything is fine and we return
# something. Else, we just do not return anything.
#
if [ "$ERROR_MSG" != "" ]
then
        # mysql is fine, return http 200
        /bin/echo -e "HTTP/1.1 200 OK\r\n"
        /bin/echo -e "Content-Type: Content-Type: text/plain\r\n"
        /bin/echo -e "\r\n"
        /bin/echo -e "MySQL is running.\r\n"
        /bin/echo -e "\r\n"
else
        # mysql is down, return http 503
        /bin/echo -e "HTTP/1.1 503 Service Unavailable\r\n"
        /bin/echo -e "Content-Type: Content-Type: text/plain\r\n"
        /bin/echo -e "\r\n"
        /bin/echo -e "MySQL is *down*.\r\n"
        /bin/echo -e "\r\n"
fi

The HAProxy config file

WARNING / DISCLAIMER
This configuration has not been tested in a production environment and should be used at your own risk.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
# HAProxy configuration - haproxy-db.cfg
 
##
## FRONTEND ##
##
 
# Load-balanced IPs for DB writes and reads
#
frontend db_write
	bind 172.16.0.50:3306
	default_backend cluster_db_write
 
frontend db_read
	bind 172.16.0.51:3306
	default_backend cluster_db_read
 
# Monitor DB server availability
#
frontend monitor_db01
	#
	# set db01_backup to 'up' or 'down'
	#
	bind 127.0.0.1:9301
	mode http
	#option nolinger
 
	acl no_repl_db01 nbsrv(db01_replication) eq 0
	acl no_repl_db02 nbsrv(db02_replication) eq 0
	acl no_db01 nbsrv(db01_status) eq 0
	acl no_db02 nbsrv(db02_status) eq 0
 
	monitor-uri /dbs
	monitor fail unless no_repl_db01 no_repl_db02 no_db02
	monitor fail if no_db01 no_db02
 
frontend monitor_db02
	#
	# set db02_backup to 'up' or 'down'
	#
	bind 127.0.0.1:9302
	mode http
	#option nolinger
 
	acl no_repl_db01 nbsrv(db01_replication) eq 0
	acl no_repl_db02 nbsrv(db02_replication) eq 0
	acl no_db01 nbsrv(db01_status) eq 0
	acl no_db02 nbsrv(db02_status) eq 0
 
	monitor-uri /dbs
	monitor fail unless no_repl_db01 no_repl_db02 no_db01
	monitor fail if no_db01 no_db02
 
frontend monitor_db03
	#
	# set db03 read-only slave to 'down'
	#
	bind 127.0.0.1:9303
	mode http
	#option nolinger
 
	acl no_repl_db03 nbsrv(db03_replication) eq 0
	acl no_repl_db01 nbsrv(db01_replication) eq 0
	acl db02 nbsrv(db02_status) eq 1
 
	monitor-uri /dbs
	monitor fail if no_repl_db03
	monitor fail if no_repl_db01 db02
 
frontend monitor_db04
	#
	# set db04 read-only slave to 'down'
	#
	bind 127.0.0.1:9304
	mode http
	#option nolinger
 
	acl no_repl_db04 nbsrv(db04_replication) eq 0
	acl no_repl_db01 nbsrv(db01_replication) eq 0
	acl db02 nbsrv(db02_status) eq 1
 
	monitor-uri /dbs
	monitor fail if no_repl_db04
	monitor fail if no_repl_db01 db02
 
frontend monitor_db05
	#
	# set db05 read-only slave to 'down'
	#
	bind 127.0.0.1:9305
	mode http
	#option nolinger
 
	acl no_repl_db05 nbsrv(db05_replication) eq 0
	acl no_repl_db02 nbsrv(db02_replication) eq 0
	acl db01 nbsrv(db01_status) eq 1
 
	monitor-uri /dbs
	monitor fail if no_repl_db05
	monitor fail if no_repl_db02 db01
 
# Monitor for split-brain syndrome
#
frontend monitor_splitbrain
	#
	# set db01_splitbrain and db02_splitbrain to 'up'
	#
	bind 127.0.0.1:9300
	mode http
	#option nolinger
 
	acl no_repl01 nbsrv(db01_replication) eq 0
	acl no_repl02 nbsrv(db02_replication) eq 0
	acl db01 nbsrv(db01_status) eq 1
	acl db02 nbsrv(db02_status) eq 1
 
	monitor-uri /dbs
	monitor fail unless no_repl01 no_repl02 db01 db02
 
##
## BACKEND ##
##
 
# Check every DB server replication status
#	- perform an http check on port 9201 (replication status)
#	- set to 'down' if response is '503 Service Unavailable'
#	- set to 'up' if response is '200 OK'
#
backend db01_replication
	mode tcp
	balance roundrobin
	option tcpka
	option httpchk
	server db01 172.16.0.60:3306 check port 9201 inter 1s rise 1 fall 1
 
backend db02_replication
	mode tcp
	balance roundrobin
	option tcpka
	option httpchk
	server db02 172.16.0.61:3306 check port 9201 inter 1s rise 1 fall 1
 
backend db03_replication
	mode tcp
	balance roundrobin
	option tcpka
	option httpchk
	server db03 172.16.0.63:3306 check port 9201 inter 1s rise 1 fall 1
 
backend db04_replication
	mode tcp
	balance roundrobin
	option tcpka
	option httpchk
	server db04 172.16.0.64:3306 check port 9201 inter 1s rise 1 fall 1
 
backend db05_replication
	mode tcp
	balance roundrobin
	option tcpka
	option httpchk
	server db05 172.16.0.65:3306 check port 9201 inter 1s rise 1 fall 1
 
# Check Master DB server mysql status
#	- perform an http check on port 9200 (mysql status)
#	- set to 'down' if response is '503 Service Unavailable'
#	- set to 'up' if response is '200 OK'
#
backend db01_status
	mode tcp
	balance roundrobin
	option tcpka
	option httpchk
	server db01 172.16.0.60:3306 check port 9200 inter 1s rise 2 fall 2
 
backend db02_status
	mode tcp
	balance roundrobin
	option tcpka
	option httpchk
	server db02 172.16.0.61:3306 check port 9200 inter 1s rise 2 fall 2
 
# DB write cluster
# 	Failure scenarios:
#	- replication 'up' on db01 & db02 	= writes to db01
#	- replication 'down' on db02 		= writes to db01
#	- replication 'down' on db01 		= writes to db02
#	- replication 'down' on db01 & db02	= go nowhere, split-brain, cluster FAIL!
#	- mysql 'down' on db02 				= writes to db01_backup
#	- mysql 'down' on db01 				= writes to db02_backup
#	- mysql 'down' on db01 & db02 		= go nowhere, cluster FAIL!
#
backend cluster_db_write
	#
	# - max 1 db server available at all times
	# - db01 is preferred (top of list)
	# - db_backups set their 'up' or 'down' based on results from monitor_dbs
	#
	mode    tcp
	option  tcpka
	balance roundrobin
	option  httpchk GET /dbs
	server  db01 172.16.0.60:3306 weight 1 check port 9201 inter 1s rise 2 fall 1
	server  db02 172.16.0.61:3306 weight 1 check port 9201 inter 1s rise 2 fall 1 backup
	server  db01_backup 172.16.0.60:3306 weight 1 check port 9301 inter 1s rise 2 fall 2 addr 127.0.0.1 backup
	server  db02_backup 172.16.0.61:3306 weight 1 check port 9302 inter 1s rise 2 fall 2 addr 127.0.0.1 backup
 
# DB read cluster
# 	Failure scenarios
#	- replication 'up' on db01 & db02 	= reads on db01, db02, all db_slaves
#	- replication 'down' on db02 		= reads on db01, slaves of db01
#	- replication 'down' on db01 		= reads on db02, slaves of db02
#	- replication 'down' on db01 & db02 = reads on db01_splitbrain and db01_splitbrain only
#	- mysql 'down' on db02 				= reads on db01_backup, slaves of db01
#	- mysql 'down' on db01 				= reads on db02_backup, slaves of db02
#	- mysql 'down' on db01 & db02 		= go nowhere, cluster FAIL!
#
backend cluster_db_read
	#
	# - max 2 master db servers available at all times
	# - max N slave db servers available at all times except during split-brain
	# - dbs track 'up' and 'down' of dbs in the cluster_db_write
	# - db_backups track 'up' and 'down' of db_backups in the cluster_db_write
	# - db_splitbrains set their 'up' or 'down' based on results from monitor_splitbrain
	#
	mode    tcp
	option  tcpka
	balance roundrobin
	option  httpchk GET /dbs
	server  db01 172.16.0.60:3306 weight 1 track cluster_db_write/db01
	server  db02 172.16.0.61:3306 weight 1 track cluster_db_write/db02
	server  db01_backup 172.16.0.60:3306 weight 1 track cluster_db_write/db01_backup
	server  db02_backup 172.16.0.61:3306 weight 1 track cluster_db_write/db02_backup
	server  db01_splitbrain 172.16.0.60:3306 weight 1 check port 9300 inter 1s rise 1 fall 2 addr 127.0.0.1
	server  db02_splitbrain 172.16.0.61:3306 weight 1 check port 9300 inter 1s rise 1 fall 2 addr 127.0.0.1
	#
	#	Scaling & redundancy options
	#	- db_slaves set their 'up' or 'down' based on results from monitor_dbs
	#	- db_slaves should take longer to rise
	#
	server  db03_slave 172.16.0.63:3306 weight 1 check port 9303 inter 1s rise 5 fall 1 addr 127.0.0.1
	server  db04_slave 172.16.0.64:3306 weight 1 check port 9304 inter 1s rise 5 fall 1 addr 127.0.0.1
	server  db05_slave 172.16.0.65:3306 weight 1 check port 9305 inter 1s rise 5 fall 1 addr 127.0.0.1
]]>
http://www.alexwilliams.ca/blog/2009/08/10/using-haproxy-for-mysql-failover-and-redundancy/feed/ 14
Dear email, we’re done! http://www.alexwilliams.ca/blog/2009/07/15/dear-email-were-done/ http://www.alexwilliams.ca/blog/2009/07/15/dear-email-were-done/#comments Thu, 16 Jul 2009 03:58:30 +0000 Alex http://www.alexwilliams.ca/blog/?p=265 I’ve decided I am going to attempt to give up using email.

I’ve grown tired of sifting through spam messages in order to find the few who are legit.

I’m tired of mail folders, filters, labels, stars, and every other doohicky used to help “classify” my messages which I’ll still be unable to find 2 years later because there’s just too many.

I’m tired of emails from companies and their “DONOTREPLY”, and who really don’t reply when I reply to them. What kind of customer service is that?

Regardless, no one needs email anymore. Only machines need email. For what? Confirming that i’m not a machine? Mailinator will take care of that. It’s temporary. Perfect! Everyone else can use one of the 40 other methods to contact me.

Other people have done it.

Now I’m aware of the implications. Leaving my precious “messages” in the hands of other operators, such as the famously insecure Twitter, but in reality, does it even matter? If there is an important piece of communication that I absolutely must have in the event of an internet apocalypse, I can simply “Print to PDF” and store it on my laptop, which is super convenient considering I perform daily backups.

Finally, with all the technologies that exist, and fantastic web services, I come to wonder why more people aren’t doing this. I know I’m not alone. Maybe you should try it out too.

]]>
http://www.alexwilliams.ca/blog/2009/07/15/dear-email-were-done/feed/ 0
Twitter – me too! http://www.alexwilliams.ca/blog/2009/07/05/twitter-me-too/ http://www.alexwilliams.ca/blog/2009/07/05/twitter-me-too/#comments Mon, 06 Jul 2009 01:19:44 +0000 Alex http://www.alexwilliams.ca/blog/?p=256 twitter_logo_header
Well i’ve finally decided to start using twitter a few weeks ago. I’m definitely *not* an early adopter (see first tweet), but it’s not too late.

I find it much more convenient to write quick twitter updates, as opposed to long well-thought blog posts.

Follow me: @alexandermensa.

]]>
http://www.alexwilliams.ca/blog/2009/07/05/twitter-me-too/feed/ 2
New articles coming soon! http://www.alexwilliams.ca/blog/2009/04/10/new-articles-coming-soon/ http://www.alexwilliams.ca/blog/2009/04/10/new-articles-coming-soon/#comments Fri, 10 Apr 2009 23:35:37 +0000 Alex http://www.alexwilliams.ca/blog/?p=171 Unfortunately (for me) I haven’t been idling on a beach without my MacBook these last 2 months. I’ve actually been typing away like a mad-scientist on a big shiny VMware server (which I setup, obviously!).

I’ve created an R&D server setup which will allow me to post a series of technical articles covering various subjects such as database replication, snapshot backups, virtualization, security for PCI compliance, and much more.

Stay tuned, the first article should be available soon.

]]>
http://www.alexwilliams.ca/blog/2009/04/10/new-articles-coming-soon/feed/ 0
Registrars and DNS server redundancy http://www.alexwilliams.ca/blog/2009/02/06/registrars-and-dns-server-redundancy/ http://www.alexwilliams.ca/blog/2009/02/06/registrars-and-dns-server-redundancy/#comments Fri, 06 Feb 2009 18:04:40 +0000 Alex http://www.alexwilliams.ca/blog/?p=107 Everyone involved in web or networking knows that DNS is the root of the internet.

Despite this, some people forget how to properly configure their domains for full redundancy. It is common practice to add 2 Name Server records for each domain. (i.e: ns1.domain.com and ns2.domain.com). Nothing is wrong with that. The big problem is when those two Name Servers are at the same physical location (i.e: same data center or worst, same physical server).

What happens if that data center catches on fire? Or there’s a major power outage and the backup generators dont work… well, you lose the ability to resolve your domain. You might say: well in that case you can go to your Registrar and simply change the Name Servers… what if your Registrar is in the same location? Too bad!

bad-setup
I’ve seen the above situation a few too many times. Why put all your eggs in 1 basket?

If you want real DNS redundancy, here is what you should do:

  • 1. If you register your domain with GoDaddy, DON’T use them to host your Name Servers.
  • 2. Specify at least 2 Name Servers which are in different physical locations.
  • 3. Configure the 2nd Name Server to automatically retrieve DNS updates from the 1st Name Server.
  • good-setup
    We all hope for there never to be any major outages etc, but when it happens, it’s nice to know that we have a good backup and failover solution which allows us to get back up and running as quickly as possible.

    I recommend EveryDNS.net. The service is completely free and they give you complete control over your DNS records. It’s important to note that the service was also created by the founder of OpenDNS. A nice bonus!

    ]]> http://www.alexwilliams.ca/blog/2009/02/06/registrars-and-dns-server-redundancy/feed/ 0 Segmenting your corporate network with VLANs http://www.alexwilliams.ca/blog/2009/01/17/segmenting-your-corporate-network-with-vlans/ http://www.alexwilliams.ca/blog/2009/01/17/segmenting-your-corporate-network-with-vlans/#comments Sat, 17 Jan 2009 23:37:11 +0000 Alex http://www.alexwilliams.ca/blog/?p=142 I’ve seen many corporate networks fail to observe the most simple rules of networking. Often, administrators try to implement solutions to solve problems (more bandwidth for VoIP), as opposed to using existing networking technology.

    In today’s post, I will discuss the segmentation of a corporate network using VLANs.

    A VLAN allows you to create a virtual LAN without making any physical changes to your network setup. In the old days, this would be accomplished by adding more hardware switches and network cabling to separate various parts of your network. Nowadays, VLAN Tagging, defined by the IEEE 802.1Q standard, can be configured on almost every managed network switch (i.e: HP ProCurve, SMC TigerSwitch, 3com SuperStack, NetGear FSM switches).

    There are many advantages to segmenting your corporate network. If your company uses VoIP phones, adding them to a VLAN will allow you to perform simple QoS (quality of service) on those devices, thus providing the ability to guarantee bandwidth, therefore call quality. Another advantage is the ability to “hide” sensitive computers and servers (accounting & finance databases), from the rest of the network. If you experience chopped calls using your VoIP phones, the problem can easily be resolved by adding more bandwidth to the internal/external network, but it’s just a temporary fix. As your network and needs grow, the same problem will likely re-appear. With the use of VLANs and QoS, these problems can almost entirely be mitigated even with additional network growth. Depending on the size of your network, the cost of upgrading to managed switches can be much lower than the cost of upgrading your external bandwidth.

    Each VLAN is assigned a tag (usually a number from 1 to 4094) to identify the traffic. On a managed switch, you can configure each port to assign a default VLAN number to each frame/packet. In a basic setup, I would dedicate a VLAN 100 for management (routers, switches & admin computers), VLAN 200 for VoIP phones, and VLAN 300 for office PCs.

    Each VLAN on a switch’s port can be assigned the following options: tagged, untagged, non-member. Here’s an example: a VoIP phone is plugged into Port 5. Most VoIP phones also have an rj45 jack to plug a computer. Those phones can usually be configured with a pre-defined VLAN Tag (if they follow the 802.1Q standard). In our case, we’ll assign them to VLAN 200. Assuming VLAN 200 is dedicated to VoIP, Port 5 on the switch would be configured ‘tagged’ on VLAN 200, ‘untagged’ on VLAN 300, ‘non-member’ VLAN 100. This means the phone will always communicate ‘tagged’ on VLAN 200, the computer would be ‘untagged’ on VLAN 300 and neither of those devices would be able to communicate with VLAN 100. This is a simple setup, but it defines one way of segmentating a corporate network.

    On that network, it would be preferable to leave every port untagged on VLAN 300, and every port which has a VoIP phone would be tagged on VLAN 200. Your phone system server (asterisk/avaya/etc) would need to communicate with the admin computers, and all the phones, so you can simply leave the server’s port to tagged on VLAN 200, tagged on VLAN 100, non-member on VLAN 300.

    This can become a bit more complicated when you add multiple switches, but the idea remains the same. It improves security and allows for easy management of devices and extends the capabilities of your network to allow for the application of technology such as QoS.

    If you have any questions, please write them in the comments below and i’ll try to answer as best I can. I will be more than happy to write another detailed post with real-world examples, as well as diagrams explaining how all this works.

    ]]>
    http://www.alexwilliams.ca/blog/2009/01/17/segmenting-your-corporate-network-with-vlans/feed/ 1
    I’m no marketer, yet – end of blog http://www.alexwilliams.ca/blog/2008/07/14/im-no-marketer-yet-end-of-blog/ http://www.alexwilliams.ca/blog/2008/07/14/im-no-marketer-yet-end-of-blog/#comments Mon, 14 Jul 2008 13:00:52 +0000 Alex http://www.alexwilliams.ca/blog/?p=103 I still consider myself in career limbo at the moment. Having yet been able to precisely define my future career, I’ve decided to simply close this blog until I figure out what to do.

    As much as I love marketing and all things related, I have no interest in starting over from scratch, so I’ll remain in the IT industry for now.

    I’m damn good with IT and it would be wise for me to continue in that direction. I have an idea of what I should be doing; I just need to analyze my strengths and weaknesses, and adjust a few things to create something valuable.

    Thanks for reading! :D

    ]]>
    http://www.alexwilliams.ca/blog/2008/07/14/im-no-marketer-yet-end-of-blog/feed/ 2
    Lack of focus or too much ambition? http://www.alexwilliams.ca/blog/2008/06/17/lack-of-focus-or-too-much-ambition/ http://www.alexwilliams.ca/blog/2008/06/17/lack-of-focus-or-too-much-ambition/#comments Tue, 17 Jun 2008 06:01:06 +0000 Alex http://www.alexwilliams.ca/blog/?p=101 Where am I?

    I seem to have been all over the map these last few months, but in reality I’ve been sitting in the same chair quietly pondering the future. One day I decided to change my title to Network Architect. After spending days reading absolutely everything I can about Network Architecture, I was convinced it was going to be my new vocation. A few weeks later, I realized it’s not something I really want to do. As challenging and interesting as it may be, I simply don’t have the desire to move in that direction. I’m still missing some important life/work experience and that will only come with time.

    Passion

    Since I started this blog, I’ve become passionate about all things marketing. I’ve read tons of books, eBooks, blog posts and articles by various authors around the world. I think I’ve done a pretty good job marketing myself and some of my crazy ideas, but it was more of a learning experience… a test to determine my ability to focus on non-logical things… ideas. Unfortunately my lack of direction has prevented me from taking it to another level.

    The next step

    I spent the last 3 months working full-time while trying to figure out where to take my career. I’ve enjoyed almost every aspect of it, but now it’s time to move-on. Sometimes we find what we want without even looking for it, and we can spend all our lives looking for something that we’ll never find. I think this has been in my face the entire time, I just wasn’t focused enough to see it.

    I don’t plan on quitting my “job” just yet, but I’m definitely looking into a Marketing position. I know it’s a bit out of reach due to my lack of theoretical marketing knowledge, but I’ve been doing this for years without even realizing it. The idea of being paid to be creative, to strategize, to develop, refine and implement ideas suits me so well! I’ve been told that it’s difficult for an IT person to move into Marketing because you’re going from left brain work to right brain work. The difference, in my case, is that I’m actually a right brain person doing left brain work.

    Where to begin?

    Well, i’ll start by revamping my website with an altered design. Cleaning up my portfolio. Re-writing my bio and redefining myself as a different (better of course) Alex.

    I’ll keep you guys posted in the next few days. Obviously this means I will return to my regular postings.

    Cheers!

    ]]>
    http://www.alexwilliams.ca/blog/2008/06/17/lack-of-focus-or-too-much-ambition/feed/ 0
    Computer Systems & Network Decentralization http://www.alexwilliams.ca/blog/2008/04/25/computer-systems-network-decentralization/ http://www.alexwilliams.ca/blog/2008/04/25/computer-systems-network-decentralization/#comments Fri, 25 Apr 2008 12:30:15 +0000 Alex http://www.alexwilliams.ca/blog/?p=99 Reflecting on my previous post about scalability, I’ve realized I omitted one important detail: decentralization. One of the biggest hurdles has always been in regards to the decentralization of the IT infrastructure. I’ve been lucky to work with companies having offices in various remote locations and I’m still puzzled that so many remote locations can depend on 1 central core computer system/network.

    When something goes wrong in that central location, everyone is affected. This completely goes against the idea of having scalable, redundant, and distributed systems.

    I’ve spent the last few weeks drawing plans to decentralize the IT infrastructure of a few organizations, only to realize that my ideas would be difficult to accept due to added complexity. In my opinion, this is something which is unavoidable at every level of the organization, more so during fast growth and expansion.

    I understand that added costs are something which would preferably be avoided, especially with the increasing gas prices and pending recession… at the same time, these organizations can’t allow themselves to be held at the mercy of such a centralized system.

    Removing said dependencies will allow an organization to grow laterally as well as vertically, while paving the way for future teleworkers (lower costs). Building these systems from the ground up requires a broad vision of the future of the organization. At the same time, the plan must be flexible enough for changes to slide through.

    I’ve looked for answers in books, whitepapers, and trade magazines, and realized that there’s no ultimate solution.

    Most Sys Admins want to maintain total control (physical & logical) of their computer systems, but what they don’t realize is that they are jeopardizing and slowing the progress of the entire organization by doing so. A good Sys Admin is able to delegate reponsibility to lower-level admins without losing control of the network. Without decentralized computer systems & networks, this type of delegation becomes increasingly difficult, even impossible.

    I have not yet obtained the experience necessary to establish myself as a senior systems architect, but after 8 years I’ve learned the patterns and solutions required to design systems which are (and will be) favorable for everyone.

    On that note, feel free to leave a comment about decentralizing computer systems & networks. I would love to hear opinions on how we can help organizations understand its importance.

    ]]>
    http://www.alexwilliams.ca/blog/2008/04/25/computer-systems-network-decentralization/feed/ 0