Why Replication Matters
As application traffic grows, the database becomes the primary bottleneck. A single MySQL server handles both reads and writes, and read traffic typically outweighs writes by 10:1 or more. Master-slave replication distributes read queries across multiple replicas while the master handles all writes, providing both read scaling and disaster recovery.
This guide covers GTID-based replication setup, monitoring, and failover procedures that we implement when managing production servers.
Architecture
Application
├── Writes → Master (MySQL 8.0, 192.168.1.10)
└── Reads → HAProxy → Slave 1 (192.168.1.11)
→ Slave 2 (192.168.1.12)
→ Slave 3 (192.168.1.13)
HAProxy load-balances read traffic across healthy replicas. If a replica falls behind or becomes unavailable, HAProxy routes traffic to the remaining healthy nodes.
Master Configuration
Configure the master server's MySQL settings:
# /etc/mysql/mysql.conf.d/master.cnf
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin
binlog_format = ROW
binlog_row_image = FULL
gtid_mode = ON
enforce_gtid_consistency = ON
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
# Retain binary logs for 7 days
binlog_expire_logs_seconds = 604800
# Performance tuning
innodb_buffer_pool_size = 8G
innodb_log_file_size = 1G
innodb_io_capacity = 2000
max_connections = 500
Create Replication User
CREATE USER 'repl_user'@'192.168.1.%'
IDENTIFIED WITH caching_sha2_password BY 'strong_password_here';
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'192.168.1.%';
FLUSH PRIVILEGES;
Verify Master Status
SHOW MASTER STATUS\G
-- Note the File and Position values
-- With GTID, you can also check:
SELECT @@global.gtid_executed;
Slave Configuration
Configure each replica:
# /etc/mysql/mysql.conf.d/slave.cnf
[mysqld]
server-id = 2 # Unique per slave
log_bin = /var/log/mysql/mysql-bin
binlog_format = ROW
gtid_mode = ON
enforce_gtid_consistency = ON
relay_log = /var/log/mysql/relay-bin
read_only = ON
super_read_only = ON
log_slave_updates = ON
# Parallel replication
slave_parallel_workers = 4
slave_parallel_type = LOGICAL_CLOCK
slave_preserve_commit_order = ON
# Performance
innodb_buffer_pool_size = 8G
Initialize Replication
Take a consistent backup from the master:
mysqldump --all-databases --single-transaction \
--routines --triggers --events \
--master-data=2 --set-gtid-purged=ON \
-u root -p > master_backup.sql
Restore on the slave and start replication:
-- On the slave
SOURCE /tmp/master_backup.sql;
CHANGE MASTER TO
MASTER_HOST='192.168.1.10',
MASTER_USER='repl_user',
MASTER_PASSWORD='strong_password_here',
MASTER_AUTO_POSITION=1,
MASTER_SSL=1;
START SLAVE;
Verify Replication Status
SHOW SLAVE STATUS\G
Key fields to check:
Slave_IO_Running: Yes— the I/O thread is receiving binary logsSlave_SQL_Running: Yes— the SQL thread is applying eventsSeconds_Behind_Master: 0— replication lagRetrieved_Gtid_SetandExecuted_Gtid_Setshould match master
HAProxy Configuration for Read Distribution
# /etc/haproxy/haproxy.cfg
frontend mysql_read
bind *:3307
mode tcp
default_backend mysql_slaves
backend mysql_slaves
mode tcp
balance leastconn
option mysql-check user haproxy_check
server slave1 192.168.1.11:3306 check inter 5s fall 3 rise 2
server slave2 192.168.1.12:3306 check inter 5s fall 3 rise 2
server slave3 192.168.1.13:3306 check inter 5s fall 3 rise 2
The application connects to port 3306 for writes (master) and port 3307 for reads (HAProxy-balanced slaves).
Monitoring Replication Health
Automated Lag Monitoring
#!/bin/bash
# /usr/local/bin/check_replication.sh
THRESHOLD=30
LAG=$(mysql -u monitor -p'monitor_pass' -h 192.168.1.11 \
-e "SHOW SLAVE STATUS\G" | grep "Seconds_Behind_Master" | awk '{print $2}')
if [ "$LAG" == "NULL" ]; then
echo "CRITICAL: Replication is broken"
exit 2
elif [ "$LAG" -gt "$THRESHOLD" ]; then
echo "WARNING: Replication lag is ${LAG}s (threshold: ${THRESHOLD}s)"
exit 1
else
echo "OK: Replication lag is ${LAG}s"
exit 0
fi
Prometheus Monitoring
Use the mysqld_exporter to expose replication metrics:
# prometheus alert rule
- alert: MySQLReplicationLag
expr: mysql_slave_status_seconds_behind_master > 30
for: 5m
labels:
severity: warning
annotations:
summary: "MySQL replication lag on {{ $labels.instance }}"
description: "Slave is {{ $value }}s behind master"
Failover Procedure
When the master fails, promote a slave:
-- On the most up-to-date slave
STOP SLAVE;
RESET SLAVE ALL;
SET GLOBAL read_only = OFF;
SET GLOBAL super_read_only = OFF;
-- Point other slaves to the new master
-- On remaining slaves:
STOP SLAVE;
CHANGE MASTER TO
MASTER_HOST='192.168.1.11',
MASTER_AUTO_POSITION=1;
START SLAVE;
Update application configuration and HAProxy to point writes to the new master. For automated failover, consider MySQL Group Replication or Orchestrator.
Best Practices
- Always use GTID-based replication for simpler failover
- Enable
super_read_onlyon slaves to prevent accidental writes - Monitor
Seconds_Behind_Masterand alert at 30 seconds - Test failover procedures quarterly in staging
- Keep binary log retention at 7 days minimum for recovery flexibility
- Use SSL/TLS for replication traffic between data centers
- Consider semi-synchronous replication for zero data loss requirements
Properly configured replication transforms a single point of failure into a resilient, scalable database tier. For complex multi-region setups, our infrastructure management team can design and implement the right topology for your workload.
Need help with this?
Our team handles this kind of work daily. Let us take care of your infrastructure.
Related Articles
The Ultimate Guide to Linux Server Management in 2025
A comprehensive guide to modern Linux server management covering automation, containerization, cloud integration, AI-driven operations, security best practices, and essential tooling for 2025.
Server & DevOpsFixing "421 Misdirected Request" for Plesk Sites on Ubuntu 22.04 After Apache Update
Resolve the 421 Misdirected Request error affecting all HTTPS sites on Plesk for Ubuntu 22.04 after an Apache update, caused by changed SNI requirements in the nginx-to-Apache proxy chain.
Server & DevOpsHow to Set Up GlusterFS on Ubuntu
A complete guide to setting up a distributed, replicated GlusterFS filesystem across multiple Ubuntu 22.04 nodes, including installation, volume creation, client mounting, maintenance, and troubleshooting.