AWS RDS Read Replica lag occurs because it uses asynchronous replication and cannot keep up with the primary DB instance
Here, at Bobcares, we assist our customers with several AWS queries as part of our AWS Support Services.
Today let us find the cause of replica lag when using Amazon RDS for MySQL.
AWS RDS Read Replica lag
With Amazon RDS for MySQL read replica with binary log file position-based replication, we can monitor replication lag.
Generally, MySQL replication works with three threads: the Binlog Dump thread, the IO_THREAD, and the SQL_THREAD.
If there is a delay in replication, we can check if the replica IO_THREAD or the replica SQL_THREAD causes the lag.
Then, we can identify the root cause of the lag.
To identify the replication thread that lags, our Support Techs shows the following examples.
1. Initially, on the primary DB instance, we run the following command and review the output:
mysql> SHOW MASTER STATUS; +----------------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +----------------------------+----------+--------------+------------------+-------------------+ | mysql-bin-changelog.066552| 521 | | | | +----------------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec)
2. Then we run the SHOW SLAVE STATUS command on the replica DB instance and review the output:
mysql> SHOW SLAVE STATUS\G; *************************** 1. row *************************** Master_Log_File: mysql-bin.066548 Read_Master_Log_Pos: 10050480 Relay_Master_Log_File: mysql-bin.066548 Exec_Master_Log_Pos: 10050300 Slave_IO_Running: Yes Slave_SQL_Running: Yes
mysql> SHOW SLAVE STATUS\G *************************** 1. row *************************** Master_Log_File: mysql-bin.066552 Read_Master_Log_Pos: 430 Relay_Master_Log_File: mysql-bin.066530 Exec_Master_Log_Pos: 50360 Slave_IO_Running: Yes Slave_SQL_Running: Yes
Normally, IO_THREAD doesn’t cause large replication delays.
However, if the replica SQL_THREAD is the source of replication delays, the cause could be the following:
Long-running queries on the primary DB instance
Long-running queries that take an equal amount of time to run on the replica DB instance can increase seconds_behind_master.
For instance, say we initiate a change. If it takes an hour to run then the lag is one hour. Since the change takes the same time to complete on the replica, the total lag is approximately two hours.
This is a common delay. However, if we monitor the slow query log on the primary instance we can minimize this lag.
In addition, we can identify and break long-running statements to reduce lag.
Insufficient DB instance class size or storage
If the storage configuration or the replica DB instance class is lower than the primary, then the replica might struggle with insufficient resources.
So, we need to ensure that the DB instance type of the replica is the same or higher than the primary DB instance.
Parallel queries run on the primary DB instance
When a high volume of writes occurs to the source DB instance in parallel, the writes to the read replica are serialized using a single SQL_THREAD.
This can create a lag between them both.
Binary logs synced to the disk on the replica DB instance
Enabling automatic backups results in overhead to sync the binary logs to the disk on the replica.
By default, the value of the parameter sync_binlog is set to 1. Changing this value to 0 disables the synchronization of the binary log to the disk by the MySQL server.
Binlog_format is set to ROW
There can be instances when the binlog_format is set to ROW. The source table may also miss a primary key.
To resolve this in the short term, we can change the search algorithm to INDEX_SCAN,HASH_SCAN.
For the long-term resolution, our Support Techs recommend adding an explicit primary key to each table.
Replica creation lag
Amazon RDS creates a read replica of a MySQL primary instance by taking a DB snapshot. Then restores it to create a new DB instance (replica) and establishes replication between the two.
Most often, when a replication establishes, there is a lag to create a backup of the primary.
To minimize this lag, before we call for the replica action we can create a manual backup. Thus, the snapshot taken by the replica creation process is an incremental backup, which is faster.
We can also enable the InnoDB cache warming feature. This will provide performance gains by saving the current state of the buffer pool.
[Found it useful? Here is a lot more to explore]
In short, we saw how our Support Techs go about AWS RDS Read Replica lag.