Bobcares

AWS RDS Read Replica lag

by | Oct 30, 2021

AWS RDS Read Replica lag occurs because it uses asynchronous replication and cannot keep up with the primary DB instance

Here, at Bobcares, we assist our customers with several AWS queries as part of our AWS Support Services.

Today let us find the cause of replica lag when using Amazon RDS for MySQL.

 

AWS RDS Read Replica lag

With Amazon RDS for MySQL read replica with binary log file position-based replication, we can monitor replication lag.

Generally, MySQL replication works with three threads: the Binlog Dump thread, the IO_THREAD, and the SQL_THREAD.

If there is a delay in replication, we can check if the replica IO_THREAD or the replica SQL_THREAD causes the lag.

Then, we can identify the root cause of the lag.

To identify the replication thread that lags, our Support Techs shows the following examples.

1. Initially, on the primary DB instance, we run the following command and review the output:

mysql> SHOW MASTER STATUS;
+----------------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+----------------------------+----------+--------------+------------------+-------------------+
| mysql-bin-changelog.066552| 521 | | | |
+----------------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

2. Then we run the SHOW SLAVE STATUS command on the replica DB instance and review the output:

Example 1:

mysql> SHOW SLAVE STATUS\G;
*************************** 1. row ***************************
Master_Log_File: mysql-bin.066548
Read_Master_Log_Pos: 10050480
Relay_Master_Log_File: mysql-bin.066548
Exec_Master_Log_Pos: 10050300
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

Example 2:

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Master_Log_File: mysql-bin.066552
Read_Master_Log_Pos: 430
Relay_Master_Log_File: mysql-bin.066530
Exec_Master_Log_Pos: 50360
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

Normally, IO_THREAD doesn’t cause large replication delays.

However, if the replica SQL_THREAD is the source of replication delays, the cause could be the following:

  • Long-running queries on the primary DB instance

Long-running queries that take an equal amount of time to run on the replica DB instance can increase seconds_behind_master.

For instance, say we initiate a change. If it takes an hour to run then the lag is one hour. Since the change takes the same time to complete on the replica, the total lag is approximately two hours.

This is a common delay. However, if we monitor the slow query log on the primary instance we can minimize this lag.

In addition, we can identify and break long-running statements to reduce lag.

  • Insufficient DB instance class size or storage

If the storage configuration or the replica DB instance class is lower than the primary, then the replica might struggle with insufficient resources.

So, we need to ensure that the DB instance type of the replica is the same or higher than the primary DB instance.

  • Parallel queries run on the primary DB instance

When a high volume of writes occurs to the source DB instance in parallel, the writes to the read replica are serialized using a single SQL_THREAD.

This can create a lag between them both.

  • Binary logs synced to the disk on the replica DB instance

Enabling automatic backups results in overhead to sync the binary logs to the disk on the replica.

By default, the value of the parameter sync_binlog is set to 1. Changing this value to 0 disables the synchronization of the binary log to the disk by the MySQL server.

  • Binlog_format is set to ROW

There can be instances when the binlog_format is set to ROW. The source table may also miss a primary key.

To resolve this in the short term, we can change the search algorithm to INDEX_SCAN,HASH_SCAN.

For the long-term resolution, our Support Techs recommend adding an explicit primary key to each table.

  • Replica creation lag

Amazon RDS creates a read replica of a MySQL primary instance by taking a DB snapshot. Then restores it to create a new DB instance (replica) and establishes replication between the two.

Most often, when a replication establishes, there is a lag to create a backup of the primary.

To minimize this lag, before we call for the replica action we can create a manual backup. Thus, the snapshot taken by the replica creation process is an incremental backup, which is faster.

We can also enable the InnoDB cache warming feature. This will provide performance gains by saving the current state of the buffer pool.

[Found it useful? Here is a lot more to explore]

 

Conclusion

In short, we saw how our Support Techs go about AWS RDS Read Replica lag.

PREVENT YOUR SERVER FROM CRASHING!

Never again lose customers to poor server speed! Let us help you.

Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.

GET STARTED

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Never again lose customers to poor
server speed! Let us help you.