##
奇怪的主从复制问题
今天在修正一客户的事务隔离级别时发现了一个奇怪的问题,主从复制中从库的线程Time值为负数。
1、现象
processlist的Time值有三个负数和一个正数,检查信息如下:
mysql> show full processlist;
+---------+-------------+-----------------+------+---------+----------+----------------------------------------------------------+-----------------------+
| Id | User | Host | db | Command | Time | State | Info |
+---------+-------------+-----------------+------+---------+----------+----------------------------------------------------------+-----------------------+
| 5 | system user | connecting host | NULL | Connect | 18183371 | Waiting for source to send event | NULL |
| 6 | system user | | NULL | Query | 0 | Replica has read all relay log; waiting for more updates | NULL |
| 7 | system user | | NULL | Query | -494 | Waiting for an event from Coordinator | NULL |
| 8 | system user | | NULL | Query | -490 | Waiting for an event from Coordinator | NULL |
| 9 | system user | | NULL | Query | -440 | Waiting for an event from Coordinator | NULL |
| 10 | system user | | NULL | Query | 6184 | Waiting for an event from Coordinator | NULL |
| 1265587 | root | localhost | NULL | Query | 0 | init | show full processlist |
+---------+-------------+-----------------+------+---------+----------+----------------------------------------------------------+-----------------------+
7 rows in set (0.00 sec)
先看参数理解:
Waiting for an event from Coordinator:
使用多线程副本(replica_parallel_workers 或slave_parallel_workers 大于 1),副本工作线程之一正在等待来自协调器线程的事件。从 MySQL 8.0.26 开始,使用 replica_parallel_workers代替 slave_parallel_workers,从 MySQL 8.0.27 开始,默认值为 4,因此副本默认是多线程的。在副本上启用多线程并设置用于并行执行复制事务的应用程序线程数。当该值是大于 1 的数字时,副本是一个多线程副本,具有指定数量的应用程序线程,加上一个协调器线程来管理它们。如果您使用多个复制通道,则每个通道都有此数量的线程。
再看Time的解释
Time:
线程处于当前状态的时间(以秒为单位)。对于副本 SQL 线程,该值是上次复制事件的时间戳与副本主机的实时时间之间的秒数。
官网对复制的Time的相关解释做了更多的解释:有兴趣的可以阅读
https://dev.mysql.com/doc/refman/8.0/en/replication-threads-monitor-worker.html
正数是可以理解的,但是为什么会有负数呢?
进一步检查主从复制的状态,两个Yes,并且 SQL_Delay=0.
mysql> show replica statusG
*************************** 1. row ***************************
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
SQL_Delay: 0
2、解决思路
Time是与时间相关的,于是我们先从数据库的时间开始检查,比如时区,now()函数,操作系统时间,果不其然发现两台主机的时间存储很大差异。
[root@ws01 ~]# date
Fri Aug 11 18:25:22 AEST 2023
[root@dbreplica1 ~]# date
Fri Aug 11 18:17:08 AEST 2023
3、解决方法
date -s "Fri Aug 11 18:26:00 AEST 2023"
再次检查,状态正常
mysql> show full processlist;
+---------+-------------+-----------------+------+---------+----------+----------------------------------------------------------+-----------------------+
| Id | User | Host | db | Command | Time | State | Info |
+---------+-------------+-----------------+------+---------+----------+----------------------------------------------------------+-----------------------+
| 5 | system user | connecting host | NULL | Connect | 18184520 | Waiting for source to send event | NULL |
| 6 | system user | | NULL | Query | 0 | Replica has read all relay log; waiting for more updates | NULL |
| 7 | system user | | NULL | Query | 0 | Waiting for an event from Coordinator | NULL |
| 8 | system user | | NULL | Query | 2 | Waiting for an event from Coordinator | NULL |
| 9 | system user | | NULL | Query | 362 | Waiting for an event from Coordinator | NULL |
| 10 | system user | | NULL | Query | 7333 | Waiting for an event from Coordinator | NULL |
| 1265590 | root | localhost | NULL | Query | 0 | init | show full processlist |
+---------+-------------+-----------------+------+---------+----------+----------------------------------------------------------+-----------------------+
7 rows in set (0.00 sec)
4、思考题
为什么存在时间差,但是却有一个正数?欢迎关注.