适用范围
MySQL 8.0 MGR (MySQL Group Replication)架构。
问题概述
客户一套3节点 MySQL8.0 MGR架构,单主模式,优化参数后进行一个从节点重启的动作,突然从节点无法启动,报错如下:
ERROR 3092 (HY000): The server is not configured properly to be an active member of the group. Please see more details on error log.
问题分析
在进行从节点加入集群的过程中(start group_replication;)报错,具体信息提示需要看错误日志(Please see more details on error log)。
根据错误提示信息,继续分析错误日志中的内容:
error log中的主要错误信息提示如下:
2024-03-28T20:22:39.591931+08:00 0 [ERROR] [MY-011526] [Repl] Plugin group_replication reported: 'This member has more executed transactions than those present in the group. Local transactions: 4c7e5ca1-ecfc-11ee-8f06-ec2a7240ebfe:1-9,906db11f-e5a4-11ed-b2b2-ec2a7240dcd0:1-55238036:56040567-56064196 > Group transactions: 906db11f-e5a4-11ed-b2b2-ec2a7240dcd0:1-55238036:56040567-56064196'
2024-03-28T20:22:39.591998+08:00 0 [ERROR] [MY-011522] [Repl] Plugin group_replication reported: 'The member contains transactions not present in the group. The member will now exit the group.'
日志中只有两个错误信息 [ERROR] [MY-011526] 和 [ERROR] [MY-011522],解读如下:
[MY-011526]
Plugin group_replication reported: ‘This member has more executed transactions than those present in the group. Local transactions: 4c7e5ca1-ecfc-11ee-8f06-ec2a7240ebfe:1-9,906db11f-e5a4-11ed-b2b2-ec2a7240dcd0:1-55238036:56040567-56064196 > Group transactions: 906db11f-e5a4-11ed-b2b2-ec2a7240dcd0:1-55238036:56040567-56064196’
2024-03-28T20:22:39.591998+08:00 0 [ERROR] [MY-011522] [Repl] Plugin group_replication reported: 'The member contains transactions not present in the group. The member will now exit the group.
解读报错信息,可以看到节点无法加入集群的原因为本地的GTID值 大于 MGR集群的GTID值。
本地实例的server_uuid为:4c7e5ca1-ecfc-11ee-8f06-ec2a7240ebfe
集群的group_replication_group_name为:906db11f-e5a4-11ed-b2b2-ec2a7240dcd0
因此这里提示本地从节点的事务(Local transactions)大于 集群事务(Group transactions),在这里我们分析错误日志中的关键字是:has more executed transactions than those present in the group 还有后面的大于号 > 有了这些,也就基本上很容易分析出原因了。
[MY-011522]
Plugin group_replication reported: 'The member contains transactions not present in the group. The member will now exit the group.
解读报错信息,该成员中包含集群中不存在的事务,成员将脱离集群。
问题原因
正常MGR集群的从节点是只读的,Primary主节点的GTID只更新group_replication_group_name的GTID而从节点自己的server_uuid的GTID不会更新,除非当前节点已经被剔除有人直接在从库上进行数据写入或者人为关闭只读去执行了操作,才会导致从节点自己的本地事务GTID信息发生更新。
通过与客户沟通并进一步确认,从节点确实有业务连接且产生的本地事务,有一部分应用连接访问从节点,从节点被写入数据,导致从节点的本地GTID发生了更新。
解决方案
- 客户想重新加入集群,故如果能确保本地事务产生的数据不对集群产生影响,可以采用purge_gtid的方式来忽略本地事务(谨慎操作,因为一般本地事务产生的数据变更客户很难识别正确)。
SET GLOBAL GTID_PURGED ="xxxx";
- 重建MGR从节点
(最终客户经过沟通确认,担心本地数据会污染整个集群,保险起见进行节点的重建。)
--确认克隆插件是否安装
select plugin_name,plugin_status from information_schema.plugins where plugin_name='clone';
+-------------+---------------+
| PLUGIN_NAME | PLUGIN_STATUS |
+-------------+---------------+
| clone | ACTIVE |
+-------------+---------------+
--克隆插件安装方式
INSTALL PLUGIN clone SONAME 'mysql_clone.so';
--配置克隆源
SET GLOBAL clone_valid_donor_list = '10.294.67.111:3306';
--执行克隆操作
CLONE INSTANCE FROM 'repluser'@'10.294.67.111':3306 IDENTIFIED BY 'yourpassword';
--查看克隆进度,会先删除数据,再进行拷贝
select stage, state, end_time from performance_schema.clone_progress;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 1367
Current database: *** NONE ***
+-----------+-----------+----------------------------+
| STAGE | STATE | END_TIME |
+-----------+-----------+----------------------------+
| DROP DATA | Completed | 2024-03-28 20:36:39.706373 |
| FILE COPY | Completed | 2024-03-28 20:36:39.385411 |
| PAGE COPY | Completed | 2024-03-28 20:36:39.798611 |
| REDO COPY | Completed | 2024-03-28 20:36:40.106462 |
| FILE SYNC | Completed | 2024-03-28 20:37:01.364659 |
| RESTART | Completed | 2024-03-28 20:38:01.678312 |
| RECOVERY | Completed | 2024-03-28 20:38:02.450064 |
+-----------+-----------+----------------------------+
7 rows in set (0.01 sec)
--克隆完成之后,从节点会自己加入集群
mysql> select * from performance_schema.replication_group_members;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 1368
Current database: *** NONE ***
+---------------------------+--------------------------------------+---------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+---------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 4c7e5ca1-ecfc-11ee-8f06-ec2a7240ebfe | 10.294.67.111 | 3306 | ONLINE | SECONDARY | 8.0.28 |
| group_replication_applier | 4c7e5ca1-ecfc-11ee-8f06-ec2a7240ebff | 10.294.67.112 | 3306 | ONLINE | PRIMARY | 8.0.28 |
| group_replication_applier | 4c7e5ca1-ecfc-11ee-8f06-ec2a7240ebfg | 10.294.67.113 | 3306 | ONLINE | SECONDARY | 8.0.28 |
+---------------------------+--------------------------------------+---------------+-------------+--------------+-------------+----------------+
参考文档
https://dev.mysql.com/doc/refman/8.0/en/group-replication-replication-group-member-stats.html