GreatSQL MGR 集群故障转移和恢复场景

2023年 8月 31日 62.7k 0

这篇博客文章将关注使用shell,完成GreatSQL MGR 集群的故障转移和恢复场景。

一、GreatSQL MGR 集群故障转移

通过MySQLShell连接到第一个集群(“GreatSQLMGR”)的任何节点,并获取详细信息。


c=dba.get_cluster();
c.status();

现在执行从实例(“172.26.170.14:3306”)到(“172.26.170.13:3306”)的主切换。


c=dba.get_cluster();
c.set_primary_instance('172.26.170.13:3306');

实例(“172.26.170.13:3306”)将显示状态为 primary

c.status();

二、如何重新加入丢失的实例

如果由于某种原因,一个实例离开集群或失去连接,并且不能自动重新加入集群,我们可能需要通过发出“Cluster.rejoinInstance(实例)“命令。这里,我们将尝试创建一个小示例来演示该命令的用法。

  • 通过停止实例上的组复制来创建一些阻止程序(“172.26.170.15:3306”)。

<code >
MySQL 172.26.170.15:3306 ssl Py > sql
Switching to SQL mode... Commands end with ;
MySQL 172.26.170.15:3306 ssl SQL > stop group_replication;

查看下面的信息,我们可以看到实例(“172.26.170.15:3306”)显示“丢失”状态。

现在,用rejoin_instance()命令再次添加实例(“172.26.170.15:3306”)。

MySQL 172.26.170.13:3306 ssl Py > c=dba.get_cluster();
MySQL 172.26.170.13:3306 ssl Py > c.rejoin_instance('172.26.170.15:3306');
Rejoining instance '172.26.170.15:3306' to cluster 'GreatSQLMGR'...
The instance '172.26.170.15:3306' was successfully rejoined to the cluster.

三、如何从重大故障中恢复整个集群

有时,即使所有节点都已启动,并且发生了一些内部问题,如组复制停滞或一些网络问题,您也可能会遇到完全中断,无法在群集上执行任何写入/活动。

在这种情况下,您可以使用任何一个节点并使用其元数据来恢复群集。您需要连接到最新的实例,否则,您可能会丢失数据或在集群节点中出现不一致。

让我们看看如何手动引入这种情况,然后尝试修复它。

<code >MySQL 172.26.170.13:3306 ssl andy SQL > stop group_replication;
MySQL 172.26.170.14:3306 ssl andy SQL > stop group_replication;
MySQL 172.26.170.15:3306 ssl andy SQL > stop group_replication;

此时,MGR集群的所有节点都处于 OFFLINE ,不可写状态

随机连接一个MGR节点,执行 reboot_cluster_from_complete_outage(),

每个输入项为 y,可以看到当执行命令的节点GTID非最多时,会明确提示你在哪个正确节点(GTID最多的节点)执行启动MGR集群命令。提示信息如下RuntimeError: Dba.reboot_cluster_from_complete_outage: The active session instance (172.26.170.14:3306) isn't the most updated in comparison with the ONLINE instances of the Cluster's metadata. Please use the most up to date instance: '172.26.170.13:3306'

<code >MySQL 172.26.170.14:3306 ssl Py > dba.reboot_cluster_from_complete_outage()
Restoring the default cluster from complete outage...

The instance '172.26.170.13:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

The instance '172.26.170.15:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

Traceback (most recent call last):
File "", line 1, in
RuntimeError: Dba.reboot_cluster_from_complete_outage: The active session instance (172.26.170.14:3306) isn't the most updated in comparison with the ONLINE instances of the Cluster's metadata. Please use the most up to date instance: '172.26.170.13:3306'.

GTID最多的节点 执行启动MGR集群命令

<code >MySQL 172.26.170.13:3306 ssl Py > dba.reboot_cluster_from_complete_outage()
Restoring the default cluster from complete outage...

The instance '172.26.170.14:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

The instance '172.26.170.15:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

172.26.170.13:3306 was restored.
Rejoining '172.26.170.14:3306' to the cluster.
Rejoining instance '172.26.170.14:3306' to cluster 'GreatSQLMGR'...
The instance '172.26.170.14:3306' was successfully rejoined to the cluster.

Rejoining '172.26.170.15:3306' to the cluster.
Rejoining instance '172.26.170.15:3306' to cluster 'GreatSQLMGR'...
The instance '172.26.170.15:3306' was successfully rejoined to the cluster.

The cluster was successfully rebooted.

查看MGR集群状态,ok!

<code >MySQL 172.26.170.13:3306 ssl SQL > SELECT MEMBER_ID, MEMBER_HOST, MEMBER_PORT, MEMBER_STATE, IF(global_status.VARIABLE_NAME IS NOT NULL, 'PRIMARY', 'SECONDARY') AS MEMBER_ROLE FROM performance_schema.replication_group_members LEFT JOIN performance_schema.global_status ON global_status.VARIABLE_NAME = 'group_replication_primary_member' AND global_status.VARIABLE_VALUE = replication_group_members.MEMBER_ID;
+--------------------------------------+---------------+-------------+--------------+-------------+
| MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE |
+--------------------------------------+---------------+-------------+--------------+-------------+
| 3ad980b6-3e47-11ee-84b3-005056b261a4 | 172.26.170.13 | 3306 | ONLINE | PRIMARY |
| 63b7cca8-3e47-11ee-a9a0-005056b23879 | 172.26.170.14 | 3306 | ONLINE | SECONDARY |
| 63ce874f-3e47-11ee-832d-005056b22d45 | 172.26.170.15 | 3306 | ONLINE | SECONDARY |
+--------------------------------------+---------------+-------------+--------------+-------------+
3 rows in set (0.0012 sec)

相关文章

Oracle如何使用授予和撤销权限的语法和示例
Awesome Project: 探索 MatrixOrigin 云原生分布式数据库
下载丨66页PDF,云和恩墨技术通讯(2024年7月刊)
社区版oceanbase安装
Oracle 导出CSV工具-sqluldr2
ETL数据集成丨快速将MySQL数据迁移至Doris数据库

发布评论