发生 紧急故障转移后,ClusterSet 各部分之间的事务集存在不同的风险,您必须隔离集群以防止写入流量或所有流量。
如果发生网络分区,则可能会出现脑裂情况,即实例失去同步并且无法正确通信以定义同步状态。脑裂可能发生在以下情况:DBA 决定强制选举一个副本集群作为主集群,创建多个主集群,从而导致脑裂情况。
在这种情况下,DBA 可以选择隔离原始主集群:
- 写道。
- 所有流量。
可以使用三种围栏操作:
-
.fenceWrites()
:停止向 ClusterSet 的主集群写入流量。副本集群不接受写入,因此此操作对它们没有影响。从 8.0.31 开始,可以在无效的副本集群上使用。此外,如果在禁用的副本集群上运行
super_read_only
,它将启用它。 -
.unfenceWrites()
:恢复写入流量。此操作可以在之前使用该操作隔离写入流量的集群上运行.fenceWrites()
。无法
*
cluster*.unfenceWrites()
在副本集群上使用。 -
.fenceAllTraffic()
:隔离集群,使其免受所有流量的影响。如果您使用 隔离了集群的所有流量.fenceAllTraffic()
,则必须使用dba.rebootClusterFromCompleteOutage()
MySQL Shell 命令重新启动集群。有关 的更多信息
dba.rebootClusterFromCompleteOutage()
,请参阅第 7.8.3 节 “从严重中断中重新启动集群”。
栅栏写入()
在副本集群上发出.fenceWrites()
会返回错误:
解释ERROR: Unable to fence Cluster from write traffic:
operation not permitted on REPLICA Clusters
Cluster.fenceWrites: The Cluster '' is a REPLICA Cluster
of the ClusterSet '' (MYSQLSH 51616)
尽管您主要在属于集群集的集群上使用防护,但也可以使用 来防护独立集群.fenceAllTraffic()
。
-
要隔离主集群免受写入流量影响,请使用 Cluster.fenceWrites 命令,如下所示:
.fenceWrites()
运行命令后:
super_read_only
集群上的 自动管理已禁用。super_read_only
在集群中的所有实例上启用。- 所有应用程序都被阻止在集群上执行写入操作。
解释cluster.fenceWrites()
The Cluster 'primary' will be fenced from write traffic* Disabling automatic super_read_only management on the Cluster...
* Enabling super_read_only on '127.0.0.1:3311'...
* Enabling super_read_only on '127.0.0.1:3312'...
* Enabling super_read_only on '127.0.0.1:3313'...NOTE: Applications will now be blocked from performing writes on Cluster 'primary'.
Use .unfenceWrites() to resume writes if you are certain a split-brain is not in effect.Cluster successfully fenced from write traffic
-
要检查您是否已隔离主集群以防止写入流量,请使用
.status
以下命令:.clusterset.status()
输出如下:
解释clusterset.status()
{
"clusters": {
"primary": {
"clusterErrors": [
"WARNING: Cluster is fenced from Write traffic.
Use cluster.unfenceWrites() to unfence the Cluster."
],
"clusterRole": "PRIMARY",
"globalStatus": "OK_FENCED_WRITES",
"primary": null,
"status": "FENCED_WRITES",
"statusText": "Cluster is fenced from Write Traffic."
},
"replica": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
}
},
"domainName": "primary",
"globalPrimaryInstance": null,
"primaryCluster": "primary",
"status": "UNAVAILABLE",
"statusText": "Primary Cluster is fenced from write traffic."
-
要解除集群的防护并恢复到主集群的写入流量,请使用 Cluster.fenceWrites 命令,如下所示:
.unfenceWrites()
super_read_only
主集群上的 自动管理已启用,并且super_read_only
主集群实例上的状态。解释 cluster.unfenceWrites()
The Cluster 'primary' will be unfenced from write traffic* Enabling automatic super_read_only management on the Cluster...
* Disabling super_read_only on the primary '127.0.0.1:3311'...Cluster successfully unfenced from write traffic
-
要隔离集群以阻止所有流量,请使用 Cluster.fenceAllTraffic 命令,如下所示:
.fenceAllTraffic()
该
super_read_only
状态在集群实例的主实例上启用。offline_mode
在集群中的所有实例上 启用之前 :解释 cluster.fenceAllTraffic()
The Cluster 'primary' will be fenced from all traffic* Enabling super_read_only on the primary '127.0.0.1:3311'...
* Enabling offline_mode on the primary '127.0.0.1:3311'...
* Enabling offline_mode on '127.0.0.1:3312'...
* Stopping Group Replication on '127.0.0.1:3312'...
* Enabling offline_mode on '127.0.0.1:3313'...
* Stopping Group Replication on '127.0.0.1:3313'...
* Stopping Group Replication on the primary '127.0.0.1:3311'...Cluster successfully fenced from all traffic
-
要解除集群与所有流量的隔离,请使用
dba.rebootClusterFromCompleteOutage()
MySQL Shell 命令。恢复集群后,当系统询问您是否要将实例重新加入集群时,您可以通过选择 ****Y****将实例重新加入集群:解释cluster = dba.rebootClusterFromCompleteOutage()
Restoring the cluster 'primary' from complete outage...The instance '127.0.0.1:3312' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: YThe instance '127.0.0.1:3313' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: Y* Waiting for seed instance to become ONLINE...
127.0.0.1:3311 was restored.
Rejoining '127.0.0.1:3312' to the cluster.
Rejoining instance '127.0.0.1:3312' to cluster 'primary'...The instance '127.0.0.1:3312' was successfully rejoined to the cluster.
Rejoining '127.0.0.1:3313' to the cluster.
Rejoining instance '127.0.0.1:3313' to cluster 'primary'...The instance '127.0.0.1:3313' was successfully rejoined to the cluster.
The cluster was successfully rebooted.