MySQL PXC集群流控
一.什么是流控(FC)?如何工作?
节点接收写集并把它们按照全局顺序组织起来,节点将接收到的未应用和提交的事务保存在接收队列中,
当这个接收队列达到一定的大小,将触发限流;此时节点将暂停复制,节点会先处理接收队列中的任务。
当接收队列减小到一个可管理的值后,复制将恢复。
它普遍存在于galera集群系统。
二.流控是发生了什么,会有哪些全局值可以观察到流控?
mysql> show global status like 'wsrep_flow%';
+----------------------------------+----------------+
| Variable_name | Value |
+----------------------------------+----------------+
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_flow_control_interval | [ 1024, 1024 ] |
| wsrep_flow_control_interval_low | 1024 |
| wsrep_flow_control_interval_high | 1024 |
| wsrep_flow_control_status | OFF |
+----------------------------------+----------------+
wsrep_flow_control_paused_ns
限流发生时,复制同步暂停的时间(各节点都可能出现,不适合作为监控项)
wsrep_flow_control_paused
该状态值发生变化,含义为从上一次SHOW GLOBAL STATUS命令开始,
限流占全体同步数据时间的百分比(初始值0.0),理想情况下应该趋近于0.0;
当它比较大的时候(超过0.6),我们需要采取一些手段(添加新节点、删除慢节点,调高wsrep_slave_threads值)
来改善限流情况
wsrep_flow_control_sent
本地节点发送给集群的限流事件信息数量,可以用来当做监控项,来确认哪个节点导致了限流产生
wsrep_flow_control_recv
本地节点收到的集群限流事件信息数量
wsrep_flow_control_interval
wsrep_flow_control_interval_low
wsrep_flow_control_interval_high
wsrep_flow_control_status
如何进行限流调优?
1. wsrep_slave_threads
The number of threads to use for applying slave write sets.
用于设置读节点执行写集的线程个数
mysql> show global variables like 'wsrep_slave_threads';
+---------------------+-------+
| Variable_name | Value |
+---------------------+-------+
| wsrep_slave_threads | 24 |
+---------------------+-------+
默认值1是远远不够的,我们需要根据另外一个状态值进行调整
公司安装的PXC集群,默认该参数值为16(也是不够的)
2. wsrep_cert_deps_distance
可以并行执行的最高与最低队列值之间的平均距离
代表可以同时并行执行多少个写集的操作
mysql> show global status like 'wsrep_cert_deps_distance';
+--------------------------+-----------+
| Variable_name | Value |
+--------------------------+-----------+
| wsrep_cert_deps_distance | 95.288623 |
+--------------------------+-----------+
我们可以将wsrep_slave_threads的值按照wsrep_cert_deps_distance的值设置
注意:刚做完SST的时候,这个状态值会非常高,然后缓慢下降,此时该值不具备参考性
mysql> show global status like 'wsrep_cert_deps_distance';
+--------------------------+-------------+
| Variable_name | Value |
+--------------------------+-------------+
| wsrep_cert_deps_distance | 6015.952973 |
+--------------------------+-------------+
1 row in set (0.00 sec)
mysql> show global status like 'wsrep_cert_deps_distance';
+--------------------------+-------------+
| Variable_name | Value |
+--------------------------+-------------+
| wsrep_cert_deps_distance | 5840.756210 |
+--------------------------+-------------+
1 row in set (0.00 sec)
mysql> show global status like 'wsrep_cert_deps_distance';
+--------------------------+-------------+
| Variable_name | Value |
+--------------------------+-------------+
| wsrep_cert_deps_distance | 5421.252076 |
+--------------------------+-------------+
1 row in set (0.00 sec)
其他参数、状态值
1. wsrep_local_recv_queue_%
mysql> show global status like 'wsrep_local_recv_queue_avg';
+----------------------------+----------+
| Variable_name | Value |
+----------------------------+----------+
| wsrep_local_recv_queue_avg | 0.110581 |
+----------------------------+----------+
1 row in set (0.00 sec)
When the node returns a value higher than 0.0 it means that the node cannot apply write-sets as fast as it receives them,
which can lead to replication throttling.
简单地说:这个值高于0.0,说明发生同步延迟,将会引起限流
mysql> show global status like 'wsrep_local_recv_queue_m%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| wsrep_local_recv_queue_max | 3788 |
| wsrep_local_recv_queue_min | 0 |
+----------------------------+-------+
2 rows in set (0.00 sec)