问题现象
用户有一套环境偶发出现:Too many connections 的报错。
以下是通过 sysbench 进行模拟和问题排查的过程
/usr/bin/sysbench /usr/share/sysbench/oltp_read_write.lua --db-driver=mysql --mysql-user=user3@test3 --mysql-password=xxx\
--mysql-port=2883 --mysql-host=11.x.x.192 --mysql-db=sbtest --tables=10 \
--table-size=1000000 --threads=2000 --report-interval=2 --time=3600 --db-ps-mode=disable run
...
FATAL: unable to connect to MySQL server on host '11.x.x.192', port 2883, aborting...
FATAL: error 1040: Too many connections
FATAL: `thread_init' function failed: /usr/share/sysbench/oltp_common.lua:349: connection creation failed
FATAL: Thread initialization failed!
问题排查
1、有过MySQL运维经验的小伙伴肯定知道在MySQL里有个变量 max_connections, 在 OceanBase mysql模式下也同样有该变量,且含义是一致的,经过确认该环境里 max_connections 变量是默认值且足够大(2147483647),排除该变量的影响。
MySQL [oceanbase]> show variables like '%connect%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_connection | utf8mb4 |
| collation_connection | utf8mb4_general_ci |
| connect_timeout | 10 |
| init_connect | |
| max_connections | 2147483647 |
| max_user_connections | 0 |
+--------------------------+--------------------+
6 rows in set (0.00 sec)
2、压测的入口是先经过 obproxy,确认obproxy的参数 client_max_connections 也是默认值 8192,排除该参数的影响,在obproxy_error.log 日志里我们发现:
2024-02-28 11:03:53.399766,undefined,,,,obcluster:test3:,OB_MYSQL,,,OB_MYSQL_COM_LOGIN,,failed,1040,,553us,283us,0us,222us,Y0-00007FD8D4A55840,,,,0,11.x.x.192:32881,Too many connections,
转化后:
echo "2024-02-28 11:03:53.399766,undefined,,,,obcluster:test3:,OB_MYSQL,,,OB_MYSQL_COM_LOGIN,,failed,1040,,553us,283us,0us,222us,Y0-00007FD8D4A55840,,,,0,11.x.x.192:32881,Too many connections," | tr ',' '\n' | cat -n
1 2024-02-28 11:03:53.399766
2 undefined
3
4
5
6 obcluster:test3:
7 OB_MYSQL
8
9
10 OB_MYSQL_COM_LOGIN
11
12 failed
13 1040
14
15 553us
16 283us
17 0us
18 222us
19 Y0-00007FD8D4A55840
20
21
22
23 0
24 11.x.x.192:32881
25 Too many connections
26
参考 obproxy 的官网文档关于 obproxy_error.log 的字段说明,第 13 个字段表示的是错误码 (1040)
./ob_error 1040
MySQL:
MySQL Error Code: 1040 (08004)
Message: Too many connections
Related OceanBase Error Code:
OB_ERR_CON_COUNT_ERROR(-5271)
第 24 个字段 表示的是 路由到的 OBServer 节点的地址信息 (11.x.x.192:32881),我们在该节点上对应时间段的 observer.log 日志里过滤
grep "\-1040" observer.log.20240228110427566 | vim -
-- 为空
grep "\-5271" observer.log.20240228110427566 | vim -
-- 有输出
找到第一条报 5271 的 traceid ,根据该traceid 过滤后如下:
grep "Y0-000612478A78E738-0-0" observer.log.20240228110427566
[2024-02-28 11:03:53.053494] WDIAG [SERVER] verify_connection (obmp_connect.cpp:1866) [2930][T1002_MysqlQueu][T1002][Y0-000612478A78E738-0-0] [lt=1][errcode=-5271] too much sessions(ret=-5271, tenant_id=1002, cur_connections=524, max_sess_num=524, tenant_name_=test3, user_name_=user3)
[2024-02-28 11:03:53.053513] WDIAG [SERVER] process (obmp_connect.cpp:333) [2930][T1002_MysqlQueu][T1002][Y0-000612478A78E738-0-0] [lt=18][errcode=-5271] verify connection fail(ret=-5271)
[2024-02-28 11:03:53.053527] INFO [SERVER] send_error_packet (obmp_packet_sender.cpp:366) [2930][T1002_MysqlQueu][T1002][Y0-000612478A78E738-0-0] [lt=7] sending error packet(ob_error=-5271, client error=1040, extra_err_info=NULL, lbt()="0x129ef88c 0xa1ff1e0 0xa1ae1c6 0xa1df8d6 0x4e0f201 0x138b70d2 0x138b832a 0x138b88a9 0x12d19651 0x12d1590c 0x7f612080be25 0x7f6120535bad")
[2024-02-28 11:03:53.053545] WDIAG [SERVER] get_session (obmp_packet_sender.cpp:592) [2930][T1002_MysqlQueu][T1002][Y0-000612478A78E738-0-0] [lt=14][errcode=-4018] get session fail(ret=-4018, sessid=3222191753, proxy_sessid=838471899490236525)
[2024-02-28 11:03:53.053586] WDIAG [SERVER] get_session (obmp_packet_sender.cpp:592) [2930][T1002_MysqlQueu][T1002][Y0-000612478A78E738-0-0] [lt=1][errcode=-4018] get session fail(ret=-4018, sessid=3222191753, proxy_sessid=838471899490236525)
[2024-02-28 11:03:53.053592] WDIAG [SERVER] disconnect (obmp_packet_sender.cpp:834) [2930][T1002_MysqlQueu][T1002][Y0-000612478A78E738-0-0] [lt=5][errcode=-4016] session is null
[2024-02-28 11:03:53.053599] INFO [SERVER] process (obmp_connect.cpp:475) [2930][T1002_MysqlQueu][T1002][Y0-000612478A78E738-0-0] [lt=6] MySQL LOGIN(direct_client_ip="11.x.x.192", client_ip=11.x.x.191, tenant_name=test3, tenant_id=1002, user_name=user3, host_name=xxx.xxx.xxx.xxx, sessid=3222191753, proxy_sessid=838471899490236525, sess_create_time=0, from_proxy=true, from_java_client=false, from_oci_client=false, from_jdbc_client=false, capability=10461837, proxy_capability=916303, use_ssl=false, c/s protocol="OB_2_0_CS_TYPE", autocommit=false, proc_ret=-5271, ret=0, conn->client_type_=3, conn->client_version_=0)
我们重点看以下日志:
[2024-02-28 11:03:53.053494] WDIAG [SERVER] verify_connection (obmp_connect.cpp:1866) [2930][T1002_MysqlQueu][T1002][Y0-000612478A78E738-0-0] [lt=1][errcode=-5271] too much sessions(ret=-5271, tenant_id=1002, cur_connections=524, max_sess_num=524, tenant_name_=test3, user_name_=user3)
上面的逻辑说明:
- 当 隐藏的参数 _resource_limit_max_session_num 不等于 0 的时候, max_session_num 的取值等于 _resource_limit_max_session_num
- 当 隐藏的参数 _resource_limit_max_session_num 等于 0 的时候,max_session_num = max(100, (unit_memory_size_ * 5 / 100) / (100<<10))
确认当前环境里该隐藏参数是默认值 0
select name,value from gv$ob_parameters where name='_resource_limit_max_session_num' and tenant_id=1002;
到这里我们要看 max_session_num 的取值,就需要看 unit_memory_size 这个变量是如何取值了
不是很明显,这里我们大胆的假设就是对应租户的memory_size 大小,我们确认一下当前测试环境里该租户的内存规格。
# 确认test3租户的规格
MySQL [oceanbase]> select tenant_id from dba_ob_tenants where tenant_name='test3';
+-----------+
| tenant_id |
+-----------+
| 1002 |
+-----------+
# 确认创建test3(1002)租户时使用的memory_size 大小
MySQL [oceanbase]> select round(memory_size/1024/1024/1024,2) as memory_size_gb from dba_ob_units where tenant_id=1002;
+----------------+
| memory_size_gb |
+----------------+
| 2.00 |
| 2.00 |
| 2.00 |
+----------------+
3 rows in set (0.00 sec)
# 确认刨除 meta租户,1002用户租户实际的memory_size大小
MySQL [oceanbase]> select round(memory_size/1024/1024/1024,2) as memory_size_gb from GV$OB_UNITS where tenant_id=1002;
+----------------+
| memory_size_gb |
+----------------+
| 1.00 |
| 1.00 |
| 1.00 |
+----------------+
3 rows in set (0.01 sec)
>>> (2*1024*1024*1024*5/100)/(100<<10)
1048
>>> (1*1024*1024*1024*5/100)/(100<<10)
524
[2024-02-28 11:03:53.053494] WDIAG [SERVER] verify_connection (obmp_connect.cpp:1866) [2930][T1002_MysqlQueu][T1002][Y0-000612478A78E738-0-0] [lt=1][errcode=-5271] too much sessions(ret=-5271, tenant_id=1002, cur_connections=524, max_sess_num=524, tenant_name_=test3, user_name_=user3)
按照代码里面的计算公式结合日志里提示的 max_sess_num=524,我们不难发现代码里这里的 unit_memory_size_ 变量的真正含义应该是指某个具体的用户租户的内存规格,需要刨除掉 meta 租户的内存大小。
至此,我们知道了本次报 Too many connections 的原因:超过了租户的最大并发连接数,具体是跟对应的用户租户的内存规格有关系,详见上面贴出的计算公式,经过和研发老师确认:
- 如果当前租户同时设置了_resource_limit_max_session_num、max_user_connections 和 max_connections 的值,则对于某个普通用户,其并发连接数只要达到其中一个值,即无法再建立新的连接;
- 对于管理员用户,其并发连接数只要达到 max_user_connections 或 max_connections 的其中一个值,即无法再建立新的连接。
说明:
- 感兴趣的小伙伴可以去测试一下 max_connections 、max_user_connections 变量对于租户连接数的控制情况,
- 在没有官方支持特别说明的情况下不建议去调整隐藏的参数:_resource_limit_max_session_num
关于 max_connections 和 max_user_connections 变量的说明详见官网文档:
https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000000510344
https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000000510356
--------------------------------------------
遇到问题莫慌,大胆假设,小心求证。
--------------------------------------------