之前遇到一个DBA,在生产库上加字段,导致数据库连接数打满。原因就是MDL锁引起。下面让我来介绍一下MDL锁及其排查和处理方式。
MDL锁:全称meta data lock,是表锁,用于保护数据库对象定义不被修改。执行SQL语句操作表都是需要获取和持有MDL锁,直到锁被释放。
在 MySQL 5.5 版本中引入了 MDL,当对一个表做增删改查操作的时候,加 MDL 读锁;当要对表做结构变更操作的时候,加 MDL 写锁。
- 读锁之间不互斥,因此你可以有多个线程同时对一张表增删改查。
- 读写锁之间、写锁之间是互斥的,用来保证变更表结构操作的安全性。因此,如果有两个线程要同时给一个表加字段,其中一个要等另一个执行完才能开始执行。
案例描述:一位同学查询只有2条数据的test.tt1,10秒后却返回锁等待超时报错。
案例警示:
- 要对生产数据库有敬畏之心
- 业务高峰,不建议对大表和热表做DDL变更(推荐gh-ost)
- 有效的数据库备份,重于一切
- 良好的数据库运维规范是减少故障的基础(DBA风险操作,需要审核和通知业务方)
技术回放:数据库版本、表结构和报错信息,如下
mysql> select version();
+---------------+
| version() |
+---------------+
| 5.7.38-41-log |
+---------------+
1 row in set (0.54 sec)
mysql> show create table tmp_mdl_lock ;
+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tmp_mdl_lock | CREATE TABLE `tmp_mdl_lock` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 |
+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select * from test.tmp_mdl_lock;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
拓展:
lock_wait_timeout设置了10秒,因此上面的SQL执行了10秒报锁等待超时错误
如果表 test.tmp_mdl_lock上的查询语句频繁,而且客户端有重试机制,也就是说超时后会再起一个新SESSION再请求的话,这个库的线程很快就会爆满
DBA分析和处理过程:
1.首先查看会话信息,发现Waiting for table metadata lock,确认为DML锁引起
select * from information_schema.processlist where info is not null;
+-----------+------+-----------+------+---------+------+---------------------------------+---------------------------------------------------------------------+---------+-----------+---------------+
| ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO | TIME_MS | ROWS_SENT | ROWS_EXAMINED |
+-----------+------+-----------+------+---------+------+---------------------------------+---------------------------------------------------------------------+---------+-----------+---------------+
| 251182208 | root | localhost | NULL | Query | 0 | executing | select * from information_schema.processlist where info is not null | 2 | 0 | 0 |
| 251181173 | root | localhost | NULL | Query | 3 | Waiting for table metadata lock | select * from test.tmp_mdl_lock | 3077 | 0 | 0 |
+-----------+------+-----------+------+---------+------+---------------------------------+---------------------------------------------------------------------+---------+-----------+---------------+
2.查询持有并导致其他事务阻塞的连接会话
select * from sys.schema_table_lock_waits \G
*************************** 1. row ***************************
object_schema: test
object_name: tmp_mdl_lock
waiting_thread_id: 251885079
waiting_pid: 251181173
waiting_account: root@localhost
waiting_lock_type: SHARED_READ
waiting_lock_duration: TRANSACTION
waiting_query: select * from test.tmp_mdl_lock
waiting_query_secs: 10
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 251884951
blocking_pid: 251181044
blocking_account: root@localhost
blocking_lock_type: SHARED_NO_READ_WRITE
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 251181044
sql_kill_blocking_connection: KILL 251181044
1 row in set (0.05 sec)
拓展:sys.schema_table_lock_waits默认为空,请打开mdl对应的instrument;
show global variables like 'performance_schema';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| performance_schema | ON |
+--------------------+-------+
update setup_instruments set ENABLED = 'yes' ,timed = 'yes' where NAME = 'wait/lock/metadata/sql/mdl';
select * from setup_instruments where name = 'wait/lock/metadata/sql/mdl';
+----------------------------+---------+-------+
| NAME | ENABLED | TIMED |
+----------------------------+---------+-------+
| wait/lock/metadata/sql/mdl | YES | YES |
3.KILL 251181044,断开线程的连接;
拓展:
kill query 线程id,表示终止这个线程中正在执行的语句;
kill 线程id,表示断开这个线程的连接,这个连接中未提交的事务会回滚、在执行的SQL会停止;
如何避免MDL锁:
- 数据库升级成MySQL 8.0
- 避免在热表和大表上做DDL操作,推荐在业务低峰期试用gh-ost做DDL变更
- 避免使用长事务
分享数据库故障处理的微信公众号:MySQL_DBA,欢迎关注,谢谢!