项目组反馈一台生产环境阿里的MySQL RDS上,有一条SQL语句导致这台RDS的CPU利用率达到95之上,这条语句实际更新的数据只有一行,执行时间却在30秒左右。由于是生产环境,这里对语句的表名和列名都做一下调整,调整之后的语句如下:
update qx set qx.col_d=qx.col_d + 0.00001 where (qx.col_a,qx.col_b,qx.col_c) = (
select w.col_a,w.col_b, w.col_c from (
select (@i:=@i+1) rn, qy.* from qx qy, (select @i:=0) it where qx.col_a=12345 and qy.col_d > 0 order by qy.col_b,qy.col_c desc) w
where w.rn=1
)
这条语句里col_a,col_b, col_c是这个表的主键,
select w.col_a,w.col_b, w.col_c from (
select (@i:=@i+1) rn, qy.* from qx qy, (select @i:=0) it where qx.col_a=12345 and qy.col_d > 0 order by qy.col_b,qy.col_c desc) w
where w.rn=1
子查询获取qx表中col_a值为12345的行,然后根据col_b,col_c 反向排序后,返回第一行。这里面变量i的使用是为了生成行号,然后再外层查询获取行号为1的值,从而达到获取排序后首行的值的目的。
这个子查询使用了变量,导致了不必要的复杂性。其实,利用MySQL的limit语句,这个子查询完全可以写成更简单的形式。不过,这不是导致这条语句性能问题的原因。
下面看一下这条语句的执行计划:
熟悉MySQL的DBA一眼就可以看出这个执行计划的问题所在。这条update语句的执行计划中出现了UNCACHEABLE SUNQUERY,关于UNCACHEABLE SUNQUERY子查询,官方参考文档里是这样描述的:
A subquery for which the resul cannot be cached and must be re-evaluated for each row of the outer query。
翻译一下就是:一条不能被缓存的子查询,对于外部查询的每一行,必须重新执行。这个子查询的外部查询是update操作。返回的行数是1509092行,也就是说每次执行这条sql语句,这个子查询就要执行150多万次。因此,这条语句不仅执行时间长,消耗的CPU资源也非常大。
对于这种采用了子查询的update语句,常用的优化方式是将其改成内连接的形式。这次优化也是这样做的。将其改为内连接的方式后,执行时间只有几毫秒,对于cpu的消耗也基本可以忽略了。下面通过另一个表的类似的数据演示一下这种改写及改写后的效果。先看一下这个表的数据:
mysql> select * from lap_times limit 1;
+--------+----------+------+----------+----------+--------------+
| raceId | driverId | lap | position | time | milliseconds |
+--------+----------+------+----------+----------+--------------+
| 841 | 20 | 1 | 1 | 1:38.109 | 98109 |
+--------+----------+------+----------+----------+--------------+
1 row in set (0.00 sec)
mysql> select count(*) from lap_times;
+----------+
| count(*) |
+----------+
| 2156704 |
+----------+
1 row in set (2.69 sec)
lap_times表一共有200多万行,表上的索引如下图所示:
执行update语句,看一下执行时间:
mysql> update lap_times tx set tx.milliseconds = tx.milliseconds+1 where (tx.raceId, tx.driverId, tx.lap) =
-> (select t_tmp.raceId, t_tmp.driverId,t_tmp.lap from (
-> select (@i:=@i+1) rn, ti.* from lap_times ti, (select @i:=0) it where ti.raceId=1 and ti.milliseconds> 0 order by ti.driverId,ti.lap desc) t_tmp
-> where t_tmp.rn=1
-> );
Query OK, 4 rows affected, 2 warnings (15.25 sec)
Rows matched: 4 Changed: 4 Warnings: 2
语句时间为15.25秒,语句的执行计划为
mysql> explain update lap_times tx set tx.milliseconds = tx.milliseconds+1 where (tx.raceId, tx.driverId, tx.lap) = (select t_tmp.raceId, t_tmp.driverId,t_tmp.lap from ( select (@i:=@i+1) rn, ti.* from lap_times ti, (select @i:=0) it where ti.raceId=1 and ti.milliseconds> 0 order by ti.driverId,ti.lap desc) t_tmp where t_tmp.rn=1 );
+----+----------------------+------------+------------+--------+-----------------+-------------+---------+-------+---------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+----------------------+------------+------------+--------+-----------------+-------------+---------+-------+---------+----------+----------------+
| 1 | UPDATE | tx | NULL | ALL | NULL | NULL | NULL | NULL | 2020504 | 100.00 | Using where |
| 2 | UNCACHEABLE SUBQUERY | | NULL | ref | | | 9 | const | 10 | 100.00 | NULL |
| 3 | DERIVED | | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | Using filesort |
| 3 | DERIVED | ti | NULL | ref | idx_r_d,lap_key | idx_r_d | 5 | const | 4020 | 33.33 | Using where |
| 4 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+----------------------+------------+------------+--------+-----------------+-------------+---------+-------+---------+----------+----------------+
5 rows in set, 3 warnings (0.00 sec)
讲这个语句改成内连接的方式
update lap_times tx inner join (
select ty.* from lap_times ty , (select @i:=0) as it where ty.raceId=1 and ty.milliseconds > 0 order by ty.driverId,ty.lap desc limit 1) tz
on tz.driverId=tx.driverId and tz.raceId=tx.raceId and tz.lap=tx.lap
set tx.milliseconds= tx.milliseconds+1;
由于原来这条语句的内查询返回的只有一行,所以改成内连接的方式语句的执行结果依然是正确的,这条语句的执行时间为
Query OK, 4 rows affected, 1 warning (0.04 sec)
Rows matched: 4 Changed: 4 Warnings: 1
执行时间只有40毫秒。改写后的执行计划如下:
mysql> explain update lap_times tx inner join (
-> select ty.* from lap_times ty , (select @i:=0) as it where ty.raceId=1 and ty.milliseconds > 0 order by ty.driverId,ty.lap desc limit 1) tz
-> on tz.driverId=tx.driverId and tz.raceId=tx.raceId and tz.lap=tx.lap
-> set tx.milliseconds= tx.milliseconds+1;
+----+-------------+------------+------------+--------+-----------------+---------+---------+-------------------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+-----------------+---------+---------+-------------------+------+----------+----------------+
| 1 | PRIMARY | | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | UPDATE | tx | NULL | ref | idx_r_d,lap_key | lap_key | 15 | const,const,const | 4 | 100.00 | NULL |
| 2 | DERIVED | | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | Using filesort |
| 2 | DERIVED | ty | NULL | ref | idx_r_d,lap_key | idx_r_d | 5 | const | 4020 | 33.33 | Using where |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+------------+--------+-----------------+---------+---------+-------------------+------+----------+----------------+
5 rows in set, 2 warnings (0.01 sec)
MySQL对连接顺序进行了优化,原来的子查询成了外部查询,update则成了内部查询,可以利用上索引lap_key了。