MySQL子查询(subqueries)优化——派生条件下推
作者:黄华亮,现任Oracle MySQL大中华区MySQL解决方案工程师,专注MySQL技术十余年,对MySQL和开源数据库有丰富的实践经验。
作者:黄华亮,现任Oracle MySQL大中华区MySQL解决方案工程师,专注MySQL技术十余年,对MySQL和开源数据库有丰富的实践经验。
1.序言
最近遇到了不少MySQL性能优化的案例,都和子查询有关,今天就这个话题做一定的分析。
首先,很多数据库的从业者都会有一种感觉,MySQL的优化器较弱,这可能是固化在MySQL早期版本的认识,现如今,MySQL在8.0已经在优化器方面做了非常多的优化。而子查询的优化通常也会令DBA感受一些压力,通常DBA会建议研发不要写复杂的子查询SQL,但现实却经常打脸,一些框架封装生成的SQL或一些外采系统,改写SQL变得不太实际,因此对SQL上优化在关键时候也非常有效。今天我就子查询派生条件下推做一些讨论。
2.派生条件下推
2.1 何为派生条件下推,翻译官方的描述如下:
MySQL 8.0.22及更高版本支持符合条件的子查询的派生条件下推。对于如SELECT*FROM(SELECT i,j FROM t1)as dt WHERE i>常量之类的查询,在许多情况下,可以将外部WHERE条件下推到派生表,在这种情况下,会导致SELECT*FROM(SELECT i、j FROM t1 WHERE i>常量)as dt。当派生表无法合并到外部查询中时(例如:如果派生表使用聚合),将外部WHERE条件下推到派生表应该会减少需要处理的行数,从而加快查询的执行。
https://dev.mysql.com/doc/refman/8.0/en/derived-condition-pushdown-optimization.html
派生条件下推一句话理解即为:外查询与派生表相关的条件会被推入到派生表中作为条件,以减少处理的数据行数,加速查询速度。
派生表 (Derived Table):派生表是在查询FROM子句的范围内生成表的表达式。例如,SELECT语句FROM子句中的子查询是派生表:
SELECT ... FROM (subquery) [AS] tb_alias_name ...
JSON_TABLE()函数生成一个表,并提供另一种创建派生表的方法:
SELECT * FROM JSON_TABLE(arg_list) [AS] tb_alias_name ...
[AS]tb_alias_name子句是必需的,因为FROM子句中的每个表都必须有一个名称。派生表中的任何列都必须具有唯一的名称。或者tb_alias_name后面可以跟一个带括号的派生表列名称列表:
SELECT ... FROM (subquery) [AS] tb_alias_name (col_list) ...
col_list的列的数目必须与subquery列的数目相同。
*派生表可以返回标量、列、行或表。
2.2 在以下情况下,可以将外部WHERE条件下推到派生的物化表中:
-
当派生表不使用聚合或窗口函数时,可以直接向下推送外部WHERE条件。这包括WHERE条件,该条件具有用AND、OR或两者连接的多个谓词。
例如:SELECT*FROM(SELECT f1,f2 FROM t1)AS dt WHERE f1<3 AND f2>11;
被重写为: SELECT f1,f2 FROM(SELECT f1,f2 FROM t1 WHERE f1<3 AND f2>11)AS dt。
-
当派生表具有GROUP BY并且不使用窗口函数时,引用一个或多个不属于GROUP BY的列的外部WHERE条件可以作为HAVING条件下推到派生表。
例如:SELECT * FROM (SELECT i, j, SUM(k) AS sum FROM t1 GROUP BY i, j) AS dt WHERE sum > 100;
通过物化条件下推重写为:SELECT * FROM (SELECT i, j, SUM(k) AS sum FROM t1 GROUP BY i, j HAVING sum > 100) AS dt.
-
当派生表使用GROUP BY并且外部WHERE条件中的列是GROUP BY列时,引用这些列的WHERE条件可以直接下推到派生表。
例如:SELECT * FROM (SELECT i,j, SUM(k) AS sum FROM t1 GROUP BY i,j) AS dt WHERE i > 10;
被重写为: SELECT * FROM (SELECT i,j, SUM(k) AS sum FROM t1 WHERE i > 10 GROUP BY i,j) AS dt.
2.3 如何启用派生条件下推?
1.优化器开关系统变量derived_condition_pushdown(8.0.22新增此变量)必须设置为on,这也是默认设置。如果优化器开关禁用了此优化,则可以使用DERIVED_CONDITION_PUSHDOWN优化器提示为特定查询启用此优化。要禁用给定查询的优化,请使用NO_DERIVED_CONDITION_PUSHDOWN优化器提示。
2.hints
对派生条件下推,MySQL提供了以下Hints:
-
/*+ NO_DERIVED_CONDITION_PUSHDOWN(dervied table name) */
-
/*+ DERIVED_CONDITION_PUSHDOWN(dervied table name) */
因hints是定义在SQL级别,优先级高于优化器变量的设置。
mysql> explain select /*+ NO_DERIVED_CONDITION_PUSHDOWN(a) */ a.c, a.pad, a.k from
-> (select k, c, pad from sbtest1
-> union all
-> select k, c, pad from sbtest2) a
-> where k=10739884;
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+-------+
| 1 | PRIMARY | | NULL | ref | | | 4 | const | 10 | 100.00 | NULL |
| 2 | DERIVED | sbtest1 | NULL | ALL | NULL | NULL | NULL | NULL | 39402496 | 100.00 | NULL |
| 3 | UNION | sbtest2 | NULL | ALL | NULL | NULL | NULL | NULL | 39456098 | 100.00 | NULL |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+-------+
3 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select /*+ NO_DERIVED_CONDITION_PUSHDOWN(`a`@`select#1`) */ `a`.`c` AS `c`,`a`.`pad` AS `pad`,`a`.`k` AS `k` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,`demo`.`sbtest1`.`c` AS `c`,`demo`.`sbtest1`.`pad` AS `pad` from `demo`.`sbtest1` union all /* select#3 */ select `demo`.`sbtest2`.`k` AS `k`,`demo`.`sbtest2`.`c` AS `c`,`demo`.`sbtest2`.`pad` AS `pad` from `demo`.`sbtest2`) `a` where (`a`.`k` = 10739884) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
2.4 派生条件下推演示
以下演示基于如下MySQL版本及数据测试:
-
MySQL version: 8.0.34
-
Data Size: 2张表,单表都约9.1GB,约 39,402,496 行。
-
表元数据如下:
mysql> show create table sbtest1\G
*************************** 1. row ***************************
Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
`id` int NOT NULL AUTO_INCREMENT,
`k` int NOT NULL DEFAULT '0',
`c` char(120) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
`pad` char(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `idx_k` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=39907018 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)
mysql> show create table sbtest2\G
*************************** 1. row ***************************
Table: sbtest2
Create Table: CREATE TABLE `sbtest2` (
`id` int NOT NULL AUTO_INCREMENT,
`k` int NOT NULL DEFAULT '0',
`c` char(120) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
`pad` char(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `idx_k` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=39989509 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)
优化器派生条件下推变量derived_condition_pushdown默认开启,如下:
mysql> SELECT @@optimizer_switch\G
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=off,skip_scan=on,hash_join=on,subquery_to_derived=off,prefer_ordering_index=on,hypergraph_optimizer=off,derived_condition_pushdown=on
1 row in set (0.00 sec)
1)本文2.2中的第一种情况
mysql> explain SELECT * FROM (SELECT k, c, pad FROM sbtest1) AS dt WHERE k=10739884;
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| 1 | SIMPLE | sbtest1 | NULL | ref | idx_k | idx_k | 4 | const | 1 | 100.00 | NULL |
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.01 sec)
#-- 查看到原sql被rewrite。
mysql> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `demo`.`sbtest1`.`k` AS `k`,`demo`.`sbtest1`.`c` AS `c`,`demo`.`sbtest1`.`pad` AS `pad` from `demo`.`sbtest1` where (`demo`.`sbtest1`.`k` = 10739884) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> SELECT * FROM (SELECT k, c, pad FROM sbtest1) AS dt WHERE k=10739884;
+----------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| k | c | pad |
+----------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 10739884 | 03816723415-61774461890-10431645322-98253769003-42610075148-02216493313-28973989001-60586458076-35045965632-83190664828 | 81783953567-06379628259-87543933402-92690557354-81683285060 |
+----------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)
2)本文2.2中的第二种情况
mysql> desc SELECT * FROM (SELECT k, SUM(k) AS sum FROM sbtest1 GROUP BY k) AS a WHERE sum show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `a`.`k` AS `k`,`a`.`sum` AS `sum` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,sum(`demo`.`sbtest1`.`k`) AS `sum` from `demo`.`sbtest1` group by `demo`.`sbtest1`.`k` having (sum(`demo`.`sbtest1`.`k`) explain SELECT * FROM (SELECT k, SUM(id) AS sum FROM sbtest1 GROUP BY k, c) AS a WHERE k =10739884;
+----+-------------+------------+------------+------+---------------+-------+---------+-------+------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------+---------+-------+------+----------+-----------------+
| 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | NULL |
| 2 | DERIVED | sbtest1 | NULL | ref | idx_k | idx_k | 4 | const | 1 | 100.00 | Using temporary |
+----+-------------+------------+------------+------+---------------+-------+---------+-------+------+----------+-----------------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `a`.`k` AS `k`,`a`.`sum` AS `sum` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,sum(`demo`.`sbtest1`.`id`) AS `sum` from `demo`.`sbtest1` where (`demo`.`sbtest1`.`k` = 10739884) group by `demo`.`sbtest1`.`k`,`demo`.`sbtest1`.`c`) `a` |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM (SELECT k, SUM(id) AS sum FROM sbtest1 GROUP BY k, c) AS a WHERE k =10739884;
+----------+--------+
| k | sum |
+----------+--------+
| 10739884 | 528507 |
+----------+--------+
1 row in set (0.00 sec)
4)带union的派生表
下面我们对带有union的派生表测试如下:
SQL statement:
select a.c, a.pad, a.k from
(select k, c, pad from sbtest1
union all
select k, c, pad from sbtest2) a
where k=10739884;
执行计划及执行结果如下:
mysql> explain select sql_no_cache a.c, a.pad, a.k from
-> (select k, c, pad from sbtest1
-> union all
-> select k, c, pad from sbtest2) a
-> where k=10739884;
+----+-------------+------------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | NULL |
| 2 | DERIVED | sbtest1 | NULL | ref | idx_k | idx_k | 4 | const | 1 | 100.00 | NULL |
| 3 | UNION | sbtest2 | NULL | ref | idx_k | idx_k | 4 | const | 1 | 100.00 | NULL |
+----+-------------+------------+------------+------+---------------+-------+---------+-------+------+----------+-------+
3 rows in set, 2 warnings (0.00 sec)
mysql> show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1681 | 'SQL_NO_CACHE' is deprecated and will be removed in a future release. |
| Note | 1003 | /* select#1 */ select `a`.`c` AS `c`,`a`.`pad` AS `pad`,`a`.`k` AS `k` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,`demo`.`sbtest1`.`c` AS `c`,`demo`.`sbtest1`.`pad` AS `pad` from `demo`.`sbtest1` where (`demo`.`sbtest1`.`k` = 10739884) union all /* select#3 */ select `demo`.`sbtest2`.`k` AS `k`,`demo`.`sbtest2`.`c` AS `c`,`demo`.`sbtest2`.`pad` AS `pad` from `demo`.`sbtest2` where (`demo`.`sbtest2`.`k` = 10739884)) `a` |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select a.c, a.pad, a.k from
-> (select k, c, pad from sbtest1
-> union all
-> select k, c, pad from sbtest2) a
-> where k=10739884;
+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+----------+
| c | pad | k |
+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+----------+
| 03816723415-61774461890-10431645322-98253769003-42610075148-02216493313-28973989001-60586458076-35045965632-83190664828 | 81783953567-06379628259-87543933402-92690557354-81683285060 | 10739884 |
+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+----------+
1 row in set (0.00 sec)
可以看到在对于union的派生表,通过下推派生条件做了优化。
由此可得到结论:
8.0.22 开始支持派生条件pushdown,但是到了8.0.29 才开始支持union 派生条件pushdown(除了下面2.5第一条中提到的2点不支持的union)。因8.0.29 不提供下载,原则上需要8.0.30 及以上版本。
2.5 派生条件下推的限制
-
如果派生表包含UNION,则无法使用优化。但是MySQL 8.0.29中取消了此限制(2.4中的演示有验证这一点)。
以下2种情况带有union的派生表不能派生条件下推。
-
如果作为UNION一部分的任何物化派生表是递归通用表表达式,则条件下推不能与UNION查询一起使用(请参见递归通用表表达:https://dev.mysql.com/doc/refman/8.0/en/with.html#common-table-expressions-recursive)。
-
不能将包含不确定性表达式的条件下推到派生表中。
-
派生表不能使用limit子句。
mysql> explain select * from (select k, c, pad from sbtest1 limit 10) as a where k=10739884;
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+-------+
| 1 | PRIMARY | | NULL | ref | | | 4 | const | 1 | 100.00 | NULL |
| 2 | DERIVED | sbtest1 | NULL | ALL | NULL | NULL | NULL | NULL | 39402496 | 100.00 | NULL |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+-------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `a`.`k` AS `k`,`a`.`c` AS `c`,`a`.`pad` AS `pad` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,`demo`.`sbtest1`.`c` AS `c`,`demo`.`sbtest1`.`pad` AS `pad` from `demo`.`sbtest1` limit 10) `a` where (`a`.`k` = 10739884) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
-
条件包含子查询,不能推入到派生表。
-
如果派生表是外部联接的内部表,则无法使用优化。
-
如果物化派生表是通用表表达式,则如果多次引用该表,则不会向下推送条件。
-
如果条件的形式为derived_column>?,则可以向下推送使用参数的条件?。如果外部WHERE条件中的派生列是具有?在基础派生表中,不能向下推此条件。
-
如果派生表是一个视图,视图创建时使用了ALGORITHM=TEMPTABLE,则条件不会推入进视图。
-
从MySQL 8.0.28开始,如果派生表的SELECT包含对用户变量的任何赋值,则不能向下推条件到派生表。(Bug#104918)
关于MySQL对于SQL子查询派生条件下推到派生表的优化,这次就暂时到这里。后续会介绍更多关于MySQL中对于SQL子查询的其它优化方式,敬待后续文章。
感谢您关注“MySQL解决方案工程师”!