【MySQL 8.0神器揭秘派生表条件下推——让你的SQL飙车不再是梦想!

2024年 1月 19日 53.7k 0

MySQL子查询(subqueries)优化——派生条件下推

作者:黄华亮,现任Oracle MySQL大中华区MySQL解决方案工程师,专注MySQL技术十余年,对MySQL和开源数据库有丰富的实践经验。

1.序言

最近遇到了不少MySQL性能优化的案例,都和子查询有关,今天就这个话题做一定的分析。

首先,很多数据库的从业者都会有一种感觉,MySQL的优化器较弱,这可能是固化在MySQL早期版本的认识,现如今,MySQL在8.0已经在优化器方面做了非常多的优化。而子查询的优化通常也会令DBA感受一些压力,通常DBA会建议研发不要写复杂的子查询SQL,但现实却经常打脸,一些框架封装生成的SQL或一些外采系统,改写SQL变得不太实际,因此对SQL上优化在关键时候也非常有效。今天我就子查询派生条件下推做一些讨论。

2.派生条件下推

2.1 何为派生条件下推,翻译官方的描述如下:

MySQL 8.0.22及更高版本支持符合条件的子查询的派生条件下推。对于如SELECT*FROM(SELECT i,j FROM t1)as dt WHERE i>常量之类的查询,在许多情况下,可以将外部WHERE条件下推到派生表,在这种情况下,会导致SELECT*FROM(SELECT i、j FROM t1 WHERE i>常量)as dt。当派生表无法合并到外部查询中时(例如:如果派生表使用聚合),将外部WHERE条件下推到派生表应该会减少需要处理的行数,从而加快查询的执行。

https://dev.mysql.com/doc/refman/8.0/en/derived-condition-pushdown-optimization.html

派生条件下推一句话理解即为:外查询与派生表相关的条件会被推入到派生表中作为条件,以减少处理的数据行数,加速查询速度。

派生表 (Derived Table):派生表是在查询FROM子句的范围内生成表的表达式。例如,SELECT语句FROM子句中的子查询是派生表:

SELECT ... FROM (subquery) [AS] tb_alias_name ...

JSON_TABLE()函数生成一个表,并提供另一种创建派生表的方法:

SELECT * FROM JSON_TABLE(arg_list) [AS] tb_alias_name ...

[AS]tb_alias_name子句是必需的,因为FROM子句中的每个表都必须有一个名称。派生表中的任何列都必须具有唯一的名称。或者tb_alias_name后面可以跟一个带括号的派生表列名称列表:

SELECT ... FROM (subquery) [AS] tb_alias_name (col_list) ...

col_list的列的数目必须与subquery列的数目相同。

*派生表可以返回标量、列、行或表。

2.2 在以下情况下,可以将外部WHERE条件下推到派生的物化表中:

  • 当派生表不使用聚合或窗口函数时,可以直接向下推送外部WHERE条件。这包括WHERE条件,该条件具有用AND、OR或两者连接的多个谓词。

    例如:SELECT*FROM(SELECT f1,f2 FROM t1)AS dt WHERE f1<3 AND f2>11;

    被重写为: SELECT f1,f2 FROM(SELECT f1,f2 FROM t1 WHERE f1<3 AND f2>11)AS dt。

  • 当派生表具有GROUP BY并且不使用窗口函数时,引用一个或多个不属于GROUP BY的列的外部WHERE条件可以作为HAVING条件下推到派生表。

   例如:SELECT * FROM (SELECT i, j, SUM(k) AS sum FROM t1 GROUP BY i, j) AS dt WHERE sum > 100;

通过物化条件下推重写为:SELECT * FROM (SELECT i, j, SUM(k) AS sum FROM t1 GROUP BY i, j HAVING sum > 100) AS dt.

  • 当派生表使用GROUP BY并且外部WHERE条件中的列是GROUP BY列时,引用这些列的WHERE条件可以直接下推到派生表。

例如:SELECT * FROM (SELECT i,j, SUM(k) AS sum FROM t1 GROUP BY i,j) AS dt WHERE i > 10;

 被重写为:  SELECT * FROM (SELECT i,j, SUM(k) AS sum FROM t1 WHERE i > 10 GROUP BY i,j) AS dt.

2.3 如何启用派生条件下推?

1.优化器开关系统变量derived_condition_pushdown(8.0.22新增此变量)必须设置为on,这也是默认设置。如果优化器开关禁用了此优化,则可以使用DERIVED_CONDITION_PUSHDOWN优化器提示为特定查询启用此优化。要禁用给定查询的优化,请使用NO_DERIVED_CONDITION_PUSHDOWN优化器提示。

2.hints

对派生条件下推,MySQL提供了以下Hints:

  • /*+ NO_DERIVED_CONDITION_PUSHDOWN(dervied table name) */

  • /*+ DERIVED_CONDITION_PUSHDOWN(dervied table name) */

因hints是定义在SQL级别,优先级高于优化器变量的设置。

mysql> explain select /*+ NO_DERIVED_CONDITION_PUSHDOWN(a) */ a.c, a.pad, a.k from
   -> (select k, c, pad  from sbtest1
   -> union all
   -> select k, c, pad from sbtest2) a
   -> where k=10739884;
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+-------+
| id | select_type | table      | partitions | type | possible_keys | key         | key_len | ref   | rows     | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+-------+
|  1 | PRIMARY     | | NULL       | ref  |   | | 4       | const |       10 |   100.00 | NULL  |
|  2 | DERIVED     | sbtest1    | NULL       | ALL  | NULL          | NULL        | NULL    | NULL  | 39402496 |   100.00 | NULL  |
|  3 | UNION       | sbtest2    | NULL       | ALL  | NULL          | NULL        | NULL    | NULL  | 39456098 |   100.00 | NULL  |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+-------+
3 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select /*+ NO_DERIVED_CONDITION_PUSHDOWN(`a`@`select#1`) */ `a`.`c` AS `c`,`a`.`pad` AS `pad`,`a`.`k` AS `k` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,`demo`.`sbtest1`.`c` AS `c`,`demo`.`sbtest1`.`pad` AS `pad` from `demo`.`sbtest1` union all /* select#3 */ select `demo`.`sbtest2`.`k` AS `k`,`demo`.`sbtest2`.`c` AS `c`,`demo`.`sbtest2`.`pad` AS `pad` from `demo`.`sbtest2`) `a` where (`a`.`k` = 10739884) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

2.4 派生条件下推演示

以下演示基于如下MySQL版本及数据测试:

  • MySQL version: 8.0.34

  • Data Size: 2张表,单表都约9.1GB,约 39,402,496 行。

  • 表元数据如下:

mysql> show create table sbtest1\G
*************************** 1. row ***************************
      Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
 `id` int NOT NULL AUTO_INCREMENT,
 `k` int NOT NULL DEFAULT '0',
 `c` char(120) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
 `pad` char(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `idx_k` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=39907018 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)
mysql> show create table sbtest2\G
*************************** 1. row ***************************
      Table: sbtest2
Create Table: CREATE TABLE `sbtest2` (
 `id` int NOT NULL AUTO_INCREMENT,
 `k` int NOT NULL DEFAULT '0',
 `c` char(120) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
 `pad` char(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `idx_k` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=39989509 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)

优化器派生条件下推变量derived_condition_pushdown默认开启,如下:

mysql> SELECT @@optimizer_switch\G
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=off,skip_scan=on,hash_join=on,subquery_to_derived=off,prefer_ordering_index=on,hypergraph_optimizer=off,derived_condition_pushdown=on
1 row in set (0.00 sec)

1)本文2.2中的第一种情况

mysql> explain SELECT * FROM (SELECT k, c, pad FROM sbtest1) AS dt WHERE k=10739884;
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| id | select_type | table   | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | sbtest1 | NULL       | ref  | idx_k         | idx_k | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.01 sec)
#-- 查看到原sql被rewrite。
mysql> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                     |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `demo`.`sbtest1`.`k` AS `k`,`demo`.`sbtest1`.`c` AS `c`,`demo`.`sbtest1`.`pad` AS `pad` from `demo`.`sbtest1` where (`demo`.`sbtest1`.`k` = 10739884) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> SELECT * FROM (SELECT k, c, pad FROM sbtest1) AS dt WHERE k=10739884;
+----------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| k        | c                                                                                                                       | pad                                                         |
+----------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 10739884 | 03816723415-61774461890-10431645322-98253769003-42610075148-02216493313-28973989001-60586458076-35045965632-83190664828 | 81783953567-06379628259-87543933402-92690557354-81683285060 |
+----------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)

2)本文2.2中的第二种情况

mysql> desc SELECT * FROM (SELECT k, SUM(k) AS sum FROM sbtest1 GROUP BY k) AS a WHERE sum show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                             |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `a`.`k` AS `k`,`a`.`sum` AS `sum` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,sum(`demo`.`sbtest1`.`k`) AS `sum` from `demo`.`sbtest1` group by `demo`.`sbtest1`.`k` having (sum(`demo`.`sbtest1`.`k`) explain SELECT * FROM (SELECT k, SUM(id) AS sum FROM sbtest1 GROUP BY k, c) AS a WHERE k =10739884;
+----+-------------+------------+------------+------+---------------+-------+---------+-------+------+----------+-----------------+
| id | select_type | table      | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra           |
+----+-------------+------------+------------+------+---------------+-------+---------+-------+------+----------+-----------------+
|  1 | PRIMARY     | | NULL       | ALL  | NULL          | NULL  | NULL    | NULL  |    2 |   100.00 | NULL            |
|  2 | DERIVED     | sbtest1    | NULL       | ref  | idx_k         | idx_k | 4       | const |    1 |   100.00 | Using temporary |
+----+-------------+------------+------------+------+---------------+-------+---------+-------+------+----------+-----------------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                   |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `a`.`k` AS `k`,`a`.`sum` AS `sum` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,sum(`demo`.`sbtest1`.`id`) AS `sum` from `demo`.`sbtest1` where (`demo`.`sbtest1`.`k` = 10739884) group by `demo`.`sbtest1`.`k`,`demo`.`sbtest1`.`c`) `a` |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM (SELECT k, SUM(id) AS sum FROM sbtest1 GROUP BY k, c) AS a WHERE k =10739884;
+----------+--------+
| k        | sum    |
+----------+--------+
| 10739884 | 528507 |
+----------+--------+
1 row in set (0.00 sec)

4)带union的派生表

下面我们对带有union的派生表测试如下:

SQL statement:

select a.c, a.pad, a.k from
(select k, c, pad  from sbtest1
union all
select k, c, pad from sbtest2) a
where k=10739884;

执行计划及执行结果如下:

mysql> explain select sql_no_cache a.c, a.pad, a.k from
   -> (select k, c, pad  from sbtest1
   -> union all
   -> select k, c, pad from sbtest2) a
   -> where k=10739884;
+----+-------------+------------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| id | select_type | table      | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------+---------+-------+------+----------+-------+
|  1 | PRIMARY     | | NULL       | ALL  | NULL          | NULL  | NULL    | NULL  |    4 |   100.00 | NULL  |
|  2 | DERIVED     | sbtest1    | NULL       | ref  | idx_k         | idx_k | 4       | const |    1 |   100.00 | NULL  |
|  3 | UNION       | sbtest2    | NULL       | ref  | idx_k         | idx_k | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+------------+------------+------+---------------+-------+---------+-------+------+----------+-------+
3 rows in set, 2 warnings (0.00 sec)
mysql> show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1681 | 'SQL_NO_CACHE' is deprecated and will be removed in a future release.                                                                                                                                                                                                                                                                                                                                                                               |
| Note    | 1003 | /* select#1 */ select `a`.`c` AS `c`,`a`.`pad` AS `pad`,`a`.`k` AS `k` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,`demo`.`sbtest1`.`c` AS `c`,`demo`.`sbtest1`.`pad` AS `pad` from `demo`.`sbtest1` where (`demo`.`sbtest1`.`k` = 10739884) union all /* select#3 */ select `demo`.`sbtest2`.`k` AS `k`,`demo`.`sbtest2`.`c` AS `c`,`demo`.`sbtest2`.`pad` AS `pad` from `demo`.`sbtest2` where (`demo`.`sbtest2`.`k` = 10739884)) `a` |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select a.c, a.pad, a.k from
   -> (select k, c, pad  from sbtest1
   -> union all
   -> select k, c, pad from sbtest2) a
   -> where k=10739884;
+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+----------+
| c                                                                                                                       | pad                                                         | k        |
+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+----------+
| 03816723415-61774461890-10431645322-98253769003-42610075148-02216493313-28973989001-60586458076-35045965632-83190664828 | 81783953567-06379628259-87543933402-92690557354-81683285060 | 10739884 |
+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+----------+
1 row in set (0.00 sec)

可以看到在对于union的派生表,通过下推派生条件做了优化。

由此可得到结论:

8.0.22 开始支持派生条件pushdown,但是到了8.0.29 才开始支持union 派生条件pushdown(除了下面2.5第一条中提到的2点不支持的union)。因8.0.29 不提供下载,原则上需要8.0.30 及以上版本。

2.5 派生条件下推的限制

  • 如果派生表包含UNION,则无法使用优化。但是MySQL 8.0.29中取消了此限制(2.4中的演示有验证这一点)。

        以下2种情况带有union的派生表不能派生条件下推。

    • 如果作为UNION一部分的任何物化派生表是递归通用表表达式,则条件下推不能与UNION查询一起使用(请参见递归通用表表达:https://dev.mysql.com/doc/refman/8.0/en/with.html#common-table-expressions-recursive)。

    • 不能将包含不确定性表达式的条件下推到派生表中。

  • 派生表不能使用limit子句。

mysql> explain select * from (select k, c, pad  from sbtest1 limit 10) as a where k=10739884;
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+-------+
| id | select_type | table      | partitions | type | possible_keys | key         | key_len | ref   | rows     | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+-------+
|  1 | PRIMARY     | | NULL       | ref  |   | | 4       | const |        1 |   100.00 | NULL  |
|  2 | DERIVED     | sbtest1    | NULL       | ALL  | NULL          | NULL        | NULL    | NULL  | 39402496 |   100.00 | NULL  |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+-------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                   |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `a`.`k` AS `k`,`a`.`c` AS `c`,`a`.`pad` AS `pad` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,`demo`.`sbtest1`.`c` AS `c`,`demo`.`sbtest1`.`pad` AS `pad` from `demo`.`sbtest1` limit 10) `a` where (`a`.`k` = 10739884) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

  • 条件包含子查询,不能推入到派生表。

  • 如果派生表是外部联接的内部表,则无法使用优化。

  • 如果物化派生表是通用表表达式,则如果多次引用该表,则不会向下推送条件。

  • 如果条件的形式为derived_column>?,则可以向下推送使用参数的条件?。如果外部WHERE条件中的派生列是具有?在基础派生表中,不能向下推此条件。

  • 如果派生表是一个视图,视图创建时使用了ALGORITHM=TEMPTABLE,则条件不会推入进视图。

  • 从MySQL 8.0.28开始,如果派生表的SELECT包含对用户变量的任何赋值,则不能向下推条件到派生表。(Bug#104918)

关于MySQL对于SQL子查询派生条件下推到派生表的优化,这次就暂时到这里。后续会介绍更多关于MySQL中对于SQL子查询的其它优化方式,敬待后续文章。

感谢您关注“MySQL解决方案工程师”!

相关文章

Oracle如何使用授予和撤销权限的语法和示例
Awesome Project: 探索 MatrixOrigin 云原生分布式数据库
下载丨66页PDF,云和恩墨技术通讯(2024年7月刊)
社区版oceanbase安装
Oracle 导出CSV工具-sqluldr2
ETL数据集成丨快速将MySQL数据迁移至Doris数据库

发布评论