覆盖索引及最左匹配原则

2023年 8月 15日数据运维捡田螺的小男孩

导语

在数据检索的过程中，经常会有多个列的匹配需求，今天介绍下联合索引的使用以及最左匹配原则的案例。

最左匹配原则作用在联合索引中，假如表中有一个联合索引`（tcol01,tcol02,tcol03）`，只有当SQL使用到`tcol01、tcol02`索引的前提下，`tcol03`的索引才会被使用；同理只有`tcol01`的索引被使用的前提下，`tcol02`的索引才会被使用。

下面我们来列举几个例子来说明。

步骤

使用 `mysql_random_data_load` 创建测试数据

建库和建表

CREATE DATABASE IF NOT EXISTS test;CREATE TABLE `test`.`t3` ( `id` int(11) NOT NULL AUTO_INCREMENT, `tcol01` tinyint(4) DEFAULT NULL, `tcol02` smallint(6) DEFAULT NULL, `tcol03` mediumint(9) DEFAULT NULL, `tcol04` int(11) DEFAULT NULL, `tcol05` bigint(20) DEFAULT NULL, `tcol06` float DEFAULT NULL, `tcol07` double DEFAULT NULL, `tcol08` decimal(10,2) DEFAULT NULL, `tcol09` date DEFAULT NULL, `tcol10` datetime DEFAULT NULL, `tcol11` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `tcol12` time DEFAULT NULL, `tcol13` year(4) DEFAULT NULL, `tcol14` varchar(100) DEFAULT NULL, `tcol15` char(2) DEFAULT NULL, `tcol16` blob, `tcol17` text, `tcol18` mediumtext, `tcol19` mediumblob, `tcol20` longblob, `tcol21` longtext, `tcol22` mediumtext, `tcol23` varchar(3) DEFAULT NULL, `tcol24` varbinary(10) DEFAULT NULL, `tcol25` enum('a','b','c') DEFAULT NULL, `tcol26` set('red','green','blue') DEFAULT NULL, `tcol27` float(5,3) DEFAULT NULL, `tcol28` double(4,2) DEFAULT NULL, PRIMARY KEY (`id`)) ENGINE=InnoDB;

建联合索引

CREATE INDEX idx_tcol123 ON t1(`tcol01`,`tcol02`,`tcol03`);

写入100w条测试数据

./mysql_random_data_load test t1 1000000 --user=root --password=GreatSQL --config-file=/data/GreatSQL/my.cnf

联合索引数据存储方式

先对索引中第一列的数据进行排序，而后在满足第一列数据排序的前提下，再对第二列数据进行排序，以此类推。

如下图：

索引最左原则案例

情况1：三个索引都能使用上

实验1：仅有where子句

# 三个条件都使用上，优化器可以自己调整顺序满足索引要求[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 WHERE tcol02=167 AND tcol03=202019 AND tcol01=1;+----+-------------+-------+------------+------+---------------+-------------+---------+-------------------+------+----------+-------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+-------------+---------+-------------------+------+----------+-------+| 1 | SIMPLE | t1 | NULL | ref | idx_tcol123 | idx_tcol123 | 9 | const,const,const | 1 | 100.00 | NULL |+----+-------------+-------+------------+------+---------------+-------------+---------+-------------------+------+----------+-------+1 row in set, 1 warning (0.11 sec)[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 WHERE tcol02=167 AND tcol01=1 AND tcol03=202019 ;+----+-------------+-------+------------+------+---------------+-------------+---------+-------------------+------+----------+-------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+-------------+---------+-------------------+------+----------+-------+| 1 | SIMPLE | t1 | NULL | ref | idx_tcol123 | idx_tcol123 | 9 | const,const,const | 1 | 100.00 | NULL |+----+-------------+-------+------------+------+---------------+-------------+---------+-------------------+------+----------+-------+

实验2：WHERE 加 order by子句

# 解析出来只有用到tcol01，tcol02索引，由于`explain`不会统计`order by`索引的信息，所有看起来`key_len`长度只有5；当tcol03倒序的时候就会用到`Backward index scan`功能[test]>explain SELECT /* NO_CACHE */ * FROM t1 WHERE tcol01=1 AND tcol02=167 order by tcol03;+----+-------------+-------+------------+------+---------------+-------------+---------+-------------+------+----------+-------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+-------------+---------+-------------+------+----------+-------+| 1 | SIMPLE | t1 | NULL | ref | idx_tcol123 | idx_tcol123 | 5 | const,const | 269 | 100.00 | NULL |+----+-------------+-------+------------+------+---------------+-------------+---------+-------------+------+----------+-------+1 row in set, 1 warning (0.01 sec)[test]>explain SELECT /* NO_CACHE */ * FROM t1 WHERE tcol01=1 AND tcol02=167 order by tcol03 desc;+----+-------------+-------+------------+------+---------------+-------------+---------+-------------+------+----------+---------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+-------------+---------+-------------+------+----------+---------------------+| 1 | SIMPLE | t1 | NULL | ref | idx_tcol123 | idx_tcol123 | 5 | const,const | 269 | 100.00 | Backward index scan |+----+-------------+-------+------------+------+---------------+-------------+---------+-------------+------+----------+---------------------+1 row in set, 1 warning (0.00 sec)# 当order by中的字段不包含在联合索引中的时候，就会用到`Using filesort`[root@GreatSQL][test]>EXPLAIN SELECT /* NO_CACHE */ * FROM t1 WHERE tcol01=1 AND tcol02=167 order by tcol04;+----+-------------+-------+------------+------+---------------+-------------+---------+-------------+------+----------+----------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+-------------+---------+-------------+------+----------+----------------+| 1 | SIMPLE | t1 | NULL | ref | idx_tcol123 | idx_tcol123 | 5 | const,const | 269 | 100.00 | Using filesort |+----+-------------+-------+------------+------+---------------+-------------+---------+-------------+------+----------+----------------+1 row in set, 1 warning (0.00 sec)

实验3：仅order by子句

# 优化器默认采取全部扫描了，因为是查询出所有数据，所以全表扫描会比索引更快，节省回表的时间[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 ORDER BY tcol01,tcol02,tcol03;+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------+| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 941900 | 100.00 | Using filesort |+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------+1 row in set, 1 warning (0.00 sec)[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 force index(`idx_tcol123`) ORDER BY tcol01,tcol02,tcol03;+----+-------------+-------+------------+-------+---------------+-------------+---------+------+--------+----------+-------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+--------+----------+-------+| 1 | SIMPLE | t1 | NULL | index | NULL | idx_tcol123 | 9 | NULL | 941900 | 100.00 | NULL |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+--------+----------+-------+1 row in set, 1 warning (0.00 sec)# 只筛选索引列，也会使用到索引，也就是所谓的覆盖索引[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ tcol01,tcol02,tcol03 FROM t1 ORDER BY tcol01,tcol02,tcol03;+----+-------------+-------+------------+-------+---------------+-------------+---------+------+--------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+--------+----------+-------------+| 1 | SIMPLE | t1 | NULL | index | NULL | idx_tcol123 | 9 | NULL | 941900 | 100.00 | Using index |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+--------+----------+-------------+1 row in set, 1 warning (0.00 sec)# 如果是筛选部分数据，那么就会使用到索引而不会全表扫描[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 ORDER BY tcol01,tcol02,tcol03 limit 10000,11110;+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-------+| 1 | SIMPLE | t1 | NULL | index | NULL | idx_tcol123 | 9 | NULL | 21110 | 100.00 | NULL |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-------+1 row in set, 1 warning (0.00 sec)# 调整字段顺序后，就变成`Using filesort`且没有用到索引，所以当使用order by语句，确保与联合索引的顺序要一致[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 ORDER BY tcol02,tcol01,tcol03 limit 10000,11110;+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------+| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 941900 | 100.00 | Using filesort |+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------+1 row in set, 1 warning (0.00 sec)

情况2：下面的SQL最多只能用到索引tcol1，tcol2部分

# tcol02范围查找后，导致数据乱序，于是tcol03索引条件用不上，同时会出现`Using index condition`和 `Using MRR`。[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 WHERE tcol01=1 AND tcol02>100 AND tcol03=202019;+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+----------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+----------------------------------+| 1 | SIMPLE | t1 | NULL | range | idx_tcol123 | idx_tcol123 | 5 | NULL | 77976 | 10.00 | Using index condition; Using MRR |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+----------------------------------+1 row in set, 1 warning (0.00 sec)[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 WHERE tcol01=1 AND tcol02>100;+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+----------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+----------------------------------+| 1 | SIMPLE | t1 | NULL | range | idx_tcol123 | idx_tcol123 | 5 | NULL | 77976 | 100.00 | Using index condition; Using MRR |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+----------------------------------+1 row in set, 1 warning (0.00 sec)[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 WHERE tcol01=1 AND tcol03=202019 ORDER BY tcol02;+----+-------------+-------+------------+------+---------------+-------------+---------+-------+--------+----------+-----------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+-------------+---------+-------+--------+----------+-----------------------+| 1 | SIMPLE | t1 | NULL | ref | idx_tcol123 | idx_tcol123 | 2 | const | 126670 | 10.00 | Using index condition |+----+-------------+-------+------------+------+---------------+-------------+---------+-------+--------+----------+-----------------------+1 row in set, 1 warning (0.00 sec)# 关掉`Using index condition`和`Using MRR`后再看一下执行计划，实际测试效率要高很多。# 这是因为ICP减少了引擎层和server层之间的数据传输和回表请求，不满足条件的请求，直接进行了过滤无需回表。# 实际上开启ICP后上面语句有用到tcol03的索引部分。[root@GreatSQL][test]>SET optimizer_switch = 'MRR=off';[root@GreatSQL][test]>SET optimizer_switch = 'index_condition_pushdown=off';[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 WHERE tcol01=1 AND tcol02>100 AND tcol03=202019;+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-------------+| 1 | SIMPLE | t1 | NULL | range | idx_tcol123 | idx_tcol123 | 5 | NULL | 77976 | 10.00 | Using where |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-------------+1 row in set, 1 warning (0.00 sec)1 row in set (1.81 sec) /* 关闭ICP和MRR后执行时间 */1 row in set (0.01 sec) /* 开启ICP和MRR后执行时间 */[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 WHERE tcol01=1 AND tcol02>100;+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-------------+| 1 | SIMPLE | t1 | NULL | range | idx_tcol123 | idx_tcol123 | 5 | NULL | 77976 | 100.00 | Using where |+----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-------------+1 row in set, 1 warning (0.00 sec)40252 rows in set (2.04 sec) /* 关闭ICP和MRR后执行时间 */40252 rows in set (1.58 sec) /* 开启ICP和MRR后执行时间 */[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 WHERE tcol01=1 AND tcol03=202019 ORDER BY tcol02;+----+-------------+-------+------------+------+---------------+-------------+---------+-------+--------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+-------------+---------+-------+--------+----------+-------------+| 1 | SIMPLE | t1 | NULL | ref | idx_tcol123 | idx_tcol123 | 2 | const | 126670 | 10.00 | Using where |+----+-------------+-------+------------+------+---------------+-------------+---------+-------+--------+----------+-------------+1 row in set, 1 warning (0.00 sec)1 row in set (1.99 sec) /* 关闭ICP和后执行时间 */1 row in set (0.01 sec) /* 开启ICP和后执行时间 */

Using index condition 请看文章万答#20，索引下推如何进行数据过滤

Using MRR 后面再介绍。

情况3：下面的SQL用不到索引

[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 WHERE tcol02=167;+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 941900 | 10.00 | Using where |+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+1 row in set, 1 warning (0.00 sec)[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 WHERE tcol02=167 AND tcol03 >=1;+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 941900 | 3.33 | Using where |+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+1 row in set, 1 warning (0.01 sec)[root@GreatSQL][test]>explain SELECT /* NO_CACHE */ * FROM t1 ORDER BY tcol02 limit 10000,11000;+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------+| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 941900 | 100.00 | Using filesort |+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------+1 row in set, 1 warning (0.00 sec)

联合索引有哪些好处

1.减少开销。建一个联合索引(tcol01, tcol02, tcol03)，相当于建立三个索引(tcol01),(tcol01,tcol02),(tcol01,tcol02,tcol03)的功能。每个索引都会占用写入开销和磁盘开销，对于大量数据的表，使用联合索引会大大的减少开销。
2.覆盖索引。对联合索引(tcol01, tcol02, tcol03)，如果有如下的SQL: select tcol01,tcol02,tcol03 from t1 where tcol01=? and tcol02=? and tcol03=? 那么就可以使用到覆盖索引的功能，查询数据无需回表，减少随机IO。
3.效率高。多列条件的查询下，索引列越多，通过索引筛选出的数据就越少。

联合索引使用建议

1.查询条件中的 where、order by、group by 涉及多个字段，一般需要创建多列索引，比如前面的 select * from t1 where tcol01=100 and tcol02=50;
2.创建联合索引的时候，要将区分度高的字段放在前面，假如有一张学生表包含学号和姓名，那么在建立联合索引的时候，学号放在姓名前面，因为学号是唯一性的，能过滤更多的数据。
3.尽量避免>、
4.只筛选需要的数据字段，满足覆盖索引的要求，不要用 select * 筛选所有列数据。

MySQL索引学习