学习《MySQL高级》高阳老师讲解索引课程的笔记,本篇侧重对 order by 排序分析
建表
# 建表 CREATE TABLE tblA( #id int primary key not null autp_increment, age int, birth timestamp not null ); insert into tblA(age,birth) values(22,now()); insert into tblA(age,birth) values(23,now()); insert into tblA(age,birth) values(24,now()); # 建立复合索引 CREATE INDEX idx_A_ageBirth on tblA(age,birth); select * from tblA;
Order By 优化(索引分析)
由于本表中只有两个字段 age, birth,复合索引都覆盖了,所以 select * 就相当于 select age, birth, 查询直接走索引,不需要回表。此处仅关注排序(order by)是否会出现文件排序(filesort)。
1.1 explain select * from tblA where age > 20 order by age;
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+ | 1 | SIMPLE | tblA | NULL | index | idx_A_ageBirth | idx_A_ageBirth | 9 | NULL | 3 | 100.00 | Using where; Using index | +----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
排序用到了 age 字段的索引,不会出现filesort。
1.2 explain select * from tblA where age > 20 order by age, birth;
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+ | 1 | SIMPLE | tblA | NULL | index | idx_A_ageBirth | idx_A_ageBirth | 9 | NULL | 3 | 100.00 | Using where; Using index | +----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
排序,age, birth 符合复合索引顺序,所以排序用到了age, birth 两个字段的索引,不会出现filesort。
*1.3 explain select * from tblA where age > 20 order by birth;
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+------------------------------------------+ | 1 | SIMPLE | tblA | NULL | index | idx_A_ageBirth | idx_A_ageBirth | 9 | NULL | 3 | 100.00 | Using where; Using index; Using filesort | +----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+------------------------------------------+
查询时用到了 age 字段的索引; 排序时,由于 age 是范围,范围后面全失效,所以不能走 age, birth 索引进行排序,那么就会出现filesort。
若 age 是个等值查询,排序时就不会出现filesort。见下面语句:
mysql> explain select * from tblA where age = 22 order by birth; +----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+--------------------------+ | 1 | SIMPLE | tblA | NULL | ref | idx_A_ageBirth | idx_A_ageBirth | 5 | const | 1 | 100.00 | Using where; Using index | +----+-------------+-------+------------+------+----------------+----------------+---------+-------+------+----------+--------------------------+
1.4 explain select * from tblA where age > 20 order by birth, age;
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+------------------------------------------+ | 1 | SIMPLE | tblA | NULL | index | idx_A_ageBirth | idx_A_ageBirth | 9 | NULL | 3 | 100.00 | Using where; Using index; Using filesort | +----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+------------------------------------------+
排序时,由于 birth, age 不符合复合索引的顺序,所以会出现filesort。
2.1 explain select * from tblA order by birth;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+-----------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+-----------------------------+ | 1 | SIMPLE | tblA | NULL | index | NULL | idx_A_ageBirth | 9 | NULL | 3 | 100.00 | Using index; Using filesort | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+-----------------------------+
排序时,仅有birth,会出现filesort。
2.2 explain select * from tblA where birth > '2016-01-28 00:00:00' order by birth;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+------------------------------------------+ | 1 | SIMPLE | tblA | NULL | index | NULL | idx_A_ageBirth | 9 | NULL | 3 | 33.33 | Using where; Using index; Using filesort | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+------------------------------------------+
排序时,where 和 order by 的字段顺序不符合复合索引顺序,会出现文件排序。
*2.3 explain select * from tblA where birth > '2016-01-28 00:00:00' order by age;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+--------------------------+ | 1 | SIMPLE | tblA | NULL | index | NULL | idx_A_ageBirth | 9 | NULL | 3 | 33.33 | Using where; Using index | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+--------------------------+
排序时,where 和 order by 的字段顺序经过优化器优化后符合符合索引的顺序,不会出现文件排序。
*2.4 explain select * from tblA order by age asc, birth desc;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+-----------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+-----------------------------+ | 1 | SIMPLE | tblA | NULL | index | NULL | idx_A_ageBirth | 9 | NULL | 3 | 100.00 | Using index; Using filesort | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+-----------------------------+
排序时,order by 的字段 age、birth 虽然符合符合索引顺序,但是 age 按升序、birth 按降序,这样无法利用索引排序,会出现文件排序。若 age、birth 同升或同降,则不会出现文件排序。
Order By 无法使用索引时的优化
如果 Order By 用不上索引的话,就会出现 filesort。filesort 有两种算法:双路排序
和单路排序
。
双路排序
MySQL4.1之前是使用双路排序,字面意思就是两次扫描磁盘,最终得到数据,
读取行指针和 orderby 列,对他们进行排序,然后扫描已经排序好的列表,按照列表中的值重新从列表中读取对应的数据输出。从磁盘取排序字段,在buffer进行排序, 再从磁盘取其他字段。
取一批数据,要对磁盘进行了两次扫描,众所周知,I/O是很耗时的,所以在mysql4.1之后,出现了第二种改进的算法,就是单路排序。
单路排序
从磁盘读取查询需要的所有列,按照order by列在buffer对它们进行排序,然后扫描排序后的列表进行输出,它的效率更快些,避免了第二次读取数据。并且把随机IO变成了顺序IO,但是它会使用更多的空间,因为它把每一行都保存在内存中了。
一般来说单路排序
更好,但也会出现问题。这是为什么?
在sort_buffer中, 单路排序
比多路排序
要多占用很多空间,因为单路排序
是把所有字段都取出,所以有可能取出的数据的总大小超出了sort_buffer
的容量,导致每次只能取sort_ buffer容量大小的数据,进行排序(创建tmp文件,多路合并),排完再取sort_buffer容量大小,再排...从而多次I/O。
本来想省一次I/O操作,反而导致了大量的I/O操作,反而得不偿失。
如何优化?通过增加 sort_buffer_size
容量和max_length_for_sort_data
提高Order By的速度
1.Order by 时 select * 是一个大忌,只查需要的字段,这点非常重要。在这里的影响是:
1.1当Query的字段大小总和小于 max_length_for_sort_data 而且排序字段不是TEXT|BLOB类型时,会用改进后的算法——单路排序,否则用老算法——多路排序。
1.2两种算法的数据都有可能超出sort_buffer的容量, 超出之后,会创建tmp文件进行合并排序,导致多次I/O, 但是用单路排序算法的风险会更大一些,所以要提高sort_buffer_size。
2.尝试提高 sort_buffer_size
不管用哪种算法,提高这个参数都会提高效率,当然,要根据系统的能力去提高,因为这个参数是针对每个进程的。
3.尝试提高 max_length_for_sort_data
提高这个参数,会增加用改进算法的概率。但是如果设的太高,数据总容量超出sort_buffer_size的概率就增大,明显症状是高的磁盘I/O活动和低的处理器使用率。
Group By 优化
1.Gourp By 实际上是先排序再分组,遵守复合索引的最佳左前缀原则;
2.当无法使用索引列时,也需要像 Order By 一样,增加 max_length_for_sort_data 和 sort_buffer_size;
3.where 先于 having,尽量用 where 进行条件过滤。