在生产环境中经常会有一些隐式类型转换导致SQL索引失效,性能极差,进而影响影响集群负载和业务的情况
本文对隐式转换常见的场景进行小结,在生产中要尽量避免SQL隐式转换的出现
常见的SQL产生隐式转换的场景有:
其中,特别是在表连接的场景和存储过程中的字符集转换很容易被忽略
说明:
字符集是针对字符类型数据的编码规则,对于数字类型则不需要进行转换字符集
数据类型的隐式转换
测试表结构
- t1表字段a为varchar类型,t2表字段a为int类型
mysql> show create database test1\G
*************************** 1\. row ***************************
Database: test1
Create Database: CREATE DATABASE `test1` /*!40100 DEFAULT CHARACTER SET utf8 */
1 row in set (0.00 sec)
mysql> show create table t1\G
*************************** 1\. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` int(11) NOT NULL,
`a` varchar(20) DEFAULT NULL,
`b` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> show create table t2\G
*************************** 1\. row ***************************
Table: t2
Create Table: CREATE TABLE `t2` (
`id` int(11) NOT NULL,
`a` int(11) DEFAULT NULL,
`b` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
单表示例
需要说明的是,有以下两种类型的转换:
- 当字段类型为字符串类型,参数为整型时,会导致索引失效
- 而字段类型为整型,传入的参数为字符串类型时,不会导致索引失效
这是因为在字符串与数字进行比较时,MySQL会将字符串类型转换为数字进行比较,因此当字段类型为字符串时,会在字段上加函数,而导致索引失效
官方文档说明:Strings are automatically converted to numbers and numbers to strings as necessary.
dev.mysql.com/doc/refman/…
-- 字段类型为varchar,传参为整数,无法走到索引
mysql> explain select * from t1 where a=1000;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | ALL | a | NULL | NULL | NULL | 498892 | 10.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 3 warnings (0.00 sec)
mysql> show warnings;
+---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1739 | Cannot use ref access on index 'a' due to type or collation conversion on field 'a' |
| Warning | 1739 | Cannot use range access on index 'a' due to type or collation conversion on field 'a' |
| Note | 1003 | /* select#1 */ select `test1`.`t1`.`id` AS `id`,`test1`.`t1`.`a` AS `a`,`test1`.`t1`.`b` AS `b` from `test1`.`t1` where (`test1`.`t1`.`a` = 1000) |
+---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
-- 字段类型为int,传参为字符串,可以走到索引
mysql> explain select * from t2 where a='1000';
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| 1 | SIMPLE | t2 | NULL | ref | a | a | 5 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
至于为什么不能将数字转换为字符串进行比较呢?
看下下面的比较结果:
- 字符串的比较是逐个比较字符串的大小,直到找到不同的字符,这样的比较结果和数字的比较结果是不同的
mysql> select '2000' <'250';
+---------------+
| '2000' <'250' |
+---------------+
| 1 |
+---------------+
1 row in set (0.00 sec)
表连接场景中的类型转换
- 当两个表的连接字段类型不一致时会导致隐式转换(MySQL内部增加cast函数),无法走到连接字段索引,进而可能无法使用最优的表连接顺序
- 原本作为被驱动表的表由于无法使用到索引,而可能作为驱动表
示例:
-
正常情况下会选择t2表作为驱动表,但由于数据类型不同,实际上执行的sql是:
select * from t1 join t2 on cast(t1.a as unsigned)=t2.a where t2.id<1000
-
如果t1作为被驱动表,则没有办法走到t1.a的索引,因此选择t1表作为驱动表
mysql> explain select * from t1 join t2 on t1.a=t2.a where t2.id<1000;
+----+-------------+-------+------------+------+---------------+------+---------+------------+--------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------------+--------+----------+-----------------------+
| 1 | SIMPLE | t1 | NULL | ALL | a | NULL | NULL | NULL | 498892 | 100.00 | Using where |
| 1 | SIMPLE | t2 | NULL | ref | PRIMARY,a | a | 5 | test1.t1.a | 1 | 5.00 | Using index condition |
+----+-------------+-------+------------+------+---------------+------+---------+------------+--------+----------+-----------------------+
2 rows in set, 2 warnings (0.00 sec)
mysql> show warnings;
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1739 | Cannot use ref access on index 'a' due to type or collation conversion on field 'a' |
| Note | 1003 | /* select#1 */ select `test1`.`t1`.`id` AS `id`,`test1`.`t1`.`a` AS `a`,`test1`.`t1`.`b` AS `b`,`test1`.`t2`.`id` AS `id`,`test1`.`t2`.`a` AS `a`,`test1`.`t2`.`b` AS `b` from `test1`.`t1` join `test1`.`t2` where ((`test1`.`t2`.`id` < 1000) and (`test1`.`t1`.`a` = `test1`.`t2`.`a`)) |
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.01 sec)
字符集的隐式转换
当参数字符集和字段字符集不同时,无法直接进行比较,而需要进行字符集转换,则需要在转换字段上加convert函数来转换字符集,导致索引失效
测试表结构
- database级别字符集为utf8mb4
- t1表字符集为utf8,t2表字符集为utf8mb4(字段级别没有指定,默认继承表级别字符集)
mysql> show create database test\G
*************************** 1\. row ***************************
Database: test
Create Database: CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET utf8mb4 */
mysql> show create table t1\G
*************************** 1\. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` int(11) NOT NULL,
`a` varchar(20) DEFAULT NULL,
`b` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> show create table t2\G
*************************** 1\. row ***************************
Table: t2
Create Table: CREATE TABLE `t2` (
`id` int(11) NOT NULL,
`a` varchar(20) DEFAULT NULL,
`b` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.01 sec)
单表示例
-- 正常执行时,匹配字段的字符集(没有单独指定时继承表的字符集)
mysql> explain select * from t1 where a='1000';
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| 1 | SIMPLE | t1 | NULL | ref | a | a | 63 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
-- 将参数转换不同的字符集,无法走到索引,而是全表扫描
mysql> explain select * from t1 where a=convert('1000' using utf8mb4);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 2000 | 100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
-- show warnings可以看到优化器进行了转换,在t1.a上加了convert函数,从而无法走到索引
mysql> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` where (convert(`test`.`t1`.`a` using utf8mb4) = <cache>(convert('1000' using utf8mb4))) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
另外,需要注意的是,MySQL内部会优先将低级的字符集转换为更高级的字符集,例如在前面的示例中,convert函数加在t1.a上而下面这个示例,MySQL则是将utf8转换为utf8mb4,convert函数加在参数上,而非t2.a字段上:
mysql> explain select * from t2 where a=convert('1000' using utf8);
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| 1 | SIMPLE | t2 | NULL | ref | a | a | 83 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `test`.`t2`.`id` AS `id`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t2` where (`test`.`t2`.`a` = convert(convert('1000' using utf8) using utf8mb4)) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
综上:
-
在当表字段字符集为更低级的字符集(如utf8),而传入的值为更高级的字符集(如utf8mb4),则此时会转换表字段的字符集,相当于字段上使用了函数,索引失效
-
当表字段为更高级的字符集(如utfmb4),而传入的值为更低级的字符集(如utf8),则此时会将传入的值进行字符集转换,并不会导致索引失效
但我们通常不会去手工使用convert函数转换参数的字符集,在以下两种场景中可能会出现比较容易忽略的隐式字符集转换,引发生产问题
表连接中的字符集转换
- 当两个表的连接字段字符集不一致时会导致隐式转换(MySQL内部增加convert函数),无法走到连接字段索引,进而可能无法使用最优的表连接顺序
- 原本作为被驱动表的表由于无法使用到索引,而可能作为驱动表
示例:
- 正常情况下,MySQL会优先小结果集的表作为驱动表,在本例中即为t2为驱动表,t1为被驱动表
- 但是由于字符集不同,实际上执行的sql为show warnings看到的,对t1.a字段加了convert函数进行转换字符集,则无法走到t1.a字段的索引而不得不改变连接顺序
mysql> explain select * from t1 left join t2 on t1.a=t2.a where t2.id<1000;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 498649 | 100.00 | NULL |
| 1 | SIMPLE | t2 | NULL | ref | PRIMARY,a | a | 83 | func | 1 | 4.79 | Using index condition |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t2`.`id` AS `id`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t1` join `test`.`t2` where ((`test`.`t2`.`id` < 1000) and (convert(`test`.`t1`.`a` using utf8mb4) = `test`.`t2`.`a`)) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
在下面示例中,虽然也发生了类型转换,但是效率并没有变差,因为原本最优的连接顺序就是t1作为驱动表
mysql> explain select * from t1 left join t2 on t1.a=t2.a where t1.id<1000;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 999 | 100.00 | Using where |
| 1 | SIMPLE | t2 | NULL | ref | a | a | 83 | func | 1 | 100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t2`.`id` AS `id`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t1` left join `test`.`t2` on((convert(`test`.`t1`.`a` using utf8mb4) = `test`.`t2`.`a`)) where (`test`.`t1`.`id` < 1000) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
存储过程中的字符集转换
这也是比较容易忽略的一种场景,问题的发现是在生产环境存储过程中根据主键更新,但却需要执行10s+
存储过程中变量的字符集默认继承自database的字符集(也可以在创建时指定),当表字段字符集和database的字符集不一样时,就会出现类似前面的隐式字符集类型转换
示例:
-
database的字符集是utf8mb4
-
character_set_client和collation_connection是创建存储过程时会话的character_set_client和collation_connection的值
-
经测试存储过程中的变量的字符集是和数据库级别的字符集一致
-- 存储过程信息: Database Collation: utf8mb4_general_ci
mysql> show create procedure update_data\G
*************************** 1\. row ***************************
Procedure: update_data
sql_mode: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
Create Procedure: CREATE DEFINER=`root`@`%` PROCEDURE `update_data`()
begin
declare j int;
declare n varchar(100);
select charset(n);
set j=1;
while(j<=2000)do
set n = cast(j as char);
select 1,now();
update t1 set b=concat(b,'1') where a=n;
select 2,now();
select sleep(1);
set j=j+1;
end while;
end
character_set_client: utf8mb4
collation_connection: utf8mb4_general_ci
Database Collation: utf8mb4_general_ci
1 row in set (0.00 sec)
如下,在执行存储过程后,看到打印的变量n的字符集是utf8mb4
mysql> call update_data();
+------------+
| charset(n) |
+------------+
| utf8mb4 |
+------------+
1 row in set (0.00 sec)
根据索引字段a更新的语句实际上是变成了下面这样,走的是全表扫描(type:index,key:primary)
mysql> explain update t1 set b=concat(b,'1') where a=convert('1000' using utf8mb4);
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| 1 | UPDATE | t1 | NULL | index | NULL | PRIMARY | 4 | NULL | 498649 | 100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
1 row in set (0.00 sec)
而正常情况下,执行计划为:
mysql> explain update t1 set b=concat(b,'1') where a='1000';
+----+-------------+-------+------------+-------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+-------+------+----------+-------------+
| 1 | UPDATE | t1 | NULL | range | a | a | 63 | const | 1 | 100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+------+---------+-------+------+----------+-------------+
1 row in set (0.00 sec)
更新时间也由0.00sec变为0.60sec,在表数据量很大的情况下,全表扫描将会对生产产生较大影响
mysql> update t1 set b=concat(b,'1') where a='1000';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update t1 set b=concat(b,'1') where a=convert('1000' using utf8mb4);
Query OK, 1 row affected (0.60 sec)
Rows matched: 1 Changed: 1 Warnings: 0
如何避免隐式转换
对于数据类型的隐式转换:
1、规范数据类型的选择
2、SQL传参与字段数据类型匹配
对于字符集的隐式转换:
1、客户端字符集、服务器端字符集、数据库字符集、表字符集、字段字符集保持一致