隐式转换知多少

数据运维 2024-04-12 张二河手机阅读

在生产环境中经常会有一些隐式类型转换导致SQL索引失效，性能极差，进而影响影响集群负载和业务的情况

本文对隐式转换常见的场景进行小结，在生产中要尽量避免SQL隐式转换的出现

常见的SQL产生隐式转换的场景有：

数据类型的隐式转换

字符集的隐式转换

其中，特别是在表连接的场景和存储过程中的字符集转换很容易被忽略

说明：字符集是针对字符类型数据的编码规则，对于数字类型则不需要进行转换字符集

数据类型的隐式转换

测试表结构

t1表字段a为varchar类型，t2表字段a为int类型


mysql> show create database test1\G

*************************** 1\. row ***************************

       Database: test1

Create Database: CREATE DATABASE `test1` /*!40100 DEFAULT CHARACTER SET utf8 */

1 row in set (0.00 sec)

mysql> show create table t1\G

*************************** 1\. row ***************************

       Table: t1

Create Table: CREATE TABLE `t1` (

  `id` int(11) NOT NULL,

  `a` varchar(20) DEFAULT NULL,

  `b` varchar(20) DEFAULT NULL,

  PRIMARY KEY (`id`),

  KEY `a` (`a`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

1 row in set (0.00 sec)

mysql> show create table t2\G

*************************** 1\. row ***************************

       Table: t2

Create Table: CREATE TABLE `t2` (

  `id` int(11) NOT NULL,

  `a` int(11) DEFAULT NULL,

  `b` varchar(20) DEFAULT NULL,

  PRIMARY KEY (`id`),

  KEY `a` (`a`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

1 row in set (0.00 sec)

单表示例

需要说明的是，有以下两种类型的转换：

当字段类型为字符串类型，参数为整型时，会导致索引失效
而字段类型为整型，传入的参数为字符串类型时，不会导致索引失效

这是因为在字符串与数字进行比较时，MySQL会将字符串类型转换为数字进行比较，因此当字段类型为字符串时，会在字段上加函数，而导致索引失效

官方文档说明：Strings are automatically converted to numbers and numbers to strings as necessary. dev.mysql.com/doc/refman/…

-- 字段类型为varchar，传参为整数，无法走到索引

mysql> explain select * from t1 where a=1000;

+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+

| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |

+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+

|  1 | SIMPLE      | t1    | NULL       | ALL  | a             | NULL | NULL    | NULL | 498892 |    10.00 | Using where |

+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+

1 row in set, 3 warnings (0.00 sec)

mysql> show warnings;

+---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+

| Level   | Code | Message                                                                                                                                           |

+---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+

| Warning | 1739 | Cannot use ref access on index 'a' due to type or collation conversion on field 'a'                                                               |

| Warning | 1739 | Cannot use range access on index 'a' due to type or collation conversion on field 'a'                                                             |

| Note    | 1003 | /* select#1 */ select `test1`.`t1`.`id` AS `id`,`test1`.`t1`.`a` AS `a`,`test1`.`t1`.`b` AS `b` from `test1`.`t1` where (`test1`.`t1`.`a` = 1000) |

+---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+

3 rows in set (0.00 sec)

-- 字段类型为int，传参为字符串，可以走到索引

mysql> explain select * from t2 where a='1000';

+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+

| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |

+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+

|  1 | SIMPLE      | t2    | NULL       | ref  | a             | a    | 5       | const |    1 |   100.00 | NULL  |

+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+

1 row in set, 1 warning (0.00 sec)

至于为什么不能将数字转换为字符串进行比较呢?

看下下面的比较结果：

字符串的比较是逐个比较字符串的大小，直到找到不同的字符，这样的比较结果和数字的比较结果是不同的

mysql> select '2000' <'250';

+---------------+

| '2000' <'250' |

+---------------+

|             1 |

+---------------+

1 row in set (0.00 sec)

表连接场景中的类型转换

当两个表的连接字段类型不一致时会导致隐式转换(MySQL内部增加cast函数)，无法走到连接字段索引，进而可能无法使用最优的表连接顺序
原本作为被驱动表的表由于无法使用到索引，而可能作为驱动表

示例：

正常情况下会选择t2表作为驱动表，但由于数据类型不同，实际上执行的sql是：select * from t1 join t2 on cast(t1.a as unsigned)=t2.a where t2.id<1000
如果t1作为被驱动表，则没有办法走到t1.a的索引，因此选择t1表作为驱动表

mysql> explain select * from t1 join t2 on t1.a=t2.a where t2.id<1000;

+----+-------------+-------+------------+------+---------------+------+---------+------------+--------+----------+-----------------------+

| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref        | rows   | filtered | Extra                 |

+----+-------------+-------+------------+------+---------------+------+---------+------------+--------+----------+-----------------------+

|  1 | SIMPLE      | t1    | NULL       | ALL  | a             | NULL | NULL    | NULL       | 498892 |   100.00 | Using where           |

|  1 | SIMPLE      | t2    | NULL       | ref  | PRIMARY,a     | a    | 5       | test1.t1.a |      1 |     5.00 | Using index condition |

+----+-------------+-------+------------+------+---------------+------+---------+------------+--------+----------+-----------------------+

2 rows in set, 2 warnings (0.00 sec)

mysql> show warnings;

+---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Level   | Code | Message                                                                                                                                                                                                                                                                                    |

+---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Warning | 1739 | Cannot use ref access on index 'a' due to type or collation conversion on field 'a'                                                                                                                                                                                                        |

| Note    | 1003 | /* select#1 */ select `test1`.`t1`.`id` AS `id`,`test1`.`t1`.`a` AS `a`,`test1`.`t1`.`b` AS `b`,`test1`.`t2`.`id` AS `id`,`test1`.`t2`.`a` AS `a`,`test1`.`t2`.`b` AS `b` from `test1`.`t1` join `test1`.`t2` where ((`test1`.`t2`.`id` < 1000) and (`test1`.`t1`.`a` = `test1`.`t2`.`a`)) |

+---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

2 rows in set (0.01 sec)

字符集的隐式转换

当参数字符集和字段字符集不同时，无法直接进行比较，而需要进行字符集转换，则需要在转换字段上加convert函数来转换字符集，导致索引失效

测试表结构

database级别字符集为utf8mb4
t1表字符集为utf8，t2表字符集为utf8mb4（字段级别没有指定，默认继承表级别字符集）

mysql> show create database test\G

*************************** 1\. row ***************************

       Database: test

Create Database: CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET utf8mb4 */

mysql> show create table t1\G

*************************** 1\. row ***************************

       Table: t1

Create Table: CREATE TABLE `t1` (

  `id` int(11) NOT NULL,

  `a` varchar(20) DEFAULT NULL,

  `b` varchar(20) DEFAULT NULL,

  PRIMARY KEY (`id`),

  KEY `a` (`a`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

1 row in set (0.00 sec)

mysql> show create table t2\G

*************************** 1\. row ***************************

       Table: t2

Create Table: CREATE TABLE `t2` (

  `id` int(11) NOT NULL,

  `a` varchar(20) DEFAULT NULL,

  `b` varchar(20) DEFAULT NULL,

  PRIMARY KEY (`id`),

  KEY `a` (`a`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

1 row in set (0.01 sec)

单表示例

-- 正常执行时，匹配字段的字符集（没有单独指定时继承表的字符集）
mysql> explain select * from t1 where a='1000';

+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+

| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |

+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+

|  1 | SIMPLE      | t1    | NULL       | ref  | a             | a    | 63      | const |    1 |   100.00 | NULL  |

+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+

1 row in set, 1 warning (0.00 sec)

-- 将参数转换不同的字符集，无法走到索引，而是全表扫描

mysql> explain select * from t1 where a=convert('1000' using utf8mb4);

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+

| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+

|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2000 |   100.00 | Using where |

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+

1 row in set, 1 warning (0.00 sec)

-- show warnings可以看到优化器进行了转换，在t1.a上加了convert函数，从而无法走到索引

mysql> show warnings;

+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Level | Code | Message                                                                                                                                                                                               |

+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Note  | 1003 | /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` where (convert(`test`.`t1`.`a` using utf8mb4) = <cache>(convert('1000' using utf8mb4))) |

+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

另外，需要注意的是，MySQL内部会优先将低级的字符集转换为更高级的字符集，例如在前面的示例中，convert函数加在t1.a上而下面这个示例，MySQL则是将utf8转换为utf8mb4，convert函数加在参数上，而非t2.a字段上：


mysql> explain select * from t2 where a=convert('1000' using utf8);

+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+

| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |

+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+

|  1 | SIMPLE      | t2    | NULL       | ref  | a             | a    | 83      | const |    1 |   100.00 | NULL  |

+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+

1 row in set, 1 warning (0.00 sec)

mysql> show warnings;

+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Level | Code | Message                                                                                                                                                                                   |

+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Note  | 1003 | /* select#1 */ select `test`.`t2`.`id` AS `id`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t2` where (`test`.`t2`.`a` = convert(convert('1000' using utf8) using utf8mb4)) |

+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

综上：

在当表字段字符集为更低级的字符集（如utf8），而传入的值为更高级的字符集（如utf8mb4），则此时会转换表字段的字符集，相当于字段上使用了函数，索引失效
当表字段为更高级的字符集（如utfmb4），而传入的值为更低级的字符集（如utf8），则此时会将传入的值进行字符集转换，并不会导致索引失效

但我们通常不会去手工使用convert函数转换参数的字符集，在以下两种场景中可能会出现比较容易忽略的隐式字符集转换，引发生产问题

表连接中的字符集转换

当两个表的连接字段字符集不一致时会导致隐式转换(MySQL内部增加convert函数)，无法走到连接字段索引，进而可能无法使用最优的表连接顺序
原本作为被驱动表的表由于无法使用到索引，而可能作为驱动表

示例：

正常情况下，MySQL会优先小结果集的表作为驱动表，在本例中即为t2为驱动表，t1为被驱动表
但是由于字符集不同，实际上执行的sql为show warnings看到的，对t1.a字段加了convert函数进行转换字符集，则无法走到t1.a字段的索引而不得不改变连接顺序

mysql> explain select * from t1 left join t2 on t1.a=t2.a where t2.id<1000;

+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------------+

| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                 |

+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------------+

|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 498649 |   100.00 | NULL                  |

|  1 | SIMPLE      | t2    | NULL       | ref  | PRIMARY,a     | a    | 83      | func |      1 |     4.79 | Using index condition |

+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-----------------------+

2 rows in set, 1 warning (0.00 sec)

mysql> show warnings;

+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Level | Code | Message                                                                                                                                                                                                                                                                                                |

+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Note  | 1003 | /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t2`.`id` AS `id`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t1` join `test`.`t2` where ((`test`.`t2`.`id` < 1000) and (convert(`test`.`t1`.`a` using utf8mb4) = `test`.`t2`.`a`)) |

+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

在下面示例中，虽然也发生了类型转换，但是效率并没有变差，因为原本最优的连接顺序就是t1作为驱动表

mysql> explain select * from t1 left join t2 on t1.a=t2.a where t1.id<1000;

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

|  1 | SIMPLE      | t1    | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL |  999 |   100.00 | Using where |

|  1 | SIMPLE      | t2    | NULL       | ref   | a             | a       | 83      | func |    1 |   100.00 | Using where |

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

2 rows in set, 1 warning (0.00 sec)

mysql> show warnings;

+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Level | Code | Message                                                                                                                                                                                                                                                                                                   |

+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Note  | 1003 | /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t2`.`id` AS `id`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t1` left join `test`.`t2` on((convert(`test`.`t1`.`a` using utf8mb4) = `test`.`t2`.`a`)) where (`test`.`t1`.`id` < 1000) |

+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

存储过程中的字符集转换

这也是比较容易忽略的一种场景，问题的发现是在生产环境存储过程中根据主键更新，但却需要执行10s+

存储过程中变量的字符集默认继承自database的字符集（也可以在创建时指定），当表字段字符集和database的字符集不一样时，就会出现类似前面的隐式字符集类型转换

示例：

database的字符集是utf8mb4
character_set_client和collation_connection是创建存储过程时会话的character_set_client和collation_connection的值
经测试存储过程中的变量的字符集是和数据库级别的字符集一致

-- 存储过程信息： Database Collation: utf8mb4_general_ci
mysql> show create procedure update_data\G

*************************** 1\. row ***************************

           Procedure: update_data

            sql_mode: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

    Create Procedure: CREATE DEFINER=`root`@`%` PROCEDURE `update_data`()

begin

  declare j int;

  declare n varchar(100);

   select charset(n);

  set j=1;

  while(j<=2000)do

set n = cast(j as char);

select 1,now();

    update t1 set b=concat(b,'1') where a=n;

select 2,now();

select sleep(1);

    set j=j+1;

  end while;

end

character_set_client: utf8mb4

collation_connection: utf8mb4_general_ci

  Database Collation: utf8mb4_general_ci

1 row in set (0.00 sec)

如下，在执行存储过程后，看到打印的变量n的字符集是utf8mb4

mysql> call update_data();

+------------+

| charset(n) |

+------------+

| utf8mb4    |

+------------+

1 row in set (0.00 sec)

根据索引字段a更新的语句实际上是变成了下面这样，走的是全表扫描（type:index，key：primary）


mysql> explain update t1 set b=concat(b,'1') where a=convert('1000' using utf8mb4);

+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+

| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows   | filtered | Extra       |

+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+

|  1 | UPDATE      | t1    | NULL       | index | NULL          | PRIMARY | 4       | NULL | 498649 |   100.00 | Using where |

+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+

1 row in set (0.00 sec)

而正常情况下，执行计划为：

mysql> explain update t1 set b=concat(b,'1') where a='1000';

+----+-------------+-------+------------+-------+---------------+------+---------+-------+------+----------+-------------+

| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |

+----+-------------+-------+------------+-------+---------------+------+---------+-------+------+----------+-------------+

|  1 | UPDATE      | t1    | NULL       | range | a             | a    | 63      | const |    1 |   100.00 | Using where |

+----+-------------+-------+------------+-------+---------------+------+---------+-------+------+----------+-------------+

1 row in set (0.00 sec)

更新时间也由0.00sec变为0.60sec，在表数据量很大的情况下，全表扫描将会对生产产生较大影响

mysql> update t1 set b=concat(b,'1') where a='1000';

Query OK, 1 row affected (0.00 sec)

Rows matched: 1  Changed: 1  Warnings: 0

mysql> update t1 set b=concat(b,'1') where a=convert('1000' using utf8mb4);

Query OK, 1 row affected (0.60 sec)

Rows matched: 1  Changed: 1  Warnings: 0

如何避免隐式转换

对于数据类型的隐式转换：

1、规范数据类型的选择 2、SQL传参与字段数据类型匹配

对于字符集的隐式转换：

1、客户端字符集、服务器端字符集、数据库字符集、表字符集、字段字符集保持一致