故障解析丨导入字符串NULL导致主从报错

2023年 12月 13日 93.9k 0

1.背景概述

目前需要搭建一个从库,由于单表数据量较大,时间比较有限,考虑到导入导出的时间,并且GreatSQL支持并行load data的功能,能够加速数据的导入,因此决定使用 select into outfile 和 load data 的方式进行数据的迁移;

在数据导入完成后进行数据同步,从库发生报错 1032 找不到记录,但是登录从库中可以查询到此条记录,这里就很奇怪;

最后通过解析relaylog,根据relaylog中的update记录,以每个字段为查询条件进行查询,发现是由于NULL值列导致的,主库这列的值是 NULL,从库在导入后导成了字符串"NULL",因此导致回放update操作时匹配不到数据而报错1032.

2.问题复现

本次测试基于 GreatSQL 8.0.32-24

2.1 初始化2个单机实例

2.2 主节点创建测试表

greatsql> create database test;
greatsql> use test;

greatsql> create table t1 (id int,
name varchar(30),
age int,
addr varchar(30),
school varchar(30),
unique key (id)) engine=innodb;

greatsql> insert into t1 values
(1,'小红',10,'北京','一中'),
(2,'小绿',11,'北京','一中'),
(3,'小黄',12,'北京',NULL),
(4,'小蓝',13,'北京',NULL),
(5,'小黑',14,'北京',NULL);

2.3 查看数据

greatsql> select * from t1;
+----+--------+------+--------+--------+
| id | name | age | addr | school |
+----+--------+------+--------+--------+
| 1 | 小红 | 10 | 北京 | 一中 |
| 2 | 小绿 | 11 | 北京 | 一中 |
| 3 | 小黄 | 12 | 北京 | NULL |
| 4 | 小蓝 | 13 | 北京 | NULL |
| 5 | 小黑 | 14 | 北京 | NULL |
+----+--------+------+--------+--------+
5 rows in set (0.00 sec)

2.4 主节点导出数据

greatsql> select * from test.t1 into outfile '/greatsql/t1.csv' FIELDS TERMINATED BY '|+|' ESCAPED BY '' LINES TERMINATED BY '/*rowsxxx*/';

2.5 查看导出的数据

$ cat t1.csv

1|+|小红|+|10|+|北京|+|一中/*rowsxxx*/2|+|小绿|+|11|+|北京|+|一中/*rowsxxx*/3|+|小黄|+|12|+|北京|+|NULL/*rowsxxx*/4|+|小蓝|+|13|+|北京|+|NULL/*rowsxxx*/5|+|小黑|+|14|+|北京|+|NULL/*rowsxxx

可以看到导出的数据中包含 NULL

2.6 从库创建表并导入数据

greatsql> create database test;
use test;
create table t1 (id int,
name varchar(30),
age int,
addr varchar(30),
school varchar(30),
unique key (id)) engine=innodb;

导入数据

greatsql> load data infile '/greatsql/t1.csv' into table test.t1 fields terminated by '|+|' ESCAPED BY '' lines terminated by '/*rowsxxx*/';

2.7 从库查询数据

greatsql> select * from test.t1;
+----+--------+------+--------+--------+
| id | name | age | addr | school |
+----+--------+------+--------+--------+
| 1 | 小红 | 10 | 北京 | 一中 |
| 2 | 小绿 | 11 | 北京 | 一中 |
| 3 | 小黄 | 12 | 北京 | NULL |
| 4 | 小蓝 | 13 | 北京 | NULL |
| 5 | 小黑 | 14 | 北京 | NULL |
+----+--------+------+--------+--------+
5 rows in set (0.00 sec)

2.8 从库建立复制

greatsql> reset master;
Query OK, 0 rows affected (0.04 sec)

greatsql> set global gtid_purged='b94e6517-68dd-11ee-b43b-00163ecb92e3:1-5755';
Query OK, 0 rows affected (0.00 sec)

greatsql> show master status;
+---------------+----------+--------------+------------------+---------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+---------------------------------------------+
| binlog.000001 | 153 | | | b94e6517-68dd-11ee-b43b-00163ecb92e3:1-5755 |
+---------------+----------+--------------+------------------+---------------------------------------------+
1 row in set (0.00 sec)

greatsql> change master to master_user='root',master_password='greatdb',master_host='192.168.137.162',master_port=6001,master_auto_position=1;
Query OK, 0 rows affected, 7 warnings (0.02 sec)

greatsql> start slave;
Query OK, 0 rows affected, 1 warning (0.04 sec)

greatsql> show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 192.168.137.162
Master_User: root
Master_Port: 6001
Connect_Retry: 60
Master_Log_File: binlog.000002
Read_Master_Log_Pos: 1861574
Relay_Log_File: relaylog.000002
Relay_Log_Pos: 395
Relay_Master_Log_File: binlog.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
......

2.9 主库更新数据

greatsql> update test.t1 set name='小小黑' where id=5;

2.10 从库查看复制状态

greatsql> show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.137.162
Master_User: root
Master_Port: 6001
Connect_Retry: 60
Master_Log_File: binlog.000002
Read_Master_Log_Pos: 1863564
Relay_Log_File: relaylog.000002
Relay_Log_Pos: 395
Relay_Master_Log_File: binlog.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'b94e6517-68dd-11ee-b43b-00163ecb92e3:5756' at master log binlog.000002, end_log_pos 1863537. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
......

greatsql> select * from performance_schema.replication_applier_status_by_worker limit 1G
*************************** 1. row ***************************
CHANNEL_NAME:
WORKER_ID: 1
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_ERROR_NUMBER: 1032
LAST_ERROR_MESSAGE: Worker 1 failed executing transaction 'b94e6517-68dd-11ee-b43b-00163ecb92e3:5756' at master log binlog.000002, end_log_pos 1863537; Could not execute Update_rows event on table test.t1; Can't find record in 't1', Error_code: 1032; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1863537
LAST_ERROR_TIMESTAMP: 2023-10-17 10:02:46.396166
LAST_APPLIED_TRANSACTION:
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION: b94e6517-68dd-11ee-b43b-00163ecb92e3:5756
APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2023-10-17 10:02:46.392331
APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2023-10-17 10:02:46.392331
APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 2023-10-17 10:02:46.393814
LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_RETRIES_COUNT: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
1 row in set (0.00 sec)

可以看到从库发生1032报错,找不到记录

2.11 解析从库relay log

#231017 10:02:46 server id 1 end_log_pos 1863456 Table_map: `test`.`t1` mapped to number 180
# has_generated_invisible_primary_key=0
# at 673
#231017 10:02:46 server id 1 end_log_pos 1863537 Update_rows: table id 180 flags: STMT_END_F
### UPDATE `test`.`t1`
### WHERE
### @1=5 /* INT meta=0 nullable=1 is_null=0 */
### @2='小黑' /* VARSTRING(120) meta=120 nullable=1 is_null=0 */
### @3=14 /* INT meta=0 nullable=1 is_null=0 */
### @4='北京' /* VARSTRING(120) meta=120 nullable=1 is_null=0 */
### @5=NULL /* VARSTRING(120) meta=120 nullable=1 is_null=1 */
### SET
### @1=5 /* INT meta=0 nullable=1 is_null=0 */
### @2='小小黑' /* VARSTRING(120) meta=120 nullable=1 is_null=0 */
### @3=14 /* INT meta=0 nullable=1 is_null=0 */
### @4='北京' /* VARSTRING(120) meta=120 nullable=1 is_null=0 */
### @5=NULL /* VARSTRING(120) meta=120 nullable=1 is_null=1 */
# at 754
#231017 10:02:46 server id 1 end_log_pos 1863564 Xid = 5940
COMMIT/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by greatsqlbinlog */ /*!*/;
DELIMITER ;
# End of log file

可以看到update更新前的数据与从库的数据一致,那么为什么还会报错 找不到记录呢?

2.12 根据relay log中的内容去从库查询数据

greatsql> select * from test.t1 where id=5;
+------+--------+------+--------+--------+
| id | name | age | addr | school |
+------+--------+------+--------+--------+
| 5 | 小黑 | 14 | 北京 | NULL |
+------+--------+------+--------+--------+
1 row in set (0.01 sec)

greatsql> select * from test.t1 where name='小黑';
+------+--------+------+--------+--------+
| id | name | age | addr | school |
+------+--------+------+--------+--------+
| 5 | 小黑 | 14 | 北京 | NULL |
+------+--------+------+--------+--------+
1 row in set (0.01 sec)

greatsql> select * from test.t1 where age=14;
+------+--------+------+--------+--------+
| id | name | age | addr | school |
+------+--------+------+--------+--------+
| 5 | 小黑 | 14 | 北京 | NULL |
+------+--------+------+--------+--------+
1 row in set (0.00 sec)

greatsql> select * from test.t1 where addr='北京';
+------+--------+------+--------+--------+
| id | name | age | addr | school |
+------+--------+------+--------+--------+
| 1 | 小红 | 10 | 北京 | 一中 |
| 2 | 小绿 | 11 | 北京 | 一中 |
| 3 | 小黄 | 12 | 北京 | NULL |
| 4 | 小蓝 | 13 | 北京 | NULL |
| 5 | 小黑 | 14 | 北京 | NULL |
+------+--------+------+--------+--------+
5 rows in set (0.00 sec)

greatsql> select * from test.t1 where school is null;
Empty set (0.01 sec)

greatsql> select * from test.t1 where school='null';
+------+--------+------+--------+--------+
| id | name | age | addr | school |
+------+--------+------+--------+--------+
| 3 | 小黄 | 12 | 北京 | NULL |
| 4 | 小蓝 | 13 | 北京 | NULL |
| 5 | 小黑 | 14 | 北京 | NULL |
+------+--------+------+--------+--------+
3 rows in set (0.00 sec)

可以看到,根据null值作为查询条件时,匹配不到数据; 根据字符串"null" 进行匹配是可以匹配到数据

2.13 去主库进行查询

greatsql> select * from test.t1 where school is null;
+------+-----------+------+--------+--------+
| id | name | age | addr | school |
+------+-----------+------+--------+--------+
| 3 | 小黄 | 12 | 北京 | NULL |
| 4 | 小蓝 | 13 | 北京 | NULL |
| 5 | 小小黑 | 14 | 北京 | NULL |
+------+-----------+------+--------+--------+
3 rows in set (0.00 sec)

greatsql> select * from test.t1 where school='null';
Empty set (0.00 sec)

在主库查询的结果与从库相反

可以得出结论,由于从库导入的数据将NULL值列的数据导入成了字符串 NULL,因此导致主从数据出现了不一致。

2.14 修复从库

greatsql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)

greatsql> update test.t1 set school=NULL where school='null';
Query OK, 3 rows affected (0.01 sec)
Rows matched: 3 Changed: 3 Warnings: 0

greatsql> start slave;
Query OK, 0 rows affected, 1 warning (0.04 sec)

greatsql> show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.137.162
Master_User: root
Master_Port: 6001
Connect_Retry: 60
Master_Log_File: binlog.000002
Read_Master_Log_Pos: 1863564
Relay_Log_File: relaylog.000003
Relay_Log_Pos: 435
Relay_Master_Log_File: binlog.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
......

可以看到主从状态已经恢复正常

3.总结

1.如果FIELDS ESCAPED BY字符为空字符,则没有字符被转义,并且NULL被作为NULL输出,而不是N;这也是导致此次主从报错的原因。

2.如果这张表使用的是主键而不是唯一索引,即使某些列被导入为字符串NULL,也不会报错。

3.如果这张表没有索引或有普通索引,则会报错。

Enjoy GreatSQL 🙂

关于GreatSQL

GreatSQL是适用于金融级应用的国内自主开源数据库,具备高性能、高可靠、高易用性、高安全等多个核心特性,可以作为MySQL或Percona Server的可选替换,用于线上生产环境,且完全免费并兼容MySQL或Percona Server。

相关链接

GreatSQL社区

Gitee

Github

Bilibili

技术交流群

微信:添加GreatSQL社区助手好友,微信号wanlidbc发送验证信息加群

QQ群:533341697

相关文章

Oracle如何使用授予和撤销权限的语法和示例
Awesome Project: 探索 MatrixOrigin 云原生分布式数据库
下载丨66页PDF,云和恩墨技术通讯(2024年7月刊)
社区版oceanbase安装
Oracle 导出CSV工具-sqluldr2
ETL数据集成丨快速将MySQL数据迁移至Doris数据库

发布评论