OceanBase数据库合并出现checksum error
点击上方“IT那活儿”公众号--专注于企业全栈运维技术分享,不管IT什么活儿,干就完了!!!
现象描述
4.x生产集群目前某租户合并出现checksum error,全局索引和主表数据不一致。

排查步骤
2.1 sys租户查看全局合并信息
1006租户合并中存在CHECKSUM_ERROR select * from CDB_OB_MAJOR_COMPACTION;
2.2 查询表和索引表(包括:全局索引和局部索引)与主表之间出现的列校验和不一致的信息
select <em> from CDB_OB_TABLET_CHECKSUM_ERROR_INFO;<br>select </em> from CDB_OB_COLUMN_CHECKSUM_ERROR_INFO;
2.3 根据合并的版本号compaction_scn查询主表与索引表数据是否一致(结果不一致)
select sum(row_count) from <br><strong>all_virtual_tablet_replica_checksum where tenant_id = 1006 <br>and compaction_scn = 1715191202784774818 and tablet_id in <br>(select tablet_id from cdb_ob_table_locations where tenant_id = 1006 and table_id = 3696982); --114007773<br><br>select sum(row_count) from <br></strong>all_virtual_tablet_replica_checksum where tenant_id = 1006 <br>and compaction_scn = 1715191202784774818 and tablet_id in <br>(select tablet_id from cdb_ob_table_locations where tenant_id = 1006 and table_id = 3697015) ;--114007527
2.4 查询主表的schema信息主表为无主键range分区表。
MySQL [oceanbase]> select ddl_stmt_str from <strong>all_virtual_ddl_operation where table_id = 3696982 and ddl_stmt_str ' ';<br><br>CREATE TABLE REWARD_USER_ORDER_111202404 PARTITION BY RANGE(PARTITION_ID)<br>(PARTITION PAR_0 VALUES LESS THAN (1000),<br>PARTITION PAR_1 VALUES LESS THAN (2000),<br>PARTITION PAR_2 VALUES LESS THAN (3000),<br>PARTITION PAR_3 VALUES LESS THAN (4000),<br>PARTITION PAR_4 VALUES LESS THAN (5000),<br>PARTITION PAR_5 VALUES LESS THAN (6000),<br>PARTITION PAR_6 VALUES LESS THAN (7000),<br>PARTITION PAR_7 VALUES LESS THAN (8000),<br>PARTITION PAR_8 VALUES LESS THAN (9000),<br>PARTITION PAR_9 VALUES LESS THAN (MAXVALUE))<br>AS SELECT <em> FROM TEMPLATE_REWARD_USER_ORDER WHERE 1 1;
2.5 修改内部表状态
MySQL [oceanbase]> select tenant_name,tenant_id from dba_ob_tenants;<br>MySQL [oceanbase]> alter system change tenant tenant_id=1005;\切换meta租户<br>Query OK, 0 rows affected (0.001 sec)<br><br><br>MySQL [oceanbase]> select </em> from </strong>all_column_checksum_error_info;<br>+----------------------------+----------------------------+-----------+---------------------+------------+---------------+----------------+----------------+-----------------+-----------+-------------------+-------------------+<br>| gmt_create | gmt_modified | tenant_id | frozen_scn | index_type | data_table_id | index_table_id | data_tablet_id | index_tablet_id | column_id | data_column_ckm | index_column_ckm |<br>+----------------------------+----------------------------+-----------+---------------------+------------+---------------+----------------+----------------+-----------------+-----------+-------------------+-------------------+<br>| 2024-05-09 03:12:04.297722 | 2024-05-09 03:12:04.297722 | 1006 | 1715191202784774818 | 1 | 3696982 | 3697015 | 0 | 0 | 1 | 81609826643664396 | 81609668040138333 |<br>+----------------------------+----------------------------+-----------+---------------------+------------+---------------+----------------+----------------+-----------------+-----------+-------------------+-------------------+<br>1 row in set (0.003 sec)<br><br>MySQL [oceanbase]> select <em> from __all_merge_info;<br>+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+<br>| gmt_create | gmt_modified | tenant_id | cluster | frozen_scn | global_broadcast_scn | is_merge_error | last_merged_scn | merge_status | error_type | suspend_merging | merge_start_time | last_merged_time |<br>+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+<br>| 2023-12-22 16:55:24.023988 | 2024-05-09 02:03:14.332966 | 1005 | 0 | 1715191201458885386 | 1715191201458885386 | 0 | 1715191201458885386 | 0 | 0 | 0 | 1715191201495297 | 1715191394333195 |<br>| 2023-12-22 16:55:24.029360 | 2024-05-09 03:12:05.177653 | 1006 | 0 | 1715191202784774818 | 1715191202784774818 | 1 | 1715104801481078385 | 1 | 1 | 0 | 1715191203192248 | 1715110900059331 |<br>+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+<br>2 rows in set (0.006 sec)<br><br>MySQL [oceanbase]> select </em> from <strong>all_column_checksum_error_info;<br>+----------------------------+----------------------------+-----------+---------------------+------------+---------------+----------------+----------------+-----------------+-----------+-------------------+-------------------+<br>| gmt_create | gmt_modified | tenant_id | frozen_scn | index_type | data_table_id | index_table_id | data_tablet_id | index_tablet_id | column_id | data_column_ckm | index_column_ckm |<br>+----------------------------+----------------------------+-----------+---------------------+------------+---------------+----------------+----------------+-----------------+-----------+-------------------+-------------------+<br>| 2024-05-09 03:12:04.297722 | 2024-05-09 03:12:04.297722 | 1006 | 1715191202784774818 | 1 | 3696982 | 3697015 | 0 | 0 | 1 | 81609826643664396 | 81609668040138333 |<br>+----------------------------+----------------------------+-----------+---------------------+------------+---------------+----------------+----------------+-----------------+-----------+-------------------+-------------------+<br>1 row in set (0.002 sec)<br><br><br>MySQL [oceanbase]> delete from </strong>all_column_checksum_error_info;<br>Query OK, 1 row affected (0.007 sec)<br><br>MySQL [oceanbase]> select <em> from __all_column_checksum_error_info;<br>Empty set (0.001 sec)<br><br>MySQL [oceanbase]> select </em> from <strong>all_merge_info;<br>+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+<br>| gmt_create | gmt_modified | tenant_id | cluster | frozen_scn | global_broadcast_scn | is_merge_error | last_merged_scn | merge_status | error_type | suspend_merging | merge_start_time | last_merged_time |<br>+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+<br>| 2023-12-22 16:55:24.023988 | 2024-05-09 02:03:14.332966 | 1005 | 0 | 1715191201458885386 | 1715191201458885386 | 0 | 1715191201458885386 | 0 | 0 | 0 | 1715191201495297 | 1715191394333195 |<br>| 2023-12-22 16:55:24.029360 | 2024-05-09 03:12:05.177653 | 1006 | 0 | 1715191202784774818 | 1715191202784774818 | 1 | 1715104801481078385 | 1 | 1 | 0 | 1715191203192248 | 1715110900059331 |<br>+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+<br>2 rows in set (0.002 sec)<br>MySQL [oceanbase]> update <strong>all_merge_info set is_merge_error = 0, error_type = 0 where tenant_id = 1006;<br>Query OK, 1 row affected (0.009 sec)<br>Rows matched: 1 Changed: 1 Warnings: 0<br><br><br>MySQL [oceanbase]> commit;<br>Query OK, 0 rows affected (0.000 sec)<br><br>MySQL [oceanbase]><br>MySQL [oceanbase]> select * from </strong>all_merge_info;<br>+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+<br>| gmt_create | gmt_modified | tenant_id | cluster | frozen_scn | global_broadcast_scn | is_merge_error | last_merged_scn | merge_status | error_type | suspend_merging | merge_start_time | last_merged_time |<br>+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+<br>| 2023-12-22 16:55:24.023988 | 2024-05-09 02:03:14.332966 | 1005 | 0 | 1715191201458885386 | 1715191201458885386 | 0 | 1715191201458885386 | 0 | 0 | 0 | 1715191201495297 | 1715191394333195 |<br>| 2023-12-22 16:55:24.029360 | 2024-05-09 17:27:22.123728 | 1006 | 0 | 1715191202784774818 | 1715191202784774818 | 0 | 1715104801481078385 | 1 | 0 | 0 | 1715191203192248 | 1715110900059331 |<br>+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+<br>2 rows in set (0.002 sec)
2.6 重新查看合并信息,错误码已经消失当前还处于合并中,要想解决需要切换1006租户1号日志流的leader到其他节点。
select <em> from CDB_OB_MAJOR_COMPACTION;
2.7 查看1006租户的 1号日志流信息
select </em> from __all_virtual_ls_info where tenant_id = 1006 and ls_id = 1;
2.8 切换日志流,解决问题注:--查看是否有error,没有error后才可以进行切换日志流
select count(*) from </strong>all_virtual_tablet_meta_table where tenant_id = 1006 and status = 1;
ALTER SYSTEM SWITCH REPLICA leader LS = 1 SERVER = '10.161.243.22:2882' TENANT = cbss_reports;
--确认切换完毕:
select <em> from __all_virtual_ls_info where tenant_id = 1006 and ls_id = 1;
-- 确认是否还有小于1715191202784774818版本的数据(成功应该是0):
select count(</em>) from __all_virtual_tablet_meta_table where tenant_id = 1006 and report_scn < 1715191202784774818;
END