点击上方“IT那活儿”公众号--专注于企业全栈运维技术分享,不管IT什么活儿,干就完了!!!
现象描述
4.x生产集群目前某租户合并出现checksum error,全局索引和主表数据不一致。
排查步骤
2.1 sys租户查看全局合并信息
1006租户合并中存在CHECKSUM_ERROR select * from CDB_OB_MAJOR_COMPACTION;
2.2 查询表和索引表(包括:全局索引和局部索引)与主表之间出现的列校验和不一致的信息
select * from CDB_OB_TABLET_CHECKSUM_ERROR_INFO;
select * from CDB_OB_COLUMN_CHECKSUM_ERROR_INFO;
2.3 根据合并的版本号compaction_scn查询主表与索引表数据是否一致(结果不一致)
select sum(row_count) from
__all_virtual_tablet_replica_checksum where tenant_id = 1006
and compaction_scn = 1715191202784774818 and tablet_id in
(select tablet_id from cdb_ob_table_locations where tenant_id = 1006 and table_id = 3696982); --114007773
select sum(row_count) from
__all_virtual_tablet_replica_checksum where tenant_id = 1006
and compaction_scn = 1715191202784774818 and tablet_id in
(select tablet_id from cdb_ob_table_locations where tenant_id = 1006 and table_id = 3697015) ;--114007527
2.4 查询主表的schema信息主表为无主键range分区表。
MySQL [oceanbase]> select ddl_stmt_str from __all_virtual_ddl_operation where table_id = 3696982 and ddl_stmt_str ' ';
CREATE TABLE REWARD_USER_ORDER_111202404 PARTITION BY RANGE(PARTITION_ID)
(PARTITION PAR_0 VALUES LESS THAN (1000),
PARTITION PAR_1 VALUES LESS THAN (2000),
PARTITION PAR_2 VALUES LESS THAN (3000),
PARTITION PAR_3 VALUES LESS THAN (4000),
PARTITION PAR_4 VALUES LESS THAN (5000),
PARTITION PAR_5 VALUES LESS THAN (6000),
PARTITION PAR_6 VALUES LESS THAN (7000),
PARTITION PAR_7 VALUES LESS THAN (8000),
PARTITION PAR_8 VALUES LESS THAN (9000),
PARTITION PAR_9 VALUES LESS THAN (MAXVALUE))
AS SELECT * FROM TEMPLATE_REWARD_USER_ORDER WHERE 1 1;
2.5 修改内部表状态
MySQL [oceanbase]> select tenant_name,tenant_id from dba_ob_tenants;
MySQL [oceanbase]> alter system change tenant tenant_id=1005;\\切换meta租户
Query OK, 0 rows affected (0.001 sec)
MySQL [oceanbase]> select * from __all_column_checksum_error_info;
+----------------------------+----------------------------+-----------+---------------------+------------+---------------+----------------+----------------+-----------------+-----------+-------------------+-------------------+
| gmt_create | gmt_modified | tenant_id | frozen_scn | index_type | data_table_id | index_table_id | data_tablet_id | index_tablet_id | column_id | data_column_ckm | index_column_ckm |
+----------------------------+----------------------------+-----------+---------------------+------------+---------------+----------------+----------------+-----------------+-----------+-------------------+-------------------+
| 2024-05-09 03:12:04.297722 | 2024-05-09 03:12:04.297722 | 1006 | 1715191202784774818 | 1 | 3696982 | 3697015 | 0 | 0 | 1 | 81609826643664396 | 81609668040138333 |
+----------------------------+----------------------------+-----------+---------------------+------------+---------------+----------------+----------------+-----------------+-----------+-------------------+-------------------+
1 row in set (0.003 sec)
MySQL [oceanbase]> select * from __all_merge_info;
+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+
| gmt_create | gmt_modified | tenant_id | cluster | frozen_scn | global_broadcast_scn | is_merge_error | last_merged_scn | merge_status | error_type | suspend_merging | merge_start_time | last_merged_time |
+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+
| 2023-12-22 16:55:24.023988 | 2024-05-09 02:03:14.332966 | 1005 | 0 | 1715191201458885386 | 1715191201458885386 | 0 | 1715191201458885386 | 0 | 0 | 0 | 1715191201495297 | 1715191394333195 |
| 2023-12-22 16:55:24.029360 | 2024-05-09 03:12:05.177653 | 1006 | 0 | 1715191202784774818 | 1715191202784774818 | 1 | 1715104801481078385 | 1 | 1 | 0 | 1715191203192248 | 1715110900059331 |
+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+
2 rows in set (0.006 sec)
MySQL [oceanbase]> select * from __all_column_checksum_error_info;
+----------------------------+----------------------------+-----------+---------------------+------------+---------------+----------------+----------------+-----------------+-----------+-------------------+-------------------+
| gmt_create | gmt_modified | tenant_id | frozen_scn | index_type | data_table_id | index_table_id | data_tablet_id | index_tablet_id | column_id | data_column_ckm | index_column_ckm |
+----------------------------+----------------------------+-----------+---------------------+------------+---------------+----------------+----------------+-----------------+-----------+-------------------+-------------------+
| 2024-05-09 03:12:04.297722 | 2024-05-09 03:12:04.297722 | 1006 | 1715191202784774818 | 1 | 3696982 | 3697015 | 0 | 0 | 1 | 81609826643664396 | 81609668040138333 |
+----------------------------+----------------------------+-----------+---------------------+------------+---------------+----------------+----------------+-----------------+-----------+-------------------+-------------------+
1 row in set (0.002 sec)
MySQL [oceanbase]> delete from __all_column_checksum_error_info;
Query OK, 1 row affected (0.007 sec)
MySQL [oceanbase]> select * from __all_column_checksum_error_info;
Empty set (0.001 sec)
MySQL [oceanbase]> select * from __all_merge_info;
+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+
| gmt_create | gmt_modified | tenant_id | cluster | frozen_scn | global_broadcast_scn | is_merge_error | last_merged_scn | merge_status | error_type | suspend_merging | merge_start_time | last_merged_time |
+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+
| 2023-12-22 16:55:24.023988 | 2024-05-09 02:03:14.332966 | 1005 | 0 | 1715191201458885386 | 1715191201458885386 | 0 | 1715191201458885386 | 0 | 0 | 0 | 1715191201495297 | 1715191394333195 |
| 2023-12-22 16:55:24.029360 | 2024-05-09 03:12:05.177653 | 1006 | 0 | 1715191202784774818 | 1715191202784774818 | 1 | 1715104801481078385 | 1 | 1 | 0 | 1715191203192248 | 1715110900059331 |
+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+
2 rows in set (0.002 sec)
MySQL [oceanbase]> update __all_merge_info set is_merge_error = 0, error_type = 0 where tenant_id = 1006;
Query OK, 1 row affected (0.009 sec)
Rows matched: 1 Changed: 1 Warnings: 0
MySQL [oceanbase]> commit;
Query OK, 0 rows affected (0.000 sec)
MySQL [oceanbase]>
MySQL [oceanbase]> select * from __all_merge_info;
+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+
| gmt_create | gmt_modified | tenant_id | cluster | frozen_scn | global_broadcast_scn | is_merge_error | last_merged_scn | merge_status | error_type | suspend_merging | merge_start_time | last_merged_time |
+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+
| 2023-12-22 16:55:24.023988 | 2024-05-09 02:03:14.332966 | 1005 | 0 | 1715191201458885386 | 1715191201458885386 | 0 | 1715191201458885386 | 0 | 0 | 0 | 1715191201495297 | 1715191394333195 |
| 2023-12-22 16:55:24.029360 | 2024-05-09 17:27:22.123728 | 1006 | 0 | 1715191202784774818 | 1715191202784774818 | 0 | 1715104801481078385 | 1 | 0 | 0 | 1715191203192248 | 1715110900059331 |
+----------------------------+----------------------------+-----------+---------+---------------------+----------------------+----------------+---------------------+--------------+------------+-----------------+------------------+------------------+
2 rows in set (0.002 sec)
2.6 重新查看合并信息,错误码已经消失当前还处于合并中,要想解决需要切换1006租户1号日志流的leader到其他节点。
select * from CDB_OB_MAJOR_COMPACTION;
2.7 查看1006租户的 1号日志流信息
select * from __all_virtual_ls_info where tenant_id = 1006 and ls_id = 1;
2.8 切换日志流,解决问题注:--查看是否有error,没有error后才可以进行切换日志流
select count(*) from __all_virtual_tablet_meta_table where tenant_id = 1006 and status = 1;
ALTER SYSTEM SWITCH REPLICA leader LS = 1 SERVER = '10.161.243.22:2882' TENANT = cbss_reports;
--确认切换完毕:
select * from __all_virtual_ls_info where tenant_id = 1006 and ls_id = 1;
-- 确认是否还有小于1715191202784774818版本的数据(成功应该是0):
select count(*) from __all_virtual_tablet_meta_table where tenant_id = 1006 and report_scn < 1715191202784774818;
END