如何处理资源池化开启RTO后出现not mark dirt问题
问题现象
出现场景是failover时,备机持有最新页面,主机从备机获取最新页面后,应该置脏的,但是没有置脏。
buffer置脏:buffinfo->dirtyflag = false
磁盘上的lsn:buffDesc->lsn_on_disk 2/419077B0,
buffer的lsn:PageGetLSN(bufferinfo->pageinfo.page) 2/4198B4F0
xlog的lsn:bufferinfo->lsn 2/419077B0
总结:xlog的lsn = 磁盘上的lsn < buffer的lsn
2024-02-02 15:12:24.163 [unknown] [unknown] localhost 140190590953216 0[0:0#0] 0 [BACKEND] PANIC: extreme_rto segment page not mark dirty:lsn 2/419077B0, lsn_disk 2/40187550, lsn_page 2/4198B4F0, page 1663/15201/5004 60990
报错点
bufferinfo没有被置脏,但是页面是最新页面
SSMarkBufferDirtyForERTO,初步怀疑BUF_ERTO_NEED_MARK_DIRTY为什么被异常。
相关日志
MarkSegPageRedoChildPageDirty
lsn_on_disk 不对
分析结果
1
报错信息
报错页面:1663/16388/5005/16384 0-3606,现在的问题是,该页面lsn_on_disk(0/DF6D60C8),小于buffer上的lsn(0/E8C0D3B0),但是没有被置脏。
extreme_rto segment page not mark dirty:lsn 0/DDAACD58, lsn_disk 0/DF6D60C8, lsn_page 0/E8C0D3B0, page 1663/16388/5005 3606
2024-02-06 14:57:16.998 [unknown] [unknown] localhost 281456656767520 0[0:0#0] 0 [BACKEND] PANIC: extreme_rto segment page not mark dirty:lsn 0/DDAACD58, lsn_disk 0/DF6D60C8, lsn_page 0/E8C0D3B0, page 1663/16388/5005 3606
2024-02-06 14:57:16.998 [unknown] [unknown] localhost 281456656767520 0[0:0#0] 0 [BACKEND] CONTEXT: xlog redo [segpage] segment head extend: relfilenode/fork:, nblocks[3606->3607], (phy loc 128/101203), reset_zero:1
2024-02-06 14:57:16.998 [unknown] [unknown] localhost 281456656767520 0[0:0#0] 0 [BACKEND] BACKTRACELOG: tid[2717562]'s backtrace:
/home/zhoucong/work/openGauss-server-list/openGauss-server/dest/bin/gaussdb() [0x1163ac4]
/home/zhoucong/work/openGauss-server-list/openGauss-server/dest/bin/gaussdb(_Z9errfinishiz+0x324) [0x1157c1c]
/home/zhoucong/work/openGauss-server-list/openGauss-server/dest/bin/gaussdb(_Z29MarkSegPageRedoChildPageDirtyP14RedoBufferInfo+0x2a4) [0x22038ec]
/home/zhoucong/work/openGauss-server-list/openGauss-server/dest/bin/gaussdb(_Z21SegPageRedoChildStateP17XLogRecParseState+0x84) [0x2203a04]
/home/zhoucong/work/openGauss-server-list/openGauss-server/dest/bin/gaussdb(_Z21ProcSegPageCommonRedoP17XLogRecParseState+0xf0) [0x2203bf8]
/home/zhoucong/work/openGauss-server-list/openGauss-server/dest/bin/gaussdb(_ZN11extreme_rto24RedoPageManagerDdlActionEP17XLogRecParseState+0x108) [0x20055b8]
/home/zhoucong/work/openGauss-server-list/openGauss-server/dest/bin/gaussdb(_ZN11extreme_rto35PageManagerProcSegPipeLineSyncStateEP17XLogRecParseState+0x120) [0x20060ec]
/home/zhoucong/work/openGauss-server-list/openGauss-server/dest/bin/gaussdb(_ZN11extreme_rto25PageManagerRedoParseStateEP17XLogRecParseState+0xcc) [0x20063d8]
/home/zhoucong/work/openGauss-server-list/openGauss-server/dest/bin/gaussdb(_ZN11extreme_rto30PageManagerRedoDistributeItemsEP17XLogRecParseState+0xc0) [0x20066b0]
/home/zhoucong/work/openGauss-server-list/openGauss-server/dest/bin/gaussdb(_ZN11extreme_rto19RedoPageManagerMainEv+0x12c) [0x20067f8]
新增日志:
2024-02-06 14:57:12.829:Mark need flush in flush copy 表示需要刷盘这里会置脏。
2024-02-06 14:57:16.984:XLogBlockRedoForExtremeRTO表示对该页面做回放, redoaction=2(REDO_DONE),表示页面没有经过回放。
2024-02-06 14:57:16.984:MarkBufferDirtyForETRO表示对页面置脏,此时并没有脏页标记。
postgresql-2024-02-06_145411.log:2024-02-06 14:54:57.448 [unknown] [unknown] localhost 281447834966560 0[0:0#0] 0 [BACKEND] LOG: [FlushBuffer] FlushBuffer, lsn_on_disk: 0/ddaacd58, bufferinfo.lsn: 0/ddaaea88spc/db/rel/bucket fork-block: 1663/16388/5005/16384 0-3606.
postgresql-2024-02-06_145659.log:2024-02-06 14:57:12.829 [unknown] [unknown] localhost 281458419489312 0[0:0#0] 0 [BACKEND] LOG: [SS] Mark need flush in flush copy, spc/db/rel/bucket fork-block: 1663/16388/5005/16384 0-3606, page lsn (0xe8c0d3b0)
postgresql-2024-02-06_145659.log:2024-02-06 14:57:16.984 [unknown] [unknown] localhost 281456656767520 0[0:0#0] 0 [BACKEND] WARNING: [XLogBlockRedoForExtremeRTO] XLogBlockRedoForExtremeRTO, redoaction: 2spc/db/rel/bucket fork-block: 1663/16388/5005/16384 0-3606.
postgresql-2024-02-06_145659.log:2024-02-06 14:57:16.984 [unknown] [unknown] localhost 281456656767520 0[0:0#0] 0 [BACKEND] WARNING: [SSMarkBufferDirtyForERTO] MarkBufferDirtyForETRO, buf_ctl->state: 32spc/db/rel/bucket fork-block: 1663/16388/5005/16384 0-3606.
postgresql-2024-02-06_145659.log:2024-02-06 14:57:16.984 [unknown] [unknown] localhost 281456656767520 0[0:0#0] 0 [BACKEND] WARNING: [SS] clear BUF_DIRTY_NEED_FLUSH, spc/db/rel/bucket fork-block: 1663/16388/5005/16384 0-3606
postgresql-2024-02-06_145659.log:2024-02-06 14:57:16.984 [unknown] [unknown] localhost 281456656767520 0[0:0#0] 0 [BACKEND] LOG: [SS] find, spc/db/rel/bucket fork-block: 1663/16388/5005/16384 0-3606
2
置脏代码
置脏代码:SegPageRedoChildState回放段页式头页之后,会调用MarkSegPageRedoChildPageDirty给页面置脏,会首先调用SSMarkBufferDirtyForERTO方法,最终会将脏页标记打在bufferinfo->dirtyflag上。dms_buf_ctl_t 是dms上buffer的控制信息,。以下场景会置脏:
buf_ctrl带有标记BUF_ERTO_NEED_MARK_DIRTY
buf_ctrl带有标记BUF_DIRTY_NEED_FLUSH
bufDesc->extra->lsn_on_disk == Invalid 说明页面是从备机请求过来的
补充日志发现, buf_ctl->state: 32 (BUF_IS_RELPERSISTENT),没有buffer标记。
void SSMarkBufferDirtyForERTO(RedoBufferInfo* bufferinfo)
{
if (!ENABLE_DMS || bufferinfo->pageinfo.page == NULL) {
return;
}
/* For buffer need flush, we need to mark dirty here */
if (!IsRedoBufferDirty(bufferinfo)) {
dms_buf_ctrl_t* buf_ctrl = GetDmsBufCtrl(bufferinfo->buf - 1);
BufferDesc *bufDesc = GetBufferDescriptor(bufferinfo->buf - 1);
if (buf_ctrl->state & BUF_ERTO_NEED_MARK_DIRTY) { buf_ctrl->state == BUF_IS_RELPERSISTENT
MakeRedoBufferDirty(bufferinfo);
} else if ((buf_ctrl->state & BUF_DIRTY_NEED_FLUSH) || CheckPageNeedSkipInRecovery(bufferinfo->buf) ||
XLogRecPtrIsInvalid(bufDesc->extra->lsn_on_disk)) {
buf_ctrl->state |= BUF_ERTO_NEED_MARK_DIRTY;
MakeRedoBufferDirty(bufferinfo);
}
}
}
3
分析页面
执行如下命令分析页面
./pagehack -t heap -f +data/base/16388/3 -s 101203 -D -c UDS:/home/zhoucong/work/dss/dss0/.dss_unix_d_socket > ~/3
实际页面lsn为 0/DDAAEA88,lsn_on_disk(0/DF6D60C8),buffer上的lsn(0/E8C0D3B0)
page information of block 101203/114688
pd_lsn: 0/DDAAEA88
pd_checksum: 0x4C1E, verify success
pd_flags:
pd_lower: 1488, non-empty
pd_upper: 2312, old
pd_special: 8168, size 24
Page size & version: 8192, 5
pd_xid_base: 9182949318625544, pd_multi_base: 9182811879677896
pd_prune_xid: 9182949318625544
4
dms日志
_20240206145629759.dlog.gz:UTC+8 2024-02-06 14:56:09.124|DMS|2696417|INFO>[DMS][1663/16388/5005/16384/0 0-3606][proc claim owner]: src_id=1, src_sid=559, dst_id=0, dst_sid=65535, has_edp=0, req_mode=1 [dms_msg.c:1242]
dms_20240206145629759.dlog.gz:UTC+8 2024-02-06 14:56:09.125|DMS|2696417|INFO>[DCS][1663/16388/5005/16384/0 0-3606][drc_claim_page_owner]: mode =1, claimed_owner=0, edp_map=0, copy_insts=2 [drc_page.c:497]
dms.dlog:UTC+8 2024-02-06 14:57:06.008|DMS|2717224|INFO>[DRC rebuild][1663/16388/5005/16384/0 0-3606]remote_ditry: 0, lock_mode: 1, is_edp: 0, inst_id: 1, lsn: 3904951216, is_dirty: 0 [dms_reform_drc_rebuild.c:92]
dms.dlog:UTC+8 2024-02-06 14:57:06.008|DMS|2717224|INFO>[DRC][1663/16388/5005/16384/0 0-3606]buf_res create successful [drc_res_mgr.c:311]
dms.dlog:UTC+8 2024-02-06 14:57:11.340|DMS|2717284|INFO>[DRC repair][1663/16388/5005/16384/0 0-3606]1-1-0, CVT:255-0-0-0-0-18446744073709551615-65535, EDP:255-0-0, FLAG:0-1-0 [dms_reform_drc_repair.c:247]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DCS][1663/16388/5005/16384/0 0-3606][dcs request page enter] [dcs_page.c:373]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DMS][1663/16388/5005/16384/0 0-3606][ask master local]: src_id=0, req_mode=1, curr_mode=0, prep_ruid=77549 [dms_msg.c:558]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DMS][1663/16388/5005/16384/0 0-3606][dms_ask_master4res_l] result type=2 [dms_msg.c:588]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DMS]1663/16388/5005/16384/0 0-3606][ask owner for res]: send ok, src_id=0, src_sid=702, dst_id=1, dst_sid=65535, req_mode=1 [dms_msg.c:397]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DCS][1663/16388/5005/16384/0 0-3606][owner ack page ready]: lock mode=1, edp=0, src_id=1, src_sid=599, dest_id=0,dest_sid=702, mode=1, remote dirty=0, remote remote diry=0, page_lsn=0, page_scn=0,curr_page_lsn=3904951216, curr_global_lsn=0 [dcs_page.c:327]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DMS][1663/16388/5005/16384/0 0-3606][claim ownership req]: send ok, src_id=0, src_sid=702, dst_id=0, dst_sid=65535, has_edp=0, ruid=0 [dms_msg.c:294]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717231|INFO>[DMS][1663/16388/5005/16384/0 0-3606][proc claim owner]: src_id=0, src_sid=702, dst_id=0, dst_sid=65535, has_edp=0, req_mode=1 [dms_msg.c:1242]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717231|INFO>[DCS][1663/16388/5005/16384/0 0-3606][drc_claim_page_owner]: mode =1, claimed_owner=1, edp_map=0, copy_insts=1 [drc_page.c:497]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DCS][1663/16388/5005/16384/0 0-3606][dcs request page leave] ret: 0 [dcs_page.c:375]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DCS][1663/16388/5005/16384/0 0-3606][dcs request page enter] [dcs_page.c:373]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DMS][1663/16388/5005/16384/0 0-3606][ask master local]: src_id=0, req_mode=2, curr_mode=1, prep_ruid=201176 [dms_msg.c:558]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DMS][1663/16388/5005/16384/0 0-3606][dms_ask_master4res_l] result type=2 [dms_msg.c:588]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DMS]1663/16388/5005/16384/0 0-3606][ask owner for res]: send ok, src_id=0, src_sid=925, dst_id=1, dst_sid=65535, req_mode=2 [dms_msg.c:397]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DCS][1663/16388/5005/16384/0 0-3606][owner ack page ready]: lock mode=2, edp=0, src_id=1, src_sid=609, dest_id=0,dest_sid=925, dirty=0, remote diry=0, page_lsn=0, page_scn=0 [dcs_page.c:303]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DCS][1663/16388/5005/16384/0 0-3606][owner ack page ready]: lock mode=2, edp=0, src_id=1, src_sid=609, dest_id=0,dest_sid=925, mode=2, remote dirty=0, remote remote diry=0, page_lsn=0, page_scn=0,curr_page_lsn=3904951216, curr_global_lsn=0 [dcs_page.c:327]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DMS][1663/16388/5005/16384/0 0-3606][claim ownership req]: send ok, src_id=0, src_sid=925, dst_id=0, dst_sid=65535, has_edp=0, ruid=0 [dms_msg.c:294]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717230|INFO>[DMS][1663/16388/5005/16384/0 0-3606][proc claim owner]: src_id=0, src_sid=925, dst_id=0, dst_sid=65535, has_edp=0, req_mode=2 [dms_msg.c:1242]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717230|INFO>[DCS][1663/16388/5005/16384/0 0-3606][drc_claim_page_owner]: mode =2, claimed_owner=0, edp_map=0, copy_insts=0 [drc_page.c:497]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DCS][1663/16388/5005/16384/0 0-3606][dcs request page leave] ret: 0 [dcs_page.c:375]
5
分析xlog
分析页面时发现,0/DF6D60C8修改的页面是1663/16388/4986/16384/0,不是出问题的1663/16388/5005/16384,目前来看这个页面的lsn错乱了
REDO @ 0/DF6D6058; LSN 0/DF6D60C8: prev 0/DF6D5FD0; xid 0; term 1; len 17; total 111; crc 11782719; desc: Heap2 - visible: cutoff xid 14253, blkref #0: rel 1663/16388/4986/16384/0, forknum:2 storage SEGMENT PAGE fork vm blk 0 (phy loc 8/4549) lastlsn 0/DF6D6058, blkref #1: rel 1663/16388/4986/16384/0, forknum:0 storage SEGMENT PAGE blk 41192 (phy loc 1024/136485) lastlsn 0/9C8773F0
REDO @ 0/DF6D60C8; LSN 0/DF6D6138: prev 0/DF6D6058; xid 0; term 1; len 17; total 111; crc 3273069089; desc: Heap2 - visible: cutoff xid 14253, blkref #0: rel 1663/16388/4986/16384/0, forknum:2 storage SEGMENT PAGE fork vm blk 0 (phy loc 8/4549) lastlsn 0/DF6D60C8, blkref #1: rel 1663/16388/4986/16384/0, forknum:0 storage SEGMENT PAGE blk 41193 (phy loc 1024/136486) lastlsn 0/9C87C1A8
REDO @ 0/DF6D6138; LSN 0/DF6D61A8: prev 0/DF6D60C8; xid 0; term 1; len 17; total 111; crc 3465733980; desc: Heap2 - visible: cutoff xid 14253, blkref #0: rel 1663/16388/4986/16384/0, forknum:2 storage SEGMENT PAGE fork vm blk 0 (phy loc 8/4549) lastlsn 0/DF6D6138, blkref #1: rel 1663/16388/4986/16384/0, forknum:0 storage SEGMENT PAGE blk 41194 (phy loc 1024/136487) lastlsn 0/9C880FE8
6
DMS日志
回放该日志时,主节点通过dms向备节点要了页面,所以该页面不是通过磁盘读取的,而是通过dms从备机获取到的。
UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DMS]1663/16388/5005/16384/0 0-3606][ask owner for res]: send ok, src_id=0, src_sid=925, dst_id=1, dst_sid=65535, req_mode=2 [dms_msg.c:397]
_20240206145629759.dlog.gz:UTC+8 2024-02-06 14:56:09.124|DMS|2696417|INFO>[DMS][1663/16388/5005/16384/0 0-3606][proc claim owner]: src_id=1, src_sid=559, dst_id=0, dst_sid=65535, has_edp=0, req_mode=1 [dms_msg.c:1242]
dms_20240206145629759.dlog.gz:UTC+8 2024-02-06 14:56:09.125|DMS|2696417|INFO>[DCS][1663/16388/5005/16384/0 0-3606][drc_claim_page_owner]: mode =1, claimed_owner=0, edp_map=0, copy_insts=2 [drc_page.c:497]
dms.dlog:UTC+8 2024-02-06 14:57:06.008|DMS|2717224|INFO>[DRC rebuild][1663/16388/5005/16384/0 0-3606]remote_ditry: 0, lock_mode: 1, is_edp: 0, inst_id: 1, lsn: 3904951216, is_dirty: 0 [dms_reform_drc_rebuild.c:92]
dms.dlog:UTC+8 2024-02-06 14:57:06.008|DMS|2717224|INFO>[DRC][1663/16388/5005/16384/0 0-3606]buf_res create successful [drc_res_mgr.c:311]
dms.dlog:UTC+8 2024-02-06 14:57:11.340|DMS|2717284|INFO>[DRC repair][1663/16388/5005/16384/0 0-3606]1-1-0, CVT:255-0-0-0-0-18446744073709551615-65535, EDP:255-0-0, FLAG:0-1-0 [dms_reform_drc_repair.c:247]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DCS][1663/16388/5005/16384/0 0-3606][dcs request page enter] [dcs_page.c:373]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DMS][1663/16388/5005/16384/0 0-3606][ask master local]: src_id=0, req_mode=1, curr_mode=0, prep_ruid=77549 [dms_msg.c:558]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DMS][1663/16388/5005/16384/0 0-3606][dms_ask_master4res_l] result type=2 [dms_msg.c:588]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DMS]1663/16388/5005/16384/0 0-3606][ask owner for res]: send ok, src_id=0, src_sid=702, dst_id=1, dst_sid=65535, req_mode=1 [dms_msg.c:397]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DCS][1663/16388/5005/16384/0 0-3606][owner ack page ready]: lock mode=1, edp=0, src_id=1, src_sid=599, dest_id=0,dest_sid=702, mode=1, remote dirty=0, remote remote diry=0, page_lsn=0, page_scn=0,curr_page_lsn=3904951216, curr_global_lsn=0 [dcs_page.c:327]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DMS][1663/16388/5005/16384/0 0-3606][claim ownership req]: send ok, src_id=0, src_sid=702, dst_id=0, dst_sid=65535, has_edp=0, ruid=0 [dms_msg.c:294]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717231|INFO>[DMS][1663/16388/5005/16384/0 0-3606][proc claim owner]: src_id=0, src_sid=702, dst_id=0, dst_sid=65535, has_edp=0, req_mode=1 [dms_msg.c:1242]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717231|INFO>[DCS][1663/16388/5005/16384/0 0-3606][drc_claim_page_owner]: mode =1, claimed_owner=1, edp_map=0, copy_insts=1 [drc_page.c:497]
dms.dlog:UTC+8 2024-02-06 14:57:12.824|DMS|2717284|INFO>[DCS][1663/16388/5005/16384/0 0-3606][dcs request page leave] ret: 0 [dcs_page.c:375]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DCS][1663/16388/5005/16384/0 0-3606][dcs request page enter] [dcs_page.c:373]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DMS][1663/16388/5005/16384/0 0-3606][ask master local]: src_id=0, req_mode=2, curr_mode=1, prep_ruid=201176 [dms_msg.c:558]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DMS][1663/16388/5005/16384/0 0-3606][dms_ask_master4res_l] result type=2 [dms_msg.c:588]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DMS]1663/16388/5005/16384/0 0-3606][ask owner for res]: send ok, src_id=0, src_sid=925, dst_id=1, dst_sid=65535, req_mode=2 [dms_msg.c:397]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DCS][1663/16388/5005/16384/0 0-3606][owner ack page ready]: lock mode=2, edp=0, src_id=1, src_sid=609, dest_id=0,dest_sid=925, dirty=0, remote diry=0, page_lsn=0, page_scn=0 [dcs_page.c:303]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DCS][1663/16388/5005/16384/0 0-3606][owner ack page ready]: lock mode=2, edp=0, src_id=1, src_sid=609, dest_id=0,dest_sid=925, mode=2, remote dirty=0, remote remote diry=0, page_lsn=0, page_scn=0,curr_page_lsn=3904951216, curr_global_lsn=0 [dcs_page.c:327]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DMS][1663/16388/5005/16384/0 0-3606][claim ownership req]: send ok, src_id=0, src_sid=925, dst_id=0, dst_sid=65535, has_edp=0, ruid=0 [dms_msg.c:294]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717230|INFO>[DMS][1663/16388/5005/16384/0 0-3606][proc claim owner]: src_id=0, src_sid=925, dst_id=0, dst_sid=65535, has_edp=0, req_mode=2 [dms_msg.c:1242]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717230|INFO>[DCS][1663/16388/5005/16384/0 0-3606][drc_claim_page_owner]: mode =2, claimed_owner=0, edp_map=0, copy_insts=0 [drc_page.c:497]
dms.dlog:UTC+8 2024-02-06 14:57:16.984|DMS|2717562|INFO>[DCS][1663/16388/5005/16384/0 0-3606][dcs request page leave] ret: 0 [dcs_page.c:375]
经过分析,该问题是在回放时该页面已经落盘,后续回放时,强制从备机获取最新页面,但是在buffer换入时没有更新旧的buffer的lsn_on_disk,并且lsn_on_disk != invalid && lsn_on_disk < page.lsn,但却没有被置脏,最终因为校验失败而报错。
修改方法
在TerminateReadPage将从备机获取到的新页面的lsn_on_disk初始化成InvalidXLogRecPtr,后续逻辑会给该页面置脏
@@ -326,6 +326,7 @@ Buffer TerminateReadPage(BufferDesc* buf_desc, ReadBufferMode read_mode, const X
buf_desc->extra->seg_fileno == EXTENT_INVALID) {
CalcSegDmsPhysicalLoc(buf_desc, buffer, !g_instance.dms_cxt.SSRecoveryInfo.in_flushcopy);
}
++ buf_desc->extra->lsn_on_disk = InvalidXLogRecPtr;
}
if (BufferIsValid(buffer)) {
buf_ctrl->been_loaded = true;
其他
BufferDesc说明
(gdb) p *bufdesc
$5 = {tag = {rnode = {spcNode = 1664, dbNode = 0, relNode = 4161, bucketNode = 16384, opt = 0}, forkNum = 0, blockNum = 0}, state = 14390126709355577345, buf_id = 1, wait_backend_pid = 0, io_in_progress_lock = 0xfffb1963bb00,
content_lock = 0xfffb1963bb80, extra = 0xfffbad50a230, lsn_dirty = 0}
(gdb) p *bufdesc->extra
$4 = {seg_fileno = 0 ' 00', seg_blockno = 4181, rec_lsn = 1086214936, dirty_queue_loc = 1, encrypt = false, lsn_on_disk = 1085681688, aio_in_progress = false}
bufdesc->tag->rnode->spcNode:表空间的id
bufdesc->tag->rnode->dbNode: 数据库id
bufdesc->tag->rnode->relNode:表的id
bufdesc->tag->rnode->bucketNode: 高斯的bucket表用的,用于表示该buffer的类型,其中段页式普通表bucketNode=16384。
bufdesc->tag->forkNum:表的文件后缀信息
bufdesc->tag->blockNum:页面号(普通表直接使用页面号,段页式表这里是逻辑页面,实际页面保存在bufdesc->extra中,在读buffer时通过seg_logic_to_physic_mapping函数换算)
bufdesc->content_lock:buffer内容锁
bufdesc->extra->seg_fileno:段页式文件的文件号
bufdesc->extra->seg_blockno:段页式文件的物理block号,每个段页式文件有131072个block。
bufdesc->extra->rec_lsn:
bufdesc->extra->lsn_on_disk:磁盘上修改该buffer的最新lsn
typedef struct RelFileNode {
Oid spcNode; /* tablespace */
Oid dbNode; /* database */
Oid relNode; /* relation */
int2 bucketNode; /* bucketid */
uint2 opt;
} RelFileNode;
/*
* The physical storage of a relation consists of one or more forks. The
* main fork is always created, but in addition to that there can be
* additional forks for storing various metadata. ForkNumber is used when
* we need to refer to a specific fork in a relation.
*/
typedef int ForkNumber;
#define SEGMENT_EXT_8192_FORKNUM -8
#define SEGMENT_EXT_1024_FORKNUM -7
#define SEGMENT_EXT_128_FORKNUM -6
#define SEGMENT_EXT_8_FORKNUM -5
#define PAX_DFS_TRUNCATE_FORKNUM -4
#define PAX_DFS_FORKNUM -3
#define DFS_FORKNUM -2
#define InvalidForkNumber -1
#define MAIN_FORKNUM 0
#define FSM_FORKNUM 1
#define VISIBILITYMAP_FORKNUM 2
#define BCM_FORKNUM 3
#define INIT_FORKNUM 4
// used for data file cache, you can modify than as you like
#define PCA_FORKNUM 5
#define PCD_FORKNUM 6