重建损坏的PostgreSQL备库
场景说明:
由于磁盘资源不足, 将node2节点暂时关闭,此时node1、node3都在正常运行,待 增加空间后(此时主节点增加了操作数据,新增了WAL文件,已备份到node3节点),开启node2节点,此时发现node报错:找不到 000000010000000000000009。
错误内容如下:
[postgres@Node2 /]$ pg2024-04-24 10:00:54.080 CST [113689] LOG: started streaming WAL from primary at 0/9000000 on timeline 1
2024-04-24 10:00:54.080 CST [113689] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000009 has already been removed
2024-04-24 10:00:54.081 CST [94151] LOG: waiting for WAL to become available at 0/9002000
_ctl stop
@font-face{
font-family:"Times New Roman";
}
@font-face{
font-family:"宋体";
}
@font-face{
font-family:"Calibri";
}
p.MsoNormal{
mso-style-name:正文;
mso-style-parent:"";
margin:0pt;
margin-bottom:.0001pt;
mso-pagination:none;
text-align:justify;
text-justify:inter-ideograph;
font-family:Calibri;
mso-fareast-font-family:宋体;
mso-bidi-font-family:'Times New Roman';
font-size:10.5000pt;
mso-font-kerning:1.0000pt;
}
span.msoIns{
mso-style-type:export-only;
mso-style-name:"";
text-decoration:underline;
text-underline:single;
color:blue;
}
span.msoDel{
mso-style-type:export-only;
mso-style-name:"";
text-decoration:line-through;
color:red;
}
@page{mso-page-border-surround-header:no;
mso-page-border-surround-footer:no;}@page Section0{
}
div.Section0{page:Section0;}
1、恢复操作如下:
[postgres@Node2 /]$ pg_resetwal -l 00000001000000000000003F /postgres/data
pg_resetwal: error: lock file "postmaster.pid" exists
pg_resetwal: hint: Is a server running? If not, delete the lock file and try again.
[postgres@Node2 /]$
2、由于此时无法使用操作pg_resetwal 恢复,果断使用重建备库node2节点,操作如下:
p_ctl stop
cp -r /postgres/data/* /postgres/backup --建议清除前先备份
rm -r /postgres/data
--检查是否可以注册备节点
/postgres/server/bin/repmgr -h 192.168.4.71 -p5432 -U repmgr -d repmgr -f /postgres/repmgr/repmgr.conf standby clone --dry-run
--注意备库repmgr.conf文件中的目录要为空
/postgres/server/bin/repmgr -h 192.168.4.71 -p5432 -U repmgr -d repmgr -f /postgres/repmgr/repmgr.conf standby clone --force
--注册备库
/postgres/server/bin/repmgr -f /postgres/repmgr/repmgr.conf standby register
3、验证数据的完整性
cssd=>
cssd=> select relid,schemaname,relname from pg_stat_user_tables;
relid | schemaname | relname
-------+---------------+---------------------------------
17419 | schema_server | TInstrumentInLocation
18101 | schema_server | TSysFormFieldConfig
17072 | schema_server | TFlowRecycleAbnormalReturn
16658 | schema_server | TFlowCleanBatch
17941 | schema_server | TMasterTag
18173 | schema_server | TTraceClean
18234 | schema_server | TTrainDocument
18038 | schema_server | TReportAuth
18092 | schema_server | TSysConfiguration
17932 | schema_server | TMasterSupplier