仅供参考
《麒麟V10P2安装gaussdb3.0.3(cm一主两备)及扩容和缩容节点操作笔记》
安装后,检查发现数据库流复制槽有问题 ,应该是dn_6004,dn_6005,实际配置成了dn_6005,dn_6006,导致配置同步节点是只有1个同步
下面是解决方法
相关参数配置
##wal相关参数
gs_guc set -I all -N all -c "wal_level=logical"
gs_guc set -I all -N all -c "full_page_writes=off"
gs_guc set -I all -N all -c "wal_log_hints=off"
gs_guc set -I all -N all -c "xloginsert_locks=48"
gs_guc set -I all -N all -c "advance_xlog_file_num=10"
##复制相关参数
gs_guc set -I all -N all -c "synchronous_commit=on"
gs_guc set -I all -N all -c "wal_keep_segments=1024"
gs_guc set -I all -N all -c "max_wal_senders=16"
gs_guc set -I all -N all -c "recovery_max_workers=4"
gs_guc set -I all -N all -c "most_available_sync=on"
gs_guc set -I all -N all -c "max_size_for_xlog_prune=104857600"
gs_guc set -I all -N all -c "catchup2normal_wait_time=0"
gs_guc set -I all -N all -c "enable_slot_log=on"
gs_guc set -I all -N all -c "max_replication_slots=32"
gs_guc set -I all -N all -c "wal_receiver_timeout=60s"
gs_guc set -I all -N all -c "sync_config_strategy=none_node"
gs_guc set -I all -N all -c "archive_mode = on"
gs_guc set -I all -N all -c "archive_dest = '/gauss/archive'"
gs_guc reload -N gsdb01 -D /gauss/gaussdb/data/dn -c "synchronous_standby_names='2(dn_6002,dn_6004)'"
gs_guc reload -N gsdb02 -D /gauss/gaussdb/data/dn -c "synchronous_standby_names='2(dn_6003,dn_6004)'"
gs_guc reload -N gsdb03 -D /gauss/gaussdb/data/dn -c "synchronous_standby_names='2(dn_6001,dn_6004)'"
gs_guc reload -N gsdb04 -D /gauss/gaussdb/data/dn -c "synchronous_standby_names='2(dn_6001,dn_6005)'"
gs_guc reload -N gsdb05 -D /gauss/gaussdb/data/dn -c "synchronous_standby_names='2(dn_6001,dn_6004)'"
系统查询集群状态
[omm@gsdb03 ~]$ cm_ctl query -Civd
[ CMServer State ]
node node_ip instance state
------------------------------------------------------------------------
1 gsdb01 192.168.0.61 1 /gauss/gaussdb/cmserver/cm_server Standby
2 gsdb02 192.168.0.62 2 /gauss/gaussdb/cmserver/cm_server Standby
3 gsdb03 192.168.0.63 3 /gauss/gaussdb/cmserver/cm_server Primary
4 gsdb04 192.168.0.64 4 /gauss/gaussdb/cmserver/cm_server Standby
5 gsdb05 192.168.0.65 5 /gauss/gaussdb/cmserver/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state | node node_ip instance state | node node_ip instance state | node node_ip instance state
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.61 6001 /gauss/gaussdb/data/dn P Primary Normal | 2 gsdb02 192.168.0.62 6002 /gauss/gaussdb/data/dn S Standby Normal | 3 gsdb03 192.168.0.63 6003 /gauss/gaussdb/data/dn S Standby Normal | 4 gsdb04 192.168.0.64 6004 /gauss/gaussdb/data/dn S Standby Normal | 5 gsdb05 192.168.0.65 6005 /gauss/gaussdb/data/dn S Standby Normal
主库中查到的状态
openGauss=# select pid,usename,application_name,client_addr,sync_priority,sync_state from pg_stat_replication order by client_addr;
pid | usename | application_name | client_addr | sync_priority | sync_state
-----------------+---------+-------------------------------+--------------+---------------+------------
140703675119360 | omm | WalSender to Standby[dn_6002] | 192.168.0.62 | 1 | Sync
140703636256512 | omm | WalSender to Standby[dn_6003] | 192.168.0.63 | 0 | Async
140703519733504 | omm | WalSender to Standby[dn_6005] | 192.168.0.64 | 0 | Async
140703565936384 | omm | WalSender to Standby[dn_6006] | 192.168.0.65 | 0 | Async
(4 rows)
gsdb04 查看配置文件参数
[omm@gsdb04 dn]$ grep dn_600 postgresql.conf
synchronous_standby_names = '2(dn_6001,dn_6005)' # standby servers that provide sync rep
log_directory = '/gauss/gaussdb/log/omm/pg_log/dn_6005' # directory where log files are written,
pgxc_node_name = 'dn_6001_6002_6003_6004_6005' # Coordinator or Datanode name
audit_directory = '/gauss/gaussdb/log/omm/pg_audit/dn_6005'
application_name = 'dn_6005'
更改配置参数
gs_guc set -N gsdb04 -D /gauss/gaussdb/data/dn -c "application_name = 'dn_6004'"
gs_guc set -N gsdb04 -D /gauss/gaussdb/data/dn -c "log_directory = '/gauss/gaussdb/log/omm/pg_log/dn_6004'"
gs_guc set -N gsdb04 -D /gauss/gaussdb/data/dn -c "audit_directory = '/gauss/gaussdb/log/omm/pg_audit/dn_6004'"
更改后配置参数
[omm@gsdb04 dn]$ grep dn_600 postgresql.conf
synchronous_standby_names = '2(dn_6001,dn_6005)' # standby servers that provide sync rep
log_directory = '/gauss/gaussdb/log/omm/pg_log/dn_6004' # directory where log files are written,
pgxc_node_name = 'dn_6001_6002_6003_6004_6005' # Coordinator or Datanode name
audit_directory = '/gauss/gaussdb/log/omm/pg_audit/dn_6004'
application_name = 'dn_6004'
重启gsdb04 集群实例
[omm@gsdb04 dn]$ cm_ctl stop -n 4
cm_ctl: stop the node: 4.
cm_ctl: stop node, nodeid: 4
.........
cm_ctl: stop node successfully.
[omm@gsdb04 dn]$ cm_ctl start -n 4
cm_ctl: start the node:4.
..
cm_ctl: start node successfully.
库中查看状态 dn_6005变成了dn_6004为正常了
openGauss=# select pid,usename,application_name,client_addr,sync_priority,sync_state from pg_stat_replication order by client_addr;
pid | usename | application_name | client_addr | sync_priority | sync_state
-----------------+---------+-------------------------------+--------------+---------------+------------
140703675119360 | omm | WalSender to Standby[dn_6002] | 192.168.0.62 | 1 | Sync
140703636256512 | omm | WalSender to Standby[dn_6003] | 192.168.0.63 | 0 | Async
140703519733504 | omm | WalSender to Standby[dn_6004] | 192.168.0.64 | 2 | Sync
140703565936384 | omm | WalSender to Standby[dn_6006] | 192.168.0.65 | 0 | Async
(4 rows)
同理修改dn_6006为dn_6005
修改之前参数配置
[omm@gsdb05 dn]$ grep dn_600 postgresql.conf
synchronous_standby_names = '2(dn_6001,dn_6004)' # standby servers that provide sync rep
log_directory = '/gauss/gaussdb/log/omm/pg_log/dn_6006' # directory where log files are written,
pgxc_node_name = 'dn_6001_6002_6003_6004_6005' # Coordinator or Datanode name
audit_directory = '/gauss/gaussdb/log/omm/pg_audit/dn_6006'
application_name = 'dn_6006'
更改配置参数
gs_guc set -N gsdb05 -D /gauss/gaussdb/data/dn -c "application_name = 'dn_6005'"
gs_guc set -N gsdb05 -D /gauss/gaussdb/data/dn -c "log_directory = '/gauss/gaussdb/log/omm/pg_log/dn_6005'"
gs_guc set -N gsdb05 -D /gauss/gaussdb/data/dn -c "audit_directory = '/gauss/gaussdb/log/omm/pg_audit/dn_6005'"
更改后配置参数
[omm@gsdb05 dn]$ grep dn_600 postgresql.conf
synchronous_standby_names = '2(dn_6001,dn_6004)' # standby servers that provide sync rep
log_directory = '/gauss/gaussdb/log/omm/pg_log/dn_6005' # directory where log files are written,
pgxc_node_name = 'dn_6001_6002_6003_6004_6005' # Coordinator or Datanode name
audit_directory = '/gauss/gaussdb/log/omm/pg_audit/dn_6005'
application_name = 'dn_6005'
重启gsdb05 集群实例
[omm@gsdb05 dn]$ cm_ctl stop -n 5
cm_ctl: stop the node: 5.
cm_ctl: stop node, nodeid: 5
.........
cm_ctl: stop node successfully.
[omm@gsdb05 dn]$ cm_ctl start -n 5
cm_ctl: start the node:5.
..
cm_ctl: start node successfully.
主库中查看状态 dn_6006变成了dn_6005也正常了
openGauss=# select pid,usename,application_name,client_addr,sync_priority,sync_state from pg_stat_replication order by client_addr;
pid | usename | application_name | client_addr | sync_priority | sync_state
-----------------+---------+-------------------------------+--------------+---------------+------------
140703675119360 | omm | WalSender to Standby[dn_6002] | 192.168.0.62 | 1 | Sync
140703636256512 | omm | WalSender to Standby[dn_6003] | 192.168.0.63 | 0 | Async
140703519733504 | omm | WalSender to Standby[dn_6004] | 192.168.0.64 | 2 | Sync
140703565936384 | omm | WalSender to Standby[dn_6005] | 192.168.0.65 | 0 | Async
(4 rows)
无效复制槽
查看复制槽
select slot_name,coalesce(plugin,'_') as plugin,
slot_type,datoid,coalesce(database,'_') as database,
active,coalesce(xmin,'_') as xmin,
pg_size_pretty(pg_xlog_location_diff(CASE WHEN pg_is_in_recovery() THEN pg_last_xlog_receive_location() ELSE pg_current_xlog_location() END , restart_lsn)) AS retained_bytes
from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | xmin | retained_bytes
-----------+--------+-----------+--------+----------+--------+------+----------------
dn_6002 | _ | physical | 0 | _ | t | 0 | 0 bytes
dn_6003 | _ | physical | 0 | _ | t | 0 | 0 bytes
dn_6005 | _ | physical | 0 | _ | t | 0 | 0 bytes
dn_6006 | _ | physical | 0 | _ | f | 0 | 8861 kB
dn_6004 | _ | physical | 0 | _ | t | 0 | 0 bytes
(5 rows)
清理复制槽
select pg_drop_replication_slot('dn_6006');
清理后
openGauss-# from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | xmin | retained_bytes
-----------+--------+-----------+--------+----------+--------+------+----------------
dn_6002 | _ | physical | 0 | _ | t | 0 | 0 bytes
dn_6003 | _ | physical | 0 | _ | t | 0 | 0 bytes
dn_6005 | _ | physical | 0 | _ | t | 0 | 0 bytes
dn_6004 | _ | physical | 0 | _ | t | 0 | 0 bytes
[omm@gsdb04 dn]$ cm_ctl stop
cm_ctl: stop cluster.
cm_ctl: stop nodeid: 1
cm_ctl: stop nodeid: 2
cm_ctl: stop nodeid: 3
cm_ctl: stop nodeid: 4
cm_ctl: stop nodeid: 5
[1] 19:44:46 [FAILURE] 192.168.0.63 Exited with error code 255
[1] 19:44:46 [FAILURE] 192.168.0.64 Exited with error code 255
[1] 19:44:47 [FAILURE] 192.168.0.65 Exited with error code 255
[1] 19:44:47 [FAILURE] 192.168.0.62 Exited with error code 255
[1] 19:44:47 [FAILURE] 192.168.0.61 Exited with error code 255
....
cm_ctl: stop cluster successfully.
[omm@gsdb04 dn]$ cm_ctl start
cm_ctl: checking cluster status.
cm_ctl: checking cluster status.
cm_ctl: checking finished in 1457 ms.
cm_ctl: The ssh connection time out or the ssh trust relationship is abnormal on some nodes. But the cluster will continue to start.
cm_ctl: start cluster.
cm_ctl: start nodeid: 1
cm_ctl: start nodeid: 2
cm_ctl: start nodeid: 3
cm_ctl: start nodeid: 4
cm_ctl: start nodeid: 5
[1] 19:45:04 [FAILURE] 192.168.0.61 Exited with error code 255
[1] 19:45:04 [FAILURE] 192.168.0.65 Exited with error code 255
[1] 19:45:04 [FAILURE] 192.168.0.62 Exited with error code 255
[1] 19:45:04 [FAILURE] 192.168.0.64 Exited with error code 255
[1] 19:45:04 [FAILURE] 192.168.0.63 Exited with error code 255
.............
cm_ctl: start cluster successfully.
[omm@gsdb04 dn]$ cm_ctl query -Civd
[ CMServer State ]
node node_ip instance state
------------------------------------------------------------------------
1 gsdb01 192.168.0.61 1 /gauss/gaussdb/cmserver/cm_server Standby
2 gsdb02 192.168.0.62 2 /gauss/gaussdb/cmserver/cm_server Standby
3 gsdb03 192.168.0.63 3 /gauss/gaussdb/cmserver/cm_server Primary
4 gsdb04 192.168.0.64 4 /gauss/gaussdb/cmserver/cm_server Standby
5 gsdb05 192.168.0.65 5 /gauss/gaussdb/cmserver/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state | node node_ip instance state | node node_ip instance state | node node_ip instance state
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.61 6001 /gauss/gaussdb/data/dn P Standby Normal | 2 gsdb02 192.168.0.62 6002 /gauss/gaussdb/data/dn S Primary Normal | 3 gsdb03 192.168.0.63 6003 /gauss/gaussdb/data/dn S Standby Normal | 4 gsdb04 192.168.0.64 6004 /gauss/gaussdb/data/dn S Standby Normal | 5 gsdb05 192.168.0.65 6005 /gauss/gaussdb/data/dn S Standby Normal
另一个问题,操作系统查看集群状态,同步状态和数据库中不一致,暂未解决,以数据库中同步状态为准
[omm@gsdb02 ~]$ gs_om -t status --all|grep sync
sync_state : Async
sync_state : Unknown
sync_state : Async
sync_state : Async
sync_state : Async
sync_state : Async