TiDB 集群跨平台在线迁移方案（离线环境下从 x86 节点迁移到 arm64 节点）CSDN博客

系统运维 2023-10-13 宇宙之一粟手机阅读

作者：pepezzzz

原文来源： tidb.net/blog/65522d…

背景

TiDB 集群具备跨平台架构的部署和扩缩容能力，能实现在业务在线的情况下完成整个集群的跨平台架构迁移。本文将详细介绍如何实现，以及如何在离线环境下从 x86 节点迁移到 arm64 节点。

迁移步骤

1、数据备份

进行集群的数据备份。

2、下载 & 合并镜像目录

官网下载页面（ pingcap.com/zh/product-…64 架构包需要额外下载。

$tar xf tidb-community-server-v5.1.1-linux-arm64.tar.gz
$tar xf tidb-community-server-v5.1.1-linux-amd64.tar.gz
$cd  tidb-community-server-v5.1.1-linux-amd64
$cp -rp keys ~/.tiup/
$tiup mirror merge ../tidb-community-server-v5.1.1-linux-arm64

3、迁移 TiUP 节点

a. 目录 tiup 节点已经添加 tidb 用户
b. 复制镜像目录（x86 和 arm64 离线合体目录）到目标节点。因为涉及到跨平台，只迁移集群元数据，打包管理用户的 .tiup 下的 clusters 目录到目标节点。同构节点间 tiup 迁移可以复制整个 .tiup 目录。

$tiup mirror show
$scp -rp tidb-community-xxx tidb@tiup-IP:/home/tidb/
$cd .tiup/storage/cluster/
$tar cvf clusters.tar clusters
$scp clusters.tar tidb@tiup-IP:/home/tidb/

c. 在目标节点安装 tiup

$cd tidb-community-xxx
$ls tiup-*-linux-arm64.tar.gz
$./local_install.sh

d. 目标节点解压集群配置

$cd ~
$tar -C .tiup/storage/cluster/ -xvf clusters.tar

e. 目标节点使用 tiup cluster display 验证管理能力。

$tiup cluster list
$tiup cluster display

f. 原有节点备份后删除 .tiup 目录，防止后期的误操作。

$tar cvf tiup.tar .tiup
$rm -rf .tiup

4、迁移 prometheus 监控节点

a. 使用扩容方式添加新的 arm64 prometheus 监控节点。

$vi monitor.yaml
$tiup cluster scale-out monitor.yaml

monitoring_servers:
- host: 172.16.6.177
  ssh_port: 22
  port: 9090
  deploy_dir: /tidb-deploy/prometheus-9090
  data_dir: /tidb-data/prometheus-9090
  log_dir: /tidb-deploy/prometheus-9090/log
  arch: aarch64
  os: linux

b. 将原来的监控数据目录复制到新的监控节点。

# ps -ef |grep tsdb.path #找到 storage.tsdb.path
tidb     12804     1  3 Oct23 ?        03:27:15 bin/prometheus/prometheus --config.file=/tidb-deploy/prometheus-9090/conf/prometheus.yml --web.listen-address=:9090 --web.external-url=http://172.16.6.177:9090/ --web.enable-admin-api --log.level=info --storage.tsdb.path=/tidb-data/prometheus-9090 --storage.tsdb.retention=30d
$cd /tidb-data/prometheus-9090
$ls -l #选择需要的天数目录（默认按天创建的目录，可以通过目录内的文件修改时间来确定时期范围）
$tar -cvzf prometheus9090.tar.gz 01E1VVCJDV9F6MVW1XQ0NB0KRA 01E1SXK95Y7TC24KMQGFCE789J
$scp prometheus9090.tar.gz tidb@prometheus-IP:/tidb-data/prometheus-9090

c. 在新的监控节点内解压。

$cd /tidb-data/prometheus-9090 
$tar xvf prometheus9090.tar.gz

d. 使用浏览器验证监控数据的时间长度后，使用缩容方式销毁原来的 prometheus 节点。

$tiup cluster edit-config  #调整 prometheus 的顺序，把 aarch64 的新节点放到第一个
$tiup cluster reload  -R grafana # 访问 grafana 确认历史数据可用后进行缩容操作
$tiup cluster scale-in  -N prometheus-IP:port

5、添加监控节点

使用缩容和扩容方式添加新的 arm64 grafana alertmanager 监控节点。

$vi grafana.yaml
$tiup cluster scale-out   grafana.yaml

grafana_servers:
- host: 172.16.6.177
  ssh_port: 22
  port: 3000
  deploy_dir: /tidb-deploy/grafana-3000
  arch: aarch64
  os: linux
alertmanager_servers:
- host: 172.16.6.177
  ssh_port: 22
  web_port: 9093
  cluster_port: 9094
  deploy_dir: /tidb-deploy/alertmanager-9093
  data_dir: /tidb-data/alertmanager-9093
  log_dir: /tidb-deploy/alertmanager-9093/log
  arch: aarch64
  os: linux

6、迁移 PD 节点

$vi pd.yaml
$tiup cluster scale-out   pd.yaml
$tiup ctl:v5.1.1 pd -u pd-IP1:2379 -i
>member show # 确认节点已经添加
>member leader transfer ARM的 
>member leader show # 确认 Leader 已经切换到 ARM 节点
$tiup cluster scale-in  -N pd-IP1:2379 #逐步下线 PD，一次只操作一个。
$tiup cluster scale-in  -N pd-IP2:2379
$tiup cluster scale-in  -N pd-IP3:2379
$tiup cluster display
$tiup ctl:v5.1.1 pd -u pd-IP1:2379 -i
>member show # 确认节点状态正确

7、迁移 TIKV 节点

$vi tikv.yaml
$tiup cluster scale-out  tikv.yaml
查看监控面板的 region 变化，业务已经停止，可以调大调度参数加快速度
$tiup ctl:v5.1.1 pd -u pd-IP1:2379 store limit all 1000
$tiup cluster scale-in  -N tikv-IP1:20160,tikv-IP2:20160,tikv-IP3:20160
完成后恢复调度参数
$tiup ctl:v5.1.1 pd store limit all 20

8、扩展 TiDB 节点

完全调整完成后再缩容，减少协调业务的次数

$vi tidb.yaml
$tiup cluster scale-out  tidb.yaml

9、缩容 TiDB 节点

确认所有组件都迁移完成后，配合负载均衡设备调整，将原 tidb-server 下线。

$tiup cluster scale-in  -N tidb-IP1:4000,tidb-IP2:4000

注意事项

跨平台架构仅建议在扩缩容期间运行，不建议长期运行。

需要注意对有状态组件的操作保护。

操作前应该使用 br 完成全量数据备份。
tiup 的元数据目录和监控数据的备份还原。
操作 pd 节点时，单次只操作一个。

节点操作的 yaml 文件需要对各节点标明 arch，如：amd64 或 arm64，aarch64 与 arm64 等同。

建议每个收缩节点的步骤完成后，都手工操作 tiup cluster prune 真正地清除节点。