使用 Datax 迁移数据到 OB

DataX 是一个离线数据同步工具/平台,支持多种异构数据库之间的迁移同步,可以通过文档查看各个数据源的配置方法。链接地址:https://github.com/alibaba/datax

使用 Datax 迁移数据到 OB-每日运维

开始迁移

这里主要是 mysql -> ob,其他数据源的配置可以参考 datax。

1、下载安装 datax

官方地址,有两种方式可以安装

https://github.com/alibaba/DataX/blob/master/userGuid.md

或者也可以直接通过下面的步骤下载安装

# 下载 datax
wget https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz
# 解压
tar -xf datax.tar.gz
# 调试,没有报错则正常
cd datax/bin
python datax.py ../job/job.json

2、在 OB 集群内创建表结构

3、编写配置文件

{
"job": {
"setting": {
"speed": {
"channel": 4
},
"errorLimit": {
"record": 0,
"percentage": 0.1
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "xxxxx",
"column": ["*"],
"connection": [
{
"table": ["t1"],
"jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8&useSSL=false"]
}
]
}
},
"writer": {
"name": "oceanbasev10writer",
"parameter": {
"obWriteMode": "insert",
"column": ["*"],
"preSql": ["truncate table t1"],
"connection": [
{
"jdbcUrl": "jdbc:oceanbase://127.0.0.1:2883/test2?",
"table": ["t1"]
}
],
"username": "用户名@租户名#集群名",
"password":"xxxxx",
"writerThreadCount":10,
"batchSize": 1000,
"memstoreThreshold": "0.9"
}
}
}
]
}
}

4、启动迁移

python bin/datax.py conf/mysql2ob.json

5、如果需要离线定时同步增量,可以指定 where 或者 querySql

指定 where

"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "xxxx",
"column": ["*"],
"where": "id >2"
"connection": [
{
"table": ["t1"],
"jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8&useSSL=false"]
}
]
}
},

指定querySql

"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "xxxx",
"column": ["*"],
"connection": [
{
"querySql": [
"select * from t1 where id > 4;"
],
"jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8&useSSL=false"]
}
]
}
},