佰聆实践:Oceanbase 数据迁移文档

2024年 5月 7日 61.2k 0

作者简介:张杰,大数据开发工程师。

 

一、练习背景

通过对 mysql 至 OceanBase 数据的迁移过程的学习,去了解和熟悉 OceanBase 数据库业务数据表结构及数据的多种迁移方法和迁移工具等。

二、练习内容

1、使用 mysqldump 将 mysql 的表结构和数据同步到 OceanBase 的 MySQL 租户中。

2、使用 datax 配置至少一个表的 MySQL 到 OceanBase 的 MySQL 租户的离线同步。

3、使用 datax 配置至少一个表的 OceanBase 到 CSV 以及 CSV 到 OceanBase 的离线同步。

4、使用 canal 配置 MySQL 的增量同步到 OceanBase 的 MySQL 租户。

三、Mysqldump 数据迁移

1、环境准备:

·安装tpcc

wget http://imysql.com/wp-content/uploads/2014/09/tpcc-mysql-src.tgz
gunzip tpcc-mysql-src.tgz
tar xf tpcc-mysql-src.tar
cd tpcc-mysql/src 
make

佰聆实践:Oceanbase 数据迁移文档-1

·并将测试数据导入mysql数据库的tpcc数据库

[root@ob1 tpcc-mysql]# ./tpcc_load 172.20.10.10:3306 tpcc root 123456 1

佰聆实践:Oceanbase 数据迁移文档-2

·查看测试业务库数据表

佰聆实践:Oceanbase 数据迁移文档-3

2、使用 mysqldump 导出业务表结构

mysqldump -h 127.1 -uroot -P3306 -p123456 -d tpcc --compact > tpch_ddl.sql

佰聆实践:Oceanbase 数据迁移文档-4

3、使用 mysqldump 导出业务表数据

mysqldump -h 127.1 -uroot -P3306 -p123456 -t tpcc > tpcc_data.sql

佰聆实践:Oceanbase 数据迁移文档-5

4、同步业务表结构至 OceanBase

mysql -h 172.20.10.10 -u root@sys#obdemo -P 2883 -p123456 -c -A tpcc
source tpcc_ddl.sql

佰聆实践:Oceanbase 数据迁移文档-6

5、同步业务表数据至 OceanBase

·source tpcc_data.sql

佰聆实践:Oceanbase 数据迁移文档-7

·查看数据已经同步完成

佰聆实践:Oceanbase 数据迁移文档-8

四、Datax 数据迁移

1、环境准备

wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
tar -zxvf data.tar.gz

佰聆实践:Oceanbase 数据迁移文档-9

2、删除 plugin 文件夹下的隐藏文件

cd /usr/local/soft/datax/datax/plugin/reader && rm -rf .*
cd /usr/local/soft/datax/datax/plugin/writer && rm -rf .*

佰聆实践:Oceanbase 数据迁移文档-10

3、生成 mysql 同步至 OceanBase的配置模板

python ./bin/datax.py -r mysqlreader -w oceanbasev10writer

佰聆实践:Oceanbase 数据迁移文档-11

4、修改配置文件

vim mysql2ob.json
{
   "job": {
       "setting": {
           },
           "errorLimit": {
               "record": 0,
               "percentage": 0.1
           }
       },
       "content": [
           {
               "reader": {
                   "name": "mysqlreader",
                   "parameter": {
                       "username": "root",
                       "password": "123456",
                       "column": [
                           "*"
                       ],
                       "connection": [
                           {
                               "table": [
                                   "student"
                               ],
                               "jdbcUrl": ["jdbc:mysql://172.20.10.10:3306/test?useUnicode=true&characterEncoding=utf8"]
                           }
                       ]
                   }
               },
 
               "writer": {
                   "name": "oceanbasev10writer",
                   "parameter": {
                       "obWriteMode": "insert",
                       "column": [
                           "*"
                       ],
                       "preSql": [
                           "truncate table student"
                       ],
                       "connection": [
                           {
                               "jdbcUrl": "||_dsc_ob10_dsc_||obdemo:sys||_dsc_ob10_dsc_||jdbc:oceanbase://172.20.10.10:2883/test?useLocalSessionState=true&allowBatch=true&allowMultiQueries=true&rewriteBatchedStatements=true",
                               "table": [
                                   "student"
                               ]
                           }
                       ],
                       "username": "root",
                       "password":"123456",
                       "writerThreadCount":10,
                       "batchSize": 1000,
                       "memstoreThreshold": "0.9"
                   }
               }
           }
       ]
   }
}

5、同步任务的启动与完成

·同步任务的启动

python ./bin/datax.py ./job/mysql2ob.json

佰聆实践:Oceanbase 数据迁移文档-12

·执行同步完成

佰聆实践:Oceanbase 数据迁移文档-13

五、datax 实现 OceanBase 以 csv 格式的导入导出

1、OceanBase 以 csv 格式导出数据

·生成 datax 配置 json 模板

python bin/datax.py -r oceanbasev10reader -w txtfilewriter > job/oceanbase2csv.json

佰聆实践:Oceanbase 数据迁移文档-14

·修改 json 文件为可执行文件

chmod 777 oceanbase2csv.json

佰聆实践:Oceanbase 数据迁移文档-15

·编辑配置 json 文件

vim job/oceanbase2csv.json
{
   "job": {
       "setting": {
           "speed": {
               "channel": 4
           },
           "errorLimit": {
               "record": 0,
               "percentage": 0.1
           }
       },
       "content": [
           {
               "reader": {
                   "name": "oceanbasev10reader",
                   "parameter": {
                       "column": ["stu_no","stu_name","stu_addr","stu_class"],
                       "connection": [
                           {
                               "jdbcUrl": ["||_dsc_ob10_dsc_||obdemo:sys||_dsc_ob10_dsc_||jdbc:oceanbase://172.20.10.10:2883/test?useLocalSessionState=true&allowBatch=true&allowMultiQueries=true&rewriteBatchedStatements=true"],
                               "table": ["student"]
                           }
                       ],
                       "password": "123456",
                       "username": "root",
                       "where": ""
                   }
               },
               "writer": {
                   "name": "txtfilewriter",
                   "parameter": {
                       "dateFormat": "yyyy-MM-dd hh:mm:ss",
                       "fieldDelimiter": ",",
                       "fileName": "ob2csv",
                       "path": "/usr/local/soft/mysql_dump",
                       "writeMode": "truncate"
                   }
               }
           }
       ]
   }
}

·执行任务

python bin/datax.py job/oceanbase2csv.json

佰聆实践:Oceanbase 数据迁移文档-16

·查看执行结果

佰聆实践:Oceanbase 数据迁移文档-17

2、OceanBase 以 csv 格式导入数据

·生成 datax 配置 json 模板

python bin/datax.py -w oceanbasev10writer -r txtfilereader > job/csv2oceanbase.json

佰聆实践:Oceanbase 数据迁移文档-18

·修改 json 文件为可执行文件

chmod 777 csv2oceanbase.json

佰聆实践:Oceanbase 数据迁移文档-19

·编辑配置 json 文件

vim csv2oceanbase.json
{
   "job": {
       "setting": {
           "speed": {
               "channel": 4
           },
           "errorLimit": {
               "record": 0,
               "percentage": 0.1
           }
       },
       "content": [
           {
               "reader": {
                   "name": "txtfilereader",
                   "parameter": {
                       "path": ["/usr/local/soft/mysql_dump"],
                       "fileName": "ob2csv",
                       "encoding": "UTF-8",
                       "column": ["*"],
                       "dateFormat": "yyyy-MM-dd hh:mm:ss" ,
                       "nullFormat": "\\N" ,
                       "fieldDelimiter": ","
                   }
               },
               "writer": {
                   "name": "oceanbasev10writer",
                   "parameter": {
                       "obWriteMode": "insert",
                       "column": [
                           "stu_no","stu_name","stu_addr","stu_class"
                       ],
                       "preSql": [
                           "truncate table student"
                       ],
                       "connection": [
                           {
                               "jdbcUrl": "||_dsc_ob10_dsc_||obdemo:sys||_dsc_ob10_dsc_||jdbc:oceanbase://172.20.10.10:2883/test?useLocalSessionState=true&allowBatch=true&allowMultiQueries=true&rewriteBatchedStatements=true",
                               "table": [
                                   "student"
                               ]
                           }
                       ],
                       "username": "root",
                       "password": "123456",
                       "writerThreadCount": 10,
                       "batchSize": 100,
                       "memstoreThreshold": "0.9"
                   }
               }
           }
       ]
   }
}

·执行任务

python bin/datax.py job/csv2oceanbase.json

佰聆实践:Oceanbase 数据迁移文档-20

·查看执行结果

佰聆实践:Oceanbase 数据迁移文档-21

·注意:

Path:需要给定文件的路径,该路径下不要有其他文件,执行会报错。

六、Canal 实现 mysql 与 OceanBase 的增量同步

1、Mysql 环境准备

·开启 binlog 日志

[mysqld]
log-bin=mysql-bin # 开启 binlog
binlog-format=ROW # 选择 ROW 模式
server_id=1 # 配置 MySQL replaction 需要定义,不要和 canal 的 slaveId 重复

佰聆实践:Oceanbase 数据迁移文档-22

·授权账号权限

这里我使用的是 root 账号,可以创建账号并赋权限。

CREATE USER canal IDENTIFIED BY 'canal'; 
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
FLUSH PRIVILEGES;

2、部署 canal admin

·下载 canal admin 并解压

wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.admin-1.1.5.tar.gz
tar -zxvf canal.admin-1.1.5.tar.gz

·修改配置文件

cd canal-admin/conf
vim application.yml

佰聆实践:Oceanbase 数据迁移文档-23

·初始化元数据库

mysql -uroot -P3306 -p123456
source conf/canal_manager.sql

佰聆实践:Oceanbase 数据迁移文档-24

·启动 canal-admin

cd canal/canal-admin/bin/ && ./startup.sh

佰聆实践:Oceanbase 数据迁移文档-25

·查看 canal-admin 的 web 服务

http://127.0.0.1:8080/
Username:admin
Password:123456

佰聆实践:Oceanbase 数据迁移文档-26

3、部署canal deployer

·下载canal deployer并解压

wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz
tar canal.deployer-1.1.5.tar.gz

·修改配置文件

 canal Admin 部署 server 和 instance,配置文件 conf/canal_local.properties 替换 conf/canal.properties。需要修改 conf/canal.properties 里的 manager 地址,其他参数值可以保持默认。

mv canal_local.properties canal.properties && vim canal.properties

佰聆实践:Oceanbase 数据迁移文档-27

·启动 canal-deployer 服务

cd ~/canal/canal-deployer/bin/ && ./startup.sh

佰聆实践:Oceanbase 数据迁移文档-1

4、通过 canal-admin 的 web 服务配置 canal server 和 canal instance

·创建 canal server 并启动

佰聆实践:Oceanbase 数据迁移文档-29

·创建 canal instance 并启动

佰聆实践:Oceanbase 数据迁移文档-30

·导入模板并修改对应的参数

#################################################

## mysql serverId , v1.0.26+ will autoGen
canal.instance.mysql.slaveId=1000
 
# enable gtid use true/false
canal.instance.gtidon=false
 
# position info
canal.instance.master.address=172.20.10.10:3306
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
canal.instance.master.gtid=
 
# rds oss binlog
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=
 
# table meta tsdb info
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
#canal.instance.tsdb.dbUsername=canal
#canal.instance.tsdb.dbPassword=canal
 
#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=
 
# username/password
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==
 
# table regex
canal.instance.filter.regex=.*\\..*
# table black regex
canal.instance.filter.black.regex=
# table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
# table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch
 
# mq config
canal.mq.topic=example
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
canal.mq.partition=0
# hash partition config
#canal.mq.partitionsNum=3
#canal.mq.partitionHash=test.table:id^name,.*\\..*
#################################################

 

5、部署 canal adapter

·下载 canal adapter 并解压

wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.adapter-1.1.5.tar.gz

tar canal.adapter-1.1.5.tar.gz

·修改配置文件

cd ~/canal/canal-adapter/conf/ && vim application.yml

server:
 port: 8081
spring:
 jackson:
   date-format: yyyy-MM-dd HH:mm:ss
   time-zone: GMT+8
   default-property-inclusion: non_null
 
canal.conf:
 mode: tcp #tcp kafka rocketMQ rabbitMQ
 flatMessage: true
 zookeeperHosts:
 syncBatchSize: 1000
 retries: 0
 timeout:
 accessKey:
 secretKey:
 consumerProperties:
   # canal tcp consumer
   canal.tcp.server.host: 172.20.10.10:11111
   canal.tcp.zookeeper.hosts:
   canal.tcp.batch.size: 500
   canal.tcp.username: admin
canal.tcp.password: 4ACFE3202A5FF5CF467898FC58AAB1D615029441
 canalAdapters:
 - instance: mysqldb # canal instance Name or mq topic name
   groups:
   - groupId: group1
     outerAdapters:
     - name: logger
     - name: rdb
       key: oceanbase1
       properties:
         jdbc.driverClassName: com.mysql.jdbc.Driver
         jdbc.url: jdbc:mysql://172.20.10.10:2883/test?useUnicode=true
         jdbc.username: root@sys#obdemo
         jdbc.password: 123456

·配置映射文件

cd ~/canal/canal-adapter/conf/rdb/ && vim mytest_user.yml

佰聆实践:Oceanbase 数据迁移文档-31

·启动 canal adapter 服务

cd ~/canal/canal-adapter/bin/ && ./startup.sh

佰聆实践:Oceanbase 数据迁移文档-32

6、同步测试                                      

· mysql 创建一个业务表

佰聆实践:Oceanbase 数据迁移文档-2

· OceanBase 数据库同步创建成功

佰聆实践:Oceanbase 数据迁移文档-34

·查看日志

佰聆实践:Oceanbase 数据迁移文档-35

OceanBase 社区版入门到实战教程直播正在进行中~

快和小伙伴一起进群交流学习吧~

加入直播群方式一:

钉钉群号 3582 5151

加入直播群方式二:

扫码下方二维码加入

佰聆实践:Oceanbase 数据迁移文档-36

相关文章

Oracle如何使用授予和撤销权限的语法和示例
Awesome Project: 探索 MatrixOrigin 云原生分布式数据库
下载丨66页PDF,云和恩墨技术通讯(2024年7月刊)
社区版oceanbase安装
Oracle 导出CSV工具-sqluldr2
ETL数据集成丨快速将MySQL数据迁移至Doris数据库

发布评论