背景
最近工作需要把Mysql数据导入到Oracle,于是想到了SQLLDR(SQLLDR是Oracle服务端自带的工具)。趁这次机会整理份操作文档方便后续查看。
SQL*Loader支持3种数据加载方式:
- 传统路径加载(direct=false):等同于insert语句(默认)
- 直接路径加载(direct=true):绕过SGA,把数据直接导入高水位线(HWM)以上,可设置并行加载,性能比传统路径加载更高,但限制也更多;
no_index_errors:默认为false,如果为true表示在加载过程中不会把index设为unusable(在此方式可用); - 外部表加载:先为数据文件上创建一个外部表,然后再把数据从外部表insert到目标表中
常用参数介绍:
- errors:最大允许错误的记录数
- discardmax :最大允许丢弃的记录数
- parallel:默认为false,如果为true表示启用并行加载
- skip_index_maintenance:默认为false,如果为true表示在加载过程中停止索引的维护
- rows:表示达到多少行将提交,默认(传统路径加载64,直接路径加载all)
- skip:省略最前面的记录数
- CHARACTERSET UTF8 :导出文件的字符集,以及导入目标字符集
数据加载对目标表处理方式:
- SQL*Loader是以INSERT方式加载数据,目标表必须是空表,否则会报错;(默认)
- APPEND:添加数据至源数据的后面
- REPLACE:在添加数据之前,先执行DELETE FROM TABLE把表清空,注意:这里的REPLACE是表级别的替代,而不是行级别
- TRUNCATE:在添加数据之前,先执行TRUNCATE TABLE table_name REUSE STORAGE
注:SQL*Loader不支持对已存在的行更新,如果你想更新已存在的行,则必须先把数据加载至一个空的中间表,然后在原表和中间表进行关联update
示例:
环境准备
- 准备CSV数据:t.csv
"status","amt","create_time"
"A","90","2024-02-26 13:37:28"
"B","100","2024-03-26 13:37:28"
"B","123","2023-01-26 13:37:28"
"A","10","2024-01-26 13:37:28"
"A","1","2024-03-26 13:37:28"
- 创建测试表:t1、t2
create table t1
(
id NUMBER(16) not null,
status VARCHAR2(2),
amt number(5,2),
comments VARCHAR2(1000),
create_time TIMESTAMP(6),
constraint pk_id primary key(id)
);
create table t2
(
id NUMBER(16) not null,
status VARCHAR2(2),
amt number(5,2),
comments VARCHAR2(1000),
create_time TIMESTAMP(6),
constraint pk_t2_id primary key(id)
);
- 创建序列:t_id_seq
create sequence t_id_seq
minvalue 1
maxvalue 99999
start with 1
increment by 1
cache 20;
- 配置:tns
cat /u01/oracle/11.2.0.3/product/network/admin/tnsnames.ora
TEST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = scan-rac.com)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = DB)
)
)
- 执行导入:sqlldr
sqlldr userid=scott/'tiger'@TEST control=/home/oracle/csv/t.ctl log=/home/oracle/csv/t.log
将t.csv数据导入t1:
- t1.ctl
options(skip=1,BINDSIZE=20971520, ROWS=10000000, READSIZE=20971520, ERRORS=10)
LOAD DATA
CHARACTERSET UTF8
INFILE 't.csv' -- 指定csv文件
BADFILE 't.bad' --符合条件但导入失败,坏数据文件
DISCARDFILE 't.dsc' -- 指定 丢弃文件
INTO TABLE t1 -- 指定表名,insert:缺省;append:追加;replace:替换
FIELDS TERMINATED BY ',' -- 数据中每行记录用 "," 分隔
OPTIONALLY ENCLOSED BY '"' -- 数据中每个字段用 '"' 框起,比如字段中有 "," 分隔符时
trailing nullcols -- 表的字段没有对应的值时允许为空
(
status varchvar,
amt numeric(5,2),
create_time TIMESTAMP "YYYY-MM-DD HH24:MI:SS", -- TIMESTAMP 指定格式
comments "'status :'||upper(:status)||'金额:'||:amt",-- 绑定变量对数据加工处理
id "t_id_seq.nextval" -- 这一列直接取序列
)
- 执行导入:sqlldr
sqlldr userid=scott/'tiger'@TEST control=/home/oracle/csv/t1.ctl log=/home/oracle/csv/t1.log
SQL*Loader: Release 11.2.0.3.0 - Production on Thu May 9 15:34:49 2024
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Commit point reached - logical record count 5
- 查看导入日志:
[oracle@db csv]$ cat t1.log
SQL*Loader: Release 11.2.0.3.0 - Production on Thu May 9 15:34:49 2024
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Control File: /home/oracle/csv/t1.ctl
Character Set UTF8 specified for all input.
Data File: t.csv
Bad File: t.bad
Discard File: t.dsc
(Allow all discards)
Number to load: ALL
Number to skip: 1
Errors allowed: 10
Bind array: 10000000 rows, maximum of 20971520 bytes
Continuation: none specified
Path used: Conventional
Table T1, loaded from every logical record.
Insert option in effect for this table: INSERT
TRAILING NULLCOLS option in effect
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
STATUS FIRST 10 , O(") CHARACTER
AMT NEXT * , O(") CHARACTER
CREATE_TIME NEXT * , O(") DATETIME YYYY-MM-DD HH24:MI:SS
COMMENTS NEXT * , O(") CHARACTER
SQL string for column : "'status :'||upper(:status)||'|AMT'||:amt"
ID NEXT * , O(") CHARACTER
SQL string for column : "t_id_seq.nextval"
value used for ROWS parameter changed from 8323541 to 20087
Table T1:
5 Rows successfully loaded.
0 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Space allocated for bind array: 20970828 bytes(20087 rows)
Read buffer bytes:20971520
Total logical records skipped: 1
Total logical records read: 5
Total logical records rejected: 0
Total logical records discarded: 0
Run began on Thu May 09 15:34:49 2024
Run ended on Thu May 09 15:34:49 2024
Elapsed time was: 00:00:00.12
CPU time was: 00:00:00.05
- 查看表T1数据
将t.csv数据分别导入t1、t2:
- t.ctl
options(skip=1,BINDSIZE=20971520, ROWS=10000000, READSIZE=20971520, ERRORS=10)
LOAD DATA
CHARACTERSET UTF8
INFILE 't.csv'
BADFILE 't.bad'
DISCARDFILE 't.dsc'
truncate -- 对t1、t2 truncate 后,再导入
INTO TABLE t1
when st = "A" -- when 子句选择导入符合条件的记录 :st 与 下面的列:st 对应,或 when (2) = "A"
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
trailing nullcols
(
st filler POSITION(1), -- 过滤字段,不必在表中出现,主要用于when条件过滤,注意filler在position关键字前边
status POSITION(1), -- WHEN子句从第一个记录的末尾继续处理。要重置回记录的开始,请为第一个字段设置位置 (1)。
amt ,
create_time TIMESTAMP "YYYY-MM-DD HH24:MI:SS",
comments "'status:'||upper(:status)",
id "t_id_seq.nextval"
)
INTO TABLE t2
when st = "B"
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
trailing nullcols
(
st filler position(1),
status position(1),
amt ,
create_time TIMESTAMP "YYYY-MM-DD HH24:MI:SS",
comments "'status:'||upper(:status)",
id "t_id_seq.nextval"
)
- 执行导入:sqlldr
sqlldr userid=scott/'tiger'@TEST control=/home/oracle/csv/t.ctl log=/home/oracle/csv/t.log
SQL*Loader: Release 11.2.0.3.0 - Production on Thu May 9 17:40:49 2024
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Commit point reached - logical record count 5
- 查看部分日志:t.log
Table T1:
3 Rows successfully loaded.
0 Rows not loaded due to data errors.
2 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Table T2:
2 Rows successfully loaded.
0 Rows not loaded due to data errors.
3 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Space allocated for bind array: 20969208 bytes(6773 rows)
Read buffer bytes:20971520
Total logical records skipped: 1
Total logical records read: 5
Total logical records rejected: 0
Total logical records discarded: 0
- 查看表T1、T2数据
定时任务脚本
- 注:执行sqlldr 命令前一定要加:exec ,否则命令不执行
cat csv_db.sh
#!/bash/bin
export ORACLE_SID=db
export ORACLE_UNQNAME=db
export ORACLE_BASE=/u01/oracle
export ORACLE_HOME=/u01/oracle/11.2.0.3/product
export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
export OGG_HOME=/u01/ogg
export PATH=$OGG_HOME:$ORACLE_HOME/bin:$PATH
export LD_LIBRARY_PATH=$OGG_HOME:$ORACLE_HOME/lib:$LD_LIBRARY_PATH
exec sqlldr userid=scott/'tiger'@TEST control=/home/oracle/csv/t.ctl log=/home/oracle/csv/t.log
总结
Oracle 数据库使用功能还是很全面且强大的。