IBM DB2 是美国IBM公司开发的一套关系型数据库管理系统,它主要的运行环境为UNIX(包括IBM自家的AIX)、Linux、IBM i(旧称OS/400)、z/OS,以及Windows服务器版本。其中 LINUX、UNIX 和Windows 平台又简称 LUW 开放平台。客户包含 省农信、农商行 以及一些中字头金融机构。
目前业界应该没有哪家国产数据库号称兼容 DB2, 所以 “去 DB2” ,业务的改造是不可避免的。业界数据库能力最接近 DB2 的是 ORACLE 数据库,而 OceanBase 又同时兼容 ORACLE 和 MySQL。所以 DB2 迁移到 OceanBase ,最佳选择是选择 ORACLE 租户类型,其次是 MySQL 租户。
本文主要是总结 DB2 迁移到 OceanBase ORACLE 租户的经验。
表结构转换
首 先要评估 DB2 的表结构在 OB(ORACLE) 上的兼容性。DB2 跟 OB(ORACLE) 的数据类型差异还是比较大,下面仅就常用的类型的典型差异举例说明。
自增列类型
DB2 的 SQL 语法,在 OB(ORACLE) 里都可以找到对应的实现,但不完全一样。
DB2 的自增列(IDENTITY
)语法 OB 3.2 版本开始支持。在那之前,需要应用使用 SEQUENCE 给列填充值。
- DB2 示例:
CREATE TABLE t4 ( ID BIGINT NOT NULL GENERATED BY DEFAULT AS IDENTITY (MAXVALUE 9223372036854775807), C1 VARCHAR (50) NOT NULL DEFAULT '' ) ;
- ORACLE 示例:
CREATE TABLE t4 (ID NUMBER GENERATED ALWAYS AS IDENTITY MINVALUE 1 MAXVALUE 9223372036854775807 NOT NULL, C1 VARCHAR2(50) );
二者写法 NOT NULL
的位置不同。
DB2 的 DATE / TIME 跟 OB(ORACLE) 的 DATE 差异问题
默认情况下(指 DB2 不开启 ORACLE 兼容模式),DB2 的 DATE 只包含日期,TIME 只包含时间。而 OB(ORACLE) 的 DATE 包含日期和时间。
- DB2:
db2 => DROP TABLE t1 DB20000I The SQL command completed successfully. db2 => CREATE TABLE t1(id bigint NOT NULL PRIMARY KEY ,c1 date , c2 time, c3 timestamp) DB20000I The SQL command completed successfully. db2 => INSERT INTO t1 values(1, sysdate,sysdate,current_timestamp ) DB20000I The SQL command completed successfully. db2 => SELECT * FROM t1 ID C1 C2 C3 -------------------- ---------- -------- -------------------------- 1 08/14/2021 05:29:55 2021-08-14-05.29.55.343157 1 record(s) selected. db2 =>
- OB(ORACLE):
obclient> DROP TABLE t1; Query OK, 0 rows affected (0.02 sec) obclient> CREATE TABLE t1(id number NOT NULL PRIMARY KEY ,c1 date , c3 timestamp); Query OK, 0 rows affected (0.04 sec) obclient> INSERT INTO t1 values(1, sysdate,current_timestamp); Query OK, 1 row affected (0.01 sec) obclient> SELECT * FROM t1; +----+---------------------+----------------------------+ | ID | C1 | C3 | +----+---------------------+----------------------------+ | 1 | 2021-08-14 13:28:15 | 2021-08-14 13:28:15.177911 | +----+---------------------+----------------------------+ 1 row in set (0.00 sec)
字符集问题
如果 DB2 字符集已经是 UTF-8 ,那可以直接转换为 OB(ORACLE) 的 UTF8MB4 。UTF8MB4 是 UTF8 的超集,多一些表情字符。
如果 DB2 字符集不是 UTF-8 ,也建议转换为 OB(ORACLE) 的 UTF8MB4,相关字符串字段长度需要在 OB(ORACLE) 里放大。如果是单向同步,这个没问题。如果还要考虑 OB(ORACLE) 数据回流到 DB2,可能会出现实际字符串数据超长问题。
- 查看 DB2 数据库字符集方法:
[db2inst1@9a9fd42bf525 ~]$ db2 get db cfg for testdb |grep -i code Database code page = 1208 Database code set = utf-8 Database country/region code = 1 [db2inst1@a32408ac626d ~]$ db2 get db cfg for testdb|grep -i national National character string mapping (NCHAR_MAPPING) = CHAR_CU32
- 查看 OB(ORACLE) 租户字符集方法
MySQL [(none)]> show variables like '%character%'; +--------------------------+-----------+ | VARIABLE_NAME | VALUE | +--------------------------+-----------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_system | utf8mb4 | | nls_characterset | AL32UTF8 | | nls_nchar_characterset | AL16UTF16 | | nls_numeric_characters | ., | +--------------------------+-----------+ 8 rows in set (0.003 sec)
OB(ORACLE) 的参数 nls_characterset
定义了数据库的字符集,通常是 AL32UTF8
,每个中文字符字长为 3 个字节。同时 nls_nchar_characterset
定义了 Unicode字符串使用的字符集,每个中文字符字长 2 个字节。
字符串长度问题
DB2 和 ORACLE 都有参数定义一个 字符存储时对应多少字节。这个会影响一个列里最多能存储几个中文字符。
- DB2 查看字长单位方法
[db2inst1@8d6dd8adcacf ~]$ db2 get db cfg for testdb |grep -i string Default string units (STRING_UNITS) = SYSTEM National character string mapping (NCHAR_MAPPING) = CHAR_CU32
当字符集是 Unicode
时,
- 如果
STRING_UNITS
值为SYSTEM
,则 CHAR, VARCHAR, and CLOB 类型的长度在不指定CODEUNITS32
时,会默认为是OCTETS
。 - 如果
STRING_UNITS
值为CODEUNITS32
时,则默认长度是CODEUNITS32
。
字符串长度单位说明:
OCTETS
: 使用字节作为字符的单位。CODEUNITS16
:使用UTF-16
编码作为字符的单位,长度为 2 个 字节。CODEUNITS32
:使用UTF-32
编码作为字符的单位,长度为 4 个字节。
NCHAR_MAPPING
取值为 CHAR_CU32
时,NCHAR
、NVARCHAR
和 NCLOB
列字符默认单位是 CODEUNITS32
。
- OB(ORACLE) 查看字长单位方法。
obclient> show global variables like '%nls_length_semantics%'; +----------------------+-------+ | VARIABLE_NAME | VALUE | +----------------------+-------+ | nls_length_semantics | BYTE | +----------------------+-------+ 1 row in set (0.00 sec)
参数 nls_length_semantics
是在租户下设置,只影响当前租户。取值有两种:
BYTE
: 使用字节作为字符的单位。CHAR
: 使用字符作为单位。每个字符多少个字节跟字符集设置有关。字符集是UTF-8
的时候,每个中文字符 3 个字节。
下面的示例方便具体理解 DB2 跟 OB(ORACLE) 的字符长度差异。
- DB2 字符长度示例
db2 => DROP TABLE t2; DB20000I The SQL command completed successfully. db2 => CREATE TABLE t2(id bigint NOT NULL PRIMARY KEY, c1 char(50), c2 varchar(50), c3 nchar(50), c4 nvarchar(50)); DB20000I The SQL command completed successfully. db2 => INSERT INTO t2 values(1,'中','中','中','中'); DB20000I The SQL command completed successfully. db2 => INSERT INTO t2 values(2,'中国','中国','中国','中国'); DB20000I The SQL command completed successfully. db2 => SELECT id, c1, length(c1) c1_length,lengthb(c1) c1_lengthb,c2,length(c2) c2_length,lengthb(c2) c2_lengthb, c3, length(c3) c3_length, lengthb(c3) c3_lengthb,c4, length(c4) c4_length,lengthb(c4) c4_lengthb FROM t2; ID C1 C1_LENGTH C1_LENGTHB C2 C2_LENGTH C2_LENGTHB C3 C3_LENGTH C3_LENGTHB C4 C4_LENGTH C4_LENGTHB -------------------- -------------------------------------------------- ----------- ----------- -------------------------------------------------- ----------- ----------- ---------------------------------------------------------------------------------------------------- ----------- ----------- ---------------------------------------------------------------------------------------------------- ----------- ----------- 1 中 50 50 中 6 6 中 50 53 中 3 6 2 中国 50 50 中国 12 12 中国 50 56 中国 6 12 2 record(s) selected.
- ORACLE 字符长度示例
obclient> DROP TABLE t2; Query OK, 0 rows affected (0.02 sec) obclient> CREATE TABLE t2(id number NOT NULL PRIMARY KEY, c1 char(50), c2 varchar2(50), c3 nchar(50), c4 nvarchar2(50)); Query OK, 0 rows affected (0.05 sec) obclient> INSERT INTO t2 values(1,'中','中','中','中'); Query OK, 1 row affected (0.00 sec) obclient> INSERT INTO t2 values(2,'中国','中国','中国','中国'); Query OK, 1 row affected (0.00 sec) obclient> SELECT id, c1, length(c1) c1_length,lengthb(c1) c1_lengthb,c2,length(c2) c2_length,lengthb(c2) c2_lengthb, c3, length(c3) c3_length, lengthb(c3) c3_lengthb,c4, length(c4) c4_length,lengthb(c4) c4_lengthb FROM t2; +----+----------------------------------------------------+-----------+------------+--------+-----------+------------+--------------------------------------------------------+-----------+------------+--------+-----------+------------+ | ID | C1 | C1_LENGTH | C1_LENGTHB | C2 | C2_LENGTH | C2_LENGTHB | C3 | C3_LENGTH | C3_LENGTHB | C4 | C4_LENGTH | C4_LENGTHB | +----+----------------------------------------------------+-----------+------------+--------+-----------+------------+--------------------------------------------------------+-----------+------------+--------+-----------+------------+ | 1 | 中 | 48 | 50 | 中 | 1 | 3 | 中 | 50 | 100 | 中 | 1 | 2 | | 2 | 中国 | 46 | 50 | 中国 | 2 | 6 | 中国 | 50 | 100 | 中国 | 2 | 4 | +----+----------------------------------------------------+-----------+------------+--------+-----------+------------+--------------------------------------------------------+-----------+------------+--------+-----------+------------+ 2 rows in set (0.01 sec)
非空与空值问题
在 DB2 里,空值 跟 NULL
是两个值,并不相等。但是在 OB(ORACLE) 里,空值就是 NULL
。
针对是否是 NULL
只能用 IS NULL
或者 IS NOT NULL
判断,而不能用等值符号判断。在 DB2 里针对是否是空值,是用等值符号判断。
所以,DB2 的应用如果定义了很多 NOT NULL
列,但实际又允许写入空值,这个应用迁移到 OB(ORACLE) 是会报错的。解决办法只能是拿掉 OB(ORACLE) 列的 NOT NULL
属性。
- DB2 示例
db2 => DROP TABLE t3; DB20000I The SQL command completed successfully. db2 => CREATE TABLE t3(id bigint NOT NULL PRIMARY KEY, c1 varchar(50) NOT NULL , c2 varchar(50) NULL ); DB20000I The SQL command completed successfully. db2 => INSERT INTO t3 values(1, '', ''); DB20000I The SQL command completed successfully. db2 => INSERT INTO t3 values(2, ' ', ''); DB20000I The SQL command completed successfully. db2 => SELECT id, c1, hex(c1), c2, hex(c2) FROM t3 ; ID C1 3 C2 5 -------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- 1 2 20 2 record(s) selected. db2 => SELECT id, c1, hex(c1), c2, hex(c2) FROM t3 WHERE c1 IS NULL ; ID C1 3 C2 5 -------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- 0 record(s) selected. db2 => SELECT id, c1, hex(c1), c2, hex(c2) FROM t3 WHERE c1 = ''; ID C1 3 C2 5 -------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- 1 2 20 2 record(s) selected. db2 =>
- ORACLE 示例
obclient> DROP TABLE t3; Query OK, 0 rows affected (0.02 sec) obclient> CREATE TABLE t3(id number NOT NULL PRIMARY KEY, c1 varchar2(50) NOT NULL , c2 varchar2(50) NULL ); Query OK, 0 rows affected (0.05 sec) obclient> INSERT INTO t3 values(1, '', ''); ORA-01400: cannot insert NULL into '(C1)' obclient> INSERT INTO t3 values(2, ' ', ''); Query OK, 1 row affected (0.00 sec) obclient> SELECT id, c1, dump(c1), c2, dump(c2,16) FROM t3 ; +----+----+------------------+------+-------------+ | ID | C1 | DUMP(C1) | C2 | DUMP(C2,16) | +----+----+------------------+------+-------------+ | 2 | | Typ=22 Len=1: 32 | NULL | NULL | +----+----+------------------+------+-------------+ 1 row in set (0.01 sec) obclient> SELECT id, c1, dump(c1), c2, dump(c2,16) FROM t3 WHERE c1 IS NULL ; Empty set (0.00 sec) obclient> SELECT id, c1, dump(c1), c2, dump(c2,16) FROM t3 WHERE c1 = ''; Empty set (0.00 sec) obclient> SELECT id, c1, dump(c1), c2, dump(c2,16) FROM t3 WHERE c1 = ' '; +----+----+------------------+------+-------------+ | ID | C1 | DUMP(C1) | C2 | DUMP(C2,16) | +----+----+------------------+------+-------------+ | 2 | | Typ=22 Len=1: 32 | NULL | NULL | +----+----+------------------+------+-------------+ 1 row in set (0.01 sec) obclient>
范围分区表分区边界问题
DB2 的范围分区表语法 跟 OB(ORACLE) 语法在指定分区的范围时语法完全不一样。除此之外,边界值也有不同逻辑。
DB2 的 RANGE 分区在定义分区边界时,可以通过在 起始值 和 结束值后面带上选项 INCLUSIVE
或 EXCLUSIVE
,以表示是否包含边界值。不指定的时候就是默认为 INCLUSIVE
(包含边界值)。OB(ORACLE) 没有这个语法,默认分区范围就是不包含边界值,且只能指定上限,不能指定下限。
所以,DB2 的分区表分区范围不要使用 INCLUSIVE
选项,要明确指定 EXCLUSIVE
选项;否则,转换为 OB(ORACLE) 的时候要调整分区范围的起始值和结束值。
- DB2 示例
[db2inst1@8d6dd8adcacf ~]$ db2 "DROP TABLE t6;" DB20000I The SQL command completed successfully. [db2inst1@a32408ac626d ~]$ db2 "CREATE TABLE t6( > id bigint NOT NULL, > c1 varchar(50), > c2 timestamp NOT NULL , > PRIMARY KEY (id) > ) PARTITION BY RANGE(c2) > ( PARTITION P1 STARTING '1/1/2020' ENDING '1/31/2020' INCLUSIVE , > PARTITION P2 STARTING '2/1/2020' ENDING '2/28/2020' INCLUSIVE , > PARTITION P3 STARTING '3/1/2020' ENDING '3/31/2020' INCLUSIVE , > PARTITION P4 STARTING '4/1/2020' ENDING '4/30/2020' INCLUSIVE , > PARTITION P5 STARTING '5/1/2020' ENDING '5/31/2020' INCLUSIVE > );" DB20000I The SQL command completed successfully. [db2inst1@a32408ac626d ~]$ db2 "INSERT INTO t6(id,c1,c2) values(1,'A','1/1/2020');" DB20000I The SQL command completed successfully. [db2inst1@a32408ac626d ~]$ db2 "INSERT INTO t6(id,c1,c2) values(2,'B','5/31/2020');" DB20000I The SQL command completed successfully. [db2inst1@a32408ac626d ~]$ db2 "INSERT INTO t6(id,c1,c2) values(3,'C','12/31/2019');" DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned: SQL0327N The row cannot be inserted into table "TPCC.T6" because it is outside the bounds of the defined data partition ranges. SQLSTATE=22525 [db2inst1@a32408ac626d ~]$ [db2inst1@a32408ac626d ~]$ db2 "SELECT * FROM t6;" ID C1 C2 -------------------- -------------------------------------------------- -------------------------- 1 A 2020-01-01-00.00.00.000000 2 B 2020-05-31-00.00.00.000000 2 record(s) selected. [db2inst1@a32408ac626d ~]$
- ORACLE 用法
obclient> CREATE TABLE t6( -> id number NOT NULL, -> c1 varchar(50), -> c2 timestamp NOT NULL , -> PRIMARY KEY (id,c2) -> ) PARTITION BY RANGE(c2) -> ( PARTITION P1 VALUES LESS THAN ( to_date('2020/02/01','YYYY/MM/DD')), -> PARTITION P2 VALUES LESS THAN ( to_date( '2020/03/01','YYYY/MM/DD')), -> PARTITION P3 VALUES LESS THAN ( to_date( '2020/04/01','YYYY/MM/DD')), -> PARTITION P4 VALUES LESS THAN ( to_date( '2020/05/01','YYYY/MM/DD')), -> PARTITION P5 VALUES LESS THAN ( to_date( '2020/06/01','YYYY/MM/DD')) -> ); Query OK, 0 rows affected (0.09 sec) obclient> INSERT INTO t6(id,c1,c2) values(1,'A',to_date('2020/01/01','YYYY/MM/DD')); Query OK, 1 row affected (0.01 sec) obclient> INSERT INTO t6(id,c1,c2) values(2,'B',to_date('2020/03/01','YYYY/MM/DD')); Query OK, 1 row affected (0.00 sec) obclient> INSERT INTO t6(id,c1,c2) values(3,'C',to_date('2020/06/01','YYYY/MM/DD')); ORA-14400: inserted partition key does not map to any partition obclient>
SQL 语法转换
除了数据类型问题外,有些常用 SQL 写法也许要微调。
rownumber
函数
DB2 里有 rownumber
和 row_number
函数,作用一样,但是 OB(ORACLE) 只有 row_number
函数。如果业务使用的是 rownumber
,那么要调整一下 SQL。
DB2 应用通常把 rownumber
和 FETCH FIRST N ROWS
结合使用来进行分页,在 OB(ORACLE) 3.1 版本之前,需要转换为 OB(ORACLE) 的分页方法。
- DB2 示例
CREATE TABLE t5 AS (SELECT * FROM SYSIBM.TABLES ) WITH DATA ; SELECT * FROM ( SELECT row_number() OVER (ORDER BY ORDER OF t5) rn , * FROM t5 ORDER BY table_schema, table_type FETCH FIRST 100 ROWS ONLY WITH ur ) t WHERE rn >=90;
业务在写这类 SQL 的时候可能会不写最里面的 ORDER BY
语句,这样性能会快很多。但是结果集的顺序就是不确定。需要确认是否符合业务需求。
- OB(ORACLE) 示例
CREATE TABLE t5 AS SELECT * FROM all_objects; EXPLAIN EXTENDED_NOADDR SELECT * FROM ( SELECT rownum rn, owner, object_name, object_type, created, last_ddl_time FROM ( SELECT * FROM t5 ORDER BY owner,object_name ) WHERE rownum < =100 ) WHERE rn >=90 ; ======================================== |ID|OPERATOR |NAME |EST. ROWS|COST| ---------------------------------------- |0 |COUNT | |11 |1143| |1 | SUBPLAN SCAN |VIEW1|11 |1143| |2 | LIMIT | |11 |1143| |3 | TOP-N SORT | |100 |1141| |4 | TABLE SCAN|T5 |729 |891 | ======================================== Outputs & filters: ------------------------------------- 0 - output([rownum() + ?], [VIEW1.T5B.OWNER], [VIEW1.T5B.OBJECT_NAME], [VIEW1.T5B.SUBOBJECT_NAME], [VIEW1.T5B.OBJECT_ID], [VIEW1.T5B.DATA_OBJECT_ID], [VIEW1.T5B.OBJECT_TYPE], [VIEW1.T5B.CREATED], [VIEW1.T5B.LAST_DDL_TIME], [VIEW1.T5B.TIMESTAMP], [VIEW1.T5B.STATUS], [VIEW1.T5B.TEMPORARY], [VIEW1.T5B.GENERATED], [VIEW1.T5B.SECONDARY], [VIEW1.T5B.NAMESPACE], [VIEW1.T5B.EDITION_NAME]), filter(nil) 1 - output([VIEW1.T5B.OWNER], [VIEW1.T5B.OBJECT_NAME], [VIEW1.T5B.SUBOBJECT_NAME], [VIEW1.T5B.OBJECT_ID], [VIEW1.T5B.DATA_OBJECT_ID], [VIEW1.T5B.OBJECT_TYPE], [VIEW1.T5B.CREATED], [VIEW1.T5B.LAST_DDL_TIME], [VIEW1.T5B.TIMESTAMP], [VIEW1.T5B.STATUS], [VIEW1.T5B.TEMPORARY], [VIEW1.T5B.GENERATED], [VIEW1.T5B.SECONDARY], [VIEW1.T5B.NAMESPACE], [VIEW1.T5B.EDITION_NAME]), filter(nil), access([VIEW1.T5B.OWNER], [VIEW1.T5B.OBJECT_NAME], [VIEW1.T5B.SUBOBJECT_NAME], [VIEW1.T5B.OBJECT_ID], [VIEW1.T5B.DATA_OBJECT_ID], [VIEW1.T5B.OBJECT_TYPE], [VIEW1.T5B.CREATED], [VIEW1.T5B.LAST_DDL_TIME], [VIEW1.T5B.TIMESTAMP], [VIEW1.T5B.STATUS], [VIEW1.T5B.TEMPORARY], [VIEW1.T5B.GENERATED], [VIEW1.T5B.SECONDARY], [VIEW1.T5B.NAMESPACE], [VIEW1.T5B.EDITION_NAME]) 2 - output([T5.OWNER], [T5.OBJECT_NAME], [T5.SUBOBJECT_NAME], [T5.OBJECT_ID], [T5.DATA_OBJECT_ID], [T5.OBJECT_TYPE], [T5.CREATED], [T5.LAST_DDL_TIME], [T5.TIMESTAMP], [T5.STATUS], [T5.TEMPORARY], [T5.GENERATED], [T5.SECONDARY], [T5.NAMESPACE], [T5.EDITION_NAME]), filter(nil), limit(?), offset(?) 3 - output([T5.OWNER], [T5.OBJECT_NAME], [T5.SUBOBJECT_NAME], [T5.OBJECT_ID], [T5.DATA_OBJECT_ID], [T5.OBJECT_TYPE], [T5.CREATED], [T5.LAST_DDL_TIME], [T5.TIMESTAMP], [T5.STATUS], [T5.TEMPORARY], [T5.GENERATED], [T5.SECONDARY], [T5.NAMESPACE], [T5.EDITION_NAME]), filter(nil), sort_keys([T5.OWNER, ASC], [T5.OBJECT_NAME, ASC]), topn(cast(cast(?, NUMBER(-1, -1)) + cast(?, NUMBER(-1, -1)), BIGINT(-1, 0))) 4 - output([T5.OWNER], [T5.OBJECT_NAME], [T5.SUBOBJECT_NAME], [T5.OBJECT_ID], [T5.DATA_OBJECT_ID], [T5.OBJECT_TYPE], [T5.CREATED], [T5.LAST_DDL_TIME], [T5.TIMESTAMP], [T5.STATUS], [T5.TEMPORARY], [T5.GENERATED], [T5.SECONDARY], [T5.NAMESPACE], [T5.EDITION_NAME]), filter(nil), access([T5.OWNER], [T5.OBJECT_NAME], [T5.SUBOBJECT_NAME], [T5.OBJECT_ID], [T5.DATA_OBJECT_ID], [T5.OBJECT_TYPE], [T5.CREATED], [T5.LAST_DDL_TIME], [T5.TIMESTAMP], [T5.STATUS], [T5.TEMPORARY], [T5.GENERATED], [T5.SECONDARY], [T5.NAMESPACE], [T5.EDITION_NAME]), partitions(p0), is_index_back=false, range_key([T5.__pk_increment]), range(MIN ; MAX)always true
注意,OB(ORACLE) 并没有 ORDER BY ORDER OF t
这种用法。也不支持 with ur
这种脏读语法。
标准的 ORACLE 分页框架用法就是两层嵌套循环,最里层是先对结果集排序。这点跟 DB2 的写法不同。当表很大并且查询条件不是很好发挥索引性能的时候,这个分页查询性能往往会不好。主要是最里面的那层排序耗时。
特殊的业务场景下,如果要查询的字段很少,能通过索引覆盖的话,就可以提升性能。或者将这个查询改写为自关联的表连接,以提升分页的性能。
- 分页示例优化(假设 t5 表很大):
EXPLAIN EXTENDED_NOADDR SELECT l.rn, r.* FROM ( SELECT rownum rn,rid FROM ( SELECT rowid rid FROM t5 ORDER BY owner,object_name ) WHERE rownum <= 100 ) l JOIN t5 r ON l.rid=r.rowid WHERE rn >=90 ORDER BY l.rn ; ===================================================== |ID|OPERATOR |NAME |EST. ROWS|COST| ----------------------------------------------------- |0 |SORT | |358 |2645| |1 | MERGE JOIN | |358 |1877| |2 | TABLE SCAN |R(rowid_index)|729 |1002| |3 | SORT | |50 |775 | |4 | SUBPLAN SCAN |L |50 |727 | |5 | COUNT | |100 |723 | |6 | SUBPLAN SCAN |VIEW1 |100 |722 | |7 | LIMIT | |100 |720 | |8 | TOP-N SORT | |100 |719 | |9 | TABLE SCAN|T5 |729 |570 | ===================================================== Outputs & filters: ------------------------------------- 0 - output([L.RN], [R.OWNER], [R.OBJECT_NAME], [R.SUBOBJECT_NAME], [R.OBJECT_ID], [R.DATA_OBJECT_ID], [R.OBJECT_TYPE], [R.CREATED], [R.LAST_DDL_TIME], [R.TIMESTAMP], [R.STATUS], [R.TEMPORARY], [R.GENERATED], [R.SECONDARY], [R.NAMESPACE], [R.EDITION_NAME]), filter(nil), sort_keys([L.RN, ASC]) 1 - output([L.RN], [R.OWNER], [R.OBJECT_NAME], [R.SUBOBJECT_NAME], [R.OBJECT_ID], [R.DATA_OBJECT_ID], [R.OBJECT_TYPE], [R.CREATED], [R.LAST_DDL_TIME], [R.TIMESTAMP], [R.STATUS], [R.TEMPORARY], [R.GENERATED], [R.SECONDARY], [R.NAMESPACE], [R.EDITION_NAME]), filter(nil), equal_conds([L.RID = R.ROWID]), other_conds(nil) 2 - output([R.ROWID], [R.OWNER], [R.OBJECT_NAME], [R.SUBOBJECT_NAME], [R.OBJECT_ID], [R.DATA_OBJECT_ID], [R.OBJECT_TYPE], [R.CREATED], [R.LAST_DDL_TIME], [R.TIMESTAMP], [R.STATUS], [R.TEMPORARY], [R.GENERATED], [R.SECONDARY], [R.NAMESPACE], [R.EDITION_NAME]), filter(nil), access([R.__pk_increment], [R.ROWID], [R.OWNER], [R.OBJECT_NAME], [R.SUBOBJECT_NAME], [R.OBJECT_ID], [R.DATA_OBJECT_ID], [R.OBJECT_TYPE], [R.CREATED], [R.LAST_DDL_TIME], [R.TIMESTAMP], [R.STATUS], [R.TEMPORARY], [R.GENERATED], [R.SECONDARY], [R.NAMESPACE], [R.EDITION_NAME]), partitions(p0), is_index_back=false, range_key([R.ROWID]), range(MIN ; MAX)always true 3 - output([L.RN], [L.RID]), filter(nil), sort_keys([L.RID, ASC]) 4 - output([L.RID], [L.RN]), filter([L.RN >= 90]), access([L.RID], [L.RN]) 5 - output([rownum()], [VIEW1..RID]), filter(nil) 6 - output([VIEW1..RID]), filter(nil), access([VIEW1..RID]) 7 - output([T5.ROWID]), filter(nil), limit(?), offset(nil) 8 - output([T5.ROWID]), filter(nil), sort_keys([T5.OWNER, ASC], [T5.OBJECT_NAME, ASC]), topn(?) 9 - output([T5.ROWID], [T5.OWNER], [T5.OBJECT_NAME]), filter(nil), access([T5.__pk_increment], [T5.ROWID], [T5.OWNER], [T5.OBJECT_NAME]), partitions(p0), is_index_back=false, range_key([T5.__pk_increment]), range(MIN ; MAX)always true
如果 DB2 的 row_number
函数不仅仅是分页,还有分组功能,那就转换为 ORACLE 对应的 row_number
函数用法。
FETCH FIRST N ROWS
在 OB 3.1 版本之后,也支持 FETCH FIRST N ROWS
。3.1 版本之前,就按上面分页方法。
explain extended_noaddr SELECT * FROM ( SELECT row_number() OVER (ORDER BY owner,object_name) rn , t5.* FROM t5 ORDER BY owner, object_name FETCH FIRST 100 ROWS ONLY ) t5b WHERE rn >=90; ========================================== |ID|OPERATOR |NAME|EST. ROWS|COST| ------------------------------------------ |0 |SUBPLAN SCAN |T5B |50 |2749| |1 | LIMIT | |100 |2746| |2 | WINDOW FUNCTION| |100 |2744| |3 | SORT | |729 |2357| |4 | TABLE SCAN |T5 |729 |891 | ========================================== Outputs & filters: ------------------------------------- 0 - output([T5B.RN], [T5B.OWNER], [T5B.OBJECT_NAME], [T5B.SUBOBJECT_NAME], [T5B.OBJECT_ID], [T5B.DATA_OBJECT_ID], [T5B.OBJECT_TYPE], [T5B.CREATED], [T5B.LAST_DDL_TIME], [T5B.TIMESTAMP], [T5B.STATUS], [T5B.TEMPORARY], [T5B.GENERATED], [T5B.SECONDARY], [T5B.NAMESPACE], [T5B.EDITION_NAME]), filter([T5B.RN >= 90]), access([T5B.RN], [T5B.OWNER], [T5B.OBJECT_NAME], [T5B.SUBOBJECT_NAME], [T5B.OBJECT_ID], [T5B.DATA_OBJECT_ID], [T5B.OBJECT_TYPE], [T5B.CREATED], [T5B.LAST_DDL_TIME], [T5B.TIMESTAMP], [T5B.STATUS], [T5B.TEMPORARY], [T5B.GENERATED], [T5B.SECONDARY], [T5B.NAMESPACE], [T5B.EDITION_NAME]) 1 - output([T_WIN_FUN_ROW_NUMBER()], [T5.OWNER], [T5.OBJECT_NAME], [T5.SUBOBJECT_NAME], [T5.OBJECT_ID], [T5.DATA_OBJECT_ID], [T5.OBJECT_TYPE], [T5.CREATED], [T5.LAST_DDL_TIME], [T5.TIMESTAMP], [T5.STATUS], [T5.TEMPORARY], [T5.GENERATED], [T5.SECONDARY], [T5.NAMESPACE], [T5.EDITION_NAME]), filter(nil), limit(?), offset(nil) 2 - output([T_WIN_FUN_ROW_NUMBER()], [T5.OWNER], [T5.OBJECT_NAME], [T5.SUBOBJECT_NAME], [T5.OBJECT_ID], [T5.DATA_OBJECT_ID], [T5.OBJECT_TYPE], [T5.CREATED], [T5.LAST_DDL_TIME], [T5.TIMESTAMP], [T5.STATUS], [T5.TEMPORARY], [T5.GENERATED], [T5.SECONDARY], [T5.NAMESPACE], [T5.EDITION_NAME]), filter(nil), win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by(nil), order_by([T5.OWNER, ASC], [T5.OBJECT_NAME, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING) 3 - output([T5.OWNER], [T5.OBJECT_NAME], [T5.SUBOBJECT_NAME], [T5.OBJECT_ID], [T5.DATA_OBJECT_ID], [T5.OBJECT_TYPE], [T5.CREATED], [T5.LAST_DDL_TIME], [T5.TIMESTAMP], [T5.STATUS], [T5.TEMPORARY], [T5.GENERATED], [T5.SECONDARY], [T5.NAMESPACE], [T5.EDITION_NAME]), filter(nil), sort_keys([T5.OWNER, ASC], [T5.OBJECT_NAME, ASC]) 4 - output([T5.OWNER], [T5.OBJECT_NAME], [T5.SUBOBJECT_NAME], [T5.OBJECT_ID], [T5.DATA_OBJECT_ID], [T5.OBJECT_TYPE], [T5.CREATED], [T5.LAST_DDL_TIME], [T5.TIMESTAMP], [T5.STATUS], [T5.TEMPORARY], [T5.GENERATED], [T5.SECONDARY], [T5.NAMESPACE], [T5.EDITION_NAME]), filter(nil), access([T5.OWNER], [T5.OBJECT_NAME], [T5.SUBOBJECT_NAME], [T5.OBJECT_ID], [T5.DATA_OBJECT_ID], [T5.OBJECT_TYPE], [T5.CREATED], [T5.LAST_DDL_TIME], [T5.TIMESTAMP], [T5.STATUS], [T5.TEMPORARY], [T5.GENERATED], [T5.SECONDARY], [T5.NAMESPACE], [T5.EDITION_NAME]), partitions(p0), is_index_back=false, range_key([T5.__pk_increment]), range(MIN ; MAX)always true
列名重复问题
DB2 的查询结果集可以接受有重复的列名(投影列),但是 OB(ORACLE) 不行。
DB2 示例:
db2 => SELECT * FROM (SELECT id,c1,c1 FROM t3); ID C1 C1 -------------------- -------------------------------------------------- -------------------------------------------------- 1 2 2 record(s) selected. db2 =>
ORACLE 示例:
obclient> SELECT * FROM (SELECT id,c1,c1 FROM t3); ORA-00918: column 'C1' in field list ambiguously defined obclient>
表别名问题
DB2 表别名语法用 AS
关键字,是可选的。但是 OB(ORACLE) 表别名不支持 AS
关键字。
- DB2 示例:
db2 => select id , c1 as c1_new from t4 as t4_new ; ID C1_NEW -------------------- -------------------------------------------------- 1 a 1 record(s) selected. db2 => select id , c1 c1_new from t4 t4_new ; ID C1_NEW -------------------- -------------------------------------------------- 1 a 1 record(s) selected.
- OB(ORACLE) 示例:
obclient> select id , c1 as c1_new from t4 as t4_new ; ORA-00900: You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use near 't4_new' at line 1 obclient> select id , c1 as c1_new from t4 t4_new ; +----+--------+ | ID | C1_NEW | +----+--------+ | 2 | a | +----+--------+ 1 row in set (0.00 sec) obclient>
DB2 的 ORACLE 兼容模式
为了减少 DB2 和 ORACLE 的不兼容逻辑,或者说 IBM 当初为了减少 ORALCE 应用迁移到 DB2 上的修改成本,DB2 可以开启 ORACLE 兼容模式。这个需要在创建数据库的时候就指定,后期不能修改。适合以 OB(ORACLE) 为主,DB2 为异构备的方案。
- DB2 数据库开启 ORACLE 兼容模式方法:
[db2inst1@08935f5bff4f ~]$ db2set -all [i] DB2COMM=TCPIP [g] DB2SYSTEM=08935f5bff4f [db2inst1@08935f5bff4f ~]$ db2set DB2_COMPATIBILITY_VECTOR=ORA [db2inst1@08935f5bff4f ~]$ db2set -all [i] DB2_COMPATIBILITY_VECTOR=ORA [i] DB2COMM=TCPIP [g] DB2SYSTEM=08935f5bff4f [db2inst1@08935f5bff4f ~]$ db2stop 01/28/2021 03:03:23 0 0 SQL1064N DB2STOP processing was successful. SQL1064N DB2STOP processing was successful. [db2inst1@08935f5bff4f ~]$ db2start 01/28/2021 03:05:13 0 0 SQL1063N DB2START processing was successful. SQL1063N DB2START processing was successful. [db2inst1@08935f5bff4f ~]$ [db2inst1@59f37c64eddd ~]$ db2 create db db2ora using codeset utf-8 territory US DB20000I The CREATE DATABASE command completed successfully. [db2inst1@59f37c64eddd ~]$
- DB2 建表示例
[db2inst1@a32408ac626d ~]$ db2 "DROP TABLE T10;" DB20000I The SQL command completed successfully. [db2inst1@a32408ac626d ~]$ db2 "CREATE TABLE T10 ( > ID BIGINT NOT NULL GENERATED BY DEFAULT AS IDENTITY (MAXVALUE 9223372036854775807), > C1 NUMBER NOT NULL, > C2 VARCHAR2(50) NOT NULL, > C3 NVARCHAR2(50) NOT NULL, > C4 DATE NOT NULL DEFAULT SYSDATE, > C5 TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP > );" DB20000I The SQL command completed successfully. [db2inst1@a32408ac626d ~]$ [db2inst1@a32408ac626d ~]$ db2 "INSERT INTO t10(c1,c2,c3) values(1,'a','A');" DB20000I The SQL command completed successfully. [db2inst1@a32408ac626d ~]$ [db2inst1@a32408ac626d ~]$ db2 "INSERT INTO t10(c1,c2,c3) values(2,'b','');" DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned: SQL0407N Assignment of a NULL value to a NOT NULL column "TBSPACEID=2, TABLEID=523, COLNO=3" is not allowed. SQLSTATE=23502
可见,当开启 ORACLE 兼容模式后,支持 ORACLE 的列类型,并且 NULL
跟空值也等同对待了。
查看一下表结构,可见在 DB2 内部,表的列还是以 DB2 的原生类型存储。
[db2inst1@a32408ac626d ~]$ db2look -d testdb -e -t T10 -- No userid was specified, db2look tries to use Environment variable USER -- USER is: DB2INST1 -- The db2look utility will consider only the specified tables -- Creating DDL for table(s) -- This CLP file was created using DB2LOOK Version "11.5" -- Timestamp: Sat Aug 14 09:52:21 2021 -- Database Name: TESTDB -- Database Manager Version: DB2/LINUXX8664 Version 11.5.6.0 -- Database Codepage: 1208 -- Database Collating Sequence is: IDENTITY -- Alternate collating sequence(alt_collate): null -- varchar2 compatibility(varchar2_compat): ON -- Binding package automatically ... -- Bind is successful CONNECT TO TESTDB; ------------------------------------------------ -- DDL Statements for Table "TPCC"."T10" ------------------------------------------------ CREATE TABLE "TPCC"."T10" ( "ID" BIGINT NOT NULL GENERATED BY DEFAULT AS IDENTITY ( START WITH +1 INCREMENT BY +1 MINVALUE +1 MAXVALUE +9223372036854775807 NO CYCLE CACHE 20 NO ORDER ) , "C1" DECFLOAT(16) NOT NULL , "C2" VARCHAR(50 OCTETS) NOT NULL , "C3" VARCHAR(50 CODEUNITS32) NOT NULL , "C4" TIMESTAMP(0) NOT NULL WITH DEFAULT "SYSIBM"."TIMESTAMP"(CURRENT TIMESTAMP) , "C5" TIMESTAMP NOT NULL WITH DEFAULT CURRENT TIMESTAMP ) IN "USERSPACE1" ORGANIZE BY ROW; ALTER TABLE "TPCC"."T10" ALTER COLUMN "ID" RESTART WITH 21; COMMIT WORK; CONNECT RESET; TERMINATE; [db2inst1@a32408ac626d ~]$
注意,如果如果涉及到国家字符集,则按DB2的 Unicode 规则保存,每个汉字是 4 个字节。
DB2 和 OB(ORACLE) 的异构数据同步
DB2 和 ORACLE 语法的客观差异决定 DB2 到 OB(ORACLE) 的同步或者并轨运行不会是一件轻松的事情。实际项目有的是业务自己同步,有的是靠产品同步。
如果是产品做异构同步,业务需要做一些妥协。比如说遵守特定的数据库表结构设计规范、SQL 规范等等。传统的数据库同步产品 OGG 、DSG 等做异构同步也是同样的经验。
就 DB2 和 OB(ORACLE)之间的实时同步,可以使用 OB 官方数据同步产品(OMS)或者 OB 合作伙伴产品(DSG)。如果是离线同步,可以使用 Alibaba 开源的 DATAX 产品( DATAX 的 OB 插件源码也已经随着 OB 社区版一起开源)。
OMS 的 DB2 同步功能包括:
- DB2 和 OB(ORACLE) 彼此的全量表结构同步。
- DB2 和 OB(ORACLE) 彼此的全量数据同步。
- DB2 和 OB(ORACLE) 彼此的增量数据同步。
- DB2 和 OB(ORACLE) 彼此的增量表结构同步。
注意,这里说的是具备双向同步的能力,但是只激活单向的同步链路。
非表结构对象在技术上存在一些客观的技术困难,需要人为分析再手动同步:
- 序列的同步。目标端再激活之前,序列的值需要推高。在 OB(ORACLE) 里目前推高序列的方法是重建序列。
- 同义词的同步。同义词链接的对象可能是其他 用户或SCHEMA 下的对象,这有个依赖问题,需要分析后手动同步。
- 存储过程的同步。DB2 和 OB(ORACLE) 存储过程语法差异非常大,很多用法都要翻译为兼容性的实现。目前还需要手动做。
其他参考
单纯的看 DB2 到 OB(ORACLE) 的异构同步,是一件困难麻烦多收益少的事情。还需要其他角度来看这个事情。
首先是可以降低对 IBM 小机和存储等高端硬件的依赖,相应的数据库运维技术门槛也降低。稳定性方面 OB 通过多副本(3 副本或 5 副本)以及容灾集群来提高整体的可靠性。
第二是数据库架构从集中式转向分布式,业务的架构也相应的转向分布式架构,两条腿走路跑的快。虽然目前看复杂的 SQL,单条语句性能 OB 可能不如小机上的 DB2 跑的快,但是 OB 的并发处理能力强。当并发上去后(如50、100等等),复杂 SQL 在 OB 上的平均延时水平很可能比 DB2 低很多,吞吐能力会更高。
第三是成熟的开发商(ISV)的应用本身都适配过 DB2、ORACLE 和 MySQL。所以在核心业务表结构设计上对特定数据库依赖很少,尽可能的取各个数据库能力的交集。这样的业务做 DB2 到 OB(ORACLE) 的迁移,困难还会少很多。
附录
DB2 DOCKER 环境搭建
- 下载 db2 镜像
docker pull ibmcom/db2 docker images |grep db2 REPOSITORY TAG IMAGE ID CREATED SIZE ibmcom/db2 latest a6a5ee354fb1 7 weeks ago 2.95GB
- 起 db2 容器(兼容ORACLE)
# docker volume create db2-11.5 # docker run -itd --name db2_11.5 --privileged=true -p 50000:50000 \ -e LICENSE=accept \ -e DB2INST1_PASSWORD=db2inst1 \ -e DBNAME=testdb \ -e DBPORT=50000 \ -e TSPORT=55000 \ -e ENABLE_ORACLE_COMPATIBILITY=true \ -v db2-11.5:/database ibmcom/db2 -- 等 5 分钟,db2 容器初始化完毕 # docker logs db2_11.5 -- 进入 db2 容器 # docker exec -it db2-11.5 bash # su - db2inst1 -- 连接数据库,查看表信息 $ db2 connect to testdb $ db2 set current schema tpcc $ db2 list tables for schema tpcc $ db2 describe table t10 -- 查看 db2 归档日志设置,增量同步需要开启归档 $ db2 get db cfg for testdb |grep -i log -- 查看/修改 db2 默认监听端口 $ db2 get dbm cfg |grep -i tcp/ip TCP/IP Service name (SVCENAME) = db2c_db2inst1 $ cat /etc/services |grep db2c_db2inst1 db2c_db2inst1 50000/tcp db2c_db2inst1_ssl 50001/tcp --修改 /etc/services 端口后,重启 db2 $ db2stop $ db2start
DATAX FOR OB 同步任务文件示例
{ "job": { "setting": { "speed": { "channel": 8, }, "errorLimit": { "record": 10000 } }, "content": [ { "reader": { "name": "db2reader", "parameter": { "username": "db2inst1", "password": "******", "splitPK":"ID", "column": ["id", "c1", "c2"] , "connection": [ { "jdbcUrl": [ "jdbc:db2://127.0.0.1:50000/testdb:currentSchema=TPCC;" ], "table": [ "datax_test" ] } ] } }, "writer": { "name": "oceanbasev10writer", "parameter": { "writeMode": "insert", "username": "tpcc", "password": "******", "column": ["id", "c1", "c2"] , "connection": [ { "jdbcUrl": "||_dsc_ob10_dsc_||obdemo:oboracle||_dsc_ob10_dsc_||jdbc:oceanbase://127.0.0.1:2883/tpcc", "table": [ "datax_test" ] } ] } } } ] } }
运行方法:
cd /home/admin/datax3 && bin/datax.py job/datax_test.json
更多阅读
- OceanBase 企业版下载使用指引
- 从ORACLE/MySQL到OceanBase:数据导出&导入
- OceanBase 索引列的NULL属性测试分析
- 闲话 OB 部署架构实践
- OceanBase 性能诊断(一)