20230906_openGauss优化案例1_批处理业务优化

2023年 9月 12日 32.9k 0

0. 优化效果汇总

本案例还比较有意思,通过实操熟悉Oracle到openGauss存储过程改写和og的数据库特性, 原程序为Oracle 批处理业务,经过openGauss相同语义改写后,程序运行4178秒,优化改写后程序运行1240秒,使用SMP并行优化后运行734秒(测试主机为个人PC虚拟机),将优化程序分享出来,有兴趣的小伙伴,可以操练起来。

1. 性能优化意义

信息系统性能优化可以提高设备和服务的性能、降低成本和功耗、提高用户体验和服务质量,最短时间内实现业务目标。

2. 案例背景

• 税务部门在交流访问时发现在事务处理数据上运行BI程序可以有效的识别 欺诈和错误信息,规避偷税漏税行为。该系统可在几周内收回成本.
• 当前系统在大型机上运行,并将所有时间都花在使用非关系型数据库,没有能力处理BI.
• 税务部门对 BI 感兴趣,但仍必须能够执行基本处理。基于提供的测试数据,税务部门认为他们的主机系统 1 小时完成处理数据。税务部门将该应用程序转换为在 Oracle 上运行,并且失望地发现它在小型开发服务器上运行 2 小时。税务机关认为
数据应在 10 分钟内处理完毕,以确保实际生产运行性能。客户咨询openGauss是否可以满足批处理性能要求。

3. Team Exercise

“VAT clearing/matching(增值税清算/匹配)”
– A公司向B公司销售增值税信息
– B公司向A公司采购增值税信息
– 增值税匹配要求
• “销售” 和 “采购” 数据必须匹配
• 批处理程序同时接收增值税
• 处理可能的错误数据
– 存在重复数据
– “销售” 和 “采购” 数据可能分别在不同的 “数据文件批次”

4. 业务逻辑

5. 测试数据说明

6. 测试脚本运行

### 1. 原程序运行
[omm@og1 team1]$ sh run.sh sol0.sql all
### 2. 优化后程序运行
[omm@og1 team1]$ sh run.sh sol1.sql all
### 3. 只加载一批数据文件
[omm@og1 team1]$ sh run.sh sol2.sql 2001e

7. db参数要求

enable_copy_server_files=true
enable_wdr_snapshot=on

8. 优化思路

原程序主要包括以下存储过程代码

8.1 prematch_buy

逐行处理输入数据,将重复、已匹配数据插入重复数据表(duplicate_buy),其它数据插入预处理购买信息表(prematch_buy).

CREATE OR REPLACE PROCEDURE team1.prematch_buy()
AS DECLARE
cur refcursor;
rec type_table%rowtype;
BEGIN

OPEN cur for select * from t_ext_buy;
loop
FETCH cur INTO rec;
exit when cur%notfound;
if is_matched(rec) then
insert into duplicate_buy
(CODE, SOURCE_ID, SOURCE_ACCOUNT, PROGRAM_ID, TRANSACTION_DATE, transaction_numeric, VALUE1, FUNCTION1, INDICATOR, COMMENTS)
values
(rec.CODE, rec.SOURCE_ID, rec.SOURCE_ACCOUNT, rec.PROGRAM_ID, rec.TRANSACTION_DATE, rec.transaction_numeric, rec.VALUE1, rec.FUNCTION1, rec.INDICATOR, rec.COMMENTS);
else
begin
insert into prematch_buy
(CODE, SOURCE_ID, SOURCE_ACCOUNT, PROGRAM_ID, TRANSACTION_DATE, transaction_numeric, VALUE1, FUNCTION1, INDICATOR, COMMENTS)
values
(rec.CODE, rec.SOURCE_ID, rec.SOURCE_ACCOUNT, rec.PROGRAM_ID, rec.TRANSACTION_DATE, rec.transaction_numeric, rec.VALUE1, rec.FUNCTION1, rec.INDICATOR, rec.COMMENTS);
exception
when unique_violation then
insert into duplicate_buy
(CODE, SOURCE_ID, SOURCE_ACCOUNT, PROGRAM_ID, TRANSACTION_DATE, transaction_numeric, VALUE1, FUNCTION1, INDICATOR, COMMENTS)
values
(rec.CODE, rec.SOURCE_ID, rec.SOURCE_ACCOUNT, rec.PROGRAM_ID, rec.TRANSACTION_DATE, rec.transaction_numeric, rec.VALUE1, rec.FUNCTION1, rec.INDICATOR, rec.COMMENTS);
end;
end if;
end loop;
END;
/

8.2 prematch_sell

处理逻辑与prematch_buy类似

CREATE OR REPLACE PROCEDURE team1.prematch_sell()
AS DECLARE
/*declaration_section*/
cur refcursor;
rec type_table%rowtype;
BEGIN
/*executable_section*/
OPEN cur for select * from t_ext_sell;
loop
FETCH cur INTO rec;
exit when cur%notfound;
if is_matched(rec)
then
insert into duplicate_sell
( CODE
, SOURCE_ID
, SOURCE_ACCOUNT
, PROGRAM_ID
, TRANSACTION_DATE
, transaction_numeric
, VALUE1
, FUNCTION1
, INDICATOR
, COMMENTS
) values (
rec.CODE
, rec.SOURCE_ID
, rec.SOURCE_ACCOUNT
, rec.PROGRAM_ID
, rec.TRANSACTION_DATE
, rec.transaction_numeric
, rec.VALUE1
, rec.FUNCTION1
, rec.INDICATOR
, rec.COMMENTS);
else
begin
insert into prematch_sell
( CODE
, SOURCE_ID
, SOURCE_ACCOUNT
, PROGRAM_ID
, TRANSACTION_DATE
, transaction_numeric
, VALUE1
, FUNCTION1
, INDICATOR
, COMMENTS
) values (
rec.CODE
, rec.SOURCE_ID
, rec.SOURCE_ACCOUNT
, rec.PROGRAM_ID
, rec.TRANSACTION_DATE
, rec.transaction_numeric
, rec.VALUE1
, rec.FUNCTION1
, rec.INDICATOR
, rec.COMMENTS);
exception when unique_violation then
insert into duplicate_sell
( CODE
, SOURCE_ID
, SOURCE_ACCOUNT
, PROGRAM_ID
, TRANSACTION_DATE
, transaction_numeric
, VALUE1
, FUNCTION1
, INDICATOR
, COMMENTS
) values (
rec.CODE
, rec.SOURCE_ID
, rec.SOURCE_ACCOUNT
, rec.PROGRAM_ID
, rec.TRANSACTION_DATE
, rec.transaction_numeric
, rec.VALUE1
, rec.FUNCTION1
, rec.INDICATOR
, rec.COMMENTS);
end;
end if;
end loop;
END;

8.3 match

逐行使用prematch_buy中的数据与prematch_sell进行对比,相匹配则记录matched,并删除prematch_buy和prematch_sell中记录。

CREATE OR REPLACE PROCEDURE team1.match()
AS DECLARE
/*declaration_section*/
rec2 type_table%rowtype;
ctid2 tid;
BEGIN
/*executable_section*/
--for rec1 in( select x1.*, x1.ctid
for rec1 in( select x1.*,ctid
from prematch_buy x1) loop
begin
select x2.*,ctid into rec2.code,
rec2.source_id,
rec2.source_account,
rec2.program_id,
rec2.transaction_date,
rec2.transaction_numeric,
rec2.value1,
rec2.function1,
rec2.indicator,
rec2.comments,
ctid2 from prematch_sell x2
where x2.CODE = rec1.code
and x2.SOURCE_ID = rec1.SOURCE_ID
and x2.SOURCE_ACCOUNT = rec1.SOURCE_ACCOUNT
and x2.PROGRAM_ID = rec1.PROGRAM_ID
and x2.TRANSACTION_DATE = rec1.TRANSACTION_DATE
and x2.transaction_numeric = rec1.transaction_numeric
and x2.value1 = rec1.value1
and x2.function1 = rec1.function1;

insert into matched
(CODE, SOURCE_ID, SOURCE_ACCOUNT, PROGRAM_ID, TRANSACTION_DATE, transaction_numeric, VALUE1, FUNCTION1, INDICATOR_1, INDICATOR_2, COMMENTS_1, COMMENTS_2)
values
(rec1.CODE, rec1.SOURCE_ID, rec1.SOURCE_ACCOUNT, rec1.PROGRAM_ID, rec1.TRANSACTION_DATE, rec1.transaction_numeric, rec1.VALUE1, rec1.FUNCTION1, rec1.INDICATOR, rec2.INDICATOR, rec1.COMMENTS, rec2.COMMENTS);
delete from prematch_buy x1
where ctid=rec1.ctid;
delete from prematch_sell x2
where ctid=ctid2;
exception
when no_data_found then
NULL;
end;
end loop;
END;

8.4 优化思路

问题1、游标逐行匹配
原程序存储过程prematch_buy/prematch_sell/match中使用使用游标逐行匹配处理,效率较低,计划同语义改写为sql整表处理。
问题2、单进程串行处理
openGauss的SMP并行技术利用计算机多核CPU架构来实现多线程并行计算,充分利用CPU资源来提高查询性能的技术。
https://docs.opengauss.org/zh/docs/5.0.0/docs/BriefTutorial/%E5%B9%B6%E8%A1%8C%E6%9F%A5%E8%AF%A2.html

8.5 优化代码展示

[omm@og1 team1]$ cat sol2.sql
SET query_dop = 4;
--step1: find duplicate ctid in t_ext_buy; insert into duplicate rows into duplicate_buy; and delete in t_ext_buy
timing
drop table IF EXISTS buy_duplicate_ctid;
create table buy_duplicate_ctid as
select ctid ctid2 from (
select b.*,
row_number() over (partition by code,
source_id,
source_account,
program_id,
transaction_date,
transaction_numeric,
value1,
function1
order by transaction_date) rowno,ctid
from t_ext_buy b) where rowno>1;

insert into duplicate_buy select * from t_ext_buy where ctid in (select ctid2 from buy_duplicate_ctid);

delete from t_ext_buy where ctid in (select ctid2 from buy_duplicate_ctid);

--step2: delete matched or prematch rows in t_ext_buy
delete from t_ext_buy s
where exists (select 'x'
from matched m
where m.code = s.code
and m.source_id = s.source_id
and m.source_account = s.source_account
and m.program_id = s.program_id
and m.transaction_date = s.transaction_date
and m.transaction_numeric = s.transaction_numeric
and m.value1 = s.value1
and m.function1 = s.function1 )
or exists (select 'x'
from prematch_buy p
where p.code = s.code
and p.source_id = s.source_id
and p.source_account = s.source_account
and p.program_id = s.program_id
and p.transaction_date = s.transaction_date
and p.transaction_numeric = s.transaction_numeric
and p.value1 = s.value1
and p.function1 = s.function1 );

--step3: insert prematch rows
insert into prematch_buy select * from t_ext_buy;

--step4: find duplicate ctid in t_ext_sell; insert into duplicate rows into duplicate_sell; and delete in t_ext_sell
drop table IF EXISTS sell_duplicate_ctid;
create table sell_duplicate_ctid as
select ctid ctid2 from (
select b.*,
row_number() over (partition by code,
source_id,
source_account,
program_id,
transaction_date,
transaction_numeric,
value1,
function1
order by transaction_date) rowno,ctid
from t_ext_sell b) where rowno>1;

insert into duplicate_sell select * from t_ext_sell where ctid in (select ctid2 from sell_duplicate_ctid);

delete from t_ext_sell where ctid in (select ctid2 from sell_duplicate_ctid);

--step5: delete matched or prematch rows in t_ext_buy
delete from t_ext_sell s
where exists (select 'x'
from matched m
where m.code = s.code
and m.source_id = s.source_id
and m.source_account = s.source_account
and m.program_id = s.program_id
and m.transaction_date = s.transaction_date
and m.transaction_numeric = s.transaction_numeric
and m.value1 = s.value1
and m.function1 = s.function1 )
or exists (select 'x'
from prematch_sell p
where p.code = s.code
and p.source_id = s.source_id
and p.source_account = s.source_account
and p.program_id = s.program_id
and p.transaction_date = s.transaction_date
and p.transaction_numeric = s.transaction_numeric
and p.value1 = s.value1
and p.function1 = s.function1 );

--step6: insert prematch rows
insert into prematch_sell select * from t_ext_sell;

--step7: do matched operation
insert
into matched
select buy.code buy_code,
buy.source_id buy_source_id,
buy.source_account buy_source_account,
buy.program_id buy_program_id,
buy.transaction_date buy_transaction_date,
buy.transaction_numeric transaction_numeric,
buy.value1 buy_value,
buy.function1 buy_function,
buy.indicator buy_indicator,
sell.indicator sell_indicator,
buy.comments buy_comments,
sell.comments sell_comments
from prematch_buy buy
join prematch_sell sell
on buy.code = sell.code
and buy.source_id = sell.source_id
and buy.source_account = sell.source_account
and buy.program_id = sell.program_id
and buy.transaction_date = sell.transaction_date
and buy.transaction_numeric = sell.transaction_numeric
and buy.value1 = sell.value1
and buy.function1 = sell.function1;

--step8: delete matched rows in prematch_buy
delete from prematch_buy s
where exists (select 'x'
from matched m
where m.code = s.code
and m.source_id = s.source_id
and m.source_account = s.source_account
and m.program_id = s.program_id
and m.transaction_date = s.transaction_date
and m.transaction_numeric = s.transaction_numeric
and m.value1 = s.value1
and m.function1 = s.function1 );

--step9: delete matched rows in prematch_sell
delete from prematch_sell s
where exists (select 'x'
from matched m
where m.code = s.code
and m.source_id = s.source_id
and m.source_account = s.source_account
and m.program_id = s.program_id
and m.transaction_date = s.transaction_date
and m.transaction_numeric = s.transaction_numeric
and m.value1 = s.value1
and m.function1 = s.function1 );

--step10: views
select 'MATCHED' tab, count(*) row_count from MATCHED
union all
select 'PREMATCH_BUY' tab, count(*) row_count from PREMATCH_BUY
union all
select 'PREMATCH_SELL' tab, count(*) row_count from PREMATCH_SELL
union all
select 'DUPLICATE_BUY' tab, count(*) row_count from DUPLICATE_BUY
union all
select 'DUPLICATE_SELL' tab, count(*) row_count from DUPLICATE_SELL;
SET query_dop = 1;
[omm@og1 team1]$

9. 优化过程记录

9.1 原程序运行

[omm@og1 team1]$ sh run.sh sol0.sql
************** complete task *********************
this step used: 4178 seconds
[omm@og1 ~]$

9.2 sql修改优化

[omm@og1 team1]$ sh run.sh sol1.sql
************** complete task *********************
this step used: 1240 seconds
[omm@og1 team1]$

9.3 SMP并行优化

[omm@og1 team1]$ sh run.sh sol2.sql
************** complete task *********************
this step used: 734 seconds
[omm@og1 team1]$

10. 程序下载

链接:https://pan.baidu.com/s/1eyi66j2c9C02kJtOiGcL-A
提取码:b0p3

相关文章

Oracle如何使用授予和撤销权限的语法和示例
Awesome Project: 探索 MatrixOrigin 云原生分布式数据库
下载丨66页PDF,云和恩墨技术通讯(2024年7月刊)
社区版oceanbase安装
Oracle 导出CSV工具-sqluldr2
ETL数据集成丨快速将MySQL数据迁移至Doris数据库

发布评论