从OB3.x中not in无法走hash连接来了解优化器的发展

数据运维 2024-05-07 泡泡手机阅读

前言：

稍微有经验的oracle dba都应该知道，not in 和 not exists是有区别的，不能随意的替换，原因在于not in对于null值不进行处理。所以如果要等价替换的话要考虑主表与关联表的数据是否有空值。

案例分析：

提及上面的小知识点，也是为了抛砖引玉，引入今天的案例，有同事在OB3.2.3.3的环境上发现了一条not in的sql，反馈执行时间1s多，走了nl反连接，他尝试改写为了not exists，走了hash反连接，执行时间降低到了0.2s。所以他尝试添加hint强制让原语句去走hash连接，可以无论如何都无法改变连接顺序。

首先正常等价改写的情况下，not in和not exists在执行计划相同的情况下，一般不存在谁优谁劣，其次正常来说等价的条件下执行计划不会有偏差，不会存在not exists可以走的计划，not in不能走。所以我感觉是优化器的缺陷，下面来看下案例。

##############原始sql语句#########
select    count(*)
  from cccc.LOOOOOOOOOOOOOOOOOOOO a
 where a.cccdate is not null
   and a.cccdate <= (sysdate - 30)
   and a.cccdate > (sysdate - 31)
   and a.status = 2 
   and a.ddd not in (select  t.ccc_dddd from  cccc.LOOOOOOOOOOOOOOOOOOOO_ddddd t );
 ######执行计划 ##########################3
| ===============================================================================
|ID|OPERATOR              |NAME                             |EST. ROWS|COST   |
-------------------------------------------------------------------------------
|0 |SCALAR GROUP BY       |                                 |1        |1897316|
|1 | NESTED-LOOP ANTI JOIN|                                 |1        |1897316|
|2 |  TABLE SCAN          |A(INX_LOOOOOOOOOOOOOOOOOOOO)     |29       |913    |
|3 |  MATERIAL            |                                 |345093   |261579 |
|4 |   PX COORDINATOR     |                                 |345093   |245872 |
|5 |    EXCHANGE OUT DISTR|:EX10000                         |345093   |133484 |
|6 |     TABLE SCAN       |T(LOOOOOOOOOOOOOOOOOOOO_ddddd)   |345093   |133484 |
===============================================================================
Outputs & filters: 
-------------------------------------
  0 - output([T_FUN_COUNT(*)(0x7eb709e254f0)]), filter(nil), 
      group(nil), agg_func([T_FUN_COUNT(*)(0x7eb709e254f0)])
  1 - output([remove_const(1)(0x7eb709eb9f10)]), filter(nil), 
      conds([(T_OP_OR, A.ddd(0x7eb709e25130) = T.ccc_dddd(0x7eb709e22e30)(0x7eb709e2dc90), (T_OP_IS, A.ddd(0x7eb709e25130), NULL, 0)(0x7eb709e2ec10), (T_OP_IS, T.ccc_dddd(0x7eb709e22e30), NULL, 0)(0x7eb709e2f810))(0x7eb709e2e530)]), nl_params_(nil), batch_join=false
  2 - output([A.ddd(0x7eb709e25130)]), filter([(T_OP_IS_NOT, A.cccdate(0x7eb709dc9040), NULL, 0)(0x7eb709dc9930)]), 
      access([A.cccdate(0x7eb709dc9040)], [A.ddd(0x7eb709e25130)]), partitions(p0), 
      is_index_back=true, filter_before_indexback[true], 
      range_key([A.STATUS(0x7eb709dcf130)], [A.cccdate(0x7eb709dc9040)], [A.CITY(0x7eb709dc8d50)], [A.__pk_increment(0x7eb709e76030)]), range(2,2023-12-29 17:17:39,MAX,MAX ; 2,2023-12-30 17:17:39,MAX,MAX), 
      range_cond([A.cccdate(0x7eb709dc9040) <= ?(0x7eb709dcb7d0)], [A.cccdate(0x7eb709dc9040) > ?(0x7eb709dcd7e0)], [A.STATUS(0x7eb709dcf130) = 2(0x7eb709dcea10)])
  3 - output([T.ccc_dddd(0x7eb709e22e30)]), filter(nil)
  4 - output([T.ccc_dddd(0x7eb709e22e30)]), filter(nil)
  5 - output([T.ccc_dddd(0x7eb709e22e30)]), filter(nil), is_single, dop=1
  6 - output([T.ccc_dddd(0x7eb709e22e30)]), filter(nil), 
      access([T.ccc_dddd(0x7eb709e22e30)]), partitions(p0), 
      is_index_back=false, 
      range_key([T.CITY(0x7eb709e21200)], [T.ccc_dddd(0x7eb709e22e30)], [T.IIIITIME(0x7eb709e214f0)], [T.__pk_increment(0x7eb709e8b500)]), range(MIN,MIN,MIN,MIN ; MAX,MAX,MAX,MAX)always true

原始的sql无法走hash连接，检查发现两表关联列没有非空约束，都是普通列，但是数据都是非空的，那么直接改写成not exists看下。

#############改写后sql##################33
select  count(*)
  from cccc.LOOOOOOOOOOOOOOOOOOOO a
 where a.cccdate is not null
   and a.cccdate <= (sysdate - 30)
   and a.cccdate > (sysdate - 31)
   and a.status = 2
   and not exists (select  1 from  cccc.LOOOOOOOOOOOOOOOOOOOO_ddddd t where a.ddd =t.ccc_dddd);  
##############执行计划#######################33
| =============================================================================
|ID|OPERATOR             |NAME                             |EST. ROWS|COST  |
-----------------------------------------------------------------------------
|0 |SCALAR GROUP BY      |                                 |1        |416897|
|1 | HASH ANTI JOIN      |                                 |1        |416897|
|2 |  TABLE SCAN         |A(INX_LOOOOOOOOOOOOOOOOOOOO)     |29       |913   |
|3 |  PX COORDINATOR     |                                 |345093   |192135|
|4 |   EXCHANGE OUT DISTR|:EX10000                         |345093   |140758|
|5 |    SUBPLAN SCAN     |VIEW1                            |345093   |140758|
|6 |     TABLE SCAN      |T                                |345093   |135571|
============================================================================= 
Outputs & filters: 
-------------------------------------
  0 - output([T_FUN_COUNT(*)(0x7e87236e4a70)]), filter(nil), 
      group(nil), agg_func([T_FUN_COUNT(*)(0x7e87236e4a70)])
  1 - output([remove_const(1)(0x7e8723778930)]), filter(nil), 
      equal_conds([A.ddd(0x7e87236e1ea0) = T.ccc_dddd(0x7e87236e2190)(0x7e87236ecfc0)]), other_conds(nil)
  2 - output([A.ddd(0x7e87236e1ea0)]), filter([(T_OP_IS_NOT, A.cccdate(0x7e87236880a0), NULL, 0)(0x7e8723688990)]), 
      access([A.cccdate(0x7e87236880a0)], [A.ddd(0x7e87236e1ea0)]), partitions(p0), 
      is_index_back=true, filter_before_indexback[true], 
      range_key([A.STATUS(0x7e872368e190)], [A.cccdate(0x7e87236880a0)], [A.CITY(0x7e8723687db0)], [A.__pk_increment(0x7e8723732910)]), range(2,2023-12-29 17:16:30,MAX,MAX ; 2,2023-12-30 17:16:30,MAX,MAX), 
      range_cond([A.cccdate(0x7e87236880a0) <= ?(0x7e872368a830)], [A.cccdate(0x7e87236880a0) > ?(0x7e872368c840)], [A.STATUS(0x7e872368e190) = 2(0x7e872368da70)])
  3 - output([T.ccc_dddd(0x7e87236e2190)]), filter(nil)
  4 - output([T.ccc_dddd(0x7e87236e2190)]), filter(nil), is_single, dop=1
  5 - output([T.ccc_dddd(0x7e87236e2190)]), filter(nil), 
      access([T.ccc_dddd(0x7e87236e2190)]), partitions(p0), 
      is_index_back=false, 
      range_key([T.CITY(0x7e87236df830)], [T.ccc_dddd(0x7e87236e2190)], [T.IIIITIME(0x7e87236dfb20)], [T.__pk_increment(0x7e872374bf50)]), range(MIN,MIN,MIN,MIN ; MAX,MAX,MAX,MAX)always true

我们主要看下1号算子的output&filters 的对比从OB3.x中not in无法走hash连接来了解优化器的发展-每日运维

可以看到not in的计划走了nest loop anti join，无法改为hash连接的原因是因为，关联列有可能为空的原因。为了验证这个点，我改写了下原始sql，手工限制关联列不为空。

###########修改后的sql##########
select  count(*)
      from cccc.LOOOOOOOOOOOOOOOOOOOO a
      where a.cccdate is not null
      and a.cccdate <= (sysdate - 30)
      and a.cccdate > (sysdate - 31)
      and a.status = 2 and a.ddd is not null 
      and a.ddd not in (select  t.ccc_dddd from  cccc.LOOOOOOOOOOOOOOOOOOOO_ddddd t where t.ccc_dddd is not null);
########执行计划###########
| =============================================================================
|ID|OPERATOR             |NAME                             |EST. ROWS|COST  |
-----------------------------------------------------------------------------
|0 |SCALAR GROUP BY      |                                 |1        |416897|
|1 | HASH ANTI JOIN      |                                 |1        |416897|
|2 |  TABLE SCAN         |A(INX_LOOOOOOOOOOOOOOOOOOOO)     |29       |913   |
|3 |  PX COORDINATOR     |                                 |345093   |192135|
|4 |   EXCHANGE OUT DISTR|:EX10000                         |345093   |140758|
|5 |    SUBPLAN SCAN     |VIEW1                            |345093   |140758|
|6 |     TABLE SCAN      |T                                |345093   |135571|
=============================================================================

Outputs & filters: 
-------------------------------------
  0 - output([T_FUN_COUNT(*)]), filter(nil), 
      group(nil), agg_func([T_FUN_COUNT(*)])
  1 - output([1]), filter(nil), 
      equal_conds([A.ddd = VIEW1.ccc_dddd]), other_conds(nil)
  2 - output([A.ddd]), filter([(T_OP_IS_NOT, A.cccdate, NULL, 0)], [(T_OP_IS_NOT, A.ddd, NULL, 0)]), 
      access([A.cccdate], [A.ddd]), partitions(p0)
  3 - output([VIEW1.ccc_dddd]), filter(nil)
  4 - output([VIEW1.ccc_dddd]), filter(nil), is_single, dop=1
  5 - output([VIEW1.ccc_dddd]), filter(nil), 
      access([VIEW1.ccc_dddd])
  6 - output([T.ccc_dddd]), filter([(T_OP_IS_NOT, T.ccc_dddd, NULL, 0)]), 
      access([T.ccc_dddd]), partitions(p0)

可以看出来在我手工排除掉关联列为空的情况下，执行计划可以走hash连接了，那这个问题应该就是ob3.x的一个优化器的小缺陷了，后面为了验证我的想法（也为了看下后续优化器有没有修补这个问题），我又去4.x上测试了一下。

测试情况：

数据库版本为4.2.1.2

###########测试数据#########
obclient [SYS]> select  * from a; select  * from b;
+------+------+
| ID   | VA   |
+------+------+
|    1 | cc   |
|    2 | cc   |
| NULL | cc   |
+------+------+
3 rows in set (0.004 sec)

+------+------+
| ID   | VA   |
+------+------+
|    1 | cc   |
| NULL | cc   |
+------+------+
2 rows in set (0.006 sec)
##############测试语句##########
obclient [SYS]> desc select  * from a where va='cc' and id not in (select id from b );
+------------------------------------------------------------------------------------------+
| Query Plan                                                                               |
+------------------------------------------------------------------------------------------+
| ===========================================================                              |
| |ID|OPERATOR               |NAME    |EST.ROWS|EST.TIME(us)|                              |
| -----------------------------------------------------------                              |
| |0 |HASH RIGHT ANTI JOIN NA|        |1       |9           |                              |
| |1 |├─PX COORDINATOR       |        |1       |5           |                              |
| |2 |│ └─EXCHANGE OUT DISTR |:EX10000|1       |5           |                              |
| |3 |│   └─TABLE FULL SCAN  |B       |1       |4           |                              |
| |4 |└─TABLE FULL SCAN      |A       |2       |4           |                              |
| ===========================================================                              |
| Outputs & filters:                                                                       |
| -------------------------------------                                                    |
|   0 - output([A.ID], [A.VA]), filter(nil), rowset=16                                     |
|       equal_conds([A.ID = B.ID]), other_conds(nil)                                       |
|   1 - output([B.ID]), filter(nil), rowset=16                                             |
|   2 - output([B.ID]), filter(nil), rowset=16                                             |
|       is_single, dop=1                                                                   |
|   3 - output([B.ID]), filter(nil), rowset=16                                             |
|       access([B.ID]), partitions(p0)                                                     |
|       is_index_back=false, is_global_index=false,                                        |
|       range_key([B.__pk_increment]), range(MIN ; MAX)always true                         |
|   4 - output([A.VA], [A.ID]), filter([A.VA = cast('cc', VARCHAR2(1048576 ))]), rowset=16 |
|       access([A.VA], [A.ID]), partitions(p0)                                             |
|       is_index_back=false, is_global_index=false, filter_before_indexback[false],        |
|       range_key([A.__pk_increment]), range(MIN ; MAX)always true                         |
+------------------------------------------------------------------------------------------+
24 rows in set (0.072 sec)

通过这个简单的测试可以看出来OB在4.x的优化器中已经把这个小问题优化掉了。

结论：

首先先说明下，该问题只是因为判断逻辑的原因影响了执行计划的选择，导致可以选择的执行路径变少了，所以可能无法选择最优的执行路径，并不会影响数据的结果。

从4.x的测试结果来看,OB的优化器（当然不止优化器，整个产品都在进步）在不断的进步和优化，也希望国产数据库早日成熟。

行之所向，莫问远方。

从OB3.x中not in无法走hash连接来了解优化器的发展

前言：

案例分析：

测试情况：

结论：

activity oracle

oracle 12c in memory

怎么遍历mysql数据库表

mysql压力测试脚本实例

openGauss Summit 2023 云和恩墨专场召开，MogDB 数据库技术创新与生态实践引发期待