BUG ?一个小小的UPDATE语句竟然还涉及这么多门路
- UPDATE FROM 细节探究
- 小结一下
看腻了就来听听视频演示吧(持续更新中):https://www.bilibili.com/video/BV1DH4y1c7ni/
UPDATE FROM 细节探究
创建测试表和数据:
drop table IF EXISTS test1;
CREATE TABLE test1(id int,name text);
insert into test1 values(1,'PostgreSQL'),(2,'Oracle'),(3,'MySQL');
drop table IF EXISTS test2;
CREATE TABLE test2(id int,name text);
insert into test2 values(5,'TiDB'),(1,'PG'),(2,'OG'),(2,'OG-HA'),(2,'ABCDEFG'),(2,'A'),(2,'abcefg'),(2,'O'),(2,'SY'),(2,'DBA'),(2,'og'),(2,'openGauss'),(4,'Oracle'),(6,'SQLServer');
select * from test1 order by 1;
select ctid,* from test2 order by id,name;
select ctid,* from test2 order by id,ctid;
PostgreSQL系数据库update from关联存在重复数据时只更新首数据,即第一行插入的数据,严格来讲是ctid最小的那一行数据:
看下PG执行计划,可以看到走merge join时是有序的所以更新第一行(ctid最小的那行)数据,关闭merge join后走hash join时则更新最后一行(ctid最大的那行)数据。下面来看一下3个执行计划的执行情况:
merge join:
begin;update test1 t1 set name = t2.name from test2 t2 where t1.id = t2.id returning t1.id,t1.name;rollback;
begin;explain analyze update test1 t1 set name = t2.name from test2 t2 where t1.id = t2.id returning t1.id,t1.name;rollback;
hash join:
关闭enable_mergejoin让优化器偏向于不选择mergejoin
set enable_mergejoin = off;
begin;update test1 t1 set name = t2.name from test2 t2 where t1.id = t2.id returning t1.id,t1.name;rollback;
begin;explain analyze update test1 t1 set name = t2.name from test2 t2 where t1.id = t2.id returning t1.id,t1.name;rollback;
nested loop:
update test2 set name='update-first' where name='OG';
select ctid,* from test2 order by id,ctid;
set enable_hashjoin to off;
begin;update test1 t1 set name = t2.name from test2 t2 where t1.id = t2.id returning t1.id,t1.name;rollback;
begin;explain analyze update test1 t1 set name = t2.name from test2 t2 where t1.id = t2.id returning t1.id,t1.name;rollback;
还原参数设置:
reset enable_mergejoin;
reset enable_hashjoin;
小结一下
- 执行计划不仅会影响执行效率,还会影响执行结果集「 BUG ?」
- update from 关联表存在重复数据时,更新的数据行受执行路径的影响,会选择ctid最小或最大的行做更新操作,具体执行计划和选择ctid大小对应关系如下:
执行路径 | merge join | hash join | nestedloop |
---|---|---|---|
更新ctid值 | min | max | min |