undo 表空间使用率过高排查
一、客户环境描述
架构:2节点RAC
版本: ReHat 7.9 + ORACLE RAC 19.19
二、客户问题反馈
客户反馈,节点2 undo表空间使用率过高,且使用率呈上升趋势,但应用使用目前反馈正常。
三、处理过程
1. 查看表空间使用率
结论:undotbs1 使用率正常,undotbs2 使用率过高。
临时解决方法:
考虑到使用率已经超过90%,且剩余空间不是很大,先通过增加数据文件方式快速降低使用率,避免undo空间不足,导致更严重的问题。
alter tablespace undotbs2 add datafile '+DATA' size 30G;
2. 查看undo 使用具体情况
select a.tablespace_name,
round(c.active_undo,2) "ACTIVE_UNDO(MB)",
round(a.UNEXPIRED_undo,2) "UNEXPIRED_UNDO(MB)",
b.total_undo "TOTAL_UNDO(MB)",
trunc(active_undo / total_undo * 100, 2) || '%' active_undo_pct,
trunc(UNEXPIRED_UNDO / total_undo * 100, 2) || '%' UNEXPIRED_UNDO_PCT
from (select nvl(sum(bytes / 1024 / 1024), 0) UNEXPIRED_UNDO, tablespace_name
from dba_undo_extents
where status = 'ACTIVE' or status='UNEXPIRED'
group by tablespace_name) a,
(select tablespace_name, sum(bytes / 1024 / 1024) total_undo
from dba_data_files
group by tablespace_name) b,
(select nvl(sum(bytes / 1024 / 1024), 0) active_undo, tablespace_name
from dba_undo_extents
where status = 'ACTIVE'
group by tablespace_name) c
where a.tablespace_name = b.tablespace_name
and a.tablespace_name=c.tablespace_name(+)
order by tablespace_name ;
结论:undotbs1 、undotbs2 active 部分都不高,说明undotbs2 当前使用率高,不是active undo部分造成的,但是可以看到undotbs2 unexpired 部分比undotbs1明显多很多,说明导致undotbs2使用率高的原因是undotbs2 存在较多未过期部分,undo不释放。
3. 查看v$undostat 视图
select a.INST_ID,a.BEGIN_TIME,a.END_TIME,a.ACTIVEBLKS,a.EXPIREDBLKS,a.UNEXPIREDBLKS,a.TUNED_UNDORETENTION
from gv$undostat a where inst_id=1;
select a.INST_ID,a.BEGIN_TIME,a.END_TIME,a.ACTIVEBLKS,a.EXPIREDBLKS,a.UNEXPIREDBLKS,a.TUNED_UNDORETENTION
from gv$undostat a where inst_id=2;
结论:undotbs1 、undotbs2 tuned_undoretention 都是10800,说明导致undotbs2 不释放并不是因为tuned_undoretention 变大导致的。但是可以看到,undotbs2 中的activeblks、expiredblks、unexpiredblks 部分都比undotbs1 大很多,说明undotbs2 真实使用的undo 确实就是比undotbs1 多,说明undotbs2中可能存在消耗undo的SQL语句,但是节点1不存在。
4. 分析awr报告
– 节点1
–节点2
SQL> select table_name,index_name from dba_indexes where index_name='I_SPUV_FLAG_UPDATE';
TABLE_NAME INDEX_NAME
---------------------------------------- ---------
FLEX_SMT_PARAM_UPLOADVALUE I_SPUV_FLAG_UPDATE
结论:通过segments by db blocks changes 对比可以看到,变化量最多的块都是FLEX_SMT_PARAM_UPLOADVALUE表,但是节点2变化量明显比节点1 大很多,且节点2 还存在FLEX_FOX_JSON 表变化量大。
5. 查看归档日志切换频率
SELECT thread#,SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) Day,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'00',1,0)) H00,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'01',1,0)) H01,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'02',1,0)) H02,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'03',1,0)) H03,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'04',1,0)) H04,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'05',1,0)) H05,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'06',1,0)) H06,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'07',1,0)) H07,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'08',1,0)) H08,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'09',1,0)) H09,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'10',1,0)) H10,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'11',1,0)) H11,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'12',1,0)) H12,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'13',1,0)) H13,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'14',1,0)) H14,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'15',1,0)) H15,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'16',1,0)) H16,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'17',1,0)) H17,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'18',1,0)) H18,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'19',1,0)) H19,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'20',1,0)) H20,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'21',1,0)) H21,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'22',1,0)) H22 ,
SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'23',1,0)) H23,
COUNT(*) TOTAL
FROM v$log_history a
where first_time>=to_char(sysdate-7)
GROUP BY thread#,SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5)
ORDER BY thread#,SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) DESC;
分析:可以看到节点1 日志切换频率相差变大,节点2从10/16号8点开始,切换频率比平时多出很多,初步判断节点2 undo使用率比节点1高和此相关。
6. 查找相关语句
DELETE FROM FLEX_SMT_PARAM_UPLOADVALUE WHERE UPDATE_TIME < SYSDATE - 30 AND ROWNUM