点击上方“IT那活儿”公众号--专注于企业全栈运维技术分享,不管IT什么活儿,干就完了!!!
一
问题描述
数据库侧接到业务反馈:获取XXX数据源连接失败。经过判断,非普遍现象,而是个例。定位为数据库承载的核心业务出现问题,旁路业务不受影响。二
处理过程
2.1 登录检查数据库运行情况
发现数据库存在大量enq类等待事件,主要为索引相关"enq: TX - index contention"。
INST_ID EVENT# EVENT COUNT(1)
---------- ---------- ---------------------------------------------------------------- ----------
1 180 db file scattered read 1
1 187 gc cr request 1
1 1496 enq: PS - contention 1
1 538 inactive session 1
1 307 enq: TX - row lock contention 2
1 198 gc cr block 3-way 2
1 205 gc current block busy 3
1 441 SQL*Net message to client 4
1 308 enq: TX - allocate ITL entry 80
1 324 enq: HW - contention 81
1 309 enq: TX - index contention 5203
2 205 gc current block busy 1
2 189 gc cr multi block request 1
2 87 latch: MGA shared context root latch 1
2 531 wait list latch free 1
2 871 LGWR wait for redo copy 1
2 354 row cache lock 1
2 187 gc cr request 2
2 179 db file sequential read 2
2 180 db file scattered read 4
2 562 PGA memory operation 5
2 186 gc current request 5
2 121 gc buffer busy acquire 8
2 441 SQL*Net message to client 12
2 324 enq: HW - contention 45
2 307 enq: TX - row lock contention 92
2 529 latch free 195
2 308 enq: TX - allocate ITL entry 726
2 309 enq: TX - index contention 1709
3 1493 PX Deq: Slave Session Stats 1
3 354 row cache lock 1
3 186 gc current request 1
3 179 db file sequential read 1
3 632 process diagnostic dump 1
3 198 gc cr block 3-way 2
3 205 gc current block busy 2
3 18 Disk file operations I/O 3
3 441 SQL*Net message to client 8
3 308 enq: TX - allocate ITL entry 438
3 309 enq: TX - index contention 983
4 180 db file scattered read 1
4 18 Disk file operations I/O 1
4 186 gc current request 1
4 1090 enq: TX - contention 1
4 221 gc current grant 2-way 1
4 205 gc current block busy 1
4 197 gc cr block 2-way 1
4 198 gc cr block 3-way 2
4 307 enq: TX - row lock contention 3
4 441 SQL*Net message to client 8
4 324 enq: HW - contention 20
4 308 enq: TX - allocate ITL entry 52
4 309 enq: TX - index contention 964
5 1483 PX Deq: Join ACK 1
6 610 ksxr poll remote instances 1
6 179 db file sequential read 1
2.2 检查相关SQL
发现大量连接是对表OZzzz_202304的DML操作,执行效率低下。
insert into
OZzzz_202304(REGION_ID,VALID_DATE,ORG_ID,DONE_DATE,DONE_CODE
,OP_ID,BUSI_ID,SO_LOG_ID,NOTES,CREATE_DATE,BILL_ID,OPT_CODE,
CHANNEL_TYPE,SESSION_ID,EXPIRE_DATE)values(:1 ,:2 ,:3 ,:4
,:5,:6 ,:7 ,:8 ,:9 ,:10 ,:11 ,:12 ,:13 ,:14 ,:15 )
insert into OZzzz_202304 (REGION_ID,VALID_DATE,ORG_ID,DONE_DATE,DONE_CODE,OP_ID,BUSI_
ID,NOTES,SO_LOG_ID,CREATE_DATE,BILL_ID,LOG_TYPE,SESSION_ID,E
XPIRE_DATE)values(:1 ,:2 ,:3 ,:4 ,:5 ,:6 ,:7 ,:8,:9 ,:10
,:11 ,:12 ,:13 ,:14 )
2.3 下探分析相关表结构情况
发现这张表索引达到10个,进一步加重锁等待事件(enq: TX - index contention),同时,检查相关索引状态正常。
2.4 进一步检查ALERT日志
发现索引表空间存在无法扩充的错误。
2023-04-06T09:19:57.556609+08:00
ORA-1683: unable to extend index ZZZ partition P82 by 1024 in tablespace YY_IND01
2023-04-06T09:19:57.893577+08:00
ORA-1654: unable to extend index ZZZ by 1024 in tablespace YY_IND01
2023-04-06T09:19:58.124847+08:00
ORA-1654: unable to extend index ZZZ by 1024 in tablespace YY_IND01S
2023-04-06T09:19:58.143427+08:00
ORA-1683: unable to extend index ZZZ partition P20 by 1024 in tablespace YY_IND01
2.5 表空间 YY_IND01使用情况
发现有 244G 左右剩余空间,但由于未达到告警阈值 95%(使用量),并未触发语音告警。
2.6 综合评估,判断为磁盘碎片化问题会导致无连续可用空间使用
对于表空间碎片,FSFI(自由空间碎片指数,低于30需要关注)有一定参考价值。进一步检查业务表空间,发现FSFI远小于30,说明各个表空间碎片较多。
TABLESPACE_NAME FSFI
--------------- -------
YY_DAT02 0.51
YY_IND01 0.60
YY_DAT03 0.63
YY_DAT01 0.76
三
总结及处理措施
3.1 问题总结
-
不可因问题的表面现象,而影响问题定位。表空间使用率虽未达到100%,但如果碎片化严重的话,剩余空间未有连续空间使用,也会触发空间不足问题。
-
索引表空间 YY_IND01 碎片化严重,虽然剩余可用空间244G,但无连续可用空间使用,碎片化问题导致数据库操作效率下降,引发资源争用。
-
对相关表执行DML操作时,由于每张表有较多索引,问题进一步放大,因此产生了大量的 index contention (索引争用),进而影响业务效率。
3.2 解决方法
紧急解决方法:对索引表空间紧急扩容,业务逐步恢复。
3.3 建议措施
加强数据库碎片化监控与分析,将表空间碎片化情况纳入日常运维通报事项,及时通知业务对于碎片化严重的表,进行高水位表重建及压降。在系统监控方面,添加表空间碎片化预警,同时,调整表空间使用率监控阈值(由 95% 紧急下调至 90%)。四
相关SQL及脚本
4.1 表空间碎片化查询
1)查看表空间的使用率和最大可用空间
SET PAGESIZE 140 LINESIZE 200
COLUMN used_pct FORMAT A11
SELECT tablespace_name,
size_mb,
free_mb,
max_size_mb,
max_free_mb,
TRUNC((max_free_mb/max_size_mb) * 100) AS free_pct,
RPAD(' '|| RPAD('X',ROUND((max_size_mb-max_free_mb)/max_size_mb*10,0), 'X'),11,'-') AS used_pct
FROM (
SELECT a.tablespace_name,
b.size_mb,
a.free_mb,
b.max_size_mb,
a.free_mb + (b.max_size_mb - b.size_mb) AS max_free_mb
FROM (SELECT tablespace_name,
TRUNC(SUM(bytes)/1024/1024) AS free_mb
FROM dba_free_space
GROUP BY tablespace_name) a,
(SELECT tablespace_name,
TRUNC(SUM(bytes)/1024/1024) AS size_mb,
TRUNC(SUM(GREATEST(bytes,maxbytes))/1024/1024) AS max_size_mb
FROM dba_data_files
GROUP BY tablespace_name) b
WHERE a.tablespace_name = b.tablespace_name
)
ORDER BY tablespace_name;
TABLESPACE_NAME SIZE_MB FREE_MB MAX_SIZE_MB MAX_FREE_MB FREE_PCT USED_PCT
------------------------------ ---------- ---------- ----------- ----------- ---------- -----------
OGG_TS 1024 1022 5120 5118 99 ----------
SYSAUX 2210 345 32767 30902 94 X---------
SYSTEM 1090 8 32767 31685 96 ----------
UNDOTBS1 1495 1452 32767 32724 99 ----------
UNDOTBS2 200 184 32767 32751 99 ----------
USERS 11 3 32767 32759 99
注意:自动扩展有效的情况下,MAX_SIZE_MB、FREE_PCT、USED_PCT都是是基于可扩展的最大Size计算的,FREE_MB是基于当前文件大小计算的可用空间。
-
Tablespace_Name ----- 表空间名
-
Size_MB ----- 当前表空间大小(当前值)
-
FREE_MB ----- 当前表空间可用空间(当前值)
-
MAX_SIZE_MB ----- 表空间最大空间(基于可扩展的最大Size计算)
-
MAX_FREE_MB ----- 表空间最大可用空间(基于可扩展的最大Size计算)
-
FREE_PCT ----- 表空间空闲率(基于可扩展的最大Size计算)
-
USED_PCT ----- 表空间使用率(基于可扩展的最大Size计算)
2)查看表空间的空闲空间碎片数和最大连续空间
col tsname format a16 justify c heading 'Tablespace'
col nfrags format 999,990 justify c heading 'Free Frags'
col mxfrag format 999,999,990 justify c heading 'Largest Frag (MB)'
col totsiz format 999,999,990 justify c heading 'Total (MB)'
col avasiz format 999,999,990 justify c heading 'Available (MB)'
col pctusd format 990 justify c heading 'Pct Used%'
set pagesize 9999 lines 200
select
total.tablespace_name tsname,
count(free.bytes) nfrags,
nvl(max(free.bytes)/1024/1024,0) mxfrag,
nvl(sum(free.bytes)/1024/1024,0) avasiz,
a.bytes/1024/1024 totsiz,
(1-nvl(sum(free.bytes),0)/a.bytes) *100 pctusd
from
dba_data_files total,
(select tablespace_name, sum(bytes) bytes
from dba_data_files
group by tablespace_name) a,
dba_free_space free
where
total.tablespace_name = a.tablespace_name(+)
and total.tablespace_name = free.tablespace_name(+)
and total.file_id=free.file_id(+)
group by
total.tablespace_name,
a.bytes
order by pctusd desc
/
Tablespace Free Frags Largest Frag (MB) Available (MB) Total (MB) Pct Used%
---------------- ---------- ----------------- -------------- ------------ ---------
SYSTEM 2 8 9 1,090 99
SYSAUX 551 102 344 2,210 84
USERS 2 3 4 11 67
UNDOTBS2 7 159 176 200 12
UNDOTBS1 58 1,375 1,446 1,495 3
OGG_TS 1 1,023 1,023 1,024 0
6 rows selected.
主要关注碎片化较多且最大连续空间较少的表空间。
-
Tablespace ----- 表空间名
-
Free Frags ----- free space碎片化数
-
Largest Frag (MB) ----- free space中最大连续空间(MB)
-
Total (MB) ----- 表空间数据文件的大小 (MB)
-
Available (MB) ----- 使用可能的Extent数(MB)
-
PctUsed ----- 表空间使用率(%)
4.2 Segment碎片化查询
1)确认碎片化相对较高的表
以下脚本列出SYSTEM和SYSAUX以外Segment HWM较高的TOP 50个对象:
set pagesize 9999
set lines 200
col table_name for a20
col owner for a10
select * from (
SELECT owner owner, TABLE_NAME , round(nvl((BLOCKS *8192 / 1024/1024)-(NUM_ROWS*AVG_ROW_LEN/1024/1024),0),2)
"Data lower than HWM in MB" FROM DBA_TABLES WHERE tablespace_name not in ('SYSTEM', 'SYSAUX')
union all
SELECT table_owner owner, TABLE_NAME , round(nvl((BLOCKS *8192 / 1024/1024) - (NUM_ROWS*AVG_ROW_LEN/1024/1024),0),2)
"Data lower than HWM in MB" FROM dba_tab_partitions WHERE tablespace_name not in ('SYSTEM', 'SYSAUX')
order by 3 desc
) where rownum 1024
)
select *
from (select b.owner||'.'||b.index_name name,
round(100 * (1 - b.leaf_blocks a.blocks)) fragment_pct,
round(segsize / 1024 / 1024) || 'M' as ind_size,
to_char(last_analyzed, 'yyyy-mm-dd hh24:mi:ss') as last_analyzed
from seg a,
(select owner,
index_name,
leaf_blocks,
distinct_keys,
num_rows,
last_analyzed,
initial_extent
from dba_indexes
where partitioned = 'NO'
and tablespace_name not in ('SYSTEM', 'SYSAUX')
and blevel > 3) b
where a.owner = b.owner
and a.segment_name = b.index_name
and a.segsize > b.initial_extent
union all
select b.owner||'.'||b.index_name name,
round(100 * (1 - b.leaf_blocks / a.blocks)) fragment_pct,
round(segsize / 1024 / 1024) || 'M' as ind_size,
to_char(last_analyzed, 'yyyy-mm-dd hh24:mi:ss') as last_analyzed
from seg a,
(select index_owner owner,
index_name,
partition_name,
leaf_blocks,
distinct_keys,
num_rows,
last_analyzed,
initial_extent
from dba_ind_partitions
where blevel > 3
and tablespace_name not in ('SYSTEM', 'SYSAUX')
) b
where a.owner = b.owner
and a.segment_name = b.index_name
and a.segsize > b.initial_extent
order by fragment_pct)
where fragment_pct >= 35
order by 2;
参考:How to Determine When an Index Should be Rebuilt? (Doc ID 1373415.1)
需要注意:查询脚本输出结果依赖于统计信息的精确度。
END