今天一个网友发了一个Oracle AWR报告给我,希望能给一些建议,说这套数据库还经常出现ORA-04031错误。这一下就引起了我的好奇。我们都知道在Oracle 11g之前的版本出现这个错误的概率还是比较高的;从11g版本开始,几乎很少见到这个错误了,一年恐怕都遇不到1次。话不多说,我们来看看这个网友提供的数据。
===============================
End 4031 Diagnostic Information
===============================
DDE: Problem Key 'ORA 4031' was flood controlled (0x6) (incident: 295061)
ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","modification ")
*** 2024-01-23 17:51:20.538
DDE: Problem Key 'ORA 4031' was flood controlled (0x6) (incident: 295062)
ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","modification ")
从上面的报错来看,报错是在heap 0,而且是modifyication,那么这个是什么呢?
对于4031错误,我们肯定是要看看trace文件的:
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 1
----------------------------------------------
"free memory " 2295 MB 60%
"KGLH0 " 336 MB 9%
"SQLA " 216 MB 6%
"KQR L PO " 151 MB 4%
"KGLS " 142 MB 4%
"kglsim object batch " 80 MB 2%
"KGLHD " 66 MB 2%
"FileOpenBlock " 53 MB 1%
"kglsim heap " 51 MB 1%
"private strands " 48 MB 1%
-----------------------------------------
free memory 2295 MB
memory alloc. 1545 MB
Sub total 3840 MB
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 2
----------------------------------------------
"free memory " 1932 MB 53%
"KGLH0 " 516 MB 14%
"SQLA " 403 MB 11%
"KGLS " 199 MB 5%
"KGLHD " 95 MB 3%
"KQR M PO " 67 MB 2%
"private strands " 48 MB 1%
"event statistics per sess " 42 MB 1%
"ksunfy : SSO free list " 40 MB 1%
"dbktb: trace buffer " 35 MB 1%
-----------------------------------------
free memory 1932 MB
memory alloc. 1716 MB
Sub total 3648 MB
TOTALS ---------------------------------------
Total free memory 4227 MB
Total memory alloc. 3261 MB
Grand total 7488 MB
可以清楚地看到,该shared pool 包含2个sub pool(Oracle 9i引入了Sub Pool结构,目的是为了提高并发能力),每个sub pool都还有2GB左右的free memory,加起来也就是说大概还有4G左右的free memory。那为什么还报错ORA-04031呢?见鬼了啊。
不要慌,我们回到前面的问题上来,问题出在modification上,那么看看这个是什么鬼?
Current information setting: 04014fff
SGA Heap Dump Interval=3600 seconds
Dump Interval=300 seconds
Last Dump Time=01/23/2024 17:51:15
Dump Count=1
Allocation request for: modification
Heap: 0x6005bb20, size: 4160
......
SO: 0x492fbe518, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x492fbe518, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:45, ser:198, calls cur/top: 0x46ae990d8/0x46ae990d8
flags : (0x0) -
flags2: (0x0), flags3: (0x10)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
(post info) last post received: 8143 0 2
last post received-location: ksl2.h LINE:2374 ID:kslpsr
last process to post me: 0x492fab828 1 6
last post sent: 0 0 26
last post sent-location: ksa2.h LINE:285 ID:ksasnd
last process posted by me: 0x492fab828 1 6
(latch info) wait_event=0 bits=0x2
holding (efd=7) 0x600453a8 hash table modification latch level=1
Location from where latch is held: ksxm.h LINE:900 ID:ksxmimod:
Context saved from call: 0
原来是modification,直接vi 搜一下trace文件即可,看看sub pool 1的这个内存区域究竟还有多大:
==============================================
Memory Utilization of Subpool 1 (SGA HEAP)
==============================================
Allocation Name Size
___________________________ ____________
"free memory " 2406108320
"miscellaneous " 1808
"dsktab_kfgsg " 58152
"sga blobs " 32768
"Wait event pointers " 0
"vproblem_lastinc " 0
......
"modification " 607904
"monitoring co " 137280
"buffers waiting for write" 0
"rules engine aggregate st" 0
"krvxdta " 0
"KGLHD " 69012208
"KGLDA " 18058528
"KGLH0 " 352799032
.......
"kglsim heap " 53849728
"ksz parent " 0
"event statistics per sess" 44558976
"event list array to post " 0
"event list to post commit" 0
"KFG SO child " 0
"KGKP randnum " 40000
从trace来看似乎还有600kb左右,然而要分配4160 bytes都报错了,那说明碎片可能非常的严重,因此我们基本肯定这个库硬解析大概率很严重,那就看看awr吧。
我的天,这个库我感觉毫无压力呀,连测试库都不如。每秒的SQL Executes是11.2,User calls 是12.3,每秒0.3个事务。那么内存相关参数设置呢?
从参数来看,32G物理内存,分配了17G给SGA,也算合理,shared pool 接近7GB,按理说足够用了,毕竟SQL每秒执行次数也就11.2次。接下来一定要看下Oracle 10g引入的Time model statistics了,确认下SQL解析情况,以便于验证我们的判断:
虽然整个SQL解析一共才2.79秒,但是其中硬解析就占了2.29秒,单纯看比重还是比较高。虽然也就几秒钟。好吧,这就是可以完全忽略的那种。
对于数据库内存管理参数,不想多说,一定是自动管理,shared pool设置为4G,已经自动调节涨到了接近7G。大家是不是也非常的好奇,非常的好奇。30分钟硬解析才2.79s,居然还能出ORA-04031错误,太奇怪了。事出反常必有妖,此时我有点怀疑是Bug了(今天上午看awr的时候,没有太注意,毕竟只花了几分钟;晚上写公众号时发现不太对)。
是的,这明显不对,我才注意到modification内存区域属于perm类型,这并非sql area,实际上跟sql解析无关,无关!
我发现其中一个bug极为相似:Bug 8857940;这实际上是一个未公布的Bug(这种通常Oracle不认为是Bug),实际上MOS参考文档上最终的solution实际上跟我上午提供的解决方案基本上一致。
最后我给出的简单建议就是先调整下参数即可。
alter system set "_enable_shared_pool_duration"=false scope=spfile;
alter system set "_shared_pool_reserved_min_alloc"=4000 scope=spfile;
参考文档:ORA-4031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(4,0)","modification ") (Doc ID 1675470.1) ,11.2.0.4可下载相关patch.
简单记录一下,确实没什么可以分析的了,大家可以再来一点案例素材!