锁 单个SESSION锁等待队列超过20个 #应用#
一、数据库状态检查的步骤/命令(通过本步骤可以确定数据库当前的状态是否OK)
检查连接情况,确保能正常登录、有应用连接、连接数使用率正常
col PROCESS_LIMIT for a20;
select process_current,value process_limit,round(process_current / value, 2) * 100 || '%' as process_pct
from (select count(*) as process_current from v$process) a, v$parameter where name = 'processes';
二、请描述告警出现的可能原因
锁等待,可能是应用逻辑或者语句效率问题
三、预计多久不处理就可以出现UIOC(P1)
持续等待则影响业务语句正常执行,需15分钟内处理正常
四、告警检查的步骤(通过本步骤确定告警是否是误告)
--单个会话的锁等待队列长度
select /*+rule*/'lock_num:'||nvl(max(count(aa.holding_session)),0) lock_num from dba_waiters aa group by aa.holding_session;
五、需要保存现场的步骤/命令
create table dbmgr.ash_20220000 as select tt.* from gv$active_session_history tt;
create table dbmgr.lock_20220000 as select *from gv$lock;
create table dbmgr.locked_object_20220000 as select *from GV$LOCKED_OBJECT;
六、需要反馈上级以及运营的信息
将堵塞的情况反馈运营/DA,同时确认业务的影响情况,运营确认是否kill holder会话
七、建议处理的流程,步骤和对应命令
1、检查堵塞情况,找到holder,发给运营评估能否kill
a.
set line 1000 pagesize 999
col EVENT for a40
select
s.username,
decode(lc.block, 0, 'waiter: ', 'holder: ') || lc.sid sid,
decode(lc.lmode,
0,
'none',
1,
'null',
2,
'row share',
3,
'row excl.',
4,
'share',
5,
's/row excl.',
6,
'exclusive',
lmode,
ltrim(to_char(lc.lmode, '990'))) lmode,
decode(lc.request,
0,
'none',
1,
'null',
2,
'row share',
3,
'row excl.',
4,
'share',
5,
's/row excl.',
6,
'exclusive',
request,
ltrim(to_char(lc.request, '990'))) request,
lc.type,
decode( s.sql_id,null,s.PREV_SQL_ID,s.sql_id) sql_id,
s.EVENT,s.BLOCKING_SESSION_STATUS,s.BLOCKING_SESSION,s.BLOCKING_INSTANCE,
'alter system kill session ' || '''' || S.SID || ',' || S.SERIAL# || ',@' ||
s.INST_ID || ''''||' immediate;' kill_command
from gv$lock lc, gv$session s
where (lc.id1, lc.id2, lc.type) in
(select t.id1, t.id2, t.type from gv$lock t where t.request > 0)
and lc.sid = s.sid
and lc.inst_id = s.inst_id
order by lc.id1, lc.request;
b.
with l as(
select /*+materialize*/ decode(request,0,'Holder:',' Waiter:') role, l.inst_id,l.sid, id1,id2,lmode,request,l.type from gv$lock l
where (id1,id2,l.type) in (select id1,id2,type from gv$lock where request>0 )
order by id1,request)
select l.*, s.SQL_ID, s.PREV_SQL_ID from l, gv$session s where l.sid = s.sid ;
with l as (select/*+materialize*/ sid from gv$lock where block = 1)
select 'alter system kill session '''||s.sid||','||serial#||',@'||inst_id||''' immediate;' as sql from gv$session s, l where s.sid = l.sid
order by s.sid,s.serial#;
2、反馈运营会话详情,评估能否kill
--查看sid对应会话详情
set linesize 150 pagesize 999
col EVENT for a30
col USERNAME for a10
col MACHINE for a10
col OSUSER for a10
col PROGRAM for a10
select inst_id,sid,serial#,sql_id,event,blocking_instance,blocking_session,blocking_session_status,final_blocking_instance,final_blocking_session,final_blocking_session_status,username,status,LAST_CALL_ET,machine,osuser,program,sql_hash_value,to_char(logon_time,'yyyy-mm-dd hh24:mi:ss'),prev_sql_id from gv$session where sid=&sid;
--查看sql_id对应具体语句sql_text
set linesize 150 pagesize 999
select a.sql_id,a.child_number,a.sql_text,a.parsing_schema_name,a.first_load_time,a.last_load_time,a.last_active_time,a.plan_hash_value,a.sql_plan_baseline from gv$sql a where sql_id='&sql_id' order by last_active_time desc;
3、杀会话(注意核对会话信息)
执行步骤1生成的kill session语句
八、验证问题已经恢复的步骤和命令
--单个会话的锁等待队列长度,恢复为0则无堵塞
select /*+rule*/'lock_num:'||nvl(max(count(aa.holding_session)),0) lock_num from dba_waiters aa group by aa.holding_session;
LOCK_NUM
-------------------------------------------------
lock_num:0
oracle行锁等待处理
1、接到告警,有几个SQL大量行锁等待
2、查询top event sql
set linesize 1000
select inst_id,sql_id,EVENT,sum(DELTA_TIME),count(1)
from gv$active_session_history ash
where ash.SAMPLE_TIME >= sysdate - 1/24
group by inst_id,sql_id,EVENT
order by count(1) desc,sum(DELTA_TIME) desc
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
3、查SQL文本,确定被锁的表为 beneficiary_info
select
distinct sql_id,sql_text from gv$sqlarea t where t.sql_id in
('3cp53mvc1wvwj','brwrfn81as7nk');
4、查表锁情况及阻塞源
select sess.inst_id,sess.sid,sess.serial#, lo.oracle_username,lo.os_user_name, ao.object_name,lo.locked_mode,
sess.prev_sql_id,sess.sql_id,sess.status,sess.LAST_CALL_ET,sess.LOGON_TIME,sess.event,
sess.ROW_WAIT_OBJ#,sess.BLOCKING_SESSION,sess.FINAL_BLOCKING_SESSION,sess.BLOCKING_INSTANCE,
(select s.sql_text from gv$sql s where s.sql_id=sess.sql_id and rownum=1) sql_text,
(select s.sql_text from gv$sql s where s.sql_id=sess.prev_sql_id and rownum=1) sql_text1
from gv$locked_object lo,dba_objects ao,gv$session sess
where ao.object_id=lo.object_id
and lo.session_id=sess.sid
and lo.inst_id = sess.inst_id
and ao.object_name=upper('beneficiary_info');
发现有大量阻塞,且没有明确的阻塞源
5、sys登录数据库做hang分析
SYS@lucd01> oradebug setmypid
Statement processed.
SYS@lucd01> oradebug -g all hanganalyze 3
Hang Analysis in /paic/app/oracle/rdbms/diag/rdbms/lucd0/lucd01/trace/lucd01_diag_409518.trc
SYS@lucd01> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.5.0.0.0
6、快速截取hang分析关键信息
cnsz083184:lucd01 > grep -n "HANG ANALYSIS:" /paic/app/oracle/rdbms/diag/rdbms/lucd0/lucd01/trace/lucd01_diag_409518.trc
2445:HANG ANALYSIS:
cnsz083184:lucd01 > vi /paic/app/oracle/rdbms/diag/rdbms/lucd0/lucd01/trace/lucd01_diag_409518.trc
vi后esc :2440 跳转到相应行
从HANG ANALYSIS:开始截取,截取到主要阻塞,重点关注 Chains
most likely to have caused the hang 内容
*** 2022-07-28T14:39:38.300811+08:00
===============================================================================
HANG ANALYSIS:
instances (db_name.oracle_sid): lucd0.lucd01, lucd0.lucd02, lucd0.lucd03
oradebug_node_dump_level: 3
analysis initiated by oradebug
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 14:39:37 ]
NOTE: scheduling delay has not been sampled for 0.630678 secs
0.000000 secs from [ 14:39:33 - 14:39:38 ], 5 sec avg
0.000000 secs from [ 14:38:38 - 14:39:38 ], 1 min avg
0.000000 secs from [ 14:34:38 - 14:39:38 ], 5 min avg
vktm time drift history
===============================================================================
Chains most likely to have caused the hang:
[a] Chain 1 Signature: 'SQL*Net message from client'