原作者:阎书利
数据库版本:PG12.1 openGauss/MOGDB 2.1.0
最近看到了许多关于PG等待事件的文章,对等待事件这部分也有了很大的兴趣。
等待事件是一个累计的统计信息,表明一个server process要继续完成作业,必须等待一个时间的结束;因为系统资源有限,那么完成某些工作,所需资源就要轮流使用,那么在这个过程当中,就会产生等待资源的情况。数据库会用不同类型的定义,来描述这个事情,称之为等待事件。
openGauss/MOGDB数据库是基于PG研发的,PG是从9.6版本加入了等待事件特性,可以通过查询pg_stat_activity中的wait_event_type和wait_event了解到每个sql进程在当前更详细的执行状态 。openGauss/MOGDB在PG的基础上有许多优化及改动,把一些等待事件重新定义,等待事件在保留了部分原等待事件的基础上也增加了一部分。
在分析问题的时候,等待事件对于我们还是较为重要的,我们可以根据等待事件,初步定位问题,并结合相关测试进行验证,看到了熟悉的等待事件,我们甚至能大概猜出问题所在。相对于ORACLE来说,PG以及openGauss/MOGDB的等待事件种类和数量较少。在等待事件这方面可能还有极大优化的空间,如果能把等待事件的细粒程度增加,应该会帮助我们更好的了解数据库状态,解决数据库问题。
一般来说产生等待事件的几种情况:
1.请求的资源忙,需要资源释放
2.会话处于空闲状态,等待任务
3.会话被阻塞,需要等待阻塞解除
以下内容对比可能根据PG版本有变化,如果有误,欢迎帮我指正交流。
一、ORACLE与PG与openGauss/MOGDB等待事件种类
在ORACLE 11G里,共有13类等待事件,包含了1367个等待事件,如下所示:
解释SYS@orcl11g> select distinct wait_class from v$event_name order by 1;
WAIT_CLASS
--------------------------------------
Administrative --管理类
Application --应用类
Cluster --集群类
Commit --提交类
Concurrency --并发
Configuration --配置
Idle --空闲
Network --网络
Other --其他
Queueing --队列
Scheduler --任务调度
System I/O --系统I/O
User I/O --用户I/O
13 rows selected.
SYS@orcl11g>select count(name) from v$event_name;
COUNT(NAME)
-------------------
1367
而PG数据库里,有着9类等待事件
解释 /* ----------
* Wait Classes
* ----------
*/
#define PG_WAIT_LWLOCK 0x01000000U /* 等待LWLock */
#define PG_WAIT_LOCK 0x03000000U /* 等待Lock */
#define PG_WAIT_BUFFER_PIN 0x04000000U /* 等待访问数据缓冲区 */
#define PG_WAIT_ACTIVITY 0x05000000U /* 服务器进程处于空闲状态 */
#define PG_WAIT_CLIENT 0x06000000U /* 等待应用客户端程序在套接字中进行操作 */
#define PG_WAIT_EXTENSION 0x07000000U /* 等待扩展模块中的操作 */
#define PG_WAIT_IPC 0x08000000U /* 等待进程间通信 */
#define PG_WAIT_TIMEOUT 0x09000000U /* 等待达到超时时间 */
#define PG_WAIT_IO 0x0A000000U /* 等待IO操作完成 */
openGauss/MOGDB里,有着5类等待事件
解释 /* ----------
* Wait Event Classes
* ----------
*/
#define WAIT_EVENT_END 0x00000000U /* 等待事件结束*/
#define PG_WAIT_LWLOCK 0x01000000U /* 等待LWLock */
#define PG_WAIT_LOCK 0x03000000U /* 等待Lock */
#define PG_WAIT_IO 0x0A000000U /* 等待IO操作完成 */
#define PG_WAIT_SQL 0x0B000000U /* 等待SQL的类型 */
二、PG和openGauss/MOGDB等待事件对比
1.WAIT_EVENT_END
在类型定义里,WAIT_EVENT_END更像是一种声明等待事件结束的状态,可以看到代码使用部分,在调用pgstat_report_waitevent 函数报告某个等待事件之后,进行相关处理,最后再调用一次pgstat_report_waitevent 函数报告WAIT_EVENT_END,类似于声明操作结束,等待结束。
pgstat_report_waitevent 函数部分代码如下,这个函数会从服务器进程需要等待的地方调用,会报告等待事件信息,等待信息被存储作为4字节。
2.PG_WAIT_LOCK
Lock类的等待事件表示backend后台进程等待重量级的锁,通常是指 relation、tuple、page、transactionid 等子类型锁 。
在PG里共有10种,
解释 /*
* LOCKTAG is the key information needed to look up a LOCK item in the
* lock hashtable. A LOCKTAG value uniquely identifies a lockable object.
*
* The LockTagType enum defines the different kinds of objects we can lock.
* We can handle up to 256 different LockTagTypes.
*/
typedef enum LockTagType
{
LOCKTAG_RELATION, /* whole relation */
LOCKTAG_RELATION_EXTEND, /* the right to extend a relation */
LOCKTAG_PAGE, /* one page of a relation */
LOCKTAG_TUPLE, /* one physical tuple */
LOCKTAG_TRANSACTION, /* transaction (for waiting for xact done) */
LOCKTAG_VIRTUALTRANSACTION, /* virtual transaction (ditto) */
LOCKTAG_SPECULATIVE_TOKEN, /* speculative insertion Xid and token */
LOCKTAG_OBJECT, /* non-relation database object */
LOCKTAG_USERLOCK, /* reserved for old contrib/userlock code */
LOCKTAG_ADVISORY /* advisory user locks */
} LockTagType;
而openGauss/MOGDB的LOCK类等待事件增加了LOCKTAG_PARTITION、LOCKTAG_PARTITION_SEQUENCE、LOCKTAG_CSTORE_FREESPACE、LOCKTAG_RELFILENODE、LOCKTAG_SUBTRANSACTION,分别是分区、分区序列、cstore的空闲空间、relfilenode以及子事务的等待。
解释 /*
* LOCKTAG is the key information needed to look up a LOCK item in the
* lock hashtable. A LOCKTAG value uniquely identifies a lockable object.
*
* The LockTagType enum defines the different kinds of objects we can lock.
* We can handle up to 256 different LockTagTypes.
*/
typedef enum LockTagType {
LOCKTAG_RELATION, /* whole relation */
/* ID info for a relation is DB OID + REL OID; DB OID = 0 if shared */
LOCKTAG_RELATION_EXTEND, /* the right to extend a relation */
/* same ID info as RELATION */
LOCKTAG_PARTITION, /*partition*/
LOCKTAG_PARTITION_SEQUENCE, /*partition sequence*/
LOCKTAG_PAGE, /* one page of a relation */
/* ID info for a page is RELATION info + BlockNumber */
LOCKTAG_TUPLE, /* one physical tuple */
/* ID info for a tuple is PAGE info + OffsetNumber */
LOCKTAG_TRANSACTION, /* transaction (for waiting for xact done) */
/* ID info for a transaction is its TransactionId */
LOCKTAG_VIRTUALTRANSACTION, /* virtual transaction (ditto) */
/* ID info for a virtual transaction is its VirtualTransactionId */
LOCKTAG_OBJECT, /* non-relation database object */
/* ID info for an object is DB OID + CLASS OID + OBJECT OID + SUBID */
LOCKTAG_CSTORE_FREESPACE, /* cstore free space */
/*
* Note: object ID has same representation as in pg_depend and
* pg_description, but notice that we are constraining SUBID to 16 bits.
* Also, we use DB OID = 0 for shared objects such as tablespaces.
*/
LOCKTAG_USERLOCK, /* reserved for old contrib/userlock code */
LOCKTAG_ADVISORY, /* advisory user locks */
/* same ID info as spcoid, dboid, reloid */
LOCKTAG_RELFILENODE, /* relfilenode */
LOCKTAG_SUBTRANSACTION, /* subtransaction (for waiting for subxact done) */
/* ID info for a transaction is its TransactionId + SubTransactionId */
LOCK_EVENT_NUM
} LockTagType;
3.PG_WAIT_IO
如下的IO类部分为openGauss/MOGDB和PG12.1对比所不具有的,可以看到PG12.1比openGauss/MOGDB多了逻辑复制查询重写,Reorder Buffer的读写等待、时间线历史文件的同步、WAL同步、WAL BOOTSTRAP的同步、写等待等等。
解释 WAIT_EVENT_DSM_FILL_ZERO_WRITE,
WAIT_EVENT_LOCK_FILE_RECHECKDATADIR_READ,
WAIT_EVENT_LOGICAL_REWRITE_CHECKPOINT_SYNC,
WAIT_EVENT_LOGICAL_REWRITE_MAPPING_SYNC,
WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE,
WAIT_EVENT_LOGICAL_REWRITE_SYNC,
WAIT_EVENT_LOGICAL_REWRITE_TRUNCATE,
WAIT_EVENT_LOGICAL_REWRITE_WRITE,
WAIT_EVENT_REORDER_BUFFER_READ,
WAIT_EVENT_REORDER_BUFFER_WRITE,
WAIT_EVENT_REORDER_LOGICAL_MAPPING_READ,
WAIT_EVENT_TIMELINE_HISTORY_FILE_SYNC,
WAIT_EVENT_TIMELINE_HISTORY_FILE_WRITE,
WAIT_EVENT_TIMELINE_HISTORY_READ,
WAIT_EVENT_TIMELINE_HISTORY_SYNC,
WAIT_EVENT_TIMELINE_HISTORY_WRITE,
WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ,
WAIT_EVENT_WAL_BOOTSTRAP_SYNC,
WAIT_EVENT_WAL_BOOTSTRAP_WRITE,
WAIT_EVENT_WAL_SYNC,
而如下部分为PG12.1不具有而openGauss/MOGDB独有的,多了undo文件相关,doublerwite文件读写等等、
解释 WAIT_EVENT_BUF_HASH_SEARCH,
WAIT_EVENT_BUF_STRATEGY_GET,
WAIT_EVENT_UNDO_FILE_EXTEND,
WAIT_EVENT_UNDO_FILE_PREFETCH,
WAIT_EVENT_UNDO_FILE_READ,
WAIT_EVENT_UNDO_FILE_WRITE,
WAIT_EVENT_UNDO_FILE_FLUSH,
WAIT_EVENT_UNDO_FILE_SYNC,
WAIT_EVENT_WAL_BUFFER_ACCESS,
WAIT_EVENT_WAL_BUFFER_FULL,
WAIT_EVENT_DW_READ,
WAIT_EVENT_DW_WRITE,
WAIT_EVENT_DW_SINGLE_POS,
WAIT_EVENT_DW_SINGLE_WRITE,
WAIT_EVENT_PREDO_PROCESS_PENDING,
WAIT_EVENT_PREDO_APPLY,
WAIT_EVENT_DISABLE_CONNECT_FILE_READ,
WAIT_EVENT_DISABLE_CONNECT_FILE_SYNC,
WAIT_EVENT_DISABLE_CONNECT_FILE_WRITE,
WAIT_EVENT_MPFL_INIT,
WAIT_EVENT_MPFL_READ,
WAIT_EVENT_MPFL_WRITE,
WAIT_EVENT_OBS_LIST,
WAIT_EVENT_OBS_READ,
WAIT_EVENT_OBS_WRITE,
WAIT_EVENT_LOGCTRL_SLEEP,
WAIT_EVENT_COMPRESS_ADDRESS_FILE_FLUSH,
WAIT_EVENT_COMPRESS_ADDRESS_FILE_SYNC,
4.PG_WAIT_LWLOCK
LWLock的等待事件主要包含两种:LWLockNamed 和LWLockTranche ,前者表示backend后台进程等待某种特定的轻量级锁 ,后者表示表示backend后台进程等待一组相关轻量级锁。
这一部分的等待事件较多,就不一一列举了,但是可以看到,openGauss/MOGDB和PG12.1的LWLockNamed 类等待事件只有少部分一致,这个可能与openGauss/MOGDB基于PG 9.2.4版本研发有关,等待事件重新定义了。而LWLockTranche 这部分还是有一部分是一致的,但明显openGauss/MOGDB补充的等待事件数量也更加多。
5.PG_WAIT_SQL
这一类的等待事件是openGauss/MOGDB分的一类关于SQL的,可以看到等待的SQL类型。
解释 /* ----------
* Wait Events - SQL
*
* Using this to indicate the type of SQL DML event.
* ----------
*/
typedef enum WaitEventSQL {
WAIT_EVENT_SQL_SELECT = PG_WAIT_SQL,
WAIT_EVENT_SQL_UPDATE,
WAIT_EVENT_SQL_INSERT,
WAIT_EVENT_SQL_DELETE,
WAIT_EVENT_SQL_MERGEINTO,
WAIT_EVENT_SQL_DDL,
WAIT_EVENT_SQL_DML,
WAIT_EVENT_SQL_DCL,
WAIT_EVENT_SQL_TCL
} WaitEventSQL;
如下是相应的pgstat_report_wait_count 函数使用的关于这个等待事件的部分,可以看到它主要是使用pg atomic函数根据wait_event_info为用户添加sql计数。
解释 /* Using pg atomic function to add count for corresponsible WaitEventSQL */
if (classId == PG_WAIT_SQL) {
WaitEventSQL w = (WaitEventSQL)wait_event_info;
switch (w) {
case WAIT_EVENT_SQL_SELECT: {
UPDATE_SQL_COUNT(WaitCountStatusCell->WaitCountArray[dataid].wc_cnt.wc_sql_select,
WaitCountStatusCell->WaitCountArray[dataid].wc_cnt.selectElapse);
} break;
case WAIT_EVENT_SQL_UPDATE: {
UPDATE_SQL_COUNT(WaitCountStatusCell->WaitCountArray[dataid].wc_cnt.wc_sql_update,
WaitCountStatusCell->WaitCountArray[dataid].wc_cnt.updateElapse);
} break;
case WAIT_EVENT_SQL_INSERT: {
UPDATE_SQL_COUNT(WaitCountStatusCell->WaitCountArray[dataid].wc_cnt.wc_sql_insert,
WaitCountStatusCell->WaitCountArray[dataid].wc_cnt.insertElapse);
} break;
case WAIT_EVENT_SQL_DELETE: {
UPDATE_SQL_COUNT(WaitCountStatusCell->WaitCountArray[dataid].wc_cnt.wc_sql_delete,
WaitCountStatusCell->WaitCountArray[dataid].wc_cnt.deleteElapse);
} break;
case WAIT_EVENT_SQL_MERGEINTO:
pg_atomic_fetch_add_u64(&(WaitCountStatusCell->WaitCountArray[dataid].wc_cnt.wc_sql_mergeinto), 1);
break;
case WAIT_EVENT_SQL_DDL:
pg_atomic_fetch_add_u64(&(WaitCountStatusCell->WaitCountArray[dataid].wc_cnt.wc_sql_ddl), 1);
break;
case WAIT_EVENT_SQL_DML:
pg_atomic_fetch_add_u64(&(WaitCountStatusCell->WaitCountArray[dataid].wc_cnt.wc_sql_dml), 1);
break;
case WAIT_EVENT_SQL_DCL:
pg_atomic_fetch_add_u64(&(WaitCountStatusCell->WaitCountArray[dataid].wc_cnt.wc_sql_dcl), 1);
break;
case WAIT_EVENT_SQL_TCL:
pg_atomic_fetch_add_u64(&(WaitCountStatusCell->WaitCountArray[dataid].wc_cnt.wc_sql_tcl), 1);
break;
default:
break;
}
}
LWLockRelease(WaitCountHashLock);
}