点击上方“IT那活儿”公众号,关注后了解更多内容,不管IT什么活儿,干就完了!!!
故障背景
因对自动化运维平台进行了国产化适配改造,迁移到国产化设备后mysql数据库主机 CPU使用率达到8000%。
主机CPU一共8核,该问题影响了同主机的其他数据库的正常使用。
故障排查
2.1 数据库排查
mysql数据库中先有上千条shsnc用户下的慢SQL语句在执行:
2.2 数据库主机排查
数据库主机上shsnc用户进程占用CPU较高,最高达到8000%。
2.3 慢SQL分析
发现查询都与AMP库的problem表有关系,查询problem表发现有119730971条数据。
2.4 排查服务
排查发现迁移之前的zabbix_server服务器未停止,一直在连接数据库。
故障分析处理
3.1 停止迁移前的zabbix_server服务
停止后发现数据库主机CPU未恢复到正常水平。
3.2 停止现在的zabbix_server服务
停止现有的zabbix_server后,数据库主机CPU恢复至正常水平,慢SQL数量也逐步减少。
3.3 原因分析
此次问题的原因主要有以下两点:
- 1)自动化平台数据库中的部分大表未做表分区,problem表堆积了上亿条数据,mysql无法处理如此大的数据,是导致本次问题的根本原因。
-
2)旧的zabbix_server服务未停止,使其一直在请求数据库,给数据库主机造成一定的压力,是本次问题的次要原因。
解决方案
4.1 停止服务
停止旧环境的zabbix_server,减少数据库压力。
4.2 对数据库中的大表进行分区,分区方案如下
4.2.1 备份表
1)数据库主机创建备份目录
mkdir -p /home/shsnc/mysqldump
2)数据库主机创建备份目录
mysqldump
-u 用户名
-p密码
--database amp problem events snc_resource_health_history events_tag problem_tag
alerts>/home/shsnc/mysqldump/amp_bak.sql
4.2.2 删除外键及添加主键
1)查看创建表语句
show create table 表名;
2)涉及表的信息
表名称 | 表用途备注 |
alerts |
告警相关表,用于记录通过邮件、微信等方式发送的 告警信息 |
problem | 监控告警问题表 |
events | 事件表 |
events_tag | 该表记录了事件的ID |
problem_tag | 用于记录事件发生的问题标识表 |
3)查看所有表外键语句
select * from INFORMATION_SCHEMA.KEY_COLUMN_USAGE where
REFERENCED_TABLE_NAME='被引用表的表名';
4)删除外键
-
alter table 表名 drop foreign key 外键名称。
- 或者在navicat界面操作删除外键。
5)分区字段添加至主键
- 需要分区字段添加至主键,否则无法创建分区,需在navicat界面操作。events表、problem表的clock字段添加为主键,将clock添加至唯一索引;
- snc_resource_health_history表的create_time字段添加为主键,create_time字段添加至唯一索引里。
4.2.3 检查表分区是否已存在
检查是否按照实际日期每天创建一张分区表:
select
TABLE_SCHEMA,TABLE_NAME,PARTITION_NAME,PARTITION_METHOD,
PARTITION_EXPRESSION,PARTITION_DESCRIPTION,INDEX_LENGTH,DATA_LENGTH
from information_schema.PARTITIONS where TABLE_SCHEMA='amp'
and TABLE_NAME='需分区的表名称';
4.2.4 添加表分区配置
将非分区表调整为分表:
- problem表
alter table problem PARTITION BY RANGE (clock)
(PARTITION p_START VALUES LESS THAN (0) ENGINE = InnoDB,
PARTITION problem20201029 VALUES LESS THAN (1603987200) ENGINE = InnoDB,
PARTITION p_END VALUES LESS THAN MAXVALUE ENGINE = InnoDB);
- events表
alter table events PARTITION BY RANGE (clock)
(PARTITION p_START VALUES LESS THAN (0) ENGINE = InnoDB,
PARTITION events20201029 VALUES LESS THAN (1603987200) ENGINE = InnoDB,
PARTITION p_END VALUES LESS THAN MAXVALUE ENGINE = InnoDB);
- snc_resource_health_history表
alter table snc_resource_health_history PARTITION BY RANGE (create_time)
(PARTITION p_START VALUES LESS THAN (0) ENGINE = InnoDB,
PARTITION events20201029 VALUES LESS THAN (1603987200) ENGINE = InnoDB,
PARTITION p_END VALUES LESS THAN MAXVALUE ENGINE = InnoDB);
4.2.5 更新分区管理表
插入数据至分区管理表:
INSERT INTO amp.partition_manager_settings (`table`, `column`, granularity, increment, retain, buffer) VALUES
('problem', 'clock', '1', '86400', '2592000', '86400');
INSERT INTO amp.partition_manager_settings (`table`, `column`, granularity, increment, retain, buffer) VALUES
('events', 'clock', '1', '86400', '2592000', '86400');
INSERT INTO amp.partition_manager_settings (`table`, `column`, granularity, increment, retain, buffer) VALUES
('snc_resource_health_history', 'create_time', '1', '86400', '2592000', '86400');
4.2.6 创建新的存储过程
存储过程语句:
DELIMITER $$
USE `amp`$$
DROP PROCEDURE IF EXISTS `PROC_ADD_PARTITIONS`$$
CREATE PROCEDURE `PROC_ADD_PARTITIONS`()
BEGIN
DECLARE done TINYINT UNSIGNED;
DECLARE p_table,p_column VARCHAR(64) CHARACTER SET latin1;
DECLARE p_granularity,p_increment,p_retain,p_buffer INT UNSIGNED;
DECLARE run_timestamp,current_val INT UNSIGNED;
DECLARE partition_list TEXT CHARACTER SET latin1;
DECLARE cur_table_list CURSOR FOR SELECT s.table,s.column,s.granularity,s.increment,s.retain,s.buffer FROM partition_manager_settings s;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
SET SESSION group_concat_max_len=65535;
SET run_timestamp= UNIX_TIMESTAMP(FROM_UNIXTIME(UNIX_TIMESTAMP(),'%Y-%m-%d 00:00:00'));
OPEN cur_table_list;
manage_partitions_loop: LOOP
SET done=0;
FETCH cur_table_list INTO p_table,p_column,p_granularity,p_increment,p_retain,p_buffer;
IF done=1 THEN
LEAVE manage_partitions_loop;
END IF;
SELECT IF(t.create_options LIKE '%partitioned%',NULL,run_timestamp)
FROM information_schema.tables t
WHERE t.table_schema=DATABASE()
AND t.table_name=p_table
INTO current_val;
IF current_val IS NOT NULL THEN
SET partition_list:='';
IF p_retain IS NOT NULL THEN
WHILE current_val>run_timestamp-p_retain DO
SET partition_list:=CONCAT('partition ',p_table,FROM_UNIXTIME(current_val,'%Y%m%d'),' values less than (',FLOOR((current_val+p_increment)/p_granularity),'),',partition_list);
SET current_val:=current_val-p_increment;
END WHILE;
END IF;
SET @sql:=CONCAT('alter table ',p_table,' partition by range (',p_column,') (partition p_START values less than (0),',partition_list,'partition p_END values less than MAXVALUE)');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END IF;
IF p_buffer IS NOT NULL THEN
SELECT IFNULL(MAX(p.partition_description)*p_granularity,run_timestamp+p_increment)
FROM information_schema.partitions p
WHERE p.table_schema=DATABASE()
AND p.table_name=p_table
AND p.partition_description>0
INTO current_val;
SET partition_list:='';
WHILE current_val'' THEN
SET @sql:=CONCAT('ALTER TABLE ',p_table,' REORGANIZE PARTITION p_END into (',partition_list,'partition p_END values less than maxvalue)');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END IF;
END IF;
IF p_retain IS NOT NULL THEN
SET partition_list='';
SELECT GROUP_CONCAT(p.partition_name SEPARATOR ',')
FROM information_schema.partitions p
WHERE p.table_schema=DATABASE()
AND p.table_name=p_table
AND p.partition_description0
INTO partition_list;
IF partition_list>'' THEN
SET @sql:=CONCAT('ALTER TABLE ',p_table,' DROP PARTITION ',partition_list);
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END IF;
END IF;
END LOOP;
CLOSE cur_table_list;
END$$
DELIMITER ;
4.2.7 调用存储过程并验证
1)调用存储过程
call PROC_ADD_PARTITIONS;
2)验证表分区有是否创建成功
select
TABLE_SCHEMA,TABLE_NAME,PARTITION_NAME,PARTITION_METHOD,PART
ITION_EXPRESSION,PARTITION_DESCRIPTION,INDEX_LENGTH,DATA_LEN
GTH from information_schema.PARTITIONS where
TABLE_SCHEMA='amp' and TABLE_NAME='表名称';
4.2.8 创建定时任务
每日定时执行存储过程:
CREATE EVENT IF NOT EXISTS event_add_partitions
ON SCHEDULE EVERY 1 DAY STARTS TIMESTAMP '2023-01-13 00:00:00'
ON COMPLETION PRESERVE
DO CALL PROC_ADD_PARTITIONS();
END