1、问题描述
数据库服务器内存一直在增长,一直能到系统内存耗尽。
2、问题分析
服务器总内存是120G如果60%是72G,业务单一主要是金蝶ERP系统,使用的人员少。所以不应该是物理内存不够用。
查看内存使用情况:
top -u mysql
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21770 mysql 20 0 114.5g 74.8g 249340 S 54.5 59.7 58965:43 mysqld
20409 mysql 20 0 113416 1300 1296 S 0.0 0.0 0:00.11 mysqld_safe
现在内存的使用情况是59.7%
[root@kingdeemysql-master ~]# cat /proc/21770/status
Name: mysqld
Umask: 0026
State: S (sleeping)
Tgid: 21770
Ngid: 22382
Pid: 21770
PPid: 20409
TracerPid: 0
Uid: 996 996 996 996
Gid: 1000 1000 1000 1000
FDSize: 32768
Groups: 1000
VmPeak: 121584392 kB
VmSize: 120983272 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 85743980 kB
VmRSS: 79213532 kB
RssAnon: 78633732 kB
RssFile: 579800 kB
RssShmem: 0 kB
VmData: 119964636 kB
VmStk: 132 kB
VmExe: 55732 kB
VmLib: 7740 kB
VmPTE: 204976 kB
VmSwap: 2274608 kB
Threads: 401
SigQ: 0/510807
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000084a07
SigIgn: 0000000000001006
SigCgt: 00000001800024e8
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: ffff,ffffffff,ffffffff
Cpus_allowed_list: 0-79
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 4557819
nonvoluntary_ctxt_switches: 2246
在上面可以看出来使用的内存数是75.5G,最上面看见的是59.7%*128G=75.5G
查看
root@(none)10:54>show variables where Variable_name in ('innodb_buffer_pool_size');
+-------------------------+-------------+
| Variable_name | Value |
+-------------------------+-------------+
| innodb_buffer_pool_size | 64424509440 |
+-------------------------+-------------+
1 row in set (0.01 sec)
64424509440/1024/1024/1024=60G
设置的innodb_buffer_pool_size大小是60G但是现在的真实使用的是75.5G
select * from sys.memory_by_host_by_current_bytes;
root@(none)11:06>select * from sys.memory_by_host_by_current_bytes;
+-------------+--------------------+-------------------+-------------------+-------------------+-----------------+
| host | current_count_used | current_allocated | current_avg_alloc | current_max_alloc | total_allocated |
+-------------+--------------------+-------------------+-------------------+-------------------+-----------------+
| 10.2.1.31 | 394405 | 865.31 GiB | 2.25 MiB | 864.98 GiB | 90.08 TiB |
| background | 330664 | 131.68 MiB | 417 bytes | 46.19 MiB | 848.02 GiB |
| localhost | 271 | 366.61 KiB | 1.35 KiB | 185.94 KiB | 494.49 MiB |
| 10.2.1.140 | 15 | 17.18 KiB | 1.15 KiB | 8.03 KiB | 26.33 GiB |
| 10.2.10.187 | 0 | 0 bytes | 0 bytes | 0 bytes | 1.22 GiB |
| 10.2.13.251 | 0 | 0 bytes | 0 bytes | 0 bytes | 27.50 GiB |
| 10.2.14.129 | -27 | -210.34 KiB | 7.79 KiB | 0 bytes | 166.72 GiB |
| 10.2.1.35 | -5390 | -1.67 MiB | 324 bytes | 0 bytes | 3.90 TiB |
+-------------+--------------------+-------------------+-------------------+-------------------+-----------------+
select * from sys.memory_by_user_by_current_bytes;
root@(none)11:27>select * from sys.memory_by_user_by_current_bytes;
+------------+--------------------+-------------------+-------------------+-------------------+-----------------+
| user | current_count_used | current_allocated | current_avg_alloc | current_max_alloc | total_allocated |
+------------+--------------------+-------------------+-------------------+-------------------+-----------------+
| cosmic | 397184 | 887.59 GiB | 2.29 MiB | 887.25 GiB | 90.36 TiB |
| background | 330711 | 131.69 MiB | 417 bytes | 46.19 MiB | 848.51 GiB |
| root | 786 | 440.21 KiB | 573 bytes | 185.94 KiB | 495.22 MiB |
| dhgt | -5390 | -1.67 MiB | 324 bytes | 0 bytes | 3.90 TiB |
+------------+--------------------+-------------------+-------------------+-------------------+-----------------+
root@(none)11:32>show engine performance_schema status;
+--------------------+-------------------------------------------------------------+-----------+
| Type | Name | Status |
+--------------------+-------------------------------------------------------------+-----------+
| performance_schema | events_errors_summary_by_thread_by_error.memory | 37515264 |
| performance_schema | events_errors_summary_by_account_by_error.size | 48848 |
| performance_schema | events_errors_summary_by_account_by_error.count | 128 |
| performance_schema | events_errors_summary_by_account_by_error.memory | 6252544 |
| performance_schema | events_errors_summary_by_user_by_error.size | 48848 |
| performance_schema | events_errors_summary_by_user_by_error.count | 128 |
| performance_schema | events_errors_summary_by_user_by_error.memory | 6252544 |
| performance_schema | events_errors_summary_by_host_by_error.size | 48848 |
| performance_schema | events_errors_summary_by_host_by_error.count | 128 |
| performance_schema | events_errors_summary_by_host_by_error.memory | 6252544 |
| performance_schema | events_errors_summary_global_by_error.size | 158272 |
| performance_schema | events_errors_summary_global_by_error.count | 1 |
| performance_schema | events_errors_summary_global_by_error.memory | 158272 |
| performance_schema | (pfs_buffer_scalable_container).count | 216 |
| performance_schema | (pfs_buffer_scalable_container).memory | 21264 |
| performance_schema | (max_global_server_errors).count | 4946 |
| performance_schema | (max_session_server_errors).count | 1526 |
| performance_schema | performance_schema.memory | 506211568 |
+--------------------+-------------------------------------------------------------+-----------+
506211568 /1024/1024=482.76M
通过show processlist
命令查看到当前有159个会话连接
show variables where variable_name in ('read_buffer_size','read_rnd_buffer_size','sort_buffer_size','join_buffer_size','binlog_cache_size','tmp_table_size');
+----------------------+-----------+
| Variable_name | Value |
+----------------------+-----------+
| binlog_cache_size | 262144 |
| join_buffer_size | 8388608 |
| read_buffer_size | 4194304 |
| read_rnd_buffer_size | 4194304 |
| sort_buffer_size | 8388608 |
| tmp_table_size | 134217728 |
+----------------------+-----------+
6 rows in set (0.01 sec)
每个session的(8388608+4194304+4194304+8388608)/1024/1024=24M
24M*159=3816M/1024=3.7G
总的mysql数据库用的内存是
60+3.7+482.76/1024=64.17G
这和mysql在操作系统中显示的内存差
75.5-64.17=11.33G
tmp_table_size 的设置是128M,连续执行多次show global status like 'Created_tmp%'
命令查看当前是否有临时表产生,发现频繁使用了临时表,并且出现了因内存临时表不够而使用到磁盘临时表:
第一次执行
root@(none)>show global status like 'Created_tmp%';
+-------------------------+----------+
| Variable_name | Value |
+-------------------------+----------+
| Created_tmp_disk_tables | 139373 |
| Created_tmp_files | 37698 |
| Created_tmp_tables | 77805159 |
+-------------------------+----------+
3 rows in set (0.01 sec)
第二次执行
root@(none)>show global status like 'Created_tmp%';
+-------------------------+----------+
| Variable_name | Value |
+-------------------------+----------+
| Created_tmp_disk_tables | 139376 |
| Created_tmp_files | 37728 |
| Created_tmp_tables | 77806712 |
+-------------------------+----------+
3 rows in set (0.01 sec)
第三次执行
root@(none)>show global status like 'Created_tmp%';
+-------------------------+----------+
| Variable_name | Value |
+-------------------------+----------+
| Created_tmp_disk_tables | 139386 |
| Created_tmp_files | 37794 |
| Created_tmp_tables | 77814169 |
+-------------------------+----------+
3 rows in set (0.00 sec)
临时表增长很快
gdb --batch --pid 21770 --ex 'call malloc_trim(0)'