1、现象描述
某客户某系统一个月内连续多次服务崩溃,每次都是重启应用服务、重启数据库服务后又正常,但是差不多每到周日就崩溃,具体崩溃时间不定,大概率是每周日白天崩溃,研发都定位不到问题出在哪里,希望我通过数据库去看。
2、应用日志
很多个服务,这里只贴了关键服务的关键日志信息:
2024-06-02 15:15:21.495 [http-nio-8640-exec-51] INFO c.s.e.openapi.api.service.impl.OpenApiServiceImpl - 传入SID1001,appId04784BB4
2024-06-02 15:15:21.496 [http-nio-8640-exec-51] INFO c.s.e.openapi.api.service.impl.OpenApiServiceImpl - 处理后的APPID04784BB4
2024-06-02 15:15:35.257 [http-nio-8640-exec-54] ERROR c.s.etcloud.frame.exception.RestExceptionHandler - ---------> huge error!
org.mybatis.spring.MyBatisSystemException: nested exception is org.apache.ibatis.exceptions.PersistenceException:
### Error querying database. Cause: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 128, maxActive 128, creating 0
### The error may exist in URL [jar:file:/data/szaisino/etcloud/app/etcloud-openapi-app/etcloud-openapi-app-saas-3.1.4.jar!/BOOT-INF/classes!/mybatis/IOpenApiCallbackLogMapper.xml]
### The error may involve com.szhtxx.etcloud.openapi.event.mapper.IOpenApiCallbackLogMapper.selectByIdAndType
### The error occurred while executing a query
### Cause: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 128, maxActive 128, creating 0
at org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfPossible(MyBatisExceptionTranslator.java:77)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:446)
at com.sun.proxy.$Proxy175.selectOne(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.selectOne(SqlSessionTemplate.java:166)
at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:83)
at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:59)
at com.sun.proxy.$Proxy228.selectByIdAndType(Unknown Source)
at sun.reflect.GeneratedMethodAccessor345.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:93)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
at com.sun.proxy.$Proxy229.selectByIdAndType(Unknown Source)
at com.szhtxx.etcloud.openapi.event.service.impl.EventProcessServiceImpl.updateCallBackLog(EventProcessServiceImpl.java:222)
at com.szhtxx.etcloud.openapi.event.controller.EventProcessController.updateCallBackLog(EventProcessController.java:76)
at com.szhtxx.etcloud.openapi.event.controller.EventProcessController$$FastClassBySpringCGLIB$$dde319fb.invoke()
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:749)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.framework.adapter.MethodBeforeAdviceInterceptor.invoke(MethodBeforeAdviceInterceptor.java:56)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:88)
at com.szhtxx.etcloud.frame.common.aspect.EcloudAspect.busiControllerLogAround(EcloudAspect.java:212)
at sun.reflect.GeneratedMethodAccessor221.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:644)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:633)
at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:93)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:688)
at com.szhtxx.etcloud.openapi.event.controller.EventProcessController$$EnhancerBySpringCGLIB$$6f2d840f.updateCallBackLog()
at sun.reflect.GeneratedMethodAccessor344.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:189)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:895)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:800)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1038)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1005)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:908)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:665)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:882)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.boot.actuate.web.trace.servlet.HttpTraceFilter.doFilterInternal(HttpTraceFilter.java:90)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.session.web.http.SessionRepositoryFilter.doFilterInternal(SessionRepositoryFilter.java:151)
at org.springframework.session.web.http.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:81)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.cloud.sleuth.instrument.web.ExceptionLoggingFilter.doFilter(ExceptionLoggingFilter.java:50)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at brave.servlet.TracingFilter.doFilter(TracingFilter.java:86)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.filterAndRecordMetrics(WebMvcMetricsFilter.java:117)
at org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.doFilterInternal(WebMvcMetricsFilter.java:106)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:200)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:490)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:834)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1415)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.ibatis.exceptions.PersistenceException:
### Error querying database. Cause: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 128, maxActive 128, creating 0
### The error may exist in URL [jar:file:/data/szaisino/etcloud/app/etcloud-openapi-app/etcloud-openapi-app-saas-3.1.4.jar!/BOOT-INF/classes!/mybatis/IOpenApiCallbackLogMapper.xml]
### The error may involve com.szhtxx.etcloud.openapi.event.mapper.IOpenApiCallbackLogMapper.selectByIdAndType
### The error occurred while executing a query
### Cause: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 128, maxActive 128, creating 0
at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:150)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:141)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:77)
at sun.reflect.GeneratedMethodAccessor342.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:433)
... 96 common frames omitted
Caused by: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 128, maxActive 128, creating 0
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:81)
at org.mybatis.spring.transaction.SpringManagedTransaction.openConnection(SpringManagedTransaction.java:82)
at org.mybatis.spring.transaction.SpringManagedTransaction.getConnection(SpringManagedTransaction.java:68)
at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:338)
at org.apache.ibatis.executor.SimpleExecutor.prepareStatement(SimpleExecutor.java:84)
at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:62)
at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:326)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109)
at sun.reflect.GeneratedMethodAccessor210.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.plugin.Invocation.proceed(Invocation.java:49)
at com.szhtxx.etcloud.frame.config.datasource.EtcloudQueryInterceptor.intercept(EtcloudQueryInterceptor.java:72)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:61)
at com.sun.proxy.$Proxy318.query(Unknown Source)
at com.github.pagehelper.PageInterceptor.intercept(PageInterceptor.java:143)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:61)
at com.sun.proxy.$Proxy318.query(Unknown Source)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:148)
... 102 common frames omitted
Caused by: com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 128, maxActive 128, creating 0
at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1404)
at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1223)
at com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4619)
at com.alibaba.druid.filter.stat.StatFilter.dataSource_getConnection(StatFilter.java:673)
at com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4615)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1201)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1193)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:105)
at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:157)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:115)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:78)
... 121 common frames omitted
3、慢查询报告
4、分析思路
4.1、慢查询日志分析
从慢查询中可以明显看到,系统38%的时间,是某个表open_api_callback_log的查询工作。第一怀疑对象也就是它,顺口问了研发这个表多少记录,研发说80多万。但是往下细看,第一个SQL语句详情,其实是一个备份工作。不会是导致应用崩溃的原因,因为备份是在凌晨进行的。在往下看前5个SQL,发现基本都有索引,但是因为有模糊查询,所有走索引的同时也都有回表操作。优化空间其实不大。第一次分析慢查询报告,就关注了前5个。接着就去关注应用的日志。
4.2、应用日志分析
应用的日志,很多个服务,给了很多分日志,第一次通过事后看日志来分析连接数暴涨的问题【以前都是直接动态跟踪数据库抓取阻塞SQL】,看得迷迷糊糊,看到上面这段日志后,接着又看到正常服务调用的信息,然后才有如下日志:
2024-06-02 15:15:51.447 [http-nio-8640-exec-21] ERROR c.s.etcloud.frame.exception.RestExceptionHandler - ---------> huge error!
org.mybatis.spring.MyBatisSystemException: nested exception is org.apache.ibatis.exceptions.PersistenceException:
### Error querying database. Cause: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 128, maxActive 128, creating 0
### The error may exist in URL [jar:file:/data/szaisino/etcloud/app/etcloud-openapi-app/etcloud-openapi-app-saas-3.1.4.jar!/BOOT-INF/classes!/mybatis/IUimsCallbackOrgMapper.xml]
### The error may involve com.szhtxx.etcloud.openapi.client.mapper.IUimsCallbackOrgMapper.qryByClientId
### The error occurred while executing a query
### Cause: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 128, maxActive 128, creating 0
at org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfPossible(MyBatisExceptionTranslator.java:77)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:446)
at com.sun.proxy.$Proxy175.selectList(Unknown Source)
于是叫研发把jar:file:/data/szaisino/etcloud/app/etcloud-openapi-app/etcloud-openapi-app-saas-3.1.4.jar!/BOOT-INF/classes!/mybatis/IUimsCallbackOrgMapper.xml 这个文件找来,看了看,里面是关于uims_callback_org表的SQL,有INSERT,DELETE,SELECT。于是上数据库检查这个表,发现这个表只有300多条记录。就很纳闷
然后继续研究慢查询报告,发现有一个update open_api_callback_log表的,少了索引,叫他们加上了。同时让研发调整连接池的最大连接数为200【原来是128】。然后就观察。
但是等到加了个索引后的过去7天,刚好又是周日,又崩溃了,显然。还没找到根因。又拿到了慢日志跟应用的日志,慢日志其实没多大区别,就是那些SQL。这次我就仔细对比了每个服务,出现com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 128, maxActive 128, creating 0 这个错误的时间,才注意到上一次最早出现的是本文第一段日志里面的文件,即:【jar:file:/data/szaisino/etcloud/app/etcloud-openapi-app/etcloud-openapi-app-saas-3.1.4.jar!/BOOT-INF/classes!/mybatis/IOpenApiCallbackLogMapper.xml】。再加上请教了更有经验的Java开发工程师,基本确定了,就这个连接数满日志最近的几条SQL引起的。于是叫研发再找到IOpenApiCallbackLogMapper.xml,一看,果然有如下SQL:
把这个SQL放到数据库客户端,查看执行计划,结果如下:
果然,排序的字段不在索引里面。叫研发晚上给加上。然后约定那个周日在家待命,叫客户把连接数触发阈值改成200,同时研发把连接数最多值调整到500个。
5、问题解决
经过一周的观察,安静的周日过后,周一因为小孩发烧请假了,周二一来,就去看了下监控数据,连接数非常平稳!所以,到此,基本可以确定,问题得到解决!
所以积累了一次可贵的经验:如何通过应用日志事后分析定位引起阻塞的SQL。开心,分享给大家!