SpringBoot 优雅停机

2023年 9月 25日 70.2k 0

SpringBoot 优雅停机的方式

  • K8S 停止 Pod 时,默认会先发送 SIGTERM 信号,尝试让应用进程优雅停机,如果应用进程无法在 K8S 规定的优雅停止超时时间内退出,即 terminationGracePeriodSeconds 的值(默认为 30 秒),则 K8S 会送 SIGKILL 强制杀死应用进程。
  • 手动停止,发送请求到 Spring Boot Actuator 的停机端点:/actuator/shutdown,SpringBoot 会关闭 Web ApplicationContext,然后退出,实现优雅停机。
  • kill -TERM 方式

    SpringBoot 优雅停机时会调用 @PreDestroy 标注的函数。

    @PreDestroy
    public void cleanup() {
        // 执行清理操作
        log.info("Received shutdown event. Performing cleanup and shutting down gracefully.");
    }
    

    发送 SIGTERM 信号给 SpringBoot 进程,在 cleanup() 打印的日志信息中,找到了执行停机任务的线程名:SpringApplicationShutdownHook。

    [2023-09-21 08:29:34.232] INFO  [SpringApplicationShutdownHook] - Received shutdown event. Performing cleanup and shutting down gracefully.
    

    立马全局搜索该线程名,发现 SpringBoot 调用 Runtime.getRuntime().addShutdownHook(new Thread(this, "SpringApplicationShutdownHook")) 方法,向 JVM 注册了一个 ShutdownHook。ShutdownHook 可以在 JVM 即将关闭时执行一些清理或收尾的任务。

    class SpringApplicationShutdownHook implements Runnable {
    
        void addRuntimeShutdownHook() {
            try {
               Runtime.getRuntime().addShutdownHook(new Thread(this, "SpringApplicationShutdownHook"));
            }
            catch (AccessControlException ex) {
               // Not allowed in some environments
            }
        }
    

    在 SpringApplication#run() 方法中,执行 applicationContext.refresh() 方法之前,向 JVM 注册了 ShutdownHook。

    image.png

    使用 AtomicBoolean shutdownHookAdded 变量,确保多线程并发执行时,只有一个线程可以成功添加 SpringApplicationShutdownHook。

    将 ConfigurableApplicationContext context 对象添加到 Set contexts 集合中,后续会调用会调用 close() 方法关闭 ConfigurableApplicationContext 对象。SpringBoot Web 容器对应的实现为 AnnotationConfigServletWebServerApplicationContext。

    class SpringApplicationShutdownHook implements Runnable {
    
        private final Set contexts = new LinkedHashSet();
    
        private final AtomicBoolean shutdownHookAdded = new AtomicBoolean();
    
        SpringApplicationShutdownHandlers getHandlers() {
           return this.handlers;
        }
    
        void registerApplicationContext(ConfigurableApplicationContext context) {
           addRuntimeShutdownHookIfNecessary();
           synchronized (SpringApplicationShutdownHook.class) {
              assertNotInProgress();
              context.addApplicationListener(this.contextCloseListener);
              this.contexts.add(context);
           }
        }
    
        private void addRuntimeShutdownHookIfNecessary() {
           if (this.shutdownHookAdded.compareAndSet(false, true)) {
              addRuntimeShutdownHook();
           }
        }
    
        void addRuntimeShutdownHook() {
           try {
              Runtime.getRuntime().addShutdownHook(new Thread(this, "SpringApplicationShutdownHook"));
           }
           catch (AccessControlException ex) {
              // Not allowed in some environments
           }
        }
    

    SpringApplicationShutdownHook: A Runnable to be used as a shutdown hook to perform graceful shutdown of Spring Boot applications. run() 方法中做了两件重要的事情:

  • contexts.forEach(this::closeAndWait):关闭 ConfigurableApplicationContext,并等待 context 变为 inactive,超时时间默认 10S。如果 context.close() 操作中存在非常耗时的同步操作 ,这里的超时等待不会生效,程序会阻塞在 context.close() 操作。
  • actions.forEach(Runnable::run):用户自定义的 Shutdown Action 可以添加到 this.handlers 中,SpringApplicationShutdownHook 在执行关闭任务时,会回调用户自定义的 Shutdown Action。Logback 优雅停机就用到了这个机制,后面会说到。
  • class SpringApplicationShutdownHook implements Runnable {
    
        private static final int SLEEP = 50;
        
        private static final long TIMEOUT = TimeUnit.MINUTES.toMillis(10);
        
        @Override
        public void run() {
            Set contexts;
            Set closedContexts;
            Set actions;
            synchronized (SpringApplicationShutdownHook.class) {
               this.inProgress = true;
               contexts = new LinkedHashSet(this.contexts);
               closedContexts = new LinkedHashSet(this.closedContexts);
               actions = new LinkedHashSet(this.handlers.getActions());
            }
            contexts.forEach(this::closeAndWait);
            closedContexts.forEach(this::closeAndWait);
            actions.forEach(Runnable::run);
        }
        
        // Call ConfigurableApplicationContext.close() and wait until the context becomes inactive. 
        // We can't assume that just because the close method returns that the context is actually inactive. 
        // It could be that another thread is still in the process of disposing beans.
        // 关闭 ConfigurableApplicationContext,等待 context 变为 inactive,超时时间默认 10S
        private void closeAndWait(ConfigurableApplicationContext context) {
            if (!context.isActive()) {
               return;
            }
            context.close();
            try {
               int waited = 0;
               while (context.isActive()) {
                  if (waited > TIMEOUT) {
                     throw new TimeoutException();
                  }
                  Thread.sleep(SLEEP);
                  waited += SLEEP;
               }
            }
            catch (InterruptedException ex) {
               Thread.currentThread().interrupt();
               logger.warn("Interrupted waiting for application context " + context + " to become inactive");
            }
            catch (TimeoutException ex) {
               logger.warn("Timed out waiting for application context " + context + " to become inactive", ex);
            }
        }
    

    ConfigurableApplicationContext#close() 方法注意事项:

    Close this application context, releasing all resources and locks that the implementation might hold. This includes destroying all cached singleton beans.

    Note: Does not invoke close on a parent context; parent contexts have their own, independent lifecycle.

    This method can be called multiple times without side effects: Subsequent close calls on an already closed context will be ignored.

    ShutdownEndpoint 方式

    在 yml 中添加如下配置,暴露 Spring Actuator Shutdown 端点:/actuator/shutdown。

    management:
      endpoint:
        shutdown:
          enabled: true
      endpoints:
        web:
          exposure:
            include: '*'
    

    ShutdownEndpoint 原理:接收到请求后,启动新线程执行 this.context.close() 操作。

    @Endpoint(id = "shutdown", enableByDefault = false)
    public class ShutdownEndpoint implements ApplicationContextAware {
    
        private static final Map NO_CONTEXT_MESSAGE = Collections
              .unmodifiableMap(Collections.singletonMap("message", "No context to shutdown."));
    
        private static final Map SHUTDOWN_MESSAGE = Collections
              .unmodifiableMap(Collections.singletonMap("message", "Shutting down, bye..."));
    
        private ConfigurableApplicationContext context;
    
        @WriteOperation
        public Map shutdown() {
           if (this.context == null) {
              return NO_CONTEXT_MESSAGE;
           }
           try {
              return SHUTDOWN_MESSAGE;
           }
           finally {
              Thread thread = new Thread(this::performShutdown);
              thread.setContextClassLoader(getClass().getClassLoader());
              thread.start();
           }
        }
    
        private void performShutdown() {
           try {
              Thread.sleep(500L);
           }
           catch (InterruptedException ex) {
              Thread.currentThread().interrupt();
           }
           this.context.close();
        }
    

    注意:执行 this.context.close() 时,也会异步触发 SpringApplicationShutdownHook#run() 方法,至于是咋触发的,我也没搞清楚。。。

    和发送 SIGTERM 信号相比,SpringApplicationShutdownHook 会在 ApplicationContextClosedListener 中监听 closedContexts,确保不会重复调用 context#close() 方法。

    class SpringApplicationShutdownHook implements Runnable {
    
        private final Set contexts = new LinkedHashSet();
    
        private final Set closedContexts = Collections.newSetFromMap(new WeakHashMap());
    
        private final ApplicationContextClosedListener contextCloseListener = new ApplicationContextClosedListener();
    
        // ApplicationListener to track closed contexts.
        private class ApplicationContextClosedListener implements ApplicationListener {
    
            @Override
            public void onApplicationEvent(ContextClosedEvent event) {
               // The ContextClosedEvent is fired at the start of a call to {@code close()}
               // and if that happens in a different thread then the context may still be
               // active. Rather than just removing the context, we add it to a {@code
               // closedContexts} set. This is weak set so that the context can be GC'd once
               // the {@code close()} method returns.
               synchronized (SpringApplicationShutdownHook.class) {
                  ApplicationContext applicationContext = event.getApplicationContext();
                  SpringApplicationShutdownHook.this.contexts.remove(applicationContext);
                  SpringApplicationShutdownHook.this.closedContexts
                        .add((ConfigurableApplicationContext) applicationContext);
               }
            }
    
        }
    

    image.png

    SpringBoot Tomcat 优雅停机

    SpringBoot 接收到停机信号,默认会立即终止 Tomcat,不会等待现有请求完成。在配置文件中加上 server.shutdown=GRACEFUL 配置,Tomcat 等待当前请求完成,实现优雅停机。

    server:
      shutdown: GRACEFUL
    

    server.shutdown=IMMEDIATE 配置:发送 HTTP 请求后,停止 SpringBoot 应用,控制台咔咔报错。

    [2023-09-22 09:11:08.533] INFO  [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Pausing ProtocolHandler ["http-nio-8080"]
    [2023-09-22 09:11:10.842] INFO  [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Stopping ProtocolHandler ["http-nio-8080"]
    [2023-09-22 09:11:10.863] ERROR [http-nio-8080-exec-3] [bcb01a34-9721-461f-ad3d-2f71c386ff10] [TID: N/A] - controller system exception, java.nio.channels.ClosedChannelException
    org.apache.catalina.connector.ClientAbortException: java.nio.channels.ClosedChannelException
    	at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:353)
    	at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:784)
    	at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:299)
    [2023-09-22 09:11:10.915] INFO  [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Destroying ProtocolHandler ["http-nio-8080"]
    

    server.shutdown=GRACEFUL 配置:发送 HTTP 请求后,停止 SpringBoot 应用,控台输出:Commencing graceful shutdown. Waiting for active requests to complete,SpringBoot 进程会等待 active requests 完成,再退出。

    [2023-09-22 09:18:12.507] INFO  [Thread-5] org.springframework.boot.web.embedded.tomcat.GracefulShutdown 53 [] [TID: N/A] - Commencing graceful shutdown. Waiting for active requests to complete
    [2023-09-22 09:18:12.507] INFO  [tomcat-shutdown] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Pausing ProtocolHandler ["http-nio-8080"]
    [2023-09-22 09:18:17.633] INFO  [tomcat-shutdown] org.springframework.boot.web.embedded.tomcat.GracefulShutdown 78 [] [TID: N/A] - Graceful shutdown complete
    [2023-09-22 09:18:17.637] INFO  [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Pausing ProtocolHandler ["http-nio-8080"]
    [2023-09-22 09:18:17.645] INFO  [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Stopping ProtocolHandler ["http-nio-8080"]
    [2023-09-22 09:18:17.657] INFO  [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Destroying ProtocolHandler ["http-nio-8080"]
    

    顺藤摸瓜,在 TomcatWebServer 源码中找到 Graceful Shutdown 的相关代码,如果 shutdown == Shutdown.GRACEFUL 时,会创建 GracefulShutdown 实例,处理优雅停机相关操作:this.gracefulShutdown.shutDownGracefully(callback),否则接收到停止信号后会立即停机:callback.shutdownComplete(GracefulShutdownResult.IMMEDIATE)。

    TomcatWebServer#shutDownGracefully() 在 WebServerGracefulShutdownLifecycle#stop() 生命周期函数中被调用。

    public class TomcatWebServer implements WebServer {
    
        private final Tomcat tomcat;
    
        private final boolean autoStart;
    
        private final GracefulShutdown gracefulShutdown;
            
        public TomcatWebServer(Tomcat tomcat, boolean autoStart, Shutdown shutdown) {
            Assert.notNull(tomcat, "Tomcat Server must not be null");
            this.tomcat = tomcat;
            this.autoStart = autoStart;
            this.gracefulShutdown = (shutdown == Shutdown.GRACEFUL) ? new GracefulShutdown(tomcat) : null;
            initialize();
        }
        
        @Override
        public void shutDownGracefully(GracefulShutdownCallback callback) {
            if (this.gracefulShutdown == null) {
               callback.shutdownComplete(GracefulShutdownResult.IMMEDIATE);
               return;
            }
            this.gracefulShutdown.shutDownGracefully(callback);
        }
        
    

    GracefulShutdown#shutDownGracefully() 新建了一个线程,异步执行 doShutdown() 方法:获取所有 Connectors,执行 connector.getProtocolHandler().closeServerSocketGraceful() 方法,优雅关闭还未断开连接的 ServerSocket,然后再 while 循环中不断等待 TomcatEmbeddedContext 变为 inactive 状态,调用回调函数,将 Tomcat 状态设置为 GracefulShutdownResult.IDLE。

    如果优雅关闭未在规定时间内返回,this.aborted 会被设置为 true,将 Tomcat 状态设置为 GracefulShutdownResult.REQUESTS_ACTIVE 并返回。

    // Handles Tomcat graceful shutdown.
    final class GracefulShutdown {
    
        private final Tomcat tomcat;
    
        private volatile boolean aborted = false;
    
        GracefulShutdown(Tomcat tomcat) {
           this.tomcat = tomcat;
        }
    
        void shutDownGracefully(GracefulShutdownCallback callback) {
           logger.info("Commencing graceful shutdown. Waiting for active requests to complete");
           new Thread(() -> doShutdown(callback), "tomcat-shutdown").start();
        }
    
        private void doShutdown(GracefulShutdownCallback callback) {
           List connectors = getConnectors();
           connectors.forEach(this::close);
           try {
              for (Container host : this.tomcat.getEngine().findChildren()) {
                 for (Container context : host.findChildren()) {
                    while (isActive(context)) {
                       if (this.aborted) {
                          logger.info("Graceful shutdown aborted with one or more requests still active");
                          callback.shutdownComplete(GracefulShutdownResult.REQUESTS_ACTIVE);
                          return;
                       }
                       Thread.sleep(50);
                    }
                 }
              }
    
           }
           catch (InterruptedException ex) {
              Thread.currentThread().interrupt();
           }
           logger.info("Graceful shutdown complete");
           callback.shutdownComplete(GracefulShutdownResult.IDLE);
        }
        
        private void close(Connector connector) {
            connector.pause();
            connector.getProtocolHandler().closeServerSocketGraceful();
        }
    

    image.png

    优雅停机的关键就在 Connector#close() 方法中,不过太底层了,啃不动。

    public abstract class AbstractEndpoint {
    
        // Close the server socket (to prevent further connections) if the server socket was originally bound on start() (rather than on init()).
        public final void closeServerSocketGraceful() {
            if (bindState == BindState.BOUND_ON_START) {
                // Stop accepting new connections
                acceptor.stop(-1);
                // Release locks that may be preventing the acceptor from stopping
                releaseConnectionLatch();
                unlockAccept();
                // Signal to any multiplexed protocols (HTTP/2) that they may wish
                // to stop accepting new streams
                getHandler().pause();
                // Update the bindState. This has the side-effect of disabling
                // keep-alive for any in-progress connections
                bindState = BindState.SOCKET_CLOSED_ON_STOP;
                try {
                    doCloseServerSocket();
                } catch (IOException ioe) {
                    getLog().warn(sm.getString("endpoint.serverSocket.closeFailed", getName()), ioe);
                }
            }
        }
    

    后续会执行 TomcatWebServer#stop() 方法,如果超过规定时间,Tomcat GracefulShutdown 还未完成其任务,则会执行 TomcatWebServer#stop() 强制停止 Tomcat。

    TomcatWebServer#stop() 在 WebServerStartStopLifecycle#stop() 生命周期函数中被调用。

    public class TomcatWebServer implements WebServer {
    
        @Override
        public void stop() throws WebServerException {
            synchronized (this.monitor) {
               boolean wasStarted = this.started;
               try {
                  this.started = false;
                  try {
                     if (this.gracefulShutdown != null) {
                        this.gracefulShutdown.abort();
                     }
                     stopTomcat();
                     this.tomcat.destroy();
                  }
                  catch (LifecycleException ex) {
                     // swallow and continue
                  }
               }
               catch (Exception ex) {
                  throw new WebServerException("Unable to stop embedded Tomcat", ex);
               }
               finally {
                  if (wasStarted) {
                     containerCounter.decrementAndGet();
                  }
               }
            }
        }
    

    Logback 优雅停机,保证日志不丢失

    为了优化程序日志性能,通常有两个做法:

  • 设置 OutputStreamAppender#immediateFlush = false,OutputStreamAppender#immediateFlush 默认为 true,默认每次 log event 都强制执行 flush 刷盘操作。将 immediateFlush 改为 false 后,不用每次 log event 都执行刷盘操作,可减少 IO 刷盘次数。但是当 Pod 重启或者停止时,可能会丢失操作系统未 flush 的日志内容。这就需要利用 ShutdownHook 实现 logback 优雅停机。
  • 设置 AsyncAppender,logback 默认同步方式打印日志,在同一个进程中,每个线程需要先获取 lock 锁,才能操作 outputStream,多线程同时打印日志,锁争抢可能导致性能问题。使用 AsyncAppender 装饰原生 Appender,log event 变为异步操作,由统一的线程统一操作 outputStream。问题同上,Pod 重启或者停止时,可能会丢失 BlockingQueue 中的 log event,同样需要利用 ShutdownHook 实现 logback 优雅停机。
  • 说个好消息,SpringBoot 已经帮我们造好了轮子,而且 AutoConfiguration 也默认生效,也就是说,我们啥代码也不需要写,只需要保证 SpringBoot 能够正确接收到 SIGTERM 信号,就行。。。他真的,我哭死。。。

    logback 优雅停机回调函数的注册:在 LoggingApplicationListener#onApplicationEvent() 方法中监听到 ApplicationEnvironmentPreparedEvent 事件,会调用 SpringApplication.getShutdownHandlers().add(shutdownHandler) 方法,向 SpringApplication.getShutdownHandlers() 中注册 logback shutdownHandler。该 shutdownHandler 会被 SpringApplicationShutdownHook#run() 方法回调。

    public class LoggingApplicationListener implements GenericApplicationListener {
    
        @Override
        public void onApplicationEvent(ApplicationEvent event) {
            // ...
            else if (event instanceof ApplicationEnvironmentPreparedEvent) {
               onApplicationEnvironmentPreparedEvent((ApplicationEnvironmentPreparedEvent) event);
            }
            // ...
        }
    
        private void registerShutdownHookIfNecessary(Environment environment, LoggingSystem loggingSystem) {
            if (environment.getProperty(REGISTER_SHUTDOWN_HOOK_PROPERTY, Boolean.class, true)) {
               Runnable shutdownHandler = loggingSystem.getShutdownHandler();
               if (shutdownHandler != null && shutdownHookRegistered.compareAndSet(false, true)) {
                  registerShutdownHook(shutdownHandler);
               }
            }
        }
        void registerShutdownHook(Runnable shutdownHandler) {
            SpringApplication.getShutdownHandlers().add(shutdownHandler);
        }
    

    上述代码添加的 Logback ShutdownHandler 在 LogbackLoggingSystem 类中定义:

    public class LogbackLoggingSystem extends Slf4JLoggingSystem {
    
        public Runnable getShutdownHandler() {
            return () -> getLoggerContext().stop();
        }
    

    LifeCycle 接口是 logback 组件的生命周期规范,stop() 方法是销毁组件的方法,Appender 接口实现了 LifeCycle 规范,调用 Appender#stop() 方法可以优雅地销毁 Appender 实例。

    public interface Appender extends LifeCycle, ContextAware, FilterAttachable {
    
    

    getLoggerContext().stop() --> reset() 会调用 root.recursiveReset() 方法,这个 root 是 ch.qos.logback.classic.Logger 对象,对应着 logback 标签。

    
        
        
    
    

    image.png

    root logger 对象中聚合两个 appender 对象,分别为代码中配置的 ConsoleAppender 和 RollingFileAppender。在 AppenderAttachableImpl#detachAndStopAllAppenders() 方法中,遍历 Appender 对象,调用其 stop() 方法,销毁实例。

    public final class Logger implements org.slf4j.Logger, LocationAwareLogger, AppenderAttachable, Serializable {
    
        transient private AppenderAttachableImpl aai;
    
        public void detachAndStopAllAppenders() {
            if (aai != null) {
                aai.detachAndStopAllAppenders();
            }
        }
        
    public class AppenderAttachableImpl implements AppenderAttachable {
    
        final private COWArrayList appenderList = new COWArrayList(new Appender[0]);
    
        public void detachAndStopAllAppenders() {
            for (Appender a : appenderList) {
                a.stop();
            }
            appenderList.clear();
        }
    

    OutputStreamAppender stop 时会关闭输出流,该操作将未 flush 的日志内容强制刷出到 this.outputStream 中,并关闭输出流。

    public class OutputStreamAppender extends UnsynchronizedAppenderBase {
    
        public void stop() {
            lock.lock();
            try {
                closeOutputStream();
                super.stop();
            } finally {
                lock.unlock();
            }
        }
        
        protected void closeOutputStream() {
            if (this.outputStream != null) {
                try {
                    // before closing we have to output out layout's footer
                    encoderClose();
                    this.outputStream.close();
                    this.outputStream = null;
                } catch (IOException e) {
                    addStatus(new ErrorStatus("Could not close output stream for OutputStreamAppender.", this, e));
                }
            }
        }
    

    AsyncAppenderBase stop 时,会等待 work 线程:worker.join(maxFlushTime),默认时间为 1s。

    public class AsyncAppenderBase extends UnsynchronizedAppenderBase implements AppenderAttachable {
    
        // The default maximum queue flush time allowed during appender stop. 
        // If the worker takes longer than this time it will exit, discarding any remaining items in the queue
        public static final int DEFAULT_MAX_FLUSH_TIME = 1000;
        int maxFlushTime = DEFAULT_MAX_FLUSH_TIME;
    
        @Override
        public void stop() {
            if (!isStarted())
                return;
    
            // mark this appender as stopped so that Worker can also processPriorToRemoval if it is invoking
            // aii.appendLoopOnAppenders
            // and sub-appenders consume the interruption
            super.stop();
    
            // interrupt the worker thread so that it can terminate. Note that the interruption can be consumed
            // by sub-appenders
            worker.interrupt();
    
            InterruptUtil interruptUtil = new InterruptUtil(context);
    
            try {
                interruptUtil.maskInterruptFlag();
    
                worker.join(maxFlushTime);
    
                // check to see if the thread ended and if not add a warning message
                if (worker.isAlive()) {
                    addWarn("Max queue flush timeout (" + maxFlushTime + " ms) exceeded. Approximately " + blockingQueue.size()
                                    + " queued events were possibly discarded.");
                } else {
                    addInfo("Queue flush finished successfully within timeout.");
                }
    
            } catch (InterruptedException e) {
                int remaining = blockingQueue.size();
                addError("Failed to join worker thread. " + remaining + " queued events may be discarded.", e);
            } finally {
                interruptUtil.unmaskInterruptFlag();
            }
        }
    

    相关文章

    JavaScript2024新功能:Object.groupBy、正则表达式v标志
    PHP trim 函数对多字节字符的使用和限制
    新函数 json_validate() 、randomizer 类扩展…20 个PHP 8.3 新特性全面解析
    使用HTMX为WordPress增效:如何在不使用复杂框架的情况下增强平台功能
    为React 19做准备:WordPress 6.6用户指南
    如何删除WordPress中的所有评论

    发布评论