在项目升级的时候,需要干掉旧的项目,然后启动一个新的项目。在这个过程中往往会出现服务的不可用,那么我们如何最大限度的做到发布的优雅,尽可能让我们升级的这个过程不影响到线上正在运行的业务?下面我将介绍几种不同的架构模式下Java项目的优雅上下线。
1. 背景
在项目升级的时候,需要干掉旧的项目,然后启动一个新的项目。在这个过程中往往会出现服务的不可用,那么我们如何最大限度的做到发布的优雅,尽可能让我们升级的这个过程不影响到线上正在运行的业务?这时我们就需要实现服务的优雅上下线。
2. 名词解释
服务的优雅上下线就是保证服务的稳定可用,避免流量中断,导致业务不可用。
优雅上线其实就是等服务启动完全就绪后,对外提供服务,也叫无损发布,延迟暴露,服务预热。
优雅下线其实就是在服务收到停机指令(kill -15 pid 或 kill -2 pid 或 kill -1 pid)后,要先到注册中心注销,拒绝新的请求后,将旧的业务处理完成。
3. 实现
3.1 单体项目
对于单体项目而言,优雅上下线比较容易,涉及不到服务间错综复杂的调用,我们只需要保证入口流量在切换时服务已经就绪,且服务能够优雅停机,不会直接断掉正在处理的业务即可。
下面我们介绍几种在单体模式下常用的优雅下线方式。
3.1.1 JVM层面实现
JVM的优雅停机方式是通过Runtime.getRuntime().addShutdownHook(shutdownTask);设置优雅停机任务,来保证程序优雅退出。
那么我们都可以在停机任务中做哪些事情来保证优雅停机呢?
延迟停机,等待其他任务执行
释放连接资源
清理临时文件
关闭线程池
executorService.shutdown(); // 无法接收新任务
executorService.awaitTermination(1500, TimeUnit.SECONDS); // 控制等待时间,防止程序一直运行
......
Thread shutdownHook = new Thread(()->{
System.out.println("优雅停机执行");
});
Runtime.getRuntime().addShutdownHook(shutdownHook);
3.1.2 Spring层面实现
首先,Spring也是依托于JVM实现的,它通过JVM的shutdownHook感知到Java进程关闭,然后执行doClose方法
我们看下doClose方法都做了哪些事
基于对Spring源码的分析,我们可以通过JVM的StutdownHook或者是监听ContextClosedEvent(容器关闭事件),或者是在Bean销毁几个阶段进行自己的优雅停机方法。
// org.springframework.context.support.AbstractApplicationContext
@Deprecated //Spring 5 即将废弃
public void destroy() {
close();
}
@Override
public void close() {
synchronized (this.startupShutdownMonitor) {
doClose();
// If we registered a JVM shutdown hook, we don't need it anymore now:
// We've already explicitly closed the context.
if (this.shutdownHook != null) {
try {
Runtime.getRuntime().removeShutdownHook(this.shutdownHook);
}
catch (IllegalStateException ex) {
// ignore - VM is already shutting down
}
}
}
}
protected void doClose() {
LiveBeansView.unregisterApplicationContext(this);
try {
// Publish shutdown event.
publishEvent(new ContextClosedEvent(this)); ➀
}
// Stop all Lifecycle beans, to avoid delays during individual destruction.
if (this.lifecycleProcessor != null) {
try {
this.lifecycleProcessor.onClose(); ➁
}
catch (Throwable ex) {
logger.warn("Exception thrown from LifecycleProcessor on context close", ex);
}
}
// Destroy all cached singletons in the context's BeanFactory.
destroyBeans(); ➂
// Close the state of this context itself.
closeBeanFactory(); ➃
// Let subclasses do some final clean-up if they wish...
onClose(); ➄
// Reset local application listeners to pre-refresh state.
if (this.earlyApplicationListeners != null) {
this.applicationListeners.clear();
this.applicationListeners.addAll(this.earlyApplicationListeners);
}
// Switch to inactive.
this.active.set(false);
}
3.1.3 SpringBoot(Web容器Tomcat)层面
3.1.3.1 方式一
通过actuator 的endpoint机制关闭服务
首先需要引入spring-boot-starter-actuator健康检查的包
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
添加配置,默认是关闭的
management.endpoint.shutdown.enabled=true
management.endpoints.web.exposure.include=shutdown
服务启动后调用 POST http://127.0.0.1/actuator/shutdown 接口即可关闭服务,这种方式风险会比较高,有被恶意调用的风险,需要结合一些权限验证的机制使用。
实现原理:
@Endpoint(id = "shutdown", enableByDefault = false)
public class ShutdownEndpoint implements ApplicationContextAware {
@WriteOperation
public Map<String, String> shutdown() {
Thread thread = new Thread(this::performShutdown);
thread.setContextClassLoader(getClass().getClassLoader());
thread.start();
}
private void performShutdown() {
try {
Thread.sleep(500L);
}
catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
// 调用AbstractApplicationContext的close()方法,与上文一致
this.context.close();
}
}
// 注入Bean
@Configuration(
proxyBeanMethods = false
)
@ConditionalOnAvailableEndpoint(
endpoint = ShutdownEndpoint.class
)
public class ShutdownEndpointAutoConfiguration {
public ShutdownEndpointAutoConfiguration() {
}
@Bean(
destroyMethod = ""
)
@ConditionalOnMissingBean
public ShutdownEndpoint shutdownEndpoint() {
return new ShutdownEndpoint();
}
}
3.1.3.2 方式二
SpringBoot 2.3之后的版本内置了优雅停机功能,当我们设置
# 开启优雅停机
server.shutdown=graceful # 默认是immediate
spring.lifecycle.timeout-per-shutdown-phase=60s # 最大等待时间,默认是30s
Web容器停机拒绝请求的方式
web 容器名称 | 拒绝方式 |
---|---|
Tomcat 9.0.33+ | 停止接受网络层的请求,客户端新请求等待超时。 |
Reactor Netty | 停止接受网络层的请求,客户端新请求等待超时。 |
Undertow | 接受请求,客户端新请求直接返回 503。 |
Jetty | 停止接受网络层的请求,客户端新请求等待超时。 |
3.1.3.3 方式三
SpringBoot2.3.0 之前的版本 注册实现TomcatConnectorCustomizer,ApplicationListener接口即可
// 注册bean
@Bean
public ShutdownConnectorCustomizer shutdownConnectorCustomizer() {
return new ShutdownConnectorCustomizer();
}
// 需要在Tomcat创建时将 自定义连接器 设置进去
@Bean
public ConfigurableServletWebServerFactory tomcatCustomizer() {
TomcatServletWebServerFactory factory = new TomcatServletWebServerFactory();
factory.addConnectorCustomizers(shutdownConnectorCustomizer());
return factory;
}
private static class ShutdownConnectorCustomizer implements TomcatConnectorCustomizer, ApplicationListener<ContextClosedEvent> {
private static final Logger log = LoggerFactory.getLogger(ShutdownConnectorCustomizer.class);
private volatile Connector connector;
@Override
public void customize(Connector connector) {
this.connector = connector;
}
@Override
public void onApplicationEvent(ContextClosedEvent event) {
this.connector.pause();
Executor executor = this.connector.getProtocolHandler().getExecutor();
if (executor instanceof ThreadPoolExecutor) {
try {
ThreadPoolExecutor threadPoolExecutor = (ThreadPoolExecutor) executor;
threadPoolExecutor.shutdown();
if (!threadPoolExecutor.awaitTermination(30, TimeUnit.SECONDS)) {
log.warn("Tomcat thread pool did not shut down gracefully within 30 seconds. Proceeding with forceful shutdown");
}
} catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
}
}
}
3.2 微服务项目
对于微服务项目来说,优雅上下线就会变得更为复杂,单体应用只需要控制入口流量即可,而微服务会面临着错中复杂的服务调用,出现问题就会导致各种503,超时报错。
如下图,服务B需要发布新版本上线,就会出现如下几种异常状况
其中服务B pod1下线,到客户端感知到服务B pod1下线这段时间,很容易出现问题,这时我们就很难百分百保证优雅,可以结合客户端重试来保证可用,这时也需要服务端具备接口幂等性,负责容易造成数据混乱。
我们最需要解决的就是服务停止前讲服务提供者取消注册,然后再关停服务,再服务能够正常提供服务后,再将自己注册到注册中心。
3.2.1 Eureka层面实现
优雅上线
在微服务中,我们还会遇到,服务未完全启动,就把自己注册到了eureka上,然后服务发现到了该实例却又调不通,这时我们就需要让服务启动好再去注册。
eureka本身是支持延迟注册的,只需要配置一个延迟注册的参数即可
eureka:
client:
healthcheck:
enabled: false
onDemandUpdateStatusChange: false # 启动会立即调用注册,需要关闭
initial-instance-info-replication-interval-seconds: 90 #代表第一次初始化延迟注册的时间间隔,
但是,需要注意了延迟注册时间这里存在一个坑:延迟注册时间最多只有30秒。配置超出30s无效,并且这个bug在一直存在,在Eureka停止维护都没修复。
相信大家对这个bug也很感兴趣,这里我们简单介绍一下这个bug:
在eureka中有三个地方可能会进行注册
总结:延迟注册不生效的原因,默认情况下只配置延迟注册时间是不生效的,需要将eureka.client.healthcheck.enabled 、eureka.client.onDemandUpdateStatusChange 都为false,才可以。即使我们都按照这个方法设置了,但是发送心跳的线程仍然会去注册,最多时间不超过30s
那我们该如何解决这一问题呢?
需要修改心跳的部分代码
// com.netflix.discovery.DiscoveryClient
// Heartbeat timer
heartbeatTask = new TimedSupervisorTask(
"heartbeat",
scheduler,
heartbeatExecutor,
renewalIntervalInSecs,
TimeUnit.SECONDS,
expBackOffBound,
new HeartbeatThread()
);
scheduler.schedule(
heartbeatTask,
// 将renewalIntervalInSecs改为clientConfig.getInitialInstanceInfoReplicationIntervalSeconds()
clientConfig.getInitialInstanceInfoReplicationIntervalSeconds(), TimeUnit.SECONDS);
/**
* The heartbeat task that renews the lease in the given intervals.
*/
private class HeartbeatThread implements Runnable {
public void run() {
if (renew()) {
lastSuccessfulHeartbeatTimestamp = System.currentTimeMillis();
}
}
}
/**
* Renew with the eureka service by making the appropriate REST call
*/
boolean renew() {
EurekaHttpResponse<InstanceInfo> httpResponse;
try {
httpResponse = eurekaTransport.registrationClient.sendHeartBeat(instanceInfo.getAppName(), instanceInfo.getId(), instanceInfo, null);
logger.debug(PREFIX + "{} - Heartbeat status: {}", appPathIdentifier, httpResponse.getStatusCode());
if (httpResponse.getStatusCode() == Status.NOT_FOUND.getStatusCode()) {
REREGISTER_COUNTER.increment();
logger.info(PREFIX + "{} - Re-registering apps/{}", appPathIdentifier, instanceInfo.getAppName());
long timestamp = instanceInfo.setIsDirtyWithTime();
boolean success = register();
if (success) {
instanceInfo.unsetIsDirty(timestamp);
}
return success;
}
return httpResponse.getStatusCode() == Status.OK.getStatusCode();
} catch (Throwable e) {
logger.error(PREFIX + "{} - was unable to send heartbeat!", appPathIdentifier, e);
return false;
}
}
利用SpringApplicationRunListener 完成启动后注册
@Slf4j
public class EurekaRegisterListener implements SpringApplicationRunListener, Ordered {
private final SpringApplication application;
private final String[] args;
public EurekaRegisterListener(SpringApplication sa, String[] arg) {
this.application = sa;
this.args = arg;
}
@Override
public int getOrder() {
return Ordered.LOWEST_PRECEDENCE;
}
@Override
public void starting() {
}
@Override
public void environmentPrepared(ConfigurableEnvironment environment) {
}
@Override
public void contextPrepared(ConfigurableApplicationContext context) {
}
@Override
public void contextLoaded(ConfigurableApplicationContext context) {
}
@Override
public void started(ConfigurableApplicationContext context) {
}
/**
* run方法在刚刚启动的时候会调用一次,然后整体服务启动后还会被调用一次
* @param context
*/
@Override
public void running(ConfigurableApplicationContext context) {
// 获取eureka服务端配置
String eurekaServiceUrls = context.getEnvironment().getProperty("eureka.client.service-url.defaultZone");
if (StringUtils.isEmpty(eurekaServiceUrls)) {
log.error("not found eureka service for manual register");
return;
}
// 第一次调用时上下文并没有被构造因此获取bean时失败,会抛异常,需要捕获并忽略!!
EurekaInstanceConfigBean eurekaInstanceConfigBean;
try {
eurekaInstanceConfigBean = context.getBean(EurekaInstanceConfigBean.class);
} catch (Exception ignore) {
return;
}
// eureka的配置项支持多个地址并用逗号隔开,因此此处也做了兼容
String[] serviceUrlArr = eurekaServiceUrls.split(",");
for (String serviceUrl : serviceUrlArr) {
// 轮询地址,构造restTemplate
EurekaHttpClient eurekaHttpClient = new RestTemplateTransportClientFactory().newClient(new DefaultEndpoint(serviceUrl));
// 获取eureka根据配置文件构造出的实例对象
InstanceInfo instanceInfo = new EurekaConfigBasedInstanceInfoProvider(eurekaInstanceConfigBean).get();
// 此时直接将状态更该为UP,默认为STARTING虽然注册但是不可用
instanceInfo.setStatus(InstanceInfo.InstanceStatus.UP);
// 发送rest请求去注册
EurekaHttpResponse<Void> register = eurekaHttpClient.register(instanceInfo);
// 判断当前地址是成功注册
if (register.getStatusCode() == 204) {
log.info("success manual register eureka");
return;
}
}
}
@Override
public void failed(ConfigurableApplicationContext context, Throwable exception) {
//启动失败时下线eureka实例,eureka内部实现直接拿过来用!
DiscoveryManager.getInstance().shutdownComponent();
}
}
优雅下线
首先是对于Springboot自动停机的选择
这里有一些弊端需要提前声明,直接暴露关机端口,会出现一系列安全性问题,不建议使用,如果是直接注册JVM的ShutdownHook去先到注册中心删除信息,再延迟关机,如下代码,这时会发现虽然注册中心能够感知下线,但是Tomcat会拒绝接收请求,数据库线程池也会关闭,由于其他客户端存在缓存,会导致请求无法正常响应。下面我们从源码层面介绍一下为啥这样不行。
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
// 从eureka注册列表中删除实例
DiscoveryManager.getInstance().shutdownComponent();
// 休眠120S
try {
Thread.sleep(120 * 1000);
} catch (Exception ignore) {
}
}));
通过对Spring源码的探究,我们会发现在容器启动时会注册一个ShutdownHook
private void refreshContext(ConfigurableApplicationContext context) {
if (this.registerShutdownHook) {
try {
//注册shutdownhook
context.registerShutdownHook();
}
catch (AccessControlException ex) {
// Not allowed in some environments.
}
}
refresh((ApplicationContext) context);
}
@Override
public void registerShutdownHook() {
if (this.shutdownHook == null) {
// No shutdown hook registered yet.
this.shutdownHook = new Thread(SHUTDOWN_HOOK_THREAD_NAME) {
@Override
public void run() {
synchronized (startupShutdownMonitor) {
//shutdownhook真正需要执行的逻辑
doClose();
}
}
};
Runtime.getRuntime().addShutdownHook(this.shutdownHook);
}
}
我们可以看一下JVM注册多个ShutdownHook是什么效果,可以发现当我们添加一个ShutdownHook时,JVM就会新创建一个线程执行,多个Hook是会并行执行的,所有Hook执行完毕才会完全退出。
// java.lang.Runtime#addShutdownHook
public void addShutdownHook(Thread hook) {
SecurityManager sm = System.getSecurityManager();
if (sm != null) {
sm.checkPermission(new RuntimePermission("shutdownHooks"));
}
ApplicationShutdownHooks.add(hook);
}
// java.lang.ApplicationShutdownHooks
/* The set of registered hooks */
private static IdentityHashMap<Thread, Thread> hooks;
//添加一个新的ShutdownHook。检查Shutdown状态和hook本身,但不进行任何安全检查。
static synchronized void add(Thread hook) {
if(hooks == null)
throw new IllegalStateException("Shutdown in progress");
if (hook.isAlive())
throw new IllegalArgumentException("Hook already running");
if (hooks.containsKey(hook))
throw new IllegalArgumentException("Hook previously registered");
hooks.put(hook, hook);
}
//为每个hook创建一个新线程。hook同时运行,此方法等待它们完成。
static void runHooks() {
Collection<Thread> threads;
synchronized(ApplicationShutdownHooks.class) {
threads = hooks.keySet();
hooks = null;
}
for (Thread hook : threads) {
hook.start();
}
for (Thread hook : threads) {
while (true) {
try {
hook.join();
break;
} catch (InterruptedException ignored) {
}
}
}
}
这时就可以清楚的知道为什么我们定义的Hook达不到我们期望的结果,因为多个ShutdownHook是并行执行的,互相不会有干扰,虽然我们期望延迟关闭,但是Spring自己也是基于JVM的ShutdownHook进行关闭容器等操作的,所以我们自定义的ShutdownHook是行不通的,那我们该如何解决这个问题呢,其实很简单,打不过就加入,我们直接切入到Spring注册的ShutdownHook就可以了,他也为我们开放了一些Hook,通过上文的源码我们会发现在关闭容器开始会发布一个ContextClose事件,我们直接去监听这个事件就可以实现延迟销毁容器,延迟退出的功能,我们可以按照如下方式实现,由于eureka上注册服务主动下线后,其他客户端最多需要90S才能感知,并且我们的微服务中用Ribbon 做服务调用负载均衡,ribbon又缓存30S,所以最多120S,其他服务感知该节点下线,所以我们设置延迟120s,这里的120s我们可以根据自己的项目进行预估。
@Component
public class EurekaShutdownConfig implements ApplicationListener<ContextClosedEvent>, PriorityOrdered {
private static final Logger log = LoggerFactory.getLogger(EurekaShutdownConfig.class);
@Override
public void onApplicationEvent(ContextClosedEvent event) {
try {
log.info(LogUtil.logMsg("_shutdown", "msg", "eureka instance offline begin!"));
DiscoveryManager.getInstance().shutdownComponent();
log.info(LogUtil.logMsg("_shutdown", "msg", "eureka instance offline end!"));
log.info(LogUtil.logMsg("_shutdown", "msg", "start sleep 120S for cache!"));
// 可以根据架构动态调整
Thread.sleep(120 * 1000);
log.info(LogUtil.logMsg("_shutdown", "msg", "stop sleep 120S for cache!"));
} catch (Throwable ignore) {
}
}
@Override
public int getOrder() {
return 0;
}
}
3.2.2 Nacos层面实现
优雅上线
nacos和eureka是一样的,也需要延迟上线来避免一些问题
实现/actuator/registry,代码如下:
@Slf4j
@Component
@Endpoint(id = "registry")
@ConditionalOnProperty(prefix = "spring.cloud.nacos.discovery", name = "server-addr")
@ConditionalOnClass({NacosServiceRegistry.class, Registration.class})
public class RegistryEndpoint {
/**
* 这里使用的是K8S 就绪状态检查回调去注册服务到注册中心
* 当服务启动 首次获取就绪状态时 将服务注册到配置中心上
* 一旦注册成功后就会像 /actuator/health 一样返回成功即可
*/
private static boolean IS_INIT = false;
/**
* 返回结果:与"/actuator/health"接口返回成功结果一样
*/
private final String SUC = "{"status":"UP","groups":["liveness","readiness"]}";
private final String UNKNOWN = "{"status":"UNKNOWN","groups":["liveness","readiness"]}";
@Value("${spring.application.name}")
private String application;
private final NacosServiceRegistry nacosServiceRegistry;
private final Registration registration;
private final HealthEndpoint healthEndpoint;
public RegistryEndpoint(NacosServiceRegistry nacosServiceRegistry, Registration registration, HealthEndpoint healthEndpoint) {
this.nacosServiceRegistry = nacosServiceRegistry;
this.registration = registration;
this.healthEndpoint = healthEndpoint;
}
@ReadOperation
public String registry() {
if (IS_INIT) {
return SUC;
}
HealthComponent health = healthEndpoint.health();
if(!org.springframework.boot.actuate.health.Status.UP.equals(health.getStatus())){
return UNKNOWN;
}
log.info("将[{}] 服务注册至注册中心 registry into !", application);
nacosServiceRegistry.register(registration);
log.info("将[{}] 服务注册至注册中心 registry success !", application);
IS_INIT = true;
return SUC;
}
}
优雅下线
与eureka一样我们也需要提前结束注册,然后延迟关闭服务
nacos感知速度会比eureka快很多,我们需要等待的时间就可以设置短一些,一般40s足以
@Component
public class NacosShutdownEvent implements ApplicationListener<ContextClosedEvent>, PriorityOrdered {
private static final Logger log = LoggerFactory.getLogger(EurekaShutdownConfig.class);
@Override
public void onApplicationEvent(ContextClosedEvent event) {
try {
log.info(LogUtil.logMsg("_shutdown", "msg", "nacos instance offline begin!"));
NacosServiceRegistry nacosServiceRegistry =
event.getApplicationContext().getBean().getBean(NacosServiceRegistry.class);
NacosRegistration registration =
event.getApplicationContext().getBean(NacosRegistration.class);
nacosServiceRegistry.deregister(registration);
log.info(LogUtil.logMsg("_shutdown", "msg", "nacos instance offline end!"));
log.info(LogUtil.logMsg("_shutdown", "msg", "start sleep 40s for cache!"));
// 睡眠40S,是因为nacos上注册服务主动下线后,清理rabbon缓存时间,
// nacos从其他客户端每10s拉取一次,或者服务端主动推送服务列表,最大40S
Thread.sleep(35 * 1000);
log.info(LogUtil.logMsg("_shutdown", "msg", "stop sleep 40s for cache!"));
} catch (Throwable ignore) {
}
}
@Override
public int getOrder() {
return 0;
}
}
3.2.3 Dubbo层面实现
Dubbo 默认就开启了优雅停机,ShutdownHookListener 监听了 Spring 的关闭事件,当 Spring 开始关闭,就会触发 ShutdownHookListener 的内部逻辑,通过配置dubbo.application.shutwait=30s可以设置dubbo延迟等待时间。
public class SpringExtensionFactory implements ExtensionFactory {
private static final Logger logger = LoggerFactory.getLogger(SpringExtensionFactory.class);
private static final Set<ApplicationContext> CONTEXTS = new ConcurrentHashSet<ApplicationContext>();
private static final ApplicationListener SHUTDOWN_HOOK_LISTENER = new ShutdownHookListener();
public static void addApplicationContext(ApplicationContext context) {
CONTEXTS.add(context);
if (context instanceof ConfigurableApplicationContext) {
// 注册 ShutdownHook
((ConfigurableApplicationContext) context).registerShutdownHook();
// 取消 AbstractConfig 注册的 ShutdownHook 事件
DubboShutdownHook.getDubboShutdownHook().unregister();
}
BeanFactoryUtils.addApplicationListener(context, SHUTDOWN_HOOK_LISTENER);
}
// 继承 ApplicationListener,这个监听器将会监听容器关闭事件
private static class ShutdownHookListener implements ApplicationListener {
@Override
public void onApplicationEvent(ApplicationEvent event) {
if (event instanceof ContextClosedEvent) {
DubboShutdownHook shutdownHook = DubboShutdownHook.getDubboShutdownHook();
shutdownHook.doDestroy();
}
}
}
}
3.2.4 K8s层面实现
这里就要用到K8s的探针以及容器的生命周期回调
探针
版本小于 v1.15 时支持 readiness 和 liveness 探针,在 v1.16 中添加了 startup 探针作为Alpha 功能,并在 v1.18 中升级为 Beta。
我们在使用K8s管理容器时,可以通过探针来探测容器的状态,我们需要了解容器的生命周期。
readiness 存活探针可以让 kubelet 知道应用程序何时准备接受新流量。如果应用程序在进程启动后需要一些时间来初始化状态,要配置 readiness 探针让 Kubernetes 在发送新流量之前进行等待。readiness 探针的主要作用是将流量引导至 service 后的 deployment。
liveness 就绪探针用于重新启动不健康的容器。 Kubelet 会定期地 ping liveness 探针,以确定健康状况,并在 liveness 检查不通过的情况下杀死 Pod。liveness 检查可以帮助应用程序从死锁中恢复。如果不进行 liveness 检查,Kubernetes 会认为死锁中的 Pod 处于健康状态,因为从 Kubernetes 的角度来看,Pod 的子进程仍在运行,是健康的。通过配置 liveness 探针,kubelet 可以检测到应用程序处于不健康状态,并重新启动 Pod 以恢复可用性。
startup 启动探针用于判断应用是否已尽启动,如果同时配置了readiness,liveness,startup,会优先使用startup
探针的参数:
- initialDelaySeconds:启动 liveness、readiness 探针前要等待的秒数。
- periodSeconds:检查探针的频率。
- timeoutSeconds:将探针标记为超时(未通过运行状况检查)之前的秒数。
- successThreshold:探针需要通过的最小连续成功检查数量。
- failureThreshold:将探针标记为失败之前的重试次数。对于 liveness 探针,这将导致 Pod 重新启动。对于 readiness 探针,将标记 Pod 为未就绪(unready)。
结合SpringBoot的健康检查去配置探针
# SpringBoot配置
management:
server:
port: 32518
endpoints:
web:
exposure:
include: health
endpoint:
health:
probes:
enabled: true
show-details: always
health:
livenessstate: #存活状态( Liveness )
enabled: true
readinessstate: # 就绪状态( Readiness )
enabled: true
# K8s 配置
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: '29'
field.cattle.io/publicEndpoints: >-
[{"port":30410,"protocol":"TCP","serviceName":"devops-test:zhj-release-nodeport","allNodes":true}]
creationTimestamp: '2023-10-07T01:39:30Z'
generation: 18
labels:
app: cloud-release
manager: cloud-hcce
recordId: '2398'
runMode: pro
softServiceId: 64d2e20597d214df96504a3f
softVersionId: 64f7d33497d214df96504b54
velero.io/backup-name: devops-test-backup
velero.io/restore-name: devops-test-backup-20231007093909
name: zhj-release
namespace: devops-test
resourceVersion: '9857909'
uid: 9ce21f33-f3ea-45ba-a827-dc1f30a9b978
spec:
progressDeadlineSeconds: 300
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: cloud-release
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
cattle.io/timestamp: '2023-12-10T00:00:00Z'
manager: cloud-hcce
creationTimestamp: null
labels:
app: cloud-release
spec:
containers:
- env:
- name: spring.profiles.active
value: test
- name: JAVA_OPTS
value: '-Xmx1024M'
image: zhj-release:1.0.3-e6b07342-20231010
imagePullPolicy: IfNotPresent
name: cloud-release
ports:
- containerPort: 8080
name: port
protocol: TCP
lifecycle: #生命周期
preStop:
exec:
command:
- /bin/bash #使用kill 15发送系统信号SIGTERM
- '-c'
- kill
- '-n'
- '15'
livenessProbe: #探针配置
failureThreshold: 3
httpGet:
path: /actuator/health/liveness
port: 32518
scheme: HTTP
initialDelaySeconds: 120
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 2
readinessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health/readiness
port: 32518
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 2
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: '2023-12-10T00:00:00Z'
lastUpdateTime: '2023-12-10T00:00:00Z'
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: 'True'
type: Available
- lastTransitionTime: '2023-12-10T00:00:00Z'
lastUpdateTime: '2023-12-10T00:00:00Z'
message: ReplicaSet "zhj-release-54bfcc7878" has successfully progressed.
reason: NewReplicaSetAvailable
status: 'True'
type: Progressing
observedGeneration: 18
readyReplicas: 1
replicas: 1
updatedReplicas: 1
这样K8s可以通过endpoint提供的接口对我们的SpringBoot服务进行探测,根据响应内容判断服务状态
容器的生命周期回调
当调用容器生命周期管理回调时,Kubernetes 管理系统根据回调动作执行其处理程序, httpGet
和 tcpSocket
在 kubelet 进程执行,而 exec
则由容器内执行。
PostStart:容器创建成功后,运行前的任务,用于资源部署、环境准备等。
PreStop:容器被终止前的任务,用于优雅关闭应用程序、通知其他系统等等。配置可参考上文。我们可以在这里关停服务。
4 其他
4.1 线程池的优雅关闭
ThreadPoolExecutor 对于关闭有两种方式shutdown和shutdownNow
shutdown 之后会变成 SHUTDOWN 状态,无法接受新的任务,随后等待正在执行的任务执行完成。意味着,shutdown 只是发出一个命令,至于有没有关闭还是得看线程自己。
shutdownNow 的处理规则则不太一样,方法执行之后变成 STOP 状态,并对执行中的线程调用 Thread.interrupt() 方法(但如果线程未处理中断,则不会有任何事发生),所以并不代表“立刻关闭”。
两者都提示我们需要额外执行 awaitTermination 方法,仅仅执行 shutdown/shutdownNow 是不够的
// 线程池
private ThreadPoolExecutor executor;
@Bean
@Primary
public ThreadPoolExecutor asyncServiceExecutor() {
executor = new ThreadPoolExecutor(5, 20, 60L, TimeUnit.SECONDS, new LinkedBlockingQueue<>(1000), new ThreadPoolExecutor.CallerRunsPolicy());
return executor;
}
@PreDestroy
public void destroyThreadPool() {
shutdown();
}
public void shutdown() {
if (this.waitForTasksToCompleteOnShutdown) {
this.executor.shutdown();
}
else {
this.executor.shutdownNow();
}
awaitTerminationIfNecessary();
}
private void awaitTerminationIfNecessary() {
if (this.awaitTerminationSeconds > 0) {
try {
// 具体延迟时间因业务而定
this.executor.awaitTermination(30, TimeUnit.SECONDS));
}
catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
}
}
SpringBoot托管的线程池可以如下设置:
@Slf4j
@EnableAsync
@Configuration
public class TaskExecutorConfig {
private static final int TIMEOUT = 60;
@Bean("taskExecutor")
public ThreadPoolTaskExecutor taskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(10);
executor.setMaxPoolSize(10);
executor.setQueueCapacity(200);
executor.setKeepAliveSeconds(1000);
executor.setThreadNamePrefix("task-asyn");
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.AbortPolicy());
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.AbortPolicy());
// 销毁之前执行shutdown方法
executor.setWaitForTasksToCompleteOnShutdown(true);
// shutdownshutdownNow 之后等待60秒
executor.setAwaitTerminationSeconds(TIMEOUT);
return executor;
}
}
4.2 MQ的优雅关闭
Spring管理的MQ默认都实现了优雅关闭
4.3 定时任务优雅关闭
Spring @Scheduled
可以设置线程池,借助Spring托管的线程池实现优雅关闭
xxl-job
执行器中托管运行着业务任务,任务上线和变更需要重启执行器,尤其是Bean模式任务。 执行器重启可能会中断运行中的任务。但是,XXL-JOB得益于自建执行器与自建注册中心,可以通过灰度上线的方式,避免因重启导致的任务中断的问题
步骤如下: