k8s健康检查参数解析

2024年 2月 27日云计算法医

一、背景

k8s会托管部署在其中的容器，并且对于容器进行全生命周期的管理，包括容器的创建，销毁和重启等操作。那么k8s是怎么管理容器的生命周期呢？

k8s为资源提供了健康检查的探针，通过健康检查的探针来监测容器的运行和存活状态，在通过资源配置的管理策略，结合kubelet api来完成对于集群资源全周期的管理。

k8s提供了三种探针来进行容器的监控检查，分别是startupProbe，livenessProbe，readinessProbe。接下来主要介绍这三种探针以及如何配置这三种探针。

二、探针

在了解k8s通过三个不同的探针来进行容器健康状态检查，从而配置其他功能完成对容器的管理。

startupProbe

启动探针主要关注容器启动阶段的初始化过程，确保应用已正确启动并进入稳定状态。

livenessProbe

主要作用是看什么时候重启容器，存活探针用于检查容器是否仍然“活着”，即它是否能正常提供服务。如果连续多次探测失败（达到failureThreshold），kubelet会认为容器已经死亡或不可恢复，进而重启该容器。

readinessProbe

主要作用是判断容器是否准备好接收请求流量。只有当readinessProbe返回成功时，kubelet才会将Pod标记为就绪状态，并将其纳入Service的endpoints中。如果readiness检查失败会从service的endpoints中将pod移除。已确保只有健康状态的应用才会分发流量。

如果配置了startupProbe，则只有当startupProbe成功后，kubelet才会开始处理livenessProbe和readinessProbe；如果没有配置startupProbe，则由livenessProbe和readinessProbe配置参数决定各自启动时间。不同服务可以通过配置启动探针来调整进行readiness和liveness的时间避免过早执行检查造成不必要的重启和流量分发。

The kubelet uses startup probes to know when a container application has started. If such a probe is configured, liveness and readiness probes do not start until it succeeds, making sure those probes don’t interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running.

以下代码分别展示了k8s健康检查的三个探针startupProbe，livenessProbe，readinessProbe。

    startupProbe:
        httpGet:
          path: /actuator/health
          port: 88888
        initialDelaySeconds: 240
        periodSeconds: 30
        timeoutSeconds: 10
        failureThreshold: 20
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health
        port: 88888
        scheme: HTTP
      initialDelaySeconds: 60
      periodSeconds: 30
      successThreshold: 1
      timeoutSeconds: 120          
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health
        port: 88888
        scheme: HTTP
      initialDelaySeconds: 20
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 120

三、探针解释

在了解健康检查探针的作用之后，如果配置适合自己的探针才是关键，否则配置不合理的探针要么导致容器重启或者无法使用服务，又或者可能需要等待很久才能完成健康检查，造成不必要的时间浪费。

httpGet：探针健康检查机制，除了可以配置httpget之外，还有exec，tcpsocket，grpc来作为探测方式类型。
- path: /actuator/health
- port: 8011
- scheme: HTTP
initialDelaySeconds：容器启动后要等待多少秒开始第一次执行探针。如果定义了启动探针，则存活探针和就绪探针的延迟将在启动探针已成功之后才开始计算。如果periodSeconds的值大于initialDelaySeconds，则initialDelaySeconds将被忽略。
periodSeconds：执行探测的时间间隔。两次健康检查过程中等待的时间间隔。也就是第一次健康检查之后，等待periodSeconds在进行下一次时间检查。
timeoutSeconds: 探针单次执行的超时时间。单次健康检查如果超过timeout时间没有返回成功结果，那么本次健康检查会因为超时失败。
failureThreshold：探针最大执行失败次数。探针执行失败次数达到failureThreshold之后，k8s集群会认为容器是不健康或者未就绪状态。
successThreshold：探针失败之后，容器再次被当作成功的最少探测成功次数。
terminationGracePeriodSeconds：kubelet等待强制停止失败容器的缓解时间。在terminationGracePeriodSeconds之后，kubelet会强制停止失败的容器。