1. 为什么需要 kube-status-metrics
Kubernetes 的监控主要关注两类指标:
- 基础性能指标
CPU、内存、磁盘、网络等指标,可以通过 DaemonSet 部署 node-exporter,由 Prometheus 抓取相关指标。
- 资源对象指标
Deployment 的副本数量、Pod 的运行状态等。这些指标需要 kube-status-metrics 轮询 Kubernetes 的 API 查询,并暴露给 Prometheus 才能够看到。
2. kube-status-metrics 默认提供了哪些指标
指标类别包括:CertificateSigningRequest MetricsConfigMap MetricsCronJob MetricsDaemonSet MetricsDeployment MetricsEndpoint MetricsHorizontal Pod Autoscaler MetricsIngress MetricsJob MetricsLease MetricsLimitRange MetricsMutatingWebhookConfiguration MetricsNamespace MetricsNetworkPolicy MetricsNode MetricsPersistentVolume MetricsPersistentVolumeClaim MetricsPod Disruption Budget MetricsPod MetricsReplicaSet MetricsReplicationController MetricsResourceQuota MetricsSecret MetricsService MetricsStatefulSet MetricsStorageClass MetricsValidatingWebhookConfiguration MetricsVerticalPodAutoscaler MetricsVolumeAttachment Metrics以 Pod 为例:kube_pod_annotationskube_pod_infokube_pod_ipskube_pod_start_timekube_pod_completion_timekube_pod_ownerkube_pod_labelskube_pod_nodeselectorskube_pod_status_phasekube_pod_status_readykube_pod_status_scheduledkube_pod_containeHnfokube_pod_container_status_waitingkube_pod_container_status_waiting_reasonkube_pod_container_status_runningkube_pod_container_state_startedkube_pod_container_status_terminatedkube_pod_container_status_terminated_reasonkube_pod_container_status_last_terminated_reasonkube_pod_container_status_readykube_pod_container_status_restarts_totalkube_pod_container_resource_requestskube_pod_container_resource_limitskube_pod_overhead_cpu_coreskube_pod_overhead_memory_byteskube_pod_runtimeclass_name_infokube_pod_createdkube_pod_deletion_timestampkube_pod_restart_policykube_pod_init_container_infokube_pod_init_container_status_waitingkube_pod_init_container_status_waiting_reasonkube_pod_init_container_status_runningkube_pod_init_container_status_terminatedkube_pod_init_container_status_terminated_reasonkube_pod_init_container_status_last_terminated_reasonkube_pod_init_container_status_readykube_pod_init_container_status_restarts_totalkube_pod_init_containerLresource_limitskube_pod_init_container^resource_requestskube_pod_spec_volumes_persistentvolumeclaims_infokube_pod_spec_volumes_persistentvolumeclaims_readonlykube_pod_status_reasonkube_pod_status_scheduled_timekube_pod_status_unschedulable相关的指标非常丰富,基本能够观测 Kubernetes 的运行状态。
3. 如何抓取 label、annotations
默认情况下,kube_pod_labels
和 kube_pod_annotations
指标仅包含名称和命名空间标签。如果需要监控更多 labels 和 annotations,就需要用到 kube-status-metrics 的两个启动参数 --metric-labels-allowlist
和 --metric-annotations-allowlist
。需要注意的是,低版本的 kube-status-metrics 并不完全支持这两个参数,下面的配置中使用的是 2.4.2 版本。
|
|
- 准备一个 Pod 作为观测目标
|
|
- 观测 kube_pod_labels
开启 kube-status-metrics
开关之前kube_pod_labels{app_kubernetes_io_component=“metrics”, app_kubernetes_io_instance=“prometheus”, app_kubernetes_io_managed_by=“Helm”, app_kubernetes_io_name=“kube-state-metrics”, app_kubernetes_io_part_of=“kube-state-metrics”, app_kubernetes_io_version=“2.3.0”, exported_namespace=“tekton-pipelines”, helm_sh_chart=“kube-state-metrics-4.4.3”, instance=“10.233.96.11:8080”, job=“kubernetes-service-endpoints”, namespace=“monitor”, node=“node2”, pod=“tekton-pipelines-controller-6f449d874b-mc7nl”, service=“prometheus-kube-state-metrics”, uid=“412f8383-1c5c-4f61-8198-453bdb204911”}开启 kube-status-metrics
开关之后kube_pod_labels{app_kubernetes_io_component=“metrics”, app_kubernetes_io_instance=“prometheus”, app_kubernetes_io_managed_by=“Helm”, app_kubernetes_io_name=“kube-state-metrics”, app_kubernetes_io_part_of=“kube-state-metrics”, app_kubernetes_io_version=“2.3.0”, exported_namespace=“tekton-pipelines”, helm_sh_chart=“kube-state-metrics-4.4.3”, instance=“10.233.105.11:8080”, job=“kubernetes-service-endpoints”, label_app=“tekton-pipelines-controller”, label_app_kubernetes_io_component=“controller”, label_app_kubernetes_io_instance=“default”, label_app_kubernetes_io_name=“controller”, label_app_kubernetes_io_part_of=“tekton-pipelines”, label_app_kubernetes_io_version=“v0.24.1”, label_pipeline_tekton_dev_release=“v0.24.1”, label_pod_template_hash=“6f449d874b”, label_version=“v0.24.1”, namespace=“monitor”, node=“node4”, pod=“tekton-pipelines-controller-6f449d874b-mc7nl”, service=“prometheus-kube-state-metrics”, uid=“412f8383-1c5c-4f61-8198-453bdb204911”}会增加很多 label_
开头的标签。
- kube_pod_annotations
开启 kube-status-metrics
开关之前kube_pod_annotations{app_kubernetes_io_component=“metrics”, app_kubernetes_io_instance=“prometheus”, app_kubernetes_io_managed_by=“Helm”, app_kubernetes_io_name=“kube-state-metrics”, app_kubernetes_io_part_of=“kube-state-metrics”, app_kubernetes_io_version=“2.3.0”, exported_namespace=“tekton-pipelines”, helm_sh_chart=“kube-state-metrics-4.4.3”, instance=“10.233.96.11:8080”, job=“kubernetes-service-endpoints”, namespace=“monitor”, node=“node2”, pod=“tekton-pipelines-controller-6f449d874b-mc7nl”, service=“prometheus-kube-state-metrics”, uid=“412f8383-1c5c-4f61-8198-453bdb204911”}开启 kube-status-metrics
开关之后kube_pod_annotations{annotation_cluster_autoscaler_kubernetes_io_safe_to_evict=“false”, annotation_cni_projectcalico_org_container_id=“8a505a530b501ad80ce471e86b553257e4ec3541313bc4245233f60a04dd3619”, annotation_cni_projectcalico_org_pod_ip=“10.233.105.3/32”, annotation_cni_projectcalico_org_pod_ips=“10.233.105.3/32”, app_kubernetes_io_component=“metrics”, app_kubernetes_io_instance=“prometheus”, app_kubernetes_io_managed_by=“Helm”, app_kubernetes_io_name=“kube-state-metrics”, app_kubernetes_io_part_of=“kube-state-metrics”, app_kubernetes_io_version=“2.3.0”, exported_namespace=“tekton-pipelines”, helm_sh_chart=“kube-state-metrics-4.4.3”, instance=“10.233.105.11:8080”, job=“kubernetes-service-endpoints”, namespace=“monitor”, node=“node4”, pod=“tekton-pipelines-controller-6f449d874b-mc7nl”, service=“prometheus-kube-state-metrics”, uid=“412f8383-1c5c-4f61-8198-453bdb204911”}会增加很多 annotation_
开头的标签。开启这两个开关之后,对 Prometheus 的内存、CPU、存储都会增加压力。在我测试的环境下,集群中有 2000 个,其中仅 40 个处于 Running 状态,全部采集时 Prometheus 的内存消耗瞬间就增加了大约 400 MB,如下图:Pod 的状态不影响 kube-status-metrics 对其指标的采集。
4. 参考
- https://github.com/kubernetes/kube-state-metrics/tree/master/docs
- https://github.com/kubernetes/kube-state-metrics/blob/master/docs/pod-metrics.md
- https://github.com/kubernetes/kube-state-metrics/blob/master/docs/cli-arguments.md