如何采集 Kubernetes 对象的 labels 和 annotations

2023年 1月 4日 61.9k 0

1. 为什么需要 kube-status-metrics

Kubernetes 的监控主要关注两类指标:

  • 基础性能指标

CPU、内存、磁盘、网络等指标,可以通过 DaemonSet 部署 node-exporter,由 Prometheus 抓取相关指标。

  • 资源对象指标

Deployment 的副本数量、Pod 的运行状态等。这些指标需要 kube-status-metrics 轮询 Kubernetes 的 API 查询,并暴露给 Prometheus 才能够看到。

2. kube-status-metrics 默认提供了哪些指标

指标类别包括:CertificateSigningRequest MetricsConfigMap MetricsCronJob MetricsDaemonSet MetricsDeployment MetricsEndpoint MetricsHorizontal Pod Autoscaler MetricsIngress MetricsJob MetricsLease MetricsLimitRange MetricsMutatingWebhookConfiguration MetricsNamespace MetricsNetworkPolicy MetricsNode MetricsPersistentVolume MetricsPersistentVolumeClaim MetricsPod Disruption Budget MetricsPod MetricsReplicaSet MetricsReplicationController MetricsResourceQuota MetricsSecret MetricsService MetricsStatefulSet MetricsStorageClass MetricsValidatingWebhookConfiguration MetricsVerticalPodAutoscaler MetricsVolumeAttachment Metrics以 Pod 为例:kube_pod_annotationskube_pod_infokube_pod_ipskube_pod_start_timekube_pod_completion_timekube_pod_ownerkube_pod_labelskube_pod_nodeselectorskube_pod_status_phasekube_pod_status_readykube_pod_status_scheduledkube_pod_containeHnfokube_pod_container_status_waitingkube_pod_container_status_waiting_reasonkube_pod_container_status_runningkube_pod_container_state_startedkube_pod_container_status_terminatedkube_pod_container_status_terminated_reasonkube_pod_container_status_last_terminated_reasonkube_pod_container_status_readykube_pod_container_status_restarts_totalkube_pod_container_resource_requestskube_pod_container_resource_limitskube_pod_overhead_cpu_coreskube_pod_overhead_memory_byteskube_pod_runtimeclass_name_infokube_pod_createdkube_pod_deletion_timestampkube_pod_restart_policykube_pod_init_container_infokube_pod_init_container_status_waitingkube_pod_init_container_status_waiting_reasonkube_pod_init_container_status_runningkube_pod_init_container_status_terminatedkube_pod_init_container_status_terminated_reasonkube_pod_init_container_status_last_terminated_reasonkube_pod_init_container_status_readykube_pod_init_container_status_restarts_totalkube_pod_init_containerLresource_limitskube_pod_init_container^resource_requestskube_pod_spec_volumes_persistentvolumeclaims_infokube_pod_spec_volumes_persistentvolumeclaims_readonlykube_pod_status_reasonkube_pod_status_scheduled_timekube_pod_status_unschedulable相关的指标非常丰富,基本能够观测 Kubernetes 的运行状态。

3. 如何抓取 label、annotations

默认情况下,kube_pod_labelskube_pod_annotations 指标仅包含名称和命名空间标签。如果需要监控更多 labels 和 annotations,就需要用到 kube-status-metrics 的两个启动参数 --metric-labels-allowlist--metric-annotations-allowlist。需要注意的是,低版本的 kube-status-metrics 并不完全支持这两个参数,下面的配置中使用的是 2.4.2 版本。

1
2
3
4
5
6
7
8
      containers:
      - args:
        - --port=8080
        - --metric-labels-allowlist=pods=[*]
        - --metric-annotations-allowlist=pods=[*]
        - --resources=certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
        - --telemetry-port=8081
        image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.4.2
  • 准备一个 Pod 作为观测目标
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
kubectl -n tekton-pipelines get pod tekton-pipelines-controller-6f449d874b-mc7nl -o yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
    cni.projectcalico.org/containerID: 8a505a530b501ad80ce471e86b553257e4ec3541313bc4245233f60a04dd3619
    cni.projectcalico.org/podIP: 10.233.105.3/32
    cni.projectcalico.org/podIPs: 10.233.105.3/32
  creationTimestamp: "2022-04-06T01:44:20Z"
  generateName: tekton-pipelines-controller-6f449d874b-
  labels:
    app: tekton-pipelines-controller
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: default
    app.kubernetes.io/name: controller
    app.kubernetes.io/part-of: tekton-pipelines
    app.kubernetes.io/version: v0.24.1
    pipeline.tekton.dev/release: v0.24.1
    pod-template-hash: 6f449d874b
    version: v0.24.1
  name: tekton-pipelines-controller-6f449d874b-mc7nl
  • 观测 kube_pod_labels

开启 kube-status-metrics 开关之前kube_pod_labels{app_kubernetes_io_component=“metrics”, app_kubernetes_io_instance=“prometheus”, app_kubernetes_io_managed_by=“Helm”, app_kubernetes_io_name=“kube-state-metrics”, app_kubernetes_io_part_of=“kube-state-metrics”, app_kubernetes_io_version=“2.3.0”, exported_namespace=“tekton-pipelines”, helm_sh_chart=“kube-state-metrics-4.4.3”, instance=“10.233.96.11:8080”, job=“kubernetes-service-endpoints”, namespace=“monitor”, node=“node2”, pod=“tekton-pipelines-controller-6f449d874b-mc7nl”, service=“prometheus-kube-state-metrics”, uid=“412f8383-1c5c-4f61-8198-453bdb204911”}开启 kube-status-metrics 开关之后kube_pod_labels{app_kubernetes_io_component=“metrics”, app_kubernetes_io_instance=“prometheus”, app_kubernetes_io_managed_by=“Helm”, app_kubernetes_io_name=“kube-state-metrics”, app_kubernetes_io_part_of=“kube-state-metrics”, app_kubernetes_io_version=“2.3.0”, exported_namespace=“tekton-pipelines”, helm_sh_chart=“kube-state-metrics-4.4.3”, instance=“10.233.105.11:8080”, job=“kubernetes-service-endpoints”, label_app=“tekton-pipelines-controller”, label_app_kubernetes_io_component=“controller”, label_app_kubernetes_io_instance=“default”, label_app_kubernetes_io_name=“controller”, label_app_kubernetes_io_part_of=“tekton-pipelines”, label_app_kubernetes_io_version=“v0.24.1”, label_pipeline_tekton_dev_release=“v0.24.1”, label_pod_template_hash=“6f449d874b”, label_version=“v0.24.1”, namespace=“monitor”, node=“node4”, pod=“tekton-pipelines-controller-6f449d874b-mc7nl”, service=“prometheus-kube-state-metrics”, uid=“412f8383-1c5c-4f61-8198-453bdb204911”}会增加很多 label_ 开头的标签。

  • kube_pod_annotations

开启 kube-status-metrics 开关之前kube_pod_annotations{app_kubernetes_io_component=“metrics”, app_kubernetes_io_instance=“prometheus”, app_kubernetes_io_managed_by=“Helm”, app_kubernetes_io_name=“kube-state-metrics”, app_kubernetes_io_part_of=“kube-state-metrics”, app_kubernetes_io_version=“2.3.0”, exported_namespace=“tekton-pipelines”, helm_sh_chart=“kube-state-metrics-4.4.3”, instance=“10.233.96.11:8080”, job=“kubernetes-service-endpoints”, namespace=“monitor”, node=“node2”, pod=“tekton-pipelines-controller-6f449d874b-mc7nl”, service=“prometheus-kube-state-metrics”, uid=“412f8383-1c5c-4f61-8198-453bdb204911”}开启 kube-status-metrics 开关之后kube_pod_annotations{annotation_cluster_autoscaler_kubernetes_io_safe_to_evict=“false”, annotation_cni_projectcalico_org_container_id=“8a505a530b501ad80ce471e86b553257e4ec3541313bc4245233f60a04dd3619”, annotation_cni_projectcalico_org_pod_ip=“10.233.105.3/32”, annotation_cni_projectcalico_org_pod_ips=“10.233.105.3/32”, app_kubernetes_io_component=“metrics”, app_kubernetes_io_instance=“prometheus”, app_kubernetes_io_managed_by=“Helm”, app_kubernetes_io_name=“kube-state-metrics”, app_kubernetes_io_part_of=“kube-state-metrics”, app_kubernetes_io_version=“2.3.0”, exported_namespace=“tekton-pipelines”, helm_sh_chart=“kube-state-metrics-4.4.3”, instance=“10.233.105.11:8080”, job=“kubernetes-service-endpoints”, namespace=“monitor”, node=“node4”, pod=“tekton-pipelines-controller-6f449d874b-mc7nl”, service=“prometheus-kube-state-metrics”, uid=“412f8383-1c5c-4f61-8198-453bdb204911”}会增加很多 annotation_ 开头的标签。开启这两个开关之后,对 Prometheus 的内存、CPU、存储都会增加压力。在我测试的环境下,集群中有 2000 个,其中仅 40 个处于 Running 状态,全部采集时 Prometheus 的内存消耗瞬间就增加了大约 400 MB,如下图:Pod 的状态不影响 kube-status-metrics 对其指标的采集。

4. 参考

  • https://github.com/kubernetes/kube-state-metrics/tree/master/docs
  • https://github.com/kubernetes/kube-state-metrics/blob/master/docs/pod-metrics.md
  • https://github.com/kubernetes/kube-state-metrics/blob/master/docs/cli-arguments.md

相关文章

KubeSphere 部署向量数据库 Milvus 实战指南
探索 Kubernetes 持久化存储之 Longhorn 初窥门径
征服 Docker 镜像访问限制!KubeSphere v3.4.1 成功部署全攻略
那些年在 Terraform 上吃到的糖和踩过的坑
无需 Kubernetes 测试 Kubernetes 网络实现
Kubernetes v1.31 中的移除和主要变更

发布评论