如何采集 Kubernetes 对象的 labels 和 annotations

2023年 1月 4日 42.8k 0

1. 为什么需要 kube-status-metrics

Kubernetes 的监控主要关注两类指标:

  • 基础性能指标

CPU、内存、磁盘、网络等指标,可以通过 DaemonSet 部署 node-exporter,由 Prometheus 抓取相关指标。

  • 资源对象指标

Deployment 的副本数量、Pod 的运行状态等。这些指标需要 kube-status-metrics 轮询 Kubernetes 的 API 查询,并暴露给 Prometheus 才能够看到。

2. kube-status-metrics 默认提供了哪些指标

指标类别包括:CertificateSigningRequest MetricsConfigMap MetricsCronJob MetricsDaemonSet MetricsDeployment MetricsEndpoint MetricsHorizontal Pod Autoscaler MetricsIngress MetricsJob MetricsLease MetricsLimitRange MetricsMutatingWebhookConfiguration MetricsNamespace MetricsNetworkPolicy MetricsNode MetricsPersistentVolume MetricsPersistentVolumeClaim MetricsPod Disruption Budget MetricsPod MetricsReplicaSet MetricsReplicationController MetricsResourceQuota MetricsSecret MetricsService MetricsStatefulSet MetricsStorageClass MetricsValidatingWebhookConfiguration MetricsVerticalPodAutoscaler MetricsVolumeAttachment Metrics以 Pod 为例:kube_pod_annotationskube_pod_infokube_pod_ipskube_pod_start_timekube_pod_completion_timekube_pod_ownerkube_pod_labelskube_pod_nodeselectorskube_pod_status_phasekube_pod_status_readykube_pod_status_scheduledkube_pod_containeHnfokube_pod_container_status_waitingkube_pod_container_status_waiting_reasonkube_pod_container_status_runningkube_pod_container_state_startedkube_pod_container_status_terminatedkube_pod_container_status_terminated_reasonkube_pod_container_status_last_terminated_reasonkube_pod_container_status_readykube_pod_container_status_restarts_totalkube_pod_container_resource_requestskube_pod_container_resource_limitskube_pod_overhead_cpu_coreskube_pod_overhead_memory_byteskube_pod_runtimeclass_name_infokube_pod_createdkube_pod_deletion_timestampkube_pod_restart_policykube_pod_init_container_infokube_pod_init_container_status_waitingkube_pod_init_container_status_waiting_reasonkube_pod_init_container_status_runningkube_pod_init_container_status_terminatedkube_pod_init_container_status_terminated_reasonkube_pod_init_container_status_last_terminated_reasonkube_pod_init_container_status_readykube_pod_init_container_status_restarts_totalkube_pod_init_containerLresource_limitskube_pod_init_container^resource_requestskube_pod_spec_volumes_persistentvolumeclaims_infokube_pod_spec_volumes_persistentvolumeclaims_readonlykube_pod_status_reasonkube_pod_status_scheduled_timekube_pod_status_unschedulable相关的指标非常丰富,基本能够观测 Kubernetes 的运行状态。

3. 如何抓取 label、annotations

默认情况下,kube_pod_labelskube_pod_annotations 指标仅包含名称和命名空间标签。如果需要监控更多 labels 和 annotations,就需要用到 kube-status-metrics 的两个启动参数 --metric-labels-allowlist--metric-annotations-allowlist。需要注意的是,低版本的 kube-status-metrics 并不完全支持这两个参数,下面的配置中使用的是 2.4.2 版本。

1
2
3
4
5
6
7
8
      containers:
      - args:
        - --port=8080
        - --metric-labels-allowlist=pods=[*]
        - --metric-annotations-allowlist=pods=[*]
        - --resources=certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
        - --telemetry-port=8081
        image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.4.2
  • 准备一个 Pod 作为观测目标
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
kubectl -n tekton-pipelines get pod tekton-pipelines-controller-6f449d874b-mc7nl -o yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
    cni.projectcalico.org/containerID: 8a505a530b501ad80ce471e86b553257e4ec3541313bc4245233f60a04dd3619
    cni.projectcalico.org/podIP: 10.233.105.3/32
    cni.projectcalico.org/podIPs: 10.233.105.3/32
  creationTimestamp: "2022-04-06T01:44:20Z"
  generateName: tekton-pipelines-controller-6f449d874b-
  labels:
    app: tekton-pipelines-controller
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: default
    app.kubernetes.io/name: controller
    app.kubernetes.io/part-of: tekton-pipelines
    app.kubernetes.io/version: v0.24.1
    pipeline.tekton.dev/release: v0.24.1
    pod-template-hash: 6f449d874b
    version: v0.24.1
  name: tekton-pipelines-controller-6f449d874b-mc7nl
  • 观测 kube_pod_labels

开启 kube-status-metrics 开关之前kube_pod_labels{app_kubernetes_io_component=“metrics”, app_kubernetes_io_instance=“prometheus”, app_kubernetes_io_managed_by=“Helm”, app_kubernetes_io_name=“kube-state-metrics”, app_kubernetes_io_part_of=“kube-state-metrics”, app_kubernetes_io_version=“2.3.0”, exported_namespace=“tekton-pipelines”, helm_sh_chart=“kube-state-metrics-4.4.3”, instance=“10.233.96.11:8080”, job=“kubernetes-service-endpoints”, namespace=“monitor”, node=“node2”, pod=“tekton-pipelines-controller-6f449d874b-mc7nl”, service=“prometheus-kube-state-metrics”, uid=“412f8383-1c5c-4f61-8198-453bdb204911”}开启 kube-status-metrics 开关之后kube_pod_labels{app_kubernetes_io_component=“metrics”, app_kubernetes_io_instance=“prometheus”, app_kubernetes_io_managed_by=“Helm”, app_kubernetes_io_name=“kube-state-metrics”, app_kubernetes_io_part_of=“kube-state-metrics”, app_kubernetes_io_version=“2.3.0”, exported_namespace=“tekton-pipelines”, helm_sh_chart=“kube-state-metrics-4.4.3”, instance=“10.233.105.11:8080”, job=“kubernetes-service-endpoints”, label_app=“tekton-pipelines-controller”, label_app_kubernetes_io_component=“controller”, label_app_kubernetes_io_instance=“default”, label_app_kubernetes_io_name=“controller”, label_app_kubernetes_io_part_of=“tekton-pipelines”, label_app_kubernetes_io_version=“v0.24.1”, label_pipeline_tekton_dev_release=“v0.24.1”, label_pod_template_hash=“6f449d874b”, label_version=“v0.24.1”, namespace=“monitor”, node=“node4”, pod=“tekton-pipelines-controller-6f449d874b-mc7nl”, service=“prometheus-kube-state-metrics”, uid=“412f8383-1c5c-4f61-8198-453bdb204911”}会增加很多 label_ 开头的标签。

  • kube_pod_annotations

开启 kube-status-metrics 开关之前kube_pod_annotations{app_kubernetes_io_component=“metrics”, app_kubernetes_io_instance=“prometheus”, app_kubernetes_io_managed_by=“Helm”, app_kubernetes_io_name=“kube-state-metrics”, app_kubernetes_io_part_of=“kube-state-metrics”, app_kubernetes_io_version=“2.3.0”, exported_namespace=“tekton-pipelines”, helm_sh_chart=“kube-state-metrics-4.4.3”, instance=“10.233.96.11:8080”, job=“kubernetes-service-endpoints”, namespace=“monitor”, node=“node2”, pod=“tekton-pipelines-controller-6f449d874b-mc7nl”, service=“prometheus-kube-state-metrics”, uid=“412f8383-1c5c-4f61-8198-453bdb204911”}开启 kube-status-metrics 开关之后kube_pod_annotations{annotation_cluster_autoscaler_kubernetes_io_safe_to_evict=“false”, annotation_cni_projectcalico_org_container_id=“8a505a530b501ad80ce471e86b553257e4ec3541313bc4245233f60a04dd3619”, annotation_cni_projectcalico_org_pod_ip=“10.233.105.3/32”, annotation_cni_projectcalico_org_pod_ips=“10.233.105.3/32”, app_kubernetes_io_component=“metrics”, app_kubernetes_io_instance=“prometheus”, app_kubernetes_io_managed_by=“Helm”, app_kubernetes_io_name=“kube-state-metrics”, app_kubernetes_io_part_of=“kube-state-metrics”, app_kubernetes_io_version=“2.3.0”, exported_namespace=“tekton-pipelines”, helm_sh_chart=“kube-state-metrics-4.4.3”, instance=“10.233.105.11:8080”, job=“kubernetes-service-endpoints”, namespace=“monitor”, node=“node4”, pod=“tekton-pipelines-controller-6f449d874b-mc7nl”, service=“prometheus-kube-state-metrics”, uid=“412f8383-1c5c-4f61-8198-453bdb204911”}会增加很多 annotation_ 开头的标签。开启这两个开关之后,对 Prometheus 的内存、CPU、存储都会增加压力。在我测试的环境下,集群中有 2000 个,其中仅 40 个处于 Running 状态,全部采集时 Prometheus 的内存消耗瞬间就增加了大约 400 MB,如下图:Pod 的状态不影响 kube-status-metrics 对其指标的采集。

4. 参考

  • https://github.com/kubernetes/kube-state-metrics/tree/master/docs
  • https://github.com/kubernetes/kube-state-metrics/blob/master/docs/pod-metrics.md
  • https://github.com/kubernetes/kube-state-metrics/blob/master/docs/cli-arguments.md

相关文章

了解 Llama 3:迄今最强大的免费开源大模型从概念到使用
KubeSphere 在互联网电商行业的应用实践
在 KubeSphere 上快速安装和使用 KDP 云原生数据平台
使用 KubeSphere 实现微服务的灰度发布
在 Kubernetes 中实现微服务应用监控
基于 KubeKey 扩容 Kubernetes v1.24 Worker 节点实战

发布评论