如何查看 Tekton 的流水线指标

2023年 1月 4日 104.7k 0

1. 抓取 Tekton Metrics

  • 新增 ConfigMap 配置文件
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: config-observability
  namespace: tekton-pipelines
  labels:
    app.kubernetes.io/instance: default
    app.kubernetes.io/part-of: tekton-pipelines
data:
    metrics.backend-destination: prometheus
    metrics.taskrun.level: "task"
    metrics.taskrun.duration-type: "histogram"
    metrics.pipelinerun.level: "pipeline"
    metrics.pipelinerun.duration-type: "histogram"
EOF

修改 data 中的配置,会改变上报指标的粒度,甚至会严重影响 Prometheus 的性能,需要谨慎修改。

  • 重启 Tekton
1
kubectl -n tekton-pipelines rollout restart deployment tekton-pipelines-controller
  • [可选] 将 tekton-pipelines-controller 设置为 NodePort 查看 Metrics
1
kubectl -n tekton-pipelines patch svc tekton-pipelines-controller -p '{"spec": {"type": "NodePort"}}'

此时通过 kubectl -n tekton-pipelines get svc tekton-pipelines-controller 可以使用主机 IP:NodePort 的方式进行访问,查看相关指标。如果采用的是集群外的 Prometheus 进行抓取指标,那么可以直接使用 IP:NodePort。

  • 在集群内部,通过 Helm 部署一个 Prometheus 实例

参考 Prometheus、Grafana 搭建 Kubernetes 监控

1
2
3
4
helm -n monitor list

NAME      	NAMESPACE	REVISION	UPDATED                             	STATUS  	CHART            	APP VERSION
prometheus	monitor  	1       	2022-03-17 14:39:38.743741 +0800 CST	deployed	prometheus-15.3.0	2.31.1     
  • 设置 Service 让 Prometheus 自动抓取
1
kubectl -n tekton-pipelines edit svc tekton-pipelines-controller
1
2
3
4
5
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"

prometheus.io/path: /metricsprometheus.io/port: "9090" 是默认值,在注解中可以省略。

  • 在 Prometheus 中查看指标

tekton_pipelines_controller_client_latency_bucket{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, le="+Inf", namespace=“tekton-pipelines”, node=“node4”, pipeline_tekton_dev_release=“v0.24.1”, service=“tekton-pipelines-controller”, version=“v0.24.1”}上面是一个简单示例,在指标中,有关于命名空间、流水线相关的标签,可以用于过滤。

2. Tekton 暴露了哪些指标

2.1 tekton_pipelines_controller_pipelinerun_duration_seconds_[bucket, sum, count]

tekton_pipelines_controller_pipelinerun_duration_seconds_bucket{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, le="+Inf", namespace=“asimov”, pipeline=“p-c8tetchin6qsrnm7bqog”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“p-caa3ljeb23td2d6v8t7g”, status=“success”, version=“v0.24.1”} 13tekton_pipelines_controller_pipelinerun_duration_seconds_sum{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“asimov”, pipeline=“p-c8tetchin6qsrnm7bqog”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“p-caa3ljeb23td2d6v8t7g”, status=“success”, version=“v0.24.1”} 494tekton_pipelines_controller_pipelinerun_duration_seconds_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“asimov”, pipeline=“p-c8tetchin6qsrnm7bqog”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“p-caa3ljeb23td2d6v8t7g”, status=“success”, version=“v0.24.1”} 13这是一个 Histogram 类型的指标,我们可以通过 (histogram_quantile(1, tekton_pipelines_controller_pipelinerun_duration_seconds_bucket)) 获取 pipelinerun 执行的大概时间,或者通过 count by (namespace, pipeline) (tekton_pipelines_controller_pipelinerun_duration_seconds_sum) 获取 pipeline 执行了多少次。当然,还可以使用 histogram_quantile 统计指定百分比的流水线执行完成,需要多长时间。

2.2 tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_[bucket, sum, count]

tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_bucket{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, le="+Inf", namespace=“account”, pipeline=“pay-c9tn0h6b23t28qjnp5mg”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“pay-c9tno3mb23t28qjnp660”, status=“failed”, task=“approve”, taskrun=“pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg”, version=“v0.24.1”} 1tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_sum{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“account”, pipeline=“pay-c9tn0h6b23t28qjnp5mg”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“pay-c9tno3mb23t28qjnp660”, status=“failed”, task=“approve”, taskrun=“pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg”, version=“v0.24.1”} 1461438tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“account”, pipeline=“pay-c9tn0h6b23t28qjnp5mg”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“pay-c9tno3mb23t28qjnp660”, status=“failed”, task=“approve”, taskrun=“pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg”, version=“v0.24.1”} 18这是一个 Histogram 类型的指标,具体使用可以参考上面的 tekton_pipelines_controller_pipelinerun_duration_seconds_ 指标。

2.3 tekton_pipelines_controller_pipelinerun_count

tekton_pipelines_controller_pipelinerun_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, pipeline_tekton_dev_release=“v0.24.1”, status=“success”, version=“v0.24.1”} 7540在整个集群上,流水线总共成功执行了 7540 次

2.4 tekton_pipelines_controller_running_pipelineruns_count

tekton_pipelines_controller_running_pipelineruns_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, pipeline_tekton_dev_release=“v0.24.1”, version=“v0.24.1”} 1在整个集群上,正在运行 1 条流水线

2.5 tekton_pipelines_controller_taskrun_duration_seconds_[bucket, sum, count]

如果直接使用 taskrun 而不是 pipelinerun 运行任务,才会有这些指标。

2.6 tekton_pipelines_controller_taskrun_count

tekton_pipelines_controller_taskrun_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, pipeline_tekton_dev_release=“v0.24.1”, status=“success”, version=“v0.24.1”} 43423在整个集群上,taskrun 成功执行了 43423 次

2.7 tekton_pipelines_controller_running_taskruns_count

tekton_pipelines_controller_running_taskruns_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, pipeline_tekton_dev_release=“v0.24.1”, version=“v0.24.1”} 1在整个集群上,正在运行 1 个 taskrun 任务

2.8 tekton_pipelines_controller_taskruns_pod_latency

tekton_pipelines_controller_taskruns_pod_latency{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“asimov”, pipeline_tekton_dev_release=“v0.24.1”, pod=“p-caa3ljeb23td2d6v8t7g-fetch-main-repo-fr62k-pod-sh4vp”, task=“git-clone”, taskrun=“p-caa3ljeb23td2d6v8t7g-fetch-main-repo-fr62k”, version=“v0.24.1”} 3000000000p-caa3ljeb23td2d6v8t7g-fetch-main-repo-fr62k taskrun 任务创建 Pod 的启动延时为 3000000000,这里的延时是秒级别,因此单位应该是纳秒,也就是 3 秒。

  • tekton_pipelines_controller_cloudevent_count

tekton_pipelines_controller_cloudevent_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“account”, pipeline=“pay-c9tn0h6b23t28qjnp5mg”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“pay-c9tno3mb23t28qjnp660”, status=“failed”, task=“approve”, taskrun=“pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg”, version=“v0.24.1”} 0Tekton 可以与 CloudEvent 集成,将事件发送到 CloudEvent 进行广播。

2.9 tekton_pipelines_controller_client_latency_[bucket, sum, count]

Tekton 中使用 Client 与 Kubernete Apiserver 交互。tekton_pipelines_controller_client_latency_bucket{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, le=“1”, pipeline_tekton_dev_release=“v0.24.1”, version=“v0.24.1”} 11627le=“0.1” 10019,在 0.1 秒内,处理了 10019 个请求le=“1” 11627,在 1 秒内,处理了 11627 个请求le=“10”,在 10 秒内,处理了 11633 个请求其他相关的指标还有:tekton_pipelines_controller_client_latency_sumtekton_pipelines_controller_client_latency_count

3. Grafana 面板

针对上面的一些描述,我绘制了一个 Tekton Overview 的 Grafana 面板,链接: https://grafana.com/grafana/dashboards/16559-tekton-overview下面是一些面板截图:如果你也需要使用这个面板,别忘了开启 Label 的采集,参考: 如何采集 Kubernetes 对象的 labels 和 annotations。

4. 参考

  • https://tekton.dev/docs/pipelines/metrics/
  • https://ish-ar.io/tekton-and-prometheus/
  • https://github.com/tektoncd/pipeline/blob/main/docs/metrics.md
  • https://grafana.com/grafana/dashboards/16559-tekton-overview

相关文章

KubeSphere 部署向量数据库 Milvus 实战指南
探索 Kubernetes 持久化存储之 Longhorn 初窥门径
征服 Docker 镜像访问限制!KubeSphere v3.4.1 成功部署全攻略
那些年在 Terraform 上吃到的糖和踩过的坑
无需 Kubernetes 测试 Kubernetes 网络实现
Kubernetes v1.31 中的移除和主要变更

发布评论