1. 抓取 Tekton Metrics
- 新增 ConfigMap 配置文件
|
|
修改 data
中的配置,会改变上报指标的粒度,甚至会严重影响 Prometheus 的性能,需要谨慎修改。
- 重启 Tekton
|
|
- [可选] 将 tekton-pipelines-controller 设置为 NodePort 查看 Metrics
|
|
此时通过 kubectl -n tekton-pipelines get svc tekton-pipelines-controller
可以使用主机 IP:NodePort 的方式进行访问,查看相关指标。如果采用的是集群外的 Prometheus 进行抓取指标,那么可以直接使用 IP:NodePort。
- 在集群内部,通过 Helm 部署一个 Prometheus 实例
参考 Prometheus、Grafana 搭建 Kubernetes 监控
|
|
- 设置 Service 让 Prometheus 自动抓取
|
|
|
|
prometheus.io/path: /metrics
和 prometheus.io/port: "9090"
是默认值,在注解中可以省略。
- 在 Prometheus 中查看指标
tekton_pipelines_controller_client_latency_bucket{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, le="+Inf", namespace=“tekton-pipelines”, node=“node4”, pipeline_tekton_dev_release=“v0.24.1”, service=“tekton-pipelines-controller”, version=“v0.24.1”}上面是一个简单示例,在指标中,有关于命名空间、流水线相关的标签,可以用于过滤。
2. Tekton 暴露了哪些指标
2.1 tekton_pipelines_controller_pipelinerun_duration_seconds_[bucket, sum, count]
tekton_pipelines_controller_pipelinerun_duration_seconds_bucket{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, le="+Inf", namespace=“asimov”, pipeline=“p-c8tetchin6qsrnm7bqog”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“p-caa3ljeb23td2d6v8t7g”, status=“success”, version=“v0.24.1”} 13tekton_pipelines_controller_pipelinerun_duration_seconds_sum{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“asimov”, pipeline=“p-c8tetchin6qsrnm7bqog”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“p-caa3ljeb23td2d6v8t7g”, status=“success”, version=“v0.24.1”} 494tekton_pipelines_controller_pipelinerun_duration_seconds_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“asimov”, pipeline=“p-c8tetchin6qsrnm7bqog”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“p-caa3ljeb23td2d6v8t7g”, status=“success”, version=“v0.24.1”} 13这是一个 Histogram 类型的指标,我们可以通过 (histogram_quantile(1, tekton_pipelines_controller_pipelinerun_duration_seconds_bucket))
获取 pipelinerun 执行的大概时间,或者通过 count by (namespace, pipeline) (tekton_pipelines_controller_pipelinerun_duration_seconds_sum)
获取 pipeline 执行了多少次。当然,还可以使用 histogram_quantile
统计指定百分比的流水线执行完成,需要多长时间。
2.2 tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_[bucket, sum, count]
tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_bucket{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, le="+Inf", namespace=“account”, pipeline=“pay-c9tn0h6b23t28qjnp5mg”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“pay-c9tno3mb23t28qjnp660”, status=“failed”, task=“approve”, taskrun=“pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg”, version=“v0.24.1”} 1tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_sum{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“account”, pipeline=“pay-c9tn0h6b23t28qjnp5mg”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“pay-c9tno3mb23t28qjnp660”, status=“failed”, task=“approve”, taskrun=“pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg”, version=“v0.24.1”} 1461438tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“account”, pipeline=“pay-c9tn0h6b23t28qjnp5mg”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“pay-c9tno3mb23t28qjnp660”, status=“failed”, task=“approve”, taskrun=“pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg”, version=“v0.24.1”} 18这是一个 Histogram 类型的指标,具体使用可以参考上面的 tekton_pipelines_controller_pipelinerun_duration_seconds_
指标。
2.3 tekton_pipelines_controller_pipelinerun_count
tekton_pipelines_controller_pipelinerun_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, pipeline_tekton_dev_release=“v0.24.1”, status=“success”, version=“v0.24.1”} 7540在整个集群上,流水线总共成功执行了 7540 次
2.4 tekton_pipelines_controller_running_pipelineruns_count
tekton_pipelines_controller_running_pipelineruns_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, pipeline_tekton_dev_release=“v0.24.1”, version=“v0.24.1”} 1在整个集群上,正在运行 1 条流水线
2.5 tekton_pipelines_controller_taskrun_duration_seconds_[bucket, sum, count]
如果直接使用 taskrun 而不是 pipelinerun 运行任务,才会有这些指标。
2.6 tekton_pipelines_controller_taskrun_count
tekton_pipelines_controller_taskrun_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, pipeline_tekton_dev_release=“v0.24.1”, status=“success”, version=“v0.24.1”} 43423在整个集群上,taskrun 成功执行了 43423 次
2.7 tekton_pipelines_controller_running_taskruns_count
tekton_pipelines_controller_running_taskruns_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, pipeline_tekton_dev_release=“v0.24.1”, version=“v0.24.1”} 1在整个集群上,正在运行 1 个 taskrun 任务
2.8 tekton_pipelines_controller_taskruns_pod_latency
tekton_pipelines_controller_taskruns_pod_latency{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“asimov”, pipeline_tekton_dev_release=“v0.24.1”, pod=“p-caa3ljeb23td2d6v8t7g-fetch-main-repo-fr62k-pod-sh4vp”, task=“git-clone”, taskrun=“p-caa3ljeb23td2d6v8t7g-fetch-main-repo-fr62k”, version=“v0.24.1”} 3000000000p-caa3ljeb23td2d6v8t7g-fetch-main-repo-fr62k
taskrun 任务创建 Pod 的启动延时为 3000000000,这里的延时是秒级别,因此单位应该是纳秒,也就是 3 秒。
- tekton_pipelines_controller_cloudevent_count
tekton_pipelines_controller_cloudevent_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“account”, pipeline=“pay-c9tn0h6b23t28qjnp5mg”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“pay-c9tno3mb23t28qjnp660”, status=“failed”, task=“approve”, taskrun=“pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg”, version=“v0.24.1”} 0Tekton 可以与 CloudEvent 集成,将事件发送到 CloudEvent 进行广播。
2.9 tekton_pipelines_controller_client_latency_[bucket, sum, count]
Tekton 中使用 Client 与 Kubernete Apiserver 交互。tekton_pipelines_controller_client_latency_bucket{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, le=“1”, pipeline_tekton_dev_release=“v0.24.1”, version=“v0.24.1”} 11627le=“0.1” 10019,在 0.1 秒内,处理了 10019 个请求le=“1” 11627,在 1 秒内,处理了 11627 个请求le=“10”,在 10 秒内,处理了 11633 个请求其他相关的指标还有:tekton_pipelines_controller_client_latency_sumtekton_pipelines_controller_client_latency_count
3. Grafana 面板
针对上面的一些描述,我绘制了一个 Tekton Overview 的 Grafana 面板,链接: https://grafana.com/grafana/dashboards/16559-tekton-overview下面是一些面板截图:如果你也需要使用这个面板,别忘了开启 Label 的采集,参考: 如何采集 Kubernetes 对象的 labels 和 annotations。
4. 参考
- https://tekton.dev/docs/pipelines/metrics/
- https://ish-ar.io/tekton-and-prometheus/
- https://github.com/tektoncd/pipeline/blob/main/docs/metrics.md
- https://grafana.com/grafana/dashboards/16559-tekton-overview