关于kubeprometheus中CPUThrottlingHigh-每日运维

我们遇到的场景是CPUThrottlingHigh 警报被正常触发，而触发的对象的CPU本身并不高，或者空闲。鉴于此，我们开始怀疑这个警报的必然性。

通常在许多情况下，会将此警报修改或者沉默，因为应用程序对延迟不敏感，即使受到限制也可以正常工作，警报基于原因而非症状。因此警报的级别是Info。但是并不能说明此警报是误报。并且沉默只会隐藏背后的真正问题。

目前这个问题仍然在讨论中，特别是在这个讨论的特别激烈108,而后在67577也有进一步的讨论

表达式如下：

sum(increase(container_cpu_cfs_throttled_periods_total{container!="", }[5m])) by (container, pod, namespace) / sum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace) > ( 25 / 100 )

目前，总结了几种易处理的方式

1，修改警报阈值比例，或者禁止他

2，取消或者修改对这些 pod 的限制

3, 内核4.18或者更高

3，完全禁止Kubernetes CFS配额(kubelet配置--cpu-cfs-quota=false)

我们尝试修改阈值

kubectl -n monitoring edit PrometheusRule  prometheus-k8s-rules

修改

    - alert: CPUThrottlingHigh      
      annotations:        
        description: '{{ $value | humanizePercentage }} throttling of CPU in namespace {{ $labels.namespace }} for container {{ $labels.container }} in pod {{ $labels.pod }}.'        
        runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/cputhrottlinghigh        
        summary: Processes experience elevated CPU throttling.      
        expr: | 
          sum(increase(container_cpu_cfs_throttled_periods_total{container!="", }[5m])) by (container, pod, namespace) / sum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace) > ( 75 / 100 )
        for: 15m      
        labels:        
          severity: info

其他相关参考：

https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/108
https://github.com/prometheus-operator/prometheus-operator/issues/2063
https://github.com/kubernetes/kubernetes/issues/67577
https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/
https://bugzilla.kernel.org/show_bug.cgi?id=198197
https://github.com/torvalds/linux/commit/512ac999d2755d2b7109e996a76b6fb8b888631d
https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1
https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
https://github.com/prometheus-operator/kube-prometheus/issues/214
https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/453
https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/b71dd35c6a1d509a1ee902eebe7afe943d8ee4b0/alerts/resource_alerts.libsonnet#L13
https://www.youtube.com/watch?v=UE7QX98-kO0
https://github.com/prometheus-operator/kube-prometheus/issues/861
https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/components/alertmanager.libsonnet#L26-L42
https://devops.stackexchange.com/questions/6494/prometheus-alert-cputhrottlinghigh-raised-but-monitoring-does-not-show-it

关于kubeprometheus中CPUThrottlingHigh

相关文章

发布评论取消回复

LOVEHL^ˇ^

如何使用 WinGet 下载 Microsoft Store 应用

下载丨66页PDF，云和恩墨技术通讯（2024年7月刊）

ETL数据集成丨快速将MySQL数据迁移至Doris数据库