Prometheus 监控Windows Exporter并设置相关告警

2023年 7月 14日 71.2k 0

  • 一、下载一键安装包
https://github.com/prometheus-community/windows_exporter/releases/download/v0.15.0/windows_exporter-0.15.0-amd64.msi

#备份下载地址
https://d.frps.cn/file/tools/windows/windows_exporter-0.15.0-amd64.msi
https://d.frps.cn/file/tools/windows/windows_exporter-0.21.0-amd64.msi
https://d.frps.cn/file/tools/windows/windows_exporter-0.22.0-amd64.msi
  • 二、手动脚本安装

项目地址:https://github.com/prometheus-community/windows_exporter

msiexec /i <path-to-msi-file> ENABLED_COLLECTORS=os,service --% EXTRA_FLAGS="--collector.service.services-where ""Name LIKE 'sql%'"""

#自行修改路径

官方提供了软件包的方式,这里我采用软件包进行安装部署

双击安装软件包1688310096596.png

  • 默认安装到C:\Program Files (x86)\windows_exporter目录。
  • 默认监听端口是9182。
  • 默认采集指标:cpu、cpu_info、memory、process、tcp、cs、logical_disk、net、os、system、textfile、time。

1688310148699.png

安装完成后,可以在Windows服务控制台看到创建的服务。

打开服务

1688310235805.png

找到Windows exporter

1688310298294.png

检查服务是否正常

访问地址: http://localhost:9182/metrics

1688310385185.png

Prometheus 添加监控

  - job_name: 'windowsJ4125_server'
    metrics_path: '/metrics'
    static_configs:
      - targets:
        - 'dsm.frps.cn:9182'

#自行修改为Windows exporter IP

1688310716376.png

添加Alertmanager rule规则

groups:
    - name: Windows status
      rules:
      - alert: WindowsServerCollectorError
        expr: windows_exporter_collector_success == 0
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: Windows Server collector Error (instance {{ $labels.instance }})
          description: "Collector {{ $labels.collector }} was not successful\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
      - alert: WindowsServerServiceStatus
        expr: windows_service_status{status="ok"} != 1
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: Windows Server service Status (instance {{ $labels.instance }})
          description: "Windows Service state is not OK\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
      - alert: WindowsServerCpuUsage
        expr: 100 - (avg by (instance) (rate(windows_cpu_time_total{mode="idle"}[2m])) * 100) > 80
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: Windows Server CPU Usage (instance {{ $labels.instance }})
          description: "CPU Usage is more than 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
      - alert: WindowsServerMemoryUsage
        expr: 100 - ((windows_os_physical_memory_free_bytes / windows_cs_physical_memory_bytes) * 100) > 90
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: Windows Server memory Usage (instance {{ $labels.instance }})
          description: "Memory usage is more than 90%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
      - alert: WindowsServerDiskSpaceUsage
        expr: 100.0 - 100 * ((windows_logical_disk_free_bytes / 1024 / 1024 ) / (windows_logical_disk_size_bytes / 1024 / 1024)) > 80
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: Windows Server disk Space Usage (instance {{ $labels.instance }})
          description: "Disk usage is more than 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

重启Prometheus和alertmanager

[root@prometheus rules]# docker restart alertmanager
alertmanager
[root@prometheus rules]# docker restart prometheus_new
prometheus_new
[root@prometheus rules]# 

此时Windows exporter规则已经添加完毕1688311543808.png

Grafana视图添加

这里直接导入下面模板即可

14694

1688311683563.png

如果这个模板有的地方没有数据,请修改grafana配置1688311807039.png

选择编辑

1688311858625.png

此时就有数据了1688311878612.png

1688312283170.png

相关文章

对接alertmanager创建钉钉卡片(1)
手把手教你搭建OpenFalcon监控系统
无需任何魔法即可使用 Ansible 的神奇变量“hostvars”
openobseve HA本地单集群模式
基于k8s上loggie/vector/openobserve日志收集
openobseve单节点和查询语法

发布评论