blackbox exporter 是prometheus社区提供的黑盒监控解决方案,运行用户通过HTTP、HTTPS、DNS、TCP以及ICMP的方式对网络进行探测(主动监测主机与服务状态)。
文章目录
Prometheus和Grafana安装以前已经写过很多次了,如果没有安装的小同学可以参考下面的文章安装
Docker版本
Prometheus 监控MySQL数据库
新闻联播老司机
K8s版本
Prometheus Grafana使用Ceph持久化并监控k8s集群
新闻联播老司机
blackbox exporter
blackbox exporter 是prometheus社区提供的黑盒监控解决方案,运行用户通过HTTP、HTTPS、DNS、TCP以及ICMP的方式对网络进行探测(主动监测主机与服务状态)。
安装Blackbox exporter
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.21.1/blackbox_exporter-0.21.1.linux-amd64.tar.gz tar zxvf blackbox_exporter-0.21.1.linux-amd64.tar.gz mkdir /usr/local/exporter mv blackbox_exporter-0.21.1.linux-amd64 /usr/local/exporter/blackbox_exporter #修改配置文件 cat >/usr/local/exporter/blackbox_exporter/blackbox.yml<<EOF modules: http_2xx: # http 检测模块 Blockbox-Exporter 中所有的探针均是以 Module 的信息进行配置 prober: http timeout: 30s http: valid_http_versions: ["HTTP/1.1", "HTTP/2"] valid_status_codes: [200] # 这里最好作一个返回状态码,在grafana作图时,有明示---陈刚注释。 method: GET preferred_ip_protocol: "ip4" http_post_2xx: # http post 监测模块 prober: http timeout: 10s http: valid_http_versions: ["HTTP/1.1", "HTTP/2"] method: POST preferred_ip_protocol: "ip4" tcp_connect: # TCP 检测模块 prober: tcp timeout: 10s EOF #启动 /usr/local/exporter/blackbox_exporter/blackbox_exporter --config.file=/usr/local/exporter/blackbox_exporter/blackbox.yml #启动没报错就可以退出
刚刚检测启动没有问题,我们编辑启动脚本
cat >/usr/lib/systemd/system/blackbox_exporter.service<<EOF [Unit] Description=blackbox_exporter After=network.target [Service] User=prometheus Group=prometheus WorkingDirectory=/usr/local/exporter/blackbox_exporter ExecStart=/usr/local/exporter/blackbox_exporter/blackbox_exporter [Install] WantedBy=multi-user.target EOF
启动测试
# 启动 [root@abcdocker system]# systemctl restart blackbox_exporter # 查看状态 [root@abcdocker system]# systemctl status blackbox_exporter # 开机自启 [root@abcdocker system]# systemctl enable blackbox_exporter
默认端口号9115
docker run --rm -d -p 9115:9115 --name blackbox_exporter -v /usr/local/exporter/blackbox_exporter:/config prom/blackbox-exporter:master --config.file=/config/blackbox.yml
检查端口启动
[root@prometheus blackbox_exporter]# docker ps|grep black 8c5302d44971 prom/blackbox-exporter:master "/bin/blackbox_expor…" 52 seconds ago Up 51 seconds 0.0.0.0:9115->9115/tcp blackbox_exporter
测试端口号
[root@prometheus blackbox_exporter]# curl 127.0.0.1:9115/metrics # HELP blackbox_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which blackbox_exporter was built. # TYPE blackbox_exporter_build_info gauge blackbox_exporter_build_info{branch="master",goversion="go1.16.10",revision="70bff7941301753b125a40bcf6b3ed28935a9a94",version="0.19.0"} 1 # HELP blackbox_exporter_config_last_reload_success_timestamp_seconds Timestamp of the last successful configuration reload. # TYPE blackbox_exporter_config_last_reload_success_timestamp_seconds gauge blackbox_exporter_config_last_reload_success_timestamp_seconds 1.6562274758327048e+09 # HELP blackbox_exporter_config_last_reload_successful Blackbox exporter config loaded successfully. ... ... ...
Promethues 监控配置
Prometheus中配置--job
编辑Promethues配置文件
[root@prometheus ~]# cd /etc/prometheus/ [root@prometheus prometheus]# ls alertmanager prometheus.yml prometheus.yml_bak_2022-06-20 rules [root@prometheus prometheus]# vim prometheus.yml
添加下面的job_name
- job_name: 'blackbox_http_2xx' metrics_path: /probe params: module: [http_2xx] #配置get请求检测 static_configs: - targets: - http://prometheus.io # Target to probe with http. - https://i4t.com # Target to probe with https. - https://ukx.cn - https://k.i4t.com - https://nas.frps.cn - https://esxi.frps.cn - https://rancher.frps.cn - https://jumpserver.frps.cn - https://frps.cn - https://imgkb.com - https://grafana.frps.cn - https://down.frps.cn - https://my.ukx.cn - https://linux.ukx.cn relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 10.0.24.13:9115 #blackbox地址和端口号 - job_name: 'blackbox_tcp_connect' # 检测某些端口是否在线 scrape_interval: 30s metrics_path: /probe params: module: [tcp_connect] static_configs: - targets: - dsm.frps.cn:9091 - dsm.frps.cn:1998 - dsm.frps.cn:1999 - apiserver.frps.cn:8443 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 10.0.24.13:9115 # blackbox-exporter 服务所在的机器和端口
重启Prometheus
不建议使用127地址
Promethues Bloackbox参数解释
以下参数只是demo例子
1、ICMP 测试(主机探活)
可以通过 ping(icmp) 检测服务器的存活,在 blackbox.yml 配置文件中配置使用 icmp module:
modules: icmp: prober: icmp
Prometheus job文件如下
- job_name: 'blackbox-ping' metrics_path: /probe params: modelus: [icmp] static_configs: - targets: - 172.16.106.208 #被监控端ip - 172.16.106.80 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: IP:9115 #blackbox-exporter 所在的机器和端口
2、TCP 测试(监控主机端口存活状态)在 blackbox.yml配置文件中配置使用 tcp module:
modules: tcp_connect: prober: tcp
Prometheus
- job_name: 'blackbox-tcp' metrics_path: /probe params: modelus: [tcp_connect] static_configs: - targets: - 172.16.106.208:6443 - 172.16.106.80:6443 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: IP:9115
3、HTTP检测(监控网站状态)http 探针是进行黑盒监控时最常用的探针之一,通过 http 探针能够网站或者 http 服务建立有效的监控,包括其本身的可用性,以及用户体验相关的如响应时间等等。除了能够在服务出现异常的时候及时报警,还能帮助运维同学分析和优化网站体验。
在 blackbox.yml配置文件中配置使用 http module:
modules: http_2xx: prober: http http: method: GET http_post_2xx: prober: http http: method: POST
Prometheus job
- job_name: 'blackbox-http' metrics_path: /probe params: modelue: [http_2xx] static_configs: - targets: - https://i4t.com relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: IP:9115 #blackbox-exporter 所在的机器和端口
通过 prober 配置项指定探针类型。配置项 http 用于自定义探针的探测方式,这里有没对 http 配置项添加任何配置,表示完全使用 http 探针的默认配置,该探针将使用 http get 的方式对目标服务进行探测,并且验证返回状态码是否为 2xx,是则表示验证成功,否则失败。
采集数据如下
# DNS解析时间,单位 s probe_dns_lookup_time_seconds 0.000199105 # 探测从开始到结束的时间,单位 s,请求这个页面响应时间 probe_duration_seconds 0.010889113 # HELP probe_failed_due_to_regex Indicates if probe failed due to regex # TYPE probe_failed_due_to_regex gauge probe_failed_due_to_regex 0 # HTTP 内容响应的长度 probe_http_content_length -1 # 按照阶段统计每阶段的时间 probe_http_duration_seconds{phase="connect"} 0.001083728 #连接时间 probe_http_duration_seconds{phase="processing"} 0.008365885 #处理请求的时间 probe_http_duration_seconds{phase="resolve"} 0.000199105 #响应时间 probe_http_duration_seconds{phase="tls"} 0 #校验证书的时间 probe_http_duration_seconds{phase="transfer"} 0.000446424 #传输时间 # 重定向的次数 probe_http_redirects 0 # ssl 指示是否将 SSL 用于最终重定向 probe_http_ssl 0 # 返回的状态码 probe_http_status_code 200 # 未压缩的响应主体长度 probe_http_uncompressed_body_length 1766 # http 协议的版本 probe_http_version 1.1 # HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes. probe_ip_addr_hash 3.24030434e+09 # 使用的 ip 协议的版本号 probe_ip_protocol 4 # 是否探测成功 probe_success 1
Grafana 配置
Grafana模板推荐
AlertManager
alertmanager告警配置如下
alertmanager安装可以看下面文章,我这直接提供规则
AlertManager 微信告警配置
新闻联播老司机
alertmanager设置规则
[root@prometheus rules]# cat /etc/prometheus/rules/blackbox_exporter.yaml groups: - name: Blackbox 监控告警 rules: - alert: BlackboxSlowProbe expr: avg_over_time(probe_duration_seconds[1m]) > 1 for: 30m labels: severity: warning annotations: summary: telnet (instance {{ $labels.instance }}) 超时1秒 description: "VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: BlackboxProbeHttpFailure expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400 for: 30m labels: severity: critical annotations: summary: HTTP 状态码 (instance {{ $labels.instance }}) description: "HTTP status code is not 200-399n VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: BlackboxSslCertificateWillExpireSoon expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30 for: 30m labels: severity: warning annotations: summary: 域名证书即将过期 (instance {{ $labels.instance }}) description: "域名证书30天后过期n VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: BlackboxSslCertificateWillExpireSoon expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 7 for: 30m labels: severity: critical annotations: summary: 域名证书即将过期 (instance {{ $labels.instance }}) description: "域名证书7天后过期n VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: BlackboxSslCertificateExpired expr: probe_ssl_earliest_cert_expiry - time() <= 0 for: 30m labels: severity: critical annotations: summary: 域名证书已过期 (instance {{ $labels.instance }}) description: "域名证书已过期n VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: BlackboxProbeSlowHttp expr: avg_over_time(probe_http_duration_seconds[1m]) > 10 for: 30m labels: severity: warning annotations: summary: HTTP请求超时 (instance {{ $labels.instance }}) description: "HTTP请求超时超过10秒n VALUE = {{ $value }}n LABELS = {{ $labels }}"
重启prometheus
docker restart prometheus_new
此时Prometheus已经添加上,并且微信已经告警
相关文章:
- Prometheus 监控VMware_ESXI并配置AlertManager告警
- Prometheus Grafana使用Ceph持久化并监控k8s集群
- Prometheus监控Ceph集群并设置AlertManager告警
- AlertManager 微信告警配置