Nginx 通过 stub_status 页面暴露了部分监控指标。Nginx Prometheus Exporter 会采集单个 Nginx 实例指标,并将其转化为 Prometheus 可用的监控数据, 最终通过 HTTP 协议暴露给 Prometheus 服务进行采集。我们可以通过 Exporter 上报重点关注的监控指标,用于异常报警和大盘展示。
本次基于http_stub_status_module模块获取数据源
Nginx 安装
如果之前没有安装,我这里有对应脚本,直接运行即可
wget https://d.frps.cn/file/tools/nginx/nginx_install.sh sh nginx_install.sh #版本后续会迭代更新,本次的版本为nginx-1.22
配置stub_status
确认 stub_status 模块启用之后,修改 Nginx 的配置文件指定 status 页面的 URL
[root@ops conf.d]# cat nginx.conf server { listen 80; server_name localhost; location /nginx_status { stub_status; access_log off; allow 127.0.0.1; deny all; } }
配置完毕检查
[root@ops conf.d]# curl localhost/nginx_status Active connections: 1 server accepts handled requests 1 1 1 Reading: 0 Writing: 1 Waiting: 0
Nginx Exporter 安装
项目地址:https://github.com/nginxinc/nginx-prometheus-exporter/releases
$ docker run -p 9113:9113 nginx/nginx-prometheus-exporter:0.10.0 -nginx.scrape-uri=http://<nginx>:8080/stub_status
http://localhost:80/nginx_status 地址为Nginx status地址
#下载nginx exporter [root@ops ~]# wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.11.0/nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz #启动测试 [root@ops ~]# ./nginx-prometheus-exporter -nginx.scrape-uri http://localhost:80/nginx_status NGINX Prometheus Exporter version=0.11.0 commit=e4a6810d4f0b776f7fde37fea1d84e4c7284b72a date=2022-09-07T21:09:51Z, dirty=false, arch=linux/amd64, go=go1.19 2023/06/07 14:31:35 Starting... 2023/06/07 14:31:35 Listening on :9113 2023/06/07 14:31:35 NGINX Prometheus Exporter has successfully started #设置后台启动 [root@ops ~]# nohup ./nginx-prometheus-exporter -nginx.scrape-uri http://localhost:80/nginx_status &
检查一下metric
[root@ops ~]# curl localhost:9113/metrics # HELP nginx_connections_accepted Accepted client connections # TYPE nginx_connections_accepted counter nginx_connections_accepted 4 # HELP nginx_connections_active Active client connections # TYPE nginx_connections_active gauge nginx_connections_active 1 # HELP nginx_connections_handled Handled client connections # TYPE nginx_connections_handled counter nginx_connections_handled 4 # HELP nginx_connections_reading Connections where NGINX is reading the request header # TYPE nginx_connections_reading gauge nginx_connections_reading 0 # HELP nginx_connections_waiting Idle client connections # TYPE nginx_connections_waiting gauge nginx_connections_waiting 0 # HELP nginx_connections_writing Connections where NGINX is writing the response back to the client # TYPE nginx_connections_writing gauge nginx_connections_writing 1 # HELP nginx_http_requests_total Total http requests # TYPE nginx_http_requests_total counter nginx_http_requests_total 5 # HELP nginx_up Status of the last metric scrape # TYPE nginx_up gauge nginx_up 1 # HELP nginxexporter_build_info Exporter build information # TYPE nginxexporter_build_info gauge nginxexporter_build_info{arch="linux/amd64",commit="e4a6810d4f0b776f7fde37fea1d84e4c7284b72a",date="2022-09-07T21:09:51Z",dirty="false",go="go1.19",version="0.11.0"} 1
metric相关参数值说明如下
这里的指标都是来自stub_status模块
nginx_connections_accepted | 已接受的客户端连接 | [] |
nginx_connections_active | 活动的客户端连接 | [] |
nginx_connections_handled | 处理客户端连接 | [] |
nginx_connections_reading | NGINX 正在读取请求标头的连接 | [] |
nginx_connections_waiting | 空闲客户端连接 | [] |
nginx_connections_writing | NGINX 将响应写回客户端的连接 | [] |
nginx_http_requests_total | http 请求总数 | [] |
nginx_up | NGINX状态;1表示抓取成功, 0表示抓取失败。 | [] |
配置 Prometheus 的抓取 Job
Exporter 和 Nginx 并非共同运行,所以数据上报的 instance 并不能真实描述是哪个实例,为了方便数据的检索和观察,我们可以修改 instance 标签,使用真实的 IP 进行替换以便更加直观
- job_name: 'abcdocker_nginx_exporter' static_configs: - targets: ['192.168.31.101:9113'] relabel_configs: - source_labels: [__address__] regex: '.*' target_label: instance replacement: '192.168.31.101:80'
Grafana 添加
将dashboard.json下载完毕上传到Grafana中即可
https://github.com/nginxinc/nginx-prometheus-exporter/blob/main/grafana/dashboard.json
备份站:https://d.frps.cn/file/tools/nginx/nginx_exporter_dashboard.json
Alertmanager 告警
[root@prometheus ~]# cat /etc/prometheus/rules/nginx_exporter.yaml groups: - name: Nginx Export监控 rules: - alert: NginxHighHttp4xxErrorRate expr: sum(rate(nginx_http_requests_total{status=~"^4.."}[1m])) / sum(rate(nginx_http_requests_total[1m])) * 100 > 5 for: 1m labels: severity: critical annotations: summary: Nginx high HTTP 4xx error rate (instance {{ $labels.instance }}) description: "Too many HTTP requests with status 4xx (> 5%)n VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: NginxHighHttp5xxErrorRate expr: sum(rate(nginx_http_requests_total{status=~"^5.."}[1m])) / sum(rate(nginx_http_requests_total[1m])) * 100 > 5 for: 1m labels: severity: critical annotations: summary: Nginx high HTTP 5xx error rate (instance {{ $labels.instance }}) description: "Too many HTTP requests with status 5xx (> 5%)n VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: NginxStatus expr: nginx_up != 1 for: 1m labels: severity: critical annotations: summary: Nginx 服务停止 (instance {{ $labels.instance }}) description: "NGINX 服务已停止 VALUE = {{ $value }}n LABELS = {{ $labels }}"
添加完成后,我们停止nginx做演示
效果图
**********告警通知********** 告警类型: NginxStatus 告警级别: critical ===================== 告警主题: Nginx 服务停止 (instance 192.168.31.101:80) 告警详情: NGINX 服务已停止 VALUE = 0 LABELS = map[__name__:nginx_up instance:192.168.31.101:80 job:abcdocker_nginx_exporter] 故障时间: 2023-06-07 17:23:48.61 +0800 CST 故障实例: 192.168.31.101:80
相关文章:
- nginx 基于tcp/udp代理
- Prometheus Operator
- Prometheus Operator 监控ETCD集群
- Nginx宝塔反向代理OpenAI接口