如何将OCPAgent采集到的时序数据对接到时序监控系统？

数据运维 2024-05-07 LOVEHL^ˇ^ 手机阅读

作者简介：时永宾，OceanBase高级开发工程师，云产品与中间件团队成员，工作方向是OceanBase监控和告警的维护和开发。

OCP-Agent是OceanBase云平台开发的管理、监控进程，可以采集OceanBase数据库性能监控，提供基础的运维管控功能；本节主要介绍如何将OCP-Agent采集到的时序数据对接到常见的时序监控系统。目前市面上常见的监控系统有Prometheus，Zabbix，VictoriaMetrics，Influxdb等，推送监控数据是希望与客户的监控平台打通，满足客户定义监控大屏，对接客户已有的监控告警系统。按对接方式不同可以分为推送（push）和拉取（pull）两种方案，OCP-Agent不仅提供拉取监控数据的功能，也提供了推送监控数据的功能，可以将OCP-Agent采集到的监控数据直接推送到时序监控系统。目前支持推送到 Pushgateway，也可以推送到任意HTTP API，如VictoriaMetrics，包括Pushgateway。

OCP-Agent监控

OCP-Agent是部署在主机上的管理、监控进程，其分为3个常驻进程和1个黑屏工具：

如何将OCPAgent采集到的时序数据对接到时序监控系统？-每日运维

图1：OCP-Agent结构

ocp_mgragent 是管理进程，执行OBServer、OBProxy的运维指令；
ocp_monagent 是监控进程，采集OBServer、OBProxy、主机监控数据；
ocp_agentd 是管理进程、监控进程的守护进程；
ocp_agentctl 是黑屏运维工具。

OCP-Agent会产生2类时序监控数据，监控数据的格式是Prometheus：

1、业务性能监控数据，如主机监控、OBServer监控、租户监控、集群监控等；

2、OCP-Agent进程自监控，监控OCP-Agent自身运行状态，如OCP-Agent的资源占用情况，内部各个组件的运行状态等。

OCP-Agent可将业务性能监控数据推送到客户的监控系统，自监控暂不推送。

除了时序数据，OCP-Agent还可以采集到OB的SQL审计类数据，可用于SQL诊断分析、告警；日志类数据用于定位故障原因。非时序数据不在本次的方案之内。

时序监控数据拉方案

OCP-Agent的监控接口设置了鉴权，密码为随机值且没有对外提供接口查询鉴权密码。故提供了将监控接口鉴权关闭的配置，关闭鉴权之后可以通过拉模式对接监控数据。

监控接口开关：ocp.agent.auth.metric-auth-enabled，默认值为 true（开启鉴权），改为false可通过pull模式对接监控。

监控接口如下：

如何将OCPAgent采集到的时序数据对接到时序监控系统？-每日运维

时序监控数据推送方案

推送到Pushgateway

OCP-Agent提供了将监控数据推送到Pushgateway的功能，其架构如下图所示：

如何将OCPAgent采集到的时序数据对接到时序监控系统？-每日运维

图2：OCP-Agent推送监控到Pushgateway

Pushgateway是Prometheus生态的一个组件，支持推送的模式。OCP-Agent将监控数据推送到Pushgateway，Prometheus再从Pushgateway拉取监控数据。

推送的流程如下：

1、OCP-Server会将元数据推送到OCP-Agent，这些元数据包括：推送的监控指标、推送的频率、推送的目标地址等；

2、OCP-Agent直接将监控数据推送到目标Pushgateway，Prometheus再从Pushgateway拉取到监控数据；

3、在Grafana中配置Prometheus的数据源，即可看到监控数据。

需要注意：

1、所有的OCP-Agent是同一份元数据，无法配置不同OCP-Agent推送到不同的地址；如果要推送到不同的目标地址，需要配置设置（黑屏设置推送文件中的目标地址，然后重启OCP-Agent）。

2、OCP-Agent是直接推送到Pushgateway，需部署Pushgateway组件，并保证OCP-Agent与监控平台的网络连通。

推送到任意HTTP API

为了应对未来多样化的需求，OCP-Agent将推送地址做了扩展，可以推送到任意HTTP API。以推送到VictoriaMetrics为例，其部署架构如下所示：

主机

如何将OCPAgent采集到的时序数据对接到时序监控系统？-每日运维

图3：OCP-Agent推送监控到VictoriaMetrics

VictoriaMetrics（简称vmagent）是Prometheus监控方案的替代方案，仅支持Push方案。VictoriaMetrics相比Prometheus，其扩展行、性能、资源占用等都有巨大优势，并且兼容Prometheus。

推送的流程如下：

1、OCP-Server会将元数据推送到OCP-Agent，这些元数据包括：推送哪些监控数据、推送的频率、推送的目标地址等；

2、OCP-Agent直接将监控数据推送到目标VictoriaMetrics；

3、在Grafana中配置VictoriaMetrics（选Prometheus即可）的数据源，即可看到监控数据。

推送Pushgateway和VictoriaMetrics压测对比

ocp_monagent限制了单核，最多占满1个核。OCP 3.3.0版本分别推送到Pushgateway和VictoriaMetrics（单机模式），在相同的压力情况下，压测结果如下（仅供参考）：

如何将OCPAgent采集到的时序数据对接到时序监控系统？-每日运维

推送到Pushgateway和VictoriaMetrics都使用了自研的推送插件pushhttp，综合来看VictoriaMetrics的性能优于Pushgateway的性能。

推送配置

OCP-Agent监控采用流水线插件的模式，可以在一个配置文件中定义多个流水线，每个流水线中可以配置多个采集插件（inputs）、多个处理插件（processors）和一个推送插件（output）。示例如下：

该配置文件中有2个流水线：ob_push_basic和ob_push_extra，采集不同的OceanBase性能监控（obInputBasic插件和obInputExtra插件，分别对应http://<ip>:62889/metrics/ob/basic和http://<ip>:62889/metrics/ob/extra的监控数据），使用了相同的推送插件（pushOutput）。

  - module: monitor.pushhttp.ob
    moduleType: monagent.pipeline
    disabled: false
    process: ocp_monagent
    config:
      name: monitor.pushhttp.ob
      status: ${monagent.pipeline.ob.status}
      pipelines:
        - name: ob_push_basic
          config:
            scheduleStrategy: periodic
            period: 1s
          structure:
            inputs:
              - <<: *obInputBasic
            processors:
              - <<: *retagProcessor
            output:
              <<: *pushOutput
        - name: ob_push_extra
          config:
            scheduleStrategy: periodic
            period: 60s
          structure:
            inputs:
              - <<: *obInputExtra
            processors:
              - <<: *retagProcessor
            output:
              <<: *pushOutput

在配置文件中需要定义所使用的插件（inputs、processors和output），本文主要介绍推送插件的配置。

gatewayOutput插件配置

使用Prometheus官方包 github.com/prometheus/client_golang/prometheus/push实现。

gatewayOutput插件配置如下：

pushOutput: &pushOutput
  plugin: gatewayOutput
  config:
    timeout: 10s
    pluginConfig:
      batchSize: 500
      workerThreads: 4
      retryTimes: 1
      gatewayUrl: <pushgateway-IP>:9091
      job: push
      instance: ${monagent.host.ip}
      hostIp: ${monagent.host.ip}

将 <pushgateway-IP> 替换为真实的Pushgateway地址，并替换配置文件中pushOutput插件的配置。

httpOutput插件推送到Pushgateway

pushOutput: &pushOutput
  plugin: httpOutput
  config: 
    timeout: 10s
    pluginConfig:
      protocol: promeproto
      exportTimestamp: false
      timestampPrecision: millisecond
      batchSize: 800
      taskQueueSize: 128
      pushTaskCount: 32
      retryTaskCount: 8
      retryTimes: 1
      http:
        targetAddress: http://<pushgateway-IP>:9091
        proxyAddress: 
        apiUrl: /metrics/job/push/instance/${monagent.host.ip}
        httpMethod: POST
        basicAuthEnabled: false
        username: 
        password: 
        timeout: 2s
        contentType: 'application/vnd.google.protobuf; proto=io.prometheus.client.MetricFamily; encoding=delimited'
        headers: ["ocp-agent:3.2.3 bp2", "plugin:httpOutput"]
        acceptedResponseCodes: [202,200]
        maxIdleConns: 4
        maxConnsPerHost: 4
        maxIdleConnsPerHost: 4
        responseHeaderTimeout: 2s
        expectContinueTimeout: 2s

将 <pushgateway-IP> 替换为真实的Pushgateway地址，并替换配置文件中pushOutput插件的配置。

httpOutput插件支持如下功能：

1、promeproto和prometheus2种数据格式；

2、是否带时间戳；

3、单次推送的数据量，失败重试次数；

4、推送地址，推送地址代理，认证，推送超时时间等；

httpOutput插件推送到vmagent

vmagent单机模式配置：

pushOutput: &pushOutput
  plugin: httpOutput
  config: 
    timeout: 10s
    pluginConfig:
      protocol: prometheus
      exportTimestamp: true
      timestampPrecision: millisecond
      batchSize: 500
      taskQueueSize: 64
      pushTaskCount: 8
      retryTaskCount: 4
      retryTimes: 1
      http:
        targetAddress: http://<vmagent-IP>:8428
        proxyAddress: 
        apiUrl: /api/v1/import/prometheus
        httpMethod: POST
        basicAuthEnabled: false
        username: 
        password: 
        timeout: 1s
        contentType: 'text/plain; version=0.0.4; charset=utf-8'
        headers: ["key1:value1", "key2:value2"]
        acceptedResponseCodes: [200, 204]
        maxIdleConns: 64
        maxConnsPerHost: 64
        maxIdleConnsPerHost: 64

将 <vmagent-IP> 替换为真实的vmagent地址，并替换配置文件中pushOutput插件的配置。

vmagent多租户模式配置（/insert/0/prometheus中的0为vmagent租户ID）：

pushOutput: &pushOutput
  plugin: httpOutput
  config: 
    timeout: 10s
    pluginConfig:
      protocol: prometheus
      exportTimestamp: true
      timestampPrecision: millisecond
      batchSize: 800
      taskQueueSize: 128
      pushTaskCount: 32
      retryTaskCount: 8
      retryTimes: 1
      http:
        targetAddress: http://<vmagent-IP>:8480/insert/0/prometheus
        proxyAddress: 
        apiUrl: /api/v1/import/prometheus
        httpMethod: POST
        basicAuthEnabled: false
        username: 
        password: 
        timeout: 2s
        contentType: 'text/plain; version=0.0.4; charset=utf-8'
        headers: ["Agent-Version:3.3.0"]
        acceptedResponseCodes: [204]
        maxIdleConns: 4
        maxConnsPerHost: 4
        maxIdleConnsPerHost: 4
        responseHeaderTimeout: 2s
        expectContinueTimeout: 2s

欢迎持续关注 OceanBase 技术社区，我们将不断输出技术干货内容，与千万技术人共同成长！！！

搜索🔍钉钉群（33254054），或扫描下方二维码，还可进入 OceanBase 技术答疑群，有任何技术问题在里面都能找到答案哦～

如何将OCPAgent采集到的时序数据对接到时序监控系统？-每日运维

如何将OCPAgent采集到的时序数据对接到时序监控系统？

MySQL 常见数据拆分办法

两年两度升级数据库，我们经历了什么

怎么导入mysql数据库驱动包

cmd启动oracle服务

msf oracle