Prometheus Operator 持久化存储

2023年 5月 4日 57.5k 0

因为Prometheus operator默认情况下没有将数据持久化存储,当Pod被删除或者意外重启后,可能会造成数据丢失。

这里我使用NFS客户端进行演示,关于其他后端存储引擎可以参考官网的storageclass。文章的大部分部署参数都是以前介绍过的这里不过多说明,不明白可以先看看pv pvc以及storageclass的理论。

Kubernetes PV与PVC

新闻联播老司机

  • 20年1月31日
  • 喜欢:0
  • 浏览:4.9k
  • 持久化存储 StorageClass

    新闻联播老司机

  • 20年2月1日
  • 喜欢:0
  • 浏览:2.2k
  • 环境说明

    192.168.0.10  k8s-01
    192.168.0.11  k8s-02
    192.168.0.12  k8s-03
    192.168.0.13  k8s-04
    
    192.168.0.14  NFS服务器
    

    首先部署NFS-Server,在192.168.0.14服务器安装NFS服务

    #这里我使用单独服务器进行演示,实际上顺便使用一台服务器安装nfs都可以 (建议和kubernetes集群分开,找单独一台机器)
    [root@nfs ~]# yum install nfs-utils -y rpcbind
    
    #接下来设置nfs存储目录
    [root@nfs ~]# mkdir /data1/k8s-volume -p
    [root@nfs ~]# chmod 755 /data1/k8s-volume/
    
    #编辑nfs配置文件
    [root@nfs ~]# cat /etc/exports
    /data1/k8s-volume  *(rw,no_root_squash,sync)
    
    #存储目录,*允许所有人连接,rw读写权限,sync文件同时写入硬盘及内存,no_root_squash 使用者root用户自动修改为普通用户
    
    
    接下来启动rpcbind
    [root@nfs ~]# systemctl start rpcbind
    [root@nfs ~]# systemctl enable rpcbind
    [root@nfs ~]# systemctl status rpcbind
    ● rpcbind.service - RPC bind service
       Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled)
       Active: active (running) since 二 2020-03-10 07:41:39 EDT; 19s ago
     Main PID: 4430 (rpcbind)
       CGroup: /system.slice/rpcbind.service
               └─4430 /sbin/rpcbind -w
    
    3月 10 07:41:39 NFS systemd[1]: Starting RPC bind service...
    3月 10 07:41:39 NFS systemd[1]: Started RPC bind service.
    
    
    
    #启动NFS
    [root@nfs ~]# systemctl restart nfs
    [root@nfs ~]# systemctl enable nfs
    [root@nfs ~]# systemctl status nfs
    ● nfs-server.service - NFS server and services
       Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
      Drop-In: /run/systemd/generator/nfs-server.service.d
               └─order-with-mounts.conf
       Active: active (exited) since 二 2020-03-10 07:42:17 EDT; 8s ago
     Main PID: 4491 (code=exited, status=0/SUCCESS)
       CGroup: /system.slice/nfs-server.service
    
    3月 10 07:42:17 NFS systemd[1]: Starting NFS server and services...
    3月 10 07:42:17 NFS systemd[1]: Started NFS server and services.
    
    
    
    #检查rpcbind及nfs是否正常
    [root@nfs ~]# rpcinfo |grep nfs
        100003    3    tcp       0.0.0.0.8.1            nfs        superuser
        100003    4    tcp       0.0.0.0.8.1            nfs        superuser
        100227    3    tcp       0.0.0.0.8.1            nfs_acl    superuser
        100003    3    udp       0.0.0.0.8.1            nfs        superuser
        100003    4    udp       0.0.0.0.8.1            nfs        superuser
        100227    3    udp       0.0.0.0.8.1            nfs_acl    superuser
        100003    3    tcp6      ::.8.1                 nfs        superuser
        100003    4    tcp6      ::.8.1                 nfs        superuser
        100227    3    tcp6      ::.8.1                 nfs_acl    superuser
        100003    3    udp6      ::.8.1                 nfs        superuser
        100003    4    udp6      ::.8.1                 nfs        superuser
        100227    3    udp6      ::.8.1                 nfs_acl    superuser
    
    
    #查看nfs目录挂载权限
    [root@NFS ~]# cat /var/lib/nfs/etab
    /data1/k8s-volume   *(rw,sync,wdelay,hide,nocrossmnt,secure,no_root_squash,no_all_squash,no_subtree_check,secure_locks,acl,no_pnfs,anonuid=65534,anongid=65534,sec=sys,rw,secure,no_root_squash,no_all_squash)
    

    我们nfs server端已经完毕,接下来在所有需要nfs挂载的集群节点安装以下

    [root@所有节点 ~]# yum install -y nfs-utils rpcbind
    
    [root@所有节点 ~]# systemctl start rpcbind
    [root@所有节点 ~]# systemctl enable rpcbind
    [root@所有节点 ~]# systemctl start nfs
    [root@所有节点 ~]# systemctl enable nfs
    Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service.
    
    
    ##如果rpcbind启动提示Job rpcbind.service/start failed with result 'dependency'.可以执行下面的命令
    
    # 查看启动文件路径find /etc/ -name '*rpcbind.socket*'
    sed -i 's/ListenStream=[::]:111/#ListenStream=[::]:111/g' /etc/systemd/system/sockets.target.wants/rpcbind.socket
    systemctl daemon-reload
    systemctl restart rpcbind.socket
    systemctl start nfs
    

    NFS安装完毕后我们可以看一下prometheus operator数据存储的目录

    root@k8s-01 ~]# kubectl get pod -n monitoring prometheus-k8s-0 -o yaml
    ....
        volumeMounts:
        - mountPath: /etc/prometheus/config_out
          name: config-out
          readOnly: true
        - mountPath: /prometheus
          name: prometheus-k8s-db
        - mountPath: /etc/prometheus/rules/prometheus-k8s-rulefiles-0
          name: prometheus-k8s-rulefiles-0
    ....
      - emptyDir: {}
        name: prometheus-k8s-db
      - name: prometheus-k8s-token-6rv95
    

    这里/prometheus目录使用的是emptyDir进行挂载,我们重建Pod之后之前的数据就没有了,由于我们的Prometheus使用Statefulset控制器进行部署的,为了保证数据一致性,这里采用storageclass来做持久化
    因为我们要使用NFS作为后端存储,这里需要一个nfs-client

    #现在还需要创建NFS-Client,不然prometheus pod现在是无法Running状态
    
    kind: Deployment
    apiVersion: apps/v1
    metadata:
      name: nfs-client-provisioner
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: nfs-client-provisioner
      strategy:
        type: Recreate
      template:
        metadata:
          labels:
            app: nfs-client-provisioner
        spec:
          serviceAccountName: nfs-client-provisioner
          containers:
            - name: nfs-client-provisioner
              image: quay.io/external_storage/nfs-client-provisioner:latest
              volumeMounts:
                - name: nfs-client-root
                  mountPath: /persistentvolumes
              env:
                - name: PROVISIONER_NAME
                  value: fuseim.pri/ifs
                - name: NFS_SERVER
                  value: 192.168.0.14           #nfs server 地址
                - name: NFS_PATH
                  value: /data1/k8s-volume     #nfs共享目录
          volumes:
            - name: nfs-client-root
              nfs:
                server: 192.168.0.14
                path: /data1/k8s-volume
    

    创建nfs-client rbac文件

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: nfs-client-provisioner
    
    ---
    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: nfs-client-provisioner-runner
    rules:
      - apiGroups: [""]
        resources: ["persistentvolumes"]
        verbs: ["get", "list", "watch", "create", "delete"]
      - apiGroups: [""]
        resources: ["persistentvolumeclaims"]
        verbs: ["get", "list", "watch", "update"]
      - apiGroups: ["storage.k8s.io"]
        resources: ["storageclasses"]
        verbs: ["get", "list", "watch"]
      - apiGroups: [""]
        resources: ["events"]
        verbs: ["list", "watch", "create", "update", "patch"]
      - apiGroups: [""]
        resources: ["endpoints"]
        verbs: ["create", "delete", "get", "list", "watch", "patch", "update"]
    
    ---
    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: run-nfs-client-provisioner
    subjects:
      - kind: ServiceAccount
        name: nfs-client-provisioner
        namespace: default
    roleRef:
      kind: ClusterRole
      name: nfs-client-provisioner-runner
      apiGroup: rbac.authorization.k8s.io
    

    创建

    [root@k8s-01 manifests]# kubectl apply -f nfs-rbac.yaml
    serviceaccount/nfs-client-provisioner created
    clusterrole.rbac.authorization.k8s.io/nfs-client-provisioner-runner created
    clusterrolebinding.rbac.authorization.k8s.io/run-nfs-client-provisioner created
    [root@k8s-01 manifests]# kubectl  apply  -f nfs-client.yaml
    deployment.apps/nfs-client-provisioner created
    
    [root@k8s-01 manifests]# kubectl get pod
    NAME                                      READY   STATUS    RESTARTS   AGE
    myapp-5jlc7                               1/1     Running   1          2d
    myapp-cg4lq                               1/1     Running   2          3d8h
    myapp-pplfn                               1/1     Running   1          3d8h
    myapp-wkfqz                               1/1     Running   2          3d8h
    nfs-client-provisioner-57cb5b4cfd-kbttp   1/1     Running   0          2m1s
    

    这里创建一个StorageClass对象

    [root@k8s-01 ~]# cat prometheus-storageclass.yaml
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: prometheus-data-db
    provisioner: fuseim.pri/ifs
    
    #创建
    [root@k8s-01 ~]# kubectl apply -f  prometheus-storageclass.yaml
    storageclass.storage.k8s.io/prometheus-data-db created
    

    这里我们声明Storageclass对象,其中provisioner=fuseim.pri/ifs,则是我们集群中使用NFS作为后端存储
    接下来我们在Prometheus中添加如下配置

    vim kube-prometheus-master/manifests/prometheus-prometheus.yaml
    ...
      storage:
        volumeClaimTemplate:
          spec:
            storageClassName: prometheus-data-db
            resources:
              requests:
                storage: 100Gi
    ....
    
    #只需要在sepc:中添加对应的信息,storageClassName为刚刚创建的名称,storage为资源对象大小
    

    image_1e327rqrqb2718dju8o1h6bu7nm.png-260.5kB
    Prometheus完整配置文件如下

    [root@k8s-01 manifests]# cat prometheus-prometheus.yaml
    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
      labels:
        prometheus: k8s
      name: k8s
      namespace: monitoring
    spec:
      alerting:
        alertmanagers:
        - name: alertmanager-main
          namespace: monitoring
          port: web
      storage:
        volumeClaimTemplate:
          spec:
            storageClassName: prometheus-data-db
            resources:
              requests:
                storage: 10Gi
      baseImage: quay.io/prometheus/prometheus
      nodeSelector:
        beta.kubernetes.io/os: linux
      replicas: 2
      resources:
        requests:
          memory: 400Mi
      ruleSelector:
        matchLabels:
          prometheus: k8s
          role: alert-rules
      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 1000
      serviceAccountName: prometheus-k8s
      serviceMonitorNamespaceSelector: {}
      serviceMonitorSelector: {}
      version: v2.11.0
    

    查看prometheus启动状态

    [root@k8s-01 manifests]# kubectl get pod -n monitoring
    NAME                                   READY   STATUS    RESTARTS   AGE
    alertmanager-main-0                    2/2     Running   0          11h
    alertmanager-main-1                    2/2     Running   15         8d
    alertmanager-main-2                    2/2     Running   11         4d3h
    grafana-558647b59-msz6b                1/1     Running   5          8d
    kube-state-metrics-5bfc7db74d-r95r2    4/4     Running   21         8d
    node-exporter-24kdw                    2/2     Running   10         8d
    node-exporter-4pqhb                    2/2     Running   8          8d
    node-exporter-pbjb2                    2/2     Running   8          8d
    node-exporter-vcq6c                    2/2     Running   10         8d
    prometheus-adapter-57c497c557-7jqq7    1/1     Running   1          2d
    prometheus-k8s-0                       3/3     Running   1          2m4s
    prometheus-k8s-1                       3/3     Running   1          2m3s
    prometheus-operator-69bd579bf9-vq8cd   1/1     Running   1          2d
    

    我们可以看一下pv和pvc对象资源

    [root@k8s-01 manifests]# kubectl get pv -n monitoring|grep prom
    pvc-5ee985bb-62cd-11ea-b6d7-000c29eeccce   10Gi       RWO            Delete           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-0   prometheus-data-db             2m36s
    pvc-5f0d05c0-62cd-11ea-b6d7-000c29eeccce   10Gi       RWO            Delete           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-1   prometheus-data-db             2m45s
    [root@k8s-01 manifests]# kubectl get pvc -n monitoring
    NAME                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS         AGE
    prometheus-k8s-db-prometheus-k8s-0   Bound    pvc-5ee985bb-62cd-11ea-b6d7-000c29eeccce   10Gi       RWO            prometheus-data-db   2m49s
    prometheus-k8s-db-prometheus-k8s-1   Bound    pvc-5f0d05c0-62cd-11ea-b6d7-000c29eeccce   10Gi       RWO            prometheus-data-db   2m48s
    [root@k8s-01 manifests]#
    

    接下来可以测试一下Pod删除之后数据是否丢失
    记录删除点
    image_1e3284hg61b299jceeed8m1oqq13.png-149.7kB
    删除Pod

    [root@k8s-01 manifests]# kubectl delete pod -n monitoring prometheus-k8s-0
    pod "prometheus-k8s-0" deleted
    [root@k8s-01 manifests]# kubectl delete pod -n monitoring prometheus-k8s-1
    pod "prometheus-k8s-1" deleted
    

    等新Pod启动查看结果
    image_1e3286r6vsf81lo2l50lv61ka1g.png-160kB
    可以看到数据没有丢失

    相关文章:

    1. Kubernetes 1.14 二进制集群安装
    2. CentOS 7 ETCD集群配置大全
    3. Kubenetes 1.13.5 集群二进制安装
    4. Kubernetes PV与PVC

    相关文章

    对接alertmanager创建钉钉卡片(1)
    手把手教你搭建OpenFalcon监控系统
    无需任何魔法即可使用 Ansible 的神奇变量“hostvars”
    openobseve HA本地单集群模式
    基于k8s上loggie/vector/openobserve日志收集
    openobseve单节点和查询语法

    发布评论