因为Prometheus operator默认情况下没有将数据持久化存储,当Pod被删除或者意外重启后,可能会造成数据丢失。
这里我使用NFS客户端进行演示,关于其他后端存储引擎可以参考官网的storageclass。文章的大部分部署参数都是以前介绍过的这里不过多说明,不明白可以先看看pv pvc以及storageclass的理论。
Kubernetes PV与PVC
新闻联播老司机
持久化存储 StorageClass
新闻联播老司机
环境说明
192.168.0.10 k8s-01 192.168.0.11 k8s-02 192.168.0.12 k8s-03 192.168.0.13 k8s-04 192.168.0.14 NFS服务器
首先部署NFS-Server,在192.168.0.14服务器安装NFS服务
#这里我使用单独服务器进行演示,实际上顺便使用一台服务器安装nfs都可以 (建议和kubernetes集群分开,找单独一台机器) [root@nfs ~]# yum install nfs-utils -y rpcbind #接下来设置nfs存储目录 [root@nfs ~]# mkdir /data1/k8s-volume -p [root@nfs ~]# chmod 755 /data1/k8s-volume/ #编辑nfs配置文件 [root@nfs ~]# cat /etc/exports /data1/k8s-volume *(rw,no_root_squash,sync) #存储目录,*允许所有人连接,rw读写权限,sync文件同时写入硬盘及内存,no_root_squash 使用者root用户自动修改为普通用户 接下来启动rpcbind [root@nfs ~]# systemctl start rpcbind [root@nfs ~]# systemctl enable rpcbind [root@nfs ~]# systemctl status rpcbind ● rpcbind.service - RPC bind service Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled) Active: active (running) since 二 2020-03-10 07:41:39 EDT; 19s ago Main PID: 4430 (rpcbind) CGroup: /system.slice/rpcbind.service └─4430 /sbin/rpcbind -w 3月 10 07:41:39 NFS systemd[1]: Starting RPC bind service... 3月 10 07:41:39 NFS systemd[1]: Started RPC bind service. #启动NFS [root@nfs ~]# systemctl restart nfs [root@nfs ~]# systemctl enable nfs [root@nfs ~]# systemctl status nfs ● nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled) Drop-In: /run/systemd/generator/nfs-server.service.d └─order-with-mounts.conf Active: active (exited) since 二 2020-03-10 07:42:17 EDT; 8s ago Main PID: 4491 (code=exited, status=0/SUCCESS) CGroup: /system.slice/nfs-server.service 3月 10 07:42:17 NFS systemd[1]: Starting NFS server and services... 3月 10 07:42:17 NFS systemd[1]: Started NFS server and services. #检查rpcbind及nfs是否正常 [root@nfs ~]# rpcinfo |grep nfs 100003 3 tcp 0.0.0.0.8.1 nfs superuser 100003 4 tcp 0.0.0.0.8.1 nfs superuser 100227 3 tcp 0.0.0.0.8.1 nfs_acl superuser 100003 3 udp 0.0.0.0.8.1 nfs superuser 100003 4 udp 0.0.0.0.8.1 nfs superuser 100227 3 udp 0.0.0.0.8.1 nfs_acl superuser 100003 3 tcp6 ::.8.1 nfs superuser 100003 4 tcp6 ::.8.1 nfs superuser 100227 3 tcp6 ::.8.1 nfs_acl superuser 100003 3 udp6 ::.8.1 nfs superuser 100003 4 udp6 ::.8.1 nfs superuser 100227 3 udp6 ::.8.1 nfs_acl superuser #查看nfs目录挂载权限 [root@NFS ~]# cat /var/lib/nfs/etab /data1/k8s-volume *(rw,sync,wdelay,hide,nocrossmnt,secure,no_root_squash,no_all_squash,no_subtree_check,secure_locks,acl,no_pnfs,anonuid=65534,anongid=65534,sec=sys,rw,secure,no_root_squash,no_all_squash)
我们nfs server端已经完毕,接下来在所有需要nfs挂载的集群节点安装以下
[root@所有节点 ~]# yum install -y nfs-utils rpcbind [root@所有节点 ~]# systemctl start rpcbind [root@所有节点 ~]# systemctl enable rpcbind [root@所有节点 ~]# systemctl start nfs [root@所有节点 ~]# systemctl enable nfs Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service. ##如果rpcbind启动提示Job rpcbind.service/start failed with result 'dependency'.可以执行下面的命令 # 查看启动文件路径find /etc/ -name '*rpcbind.socket*' sed -i 's/ListenStream=[::]:111/#ListenStream=[::]:111/g' /etc/systemd/system/sockets.target.wants/rpcbind.socket systemctl daemon-reload systemctl restart rpcbind.socket systemctl start nfs
NFS安装完毕后我们可以看一下prometheus operator数据存储的目录
root@k8s-01 ~]# kubectl get pod -n monitoring prometheus-k8s-0 -o yaml .... volumeMounts: - mountPath: /etc/prometheus/config_out name: config-out readOnly: true - mountPath: /prometheus name: prometheus-k8s-db - mountPath: /etc/prometheus/rules/prometheus-k8s-rulefiles-0 name: prometheus-k8s-rulefiles-0 .... - emptyDir: {} name: prometheus-k8s-db - name: prometheus-k8s-token-6rv95
这里/prometheus目录使用的是emptyDir进行挂载,我们重建Pod之后之前的数据就没有了,由于我们的Prometheus使用Statefulset控制器进行部署的,为了保证数据一致性,这里采用storageclass来做持久化
因为我们要使用NFS作为后端存储,这里需要一个nfs-client
#现在还需要创建NFS-Client,不然prometheus pod现在是无法Running状态 kind: Deployment apiVersion: apps/v1 metadata: name: nfs-client-provisioner spec: replicas: 1 selector: matchLabels: app: nfs-client-provisioner strategy: type: Recreate template: metadata: labels: app: nfs-client-provisioner spec: serviceAccountName: nfs-client-provisioner containers: - name: nfs-client-provisioner image: quay.io/external_storage/nfs-client-provisioner:latest volumeMounts: - name: nfs-client-root mountPath: /persistentvolumes env: - name: PROVISIONER_NAME value: fuseim.pri/ifs - name: NFS_SERVER value: 192.168.0.14 #nfs server 地址 - name: NFS_PATH value: /data1/k8s-volume #nfs共享目录 volumes: - name: nfs-client-root nfs: server: 192.168.0.14 path: /data1/k8s-volume
创建nfs-client rbac文件
apiVersion: v1 kind: ServiceAccount metadata: name: nfs-client-provisioner --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: nfs-client-provisioner-runner rules: - apiGroups: [""] resources: ["persistentvolumes"] verbs: ["get", "list", "watch", "create", "delete"] - apiGroups: [""] resources: ["persistentvolumeclaims"] verbs: ["get", "list", "watch", "update"] - apiGroups: ["storage.k8s.io"] resources: ["storageclasses"] verbs: ["get", "list", "watch"] - apiGroups: [""] resources: ["events"] verbs: ["list", "watch", "create", "update", "patch"] - apiGroups: [""] resources: ["endpoints"] verbs: ["create", "delete", "get", "list", "watch", "patch", "update"] --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: run-nfs-client-provisioner subjects: - kind: ServiceAccount name: nfs-client-provisioner namespace: default roleRef: kind: ClusterRole name: nfs-client-provisioner-runner apiGroup: rbac.authorization.k8s.io
创建
[root@k8s-01 manifests]# kubectl apply -f nfs-rbac.yaml serviceaccount/nfs-client-provisioner created clusterrole.rbac.authorization.k8s.io/nfs-client-provisioner-runner created clusterrolebinding.rbac.authorization.k8s.io/run-nfs-client-provisioner created [root@k8s-01 manifests]# kubectl apply -f nfs-client.yaml deployment.apps/nfs-client-provisioner created [root@k8s-01 manifests]# kubectl get pod NAME READY STATUS RESTARTS AGE myapp-5jlc7 1/1 Running 1 2d myapp-cg4lq 1/1 Running 2 3d8h myapp-pplfn 1/1 Running 1 3d8h myapp-wkfqz 1/1 Running 2 3d8h nfs-client-provisioner-57cb5b4cfd-kbttp 1/1 Running 0 2m1s
这里创建一个StorageClass对象
[root@k8s-01 ~]# cat prometheus-storageclass.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: prometheus-data-db provisioner: fuseim.pri/ifs #创建 [root@k8s-01 ~]# kubectl apply -f prometheus-storageclass.yaml storageclass.storage.k8s.io/prometheus-data-db created
这里我们声明Storageclass对象,其中provisioner=fuseim.pri/ifs,则是我们集群中使用NFS作为后端存储
接下来我们在Prometheus中添加如下配置
vim kube-prometheus-master/manifests/prometheus-prometheus.yaml ... storage: volumeClaimTemplate: spec: storageClassName: prometheus-data-db resources: requests: storage: 100Gi .... #只需要在sepc:中添加对应的信息,storageClassName为刚刚创建的名称,storage为资源对象大小
Prometheus完整配置文件如下
[root@k8s-01 manifests]# cat prometheus-prometheus.yaml apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: labels: prometheus: k8s name: k8s namespace: monitoring spec: alerting: alertmanagers: - name: alertmanager-main namespace: monitoring port: web storage: volumeClaimTemplate: spec: storageClassName: prometheus-data-db resources: requests: storage: 10Gi baseImage: quay.io/prometheus/prometheus nodeSelector: beta.kubernetes.io/os: linux replicas: 2 resources: requests: memory: 400Mi ruleSelector: matchLabels: prometheus: k8s role: alert-rules securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 1000 serviceAccountName: prometheus-k8s serviceMonitorNamespaceSelector: {} serviceMonitorSelector: {} version: v2.11.0
查看prometheus启动状态
[root@k8s-01 manifests]# kubectl get pod -n monitoring NAME READY STATUS RESTARTS AGE alertmanager-main-0 2/2 Running 0 11h alertmanager-main-1 2/2 Running 15 8d alertmanager-main-2 2/2 Running 11 4d3h grafana-558647b59-msz6b 1/1 Running 5 8d kube-state-metrics-5bfc7db74d-r95r2 4/4 Running 21 8d node-exporter-24kdw 2/2 Running 10 8d node-exporter-4pqhb 2/2 Running 8 8d node-exporter-pbjb2 2/2 Running 8 8d node-exporter-vcq6c 2/2 Running 10 8d prometheus-adapter-57c497c557-7jqq7 1/1 Running 1 2d prometheus-k8s-0 3/3 Running 1 2m4s prometheus-k8s-1 3/3 Running 1 2m3s prometheus-operator-69bd579bf9-vq8cd 1/1 Running 1 2d
我们可以看一下pv和pvc对象资源
[root@k8s-01 manifests]# kubectl get pv -n monitoring|grep prom pvc-5ee985bb-62cd-11ea-b6d7-000c29eeccce 10Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-0 prometheus-data-db 2m36s pvc-5f0d05c0-62cd-11ea-b6d7-000c29eeccce 10Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-1 prometheus-data-db 2m45s [root@k8s-01 manifests]# kubectl get pvc -n monitoring NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE prometheus-k8s-db-prometheus-k8s-0 Bound pvc-5ee985bb-62cd-11ea-b6d7-000c29eeccce 10Gi RWO prometheus-data-db 2m49s prometheus-k8s-db-prometheus-k8s-1 Bound pvc-5f0d05c0-62cd-11ea-b6d7-000c29eeccce 10Gi RWO prometheus-data-db 2m48s [root@k8s-01 manifests]#
接下来可以测试一下Pod删除之后数据是否丢失
记录删除点
删除Pod
[root@k8s-01 manifests]# kubectl delete pod -n monitoring prometheus-k8s-0 pod "prometheus-k8s-0" deleted [root@k8s-01 manifests]# kubectl delete pod -n monitoring prometheus-k8s-1 pod "prometheus-k8s-1" deleted
等新Pod启动查看结果
可以看到数据没有丢失
相关文章:
- Kubernetes 1.14 二进制集群安装
- CentOS 7 ETCD集群配置大全
- Kubenetes 1.13.5 集群二进制安装
- Kubernetes PV与PVC