在docker中可以使用那个stats查看容器的资源使用,而在Kubernetes中可以使用top查看。但是一般都会报错,如下:
[root@linuxea metrics]# kubectl top pod
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)
[root@linuxea metrics]#
top命令应用依赖于heapster实现这个功能,在kubenretes上需要一个运行在集群级别的,各个pod,甚至各个节点的资源用量的采集,已经存储工具。
top是根据这些采集和存储获取数据,而后显示的。如果没有heapster采集存储,top命令是无法运行的。此前的dashboard的一部分用量也是依赖于heapster的。
在节点上,使用top等命令是用来统计节点的资源使用情况,当kubernetes运行在众多节点上后, 每一个pod运行在哪里,如果事先不做一些处理,默认是无法确定的。如何使用一个统一的视图来查看,必须要在每个节点部署一个统一的一个资源指标收集和存储工具,至少要有一个收集工具。当需要查看的时候,可以连接到每一个节点之上,通过本地的agent获取节点上的进程,以及节点本身的资源用量。而后显示top中
单独的部署,heapster只是一个汇聚工具,每个节点需要采集节点自身的和节点之上pod的指标数据,但是这些数据只是在节点自身上,因此,需要一个集中统一来收集并且存储的工具。
- 架构
在每个节点上都运行一个重要的组件:kubelet。kubelet是可以获取由他创建的pod的资源数据的,在kubelet下真正完成采集pod的数据的是kubelet子组件cAdvisor(cAdvisor目前是内建的功能)。
cAdvisor专门用来收集当前节点上各pod上各个容器的资源用量,以及节点之上的资源用量和存储用量等等信息。早期收集完成后,是提供端口,可进行在节点上查看。现在是主动报告数据到heapster,而后heapster收集每个节点上运行的heapster采集到的数据
在集群之上托管heapster,各个cAdvisor主动向heapster发送采集到的数据。如果没有存储,默认是存储在内存中,但是内存是有限的。这样一来就不能够查看到历史数据。查看历史数据就需要InfluxDB,heapster收集到数据后整合到InfluxDB,这样就能完成持久存储的目的。并且使用Grafana来查看数据源为InfluxDB中的数据。完成展示数据。如下图
另外还需要设置RBAC的资源控制权限
pod资源监控资源分为三类指标:kuberntes系统指标,容器指标,业务指标
influxDB
influxDB是一个时序数据库系统,InfluxDB被heapster依赖,所以先部署influxDB
一般在生产环境中,这里的influxDB最好使用有持久功能的存储卷来做数据存储,在github的heapster中,使用的是
volumes:
- name: influxdb-storage
emptyDir: {}
下载github之上的这个yaml文件
[root@linuxea metrics]# curl -Lk https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/influxdb.yaml -o $PWD/influxdb.yaml
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 960 100 960 0 0 1199 0 --:--:-- --:--:-- --:--:-- 1200
下载之后,可以直接启动,也可以apiVersion:apps/v1,如果要修改,需要添加selector
spec:
replicas: 1
selector:
matchLabels:
task: monitoring
k8s-app: influxdb
apply
[root@linuxea metrics]# kubectl apply -f influxdb.yaml
deployment.apps/monitoring-influxdb created
service/monitoring-influxdb created
[root@linuxea metrics]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
monitoring-influxdb-848b9b66f6-4wtfb 1/1 Running 0 8s
[root@linuxea metrics]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 49d
kubernetes-dashboard NodePort 10.101.194.113 <none> 443:31780/TCP 28d
monitoring-influxdb ClusterIP 10.99.135.180 <none> 8086/TCP 55s
也可以通过查看日志来了解启动是否ok kubectl logs -n kube-system monitoring-influxdb-848b9b66f6-4wtfb
[root@linuxea metrics]# kubectl logs -n kube-system monitoring-influxdb-848b9b66f6-4wtfb
ts=2018-11-04T11:47:18.945900Z lvl=info msg="InfluxDB starting" log_id=0BZdcP8G000 version=unknown branch=unknown commit=unknown
ts=2018-11-04T11:47:18.945924Z lvl=info msg="Go runtime" log_id=0BZdcP8G000 version=go1.10.3 maxprocs=4
ts=2018-11-04T11:47:58.951399Z lvl=info msg="Using data dir" log_id=0BZdcP8G000 service=store path=/data/data
ts=2018-11-04T11:47:58.951575Z lvl=info msg="Open store (start)" log_id=0BZdcP8G000 service=store trace_id=0BZdeqPl000 op_name=tsdb_open op_event=start
ts=2018-11-04T11:47:58.951672Z lvl=info msg="Open store (end)" log_id=0BZdcP8G000 service=store trace_id=0BZdeqPl000 op_name=tsdb_open op_event=end op_elapsed=0.100ms
ts=2018-11-04T11:47:58.951762Z lvl=info msg="Opened service" log_id=0BZdcP8G000 service=subscriber
ts=2018-11-04T11:47:58.951776Z lvl=info msg="Starting monitor service" log_id=0BZdcP8G000 service=monitor
ts=2018-11-04T11:47:58.951781Z lvl=info msg="Registered diagnostics client" log_id=0BZdcP8G000 service=monitor name=build
ts=2018-11-04T11:47:58.951785Z lvl=info msg="Registered diagnostics client" log_id=0BZdcP8G000 service=monitor name=runtime
ts=2018-11-04T11:47:58.951824Z lvl=info msg="Registered diagnostics client" log_id=0BZdcP8G000 service=monitor name=network
ts=2018-11-04T11:47:58.951831Z lvl=info msg="Registered diagnostics client" log_id=0BZdcP8G000 service=monitor name=system
ts=2018-11-04T11:47:58.951897Z lvl=info msg="Starting precreation service" log_id=0BZdcP8G000 service=shard-precreation check_interval=10m advance_period=30m
ts=2018-11-04T11:47:58.952047Z lvl=info msg="Starting snapshot service" log_id=0BZdcP8G000 service=snapshot
ts=2018-11-04T11:47:58.952060Z lvl=info msg="Starting continuous query service" log_id=0BZdcP8G000 service=continuous_querier
ts=2018-11-04T11:47:58.951957Z lvl=info msg="Storing statistics" log_id=0BZdcP8G000 service=monitor db_instance=_internal db_rp=monitor interval=10s
ts=2018-11-04T11:47:58.952114Z lvl=info msg="Starting HTTP service" log_id=0BZdcP8G000 service=httpd authentication=false
ts=2018-11-04T11:47:58.952155Z lvl=info msg="opened HTTP access log" log_id=0BZdcP8G000 service=httpd path=stderr
ts=2018-11-04T11:47:58.952232Z lvl=info msg="Listening on HTTP" log_id=0BZdcP8G000 service=httpd addr=[::]:8086 https=false
ts=2018-11-04T11:47:58.952253Z lvl=info msg="Starting retention policy enforcement service" log_id=0BZdcP8G000 service=retention check_interval=30m
ts=2018-11-04T11:47:58.952449Z lvl=info msg="Listening for signals" log_id=0BZdcP8G000
rbac
下载github的rbac
[root@linuxea metrics]# curl -Lk https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml -o $PWD/heapster-rbac.yaml
[root@linuxea metrics]# kubectl apply -f heapster-rbac.yaml
clusterrolebinding.rbac.authorization.k8s.io/heapster created
heapster
其中heapster用户名是被绑定在ClusterRole上运行的,可查看rbac的yaml文件。
[root@linuxea metrics]# curl -Lk https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster.yaml -o $PWD/heapster.yaml
其中- --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086
指定influxDB的pod名称,这个名称可被解析
为了方便访问,我们加上NodePort
spec:
ports:
- port: 80
targetPort: 8082
type: NodePort
apply
[root@linuxea metrics]# kubectl apply -f heapster.yaml
serviceaccount/heapster created
deployment.apps/heapster created
service/heapster created
[root@linuxea metrics]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
heapster NodePort 10.109.4.29 <none> 80:32154/TCP 11s
[root@linuxea metrics]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
heapster-84c9bc48c4-lwlk7 1/1 Running 0 1m
并且可以通过日志查看是否准备妥当
[root@linuxea metrics]# kubectl logs -f heapster-84c9bc48c4-lwlk7 -n kube-system
I1104 12:03:48.305225 1 heapster.go:78] /heapster --source=kubernetes:https://kubernetes.default --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086
I1104 12:03:48.305268 1 heapster.go:79] Heapster version v1.5.4
I1104 12:03:48.305424 1 configs.go:61] Using Kubernetes client with master "https://kubernetes.default" and version v1
I1104 12:03:48.305466 1 configs.go:62] Using kubelet port 10255
E1104 12:05:57.947315 1 influxdb.go:297] issues while creating an InfluxDB sink: failed to ping InfluxDB server at "monitoring-influxdb.kube-system.svc:8086" - Get http://monitoring-influxdb.kube-system.svc:8086/ping: dial tcp 10.99.135.180:8086: getsockopt: connection timed out, will retry on use
I1104 12:05:57.947337 1 influxdb.go:312] created influxdb sink with options: host:monitoring-influxdb.kube-system.svc:8086 user:root db:k8s
I1104 12:05:57.947358 1 heapster.go:202] Starting with InfluxDB Sink
I1104 12:05:57.947363 1 heapster.go:202] Starting with Metric Sink
I1104 12:05:57.954628 1 heapster.go:112] Starting heapster on port 8082
此后,通过浏览器访问试试
grafana
在grafana的yml文件中挂载了两个存储卷,一个是存放数据,一个是https证书,存放位置/etc/ssl/certs,如要使用自己的则需要存放在/etc/ssl/certs之下
并且传递了INFLUXDB_HOST和GF_SERVER_HTTP_PORT的变量参数
[root@linuxea metrics]# curl -Lk https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/grafana.yaml -o $PWD/grafana.yaml
我们通过集群外部访问,添加NodePort,并且设置端口为30980
ports:
- name: http
port: 80
targetPort: 3000
nodePort: 30980
protocol: TCP
type: NodePort
selector:
k8s-app: grafan
apply
[root@linuxea metrics]# kubectl apply -f grafana.yaml
deployment.apps/monitoring-grafana created
service/monitoring-grafana unchanged
[root@linuxea metrics]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
heapster NodePort 10.109.4.29 <none> 80:32154/TCP 26m
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 49d
kubernetes-dashboard NodePort 10.101.194.113 <none> 443:31780/TCP 28d
monitoring-grafana NodePort 10.103.116.252 <none> 80:30980/TCP 4s
[root@linuxea metrics]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
heapster-84c9bc48c4-lwlk7 1/1 Running 0 18m
kubernetes-dashboard-767dc7d4d-q6ls7 1/1 Running 0 6d
monitoring-grafana-555545f477-hpz85 1/1 Running 0 1m
monitoring-influxdb-848b9b66f6-4wtfb 1/1 Running 0 34m
在集群外通过30980访问集群内的任何一台ip:30980即可heapster在1.13彻底弃用。可参考:Grafana和Heapster,可以试试其他模板下载使用。