metrics-server
metrics-server是用户开放的一个api server,这个api server用于服务资源指标服务器,并不是服务kubernetes api,更不是服务pod api,仅仅用于服务cpu利用率,内存使用率等等对象。
metrics-server并不是kubernetes组成部分,只是托管在kubernetes之上的一个pod,为了能让用户使用metrics-server之上的api,在kubernetes上可以无缝使用metrics-server,可以在新的结构中这样的组织,如下:
kubernetes依然正常运行,除此之外额外运行一个metrics-server,metrics-server也能提供另外一组api,这两组api合并到一起当一个使用,就需要在之前加一层代理,这个代理叫做聚合器(kube-aggregator)。这个聚合器不单单能聚合metrics-server,其他的第三方也可以聚合。
这个聚合器提供的资源指标是:/apis/metrics.k8s.io/v1beta1,kubernetes默认不提供这个接口,通过metrics-server提供/apis/metrics.k8s.io/v1beta1,而kubernetes提供原生的api 群组,这两个api server通过kube-aggregator聚合器的方式整合到一起,用户访问时通过kube-aggregator,既能访问原生的api 群组,也能通过kube-aggregator访问metrics-server提供的额外群组。
事实上也可以扩展其他的api,加到kube-aggregator下即可。heapster废弃后,metrics将会成kubernetes多个核心组件的先决条件,如:kubectl,top等等,如果没有metrics,这些则用不了。为了给这些组件提供数据,就要部署metrics。
部署
我们可以克隆kubernetes源码树中的metrics-server,也可以克隆metrics-server下的,这两个git地址不同,内容也是不同的。
1,kubernetes-incubato
克隆github上metrics-server的代码,而后使用1.8+版本部署
[root@linuxea ~]# git clone https://github.com/kubernetes-incubator/metrics-server.git
使用kubectl apply -f ./
将/root/metrics-server/deploy/1.8+
下的所有yaml文件部署起来
[root@linuxea ~]# cd /root/metrics-server/deploy/1.8+
[root@linuxea 1.8+]# kubectl apply -f ./
- 确保metrics-server服务成功启动
[root@linuxea 1.8+]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 53d
kubernetes-dashboard NodePort 10.101.194.113 <none> 443:31780/TCP 32d
metrics-server ClusterIP 10.99.129.34 <none> 443/TCP 1m
- 确保
metrics-server-85cc795fbf-7srw
pod启动
[root@linuxea 1.8+]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
metrics-server-85cc795fbf-7srw2 1/1 Running 0 1m
2,kubernetes 的cluster/addons/metrics-server中的metrics-server
克隆kubernetes源码树中的metrics-server
请注意,我这里使用的是kubernetes v1.11.1版本,期间重装几次,docker使用docker://18.05.0-ce
metrics-server和metrics-server-nanny版本如下:
提示:如果你不是这个版本,如果是更新的版本请阅读github使用文档,或者查看源码和yaml文件
- name: metrics-server image: k8s.gcr.io/metrics-server-amd64:v0.3.1
- name: metrics-server-nanny image: k8s.gcr.io/addon-resizer:1.8.3
单独下载这几个文件
auth-delegator.yaml
auth-reader.yaml
metrics-apiservice.yaml
metrics-server-deployment.yaml
metrics-server-service.yaml
resource-reader.yaml
[root@linuxea metrics-server]# for i in auth-delegator.yaml auth-reader.yaml metrics-apiservice.yaml metrics-server-deployment.yaml metrics-server-service.yaml resource-reader.yaml;do wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/metrics-server/$i;done
如果有以下报错,可参考如下:
403 Forbidden", response: "Forbidden (user=system:anonymous, verb=get, resource=nodes, subresource=stats)
E0903 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:<hostname>: unable to fetch metrics from Kubelet <hostname> (<hostname>): Get https://<hostname>:10250/stats/summary/: dial tcp: lookup <hostname> on 10.96.0.10:53: no such host
no response from https://10.101.248.96:443: Get https://10.101.248.96:443: Proxy Error ( Connection refused )
E1109 09:54:49.509521 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:linuxea.node-2.com: unable to fetch metrics from Kubelet linuxea.node-2.com (10.10.240.203): Get https://10.10.240.203:10255/stats/summary/: dial tcp 10.10.240.203:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-3.com: unable to fetch metrics from Kubelet linuxea.node-3.com (10.10.240.143): Get https://10.10.240.143:10255/stats/summary/: dial tcp 10.10.240.143:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-4.com: unable to fetch metrics from Kubelet linuxea.node-4.com (10.10.240.142): Get https://10.10.240.142:10255/stats/summary/: dial tcp 10.10.240.142:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.master-1.com: unable to fetch metrics from Kubelet linuxea.master-1.com (10.10.240.161): Get https://10.10.240.161:10255/stats/summary/: dial tcp 10.10.240.161:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-1.com: unable to fetch metrics from Kubelet linuxea.node-1.com (10.10.240.202): Get https://10.10.240.202:10255/stats/summary/: dial tcp 10.10.240.202:10255: connect: connection refused]
我们修改一些参数进行配置
修改metrics-server-deployment.yaml
中command
参数,配置cpu内存大小
command:
- /pod_nanny
- --config-dir=/etc/config
- --cpu=100m
- --extra-cpu=0.5m
- --memory=100Mi
- --extra-memory=50Mi
- --threshold=5
- --deployment=metrics-server-v0.3.1
- --container=metrics-server
- --poll-period=300000
- --estimator=exponential
# Specifies the smallest cluster (defined in number of nodes)
# resources will be scaled to.
- --minClusterSize=10
并且修改metrics-server-amd64:v0.3.1的配置段,添加如下:
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
最终如下:
spec:
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.1
command:
- /metrics-server
- --metric-resolution=30s
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
# These are needed for GKE, which doesn't support secure communication yet.
# Remove these lines for non-GKE clusters, and when GKE supports token-based auth.
#- --kubelet-port=10255
#- --deprecated-kubelet-completely-insecure=true
- --kubelet-insecure-tls
这种方式是禁用tls验证,一般不建议在生产环境中使用。并且由于DNS是无法解析到这些主机名,使用- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
进行规避。还有另外一种方法,修改coredns,不过,我并不建议这样做。
参考这篇:https://github.com/kubernetes-incubator/metrics-server/issues/131
另外在 resource-reader.yaml
中添加 - nodes/stats
,如下:
[root@linuxea metrics-server]# cat resource-reader.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:metrics-server
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
参考:https://github.com/kubernetes-incubator/metrics-server/issues/95
apply
[root@linuxea metrics-server]# pwd
/root/metrics-server
[root@linuxea metrics-server]# kubectl apply -f .
[root@linuxea metrics-server]# kubectl get pods,svc -n kube-system
NAME READY STATUS RESTARTS AGE
pod/coredns-576cbf47c7-65ndt 1/1 Running 0 2m18s
pod/coredns-576cbf47c7-rrk4f 1/1 Running 0 2m18s
pod/etcd-linuxea.master-1.com 1/1 Running 0 89s
pod/kube-apiserver-linuxea.master-1.com 1/1 Running 0 97s
pod/kube-controller-manager-linuxea.master-1.com 1/1 Running 0 84s
pod/kube-flannel-ds-amd64-4dtgp 1/1 Running 0 115s
pod/kube-flannel-ds-amd64-6g2sm 1/1 Running 0 48s
pod/kube-flannel-ds-amd64-7txhx 1/1 Running 0 50s
pod/kube-flannel-ds-amd64-fs4lw 1/1 Running 0 57s
pod/kube-flannel-ds-amd64-v2qvv 1/1 Running 0 48s
pod/kube-proxy-bmhfh 1/1 Running 0 2m18s
pod/kube-proxy-c9wkz 1/1 Running 0 50s
pod/kube-proxy-d8vlj 1/1 Running 0 57s
pod/kube-proxy-rpst5 1/1 Running 0 48s
pod/kube-proxy-t5pzg 1/1 Running 0 48s
pod/kube-scheduler-linuxea.master-1.com 1/1 Running 0 97s
pod/metrics-server-v0.3.1-69788f46f9-82w76 2/2 Running 0 15s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 2m32s
service/metrics-server ClusterIP 10.103.131.149 <none> 443/TCP 19s
[root@linuxea metrics-server]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
metrics-server ClusterIP 10.98.186.115 <none> 443/TCP 42s
此刻,metrics-server
提供的metrics.k8s.io/v1beta1
就能显示值啊api-versions
中
[root@linuxea metrics-server]# kubectl api-versions|grep metrics
metrics.k8s.io/v1beta1
这些已经准备完成,我们可以试试查看收集的数据
[root@linuxea metrics-server]# kubectl top pods
NAME CPU(cores) MEMORY(bytes)
linuxea-hpa-68ffdc8b94-jjfw7 1m 103Mi
linuxea-hpa-68ffdc8b94-mbgc8 1m 99Mi
linuxea-hpa-68ffdc8b94-trtkm 1m 101Mi
linuxea-hpa-68ffdc8b94-twcxx 1m 100Mi
linuxea-hpa-68ffdc8b94-w9d7j 1m 100Mi
[root@linuxea metrics-server]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
linuxea.master-1.com 197m 4% 3213Mi 41%
linuxea.node-1.com 60m 1% 939Mi 24%
linuxea.node-2.com 58m 1% 1066Mi 27%
linuxea.node-3.com 127m 3% 673Mi 17%
linuxea.node-4.com 47m 1% 664Mi 17%