在上次处理 kubelet.go node "master" not found问题
之后的一段时间里面,我又遇到了相同的问题发生在其他节点。它的表现方式是/etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
我此前也写了一篇文章linuxea:处理k8s kubelet.go node "master" not found问题
假如按照此前的方式删除/etc/kubernetes/bootstrap-kubelet.conf
之后可能就会出现kubelet.go node "master" not found
的问题,随后使用admin.conf来替换启动文件来解决这个问题的
但是我随后发现,这个问题的缘由是kubelet的证数到期后进行了证数更新导致的上面的这个错误,从而误导了我删除了10-kubeadm.conf
种的--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf
字段后重启,并使用将master
的admin.conf
替换成kubelet.conf
来解决了这个问题,这一个操作,似乎掩盖了真正的问题所在.
究其原因是因为Kubelet的证数没有更新。这种情况发生在手动执行了更新证数到期时间后导致的,kubeadm更新证数并不会更新到Kubelet的证数(实际上是客户端证书轮换失败)。
于是当kublet被重启后,就发生了证数不一致的问题,此前将master的admin.conf替换成kubelet.conf来解决了这个问题的假象在于没有重启kubelet。
- 我个人并没有这种腿癖好,下午太困,群友说美腿提神啊(来自网图),响应号召
我们来看相同的报错,发生在1.16的kubernetes版本中:
2月 09 16:41:11 master systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
2月 09 16:41:11 master systemd[1]: Unit kubelet.service entered failed state.
2月 09 16:41:11 master systemd[1]: kubelet.service failed.
2月 09 16:41:22 master systemd[1]: kubelet.service holdoff time over, scheduling restart.
2月 09 16:41:22 master systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
2月 09 16:41:22 master systemd[1]: Started kubelet: The Kubernetes Node Agent.
2月 09 16:41:22 master kubelet[74138]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
2月 09 16:41:22 master kubelet[74138]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
2月 09 16:41:22 master kubelet[74138]: I0209 16:41:22.222741 74138 server.go:410] Version: v1.16.3
2月 09 16:41:22 master kubelet[74138]: I0209 16:41:22.223911 74138 plugins.go:100] No cloud provider specified.
2月 09 16:41:22 master kubelet[74138]: I0209 16:41:22.223954 74138 server.go:773] Client rotation is on, will bootstrap in background
2月 09 16:41:22 master systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
2月 09 16:41:22 master kubelet[74138]: E0209 16:41:22.227202 74138 bootstrap.go:265] part of the existing bootstrap client certificate is expired: 2021-03-18 08:46:29 +0000 UTC
2月 09 16:41:22 master kubelet[74138]: F0209 16:41:22.227239 74138 server.go:271] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
2月 09 16:41:22 master systemd[1]: Unit kubelet.service entered failed state.
2月 09 16:41:22 master systemd[1]: kubelet.service failed.
此前的方式就是直接删除了/etc/kubernetes/bootstrap-kubelet.conf
(kubeadm安装)这段,这段位于kubelet启动的的配置文件内,你可以通过命令来查看贴图的日期不重要,仅仅提供说明
[root@master ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Thu 2021-12-30 03:08:09 CST; 1 months 23 days ago
Docs: https://kubernetes.io/docs/
Main PID: 32478 (kubelet)
Tasks: 29
Memory: 106.9M
CGroup: /system.slice/kubelet.service
└─32478 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/confi...
证数查看
首先我们查看证书
[root@master pki]# kubeadm alpha certs check-expiration
CERTIFICATE EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
admin.conf Feb 07, 2032 08:31 UTC 9y no
apiserver Feb 07, 2032 08:31 UTC 9y no
apiserver-etcd-client Feb 07, 2032 08:31 UTC 9y no
apiserver-kubelet-client Feb 07, 2032 08:31 UTC 9y no
controller-manager.conf Feb 07, 2032 08:31 UTC 9y no
etcd-healthcheck-client Feb 07, 2032 08:31 UTC 9y no
etcd-peer Feb 07, 2032 08:31 UTC 9y no
etcd-server Feb 07, 2032 08:31 UTC 9y no
front-proxy-client Feb 07, 2032 08:31 UTC 9y no
scheduler.conf Feb 07, 2032 08:31 UTC 9y no
查看到的日期是正常的而后我们查看kubelet的证数,kubelet.conf是在/var/lib/kubelet/pki的连接文件,于是我们查看它的证数到期时间
[root@master ]# cd /var/lib/kubelet/pki
[root@master pki]# ls
kubelet-client-2020-03-18-16-46-37.pem kubelet-client-2021-01-28-09-11-35.pem kubelet-client-current.pem kubelet.key
kubelet-client-2020-03-18-16-47-03.pem kubelet-client-2022-02-09-16-22-05.pem kubelet.crt
[root@master pki]# openssl x509 -noout -enddate -in ./kubelet.crt
notAfter=Mar 18 07:46:26 2021 GMT
我们可以看到在Mar 18 07:46:26 2021 GMT
也就是说在2021 年 3 月 18 日 07:46:26就已经到期了
kubelet-client-2022-02-09-16-22-05.pem
文件是通过kubeadm alpha certs renew all
更新后的,可以看到有不同的日期。这个kubeadm是有10年的时间的,所以它并不影响。但是这个pem和我们的日期也是对不上的
kubelet client的日志也没更新
Kubelet 客户端证书轮换失败
来源于kublet的文章Kubelet 客户端证书轮换失败原文如下:
By default, kubeadm configures a kubelet with automatic rotation of client certificates by using the
/var/lib/kubelet/pki/kubelet-client-current.pem
symlink specified in/etc/kubernetes/kubelet.conf
. If this rotation process fails you might see errors such asx509: certificate has expired or is not yet valid
in kube-apiserver logs. To fix the issue you must follow these steps:Backup and delete /etc/kubernetes/kubelet.conf
and/var/lib/kubelet/pki/kubelet-client*
from the failed node.From a working control plane node in the cluster that has /etc/kubernetes/pki/ca.key
executekubeadm kubeconfig user --org system:nodes --client-name system:node:$NODE > kubelet.conf
.$NODE
must be set to the name of the existing failed node in the cluster. Modify the resultedkubelet.conf
manually to adjust the cluster name and server endpoint, or passkubeconfig user --config
(it acceptsInitConfiguration
). If your cluster does not have theca.key
you must sign the embedded certificates in thekubelet.conf
externally.Copy this resulted kubelet.conf
to/etc/kubernetes/kubelet.conf
on the failed node.Restart the kubelet ( systemctl restart kubelet
) on the failed node and wait for/var/lib/kubelet/pki/kubelet-client-current.pem
to be recreated.Manually edit the
kubelet.conf
to point to the rotated kubelet client certificates, by replacingclient-certificate-data
andclient-key-data
with:client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
Restart the kubelet. Make sure the node becomes Ready
.
翻译过来的意思如下:
默认情况下,kubeadm 通过使用
/var/lib/kubelet/pki/kubelet-client-current.pem
在/etc/kubernetes/kubelet.conf
. 如果此轮换过程失败,您可能会x509: certificate has expired or is not yet valid
在 kube-apiserver 日志中看到错误。要解决此问题,您必须执行以下步骤:从故障节点备份 /etc/kubernetes/kubelet.conf
和删除。/var/lib/kubelet/pki/kubelet-client*
从集群中具有 /etc/kubernetes/pki/ca.key
执行 的工作控制平面节点kubeadm kubeconfig user --org system:nodes --client-name system:node:$NODE > kubelet.conf
。$NODE
必须设置为集群中现有故障节点的名称。手动修改结果kubelet.conf
以调整集群名称和服务器端点,或通过kubeconfig user --config
(它接受InitConfiguration
)。如果您的集群没有,您必须在外部ca.key
签署嵌入式证书。kubelet.conf
将此结果复制 kubelet.conf
到/etc/kubernetes/kubelet.conf
故障节点上。重新启动故障节点上的 kubelet ( systemctl restart kubelet
) 并等待/var/lib/kubelet/pki/kubelet-client-current.pem
重新创建。手动编辑
kubelet.conf
以指向旋转的 kubelet 客户端证书,方法是将client-certificate-data
和替换client-key-data
为:client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
重启 kubelet。 确保节点变为 Ready
.
在github上有好几种办法,然而这种方式,被一些大佬吐槽,评价是过于粗糙
解决方法是复制/etc/kubernetes/admin.conf特定键的内容client-certificate-data并将client-key-data这些新字符串粘贴到/etc/kubernetes/kubelet.conf相同键下的文件中。然后只是一个service kubelet restart
[root@master kubernetes]# cat admin.conf
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJd01ETXhPREE0TkRZeU4xb1hEVE13TURNeE5qQTRORFl5TjFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTGFaClRWNODRKWVBPM09yKzdVbS9KN29sRVFEa3RGT3RWWHg0NWhQU0MrVkhWVEZib1JvOWEKNnVHT05iTWNHWVJjcERBbUZSU2pycnFlaFhmbTNjVWJaRUxrdmpTNXFsaFVONGlYak9idFFVYnQ4cHREYU9QSgo1cDUybjRnczdKMU92bzhKRjYzYU83Vy91cHdJS05MOEovWlpUVTh0YlU1TklkUzZCMXE1cFRSQTFBVT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
server: https://master:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM4akNDQWRxZ0F3SUJBZ0lJUm91STNYU1ZTak13RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TURBek1UZ3dPRFEyTWpkYUZ3MHpNakF5TURjd09ETXhNVGxhTURReApGekFWQmdOVkJBb1REb9FUmJWenpRQndxZ1djMkMrbmVmRlNYK0FQMHdrL2VmdXJpdGRqUTAKeFhVNjgwNnF0b1hzM3VHaWtNQkc1WmQzT2srLzc5NlZGM29TZllObU5CaVAxY3FjVUJIcVFpOTdQNVZSL2RmawpaR0phMVJoNE5aRk9IaXVqRXFFOGQxUFVLOTg0SHNxOTcxN0dIelRaZGNDMW1EcFF3d3FUdktVRlZOa3hQdFljCjdDWkl1QUltZWFwcXlQVkFhdEp5Vk5kVy9NRlVya0ZjTHZFMnlRQ1pXd1NxL3RnSDFtMD0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBcXVmRUo2NG9wR2txM1Vzd21SNGFiOTRuS0RjTTFMSWRsYnBXVkIraDAzZGp5K0ZICnJsRVVSVUdESEtBZjIvN2EwbTNrS0xoSWVudC9GRVRxSm5Kd3RUUzdmUDlDVzVwUGR2OHdEQ3o3U1dzK1ZrczcKTTVjcXhMNFovem5ySU9LZ2FmQzIyaTVFdjgrRjBqdW85b1lES3VwMFQ0bmxON3dNeXdjN1dFS0dNcGtEZGNnTgpwem1kTGZDSzQvNXdWeFhVcDFvTDJ1OHowV0RLKzcyN3plaFVMcFpZN0lXRG1PRnd2YzFxcmp6RFBCYWNxd3MwCnJyMkx6RXllRWt6cUZpd3BkcXBmbE4rYkxTZkN3ekNlWFdTcEVQ5UEVnV0dEWFlaYUhGTzBRZVF0a2Vnd2xoeWdXeXNZOTBBZnArbQpOeVByZW8zRngzaTlBUG9QeWRuNHFtbVd2dmhiT2FhUGZyK1pBUmFOa0JCaXc1OUw3eW5IMVhLcExMMDBGZHlCClFRYS8rUUtCZ1FDYzFLaXV3Ui9ZWGY5aGtKeWVZRTZHUXhKeEc2OWl2MDNuZm1ldi9zeExKZDY3WmxBemRrbDgKc3Vtb29uK0dhc0V4SGFqQUhkVVlNZmplU2ZxUkNOR1FISWM4cGFNYjQxbFErRGowRlBydzRHeThjcTBNWEtleQpIelduazQrVmpXeW9URVJoTnpkSEVUdXFKUG51TFdqbFhSaFhLWCtIVmVZVUdwN3pRNHFXQWc9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
修改后如下
[root@master kubernetes]# cat kubelet.conf
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJd01ETXhPREE0TkRZeU4xb1hEVE13TURNeE5qQTRORFl5TjFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTGFaClRWNODRKWVBPM09yKzdVbS9KN29sRVFEa3RGT3RWWHg0NWhQU0MrVkhWVEZib1JvOWEKNnVHT05iTWNHWVJjcERBbUZSU2pycnFlaFhmbTNjVWJaRUxrdmpTNXFsaFVONGlYak9idFFVYnQ4cHREYU9QSgo1cDUybjRnczdKMU92bzhKRjYzYU83Vy91cHdJS05MOEovWlpUVTh0YlU1TklkUzZCMXE1cFRSQTFBVT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
server: https://master:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: system:node:master
name: system:node:master@kubernetes
current-context: system:node:master@kubernetes
kind: Config
preferences: {}
users:
- name: system:node:master
user:
client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM4akNDQWRxZ0F3SUJBZ0lJUm91STNYU1ZTak13RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TURBek1UZ3dPRFEyTWpkYUZ3MHpNakF5TURjd09ETXhNVGxhTURReApGekFWQmdOVkJBb1REb9FUmJWenpRQndxZ1djMkMrbmVmRlNYK0FQMHdrL2VmdXJpdGRqUTAKeFhVNjgwNnF0b1hzM3VHaWtNQkc1WmQzT2srLzc5NlZGM29TZllObU5CaVAxY3FjVUJIcVFpOTdQNVZSL2RmawpaR0phMVJoNE5aRk9IaXVqRXFFOGQxUFVLOTg0SHNxOTcxN0dIelRaZGNDMW1EcFF3d3FUdktVRlZOa3hQdFljCjdDWkl1QUltZWFwcXlQVkFhdEp5Vk5kVy9NRlVya0ZjTHZFMnlRQ1pXd1NxL3RnSDFtMD0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBcXVmRUo2NG9wR2txM1Vzd21SNGFiOTRuS0RjTTFMSWRsYnBXVkIraDAzZGp5K0ZICnJsRVVSVUdESEtBZjIvN2EwbTNrS0xoSWVudC9GRVRxSm5Kd3RUUzdmUDlDVzVwUGR2OHdEQ3o3U1dzK1ZrczcKTTVjcXhMNFovem5ySU9LZ2FmQzIyaTVFdjgrRjBqdW85b1lES3VwMFQ0bmxON3dNeXdjN1dFS0dNcGtEZGNnTgpwem1kTGZDSzQvNXdWeFhVcDFvTDJ1OHowV0RLKzcyN3plaFVMcFpZN0lXRG1PRnd2YzFxcmp6RFBCYWNxd3MwCnJyMkx6RXllRWt6cUZpd3BkcXBmbE4rYkxTZkN3ekNlWFdTcEVQ5UEVnV0dEWFlaYUhGTzBRZVF0a2Vnd2xoeWdXeXNZOTBBZnArbQpOeVByZW8zRngzaTlBUG9QeWRuNHFtbVd2dmhiT2FhUGZyK1pBUmFOa0JCaXc1OUw3eW5IMVhLcExMMDBGZHlCClFRYS8rUUtCZ1FDYzFLaXV3Ui9ZWGY5aGtKeWVZRTZHUXhKeEc2OWl2MDNuZm1ldi9zeExKZDY3WmxBemRrbDgKc3Vtb29uK0dhc0V4SGFqQUhkVVlNZmplU2ZxUkNOR1FISWM4cGFNYjQxbFErRGowRlBydzRHeThjcTBNWEtleQpIelduazQrVmpXeW9URVJoTnpkSEVUdXFKUG51TFdqbFhSaFhLWCtIVmVZVUdwN3pRNHFXQWc9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
最后我们得到的结果是,通过kubeadm alpha certs renew all
更新的k8s证数,是不会更新kubelet.conf的证数的,并且这在github上得到了进一步的讨论和证实
参考
Kubelet can't running after renew certificateslinuxea:处理k8s kubelet.go node "master" not found问题