BIRD is not ready: BGP not established问题解决

2023年 7月 21日 129.5k 0

问题现象:

在k8s上面使用calico网络插件,这两天发现dns服务异常,经过排查,发现dns的两个pod,位于master node上面的ip是不能被ping通的,导致了dns服务不能正常提供服务。
然后查看网络插件的pod,发现位于master节点上的calico-node服务,不正常

错误如下:

NAME                                       READY   STATUS    RESTARTS       AGE    IP               NODE            NOMINATED NODE   READINESS GATES
calico-kube-controllers-7cd8b89887-vfzwc   1/1     Running   2 (117d ago)   132d   10.244.118.109   xy-5-server14              
calico-node-9qtv5                          1/1     Running   0              132d   192.168.5.19     xy-5-server19              
calico-node-lxg9k                          0/1     Running   0              34s    192.168.5.14     xy-5-server14              
calico-node-rmscn                          1/1     Running   0              33s    192.168.5.17     xy-5-server17              
calico-typha-d4f58c4c9-8nf76               1/1     Running   0              132d   192.168.5.17     xy-5-server17              
calico-typha-d4f58c4c9-dbf8g               1/1     Running   0              132d   192.168.5.14     xy-5-server14              
csi-node-driver-92rbg                      2/2     Running   0              132d   10.244.116.196   xy-5-server17              
csi-node-driver-gpgwd                      2/2     Running   0              132d   10.244.6.82      xy-5-server19              
csi-node-driver-h9kbw                      2/2     Running   0              132d   10.244.118.101   xy-5-server14              
[root@xy-5-server14 calico]# kubectl  -n calico-system describe pod calico-node-lxg9k 

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  47s                default-scheduler  Successfully assigned calico-system/calico-node-lxg9k to xy-5-server14
  Normal   Pulled     47s                kubelet            Container image "docker.io/calico/pod2daemon-flexvol:v3.24.5" already present on machine
  Normal   Created    47s                kubelet            Created container flexvol-driver
  Normal   Started    47s                kubelet            Started container flexvol-driver
  Normal   Pulled     46s                kubelet            Container image "docker.io/calico/cni:v3.24.5" already present on machine
  Normal   Created    45s                kubelet            Created container install-cni
  Normal   Started    45s                kubelet            Started container install-cni
  Normal   Pulled     42s                kubelet            Container image "docker.io/calico/node:v3.24.5" already present on machine
  Normal   Created    42s                kubelet            Created container calico-node
  Normal   Started    41s                kubelet            Started container calico-node
  Warning  Unhealthy  40s (x2 over 41s)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
  Warning  Unhealthy  37s                kubelet            Readiness probe failed: 2023-07-18 08:18:19.246 [INFO][379] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.5.17,192.168.5.19
  Warning  Unhealthy  27s  kubelet  Readiness probe failed: 2023-07-18 08:18:29.242 [INFO][423] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.5.17,192.168.5.19
  Warning  Unhealthy  17s  kubelet  Readiness probe failed: 2023-07-18 08:18:39.246 [INFO][455] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.5.17,192.168.5.19
  Warning  Unhealthy  7s  kubelet  Readiness probe failed: 2023-07-18 08:18:49.249 [INFO][486] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.5.17,192.168.5.19

推而广之,发现所有的位于master节点上面的pod的ip,均不能正常ping通

问题发现

安装calico的客户端:参考:www.cnblogs.com/varden/p/15…
在master上面:

[root@xy-5-server14 ~]# calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 192.168.5.17 | node-to-node mesh | up    | 08:36:51 | Established |
| 192.168.5.19 | node-to-node mesh | up    | 08:37:15 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

连接正常...
在node1上面

[root@xy-5-server17 ~]# calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+------------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |   SINCE    |    INFO     |
+--------------+-------------------+-------+------------+-------------+
| 192.168.5.19 | node-to-node mesh | up    | 2023-03-07 | Established |
| 10.4.0.1     | node-to-node mesh | start | 2023-07-17 | Connect     |
+--------------+-------------------+-------+------------+-------------+

IPv6 BGP status
No IPv6 peers found.

发现问题了吧,master的地址正常应该使用的是192.168.5.14,这个却使用的是10.4.0.1这个ip。
同样,在node2上面,也发现相同的问题

[root@xy-5-server19 ~]# calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+------------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |   SINCE    |    INFO     |
+--------------+-------------------+-------+------------+-------------+
| 192.168.5.17 | node-to-node mesh | up    | 08:18:24   | Established |
| 10.4.0.1     | node-to-node mesh | start | 2023-07-17 | Connect     |
+--------------+-------------------+-------+------------+-------------+

IPv6 BGP status
No IPv6 peers found.

在网上找到相同的遭遇的帖子:www.jianshu.com/p/4b175e733…
cloud.tencent.com/developer/a…
需要指定网卡,但是我使用的是operator安装的calico,直接修改calico-node的statefulset是不起作用的,会被operator改回去。跟文中的描述不一致。

问题解决

在calico官网找到相关配置:docs.tigera.io/calico/late…

1111.png

然后在k8s集群中找到

[root@xy-5-server17 ~]# kubectl get Installation
NAME      AGE
default   155d
[root@xy-5-server17 ~]# kubectl edit Installation default
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  creationTimestamp: "2023-02-13T09:18:18Z"
  finalizers:
  - tigera.io/operator-cleanup
  generation: 3
  name: default
  resourceVersion: "151883088"
  uid: 580c6998-4b1e-4616-8c0b-7a3fc4adf553
spec:
  calicoNetwork:
    bgp: Enabled
    hostPorts: Enabled
    ipPools:
    - blockSize: 26
      cidr: 10.244.0.0/16
      disableBGPExport: false
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()
    linuxDataplane: Iptables
    multiInterfaceMode: None
    nodeAddressAutodetectionV4:
      interface: ens4f1
  cni:
    ipam:
      type: Calico
    type: Calico
  controlPlaneReplicas: 2
  flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
  kubeletVolumePluginPath: /var/lib/kubelet
  nodeUpdateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
  nonPrivileged: Disabled
  variant: Calico
status:
  computed:
    calicoNetwork:
      bgp: Enabled
      hostPorts: Enabled
      ipPools:
      - blockSize: 26
        cidr: 10.244.0.0/16
        disableBGPExport: false

nodeAddressAutodetectionV4:
      interface: ens4f1

这段配置,改成文档中描述的那样,设置自己的网卡即可
然后发现master节点上的calico-node pod运行正常,dns pod的ip可以ping通,dns服务恢复正常,问题得到了解决。

相关文章

服务器端口转发,带你了解服务器端口转发
服务器开放端口,服务器开放端口的步骤
产品推荐:7月受欢迎AI容器镜像来了,有Qwen系列大模型镜像
如何使用 WinGet 下载 Microsoft Store 应用
百度搜索:蓝易云 – 熟悉ubuntu apt-get命令详解
百度搜索:蓝易云 – 域名解析成功但ping不通解决方案

发布评论