BIRD is not ready: BGP not established问题解决

系统运维 2023-07-21 共饮一杯手机阅读

问题现象：

在k8s上面使用calico网络插件，这两天发现dns服务异常，经过排查，发现dns的两个pod，位于master node上面的ip是不能被ping通的，导致了dns服务不能正常提供服务。然后查看网络插件的pod，发现位于master节点上的calico-node服务，不正常

错误如下：

NAME                                       READY   STATUS    RESTARTS       AGE    IP               NODE            NOMINATED NODE   READINESS GATES
calico-kube-controllers-7cd8b89887-vfzwc   1/1     Running   2 (117d ago)   132d   10.244.118.109   xy-5-server14              
calico-node-9qtv5                          1/1     Running   0              132d   192.168.5.19     xy-5-server19              
calico-node-lxg9k                          0/1     Running   0              34s    192.168.5.14     xy-5-server14              
calico-node-rmscn                          1/1     Running   0              33s    192.168.5.17     xy-5-server17              
calico-typha-d4f58c4c9-8nf76               1/1     Running   0              132d   192.168.5.17     xy-5-server17              
calico-typha-d4f58c4c9-dbf8g               1/1     Running   0              132d   192.168.5.14     xy-5-server14              
csi-node-driver-92rbg                      2/2     Running   0              132d   10.244.116.196   xy-5-server17              
csi-node-driver-gpgwd                      2/2     Running   0              132d   10.244.6.82      xy-5-server19              
csi-node-driver-h9kbw                      2/2     Running   0              132d   10.244.118.101   xy-5-server14              
[root@xy-5-server14 calico]# kubectl  -n calico-system describe pod calico-node-lxg9k 

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  47s                default-scheduler  Successfully assigned calico-system/calico-node-lxg9k to xy-5-server14
  Normal   Pulled     47s                kubelet            Container image "docker.io/calico/pod2daemon-flexvol:v3.24.5" already present on machine
  Normal   Created    47s                kubelet            Created container flexvol-driver
  Normal   Started    47s                kubelet            Started container flexvol-driver
  Normal   Pulled     46s                kubelet            Container image "docker.io/calico/cni:v3.24.5" already present on machine
  Normal   Created    45s                kubelet            Created container install-cni
  Normal   Started    45s                kubelet            Started container install-cni
  Normal   Pulled     42s                kubelet            Container image "docker.io/calico/node:v3.24.5" already present on machine
  Normal   Created    42s                kubelet            Created container calico-node
  Normal   Started    41s                kubelet            Started container calico-node
  Warning  Unhealthy  40s (x2 over 41s)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
  Warning  Unhealthy  37s                kubelet            Readiness probe failed: 2023-07-18 08:18:19.246 [INFO][379] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.5.17,192.168.5.19
  Warning  Unhealthy  27s  kubelet  Readiness probe failed: 2023-07-18 08:18:29.242 [INFO][423] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.5.17,192.168.5.19
  Warning  Unhealthy  17s  kubelet  Readiness probe failed: 2023-07-18 08:18:39.246 [INFO][455] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.5.17,192.168.5.19
  Warning  Unhealthy  7s  kubelet  Readiness probe failed: 2023-07-18 08:18:49.249 [INFO][486] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.5.17,192.168.5.19

推而广之，发现所有的位于master节点上面的pod的ip，均不能正常ping通

问题发现

安装calico的客户端：参考:www.cnblogs.com/varden/p/15… 在master上面：

[root@xy-5-server14 ~]# calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 192.168.5.17 | node-to-node mesh | up    | 08:36:51 | Established |
| 192.168.5.19 | node-to-node mesh | up    | 08:37:15 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

连接正常... 在node1上面

[root@xy-5-server17 ~]# calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+------------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |   SINCE    |    INFO     |
+--------------+-------------------+-------+------------+-------------+
| 192.168.5.19 | node-to-node mesh | up    | 2023-03-07 | Established |
| 10.4.0.1     | node-to-node mesh | start | 2023-07-17 | Connect     |
+--------------+-------------------+-------+------------+-------------+

IPv6 BGP status
No IPv6 peers found.

发现问题了吧，master的地址正常应该使用的是192.168.5.14，这个却使用的是10.4.0.1这个ip。同样，在node2上面，也发现相同的问题

[root@xy-5-server19 ~]# calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+------------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |   SINCE    |    INFO     |
+--------------+-------------------+-------+------------+-------------+
| 192.168.5.17 | node-to-node mesh | up    | 08:18:24   | Established |
| 10.4.0.1     | node-to-node mesh | start | 2023-07-17 | Connect     |
+--------------+-------------------+-------+------------+-------------+

IPv6 BGP status
No IPv6 peers found.

在网上找到相同的遭遇的帖子：www.jianshu.com/p/4b175e733… cloud.tencent.com/developer/a… 需要指定网卡，但是我使用的是operator安装的calico，直接修改calico-node的statefulset是不起作用的，会被operator改回去。跟文中的描述不一致。

问题解决

在calico官网找到相关配置：docs.tigera.io/calico/late…

然后在k8s集群中找到

[root@xy-5-server17 ~]# kubectl get Installation
NAME      AGE
default   155d
[root@xy-5-server17 ~]# kubectl edit Installation default
1. Please edit the object below. Lines beginning with a '#' will be ignored,
1. and an empty file will abort the edit. If an error occurs while saving this file will be
1. reopened with the relevant failures.
1. apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  creationTimestamp: "2023-02-13T09:18:18Z"
  finalizers:
  - tigera.io/operator-cleanup
  generation: 3
  name: default
  resourceVersion: "151883088"
  uid: 580c6998-4b1e-4616-8c0b-7a3fc4adf553
spec:
  calicoNetwork:
    bgp: Enabled
    hostPorts: Enabled
    ipPools:
    - blockSize: 26
      cidr: 10.244.0.0/16
      disableBGPExport: false
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()
    linuxDataplane: Iptables
    multiInterfaceMode: None
    nodeAddressAutodetectionV4:
      interface: ens4f1
  cni:
    ipam:
      type: Calico
    type: Calico
  controlPlaneReplicas: 2
  flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
  kubeletVolumePluginPath: /var/lib/kubelet
  nodeUpdateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
  nonPrivileged: Disabled
  variant: Calico
status:
  computed:
    calicoNetwork:
      bgp: Enabled
      hostPorts: Enabled
      ipPools:
      - blockSize: 26
        cidr: 10.244.0.0/16
        disableBGPExport: false

将

nodeAddressAutodetectionV4:
      interface: ens4f1

这段配置，改成文档中描述的那样，设置自己的网卡即可然后发现master节点上的calico-node pod运行正常，dns pod的ip可以ping通，dns服务恢复正常，问题得到了解决。

BIRD is not ready: BGP not established问题解决

问题现象：

问题发现

问题解决

深度技术win7系统账号密码不能修改？修改深度技术win7系统账号密码的方法？

Win7文件夹没有安全选项的解决步骤

linux中的软件包可以卸载吗

使用HyperV安装CentOS详细教程

Win10游戏如何窗口化？Win10游戏窗口化的方法