定义的配置也是json格式,我们可以使用directorouting,能够路由则进行路由,不能路由则使用隧道转发
flannel运行在configmap的kube-system名称空间中
[root@linuxea flannel]# kubectl get configmap -n kube-system
NAME DATA AGE
coredns 1 25d
extension-apiserver-authentication 6 25d
kube-flannel-cfg 2 25d
kube-proxy 2 25d
kubeadm-config 1 25d
kubelet-config-1.11 1 25d
kubernetes-dashboard-settings 1 4d
- flannel本身和kubernetes没有什么关系,flannel以一个插件的形式运行在kubernetes之上,为了能够运行,flannel事先需要存在。并且任何一个kubelet的节点都需要部署flannel,kubelet需要借助flannel为pod设置网络接口,添加激活等。flannel支持部署成守护进程,也支持部署为kuberntes上的pod,对于部署成pod来讲,flannel以DaemonSet的方式部署,并且直接共享宿主机的网络名称空间的pod,从而设置配置虚拟网络,桥等。如果将flannel托管运行在kuberntes之上作为pod运行的话,尽管表现的是pod,但是仍然模拟了系统级别的守护进程的方式运行。
我们使用kubeadm部署的k8s,所有组件都是运行为pod状态,flannel作为daemonset方式运行,使用kubectl get daemonset -n kube-system
查看,由于系统都是amd64位,只有kube-flannel-ds-amd64
有运行4个,也就意味着集群中有4个节点
[root@linuxea flannel]# kubectl get daemonset -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-flannel-ds-amd64 4 4 4 4 4 beta.kubernetes.io/arch=amd64 25d
kube-flannel-ds-arm 0 0 0 0 0 beta.kubernetes.io/arch=arm 25d
kube-flannel-ds-arm64 0 0 0 0 0 beta.kubernetes.io/arch=arm64 25d
kube-flannel-ds-ppc64le 0 0 0 0 0 beta.kubernetes.io/arch=ppc64le 25d
kube-flannel-ds-s390x 0 0 0 0 0 beta.kubernetes.io/arch=s390x 25d
kube-proxy 4 4 4 4 4 beta.kubernetes.io/arch=amd64 25d
并且,kubectl的节点都会部署一个flannel,包括master节点
[root@linuxea flannel]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
kube-controller-manager-linuxea.master-1.com 1/1 Running 0 25d 10.10.240.161 linuxea.master-1.com <none>
kube-flannel-ds-amd64-5swqs 1/1 Running 0 25d 10.10.240.202 linuxea.node-1.com <none>
kube-flannel-ds-amd64-fwzjl 1/1 Running 0 25d 10.10.240.146 linuxea.node-3.com <none>
kube-flannel-ds-amd64-gtqhv 1/1 Running 0 25d 10.10.240.161 linuxea.master-1.com <none>
kube-flannel-ds-amd64-qmhq9 1/1 Running 0 25d 10.10.240.203 linuxea.node-2.com <none>
有一个专用的configmap来配置flannel
[root@linuxea flannel]# kubectl get configmap -n kube-system
NAME DATA AGE
kube-flannel-cfg 2 25d
并且是使用vxlan的方式,(手动进行了排序)
[root@linuxea flannel]# kubectl get configmap kube-flannel-cfg -n kube-system -o jsonpath={.metadata.annotations}
map[kubectl.kubernetes.io/last-applied-configuration:{
"apiVersion":"v1","data":{
"cni-conf.json":"{
"name": "cbr0",
"plugins": [{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}",
"net-conf.json":"{
"Network": "172.16.0.0/16",
"Backend": {
"Type": "vxlan"
}}"},
"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app":"flannel","tier":"node"},"name":"kube-flannel-cfg","namespace":"kube-system"}}
常用配置参数如下:
- Network: 定义ip端 ,如: 172.16.0.0/16
- SubnetLen: 将网段切分子网给节点使用多长的掩码切分,取决于运行的数量。默认24位。
- SubnetMin: 限制子网的起始的第一个子网端。如:172.16.10.0/24范围内
- SubnetMax: 限制子网的结束的最后一个子网端。如:172.16.12.0/24范围内
- Backend:指明通讯方式,vxlan,host-gw,udp
在配置directorouting之前,我们先查看下vxlan的之间的通讯过程。
vxlan转发
现在的三个pod分别在不同的node节点
[root@linuxea ~]# kubectl get pods -o wide
satefulset-2 1/1 Running 0 12d 172.16.4.14 linuxea.node-2.com <none>
satefulset-3 1/1 Running 0 12d 172.16.3.45 linuxea.node-1.com <none>
satefulset-4 1/1 Running 0 12d 172.16.5.119 linuxea.node-3.com <none>
此时,我们在node2上与node1进行通讯
[root@linuxea ~]# kubectl exec -it satefulset-2 -- /bin/sh
/ # ping 172.16.3.45
PING 172.16.3.45 (172.16.3.45): 56 data bytes
64 bytes from 172.16.3.45: seq=0 ttl=62 time=1.239 ms
64 bytes from 172.16.3.45: seq=1 ttl=62 time=0.377 ms
64 bytes from 172.16.3.45: seq=2 ttl=62 time=0.574 ms
64 bytes from 172.16.3.45: seq=3 ttl=62 time=0.583 ms
在到node1节点抓cni0或者flannel.1的包,从172.16.4.14 > 172.16.3.45
是直接进行通讯,可见在flannel.1和cni0接口时还尚未被转成vxlan
[root@DS-VM-Node_10_10_240_202 ~]# tcpdump -i cni0 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cni0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:15:58.935242 IP 172.16.4.14 > 172.16.3.45: ICMP echo request, id 8192, seq 0, length 64
14:15:58.935278 IP 172.16.3.45 > 172.16.4.14: ICMP echo reply, id 8192, seq 0, length 64
14:15:59.935675 IP 172.16.4.14 > 172.16.3.45: ICMP echo request, id 8192, seq 1, length 64
14:15:59.935733 IP 172.16.3.45 > 172.16.4.14: ICMP echo reply, id 8192, seq 1, length 64
14:16:00.935770 IP 172.16.4.14 > 172.16.3.45: ICMP echo request, id 8192, seq 2, length 64
或者直接抓flannel.1
[root@DS-VM-Node_10_10_240_202 ~]# tcpdump -i flannel.1 -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes
14:22:48.289532 IP 172.16.4.14 > 172.16.3.45: ICMP echo request, id 8704, seq 52, length 64
14:22:48.289590 IP 172.16.3.45 > 172.16.4.14: ICMP echo reply, id 8704, seq 52, length 64
14:22:49.289595 IP 172.16.4.14 > 172.16.3.45: ICMP echo request, id 8704, seq 53, length 64
抓eth0物理接口的包,可见 172.16.4.14 > 172.16.3.45
的icmp报文,并且存在overlay
字段,这便是隧道转发
[root@DS-VM-Node_10_10_240_202 ~]# tcpdump -i eth0 -nn host 10.10.240.203
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:30:58.369661 IP 10.10.240.203.9697 > 10.10.240.202.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 172.16.4.14 > 172.16.3.45: ICMP echo request, id 8704, seq 542, length 64
14:30:58.369740 IP 10.10.240.202.43591 > 10.10.240.203.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 172.16.3.45 > 172.16.4.14: ICMP echo reply, id 8704, seq 542, length 64
14:30:59.370155 IP 10.10.240.203.9697 > 10.10.240.202.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 172.16.4.14 > 172.16.3.45: ICMP echo request, id 8704, seq 543, length 64
14:30:59.370239 IP 10.10.240.202.43591 > 10.10.240.203.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 172.16.3.45 > 172.16.4.14: ICMP echo reply, id 8704, seq 543, length 64
- 补充
此前,我们知道cnio的172.16.3.1被当前主机作为隧道协议的本地通信的接口。当创建pod后此接口就出现。安装 yum install bridge-utils
我们在node节点查看,如下:
[root@DS-VM-Node_10_10_240_202 ~]# ip a
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
link/ether 0a:58:ac:10:03:01 brd ff:ff:ff:ff:ff:ff
inet 172.16.3.1/24 scope global cni0
valid_lft forever preferred_lft forever
inet6 fe80::882c:bff:fe2b:7d1c/64 scope link
valid_lft forever preferred_lft forever
40: vethff44d0d4@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default
link/ether 12:ca:aa:c9:20:40 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::10ca:aaff:fec9:2040/64 scope link
valid_lft forever preferred_lft forever
41: veth6950f86e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default
link/ether b2:cc:17:b2:72:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::b0cc:17ff:feb2:7202/64 scope link
valid_lft forever preferred_lft forever
46: veth70f4948e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default
link/ether 0a:31:f6:8b:be:24 brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::831:f6ff:fe8b:be24/64 scope link
valid_lft forever preferred_lft forever
47: veth1b79d0e3@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default
link/ether 8e:bb:4b:aa:09:6f brd ff:ff:ff:ff:ff:ff link-netnsid 3
inet6 fe80::8cbb:4bff:feaa:96f/64 scope link
valid_lft forever preferred_lft forever
show 查看cni0的网桥接口,veth0583b6ee
和vethe7329d65
都将桥接到cni0
[root@DS-VM-Node_10_10_240_202 ~]# brctl show cni0
bridge name bridge id STP enabled interfaces
cni0 8000.0a58ac100301 no veth1b79d0e3
veth6950f86e
veth70f4948e
vethff44d0d4
定义directrouting
在开始集群之前最好就应该设计好网络模式,当pod规模变大之后网络会怎么样,这不好预测。如果半道要修改flannel的网络类型,就需要重新启动flannel,这就意味着集群内的所有pod将会短暂性的无法工作。且不管是修改vxlan的directrouting还是host-gw!
下载kube-flannel.yml修改
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
而后添加 "Directrouting": true
net-conf.json: |
{
"Network": "172.16.0.0/16",
"Backend": {
"Type": "vxlan",
"Directrouting": true
}
}
---
修改完成后,删除之前的kube-flannel.yml ,重新应用
在删除之前,我们可以查看下ip route show,ip转发到flannel。而后创建完成就会变成物理的网卡
[root@linuxea flannel]# ip route show
default via 10.0.0.1 dev eth0
10.0.0.0/8 dev eth1 proto kernel scope link src 10.0.1.215
10.0.0.0/8 dev eth0 proto kernel scope link src 10.10.240.161
172.16.0.0/24 dev cni0 proto kernel scope link src 172.16.0.1
172.16.3.0/24 via 172.16.3.0 dev flannel.1 onlink
172.16.4.0/24 via 172.16.4.0 dev flannel.1 onlink
172.16.5.0/24 via 172.16.5.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
- 开始删除掉此前没有修改的kube-flannel
等待flannle删除完成
[root@linuxea ~]# kubectl delete -f kube-flannel.yml
clusterrole.rbac.authorization.k8s.io "flannel" deleted
clusterrolebinding.rbac.authorization.k8s.io "flannel" deleted
serviceaccount "flannel" deleted
configmap "kube-flannel-cfg" deleted
daemonset.extensions "kube-flannel-ds-amd64" deleted
daemonset.extensions "kube-flannel-ds-arm64" deleted
daemonset.extensions "kube-flannel-ds-arm" deleted
daemonset.extensions "kube-flannel-ds-ppc64le" deleted
daemonset.extensions "kube-flannel-ds-s390x" deleted
删除完成后进行创建
[root@linuxea ~]# kubectl apply -f kube-flannel.yml
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.extensions/kube-flannel-ds-amd64 created
daemonset.extensions/kube-flannel-ds-arm64 created
daemonset.extensions/kube-flannel-ds-arm created
daemonset.extensions/kube-flannel-ds-ppc64le created
daemonset.extensions/kube-flannel-ds-s390x created
创建完成如下:
[root@linuxea ~]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
kube-flannel-ds-amd64-mnrwp 1/1 Running 0 5s
kube-flannel-ds-amd64-msv47 1/1 Running 0 5s
kube-flannel-ds-amd64-nhqt7 1/1 Running 0 5s
kube-flannel-ds-amd64-s2dh8 1/1 Running 0 5s
创建完成后route路由已经发生改变
[root@linuxea ~]# ip route show
default via 10.0.0.1 dev eth0
10.0.0.0/8 dev eth1 proto kernel scope link src 10.0.1.215
10.0.0.0/8 dev eth0 proto kernel scope link src 10.10.240.161
172.16.0.0/24 dev cni0 proto kernel scope link src 172.16.0.1
172.16.3.0/24 via 10.10.240.202 dev eth1
172.16.4.0/24 via 10.10.240.203 dev eth1
172.16.5.0/24 via 10.10.240.146 dev eth1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
进行抓包
在master节点进入位于node2的pod内ping node1的pod
[root@linuxea ingress]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
linuxea-sa-demo 1/1 Running 0 11d 172.16.5.128 linuxea.node-3.com <none>
satefulset-0 1/1 Running 0 14d 172.16.5.120 linuxea.node-3.com <none>
satefulset-1 1/1 Running 0 14d 172.16.3.46 linuxea.node-1.com <none>
satefulset-2 1/1 Running 0 14d 172.16.4.14 linuxea.node-2.com <none>
satefulset-3 1/1 Running 0 14d 172.16.3.45 linuxea.node-1.com <none>
satefulset-4 1/1 Running 0 14d 172.16.5.119 linuxea.node-3.com <none>
[root@linuxea ingress]# kubectl exec -it satefulset-2 -- /bin/sh
/ # ping 172.16.3.45
PING 172.16.3.45 (172.16.3.45): 56 data bytes
64 bytes from 172.16.3.45: seq=0 ttl=62 time=0.473 ms
64 bytes from 172.16.3.45: seq=1 ttl=62 time=0.558 ms
64 bytes from 172.16.3.45: seq=2 ttl=62 time=0.609 ms
在node1上抓包,如下:
[root@DS-VM-Node_10_10_240_202 ~]# tcpdump -i eth0 -nn icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:01:26.634764 IP 10.0.1.61 > 10.10.240.202: ICMP host 10.0.1.61 unreachable - admin prohibited, length 60
17:01:29.642607 IP 10.0.1.61 > 10.10.240.202: ICMP host 10.0.1.61 unreachable - admin prohibited, length 60
17:01:30.356901 IP 172.16.4.14 > 172.16.3.45: ICMP echo request, id 10496, seq 0, length 64
17:01:30.356968 IP 172.16.3.45 > 172.16.4.14: ICMP echo reply, id 10496, seq 0, length 64
17:01:31.356929 IP 172.16.4.14 > 172.16.3.45: ICMP echo request, id 10496, seq 1, length 64
由此可见,directrouting的路由方式的抓包结果与普通结果一样,但效率更高。
只要两台机器在同一个网段就可以根据route路由表中的路由条目使用directrouting自动直接路由。
定义host-gw
host-gw不支持vxlan的directrouting,也没有其他的参数,它不像vxlan支持这种兼容的directrouting。host-gw只要在同一个网段即可。那么就可以直接修改配置文件
[root@linuxea ingress]# sed 's/vxlan/host-gw/' -i kube-flannel.yml
最终修改如下:
net-conf.json: |
{
"Network": "172.16.0.0/16",
"Backend": {
"Type": "host-gw"
}
}
---
而后删除此前的kube-flannel,重新apply即可。
并且你可以使用kubectl logs -n kube-system kube-flannel-ds-amd64-mnrwp
查看变更的信息