欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

docker host net mtu (by quqi99)

程序员文章站 2024-03-24 22:05:22
...

作者:张华 发表于:2021-03-06
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明

在一个gre虚机上创建一个docker container,显然网络不通是由mtu造成的。但是如果host=net创建的网络为什么也是不行呢?

sudo docker run --rm --net=host --privileged --name=nginx -v /sys/fs/cgroup:/sys/fs/cgroup -d -ti nginx
sudo docker exec -ti nginx bash

测试了UDP与TCP都是时而可以时而不可以。

#TCP
nc -tlp 8888
nc -vtz 10.5.0.178 8888

#UDP
nc -ulp 8888
nc -vuz 10.5.0.178 8888

测试结果

  • 使用tcp时,tcp server每测试一遍会自动断,重新运行’nc -tlp 8888’每次均没有问题
  • 使用udp时,udp server每测试一遍不会自动断,每次重新运行"nc -ulp 8888"不会有问题,但不重新运行server时就报refused错误了

上面其实都是正常的,但客户那里报的错如下,这也是正常的,至于“inverse host lookup failed: Unknown host”是查不到host, nc上添加"-n"参数即可解决。

 # nc -uvz xxx 5093
xxx: inverse host lookup failed: Unknown host
(UNKNOWN) [xxx] 5093 (?) open

但客户tcpdump抓到了如下mtu问题,这才是真正的错误所在

07:19:55.096224 IP 10.30.50.189.33473 > xxx.5093: UDP, bad length 1432 > 1408

所以用nc测试能看到“5093 (?) open”说明网络是通的,但存在mtu问题。

tcp可以时的抓包

在虚机上的抓包数据:

[email protected]:~$ sudo tcpdump -ei ens2 -s 0 port 8888
02:06:28.513860 fa:16:3e:54:36:ad (oui Unknown) > fa:16:3e:22:d6:67 (oui Unknown), ethertype IPv4 (0x0800), length 74: i1.38146 > juju-c40d4b-ovn-6.cloud.sts.8888: Flags [S], seq 3310591319, win 65340, options [mss 1452,sackOK,TS val 2788776170 ecr 0,nop,wscale 7], length 0
02:06:28.515634 fa:16:3e:22:d6:67 (oui Unknown) > fa:16:3e:54:36:ad (oui Unknown), ethertype IPv4 (0x0800), length 74: juju-c40d4b-ovn-6.cloud.sts.8888 > i1.38146: Flags [S.], seq 432497099, ack 3310591320, win 62342, options [mss 8918,sackOK,TS val 3043405881 ecr 2788776170,nop,wscale 7], length 0
02:06:28.515689 fa:16:3e:54:36:ad (oui Unknown) > fa:16:3e:22:d6:67 (oui Unknown), ethertype IPv4 (0x0800), length 66: i1.38146 > juju-c40d4b-ovn-6.cloud.sts.8888: Flags [.], ack 1, win 511, options [nop,nop,TS val 2788776172 ecr 3043405881], length 0
02:06:28.515907 fa:16:3e:54:36:ad (oui Unknown) > fa:16:3e:22:d6:67 (oui Unknown), ethertype IPv4 (0x0800), length 66: i1.38146 > juju-c40d4b-ovn-6.cloud.sts.8888: Flags [F.], seq 1, ack 1, win 511, options [nop,nop,TS val 2788776172 ecr 3043405881], length 0
02:06:28.517663 fa:16:3e:22:d6:67 (oui Unknown) > fa:16:3e:54:36:ad (oui Unknown), ethertype IPv4 (0x0800), length 68: juju-c40d4b-ovn-6.cloud.sts.8888 > i1.38146: Flags [P.], seq 1:3, ack 2, win 488, options [nop,nop,TS val 3043405882 ecr 2788776172], length 2
02:06:28.517723 fa:16:3e:54:36:ad (oui Unknown) > fa:16:3e:22:d6:67 (oui Unknown), ethertype IPv4 (0x0800), length 54: i1.38146 > juju-c40d4b-ovn-6.cloud.sts.8888: Flags [R], seq 3310591321, win 0, length 0
02:06:28.517860 fa:16:3e:22:d6:67 (oui Unknown) > fa:16:3e:54:36:ad (oui Unknown), ethertype IPv4 (0x0800), length 66: juju-c40d4b-ovn-6.cloud.sts.8888 > i1.38146: Flags [F.], seq 3, ack 2, win 488, options [nop,nop,TS val 3043405882 ecr 2788776172], length 0


[email protected]:~$ ip addr show ens2
2: ens2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1492 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:54:36:ad brd ff:ff:ff:ff:ff:ff
    inet 192.168.21.161/24 brd 192.168.21.255 scope global dynamic ens2

在物理机上的抓包数据:

02:06:28.514218 fa:16:3e:50:aa:2a (oui Unknown) > fa:16:3e:64:80:50 (oui Unknown), ethertype IPv4 (0x0800), length 74: 10.5.150.115.38146 > juju-c40d4b-ovn-6.cloud.sts.8888: Flags [S], seq 3310591319, win 65340, options [mss 1452,sackOK,TS val 2788776170 ecr 0,nop,wscale 7], length 0
02:06:28.514272 fa:16:3e:64:80:50 (oui Unknown) > fa:16:3e:50:aa:2a (oui Unknown), ethertype IPv4 (0x0800), length 74: juju-c40d4b-ovn-6.cloud.sts.8888 > 10.5.150.115.38146: Flags [S.], seq 432497099, ack 3310591320, win 62342, options [mss 8918,sackOK,TS val 3043405881 ecr 2788776170,nop,wscale 7], length 0
02:06:28.515805 fa:16:3e:50:aa:2a (oui Unknown) > fa:16:3e:64:80:50 (oui Unknown), ethertype IPv4 (0x0800), length 66: 10.5.150.115.38146 > juju-c40d4b-ovn-6.cloud.sts.8888: Flags [.], ack 1, win 511, options [nop,nop,TS val 2788776172 ecr 3043405881], length 0
02:06:28.515848 fa:16:3e:50:aa:2a (oui Unknown) > fa:16:3e:64:80:50 (oui Unknown), ethertype IPv4 (0x0800), length 66: 10.5.150.115.38146 > juju-c40d4b-ovn-6.cloud.sts.8888: Flags [F.], seq 1, ack 1, win 511, options [nop,nop,TS val 2788776172 ecr 3043405881], length 0
02:06:28.516053 fa:16:3e:64:80:50 (oui Unknown) > fa:16:3e:50:aa:2a (oui Unknown), ethertype IPv4 (0x0800), length 68: juju-c40d4b-ovn-6.cloud.sts.8888 > 10.5.150.115.38146: Flags [P.], seq 1:3, ack 2, win 488, options [nop,nop,TS val 3043405882 ecr 2788776172], length 2
02:06:28.516071 fa:16:3e:64:80:50 (oui Unknown) > fa:16:3e:50:aa:2a (oui Unknown), ethertype IPv4 (0x0800), length 66: juju-c40d4b-ovn-6.cloud.sts.8888 > 10.5.150.115.38146: Flags [F.], seq 3, ack 2, win 488, options [nop,nop,TS val 3043405882 ecr 2788776172], length 0
02:06:28.517335 fa:16:3e:50:aa:2a (oui Unknown) > fa:16:3e:64:80:50 (oui Unknown), ethertype IPv4 (0x0800), length 54: 10.5.150.115.38146 > juju-c40d4b-ovn-6.cloud.sts.8888: Flags [R], seq 3310591321, win 0, length 0


[email protected]:~# ip addr show ens3
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8958 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:64:80:50 brd ff:ff:ff:ff:ff:ff
    inet 10.5.0.178/16 brd 10.5.255.255 scope global dynamic ens3

udp可以时的抓包

虚机

02:15:12.943389 fa:16:3e:54:36:ad (oui Unknown) > fa:16:3e:22:d6:67 (oui Unknown), ethertype IPv4 (0x0800), length 43: i1.39709 > juju-c40d4b-ovn-6.cloud.sts.8888: UDP, length 1
02:15:12.946069 fa:16:3e:54:36:ad (oui Unknown) > fa:16:3e:22:d6:67 (oui Unknown), ethertype IPv4 (0x0800), length 43: i1.39709 > juju-c40d4b-ovn-6.cloud.sts.8888: UDP, length 1

[email protected]:~$ sudo tcpdump -i ens2 udp port 8888 -A -nn
02:46:17.724481 IP 192.168.21.161.50757 > 10.5.0.178.8888: UDP, length 1
E...#[email protected]@.6.....
....E"..        ...
02:46:17.728605 IP 192.168.21.161.50757 > 10.5.0.178.8888: UDP, length 1
E...#[email protected]@.6.....
....E"..        ...

物理机

02:15:12.942802 fa:16:3e:50:aa:2a (oui Unknown) > fa:16:3e:64:80:50 (oui Unknown), ethertype IPv4 (0x0800), length 43: 10.5.150.115.39709 > juju-c40d4b-ovn-6.cloud.sts.8888: UDP, length 1
02:15:12.944746 fa:16:3e:50:aa:2a (oui Unknown) > fa:16:3e:64:80:50 (oui Unknown), ethertype IPv4 (0x0800), length 43: 10.5.150.115.39709 > juju-c40d4b-ovn-6.cloud.sts.8888: UDP, length 1

udp不可以时的抓包数据(这也是正常的,server要重启)

[email protected]:/# nc -vuz 10.5.0.178 8888
juju-c40d4b-ovn-6.cloud.sts [10.5.0.178] 8888 (?) : Connection refused

虚机

02:16:28.016397 fa:16:3e:54:36:ad (oui Unknown) > fa:16:3e:22:d6:67 (oui Unknown), ethertype IPv4 (0x0800), length 43: i1.55573 > juju-c40d4b-ovn-6.cloud.sts.8888: UDP, length 1

物理机

02:16:28.014916 fa:16:3e:50:aa:2a (oui Unknown) > fa:16:3e:64:80:50 (oui Unknown), ethertype IPv4 (0x0800), length 43: 10.5.150.115.55573 > juju-c40d4b-ovn-6.cloud.sts.8888: UDP, length 1

UDP, bad length 1432 > 1408

错误"UDP, bad length 1432 > 1408"意为UDP包长度大于UDP有效负载长度(The tcpdump error message you get is due to IP fragmentation which happens because the multicast datagram length > MTU - https://github.com/the-tcpdump-group/tcpdump/blob/tcpdump-4.7.4/print-udp.c#L694),客户的ens3的mtu是1442, 这个1442是怎么来的。

以太网帧为46到1500节字之间,IPv4的IP包头是20,IP报文体是1480字节。UDP头(源端口,目标端口,UDP长度,UDP校验和)是8字节,所以UDP包长度是1472字节。还有GRE头也是8字节。
所以 1442 - 20(IP头) - 8 (UDP头) - 8 (ICMP头) = 1408
GRE/Vxlan的包头大小见 - https://tonydeng.github.io/sdn-handbook/basic/overlay.html

可用下列命令测试mtu (1442 - 28 = 1414)

ping -c 2 -s 1414 -M do 10.5.0.178

[email protected]:/# traceroute --mtu 10.5.0.178
traceroute to 10.5.0.178 (10.5.0.178), 30 hops max, 65000 byte packets
 1  * F=1492 * *
 2  juju-c40d4b-ovn-6.cloud.sts (10.5.0.178)  2.859 ms  2.169 ms  0.671 ms

对于udp, 因为无连接, 所以无法协商mss

$ cat /proc/sys/net/ipv4/ip_no_pmtu_disc
0

见 - https://zhhuabj.blog.csdn.net/article/details/82346840
根据这篇文章(https://blog.csdn.net/sinat_20184565/article/details/80326262),对于udp,在udp server处设置ip_no_pmtu_disc=1(docker中如何设置-https://github.com/hwdsl2/docker-ipsec-vpn-server)后,udp server发出来的包会带有禁止分片DF=1, 这样当udp client收到这种DF=1且udp包大小>mtu时(也见-https://zhhuabj.blog.csdn.net/article/details/114434188)就会向server返回实际的mtu大小,然后server端将包先按mtu分好。因为一般udp分片都是关的,所以需要在server端的应用层先分好。

[email protected]:~$ ethtool -k ens2 |grep udp-fragmentation-offload
udp-fragmentation-offload: off

另一种办法可以是提高虚机与容器的mtu到(1432+ 28=1460), 为什么现在是1442这么低。

重现问题

perf相比nc有一个-l参数,可以指定udp包大小,所以可以很容易重现问题。
在容器里运行:

iperf -c 10.5.0.178 -u -l 1600

在物理机上运行:

iperf -s -u -l 1600

容器里抓包

[email protected]:~$ sudo tcpdump -ei ens2 -s 0 port 5001
04:06:53.401372 fa:16:3e:54:36:ad (oui Unknown) > fa:16:3e:22:d6:67 (oui Unknown), ethertype IPv4 (0x0800), length 1450: i1.54000 > juju-c40d4b-ovn-6.cloud.sts.5001: UDP, bad length 1600 > 1408
相关标签: docker docker mtu