This is conceptually very similar to the macvlan driver with one major exception of using L3 for mux-ing /demux-ing among slaves. This property makes the master device share the L2 with it’s slave devices. I have developed this driver in conjunction with network namespaces and not sure if there is use case outside of it.
IPvlan has two modes of operation - L2 and L3. For a given master device, you can select one of these two modes and all slaves on that master will operate in the same (selected) mode. The RX mode is almost identical except that in L3 mode the slaves wont receive any multicast / broadcast traffic. L3 mode is more restrictive since routing is controlled from the other (mostly) default namespace.
In this mode TX processing happens on the stack instance attached to the slave device and packets are switched and queued to the master device to send out. In this mode the slaves will RX/TX multicast and broadcast (if applicable) as well.
bridge
This is the default option. To configure the IPvlan port in this mode, user can choose to either add this option on the command-line or don’t specify anything. This is the traditional mode where slaves can cross-talk among themselves apart from talking through the master device.
tao@S3:~$ tao@S3:~$ sudo ip netns add ns1 tao@S3:~$ sudo ip netns add ns2 tao@S3:~$ sudo ip link add link eno2 name ipvl1 type ipvlan mode l2 bridge tao@S3:~$ sudo ip link add link eno2 name ipvl2 type ipvlan mode l2 bridge tao@S3:~$ sudo ip link set ipvl1 netns ns1 tao@S3:~$ sudo ip link set ipvl2 netns ns2 tao@S3:~$ ip a s eno2 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether bb:6b:8c:89:88:22 brd ff:ff:ff:ff:ff:ff altname enp0s31f6 inet 10.138.36.58/23 brd 10.138.37.255 scope global dynamic noprefixroute eno2 valid_lft 46416sec preferred_lft 46416sec inet6 fe80::3304:c53b:493e:2f97/64 scope link noprefixroute valid_lft forever preferred_lft forever tao@S3:~$
ns1中的配置如下:
1 2 3 4 5 6 7 8 9 10 11 12 13
root@S3:/home/tao# ip link 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 101: ipvl1@if3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether bb:6b:8c:89:88:22 brd ff:ff:ff:ff:ff:ff link-netnsid 0 root@S3:/home/tao# ip link set lo up root@S3:/home/tao# ip link set ipvl1 up root@S3:/home/tao# ip addr add 10.138.36.2/23 dev ipvl1 root@S3:/home/tao# ip route add default dev ipvl1 root@S3:/home/tao# ip r default dev ipvl1 scope link 10.138.36.0/23 dev ipvl1 proto kernel scope link src 10.138.36.2 root@S3:/home/tao#
root@S3:/home/tao# ip link set lo up root@S3:/home/tao# ip link set ipvl2 up root@S3:/home/tao# ip addr add 10.138.36.3/23 dev ipvl2 root@S3:/home/tao# ip route add default via 10.138.36.1 dev ipvl2 root@S3:/home/tao# ip r default via 10.138.36.1 dev ipvl2 10.138.36.0/23 dev ipvl2 proto kernel scope link src 10.138.36.3 root@S3:/home/tao# ip addr s 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 102: ipvl2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether bb:6b:8c:89:88:22 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.138.36.3/23 scope global ipvl2 valid_lft forever preferred_lft forever inet6 fe80::bb6b:8c00:289:8822/64 scope link valid_lft forever preferred_lft forever root@S3:/home/tao#
验证ipvlan接口间可以互通
在ns2中执行ping 10.138.36.2, 可以通。
1 2 3 4 5 6 7 8 9 10 11 12 13
root@S3:/home/tao# arp -n Address HWtype HWaddress Flags Mask Iface 10.138.36.2 ether bb:6b:8c:89:88:22 C ipvl2 root@S3:/home/tao# arp -d 10.138.36.2 root@S3:/home/tao# root@S3:/home/tao# ping 10.138.36.2 -c1 PING 10.138.36.2 (10.138.36.2) 56(84) bytes of data. 64 bytes from 10.138.36.2: icmp_seq=1 ttl=64 time=0.092 ms
root@S3:/home/tao# ping baidu.com ping: baidu.com: Temporary failure in name resolution root@S3:/home/tao# resolvectl dns ipvl2 10.3.3.3 10.3.3.4 Failed to set DNS configuration: Link 102 not known root@S3:/home/tao# root@S3:/home/tao# ping 10.138.36.55 -c1 PING 10.138.36.55 (10.138.36.55) 56(84) bytes of data. 64 bytes from 10.138.36.55: icmp_seq=1 ttl=64 time=0.960 ms
--- 10.138.36.55 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.960/0.960/0.960/0.000 ms root@S3:/home/tao# ping 110.242.68.66 -c1 PING 110.242.68.66 (110.242.68.66) 56(84) bytes of data. 64 bytes from 110.242.68.66: icmp_seq=1 ttl=52 time=24.3 ms
root@S3:/home/tao# ip r default dev ipvl1 scope link 10.138.36.0/23 dev ipvl1 proto kernel scope link src 10.138.36.2 root@S3:/home/tao# ping 10.138.36.55 -c1 PING 10.138.36.55 (10.138.36.55) 56(84) bytes of data. 64 bytes from 10.138.36.55: icmp_seq=1 ttl=64 time=0.865 ms
--- 10.138.36.55 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.865/0.865/0.865/0.000 ms root@S3:/home/tao# ping 110.242.68.66 -c1 PING 110.242.68.66 (110.242.68.66) 56(84) bytes of data. From 10.138.36.2 icmp_seq=1 Destination Host Unreachable
If this is added to the command-line, the port is set in VEPA mode. i.e. port will offload switching functionality to the external entity as described in 802.1Qbg Note: VEPA mode in IPvlan has limitations. IPvlan uses the mac-address of the master-device, so the packets which are emitted in this mode for the adjacent neighbor will have source and destination mac same. This will make the switch / router send the redirect message.
tao@S3:~$ sudo ip netns add ns1 tao@S3:~$ sudo ip netns add ns2 tao@S3:~$ sudo ip link add link eno2 name ipvl1 type ipvlan mode l2 vepa tao@S3:~$ sudo ip link add link eno2 name ipvl2 type ipvlan mode l2 vepa tao@S3:~$ sudo ip linkset ipvl1 netns ns1 tao@S3:~$ sudo ip linkset ipvl2 netns ns2 tao@S3:~$ tao@S3:~$ sudo ip netns exec ns1 ip linkset lo up tao@S3:~$ sudo ip netns exec ns1 ip linkset ipvl1 up tao@S3:~$ sudo ip netns exec ns1 ip addr add 192.168.9.2/24 dev ipvl1 tao@S3:~$ sudo ip netns exec ns1 ip route add default dev ipvl1 tao@S3:~$ tao@S3:~$ sudo ip netns exec ns2 ip linkset lo up tao@S3:~$ sudo ip netns exec ns2 ip linkset ipvl2 up tao@S3:~$ sudo ip netns exec ns2 ip addr add 192.168.9.3/24 dev ipvl2 tao@S3:~$ sudo ip netns exec ns2 ip r add default via 10.138.36.66 dev ipvl2 Error: Nexthop has invalid gateway. tao@S3:~$ sudo ip netns exec ns2 ip r add default dev ipvl2 tao@S3:~$
注意:设置网关时报错,下一跳必须在同一网络中!
验证同一父接口上ipvlan接口的连通性
在没有外部交换机/路由器的情况下,在ns2中pingns1,不通:
1 2 3 4 5 6 7 8
root@S3:/home/tao# ping 192.168.9.2 -c 1 PING 192.168.9.2 (192.168.9.2) 56(84) bytes of data. From 192.168.9.3 icmp_seq=1 Destination Host Unreachable
In this mode TX processing up to L3 happens on the stack instance attached to the slave device and packets are switched to the stack instance of the master device for the L2 processing and routing from that instance will be used before packets are queued on the outbound device. In this mode the slaves will not receive nor can send multicast / broadcast traffic.
bridge
创建ipvlan接口和相应网络命名空间
整体配置和上面L2模式类似:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
tao@S3:~$ sudo ip netns add ns1 tao@S3:~$ sudo ip netns add ns2 tao@S3:~$ tao@S3:~$ sudo ip link add link eno2 name ipvl1 type ipvlan mode l3 bridge tao@S3:~$ sudo ip link add link eno2 name ipvl2 type ipvlan mode l3 bridge tao@S3:~$ sudo ip linkset ipvl1 netns ns1 tao@S3:~$ sudo ip linkset ipvl2 netns ns2 tao@S3:~$ sudo ip netns exec ns1 ip linkset lo up tao@S3:~$ sudo ip netns exec ns1 ip linkset ipvl1 up tao@S3:~$ sudo ip netns exec ns1 ip addr add 192.168.9.2/24 dev ipvl1 tao@S3:~$ sudo ip netns exec ns1 ip route add default dev ipvl1 tao@S3:~$ tao@S3:~$ sudo ip netns exec ns2 ip linkset lo up tao@S3:~$ sudo ip netns exec ns2 ip linkset ipvl2 up tao@S3:~$ sudo ip netns exec ns2 ip addr add 192.168.9.3/24 dev ipvl2 tao@S3:~$ sudo ip netns exec ns2 ip r add default dev ipvl2 tao@S3:~$
检验ipvlan接口之间的连通性
在ns2中ping 192.168.9.2, 可以通。注意ipvl2接口的状态字段: NOARP。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
root@S3:/home/tao# ip a s ipvl2 104: ipvl2@if3: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether bb:6b:8c:89:88:22 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.9.3/24 scope global ipvl2 valid_lft forever preferred_lft forever inet6 fe80::bb6b:8c00:289:8822/64 scope link valid_lft forever preferred_lft forever root@S3:/home/tao# ping 192.168.9.2 -c 1 PING 192.168.9.2 (192.168.9.2) 56(84) bytes of data. 64 bytes from 192.168.9.2: icmp_seq=1 ttl=64 time=0.073 ms
tao@S20:~$ sudo ip route add 192.168.9.0/24 via 10.138.36.58 dev eno1 tao@S20:~$ ip r default via 10.138.36.1 dev eno1 proto dhcp metric 20100 10.138.36.0/23 dev eno1 proto kernel scope link src 10.138.36.66 metric 100 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 192.168.9.0/24 via 10.138.36.58 dev eno1 tao@S20:~$ tao@S20:~$ ping 192.168.9.2 -c 1 PING 192.168.9.2 (192.168.9.2) 56(84) bytes of data. 64 bytes from 192.168.9.2: icmp_seq=1 ttl=64 time=0.221 ms
tao@S3:~$ sudo ip netns add ns1 tao@S3:~$ sudo ip netns add ns2 tao@S3:~$ sudo ip link add link eno2 name ipvl1 type ipvlan mode l3 vepa tao@S3:~$ sudo ip link add link eno2 name ipvl2 type ipvlan mode l3 vepa tao@S3:~$ sudo ip linkset ipvl1 netns ns1 tao@S3:~$ sudo ip linkset ipvl2 netns ns2 tao@S3:~$ tao@S3:~$ sudo ip netns exec ns1 ip linkset lo up tao@S3:~$ sudo ip netns exec ns1 ip linkset ipvl1 up tao@S3:~$ sudo ip netns exec ns1 ip addr add 192.168.9.2/24 dev ipvl1 tao@S3:~$ sudo ip netns exec ns1 ip r add default dev ipvl1 tao@S3:~$ tao@S3:~$ sudo ip netns exec ns2 ip linkset lo up tao@S3:~$ sudo ip netns exec ns2 ip linkset ipvl2 up tao@S3:~$ sudo ip netns exec ns2 ip addr add 192.168.9.3/24 dev ipvl2 tao@S3:~$ sudo ip netns exec ns2 ip r add default dev ipvl2 tao@S3:~$
tao@S3:~$ sudo ip r add 192.168.9.0/24 via 10.138.36.66 dev eno2 tao@S3:~$ ip r default via 10.138.36.1 dev eno2 proto dhcp metric 100 10.138.36.0/23 dev eno2 proto kernel scope link src 10.138.36.58 metric 100 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 172.19.0.0/16 dev br-9bf9aa158c15 proto kernel scope link src 172.19.0.1 192.168.9.0/24 via 10.138.36.66 dev eno2 tao@S3:~$
Chain FORWARD (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 ACCEPT all -- eno1 eno1 0.0.0.0/0 0.0.0.0/0 138 11592 DOCKER-USER all -- * * 0.0.0.0/0 0.0.0.0/0 138 11592 DOCKER-ISOLATION-STAGE-1 all -- * * 0.0.0.0/0 0.0.0.0/0 tao@S20:~$ ip r default via 10.138.36.1 dev eno1 proto dhcp metric 100 10.138.36.0/23 dev eno1 proto kernel scope link src 10.138.36.66 metric 100 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 192.168.9.0/24 via 10.138.36.58 dev eno1 tao@S20:~$
现在ipvlan接口间就可以ping通了, 如下:
1 2 3 4 5 6 7 8
root@S3:/home/tao# ping 192.168.9.2 -c1 PING 192.168.9.2 (192.168.9.2) 56(84) bytes of data. 64 bytes from 192.168.9.2: icmp_seq=1 ttl=63 time=0.350 ms
tao@S3:~$ sudo tcpdump -i eno2 'arp or icmp' -e -n [sudo] password for tao: tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on eno2, link-type EN10MB (Ethernet), snapshot length 262144 bytes 16:16:13.911247 bb:6b:8c:89:88:22 > ff:e9:75:75:60:01, ethertype IPv4 (0x0800), length 98: 192.168.9.3 > 39.156.66.10: ICMP echo request, id 6307, seq 1, length 64 16:16:29.355347 ff:e9:75:75:60:01 > bb:6b:8c:86:78:83, ethertype ARP (0x0806), length 60: Reply 10.138.36.1 is-at ff:e9:75:75:60:01, length 46 ^C 4 packets captured 4 packets received by filter 0 packets dropped by kernel tao@S3:~$
解决:在父接口所在命名空间添加路由,将发往39.156.66.10的报文转发到36.66:
1 2 3 4 5 6 7
tao@S3:~$ sudo ip r add 39.156.66.10/32 via 10.138.36.66 dev eno2 tao@S3:~$ ip r default via 10.138.36.1 dev eno2 proto dhcp metric 100 10.138.36.0/23 dev eno2 proto kernel scope link src 10.138.36.58 metric 100 39.156.66.10 via 10.138.36.66 dev eno2 192.168.9.0/24 via 10.138.36.66 dev eno2 tao@S3:~$