Not necessarily sure the specific cause of this, but I'm using a zone based firewall config and something has changed between the June builds and now that breaks my container networking. Essentially, ARP requests from the containers are seen on the host side, and it responds, but the containers never see the ARP requests from the host (gateway) side so they cannot respond.
A tcpdump showing the behavior:
astr0n8t@vyos:~$ sudo tcpdump -i pod-interface tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on pod-interface, link-type EN10MB (Ethernet), snapshot length 262144 bytes 16:25:00.599567 ARP, Request who-has 172.17.0.2 tell 172.17.0.1, length 28 16:25:00.727584 ARP, Request who-has 172.17.0.1 tell 172.17.0.10, length 28 16:25:00.727628 ARP, Reply 172.17.0.1 is-at 2e:94:d0:00:ee:15 (oui Unknown), length 28 16:25:01.623592 ARP, Request who-has 172.17.0.2 tell 172.17.0.1, length 28 16:25:01.751584 ARP, Request who-has 172.17.0.1 tell 172.17.0.10, length 28 16:25:01.751628 ARP, Reply 172.17.0.1 is-at 2e:94:d0:00:ee:15 (oui Unknown), length 28 16:25:03.188674 ARP, Request who-has 172.17.0.1 tell 172.17.0.10, length 28 16:25:03.188699 ARP, Reply 172.17.0.1 is-at 2e:94:d0:00:ee:15 (oui Unknown), length 28 16:25:03.391875 ARP, Request who-has 172.17.0.2 tell 172.17.0.1, length 28 16:25:04.247646 ARP, Request who-has 172.17.0.1 tell 172.17.0.10, length 28 16:25:04.247692 ARP, Reply 172.17.0.1 is-at 2e:94:d0:00:ee:15 (oui Unknown), length 28 16:25:04.439718 ARP, Request who-has 172.17.0.2 tell 172.17.0.1, length 28 16:25:05.272594 ARP, Request who-has 172.17.0.1 tell 172.17.0.10, length 28 16:25:05.272641 ARP, Reply 172.17.0.1 is-at 2e:94:d0:00:ee:15 (oui Unknown), length 28 16:25:05.463815 ARP, Request who-has 172.17.0.10 tell 172.17.0.1, length 28 16:25:05.463854 ARP, Request who-has 172.17.0.2 tell 172.17.0.1, length 28 16:25:06.488312 ARP, Request who-has 172.17.0.10 tell 172.17.0.1, length 28
And then in the containers it shows this on the arp table:
# arp -an ? (172.17.0.1) at <incomplete> on eth0
And on the host it shows:
? (172.17.0.2) at <incomplete> on pod-interface
If I disable my firewall completely, the problem goes away and arp works as intended:
? (172.17.0.1) at 2e:94:d0:00:ee:15 [ether] on eth0
? (172.17.0.2) at d6:bf:40:eb:f0:51 [ether] on pod-interface
Relevant firewall bits are:
LAN:
name ALLOW { rule 1 { action accept description "Allow all out" } } name LAN { rule 1 { action jump description "Catch POD-INTERFACE traffic" inbound-interface { name pod-interface } jump-target POD-INTERFACE } } name POD-INTERFACE { rule 1 { action accept } }
Zones:
zone LAN { default-action drop description "Internal LAN Zone" from LOCAL { firewall { name ALLOW } } interface pod-interface intra-zone-filtering { firewall { name LAN } } } zone LOCAL { description "Local zone" from LAN { firewall { name LAN } } local-zone }
And running:
nadehi18@cloud1:~$ show version Version: VyOS 1.5-rolling-202410180006 Release train: current Release flavor: generic Built by: [email protected] Built on: Fri 18 Oct 2024 00:07 UTC Build UUID: a6dc3e7c-619f-4051-937a-93b4adac485f Build commit ID: 2359180068a653 Architecture: x86_64 Boot via: installed image System type: KVM guest Secure Boot: n/a (BIOS)
I was previously running this version where my configuration was working:
1.5-rolling-202406011750