Outgoing traffic is unexpectedly translated when policy-based routing is involved. In our case, traffic was dropped upstream because there was no routing back to Vyos because the expected address was behind SNAT.
Test VyOS router has two interfaces, eth0 in LAN and eth1 in WAN.
Here are VyOS commands and debug information to reproduce the issue:
$ configure #eth0 is used for LAN, ICMP ping source machine IP is 192.168.0.254 set interfaces ethernet eth0 address '192.168.0.1/24' #eth1 is used for WAN set interfaces ethernet eth1 address '10.10.1.1/24 #static routes set protocols static route 0.0.0.0/0 next-hop 10.10.1.254 #packet capture interface set protocols static table 1 route 172.16.0.0/24 next-hop 10.10.1.254 #policy route from LAN set policy route pbr-eth0 interface 'eth0' set policy route pbr-eth0 rule 16 log set policy route pbr-eth0 rule 16 set table '1' set policy route pbr-eth0 rule 16 destination address '172.16.0.0/24' #create container network set container network ctr prefix '10.255.255.0/24' #commit changes commit `
Lets check check fwmark existence:
# ip rule list | grep fwmark 1: from all fwmark 0x7ffffffe lookup 1
Lets ping 172.16.0.2 from host in LAN and capture traffic on WAN interface:
$ tcpdump -i eth1 icmp tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on eth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes 12:20:20.740278 IP 192.168.0.254 > 172.16.0.2: ICMP echo request, id 1104, seq 153, length 64 12:20:20.740735 IP 172.16.0.2 > 192.168.0.254: ICMP echo reply, id 1104, seq 153, length 64
Check matching policy:
journalctl -o cat -f [ipv4-route-pbr-eth0-16-A]IN=eth0 OUT= MAC=50:00:00:07:00:00:00:50:79:66:68:08:08:00 SRC=192.168.0.254 DST=172.16.0.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=21273 PROTO=ICMP TYPE=8 CODE=0 ID=6483 SEQ=1
Create container:
run add container image 'debian:bookworm' set container name ctest image 'debian:bookworm' set container name ctest network 'ctr' #commit changes commit
Check overlapping mark on nat table:
# sudo nft list chain ip nat NETAVARK-HOSTPORT-MASQ
# Warning: table ip nat is managed by iptables-nft, do not touch!
table ip nat {
chain NETAVARK-HOSTPORT-MASQ {
meta mark & 0x00002000 == 0x00002000 counter packets 0 bytes 0 masquerade
}
}Create any (partial) NAT rule to activate unexpected SNAT and commit changes:
set nat commit
Check that unexpected nat is activated:
# tcpdump -i vnet0_7 icmp tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on vnet0_7, link-type EN10MB (Ethernet), snapshot length 262144 bytes 12:47:11.299028 IP 10.10.1.1 > 172.16.0.2: ICMP echo request, id 20310, seq 1, length 64 12:47:11.299272 IP 172.16.0.2 > 10.10.1.1: ICMP echo reply, id 20310, seq 1, length 64
Check that chain counters increase.
# sudo nft list chain ip nat NETAVARK-HOSTPORT-MASQ
# Warning: table ip nat is managed by iptables-nft, do not touch!
table ip nat {
chain NETAVARK-HOSTPORT-MASQ {
meta mark & 0x00002000 == 0x00002000 counter packets 138 bytes 11592 masquerade
}
}Check that non-policy based routing (default) still works as expected:
# tcpdump -i vnet0_7 icmp tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on vnet0_7, link-type EN10MB (Ethernet), snapshot length 262144 bytes 12:50:32.846942 IP 192.168.0.254 > 8.8.8.8: ICMP echo request, id 6231, seq 1, length 64 12:50:32.851701 IP 8.8.8.8 > 192.168.0.254: ICMP echo reply, id 6231, seq 1, length 64
Workaround: delete container and reboot.