Page MenuHomeVyOS Platform

PBR into VXLAN VRF does not work, encapsulation loop
Open, NormalPublicBUG

Description

This is a bit hard to explain, please bear with me.

We test VyOS with a BGP-EVPN VXLAN core. There is a VRF with the internet routing table. When traffic from a RFC1918 source address to an external public address comes it, it needs to be PBRed to another VRF table where the default route points to a NAT gateway.

The configuration is pretty trivial. Rule 1 accepts all traffic from RFC1918 towards the local networks unmodified. Rule 2 then matches all remaining packets from RFC1918 to the world and looks it up in table 100, which is the NAT gateway table

set policy route PRIVATE_TO_SECOMAT interface 'tun0'
set policy route PRIVATE_TO_SECOMAT rule 1 action 'return'
set policy route PRIVATE_TO_SECOMAT rule 1 destination group network-group 'MWN'
set policy route PRIVATE_TO_SECOMAT rule 1 source group network-group 'RFC1918'
set policy route PRIVATE_TO_SECOMAT rule 2 set table '100'
set policy route PRIVATE_TO_SECOMAT rule 2 source group network-group 'RFC1918'

This does not work and it took me a while to figure out.

First of all, on the L3VNI VRF interface vxlan10001 (in the PBRed destination VRF) you see the actual ping packet (which is good) AND the VXLAN encapsulated version of it, which should not happen (the vxlan interface should only show the decapsulated packets)

22:30:07.038446 IP 192.168.12.1 > 1.1.1.1: ICMP echo request, id 41639, seq 665, length 64
22:30:07.038487 IP 10.10.0.251.6601 > 10.187.0.14.4789: VXLAN, flags [I] (0x08), vni 100002
IP 192.168.12.1 > 1.1.1.1: ICMP echo request, id 41639, seq 665, length 64

Also, the kernel complains loudly

[ 1522.139202] Dead loop on virtual device vxlan10001, fix it urgently!
[ 1523.163174] Dead loop on virtual device vxlan10001, fix it urgently!

Analysis

PBR is marking the packet to be routed differently in a NFT ruleset, then creating a ip rule matching the mark redirecting the lookup to another table

table ip vyos_mangle {
        chain VYOS_PBR_PREROUTING {
                type filter hook prerouting priority mangle; policy accept;
                iifname "tun0" counter packets 887 bytes 74508 jump VYOS_PBR_UD_PRIVATE_TO_SECOMAT
        }
        chain VYOS_PBR_UD_PRIVATE_TO_SECOMAT {
                ip daddr @N_MWN ip saddr @N_RFC1918 counter packets 0 bytes 0 return comment "ipv4-route-PRIVATE_TO_SECOMAT-1"
                ip saddr @N_RFC1918 counter packets 887 bytes 74508 meta mark set 0x7fffff9b return comment "ipv4-route-PRIVATE_TO_SECOMAT-2"
        }
}
admin@vyos1-kb1:~$ ip rule
100:    from all fwmark 0x7fffff9b lookup SECOMAT

I _think_ that the packet mark is not cleared on encapsulation (AFAICR I read somewhere that this is actually a feature). So the original packet comes in, gets marked accordingly, the rule directs the lookup into the SECOMAT table. In the SECOMAT table the route points towards br10001 -> vxlan10001, the traffic gets encapsulated in VXLAN and handed over to the stack to route through the underlay. However, it is still carrying the mark, so it AGAIN gets hit by the rule and an encapsulation loop happens.

I could not figure out to fix this in the configuration or clean the mark in NFT, but the following workaround helps and backs the analysis

admin@vyos1-kb1:~$ sudo ip rule add to 10.187.0.0/24 pref 99 table main
admin@vyos1-kb1:~$ sudo ip rule 
99:     from all to 10.187.0.0/24 lookup main
100:    from all fwmark 0x7fffff9b lookup SECOMAT

so catch all traffic towards the underlay egress and force it in the main table before checking the fwmark. Adding the same rule with pref 101 does not help.

Details

Difficulty level
Unknown (require assessment)
Version
1.5-rolling-202406120020
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Unspecified (please specify)

Event Timeline

Viacheslav triaged this task as Normal priority.Jun 13 2024, 6:54 AM

@bernhardschmidt Are you able to share the relevant pieces of your VXLAN and VRF config as well?

I'm trying to replicate this issue in a lab setup of my own.