Page MenuHomeVyOS Platform

EVPN ESI requires arbitrary change to VxLAN interface to update fdb from EVPN
Closed, ResolvedPublic

Assigned To
None
Authored By
L0crian
Mar 22 2025, 2:59 PM
Referenced Files
F15395864: image.png
Jun 26 2025, 9:53 PM
F15395805: image.png
Jun 26 2025, 9:53 PM
F13077639: image.png
Mar 22 2025, 2:59 PM

Description

EDIT: This appears to be caused by the offloads that are enabled by default. Removing them corrects the issue.

When configuring an EVPN-MH solution, ARP is not resolved locally from EVPN until the parameters field of the VxLAN interface is either added or deleted. This is not a specific item from within parameters, the full field needs to be added or removed, and then the fdb is updated correctly:

NOTE: Tested on latest rolling

Topology:

image.png (606×401 px, 26 KB)

I am simulating a double failure scenario in this. Traffic will flow as such:

  1. Client (10.0.1.10) attempts to reach the internet by it's gateway (10.0.1.1).
vyos@Client:~$ ping 4.2.2.2 count 1
PING 4.2.2.2 (4.2.2.2) 56(84) bytes of data.

--- 4.2.2.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
  1. Both PEs have an anycast gateway of 10.0.1.1 that can respond.

PE1:

Codes: S - State, L - Link, u - Up, D - Down, A - Admin Down
Interface    IP Address       MAC                VRF        MTU  S/L    Description
-----------  ---------------  -----------------  -------  -----  -----  -------------
br0          10.0.1.1/24      aa:bb:cc:dd:ee:f1  default   1500  u/u

PE2:

Codes: S - State, L - Link, u - Up, D - Down, A - Admin Down
Interface    IP Address       MAC                VRF        MTU  S/L    Description
-----------  ---------------  -----------------  -------  -----  -----  -------------
br0          10.0.1.1/24      aa:bb:cc:dd:ee:f1  default   1500  u/u
  1. Traffic goes from client-->sw2-->sw1-->pe1, where PE1 is one of the anycast gateways, so it routes to the internet.
  2. Internet is down from PE1 directly, so it routes L3 over to PE2 to the internet.
  3. Traffic returns from the internet to PE2, with the return IP of 10.0.1.10.
  4. Since PE2 has that subnet, it attempts to ARP for 10.0.1.10 so it can forward L2 over the VxLAN interface to PE1 (and eventually follow the L2 path to the client).
  5. The client will see the arp message, but the reponse stops at PE1, due to the anycast gateway. This prevents PE2 from ever knowing the MAC for 10.0.1.10.
vyos@PE2# run show arp interface br0
Address     Interface    Link layer address    State
----------  -----------  --------------------  ----------
10.0.1.10   br0                                INCOMPLETE
  1. ARP should resolve from the EVPN ARP-cache, which is present and populated correctly on PE2, but it doesn't:
vyos@PE2:~$ show evpn arp-cache vni all
VNI 100 #ARP (IPv4 and IPv6, local and remote) 1

Flags: I=local-inactive, P=peer-active, X=peer-proxy
Neighbor        Type   Flags State    MAC               Remote ES/VTEP                 Seq #'s
10.0.1.10       local  PXI   active   e2:d9:b5:99:a4:73                                1/0
fdb pre-change:
# MAC for Client IP (10.0.1.10)
6e:2c:0a:5e:c6:d5 dev bond0 vlan 1 master br0 static
6e:2c:0a:5e:c6:d5 dev vxlan0 master br0 
6e:2c:0a:5e:c6:d5 dev vxlan0 dst 10.0.0.1 self

After Change:

If I remove (or add) the parameters field of the VxLAN interface, it'll finally resolve from the arp-cache

vyos@PE2# delete interfaces vxlan vxlan0 parameters 
vyos@PE2# commit
vyos@Client:~$ ping 4.2.2.2 count 1
PING 4.2.2.2 (4.2.2.2) 56(84) bytes of data.
64 bytes from 4.2.2.2: icmp_seq=1 ttl=57 time=13.0 ms

--- 4.2.2.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 12.992/12.992/12.992/0.000 ms
vyos@PE2# run show arp interface br0
Address    Interface    Link layer address    State
---------  -----------  --------------------  -------
10.0.1.10  br0          e2:d9:b5:99:a4:73     NOARP
fdb post-change:
6e:2c:0a:5e:c6:d5 dev bond0 vlan 1 extern_learn master br0 static
6e:2c:0a:5e:c6:d5 dev vxlan0 extern_learn master br0 
6e:2c:0a:5e:c6:d5 dev vxlan0 nhid 536870913 self extern_learn
NOTE: This works correctly until the MAC ages out from no traffic, or a topology change occurs. Then it is necessary to add/remove the parameters section again.

Config:

PE1:

PE1:
set interfaces bonding bond0 evpn es-df-pref '1000'
set interfaces bonding bond0 evpn es-id '100'
set interfaces bonding bond0 evpn es-sys-mac 'aa:bb:cc:dd:ee:f0'
set interfaces bonding bond0 evpn uplink
set interfaces bonding bond0 member interface 'eth2'
set interfaces bonding bond0 min-links '1'
set interfaces bonding bond0 mode '802.3ad'
set interfaces bonding bond0 system-mac 'aa:bb:cc:dd:ee:f0'

set interfaces bridge br0 address '10.0.1.1/24'
set interfaces bridge br0 mac 'aa:bb:cc:dd:ee:f1'
set interfaces bridge br0 member interface bond0
set interfaces bridge br0 member interface vxlan0

set interfaces dummy dum0 address '10.0.0.1/32'

set interfaces ethernet eth0 vif 101 address 'dhcp'
set interfaces ethernet eth0 vif 101 dhcp-options default-route-distance '255'
set interfaces ethernet eth1 address '10.1.2.1/24'
set interfaces ethernet eth1 
set interfaces ethernet eth2 

set interfaces vxlan vxlan0 mtu '1500'
set interfaces vxlan vxlan0 parameters nolearning
set interfaces vxlan vxlan0 port '4789'
set interfaces vxlan vxlan0 source-address '10.0.0.1'
set interfaces vxlan vxlan0 vni '100'

set nat source rule 10 outbound-interface name 'eth0.101'
set nat source rule 10 translation address 'masquerade'

set protocols bgp address-family l2vpn-evpn advertise-all-vni
set protocols bgp neighbor 10.1.2.2 address-family l2vpn-evpn
set protocols bgp neighbor 10.1.2.2 remote-as '65000'
set protocols bgp system-as '65000'

set protocols ospf area 0
set protocols ospf interface dum0 area '0'
set protocols ospf interface eth1 area '0'
set protocols ospf interface eth1 network 'point-to-point'

set protocols static route 0.0.0.0/0 next-hop 10.1.2.2 distance '245'

PE2:

set interfaces bonding bond0 evpn es-df-pref '500'
set interfaces bonding bond0 evpn es-id '100'
set interfaces bonding bond0 evpn es-sys-mac 'aa:bb:cc:dd:ee:f0'
set interfaces bonding bond0 evpn uplink
set interfaces bonding bond0 member interface 'eth2'
set interfaces bonding bond0 min-links '1'
set interfaces bonding bond0 mode '802.3ad'
set interfaces bonding bond0 system-mac 'aa:bb:cc:dd:ee:f0'

set interfaces bridge br0 address '10.0.1.1/24'
set interfaces bridge br0 mac 'aa:bb:cc:dd:ee:f1'
set interfaces bridge br0 member interface bond0
set interfaces bridge br0 member interface vxlan0

set interfaces dummy dum0 address '10.0.0.2/32'

set interfaces ethernet eth0 vif 101 address 'dhcp'
set interfaces ethernet eth1 address '10.1.2.2/24'
set interfaces ethernet eth1 
set interfaces ethernet eth2 

set interfaces vxlan vxlan0 description 'TEST'
set interfaces vxlan vxlan0 mtu '1500'
set interfaces vxlan vxlan0 parameters nolearning
set interfaces vxlan vxlan0 port '4789'
set interfaces vxlan vxlan0 source-address '10.0.0.2'
set interfaces vxlan vxlan0 vni '100'

set nat source rule 10 outbound-interface name 'eth0.101'
set nat source rule 10 translation address 'masquerade'

set protocols bgp address-family l2vpn-evpn advertise-all-vni
set protocols bgp address-family l2vpn-evpn vni 100
set protocols bgp neighbor 10.1.2.1 address-family l2vpn-evpn
set protocols bgp neighbor 10.1.2.1 remote-as '65000'
set protocols bgp neighbor 10.1.2.1 solo
set protocols bgp system-as '65000'

set protocols ospf area 0
set protocols ospf interface dum0 area '0'
set protocols ospf interface eth1 area '0'
set protocols ospf interface eth1 network 'point-to-point'

set protocols static route 0.0.0.0/0 next-hop 10.0.101.1 distance '245'

Details

Version
1.4,1.5
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)

Event Timeline

L0crian created this object in space S1 VyOS Public.
L0crian updated the task description. (Show Details)
L0crian updated the task description. (Show Details)
L0crian removed a project: VyOS 1.4 Sagitta (1.4.1).

I came across the issue described here and noticed that it has been marked as resolved. However, I'm still experiencing a similar problem in my environment.

In my case, traffic appears to work correctly, packets are forwarded and reach the destination server, and replies are coming back as expected. However, one of the routers continues to send ARP requests, even though the MAC address is already visible in show evpn arp-cache. It looks like the router forwards the packet properly, and the other router responds and populates its own ARP/EVPN cache accordingly.

Despite this, ARP requests keep being sent endlessly, and the server keeps responding to the arp, even though it already has the MAC cached. It's as if the EVPN ARP cache is not fully trusted or synchronized, causing an unnecessary ARP loop.

I can provide a topology diagram and configuration for review if needed, I just can't post them publicly here due to sensitive data. Please feel free to contact me directly if you're available to take a closer look. Tested in latest nightly build.

Thanks in advance!

image.png (1×1 px, 214 KB)

I managed to recreate the lab in GNS3 with the exact same EVPN 1-to-1 configuration as I had on the physical setup, and it seems the same bug appears. As shown in the screenshot, one router keeps sending ARP requests while the other keeps receiving ARP replies in a loop.

image.png (532×736 px, 174 KB)

If you want to take a look, I can give you VPN access through NetBird, and I can also share access to GNS3 so you can connect directly to the console.