Page MenuHomeVyOS Platform

excessive FRR logs about non-existent VRFs
Open, NormalPublicBUG

Description

While subscriber connects/disconnects to remote-access l2tp vpn every routing protocol daemon outputs error message like this:

Oct 06 13:29:20 bgpd[923]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF l2tp0 in VRF 79
Oct 06 13:29:20 ripd[930]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF l2tp0 in VRF 79
Oct 06 13:29:20 ripngd[933]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF l2tp0 in VRF 79
Oct 06 13:29:20 ospfd[936]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF l2tp0 in VRF 79
Oct 06 13:29:20 ospf6d[939]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF l2tp0 in VRF 79
Oct 06 13:29:20 ldpd[957]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF l2tp0 in VRF 79
Oct 06 13:29:20 eigrpd[961]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF l2tp0 in VRF 79
Oct 06 13:29:20 staticd[964]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF l2tp0 in VRF 79
Oct 06 13:29:20 bfdd[967]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF l2tp0 in VRF 79
Oct 06 13:29:20 isisd[200162]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF l2tp0 in VRF 79

Client has nothing to do with VRF 79, moreover - there is no VRF table 79 on the system. The number is random usually. It can be 0 or any.
The messages are very annoying and they have unjustifiably high level - error. On the system with several thousands of clients the log file grows to several Gb in less than an hour. We can't filter errors from routing protocols - as we would lost the BGP peering events then.

Details

Difficulty level
Normal (likely a few hours)
Version
1.4-rolling-202209131208
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)

Event Timeline

Hi @aserkin! It looks like you have some frr server misbehavior. It sends up/down events with an unexisting vrf id.
Could you make/describe the setup that causes the issue to appear? Thanks

v.huti triaged this task as Normal priority.
v.huti changed Difficulty level from Unknown (require assessment) to Normal (likely a few hours).

This a project for mobile access to enterprise networks. VyOS plays as an MPLS-PE router as well as L2TP Network Server. Every subscriber coming via l2tp is directed to the customer's VRF other than default (with RADIUS attribute)

. Attached log file and configuration from our lab. The production system tested has about 200 VRFs configured.

This comment was removed by aserkin.

Any suggestions on the problem, guys?
I see a lot of messages regarding these messages appearing in various scenarios since 2017 or even earlier in FRR community. But did not find any solution actually.

@aserkin as workaround try to change facility level

vtysh -c "conf t" -c "log facility local0"

But it can affect to bgp logs

That does not change the behavior. I get five messages on session start from bfdd, bgpd, ospfd processes, and 16 messages from all FRR daemons on session stop.
The only way to get rid of them is 'log syslog emergencies' but this filters important events as well.

Added more bgpd/ospfd events to the log. The VRF Id seem to be correct. But the events look curious. After session start the interface is first created in vrf default (vrf default, id:0) followed by bgpd/ospfd events, then accel-ppp process moves it to destination vrf (vrf client, id:5) which is follwed by the bgpd/ospfd errors.
Finally, with more or less than 5000 sessions bgpd accidentally becomes unresponsive and utilizes 200% cpu (8 cores are used on VM). Accel-pppd process having all network destinations unreachable also goes unresponsive a bit later.
After that we have to reboot.