We're running L2TP LNS with MPLS-PE functionality terminating subscribers to 186 VRFs. During the tests i see constant issue with the node when it drops all routes learnt from frr daemons. All the BGP peers go down and the node can be recovered only with reboot.
I tried to use some debug logging for FRR, but can't see any particular messages pointing to the root cause.
The latest test logs contain:
- FRR logs with zebra kernel, bgp neighbor-events and nht debugging
- System syslog covering the issue. The problem starts at 23:40:51 with Network unreachable messages from accel-ppp daemon
- Some show commands after the problem occured
- system configuration
The size of the logging is too big for the upload, so sharing it here:
https://oc.cpm.ru/index.php/s/HU4zwoXge2ynQfn
There is no visible reason for the routes to be dropped. The amount of subscribers at the time of the issue varies from 3k to 10k+
If you can advice what logs&debugs should be turned on additionally - we can repeat the tests next maintenance window.