Page MenuHomeVyOS Platform

BGP Peer Group Scaling issues
Needs reporter action, HighPublicBUG

Description

Hi Team - Great work with VYOS BTW.

Yesterday, I came across this issue - which might not necessarily be a bug, but just a limitation. I could not find any related articles, so thought of bringing this up.

I configured 1000 BGP peers via a peer-group (passive-mode). I also had 10 standalone BGP peers (not configured via peer-group) which were working just fine. Out of of 1000 peers in the peer-group, I had only 3 active peers out of which one was sending 5 routes. BGP daemon config seems to be okay while creating 1000 peers. But when I restart BGP process or reboot the box or do a VRRP failover the daemon crashes. It's stuck at 100% CPU usage. Also, it doesn't seem to be spreading the load across all cores. Only one CPU is at 100% and all others are at 0% usage.

I thought I should bring it to your attention and see if this is fixable.

Thanks
Prem

Details

Version
VyOS 1.3.0-20220120155620
Is it a breaking change?
Perfectly compatible
Issue type
Bug (incorrect behavior)

Event Timeline

pjeevarathinam updated the task description. (Show Details)
pjeevarathinam updated the task description. (Show Details)

I tried this command as suggested - no luck.

sudo vtysh -c 'conf' -c 'router bgp YOUR_ASN_HERE' -c 'bgp listen limit 5000'

I was also suggested to try this -

You can also try changing /etc/frr/daemons and append --limit-fds 500 to the BGP daemon

No luck. It crashed BGP process

Provide some logs and examples of configuration.
Do you use SNMP?

dmbaturin subscribed.

We need to check if it's still relevant and decide if it declare it WONTFIX.

@pjeevarathinam Could you re-check wiht 1.4-rc3 or the latest rolling?
You can play with descriptions

vyos@r4# set system frr descriptors 
Possible completions:
   <1024-8192>          Number of file descriptors
Viacheslav changed the task status from Open to Needs reporter action.Apr 7 2024, 5:03 PM
dmbaturin changed Is it a breaking change? from Unspecified (possibly destroys the router) to Perfectly compatible.
dmbaturin changed Issue type from Infrastructure issue or change to Bug (incorrect behavior).