User Details
- User Since
- Apr 26 2022, 2:45 PM (47 w, 2 d)
Mon, Mar 13
Actually only multihop BGP peers go down. Others are up, but the routes received from them does not go to kernel, so the connectivity drops.
Latest techsupport: https://oc.cpm.ru/index.php/s/Fg9FfoOatihBOrQ
The system was alive more than 12 hours, but crashed the same way as before.
Fri, Mar 10
Wed, Mar 8
As you can see LNS/MPLS-PE is being built on VyOS 1.4. MPLS-P are NSN (aka Alcatel Lucent) boxes as far as i know.
BTW this configuration takes almost 20 minutes to load. I wonder if there's a way to speed up this process?
Thank you, @c-po. Will try raising limits to 4096.
Well in this project we're trying to implement L2TP network server with MPLE-PE functionality with our partner mobile operator. This is for b2b projects with a number of customers connecting their mobiles to corporate resources for some reasons.
So the config has three groups of BGP peers: four of ipv4-unicast peers (10.228.134.34, 10.228.134.36, 10.228.134.38, 10.228.134.40) for connection to L2TP LACs (actually they are mobile gateways - GGSN/PGW) and AAA servers, another pair is ipv4-vpn multihop peers (10.5.72.1,10.5.72.2) where customer's L3VPN connections are terminated, And one more peer connecting to 3d party carrier grade NAT solution for the customers who need Internet access.
The LNS and NAT nodes are implemented on a single server with KVM virtual machines interconnected with each other and with external world by OpenVSvitch/DPDK.
The VRF names are assigned by AAA server for each subscriber with Accel-VRF-Name attribute.
This is also where the defect https://github.com/FRRouting/frr/issues/12919 comes from. Just to spot on it)
Let me know if you nedd additional info.
Tue, Mar 7
again. It says - download complete. And i can get it from the message:
Thank you for the hint, @c-po
Attached the entire config we have on the node.
There're not much BGP peers, but quite a number of VRFs which terminate remote access l2tp subscribers.
I'd really appreciate any advice on the system optimization for that particular task - ideally i'd like this node to terminate up to 20k l2tp subscribers with very low traffic (not exceeding 0.5gbps i guess).
Mon, Mar 6
The bfdd process did not start even with empty config.
I changed LimitNOFILE=1024 to LimitNOFILE=2048 in /lib/systemd/system/frr.service
That did the trick, but i'm not sure it's a good solution.
What do you think, @Viacheslav ?
The limits look like standard
[email protected]:~# ulimit -Hn
1048576
[email protected]:~# ulimit -Sn
1024
[email protected]:~# sysctl fs.file-max
fs.file-max = 9223372036854775807
Thu, Mar 2
Dec 7 2022
Yes they are. 192.168.101.10 - is an ip of vpn remote access subscriber. He's connected to interface l2tp0 (accel-ppp). And i'm just trying to open tcp connection to port 80 on client from peer node.
The firewall settings does not seem to catch the traffic going out of l2tp* interfaces.
[email protected]:~$ show config commands |grep firewall set firewall interface l2tp* out name 'nodefw' set firewall log-martians 'disable' set firewall name nodefw rule 100 action 'accept' set firewall name nodefw rule 100 protocol 'tcp' set firewall name nodefw rule 100 tcp flags syn set firewall name nodefw rule 100 tcp mss '1300'
Oops. Thank you Nicolas.
Suddenly found myself far behind the current rolling release. Will upgrade first.
Dec 6 2022
There's no
set firewall interface
option here:
[email protected]:~$ show version
Version: VyOS 1.4-rolling-202209131208
Oct 17 2022
Added more bgpd/ospfd events to the log. The VRF Id seem to be correct. But the events look curious. After session start the interface is first created in vrf default (vrf default, id:0) followed by bgpd/ospfd events, then accel-ppp process moves it to destination vrf (vrf client, id:5) which is follwed by the bgpd/ospfd errors.
Finally, with more or less than 5000 sessions bgpd accidentally becomes unresponsive and utilizes 200% cpu (8 cores are used on VM). Accel-pppd process having all network destinations unreachable also goes unresponsive a bit later.
After that we have to reboot.
Oct 12 2022
That does not change the behavior. I get five messages on session start from bfdd, bgpd, ospfd processes, and 16 messages from all FRR daemons on session stop.
The only way to get rid of them is 'log syslog emergencies' but this filters important events as well.
Any suggestions on the problem, guys?
I see a lot of messages regarding these messages appearing in various scenarios since 2017 or even earlier in FRR community. But did not find any solution actually.
Oct 6 2022
This a project for mobile access to enterprise networks. VyOS plays as an MPLS-PE router as well as L2TP Network Server. Every subscriber coming via l2tp is directed to the customer's VRF other than default (with RADIUS attribute)
Sep 7 2022
I'd suggest adding
**Restart=always RestartSec=10**
to /usr/share/vyos/templates/telegraf/override.conf.j2 as it is done for ntp.service.
Otherwise the telegraf service do not start - it does 5 start attempts very quickly during boot with error:
Sep 07 11:43:59 vyos-lns-1 systemd[1]: telegraf.service: Failed with result 'exit-code'. Sep 07 11:43:59 vyos-lns-1 systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 5. Sep 07 11:43:59 vyos-lns-1 systemd[1]: telegraf.service: Start request repeated too quickly. Sep 07 11:43:59 vyos-lns-1 systemd[1]: telegraf.service: Failed with result 'exit-code'.
and stays in a failed state.
see boot log attached.
Sep 1 2022
Need an advice guys, how we can reproduce the problem. I tried to peer with bird and announced 100k prefixes to the vyos box, but this simple config did not cause memory leak with bgpd. Still trying
Aug 19 2022
Nothing helps
Aug 18 2022
The only way to start telegraf with ip vrf exec i found - is to comment out
#User=telegraf
in /etc/systemd/system/vyos-telegraf.service and
chown root:root /run/telegraf
Aug 16 2022
Manual start of telegraf works for me
Aug 15 2022
Aug 10 2022
Hi Viacheslav
Sorry, i probably misspelled the config option. Actually it's availabe at [radius] section of accel-ppp.conf.
Below is the [radius] section from my /run/accel-pppd/l2tp.conf after i changed
/usr/libexec/vyos/conf_mode/vpn_l2tp.py:
Jul 29 2022
Jul 28 2022
Is there any chance to fix that ?