aserkin (Alexander Serkin)
User

Projects

User does not belong to any projects.

User Details

User Since: Apr 26 2022, 2:45 PM (134 w, 5 d)

Recent Activity
View All

Jan 10 2024

aserkin added a comment to T5424: Routes vanishes when using FRR with ECMP and one of the ECMP paths is no longer available.

Does anybody know if that's going to be fixed in FRR?

Jan 10 2024, 11:26 PM · VyOS Rolling, Restricted Project

Nov 14 2023

aserkin added a comment to T5168: zebra memory leak.

Hi @v.huti
This is probably obsoleted. I've upgraded few times since then and came to version 8.5 which does not seem to suffer this. Thank you.
And we had to stop activities on the project due to other issue described in
https://vyos.dev/T5424

Nov 14 2023, 9:24 AM

Aug 7 2023

aserkin added a comment to T5424: Routes vanishes when using FRR with ECMP and one of the ECMP paths is no longer available.

If that was pppoe i'd have thought of arp, but here with fixed number of l2tp tunnels (22 tunnels from LACs) i don't think arp cache oversizes the table.
Some more information which i can't think of as a failure reason yet, but it looks strange, - just before the issue we see that LAC drops l2tp tunnel for some reason and starts to send SCCRQ with tid=0 as if it just started working. After a while accel-ppp daemon drops the old tunnels and starts the new ones for few LACs. This definitely cause massive (thousands) route updates between zebra and kernel i guess. Sometimes the system can stand this, sometimes it cant.

Aug 7 2023, 5:21 PM · VyOS Rolling, Restricted Project

aserkin added a comment to T5424: Routes vanishes when using FRR with ECMP and one of the ECMP paths is no longer available.

I checked the FRR version in the recent rolling release - it is release candidate still. Does it worth upgrading from 8.5.2? As for the possibility - yes, sure we can build latest image.

Aug 7 2023, 3:43 PM · VyOS Rolling, Restricted Project

aserkin added a comment to T5424: Routes vanishes when using FRR with ECMP and one of the ECMP paths is no longer available.

logs.zip858 KBDownload

Adding what was available this time. Will try to turn on debugs next time if we have another chance. Yes, the behavior was identical to previous.

Aug 7 2023, 3:35 PM · VyOS Rolling, Restricted Project

aserkin added a comment to T5424: Routes vanishes when using FRR with ECMP and one of the ECMP paths is no longer available.

After 19 hours of production run since yesterday the failure occurred again despite the workaround applied. Routes are cleared from kernel for some reason. During the run we observed few l2tp tunnels drops followed by 600 to 6000 sessions drop. The reason is not clear for now but i'm not sure this should kill zebra functionality this way.

Aug 7 2023, 3:06 PM · VyOS Rolling, Restricted Project

Aug 3 2023

aserkin added a comment to T5424: Routes vanishes when using FRR with ECMP and one of the ECMP paths is no longer available.

Yes, i did that as option A yesterday. And rebooted. Then removed "zebra nexthop-group keep 1" and play a bit with interfaces up/down until kernel routes vanished. Then i put "zebra nexthop-group keep 1" back and rebooted again.
Will try option B then.
Meanwhile it appeared possible to fix "Route install failed" errors. I turned on debug zebra kernel, found the nhg_id which caused route install error and created it manually using nh1/nh2 provided by vtysh -c "show nexthop-group rib <nhg_id>". Just as it is described in the original thread regarding ipv6 routes.

Aug 3 2023, 10:52 AM · VyOS Rolling, Restricted Project

aserkin added a comment to T5424: Routes vanishes when using FRR with ECMP and one of the ECMP paths is no longer available.

There is still some problem with the workaround proposed. It seems not fully working when applied on the running system with active BGP sessions. At least i still see the next hop groups in the kernel which has only one next hop after our last tests:

Aug 3 2023, 9:15 AM · VyOS Rolling, Restricted Project

Aug 2 2023

aserkin added a comment to T5424: Routes vanishes when using FRR with ECMP and one of the ECMP paths is no longer available.

From last night tests it seems to be solved. Though i'd prefer to test the node in production for a few weeks to be sure.

Aug 2 2023, 1:52 PM · VyOS Rolling, Restricted Project

May 10 2023

aserkin removed a watcher for VyOS 1.4 Sagitta: aserkin.

May 10 2023, 11:40 AM

Apr 25 2023

aserkin added a comment to T5169: Add CGNAT Carrier-Grade NAT based on nftables.

Apr 25 2023, 4:11 PM · VyOS Rolling, VyOS 1.5 Circinus

aserkin added a comment to T5169: Add CGNAT Carrier-Grade NAT based on nftables.

Two cents from the fields. It will be nice to see vrf aware cg-nat solution, when subscribers from a number of "inside" vrfs NAT'ed into one outside vrf. Of course if that's possible.

Apr 25 2023, 4:10 PM · VyOS Rolling, VyOS 1.5 Circinus

Apr 19 2023

aserkin attached a referenced file: F3728177: zebra-mem-leak.jpg.

Apr 19 2023, 10:19 AM

aserkin created T5168: zebra memory leak.

Apr 19 2023, 10:02 AM

aserkin added a watcher for VyOS 1.4 Sagitta: aserkin.

Apr 19 2023, 9:55 AM

Mar 13 2023

aserkin added a comment to T5077: routes completely dropped from the node while running L2TP LNS service.

Actually only multihop BGP peers go down. Others are up, but the routes received from them does not go to kernel, so the connectivity drops.
Latest techsupport: https://oc.cpm.ru/index.php/s/Fg9FfoOatihBOrQ
The system was alive more than 12 hours, but crashed the same way as before.

Mar 13 2023, 8:23 AM · Restricted Project

Mar 10 2023

aserkin created T5077: routes completely dropped from the node while running L2TP LNS service.

Mar 10 2023, 7:44 AM · Restricted Project

Mar 8 2023

aserkin added a comment to T5045: BFD is not starting after upgrade to 1.4-rolling-202302150317.

As you can see LNS/MPLS-PE is being built on VyOS 1.4. MPLS-P are NSN (aka Alcatel Lucent) boxes as far as i know.

Mar 8 2023, 1:25 PM · VyOS 1.4 Sagitta

aserkin added a comment to T5045: BFD is not starting after upgrade to 1.4-rolling-202302150317.

BTW this configuration takes almost 20 minutes to load. I wonder if there's a way to speed up this process?

Mar 8 2023, 10:38 AM · VyOS 1.4 Sagitta

aserkin added a comment to T5045: BFD is not starting after upgrade to 1.4-rolling-202302150317.

Thank you, @c-po. Will try raising limits to 4096.
Well in this project we're trying to implement L2TP network server with MPLE-PE functionality with our partner mobile operator. This is for b2b projects with a number of customers connecting their mobiles to corporate resources for some reasons.
So the config has three groups of BGP peers: four of ipv4-unicast peers (10.228.134.34, 10.228.134.36, 10.228.134.38, 10.228.134.40) for connection to L2TP LACs (actually they are mobile gateways - GGSN/PGW) and AAA servers, another pair is ipv4-vpn multihop peers (10.5.72.1,10.5.72.2) where customer's L3VPN connections are terminated, And one more peer connecting to 3d party carrier grade NAT solution for the customers who need Internet access.
The LNS and NAT nodes are implemented on a single server with KVM virtual machines interconnected with each other and with external world by OpenVSvitch/DPDK.
The VRF names are assigned by AAA server for each subscriber with Accel-VRF-Name attribute.
This is also where the defect https://github.com/FRRouting/frr/issues/12919 comes from. Just to spot on it)
Let me know if you nedd additional info.

Mar 8 2023, 8:20 AM · VyOS 1.4 Sagitta

Mar 7 2023

aserkin added a comment to T5045: BFD is not starting after upgrade to 1.4-rolling-202302150317.

again. It says - download complete. And i can get it from the message:

Mar 7 2023, 11:32 AM · VyOS 1.4 Sagitta

aserkin added a comment to T5045: BFD is not starting after upgrade to 1.4-rolling-202302150317.

Thank you for the hint, @c-po
Attached the entire config we have on the node.

lns-3-1-commands-clean.cfg213 KBDownload

There're not much BGP peers, but quite a number of VRFs which terminate remote access l2tp subscribers.
I'd really appreciate any advice on the system optimization for that particular task - ideally i'd like this node to terminate up to 20k l2tp subscribers with very low traffic (not exceeding 0.5gbps i guess).

Mar 7 2023, 11:01 AM · VyOS 1.4 Sagitta

Mar 6 2023

aserkin added a comment to T5045: BFD is not starting after upgrade to 1.4-rolling-202302150317.

The bfdd process did not start until i changed LimitNOFILE=1024 to LimitNOFILE=2048 in /lib/systemd/system/frr.service
That did the trick, but i'm not sure it's a good solution.
What do you think, @Viacheslav ?

Mar 6 2023, 11:27 PM · VyOS 1.4 Sagitta

aserkin added a comment to T5045: BFD is not starting after upgrade to 1.4-rolling-202302150317.

The limits look like standard
root@nn-vlns-3-1:~# ulimit -Hn
1048576
root@nn-vlns-3-1:~# ulimit -Sn
1024
root@nn-vlns-3-1:~# sysctl fs.file-max
fs.file-max = 9223372036854775807

Mar 6 2023, 8:00 PM · VyOS 1.4 Sagitta

Mar 2 2023

aserkin created T5045: BFD is not starting after upgrade to 1.4-rolling-202302150317.

Mar 2 2023, 12:20 AM · VyOS 1.4 Sagitta

Dec 7 2022

aserkin added a comment to T4863: need an option for route policy to apply to dynamic interfaces l2tp*/ipoe*/pppoe* (for TCP MSS setting).

Yes they are. 192.168.101.10 - is an ip of vpn remote access subscriber. He's connected to interface l2tp0 (accel-ppp). And i'm just trying to open tcp connection to port 80 on client from peer node.

Dec 7 2022, 11:17 AM · VyOS 1.4 Sagitta

aserkin added a comment to T4863: need an option for route policy to apply to dynamic interfaces l2tp*/ipoe*/pppoe* (for TCP MSS setting).

The firewall settings does not seem to catch the traffic going out of l2tp* interfaces.

admin@vyos-lns-1:~$ show config commands |grep firewall
set firewall interface l2tp* out name 'nodefw'
set firewall log-martians 'disable'
set firewall name nodefw rule 100 action 'accept'
set firewall name nodefw rule 100 protocol 'tcp'
set firewall name nodefw rule 100 tcp flags syn
set firewall name nodefw rule 100 tcp mss '1300'

Dec 7 2022, 10:44 AM · VyOS 1.4 Sagitta

aserkin added a comment to T4863: need an option for route policy to apply to dynamic interfaces l2tp*/ipoe*/pppoe* (for TCP MSS setting).

Oops. Thank you Nicolas.
Suddenly found myself far behind the current rolling release. Will upgrade first.

Dec 7 2022, 8:39 AM · VyOS 1.4 Sagitta

Dec 6 2022

aserkin added a comment to T4863: need an option for route policy to apply to dynamic interfaces l2tp*/ipoe*/pppoe* (for TCP MSS setting).

There's no

set firewall interface

option here:
admin@vyos-lns-1:~$ show version
Version: VyOS 1.4-rolling-202209131208

Dec 6 2022, 5:52 PM · VyOS 1.4 Sagitta

aserkin updated the task description for T4863: need an option for route policy to apply to dynamic interfaces l2tp*/ipoe*/pppoe* (for TCP MSS setting).

Dec 6 2022, 1:02 PM · VyOS 1.4 Sagitta

aserkin updated the task description for T4863: need an option for route policy to apply to dynamic interfaces l2tp*/ipoe*/pppoe* (for TCP MSS setting).

Dec 6 2022, 1:02 PM · VyOS 1.4 Sagitta

aserkin updated the task description for T4863: need an option for route policy to apply to dynamic interfaces l2tp*/ipoe*/pppoe* (for TCP MSS setting).

Dec 6 2022, 1:01 PM · VyOS 1.4 Sagitta

aserkin created T4863: need an option for route policy to apply to dynamic interfaces l2tp*/ipoe*/pppoe* (for TCP MSS setting).

Dec 6 2022, 1:01 PM · VyOS 1.4 Sagitta

Oct 17 2022

aserkin added a comment to T4731: excessive FRR logs about non-existent VRFs.

Added more bgpd/ospfd events to the log. The VRF Id seem to be correct. But the events look curious. After session start the interface is first created in vrf default (vrf default, id:0) followed by bgpd/ospfd events, then accel-ppp process moves it to destination vrf (vrf client, id:5) which is follwed by the bgpd/ospfd errors.
Finally, with more or less than 5000 sessions bgpd accidentally becomes unresponsive and utilizes 200% cpu (8 cores are used on VM). Accel-pppd process having all network destinations unreachable also goes unresponsive a bit later.
After that we have to reboot.

l2tp-session-start-stop.log13 KBDownload

Oct 17 2022, 12:11 PM · VyOS Rolling, Restricted Project

Oct 12 2022

aserkin added a comment to T4731: excessive FRR logs about non-existent VRFs.

That does not change the behavior. I get five messages on session start from bfdd, bgpd, ospfd processes, and 16 messages from all FRR daemons on session stop.
The only way to get rid of them is 'log syslog emergencies' but this filters important events as well.

Oct 12 2022, 1:00 PM · VyOS Rolling, Restricted Project

aserkin added a comment to T4731: excessive FRR logs about non-existent VRFs.

Any suggestions on the problem, guys?
I see a lot of messages regarding these messages appearing in various scenarios since 2017 or even earlier in FRR community. But did not find any solution actually.

Oct 12 2022, 9:09 AM · VyOS Rolling, Restricted Project

Oct 6 2022

aserkin created T4732: need an option for VRF name when you specify location for commit-archive.

Oct 6 2022, 6:02 PM · Restricted Project, VyOS Rolling

aserkin added a comment to T4731: excessive FRR logs about non-existent VRFs.

Oct 6 2022, 4:59 PM · VyOS Rolling, Restricted Project

aserkin added a comment to T4731: excessive FRR logs about non-existent VRFs.

This a project for mobile access to enterprise networks. VyOS plays as an MPLS-PE router as well as L2TP Network Server. Every subscriber coming via l2tp is directed to the customer's VRF other than default (with RADIUS attribute)

vyos-lns-1.cfg8 KBDownload

Oct 6 2022, 4:24 PM · VyOS Rolling, Restricted Project

aserkin created T4731: excessive FRR logs about non-existent VRFs.

Oct 6 2022, 10:44 AM · VyOS Rolling, Restricted Project

Sep 7 2022

aserkin created T4675: telegraf do not start at boot when configured in VRF.

Sep 7 2022, 2:01 PM · VyOS 1.4 Sagitta

aserkin added a comment to T4617: VRF specification is needed for telegraf prometheus-client listen-address <address> .

I'd suggest adding

**Restart=always
RestartSec=10**

to /usr/share/vyos/templates/telegraf/override.conf.j2 as it is done for ntp.service.
Otherwise the telegraf service do not start - it does 5 start attempts very quickly during boot with error:

Sep 07 11:43:59 vyos-lns-1 systemd[1]: telegraf.service: Failed with result 'exit-code'.
Sep 07 11:43:59 vyos-lns-1 systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 5.
Sep 07 11:43:59 vyos-lns-1 systemd[1]: telegraf.service: Start request repeated too quickly.
Sep 07 11:43:59 vyos-lns-1 systemd[1]: telegraf.service: Failed with result 'exit-code'.

and stays in a failed state.
see boot log attached.

vyos-boot.log240 KBDownload

Sep 7 2022, 9:28 AM · VyOS 1.4 Sagitta

Sep 1 2022

aserkin added a comment to T4604: bgpd eats huge amount of memory (about 500Megs a day).

Need an advice guys, how we can reproduce the problem. I tried to peer with bird and announced 100k prefixes to the vyos box, but this simple config did not cause memory leak with bgpd. Still trying

Sep 1 2022, 7:44 AM · VyOS 1.4 Sagitta

Aug 19 2022

aserkin added a comment to T4617: VRF specification is needed for telegraf prometheus-client listen-address <address> .

Nothing helps

Aug 19 2022, 11:15 AM · VyOS 1.4 Sagitta

Aug 18 2022

aserkin added a comment to T4617: VRF specification is needed for telegraf prometheus-client listen-address <address> .

The only way to start telegraf with ip vrf exec i found - is to comment out
#User=telegraf
in /etc/systemd/system/vyos-telegraf.service and
chown root:root /run/telegraf

Aug 18 2022, 11:07 AM · VyOS 1.4 Sagitta

Aug 16 2022

aserkin added a comment to T4617: VRF specification is needed for telegraf prometheus-client listen-address <address> .

Manual start of telegraf works for me

Aug 16 2022, 4:46 PM · VyOS 1.4 Sagitta

Aug 15 2022

aserkin created T4617: VRF specification is needed for telegraf prometheus-client listen-address <address> .

Aug 15 2022, 10:22 PM · VyOS 1.4 Sagitta

aserkin renamed T4615: vpn sessions-columns configuration needed from vpn session-columns configuration needed to vpn sessions-columns configuration needed.

Aug 15 2022, 5:40 PM · VyOS 1.5 Circinus

aserkin created T4615: vpn sessions-columns configuration needed.

Aug 15 2022, 5:38 PM · VyOS 1.5 Circinus

Aug 10 2022

aserkin added a comment to T4603: Need a config option to specify NAS-IP-Address for vpn l2tp.

Hi Viacheslav
Sorry, i probably misspelled the config option. Actually it's availabe at [radius] section of accel-ppp.conf.
Below is the [radius] section from my /run/accel-pppd/l2tp.conf after i changed
/usr/libexec/vyos/conf_mode/vpn_l2tp.py:

Aug 10 2022, 5:14 PM · VyOS 1.4 Sagitta

aserkin changed Version from - to 1.4 on T4603: Need a config option to specify NAS-IP-Address for vpn l2tp.

Aug 10 2022, 7:13 AM · VyOS 1.4 Sagitta

aserkin created T4603: Need a config option to specify NAS-IP-Address for vpn l2tp.

Aug 10 2022, 7:12 AM · VyOS 1.4 Sagitta