Page MenuHomeVyOS Platform

Slow boot/load and CLI response times
Resolved (N/A)PublicBUG

Description

During our testing of VyOS 1.2.1-S2 I noticed a significant increase in boot/load and CLI response times on one of our customer edge device platforms.

Relevant Device Specs:

Model: Lanner NCA-1020 (https://www.lanner-america.com/buy-lanner/product/network-appliances/x86-desktop-network-appliances/nca-1020a/)
Processor: Intel Celeron N3010, 2.4GHz, Dual-Core
Memory: 2GB DDR3L 1600 MHz non-ECC
Ethernet: Intel i211, 10/100/1000 Mbps RJ-45
Storage: mini mSATA 32 GB

Observed boot times

** Boot time 1.1.8 -> 1.2.1-S2 w/o configuration ~160 seconds
** Boot time 1.1.8 -> 1.2.1-S2 /w configuration ~260 seconds

** Boot time 1.2.1-S2 w/o configuration ~120 seconds
** Boot time 1.2.1-S2 /w configuration ~233 seconds (no less than 180 seconds)

** Boot time 1.1.8 w/o configuration ~40 seconds
** Boot time 1.1.8 /w configuration ~60 seconds

** Avg CLI response time 1.2.1-S2, ~2 seconds
** Avg CLI response time 1.8.1, no perceivable delay

If it's helpful, I can also attach all permutations of the configuration files we tested with.

Details

Difficulty level
Unknown (require assessment)
Version
1.2.1-S2
Why the issue appeared?
Will be filled on close

Event Timeline

Hello, @ekim!
Such a significant increase of boot time with the same configuration is very strange, but still possible - even small changes can easily cause such behavior if you have a huge or some specific configuration. But what is more strange - CLI response time. Regardless of configuration, CLI should work without visible delays, except for autocompleting or commit operations.
Could you check the current load of the host at that moment, when CLI is slow? We must be sure that the system is not overloaded.

Hi @zsdc,

Neither the CPU or memory load ever exceeds 0.5% utilization while idling. I haven't tested whether throughput or control plane timings have been impacted.

The configuration is identical from 1.1.8 -> 1.2.1-s2. Doesn't make sense to me why there is such a large impact, especially with no impact to resource consumption.

Thanks

@ekim, could you make a screen record with this CLI delay?

In 1.2 was changed a lot of internal logic and many software which handling different services, so some impact is possible. But, we are trying to do our best to avoid such a big difference. After we will be sure that there are no problems with CLI in your case, we will switch to configuration analysis to detect possible bottlenecks.

Neither have configurations on them. However, CLI response time when a configuration is loaded is even slower on 1.2.1-s2. Of course, no perceptible impact on 1.1.8.

Hello, @ekim! I see now. This is more looks like the waiting due to I/O. Do, please, the next:

  1. Run sudo atop -w /tmp/atop-mon.log -a 5 60 in dedicated terminal.
  2. Try to work several minutes in the terminal. It must freezing at this moment, otherwise, collected test data will be wrong.
  3. Wait until atop finish its work (~ 5 min from the start).
  4. Copy the file /tmp/atop-mon.log from the router and send to us for analysis.

Thanks! Here you go!

Thank you, @ekim!
What exactly you were doing at the moment of this log record? I see a lot of scripts, which are almost permanently do something in the system. Is it possible that this system contains some custom scripts (in /config/scripts folder, for example)?
If yes, you should check the schedule of execution, and requirements of modification for 1.2 version command syntax.

These are default VyOS installations without any modifications other than to the running configuration. During these recordings I set IP address, duplex, and speed on eth0, set a static route, hostname, and ntp, time, and dhcp server configurations. While doing so, I used ? to enumerate possible configuration paths and [tab] for CLI completion to hopefully illuminate the latency in these operations.

The only thing in /config/scripts is the default file and has only default comments in it:

vyos@vyos:~$ ls /config/scripts/
vyatta-postconfig-bootup.script
vyos@vyos:~$ less /config/scripts/vyatta-postconfig-bootup.script 
#!/bin/sh
# This script is called from /etc/rc.local on boot after the Vyatta
# configuration is fully applied. Any modifications done to work around
# unfixed bugs and implement enhancements which are not complete in the Vyatta
# system can be placed here.

@ekim, we have never met with such a problem and cannot reproduce it in our environments. The better way to continue investigation would be getting access to this installation. If this would be possible, we could continue debugging.

syncer edited projects, added Rejected; removed VyOS 1.2 Crux.
syncer added a subscriber: syncer.

Impossible to reproduce

syncer reopened this task as Needs testing.Oct 22 2019, 9:37 PM
syncer reassigned this task from zsdc to Unknown Object (User).
syncer triaged this task as Normal priority.
syncer edited projects, added VyOS 1.3 Equuleus; removed Rejected.
syncer added a subscriber: zsdc.
Unknown Object (User) added a comment.Nov 28 2019, 9:25 PM

@ekim seems this issue with kernel and your device. If I build custom stable kernel for VyOS installed on same device without CONFIG_PINCTRL_CHERRYVIE all works properly.
Can you show cat /pro/interrupt output for confirm this?
You can also change /boot/grub/grub.cfg add noapic and reboot device.

menuentry "VyOS 1.2.3 linux (Serial console)" {
	linux /boot/1.2.3/vmlinuz boot=live noapic quiet rootdelay=5 vyos-union=/boot/1.2.3 console=tty0 console=ttyS0,9600
	initrd /boot/1.2.3/initrd.img
}

Can you confirm that with grub modification after reboot all works correct?

syncer reassigned this task from Unknown Object (User) to Viacheslav.Jul 16 2023, 9:29 PM
syncer added a subscriber: Unknown Object (User).

Comparing boot time for now 1.3 and 1.1.8 is not actual
There are 2 different systems :)
Also, some validators were rewritten on 1.2 to Python and for 1.3 to sh, OCAML and so on (python validators could be cause of the issue)
In my internal test VM loads ~40 sec tested in VyOS 1.3.5
We always can improve something, but lets find what we can improve in separate tasks.

Close it for now.