Page MenuHomeVyOS Platform

Problems with simultaneous usage of multiple `vtysh` processes
Confirmed, HighPublicBUG

Description

In case multiple vtysh processes are running simultaneously this can lead to unapplied configuration parts.

An indicator of this problem is a message in a log:

Aug 20 10:40:28 VyOS-on-AWS mgmtd[2444]: [WEJ55-ZGM1W] Locking for DS 2 failed, Err: 'Lock already taken on DS by another session!' vty 0x55738612b7f0

An example of a real problem because of this is a DHCP client.
A DHCP client is started by interfaces_ethernet.py (https://github.com/vyos/vyos-1x/blob/2277371fe18577502ce318c23789f86d1ec97be7/src/conf_mode/interfaces_ethernet.py#L410), and then the script also work with vtysh (https://github.com/vyos/vyos-1x/blob/2277371fe18577502ce318c23789f86d1ec97be7/src/conf_mode/interfaces_ethernet.py#L416-L425).

In some cases (hard to catch, but exists), this leads to a situation when dhclient cannot install routes to FRR via https://github.com/vyos/vyos-1x/blob/2277371fe18577502ce318c23789f86d1ec97be7/src/etc/dhcp/dhclient-enter-hooks.d/03-vyos-ipwrapper#L87

A simple way to simulate the problem is a script:

#!/usr/bin/env python3
from vyos import frr
from concurrent.futures import ThreadPoolExecutor
from subprocess import run


def test_frr1():
    run_result = run(['/usr/bin/vtysh', '-c', 'conf t', '-c', 'ip route 0.0.0.0/0 192.0.2.1 eth0 tag 210 210 '])
    if run_result.stderr:
        print(f'Error: {run_result.stderr}')

def test_frr2():
    frr_cfg = frr.FRRConfig()
    frr_cfg.load_configuration('zebra')
    frr_cfg.commit_configuration('zebra')

with ThreadPoolExecutor(max_workers=50) as executor:
    threads = 20
    while threads:
        executor.submit(test_frr1)
        executor.submit(test_frr2)
        threads -= 1

During the run (this may require multiple runs to catch), you may see messages:

% command failed, could not lock candidate DS
% command failed, could not lock candidate DS

And in the logs:

Aug 20 13:30:43 vyos mgmtd[1315]: [WEJ55-ZGM1W] Locking for DS 2 failed, Err: 'Lock already taken on DS by another session!' vty 0x55b998aaddc0
Aug 20 13:30:44 vyos mgmtd[1315]: [WEJ55-ZGM1W] Locking for DS 2 failed, Err: 'Lock already taken on DS by another session!' vty 0x55b998aaddc0

Details

Difficulty level
Unknown (require assessment)
Version
1.4.0, 1.5-rolling-202408200022
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Perfectly compatible
Issue type
Bug (incorrect behavior)

Event Timeline

zsdc changed the task status from Open to Confirmed.