Page MenuHomeVyOS Platform

VPP: configuration commit fail causes interface state corruption
In progress, HighPublicBUG

Description

Subject issue can be constantly reproduced using the following scenario with memory allocation issue on aws vm as example:

vyos@VyOS-for-Smoke-Tests# set vpp settings interface eth1 driver 'dpdk'
[edit]
vyos@VyOS-for-Smoke-Tests# set vpp settings buffers page-size '1G'
[edit]
vyos@VyOS-for-Smoke-Tests# commit
[ vpp ]

WARNING: NOTE: Current dataplane capacity (estimated): 2.1 M IPv4
routes. Exceeding these values will lead to a dataplane out-of-memory
condition and a crash. Extensive use of features like ACLs, NAT and
others may reduce the numbers above. Please read the documentation for
details: https://docs.vyos.io/

An error occurred: [Errno 2] Cannot connect to VPP API. VPP service will
be restarted with the previous configuration
[[vpp]] failed
Commit failed
[edit]
vyos@VyOS-for-Smoke-Tests# lspci -nnk | grep -iA3 Ethernet
00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network Adapter (ENA) [1d0f:ec20]
        Kernel driver in use: ena
        Kernel modules: ena
00:06.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network Adapter (ENA) [1d0f:ec20]
        Kernel driver in use: ena
        Kernel modules: ena
[edit]
vyos@VyOS-for-Smoke-Tests# run sh in
Codes: S - State, L - Link, u - Up, D - Down, A - Admin Down
Interface    IP Address       MAC                VRF        MTU  S/L    Description
-----------  ---------------  -----------------  -------  -----  -----  -------------
eth0         172.16.11.55/24  06:50:78:09:b8:59  default   1500  u/u    WAN
eth1         -                06:36:b4:e3:0f:0d  default   1500  A/D
lo           127.0.0.1/8      00:00:00:00:00:00  default  65536  u/u
             ::1/128
[edit]

The following exception is present in journal during above commit:

Dec 08 17:01:25 VyOS-for-Smoke-Tests systemd[1]: Starting vector packet processing engine...
Dec 08 17:01:25 VyOS-for-Smoke-Tests vpp[5789]: vpp[5789]: vlib_physmem_shared_map_create: clib_pmalloc_create_shared_arena: unsupported page size (1048576KB)
Dec 08 17:01:25 VyOS-for-Smoke-Tests vpp[5789]: vpp[5789]: vlib_buffer_main_init: failed to allocate buffer pool(s)
Dec 08 17:01:25 VyOS-for-Smoke-Tests vpp[5789]: vlib_physmem_shared_map_create: clib_pmalloc_create_shared_arena: unsupported page size (1048576KB)
Dec 08 17:01:25 VyOS-for-Smoke-Tests vpp[5789]: vlib_buffer_main_init: failed to allocate buffer pool(s)
Dec 08 17:01:25 VyOS-for-Smoke-Tests systemd[1]: vpp.service: Failed with result 'protocol'.
Dec 08 17:01:25 VyOS-for-Smoke-Tests systemd[1]: Failed to start vector packet processing engine.
Dec 08 17:01:25 VyOS-for-Smoke-Tests systemd[1]: vpp.service: Triggering OnFailure= dependencies.
Dec 08 17:01:25 VyOS-for-Smoke-Tests systemctl[5787]: Job for vpp.service failed because the service did not take the steps required by its unit configuration.
Dec 08 17:01:25 VyOS-for-Smoke-Tests systemctl[5787]: See "systemctl status vpp.service" and "journalctl -xeu vpp.service" for details.
Dec 08 17:01:25 VyOS-for-Smoke-Tests systemd[1]: Starting Restart VPP on failure...
Dec 08 17:01:26 VyOS-for-Smoke-Tests python3[845]: VPP API connection timeout: [Errno 111] Connection refused
Dec 08 17:01:26 VyOS-for-Smoke-Tests python3[5791]: Traceback (most recent call last):
Dec 08 17:01:26 VyOS-for-Smoke-Tests python3[5791]:   File "/usr/libexec/vyos/reset_section.py", line 116, in <module>
Dec 08 17:01:26 VyOS-for-Smoke-Tests python3[5791]:     session.commit()
Dec 08 17:01:26 VyOS-for-Smoke-Tests python3[5791]:   File "/usr/lib/python3/dist-packages/vyos/configsession.py", line 325, in commit
Dec 08 17:01:26 VyOS-for-Smoke-Tests python3[5791]:     out = self.__run_command([COMMIT])
Dec 08 17:01:26 VyOS-for-Smoke-Tests python3[5791]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 08 17:01:26 VyOS-for-Smoke-Tests python3[5791]:   File "/usr/lib/python3/dist-packages/vyos/configsession.py", line 252, in __run_command
Dec 08 17:01:26 VyOS-for-Smoke-Tests python3[5791]:     raise ConfigSessionError(output)
Dec 08 17:01:26 VyOS-for-Smoke-Tests python3[5791]: vyos.configsession.ConfigSessionError: Configuration system temporarily locked due to another commit in progress
Dec 08 17:01:26 VyOS-for-Smoke-Tests systemd[1]: vpp-failure-handler.service: Main process exited, code=exited, status=1/FAILURE
Dec 08 17:01:26 VyOS-for-Smoke-Tests systemd[1]: opt-vyatta-config-tmp-new_config_5791.mount: Deactivated successfully.
Dec 08 17:01:26 VyOS-for-Smoke-Tests systemd[1]: vpp-failure-handler.service: Failed with result 'exit-code'.
Dec 08 17:01:26 VyOS-for-Smoke-Tests systemd[1]: Failed to start Restart VPP on failure.

Subsequent attempt to start VPP results in the following:

vyos@VyOS-for-Smoke-Tests# set vpp settings interface eth1 driver 'dpdk'
[edit]
vyos@VyOS-for-Smoke-Tests# commit
[ vpp ]

WARNING: NOTE: Current dataplane capacity (estimated): 2.1 M IPv4
routes. Exceeding these values will lead to a dataplane out-of-memory
condition and a crash. Extensive use of features like ACLs, NAT and
others may reduce the numbers above. Please read the documentation for
details: https://docs.vyos.io/

FileExistsError: [Errno 17] File exists

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/libexec/vyos/services/vyos-configd", line 157, in run_script
    script.apply(c)
  File "/usr/libexec/vyos/conf_mode/vpp.py", line 745, in apply
    control_host.override_driver(
  File "/usr/lib/python3/dist-packages/vyos/vpp/control_host.py", line 140, in override_driver
    Path('/sys/module/vfio_pci/drivers/pci:vfio-pci/new_id').write_text(
  File "/usr/lib/python3.11/pathlib.py", line 1079, in write_text
    with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f:
FileExistsError: [Errno 17] File exists

[[vpp]] failed
Commit failed
[edit]
vyos@VyOS-for-Smoke-Tests# lspci -nnk | grep -iA3 Ethernet
00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network Adapter (ENA) [1d0f:ec20]
        Kernel driver in use: ena
        Kernel modules: ena
00:06.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network Adapter (ENA) [1d0f:ec20]
        Kernel modules: ena
[edit]
vyos@VyOS-for-Smoke-Tests# run sh in
Codes: S - State, L - Link, u - Up, D - Down, A - Admin Down
Interface    IP Address       MAC                VRF        MTU  S/L    Description
-----------  ---------------  -----------------  -------  -----  -----  -------------
eth0         172.16.11.55/24  06:50:78:09:b8:59  default   1500  u/u    WAN
lo           127.0.0.1/8      00:00:00:00:00:00  default  65536  u/u
             ::1/128
[edit]

Details

Version
1.5-rolling-202512061506
Is it a breaking change?
Perfectly compatible
Issue type
Bug (incorrect behavior)