Smoketests for DHCP Server can falsely fail
Closed, ResolvedPublicBUG
Actions

Assigned To

Authored By

	chris.blackburn
	Jul 31 2025, 6:53 PM

Description

Part of the Smoketest code for the DHCP server tests whether the DHCP server is running successfully. This code is called in a few different places inside the tests to make sure that a particular configuration has not caused the server to crash. Specifically, the code that I'm referring to is copied down below

def verify_service_running(self):
    tmp = cmd('tail -n 100 /var/log/messages | grep kea')
    self.assertTrue(process_named_running(PROCESS_NAME), msg=f'Service not running, log: {tmp}')

This code performs two actions. Gets the last 100 lines of the system log, filters it for the Keyword Kea and then checks if the named process is running and uses the aforementioned log file to help provide the user context to fault find the issue if the assertion fails. Unfortunately, the cmd function that the code uses to get the log file throws an exception if the command called returns an error code other than 0. This means that if there are 0 relevant log lines after filtering then an uncaught exception will be thrown and the test will fail, even though there's not actually anything wrong. When this happens, you see an output similar to the below

======================================================================
ERROR: test_dhcp_single_pool_range (__main__.TestServiceDHCPServer.test_dhcp_single_pool_range)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/libexec/vyos/tests/smoke/cli/test_service_dhcp-server.py", line 188, in test_dhcp_single_pool_range
    self.verify_service_running()
  File "/usr/libexec/vyos/tests/smoke/cli/test_service_dhcp-server.py", line 102, in verify_service_running
    tmp = cmd('tail -n 100 /var/log/messages | grep kea')
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/vyos/utils/process.py", line 189, in cmd
    raise OSError(code, feedback)
PermissionError: [Errno 1] failed to run command: None tail -n 100 /var/log/messages | grep kea
returned: 
exit code: 1

I can think of a couple of things that can be done to resolve this and improve the behaviour:

Change the command being called. I assume that the desired behaviour is to get the 100 most recent log messages associated with Kea. Because the "tail" command is the first in the chain, it gets the last 100 log lines and *then* filters it using grep. This will produce a variable number of log lines which is probably not what's wanted. Therefore, we should change the command to grep Kea /var/log/messages | tail -n 100. This will search all of the system log file and get the last 100 messages that are Kea related. NB: This new method may make the number 100 inappropriate. I'm going to leave it as-is for now to minimise the change but this might need to be changed in the future.
Add exception handling code around the call to cmd. As this is only getting extra context information, even if it fails, we should not cause the tests to stop. We should handle this failure and set the log message accordingly.

An example of what I think the code should be set to is:

def verify_service_running(self):
    try:
        tmp = cmd('grep Kea /var/log/messages | tail -n 100')
    except OSError:
        tmp = "No relevant log entries"
    self.assertTrue(process_named_running(PROCESS_NAME), msg=f'Service not running, log: {tmp}')