Page MenuHomeVyOS Platform

Function `call` sometimes hangs
Closed, ResolvedPublicBUG

Description

During developing https://github.com/vyos/vyos-1x/pull/2179, I found vyos.utils.process.call could hang when starting an FRR daemon.

Steps to produce this issue:

# ensure igmp protocol is disabled
delete protocols igmp
commit
# enable igmp protocol, which will cause pimd to start
set protocols igmp interface eth0
commit  # hangs here

I found it hanged at pipe = p.communicate(input, timeout) (https://github.com/vyos/vyos-1x/blob/fc35434bfb0def50e5e492030451e035c80d153d/python/vyos/utils/process.py#L82) when call(pimd_cmd) is called at (https://github.com/vyos/vyos-1x/blob/fc35434bfb0def50e5e492030451e035c80d153d/src/conf_mode/protocols_igmp.py#L122).

At the same time, ps -ef | grep pimd showed that pimd was <defunct>:

frr        68972   68968  0 06:52 pts/1    00:00:00 [pimd] <defunct>
frr        68973       1  0 06:52 ?        00:00:00 /usr/lib/frr/pimd -d -F traditional --daemon -A 127.0.0.1

By looking at this issue closer, I found it was similar to https://stackoverflow.com/questions/50646412/subprocess-becomes-defunct-communicate-hangs. Changing stdout and stderr from PIPE to None solved the issue:

call(f'/usr/lib/frr/pimd -d -F traditional --daemon -A 127.0.0.1', stdout=None, stderr=None)

It seems to me that when calling call(f'/usr/lib/frr/pimd -d -F traditional --daemon -A 127.0.0.1') with stdout and stderr setting to PIPE (the default value), the write end of the pipe doesn't get closed after the process is daemonized with the double-fork technique. It is likely an issue in FRR on how it launches a daemon process, but we can mitigate this in vyos-1x by changing the default values of stdout and stderr from PIPE to None. A None value causes the child process created by Popen to inherit stdout and stderr from its parent process.

Details

Version
1.4
Is it a breaking change?
Perfectly compatible
Issue type
Bug (incorrect behavior)