During developing https://github.com/vyos/vyos-1x/pull/2179, I found vyos.utils.process.call could hang when starting an FRR daemon.
Steps to produce this issue:
# ensure igmp protocol is disabled delete protocols igmp commit # enable igmp protocol, which will cause pimd to start set protocols igmp interface eth0 commit # hangs here
I found it hanged at pipe = p.communicate(input, timeout) (https://github.com/vyos/vyos-1x/blob/fc35434bfb0def50e5e492030451e035c80d153d/python/vyos/utils/process.py#L82) when call(pimd_cmd) is called at (https://github.com/vyos/vyos-1x/blob/fc35434bfb0def50e5e492030451e035c80d153d/src/conf_mode/protocols_igmp.py#L122).
At the same time, ps -ef | grep pimd showed that pimd was <defunct>:
frr 68972 68968 0 06:52 pts/1 00:00:00 [pimd] <defunct> frr 68973 1 0 06:52 ? 00:00:00 /usr/lib/frr/pimd -d -F traditional --daemon -A 127.0.0.1
By looking at this issue closer, I found it was similar to https://stackoverflow.com/questions/50646412/subprocess-becomes-defunct-communicate-hangs. Changing stdout and stderr from PIPE to None solved the issue:
call(f'/usr/lib/frr/pimd -d -F traditional --daemon -A 127.0.0.1', stdout=None, stderr=None)
It seems to me that when calling call(f'/usr/lib/frr/pimd -d -F traditional --daemon -A 127.0.0.1') with stdout and stderr setting to PIPE (the default value), the write end of the pipe doesn't get closed after the process is daemonized with the double-fork technique. It is likely an issue in FRR on how it launches a daemon process, but we can mitigate this in vyos-1x by changing the default values of stdout and stderr from PIPE to None. A None value causes the child process created by Popen to inherit stdout and stderr from its parent process.