Page MenuHomeVyOS Platform

WATCHFRR: crashlog and per-thread log buffering unavailable (due to files left behind in /var/tmp/frr/ after reboot)
Closed, ResolvedPublicBUG

Description

On booting 1.3.0 (epa3), I saw something I never saw before at boot time:

vyos@gw# sudo journalctl|grep -i frr
Nov 13 13:58:30 gw vyos-router[745]: 2021/11/13 13:58:30 WATCHFRR: failed to mkdir "/var/tmp/frr/watchfrr.889": File exists
Nov 13 13:58:30 gw vyos-router[745]: 2021/11/13 13:58:30 WATCHFRR: crashlog and per-thread log buffering unavailable!

Upon checking /var/tmp/frr indeed there were directories left behind for old watchfrr processes:

vyos@gw:/$ sudo ls -la /var/tmp/frr
total 88
drwx------ 22 frr  frr  4096 Nov 19 15:07 .
drwxrwxrwt  1 root root 4096 Nov 21 00:00 ..
drwx------  2 frr  frr  4096 Nov 19 15:07 bfdd.982
drwx------  2 frr  frr  4096 Nov 19 15:07 bgpd.946
drwx------  2 frr  frr  4096 Nov 19 15:07 isisd.967
drwx------  2 frr  frr  4096 Nov 19 15:07 ldpd.971
drwx------  2 frr  frr  4096 Nov 19 15:07 ldpd.972
drwx------  2 frr  frr  4096 Nov 19 15:07 ldpd.973
drwx------  2 frr  frr  4096 Nov 19 15:07 ospf6d.965
drwx------  2 frr  frr  4096 Nov 19 15:07 ospfd.961
drwx------  2 frr  frr  4096 Nov 19 15:07 ripd.956
drwx------  2 frr  frr  4096 Nov 19 15:07 ripngd.958
drwx------  2 frr  frr  4096 Nov 19 15:07 staticd.977
drwx------  2 root root 4096 Nov 13 13:58 watchfrr.907
drwx------  2 root root 4096 Nov 13 10:08 watchfrr.21842
drwx------  2 root root 4096 Nov 13 9:24 watchfrr.889
drwx------  2 root root 4096 Nov 19 15:07 watchfrr.900
drwx------  2 root root 4096 Nov 14 13:30 watchfrr.905
drwx------  2 root root 4096 Nov 14 13:20 watchfrr.906
drwx------  2 root root 4096 Nov 15 09:04 watchfrr.909
drwx------  2 root root 4096 Nov 19 13:53 watchfrr.914
drwx------  2 frr  frr  4096 Nov 19 15:07 zebra.937

However improbable, the PID seemed to have clashed with an old and earlier invocation of the watchfrr process. What are the odds...

FRR version:

vyos@gw:/$ show version frr
FRRouting 7.5.1-20211107-00-ga122222f5 (gw).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--enable-exampledir=/usr/share/doc/frr/examples/' '--localstatedir=/var/run/frr' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-systemd=yes' '--enable-rpki' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'

Might be related to https://github.com/FRRouting/frr/issues/6541?

Details

Difficulty level
Unknown (require assessment)
Version
1.3.0-epa3
Why the issue appeared?
Issues in third-party code
Is it a breaking change?
Perfectly compatible
Issue type
Cosmetic issue (typos etc.)

Event Timeline

marc_s renamed this task from Files left behind in /var/tmp/frr/ after reboot to WATCHFRR: crashlog and per-thread log buffering unavailable (due to files left behind in /var/tmp/frr/ after reboot).Nov 21 2021, 10:19 AM
marc_s updated the task description. (Show Details)
c-po triaged this task as Low priority.
c-po added a project: VyOS 1.4 Sagitta.
c-po changed Why the issue appeared? from Will be filled on close to Issues in third-party code.
c-po changed Is it a breaking change? from Unspecified (possibly destroys the router) to Perfectly compatible.
c-po changed Issue type from Bug (incorrect behavior) to Cosmetic issue (typos etc.).

@c-po I see that you've marked this as resolved, but I don't see any comments (maybe I'm doing something wrong). I'd appreciate some feedback.
Is it resolved as in: will be fixed as soon as third-party code is fixed? Or have you made changes to the code to mitigate?