Page MenuHomeVyOS Platform

FRR config not loaded after daemons segfault or restart
Closed, ResolvedPublicBUG

Description

Reproducing

set interfaces loopback lo address 10.1.1.1/32
set protocols ospf area 0 network 192.168.0.0/24
set protocols ospf default-information originate always
set protocols ospf default-information originate metric 10
set protocols ospf default-information originate metric-type 2
set protocols ospf log-adjacency-changes
set protocols ospf parameters router-id 10.1.1.1
set protocols ospf redistribute connected metric-type 2
set protocols ospf redistribute connected route-map CONNECT

set policy route-map CONNECT rule 10 action permit
set policy route-map CONNECT rule 10 match interface lo

And if daemons will be segfault or killed, then watchfrr will recover process, but without actual config

vyos@DHCP-Relay# sudo killall ospfd
[edit]
vyos@DHCP-Relay# ps ax | grep ospf
  876 ?        Ss     0:00 /usr/lib/frr/watchfrr -d zebra bgpd ripd ripngd ospfd ospf6d staticd
  942 ?        Ss     0:00 /usr/lib/frr/ospf6d -d --daemon -A ::1 -M snmp
 2481 ?        Ss     0:00 /usr/lib/frr/ospfd -d --daemon -A 127.0.0.1 -M snmp
 2486 ttyS0    S+     0:00 grep ospf
vyos@DHCP-Relay# vtysh -d ospfd -c "show run"
Building configuration...

Current configuration:
!
frr version 7.0.1-20190820-04-g047efd6
frr defaults traditional
hostname DHCP-Relay
log syslog informational
service integrated-vtysh-config
!
line vty
!
end

Details

Difficulty level
Unknown (require assessment)
Version
1.2.3
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)

Event Timeline

Unknown Object (User) created this task.Dec 20 2019, 10:09 PM

This is a known fault, and is not easily fixable in the current implementation. This fault is because the vuos cli manually configures the frr process after it's started, and when the process dies/restarts it will read its config from the saved config file. This makes the process restart into an empty config as we have no way to save the config from the prior process.

As far as i know this is ment to be fixed by a cli rewrite.

We've seen this recently on bleeding-edge (yesterday's version) of 1.3. I'm currently investigating what tripped ospf6d, but I suspect it's going to be some Ubiquiti routers spewing their nasty OSPFv3 implementation.

Mar 24 21:23:31 coudreau ospf6d[1171]: SPF: Scheduled in 0 msec
Mar 24 21:23:31 coudreau ospf6d[1171]: SPF processing: # Areas: 1, SPF runtime: 0 sec 47 usec, Reason: R+, L+
Mar 24 21:23:31 coudreau ospf6d[1171]: SPF: Scheduled in 50 msec
Mar 24 21:23:31 coudreau ospf6d[1171]: SPF processing: # Areas: 1, SPF runtime: 0 sec 40 usec, Reason: N+
...snip...
Mar 24 21:23:34 coudreau watchfrr[1107]: [EC 268435457] ospf6d state -> down : read returned EOF
Mar 24 21:23:34 coudreau zebra[1142]: [EC 4043309121] Client 'ospf6' encountered an error and is shutting down.
Mar 24 21:23:34 coudreau zebra[1142]: client 51 disconnected. 7 ospf6 routes removed from the rib
Mar 24 21:23:39 coudreau watchfrr[1107]: [EC 100663303] Forked background command [pid 3764]: /usr/lib/frr/watchfrr.sh restart ospf6d
Mar 24 21:23:39 coudreau zebra[1142]: client 51 says hello and bids fair to announce only ospf6 routes vrf=0
Mar 24 21:23:40 coudreau watchfrr[1107]: ospf6d state -> up : connect succeeded

I am showing my naïvity about how VyOS' internals work now: would it not be possible to have FRR's daemons configured to use a configuration file in tmpfs, and have VyOS issue a "write mem" at the end of each time it interacts with FRR? That way FRR would have a persistent configuration in the event of a segfault or subprocess crash?

A router reboot last week reminded me to never to write mem in vtysh (but after looking it was automatic bij me :( )
The router booted with the configuration in FRR already loaded, and then Vyos tried to populate FRR based on the Vyos configuration and everything was broken :-)
It didn't help that the configuration i saved in FRR was a couple of months old.

I'm not expecting a persisted-across-reboots FRR config — hence suggesting tmpfs — so when the system boots there is nothing there. Obviously something would need to create the (empty) FRR config files in tmpfs before running FRR, otherwise I expect all the FRR daemons will fail to start.

@maznu @Merijn Can you test the latest rolling 1.4 release?
It should be fixed.
You can kill ripd/ripng/ospfd/bgpd/isisd daemons or allow it for watchfrr.

/usr/lib/frr/watchfrr.sh restart bgpd
Viacheslav changed the task status from Open to Needs testing.Apr 5 2021, 1:25 PM
erkin set Issue type to Bug (incorrect behavior).Aug 31 2021, 6:03 PM