Page MenuHomeVyOS Platform

bgpd crashed, when BGP EVPN peers are changed/rebooted
In progress, NormalPublicBUG

Description

bgpd crashed, when BGP EVPN peers are changed/rebooted.
sometimes, FRR will restart BGPd directly, sometimes not.

I will attach vyos setup in comment.

I tried to backport this patch, and it seems everthing is good right now.
https://github.com/FRRouting/frr/commit/8b087b2a4392a2fdc4645f9c31bb33402f53d3e7
the file I uploaded, is the backport version.

FRR log

Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  Mac Hash Entry                :      9 *         16
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  Mac Hash Entry Intf String    :     16 * (variably sized)
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP instance                  :      2 *      12456
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP Name data                 :     26 * (variably sized)
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP listen socket details     :      6 * (variably sized)
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP peer                      :     13 *      24072
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP peer connection           :     13 *        344
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP peer hostname             :     21 * (variably sized)
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  Peer group                    :      4 *         64
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP Peer group hostname       :      4 * (variably sized)
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  Peer description              :      1 *         33
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP peer af                   :      6 *         80
Nov 19 15:15:26 vyos-2 BGP[18660]: /lib/x86_64-linux-gnu/libc.so.6(+0x8aeec) [0x7fea48149eec]
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP update group              :      1 *        104
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP update subgroup           :      1 *        240
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP packet                    :      1 *         56
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP attribute                 :    369 *        288
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP aspath                    :     16 *         40
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP aspath seg                :     15 *         24
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP aspath segment data       :     15 * (variably sized)
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP aspath str                :     16 * (variably sized)
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP table                     :    337 *         56
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP node                      :    359 *        128
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP route                     :    293 *        144
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP ancillary route info      :    288 *         72
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP extra info for EVPN       :    279 *         32
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP extra info for vrf leaking:    106 *         80
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP connected                 :      7 *          4
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP static                    :      1 *        144
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP synchronise               :      1 *         48
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP adj in                    :    290 *         56
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP adj out                   :    112 *         96
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  extcommunity                  :    178 *         32
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  extcommunity val              :    178 * (variably sized)
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  extcommunity str              :    136 * (variably sized)
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  community-list handler        :      1 *        120
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP node clear queue          :     50 *          8
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP nexthop                   :      8 *        232
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP peer update interface     :      6 *          5
Nov 19 15:15:26 vyos-2 BGP[18660]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x12) [0x7fea480fafb2]
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP own address               :      7 *         64
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP own tunnel-ip address     :      2 *          8
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP redistribution            :      2 *         24
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP LABELS                    :     24 *         24
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP EVPN Information          :     20 *        152
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP EVPN MH Information       :      1 *         56
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP EVPN Import RT            :     20 *         16
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP EVPN VRF Import RT        :      2 *         16
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP EVPN Overlay              :      2 *         48
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP Martian Addr Intf String  :      7 * (variably sized)
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP PBR Context               :      2 *         32
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  BGP interface context         :     16 *          4
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: showing active allocations in memory group rfapi
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  NVE Configuration             :      1 *       3152
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  RFAPI Generic                 :      1 *        296
Nov 19 15:15:26 vyos-2 frrinit.sh[18660]: core_handler: memstats:  RFAPI Import Table            :      1 *        208
Nov 19 15:15:26 vyos-2 BGP[18660]: /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3) [0x7fea480e5472]
Nov 19 15:15:26 vyos-2 BGP[18660]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(_zlog_assert_failed+0xe9) [0x7fea4851bda9]
Nov 19 15:15:26 vyos-2 BGP[18660]: /usr/lib/frr/bgpd(bgp_attr_unintern+0xae) [0x5638c49d72fe]
Nov 19 15:15:26 vyos-2 BGP[18660]: /usr/lib/frr/bgpd(bgp_adj_in_remove+0x16) [0x5638c4b10c96]
Nov 19 15:15:26 vyos-2 BGP[18660]: /usr/lib/frr/bgpd(+0x170cc4) [0x5638c4a4ccc4]
Nov 19 15:15:26 vyos-2 BGP[18660]: /usr/lib/frr/bgpd(bgp_clear_route+0xff) [0x5638c4a51edf]
Nov 19 15:15:26 vyos-2 BGP[18660]: /usr/lib/frr/bgpd(bgp_clear_route_all+0x30) [0x5638c4a52020]
Nov 19 15:15:26 vyos-2 BGP[18660]: /usr/lib/frr/bgpd(bgp_fsm_change_status+0x258) [0x5638c4a1c168]
Nov 19 15:15:26 vyos-2 BGP[18660]: /usr/lib/frr/bgpd(bgp_event_update+0x1ee) [0x5638c4a1dcce]
Nov 19 15:15:26 vyos-2 BGP[18660]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(event_call+0x81) [0x7fea484fc8a1]
Nov 19 15:15:26 vyos-2 BGP[18660]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xc0) [0x7fea484a50a0]
Nov 19 15:15:26 vyos-2 BGP[18660]: /usr/lib/frr/bgpd(main+0x3d6) [0x5638c49d29e6]
Nov 19 15:15:26 vyos-2 BGP[18660]: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7fea480e624a]
Nov 19 15:15:26 vyos-2 BGP[18660]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7fea480e6305]
Nov 19 15:15:26 vyos-2 BGP[18660]: /usr/lib/frr/bgpd(_start+0x21) [0x5638c49d4991]
Nov 19 15:15:26 vyos-2 BGP[18660]: in thread bgp_packet_process_error scheduled from ../bgpd/bgp_io.c:255 bgp_process_reads()
Nov 19 15:15:26 vyos-2 zebra[18565]: [EC 4043309121] Client 'bgp' (session id 1) encountered an error and is shutting down.
Nov 19 15:15:26 vyos-2 zebra[18565]: [EC 4043309121] Client 'vnc' (session id 0) encountered an error and is shutting down.
Nov 19 15:15:26 vyos-2 watchfrr[18543]: [HD38Q-0HBRT][EC 268435457] bgpd state -> down : read returned EOF
Nov 19 15:15:26 vyos-2 zebra[18565]: [EC 4043309121] Client 'bgp' (session id 0) encountered an error and is shutting down.
Nov 19 15:15:26 vyos-2 zebra[18565]: ../zebra/zebra_ptm.c:1285 failed to find process pid registration
Nov 19 15:15:26 vyos-2 zebra[18565]: client 111 disconnected 0 bgp routes removed from the rib
Nov 19 15:15:26 vyos-2 zebra[18565]: client 111 disconnected 0 bgp nhgs removed from the rib
Nov 19 15:15:26 vyos-2 zebra[18565]: client 32 disconnected 0 vnc routes removed from the rib
Nov 19 15:15:26 vyos-2 zebra[18565]: client 32 disconnected 0 vnc nhgs removed from the rib
Nov 19 15:15:26 vyos-2 zebra[18565]: client 29 disconnected 5 bgp routes removed from the rib
Nov 19 15:15:26 vyos-2 zebra[18565]: client 29 disconnected 0 bgp nhgs removed from the rib
Nov 19 15:15:31 vyos-2 watchfrr[18543]: [YFT0P-5Q5YX] Forked background command [pid 19606]: /usr/lib/frr/watchfrr.sh restart bgpd
Nov 19 15:15:31 vyos-2 frrinit.sh[19606]: Cannot stop bgpd: pid 18660 not running
Nov 19 15:15:31 vyos-2 zebra[18565]: client 33 says hello and bids fair to announce only vnc routes vrf=0
Nov 19 15:15:31 vyos-2 zebra[18565]: client 30 says hello and bids fair to announce only bgp routes vrf=0
Nov 19 15:15:31 vyos-2 bgpd[19615]: [EC 33554466] 169.254.0.2 [FSM] Failure handling event BGP_Start in state Idle, prior events (null), (null), fd -1, last reset: No AFI/SAFI activated for peer
Nov 19 15:15:31 vyos-2 bgpd[19615]: [EC 33554466] 169.254.0.6 [FSM] Failure handling event BGP_Start in state Idle, prior events (null), (null), fd -1, last reset: No AFI/SAFI activated for peer
Nov 19 15:15:31 vyos-2 bgpd[19615]: [EC 33554466] 169.254.254.0 [FSM] Failure handling event BGP_Start in state Idle, prior events (null), (null), fd -1, last reset: No AFI/SAFI activated for peer
Nov 19 15:15:31 vyos-2 bgpd[19615]: [EC 33554466] 169.254.254.1 [FSM] Failure handling event BGP_Start in state Idle, prior events (null), (null), fd -1, last reset: No AFI/SAFI activated for peer
Nov 19 15:15:31 vyos-2 watchfrr[18543]: [QDG3Y-BY5TN] bgpd state -> up : connect succeeded
Nov 19 15:15:32 vyos-2 zebra[18565]: client 112 says hello and bids fair to announce only bgp routes vrf=0
Nov 19 15:15:32 vyos-2 bgpd[19615]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Nov 19 15:15:32 vyos-2 bgpd[19615]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Nov 19 15:15:32 vyos-2 bgpd[19615]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Nov 19 15:15:32 vyos-2 bgpd[19615]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Nov 19 15:15:37 vyos-2 bgpd[19615]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Nov 19 15:15:38 vyos-2 bgpd[19615]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Nov 19 15:15:42 vyos-2 bgpd[19615]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected

Details

Version
2025.11.19-0020-rolling
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)

Event Timeline

vyos-bq-1
wg0 connect to vyos-1 in GCE
wg1 connect to vyos-2 in GCE

interfaces {
    bridge br0 {
        enable-vlan
        mac xx:xx:xx:xx:xx:66
        member {
            interface vxlan0 {
                allowed-vlan 2-20
                native-vlan 1
            }
        }
        vif 2 {
            address xxx.xxx.0.1/24
            vrf RED
        }
    }
    bridge br10000 {
        enable-vlan
        ipv6 {
            address {
                no-default-link-local
            }
        }
        member {
            interface vxlan10000 {
                allowed-vlan 2-4094
                native-vlan 1
            }
        }
        vif 2 {
            ipv6 {
                address {
                    no-default-link-local
                }
            }
            vrf RED
        }
    }
    dummy dum0 {
        address xxx.xxx.254.0/32
    }
    ethernet eth0 {
        address dhcp
    }
    loopback lo {
    }
    vxlan vxlan0 {
        mtu 1450
        parameters {
            external
            nolearning
            vni-filter
        }
        port 4789
        source-address xxx.xxx.254.0
        source-interface dum0
        vlan-to-vni 1-20 {
            vni 20001-20020
        }
    }
    vxlan vxlan10000 {
        mtu 1450
        parameters {
            external
            nolearning
            vni-filter
        }
        port 4789
        source-address xxx.xxx.254.0
        source-interface dum0
        vlan-to-vni 2 {
            vni 10000
        }
    }
    wireguard wg0 {
        address xxx.xxx.0.0/31
        ip {
            adjust-mss 1380
        }
        peer vyos-1 {
            address xxx.xxx.227.63
            allowed-ips xxx.xxx.0.0/0
            persistent-keepalive 15
            port 5566
            public-key ****************
        }
        per-client-thread
        private-key xxxxxx
    }
    wireguard wg1 {
        address xxx.xxx.0.2/31
        ip {
            adjust-mss 1380
        }
        peer vyos-2 {
            address xxx.xxx.232.232
            allowed-ips xxx.xxx.0.0/0
            persistent-keepalive 15
            port 5566
            public-key ****************
        }
        per-client-thread
        private-key xxxxxx
    }
}
protocols {
    bgp {
        address-family {
            ipv4-unicast {
                network xxx.xxx.254.0/32 {
                }
                redistribute {
                    connected {
                    }
                }
            }
            l2vpn-evpn {
                advertise-all-vni
                advertise-default-gw
                advertise-svi-ip
                vni 20001 {
                    rd 65000:20001
                }
                vni 20002 {
                    rd 65000:20002
                }
                vni 20003 {
                    rd 65000:20003
                }
                vni 20004 {
                    rd 65000:20004
                }
                vni 20005 {
                    rd 65000:20005
                }
                vni 20006 {
                    rd 65000:20006
                }
                vni 20007 {
                    rd 65000:20007
                }
                vni 20008 {
                    rd 65000:20008
                }
                vni 20009 {
                    rd 65000:20009
                }
                vni 20010 {
                    rd 65000:20010
                }
                vni 20011 {
                    rd 65000:20011
                }
                vni 20012 {
                    rd 65000:20012
                }
                vni 20013 {
                    rd 65000:20013
                }
                vni 20014 {
                    rd 65000:20014
                }
                vni 20015 {
                    rd 65000:20015
                }
                vni 20016 {
                    rd 65000:20016
                }
                vni 20017 {
                    rd 65000:20017
                }
                vni 20018 {
                    rd 65000:20018
                }
                vni 20019 {
                    rd 65000:20019
                }
                vni 20020 {
                    rd 65000:20020
                }
            }
        }
        neighbor xxx.xxx.0.1 {
            advertisement-interval 0
            peer-group L3CLOS
            timers {
                connect 5
            }
        }
        neighbor xxx.xxx.0.3 {
            advertisement-interval 0
            peer-group L3CLOS
            timers {
                connect 5
            }
        }
        neighbor xxx.xxx.254.2 {
            advertisement-interval 0
            peer-group L3CLOS-EVPN
            timers {
                connect 5
            }
        }
        neighbor xxx.xxx.254.3 {
            advertisement-interval 0
            peer-group L3CLOS-EVPN
            timers {
                connect 5
            }
        }
        parameters {
            bestpath {
                as-path {
                    multipath-relax
                }
            }
            router-id xxx.xxx.254.0
        }
        peer-group L3CLOS {
            address-family {
                ipv4-unicast {
                    soft-reconfiguration {
                        inbound
                    }
                }
            }
            description "to different region cluster vyos"
            ebgp-multihop 64
            remote-as external
        }
        peer-group L3CLOS-EVPN {
            address-family {
                l2vpn-evpn {
                    soft-reconfiguration {
                        inbound
                    }
                }
            }
            ebgp-multihop 64
            remote-as external
            update-source dum0
        }
        system-as 65000
        timers {
            holdtime 3
            keepalive 1
        }
    }
}
service {
    ntp {
        allow-client xxxxxx
            address xxx.xxx.0.0/8
            address xxx.xxx.0.0/16
            address xxx.xxx.0.0/8
            address xxx.xxx.0.0/12
            address xxx.xxx.0.0/16
            address ::1/128
            address fe80::/10
            address fc00::/7
        }
        server xxxxx.tld {
        }
        server xxxxx.tld {
        }
        server xxxxx.tld {
        }
    }
    ssh {
        port 22
    }
}
system {
    config-management {
        commit-revisions 100
    }
    console {
        device ttyS0 {
            speed 115200
        }
    }
    host-name xxxxxx
    login {
        user xxxxxx {
            authentication {
                encrypted-password xxxxxx
            }
        }
        user xxxxxx {
            authentication {
                encrypted-password xxxxxx
                plaintext-password xxxxxx
            }
        }
    }
    name-server eth0
}
vrf {
    bind-to-all
    name RED {
        protocols {
            bgp {
                address-family {
                    ipv4-unicast {
                        redistribute {
                            connected {
                            }
                        }
                    }
                    l2vpn-evpn {
                        advertise {
                            ipv4 {
                                unicast {
                                }
                            }
                        }
                        rd 65000:10000
                        route-target {
                            export 10000:1
                            import 10000:1
                        }
                    }
                }
                parameters {
                    router-id xxx.xxx.254.0
                }
                peer-group L3RTR {
                    ebgp-multihop 64
                }
                system-as 65000
                timers {
                    holdtime 3
                    keepalive 1
                }
            }
        }
        table 100
        vni 10000
    }
}

vyos-bq-2
wg0 connect to vyos-1
wg1 connect to vyos-2

interfaces {
    bridge br0 {
        enable-vlan
        mac xx:xx:xx:xx:xx:66
        member {
            interface vxlan0 {
                allowed-vlan 2-20
                native-vlan 1
            }
        }
        vif 2 {
            address xxx.xxx.0.1/24
            vrf RED
        }
    }
    bridge br10000 {
        enable-vlan
        ipv6 {
            address {
                no-default-link-local
            }
        }
        member {
            interface vxlan10000 {
                allowed-vlan 2-4094
                native-vlan 1
            }
        }
        vif 2 {
            ipv6 {
                address {
                    no-default-link-local
                }
            }
            vrf RED
        }
    }
    dummy dum0 {
        address xxx.xxx.254.1/32
    }
    ethernet eth0 {
        address dhcp
    }
    loopback lo {
    }
    vxlan vxlan0 {
        mtu 1450
        parameters {
            external
            nolearning
            vni-filter
        }
        port 4789
        source-address xxx.xxx.254.1
        source-interface dum0
        vlan-to-vni 1-20 {
            vni 20001-20020
        }
    }
    vxlan vxlan10000 {
        mtu 1450
        parameters {
            external
            nolearning
            vni-filter
        }
        port 4789
        source-address xxx.xxx.254.1
        source-interface dum0
        vlan-to-vni 2 {
            vni 10000
        }
    }
    wireguard wg0 {
        address xxx.xxx.0.4/31
        ip {
            adjust-mss 1380
        }
        peer vyos-1 {
            address xxx.xxx.227.63
            allowed-ips xxx.xxx.0.0/0
            persistent-keepalive 15
            port 5567
            public-key ****************
        }
        per-client-thread
        private-key xxxxxx
    }
    wireguard wg1 {
        address xxx.xxx.0.6/31
        ip {
            adjust-mss 1380
        }
        peer vyos-2 {
            address xxx.xxx.232.232
            allowed-ips xxx.xxx.0.0/0
            persistent-keepalive 15
            port 5567
            public-key ****************
        }
        per-client-thread
        private-key xxxxxx
    }
}
protocols {
    bgp {
        address-family {
            ipv4-unicast {
                network xxx.xxx.254.1/32 {
                }
                redistribute {
                    connected {
                    }
                }
            }
            l2vpn-evpn {
                advertise-all-vni
                advertise-default-gw
                advertise-svi-ip
                vni 20001 {
                    rd 65001:20001
                }
                vni 20002 {
                    rd 65001:20002
                }
                vni 20003 {
                    rd 65001:20003
                }
                vni 20004 {
                    rd 65001:20004
                }
                vni 20005 {
                    rd 65001:20005
                }
                vni 20006 {
                    rd 65001:20006
                }
                vni 20007 {
                    rd 65001:20007
                }
                vni 20008 {
                    rd 65001:20008
                }
                vni 20009 {
                    rd 65001:20009
                }
                vni 20010 {
                    rd 65001:20010
                }
                vni 20011 {
                    rd 65001:20011
                }
                vni 20012 {
                    rd 65001:20012
                }
                vni 20013 {
                    rd 65001:20013
                }
                vni 20014 {
                    rd 65001:20014
                }
                vni 20015 {
                    rd 65001:20015
                }
                vni 20016 {
                    rd 65001:20016
                }
                vni 20017 {
                    rd 65001:20017
                }
                vni 20018 {
                    rd 65001:20018
                }
                vni 20019 {
                    rd 65001:20019
                }
                vni 20020 {
                    rd 65001:20020
                }
            }
        }
        neighbor xxx.xxx.0.5 {
            advertisement-interval 0
            peer-group L3CLOS
            timers {
                connect 5
            }
        }
        neighbor xxx.xxx.0.7 {
            advertisement-interval 0
            peer-group L3CLOS
            timers {
                connect 5
            }
        }
        neighbor xxx.xxx.254.2 {
            advertisement-interval 0
            peer-group L3CLOS-EVPN
            timers {
                connect 5
            }
        }
        neighbor xxx.xxx.254.3 {
            advertisement-interval 0
            peer-group L3CLOS-EVPN
            timers {
                connect 5
            }
        }
        parameters {
            bestpath {
                as-path {
                    multipath-relax
                }
            }
            router-id xxx.xxx.254.1
        }
        peer-group L3CLOS {
            address-family {
                ipv4-unicast {
                    soft-reconfiguration {
                        inbound
                    }
                }
            }
            description "to different region cluster vyos"
            ebgp-multihop 64
            remote-as external
        }
        peer-group L3CLOS-EVPN {
            address-family {
                l2vpn-evpn {
                    soft-reconfiguration {
                        inbound
                    }
                }
            }
            ebgp-multihop 64
            remote-as external
            update-source dum0
        }
        system-as 65001
        timers {
            holdtime 3
            keepalive 1
        }
    }
}
service {
    ntp {
        allow-client xxxxxx
            address xxx.xxx.0.0/8
            address xxx.xxx.0.0/16
            address xxx.xxx.0.0/8
            address xxx.xxx.0.0/12
            address xxx.xxx.0.0/16
            address ::1/128
            address fe80::/10
            address fc00::/7
        }
        server xxxxx.tld {
        }
        server xxxxx.tld {
        }
        server xxxxx.tld {
        }
    }
    ssh {
        port 22
    }
}
system {
    config-management {
        commit-revisions 100
    }
    console {
        device ttyS0 {
            speed 115200
        }
    }
    host-name xxxxxx
    login {
        user xxxxxx {
            authentication {
                encrypted-password xxxxxx
            }
        }
        user xxxxxx {
            authentication {
                encrypted-password xxxxxx
                plaintext-password xxxxxx
            }
        }
    }
    name-server eth0
}
vrf {
    bind-to-all
    name RED {
        protocols {
            bgp {
                address-family {
                    ipv4-unicast {
                        redistribute {
                            connected {
                            }
                        }
                    }
                    l2vpn-evpn {
                        advertise {
                            ipv4 {
                                unicast {
                                }
                            }
                        }
                        rd 65001:10000
                        route-target {
                            export 10000:1
                            import 10000:1
                        }
                    }
                }
                parameters {
                    router-id xxx.xxx.254.1
                }
                peer-group L3RTR {
                    ebgp-multihop 64
                }
                system-as 65001
                timers {
                    holdtime 3
                    keepalive 1
                }
            }
        }
        table 100
        vni 10000
    }
}

vyos-1 in GCE
10.0.0.0/24 is GKE cluster subnet, I connect vyos-1 to GKE FRR for BGP EVPN

interfaces {
    bridge br0 {
        enable-vlan
        mac xx:xx:xx:xx:xx:01
        member {
            interface vxlan0 {
                allowed-vlan 2-20
                native-vlan 1
            }
        }
        vif 2 {
            address xxx.xxx.2.1/24
            vrf RED
        }
        vif 3 {
            address xxx.xxx.3.1/24
            vrf RED
        }
    }
    bridge br10000 {
        enable-vlan
        ipv6 {
            address {
                no-default-link-local
            }
        }
        member {
            interface vxlan10000 {
                allowed-vlan 2-4094
                native-vlan 1
            }
        }
        vif 2 {
            ipv6 {
                address {
                    no-default-link-local
                }
            }
            vrf RED
        }
    }
    dummy dum0 {
        address xxx.xxx.254.2/32
    }
    ethernet eth0 {
        address dhcp
        offload {
            gro
            gso
            sg
            tso
        }
    }
    ethernet eth1 {
        address dhcp
    }
    loopback lo {
    }
    vxlan vxlan0 {
        mtu 1450
        parameters {
            external
            nolearning
            vni-filter
        }
        port 4789
        source-address xxx.xxx.0.2
        source-interface eth1
        vlan-to-vni 1-20 {
            vni 10001-10020
        }
    }
    vxlan vxlan10000 {
        mtu 1450
        parameters {
            external
            nolearning
            vni-filter
        }
        port 4789
        source-address xxx.xxx.254.2
        source-interface dum0
        vlan-to-vni 2 {
            vni 10000
        }
    }
    wireguard wg0 {
        address xxx.xxx.0.1/31
        ip {
            adjust-mss 1380
        }
        peer vyos-bq-1 {
            allowed-ips xxx.xxx.0.0/0
            persistent-keepalive 15
            public-key ****************
        }
        per-client-thread
        port 5566
        private-key xxxxxx
    }
    wireguard wg1 {
        address xxx.xxx.0.5/31
        ip {
            adjust-mss 1380
        }
        peer vyos-bq-2 {
            allowed-ips xxx.xxx.0.0/0
            persistent-keepalive 15
            public-key ****************
        }
        per-client-thread
        port 5567
        private-key xxxxxx
    }
}
protocols {
    bgp {
        address-family {
            ipv4-unicast {
                network xxx.xxx.254.2/32 {
                }
                redistribute {
                    connected {
                    }
                }
            }
            l2vpn-evpn {
                advertise-all-vni
                advertise-default-gw
                advertise-svi-ip
                vni 10001 {
                    rd 65002:10001
                }
                vni 10002 {
                    rd 65002:10002
                }
                vni 10003 {
                    rd 65002:10003
                }
                vni 10004 {
                    rd 65002:10004
                }
                vni 10005 {
                    rd 65002:10005
                }
                vni 10006 {
                    rd 65002:10006
                }
                vni 10007 {
                    rd 65002:10007
                }
                vni 10008 {
                    rd 65002:10008
                }
                vni 10009 {
                    rd 65002:10009
                }
                vni 10010 {
                    rd 65002:10010
                }
                vni 10011 {
                    rd 65002:10011
                }
                vni 10012 {
                    rd 65002:10012
                }
                vni 10013 {
                    rd 65002:10013
                }
                vni 10014 {
                    rd 65002:10014
                }
                vni 10015 {
                    rd 65002:10015
                }
                vni 10016 {
                    rd 65002:10016
                }
                vni 10017 {
                    rd 65002:10017
                }
                vni 10018 {
                    rd 65002:10018
                }
                vni 10019 {
                    rd 65002:10019
                }
                vni 10020 {
                    rd 65002:10020
                }
            }
        }
        listen {
            range xxx.xxx.0.0/24 {
                peer-group GKE-EVPN-LEAF
            }
        }
        neighbor xxx.xxx.0.0 {
            advertisement-interval 0
            peer-group L3CLOS
            timers {
                connect 5
            }
        }
        neighbor xxx.xxx.0.4 {
            advertisement-interval 0
            peer-group L3CLOS
            timers {
                connect 5
            }
        }
        neighbor xxx.xxx.254.0 {
            advertisement-interval 0
            peer-group L3CLOS-EVPN
            timers {
                connect 5
            }
        }
        neighbor xxx.xxx.254.1 {
            advertisement-interval 0
            peer-group L3CLOS-EVPN
            timers {
                connect 5
            }
        }
        parameters {
            bestpath {
                as-path {
                    multipath-relax
                }
            }
            router-id xxx.xxx.254.2
        }
        peer-group GKE-EVPN-LEAF {
            address-family {
                l2vpn-evpn {
                    soft-reconfiguration {
                        inbound
                    }
                }
            }
            ebgp-multihop 64
            remote-as external
            update-source eth1
        }
        peer-group L3CLOS {
            address-family {
                ipv4-unicast {
                    soft-reconfiguration {
                        inbound
                    }
                }
            }
            description "to different region cluster vyos"
            ebgp-multihop 64
            remote-as external
        }
        peer-group L3CLOS-EVPN {
            address-family {
                l2vpn-evpn {
                    soft-reconfiguration {
                        inbound
                    }
                }
            }
            ebgp-multihop 64
            remote-as external
            update-source dum0
        }
        system-as 65002
        timers {
            holdtime 3
            keepalive 1
        }
    }
}
service {
    ntp {
        allow-client xxxxxx
            address xxx.xxx.0.0/8
            address xxx.xxx.0.0/16
            address xxx.xxx.0.0/8
            address xxx.xxx.0.0/12
            address xxx.xxx.0.0/16
            address ::1/128
            address fe80::/10
            address fc00::/7
        }
        server xxxxx.tld {
        }
        server xxxxx.tld {
        }
        server xxxxx.tld {
        }
    }
    ssh {
        port 22
    }
}
system {
    config-management {
        commit-revisions 100
    }
    conntrack {
        modules {
            ftp
            h323
            nfs
            pptp
            sip
            sqlnet
            tftp
        }
    }
    console {
        device ttyS0 {
            speed 115200
        }
    }
    host-name xxxxxx
    login {
        operator-group default {
            command-policy {
                allow "*"
            }
        }
        user xxxxxx {
            authentication {
                encrypted-password xxxxxx
            }
        }
        user xxxxxx {
            authentication {
                encrypted-password xxxxxx
            }
        }
    }
    name-server eth0
    option {
        reboot-on-upgrade-failure 5
    }
    syslog {
        local {
            facility all {
                level info
            }
            facility local7 {
                level debug
            }
        }
    }
}
vrf {
    name RED {
        protocols {
            bgp {
                address-family {
                    ipv4-unicast {
                        redistribute {
                            connected {
                            }
                        }
                    }
                    l2vpn-evpn {
                        advertise {
                            ipv4 {
                                unicast {
                                }
                            }
                        }
                        rd 65002:10000
                        route-target {
                            export 10000:1
                            import 10000:1
                        }
                    }
                }
                parameters {
                    router-id xxx.xxx.254.2
                }
                peer-group L3RTR {
                    ebgp-multihop 64
                }
                system-as 65002
                timers {
                    holdtime 3
                    keepalive 1
                }
            }
        }
        table 100
        vni 10000
    }
}

vyos-2 in GCE
10.0.0.0/24 is GKE cluster subnet, I connect vyos-2 to GKE FRR for BGP EVPN

interfaces {
    bridge br0 {
        enable-vlan
        mac xx:xx:xx:xx:xx:01
        member {
            interface vxlan0 {
                allowed-vlan 2-20
                native-vlan 1
            }
        }
        vif 2 {
            address xxx.xxx.2.1/24
            vrf RED
        }
        vif 3 {
            address xxx.xxx.3.1/24
            vrf RED
        }
    }
    bridge br10000 {
        enable-vlan
        ipv6 {
            address {
                no-default-link-local
            }
        }
        member {
            interface vxlan10000 {
                allowed-vlan 2-4094
                native-vlan 1
            }
        }
        vif 2 {
            ipv6 {
                address {
                    no-default-link-local
                }
            }
            vrf RED
        }
    }
    dummy dum0 {
        address xxx.xxx.254.3/32
    }
    ethernet eth0 {
        address dhcp
        offload {
            gro
            gso
            sg
            tso
        }
    }
    ethernet eth1 {
        address dhcp
    }
    loopback lo {
    }
    vxlan vxlan0 {
        mtu 1450
        parameters {
            external
            nolearning
            vni-filter
        }
        port 4789
        source-address xxx.xxx.0.3
        source-interface eth1
        vlan-to-vni 1-20 {
            vni 10001-10020
        }
    }
    vxlan vxlan10000 {
        mtu 1450
        parameters {
            external
            nolearning
            vni-filter
        }
        port 4789
        source-address xxx.xxx.254.3
        source-interface dum0
        vlan-to-vni 2 {
            vni 10000
        }
    }
    wireguard wg0 {
        address xxx.xxx.0.3/31
        ip {
            adjust-mss 1380
        }
        peer vyos-bq-1 {
            allowed-ips xxx.xxx.0.0/0
            persistent-keepalive 15
            public-key ****************
        }
        per-client-thread
        port 5566
        private-key xxxxxx
    }
    wireguard wg1 {
        address xxx.xxx.0.7/31
        ip {
            adjust-mss 1380
        }
        peer vyos-bq-2 {
            allowed-ips xxx.xxx.0.0/0
            persistent-keepalive 15
            public-key ****************
        }
        per-client-thread
        port 5567
        private-key xxxxxx
    }
}
protocols {
    bgp {
        address-family {
            ipv4-unicast {
                network xxx.xxx.254.3/32 {
                }
                redistribute {
                    connected {
                    }
                }
            }
            l2vpn-evpn {
                advertise-all-vni
                advertise-default-gw
                advertise-svi-ip
                vni 10001 {
                    rd 65003:10001
                }
                vni 10002 {
                    rd 65003:10002
                }
                vni 10003 {
                    rd 65003:10003
                }
                vni 10004 {
                    rd 65003:10004
                }
                vni 10005 {
                    rd 65003:10005
                }
                vni 10006 {
                    rd 65003:10006
                }
                vni 10007 {
                    rd 65003:10007
                }
                vni 10008 {
                    rd 65003:10008
                }
                vni 10009 {
                    rd 65003:10009
                }
                vni 10010 {
                    rd 65003:10010
                }
                vni 10011 {
                    rd 65003:10011
                }
                vni 10012 {
                    rd 65003:10012
                }
                vni 10013 {
                    rd 65003:10013
                }
                vni 10014 {
                    rd 65003:10014
                }
                vni 10015 {
                    rd 65003:10015
                }
                vni 10016 {
                    rd 65003:10016
                }
                vni 10017 {
                    rd 65003:10017
                }
                vni 10018 {
                    rd 65003:10018
                }
                vni 10019 {
                    rd 65003:10019
                }
                vni 10020 {
                    rd 65003:10020
                }
            }
        }
        listen {
            range xxx.xxx.0.0/24 {
                peer-group GKE-EVPN-LEAF
            }
        }
        neighbor xxx.xxx.0.2 {
            advertisement-interval 0
            peer-group L3CLOS
            timers {
                connect 5
            }
        }
        neighbor xxx.xxx.0.6 {
            advertisement-interval 0
            peer-group L3CLOS
            timers {
                connect 5
            }
        }
        neighbor xxx.xxx.254.0 {
            advertisement-interval 0
            peer-group L3CLOS-EVPN
            timers {
                connect 5
            }
        }
        neighbor xxx.xxx.254.1 {
            advertisement-interval 0
            peer-group L3CLOS-EVPN
            timers {
                connect 5
            }
        }
        parameters {
            bestpath {
                as-path {
                    multipath-relax
                }
            }
            router-id xxx.xxx.254.3
        }
        peer-group GKE-EVPN-LEAF {
            address-family {
                l2vpn-evpn {
                    soft-reconfiguration {
                        inbound
                    }
                }
            }
            ebgp-multihop 64
            remote-as external
            update-source eth1
        }
        peer-group L3CLOS {
            address-family {
                ipv4-unicast {
                    soft-reconfiguration {
                        inbound
                    }
                }
            }
            description "to different region cluster vyos"
            ebgp-multihop 64
            remote-as external
        }
        peer-group L3CLOS-EVPN {
            address-family {
                l2vpn-evpn {
                    soft-reconfiguration {
                        inbound
                    }
                }
            }
            ebgp-multihop 64
            remote-as external
            update-source dum0
        }
        system-as 65003
        timers {
            holdtime 3
            keepalive 1
        }
    }
}
service {
    ntp {
        allow-client xxxxxx
            address xxx.xxx.0.0/8
            address xxx.xxx.0.0/16
            address xxx.xxx.0.0/8
            address xxx.xxx.0.0/12
            address xxx.xxx.0.0/16
            address ::1/128
            address fe80::/10
            address fc00::/7
        }
        server xxxxx.tld {
        }
        server xxxxx.tld {
        }
        server xxxxx.tld {
        }
    }
    ssh {
        port 22
    }
}
system {
    config-management {
        commit-revisions 100
    }
    conntrack {
        modules {
            ftp
            h323
            nfs
            pptp
            sip
            sqlnet
            tftp
        }
    }
    console {
        device ttyS0 {
            speed 115200
        }
    }
    host-name xxxxxx
    login {
        operator-group default {
            command-policy {
                allow "*"
            }
        }
        user xxxxxx {
            authentication {
                encrypted-password xxxxxx
            }
        }
        user xxxxxx {
            authentication {
                encrypted-password xxxxxx
            }
        }
    }
    name-server eth0
    option {
        reboot-on-upgrade-failure 5
    }
    syslog {
        local {
            facility all {
                level info
            }
            facility local7 {
                level debug
            }
        }
    }
}
vrf {
    name RED {
        protocols {
            bgp {
                address-family {
                    ipv4-unicast {
                        redistribute {
                            connected {
                            }
                        }
                    }
                    l2vpn-evpn {
                        advertise {
                            ipv4 {
                                unicast {
                                }
                            }
                        }
                        rd 65003:10000
                        route-target {
                            export 10000:1
                            import 10000:1
                        }
                    }
                }
                parameters {
                    router-id xxx.xxx.254.3
                }
                peer-group L3RTR {
                    ebgp-multihop 64
                }
                system-as 65002
                timers {
                    holdtime 3
                    keepalive 1
                }
            }
        }
        table 100
        vni 10000
    }
}

Test method:

  1. login to vyos-1 or vyos-2
  2. execute watch -n 1 "vtysh -c 'sh ru'"
  3. reboot vyos-bq-1 and vyos-bq-2 (reboot two devices will always crash it.)
  4. and see vyos-1 or vyos-2, frr config will only have partial config (you will lose all BGP related config)
  5. (Maybe) frr will restart bgpd itself, and get config back, or it will never come back again.

@tjjh89017 provided an upstream commit that fixed the issue - it was yet not backported due to conflicts https://github.com/FRRouting/frr/pull/19890.

I have resubmitted the backport with conflicts resolved https://github.com/FRRouting/frr/pull/20078

c-po changed the task status from Open to In progress.Wed, Nov 19, 7:51 PM
c-po claimed this task.
c-po triaged this task as Normal priority.

Record some discusion in slack

I found this issue
https://github.com/FRRouting/frr/issues/19549
It seems the root cause is related.
And I tried to apply the patch, and it seems it works properly.
already reboot vyos-bq-* 5 times, vyos-* in GCE still work. with the patches