The issue is that VRRP goes to fault state due to possibly conflicting keepalived configuration (generated by VyOS).
Everything works fine, until we introduce the vrrp sync-group configuration into play.
Seems like VyOS erroneously uses the health-check script both for "vrrp" instance and the vrrp_sync_group (seen below in the keepalived configuration).
VRRP and conntrack configuration:
set high-availability vrrp group vrrp address 169.254.0.254/24 set high-availability vrrp group vrrp health-check failure-count '1' set high-availability vrrp group vrrp health-check interval '1' set high-availability vrrp group vrrp health-check script '/config/scripts/bgp-check.sh' set high-availability vrrp group vrrp interface 'gnv0' set high-availability vrrp group vrrp preempt-delay '30' set high-availability vrrp group vrrp priority '200' set high-availability vrrp group vrrp track interface 'eth1' set high-availability vrrp sync-group vrrp member 'vrrp' set high-availability vrrp sync-group vrrp transition-script backup '/config/scripts/vrrp-states.sh BACKUP' set high-availability vrrp sync-group vrrp transition-script fault '/config/scripts/vrrp-states.sh BACKUP' set high-availability vrrp sync-group vrrp transition-script master '/config/scripts/vrrp-states.sh MASTER' set high-availability vrrp sync-group vrrp transition-script stop '/config/scripts/vrrp-states.sh BACKUP' set service conntrack-sync failover-mechanism vrrp sync-group 'vrrp' set service conntrack-sync interface gnv0
/run/keepalived/keepalived.conf
# Autogenerated by VyOS # Do not edit this file, all your changes will be lost # on next commit or reboot # Global definitions configuration block global_defs { dynamic_interfaces script_user root notify_fifo /run/keepalived/keepalived_notify_fifo notify_fifo_script /usr/libexec/vyos/system/keepalived-fifo.py } vrrp_script healthcheck_vrrp { script "/config/scripts/bgp-check.sh" interval 1 fall 1 rise 1 } vrrp_instance vrrp { state BACKUP interface gnv0 virtual_router_id 1 priority 200 advert_int 1 preempt_delay 30 mcast_src_ip 169.254.0.1 virtual_ipaddress { 169.254.0.254/24 } track_interface { eth1 } track_script { healthcheck_vrrp } } vrrp_sync_group vrrp { group { vrrp } track_script { healthcheck_vrrp } notify_master "/usr/libexec/vyos/vyos-vrrp-conntracksync.sh master vrrp" notify_backup "/usr/libexec/vyos/vyos-vrrp-conntracksync.sh backup vrrp" notify_fault "/usr/libexec/vyos/vyos-vrrp-conntracksync.sh fault vrrp" }
show vrrp log:
Feb 06 14:15:29 Keepalived_vrrp[2326]: (vrrp) track_script healthcheck_vrrp is configured on VRRP instance and sync group. Remove vrrp instance config
Seems like the Issue might be caused by this keepalived.conf.j2 template: https://github.com/vyos/vyos-1x/blob/da465d26b524fb26e0e9085e80a3ccaa6435eaa9/data/templates/high-availability/keepalived.conf.j2#L131
Should probably adjust the template logic to ensure that track_script is only configured in one place — either at the individual VRRP instance level or within the sync group, but not both?
This could possibly fix it with no further edits needed, but not entirely sure:
{% if group_config.health_check is vyos_defined and group_config.health_check.script is not vyos_defined %} track_script { healthcheck_{{ name }} } {% endif %}