The issue is that VRRP goes to fault state due to possibly conflicting keepalived configuration (generated by VyOS).
Everything works fine, until we introduce the vrrp sync-group configuration into play.
Seems like VyOS erroneously uses the health-check script both for "vrrp" instance and the vrrp_sync_group (seen below in the keepalived configuration).
VRRP and conntrack configuration:
set high-availability vrrp group vrrp address 169.254.0.254/24 set high-availability vrrp group vrrp health-check failure-count '1' set high-availability vrrp group vrrp health-check interval '1' set high-availability vrrp group vrrp health-check script '/config/scripts/bgp-check.sh' set high-availability vrrp group vrrp interface 'gnv0' set high-availability vrrp group vrrp preempt-delay '30' set high-availability vrrp group vrrp priority '200' set high-availability vrrp group vrrp track interface 'eth1' set high-availability vrrp sync-group vrrp member 'vrrp' set high-availability vrrp sync-group vrrp transition-script backup '/config/scripts/vrrp-states.sh BACKUP' set high-availability vrrp sync-group vrrp transition-script fault '/config/scripts/vrrp-states.sh BACKUP' set high-availability vrrp sync-group vrrp transition-script master '/config/scripts/vrrp-states.sh MASTER' set high-availability vrrp sync-group vrrp transition-script stop '/config/scripts/vrrp-states.sh BACKUP' set service conntrack-sync failover-mechanism vrrp sync-group 'vrrp' set service conntrack-sync interface gnv0
/run/keepalived/keepalived.conf
# Autogenerated by VyOS
# Do not edit this file, all your changes will be lost
# on next commit or reboot
# Global definitions configuration block
global_defs {
dynamic_interfaces
script_user root
notify_fifo /run/keepalived/keepalived_notify_fifo
notify_fifo_script /usr/libexec/vyos/system/keepalived-fifo.py
}
vrrp_script healthcheck_vrrp {
script "/config/scripts/bgp-check.sh"
interval 1
fall 1
rise 1
}
vrrp_instance vrrp {
state BACKUP
interface gnv0
virtual_router_id 1
priority 200
advert_int 1
preempt_delay 30
mcast_src_ip 169.254.0.1
virtual_ipaddress {
169.254.0.254/24
}
track_interface {
eth1
}
track_script {
healthcheck_vrrp
}
}
vrrp_sync_group vrrp {
group {
vrrp
}
track_script {
healthcheck_vrrp
}
notify_master "/usr/libexec/vyos/vyos-vrrp-conntracksync.sh master vrrp"
notify_backup "/usr/libexec/vyos/vyos-vrrp-conntracksync.sh backup vrrp"
notify_fault "/usr/libexec/vyos/vyos-vrrp-conntracksync.sh fault vrrp"
}show vrrp log:
Feb 06 14:15:29 Keepalived_vrrp[2326]: (vrrp) track_script healthcheck_vrrp is configured on VRRP instance and sync group. Remove vrrp instance config
Seems like the Issue might be caused by this keepalived.conf.j2 template: https://github.com/vyos/vyos-1x/blob/da465d26b524fb26e0e9085e80a3ccaa6435eaa9/data/templates/high-availability/keepalived.conf.j2#L131
Should probably adjust the template logic to ensure that track_script is only configured in one place — either at the individual VRRP instance level or within the sync group, but not both?
This could possibly fix it with no further edits needed, but not entirely sure:
{% if group_config.health_check is vyos_defined and group_config.health_check.script is not vyos_defined %}
track_script {
healthcheck_{{ name }}
}
{% endif %}