Page MenuHomeVyOS Platform

VRRP health-check script is not applied correctly in keepalived.conf
Closed, ResolvedPublicBUG

Description

The issue is that VRRP goes to fault state due to possibly conflicting keepalived configuration (generated by VyOS).

Everything works fine, until we introduce the vrrp sync-group configuration into play.

Seems like VyOS erroneously uses the health-check script both for "vrrp" instance and the vrrp_sync_group (seen below in the keepalived configuration).

VRRP and conntrack configuration:

set high-availability vrrp group vrrp address 169.254.0.254/24
set high-availability vrrp group vrrp health-check failure-count '1'
set high-availability vrrp group vrrp health-check interval '1'
set high-availability vrrp group vrrp health-check script '/config/scripts/bgp-check.sh'
set high-availability vrrp group vrrp interface 'gnv0'
set high-availability vrrp group vrrp preempt-delay '30'
set high-availability vrrp group vrrp priority '200'
set high-availability vrrp group vrrp track interface 'eth1'
set high-availability vrrp sync-group vrrp member 'vrrp'
set high-availability vrrp sync-group vrrp transition-script backup '/config/scripts/vrrp-states.sh BACKUP'
set high-availability vrrp sync-group vrrp transition-script fault '/config/scripts/vrrp-states.sh BACKUP'
set high-availability vrrp sync-group vrrp transition-script master '/config/scripts/vrrp-states.sh MASTER'
set high-availability vrrp sync-group vrrp transition-script stop '/config/scripts/vrrp-states.sh BACKUP'

set service conntrack-sync failover-mechanism vrrp sync-group 'vrrp'
set service conntrack-sync interface gnv0

/run/keepalived/keepalived.conf

# Autogenerated by VyOS
# Do not edit this file, all your changes will be lost
# on next commit or reboot

# Global definitions configuration block
global_defs {
    dynamic_interfaces
    script_user root
    notify_fifo /run/keepalived/keepalived_notify_fifo
    notify_fifo_script /usr/libexec/vyos/system/keepalived-fifo.py
}

vrrp_script healthcheck_vrrp {
    script "/config/scripts/bgp-check.sh"
    interval 1
    fall 1
    rise 1
}
vrrp_instance vrrp {
    state BACKUP
    interface gnv0
    virtual_router_id 1
    priority 200
    advert_int 1
    preempt_delay 30
    mcast_src_ip 169.254.0.1
    virtual_ipaddress {
        169.254.0.254/24
    }
    track_interface {
        eth1
    }
    track_script {
        healthcheck_vrrp
    }
}

vrrp_sync_group vrrp {
    group {
        vrrp
    }

    track_script {
        healthcheck_vrrp
    }
    notify_master "/usr/libexec/vyos/vyos-vrrp-conntracksync.sh master vrrp"
    notify_backup "/usr/libexec/vyos/vyos-vrrp-conntracksync.sh backup vrrp"
    notify_fault "/usr/libexec/vyos/vyos-vrrp-conntracksync.sh fault vrrp"
}

show vrrp log:

Feb 06 14:15:29 Keepalived_vrrp[2326]: (vrrp) track_script healthcheck_vrrp is configured on VRRP instance and sync group. Remove vrrp instance config

Seems like the Issue might be caused by this keepalived.conf.j2 template: https://github.com/vyos/vyos-1x/blob/da465d26b524fb26e0e9085e80a3ccaa6435eaa9/data/templates/high-availability/keepalived.conf.j2#L131

Should probably adjust the template logic to ensure that track_script is only configured in one place — either at the individual VRRP instance level or within the sync group, but not both?

This could possibly fix it with no further edits needed, but not entirely sure:

{% if group_config.health_check is vyos_defined and group_config.health_check.script is not vyos_defined %}
    track_script {
        healthcheck_{{ name }}
    }
{% endif %}

Details

Version
1.5
Is it a breaking change?
Perfectly compatible
Issue type
Bug (incorrect behavior)

Event Timeline

faekz0r updated the task description. (Show Details)
HollyGurza changed the task status from Open to In progress.Feb 12 2024, 11:31 AM
HollyGurza claimed this task.
dmbaturin renamed this task from vrrp health-check script not applied correctly in keepalived.conf to VRRP health-check script is not applied correctly in keepalived.conf.Mar 12 2024, 4:22 PM
dmbaturin changed the task status from Unknown Status to Resolved.