Page MenuHomeVyOS Platform

VRRP transition script will be executed once only
Closed, ResolvedPublicBUG

Description

Hi.

Sorry for bad English.

If used not all VRRP-transition scripts (master, backup,fault), or if used only one script (eg. master) - it will be executed only one time, for one VRRP-transition.
This is because in keepalived.conf is formed notify property with "vrrp-script-wrapper.py" for each situation (master, backup, fault).
But "vrrp-script-wrapper.py" save state for each VRRP-group and do not execute script if state no changed.

old_state = vyos.keepalived.get_old_state(args.group)
if (old_state is None) or (old_state != args.state):

<exec script>

else:
    syslog.syslog(syslog.LOG_NOTICE, "State of the group {0} has not changed, not running transition script".format(args.group))

If i have only master script - it will be executed once, for one transition to MASTER, because state is None.
After this new state "master" is saved and script execution code is unaccesible for next-time Master transition.
Nothing else "vrrp-script-wrapper.py" change this saved state.

vyos.keepalived.save_state(args.group, args.state)

I dont understand, why do you need a condition here.
Keepalived will execute each notify-script only if state changed.

Details

Difficulty level
Easy (less than an hour)
Version
1.2.1
Why the issue appeared?
Design mistake
Is it a breaking change?
Behavior change
Issue type
Improvement (missing useful functionality)

Event Timeline

lbv2rus updated the task description. (Show Details)
zsdc changed the task status from Open to Confirmed.Jun 5 2019, 1:26 PM
dmbaturin added a subscriber: dmbaturin.

I'm not sure if I can think of a situation when a master script can be used without a backup/fault script, but I suppose it's a fair point.
We likely want to save the state on every transition.

The reason this whole thing is needed is that keepalived doesn't remember its own state, and if it's in the master state and gets restarted, it starts in the backup state and re-establishes itself as the master. This causes scripts to run when there's no need to, and since people often use scripts to change the config, this can be very disruptive.

dmbaturin changed Why the issue appeared? from Will be filled on close to Design mistake.

As I see, from current VyOS scripts, keepalived restart only at router startup or if all VRRP groups were deleted. In case of configuration change we use reload, which is correct.
This means that we get nothing from the keeping state in case of the restart - there is no sense to keep states of deleted groups, and we have nothing to keep at first startup.

Until recently time, keepalived had problems with notifying during reload too - sometimes it could run scripts, even if a state was not really changed. This bug was effectively mitigated by saving state in our scripts. But we have tracked this issue and fixed it in the upstream. So, after we update keepalived version, there will be no reason to keep states additionally to the keepalived.

Possible state tree will be:
During startup:

  • unknown/initial state (no notify event generated):
    • FAULT (notify_fault);
    • BACKUP (notify_backup):
      • stay in BACKUP (no notify event generated);
      • MASTER (notify_master).

During reload:

  • previous state:
    • no changes (no notify event generated);
    • FAULT (notify_fault);
    • BACKUP (notify_backup):
      • stay in BACKUP (no notify event generated);
      • MASTER (notify_master).

When the previous state is the same as the new state after reload, keepalived will not generate any notify events.

Therefore, states switching logic must be processed inside the scripts itself, keeping states inside the wrapper will be useless. But, there is nothing very hard - the only thing, which actually has a sense even now, is that users must always keep in mind that VRRP starts from the unknown state.

By the way, also it is possible to add STOP script, which can be triggered in case of keepalived stop. This can be useful for switching configuration to some "neutral" state, which allows users to not check the current configuration in scripts before applying changes via notify_ backup/master/fault during the first start.

syncer triaged this task as Normal priority.Aug 31 2019, 12:47 AM
syncer edited projects, added VyOS 1.2 Crux (VyOS 1.2.4); removed VyOS 1.2 Crux.
syncer raised the priority of this task from Normal to High.Nov 16 2019, 11:50 PM
zsdc changed the task status from Confirmed to Needs testing.Apr 2 2020, 11:51 AM

In the current 1.3 branch the original issue was resolved and added STOP script support. It is necessary to test this and review the possibility to backport the solution into 1.2.

In light of the mode-force in 1.2.x, we definitely can't make this the new default within an LTS branch.

dmbaturin set Is it a breaking change? to Behavior change.
jestabro moved this task from Need Triage to Finished on the VyOS 1.3 Equuleus board.

This is resolved for 1.3, and is not a backport candidate for 1.2, as indicated above.

dmbaturin set Issue type to Improvement (missing useful functionality).Sep 29 2021, 1:40 PM