config-sync should be saved on receiving peer, after auto-commit
Open, WishlistPublicFEATURE REQUEST
Actions

Assigned To

None

Authored By

	manuel81
	Jun 4 2024, 3:22 PM

Description

The router-admin should be able to work only on the primary node in the first place. After a commit on the primary node, the config is synchronized and the secondary node is ready to take over in case of failure.

But what if the secondary node loses power in the meantime? Or it is rebooted by another admin for some reason? Any newly synchronised config will be lost, and there is no real high availability.

So some kind of autosave action will be needed, maybe when the save command is issued on the primary node? It should not be necessary to manually save the config on the standby node.

Details

Difficulty level: Normal (likely a few hours)
Version: 1.5-rolling-202406020021
Why the issue appeared?: Will be filled on close
Is it a breaking change?: Behavior change
Issue type: Improvement (missing useful functionality)

Event Timeline

manuel81 created this task.Jun 4 2024, 3:22 PM

The config-sync is not a HA.
I don't think config-sync should save, reboot, or do something else.
Imagine if, due to config-sync, you lose access to the secondary node and if it was saved by config-sync.

That is completely wrong behavior. However, we need to think about how to integrate this feature correctly.

This can be handled just like "how others does it" as in if the peer is lost after a sync then the peer will automatically return to previous config.

This option of "autorollback" can also be configurable as in enabled/disabled incl what timeout should be used for the autorollback (if enabled) to kick in (if there are secondary dependencies you often want a long enough timeout before the autorollback kicks in).

Once the peer is reestablished and have successfully saved the new current config the timeout will no longer be at play (for that config version).

What is otherwise the purpose of config-sync if not to sync the config to the peer so the configs are in sync (hence the name)?

pasik subscribed.Jun 5 2024, 12:29 PM

In T6445#191034, @Apachez wrote:

This can be handled just like "how others does it" as in if the peer is lost after a sync then the peer will automatically return to previous config.

It is impossible/does not work in the current implementation.
The "main" node sent a new config (via API) for the specific sections if these sections were changed on the "main" router,
The secondary router does not know who changed this config, a user via API or "config-sync;" it just commits/replaces required config sections.
If the main router loses connectivity to the secondary router it is impossible to send the next API request to the secondary router "return previous config"

Yes but this is what the peer would do on its own - if the opposite device is lost in connectivity it can rollback to previous config which is enabled by default (that is history of configs).

Preferly this could be an system option (or config-sync option) if rollback due to lost peer should be supported or not.

In T6445#191083, @Apachez wrote:

Yes but this is what the peer would do on its own - if the opposite device is lost in connectivity it can rollback to previous config which is enabled by default (that is history of configs).

Why do I need it? I want to update the "main" node and reboot the main router; why will the secondary node rollback for the previous config?
I'm not going to implement it. It's the worst thing.

Because not all netadmins sits physically close to the devices being managed.

Nowadays its not uncommon to be far away and by that having a autorollback feature for the case when you cut off the branch you sit on would be handy to restore mgmt of the device.

For example Arista (and many others) does this automatically when for example pushing a new config from CVP (the mgmtserver). If the device cannot reconnect to the mgmtserver within a certain time after a new commit have been pushed it will automatically try to rollback to previous config and see if the communication is restored again with the mgmtserver.

Ok, auto-rollback might be a topic for another thread? Don't you think? OpenWRT does it, for example, and it has been useful to me in a few situations.

Back to the original topic: Save after config-sync.

I came up with this feature because in my company we have some deployments where it is necessary to keep two interface-configs in sync manually (mostly Cisco and Huawei switches). There have been some real-life situations where the netadmin just forgot to apply and save the new config on the standby interface. Sooner or later, it just happens.

Many (or most) deployments will have two (almost) identical routers right next to each other. Also, it is best practice to use out-of-band management, which greatly reduces the risk of losing connectivity.

One idea: how about adding another switch to config-sync?

set service config-sync auto-save <true|false>.

On the other hand, I work mainly with Palo Alto firewalls that do "real" HA. Of course, the passive peer stores the config on PAN-OS after receiving it from the active. Because you don't want to do it manually.

By the way ... the PAN-OS HA is not that different from the VyOS HA. Both have a sending and a receiving peer, session synchronization, link and path monitoring, and so on. Some of the features VyOS does even smoother than PAN-OS (e.g. DHCP-HA).

How about having a command on the primary router to trigger a save on the secondary router?

The standard way of doing manual changes is already

conf
set/delete/rename [...]
commit/commit-confirm
save

Why not have a command like save-all that saves the config on both routers?

config-sync should be saved on receiving peer, after auto-commitOpen, WishlistPublicFEATURE REQUESTActions

Description

Details

Event Timeline

config-sync should be saved on receiving peer, after auto-commit
Open, WishlistPublicFEATURE REQUEST
Actions