Page MenuHomeVyOS Platform

Changing settings on an interface causes it to fall out of bridge
Closed, ResolvedPublic


I added a vif to a ethernet interface, committed, and the ethernet interface dropped out of the bridge. I had to go to the console to delete and re-add the member interfaces.

vyos@rt-home:~$ show bridge 
bridge name     bridge id               STP enabled     interfaces
br0             5000.0019xxxxxxxx       yes             eth0
vyos@rt-home# set interfaces ethernet eth1 vif 20
vyos@rt-home# set interfaces ethernet eth2 vif 20
vyos@rt-home# commit

(lost connection here, had to go log in to the physical console)

vyos@rt-home:~$ show bridge
bridge name	bridge id		STP enabled	interfaces
br0		5000.001921dffc1e	yes		eth0
vyos@rt-home# delete interfaces bridge br0 member interface eth1
vyos@rt-home# delete interfaces bridge br0 member interface eth2
vyos@rt-home# commit
vyos@rt-home# add interfaces bridge br0 member interface eth1
vyos@rt-home# add interfaces bridge br0 member interface eth2
vyos@rt-home# commit
vyos@rt-home# run show bridge br0
bridge name     bridge id               STP enabled     interfaces
br0             5000.0006xxxxxxxx       yes             eth0
vyos@rt-home# show interfaces bridge |strip-private 
 bridge br0 {
     address xxxx:xxxx::1/64
     aging 300
     description LAN
     firewall {
         local {
             name lan-local
     hello-time 2
     ipv6 {
         dup-addr-detect-transmits 2
     max-age 20
     member {
         interface eth0 {
         interface eth1 {
         interface eth2 {
         interface eth4 {
     priority 20480

Edit 2020-04-24: this also happens if changing any setting on an ethernet interface, like removing an address, not just adding a vif to it. Retested with latest rolling code to still be an issue.


Difficulty level
Unknown (require assessment)
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)

Event Timeline

jjakob triaged this task as Unbreak Now! priority.Apr 8 2020, 11:42 AM
jjakob created this task.
jjakob created this object in space S1 VyOS Public.

This is a pretty serious bug. Is there any progress?

jjakob renamed this task from Interface falls out of bridge when adding a vif to it to Changing settings on an interface causes it to fall out of bridge.Apr 24 2020, 1:50 AM
jjakob updated the task description. (Show Details)
jjakob changed Version from 1.3-rolling-202003291001 to 1.3-rolling-202004200558.
jjakob changed Is it a breaking change? from Perfectly compatible to Unspecified (possibly destroys the router).

If this happens to you, you don't need to delete and re-add the members in the config, you can run sudo ip link set dev ethX master brX to add the eth interfaces back to the bridge.

I think this must be because the ethernet scripts always set all settings on the interface, one of these must cause the interface to go down and come back up. The fix would be to add it and all vlan interfaces back to any bridges they're a member of at the end of applying all settings. This would probably need to be done for more interface types, at least the ones that can go down and up while their settings are applied (I haven't debugged which setting it is exactly, so this would be best done by testing). It would be best to call a common function of BridgeIf that just adds all members to its bridge to avoid duplication.

jjakob changed the task status from Open to In progress.Apr 24 2020, 10:45 PM
jjakob claimed this task.
jjakob moved this task from Need Triage to In Progress on the VyOS 1.3 Equuleus board.

Yes the code always does down / up. It is something @cpo and I discussed and it was deemed safer when performing changes. There could be a way to indicate that a change occur which should cause a trigger in another section. Something all sections should run on every change. This would be a major design change as we would not anymore apply everything, would need to decide what part of the change trigger this update process and implement a new section and make sure it is run !

For clarity, I do not believe the rewrite in the parent task is the root of the issue, as the behaviour on configuration was not changed. The code was restructured but the action logic remained the same. The same commands/actions will be run now as were run before the patchset. If something changed it will be in the conf mode interface code and not ifconfig AFAICS

I'll try out the latest code then. But on 0421 with vyos-1x from 04-23
15:11 the issue is still there. I had most of the fix ready, it re-added
the interface back to the bridge it is supposed to be in after applying
all settings. I've thought about if this could be some function/method of
ifconfig, but I don't think it's necessary or how it would be
implemented. If the issue is still there, I'll open a PR. Then if there
is a bigger need to add this to ifconfig you can do it later.

Regarding changing only settings that are changed, yes, that would be the
best, I think some conf_mode scripts already use this method. Get dicts
of the current and effective configs, compare them, then only apply the
changes, other scripts can be rewritten when there is time, currently the
most important thing is that this code works without bugs, and big
rewrites done in a hurry would introduce new bugs.

The problem with bridges is that when other scripts work on their
interfaces (break them down and recreate them) they need a way to simply
add themselves back to the bridge, as running the whole bridge setup
would be unnecessary. Itcs pretty simple and I have the code working
locally as I said.

@jjakob if you enable debugging and record what is happening to the interface when it drops would be useful.

I found that it was due to set_vrf which uses the same master/nomaster commands that bridge uses. I'm adding checks and conditions to all interfaces to prevent that - raise ConfigError if both are set, and only set_vrf if vrf is set. The prior idea I had with re-adding the interface to the bridge at the end isn't necessary in this case, but for other interface types, where the interface itself is deleted and recreated, re-adding it is still necessary. The PR should be ready for initial review tomorrow. is the initial fix for this.

There is a bigger problem still left to solve:

When the bridge member interface settings were migrated from interfaces to the bridge itself, the interface code lost knowledge of its bridge membership and port settings. This is needed because some interfaces delete and recreate themselves in the process of changing settings, so they need to re-add themselves to the bridge each time. This is now solved with my PR changes.
Unfortunately there are also STP settings set on each interface by the bridge setup code. Some of these settings are per each member interface, while others are part of the main bridge interface, but they are all applied on the slave interface itself.
When a interface adds itself to a bridge, it needs to restore these settings. I thought of caching all set_ calls inside a dict in the interface's class, but that would get deleted when the interface code deletes the interface.

I don't know what the best way to solve this is. The settings that the bridge code applies on the member interface need to get stored in a place where the member interface deletion doesn't delete it, so we can later call the add_port() and (new) restore() functions to restore the bridge port. I'd appreciate the help of others on this. @thomas-mangin @c-po

Scratch the above comment - I'll write a simple utility function that parses the STP config from the bridge, and some utility functions to remove duplication for adding the bridge members and setting STP on them, that should be good enough.

Still TODO: apply STP settings when re-adding interface to bridge from interface scripts, currently the interface loses configured STP settings, but is at least readded to the bridge.

jjakob changed the task status from In progress to On hold.May 9 2020, 8:46 AM
jjakob moved this task from In Progress to Backlog on the VyOS 1.3 Equuleus board.
erkin set Issue type to Bug (incorrect behavior).Aug 30 2021, 7:35 AM
erkin removed a subscriber: Active contributors.