Page MenuHomeVyOS Platform

Commit fails if ethernet interface doesn't support flow control (pause)
Closed, ResolvedPublicBUG

Description

I set up VyOS rolling from 2020-02-24 on a xcp-ng 8 host. When trying to assign an IP to an ethernet interface, it errors out when ethtool tries to set flow control settings (pause/etc). Looking at the source, there are a list of blacklisted drivers for pause, but xen_netfront is not on the list. After adding it, the interface comes up without issue.

Pull request:
https://github.com/vyos/vyos-1x/pull/266

Details

Version
VyOS 1.3-rolling-202003241825
Is it a breaking change?
Perfectly compatible
Issue type
Bug (incorrect behavior)

Event Timeline

does this also affect vyos 1.2, or only 1.3/rolling?

This fix is for 1.3 rolling only and should not be a problem on 1.2 as long as users do not explicitly set speed/duplex.

ok, thanks! I can test on xen aswell, when the fix is in 1.3/rolling.

Please add r8169 as well. The config failed to load at boot after upgrading to latest rolling because of this error. The script should check if the interface supports pause and silently continue if it doesn't, otherwise maintaining a list of all pause-unsupported interfaces is going to be next to impossible. I suspect a lot more of them don't.

In T2158#56542, @jjakob wrote:

Please add r8169 as well. The config failed to load at boot after upgrading to latest rolling because of this error. The script should check if the interface supports pause and silently continue if it doesn't, otherwise maintaining a list of all pause-unsupported interfaces is going to be next to impossible. I suspect a lot more of them don't.

Yeah - actually checking the results of ethtool (catching an error if any) would be ideal for sure, the driver blacklist method is kind of a kludge.

If we could get the driver blacklist committed quickly so that the released version will actually work for us hitting problems, and then handle the output for real as time allows, that would probably be ideal?

I suspect the driver blacklist won't be enough for a lot of users. A lot of very common ethernet cards don't support setting pause frames.

Oh, I see my changes were already pulled in, missed that!

I just did a quick validation; spun up a new vyos instance on xcp-ng with 1.3-rolling-202003241825, and validated that it failed. Re-installed with 1.3-rolling-202003250217, and it works.

So, as far as Xen is concerned, this is confirmed fixed. However, @jjakob is correct that this should be fixed to catch errors and handle them versus having to list each driver individually. Not sure if that should be a new bug or continue here?

Please add r8169 as well. The config failed to load at boot after upgrading to latest rolling because of this error. The script should check if the interface supports pause and silently continue if it doesn't, otherwise maintaining a list of all pause-unsupported interfaces is going to be next to impossible. I suspect a lot more of them don't.

@thomas-mangin can resolve the issue as he introduced it :)

Unbootable systems are super bad

c-po triaged this task as Urgent! priority.

we can fix it in two ways: undo the commit which check the return code of the program, hiding the issue and add a first BIG PRINT warning stage or find the root cause and fix it (harder).

It seems that some interface may not have pause feature and we have no correct detection for the feature when ethtool -a could be used ?

let me try to put a quick patch for you.

Maybe check the physical interface support via ethtool in the ethernet validate() function and raise a configerror if it doesn't? Or should the default be disabled and should a config command be enable-flow-control? The script that actually sets the flow control should definitely just print a warning to the syslog and not fail.

@jjakob I had that discussion with @thomas-mangin already - the best solution would be a dynamic probe via ethtool in verify()

I already hotfixed the issue on mine by adding r8169 into the unsupported list - but as said, that's not the real solution.

jjakob renamed this task from Need to add xen_netfront to interfaces that don't support pause to Commit fails if ethernet interface doesn't support flow control (pause).Mar 26 2020, 8:49 AM

Hi jjakob, AFAICS the patch above should have fixed any problem with the ethtool. If not I will need to understand why.

@thomas-mangin Which commit do you mean, https://github.com/vyos/vyos-1x/commit/60d35d1d4d3a5acec6e39cccb166fd33490b6c27 ?
I can definitely say that did not fix the issue for r8169, the router failed boot after upgrading to 1.3-rolling-202003250217. If there were any patches after that, I can't see them.

I think this throws a exception that isn't caught: https://github.com/vyos/vyos-1x/blob/583e9d907236a4a98fe40e97a378c1fb655f8a95/python/vyos/ifconfig/ethernet.py#L114

root@vyos:~# /sbin/ethtool --show-pause eth0
Pause parameters for eth0:
Cannot get device pause settings: Operation not supported
root@vyos:~# echo $?
76

also I would remove L107-L109 and move the debug message to the exception handler of L114

I can confirm the above commit fixes booting with interfaces that don't support flow control. I have no way of checking that it properly applies if the interface does support it.

What's the reason for enabling flow control by default? I'd have assumed disabled is more common and causes less problems. The node naming is not the best IMO as it has "disable-" in it, more reasonable would be to have a node called "flow-control" that enabled it if set, the default being disabled, and it could have sub-nodes to tweak the exact flow control settings.

I could not tell you why it was decided to be this way.

erkin set Issue type to Bug (incorrect behavior).Aug 31 2021, 5:12 PM
erkin removed a subscriber: Global Notifications.