Page MenuHomeVyOS Platform

DHCP client sometimes doesn't start
Closed, ResolvedPublic

Description

With 1.2-rolling-201910021249 the DHCP client doesn't automatically start on some interfaces, but it can still be started using the renew command.

I tracked this down to commit 35c7d6616 which now only start dhclient when the interface is really up. The problem seems to be that some interfaces (in my case a bond vif) take time to get to the 'up' state and the interface is still down when the addresses are added, so the DHCP client is not started.

For consistency set_state() should probably wait for the requested state to be effective before it return. Adding such a test to set_state() fix the problem on my system.

Details

Difficulty level
Unknown (require assessment)
Version
-
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)

Event Timeline

albeu created this object in space S1 VyOS Public.

The following patch fix the issue for me:

c-po triaged this task as Normal priority.
c-po added a project: VyOS 1.3 Equuleus.

Hi @albeu thank you for this contribution. While reviewing it I found one flaw:

When link does not go up "in time" you raise an exception and the commit will fail. This will always happen to interfaces where no carrier is attached. Thus I will pick up on your approach and fine tune it.

Hi @albeu the fix is very bad in most of our cases and not really good to address a single issue. Can you give some hints to reproduce the "DHCP won't start on some interfaces"? problem?

c-po reopened this task as In progress.Oct 16 2019, 6:44 AM

The system where I'm seeing this is a VM which use a BNX2 dual port network card via PCI pass-thru. Both ports (eth0 and eth1) are configured in a LACP bond (bond0) with several VIF running on top of it (bond0.10, bond0.20, etc). All VIF are showing this problem.

I would again point out to commit 35c7d6616 which added an "if self._state == 'up'" condition before starting the dhcp client. There is sadly no mention of this change in the commit log, so it is hard to tell why it was added. Note that such a test is not done for static address, so there is an asymmetry here. In my test removing this condition also solve the problem and I really don't understand why it is there. There is no harm in starting the dhcp client a bit too early, and much in not starting it at all.

There is also a conceptual problem because set_state() set the administrative state and get_state() return the operative state, but they are not the same and operative state is not supported by all drivers. See the documentation for more details.

Root cause identified and fixed. Please test @albeu.

Tested on 1.2-rolling-201910250117, the issue is solved.

erkin set Issue type to Bug (incorrect behavior).Aug 31 2021, 6:40 PM