Page MenuHomeVyOS Platform

IPv6 default route disappears after upgrade
Closed, ResolvedPublicBUG

Description

After upgrading to Sagitta 1.4.3, the router no longer obtains a default route for IPv6 from the ISP, however IPv4 works fine. Downgrading to 1.4.2 LTS fixed the issue.

On 1.4.3 LTS, I had to manually add a static route with the command set protocols static route6 ::/0 next-hop fe80::201:5cff:fe81:c046 interface eth0. This temporarily solves the issue, but these link-local addresses can change.

After downgrading to 1.4.2, the static route is no longer needed as it automatically adds a kernel route learned from the ISP.

EDIT - Almost forgot, here are the relevant configs:

admin@router:~$ sh configuration commands | strip-private | grep "eth0"
set firewall group interface-group WAN_IG interface 'eth0'
set interfaces ethernet eth0 address 'dhcp'
set interfaces ethernet eth0 address 'dhcpv6'
set interfaces ethernet eth0 description 'WAN'
set interfaces ethernet eth0 dhcpv6-options pd 0 interface eth1.4 address '1'
set interfaces ethernet eth0 dhcpv6-options pd 0 interface eth1.4 sla-id '0'
set interfaces ethernet eth0 dhcpv6-options pd 0 interface eth1.5 address '1'
set interfaces ethernet eth0 dhcpv6-options pd 0 interface eth1.5 sla-id '1'
set interfaces ethernet eth0 dhcpv6-options pd 0 length '56'
set interfaces ethernet eth0 dhcpv6-options rapid-commit
set interfaces ethernet eth0 hw-id 'xx:xx:xx:xx:xx:f7'
set interfaces ethernet eth0 offload gro
set interfaces ethernet eth0 offload gso
set interfaces ethernet eth0 offload rfs
set interfaces ethernet eth0 offload rps
set interfaces ethernet eth0 offload sg
set interfaces ethernet eth0 offload tso
set interfaces ethernet eth0 ring-buffer rx '4096'
set interfaces ethernet eth0 ring-buffer tx '4096'
set nat source rule 1 outbound-interface name 'eth0'

If you need anything else, let me know

Details

Version
Sagitta 1.4.3 LTS
Is it a breaking change?
Perfectly compatible
Issue type
Bug (incorrect behavior)

Event Timeline

@Viacheslav I just did some local testing with 3 VMs. A VM with today's rolling release to simulate a home router, a blank Debian VM for host testing, and a Sagitta VM simulating an ISP (running dhcpv6-server and router-advert services)

The issue is present in rolling and it appears to be caused by T7379 as you pointed out. Configuring eth0 with autoconf (SLAAC) sets the default route, and removing autoconf from eth0 removes the route successfully. Then I set eth0 to dhcpv6 and no default route was added.

The problem seems to be that ALL default routes learned from router advertisements are being deleted when SLAAC is not configured. Router advertisements are still necessary in DHCPv6. When you use DHCPv6, the host still waits for an RA, only difference is the RA will have a 1 for the "managed" or "other" bit to inform the host that it needs to contact a DHCPv6 server, but the default route is still learned from the RA.

def flush_ipv6_slaac_routes(self, ra_addrs: list=[]) -> None:
        """
        Flush IPv6 default routes installed in response to router advertisement
        messages from this interface.
        Will raise an exception on error.
        """
        # Find IPv6 connected prefixes for flushed SLAAC addresses
        connected = []
        for addr in ra_addrs if isinstance(ra_addrs, list) else []:
            connected.append(str(IPv6Interface(addr).network))

        netns = get_interface_namespace(self.ifname)
        netns_cmd = f'ip netns exec {netns}' if netns else ''

        tmp = self._cmd(f'{netns_cmd} ip -j -6 route show dev {self.ifname}')
        tmp = json.loads(tmp)
        # Parse interface routes. Example data:
        # {'dst': 'default', 'gateway': 'fe80::250:56ff:feb3:cdba',
        # 'protocol': 'ra', 'metric': 1024, 'flags': [], 'expires': 1398,
        # 'metrics': [{'hoplimit': 64}], 'pref': 'medium'}
        for route in tmp:
            # If it's a default route received from RA, delete it
            if (dict_search('dst', route) == 'default' and
                dict_search('protocol', route) == 'ra'):
                self._cmd(f'{netns_cmd} ip -6 route del default via {route["gateway"]} dev {self.ifname}')
            # Remove connected prefixes received from RA
            if dict_search('dst', route) in connected:
                # If it's a connected prefix, delete it
                self._cmd(f'{netns_cmd} ip -6 route del {route["dst"]} dev {self.ifname}')

        return None

So I think the key to solving this will be to change the logic so that it only calls flush_ipv6_slaac_routes when both SLAAC and DHCPv6 are unconfigured, not just SLAAC.

This "error" is related to https://github.com/vyos/vyos-1x/pull/4461.

Your config only has set interfaces ethernet eth0 address 'dhcpv6' to request for a dynamic address via DHCPv6, but it has no setting for SLAAC to also request for a default route. IPv6 DHCP acts differently compared to IPv4 - it can't send a default route.
So VyOS 1.4.3 and onwards now behave as expected. This was actually bug in the past that when DHCPv6 was enabled - SLAAC was too by error.

set interfaces ethernet eth0 ipv6 address autoconf will enable SLAAC and properly request a default route.

Hi @c-po,

I understand what you're saying. And I'm not suggesting that 4461 be reverted - it does fix the earlier bug you mentioned where the default route from SLAAC wasn't being cleared.

What I'm saying is - router advertisements are necessary for both DHCPv6 and SLAAC to function correctly.

What's supposed to happen is, a host set to auto-configure itself with an IPv6 address is supposed to solicit and listen for an RA. The RA informs the host which configuration method to use - either SLAAC or DHCPv6. The decision of which method to use depends on the flags in the RA:

  • Managed = 0, Other = 0 -> Use pure SLAAC, generate your own IPv6 address and check for duplicates.
  • Managed = 1, Other = 0 -> Stateful DHCPv6, obtain IPv6 address assignment from a pool on the DHCPv6 server.
  • Managed = 0, Other = 1 -> Stateless DHCPv6, generate your own IPv6 address and check for duplicates, but also contact DHCPv6 server for additional information common to all hosts in the subnet

Regardless of which configuration method is chosen, the default route is learned from the RA. As you said, the DHCPv6 protocol itself does not supply a default route. But it doesn't have to because the host will have already discovered it from the RA. This is how other hosts configure themselves on IPv6 networks.

Based on this, there really should be no need for an administrator to select DHCPv6 or SLAAC when configuring an interface. If they use autoconf, it should decide based on the flags set in the RA by the upstream ISP router, just like any other host would. But that's a bit off-topic and probably involves a huge change to the codebase.

But if the administrator must configure SLAAC/DHCPv6 on the interface, the choice should be mutually exclusive, and both should accept RAs. I'm simply suggesting that default routes should be learned regardless of whether DHCPv6 or SLAAC is chosen, because that would make it consistent with expected behavior and likely involve a smaller change to the codebase than removing address dhcpv6 from the config tree and having autoconf decide between SLAAC and DHCPv6.

In my opinion, only if both ipv6 address autoconf and address dhcpv6 are missing should the default RA routes be cleared and the interface stop listening for RAs.

There are more fundamental problems here with the implementation that are incomplete.

There is no NA, and it's a prefix from dhcpv6. Even with the following, no default route is setup. This is not new with 1.4.3.

ethernet eth5 {
     address dhcp
     address dhcpv6
     description WAN
     dhcpv6-options {
         no-release
         pd 0 {
             interface eth0 {
                 address 1
                 sla-id 0
             }
             interface wg100 {
                 address 1
                 sla-id 1
             }
             length 56
         }
     }
     ip {
         adjust-mss clamp-mss-to-pmtu
     }
     ipv6 {
         address {
             autoconf
         }
     }
     ring-buffer {
         rx 4096
         tx 4096
     }
 }

I wrote this (hack) script below, which fixes the problem of a default route not being established. It is not the right place to do it because the route is on the RA*, but races enough that it still brings up the interface later.

/config/scripts/dhcp6c/eth5

#!/bin/vbash
source /opt/vyatta/etc/functions/script-template
run configure

WAN_IF=$(basename "$0")

# Find link-local next-hop
LL_GW=$(ip -6 neigh show dev "$WAN_IF" | awk '$4 == "router" && $5 != "FAILED" && $5 != "INCOMPLETE" { print $1; exit }')

delete protocols static route6 ::/0

if [ -n "$LL_GW" ]; then
  set protocols static route6 ::/0 next-hop "$LL_GW" interface "$WAN_IF"
  commit
fi

Please feel free to take this under your license, to fix this long-occurring bug with VyOS and ipv6.

Based on a recent Slack conversation we had about this, it seems like the most reasonable solution is to have configuration options like this:

  • ipv6 address autoconf - accept RAs and do whatever M and O flags tell it to do
  • ipv6 address dhcpv6 - DHCPv6 without RAs, must configure default route separately if needed
  • ipv6 address dhcpv6 nd-defgtw - DCHPv6 without RAs, determine default route from neighbor discovery (possibly using the functionality in @josha script above)
  • ipv6 address xx:xx:xx.... - Static IP

Based on a recent Slack conversation we had about this, it seems like the most reasonable solution is to have configuration options like this:

  • ipv6 address autoconf - accept RAs and do whatever M and O flags tell it to do
  • ipv6 address dhcpv6 - DHCPv6 without RAs, must configure default route separately if needed
  • ipv6 address dhcpv6 nd-defgtw - DCHPv6 without RAs, determine default route from neighbor discovery (possibly using the functionality in @josha script above)
  • ipv6 address xx:xx:xx.... - Static IP

From the peanut gallery: there should be some "no-default-gateway" command or similar for v4 and v6 per iface, with VyOS defaulting to self-healing. If there is no default gw, make a reasonable assumption the user does not want a broken network and add the received router on a dead-end network. Adding some weird nd-defgtw thing further pushes this out of reach, as I first thought that VyOS had no real support for ipv6, as opposed to this just not being implemented properly.

@josha I agree with you. Perhaps it should be like this?

  • ipv6 address autoconf - accept RAs and do whatever M and O flags tell it to do
  • ipv6 address dhcpv6 - DCHPv6 without RAs, determine default route from neighbor discovery (possibly using the functionality in @josha script above)
  • ipv6 address dhcpv6 no-default-gateway - DHCPv6 without RAs, must configure default route separately if needed
  • ipv6 address xx:xx:xx.... - Static IP

@josha I agree with you. Perhaps it should be like this?

  • ipv6 address autoconf - accept RAs and do whatever M and O flags tell it to do
  • ipv6 address dhcpv6 - DCHPv6 without RAs, determine default route from neighbor discovery (possibly using the functionality in @josha script above)
  • ipv6 address dhcpv6 no-default-gateway - DHCPv6 without RAs, must configure default route separately if needed
  • ipv6 address xx:xx:xx.... - Static IP

This seems the most logical approach to me. Though, I'm a bit concerned about the following:

The VyOS 1.5 docs call out that using ... ipv6 address autoconf disables IPv6 forwarding on the interface. Probably not what we want on our WAN interface if it's our default route.

@ryanzuwala I think of a change in 1.4.4 which requires config migration. So if one has a working config in 1.4.2 that is "invalid" as it lacks the "ipv6 address autoconf" node - the migrator will add it to restore previous behavior AND have all CLI nodes in place to be a good internet citizen.

This could be considered as a "good" implementation and fix all use-cases. Any other change in the design how those addresses are assigned/used should be up for 1.5 or rolling.

@c-po if I understand you correctly, for 1.4.4, you are going to make a migration file that looks for interfaces that have only:

set interfaces ethernet ethX address 'dhcpv6'

and the migration will add the autoconf setting to that interface, so it becomes the combined configuration:

set interfaces ethernet ethX address dhcpv6
set interfaces ethernet ethX ipv6 address autoconf

I think that's probably fine for now. I've been running 1.4.3 with the combined config above on my WAN interface for over a couple weeks with no issues.

I agree that big design changes should be in 1.5/rolling. I'm just wondering if setting address dhcpv6 and ipv6 address autoconf together on an interface is meant to be a temporary workaround for >=1.4.3 until this can be redesigned in 1.5, or if using dhcpv6 and autoconf together is the desired official way to configure a WAN port? What is your opinion on the proposed design changes to address assignment for 1.5?

Hopefully @Apachez will chime in here too, I know he expressed some concerns in the Slack conversation.

My €0.05 in this topic is if the "ipv6 address autoconf" should instead be named "ipv6 address slaac" since after all the feature is named "stateless address autoconfiguration" aka slaac and not necessary "autoconf". But I think its a matter of taste if it should be called "autconf" or "slaac".

I often refer to "how do others do this?" and for that context "autoconf" would be more in line with that than "slaac".

For example Arista uses "ipv6 address auto-config".

Ref: https://www.arista.com/en/um-eos/eos-ipv6#xx1150664

The other opinion (which seems already being taken care of) is to set address "manually" (aka not rely on slaac (RA) M-flag) using dhcpv6 but then select if a default route should be picked through ND or if you dont want to have a default route automatically being setup for you. After all there do exists usecases where you want an IP-address being set but without necessary have a default route also being configured.

The thing with IPv6 (compared to IPv4) is that you can use linklocal as default route by selecting an egressing interface as nexthop and the stack will use the linklocal address of that nexthop (along with interface to be part of that routing entry). So with IPv6 the nexthop can either be some public address OR a linklocal address as nexthop (along with the interface like "%ethX").

So I think the migration script should transform "invalid IPv6 config" into "set interfaces ethernet ethX ipv6 address autoconf". But if IPv6 was disabled it should remain disabled.

That is as stated previously (with updates):

  1. ipv6 address autoconf - Accept RAs and do whatever M- and O-flags tell it to do, determine default route from ND (possibly using the functionality in @josha script above).
  2. ipv6 address autoconf no-default-gateway - Accept RAs and do whatever M- and O-flags tell it to do, must configure default route separately if needed.
  3. ipv6 address dhcpv6 - DCHPv6 without RAs (that is we dont care if there exists RAs or not), determine default route from ND (possibly using the functionality in @josha script above).
  4. ipv6 address dhcpv6 no-default-gateway - DHCPv6 without RAs (that is we dont care if there exists RAs or not), must configure default route separately if needed.
  5. ipv6 address xx:..:xx - Static IP, must configure default route separately if needed.
  6. ipv6 address xx:..:xx link-local - Static link-local IP, overrule default link-local which is based on mac-address.
  7. ipv6 address none - Default, no IPv6 is being configured for this interface. Question here is if link-local should still be configured automatically if "ipv6 enable" is set?
dmbaturin renamed this task from No IPv6 default route in 1.4.3 LTS to IPv6 default route disappears after upgrade.Thu, Nov 13, 12:08 AM
dmbaturin changed Is it a breaking change? from Unspecified (possibly destroys the router) to Perfectly compatible.