Page MenuHomeVyOS Platform

Allow WireGuard peers via DNS hostname
Open, NormalPublicFEATURE REQUEST

Assigned To
None
Authored By
b-
Jan 11 2023, 9:01 PM
Referenced Files
F5044844: 1-4.png
Tue, Nov 19, 3:25 PM
F5044842: 1-3.png
Tue, Nov 19, 3:25 PM
F5044546: 1-2.png
Tue, Nov 19, 10:41 AM
F5044544: 1-1.png
Tue, Nov 19, 10:41 AM

Description

Hi, I would really like to set int wireg wg123 peer foo address foo.example.com

This is important because:

  1. a number of WG peers I’ve tried to connect to don’t accept connections without a hostname
  2. I have a dynamic IP address at home, which means that if I want another VyOS router to be a WireGuard peer with my home router I need to manually update the IP address in the VyOS config when it changes. Allowing a DNS name would enable using a dynamic DNS hostname, which would Just Work™

I added the 1.3.3 tag because my (likely incorrect) assumption is that this would be as simple as allowing text in addition to IP addresses for peers? In which case I wouldn’t expect it to need much testing/validation. But I’m not certain, don’t quote me on that!

Details

Version
-
Is it a breaking change?
Config syntax change (migratable)

Event Timeline

Unfortunately this is not a trivial task as WG does the DNS lookup only once on tunnel creation and not subsequently. A 3rd party script would be required to do that.

I have a similar issue and what I do is I have a static HUB and the dynamic clients start the connection to the HUB. The HUB itself has no CLI address or endpoint definition (depending on VyOS version you are using), and all connections are started from the client and the HUB will accept them.

Does anyone have any thoughts on the best place to start adding this functionality / design ideas for this feature?

@c-po I'm curious, does using a hub like you suggest mean all data gets proxied through the hub, or is the hub enough to facilitate the connections and then the clients talk directly to each other?

+1

I'm migrating my EdgeRouter to VyOS, its module 'https://github.com/WireGuard/wireguard-vyatta-ubnt/releases' supports endpoint as domain:port, which is same as wg set or wg-quick.

  • My home router and company router, both IP changes frequently, I set up ddns for both sides, and a cron on both side to do reconnect by wg set.
  • My server, with low bandwidth but fixed IP.

Change endpoint to domain:port is not a problem, but remember or lookup IP is.

My migrating plan is not setting address nor port in peers, but set up the endpoints using the same cron script. But this is not an elegant solution, let WG resolve a name on peer start would be very useful.

I'll try to get VyOS supports endpoint with domain name (let WG resolve it) and start a PR as soon as I get it done.

Also, I created another PR for T6490, the original code creates WG interface when creating peers, if peer creates failed, for example dns not working, WG interface won't be properly configure. Is that the reason endpoint with domain name was not supported yet?
And, I noticed in firewall doc (https://docs.vyos.io/en/latest/configuration/firewall/), it says 'Due to a race condition that can lead to a failure during boot process, all interfaces are initialized before firewall is configured', is the reason above same as this 'race condition' ?

I simply added hostname/fqdn to address and here is my test result:

vyos@vyos# compare
[interfaces wireguard wg0]
+ address "192.168.85.40/23"
+ mtu "1420"
+ peer default-1 {
+     address "t2.vm.xxx.xxx"
+     allowed-ips "192.168.85.2/32"
+     port "61520"
+     preshared-key "xxxxx="
+     public-key "xxxxx"
+ }
+ port "61520"

[edit]
vyos@vyos# commit
[edit]
vyos@vyos# ping 192.168.85.2
PING 192.168.85.2 (192.168.85.2) 56(84) bytes of data.
64 bytes from 192.168.85.2: icmp_seq=1 ttl=64 time=7.35 ms
^C
--- 192.168.85.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 7.349/7.349/7.349/0.000 ms
[edit]
vyos@vyos# save

Should I start a PR with such modification? Or add a new entry, sth like domain-name, endpoint-host?

Code commit here: https://github.com/sskaje/vyos-1x/tree/T4930

I haven't create PR yet because this changes requires my changes T6490 approved, otherwise DNS may cause WG interface failure.

My changes:

  • Allow set interfaces wireguard wgX peer XXX address as hostname/fqdn in interface-definitions/interfaces_wireguard.xml.in
  • Create new op command reset wireguard interface, it reads config from config tree and redo wg set wg0 peer xxx endpoint address:port

Smoke test result:

root@vyos:/opt/vyos-1x# python3 /usr/libexec/vyos/tests/smoke/cli/test_interfaces_wireguard.py
test_01_wireguard_peer (__main__.WireGuardInterfaceTest.test_01_wireguard_peer) ... ok
test_02_wireguard_add_remove_peer (__main__.WireGuardInterfaceTest.test_02_wireguard_add_remove_peer) ... ok
test_03_wireguard_same_public_key (__main__.WireGuardInterfaceTest.test_03_wireguard_same_public_key) ... ok
test_04_wireguard_threaded (__main__.WireGuardInterfaceTest.test_04_wireguard_threaded) ... ok
test_05_wireguard_peer_pubkey_change (__main__.WireGuardInterfaceTest.test_05_wireguard_peer_pubkey_change) ... ok

----------------------------------------------------------------------
Ran 5 tests in 55.790s

OK

New op command introduced:

# reset all peers under wg0
root@vyos:/opt/vyos-1x# reset wireguard interface wg0
Resetting wg0 peer xxxx= endpoint to t1.vm.xxx.xxx:60020 ... done
Resetting wg0 peer yyyy= endpoint to t2.vm.xxx.xxx:60020 ... done

# reset single peer
root@vyos:/opt/vyos-1x# reset wireguard interface wg0 peer wg0-xxx
Resetting wg0 peer xxx= endpoint to t2.vm.xxx.xxx60020 ... done

@sskaje, what would it be like without an internet connection while the system was loaded? And will the Internet be available 1-2 minutes after boot?
I think you will get the router without wireguard at all, as it was in the previous commits. It cannot resolve the address, so it cannot create a session. And it will be in this state until you reconfigure it again.

@Viacheslav I made is based on T6490, PR here: https://github.com/vyos/vyos-1x/pull/4194
This PR makes peers no longer required, so WireGuard interfaces will be created on boot, with or without Internet connection, with or without DNS resolution.

But as I tested on my VM, by disconnecting all interfaces, WG interfaces were all correctly created but peers not.

So I made another change, splitting peer config into creating peer and setting endpoint, changes here: https://github.com/vyos/vyos-1x/commit/60dd10f229868b46bdf1b93e03db44f9ea7246e4

Code not 100% completed yet because I haven't found the correct way logging and displaying exception in VyOS, and also making wg dns resolution not blocking so long time.

I thought dns look up causes the block. So I made some changes like:

vyos@vyos:~$ configure
[edit]
vyos@vyos# show system option
 resolv {
     attempts 1
     timeout 5
 }
[edit]
vyos@vyos# cat /etc/reso
resolv.conf  resolvconf/
[edit]
vyos@vyos# cat /etc/resolv.conf
### Autogenerated by VyOS ###
### Do not edit, your changes will get overwritten ###


# system
nameserver 192.168.11.1



options attempts:1 timeout:5
[edit]
vyos@vyos#

Nothing helps.

I try to run wg set manually, got errors like

Device or resource busy: 'some.domain:port'. Trying again in 1.00 seconds...

The retry delay time increase.

Then I looked into wg source code

wireguard-tools/src/config.c

static inline bool parse_endpoint(struct sockaddr *endpoint, const char *value)
{               
...

        #define min(a, b) ((a) < (b) ? (a) : (b))
        for (unsigned int timeout = 1000000;; timeout = min(20000000, timeout * 6 / 5)) {
                ret = getaddrinfo(begin, end, &hints, &resolved);
                if (!ret)
                        break;
                /* The set of return codes that are "permanent failures". All other possibilities are potentially transient.
                 *
                 * This is according to https://sourceware.org/glibc/wiki/NameResolver which states:
                 *      "From the perspective of the application that calls getaddrinfo() it perhaps
                 *       doesn't matter that much since EAI_FAIL, EAI_NONAME and EAI_NODATA are all
                 *       permanent failure codes and the causes are all permanent failures in the
                 *       sense that there is no point in retrying later."
                 *
                 * So this is what we do, except FreeBSD removed EAI_NODATA some time ago, so that's conditional.
                 */
                if (ret == EAI_NONAME || ret == EAI_FAIL ||
                        #ifdef EAI_NODATA
                                ret == EAI_NODATA ||
                        #endif
                                (retries >= 0 && !retries--)) {
                        free(mutable);
                        fprintf(stderr, "%s: `%s'\n", ret == EAI_SYSTEM ? strerror(errno) : gai_strerror(ret), value);
                        return false;
                }
                fprintf(stderr, "%s: `%s'. Trying again in %.2f seconds...\n", ret == EAI_SYSTEM ? strerror(errno) : gai_strerror(ret), value, timeout / 1000000.0);
                usleep(timeout);
        }

The retry is not configurable for now.

UPDATE
WG_ENDPOINT_RESOLUTION_RETRIES=5 wg set wg0 peer peer0  endpoint xxx:ppp

This works when manually executing command and after passing to _cmd() and execute with reset wireguard, but not working on boot.

Hi @sskaje!

In reference to T1700 and other tickets, there are things in wireguard that is not implemented the "best way".
For the first as noted earlier, the Wireguard kernel module have NO information of the existance of a "DNS peer", DNS to IP mappings are done by the wg config utillity at the moment the command is executed on the device.. this means that entering a DNS name as a peer address will execute a DNS request ONCE and its never retried .

On boot this DNS resolutin is done in the initial commit to vyos, mangled with interface setup etc. as for this if the router uses dhcp on its upstream, its likely that the dhcp process is not completed at the time wg tries to setup the peer.
Resulting in wg not getting a dns response and might be stuck blocking the vyos commit while waiting for a "DNS timeout".

If you ask me there should never be a "Blocking application" executed while vyos committing as this could make commiting in vyos take an enormous time based on the config tried to get commited.
also if you make wg "try harder" to resolve the dns name it could result in a deadlock where wg cant ever resolv the name and the commit cant continue before this resolution is done.
In either case you end up with a peer not beeing configured correctly.

In my mind the correct answer for this not to do this dns resolvation at commit time but to create a "wireguard-dns-resolver" daemon that can be spawned by the vyos configurator at commit time thats responsible for trying to resolve the dns name multiple times. this could also retry indefinitely until it resolves. This will also make it possible to re-resolve wg on a "dead peer", aka. X minutes after the last packet/hearbeat is seen.

In T4930#208083, @runar wrote:

Hi @sskaje!

In reference to T1700 and other tickets, there are things in wireguard that is not implemented the "best way".
For the first as noted earlier, the Wireguard kernel module have NO information of the existance of a "DNS peer", DNS to IP mappings are done by the wg config utillity at the moment the command is executed on the device.. this means that entering a DNS name as a peer address will execute a DNS request ONCE and its never retried .

It's already known, that's why I use scripts to redo wg set on my ubnt and servers, and also I created the reset wireguard command.
And, "ONCE or never retried" doesn't matter, if connection was established and being kept alive by sth like crontab + ping, ip/port changes will be auto handled by wg, this is also part of the strategy I use to by pass some of my cloud providers' and national firewall's inspection.

On boot this DNS resolutin is done in the initial commit to vyos, mangled with interface setup etc. as for this if the router uses dhcp on its upstream, its likely that the dhcp process is not completed at the time wg tries to setup the peer.
Resulting in wg not getting a dns response and might be stuck blocking the vyos commit while waiting for a "DNS timeout".

If you ask me there should never be a "Blocking application" executed while vyos committing as this could make commiting in vyos take an enormous time based on the config tried to get commited.
also if you make wg "try harder" to resolve the dns name it could result in a deadlock where wg cant ever resolv the name and the commit cant continue before this resolution is done.
In either case you end up with a peer not beeing configured correctly.

WG interface start up + peer set up without endpoint is enough for most cases, let wg go try but don't block the boot procedure.
I have 5 WG interfaces and 5 peers with endpoints which requires domain to establish connectection. When I disconnect my VM's all network interfaces and reboot, the start procedure stuck for 300 seconds to wait for the wg retrying dns resolution.

At very first moment, I planned to add an extra config entry, like host-name, or domain-name, same level with address, priority address+port > host-name + port, but that's too complex for users who has simple and stable networks.

Most ppl just need to set like ubnt wireguard module, peer xxx endpoint domain:port, tell them what they may meet if dns not working and try to let the wg not block on boot is enough.

1-1.png (1×2 px, 371 KB)

1-2.png (1×2 px, 363 KB)

The best solution so far is making changes to wg, let we can control it don't retry that many time. I'll work on that.

In my mind the correct answer for this not to do this dns resolvation at commit time but to create a "wireguard-dns-resolver" daemon that can be spawned by the vyos configurator at commit time thats responsible for trying to resolve the dns name multiple times. this could also retry indefinitely until it resolves. This will also make it possible to re-resolve wg on a "dead peer", aka. X minutes after the last packet/hearbeat is seen.

You can't find a good way suitable for everyone getting the dns redo-resolve the peer domain. For home users, who uses PPPoE, the ppp/ip-up.d was a good point to trigger dns-resolution, but for others with different and complex network settings, they just need the ability to let wg redo dns resolution.

5 endpoints using domain, limit retry to 5 times, total start costs around 5 * 10 = 50 seconds.

Before:

1-3.png (1×2 px, 616 KB)

After:

1-4.png (1×2 px, 633 KB)

The retry could be set lower.

Hi!

I do not like the concept that this should be done inline while in the middle of a commit.
As this will halt the commit phase for potentially a long time (relative) if dns is not up'n'running.
This in itself is not that critical, but if this is done the same on multiple sub-systems you potentially can have an exponentionall increase of boot time because of this.
And in a time where we are optimising milliseconds of code to get shorter boot and commit times in other subsystems i feel this is not the correct way to do it.

If you should do this properly you need to fork off the dns requesting jobs to its own process that does not run inline with the commit scripts.
The best for this is a small config file and a small daemon started by systemd.

and on another note.
as a "hotfix" for not beeing allowed to create a peer without a peer address you could set the peer address to 0.0.0.0.. a peer can also be added with this address if the user does not specify an address?

I'm also attaching a example small script for doing the dns resolution https://gist.github.com/runborg/9511113fbcc17897e09e40ceba0828f5 (this is done in bash and cron)

In T4930#208505, @runar wrote:

Hi!

I do not like the concept that this should be done inline while in the middle of a commit.
As this will halt the commit phase for potentially a long time (relative) if dns is not up'n'running.
This in itself is not that critical, but if this is done the same on multiple sub-systems you potentially can have an exponentionall increase of boot time because of this.
And in a time where we are optimising milliseconds of code to get shorter boot and commit times in other subsystems i feel this is not the correct way to do it.

My opinion, as a router, wireguard interface ready and peers ready (endpoint excluded) are the most important things:

  • if another side of endpoint (I don't know how to name it, just another side of the peer) can initialiaze the connection, it will get connected,
  • if vyos needs to initialize the connection, all we need is a delayed task redo wg set wgX peer peerX endpoint domain:port

but, I don't think we can define the "delay".

For the boot up time using, I've update code in https://github.com/vyos/vyos-1x/pull/4200
And I've tested if I set the retry times to 1x, the boot up time usage is 1 second * each peer configured with domain for its endpoint.
For me, 5x is 5 seconds.

If you should do this properly you need to fork off the dns requesting jobs to its own process that does not run inline with the commit scripts.
The best for this is a small config file and a small daemon started by systemd.

No need to fork off the dns requesting jobs to its own process , just give operators/administrators ability to handle this, like manually reset wireguard or task-scheduler(crontab) + vbash scripting

and on another note.
as a "hotfix" for not beeing allowed to create a peer without a peer address you could set the peer address to 0.0.0.0.. a peer can also be added with this address if the user does not specify an address?

I'm also attaching a example small script for doing the dns resolution https://gist.github.com/runborg/9511113fbcc17897e09e40ceba0828f5 (this is done in bash and cron)

I don't agree with the 'refresh peer' idea in this script, it would ruin connections with peers with ddns. Also the $IP by digging, it's stupid and it can't handle any CNAMEs.

I don't agree with the 'refresh peer' idea in this script, it would ruin connections with peers with ddns. Also the $IP by digging, it's stupid and it can't handle any CNAMEs.

This script is handed over as an example for how it potentially can be done and not a "this is how you should do it", and yea there are potentials for improvement. but to call digging stupid is not correct in my mind, as it does exactly what its set to do.
as for the issue you noted that can be fixed by using tail -1 instead of head -1, that way you get the last element in the list, eg. the address that the cname points to.

No need to fork off the dns requesting jobs to its own process , just give operators/administrators ability to handle this, like manually reset wireguard or task-scheduler(crontab) + vbash scripting

Well, here we disagree...
For the first, to use task-scheduler. if you create a solution that should handle getting a DNS name then why will you rely on the user to use custom scripts to do this? in that case that is already possible to do now without any modification so then i do not se the reason for this ticket.

like manually reset

You cant guarantee that the topology created makes the administrator able to connect back to the device before the tunnel is up, eg. if the device is located behind a firewall or nat device that the administrator does not in any way control. having this in mind you need to create a solution that can "self heal" or at least a solution that have a high rate of success independent of other factors.

but, I don't think we can define the "delay".

completely agree, and thats why you need to fork it out to another script/daemon that handles this.

And I've tested if I set the retry times to 1x, the boot up time usage is 1 second * each peer configured with domain for its endpoint.
For me, 5x is 5 seconds.

You cant guarantee that this always will be the case, let me try to give some examples:

  1. following the vyos priority list (/opt/vyatta/sbin/priority.pl) interfaces/wireguard have a priority of 379, so every service with a priority higher than 379 will not have executed when interfaces/wireguard executes. one of these are system/name-server with a priority of 400. this means that if your device have a manually configured name-server (not from dhcp) you will not be able to resolve dns names. the same if your upstream needs interfaces/tunnel (380) or interfaces/vti(381). you will actually delay initialisation of all services with a priority higher than 379 while waiting for dns replies.
  1. If your internet connection is not available at the time of booting you will try to resolve and soon stop trying. before your connection is available. this could be eg. because of a power outage on your providers equipment, if your uplink is a WWAN interface or any other connection that needs more time to be in-service.

Because of these two examples we can never know when the internet uplink will be available it can take 1sec or it can take 5 hours. This means you need to be able to retry dns resolution until 1) a domain name is resolved or 2) a wireguard peer connects back to you. That's why you need to handle this outside of the commit cycle by a external program/daemon. Delaying the commit cycle is not an viable option in this regards.

Doing this as an external daemon again also makes it possible to implement more features into this. eg. a feature that tries to do a resolution and check if the dns name have changed if you loose connection with the wireguard peer over time..(eg. after 5-10+ minutes of no replies) you could also do things like cycle back'n'forth between dns resolution and last known good address until you get something that responds... But No, i do not say that you need all these features in your scenario but its things thats possible to do when you have a control process running in the background.

@runar btw, we have python script for the priority /usr/libexec/vyos/priority.py

@runar btw, we have python script for the priority /usr/libexec/vyos/priority.py

Thnx for this! :) i wasn't aware of a new implementation of the old perl script!

This comment was removed by c-po.

Does anyone have any thoughts on the best place to start adding this functionality / design ideas for this feature?

@c-po I'm curious, does using a hub like you suggest mean all data gets proxied through the hub, or is the hub enough to facilitate the connections and then the clients talk directly to each other?

Everything traverses through the hub.

@runar @sskaje

In general I like the idea and it's a very useful addition. Given the current implementation and design of wireguard to be easy, lightweight and not messed with 1000 of config options the design choice is to move everything requiring brain out of the WG core code.

In the 1.5 development cycle which will be stream and is current there are multiple enhancements to that DNS/FQDN stuff mainly for firewall an NAT:

Like':

  • set firewall group domain-group foo address bar.com
  • set nat destination rule 10 source fqdn

Both spawn a process named:

root       17818  6.7  1.2  64572 51968 ?        Ss   07:32   0:00 /usr/bin/python3 -u /usr/libexec/vyos/vyos-domain-resolver.py

So @sskaje maybe you can make yourself familiar with the generic daemon to update your WG remote peer IP address once the DNS changes.

Routgh idea:

  • If peer is created with hostname over IP, do not add it during commit
  • vyos-domain-resolver.py should take care of dynamically adding/removing peers that have a hostname configured
In T4930#208881, @c-po wrote:

@runar @sskaje

In general I like the idea and it's a very useful addition. Given the current implementation and design of wireguard to be easy, lightweight and not messed with 1000 of config options the design choice is to move everything requiring brain out of the WG core code.

not WG core, just the cli tool wg.

In the 1.5 development cycle which will be stream and is current there are multiple enhancements to that DNS/FQDN stuff mainly for firewall an NAT:

Like':

  • set firewall group domain-group foo address bar.com
  • set nat destination rule 10 source fqdn

Both spawn a process named:

root       17818  6.7  1.2  64572 51968 ?        Ss   07:32   0:00 /usr/bin/python3 -u /usr/libexec/vyos/vyos-domain-resolver.py

So @sskaje maybe you can make yourself familiar with the generic daemon to update your WG remote peer IP address once the DNS changes.

I'm ok with updating peer endpoint with similar way, but

Routgh idea:

  • If peer is created with hostname over IP, do not add it during commit

I would insist only not adding endpoint, let peer be ready if they have configured with hostname endpoint.

  • vyos-domain-resolver.py should take care of dynamically adding/removing peers that have a hostname configured

Removing peer will be a bad decision, the only case is updating peer endpoints if they have use domain hostname and not connected in last XX seconds/minutes, otherwise it would break connections from a recently changed dns record or ddns, and the other side initialized the connection, (for me, both home and work place use ddns and configured each other's domain on ubnt)

Here is how to get the latest-handshakes in seconds:

# wg show wg0 latest-handshakes
xxxw=    1732812147
xxx=    0

I don't see any document on wireguard.com about handshake interval, but I see someone says it's 120-180 seconds.

If last_handshakes == 0 or (current_time - latest_handshakes)> 300, that would be safe enough to perform re-resolve and reconnect

I would insist only not adding endpoint, let peer be ready if they have configured with hostname endpoint.

If it does not block commit that's fine for me

Removing peer will be a bad decision, the only case is updating peer endpoints if they have use domain hostname and not connected in last XX seconds/minutes, otherwise it would break connections from a recently changed dns record or ddns, and the other side initialized the connection, (for me, both home and work place use ddns and configured each other's domain on ubnt)

If it does not block the initial commit that's fine for me

Code committed.

Flag file separated by interface like

root@vyos:/home/vyos# ls -al /run/use-*
-rw-r--r-- 1 root vyattacfg 184 Nov 30 16:33 /run/use-vyos-domain-resolver-interfaces-wireguard-wg0
-rw-r--r-- 1 root vyattacfg 182 Nov 30 16:33 /run/use-vyos-domain-resolver-interfaces-wireguard-wg4

vyos@vyos# delete interfaces wireguard wg6 peer wg6-xxxx disable
[edit]
vyos@vyos# commit
[edit]
vyos@vyos# ls -al /run/use-vyos-*
-rw-r--r-- 1 root vyattacfg 184 Nov 30 16:33 /run/use-vyos-domain-resolver-interfaces-wireguard-wg0
-rw-r--r-- 1 root vyattacfg 182 Nov 30 16:33 /run/use-vyos-domain-resolver-interfaces-wireguard-wg4
-rw-r--r-- 1 root vyattacfg 169 Nov 30 16:34 /run/use-vyos-domain-resolver-interfaces-wireguard-wg6
[edit]