Page MenuHomeVyOS Platform

pdns-recursor failing many previously working DNS lookups, failure rate of 10% after system upgrade
Open, NormalPublic

Description

After upgrading from 1.5-rolling-202412031443 to the latest rolling 2025.07.28-0022-rolling, I observe that DNS forwarding is now failing frequently, causing significant issues on the network:

$ monitor log
Jul 31 17:01:20 pdns-recursor[443320]: msg="Sending SERVFAIL during resolve" error="Too much time waiting for net|A, timeouts: 4, throttles: 0, queries: 24, 7062msec" subsystem="syncres" level="0" prio="Notice" tid="1" ts="1753981280.850" ecs="" mtid="1128" proto="udp" qname="v10.events.data.microsoft.com" qtype="A" remote="10.3.141.55:60715"
Jul 31 17:01:24 pdns-recursor[443320]: msg="Sending SERVFAIL during resolve" error="Too much time waiting for azure-dns.net|A, timeouts: 4, throttles: 4, queries: 29, 7495msec" subsystem="syncres" level="0" prio="Notice" tid="1" ts="1753981284.314" ecs="" mtid="1133" proto="udp" qname="v10.events.data.microsoft.com" qtype="A" remote="10.3.141.55:41802"
Jul 31 17:01:26 pdns-recursor[443320]: msg="Sending SERVFAIL during resolve" error="Too much time waiting for ns-1225.awsdns-25.org|A, timeouts: 3, throttles: 0, queries: 30, 7027msec" subsystem="syncres" level="0" prio="Notice" tid="1" ts="1753981286.639" ecs="" mtid="1136" proto="udp" qname="appmana-thanos.s3.dualstack.us-west-2.amazonaws.com" qtype="AAAA" remote="10.3.212.202:44259"
Jul 31 17:01:47 pdns-recursor[443320]: msg="Sending SERVFAIL during resolve" error="Too much time waiting for 270666084746.dkr.ecr.us-west-2.amazonaws.com|A, timeouts: 5, throttles: 1, queries: 36, 8131msec" subsystem="syncres" level="0" prio="Notice" tid="1" ts="1753981307.876" ecs="" mtid="1143" proto="udp" qname="270666084746.dkr.ecr.us-west-2.amazonaws.com" qtype="A" remote="10.3.212.202:39113"
Jul 31 17:01:47 pdns-recursor[443320]: msg="Sending SERVFAIL during resolve" error="Too much time waiting for ns-1321.awsdns-37.org|A, timeouts: 5, throttles: 0, queries: 28, 8176msec" subsystem="syncres" level="0" prio="Notice" tid="1" ts="1753981307.993" ecs="" mtid="1144" proto="udp" qname="270666084746.dkr.ecr.us-west-2.amazonaws.com" qtype="AAAA" remote="10.3.141.55:33408"
Jul 31 17:01:48 pdns-recursor[443320]: msg="Sending SERVFAIL during resolve" error="Too much time waiting for uk|A, timeouts: 4, throttles: 0, queries: 11, 7017msec" subsystem="syncres" level="0" prio="Notice" tid="1" ts="1753981308.758" ecs="" mtid="1147" proto="udp" qname="270666084746.dkr.ecr.us-west-2.amazonaws.com" qtype="A" remote="10.3.212.202:36837"
Jul 31 17:01:48 pdns-recursor[443320]: msg="Sending SERVFAIL during resolve" error="Too much time waiting for 270666084746.dkr.ecr.us-west-2.amazonaws.com|AAAA, timeouts: 4, throttles: 0, queries: 13, 7018msec" subsystem="syncres" level="0" prio="Notice" tid="1" ts="1753981308.759" ecs="" mtid="1146" proto="udp" qname="270666084746.dkr.ecr.us-west-2.amazonaws.com" qtype="AAAA" remote="10.3.141.55:54625"
Jul 31 17:01:49 pdns-recursor[443320]: msg="Sending SERVFAIL during resolve" error="Too much time waiting for 270666084746.dkr.ecr.us-west-2.amazonaws.com|AAAA, timeouts: 3, throttles: 0, queries: 12, 7049msec" subsystem="syncres" level="0" prio="Notice" tid="1" ts="1753981309.279" ecs="" mtid="1148" proto="udp" qname="270666084746.dkr.ecr.us-west-2.amazonaws.com" qtype="AAAA" remote="10.3.141.55:55427"

Configuration:

$ show configuration | strip-private | cat
firewall {
    global-options {
        all-ping enable
        broadcast-ping disable
        ipv6-receive-redirects disable
        ipv6-src-route disable
        ip-src-route disable
        log-martians enable
        receive-redirects disable
        send-redirects enable
        source-validation disable
        syn-cookies enable
        twa-hazards-protection disable
    }
    group {
        interface-group WAN {
            interface eth0
            interface wlan0
        }
        network-group CALICO-NETS {
            network xxx.xxx.0.0/16
            network xxx.xxx.184.0/24
        }
    }
    ipv4 {
        forward {
            filter {
                default-action accept
                rule 3 {
                    action accept
                    description "Allow Established/Related Hairpin"
                    inbound-interface {
                        name eth2
                    }
                    outbound-interface {
                        name eth2
                    }
                    state established
                    state related
                }
                rule 4 {
                    action accept
                    description "Allow LAN to Calico Hairpin (New)"
                    destination {
                        group {
                            network-group CALICO-NETS
                        }
                    }
                    inbound-interface {
                        name eth2
                    }
                    outbound-interface {
                        name eth2
                    }
                    source {
                        address xxx.xxx.0.0/24
                    }
                    state new
                }
                rule 5 {
                    action jump
                    inbound-interface {
                        group WAN
                    }
                    jump-target OUTSIDE-IN
                }
            }
        }
        input {
            filter {
                default-action accept
                rule 5 {
                    action jump
                    inbound-interface {
                        group WAN
                    }
                    jump-target OUTSIDE-LOCAL
                }
            }
        }
        name OUTSIDE-IN {
            default-action drop
            rule 10 {
                action return
            }
            rule 20 {
                action return
                destination {
                    address xxx.xxx.184.99
                    port 80,443
                }
                protocol tcp
            }
        }
        name OUTSIDE-LOCAL {
            default-action drop
            rule 10 {
                action return
            }
            rule 20 {
                action return
                icmp {
                    type-name echo-request
                }
                protocol icmp
            }
            rule 40 {
                action return
                protocol esp
            }
            rule 41 {
                action return
                destination {
                    port 500
                }
                protocol udp
            }
            rule 42 {
                action return
                destination {
                    port 4500
                }
                protocol udp
            }
            rule 43 {
                action return
                destination {
                    port 1701
                }
                ipsec {
                    match-ipsec-in
                }
                protocol udp
            }
        }
    }
    ipv6 {
        forward {
            filter {
                default-action accept
                rule 5 {
                    action jump
                    inbound-interface {
                        name eth0
                    }
                    jump-target WAN_IN
                }
            }
        }
        input {
            filter {
                default-action accept
                rule 5 {
                    action jump
                    inbound-interface {
                        name eth0
                    }
                    jump-target WAN_LOCAL
                }
            }
        }
        name WAN_IN {
            default-action drop
            rule 10 {
                action return
            }
            rule 20 {
                action return
                protocol ipv6-icmp
            }
            rule 30 {
                action return
                destination {
                    port 546
                }
                protocol udp
                source {
                    port 547
                }
            }
        }
        name WAN_LOCAL {
            default-action drop
            rule 10 {
                action return
            }
            rule 20 {
                action return
                protocol ipv6-icmp
            }
            rule 30 {
                action return
                description "dhcpv6 packets"
                destination {
                    port 546
                }
                protocol udp
                source {
                    port 547
                }
            }
        }
    }
}
interfaces {
    ethernet eth0 {
        address dhcp
        description OUTSIDE
        hw-id xx:xx:xx:xx:xx:bf
    }
    ethernet eth2 {
        address xxx.xxx.0.1/24
        description INSIDE
        hw-id xx:xx:xx:xx:xx:25
    }
    loopback lo {
    }
    wireless wlan0 {
        address dhcp
        hw-id xx:xx:xx:xx:xx:9a
        mgmt-frame-protection optional
        mode ac
        physical-device phy0
        security {
            wpa {
                mode wpa+wpa2
                passphrase ****************
            }
        }
        ssid "Studio 305"
        type station
    }
}
load-balancing {
    wan {
        enable-local-traffic
        flush-connections
        interface-health eth0 {
            nexthop dhcp
            test 10 {
                target xxx.xxx.8.8
                type ping
            }
        }
        interface-health wlan0 {
            nexthop dhcp
            test 10 {
                target xxx.xxx.8.8
                type ping
            }
        }
        rule 10 {
            failover
            inbound-interface eth2
            interface eth0 {
            }
        }
        rule 20 {
            failover
            inbound-interface eth2
            interface wlan0 {
            }
        }
        sticky-connections {
            inbound
        }
    }
}
nat {
    destination {
        rule 10 {
            description "NGINX controller appmana-cluster-03 SONIC WAN"
            destination {
                port 80,443
            }
            inbound-interface {
                group WAN
            }
            protocol tcp
            translation {
                address xxx.xxx.184.99
            }
        }
        rule 110 {
            description "nat reflection: inside"
            destination {
                address xxx.xxx.127.2
                port 80,443
            }
            disable
            inbound-interface {
                name eth2
            }
            protocol tcp
            translation {
                address xxx.xxx.184.99
            }
        }
    }
    source {
        rule 110 {
            description "nat reflection: inside"
            destination {
                address xxx.xxx.0.0/8
            }
            disable
            outbound-interface {
                name eth2
            }
            protocol tcp
            source {
                address xxx.xxx.0.0/8
            }
            translation {
                address masquerade
            }
        }
        rule 120 {
            description "SNAT for LAN to Calico K8s"
            destination {
                group {
                    network-group CALICO-NETS
                }
            }
            outbound-interface {
                name eth2
            }
            source {
                address xxx.xxx.0.0/24
            }
            translation {
                address masquerade
            }
        }
    }
}
pki {
    certificate generated_https {
        certificate  ...
        private {
            key xxxxxx
        }
    }
}
policy {
    prefix-list NO-ADVERTISE-PREFIX {
        rule 5 {
            action deny
            prefix xxx.xxx.0.0/24
        }
        rule 10 {
            action permit
            le 32
            prefix xxx.xxx.0.0/0
        }
    }
    prefix-list NO-ADVERTISE-VPC-PREFIX {
        rule 5 {
            action deny
            prefix xxx.xxx.168.0/22
        }
        rule 10 {
            action permit
            le 32
            prefix xxx.xxx.0.0/0
        }
    }
    route-map calico {
        rule 2 {
            action permit
            set {
                as-path {
                    prepend "2 2 2"
                }
            }
        }
    }
}
protocols {
    bgp {
        address-family {
            ipv4-unicast {
                redistribute {
                    connected {
                    }
                }
            }
        }
        listen {
            range xxx.xxx.0.0/24 {
                peer-group calico
            }
        }
        neighbor xxx.xxx.0.2 {
            peer-group calico
        }
        neighbor xxx.xxx.0.3 {
            peer-group calico
        }
        neighbor xxx.xxx.0.6 {
            peer-group calico
        }
        neighbor xxx.xxx.0.9 {
            peer-group calico
        }
        neighbor xxx.xxx.0.11 {
            peer-group calico
        }
        neighbor xxx.xxx.0.15 {
            peer-group calico
        }
        neighbor xxx.xxx.0.19 {
            peer-group calico
        }
        neighbor xxx.xxx.0.20 {
            peer-group calico
        }
        neighbor xxx.xxx.0.22 {
            peer-group calico
        }
        neighbor xxx.xxx.0.41 {
            peer-group calico
        }
        neighbor xxx.xxx.0.50 {
            peer-group calico
        }
        neighbor xxx.xxx.0.56 {
            peer-group calico
        }
        neighbor xxx.xxx.0.57 {
            peer-group calico
        }
        neighbor xxx.xxx.0.58 {
            peer-group calico
        }
        neighbor xxx.xxx.0.59 {
            peer-group calico
        }
        neighbor xxx.xxx.0.60 {
            peer-group calico
        }
        neighbor xxx.xxx.0.61 {
            peer-group calico
        }
        neighbor xxx.xxx.0.67 {
            peer-group calico
        }
        neighbor xxx.xxx.124.249 {
            peer-group aws
            timers {
                holdtime 30
                keepalive 10
            }
        }
        neighbor xxx.xxx.230.41 {
            peer-group aws
            timers {
                holdtime 30
                keepalive 10
            }
        }
        parameters {
            bestpath {
            }
            distance {
                global {
                    external 200
                    internal 20
                    local 1
                }
            }
        }
        peer-group aws {
            address-family {
                ipv4-unicast {
                    prefix-list {
                        export NO-ADVERTISE-VPC-PREFIX
                    }
                    soft-reconfiguration {
                        inbound
                    }
                }
            }
            remote-as XXXXXX
        }
        peer-group calico {
            address-family {
                ipv4-unicast {
                    prefix-list {
                        export NO-ADVERTISE-PREFIX
                    }
                    soft-reconfiguration {
                        inbound
                    }
                }
            }
            remote-as XXXXXX
        }
        system-as 65000
    }
}
qos {
    interface eth0 {
        egress UPLOAD-SHAPER
    }
    policy {
        shaper UPLOAD-SHAPER {
            bandwidth 1gbit
            default {
                bandwidth 95%
                queue-type fq-codel
            }
        }
    }
}
service {
    dhcp-server {
        shared-network-name xxxxxx {
            subnet xxx.xxx.0.0/24 {
                ignore-client-id
                lease 86400
                option {
                    default-router xxx.xxx.0.1
                    domain-name xxxxxx
                    domain-search xxxxxx
                    name-server xxx.xxx.0.1
                }
                range 0 {
                    start xxx.xxx.0.2
                    stop xxx.xxx.0.200
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.10
                    mac xx:xx:xx:xx:xx:23
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.41
                    mac xx:xx:xx:xx:xx:47
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.3
                    mac xx:xx:xx:xx:xx:94
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.9
                    mac xx:xx:xx:xx:xx:0A
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.11
                    mac xx:xx:xx:xx:xx:f7
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.40
                    mac xx:xx:xx:xx:xx:65
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.32
                    mac xx:xx:xx:xx:xx:71
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.50
                    mac xx:xx:xx:xx:xx:15
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.15
                    mac xx:xx:xx:xx:xx:FA
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.19
                    mac xx:xx:xx:xx:xx:A2
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.20
                    mac xx:xx:xx:xx:xx:65
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.22
                    mac xx:xx:xx:xx:xx:60
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.6
                    mac xx:xx:xx:xx:xx:58
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.2
                    mac xx:xx:xx:xx:xx:47
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.53
                    mac xx:xx:xx:xx:xx:7c
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.56
                    mac xx:xx:xx:xx:xx:FC
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.57
                    mac xx:xx:xx:xx:xx:43
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.58
                    mac xx:xx:xx:xx:xx:B1
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.59
                    mac xx:xx:xx:xx:xx:81
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.60
                    mac xx:xx:xx:xx:xx:93
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.61
                    mac xx:xx:xx:xx:xx:B0
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.65
                    mac xx:xx:xx:xx:xx:7e
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.67
                    mac xx:xx:xx:xx:xx:18
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.37
                    mac xx:xx:xx:xx:xx:05
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.4
                    mac xx:xx:xx:xx:xx:75
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.8
                    mac xx:xx:xx:xx:xx:e9
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.17
                    mac xx:xx:xx:xx:xx:ed
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.7
                    mac xx:xx:xx:xx:xx:7c
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.13
                    mac xx:xx:xx:xx:xx:82
                }
                static-mapping xxxxxx {
                    ip-address xxx.xxx.0.73
                    mac xx:xx:xx:xx:xx:35
                }
                subnet-id 1
            }
        }
    }
    dns {
        forwarding {
            allow-from xxx.xxx.0.0/16
            allow-from xxx.xxx.0.0/8
            cache-size 0
            listen-address xxx.xxx.0.1
        }
    }
    https {
        api {
            keys {
                id public-ip-job-key-id {
                    key xxxxxx
                }
            }
            rest {
            }
        }
        listen-address xxx.xxx.0.1
        port 9090
    }
    ntp {
        allow-client xxxxxx
            address xxx.xxx.0.0/0
            address ::/0
        }
        server xxxxx.tld {
        }
        server xxxxx.tld {
        }
        server xxxxx.tld {
        }
    }
    ssh {
        port 22
    }
}
/* test */
system {
    config-management {
        commit-revisions 100
    }
    conntrack {
        modules {
            ftp
            h323
            nfs
            pptp
            sip
            sqlnet
            tftp
        }
    }
    console {
        device ttyS0 {
            speed 115200
        }
    }
    host-name xxxxxx
    ip {
        multipath {
        }
    }
    login {
        user xxxxxx {
            authentication {
                encrypted-password xxxxxx
                plaintext-password xxxxxx
                public-keys xxxx@xxx.xxx {
                    key xxxxxx
                    type ssh-rsa
                }
                public-keys xxxx@xxx.xxx {
                    key xxxxxx
                    type ssh-rsa
                }
                public-keys xxxx@xxx.xxx {
                    key xxxxxx
                    type ssh-ed25519
                }
                public-keys xxxx@xxx.xxx {
                    key xxxxxx
                    type ssh-rsa
                }
            }
        }
    }
    name-server xxx.xxx.8.8
    name-server xxx.xxx.4.4
    static-host-mapping {
        host-name xxxxxx {
            inet xxx.xxx.0.10
        }
        host-name xxxxxx {
            inet xxx.xxx.0.10
        }
        host-name xxxxxx {
            inet xxx.xxx.0.41
        }
        host-name xxxxxx {
            inet xxx.xxx.0.41
        }
        host-name xxxxxx {
            inet xxx.xxx.0.3
        }
        host-name xxxxxx {
            inet xxx.xxx.0.3
        }
        host-name xxxxxx {
            inet xxx.xxx.0.9
        }
        host-name xxxxxx {
            inet xxx.xxx.0.9
        }
        host-name xxxxxx {
            inet xxx.xxx.0.11
        }
        host-name xxxxxx {
            inet xxx.xxx.0.11
        }
        host-name xxxxxx {
            inet xxx.xxx.0.40
        }
        host-name xxxxxx {
            inet xxx.xxx.0.40
        }
        host-name xxxxxx {
            inet xxx.xxx.0.32
        }
        host-name xxxxxx {
            inet xxx.xxx.0.32
        }
        host-name xxxxxx {
            inet xxx.xxx.0.50
        }
        host-name xxxxxx {
            inet xxx.xxx.0.50
        }
        host-name xxxxxx {
            inet xxx.xxx.0.15
        }
        host-name xxxxxx {
            inet xxx.xxx.0.15
        }
        host-name xxxxxx {
            inet xxx.xxx.0.19
        }
        host-name xxxxxx {
            inet xxx.xxx.0.19
        }
        host-name xxxxxx {
            inet xxx.xxx.0.20
        }
        host-name xxxxxx {
            inet xxx.xxx.0.20
        }
        host-name xxxxxx {
            inet xxx.xxx.0.22
        }
        host-name xxxxxx {
            inet xxx.xxx.0.22
        }
        host-name xxxxxx {
            inet xxx.xxx.0.6
        }
        host-name xxxxxx {
            inet xxx.xxx.0.6
        }
        host-name xxxxxx {
            inet xxx.xxx.0.2
        }
        host-name xxxxxx {
            inet xxx.xxx.0.2
        }
        host-name xxxxxx {
            inet xxx.xxx.0.53
        }
        host-name xxxxxx {
            inet xxx.xxx.0.53
        }
        host-name xxxxxx {
            inet xxx.xxx.0.56
        }
        host-name xxxxxx {
            inet xxx.xxx.0.56
        }
        host-name xxxxxx {
            inet xxx.xxx.0.57
        }
        host-name xxxxxx {
            inet xxx.xxx.0.57
        }
        host-name xxxxxx {
            inet xxx.xxx.0.58
        }
        host-name xxxxxx {
            inet xxx.xxx.0.58
        }
        host-name xxxxxx {
            inet xxx.xxx.0.59
        }
        host-name xxxxxx {
            inet xxx.xxx.0.59
        }
        host-name xxxxxx {
            inet xxx.xxx.0.60
        }
        host-name xxxxxx {
            inet xxx.xxx.0.60
        }
        host-name xxxxxx {
            inet xxx.xxx.0.61
        }
        host-name xxxxxx {
            inet xxx.xxx.0.61
        }
        host-name xxxxxx {
            inet xxx.xxx.0.65
        }
        host-name xxxxxx {
            inet xxx.xxx.0.65
        }
        host-name xxxxxx {
            inet xxx.xxx.0.67
        }
        host-name xxxxxx {
            inet xxx.xxx.0.67
        }
        host-name xxxxxx {
            inet xxx.xxx.0.37
        }
        host-name xxxxxx {
            inet xxx.xxx.0.37
        }
        host-name xxxxxx {
            inet xxx.xxx.0.4
        }
        host-name xxxxxx {
            inet xxx.xxx.0.4
        }
        host-name xxxxxx {
            inet xxx.xxx.0.8
        }
        host-name xxxxxx {
            inet xxx.xxx.0.8
        }
        host-name xxxxxx {
            inet xxx.xxx.0.17
        }
        host-name xxxxxx {
            inet xxx.xxx.0.17
        }
        host-name xxxxxx {
            inet xxx.xxx.0.7
        }
        host-name xxxxxx {
            inet xxx.xxx.0.7
        }
        host-name xxxxxx {
            inet xxx.xxx.0.13
        }
        host-name xxxxxx {
            inet xxx.xxx.0.13
        }
        host-name xxxxxx {
            inet xxx.xxx.0.73
        }
        host-name xxxxxx {
            inet xxx.xxx.0.73
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
        host-name xxxxxx {
            inet xxx.xxx.184.99
        }
    }
    sysctl {
        parameter net.ipv4.fib_multipath_hash_policy {
            value 1
        }
        parameter net.ipv6.fib_multipath_hash_policy {
            value 1
        }
    }
    syslog {
        local {
            facility all {
                level info
            }
            facility local7 {
                level debug
            }
        }
    }
    update-check {
        url xxxxxx
    }
    wireless {
        country-code xxxxxx
    }
}
vpn {
    ipsec {
        interface eth0
        options {
            disable-route-autoinstall
        }
    }
    l2tp {
        remote-access {
            authentication {
                local-users {
                    username xxxxxx {
                        password xxxxxx
                    }
                    username xxxxxx {
                        password xxxxxx
                    }
                    username xxxxxx {
                        password xxxxxx
                    }
                }
                mode local
                protocols mschap-v2
            }
            client-ip-pool default-range-pool {
                range xxx.xxx.1.2-xxx.xxx.1.254
            }
            default-pool default-range-pool
            gateway-address xxx.xxx.255.0
            ipsec-settings {
                authentication {
                    mode pre-shared-secret
                    pre-shared-secret xxxxxx
                }
            }
            name-server xxx.xxx.0.1
            outside-address xxx.xxx.127.2
        }
    }
}
$ show version
Version:          VyOS 2025.07.28-0022-rolling
Release train:    current
Release flavor:   generic

Built by:         autobuild@vyos.net
Built on:         Mon 28 Jul 2025 00:23 UTC
Build UUID:       ceedf106-f15f-4e48-a450-f3997fa245bb
Build commit ID:  4f5de07491c9f8

Architecture:     x86_64
Boot via:         installed image
System type:      bare metal
Secure Boot:      disabled

Hardware vendor:  HP
Hardware model:   HP EliteDesk 800 G5 Desktop Mini
Hardware S/N:     MXL95025NY
Hardware UUID:    800b5dc3-e6c8-ba65-0bcb-dc6bfdfbccb2

Copyright:        VyOS maintainers and contributors

Details

Version
VyOS 2025.07.28-0022-rolling
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)

Event Timeline

doctorpangloss triaged this task as Normal priority.
doctorpangloss created this object in space S1 VyOS Public.
# python3 <<'EOF'
> import socket
> import time
> import struct
> import random
> import sys
> 
> # --- Configuration ---
> DNS_SERVER = "8.8.8.8"
> DNS_PORT = 53
> QUERY_DOMAIN = "google.com"
> TIMEOUT = 1.0  # 1 second timeout
> 
> # --- Counters ---
> success_count = 0
> timeout_count = 0
> total_count = 0
> 
> def build_dns_query(domain_name):
>     # Standard DNS query header
>     transaction_id = random.randint(0, 65535)
>     flags = 0x0100  # Standard query
>     questions = 1
>     header = struct.pack('!HHHHHH', transaction_id, flags, questions, 0, 0, 0)
> 
>     # Question section
>     qname = b''
>     for part in domain_name.split('.'):
>         qname += struct.pack('B', len(part)) + part.encode('utf-8')
>     qname += b'\x00'  # End of QNAME
> 
>     qtype = 1  # A record
>     qclass = 1 # IN class
>     question = struct.pack('!HH', qtype, qclass)
> 
>     return header + qname + question
> 
> def print_stats():
>     global success_count, timeout_count, total_count
>     if total_count == 0:
>         return
>     
>     timeout_rate = (timeout_count / total_count) * 100
>     status_line = (
>         f"\rSuccess: {success_count} | "
>         f"Timeouts: {timeout_count} | "
>         f"Total: {total_count} | "
>         f"Timeout Rate: {timeout_rate:.2f}%  "
>     )
>     sys.stdout.write(status_line)
>     sys.stdout.flush()
> 
> print(f"--- Starting DNS timeout test against {DNS_SERVER} (Ctrl+C to stop) ---")
> 
> try:
>     while True:
>         total_count += 1
>         query = build_dns_query(QUERY_DOMAIN)
>         
>         sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
>         sock.settimeout(TIMEOUT)
>         
>         try:
>             sock.sendto(query, (DNS_SERVER, DNS_PORT))
>             data, addr = sock.recvfrom(512)
>             success_count += 1
>         except socket.timeout:
>             timeout_count += 1
>         except Exception as e:
>             # Handle other potential errors, though timeout is the one we expect
>             timeout_count += 1
>         finally:
>             sock.close()
> 
>         print_stats()
>         time.sleep(1)
> 
> except KeyboardInterrupt:
>     print("\n--- Test stopped. Final Statistics ---")
>     print_stats()
>     print("\n")
> 
> EOF
--- Starting DNS timeout test against 8.8.8.8 (Ctrl+C to stop) ---
Success: 68 | Timeouts: 0 | Total: 68 | Timeout Rate: 0.00%  ^C
--- Test stopped. Final Statistics ---
Success: 68 | Timeouts: 0 | Total: 68 | Timeout Rate: 0.00%

directly sending dns packets to 8.8.8.8 works fine

whereas, if I allow the applications to use the system DNS:

# python3 <<'EOF'
> import socket
> import time
> import sys
> import signal
> 
> # --- Configuration ---
> QUERY_DOMAIN = "github.com"
> TIMEOUT = 2.0  # <--- Timeout is now set to 2 seconds
> 
> # --- Counters ---
> success_count = 0
> timeout_count = 0
> total_count = 0
> timings = []
> 
> def timeout_handler(signum, frame):
>     # This function is called by the signal alarm to interrupt the blocking call
>     raise TimeoutError("DNS query timed out")
> 
> # Register the timeout handler for the ALARM signal
> signal.signal(signal.SIGALRM, timeout_handler)
> 
> def print_stats():
>     global success_count, timeout_count, total_count, timings
>     if total_count == 0:
>         return
>     
>     timeout_rate = (timeout_count / total_count) * 100
>     avg_latency = (sum(timings) / len(timings) * 1000) if timings else 0
>     
>     status_line = (
>         f"\rSuccess: {success_count} | "
>         f"Timeouts: {timeout_count} | "
>         f"Total: {total_count} | "
>         f"Timeout Rate: {timeout_rate:.2f}% | "
>         f"Avg Latency: {avg_latency:.2f} ms  "
>     )
>     sys.stdout.write(status_line)
>     sys.stdout.flush()
> 
> print(f"--- Starting system DNS test (resolving {QUERY_DOMAIN}) (Ctrl+C to stop) ---")
> 
> try:
>     while True:
>         total_count += 1
>         start_time = time.monotonic()
>         
>         # Set a 2-second alarm to enforce the timeout
>         signal.alarm(int(TIMEOUT))
>         
>         try:
>             # This uses the system's resolver (/etc/resolv.conf)
>             socket.gethostbyname(QUERY_DOMAIN)
>             
>             # If we get here, it was a success
>             end_time = time.monotonic()
>             timings.append(end_time - start_time)
>             # Prune timings list to keep it from growing indefinitely
>             if len(timings) > 100:
>                 timings.pop(0)
>             success_count += 1
>             
>         except (socket.gaierror, TimeoutError):
>             # A gaierror or our custom TimeoutError are caught here
>             timeout_count += 1
>         finally:
>             # IMPORTANT: Disable the alarm so it doesn't fire later
>             signal.alarm(0)
> 
>         print_stats()
>         time.sleep(1)
> 
> except KeyboardInterrupt:
>     print("\n--- Test stopped. Final Statistics ---")
>     print_stats()
>     print("\n")
> 
> EOF
--- Starting system DNS test (resolving github.com) (Ctrl+C to stop) ---
Success: 36 | Timeouts: 4 | Total: 40 | Timeout Rate: 10.00% | Avg Latency: 12.52 ms

about 10% of dns lookups are failing. it happens periodically, like a buffer is filling or something. so to me this confirms it is not my ISP, it's power-dns

doctorpangloss renamed this task from Too much time waiting for... pdns-recursor failing many previously working DNS lookups to pdns-recursor failing many previously working DNS lookups, failure rate of 10% after system upgrade.Jul 31 2025, 5:51 PM
set service dns forwarding cache-size 10000
set service dns forwarding system

appears to resolve the issue, any insights?

doctorpangloss lowered the priority of this task from Normal to Low.
doctorpangloss raised the priority of this task from Low to Normal.

this issue persists, it's now manifesting itself in different ways. about 10% of DNS lookups from LAN clients still fail, now it's 10% of addresses looked up instead of 10% of queries total.

set service dns forwarding cache-size 0

resolves this issue, so I think there is something really busted about pdns recursor right now

many dns timeouts persist. is this interacting with wan load balancing?