Page MenuHomeVyOS Platform

Configure GRE over IPsec tunnel when source port is in VRF, OSPF causes GRE tunnel broken.
Open, LowPublicBUG

Description

Refer to T4031. My ASBR has two VRFs:

  1. cm_up to the Internet
  2. default for my backbone

My backbone have to use GRE over IPsec tunnel across the Internet to reach my other routers. By the default setting, I can't setup the tunnel like T4031.
So I modify the strongswan systemd service like this:

ExecStart=/usr/sbin/ip vrf exec cm_up /usr/sbin/charon-systemd

And it worked. When I setup OSPF over GRE tunnel, it works alright. But when I reboot the instance, the IPsec tunnel works correctly but the GRE tunnel is broken: it can't send or receive packet.

I tried to restart ipsec process, it doesn't work. I tried to delete tunnel and recreate one, but it doesn't work too.

But, when I delete the tunnel's OSPF announcement, and disable the tunnel and re-enable it. It works. When I re-set the tunnel's OSPF announcement, everything works smoothly.

I don't know what causes this bug but I'd love to fix the IPsec over VRF problem. But I have no idea about why OSPF brokes GRE tunnel.

Here's my configuration:

vyos@bsp-asbr2-cm:~$ show conf
interfaces {
    dummy dum0 {
        address 192.168.127.32/32
        description "GRE over IPSec originate loopback"
        vrf cm_up
    }
    dummy dum1 {
        address 192.168.127.34/32
    }
    ethernet eth0 {
        address XXX.XXX.XX.100/25
        description "To China Mobile static access"
        hw-id 00:0c:29:33:09:da
        vrf cm_up
    }
    ethernet eth1 {
        address 192.168.124.1/28
        description "Downstream to vSRX"
        hw-id 00:0c:29:33:09:e4
    }
    ethernet eth2 {
        address 192.168.124.66/28
        description "MPLS BB between 2 HV"
        disable
        hw-id 00:0c:29:33:09:ee
    }
    ethernet eth3 {
        address 192.168.124.33/28
        description "MPLS BB originate from CM"
        hw-id 00:0c:29:33:09:f8
        vrf cm_up
    }
    loopback lo {
    }
    tunnel tun0 {
        address 10.96.255.9/30
        description "S2S VPN 1"
        encapsulation gre
        ip {
            adjust-mss clamp-mss-to-pmtu
        }
        mtu 1428
        remote 192.168.63.32
        source-address 192.168.127.32
        source-interface dum0
    }
}
nat {
    destination {
        rule 10 {
            destination {
                port 10000-64000
            }
            inbound-interface eth0
            protocol tcp_udp
            translation {
                address 192.168.124.34
            }
        }
    }
    source {
        rule 10 {
            outbound-interface eth0
            protocol all
            translation {
                address masquerade
            }
        }
    }
}
pki {
    key-pair ipsec-CDSLCM {
        private {
            key ****************
        }
        public {
            key ****************
        }
    }
    key-pair ipsec-CDSLCU {
        public {
            key ****************
        }
    }
    key-pair ipsec-JXNCCT {
        public {
            key ****************
        }
    }
}
protocols {
    ospf {
        area 0.0.0.0 {
            network 192.168.0.0/15
            network 10.96.0.0/16
        }
        parameters {
            router-id 192.168.127.32
        }
    }
}
qos {
    policy {
        shaper test {
            bandwidth 330mbit
            default {
                bandwidth 300mbit
                queue-type fair-queue
            }
        }
    }
}
service {
    ntp {
        allow-client {
            address 0.0.0.0/0
            address ::/0
        }
        server time1.vyos.net {
        }
        server time2.vyos.net {
        }
        server time3.vyos.net {
        }
    }
    ssh {
        listen-address 192.168.124.1
    }
}
system {
    config-management {
        commit-revisions 100
    }
    conntrack {
        modules {
            ftp
            h323
            nfs
            pptp
            sip
            sqlnet
            tftp
        }
    }
    console {
        device ttyS0 {
            speed 115200
        }
    }
    host-name bsp-asbr2-cm
    login {
        user vyos {
            authentication {
                encrypted-password ****************
            }
        }
    }
    name-server 114.114.114.114
    syslog {
        global {
            facility all {
                level info
            }
            facility protocols {
                level debug
            }
        }
    }
    time-zone Asia/Shanghai
}
vpn {
    ipsec {
        esp-group MyESPGroup {
            proposal 1 {
                encryption aes128
                hash aes128gmac
            }
        }
        ike-group MyIKEGroup {
            proposal 1 {
                dh-group 2
                encryption aes128
                hash sha1
            }
        }
        interface eth0
        site-to-site {
            peer JXNCCT {
                authentication {
                    local-id cdslcm.ras.meit.su
                    mode rsa
                    remote-id zion.lv2.pw
                    rsa {
                        local-key ****************
                        remote-key ****************
                    }
                }
                connection-type respond
                default-esp-group MyESPGroup
                ike-group MyIKEGroup
                local-address XXX.XXX.XX.100
                remote-address any
                tunnel 1 {
                    local {
                        prefix 192.168.127.32/32
                    }
                    remote {
                        prefix 192.168.63.32/32
                    }
                }
            }
        }
    }
}
vrf {
    name cm_up {
        protocols {
            static {
                route 0.0.0.0/0 {
                    next-hop XXX.XXX.XX.1 {
                    }
                }
            }
        }
        table 101
    }
}
vyos@bsp-asbr2-cm:~$

Btw, can we default enable mitigations=off parameter on older hardware (like haswell/broadwell) when installation is taking progress?

Because without it the system's routing and ipsec performance will drop to some unbearable level. Like the IPsec throughput in D1521 is around 300Mbps with ksoftirqd take one CPU core entirely without mitigations=off.

Details

Difficulty level
Unknown (require assessment)
Version
1.4-rolling-202302150317
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Unspecified (please specify)

Event Timeline

Btw, in this rolling release, OSPF BFD in tunnel doesn't work correctly too.

When I set BFD on tunnel interface, it refused to calculate routes and put it in the route table whatever BFD state is up or down. I have to delete the BFD configuration, commit, save and reboot to re-bring it up.

Mar 16 12:47:29 bsp-asbr2-cm charon-systemd[45036]: authentication of 'domain1' with RSA_EMSA_PKCS1_SHA2_256 successful
Mar 16 12:47:29 bsp-asbr2-cm charon[45036]: 14[IKE] <JXNCCT|2> peer supports MOBIKE
Mar 16 12:47:29 bsp-asbr2-cm charon-systemd[45036]: peer supports MOBIKE
Mar 16 12:47:29 bsp-asbr2-cm charon[45036]: 14[IKE] <JXNCCT|2> authentication of 'domain2' (myself) with RSA_EMSA_PKCS1_SHA2_256 successful
Mar 16 12:47:29 bsp-asbr2-cm charon-systemd[45036]: authentication of 'domain2' (myself) with RSA_EMSA_PKCS1_SHA2_256 successful
Mar 16 12:47:29 bsp-asbr2-cm charon[45036]: 14[IKE] <JXNCCT|2> IKE_SA JXNCCT[2] established between <pubIP2>[domain2]...<pubIP1>[domain1]
Mar 16 12:47:29 bsp-asbr2-cm charon-systemd[45036]: IKE_SA JXNCCT[2] established between <pubIP2>[domain2]...<pubIP1>[domain1]
Mar 16 12:47:29 bsp-asbr2-cm charon[45036]: 14[IKE] <JXNCCT|2> scheduling rekeying in 28200s
Mar 16 12:47:29 bsp-asbr2-cm charon-systemd[45036]: scheduling rekeying in 28200s
Mar 16 12:47:29 bsp-asbr2-cm charon[45036]: 14[IKE] <JXNCCT|2> maximum IKE_SA lifetime 31080s
Mar 16 12:47:29 bsp-asbr2-cm charon-systemd[45036]: maximum IKE_SA lifetime 31080s
Mar 16 12:47:29 bsp-asbr2-cm charon[45036]: 14[CFG] <JXNCCT|2> selected proposal: ESP:AES_CBC_128/HMAC_SHA1_96/NO_EXT_SEQ
Mar 16 12:47:29 bsp-asbr2-cm charon-systemd[45036]: selected proposal: ESP:AES_CBC_128/HMAC_SHA1_96/NO_EXT_SEQ
Mar 16 12:47:29 bsp-asbr2-cm charon[45036]: 14[KNL] <JXNCCT|2> received netlink error: Invalid argument (22)
Mar 16 12:47:29 bsp-asbr2-cm charon-systemd[45036]: received netlink error: Invalid argument (22)
Mar 16 12:47:29 bsp-asbr2-cm charon[45036]: 14[KNL] <JXNCCT|2> unable to install source route for 192.168.127.32
Mar 16 12:47:29 bsp-asbr2-cm charon-systemd[45036]: unable to install source route for 192.168.127.32
Mar 16 12:47:29 bsp-asbr2-cm charon[45036]: 14[IKE] <JXNCCT|2> CHILD_SA JXNCCT-tunnel-1{2} established with SPIs c4ba20f9_i c3ba4340_o and TS 192.168.127.32/32 === 192.168.63.32/32
Mar 16 12:47:29 bsp-asbr2-cm charon-systemd[45036]: CHILD_SA JXNCCT-tunnel-1{2} established with SPIs c4ba20f9_i c3ba4340_o and TS 192.168.127.32/32 === 192.168.63.32/32
Mar 16 12:47:29 bsp-asbr2-cm charon[45036]: 14[ENC] <JXNCCT|2> generating IKE_AUTH response 1 [ IDr AUTH SA TSi TSr N(MOBIKE_SUP) N(NO_ADD_ADDR) ]
Mar 16 12:47:29 bsp-asbr2-cm charon-systemd[45036]: generating IKE_AUTH response 1 [ IDr AUTH SA TSi TSr N(MOBIKE_SUP) N(NO_ADD_ADDR) ]
Mar 16 12:47:29 bsp-asbr2-cm charon[45036]: 14[NET] <JXNCCT|2> sending packet: from <pubIP2>[4500] to <pubIP1>[4500] (476 bytes)
Mar 16 12:47:29 bsp-asbr2-cm charon-systemd[45036]: sending packet: from <pubIP2>[4500] to <pubIP1>[4500] (476 bytes)
Mar 16 12:47:59 bsp-asbr2-cm charon[45036]: 06[NET] <JXNCCT|2> received packet: from <pubIP1>[4500] to <pubIP2>[4500] (76 bytes)
Mar 16 12:47:59 bsp-asbr2-cm charon-systemd[45036]: received packet: from <pubIP1>[4500] to <pubIP2>[4500] (76 bytes)
Mar 16 12:47:59 bsp-asbr2-cm charon[45036]: 06[ENC] <JXNCCT|2> parsed INFORMATIONAL request 2 [ ]
Mar 16 12:47:59 bsp-asbr2-cm charon-systemd[45036]: parsed INFORMATIONAL request 2 [ ]
Mar 16 12:47:59 bsp-asbr2-cm charon[45036]: 06[ENC] <JXNCCT|2> generating INFORMATIONAL response 2 [ ]
Mar 16 12:47:59 bsp-asbr2-cm charon-systemd[45036]: generating INFORMATIONAL response 2 [ ]
Mar 16 12:47:59 bsp-asbr2-cm charon[45036]: 06[NET] <JXNCCT|2> sending packet: from <pubIP2>[4500] to <pubIP1>[4500] (76 bytes)
Mar 16 12:47:59 bsp-asbr2-cm charon-systemd[45036]: sending packet: from <pubIP2>[4500] to <pubIP1>[4500] (76 bytes)

When the IPsec connection established, strongswan said unable to install source route.

Workaround: put these lines to /config/scripts/vyos-postconfig-bootup.script

vtysh -c "conf t" -c "int tun0" -c "ip ospf passive"
sleep 10

systemctl stop strongswan
ip l set tun0 down

sleep 5
systemctl start strongswan

a=""

while [ "$a" == "" ]
do
a=`swanctl -l 2>/dev/null | grep in `
sleep 1
echo "Wait for Tunnel to be ready"
done

echo "Tunnel up"
sleep 20

ip l set tun0 up
echo "Hopefully done"

sleep 10
echo "Enabling OSPF"
vtysh -c "conf t" -c "int tun0" -c "no ip ospf passive"

@c-po @Viacheslav

It seems this problem is not caused by IPsec, but it was caused by GRE implementation.

If I set the underlay interface of a GRE tunnel to a VRF interface. The GRE tunnel will not be able to send packet, but once it received a packet from remote side, the GRE tunnel can work normally.

Btw, I still don't know how to properly configure IPsec in multiple VRFs.