Page MenuHomeVyOS Platform

vti-up-down script brings down VTIs when Child SA is renegotiated or reestablished
Open, NormalPublicBUG

Description

The vti-up-down script brings down when the child SA is up.

I have observed two different triggers that makes the vti-up-down bring down the VTI.

  1. Remote host re-key outside re-key window. When remote side brings down the old(re-negotiated) Child_SA, the vti-up-down script brings down the vti
Jul 12 11:53:34 device_hostname charon[10308]: 07[ENC] <peer_VPN-PEER-IP|5136> parsed CREATE_CHILD_SA request 1412 [ N(ESP_TFC_PAD_N) SA No KE TSi TSr ]
Jul 12 11:53:34 device_hostname charon[10308]: 07[IKE] <peer_VPN-PEER-IP|5136> CHILD_SA peer_VPN-PEER-IP-vti{79344} established with SPIs ca20c445_i b69fd19a_o and TS 0.0.0.0/0 ::/0 === 0.0.0.0/0 ::/0
Jul 12 11:53:34 device_hostname vti-up-down[980158]: Interface vti4 up-client peer_VPN-PEER-IP-vt

Jul 12 11:53:37 device_hostname charon[10308]: 01[IKE] <peer_VPN-PEER-IP|5136> received DELETE for ESP CHILD_SA with SPI cbf2d60c
Jul 12 11:53:37 device_hostname charon[10308]: 01[IKE] <peer_VPN-PEER-IP|5136> CHILD_SA closed
Jul 12 11:53:37 device_hostname vti-up-down[980166]: Interface vti4 down-client peer_VPN-PEER-IP-vti
  1. DPD Timeout - Established new IKE_SA/Child_SA, and brings down VTI when old instance times out.
#596 starts DPD at 05:08:23
Jul 13 05:08:23 NAME-fw-p charon[13087]: 16[IKE] <peer_IP_ADDR|596> sending DPD request


#600 brings up new ike before the old one timed out. 
Jul 13 05:09:41 NAME-fw-p charon[13087]: 12[IKE] <peer_IP_ADDR|600> IKE_SA peer_IP_ADDR[600] established between Y.X.X.X_LOCAL_IP[Y.X.X.X_LOCAL_IP]...x.X.X.X_remote_IP[x.X.X.X_remote_IP]
Jul 13 05:09:41 NAME-fw-p charon[13087]: 12[IKE] <peer_IP_ADDR|600> CHILD_SA peer_IP_ADDR-vti{1959} established with SPIs c0a0c2ae_i 556f0b24_o and TS 0.0.0.0/0 === 0.0.0.0/0

#596 is killed(no IKE action in log), and VTI is brought down.
Jul 13 05:11:08 NAME-fw-p charon[13087]: 03[IKE] <peer_IP_ADDR|596> giving up after 5 retransmits
Jul 13 05:11:08 NAME-fw-p vti-up-down[2999080]: Interface vti7 down-client peer_IP_ADDR-vti

It is hard to reproduce an error that's triggered as a result of something else going wrong. I have mostly observed it for native Azure VPNs with rtt > 150ms ( Were the Azure DC is in North America and our DC is in Europe). Also once to a Palo Alto FW.

They still happen quite frequently - let me know if you want me to tune any parameters.

Example config - let me know what you need from me:

set vpn ipsec esp-group GROUP-NAME lifetime '3600'
set vpn ipsec esp-group GROUP-NAME mode 'tunnel'
set vpn ipsec esp-group GROUP-NAME pfs 'dh-group19'
set vpn ipsec esp-group GROUP-NAME proposal 1 encryption 'aes256'
set vpn ipsec esp-group GROUP-NAME proposal 1 hash 'sha256'

set vpn ipsec ike-group GROUP-NAME disable-mobike
set vpn ipsec ike-group GROUP-NAME key-exchange 'ikev2'
set vpn ipsec ike-group GROUP-NAME lifetime '28800'
set vpn ipsec ike-group GROUP-NAME proposal 1 dh-group '19'
set vpn ipsec ike-group GROUP-NAME proposal 1 encryption 'aes256'
set vpn ipsec ike-group GROUP-NAME proposal 1 hash 'sha256'

set vpn ipsec authentication psk peer_REMOTE-IP id 'REMOTE.ID'
set vpn ipsec authentication psk peer_REMOTE-IP id 'LOCAL.ID'
set vpn ipsec authentication psk peer_REMOTE-IP secret ''

set vpn ipsec site-to-site peer peer_REMOTE-IP authentication mode 'pre-shared-secret'
set vpn ipsec site-to-site peer peer_REMOTE-IP authentication remote-id 'REMOTE.IP'
set vpn ipsec site-to-site peer peer_REMOTE-IP connection-type 'initiate'
set vpn ipsec site-to-site peer peer_REMOTE-IP default-esp-group 'GROUP-NAME'
set vpn ipsec site-to-site peer peer_REMOTE-IP ike-group 'GROUP-NAME'
set vpn ipsec site-to-site peer peer_REMOTE-IP ikev2-reauth 'inherit'
set vpn ipsec site-to-site peer peer_REMOTE-IP local-address 'LOCAL.IP'
set vpn ipsec site-to-site peer peer_REMOTE-IP remote-address 'REMOTE.IP'
set vpn ipsec site-to-site peer peer_REMOTE-IP vti bind 'vti5'
set vpn ipsec site-to-site peer peer_REMOTE-IP vti esp-group 'GROUP-NAME'

set interfaces vti vti5 address '169.254.1.x/30'
set interfaces vti vti5 mtu '1396'

Details

Difficulty level
Unknown (require assessment)
Version
1.4.0-GA
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)
Forum thread
https://vyos-community.slack.com/archives/C01A6CJFW1F/p1720165471164579

Event Timeline

dmbaturin triaged this task as Normal priority.Jul 15 2024, 6:33 AM
dmbaturin changed Issue type from Unspecified (please specify) to Bug (incorrect behavior).

Just to add some context. This keeps on happening regularly, but for some reason more often on our trans-atlantic VPNs. Maybe once a week( it varies).

Log entires from a typical event ( the SA stays up, but vti is forced down). When this happen I have to bounce the Child SA to get the VTI back up, if not it will stay down.

Sep 05 17:41:05 charon[3018769]: 06[NET] <peer_PEER|2095> received packet: from xxx.xxx.75.65[500] to xxx.xxx.xxx.199[500] (80 bytes)
Sep 05 17:41:05 charon[3018769]: 06[ENC] <peer_PEER|2095> parsed INFORMATIONAL request 4 [ ]
Sep 05 17:41:05 charon-systemd[3018769]: received packet: from xxx.xxx.75.65[500] to xxx.xxx.xxx.199[500] (80 bytes)
Sep 05 17:41:05 charon[3018769]: 06[ENC] <peer_PEER|2095> generating INFORMATIONAL response 4 [ ]
Sep 05 17:41:05 charon[3018769]: 06[NET] <peer_PEER|2095> sending packet: from xxx.xxx.xxx.199[500] to xxx.xxx.75.65[500] (80 bytes)
Sep 05 17:41:05 charon-systemd[3018769]: sending packet: from xxx.xxx.xxx.199[500] to xxx.xxx.75.65[500] (80 bytes)
Sep 05 17:41:19 charon[3018769]: 09[NET] <peer_PEER|2095> received packet: from xxx.xxx.75.65[500] to xxx.xxx.xxx.199[500] (80 bytes)
Sep 05 17:41:19 charon[3018769]: 09[ENC] <peer_PEER|2095> parsed INFORMATIONAL request 5 [ ]
Sep 05 17:41:19 charon-systemd[3018769]: received packet: from xxx.xxx.75.65[500] to xxx.xxx.xxx.199[500] (80 bytes)
Sep 05 17:41:19 charon[3018769]: 09[ENC] <peer_PEER|2095> generating INFORMATIONAL response 5 [ ]
Sep 05 17:41:19 charon[3018769]: 09[NET] <peer_PEER|2095> sending packet: from xxx.xxx.xxx.199[500] to xxx.xxx.75.65[500] (80 bytes)
Sep 05 17:41:19 charon-systemd[3018769]: sending packet: from xxx.xxx.xxx.199[500] to xxx.xxx.75.65[500] (80 bytes)
Sep 05 17:41:33 charon[3018769]: 12[NET] <peer_PEER|2095> received packet: from xxx.xxx.75.65[500] to xxx.xxx.xxx.199[500] (80 bytes)
Sep 05 17:41:33 charon-systemd[3018769]: received packet: from xxx.xxx.75.65[500] to xxx.xxx.xxx.199[500] (80 bytes)
Sep 05 17:41:33 charon[3018769]: 12[ENC] <peer_PEER|2095> parsed INFORMATIONAL request 6 [ ]
Sep 05 17:41:33 charon[3018769]: 12[ENC] <peer_PEER|2095> generating INFORMATIONAL response 6 [ ]
Sep 05 17:41:33 charon[3018769]: 12[NET] <peer_PEER|2095> sending packet: from xxx.xxx.xxx.199[500] to xxx.xxx.75.65[500] (80 bytes)
Sep 05 17:41:33 charon-systemd[3018769]: sending packet: from xxx.xxx.xxx.199[500] to xxx.xxx.75.65[500] (80 bytes)
Sep 05 17:41:33 charon[3018769]: 14[IKE] <peer_PEER|2086> giving up after 5 retransmits
Sep 05 17:41:34 vti-up-down[2517521]: Interface vti1 down-client xxxxxx
Sep 05 17:41:47 charon[3018769]: 07[NET] <peer_PEER|2095> received packet: from xxx.xxx.75.65[500] to xxx.xxx.xxx.199[500] (80 bytes)
Sep 05 17:41:47 charon[3018769]: 07[ENC] <peer_PEER|2095> parsed INFORMATIONAL request 7 [ ]
Sep 05 17:41:47 charon-systemd[3018769]: received packet: from xxx.xxx.75.65[500] to xxx.xxx.xxx.199[500] (80 bytes)
Sep 05 17:41:47 charon[3018769]: 07[ENC] <peer_PEER|2095> generating INFORMATIONAL response 7 [ ]
Sep 05 17:41:47 charon[3018769]: 07[NET] <peer_PEER|2095> sending packet: from xxx.xxx.xxx.199[500] to xxx.xxx.75.65[500] (80 bytes)
Sep 05 17:41:47 charon-systemd[3018769]: sending packet: from xxx.xxx.xxx.199[500] to xxx.xxx.75.65[500] (80 bytes)

This time it seems like the remote side initiate a new IKE_SA without properly closing the old one. This tricked strongswan to keep both, but when the old one timed out. the vti-up-down script killed the VTI.

My thouughts

  • A timeout from a old IKE SA should not bring down the VTI when a the new IKE SA is built and up.
  • When something goes wrong, and the vti-up-down brings down the vti, it should not stay down forever

Adding log entire for the old ike_SA 2086: "received message ID 1036, expected 1037, ignored"
Sep 05 17:38:48 charon[3018769]: 08[IKE] <peer_PEER|2086> sending DPD request
Sep 05 17:38:48 charon[3018769]: 08[ENC] <peer_PEER|2086> generating INFORMATIONAL request 32 [ ]
Sep 05 17:38:48 charon[3018769]: 08[NET] <peer_PEER|2086> sending packet: from xxx.xxx.xxx.199[500] to xxx.xxx.xxx.65[500] (80 bytes)
Sep 05 17:38:49 charon[3018769]: 07[NET] <peer_PEER|2086> received packet: from xxx.xxx.75.65[500] to xxx.xxx.117.199[500] (80 bytes)
Sep 05 17:38:49 charon[3018769]: 07[IKE] <peer_PEER|2086> received message ID 1036, expected 1037, ignored