For at least a year now, I've been running a VPN between a remote Cisco and a local VyOS device. I noticed every once in a while, the tunnel would just drop and come back but didn't think too much of it.
I recently moved a backup process to start taking place over VPN instead of SSH. Loe and behold, as soon as the transferred file hit somewhere in the 4.2GB size, the tunnel would drop.
-rw-r--r-- 1 1001 1002 4415553536 Jan 20 21:16 backup_2019-01-20.tar -rw-r--r-- 1 1001 1002 4211081216 Jan 25 21:13 backup_2019-01-25.tar -rw-r--r-- 1 1001 1002 4255121408 Jan 26 21:15 backup_2019-01-26.tar -rw-r--r-- 1 1001 1002 4434427904 Jan 27 21:15 backup_2019-01-27.tar
I did some testing, and this only occurs with VyOS on one end of the tunnel. With Mikrotik->Cisco, the files complete the transfer.
A manual test with rsync:
backup_2019-01-28.tar.gz 41% 4178MB 4.3MB/s 22:40 ETA packet_write_wait: Connection to 10.0.0.80 port 22: Broken pipe
I found this in the logs.
Note that I obscured the IPs from 77.0.0.2=local, 88.0.0.2=remote.
Jan 28 22:51:54 edge charon: 04[NET] received packet: from 88.0.0.2[500] to 77.0.0.2[500] (508 bytes) Jan 28 22:51:54 edge charon: 04[ENC] parsed CREATE_CHILD_SA request 5 [ N(REKEY_SA) SA No TSi TSr ] Jan 28 22:51:54 edge charon: 04[CFG] received proposals: ESP:AES_CBC_128/HMAC_SHA1_96/NO_EXT_SEQ, ESP:DES_CBC/HMAC_SHA1_96/HMAC_MD5_96/NO_EXT_SEQ, ESP:3DES_CBC/HMAC_SHA1_96/HMAC_MD5_96/NO_EXT_SEQ, ESP:AES_CBC_128/HMAC_SHA1_96/HMAC_MD5_96/NO_EXT_SEQ, ESP:AES_CBC_192/HMAC_SHA1_96/HMAC_MD5_96/NO_EXT_SEQ, ESP:AES_CBC_256/HMAC_SHA1_96/HMAC_MD5_96/NO_EXT_SEQ Jan 28 22:51:54 edge charon: 04[CFG] configured proposals: ESP:AES_CBC_128/HMAC_SHA1_96/MODP_1536/NO_EXT_SEQ Jan 28 22:51:54 edge charon: 04[IKE] no acceptable proposal found Jan 28 22:51:54 edge charon: 04[IKE] failed to establish CHILD_SA, keeping IKE_SA Jan 28 22:51:54 edge charon: 04[ENC] generating CREATE_CHILD_SA response 5 [ N(NO_PROP) ] Jan 28 22:51:54 edge charon: 04[NET] sending packet: from 77.0.0.2[500] to 88.0.0.2[500] (76 bytes) Jan 28 22:52:54 edge charon: 08[NET] received packet: from 88.0.0.2[500] to 77.0.0.2[500] (76 bytes) Jan 28 22:52:54 edge charon: 08[ENC] parsed INFORMATIONAL request 6 [ D ] Jan 28 22:52:54 edge charon: 08[IKE] received DELETE for ESP CHILD_SA with SPI db82d63c Jan 28 22:52:54 edge charon: 08[IKE] closing CHILD_SA peer-88.0.0.2-tunnel-0{47} with SPIs c4b5c9ed_i (4557325114 bytes) db82d63c_o (53194137 bytes) and TS 10.0.0.0/8 === 10.0.0.0/24 Jan 28 22:52:54 edge charon: 08[IKE] sending DELETE for ESP CHILD_SA with SPI c4b5c9ed Jan 28 22:52:54 edge charon: 08[IKE] CHILD_SA closed Jan 28 22:52:54 edge charon: 08[ENC] generating INFORMATIONAL response 6 [ D ] Jan 28 22:52:54 edge charon: 08[NET] sending packet: from 77.0.0.2[500] to 88.0.0.2[500] (76 bytes) Jan 28 22:52:54 edge charon: 11[NET] received packet: from 88.0.0.2[500] to 77.0.0.2[500] (76 bytes) Jan 28 22:52:54 edge charon: 11[ENC] parsed INFORMATIONAL request 7 [ D ] Jan 28 22:52:54 edge charon: 11[IKE] received DELETE for IKE_SA peer-88.0.0.2-tunnel-0[27] Jan 28 22:52:54 edge charon: 11[IKE] deleting IKE_SA peer-88.0.0.2-tunnel-0[27] between 77.0.0.2[77.0.0.2]...88.0.0.2[88.0.0.2] Jan 28 22:52:54 edge charon: 11[IKE] IKE_SA deleted Jan 28 22:52:54 edge charon: 11[ENC] generating INFORMATIONAL response 7 [ ] Jan 28 22:52:54 edge charon: 11[NET] sending packet: from 77.0.0.2[500] to 88.0.0.2[500] (76 bytes)
Seven seconds later, the tunnel pops back up as expected
Jan 28 22:53:08 edge charon: 06[NET] received packet: from 88.0.0.2[500] to 77.0.0.2[500] (718 bytes) Jan 28 22:53:08 edge charon: 06[ENC] parsed IKE_SA_INIT request 0 [ SA KE No V V N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) V ] Jan 28 22:53:08 edge charon: 06[IKE] received Cisco Delete Reason vendor ID Jan 28 22:53:08 edge charon: 06[IKE] received Cisco Copyright (c) 2009 vendor ID Jan 28 22:53:08 edge charon: 06[IKE] received FRAGMENTATION vendor ID
The config:
ipsec { esp-group remote-site-esp { compression disable lifetime 28800 mode tunnel pfs dh-group5 proposal 1 { encryption aes128 hash sha1 } } ike-group remote-site-ike { dead-peer-detection { action restart interval 30 timeout 30 } ikev2-reauth no key-exchange ikev2 lifetime 86400 proposal 1 { dh-group 2 encryption aes128 hash sha1 } } ipsec-interfaces { interface eth0 } nat-networks { allowed-network 0.0.0.0/0 { } } nat-traversal enable site-to-site { peer 88.0.0.2 { authentication { mode pre-shared-secret pre-shared-secret password } connection-type initiate ike-group remote-site-ike ikev2-reauth inherit local-address 77.0.0.22 tunnel 0 { allow-nat-networks disable allow-public-networks disable esp-group remote-site-esp local { prefix 10.0.0.0/8 } remote { prefix 10.0.0.0/24 } } } } }