Page MenuHomeVyOS Platform

Dropped TX packets since kernel change from 4.18.11 to 4.14.65 in VyOS on AWS
Closed, ResolvedPublicBUG

Description

I have a VyOS instance in a AWS VPC with 2 interfaces as a VPN gateway (to HQ) and firewall.
Public interface with Elastic-IP in public subnet and second interface in private subnet. All traffic from instances in private subnet to internet or HQ goes to VyOS instance private interface.

Since vyos-1.2.0-rolling+201810220337 (kernel 4.14.65) we see about 8-10% packetloss from instances in private-subnet (test instance is openSUSE Leap 15) to HQ and internet (through VyOS router).

Packetloss only occures with packets smaller 240bytes, larger packets are not affected.
I switches to VyOS 1.2.0-rc3 (kernel 4.14.65), same problem.
I reverted to vyos-1.2.0-201810210337 (kernel 4.18.11) problem is gone.
Back to VyOS 1.2.0-rc3 (kernel 4.14.65), same problem again.

In VyOS I see dropped TX packets on public and private interfaces:

vyos@vyos10:/var/log$ sudo ifconfig
eth0      Link encap:Ethernet  HWaddr 06:2a:e8:xx:xx:xx
          inet addr:172.16.101.16  Bcast:172.16.101.31  Mask:255.255.255.224
          inet6 addr: fe80::42a:e8ff:fe45:5dd6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:403401 errors:0 dropped:0 overruns:0 frame:0
          TX packets:475753 errors:0 dropped:6133 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:88966065 (84.8 MiB)  TX bytes:130760175 (124.7 MiB)

eth1      Link encap:Ethernet  HWaddr 06:3a:56:yy:yy:yy
          inet addr:172.16.100.10  Bcast:172.16.100.255  Mask:255.255.255.0
          inet6 addr: fe80::43a:56ff:fe8a:8de7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:358583 errors:0 dropped:0 overruns:0 frame:0
          TX packets:298060 errors:0 dropped:4788 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:75408931 (71.9 MiB)  TX bytes:59339562 (56.5 MiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:262 errors:0 dropped:0 overruns:0 frame:0
          TX packets:262 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:23284 (22.7 KiB)  TX bytes:23284 (22.7 KiB)

Everytime I test with small packets (ICMP echo smaller 240bytes for example) the TX dropped value increases on public and private interface in parallel. Checked online with:

watch -tn 1  "ifconfig -a | grep -A 5 eth1 | grep 'TX packets' | sed 's/^.* dropped:\\([0-9]\\{1,\\}\\) .*\$/\1/g'"

Here an MTR from test instance in private subnet to IP 1.1.1.1:

                              My traceroute  [vUNKNOWN]
aws-vm-test01 (172.16.100.12)                                     2018-10-25T11:45:18+0200
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                             Packets               Pings
 Host                                      Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. 172.16.100.10                           0.0%   120    0.5   0.5   0.3   1.4   0.1
 2. ???
 3. ???
 4. ???
 5. ???
 6. ???
 7. 100.65.10.33                           10.8%   120    1.8   1.1   0.8  14.1   1.3
 8. ???
 9. ???
10. 52.93.7.108                             7.5%   120    3.4  11.5   2.7  50.4  12.1
11. 52.93.7.29                              5.9%   119    1.4   1.6   1.2   9.8   0.9
12. inex1.as13335.net                      10.9%   119    2.0   2.7   1.5  42.7   4.7
13. one.one.one.one                         9.2%   119    1.6   1.6   1.4   4.0   0.3

If you have specific questions I can test again and report further information.

Details

Difficulty level
Unknown (require assessment)
Version
VyOS 1.2.0-rc3
Why the issue appeared?
Will be filled on close

Event Timeline

I also made 2 tcpdumps, one on eth0 (pulic interface) and on on eth1 (private interface) with VyOS 1.2.0-rc3 (kernel 4.14.65).


VyoS:
t2.nano (only little network traffic in this VPC and no such problems with 4.18 kernel versions)

Test instance:
t2.micro

just retested with VyOS1.2.0-rc4, same drops.

syncer triaged this task as Normal priority.
syncer edited projects, added VyOS 1.2 Crux (VyOS 1.2.0-rc6); removed VyOS 1.2 Crux.

tested again on VyOS 1.2.0-rc5 (kernel 4.14.75), same packetloss.

In T935#24147, @Line2 wrote:

tested again on VyOS 1.2.0-rc5 (kernel 4.14.75), same packetloss.

1.2.0-rolling+201810210337 has 4.18.11, not 4.14.
1.2.0-rolling+201810220337 has 4.14
I could reproduce issue in both rc4 and rc5, 1.2.0-rolling+201810220337 - all with 4.14

vyos@VyOS-AMI:~$ sh ver | grep roll
Version: VyOS 1.2.0-rolling+201810210337
vyos@VyOS-AMI:~$ uname -r
4.18.11-amd64-vyos
vyos@VyOS-AMI:~$ sudo ifconfig | grep drop
RX packets:442 errors:0 dropped:0 overruns:0 frame:0
TX packets:351 errors:0 dropped:0 overruns:0 carrier:0
RX packets:179 errors:0 dropped:0 overruns:0 frame:0
TX packets:165 errors:0 dropped:0 overruns:0 carrier:0
RX packets:64 errors:0 dropped:0 overruns:0 frame:0
TX packets:64 errors:0 dropped:0 overruns:0 carrier:0

Line2 renamed this task from Dropped TX packets since kernel change from 4.14.65 to 4.18.11 in VyOS on AWS to Dropped TX packets since kernel change from 4.18.11 to 4.14.65 in VyOS on AWS.Oct 30 2018, 7:14 AM
Line2 updated the task description. (Show Details)

Sorry for the mixup of the kernel version numbers in my original post. I corrected it. But good you can reproduce the issue.

any new findings in this case? I also searched on the internet with no solution yet (beside kernel version 4.18.x)

I saw the change to kernel 4.19.0 (LTS) in VyOS-1.2.0-rolling+201811061700. Just updated to this version and tested again. The packetloss is gone again in this version (same as with kernel 4.18.x)! Tested with mtr and PING (small packets).

Is it planned to switch to kernel 4.19 in 1.2.0-rc6 too?

Ok I see, RC6 is now on kernel 4.19. Very nice!