Page MenuHomeVyOS Platform

XCP-ng packet drops for small packets (e.g. icmp) under Xen and AWS
Open, HighPublicBUG

Description

Hi I’m new. I’m not sure what needs to go here but I found a problem and someone else on the forum confirmed it is real.

When I run netstat -i it show Tx drops on Ethernet interfaces. Same TX drop shows When I run ifconfig

The drop can be reproduced by sending packets under 214 bytes in size. It seems to drop about 3.75% of small packets. Packets over 215 on the same test has 0% loss reliably.

Tested under XCP Ng 8, XenServer 6.5 and AWS (which runs Xen).

Doesn’t happen on virtual box, VMware, or HyperV.

Suspect it’s related to PARAVIRTUAL IO Drivers (xen_netfront) or something related.

Also only drops packets which are being forwarded from Ethernet to Ethernet. it doesn’t affect traffic that originates or terminated on the VyOS itself. Doesn’t affect traffic from VPN to Ethernet.

I have a XEN Lab available to anyone who wishes to tinker and test.

Details

Difficulty level
Hard (possibly days)
Version
1.3 rolling
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)

Event Timeline

https://phabricator.vyos.net/T935 Here’s the same thing happening in the past. I think it was resolved by doing kernel updates? Can someone do a kernel update in the rolling build?

There is no newer kernel then 4.19.124 on the 4.19x train. Newer Kernels do not work as the out-of-tree Intel drivers for the NICs and QAT won‘t compile for Kernel >5.3 and that is bot an LTS one.

In T2505#64889, @c-po wrote:

There is no newer kernel then 4.19.124 on the 4.19x train. Newer Kernels do not work as the out-of-tree Intel drivers for the NICs and QAT won‘t compile for Kernel >5.3 and that is bot an LTS one.

So if this is a kernel issue I should have the same problem with the same kernel under Debian 10 right

If this can be solved by a kernel update, there was talk about maybe having different build "flavors" in the past - one with all the hardware nic drivers, one without. The minimal image could then have the latest (5.x) kernel.
There's T2085 which prevents us from testing any newer kernel ourselves as it's built by Jenkinsfiles in the CI, we'd need to manually do the steps the CI does to build a kernel. I proposed a shared script solution for these repositories in that task that could be called from both the CI and vyos-build, this would allow anyone to build all packages, including the kernel, through vyos-build, just for cases like this.

@Sonicbx @jjakob I also created https://phabricator.vyos.net/T2504 - I think we duplicated the issue here. You can close whichever issue you want.

In T2505#64896, @jjakob wrote:

If this can be solved by a kernel update, there was talk about maybe having different build "flavors" in the past - one with all the hardware nic drivers, one without. The minimal image could then have the latest (5.x) kernel.
There's T2085 which prevents us from testing any newer kernel ourselves as it's built by Jenkinsfiles in the CI, we'd need to manually do the steps the CI does to build a kernel. I proposed a shared script solution for these repositories in that task that could be called from both the CI and vyos-build, this would allow anyone to build all packages, including the kernel, through vyos-build, just for cases like this.

vyos-build-kernel comes with dedicated build scripts for some time now - this should no longer be an issue. I do not support the different falvour idea as it will be a nightmare to maintain. Just give it some time when Intel decides to update their stuff.

I replaced the distributed guest utilities (vyos-xe-guest-utilities) with the ones that come with xcp-ng. But this changed nothing regarding the packet loss. Tho, now they get properly recognized by xcp-ng :-)

Does anyone have some idea on how to test with different kernels? For now this is a deal breaker while using the 1.3.x branch. Tho I would really love to keep using bleeding edge in order to help testing things :-)

I don't see problems with Debian Buster, kernel "4.19.0-9"
Need to check this patch. Ref. https://patchwork.kernel.org/patch/9293785/

With 4.19.123-amd64-vyos I am having the same problems. I would assume, that the patch from 2016 is already in this kernel?

Also happens with 4.19.131-amd64-vyos - I guess that patch mentioned by @Viacheslav is either not included or not solving the problem.

@fetzerms the mentioned patch is not included in the mainline kernel!

@c-po Thank you for clarifying. I guess I misinterpreted what i read on patchwork. I'd be eager to test a kernel with the patch!

Oh - I'm sorry - I mixed up the lines in the kernel. The patch is actually in VyOS.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fd07160bb7180cdd0afeb089d8cdfd66002f17e6

(21:51) cpo lnx01:~/vyos-build/packages/linux-kernel/linux # git tag --contains fd07160bb7180 | grep 4.19.131
v4.19.131
c-po renamed this task from Major Dropping small packets under Xen and AWS to XCP-ng packet drops for small packets (e.g. icmp) under Xen and AWS.Aug 16 2020, 2:49 PM
c-po added a subscriber: zsdc.

Hi,
I've been experiencing issues similar to what is being mentioned here. From my experience, I cant find any indication that the packet loss is related to packet size, it seems random.
Im running 1.4-rolling-202101270854 on XCP-NG 8.1.0. Confirmed on both Intel(R) Atom(TM) CPU C2758 and Intel(R) Xeon(R) CPU E5-2630 v3 machines. HW-checksumming turned off in XCP-NG makes no difference.

As mentioned by others, ethtool shows almost no output for the ethernet devices, and they are barely mentioned in dmesg.

# netstat -i
Kernel Interface table
Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0      1500  7946569      0    278 0      14275236      0 162847      0 BMRU
eth1      1500  8859462      0      0 0      11365142      0 159164      0 BMRU

# dmesg | grep vif
[    2.817709] xenbus_probe_frontend: Device with no driver: device/vif/0
[    2.825728] xenbus_probe_frontend: Device with no driver: device/vif/1

# ethtool eth0
Settings for eth0:
	Link detected: yes

Im also not able to set smp_affinity on this release. Not sure if that is a symptom of something else (this was possible in 1.3 rolling)

I will gladly do any form of testing. Im not too experienced with patching kernels, but I'm not afraid of trying if that is what is needed.

We don't find any solution right now. We test with different kernels/offloads/sysctl params but without result.
Additional topic
https://xcp-ng.org/forum/topic/2956/tx-dropped-in-pv-vm

Looks like upgrading to 1.4-rolling-202102060218 finally fixed the issue for me. Now netstat shows 0 dropped TX, and my SSH connections are much more responsive.

So after a week of running, and comparing to performance with the LTS, i know that something is wrong.

In this latest 1.4 rolling using kernel 5.10.12, and some of the earlier versions (that experienced packet loss), running top shows that ksoftirqd/0 runs at around 100% when saturating the link. For one of my machines, this means that packet loss starts appearing at 250mb/s, as ksoftirqd/0 reaches 100% cpu use.

Downgrading to the 1.2.6 LTS gives me almost twice the performance with the exact same config. ksoftirqd barely reaches 1% cpu load.

I start to wonder how the VyOS XCP-NG partnership is working, when Xen support is lacking so much?

@mathiashedberg could you try and enable RPS set interfaces ethernet eth0 offload rps and see if this does any good on utilisation / drop rate? I had a similar issue with a PPPoE link which behaved super bad under preasure.

@c-po That seemed to more than double my RX!

Here are some quick iperf3 results from vyos to a host connected over DAC SFP+

Baseline 1.2.6 LTS:

TX: 8.59 Gbits/sec    Retr: 0
RX: 4.24 Gbits/sec    Retr: 43

1.4-rolling-202102060218:

Without offloading:
TX: 3.16 Gbits/sec    Retr: 0
RX: 1.26 Gbits/sec    Retr: 86

With offloading:
TX: 3.00 Gbits/sec    Retr: 0
RX: 3.64 Gbits/sec    Retr: 487

netstat -i shows me 0 TX/RX errors both with and without offloading.

Im quite happy with this result as ksoftirqd cpu use is much less than before, and i can now finally saturate my wan link.

erkin set Issue type to Bug (incorrect behavior).Aug 30 2021, 6:01 AM
erkin removed a subscriber: Active contributors.

@Sonicbx Is it an actual bug?

I haven't tried testing this in over 2 years. I no longer have a place to run XCPNG but I do have a HyperV host I can try this out on. I'll get back to you.

@Sonicbx As I remember, HyperV is not affected.
But thanks anyway,

dmbaturin added a project: VyOS 1.4 Sagitta.

What is the resolution? How was it resolved? @Viacheslav

There have been no reports since 2021

If it is still a bug, we'll reopen it. @Sonicbx can you re-check with the 1.3.x or 1.4-rc3?

Packet drops are still an issue with XCP-ng 8.2.1 and the latest rolling releases, e.g. 1.5-rolling-202404290019. Roughly 7% of pings are dropped and TX-DRP is nonzero. The same rolling release works flawlessly under Proxmox without any TX-DRP show in netstat.

@peter, did you try various offloading settings for the NIC being used with reboots in between?

Also, I have no knowledge of what Xen wants to have included in the kernel to work properly but out of the blue I can find a couple of candidates which might need to be included either inside the kernel or as a module in the "Xen driver support" section of the kernel compile config:

https://github.com/vyos/vyos-build/blob/current/packages/linux-kernel/arch/x86/configs/vyos_defconfig#L5034

Thinking of (as candidates):

# CONFIG_XEN_GRANT_DMA_ALLOC is not set

# CONFIG_XEN_PVCALLS_FRONTEND is not set
# CONFIG_XEN_PVCALLS_BACKEND is not set

# CONFIG_XEN_VIRTIO is not set

https://github.com/torvalds/linux/blob/master/drivers/xen/Kconfig

config XEN_GRANT_DMA_ALLOC
	bool "Allow allocating DMA capable buffers with grant reference module"
	depends on XEN && HAS_DMA
	help
	  Extends grant table module API to allow allocating DMA capable
	  buffers and mapping foreign grant references on top of it.
	  The resulting buffer is similar to one allocated by the balloon
	  driver in that proper memory reservation is made by
	  ({increase|decrease}_reservation and VA mappings are updated if
	  needed).
	  This is useful for sharing foreign buffers with HW drivers which
	  cannot work with scattered buffers provided by the balloon driver,
	  but require DMAable memory instead.

config XEN_PVCALLS_FRONTEND
	tristate "XEN PV Calls frontend driver"
	depends on INET && XEN
	select XEN_XENBUS_FRONTEND
	help
	  Experimental frontend for the Xen PV Calls protocol
	  (https://xenbits.xen.org/docs/unstable/misc/pvcalls.html). It
	  sends a small set of POSIX calls to the backend, which
	  implements them.

config XEN_PVCALLS_BACKEND
	tristate "XEN PV Calls backend driver"
	depends on INET && XEN && XEN_BACKEND
	help
	  Experimental backend for the Xen PV Calls protocol
	  (https://xenbits.xen.org/docs/unstable/misc/pvcalls.html). It
	  allows PV Calls frontends to send POSIX calls to the backend,
	  which implements them.

	  If in doubt, say n.

config XEN_VIRTIO
	bool "Xen virtio support"
	depends on VIRTIO
	select XEN_GRANT_DMA_OPS
	select XEN_GRANT_DMA_IOMMU if OF
	help
	  Enable virtio support for running as Xen guest. Depending on the
	  guest type this will require special support on the backend side
	  (qemu or kernel, depending on the virtio device types used).

	  If in doubt, say n.

Just upgrade from 1.3 LTS to 1.5-rolling-202408300023 on XCP-ng 8.2 with latest patches.
The sg offload setting seems fixed the drop packet issue.

set interfaces ethernet eth0 offload sg
set interfaces ethernet eth1 offload sg
...

Tried to set only sg offload, but other offload settings appear automatically after reboot

# show interfaces ethernet eth0
 address xx.xx.xx.xx/xx
 hw-id xx:xx:xx:xx:xx:xx
 offload {
     gro
     gso
     sg
     tso
 }

Few more information:

# netstat -i
Kernel Interface table
Iface             MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0             1500 4107278116      0      0 0      3802188065      0      0      0 BMRU
eth1             1500 3246960951      0      0 0      3180233640      0      0      0 BMRU
eth2             1500 1614614389      0      0 0      1835133348      0      0      0 BMRU
eth3             1500       1348      0      0 0            2026      0      0      0 BMRU
eth4             1500  147557773      0      0 0       153541298      0      0      0 BMRU
eth5             1500          0      0      0 0              32      0      0      0 BMRU
eth6             1500   83215964      0      0 0       219963927      0      0      0 BMRU
eth7             1500 1269002202      0      0 0      1267772597      0      0      0 BMRU
eth8             1500      14377      0      1 0           10663      0      0      0 BMRU
lo              65536       1727      0      0 0            1727      0      0      0 LRU
pim6reg          1452          0      0      0 0               0      0      0      0 ORU
# ethtool -k eth0
Features for eth0:
rx-checksumming: on [fixed]
tx-checksumming: on
        tx-checksum-ipv4: on [fixed]
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: on [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]