Page MenuHomeVyOS Platform

Intel: update out-of-tree drivers, i40e driver warning
Closed, ResolvedPublicBUG

Description

Could you update to the latest i40e driver (2.13.10) from Intel to stop warnings about mismatched in dmesg?

[ 1.297061] i40e 0000:07:00.0: The driver for the device detected a newer version of the NVM image v1.11 than expected v1.10. Please install the most recent version of the network driver.

Details

Difficulty level
Easy (less than an hour)
Version
1.3 rolling
Why the issue appeared?
Other
Is it a breaking change?
Perfectly compatible
Issue type
Package upgrade

Event Timeline

The latest driver also includes a fix for link flapping, which is the issue we are experiencing.

c-po triaged this task as Normal priority.
c-po changed Why the issue appeared? from Will be filled on close to Other.

Driver will be included in next rolling ISO

c-po renamed this task from i40e driver warning to Intel: update out-of-tree drivers, i40e driver warning.Oct 30 2020, 4:04 PM

Frustratingly, 2.13.10 seems to have some other — very nasty — bugs in it. We've had three kernel crashes on the latest VyOS 1.3 releases (from around Christmas) as a result, and I currently believe they are the same as those problems described here:

https://sourceforge.net/p/e1000/mailman/e1000-devel/thread/cbfad569-e97e-8ea6-ede9-8f3e2f20e790%40cri.epita.fr/#msg37186731

Quoting that thread:

During our tests, we observed that the slab memory is constantly
increasing, without the machine doing anything else than network operations.
The problem only happens when receiving traffic. Or at least we haven't
been able to reproduce it when only sending traffic.
After tracing the memory allocations and deallocations made by the
kernel, we were able to confirm that the driver leaks memory. However,
this leak doesn't happen when ntuples are off (disabled with `ethtool
--features ens1f1 ntuple off`).
We now plan to further analyze the memory operations made by the i40e
driver and will report back if we find anything. In the meantime, we are
opening this thread hoping that someone might already know of this issue
and have a fix.

I guess I'm going to disable i40e's ntuple support on 1.3-rolling and see if it makes a difference because right now our options are:

  • VyOS 1.2.6 (running FRRouting 7.3) which has a bug where ospf6d crashes on certain LSAs
  • VyOS 1.3 (running stable FRR 7.5) where the i40e 2.13.10 driver has a bug where the entire router crashes
  • VyOS 1.2 rolling from around 201906 (running FRR 7.2-dev) which is from before FRR introduced their ospf6d bug, and seems to be a not-completely-awful i40e driver
  • throw the i40e NICs out, and go Mellanox

Seems i40e is a lot of fun. Given thos nasty errors and Intels development cycle, I have a recent 1.3 ISO with Kernel 5.10.4 and build in i40e drivers (mainline).

@maznu could you give those a testdrive?

i40e is a tyre fire.

I would be very happy to try your ISO :)

Alternatively, we've got an i40e VyOS box in production which is stable with:

[   17.875126] i40e: Intel(R) 40-10 Gigabit Ethernet Connection Network Driver - version 2.7.29
[   17.875127] i40e: Copyright(c) 2013 - 2018 Intel Corporation.
[   17.886722] i40e 0000:01:00.0: fw 6.0.48442 api 1.7 nvm 6.01 0x800035ce 1.1747.0

Here are my conclusions about the last week's shenanigans.

Grateful, as ever, to the VyOS team — in particular @c-po — for their help.

https://faelix.net/news/202101/vyos-updates-vs-intel-i40e-rfo/

erkin removed a subscriber: Active contributors.