Short story: Intel i210 network card has frequent "Tx Unit Hang" error caused my openWrt cannot connect with openWRT x86, while it works perfectly on Windows Server.
I found a few topics about "Tx Unit Hang" but they are all about e1000e network card, and they don't solve my problem on Intel i210 network card.
Detailed story:
Tried many openwrt versions from
openwrt-18.06.1-x86-64-combined-ext4.img
to
openwrt-19.07.5-x86-64-combined-squashfs.img
- A DIY Fanless Mini PC: CPU: i3 4030u, 8GB DDR3, SSD on mini-PCIe, HDD on SATA
- Onboard RTL 8111e network card connect to PCI bus
- Intel WGI 211 AT on mini-PCIe port.
I flashed the i211AT to i210AT by using eeupdate - they are same hardware but different firmware. This step allows windows server install driver for it. I have no i211AT firmware to rollback.
The system works well in Windows Server 2019. I installed OpenWRT over Hyper V, Everything works perfectly over the hyper-v virtual network cards.
I want to optimize performance by using PCI passthrough and the Intel i210AT doesn't work.
So I created a USB flash drive with OpenWRT, directly boot from the flash drive. I disabled SSD and HDD.
RTL8111e network card works fine but Intel i211AT shows the error message "igb 0000:03:00.0: Detected Tx Unit Hang" every a few seconds.
My client PC may connect to this port, even assigned with an IP from DHCP for a very short period and it disconnects due to this error:
Thu Jan 14 09:16:10 2021 user.notice mwan3[31625]: Execute ifdown event on interface wan (unknown)
Thu Jan 14 09:16:11 2021 kern.err kernel: [ 2745.165454] igb 0000:03:00.0: Detected Tx Unit Hang
Thu Jan 14 09:16:11 2021 kern.err kernel: [ 2745.165454] Tx Queue <2>
Thu Jan 14 09:16:11 2021 kern.err kernel: [ 2745.165454] TDH <0>
Thu Jan 14 09:16:11 2021 kern.err kernel: [ 2745.165454] TDT <1>
Thu Jan 14 09:16:11 2021 kern.err kernel: [ 2745.165454] next_to_use <1>
Thu Jan 14 09:16:11 2021 kern.err kernel: [ 2745.165454] next_to_clean <0>
Thu Jan 14 09:16:11 2021 kern.err kernel: [ 2745.165454] buffer_info[next_to_clean]
Thu Jan 14 09:16:11 2021 kern.err kernel: [ 2745.165454] time_stamp <100095171>
Thu Jan 14 09:16:11 2021 kern.err kernel: [ 2745.165454] next_to_watch <00000000c2bf44b8>
Thu Jan 14 09:16:11 2021 kern.err kernel: [ 2745.165454] jiffies <1000953c0>
Thu Jan 14 09:16:11 2021 kern.err kernel: [ 2745.165454] desc.status <2a8000>
Thu Jan 14 09:16:12 2021 daemon.info netdata[7996]: RRDSET: chart name 'net.pppoe_wan' on host 'OpenWrt' already exists.
Thu Jan 14 09:16:12 2021 daemon.err netdata[7996]: PROCFILE: Cannot open file '/proc/sysvipc/shm'
Thu Jan 14 09:16:14 2021 daemon.notice netifd: Network device 'eth0' link is down
Thu Jan 14 09:16:14 2021 daemon.notice netifd: Interface 'wan6' has link connectivity loss
Thu Jan 14 09:16:14 2021 daemon.notice netifd: Interface 'wan' has link connectivity loss
Thu Jan 14 09:16:14 2021 kern.err kernel: [ 2747.980212] igb 0000:03:00.0 eth0: Reset adapter
Thu Jan 14 09:16:14 2021 daemon.info pppd[31660]: Terminating on signal 15
Thu Jan 14 09:16:14 2021 daemon.err netdata[7996]: PROCFILE: Cannot open file '/proc/sysvipc/shm'
Thu Jan 14 09:16:14 2021 daemon.notice netifd: Interface 'wan6' is now down
Thu Jan 14 09:16:16 2021 daemon.err netdata[7996]: PROCFILE: Cannot open file '/proc/sysvipc/shm'
Thu Jan 14 09:16:17 2021 kern.info kernel: [ 2751.500653] igb 0000:03:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Thu Jan 14 09:16:17 2021 daemon.notice netifd: Network device 'eth0' link is up
Thu Jan 14 09:16:17 2021 daemon.notice netifd: Interface 'wan6' has link connectivity
Thu Jan 14 09:16:17 2021 daemon.notice netifd: Interface 'wan6' is setting up now
Thu Jan 14 09:16:17 2021 daemon.notice netifd: Interface 'wan' has link connectivity
Thu Jan 14 09:16:18 2021 daemon.err netdata[7996]: PROCFILE: Cannot open file '/proc/sysvipc/shm'
Thu Jan 14 09:16:19 2021 daemon.notice netifd: Interface 'wan' is now down
Thu Jan 14 09:16:19 2021 daemon.notice netifd: Interface 'wan' is setting up now
Thu Jan 14 09:16:19 2021 daemon.err insmod: module is already loaded - slhc
Thu Jan 14 09:16:19 2021 daemon.err insmod: module is already loaded - ppp_generic
Thu Jan 14 09:16:19 2021 daemon.err insmod: module is already loaded - pppox
Thu Jan 14 09:16:19 2021 daemon.err insmod: module is already loaded - pppoe
Thu Jan 14 09:16:19 2021 daemon.info pppd[32333]: Plugin rp-pppoe.so loaded.
Thu Jan 14 09:16:19 2021 daemon.info pppd[32333]: RP-PPPoE plugin version 3.8p compiled against pppd 2.4.8
Thu Jan 14 09:16:19 2021 daemon.notice pppd[32333]: pppd 2.4.8 started by root, uid 0
Thu Jan 14 09:16:19 2021 user.notice mwan3[32283]: Execute ifdown event on interface wan (unknown)
Thu Jan 14 09:16:20 2021 daemon.err netdata[7996]: PROCFILE: Cannot open file '/proc/sysvipc/shm'
Thu Jan 14 09:16:20 2021 kern.err kernel: [ 2754.125442] igb 0000:03:00.0: Detected Tx Unit Hang
I found some posts related to this error but they are all about 'e1000e' driver network card such as 82566 , not 'igb' driver like i210AT.
but still follow the instructions on those posts: installed ethtool via opkg and tried following commands:
ethtool -K eth0 tx off rx off
ethtool -K eth0 gso off gro off tso off
No improvements. Some posts suggested change e1000e firmware using ethtool but i210AT has different data in the firmware. The bit is unset on i210AT so no need to change.
https://downloadmirror.intel.com/15817/eng/README.txt
I also tried different version of openwrt from 2018 to the latest, official and unofficial...
Please advise what I can do to solve this issue.