Network issues on new OpenWrt install on X86

I have a core2duo machine, 4GB ram, 32GB SSD.
It uses and Intel board, intel onboard nic and a secondary Intel Nic.
Openwrt installed fine and booted up. I got an IP address from Verizion FiOS right away
The Internet is technically working sorta. I have 400 Mbps symmetrical
On my Main PC connected via ethernet my connection spikes to a few hundred megabytes and stalls out and won't even complete a speedtest. On my phone, connected to WiFi AP I'm get 32 mbps down and 3.21 mbps up. Same thing with my Linux box I tried downloading a torrent and it spikes to a couple MB/s then slows down to like 56KB/s and back up again.
The only thing I've down is set a password, install LuCI and I disabled the IP6 wan interface hoping that would help because I don't think FiOS supports ip6 yet. But that didn't help at all.
This box ran pfsense perfectly a few hours ago so it's not a hardware issue.

Ideas?

I think Core2Duo is too old x86 architecture, it cannot handle 400mbps speed.
You can try to install irqbalance and enable software flow-offload to see if it improves.

It's possible that the machine itself is not capable of 400 mbps that remains to be seen.
But the problem is much bigger than this. I found this in the system / kernel log

un Sep 20 12:56:02 2020 kern.err kernel: [26054.614074] PHY Extended Status    <3000>
Sun Sep 20 12:56:02 2020 kern.err kernel: [26054.614074] PCI Status             <10>
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188]   TDH                  <2a>
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188]   TDT                  <a5>
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188]   next_to_use          <a5>
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188]   next_to_clean        <28>
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188] buffer_info[next_to_clean]:
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188]   time_stamp           <100623a42>
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188]   next_to_watch        <2a>
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188]   jiffies              <100624028>
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188]   next_to_watch.status <0>
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188] MAC Status             <80083>
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188] PHY Status             <796d>
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188] PHY 1000BASE-T Status  <3800>
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188] PHY Extended Status    <3000>
Sun Sep 20 12:56:04 2020 kern.err kernel: [26056.598188] PCI Status             <10>
Sun Sep 20 12:56:06 2020 kern.err kernel: [26058.617970] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:

It looks like the ethernet card is hanging, may be a driver issue. I used LSPCI and the cards are
00:19.0 Ethernet controller: Intel Corporation 82567LM-3 Gigabit Network Connection (rev 02)
and
02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

Card uses IRQ to ask the driver to do some work.

If doesn't keep up then you may have issues. Trying IRQ balance is worth it

Test some modern general-purpose Linux distribution to localize the issue.
You can boot a live media for testing.

The Core 2 Duo chip will be plenty fast for this application, even with interrupts constrained to a single core.

It should be running the 19.07.4 x86-64 build. Release builds include Luci.

I see the problem is in 82567 controller, it does not use the pcie bus but uses the GLCI bus. Your firmware uses the e1000e driver for the pcie controller, which x86 image did you install ?

I used this image
https://downloads.openwrt.org/snapshots/targets/x86/64/openwrt-x86-64-generic-ext4-combined-efi.img.gz
From this page
https://downloads.openwrt.org/snapshots/targets/x86/64/

Based on this guide
https://openwrt.org/docs/guide-user/installation/openwrt_x86

Note to others, the nics work in ubuntu and opnsense, so I'm pretty sure my problem with openwrt is a driver issue

1 Like

Try remove kmod-e1000e vs install kmod-e1000 driver to see what happens.

1 Like

The dreaded TX unit hang…
This is a firmware (EEPROM) issue that has plagued PCI and early PCIe e1000/ e1000e chipsets for years, it's effectively a problem with powersaving features interfering with normal operations. The result of this are dropped packets and silent data corruption, these cards are effectively defective (and while less prominent, it also affects windows or other operating systems).

On several e1000/ e1000e chipsets you can mitigate this by (permanently) binary patching the card's EEPROM using ethtool, which effectively disables the problematic powersaving features in hardware. One guide (for a slightly older chipset revision) I can find quickly about this issue would be 82573(V/L/E) TX Unit Hang Messages, you'd need to check up on the details for your revision yourself - be aware that this hot patching via ethtool is risky business and might destroy your card for good (backups, backups, backups, they might not help under all circumstances, but it's better to have them than not).

I've been plagued by these problems on two rack servers in the past (one slightly older, PCI based - the other a bit newer, with two onboard cards based on PCIe) and an Intel e1000e/ PCIe desktop card. Back when these systems were brandnew and supposed to be deployed, there was no solution known to fix these issues - it only became apparent after had had already gone into service due to the abysmal performance and silent data corruption. As there was no fix available at that time, I just added realtek r8168 cards for less than 10 bucks each instead, which just worked fine for over half a decade. Only after decommissioning these servers from production I could get back to them again and fix the EEPROM by hand, they've been fine for personal uses afterwards (and still are today).

While general consensus seems to be that intel ethernet is high quality and much better accelerated than realtek (this is certainly true for rtl8139 (and their windows drivers in particular, the linux drivers are faster than their original windows counterparts), not so much for r8168), my personal experiences seem to differ. I never saw anything like this with realtek - and contrary to e100 (and more contemporary desktop intel network cards), even rtl8139 is still fully supported in modern server operating system editions.

6 Likes

Guess I'm sticking with opnsense then.
This machine acted as a pfsense router for a year or two with no issues, whatever the difference in the Linux vs FreeBSD driver is at makes these trash in Linux. I'm not going to the trouble of patching and possibly breaking the cards if it works fine in FreeBSD

I left pfsense cause it felt slow to get new features and I hated the interface. OPNsense seems a little quicker to add features and the UI is much better.

I wanted to use openwrt for SQM / Cake but I'm getting pretty good performance out of just FQ_codel on OPNSense

Thanks for explaining what's wrong, cheers

Beside patching, is there a way to deactivate those power saving mechanism on intel nics? Powertop is available on OpenWRT since about a year.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.