Mt7621 ethernet performance below expectations

I have conducted some iperf3 tests on my Netgear R6700v2 (mt7621) device and have noticed that it can only sink packets at ~500Mb/s and source them at around 700Mb/s. I have employed irqbalance and ensured that the iperf3 process is on its own core. There is no NAT involved - packets are going directly do/from the device over hard-line gigabit ethernet. While iperf3 is running, htop shows sirq is ~25%, and the iperf3 process is maxed at 100% on one core.

The interesting thing, is that the same device with DD-WRT installed will source/sink packets at the full link speed, topping out at 950Mb/s (max gigabit speed when overhead is considered) in both directions. In that case, sirq was around 13% and the iperf3 process itself maxed at about 25% CPU usage on one core. Performance was so good, I used this device as the packet source/sink for NAT speed trials for other devices. Double the speed at a fraction of the CPU usage.

I did attempt to activate both software and hardware flow offloading. As expected, since NAT isn't involved, neither had any effect.

Both DD-WRT and OpenWrt are using the same mt7621 drivers. Any hints at how to improve performance?

1 Like

iPerf3 testing on the vast majority of embedded devices will not fully saturate a gigabit ethernet connection, regardless if it is sourcing or sinking the packets. That is because it is CPU bound.

You are always best served by running iPerf through a device, where each of the sourcing and sinking devices are capable of saturating the link (i.e. a general purpose computer or even a RPi4 will do the trick.

Are you sure that you had iPerf3 running on the router itself when you were using DDWRT? I don't recall iPerf being built-in with DDWRT, and unlike OpenWrt, you cannot add packages to the firmware. I could be mistaken, though.

Just for your reference, my Xiaomi4A giga (MT7621), tested using OpenWrt official 21.02.1 build, and I got 940Mbps on LAN port.

DD-WRT uses Shortcut Forwarding Engine (SFE).

OpenWRT does not use hardware NAT.

Running packets through is a good test for a device's switching and/or NAT capacity. Running into/out of a device is a good test of it's ability to serve data (SAMBA, FTP, minidlnad, etc).

Entware is available for DD-WRT. It basically offers most OpenWrt packages for most other firmwares built to run out of /opt. I'm positive on the topology I was employing.

Hmmmm. I /am/ running snapshot - doing so to give me access to ntfs3. Perhaps that is an issue.

I wasn't employing NAT in any test on either DD-WRT or OpenWrt. SFE doesn't affect packets sourcing from or destined to the device itself.

HWNAT

1 Like

Very true. My response was based on the typical use of traffic flowing through the device, but if you are serving data from the unit, this does make a lot of sense to test.

Cool. I did not know that. Thanks. It has been probably 5-7 years or more since I last touched DDWRT (and even if it was available at the time, I may not have known to look).

help??

3 Likes

Localhost iperf3 is about 600Mb/s

You know, that would explain a lot. If polling is involved it explains the low throughput and high CPU. Only thing is there isn't an external mt7530 involved here. The mt7621 also has its own internal 5 port switch which is what my device is using. But it makes me wonder if in snapshot kernels the same thing is happening with the internal switch. It's something to look at.

I stand corrected. Looks like this is the issue. From my dmesg:

[    2.989423] mt7530 mdio-bus:1f: MT7530 adapts as multi-chip module
[    3.016994] mt7530 mdio-bus:1f lan4 (uninitialized): PHY [dsa-0.0:00] driver [Generic PHY] (irq=POLL)
[    3.037287] mt7530 mdio-bus:1f lan3 (uninitialized): PHY [dsa-0.0:01] driver [Generic PHY] (irq=POLL)
[    3.057515] mt7530 mdio-bus:1f lan2 (uninitialized): PHY [dsa-0.0:02] driver [Generic PHY] (irq=POLL)
[    3.077604] mt7530 mdio-bus:1f lan1 (uninitialized): PHY [dsa-0.0:03] driver [Generic PHY] (irq=POLL)
[    3.097805] mt7530 mdio-bus:1f wan (uninitialized): PHY [dsa-0.0:04] driver [Generic PHY] (irq=POLL)

EDIT: For now marking this solved (thanks again @anomeome ). Once this makes it into a snapshot I'll try it just to make sure.

One of the "lucky" device targets for HFO.

Thanks for the correction.

I just installed today's snapshot (since the patch just narrowly missed being included in yesterday's). And while the ethernet ports now report actual IRQs rather than falling back to polling, it doesn't make a difference. Performance is just the same.

[    2.990233] mt7530 mdio-bus:1f: MT7530 adapts as multi-chip module
[    3.022722] mt7530 mdio-bus:1f lan4 (uninitialized): PHY [mt7530-0:00] driver [MediaTek MT7530 PHY] (irq=26)
[    3.044812] mt7530 mdio-bus:1f lan3 (uninitialized): PHY [mt7530-0:01] driver [MediaTek MT7530 PHY] (irq=27)
[    3.066878] mt7530 mdio-bus:1f lan2 (uninitialized): PHY [mt7530-0:02] driver [MediaTek MT7530 PHY] (irq=28)
[    3.088901] mt7530 mdio-bus:1f lan1 (uninitialized): PHY [mt7530-0:03] driver [MediaTek MT7530 PHY] (irq=29)
[    3.110976] mt7530 mdio-bus:1f wan (uninitialized): PHY [mt7530-0:04] driver [MediaTek MT7530 PHY] (irq=30)

I know it shows IRQs, but it for the life of me it still behaves like it's actually polling. I can't otherwise explain why iperf3 has such a high CPU usage.

FYI, my Xiaomi 4A Gigabit with official OpenWrt 21.02.1 can source and sink at 700~750 when running iperf (similar test method as OP).

500mbit is about what this SoC can handle, you can optimize compiler flags slightly but that'll probably give you 10% or so at best. Software does evolve and focus is shifted to more modern archs (MIPS is dying/slowly dying) so don't expect newer software to perform better, more like equal or slightly worse. The MT7621 was launched in 2013 so compare that to a 10y old computer not being able to run the latest games (without any upgrades since being purchased). While it does have hardware support for NAT it's always been "best effort" and in general hardware acceleration has always been on the very low priority list due to many reasons. If it performs better with older software and you don't mind the complications it entails just stay with that software.

Accounting for overhead, DD-WRT on the same device can saturate the gigabit Ethernet line. Data is below.

This is with the same mt7621 drivers (essentially)

Interestingly, DD-WRT's mt7530 driver only uses one IRQ for the whole device. With the new patch noted above, OpenWrt is trying to assign one IRQ per port. I don't think OpenWrt's 7621 & 7530 drivers, even with the recent patch, are hooking into the IRQ properly.

Anyway, DD-WRT iperf3 data follows:

Kurt@Roswell:~$ iperf3 -c 192.168.3.200 -t 10
Connecting to host 192.168.3.200, port 5201
[  4] local 192.168.3.20 port 58023 connected to 192.168.3.200 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   114 MBytes   953 Mbits/sec
[  4]   1.00-2.00   sec   110 MBytes   927 Mbits/sec
[  4]   2.00-3.00   sec   113 MBytes   946 Mbits/sec
[  4]   3.00-4.00   sec   113 MBytes   946 Mbits/sec
[  4]   4.00-5.00   sec   112 MBytes   942 Mbits/sec
[  4]   5.00-6.00   sec   113 MBytes   945 Mbits/sec
[  4]   6.00-7.00   sec   113 MBytes   945 Mbits/sec
[  4]   7.00-8.00   sec   112 MBytes   940 Mbits/sec
[  4]   8.00-9.00   sec   113 MBytes   947 Mbits/sec
[  4]   9.00-10.00  sec   113 MBytes   948 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  1.10 GBytes   944 Mbits/sec                  sender
[  4]   0.00-10.00  sec  1.10 GBytes   944 Mbits/sec                  receiver

iperf Done.
Kurt@Roswell:~$ iperf3 -c 192.168.3.200 -t 10 -R
Connecting to host 192.168.3.200, port 5201
Reverse mode, remote host 192.168.3.200 is sending
[  4] local 192.168.3.20 port 58036 connected to 192.168.3.200 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   112 MBytes   941 Mbits/sec
[  4]   1.00-2.00   sec   112 MBytes   942 Mbits/sec
[  4]   2.00-3.00   sec   113 MBytes   946 Mbits/sec
[  4]   3.00-4.00   sec   113 MBytes   947 Mbits/sec
[  4]   4.00-5.00   sec   113 MBytes   947 Mbits/sec
[  4]   5.00-6.00   sec   112 MBytes   941 Mbits/sec
[  4]   6.00-7.00   sec   113 MBytes   947 Mbits/sec
[  4]   7.00-8.00   sec   113 MBytes   947 Mbits/sec
[  4]   8.00-9.00   sec   112 MBytes   942 Mbits/sec
[  4]   9.00-10.00  sec   113 MBytes   945 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.10 GBytes   945 Mbits/sec  1112             sender
[  4]   0.00-10.00  sec  1.10 GBytes   945 Mbits/sec                  receiver

iperf Done.

Interesting, anyone else share this perspetive?

If so, any fix?

I just reverted from yesterday's snapshot to release 21.02.1 to test the throughput there. Interestingly it is much better in the release (~830 megabits/s) than in current snapshot (~500 megabits/s). Data is below.

On this device the 21.02.1 release uses kernel version 5.4 and snapshot uses 5.10. Both, however, should be using fundamentally the same mt7621 and mt7530 drivers.

Speaking toward that recent patch referred to earlier in this thread - I had a brief conversation with a dev. That patch is not addressing throughput. It is only attaching IRQs to detect link status changes. Except for detecting when you plug in your cable three tenths of a second faster, it has no effect on throughput.

So I don't know what's up with snapshot. I think it's clear something is, but that patch has been a red herring. It's still possible that something is amiss in the IRQ handling department. I don't have any real evidence, just this:

  • The iperf3 process rails its core to 100%, 25% of which is system time. Which feels like a lot of system time and is reminiscent of polling.
  • On snapshot receiving (~500 megabit) is much slower than sending (~700 megabit) and that is very reminiscent of polling behaviour. In 21.02.1, sending and receiving are the same (~830 megabit)
  • Benchmarks that differentiate between CPU-only speed and CPU + libc + kernel speed (run on the same device) show that OpenWrt's kernel and libc are MUCH more efficient than DD-WRT's. This implies a more efficient kernel or libc is not what is behind DD-WRT's throughput advantage since it gets the full gigabit link speed out of the same device.

None of the above are absolutely diagnostic, but they do seem to add up.

I'm very interested in this as well.

iperf3 data from 21.02.1 follows:

$ iperf3 -c 192.168.3.200 -t 100
Connecting to host 192.168.3.200, port 5201
[  4] local 192.168.3.20 port 57594 connected to 192.168.3.200 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  99.1 MBytes   831 Mbits/sec
[  4]   1.00-2.00   sec  98.0 MBytes   822 Mbits/sec
...
[  4]  36.00-37.00  sec  97.9 MBytes   821 Mbits/sec
[  4]  37.00-37.37  sec  36.5 MBytes   831 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-37.37  sec  3.59 GBytes   826 Mbits/sec                  sender
[  4]   0.00-37.37  sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated

I would just go with the highest speed option and leave it alone.

5.4 vs 5.10 aren't really a fair comparison and if you built your own box the newest kernels are 5.15-5.17 right now.

Depending on the HW though determines how new you can go. When running 8700K system I could only go as far as 5.14.rc7 but, upgrading to 12700K I'm stuck in the 5.15.x realm as the NIC I'm using for whatever reason doesn't like 5.16 / 5.17 at the moment. Whether it's a missing module or some other oddity the NIC causes a boot hang and doesn't come up when the desktop loads.

There are quirks to each release though as they become available.

Hello everyone,

Got quite the same problem here, after updating my Cudy W1300 on MT7621 from 19 to the latest 21.02.1 firmware, i can max have 500mb compared to 800mb before (1gb/500mb connection)

I did try to reset and start completely fresh, same result.

I've just activate the software and hardware flow offloading and everything is now back to normal.

2 Likes