I have conducted some iperf3 tests on my Netgear R6700v2 (mt7621) device and have noticed that it can only sink packets at ~500Mb/s and source them at around 700Mb/s. I have employed irqbalance and ensured that the iperf3 process is on its own core. There is no NAT involved - packets are going directly do/from the device over hard-line gigabit ethernet. While iperf3 is running, htop shows sirq is ~25%, and the iperf3 process is maxed at 100% on one core.
The interesting thing, is that the same device with DD-WRT installed will source/sink packets at the full link speed, topping out at 950Mb/s (max gigabit speed when overhead is considered) in both directions. In that case, sirq was around 13% and the iperf3 process itself maxed at about 25% CPU usage on one core. Performance was so good, I used this device as the packet source/sink for NAT speed trials for other devices. Double the speed at a fraction of the CPU usage.
I did attempt to activate both software and hardware flow offloading. As expected, since NAT isn't involved, neither had any effect.
Both DD-WRT and OpenWrt are using the same mt7621 drivers. Any hints at how to improve performance?
iPerf3 testing on the vast majority of embedded devices will not fully saturate a gigabit ethernet connection, regardless if it is sourcing or sinking the packets. That is because it is CPU bound.
You are always best served by running iPerf through a device, where each of the sourcing and sinking devices are capable of saturating the link (i.e. a general purpose computer or even a RPi4 will do the trick.
Are you sure that you had iPerf3 running on the router itself when you were using DDWRT? I don't recall iPerf being built-in with DDWRT, and unlike OpenWrt, you cannot add packages to the firmware. I could be mistaken, though.
Running packets through is a good test for a device's switching and/or NAT capacity. Running into/out of a device is a good test of it's ability to serve data (SAMBA, FTP, minidlnad, etc).
Entware is available for DD-WRT. It basically offers most OpenWrt packages for most other firmwares built to run out of /opt. I'm positive on the topology I was employing.
Hmmmm. I /am/ running snapshot - doing so to give me access to ntfs3. Perhaps that is an issue.
I wasn't employing NAT in any test on either DD-WRT or OpenWrt. SFE doesn't affect packets sourcing from or destined to the device itself.
Very true. My response was based on the typical use of traffic flowing through the device, but if you are serving data from the unit, this does make a lot of sense to test.
Cool. I did not know that. Thanks. It has been probably 5-7 years or more since I last touched DDWRT (and even if it was available at the time, I may not have known to look).
You know, that would explain a lot. If polling is involved it explains the low throughput and high CPU. Only thing is there isn't an external mt7530 involved here. The mt7621 also has its own internal 5 port switch which is what my device is using. But it makes me wonder if in snapshot kernels the same thing is happening with the internal switch. It's something to look at.
I just installed today's snapshot (since the patch just narrowly missed being included in yesterday's). And while the ethernet ports now report actual IRQs rather than falling back to polling, it doesn't make a difference. Performance is just the same.
I know it shows IRQs, but it for the life of me it still behaves like it's actually polling. I can't otherwise explain why iperf3 has such a high CPU usage.
500mbit is about what this SoC can handle, you can optimize compiler flags slightly but that'll probably give you 10% or so at best. Software does evolve and focus is shifted to more modern archs (MIPS is dying/slowly dying) so don't expect newer software to perform better, more like equal or slightly worse. The MT7621 was launched in 2013 so compare that to a 10y old computer not being able to run the latest games (without any upgrades since being purchased). While it does have hardware support for NAT it's always been "best effort" and in general hardware acceleration has always been on the very low priority list due to many reasons. If it performs better with older software and you don't mind the complications it entails just stay with that software.
Accounting for overhead, DD-WRT on the same device can saturate the gigabit Ethernet line. Data is below.
This is with the same mt7621 drivers (essentially)
Interestingly, DD-WRT's mt7530 driver only uses one IRQ for the whole device. With the new patch noted above, OpenWrt is trying to assign one IRQ per port. I don't think OpenWrt's 7621 & 7530 drivers, even with the recent patch, are hooking into the IRQ properly.
I just reverted from yesterday's snapshot to release 21.02.1 to test the throughput there. Interestingly it is much better in the release (~830 megabits/s) than in current snapshot (~500 megabits/s). Data is below.
On this device the 21.02.1 release uses kernel version 5.4 and snapshot uses 5.10. Both, however, should be using fundamentally the same mt7621 and mt7530 drivers.
Speaking toward that recent patch referred to earlier in this thread - I had a brief conversation with a dev. That patch is not addressing throughput. It is only attaching IRQs to detect link status changes. Except for detecting when you plug in your cable three tenths of a second faster, it has no effect on throughput.
So I don't know what's up with snapshot. I think it's clear something is, but that patch has been a red herring. It's still possible that something is amiss in the IRQ handling department. I don't have any real evidence, just this:
The iperf3 process rails its core to 100%, 25% of which is system time. Which feels like a lot of system time and is reminiscent of polling.
On snapshot receiving (~500 megabit) is much slower than sending (~700 megabit) and that is very reminiscent of polling behaviour. In 21.02.1, sending and receiving are the same (~830 megabit)
Benchmarks that differentiate between CPU-only speed and CPU + libc + kernel speed (run on the same device) show that OpenWrt's kernel and libc are MUCH more efficient than DD-WRT's. This implies a more efficient kernel or libc is not what is behind DD-WRT's throughput advantage since it gets the full gigabit link speed out of the same device.
None of the above are absolutely diagnostic, but they do seem to add up.
I would just go with the highest speed option and leave it alone.
5.4 vs 5.10 aren't really a fair comparison and if you built your own box the newest kernels are 5.15-5.17 right now.
Depending on the HW though determines how new you can go. When running 8700K system I could only go as far as 5.14.rc7 but, upgrading to 12700K I'm stuck in the 5.15.x realm as the NIC I'm using for whatever reason doesn't like 5.16 / 5.17 at the moment. Whether it's a missing module or some other oddity the NIC causes a boot hang and doesn't come up when the desktop loads.
There are quirks to each release though as they become available.
Got quite the same problem here, after updating my Cudy W1300 on MT7621 from 19 to the latest 21.02.1 firmware, i can max have 500mb compared to 800mb before (1gb/500mb connection)
I did try to reset and start completely fresh, same result.