I have started testing 21.02 on a couple of my MT7621-based devices (ZBT WG3526) and noticed a huge drop in throughput. My test is fairly basic - I use iperf3 and have a client connected to the LAN and WAN side of my router, the network configuration is the default one (i.e., no changes made after installing 21.02). I measure the TCP performance and run eight flows in parallel to make sure I am able to congest the CPU.
When I run my tests with 19.07, I measure around 1Gbit/s. However, with 21.02, I only get ~450 Mbit/s. I see the same with the most recent master (as of when this post is written), just to confirm that I am not dealing with a 21.02 issue. Enabling flow offloading brings the 21.02 performance up to the same level as 19.07, but only seems to help on "plain" routing performance. If I for example add Wireguard to the mix and perform the same test (iperf3 between two clients) across a wg-tunnel, I again see a reduction of roughly 50%.
Does anyone have any configuration or optimisation tips, or pointers on were to start looking trying to reclaim the missing performance? I have made sure that packet steering, etc. is enabled.
Thanks. Flow offloading fortunately helps with my routing throughput test, but does unfortunately not have an effect on Wireguard performance.
After I wrote my post yesterday, I dug out an mt7622 board I have that is equipped with the mt7530 switch (same as mt7621). The reason for testing mt7622, is that the old swconfig driver is still available in the tree.
mt7622 is sufficiently fast to manage gigabit with and without offloading enabled, so I focused on my Wireguard test. When using dsa, I got around half the throughput of the swconfig driver. The CPU usage was around the same with dsa and swconfig, so I guess the dsa performs some operation or steps that swconfig skips. Time to whip out perf and see what goes on.
Another observation is that the performance issue only appears when I measure between WAN and LAN. I changed my WAN connection (WAN port) to a 5G module (connected to a 5G network), and moved my Wiregurard serv (+ iperf3 server) to a DC. With the 5G-setup, I got significantly better throughput than with DSA + WAN <-> LAN.
I'm afraid noone will be able to point a specific source of that problem. You need to at least find OpenWrt commit that broken performance for you. Start checking with kernel bumps (e.g. 5.4 to 5.10 switch).
Thanks. I expected as much, but was hoping someone could maybe point to some out of tree patches or something I could try
I have kept working on the issue and I am not able to make much sense of the output from perf, but considering that one symptom is that CPU usage is very high, I tried backporting and making use of the multi CPU DSA feature that has been proposed a few times (on my mt7622 board). Unfortunately, this didnt help much and performance is roughly the same.
Time to go back staring at the output from perf I guess.
I also saw a large drop in performance across my LAN on my MT7621 ER-X gateway - down to a bit under 300 Mbps going from 19.07 to 21.02 with the ER-X as the server:
The MT7621 target was very recently switched to the 5.10 kernel in snapshot, so I just gave that a try and am happy to report things are looking up for the old ER-X: ~500 Mbps without any flow offloading enabled and packet steering checked. The four CPU threads peak ~74%, ~90%, ~35% and ~70%; so the load is getting spread around fairly well.
I saw no further improvement from enabling software offloading. Still, I'm pretty excited to see improvement to half a gig without offloading on this hardware!
It looks like offloading may be back in the next release:
Thanks for that link mrlamud. I am using VLANs, and I had tried only software, not hardware offload with the recent 5.10 kernel snapshot.
Unfortunately, hardware offload doesn't help either.
With SQM off, and using my ER-X as the iperf3 server, I get ~500 Mbps wired between the ER-X and a Linksys EA8500 AP. Reversing the direction and using the EA8500 as the server, I get ~700-800 Mbps. No offloading, software offloading, and hardware offloading all have the same CPU utilization on the ER-X. Same results with SQM enabled on the WAN.
At least for my set-up, offloading (software or hardware) does nothing at all with the recent 5.10 kernel snapshot. The recent 5.10 kernel snapshot is at least a lot faster without offloading than 21.02.
It looks like kristrev has narrowed the problem down to DSA here:
I continued working on this problem and discovered that I had made a mistake when testing master. After correcting my errors, master gives roughly the same performance as 19.07.
Because master works well, I decided to backport 5.10 to 21.02. This, more or less as expected, brought the 21.02 performance inline with 19.07 and userspace seems to work well. I also backported the upstream kernel commits adding support for the MT7530 interrupt controller. Using the interrupt controller resulted in lower CPU usage, yielding higher Wireguard throughput.
This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.