2 Gbps WAN/LAN NAT Routing on ramips MT7621 devices

The modem in the NR7101 is wired to the SoC internal xhci controller so it won't be directly affected by this. In theory it could take some advantage of the extra bandwidth between cpu and switch, but it's hard to see where you'd use that bandwidth given the single ethernet port...

Answered here Users needed to test 2 Gbps WAN/LAN NAT Routing on ramips MT7621 devices - #53 by arinc9

Tried it on Cudy-X6

Tested straight through cable connecting to hosts. Get close to 1Gbps in both directions.
Test with router inline. Similar to arinc9 test except iperf3 client and server are separate hosts.
With 22.03 rc5 see about 1Gbps total.

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-5.00   sec  83.8 MBytes   141 Mbits/sec   46             sender
[  5][TX-C]   0.00-5.00   sec  82.5 MBytes   138 Mbits/sec                  receiver
[  7][RX-C]   0.00-5.00   sec   441 MBytes   740 Mbits/sec  219             sender
[  7][RX-C]   0.00-5.00   sec   438 MBytes   734 Mbits/sec                  receiver

With arinc9 sysupgrade see about the same

 - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-20.00  sec   443 MBytes   186 Mbits/sec  280             sender
[  5][TX-C]   0.00-20.01  sec   439 MBytes   184 Mbits/sec                  receiver
[  7][RX-C]   0.00-20.00  sec  1.48 GBytes   637 Mbits/sec   28             sender
[  7][RX-C]   0.00-20.01  sec  1.48 GBytes   635 Mbits/sec                  receiver

I uploaded sysupgrade and did not keep config.
All I did was change IP addresses on router.
Is there some other config required?
hardware offload? where is that?

Ok, found it in firewall settings.
Gotta click Software offloading check box to see the Hardware offloading check box. A bit confusing.

Now the magic happened.
This is what I get with arinc9 sysupgrade file hardware offloading enabled on Cudy X6

- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-20.00  sec  2.17 GBytes   931 Mbits/sec    0             sender
[  5][TX-C]   0.00-20.00  sec  2.17 GBytes   930 Mbits/sec                  receiver
[  7][RX-C]   0.00-20.00  sec  2.04 GBytes   876 Mbits/sec    0             sender
[  7][RX-C]   0.00-20.00  sec  2.04 GBytes   875 Mbits/sec                  receiver

1 Like

Also tested Cudy 2100 with corresponding similar joy.

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-20.00  sec  2.17 GBytes   931 Mbits/sec   12             sender
[  5][TX-C]   0.00-20.00  sec  2.17 GBytes   930 Mbits/sec                  receiver
[  7][RX-C]   0.00-20.00  sec  2.04 GBytes   877 Mbits/sec    0             sender
[  7][RX-C]   0.00-20.00  sec  2.04 GBytes   876 Mbits/sec                  receiver

This is the test setup:
test-setup-pic2

1 Like

This is a very good point. Let me start with explaining the bridge offloading feature. This is not related to software or hardware offloading at all.

Quoting from OpenWrt 22.03.0-rc4 fourth release candidate - #91 by arinc9.

There's also this feature called bridge offloading. Most DSA subdrivers implement this. The only one I know that lacks this is the rtl8365mb driver. What this feature does is it offloads traffic between switch ports. Forwarding frames between switch ports will be offloaded to the switch hardware so they don't go through the CPU and exhaust the link to the CPU.

You get full speed on switching between lan0 and lan1 thanks to this.

When lan0 is directly connected to the CPU, you can't benefit this anymore. Hardware offloading will help but it's a layer 3 feature, meaning, packets between the interfaces must be routed.

This is the rule for hardware offloading:

iptables -A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j FLOWOFFLOAD --hw

Related patches to the FLOWOFFLOAD target:
openwrt/650-netfilter-add-xt_FLOWOFFLOAD-target.patch at master · openwrt/openwrt
openwrt/800-flowoffload_target.patch at master · openwrt/openwrt

So, it's not a missing configuration but rather a design choice. Switching at CPU level is not very efficient on these SoCs so you'd prefer to do routing instead.

2 Likes

@arinc9, what is your reasoning for bringing up the gmac1 as a separate WAN port? Why not patch the dsa driver to fully support multiple cpu ports (there were patches for this before, that for some reason never made it into mainline).

So that we get wan@eth1 and lan1..4@eth0. Or even have it user selectable like: wan@eth1, lan1@eth0, lan2@eth1, lan3@eth0 or what ever combination makes most sense depending the use case.

That would be a DSA-subsystem-wide feature and there's a reason why the said patches never made into mainline. The MT7530 DSA driver already implements this MT7530-switch-specific feature which lets muxing PHY0/4 to GMAC5 of the switch.

Would the D-Link DIR 1960 benefit from this?

Given that the DIR-1960 is included in this patch-set, I would say yes. Basically any MT7621 based device would with a few exceptions as stated in the first post.

@arinc9, I was hoping that OpenWrt could/would accept some multi-cpu dsa patches so more targets would benefit from this even if the mainline kernel will not due to lack of consensus how to implement it. @Ansuel tried and failed on mainline with this if I remember correctly, even it worked perfectly on the his targets.

I agree, however, this is not really my concern.

is there an upgrade package for the DIR-1960? I have not done the patch route before but im comfortable doing it as long as i know the process.

EDIT---- Nevermind i located it

I can confirm that on the DLINK DIR-1960 (and therefore 2660 also) all seems to work just fine. I have not done a full duplex speed test but the image itself and wan port are both working.

2 Likes

@arinc9 Can you explain a bit more why there is no benefit for this on Xiaomi Mi Router 3G V1? I saw it mentioned on GitHub.

Can this patch help with VLAN performance? I have my wan port tagged and lan1 and lan2 untagged as VLAN 200 and 300 with bridge filtering enabled. I'm running 22.03.0-rc6
Switching works fine and I get gigabit speeds (~890 mbit/s) on devices plugged to lan1/lan2. But running iperf3 on router caps at around 300 mbit/s.
Is it because of missing flow offload for br.200 interface?

MT7530's phy0 or phy4 can be muxed to the switch's gmac5 which connects to the SoC's gmac1. This device does not use phy0 or phy4 so there is no muxable PHY.

Not directly, no.

1 Like

Someone on GitHub commented

I flashed the test firmware on iptime t5004 and saw the nat/routing performance increased from 500/500 to 1000/1000. But the bridge performance dropped from 1000/1000 to 500/500.

It is more generally considered to be beneficial to prioritize nat / routing performance.

Is this expected/correct?

I have tested on a TOTOLINK X5000R. Network ports work correctly.

Unfortunately I can not test with iperf3 at the moment, but even the standard speedtest.net results are much better.

Previous values were around 700/850, with quite a bit of variance between runs.
I am now getting 944/946 - which is the same on every run. It's much more consistant.

I will now read up on the impact of using the flow offloading settings, but I suspect it won't matter as our network is simple, with only basic firewall rules.

Prior to this, I was using the router just for the wifi, with a WRT32X being main so I could get closer to the 1000/1000 my ISP gives.

It looks like it's finally time to retire the WRT32X.
Thank you very much for the patch.

1 Like

The top: This was already discussed here.
The bottom: To each their own.

You can check this post of mine to read more details on hardware offloading. It shouldn't have any negative effects.

I actually happen to have a similar setup. WRT32X's Wi-Fi is just horrendous with unstable connection on 2.4 GHz radio so I got an AP with mt7621 SoC to deal with Wi-Fi instead.

Cheers.

1 Like

Is there support for the Netgear wax202 with the same chip? I can't find any file on the GitHub to test.