Overview
While there's been a lot of individuals testing various routers, the question still always seems to come up, "What can I expect from Router X?"
I won't say that my testing is going to definitively answer those questions, but at least it should be a self-consistent data set.
To be very clear, this is a benchmark. Like most benchmarks, the performance you get in your own environment may be different. If you have a craptastic ISP, a lossy ISP connection, a flakey cable, hosts pummeling your LAN with broadcast traffic, a different traffic distribution, a poorly implemented router, or any of an uncountable number of other things, the results you see may not meet the levels reported here. It is your responsibility to use any information provided here to assist you in making your decision.
Testing Outline
Testing was performed using flent
which provides "stress testing" by using multiple streams. The three tests shown are tcp_8down, tcp_8up, (with ICMP ping) and RRUL (4 up, 4 down, with 3 UDP ping and ICMP ping). IPv4 was used. The standard, 60-second test duration (for an overall 70-second run duration) was used. These multi-stream tests are more stressful than single-stream testing.
This testing uses full (1500 MTU), TCP packets. If your traffic consists of a large fraction of small packets (such as VOIP), the PPS (packet-per-second) rate will be much higher for a given bandwidth. These effects are not explored in this thread, though they may further limit the performance in your environment.
Flow offloading is not enabled in the config. At the present time (September, 2019) it apparently does not work on master
. (2019-09-27 – Appears to now be resolved on master
)
Current testing does not include irqbalance
. Earlier testing included irqbalance
running from boot for multi-core devices. Note that it needs to be enabled in /etc/config/irqbalance
to have an effect. At least one device under some conditions seems to perform worse with irqbalance
enabled (IPQ4019). It has been hypothesized that the overhead of running irqbalance
at the limits of performance is behind that observation. Performing tests with your device in your environment is suggested.
Default CPU-speed governors and settings were used. Many suggest that "tweaks" here can improve start-up and steady-state performance. If your tweaks let you exceed these numbers in practice, that's great!
SQM using piece_of_cake.qos
was applied to the WAN interface for NAT/routing, or to the VPN's tunnel interface for WireGuard and OpenVPN. The same bandwidth target was applied for upstream as well as downstream. Overhead of 22 was used for Ethernet, 82 for WireGuard1, and 95 for OpenVPN2. The overhead values are believed to be close to correct, but are not prescriptive.
After staring at too many of these runs early on in this effort, I realized how much of a judgement call it was to look at a run and say if it was good or bad. Also, the sheer number of runs needed, even at just over a minute each, pushed me to automate the testing (even automated, the full set of runs takes five or six hours per device). I settled on the following criteria for a "good" SQM run:
- Coefficient of variance of bandwidth of all eight TCP streams under 1%, or
- Standard deviation of bandwidth of all eight TCP streams under 0.02 Mbps
Basically, if the streams go out of balance from each other by a "tiny" bit, then SQM is starting to break down. The second criteria comes into play for low-bandwidth runs (typically under 10 Mbps, aggregate), where the output from flent
for a single stream might be 0.12 Mbps
and the variance gets dominated by the rounding precision and/or single packets become significant in the flow measurement variance.
When the ICMP-ping time exceeded 10 ms, the highest throughput with a ICMP-ping time of under 10 ms is shown as well.
The number shown is the reported throughput, not the SQM bandwidth-target setting.
LuCI is not "logged in" as early testing on a single-core device suggested that this can significantly degrade throughput performance. LuCI is installed and nginx / OpenSSL is installed and running. Running LuCI and generating page content will likely reduce the performance seen on single-core devices.
Wireless is not enabled.
WireGuard and OpenVPN were installed on Debian 10 ("Buster") on two upper-range x86_64/AMD64 machines. An AMD Ryzen 5 2600X was configured as the VPN "server" and netserver
host and an Intel i3-7100T was driving the test as the "LAN" client. OpenVPN is configured for UDP, without compression. The test harness is capable of over 900 Mbps rates for Ethernet and WireGuard using netperf
(effectively at the media limits with iperf3
). It is capable of ~500 Mbps for OpenVPN.
As I'm not recommending any specific devices in this thread, I've listed them by target, clock speed, number of cores, and the SoC.
All tested devices have at least 128 MB of RAM.
Speeds are in Mbps, aggregate, as reported by flent (which uses netperf
). Non-SQM runs are the median of five runs. SQM runs required two "successes" with less than two non-successful runs at a given bandwidth target. The parenthetical numbers are the median ping time in ms. Results rounded to at most two significant figures.
With a minimum step size in the SQM bandwidth-target search around 5%, differences of around 5-10% may not be signficant in the SQM results. As ping time often increases abruptly as SQM starts to get overstressed (which is where these results are measured), I wouldn't consider ping time for SQM results as more than "interesting information" about the benefits of SQM when the router is stressed, not just the line.
Note that the RRUL test has upstream and downstream flows "competing" with each other. Especially noted in VPN testing, there can be a huge imbalance between the two without SQM, with one direction or the other dominating, or even effectively excluding the other.
Remember, the tcp_8up or tcp_8down tests' numbers don't imply that you can get that throughput in both directions simultaneously. Look at the RRUL numbers for simultaneous throughput. Remember that the RRUL numbers are already the total of upstream and downstream.
I would suggest selecting a device that exceeds your needs by a comfortable margin as measures of performance other than throughput (such as latency and stream "fairness") are often much better at slightly lower bandwidths than at the limits of performance.
Q&A
Router X's ping time is so high. Router Y's is lower and it's cheaper. Router X sucks!
Remember that this is a test to estimate the limits of performance, not to estimate latency under reasonable operating conditions. You shouldn't be regularly pushing your router this hard. At reasonable rates, the tested routers should all provide reasonable latency of a couple milliseconds or less, with another couple added if a VPN is involved for the slower SoCs. Even just backing down the SQM bandwidth target a small amount can significantly reduce the latency.
The note says that downstream OpenVPN dominates the results. Does that mean it only works well in one direction?
As with ping time, these numbers are at the limits of performance. When CPU or other limits come into play, "balance" can be greatly impacted by the code involved, kernel prioritization, implementation details of the TCP stack and Ethernet drivers, and the like. Virtually all of the OpenVPN tests without SQM and with competing upstream and downstream streams resulted in virtually only downstream results. Back off of these limits by reducing the throughput you're asking for (fewer requests or SQM) and reasonable, simultaneous upstream and downstream performance can be achieved.
WireGuard also exhibits imbalance, though not to the extremes seen in the OpenVPN testing.
My iperf3
results are better.
That wouldn't surprise me. There's less to keep track of with a single stream, especially with SQM in play. Similar comments apply for things like the DSLReports test, which is a download test then an upload test.
Why haven't you tested Router X?
Likely because I don't own it, it's too old, or I haven't gotten to it yet.
But, it's faster if you set...
This primarily around providing general guidance if a device is "sufficient" for an application, especially for users that do not build their own images or highly configure OpenWrt. Things are pretty much stock with these configs. Any improvements over stock config or build are a bonus.
Have a great tweak? I'd be interested in hearing about it on another thread, as would many others!
Results
Differences in SQM results of less than 5-10% may not be significant.
as the smallest SQM step size explored was ~5%. Remember also that at the limits of performance of the device, dropping the SQM bandwidth target slightly can significantly reduce the latency.
Note that 10% of 920 Mbps is nearly 100 Mbps.
Results rounded to two significant figures or less (this is more than the step size of the SQM tests).
If you skipped to here, use of flow offloading is indicated, the CPU-speed (governor) parameters have not been changed from the defaults, and there are no other "performance tweaks" known to be applied in the build or configuration. piece_of_cake.qos
is used for SQM testing. Wireless is not enabled.
Key:
1720 (6) |
Throughput in Mbits/s ping time in ms |
When two results are shown side-by-side in the same cell, one represents the performance seen with a ping time of 10 ms or less. The other is the "by the test" limit where SQM was starting to "fall apart".
Routing/NAT
No Flow Offload
Target | Clock | Cores | SoC / CPU | Notes | 8 Dn | 8 Up | RRUL | 8 Dn / SQM | 8 Up / SQM | RRUL / SQM |
---|---|---|---|---|---|---|---|---|---|---|
x86_64 | 1500-2500 | 4 | Celeron J4105 | Realtek RT8111A | 920 (4) |
940 (4) |
1720 (6) |
900 (1) |
920 (1) |
1610 (2) |
x86_64 | 1000-1400 | 4 | AMD GX-412TC |
Intel i211AT | 920 (4) |
940 (5) |
1740 (7) |
770L (2) |
930 (2) |
700 (1) |
mvebu a9 | 1866 | 2 | 88W8964 | 940 (2) |
940 (2) |
1820 (4) |
900 (1) |
920 (1) |
1110 (2) |
|
mvebu a53 | 1000 | 2 | 88F3720 | OEM v19.01rc1? | 560 (31) |
510 (35) |
510 (32) |
290 (1) |
270 (1) |
290 (2) |
ipq40xx | 717 | 4 | IPQ4019 | Single RGMII3 | 840 (19) |
940 (6) |
1420 (26) |
200L (3) |
210 (4) |
210 (3) |
ath79 | 775 | 1 | QCA9563 | Single MAC | 470 (9) |
380 (9) |
400 (9) |
210L (4) |
151 (3) |
185 (9) |
ath79 | 720 | 1 | QCA9558 | Dual MAC | 390 (8) |
320 (9) |
340 (13) |
210L (4) |
160 (3) |
196 (6) |
ath79 | 650 | 1 | QCA9531 | 100 Mbps phys | 67 (1) |
86 (3) |
184 (15) |
93 (4) |
92 (3) |
165 (6) |
ramips | 580 | 1 | MT7628N | 100 Mbps phys | 94 (6) |
94 (7) |
157 (87) |
81 (3) |
78 (3) |
75 (4) |
With Flow Offload (Many Devices Not Yet Tested)
Target | Clock | Cores | SoC / CPU | Notes | 8 Dn | 8 Up | RRUL | 8 Dn / SQM | 8 Up / SQM | RRUL / SQM |
---|---|---|---|---|---|---|---|---|---|---|
mvebu a9 | 1866 | 2 | 88W8964 | 940 (2) |
940 (2) |
1830 (3) |
920 (1) |
920 (1) |
1210 (1) |
|
mvebu a53 | 1000 | 2 | 88F3720 | OEM v19.01rc1? | 910 (8) |
910 (9) |
860 (3) |
330 (1) |
400 (1) |
370 (1) |
ipq40xx | 717 | 4 | IPQ4019 | Single RGMII3 | 940 (1) |
940 (3) |
1460 (3) |
230L (3) |
270 (2) |
250 (2) |
L The downstream, SQM throughput seems to be limited by something other than the SQM bandwidth target itself. Increasing the bandwidth target past this point did not significantly increase throughput, although the streams remained in balance.
WireGuard, Routing/NAT
No Flow Offload
Target | Clock | Cores | SoC / CPU | Notes | 8 Dn | 8 Up | RRUL | 8 Dn / SQM | 8 Up / SQM | RRUL / SQM |
---|---|---|---|---|---|---|---|---|---|---|
x86_64 | 1500-2500 | 4 | Celeron J4105 | Realtek RT8111A | 880 (5) |
900 (7) |
1670 (10) |
860 (2) |
880 (1) |
1470 (2) |
x86_64 | 1000-1400 | 4 | AMD GX-412TC |
Intel i211AT | 550 (18) |
460 (8) |
500 (27) |
350L (4) |
340 (2) |
340 (3) |
mvebu a9 | 1866 | 2 | 88W8964 | 890 (8) |
770 (13) |
840 (17) |
540 (7) |
550 (8) |
580 (6) |
|
mvebu a53 | 1000 | 2 | 88F3720 | OEM v19.01rc1? | 280 (9) |
220 (10) |
230 (10) |
168 (8) |
180 (8) |
161 (5) |
ipq40xx | 717 | 4 | IPQ4019 | Single RGMII3 | 350 (30) |
260 (117) |
300 (69) |
162 (10) |
147 (8) |
162 (10) |
ath79 | 775 | 1 | QCA9563 | Single MAC | 80 (33) |
75 (34) |
75 (37) |
41 (9) |
50 (9) |
46 (10) |
ath79 | 720 | 1 | QCA9558 | Dual MAC | 78 (38) |
74 (40) |
74 (49) |
44 (10) |
51 (9) |
46 (10) |
ath79 | 650 | 1 | QCA9531 | 100 Mbps phys | 51 (15) |
67 (59) |
69 (63) |
26 (4) |
35 41 (8) (13) |
32 (6) |
ramips | 580 | 1 | MT7628N | 100 Mbps phys | 48 (34) |
46 (65) |
48 (77) |
29 33 (10) (15) |
28 (8) |
27 (10) |
With Flow Offload (Many Devices Not Yet Tested)
Target | Clock | Cores | SoC / CPU | Notes | 8 Dn | 8 Up | RRUL | 8 Dn / SQM | 8 Up / SQM | RRUL / SQM |
---|---|---|---|---|---|---|---|---|---|---|
mvebu a9 | 1866 | 2 | 88W8964 | 900 (4) |
770 (16) |
850 (17) |
580 (7) |
570 (10) |
620 (10) |
|
mvebu a53 | 1000 | 2 | 88F3720 | OEM v19.01rc1? | 310 (8) |
270 (7) |
280 (9) |
175 (8) |
210 (6) |
189 (8) |
ipq40xx | 717 | 4 | IPQ4019 | Single RGMII3 | 410 (29) |
300 (102) |
330 (81) |
191 (10) |
171 (9) |
193 (10) |
L The downstream, SQM throughput seems to be limited by something other than the SQM bandwidth target itself. Increasing the bandwidth target past this point did not significantly increase throughput, although the streams remained in balance.
OpenVPN, Routing/NAT
OpenVPN tested at the device's limits with the bi-directional, RRUL test and without SQM results in the vast majority of bandwidth (often 90% or more) being downstream.
AES-256-CBC (Many Devices Not Yet Tested)
Target | Clock | Cores | SoC / CPU | Notes | 8 Dn | 8 Up | RRUL | 8 Dn / SQM | 8 Up / SQM | RRUL / SQM |
---|---|---|---|---|---|---|---|---|---|---|
mvebu a9 | 1866 | 2 | 88W8964 | 148 (24) |
106 (7) |
146 (21) |
110 122 (2) (21)</small |
101 (6) |
100 (2) |
|
mvebu a53 | 1000 | 2 | 88F3720 | OEM v19.01rc1? | 82 (39) |
58 (13) |
85 (38) |
47 50 (6) (29) |
45 (10) |
43 (9) |
ipq40xx | 717 | 4 | IPQ4019 | Single RGMII3 | 31 (84) |
24 (32) |
32 (77) |
22 (8) |
21 (3) |
21 22 (6) (34) |
AES-256-CBC with Flow Offload (Many Devices Not Yet Tested)
Target | Clock | Cores | SoC / CPU | Notes | 8 Dn | 8 Up | RRUL | 8 Dn / SQM | 8 Up / SQM | RRUL / SQM |
---|---|---|---|---|---|---|---|---|---|---|
mvebu a9 | 1866 | 2 | 88W8964 | 165 (20) |
110 (7) |
160 (20) |
111 123 (5) (21) |
101 (2) |
102 (3) |
|
mvebu a53 | 1000 | 2 | 88F3720 | OEM v19.01rc1? | 103 (36) |
65 (12) |
103 (35) |
48 51 (9) (30) |
48 (7) |
45 47 (7) (13) |
ipq40xx | 717 | 4 | IPQ4019 | Single RGMII3 | 31 (73) |
26 (29) |
34 (73) |
22 (7) |
21 (8) |
21 (7) |
Older Tests -- Likely BF-CBC
Target | Clock | Cores | SoC / CPU | Notes | 8 Dn | 8 Up | RRUL | 8 Dn / SQM | 8 Up / SQM | RRUL / SQM |
---|---|---|---|---|---|---|---|---|---|---|
x86_64 | 1500-2500 | 4 | Celeron J4105 | Realtek RT8111A | 240 (18) |
200 (5) |
240 (14) |
157 (5) |
155 (1) |
151 (4) |
x86_64 | 1000-1400 | 4 | AMD GX-412TC |
Intel i211AT | 68 (58) |
56 (16) |
68 (55) |
48 (5) |
49 (9) |
49 (9) |
ipq40xx | 717 | 4 | IPQ4019 | Single RGMII3 | 31 (83) |
26 (29) |
33 (76) |
23 (10) |
20 (9) |
20 (7) |
ath79 | 775 | 1 | QCA9563 | Single MAC | 20 (200) |
15 (60) |
20 (188) |
12 14 (8) (17) |
12 13 (7) (47) |
10 14 (10) (70) |
ath79 | 720 | 1 | QCA9558 | Dual MAC | 21 (188) |
16 (57) |
21 (175) |
13 15 (8) (48) |
13 14 (8) (17) |
11 14 (10) (57) |
ath79 | 650 | 1 | QCA9531 | 100 Mbps phys | 16 (220) |
12 (73) |
17 (230) |
10 11 (9) (19) |
10 11 (8) (72) |
9 11 (5) (74) |
ramips | 580 | 1 | MT7628N | 100 Mbps phys | 14 (290) |
10 (88) |
14 (280) |
8 9 (9) (20) |
8 9 (9) (40) |
5 9 (10) (90) |
Notes
On VPN
Encryption is often more expensive than is decryption.
The RRUL test has upstream and downstream flows "competing" with each other. Without SQM, the results shown are often dominated by one direction or the other. For OpenVPN, the effect is nearly complete with "only" downstream traffic showing significant bandwidth (often 90% or more). Interestingly, for WireGuard, the upstream direction seems to get more of the traffic and there is still a reasonable downstream flow.
If latency is important to you, using SQM on the tunnel interface may help significantly (at the cost of overall bandwidth).
On Bandwidth
The link bandwidth will always exceed the upper-layer, payload throughput. This is due to a variety of factors including, for example:
- Collision avoidance and sync on the media
- Media-layer (Ethernet) framing
- IP framing
- TCP framing
- Control, ACK packets
- Other packets on the link (such as ARP and broadcasts)
- Packet loss
The rough, theoretical throughput limit for TCP over GigE (without jumbo frames) is ~940 Mbps4.
WireGuard or OpenVPN encapsulation further reduces this to ~900 Mbps1,2.
ACK packets reduce the "opposite" direction capacity by a bit over 5% 5.
It is not clear if netperf
measures bandwidth including IP and TCP headers or not. Similarly, I think that it probably does not include Ethernet framing.
netperf
documentation may be found at https://hewlettpackard.github.io/netperf/doc/netperf.html and https://github.com/HewlettPackard/netperf, along with source.
flent
documentation may be found at https://flent.org/
Firmware Source References
The following commits are the base on which the various images were built (all builds from master
:
- x86_64 – commit 1c0290c5cc, CommitDate: Fri Aug 30 20:45:40 2019 +0200
- IPQ4019 – commit c3a78955f3, CommitDate: Mon Aug 26 18:21:13 2019 +0200
- QCA9558 – commit 921675a2d1, CommitDate: Sat Aug 24 08:55:33 2019 +0800
- QCA9563 – commit b133e466b0, CommitDate: Wed Aug 14 12:36:37 2019 +0200
- QCA9531 – commit b133e466b0, CommitDate: Wed Aug 14 12:36:37 2019 +0200
Note significant changes have occurred on master
since the other devices' builds.
-
MT7682N - commit 2fedf023e4, CommitDate: Wed Nov 27 20:20:31 2019 +0100
-
88F3720 tested with OEM likely has a merge base of OpenWrt v19.07.0-rc1 or -rc2; OpenWrt 19.07-SNAPSHOT r10273-2b88d02. This device has yet to be ported to mainline OpenWrt. The OEM provides its fork of the openwrt-19.07 branch as source for its firmware.
-
88W8964 – commit b1e8a390ea, CommitDate: Thu Dec 19 15:40:49 2019 +0100
Footnotes
1 Additional 60-byte overhead for WireGuard for IPv4 (80 bytes for IPv6)
2 Additional 73-byte overhead based on a reported 1427 MTU for OpenVPN
3 The present (2019) DTS/driver for the ipq40xx is believed to only use one path, although two interfaces are revealed.
4 1500 bytes MTU less 20 bytes minimal IPv4 header less 20 bytes minimal TCP header gives 1460 maximum payload. Ethernet framing of 22 bytes (with 802.1Q VLAN tagging) plus 8 byte sync and start-of-frame delimiter plus 12 byte minimal interpacket gap gives 1542 byte-times on the wire. 1460/1542 ~ 94.7%, under ideal timing and no other clients or packets on the wire. Note that IPv6 headers are larger (40 bytes, minimum) and will reduce the factor further.
5 Minimal Ethernet packet (for a "naked ACK") is 84 byte-times on the wire. 84/1542 ~ 5.4%
Edit history:
- 2019-09-16
- Correct that it is ICMP-ping time
- Add link to Python scripts on Github
- 2019-09-17
- Quantified ACK-packet bandwidth effect
- 2019-09-27
- Clarified SQM bandwidth target, rather than "target" in the context of the inner workings of the queueing mechanisms on suggestion of @moeller0
- Added note that flow offloading appears to be functional now on
master
- 2019-10-03
- Added "key" for table values
- 2019-11-28
- Added MediaTek MT7828N results
- 2019-12-18
- Note that earlier, OpenVPN tesing was likely with BF-CBC, not AES-256-CBC
- 2019-12-21
- Add 88F3720, 88W8964, update results for IPQ4019