Bought a Topton x86 box with N305 CPU and X520 (ixgbe) NICs. 5 Gbps uplink with PPPoE.
On BSD distros such as OPNsense I was able to achieve 4.5 Gbps internet speeds. On OpenWrt (23.05) I was basically limited to 2.5 Gbps. Adjusted settings such as packet steering (to use the entire beefy CPU of course), enabled irqbalance, adjusted software/hardware flow offloading (neither benefited me, so I turned both off). No luck. Tried using the snapshot from last night as it has the 6.6 LTS kernel which is newer than the CPU, unlike 5.15. Still no good. Tried adjusting the CPU governor to “performance” rather than “powersave” which actually dropped my performance to about 1.9 Gbps. I set up SQM with cake & piece_of_cake.layer, now I’m getting 4 Gbps (only on powersave).
What’s the reason behind this? Why is Cake SQM required for me to get good uplink speeds? And why do I get lower performance with the performance governor, while powersave is fine?
This not ubuntu not bsd rant site.
Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button:
Remember to redact passwords, MAC addresses and any public IP addresses you may have:
Yeah, that CPU is not going to shape 5 Gbps with SQM... simplest_tbf.qos/fq_codel likely will give you the highest achievable shaper rate...
Without traffic shaping, I assume?
Just for completeness, you probably checked that the ethernet link rates where 5 or 10 Gbps, and not for some reason only 2.5 Gbps?
Sorry to disappoint, but atoms are not really all that beefy... they will do fine for 1 Gbps, but 5 Gbps is considerably harder as now you only have a 1/5 of the CPU cycles available to deal with each packet at line rate.
That is odd, these should help some, but certainly not enough to reach 5 Gbps shaper rate...
This is way more than I would expect for a N305, your CPU is punching well above its weight here...
No idea, however higher throughput with CPU-costly SQM/cake rthan without seems a bit fishy...
I'm unsure where the assumption I came here to rant comes from. I posted because I would like to resolve the odd behavior I'm seeing with the performance.
root@OpenWrt:~# ubus call system board
{
"kernel": "6.6.54",
"hostname": "OpenWrt",
"system": "Intel(R) Core(TM) i3-N305",
"model": "Default string Default string",
"board_name": "default-string-default-string",
"rootfs_type": "ext4",
"release": {
"distribution": "OpenWrt",
"version": "SNAPSHOT",
"revision": "r27707-084665698b",
"target": "x86/64",
"description": "OpenWrt SNAPSHOT r27707-084665698b"
}
}
root@OpenWrt:~# cat /etc/config/network
config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals 'globals'
option ula_prefix 'fdf7:c5b1:8020::/48'
option packet_steering '2'
option steering_flows '128'
config device
option name 'br-lan'
option type 'bridge'
list ports 'eth1'
list ports 'eth2'
list ports 'eth3'
config interface 'lan'
option device 'br-lan'
option proto 'static'
option ipaddr '192.168.0.1'
option netmask '255.255.255.0'
option ip6assign '60'
list dns '192.168.0.217'
config interface 'wan'
option device 'eth0'
option proto 'pppoe'
option username '<my pppoe username>'
option password '<pppoe password>'
option ipv6 'auto'
config interface 'wan6'
option device 'eth0'
option proto 'dhcpv6'
root@OpenWrt:~# cat /etc/config/wireless
cat: can't open '/etc/config/wireless': No such file or directory
root@OpenWrt:~# cat /etc/config/dhcp
config dnsmasq
option domainneeded '1'
option localise_queries '1'
option rebind_protection '1'
option rebind_localhost '1'
option local '/lan/'
option domain 'lan'
option expandhosts '1'
option cachesize '1000'
option authoritative '1'
option readethers '1'
option leasefile '/tmp/dhcp.leases'
option resolvfile '/tmp/resolv.conf.d/resolv.conf.auto'
option localservice '1'
option ednspacket_max '1232'
config dhcp 'lan'
option interface 'lan'
option start '100'
option limit '150'
option leasetime '12h'
option dhcpv4 'server'
option ra 'server'
list ra_flags 'managed-config'
list ra_flags 'other-config'
option dns_service '0'
config dhcp 'wan'
option interface 'wan'
option ignore '1'
config odhcpd 'odhcpd'
option maindhcp '0'
option leasefile '/tmp/hosts/odhcpd'
option leasetrigger '/usr/sbin/odhcpd-update'
option loglevel '4'
config host
option name 'desktop'
option ip '192.168.0.211'
option mac '00:00:00:00:00:00'
config host
option name 'zyxelAP'
option ip '192.168.0.164'
option mac '00:00:00:00:00:00'
config host
option name 'homelab'
option ip '192.168.0.217'
option mac '00:00:00:00:00:00'
config host
option name 'nas'
option ip '192.168.0.146'
option mac '00:00:00:00:00:00'
root@OpenWrt:~# cat /etc/config/firewall
config defaults
option input 'REJECT'
option output 'ACCEPT'
option forward 'REJECT'
option synflood_protect '1'
config zone
option name 'lan'
option input 'ACCEPT'
option output 'ACCEPT'
option forward 'ACCEPT'
list network 'lan'
config zone
option name 'wan'
option input 'REJECT'
option output 'ACCEPT'
option forward 'REJECT'
option masq '1'
option mtu_fix '1'
list network 'wan'
list network 'wan6'
config forwarding
option src 'lan'
option dest 'wan'
config rule
option name 'Allow-DHCP-Renew'
option src 'wan'
option proto 'udp'
option dest_port '68'
option target 'ACCEPT'
option family 'ipv4'
config rule
option name 'Allow-Ping'
option src 'wan'
option proto 'icmp'
option icmp_type 'echo-request'
option family 'ipv4'
option target 'ACCEPT'
config rule
option name 'Allow-IGMP'
option src 'wan'
option proto 'igmp'
option family 'ipv4'
option target 'ACCEPT'
config rule
option name 'Allow-DHCPv6'
option src 'wan'
option proto 'udp'
option dest_port '546'
option family 'ipv6'
option target 'ACCEPT'
config rule
option name 'Allow-MLD'
option src 'wan'
option proto 'icmp'
option src_ip 'fe80::/10'
list icmp_type '130/0'
list icmp_type '131/0'
list icmp_type '132/0'
list icmp_type '143/0'
option family 'ipv6'
option target 'ACCEPT'
config rule
option name 'Allow-ICMPv6-Input'
option src 'wan'
option proto 'icmp'
list icmp_type 'echo-request'
list icmp_type 'echo-reply'
list icmp_type 'destination-unreachable'
list icmp_type 'packet-too-big'
list icmp_type 'time-exceeded'
list icmp_type 'bad-header'
list icmp_type 'unknown-header-type'
list icmp_type 'router-solicitation'
list icmp_type 'neighbour-solicitation'
list icmp_type 'router-advertisement'
list icmp_type 'neighbour-advertisement'
option limit '1000/sec'
option family 'ipv6'
option target 'ACCEPT'
config rule
option name 'Allow-ICMPv6-Forward'
option src 'wan'
option dest '*'
option proto 'icmp'
list icmp_type 'echo-request'
list icmp_type 'echo-reply'
list icmp_type 'destination-unreachable'
list icmp_type 'packet-too-big'
list icmp_type 'time-exceeded'
list icmp_type 'bad-header'
list icmp_type 'unknown-header-type'
option limit '1000/sec'
option family 'ipv6'
option target 'ACCEPT'
config rule
option name 'Allow-IPSec-ESP'
option src 'wan'
option dest 'lan'
option proto 'esp'
option target 'ACCEPT'
config rule
option name 'Allow-ISAKMP'
option src 'wan'
option dest 'lan'
option dest_port '500'
option proto 'udp'
option target 'ACCEPT'
config redirect
option dest 'lan'
option target 'DNAT'
option name 'HTTP'
list proto 'tcp'
option src 'wan'
option src_dport '80'
option dest_ip '192.168.0.217'
option dest_port '80'
config redirect
option dest 'lan'
option target 'DNAT'
option name 'HTTPS'
list proto 'tcp'
option src 'wan'
option src_dport '443'
option dest_ip '192.168.0.217'
option dest_port '443'
config redirect
option dest 'lan'
option target 'DNAT'
option name 'qBitTorrent (homelab)'
option src 'wan'
option src_dport '54545'
option dest_ip '192.168.0.217'
option dest_port '54545'
list proto 'tcp'
list proto 'udp'
config redirect
option dest 'lan'
option target 'DNAT'
option name 'Deluge (desktop)'
option src 'wan'
option src_dport '6891'
option dest_ip '192.168.0.211'
option dest_port '6891'
list proto 'tcp'
list proto 'udp'
Correct, I haven't specified any traffic shaping there.
Of course! I'm negotiating at 10GbE. The X520 NICs on the box don't do multi-gig negotiation, just 1GbE/10GbE.
Huh.. I'm confused. Where does this assumption come from? I was able to drive a 1000/250 uplink on a slightly overclocked Raspberry Pi 4 B using OpenWrt, and traffic shaping with cake SQM, and that was using an external NIC over USB. Wasn't maxing out the processor either. The new x86 router is so much more performant: https://www.cpubenchmark.net/compare/4297vs5213/BCM2711-vs-Intel-i3-N305
I agree. I don't understand why it's like that, and I would prefer to have the ability to turn it off and still get good throughput as with SQM I'm forced to sacrifice some of my bandwidth.
Discussions on the libreqos zulip forum, about aggregate shaper performance on different CPUs...
i3-N305 is not a real performance corer CPU, but a gracemont efficiency only design, aka the CPU formerly known as Atom. In spite of the marketing chutzpah of giving this thing the i3-prefix this is not a really beefy x86b CPU.
Yeah, x86 offers higher performance than the low price Arm options, but the ex-atoms are intended to play in the budget segment and hence are well below the "real" x86 CPUs. And that is not all bad, often enough such an ex-atom is just fine and the higer efficiency results in lower power consumption. But traffic shaping is a costly operation that runs better on faster CPUs with better memory systems. The pi4 is able to shape at 1 Gbps, but there is not much headroom left, while your atom apparently went to 2.5 or even 4 Gbps, so this still fits with your N305 being more oerformant than the Pi4, no (in spite of not being the beefiest x86 CPU)?
Out of curiosity, what do you do with your network to actually perceive a noticeable improvement with a 5 Gbps link than say 1 Gbps?
I should add, that a few years ago my router only allowed to shape around 70 Mbps aggregate, so on my 100/40 dsl link I sacrificed 50% of potential throughput without hesitation, as for my usage-pattern 50/20 Mbps with SQM was clearly superior to 100/40 without. But that is clearly a subjective policy call, each network admin needs to take for their own network. So I am not trying to imply that as this was the policy I chose, you should do the same.
PS.: In theory you shpuld enable revceive packet steering even without SQM.
Sounds normal to me. I got the device after watching this review from ServeTheHome: https://youtu.be/Z-3ZjMSEEc8 - they were able to route about 15 Gbps on the very same machine with the same SFP+ NICs I use for WAN/LAN.
And is it really all that weird? Here's how the CPU usage looks with cake SQM running a speed test:
Doing some basic math, even if 5 Gbps was utilized bidirectionally (it won't be on my end, I have a 5000/1000 uplink) I'd still have about 170% CPU left.
P2P file sharing (torrenting) as my /etc/config/firewall output probably suggested. I'm not upset about not getting my whole 5 Gbps link, the box I use for file sharing only has a 2.5GbE NIC. I got the 5 Gbps link from my ISP because it's the only package they have that provides good upload speed (5000/1000) and it's incredibly cheap in my country. Just a "I want to use what I pay for" kind of thing.
Yes, I did enable it on all cores in LuCI. The "Enabled" setting (not "Enabled (all CPUs)" dropped my throughput by a tiny bit.
Yeah, but that was likely downhill with wind from the back on a holiday... this is IMHO not a realistic scenario for more gnarly traffic patterns.
Please enable the detailed CPU time display for htop, cake is accounted primarily as soft interrupt and it will be located on a single CPU as cake is essentially single threaded per instance.
Not a valid way of looking at it, as being single threaded cake will run into a throughput ceiling once the CPU it runs on is saturated even if all other CPUs are idle...
That is a question you need to answer As I implied above, I would not even flinch to just set the shaper to 1000/900 and be done with...
Fair, i also cherish high upload and low asymmetry, so I understand your usecase.
I am happy that OpenWrt offers both options now, as it is really hard to predict which variant works best fior a given SoC.
irqbalance + SQM (either cake/fq_codel) + packet steering (enabled; not "all CPUs") with RPS of 128/256 gets me to 3.8 Gbps, with 1 core being maxed out.
Without SQM, my CPU usage under load totals at about 10%. Getting me 1.5~ Gbps.
If I enable packet steering, without SQM and without irqbalance, I get about 2.0 Gbps.
Packet steering on all CPUs, without SQM and without irqbalance, I get 1.1 Gbps.
Packet steering on all CPUs, without SQM and with irqbalance, I get 1.05 Gbps.
Packet steering, with SQM (fq_codel) and with irqbalance, I get 3.8-3.9 Gbps, with one core being maxed out
So for some reason, it seems like on OpenWrt I cannot use my uplink's speed unless only one core is being maxed out. When using cake SQM, it puts enough load on a CPU core to max it out, making my uplink actually perform close to expected speeds.
So.. I tested this by putting an artificial 100% CPU load on the system by using the cat /dev/urandom | gzip > /dev/null snippet, without SQM. I was still getting low performance. I noticed the load was on CPU3.
So.. what I tried next was to put the affinity of the gzip process on CPU0 (taskset -p 0x01 $(pgrep gzip)). And now I was getting 3.3 Gbps without SQM. It's not anywhere near as close to my 5 gig uplink as it should be, but it's certainly over 50% faster than what I get without putting artificial load on the CPU. I will do further testing with 50%~ artificial load on one core to see how that affects performance.
I don't know what's up here, but I need high load on certain CPU cores to get close to my expected uplink speeds. This wasn't the case with OPNsense so I assume there's either some kind of bug with OpenWrt, the kernel (either upstream or OpenWrt's patches), or some obvious configuration issue I missed. I get full speed on LAN when running iperf3 tests, and the NICs I use for LAN and WAN are identical.
echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
2.Test with or without Intel P-States (if supported by your CPU). I’m not sure if your CPU supports Intel P-States, but if it does, you could perform tests with Intel P-States enabled or disabled. Check the current scaling driver:
Run the following command to see which driver is used for CPU frequency scaling:
You haven't said much about the other end you're testing against, and it seems to me that your unit appearing to require SQM to get the performance up leads me to wonder whether you're dealing with a performance issue in the uplink part of the system that you may not have any control over.
One example I'm aware of is a wholesale carrier using a "policer" to enforce data rates on customer services. The particular settings used for some bandwidth plans offered to customers have savage effects on uplink throughput if traffic patterns trigger the policer, which just drops traffic. Some CPE seems to not fall foul of this very often while other CPE reliably triggers the policer effectively cutting the feasible throughput in half unless it can support some shaping (i.e. SQM) function.
A related possibility might involve the TCP/IP congestion control algorithm on the device generating the uplink data flow through the PPPoE tunnel, though ISTR that this might be more likely an issue on very long distance connections and not much of an issue on short distance connections.
I seem to recall a case where @tohojo was involved when memory serves right, where "too small buffers" somewhere on the path required proper pacing of the data, and that is something sqm is decent at. It might help to test using the fq qdisc on the load generating machine as that can do pacing (and IIRC ´does so by default)...