Low performance unless SQM is enabled

Bought a Topton x86 box with N305 CPU and X520 (ixgbe) NICs. 5 Gbps uplink with PPPoE.

On BSD distros such as OPNsense I was able to achieve 4.5 Gbps internet speeds. On OpenWrt (23.05) I was basically limited to 2.5 Gbps. Adjusted settings such as packet steering (to use the entire beefy CPU of course), enabled irqbalance, adjusted software/hardware flow offloading (neither benefited me, so I turned both off). No luck. Tried using the snapshot from last night as it has the 6.6 LTS kernel which is newer than the CPU, unlike 5.15. Still no good. Tried adjusting the CPU governor to “performance” rather than “powersave” which actually dropped my performance to about 1.9 Gbps. I set up SQM with cake & piece_of_cake.layer, now I’m getting 4 Gbps (only on powersave).

What’s the reason behind this? Why is Cake SQM required for me to get good uplink speeds? And why do I get lower performance with the performance governor, while powersave is fine?

Thank you

This not ubuntu not bsd rant site.
Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button:
grafik
Remember to redact passwords, MAC addresses and any public IP addresses you may have:

ubus call system board
cat /etc/config/network
cat /etc/config/wireless
cat /etc/config/dhcp
cat /etc/config/firewall

Yeah, that CPU is not going to shape 5 Gbps with SQM... simplest_tbf.qos/fq_codel likely will give you the highest achievable shaper rate...

Without traffic shaping, I assume?

Just for completeness, you probably checked that the ethernet link rates where 5 or 10 Gbps, and not for some reason only 2.5 Gbps?

Sorry to disappoint, but atoms are not really all that beefy... they will do fine for 1 Gbps, but 5 Gbps is considerably harder as now you only have a 1/5 of the CPU cycles available to deal with each packet at line rate.

That is odd, these should help some, but certainly not enough to reach 5 Gbps shaper rate...

This is way more than I would expect for a N305, your CPU is punching well above its weight here...

No idea, however higher throughput with CPU-costly SQM/cake rthan without seems a bit fishy...

1 Like

I'm unsure where the assumption I came here to rant comes from. I posted because I would like to resolve the odd behavior I'm seeing with the performance.

root@OpenWrt:~# ubus call system board
{
	"kernel": "6.6.54",
	"hostname": "OpenWrt",
	"system": "Intel(R) Core(TM) i3-N305",
	"model": "Default string Default string",
	"board_name": "default-string-default-string",
	"rootfs_type": "ext4",
	"release": {
		"distribution": "OpenWrt",
		"version": "SNAPSHOT",
		"revision": "r27707-084665698b",
		"target": "x86/64",
		"description": "OpenWrt SNAPSHOT r27707-084665698b"
	}
}
root@OpenWrt:~# cat /etc/config/network

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fdf7:c5b1:8020::/48'
	option packet_steering '2'
	option steering_flows '128'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth1'
	list ports 'eth2'
	list ports 'eth3'

config interface 'lan'
	option device 'br-lan'
	option proto 'static'
	option ipaddr '192.168.0.1'
	option netmask '255.255.255.0'
	option ip6assign '60'
	list dns '192.168.0.217'

config interface 'wan'
	option device 'eth0'
	option proto 'pppoe'
	option username '<my pppoe username>'
	option password '<pppoe password>'
	option ipv6 'auto'

config interface 'wan6'
	option device 'eth0'
	option proto 'dhcpv6'

root@OpenWrt:~# cat /etc/config/wireless
cat: can't open '/etc/config/wireless': No such file or directory
root@OpenWrt:~# cat /etc/config/dhcp

config dnsmasq
	option domainneeded '1'
	option localise_queries '1'
	option rebind_protection '1'
	option rebind_localhost '1'
	option local '/lan/'
	option domain 'lan'
	option expandhosts '1'
	option cachesize '1000'
	option authoritative '1'
	option readethers '1'
	option leasefile '/tmp/dhcp.leases'
	option resolvfile '/tmp/resolv.conf.d/resolv.conf.auto'
	option localservice '1'
	option ednspacket_max '1232'

config dhcp 'lan'
	option interface 'lan'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option dhcpv4 'server'
	option ra 'server'
	list ra_flags 'managed-config'
	list ra_flags 'other-config'
	option dns_service '0'

config dhcp 'wan'
	option interface 'wan'
	option ignore '1'

config odhcpd 'odhcpd'
	option maindhcp '0'
	option leasefile '/tmp/hosts/odhcpd'
	option leasetrigger '/usr/sbin/odhcpd-update'
	option loglevel '4'

config host
	option name 'desktop'
	option ip '192.168.0.211'
	option mac '00:00:00:00:00:00'

config host
	option name 'zyxelAP'
	option ip '192.168.0.164'
	option mac '00:00:00:00:00:00'

config host
	option name 'homelab'
	option ip '192.168.0.217'
	option mac '00:00:00:00:00:00'

config host
	option name 'nas'
	option ip '192.168.0.146'
	option mac '00:00:00:00:00:00'

root@OpenWrt:~# cat /etc/config/firewall

config defaults
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option synflood_protect '1'

config zone
	option name 'lan'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'lan'

config zone
	option name 'wan'
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option masq '1'
	option mtu_fix '1'
	list network 'wan'
	list network 'wan6'

config forwarding
	option src 'lan'
	option dest 'wan'

config rule
	option name 'Allow-DHCP-Renew'
	option src 'wan'
	option proto 'udp'
	option dest_port '68'
	option target 'ACCEPT'
	option family 'ipv4'

config rule
	option name 'Allow-Ping'
	option src 'wan'
	option proto 'icmp'
	option icmp_type 'echo-request'
	option family 'ipv4'
	option target 'ACCEPT'

config rule
	option name 'Allow-IGMP'
	option src 'wan'
	option proto 'igmp'
	option family 'ipv4'
	option target 'ACCEPT'

config rule
	option name 'Allow-DHCPv6'
	option src 'wan'
	option proto 'udp'
	option dest_port '546'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-MLD'
	option src 'wan'
	option proto 'icmp'
	option src_ip 'fe80::/10'
	list icmp_type '130/0'
	list icmp_type '131/0'
	list icmp_type '132/0'
	list icmp_type '143/0'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Input'
	option src 'wan'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	list icmp_type 'router-solicitation'
	list icmp_type 'neighbour-solicitation'
	list icmp_type 'router-advertisement'
	list icmp_type 'neighbour-advertisement'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Forward'
	option src 'wan'
	option dest '*'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-IPSec-ESP'
	option src 'wan'
	option dest 'lan'
	option proto 'esp'
	option target 'ACCEPT'

config rule
	option name 'Allow-ISAKMP'
	option src 'wan'
	option dest 'lan'
	option dest_port '500'
	option proto 'udp'
	option target 'ACCEPT'

config redirect
	option dest 'lan'
	option target 'DNAT'
	option name 'HTTP'
	list proto 'tcp'
	option src 'wan'
	option src_dport '80'
	option dest_ip '192.168.0.217'
	option dest_port '80'

config redirect
	option dest 'lan'
	option target 'DNAT'
	option name 'HTTPS'
	list proto 'tcp'
	option src 'wan'
	option src_dport '443'
	option dest_ip '192.168.0.217'
	option dest_port '443'

config redirect
	option dest 'lan'
	option target 'DNAT'
	option name 'qBitTorrent (homelab)'
	option src 'wan'
	option src_dport '54545'
	option dest_ip '192.168.0.217'
	option dest_port '54545'
	list proto 'tcp'
	list proto 'udp'

config redirect
	option dest 'lan'
	option target 'DNAT'
	option name 'Deluge (desktop)'
	option src 'wan'
	option src_dport '6891'
	option dest_ip '192.168.0.211'
	option dest_port '6891'
	list proto 'tcp'
	list proto 'udp'
1 Like

I think you need irqbalance for ixgbe...

Correct, I haven't specified any traffic shaping there.

Of course! I'm negotiating at 10GbE. The X520 NICs on the box don't do multi-gig negotiation, just 1GbE/10GbE.

Huh.. I'm confused. Where does this assumption come from? I was able to drive a 1000/250 uplink on a slightly overclocked Raspberry Pi 4 B using OpenWrt, and traffic shaping with cake SQM, and that was using an external NIC over USB. Wasn't maxing out the processor either. The new x86 router is so much more performant: https://www.cpubenchmark.net/compare/4297vs5213/BCM2711-vs-Intel-i3-N305

I agree. I don't understand why it's like that, and I would prefer to have the ability to turn it off and still get good throughput as with SQM I'm forced to sacrifice some of my bandwidth.

I do use irqbalance, as mentioned in the OP.

And

cat /proc/interrupts
cat /proc/net/softnet_stat
ethtool -S wan
ethtool -g wan

wan isn't a valid physical interface, so I replaced it with eth0 as it's the physical interface I use for wan:

root@OpenWrt:~# cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
   0:         31          0          0          0          0          0          0          0  IR-IO-APIC    2-edge      timer
   4:          0          0          0          0          0         13          0          0  IR-IO-APIC    4-edge      ttyS0
   8:          0          0          0          0          0          0          0          0  IR-IO-APIC    8-edge      rtc0
   9:          0          0          0          0          0          0          0          0  IR-IO-APIC    9-fasteoi   acpi
  16:          0          0          0          0          0          0          0          0  IR-IO-APIC   16-fasteoi   i801_smbus
 120:          0          0          0          0          0          0          0          0  DMAR-MSI    0-edge      dmar0
 121:          0          0          0          0          0          0          0          0  DMAR-MSI    1-edge      dmar1
 126:          0          0          0          0          0          0          0          0  IR-PCI-MSI-0000:00:02.0    0-edge      i915
 127:          0          0          0          0          0          0          0          0  IR-PCI-MSI-0000:00:17.0    0-edge      ahci[0000:00:17.0]
 128:          0         21          0          0          0          0          0          0  IR-PCI-MSIX-0000:04:00.0    0-edge      nvme0q0
 129:        401          0          0          0          0          0          0          0  IR-PCI-MSIX-0000:04:00.0    1-edge      nvme0q1
 130:          0        408          0          0          0          0          0          0  IR-PCI-MSIX-0000:04:00.0    2-edge      nvme0q2
 131:          0          0        230          0          0          0          0          0  IR-PCI-MSIX-0000:04:00.0    3-edge      nvme0q3
 132:          0          0          0        157          0          0          0          0  IR-PCI-MSIX-0000:04:00.0    4-edge      nvme0q4
 133:          0          0          0          0        110          0          0          0  IR-PCI-MSIX-0000:04:00.0    5-edge      nvme0q5
 134:          0          0          0          0          0        341          0          0  IR-PCI-MSIX-0000:04:00.0    6-edge      nvme0q6
 135:          0          0          0          0          0          0        120          0  IR-PCI-MSIX-0000:04:00.0    7-edge      nvme0q7
 136:          0          0          0          0          0          0          0        248  IR-PCI-MSIX-0000:04:00.0    8-edge      nvme0q8
 137:          0          0          0          0          0          0          0          0  IR-PCI-MSI-0000:00:0d.0    0-edge      xhci_hcd
 138:          0          0          0          0          0          0          0          0  IR-PCI-MSI-0000:00:14.0    0-edge      xhci_hcd
 139:          0    9413891          0          0     535243          0          0        684  IR-PCI-MSIX-0000:01:00.0    0-edge      eth0-TxRx-0
 140:     202808       3204     279354      40263      80681     293275     551418     503688  IR-PCI-MSIX-0000:01:00.0    1-edge      eth0-TxRx-1
 141:     824047     269262     154013     427076     693080     520823     453807     193157  IR-PCI-MSIX-0000:01:00.0    2-edge      eth0-TxRx-2
 142:     206388       1974     191722     476588     478273      52808     205478     342884  IR-PCI-MSIX-0000:01:00.0    3-edge      eth0-TxRx-3
 143:     389435       2463     141509      89359     133037     289462     133890     368770  IR-PCI-MSIX-0000:01:00.0    4-edge      eth0-TxRx-4
 144:     418480       3700     460548      80826      85359     137095     321877      91787  IR-PCI-MSIX-0000:01:00.0    5-edge      eth0-TxRx-5
 145:     238630       2016     225264     391946      54257     751180     104935      46983  IR-PCI-MSIX-0000:01:00.0    6-edge      eth0-TxRx-6
 146:      45813       3767     433944    1109022      58492     147758     292732     436018  IR-PCI-MSIX-0000:01:00.0    7-edge      eth0-TxRx-7
 147:          0          0          0          0          5          0          0          0  IR-PCI-MSIX-0000:01:00.0    8-edge      eth0
 148:     789190     273815      59424     418507     864992      35517     172261     570646  IR-PCI-MSIX-0000:01:00.1    0-edge      eth1-TxRx-0
 149:     556064      13395     579560     217184     392370     406292     711754      75166  IR-PCI-MSIX-0000:01:00.1    1-edge      eth1-TxRx-1
 150:     330748       5170    1934790    1357000      53542     117498     121022      64978  IR-PCI-MSIX-0000:01:00.1    2-edge      eth1-TxRx-2
 151:    1256087     223690      31811      76431     228707     226442     224754     890728  IR-PCI-MSIX-0000:01:00.1    3-edge      eth1-TxRx-3
 152:     213516       4693     398907     114737     794007     633002     367956     452232  IR-PCI-MSIX-0000:01:00.1    4-edge      eth1-TxRx-4
 153:      81275      17322     265589     651604     739143     782951     502681      71108  IR-PCI-MSIX-0000:01:00.1    5-edge      eth1-TxRx-5
 154:     243542       4360      79462      43718     123465     134430    1249512     330955  IR-PCI-MSIX-0000:01:00.1    6-edge      eth1-TxRx-6
 155:     320688       5062     591702      36055     726004     696719     336026     693235  IR-PCI-MSIX-0000:01:00.1    7-edge      eth1-TxRx-7
 156:          0          3          0          4          0          0          2          1  IR-PCI-MSIX-0000:01:00.1    8-edge      eth1
 157:          0          0          0          0          0          0          0          0  IR-PCI-MSIX-0000:02:00.0    0-edge      eth2
 158:       3176        150       2115       2344       2680       1445       3205       1940  IR-PCI-MSIX-0000:02:00.0    1-edge      eth2-TxRx-0
 159:       2721        465       1895       2082       2652       1670       3275       2295  IR-PCI-MSIX-0000:02:00.0    2-edge      eth2-TxRx-1
 160:       3105        215       2261       2172       2510       1762       2945       2085  IR-PCI-MSIX-0000:02:00.0    3-edge      eth2-TxRx-2
 161:       3220        210       2470       1917       2685       1285       3242       2026  IR-PCI-MSIX-0000:02:00.0    4-edge      eth2-TxRx-3
 162:          0          0          0          0          0          0          0          0  IR-PCI-MSIX-0000:03:00.0    0-edge      eth3
 163:       3382        260       2186       1917       2750       1515       2995       2050  IR-PCI-MSIX-0000:03:00.0    1-edge      eth3-TxRx-0
 164:       3426        397       2160       1822       2565       1510       3140       2035  IR-PCI-MSIX-0000:03:00.0    2-edge      eth3-TxRx-1
 165:       3105        295       2577       1672       2900       1446       2995       2065  IR-PCI-MSIX-0000:03:00.0    3-edge      eth3-TxRx-2
 166:       3436        265       2395       2199       2510       1495       2740       2015  IR-PCI-MSIX-0000:03:00.0    4-edge      eth3-TxRx-3
 167:          0          0          0          0          0          0          0        297  IR-PCI-MSI-0000:00:1f.3    0-edge      snd_hda_intel:card0
 NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
 LOC:    7874880    5828672    7244515    5484361    4876463    4831851    5170762    5939630   Local timer interrupts
 SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
 IWI:          0          0          0          0          0          8          0          0   IRQ work interrupts
 RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
 RES:        522        630        555        758        639        658        822        444   Rescheduling interrupts
 CAL:     231911     178422     188386     174749     139188     203958     170303     349150   Function call interrupts
 TLB:       6019       6828       5818       5018       5178       6444       7883       7405   TLB shootdowns
 TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
 DFR:          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
 MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
 MCP:        113        114        114        114        114        114        114        114   Machine check polls
 ERR:          0
 MIS:          0
 PIN:          0          0          0          0          0          0          0          0   Posted-interrupt notification event
 NPI:          0          0          0          0          0          0          0          0   Nested posted-interrupt event
 PIW:          0          0          0          0          0          0          0          0   Posted-interrupt wakeup event
root@OpenWrt:~# cat /proc/net/softnet_stat
026b8eeb 0000000b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 001fc246 00000000 00000000 00000000 00000000 00000000
01a29a8f 00000000 00000017 00000000 00000000 00000000 00000000 00000000 00000000 000e0f0d 00000000 00000000 00000001 00000000 00000000
01aa8411 0000009f 00000007 00000000 00000000 00000000 00000000 00000000 00000000 002282f4 00000000 00000000 00000002 00000000 00000000
015d1e85 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 001c827e 00000000 00000000 00000003 00000000 00000000
01431579 00000000 0000005d 00000000 00000000 00000000 00000000 00000000 00000000 00172a06 00000000 00000000 00000004 00000000 00000000
013ccc04 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 002189eb 00000000 00000000 00000005 00000000 00000000
01781462 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0019d9a6 00000000 00000000 00000006 00000000 00000000
01b25cb1 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0022385d 00000000 00000000 00000007 00000000 00000000
root@OpenWrt:~# ethtool -S eth0
NIC statistics:
     rx_packets: 68136006
     tx_packets: 38073049
     rx_bytes: 92303386776
     tx_bytes: 28523012705
     rx_pkts_nic: 68136006
     tx_pkts_nic: 38073049
     rx_bytes_nic: 92577700148
     tx_bytes_nic: 28675358892
     lsc_int: 5
     tx_busy: 0
     non_eop_descs: 0
     rx_errors: 0
     tx_errors: 0
     rx_dropped: 71
     tx_dropped: 0
     multicast: 0
     broadcast: 0
     rx_no_buffer_count: 0
     collisions: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     hw_rsc_aggregated: 0
     hw_rsc_flushed: 0
     fdir_match: 0
     fdir_miss: 0
     fdir_overflow: 0
     rx_fifo_errors: 0
     rx_missed_errors: 1166
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_timeout_count: 0
     tx_restart_queue: 0
     rx_length_errors: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     tx_flow_control_xon: 5
     rx_flow_control_xon: 0
     tx_flow_control_xoff: 15
     rx_flow_control_xoff: 0
     rx_csum_offload_errors: 0
     alloc_rx_page: 6420602
     alloc_rx_page_failed: 0
     alloc_rx_buff_failed: 0
     rx_no_dma_resources: 0
     os2bmc_rx_by_bmc: 0
     os2bmc_tx_by_bmc: 0
     os2bmc_tx_by_host: 0
     os2bmc_rx_by_host: 0
     tx_hwtstamp_timeouts: 0
     tx_hwtstamp_skipped: 0
     rx_hwtstamp_cleared: 0
     tx_ipsec: 0
     rx_ipsec: 0
     tx_queue_0_packets: 5700028
     tx_queue_0_bytes: 5142208131
     tx_queue_1_packets: 4293870
     tx_queue_1_bytes: 3335761924
     tx_queue_2_packets: 9065929
     tx_queue_2_bytes: 6641112106
     tx_queue_3_packets: 3930277
     tx_queue_3_bytes: 3438717969
     tx_queue_4_packets: 3225273
     tx_queue_4_bytes: 2401060855
     tx_queue_5_packets: 3425238
     tx_queue_5_bytes: 2020340440
     tx_queue_6_packets: 3528196
     tx_queue_6_bytes: 2258790109
     tx_queue_7_packets: 4904238
     tx_queue_7_bytes: 3285021171
     tx_queue_8_packets: 0
     tx_queue_8_bytes: 0
     tx_queue_9_packets: 0
     tx_queue_9_bytes: 0
     tx_queue_10_packets: 0
     tx_queue_10_bytes: 0
     tx_queue_11_packets: 0
     tx_queue_11_bytes: 0
     tx_queue_12_packets: 0
     tx_queue_12_bytes: 0
     tx_queue_13_packets: 0
     tx_queue_13_bytes: 0
     tx_queue_14_packets: 0
     tx_queue_14_bytes: 0
     tx_queue_15_packets: 0
     tx_queue_15_bytes: 0
     tx_queue_16_packets: 0
     tx_queue_16_bytes: 0
     tx_queue_17_packets: 0
     tx_queue_17_bytes: 0
     tx_queue_18_packets: 0
     tx_queue_18_bytes: 0
     tx_queue_19_packets: 0
     tx_queue_19_bytes: 0
     tx_queue_20_packets: 0
     tx_queue_20_bytes: 0
     tx_queue_21_packets: 0
     tx_queue_21_bytes: 0
     tx_queue_22_packets: 0
     tx_queue_22_bytes: 0
     tx_queue_23_packets: 0
     tx_queue_23_bytes: 0
     tx_queue_24_packets: 0
     tx_queue_24_bytes: 0
     tx_queue_25_packets: 0
     tx_queue_25_bytes: 0
     tx_queue_26_packets: 0
     tx_queue_26_bytes: 0
     tx_queue_27_packets: 0
     tx_queue_27_bytes: 0
     tx_queue_28_packets: 0
     tx_queue_28_bytes: 0
     tx_queue_29_packets: 0
     tx_queue_29_bytes: 0
     tx_queue_30_packets: 0
     tx_queue_30_bytes: 0
     tx_queue_31_packets: 0
     tx_queue_31_bytes: 0
     tx_queue_32_packets: 0
     tx_queue_32_bytes: 0
     tx_queue_33_packets: 0
     tx_queue_33_bytes: 0
     tx_queue_34_packets: 0
     tx_queue_34_bytes: 0
     tx_queue_35_packets: 0
     tx_queue_35_bytes: 0
     tx_queue_36_packets: 0
     tx_queue_36_bytes: 0
     tx_queue_37_packets: 0
     tx_queue_37_bytes: 0
     tx_queue_38_packets: 0
     tx_queue_38_bytes: 0
     tx_queue_39_packets: 0
     tx_queue_39_bytes: 0
     tx_queue_40_packets: 0
     tx_queue_40_bytes: 0
     tx_queue_41_packets: 0
     tx_queue_41_bytes: 0
     tx_queue_42_packets: 0
     tx_queue_42_bytes: 0
     tx_queue_43_packets: 0
     tx_queue_43_bytes: 0
     tx_queue_44_packets: 0
     tx_queue_44_bytes: 0
     tx_queue_45_packets: 0
     tx_queue_45_bytes: 0
     tx_queue_46_packets: 0
     tx_queue_46_bytes: 0
     tx_queue_47_packets: 0
     tx_queue_47_bytes: 0
     tx_queue_48_packets: 0
     tx_queue_48_bytes: 0
     tx_queue_49_packets: 0
     tx_queue_49_bytes: 0
     tx_queue_50_packets: 0
     tx_queue_50_bytes: 0
     tx_queue_51_packets: 0
     tx_queue_51_bytes: 0
     tx_queue_52_packets: 0
     tx_queue_52_bytes: 0
     tx_queue_53_packets: 0
     tx_queue_53_bytes: 0
     tx_queue_54_packets: 0
     tx_queue_54_bytes: 0
     tx_queue_55_packets: 0
     tx_queue_55_bytes: 0
     tx_queue_56_packets: 0
     tx_queue_56_bytes: 0
     tx_queue_57_packets: 0
     tx_queue_57_bytes: 0
     tx_queue_58_packets: 0
     tx_queue_58_bytes: 0
     tx_queue_59_packets: 0
     tx_queue_59_bytes: 0
     tx_queue_60_packets: 0
     tx_queue_60_bytes: 0
     tx_queue_61_packets: 0
     tx_queue_61_bytes: 0
     tx_queue_62_packets: 0
     tx_queue_62_bytes: 0
     tx_queue_63_packets: 0
     tx_queue_63_bytes: 0
     rx_queue_0_packets: 68136006
     rx_queue_0_bytes: 92303386776
     rx_queue_1_packets: 0
     rx_queue_1_bytes: 0
     rx_queue_2_packets: 0
     rx_queue_2_bytes: 0
     rx_queue_3_packets: 0
     rx_queue_3_bytes: 0
     rx_queue_4_packets: 0
     rx_queue_4_bytes: 0
     rx_queue_5_packets: 0
     rx_queue_5_bytes: 0
     rx_queue_6_packets: 0
     rx_queue_6_bytes: 0
     rx_queue_7_packets: 0
     rx_queue_7_bytes: 0
     rx_queue_8_packets: 0
     rx_queue_8_bytes: 0
     rx_queue_9_packets: 0
     rx_queue_9_bytes: 0
     rx_queue_10_packets: 0
     rx_queue_10_bytes: 0
     rx_queue_11_packets: 0
     rx_queue_11_bytes: 0
     rx_queue_12_packets: 0
     rx_queue_12_bytes: 0
     rx_queue_13_packets: 0
     rx_queue_13_bytes: 0
     rx_queue_14_packets: 0
     rx_queue_14_bytes: 0
     rx_queue_15_packets: 0
     rx_queue_15_bytes: 0
     rx_queue_16_packets: 0
     rx_queue_16_bytes: 0
     rx_queue_17_packets: 0
     rx_queue_17_bytes: 0
     rx_queue_18_packets: 0
     rx_queue_18_bytes: 0
     rx_queue_19_packets: 0
     rx_queue_19_bytes: 0
     rx_queue_20_packets: 0
     rx_queue_20_bytes: 0
     rx_queue_21_packets: 0
     rx_queue_21_bytes: 0
     rx_queue_22_packets: 0
     rx_queue_22_bytes: 0
     rx_queue_23_packets: 0
     rx_queue_23_bytes: 0
     rx_queue_24_packets: 0
     rx_queue_24_bytes: 0
     rx_queue_25_packets: 0
     rx_queue_25_bytes: 0
     rx_queue_26_packets: 0
     rx_queue_26_bytes: 0
     rx_queue_27_packets: 0
     rx_queue_27_bytes: 0
     rx_queue_28_packets: 0
     rx_queue_28_bytes: 0
     rx_queue_29_packets: 0
     rx_queue_29_bytes: 0
     rx_queue_30_packets: 0
     rx_queue_30_bytes: 0
     rx_queue_31_packets: 0
     rx_queue_31_bytes: 0
     rx_queue_32_packets: 0
     rx_queue_32_bytes: 0
     rx_queue_33_packets: 0
     rx_queue_33_bytes: 0
     rx_queue_34_packets: 0
     rx_queue_34_bytes: 0
     rx_queue_35_packets: 0
     rx_queue_35_bytes: 0
     rx_queue_36_packets: 0
     rx_queue_36_bytes: 0
     rx_queue_37_packets: 0
     rx_queue_37_bytes: 0
     rx_queue_38_packets: 0
     rx_queue_38_bytes: 0
     rx_queue_39_packets: 0
     rx_queue_39_bytes: 0
     rx_queue_40_packets: 0
     rx_queue_40_bytes: 0
     rx_queue_41_packets: 0
     rx_queue_41_bytes: 0
     rx_queue_42_packets: 0
     rx_queue_42_bytes: 0
     rx_queue_43_packets: 0
     rx_queue_43_bytes: 0
     rx_queue_44_packets: 0
     rx_queue_44_bytes: 0
     rx_queue_45_packets: 0
     rx_queue_45_bytes: 0
     rx_queue_46_packets: 0
     rx_queue_46_bytes: 0
     rx_queue_47_packets: 0
     rx_queue_47_bytes: 0
     rx_queue_48_packets: 0
     rx_queue_48_bytes: 0
     rx_queue_49_packets: 0
     rx_queue_49_bytes: 0
     rx_queue_50_packets: 0
     rx_queue_50_bytes: 0
     rx_queue_51_packets: 0
     rx_queue_51_bytes: 0
     rx_queue_52_packets: 0
     rx_queue_52_bytes: 0
     rx_queue_53_packets: 0
     rx_queue_53_bytes: 0
     rx_queue_54_packets: 0
     rx_queue_54_bytes: 0
     rx_queue_55_packets: 0
     rx_queue_55_bytes: 0
     rx_queue_56_packets: 0
     rx_queue_56_bytes: 0
     rx_queue_57_packets: 0
     rx_queue_57_bytes: 0
     rx_queue_58_packets: 0
     rx_queue_58_bytes: 0
     rx_queue_59_packets: 0
     rx_queue_59_bytes: 0
     rx_queue_60_packets: 0
     rx_queue_60_bytes: 0
     rx_queue_61_packets: 0
     rx_queue_61_bytes: 0
     rx_queue_62_packets: 0
     rx_queue_62_bytes: 0
     rx_queue_63_packets: 0
     rx_queue_63_bytes: 0
     tx_pb_0_pxon: 0
     tx_pb_0_pxoff: 0
     tx_pb_1_pxon: 0
     tx_pb_1_pxoff: 0
     tx_pb_2_pxon: 0
     tx_pb_2_pxoff: 0
     tx_pb_3_pxon: 0
     tx_pb_3_pxoff: 0
     tx_pb_4_pxon: 0
     tx_pb_4_pxoff: 0
     tx_pb_5_pxon: 0
     tx_pb_5_pxoff: 0
     tx_pb_6_pxon: 0
     tx_pb_6_pxoff: 0
     tx_pb_7_pxon: 0
     tx_pb_7_pxoff: 0
     rx_pb_0_pxon: 0
     rx_pb_0_pxoff: 0
     rx_pb_1_pxon: 0
     rx_pb_1_pxoff: 0
     rx_pb_2_pxon: 0
     rx_pb_2_pxoff: 0
     rx_pb_3_pxon: 0
     rx_pb_3_pxoff: 0
     rx_pb_4_pxon: 0
     rx_pb_4_pxoff: 0
     rx_pb_5_pxon: 0
     rx_pb_5_pxoff: 0
     rx_pb_6_pxon: 0
     rx_pb_6_pxoff: 0
     rx_pb_7_pxon: 0
     rx_pb_7_pxoff: 0
root@OpenWrt:~# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:		8192
RX Mini:	0
RX Jumbo:	0
TX:		8192
Current hardware settings:
RX:		512
RX Mini:	0
RX Jumbo:	0
TX:		512

Discussions on the libreqos zulip forum, about aggregate shaper performance on different CPUs...
i3-N305 is not a real performance corer CPU, but a gracemont efficiency only design, aka the CPU formerly known as Atom. In spite of the marketing chutzpah of giving this thing the i3-prefix this is not a really beefy x86b CPU.

Yeah, x86 offers higher performance than the low price Arm options, but the ex-atoms are intended to play in the budget segment and hence are well below the "real" x86 CPUs. And that is not all bad, often enough such an ex-atom is just fine and the higer efficiency results in lower power consumption. But traffic shaping is a costly operation that runs better on faster CPUs with better memory systems. The pi4 is able to shape at 1 Gbps, but there is not much headroom left, while your atom apparently went to 2.5 or even 4 Gbps, so this still fits with your N305 being more oerformant than the Pi4, no (in spite of not being the beefiest x86 CPU)?

Out of curiosity, what do you do with your network to actually perceive a noticeable improvement with a 5 Gbps link than say 1 Gbps?
I should add, that a few years ago my router only allowed to shape around 70 Mbps aggregate, so on my 100/40 dsl link I sacrificed 50% of potential throughput without hesitation, as for my usage-pattern 50/20 Mbps with SQM was clearly superior to 100/40 without. But that is clearly a subjective policy call, each network admin needs to take for their own network. So I am not trying to imply that as this was the policy I chose, you should do the same.

PS.: In theory you shpuld enable revceive packet steering even without SQM.

Sounds normal to me. I got the device after watching this review from ServeTheHome: https://youtu.be/Z-3ZjMSEEc8 - they were able to route about 15 Gbps on the very same machine with the same SFP+ NICs I use for WAN/LAN.

And is it really all that weird? Here's how the CPU usage looks with cake SQM running a speed test:

Doing some basic math, even if 5 Gbps was utilized bidirectionally (it won't be on my end, I have a 5000/1000 uplink) I'd still have about 170% CPU left.

Yes, I should have probably went with a box that has the U300E due to the significantly faster single-threaded performance, but does it really matter? https://www.cpubenchmark.net/compare/5213vs5859/Intel-i3-N305-vs-Intel-U300E

P2P file sharing (torrenting) as my /etc/config/firewall output probably suggested. I'm not upset about not getting my whole 5 Gbps link, the box I use for file sharing only has a 2.5GbE NIC. I got the 5 Gbps link from my ISP because it's the only package they have that provides good upload speed (5000/1000) and it's incredibly cheap in my country. Just a "I want to use what I pay for" kind of thing.

Yes, I did enable it on all cores in LuCI. The "Enabled" setting (not "Enabled (all CPUs)" dropped my throughput by a tiny bit.
image

1 Like

Yeah, but that was likely downhill with wind from the back on a holiday... this is IMHO not a realistic scenario for more gnarly traffic patterns.

Please enable the detailed CPU time display for htop, cake is accounted primarily as soft interrupt and it will be located on a single CPU as cake is essentially single threaded per instance.

Not a valid way of looking at it, as being single threaded cake will run into a throughput ceiling once the CPU it runs on is saturated even if all other CPUs are idle...

That is a question you need to answer :wink: As I implied above, I would not even flinch to just set the shaper to 1000/900 and be done with...

Fair, i also cherish high upload and low asymmetry, so I understand your usecase.

I am happy that OpenWrt offers both options now, as it is really hard to predict which variant works best fior a given SoC.

That changes the coloring a bit

As it should, SIRQ is reported in pink, although with your color scheme I am unsure what I am looking at... :wink:

softirq is qdisc and firewall.

Yeah.. I wish I had an idea though why the throughput is so much lower without SQM.

I did some more experimenting:

  • irqbalance + SQM (either cake/fq_codel) + packet steering (enabled; not "all CPUs") with RPS of 128/256 gets me to 3.8 Gbps, with 1 core being maxed out.
  • Without SQM, my CPU usage under load totals at about 10%. Getting me 1.5~ Gbps.
  • If I enable packet steering, without SQM and without irqbalance, I get about 2.0 Gbps.
  • Packet steering on all CPUs, without SQM and without irqbalance, I get 1.1 Gbps.
  • Packet steering on all CPUs, without SQM and with irqbalance, I get 1.05 Gbps.
  • Packet steering, with SQM (fq_codel) and with irqbalance, I get 3.8-3.9 Gbps, with one core being maxed out
  • Packet steering on all CPUs, with SQM (fq_codel) and with irqbalance, I get 3.4-3.6 Gbps, no cores maxed out
  • Packet steering on all CPUs, with SQM (cake) and with irqbalance, I get 3.2-3.4 Gbps, no cores maxed out

So for some reason, it seems like on OpenWrt I cannot use my uplink's speed unless only one core is being maxed out. When using cake SQM, it puts enough load on a CPU core to max it out, making my uplink actually perform close to expected speeds.

So.. I tested this by putting an artificial 100% CPU load on the system by using the cat /dev/urandom | gzip > /dev/null snippet, without SQM. I was still getting low performance. I noticed the load was on CPU3.

So.. what I tried next was to put the affinity of the gzip process on CPU0 (taskset -p 0x01 $(pgrep gzip)). And now I was getting 3.3 Gbps without SQM. It's not anywhere near as close to my 5 gig uplink as it should be, but it's certainly over 50% faster than what I get without putting artificial load on the CPU. I will do further testing with 50%~ artificial load on one core to see how that affects performance.

I don't know what's up here, but I need high load on certain CPU cores to get close to my expected uplink speeds. This wasn't the case with OPNsense so I assume there's either some kind of bug with OpenWrt, the kernel (either upstream or OpenWrt's patches), or some obvious configuration issue I missed. I get full speed on LAN when running iperf3 tests, and the NICs I use for LAN and WAN are identical.

Here are a few additional things you can try:

1.Set the Scaling Governor to 'performance' mode

Check the current Scaling Governor:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Set/Change the Scaling Governor to 'performance':

echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

2.Test with or without Intel P-States (if supported by your CPU). I’m not sure if your CPU supports Intel P-States, but if it does, you could perform tests with Intel P-States enabled or disabled. Check the current scaling driver:
Run the following command to see which driver is used for CPU frequency scaling:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver

If Intel P-State is enabled, the output should be intel_pstate.Check the Intel P-State status:

To verify if Intel P-State is active, use this command:

cat /sys/devices/system/cpu/intel_pstate/status

If it's enabled, the output should be active.

Enable/Disable Intel P-State: You can modify the Grub configuration to enable, disable, or set Intel P-State to 'passive' mode.

nano /boot/grub/grub.cfg

Add or modify the line:

intel_pstate=enable

Replace enable with disable or passive as needed.

3.Disable C-States.

nano /boot/grub/grub.cfg

Add or modify the line:

intel_idle.max_cstate=0

Note: Some of these settings can lead to increased power consumption.

1 Like

You haven't said much about the other end you're testing against, and it seems to me that your unit appearing to require SQM to get the performance up leads me to wonder whether you're dealing with a performance issue in the uplink part of the system that you may not have any control over.

One example I'm aware of is a wholesale carrier using a "policer" to enforce data rates on customer services. The particular settings used for some bandwidth plans offered to customers have savage effects on uplink throughput if traffic patterns trigger the policer, which just drops traffic. Some CPE seems to not fall foul of this very often while other CPE reliably triggers the policer effectively cutting the feasible throughput in half unless it can support some shaping (i.e. SQM) function.

A related possibility might involve the TCP/IP congestion control algorithm on the device generating the uplink data flow through the PPPoE tunnel, though ISTR that this might be more likely an issue on very long distance connections and not much of an issue on short distance connections.

1 Like

I seem to recall a case where @tohojo was involved when memory serves right, where "too small buffers" somewhere on the path required proper pacing of the data, and that is something sqm is decent at. It might help to test using the fq qdisc on the load generating machine as that can do pacing (and IIRC ´does so by default)...