Sid
August 20, 2024, 10:54pm
1
On my NanoPi R2S I’ve sqm setup with cake/piece of cake on openwrt v23.05.4.
Recently I’ve observed that sqm is turning off by itself, doing service sqm status shows this,
service sqm status
active with no instances
I’ve confirmed that the tick box ‘Enable this sqm instance’ is checked in luci.
After doing service sqm restart the output is this
service sqm restart
SQM: Stopping SQM on eth0
SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0 ingress
SQM: ERROR: cmd_wrapper: tc: LAST ERROR: RTNETLINK answers: Invalid argument
SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0 root
SQM: ERROR: cmd_wrapper: tc: LAST ERROR: RTNETLINK answers: No such file or directory
SQM: Starting SQM script: piece_of_cake.qos on eth0, in: 151000 Kbps, out: 166000 Kbps
SQM: piece_of_cake.qos was started on eth0 successfully
My /etc/config/sqm is this
cat /etc/config/sqm
config queue 'eth1'
option enabled '1'
option interface 'eth0'
option download '151000'
option upload '166000'
option qdisc 'cake'
option script 'piece_of_cake.qos'
option qdisc_advanced '1'
option ingress_ecn 'ECN'
option egress_ecn 'NOECN'
option qdisc_really_really_advanced '1'
option linklayer 'ethernet'
option debug_logging '0'
option verbosity '5'
option squash_dscp '1'
option squash_ingress '1'
option iqdisc_opts 'nat dual-dsthost ingress'
option eqdisc_opts 'nat dual-srchost'
option overhead '50'
And tc -s qdisc
qdisc cake 8033: dev eth0 root refcnt 9 bandwidth 166Mbit besteffort dual-srchost nat nowash no-ack-filter split-gso rtt 100ms noatm overhead 50
Sent 4090815 bytes 20637 pkt (dropped 0, overlimits 140 requeues 0)
backlog 0b 0p requeues 0
memory used: 27370b of 8300000b
capacity estimate: 166Mbit
min/max network layer size: 28 / 1500
min/max overhead-adjusted size: 78 / 1550
average network hdr offset: 14
Tin 0
thresh 166Mbit
target 5ms
interval 100ms
pk_delay 34us
av_delay 17us
sp_delay 7us
backlog 0b
pkts 20637
bytes 4090815
way_inds 83
way_miss 763
way_cols 0
drops 0
marks 0
ack_drop 0
sp_flows 1
bk_flows 1
un_flows 0
max_len 15846
quantum 1514
qdisc cake 8034: dev ifb4eth0 root refcnt 2 bandwidth 151Mbit besteffort dual-dsthost nat wash ingress no-ack-filter split-gso rtt 100ms noatm overhead 50
Sent 229046744 bytes 185263 pkt (dropped 2, overlimits 202047 requeues 0)
backlog 0b 0p requeues 0
memory used: 238080b of 7550000b
capacity estimate: 151Mbit
min/max network layer size: 46 / 1500
min/max overhead-adjusted size: 96 / 1550
average network hdr offset: 14
Tin 0
thresh 151Mbit
target 5ms
interval 100ms
pk_delay 315us
av_delay 117us
sp_delay 5us
backlog 0b
pkts 185265
bytes 229049332
way_inds 32
way_miss 433
way_cols 0
drops 2
marks 0
ack_drop 0
sp_flows 0
bk_flows 1
un_flows 0
max_len 7544
quantum 1514
Offloading of any kind has been turned off on my NanoPi.
Is this a new bug in v23.05.4?
Looks completely normal to me.
1 Like
Sid
August 21, 2024, 2:15am
3
I stumbled upon this issue as I did a random speedtest and saw higher bufferbloat. So since then I’ve noticed sqm goes down at random times and I’ve to restart the service again.
Also what’s causing these errors after a service restart of sqm?
service sqm restart
SQM: Stopping SQM on eth0
SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0 ingress
SQM: ERROR: cmd_wrapper: tc: LAST ERROR: RTNETLINK answers: Invalid argument
SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0 root
SQM: ERROR: cmd_wrapper: tc: LAST ERROR: RTNETLINK answers: No such file or directory
SQM: Starting SQM script: piece_of_cake.qos on eth0, in: 151000 Kbps, out: 166000 Kbps
SQM: piece_of_cake.qos was started on eth0 successfully
Ok, I see what you mean during the restart. Next time, run tc qdisc
before restarting to capture the state of eth0 before restarting.
brada4
August 21, 2024, 3:06am
5
Do you have any like screenshot of measurement?
What you posted so far is normal output of optimally running sqm
Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </>
" button:
Remember to redact passwords, MAC addresses and any public IP addresses you may have:
ubus call system board
cat /etc/config/network
cat /etc/config/dhcp
cat /etc/config/firewall
Sid
August 21, 2024, 4:43am
6
Without sqm I usually get higher download latency like this
With sqm everything is fine
ubus call system board
{
"kernel": "5.15.162",
"hostname": "NanoPi",
"system": "ARMv8 Processor rev 4",
"model": "FriendlyElec NanoPi R2S",
"board_name": "friendlyarm,nanopi-r2s",
"rootfs_type": "squashfs",
"release": {
"distribution": "OpenWrt",
"version": "23.05.4",
"revision": "r24012-d8dd03c46f",
"target": "rockchip/armv8",
"description": "OpenWrt 23.05.4 r24012-d8dd03c46f”
cat /etc/config/network
config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals 'globals'
option ula_prefix 'fde5:2117:ba54::/48'
option packet_steering '1'
config device
option name 'br-lan'
option type 'bridge'
list ports 'eth1'
config device
option name 'eth1'
option macaddr 'MAC address'
config interface 'lan'
option device 'br-lan'
option proto 'static'
option ipaddr '192.168.1.1'
option netmask '255.255.255.0'
option ip6assign '60'
config device
option name 'eth0'
option macaddr 'MAC address'
config interface 'wan'
option device 'eth0'
option proto 'static'
option ipaddr '192.168.29.175'
option netmask '255.255.255.0'
option gateway '192.168.29.1'
list dns '1.1.1.1'
list dns '1.0.0.1'
config interface 'wan6'
option device 'eth0'
option proto 'dhcpv6'
option reqaddress 'try'
option reqprefix 'auto'
option peerdns '0'
list dns '2606:4700:4700::1111'
list dns '2606:4700:4700::1001'
option sourcefilter '0'
config interface 'tailscale'
option proto 'none'
option device 'tailscale0'
config interface 'wg0'
option proto 'wireguard'
option private_key 'WG key'
list addresses '192.168.195.1/24'
list addresses 'fd7c:35df:4fab::1/64'
list addresses 'fe80::1/64'
option listen_port '51820'
config wireguard_wg0
option description 'iPhone'
option public_key 'Peer Key'
option persistent_keepalive '25'
list allowed_ips '192.168.195.2/32'
list allowed_ips 'fd7c:35df:4fab::2/128'
list allowed_ips 'fe80::2/128'
option endpoint_host 'ddns domain'
option endpoint_port '51820'
cat /etc/config/dhcp
config dnsmasq
option domainneeded '1'
option localise_queries '1'
option rebind_protection '0'
option local '/lan/'
option domain 'lan'
option expandhosts '1'
option cachesize '1000'
option authoritative '1'
option readethers '1'
option leasefile '/tmp/dhcp.leases'
option resolvfile '/tmp/resolv.conf.d/resolv.conf.auto'
option localservice '0'
option ednspacket_max '1232'
option port '54'
config dhcp 'lan'
option interface 'lan'
option start '100'
option limit '150'
option leasetime '12h'
option dhcpv4 'server'
option dhcpv6 'relay'
option ra 'relay'
option ndp 'relay'
list dhcp_option '6,192.168.1.1'
list dhcp_option '3,192.168.1.1'
config dhcp 'wan'
option interface 'wan'
option ignore '1'
option start '100'
option limit '150'
option leasetime '12h'
config odhcpd 'odhcpd'
option maindhcp '0'
option leasefile '/tmp/hosts/odhcpd'
option leasetrigger '/usr/sbin/odhcpd-update'
option loglevel '4'
config dhcp 'wan6'
option interface 'wan6'
option ignore '1'
option master '1'
option ra 'relay'
option dhcpv6 'relay'
option ndp 'relay'
cat /etc/config/firewall
config defaults
option input 'REJECT'
option output 'ACCEPT'
option forward 'REJECT'
option synflood_protect '1'
config zone
option name 'lan'
list network 'lan'
option input 'ACCEPT'
option output 'ACCEPT'
option forward 'ACCEPT'
config zone
option name 'wan'
list network 'wan'
list network 'wan6'
option input 'REJECT'
option output 'ACCEPT'
option forward 'REJECT'
option mtu_fix '1'
option masq '1'
config forwarding
option src 'lan'
option dest 'wan'
config rule
option name 'Allow-DHCP-Renew'
option src 'wan'
option proto 'udp'
option dest_port '68'
option target 'ACCEPT'
option family 'ipv4'
config rule
option name 'Allow-Ping'
option src 'wan'
option proto 'icmp'
option icmp_type 'echo-request'
option family 'ipv4'
option target 'ACCEPT'
config rule
option name 'Allow-IGMP'
option src 'wan'
option proto 'igmp'
option family 'ipv4'
option target 'ACCEPT'
config rule
option name 'Allow-DHCPv6'
option src 'wan'
option proto 'udp'
option dest_port '546'
option family 'ipv6'
option target 'ACCEPT'
config rule
option name 'Allow-MLD'
option src 'wan'
option proto 'icmp'
option src_ip 'fe80::/10'
list icmp_type '130/0'
list icmp_type '131/0'
list icmp_type '132/0'
list icmp_type '143/0'
option family 'ipv6'
option target 'ACCEPT'
config rule
option name 'Allow-ICMPv6-Input'
option src 'wan'
option proto 'icmp'
list icmp_type 'echo-request'
list icmp_type 'echo-reply'
list icmp_type 'destination-unreachable'
list icmp_type 'packet-too-big'
list icmp_type 'time-exceeded'
list icmp_type 'bad-header'
list icmp_type 'unknown-header-type'
list icmp_type 'router-solicitation'
list icmp_type 'neighbour-solicitation'
list icmp_type 'router-advertisement'
list icmp_type 'neighbour-advertisement'
option limit '1000/sec'
option family 'ipv6'
option target 'ACCEPT'
config rule
option name 'Allow-ICMPv6-Forward'
option src 'wan'
option dest '*'
option proto 'icmp'
list icmp_type 'echo-request'
list icmp_type 'echo-reply'
list icmp_type 'destination-unreachable'
list icmp_type 'packet-too-big'
list icmp_type 'time-exceeded'
list icmp_type 'bad-header'
list icmp_type 'unknown-header-type'
option limit '1000/sec'
option family 'ipv6'
option target 'ACCEPT'
config rule
option name 'Allow-IPSec-ESP'
option src 'wan'
option dest 'lan'
option proto 'esp'
option target 'ACCEPT'
config rule
option name 'Allow-ISAKMP'
option src 'wan'
option dest 'lan'
option dest_port '500'
option proto 'udp'
option target 'ACCEPT'
config zone
option name 'tailscale'
option input 'ACCEPT'
option output 'ACCEPT'
option forward 'ACCEPT'
option masq '1'
option mtu_fix '1'
list device 'tailscale0'
list network 'tailscale'
config forwarding
option src 'tailscale'
option dest 'lan'
config forwarding
option src 'tailscale'
option dest 'wan'
config forwarding
option src 'lan'
option dest 'tailscale'
config redirect
option dest 'lan'
option target 'DNAT'
option name 'Intercept-DNS'
option family 'any'
option src 'lan'
option src_dport '53'
option dest_port '53'
config rule
option name 'Tailscale-Allow'
list proto 'udp'
option src '*'
option dest_port '41641'
option target 'ACCEPT'
config zone
option name 'wg0'
option input 'ACCEPT'
option output 'ACCEPT'
option forward 'ACCEPT'
list network 'wg0'
option masq '1'
option mtu_fix '1'
config forwarding
option src 'wg0'
option dest 'lan'
config forwarding
option src 'wg0'
option dest 'wan'
config forwarding
option src 'lan'
option dest 'wg0'
config rule
option name 'Wireguard-Allow'
list proto 'udp'
option src 'wan'
option dest_port '51820'
option target 'ACCEPT'
Sid
August 22, 2024, 4:59am
7
So sqm stopped working itself again, this is the output of tc -s qdisc in the current state
tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc mq 0: dev eth0 root
Sent 308858382 bytes 255667 pkt (dropped 0, overlimits 0 requeues 65)
backlog 0b 0p requeues 65
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
Sent 308858382 bytes 255667 pkt (dropped 0, overlimits 0 requeues 65)
backlog 0b 0p requeues 65
maxpacket 68130 drop_overlimit 0 new_flow_count 13347 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
Sent 1836112982 bytes 1663705 pkt (dropped 0, overlimits 0 requeues 1268)
backlog 0b 0p requeues 1268
maxpacket 16654 drop_overlimit 0 new_flow_count 10602 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev wg0 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev ifb4eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev tailscale0 root refcnt 2 limit 10240p flows 1024 quantum 1500 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
Sent 456 bytes 6 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
Again the same error as I restarted the service
service sqm restart
SQM: Stopping SQM on eth0
SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0 ingress
SQM: ERROR: cmd_wrapper: tc: LAST ERROR: RTNETLINK answers: Invalid argument
SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0 root
SQM: ERROR: cmd_wrapper: tc: LAST ERROR: RTNETLINK answers: No such file or directory
SQM: Starting SQM script: piece_of_cake.qos on eth0, in: 151000 Kbps, out: 166000 Kbps
SQM: piece_of_cake.qos was started on eth0 successfully
This smells like a hotplug issue, namely eth0 might go down and up again, if hotplug does not trigger for both events you end up without sqm active. Thinking it through hotplug down might actually work, after all both cake instances are gone...
Does dmesg
reveal anything about eth0? Or does logread
reveal anything odd?
brada4
August 22, 2024, 6:25am
9
maxpacket 16654
you need to disable gro and tso to make codel work.
Cake will by default split big meta-packets so nothing to do for cake...
For fq_codel it is arguable whether one wants to split meta-packets or not (meta packets are processed more efficiently in the kernel, but increase the "temporal lumpiness" of transmissions).
Sid
August 22, 2024, 9:46am
11
I couldn’t find anything related to sqm in logread when it went down, didn’t check dmesg. I’ll make sure to check dmesg whenever it goes down the next time.
Why does service status of sqm show no instances active?
service sqm status
active with no instances
It gives the same status output even when an sqm instance is clearly active.
Sid
August 22, 2024, 9:48am
12
brada4:
disable gro and tso
I have no idea what those two are to be honest. I’m only using cake/piece of cake on my wan (eth0).
brada4
August 22, 2024, 10:12am
14
New default qdisc for multiqueue device since like 4.late kernels.
1 Like