Sqm turning off randomly

On my NanoPi R2S I’ve sqm setup with cake/piece of cake on openwrt v23.05.4.

Recently I’ve observed that sqm is turning off by itself, doing service sqm status shows this,

service sqm status
active with no instances

I’ve confirmed that the tick box ‘Enable this sqm instance’ is checked in luci.

After doing service sqm restart the output is this

service sqm restart
SQM: Stopping SQM on eth0
SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0 ingress
SQM: ERROR: cmd_wrapper: tc: LAST ERROR: RTNETLINK answers: Invalid argument
SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0 root
SQM: ERROR: cmd_wrapper: tc: LAST ERROR: RTNETLINK answers: No such file or directory
SQM: Starting SQM script: piece_of_cake.qos on eth0, in: 151000 Kbps, out: 166000 Kbps
SQM: piece_of_cake.qos was started on eth0 successfully

My /etc/config/sqm is this

cat /etc/config/sqm

config queue 'eth1'
        option enabled '1'
        option interface 'eth0'
        option download '151000'
        option upload '166000'
        option qdisc 'cake'
        option script 'piece_of_cake.qos'
        option qdisc_advanced '1'
        option ingress_ecn 'ECN'
        option egress_ecn 'NOECN'
        option qdisc_really_really_advanced '1'
        option linklayer 'ethernet'
        option debug_logging '0'
        option verbosity '5'
        option squash_dscp '1'
        option squash_ingress '1'
        option iqdisc_opts 'nat dual-dsthost ingress'
        option eqdisc_opts 'nat dual-srchost'
        option overhead '50'

And tc -s qdisc

qdisc cake 8033: dev eth0 root refcnt 9 bandwidth 166Mbit besteffort dual-srchost nat nowash no-ack-filter split-gso rtt 100ms noatm overhead 50 
 Sent 4090815 bytes 20637 pkt (dropped 0, overlimits 140 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 27370b of 8300000b
 capacity estimate: 166Mbit
 min/max network layer size:           28 /    1500
 min/max overhead-adjusted size:       78 /    1550
 average network hdr offset:           14

                  Tin 0
  thresh        166Mbit
  target            5ms
  interval        100ms
  pk_delay         34us
  av_delay         17us
  sp_delay          7us
  backlog            0b
  pkts            20637
  bytes         4090815
  way_inds           83
  way_miss          763
  way_cols            0
  drops               0
  marks               0
  ack_drop            0
  sp_flows            1
  bk_flows            1
  un_flows            0
  max_len         15846
  quantum          1514
qdisc cake 8034: dev ifb4eth0 root refcnt 2 bandwidth 151Mbit besteffort dual-dsthost nat wash ingress no-ack-filter split-gso rtt 100ms noatm overhead 50 
 Sent 229046744 bytes 185263 pkt (dropped 2, overlimits 202047 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 238080b of 7550000b
 capacity estimate: 151Mbit
 min/max network layer size:           46 /    1500
 min/max overhead-adjusted size:       96 /    1550
 average network hdr offset:           14

                  Tin 0
  thresh        151Mbit
  target            5ms
  interval        100ms
  pk_delay        315us
  av_delay        117us
  sp_delay          5us
  backlog            0b
  pkts           185265
  bytes       229049332
  way_inds           32
  way_miss          433
  way_cols            0
  drops               2
  marks               0
  ack_drop            0
  sp_flows            0
  bk_flows            1
  un_flows            0
  max_len          7544
  quantum          1514

Offloading of any kind has been turned off on my NanoPi.

Is this a new bug in v23.05.4?

Looks completely normal to me.

1 Like

I stumbled upon this issue as I did a random speedtest and saw higher bufferbloat. So since then I’ve noticed sqm goes down at random times and I’ve to restart the service again.

Also what’s causing these errors after a service restart of sqm?

service sqm restart
SQM: Stopping SQM on eth0
SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0 ingress
SQM: ERROR: cmd_wrapper: tc: LAST ERROR: RTNETLINK answers: Invalid argument
SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0 root
SQM: ERROR: cmd_wrapper: tc: LAST ERROR: RTNETLINK answers: No such file or directory
SQM: Starting SQM script: piece_of_cake.qos on eth0, in: 151000 Kbps, out: 166000 Kbps
SQM: piece_of_cake.qos was started on eth0 successfully

Ok, I see what you mean during the restart. Next time, run tc qdisc before restarting to capture the state of eth0 before restarting.

Do you have any like screenshot of measurement?
What you posted so far is normal output of optimally running sqm

Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button:
grafik
Remember to redact passwords, MAC addresses and any public IP addresses you may have:

ubus call system board
cat /etc/config/network
cat /etc/config/dhcp
cat /etc/config/firewall

Without sqm I usually get higher download latency like this

With sqm everything is fine

ubus call system board
{
        "kernel": "5.15.162",
        "hostname": "NanoPi",
        "system": "ARMv8 Processor rev 4",
        "model": "FriendlyElec NanoPi R2S",
        "board_name": "friendlyarm,nanopi-r2s",
        "rootfs_type": "squashfs",
        "release": {
                "distribution": "OpenWrt",
                "version": "23.05.4",
                "revision": "r24012-d8dd03c46f",
                "target": "rockchip/armv8",
                "description": "OpenWrt 23.05.4 r24012-d8dd03c46f”
cat /etc/config/network

config interface 'loopback'
        option device 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fde5:2117:ba54::/48'
        option packet_steering '1'

config device
        option name 'br-lan'
        option type 'bridge'
        list ports 'eth1'

config device
        option name 'eth1'
        option macaddr 'MAC address'

config interface 'lan'
        option device 'br-lan'
        option proto 'static'
        option ipaddr '192.168.1.1'
        option netmask '255.255.255.0'
        option ip6assign '60'

config device
        option name 'eth0'
        option macaddr 'MAC address'

config interface 'wan'
        option device 'eth0'
        option proto 'static'
        option ipaddr '192.168.29.175'
        option netmask '255.255.255.0'
        option gateway '192.168.29.1'
        list dns '1.1.1.1'
        list dns '1.0.0.1'

config interface 'wan6'
        option device 'eth0'
        option proto 'dhcpv6'
        option reqaddress 'try'
        option reqprefix 'auto'
        option peerdns '0'
        list dns '2606:4700:4700::1111'
        list dns '2606:4700:4700::1001'
        option sourcefilter '0'

config interface 'tailscale'
        option proto 'none'
        option device 'tailscale0'

config interface 'wg0'
        option proto 'wireguard'
        option private_key 'WG key'
        list addresses '192.168.195.1/24'
        list addresses 'fd7c:35df:4fab::1/64'
        list addresses 'fe80::1/64'
        option listen_port '51820'

config wireguard_wg0
        option description 'iPhone'
        option public_key 'Peer Key'
        option persistent_keepalive '25'
        list allowed_ips '192.168.195.2/32'
        list allowed_ips 'fd7c:35df:4fab::2/128'
        list allowed_ips 'fe80::2/128'
        option endpoint_host 'ddns domain'
        option endpoint_port '51820'
cat /etc/config/dhcp

config dnsmasq
        option domainneeded '1'
        option localise_queries '1'
        option rebind_protection '0'
        option local '/lan/'
        option domain 'lan'
        option expandhosts '1'
        option cachesize '1000'
        option authoritative '1'
        option readethers '1'
        option leasefile '/tmp/dhcp.leases'
        option resolvfile '/tmp/resolv.conf.d/resolv.conf.auto'
        option localservice '0'
        option ednspacket_max '1232'
        option port '54'

config dhcp 'lan'
        option interface 'lan'
        option start '100'
        option limit '150'
        option leasetime '12h'
        option dhcpv4 'server'
        option dhcpv6 'relay'
        option ra 'relay'
        option ndp 'relay'
        list dhcp_option '6,192.168.1.1'
        list dhcp_option '3,192.168.1.1'

config dhcp 'wan'
        option interface 'wan'
        option ignore '1'
        option start '100'
        option limit '150'
        option leasetime '12h'

config odhcpd 'odhcpd'
        option maindhcp '0'
        option leasefile '/tmp/hosts/odhcpd'
        option leasetrigger '/usr/sbin/odhcpd-update'
        option loglevel '4'

config dhcp 'wan6'
        option interface 'wan6'
        option ignore '1'
        option master '1'
        option ra 'relay'
        option dhcpv6 'relay'
        option ndp 'relay'
cat /etc/config/firewall 

config defaults
        option input 'REJECT'
        option output 'ACCEPT'
        option forward 'REJECT'
        option synflood_protect '1'

config zone
        option name 'lan'
        list network 'lan'
        option input 'ACCEPT'
        option output 'ACCEPT'
        option forward 'ACCEPT'

config zone
        option name 'wan'
        list network 'wan'
        list network 'wan6'
        option input 'REJECT'
        option output 'ACCEPT'
        option forward 'REJECT'
        option mtu_fix '1'
        option masq '1'

config forwarding
        option src 'lan'
        option dest 'wan'

config rule
        option name 'Allow-DHCP-Renew'
        option src 'wan'
        option proto 'udp'
        option dest_port '68'
        option target 'ACCEPT'
        option family 'ipv4'

config rule
        option name 'Allow-Ping'
        option src 'wan'
        option proto 'icmp'
        option icmp_type 'echo-request'
        option family 'ipv4'
        option target 'ACCEPT'

config rule
        option name 'Allow-IGMP'
        option src 'wan'
        option proto 'igmp'
        option family 'ipv4'
        option target 'ACCEPT'

config rule
        option name 'Allow-DHCPv6'
        option src 'wan'
        option proto 'udp'
        option dest_port '546'
        option family 'ipv6'
        option target 'ACCEPT'

config rule
        option name 'Allow-MLD'
        option src 'wan'
        option proto 'icmp'
        option src_ip 'fe80::/10'
        list icmp_type '130/0'
        list icmp_type '131/0'
        list icmp_type '132/0'
        list icmp_type '143/0'
        option family 'ipv6'
        option target 'ACCEPT'

config rule
        option name 'Allow-ICMPv6-Input'
        option src 'wan'
        option proto 'icmp'
        list icmp_type 'echo-request'
        list icmp_type 'echo-reply'
        list icmp_type 'destination-unreachable'
        list icmp_type 'packet-too-big'
        list icmp_type 'time-exceeded'
        list icmp_type 'bad-header'
        list icmp_type 'unknown-header-type'
        list icmp_type 'router-solicitation'
        list icmp_type 'neighbour-solicitation'
        list icmp_type 'router-advertisement'
        list icmp_type 'neighbour-advertisement'
        option limit '1000/sec'
        option family 'ipv6'
        option target 'ACCEPT'

config rule
        option name 'Allow-ICMPv6-Forward'
        option src 'wan'
        option dest '*'
        option proto 'icmp'
        list icmp_type 'echo-request'
        list icmp_type 'echo-reply'
        list icmp_type 'destination-unreachable'
        list icmp_type 'packet-too-big'
        list icmp_type 'time-exceeded'
        list icmp_type 'bad-header'
        list icmp_type 'unknown-header-type'
        option limit '1000/sec'
        option family 'ipv6'
        option target 'ACCEPT'

config rule
        option name 'Allow-IPSec-ESP'
        option src 'wan'
        option dest 'lan'
        option proto 'esp'
        option target 'ACCEPT'

config rule
        option name 'Allow-ISAKMP'
        option src 'wan'
        option dest 'lan'
        option dest_port '500'
        option proto 'udp'
        option target 'ACCEPT'

config zone
        option name 'tailscale'
        option input 'ACCEPT'
        option output 'ACCEPT'
        option forward 'ACCEPT'
        option masq '1'
        option mtu_fix '1'
        list device 'tailscale0'
        list network 'tailscale'

config forwarding
        option src 'tailscale'
        option dest 'lan'

config forwarding
        option src 'tailscale'
        option dest 'wan'

config forwarding
        option src 'lan'
        option dest 'tailscale'

config redirect
        option dest 'lan'
        option target 'DNAT'
        option name 'Intercept-DNS'
        option family 'any'
        option src 'lan'
        option src_dport '53'
        option dest_port '53'

config rule
        option name 'Tailscale-Allow'
        list proto 'udp'
        option src '*'
        option dest_port '41641'
        option target 'ACCEPT'

config zone
        option name 'wg0'
        option input 'ACCEPT'
        option output 'ACCEPT'
        option forward 'ACCEPT'
        list network 'wg0'
        option masq '1'
        option mtu_fix '1'

config forwarding
        option src 'wg0'
        option dest 'lan'

config forwarding
        option src 'wg0'
        option dest 'wan'

config forwarding
        option src 'lan'
        option dest 'wg0'

config rule
        option name 'Wireguard-Allow'
        list proto 'udp'
        option src 'wan'
        option dest_port '51820'
        option target 'ACCEPT'

So sqm stopped working itself again, this is the output of tc -s qdisc in the current state

tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc mq 0: dev eth0 root 
 Sent 308858382 bytes 255667 pkt (dropped 0, overlimits 0 requeues 65) 
 backlog 0b 0p requeues 65
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64 
 Sent 308858382 bytes 255667 pkt (dropped 0, overlimits 0 requeues 65) 
 backlog 0b 0p requeues 65
  maxpacket 68130 drop_overlimit 0 new_flow_count 13347 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64 
 Sent 1836112982 bytes 1663705 pkt (dropped 0, overlimits 0 requeues 1268) 
 backlog 0b 0p requeues 1268
  maxpacket 16654 drop_overlimit 0 new_flow_count 10602 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wg0 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev ifb4eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev tailscale0 root refcnt 2 limit 10240p flows 1024 quantum 1500 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64 
 Sent 456 bytes 6 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

Again the same error as I restarted the service

service sqm restart
SQM: Stopping SQM on eth0
SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0 ingress
SQM: ERROR: cmd_wrapper: tc: LAST ERROR: RTNETLINK answers: Invalid argument
SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0 root
SQM: ERROR: cmd_wrapper: tc: LAST ERROR: RTNETLINK answers: No such file or directory
SQM: Starting SQM script: piece_of_cake.qos on eth0, in: 151000 Kbps, out: 166000 Kbps
SQM: piece_of_cake.qos was started on eth0 successfully

This smells like a hotplug issue, namely eth0 might go down and up again, if hotplug does not trigger for both events you end up without sqm active. Thinking it through hotplug down might actually work, after all both cake instances are gone...
Does dmesg reveal anything about eth0? Or does logread reveal anything odd?

maxpacket 16654

you need to disable gro and tso to make codel work.

Cake will by default split big meta-packets so nothing to do for cake...
For fq_codel it is arguable whether one wants to split meta-packets or not (meta packets are processed more efficiently in the kernel, but increase the "temporal lumpiness" of transmissions).

I couldn’t find anything related to sqm in logread when it went down, didn’t check dmesg. I’ll make sure to check dmesg whenever it goes down the next time.

Why does service status of sqm show no instances active?

service sqm status
active with no instances

It gives the same status output even when an sqm instance is clearly active.

I have no idea what those two are to be honest. I’m only using cake/piece of cake on my wan (eth0).

What is this:

New default qdisc for multiqueue device since like 4.late kernels.

1 Like