SQM high latency

Hi, I recently upgraded from a 30/10 cable connection to 120/20 and my TP-Link Archer C7 hit 100% load doing SQM QoS, resulting in ping spikes. I bought a new Linksys WRT3200ACM thinking the 1.8GHz dual core would be capable of keeping latency low despite downloads but I seem to notice the same behavior, with CPU load around 9%. Could it be a software or configuration issue? No Wifi is involved in this case.

What brand/model router do you have?
Linksys WRT3200ACM (same with TP-Link Archer C7)
What version of LEDE are you using?
r2987-25200ae
How do you connect to the internet? Cable? DSL, other?
Cable
What's your nominal/expected/advertised download speed? Upload speed?
120 / 20
If you turn off all QoS/management, what are your measured download/upload speeds?

What is the WAN "interface name" in the Network -> Interfaces page?
eth1
What parameters do you see in the Network -> SQM-QoS values?
Enabled, eth1, 100000, 20000, cake, piece of cake, no link layer adaptation


Bufferbloat alternates between 0ms and 500ms every second. Same with fq_codel/simple

Cake qdisc is not installed by LEDE default but the scripts fail silently. So it may not set anything up.
simplest.qos should run well on Archer C7v2 ("me too"). try testing from 60 to 120 at 15 Mbps intervals. if you ssh into your router, then set option shaper_burst 1 in /etc/config/sqm and this can help.

Mmmh, if cake would be missing I would expect speedtest just the 131/22 one above. Could you post the output of "tc -d qdisc", "tc -d class show dev eth1" and "tc -d class show dev ifb4eth1", please? Maybe this will give some hints. Also Eric"s advise to test simple.qos with fq_codel as qdisc is great to figure out whether your issue is cake specific or sqm generic.

Thanks for the suggestions. Eric, I verified cake is indeed installed. Download limit doesn't matter, I notice bufferbloat whether I set 100Mbit ingress or even 10Mbit. I tried enabling shaper_burst and ping increased over 1 second.

config queue 'eth1'
option interface 'eth1'
option qdisc_advanced '0'
option linklayer 'none'
option debug_logging '0'
option verbosity '5'
option enabled '1'
option upload '20000'
option qdisc 'cake'
option script 'piece_of_cake.qos'
option download '100000'
option shaper_burst 1

Here's the output of tc commands :

tc -d qdisc

qdisc noqueue 0: dev lo root refcnt 2
qdisc mq 0: dev eth0 root
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :5 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :6 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :7 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :8 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc cake 8035: dev eth1 root refcnt 9 bandwidth 20Mbit besteffort triple-isolate wash rtt 100.0ms noatm overhead 14
qdisc ingress ffff: dev eth1 parent ffff:fff1 ----------------
qdisc noqueue 0: dev br-lan root refcnt 2
qdisc noqueue 0: dev 6rd-wan6 root refcnt 2
qdisc noqueue 0: dev 6rd-wan_6 root refcnt 2
qdisc fq_codel 0: dev ifb0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc cake 8036: dev ifb4eth1 root refcnt 2 bandwidth 100Mbit besteffort triple-isolate wash rtt 100.0ms noatm overhead 14

tc -d class show dev eth1

class cake 8035:39 parent 8035:
class cake 8035:122 parent 8035:

tc -d class show dev ifb4eth1

class cake 8036:372 parent 8036:

Here's simple.qos with fq_codel.

tc -d qdisc

qdisc noqueue 0: dev lo root refcnt 2
qdisc mq 0: dev eth0 root
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :5 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :6 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :7 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth0 parent :8 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc htb 1: dev eth1 root refcnt 9 r2q 10 default 12 direct_packets_stat 0 ver 3.17 direct_qlen 532
qdisc fq_codel 110: dev eth1 parent 1:11 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 120: dev eth1 parent 1:12 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 130: dev eth1 parent 1:13 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn
qdisc ingress ffff: dev eth1 parent ffff:fff1 ----------------
qdisc noqueue 0: dev br-lan root refcnt 2
qdisc noqueue 0: dev 6rd-wan6 root refcnt 2
qdisc noqueue 0: dev 6rd-wan_6 root refcnt 2
qdisc fq_codel 0: dev ifb0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc htb 1: dev ifb4eth1 root refcnt 2 r2q 10 default 10 direct_packets_stat 0 ver 3.17 direct_qlen 32
qdisc fq_codel 110: dev ifb4eth1 parent 1:10 limit 1001p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn

tc -d class show dev eth1

class htb 1:11 parent 1:1 leaf 110: prio 1 quantum 1500 rate 128Kbit ceil 6666Kbit linklayer ethernet burst 1600b/1 mpu 0b overhead 0b cburst 1599b/1 mpu 0b overhead 0b level 0
class htb 1:1 root rate 20Mbit ceil 20Mbit linklayer ethernet burst 1600b/1 mpu 0b overhead 0b cburst 1600b/1 mpu 0b overhead 0b level 7
class htb 1:10 parent 1:1 prio 0 quantum 1500 rate 20Mbit ceil 20Mbit linklayer ethernet burst 1600b/1 mpu 0b overhead 0b cburst 1600b/1 mpu 0b overhead 0b level 0
class htb 1:13 parent 1:1 leaf 130: prio 3 quantum 1500 rate 3333Kbit ceil 19984Kbit linklayer ethernet burst 1599b/1 mpu 0b overhead 0b cburst 1598b/1 mpu 0b overhead 0b level 0
class htb 1:12 parent 1:1 leaf 120: prio 2 quantum 1500 rate 3333Kbit ceil 19984Kbit linklayer ethernet burst 1599b/1 mpu 0b overhead 0b cburst 1598b/1 mpu 0b overhead 0b level 0
class fq_codel 110:2af parent 110:
class fq_codel 120:11 parent 120:
class fq_codel 120:1f parent 120:
class fq_codel 120:56 parent 120:
class fq_codel 120:97 parent 120:
class fq_codel 120:c7 parent 120:
class fq_codel 120:134 parent 120:
class fq_codel 120:14d parent 120:
class fq_codel 120:156 parent 120:
class fq_codel 120:16e parent 120:
class fq_codel 120:19b parent 120:
class fq_codel 120:1c2 parent 120:
class fq_codel 120:204 parent 120:
class fq_codel 120:21c parent 120:
class fq_codel 120:226 parent 120:
class fq_codel 120:255 parent 120:
class fq_codel 120:26d parent 120:
class fq_codel 120:272 parent 120:
class fq_codel 120:28c parent 120:
class fq_codel 120:298 parent 120:
class fq_codel 120:2ed parent 120:
class fq_codel 120:323 parent 120:
class fq_codel 120:347 parent 120:
class fq_codel 120:368 parent 120:
class fq_codel 120:394 parent 120:
class fq_codel 120:3bd parent 120:
class fq_codel 120:3e3 parent 120:

tc -d class show dev ifb4eth1

class htb 1:10 parent 1:1 leaf 110: prio 0 quantum 12000 rate 100Mbit ceil 100Mbit linklayer ethernet burst 1600b/1 mpu 0b overhead 0b cburst 1600b/1 mpu 0b overhead 0b level 0
class htb 1:1 root rate 100Mbit ceil 100Mbit linklayer ethernet burst 1600b/1 mpu 0b overhead 0b cburst 1600b/1 mpu 0b overhead 0b level 7
class fq_codel 110:1d5 parent 110:
class fq_codel 110:349 parent 110:

Hi Martin,
looking at your speedtest results (specifically the bufferbloat expanded plots) it seems clear that ingress shaping is problematic while egress shaping works quite well (and does not seem to introduce unwanted delay). Could I ask you once more to perform a test with the ingress set to 60Mbps (50% of the real rate), I predict that still will look nasty. Do you know whether your WAN port uses a dedicated ethernet port or whether it is running though the router's switch chip?

Best Regards

Hello everyone,
just wanted to throw in an idea/my experience: although I have a cable connection too, I still had to set a link layer adaptation (Ethernet with overhead, 28 byte) to make it work and get rid of bufferbloat. Maybe worth a try?
EDIT: This was on a TP-Link Archer C7 v2.0.

@moeller0, indeed upload shaping works perfectly. Only download/ingress suffers from bufferbloat.
Here are the 60Mb ingress results :

Switch on WRT3200ACM: Marvell 88E6352 (7-Port AVB GbE Switch, 5x GbE PHY)

@Strangi, I tried Ethernet per packet overhead of 28 bytes with no noticeable effect.

Thanks, so even at ~50% of the true goodput-rate you still see massive latency issues on ingress.If you run a speedtest could you also log into the router and run "top -d 1" and look at the idle and sirq in the second line from the top? sirq is not accounted for as user or system time, so even with low sys and usr percentages a CPU might still be effectively maxed out (your CPU is supposedly so fast it should not really have to work hard for your shaper rates, but thee might be a bug lurking somewhere). Could you also post the result of "ethtool -k eth1" and tc -s class show dev ifb4eth1 from before and after a speedtest please? I know I am asking for a lot of data, and I can not promise that this actually will lead to a (quick) fix for your issue, but unless westart going tp the bottom of this we will never know...

Regarding the per paket overhead, I agree with stangri that it is important to get this right, but it will only ever show up as a latency issue, if you run close to the actual bottleneck rate and/or if you use saturate your link with small packets. Your @50% ingress tests sufficiently avoided the wrong overhead issue, and hence no effect on your measurements.

So honestly something looks wrong (and there was at least another report of SQM failing to shape ingress well. My gut feeling is that the router might actually pipe the WAN port in through the switch and you might be seeing switch buffer bloat/switch misbehaviour...

Best Regards

With 100Mb ingress, simple fq_codel, CPU: 0% usr 0% sys 0% nic 91% idle 0% io 0% irq 9% sirq
For reference, Archer C7 was 0% idle here. Either this router is very fast or it's not actually running the algorithm at all.

ethtool -k eth1

Features for eth1:
rx-checksumming: off [fixed]
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

tc -s class show dev ifb4eth1 (before speedtest)

class htb 1:10 parent 1:1 leaf 110: prio 0 rate 100Mbit ceil 100Mbit burst 1600b cburst 1600b
Sent 283278 bytes 4158 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
lended: 4149 borrowed: 0 giants: 0
tokens: 1925 ctokens: 1925

class htb 1:1 root rate 100Mbit ceil 100Mbit burst 1600b cburst 1600b
Sent 283278 bytes 4158 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
lended: 0 borrowed: 0 giants: 0
tokens: 1925 ctokens: 1925

class fq_codel 110:26f parent 110:
(dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
deficit 196 count 0 lastcount 0 ldelay 2us

tc -s class show dev ifb4eth1 (after speedtest)

class htb 1:10 parent 1:1 leaf 110: prio 0 rate 100Mbit ceil 100Mbit burst 1600b cburst 1600b
Sent 145948045 bytes 140720 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
lended: 97169 borrowed: 0 giants: 0
tokens: 1925 ctokens: 1925

class htb 1:1 root rate 100Mbit ceil 100Mbit burst 1600b cburst 1600b
Sent 145948045 bytes 140720 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
lended: 0 borrowed: 0 giants: 0
tokens: 1925 ctokens: 1925

class fq_codel 110:26f parent 110:
(dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
deficit 38 count 0 lastcount 0 ldelay 1us

Thanks, unfortunately no smoking gun. This all looks rather normal. My current pet hypothesis is that router router pipes the wan port through its switch (which I could neither proof or disproof with a quick research, could you post a screenshot of the network->switch tab from the router's GUI somewhere, that should tell) and that the switch has some issues. Since egress data will be shaped before seeing the switch this might explain, why you only see issues with ingress. I admit that my evidence is rather weak...

Hi moeller0, thanks for the ideas. I think you're right eth1 is part of the switch. I would be surprised to hear the WRT3200ACM was badly designed, since it's supposedly a high-end opensource-ready router, but there could be some driver issue somewhere. Any idea how to diagnose and rule that out?

Imgur

Archer C7 shows a similar switch page as WRT3200ACM. (click to view entire image)
Imgur
I'm using the C7 as Wifi for the time being but have been testing SQM on WRT3200ACM LAN ports. Upgraded to LEDE 17.01 and bufferbloat persists.

I took the WRT3200ACM out of the equation. Here's the Archer C7 v2 set to 30/20Mbit with cake/piece_of_cake on LEDE r2608 :

Then set to 20/20Mbit, something strange happens :

CPU: 0% usr 1% sys 0% nic 51% idle 0% io 0% irq 46% sirq

I have cake/piece of cake running on a TP link 842N v3.1 and on an Archer C7v2, both very recent LEDE builds.

On my 842N, bufferbloat is "A" on the DSL reports, whereas it is "D" on the Archer C7, all else being equal.

I would guess that the Archer C7 v2 is either defect or something weird is going on on the hardware side.

It would be good to hear others' experience with the Archer C7.

If the moderator(s) read this: maybe it would be good to have a dedicated section on QoS issues in this forum?

Hi Martin, your speedtests both show a nicely working egress/upload shaping, it is only ingress/download that looks terrible (well there is one terrible data point in idle of https://www.dslreports.com/speedtest/9434989 as well). Assuming both your routers pipe ingress via a switch port, this might indicate we currently have issues with switches (we being eiwther sqm and/or lede)

@deuteragenie according to https://wiki.openwrt.org/toh/tp-link/tl-wr842nd your 842 has a dedicated ethernet phy for the wan port, so it is not using this over a switch. I guess I might be able to tests my hypothesis by instantiatimg sqm on an lAN interface on my wndr3700v2, as those also go over a switch.

Best Regards

Hi @moeller0, it does look like LEDE switch drivers are incomplete compared to mainline Linux. Mainline drivers disable ingress rate limiting whereas the LEDE driver doesn't even touch that register. Later Linux drivers not only initially disable it but also configure the rate limiting. Could this be necessary for proper ingress shaping?

LEDE 4.4

Linux 4.4

Linux master (4.10)

Rate limiting

Hi @moeller0, have you had a chance to try SQM on your Netgear router?