Hi, I recently upgraded from a 30/10 cable connection to 120/20 and my TP-Link Archer C7 hit 100% load doing SQM QoS, resulting in ping spikes. I bought a new Linksys WRT3200ACM thinking the 1.8GHz dual core would be capable of keeping latency low despite downloads but I seem to notice the same behavior, with CPU load around 9%. Could it be a software or configuration issue? No Wifi is involved in this case.
What brand/model router do you have?
Linksys WRT3200ACM (same with TP-Link Archer C7) What version of LEDE are you using?
r2987-25200ae How do you connect to the internet? Cable? DSL, other?
Cable What's your nominal/expected/advertised download speed? Upload speed?
120 / 20 If you turn off all QoS/management, what are your measured download/upload speeds?
What is the WAN "interface name" in the Network -> Interfaces page?
eth1 What parameters do you see in the Network -> SQM-QoS values?
Enabled, eth1, 100000, 20000, cake, piece of cake, no link layer adaptation
Bufferbloat alternates between 0ms and 500ms every second. Same with fq_codel/simple
Cake qdisc is not installed by LEDE default but the scripts fail silently. So it may not set anything up.
simplest.qos should run well on Archer C7v2 ("me too"). try testing from 60 to 120 at 15 Mbps intervals. if you ssh into your router, then set option shaper_burst 1 in /etc/config/sqm and this can help.
Mmmh, if cake would be missing I would expect speedtest just the 131/22 one above. Could you post the output of "tc -d qdisc", "tc -d class show dev eth1" and "tc -d class show dev ifb4eth1", please? Maybe this will give some hints. Also Eric"s advise to test simple.qos with fq_codel as qdisc is great to figure out whether your issue is cake specific or sqm generic.
Thanks for the suggestions. Eric, I verified cake is indeed installed. Download limit doesn't matter, I notice bufferbloat whether I set 100Mbit ingress or even 10Mbit. I tried enabling shaper_burst and ping increased over 1 second.
Hi Martin,
looking at your speedtest results (specifically the bufferbloat expanded plots) it seems clear that ingress shaping is problematic while egress shaping works quite well (and does not seem to introduce unwanted delay). Could I ask you once more to perform a test with the ingress set to 60Mbps (50% of the real rate), I predict that still will look nasty. Do you know whether your WAN port uses a dedicated ethernet port or whether it is running though the router's switch chip?
Hello everyone,
just wanted to throw in an idea/my experience: although I have a cable connection too, I still had to set a link layer adaptation (Ethernet with overhead, 28 byte) to make it work and get rid of bufferbloat. Maybe worth a try?
EDIT: This was on a TP-Link Archer C7 v2.0.
@moeller0, indeed upload shaping works perfectly. Only download/ingress suffers from bufferbloat.
Here are the 60Mb ingress results :
Switch on WRT3200ACM: Marvell 88E6352 (7-Port AVB GbE Switch, 5x GbE PHY)
@Strangi, I tried Ethernet per packet overhead of 28 bytes with no noticeable effect.
Thanks, so even at ~50% of the true goodput-rate you still see massive latency issues on ingress.If you run a speedtest could you also log into the router and run "top -d 1" and look at the idle and sirq in the second line from the top? sirq is not accounted for as user or system time, so even with low sys and usr percentages a CPU might still be effectively maxed out (your CPU is supposedly so fast it should not really have to work hard for your shaper rates, but thee might be a bug lurking somewhere). Could you also post the result of "ethtool -k eth1" and tc -s class show dev ifb4eth1 from before and after a speedtest please? I know I am asking for a lot of data, and I can not promise that this actually will lead to a (quick) fix for your issue, but unless westart going tp the bottom of this we will never know...
Regarding the per paket overhead, I agree with stangri that it is important to get this right, but it will only ever show up as a latency issue, if you run close to the actual bottleneck rate and/or if you use saturate your link with small packets. Your @50% ingress tests sufficiently avoided the wrong overhead issue, and hence no effect on your measurements.
So honestly something looks wrong (and there was at least another report of SQM failing to shape ingress well. My gut feeling is that the router might actually pipe the WAN port in through the switch and you might be seeing switch buffer bloat/switch misbehaviour...
With 100Mb ingress, simple fq_codel, CPU: 0% usr 0% sys 0% nic 91% idle 0% io 0% irq 9% sirq
For reference, Archer C7 was 0% idle here. Either this router is very fast or it's not actually running the algorithm at all.
ethtool -k eth1
Features for eth1:
rx-checksumming: off [fixed]
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]
Thanks, unfortunately no smoking gun. This all looks rather normal. My current pet hypothesis is that router router pipes the wan port through its switch (which I could neither proof or disproof with a quick research, could you post a screenshot of the network->switch tab from the router's GUI somewhere, that should tell) and that the switch has some issues. Since egress data will be shaped before seeing the switch this might explain, why you only see issues with ingress. I admit that my evidence is rather weak...
Hi moeller0, thanks for the ideas. I think you're right eth1 is part of the switch. I would be surprised to hear the WRT3200ACM was badly designed, since it's supposedly a high-end opensource-ready router, but there could be some driver issue somewhere. Any idea how to diagnose and rule that out?
Archer C7 shows a similar switch page as WRT3200ACM. (click to view entire image)
I'm using the C7 as Wifi for the time being but have been testing SQM on WRT3200ACM LAN ports. Upgraded to LEDE 17.01 and bufferbloat persists.
I took the WRT3200ACM out of the equation. Here's the Archer C7 v2 set to 30/20Mbit with cake/piece_of_cake on LEDE r2608 :
Then set to 20/20Mbit, something strange happens :
Hi Martin, your speedtests both show a nicely working egress/upload shaping, it is only ingress/download that looks terrible (well there is one terrible data point in idle of https://www.dslreports.com/speedtest/9434989 as well). Assuming both your routers pipe ingress via a switch port, this might indicate we currently have issues with switches (we being eiwther sqm and/or lede)
@deuteragenie according to https://wiki.openwrt.org/toh/tp-link/tl-wr842nd your 842 has a dedicated ethernet phy for the wan port, so it is not using this over a switch. I guess I might be able to tests my hypothesis by instantiatimg sqm on an lAN interface on my wndr3700v2, as those also go over a switch.
Hi @moeller0, it does look like LEDE switch drivers are incomplete compared to mainline Linux. Mainline drivers disable ingress rate limiting whereas the LEDE driver doesn't even touch that register. Later Linux drivers not only initially disable it but also configure the rate limiting. Could this be necessary for proper ingress shaping?