I'm currently running LEDE CAPRICORN 1.2 r2640-08db3e1 / LuCI Master (git-16.358.28306-df0d765) on a Linksys WRT3200ACM. Build runs pretty good and is from @cybrnook
I'm on a 120/6 cable connection behind a Cisco EPC3212.
I started fiddling with SQM a couple of days ago and currently settled on "cake" / "piece_of_cake". All other settings are on default.
When doing tests on the DSLReports speedtest I can see improvements in the rest results. It usually goes from Cs or Ds to As or Bs. Most of the time I'm getting "A B A" with SQM enabled.
But when I enable ingress shaping the results usually get way worse. Sometimes all the way down to F for bufferbloat.
I tried a lot of different combinations and values for ingress / egress but it didn't matter much. As soon as ingress shaping has any other value than 0 the results get worse.
So I just tested with "ingress shaping off & egress shaping on" for a while and keep getting good results on the test with values between 80 - 95% of my maximum upload speed.
Today I also disabled egress shaping and am still getting the same grades and good results (still "A B A" most of the time now with "ingress shaping off & egress shaping off"
Rebooting the router after changing settings in SQM didn't seem to make a difference. Neither does changing queuing disciplines or scripts (similar good / bad results on the ones I tried).
So now I'm confused and have a couple of questions:
Any ideas why ingress shaping could lead to way worse results (and how to change / avoid that)?
Any idea why setting ingress / egress shaping to 0 still gets me good results?
Is SQM even in effect (much) when those are disabled?
Cable networks have a shared bus network topology, and can have some weird (congestion related) behavior if CMTS is overloaded.
I have a SQM (fq_codel) enabled router and sometimes obtain an "A" or "A+" from DSLReports, and sometimes an "F", no matter if shaping is set on 90% or 60% of ingress/egress rate.
Mmmh, I am uncertain about the actual topology of that routers WANaccess port. Looking at https://wiki.openwrt.org/toh/linksys/wrt_ac_series it seems like all ports go through a switch. Could you try to reduce the rates to 50% of what you measure without sqm an retest? Also could you please post the output of the foolwing commands on your router's command line:
This all seems a bit sub optimal, but I have no real idea what is going wrong. And it could be buffering in the switch (which I believe was the initial bufferbloat phenotype) or sqm simply not issueing the right commands, so the diagnostics I asked for might help in figuring our which avenue to follow..
Okay, so the cmd output was with both ingress and egress shaping deactivated so everything as expected, but also not diagnostic for anything. Could you please redo this with at least ingress shaping activated? Also please add:
tc -d class show dev eth0
tc -s class show dev eth0
tc -d class show dev ifb4eth0
tc -s class show dev ifb4eth0
with both ingress and egress shapers active, right after running a speedtest.
The test results show that your egress is over-buffered and that shaping does bring a noticeable improvement (you might want to set the link layer accounting to ethernet and specify 4 bytes of additional overhead (on eth0 the linux kernel will already silently account for 14 bytes of overhead, so you only need to add the missing 4 to reach the 18 that DOCSIS systems seem to require)). I would even try, after setting the proper overhead to set the egress shaper at 100%.
The ingress shaper is more of a concern; I would guess that your cable segment might be quite full (as the tests without ingress shaping give pretty variable/hideous results. In that case sqm would be off the hook as with congestion our ingress shaping simply is at the merci of the CMTS. But I agree that the no-ingress shaping results have more samples with acceptable latencies than the ingress shaping ones, so unfortunatelt sqm-scripts might still be involved...
Final question, have you tried simple.qos with fq_codel as qdisc for ingress as well and could you post a link to a dslreports speedtest, please (also you could try to activate the high resoltion bufferbloat tests and up the test duration to 30 seconds for both directions to get more data quicker)
And yes, sadly my segment is pretty full. So picking the holidays to test something like this was not the best idea.
Will repeat the tests with your suggestions around next week when things should start to calm down again.
I tried fq_codel / simple in the beginning but cake / piece_of_cake seemed to get less spikes during the tests. Will stick to fq_codel / simple this time (seems to be more mature according to some reading I did so far).
Oh, I do not want to imply cake might only be half baked, but rather testing htb+fq_codel at least once for comparison seems like a good thing to do. My hypothesis is that it behaves similar to cake, but real data would be nice.
Never had trouble with the cables but just for completeness:
Ethernet cables used are labeled CAT7 but are most likely just CAT6 with the highest grade of shielding since there's still (shielded) RJ45 plugs on them (and iirc real CAT7 cables don't have RJ45). The COAX cable is also one with a higher grade of shielding.
well yes and no.The shaper used in DOCSIS systems that limits a users maximal bandwidth does completely ignore DOCSIS overhead and only includes ethernet frames including their frame check sequence (FCS 4 Byte). (The linux kernel accounts for ethernet framing without the FCS).
"C.220.127.116.11 Maximum Sustained Traffic Rate 632 This parameter is the rate parameter R of a token-bucket-based rate limit for packets. R is expressed in bits per second, and MUST take into account all MAC frame data PDU of the Service Flow from the byte following the MAC header HCS to the end of the CRC, including every PDU in the case of a Concatenated MAC Frame. This parameter is applied after Payload Header Suppression; it does not include the bytes suppressed for PHS. The number of bytes forwarded (in bytes) is limited during any time interval T by Max(T), as described in the expression: Max(T) = T * (R / 8) + B, (1) where the parameter B (in bytes) is the Maximum Traffic Burst Configuration Setting (refer to Annex C.18.104.22.168). NOTE: This parameter does not limit the instantaneous rate of the Service Flow. The specific algorithm for enforcing this parameter is not mandated here. Any implementation which satisfies the above equation is conformant. In particular, the granularity of enforcement and the minimum implemented value of this parameter are vendor specific. The CMTS SHOULD support a granularity of at most 100 kbps. The CM SHOULD support a granularity of at most 100 kbps. NOTE: If this parameter is omitted or set to zero, then there is no explicitly-enforced traffic rate maximum. This field specifies only a bound, not a guarantee that this rate is available."
So in essence DOCSIS users need to (only) account for 18 Bytes of ethernet overhead in both ingress and egress directions under non-congested conditions. But since on an ethN interface the linux kernel already accounts for 14 of those for fq_codel+HTB specify the overhead as 4. For recent cake you can and should specify the overhead as 18 as cake can undo the kernels automatic overhead addition.
thanks for the data, I will take a few days before I find time to look over it closely as I did not find any smoking gun on my first reading (and I might not find one on fine reading either). The error messages you got in the log are an already known issue with the hfsc kernel module used by some of the qos scripts (hfsc_lite.qos, hfsc_litest.qos, and nxt_routed_hfsc.qos) if you tried those that might have triggered the message (but we also load the hfsc module during sqm start-up to have it available for the listed hfsc using scripts, auto-loading of modules is not reliable on all "supported" distributions)
Okay, I have looked closer into your files and I am sorry to say, I made you do all these tests for no gain, I have no real idea why you seem to be better of with only upstream shaping. I also am out of realistic ideas what to test next...