SQM not working correctly - WNDR3800 18.06

For the past week, my SQM config has stopped working correctly. When I do a test on dslreports, bufferbloat will occur for about 1-2 seconds at the start and then sqm will kick in and it will start working as intended.

If I perform a ping test when watching a 1080p stream on twitch, I get a ping spike of 60+ ms every 3/4 seconds whereas sqm used to stop any spikes from happening and keep my ping from increasing more than 5+ ms under load.

I've tried resetting my router multiple times, used different configurations and tried on both 17.1 and 18.06 firmware, but I can't seem to fix it. I've also disabled IPv6 completely.

My ISP is BT Infinity and I'm FTTC. My Download uncapped is 54Mb and Upload is 10Mb.

Here's my current sqm config, if anyone can offer any advice, I'd really appreciate it.

Interface: eth1-wan
Download 48000 kbps
Upload 9000 kbps
cake
piece of cake
ethernet with overhead
Per Packet overhead: 8

Are connection via wifi or vie ethernet cable to your router? If wifi, try to repeat your tests over ethernet, just to rule out issues due to RF-noise or "congested" air. Also how do the dsl error counters look, especially do they increase with a frequency similar to the observed ping-spikes?

Connection is via ethernet. Not sure what you mean by dsl error counters? Also, to elaborate, when I start the test on dslreports, my speed gradually increases for the first few seconds before maintaining a constant 46ish Mb. There seems to be a problem getting to max throughput at the start.

This is more or less normal, make sure you have set the preferences to say 10 streams and you may get faster results on download, upload it's normal for TCP slow start to cause a kind of smooth concave downwards curve during the initial upload stages.

1 Like

As @dlakelan indicated that is typical TCP-slow-start behavior (which is actually not that slow), TCP tries to eventually fill all available capacity (within limits) by continuously trying to send at a higher rate until it encounters marked or dropped packets at which point it will scale down some and again slowly (slower than during slow-start) start to raise again (google for TCP sawtooth). Using multiple TCP connections will allow to hide the individual sawtooth patterns somewhat (you could try setting the number of flows to one and might be able to see some of the sawtooth in the bandwidth plot, if that has the appropriate temporal resolution).

1 Like

Ok, that's fair, it's just weird because in the past I would maintain <5ms bloat the entire test, but now I've got this weird startup of 200 ms.

I'm still experiencing problems with just regular lag spikes when downloading anything. If I run a ping test in command prompt during a video or game download, I'll get constant spikes to +60ms, often going to 100ms :confused:

Startup speed issues are normal, but long lags are not normal, can you link to an example test?

I made a video showing my problem. I set the streams to 3 to show it happens regardless of # of streams

Yeah I see this sometimes as well. I suspect it is some monkey business on the fiber connection. I have gig FTTP and yet the first test I run in the morning would often have that kind of behavior. My theory is when my line is idle the backhaul shuts down some of the capacity, that shelf followed by rise is the ISP opening up the floodgates so to speak.

Does a test immediately after this test behave better?

unfortunately not. I need to do a line noise test I think, which requires me asking my neighbour who I never talk to, to ask to borrow an old wired phone. I hope there's actually something wrong with my connection so I can get it fixed -___-

Somewhat related, try running https://www.dslreports.com/tools/pingtest for say an hour on an unloaded link and also try https://www.dslreports.com/pingtest, and https://www.dslreports.com/smokeping to see how well your link behaves without load

I'll try and do it around midnight/early morning since there's always someone using the internet throughout the day.

Just a tought, try
opkg update ; opkg install luci-app-statistics collectd collectd-mod-cpu collectd-mod-ping then configure (in the GUI) Statistics (tab) -> Setup -> Network Plugins (sub tab) -> Ping (sub sub tab) there check "enable this plugin" and put say gstatic.com into the "Monitor hosts" filed, maybe set "Interval for pings" to 5, and let that run over night, then you can visually correlate ping spikes with router CPU load (you could also install collectd-mod-interface and configure this for your wan interface so you can also figure out the network load during ping spikes). The beauty of this is that you can just run this in the background without having to find a quiet period in your network...

Okay, I've set that up. I'll let it run today and see what it says by the end of the night.

If it was working correctly and then stopped your ISP may have changed something.

I have Comcast. SQM was working beautifully and then bufferbloat ratings went to hell. I was able to get bufferbloat scores back to "A" on DSL Reports by DISABLING downstream SQM (set download speed to zero) and setting upstream SQM to the provisioned speed. Anything else was crap.

All I can figure out is Comcast has turned on AQM as defined in the DOCSIS 3.1 specification and amended to the DOCSIS 3.0 specification.

This issue for me seemed coincidental with a firmware update to my DOCSIS 3.0 cable modem.

Good luck,

Jim

1 Like

I'll give that a try, thanks for posting. @jbrossard

Also, here's my router stats from yesterday. @moeller0

@moeller0

@moeller0

Mmmh, I think that shows nicely how ping latency, network and CPU load correlate, but I can not see anythnig unusual here, neither drops not excessive latencies. I note that openwrt does not really show the single-server smokeping plot I expected (I use multiple ICMP-responders so I always see this multi-host view).IU also note that collectd, will cause it self periodically cause CPU-load spikes (AFAICT it will not overload the CPU, but is not completely free either)

Rang my ISP, they said my line is fine, so I'm pretty much out of ideas at this point. I'm going to put my router in recovery mode and flash a fresh image of LEDE to see if that has any effect. Otherwise I guess I'm $%£$%£$%.