SQM not working correctly - WNDR3800 18.06

I still stand by the idea that there is some monkey business going on where there's maybe 1 channel and then when they detect you're using capacity they negotiate another channel. In other words, this is "how it's supposed to work" according to their design.

From the perspective of the network engineer, the goal is to prevent any single user from grabbing huge quantities of bandwidth very quickly. If you need to transfer say 20Mbits of data, they'll give you 20Mbps for up to a second, and then if you continue transferring, they'll open up the floodgate and let you get your full 40... By doing this, things like video streaming will occur in short bursts of 20Mb for 1/2 a second instead of short bursts of 40Mb for 1/4 of a second, thereby keeping the bandwidth usage more even and causing less congestion on the backhaul.

What exactly is the symptom that you'd like to change? Reducing latency further?

What happens if you set your download speed to say 18Mbps? Does the ping spiking go away? If you have variable download speed, shapers will only be able to handle the queue if you set the shaper to work at the lowest level that's guaranteed. In your case this seems to be about 22Mbps or so, setting your download to 18 and testing should let you see if this explanation makes sense.

Maybe you can get some diagnostic output from the modem, like signal amplitude SNR, and what not, that might tell us why and where there are issues. Like there could be an RF-issue on the return path, that makes your segment work with very limited upload capacity that than runs into congestion issues more often, obviously I am just speculating...

The problem is the sqm has just stopped working. If I watch a youtube video, use netflix, do anything that invokes me consuming bandwidth, my ping will spike 50+ms every 4/5 seconds. The DSLreport doesn't really show the real problem. That was just another example of a weird problem where I have a laggy startup, which I never used to have.

I just tried setting the download to 20Mb and it didn't change anything.
bleh :frowning:

As for the modem, I can look at all my stats and they seem fine. Same way they've been for years.

Edit: Plugged in the standard ISP router, same experience with the laggy startup on dslreports. Tried flashing a reovery image on my wndr3800 to no avail. Something has clearly happened to my connection outside of my control and since my speeds haven't been affected there's nothing I can say to my ISP that will get them to send an engineer to my house. This is making online gaming really fun :upside_down_face:

Here's what my ping looks like when I watch a stream on twitch (with sqm active)

install mtr and try running mtr on the same destination, it may reveal where in the chain your ping variability is coming from, if it's not the link between you and the first hop after you, then SQM can't really do anything about it, but if it's somewhere in the ISP equipment, it's possible you can complain enough to get them to take a look at it.

|------------------------------------------------------------------------------------------|
|                                      WinMTR statistics                                   |
|                       Host              -   %  | Sent | Recv | Best | Avrg | Wrst | Last |
|------------------------------------------------|------|------|------|------|------|------|
|                                LEDE.lan -    0 |   97 |   97 |    0 |    0 |   21 |    0 |
|                           172.16.11.200 -    0 |   96 |   96 |    5 |   34 |  206 |    5 |
|                           31.55.187.181 -   96 |   21 |    1 |    0 |   75 |   75 |   75 |
|                           31.55.187.188 -    0 |   96 |   96 |    6 |   34 |  182 |    7 |
| core1-hu0-6-0-7.southbank.ukcore.bt.net -    0 |   96 |   96 |    6 |   33 |  160 |    6 |
|  peer8-et-7-0-2.telehouse.ukcore.bt.net -    0 |   96 |   96 |    6 |   36 |  138 |    7 |
|                           195.99.126.83 -    0 |   96 |   96 |    7 |   34 |  145 |    7 |
|                          151.101.192.81 -    0 |   96 |   96 |    6 |   41 |  212 |    7 |
|________________________________________________|______|______|______|______|______|______|
   WinMTR v1.00 GPLv2 (original by Appnor MSP - Fully Managed Hosting & Cloud Provider)

Ran it for a couple minutes during a twitch stream. Does my gateway IP being affected by lag point to a problem or is that normal?

Ok, so let's ignore the third line, since that device clearly doesn't respond to pings...

Everything after 172.16.11.200 has wrst case in the hundreds, much higher than average case, this suggests that occasionally your modem or the head-end or whatever 172.16.11.200 is or something between LEDE.lan and 172.16.11.200 (like a modem in bridge mode or whatever) is causing delays.

Try running: http://www.dslreports.com/tools/puma6 to see the distribution of delays.

If you don't have problems with signal levels on your modem, I'd suggest maybe physically replacing the modem, perhaps it has overheating issues, or the ISP upgraded its firmware and induced a bug or something?

Also cheap thing to check: replace ethernet cables between your WNDR3600 and the modem, and plug and unplug the cable between your modem and the jack in the wall, maybe even replace it.

Thanks so much for helping me try to resolve this. I'll have to wait till later or tomorrow before I run the test because people are using the internet atm.

My modem definitely hasn't been upgraded because my ISP stopped upgrading the firmware years ago. It's unlocked so I could get line stats from it and I disabled it from upgrading when I did that in the past.

I'll try unplugging everything tomorrow. I'm still waiting for my neighbour to get back so I can borrow his phone and do a proper noise test on my line.

It will affect the test, but should still be informative, the test doesn't use bandwidth or interfere with the other users, it's safe to try that puma test whenever.

no idea what this test means but here you go

22ms : xxxxxxxxx
23ms : xxxxxxxxxxxxxxxxxxxxxxxxxxx
24ms : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
25ms : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
26ms : xxxxxxxxxxxxxxxxxxxxxxxxxxxx
27ms : xxxxxxxxxxx
28ms : xxxxxx
29ms : xx
30ms : x
31ms : xx
39ms : x

I'll unplug and plug everything back in tomorrow

So that test gives a histogram of how long it takes to set up and tear down a TCP connection. None of the samples showed severe delays, so that indicates at the time you were using it, the results were good.

If you notice delays, run it again and see what it shows.

Ran it twice.

No streaming/downloading

22ms : xxxxx
23ms : xxxxxxxxxxxxxxxxxxxxxxxxxxxx
24ms : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
25ms : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
26ms : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
27ms : xxxx
28ms : xxxx
29ms : xx
30ms : xxx
31ms : x
33ms : x

1080p mixer stream running:

22ms : xx
23ms : xxxxxxxxxxxxxxxx
24ms : xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
25ms : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
26ms : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
27ms : xxxxxxxxxxxxxxxx
28ms : xxxxxxxx
29ms : xxxxx
30ms : xxxxx
31ms : xx
32ms : xxx
33ms : x
34ms : x
35ms : x
37ms : x
38ms : x
42ms : x
43ms : x
45ms : x
48ms : xx
50ms : x
51ms : x
65ms : x
84ms : x
89ms : x
96ms : xx
100 - 149ms :xxxxxxxxxx
1 Like

Ok so that shows your problem occurs under load. Was SQM running?

If you run a stream with SQM running, what does top -d 1 tell you the idle percentage is on your router?

This is what it says with a stream running

Mem: 45032K used, 80332K free, 1032K shrd, 4832K buff, 15120K cached
CPU:   0% usr   0% sys   0% nic  93% idle   0% io   0% irq   4% sirq
Load average: 0.00 0.00 0.00 1/57 7207
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 7207  7198 root     R     1188   1%   2% top -d 1
 2089     1 root     S     1892   2%   1% /usr/sbin/hostapd -s -P /var/run/wifi
 7197  1355 root     S     1148   1%   1% /usr/sbin/dropbear -F -P /var/run/dro
  101     2 root     SW       0   0%   1% [kworker/0:1]
 1703     1 root     S     3772   3%   0% /usr/sbin/uhttpd -f -h /www -r LEDE -
 2111     1 root     S     3652   3%   0% /usr/sbin/collectd -f
 1726     1 root     S     2776   2%   0% /usr/sbin/vsftpd
 2070     1 root     S     1892   2%   0% /usr/sbin/hostapd -s -P /var/run/wifi
 1239     1 root     S     1704   1%   0% /sbin/netifd
 1855     1 root     S     1560   1%   0% /usr/sbin/nlbwmon -o /var/lib/nlbwmon
    1     0 root     S     1536   1%   0% /sbin/procd
 1168     1 root     S     1440   1%   0% /sbin/rpcd
 1306     1 root     S     1416   1%   0% /usr/sbin/odhcpd
 2505     1 root     S     1320   1%   0% /usr/sbin/miniupnpd -f /var/etc/miniu
 1159     1 root     S     1224   1%   0% /sbin/logd -S 64
 2662     1 dnsmasq  S     1216   1%   0% /usr/sbin/dnsmasq -C /var/etc/dnsmasq
 1999  1239 root     S     1204   1%   0% /usr/sbin/pppd nodetach ipparam wan i
 2243     1 root     S <   1192   1%   0% /usr/sbin/ntpd -n -N -S /usr/sbin/ntp
 7198  7197 root     S     1192   1%   0% -ash
^C493     1 root     S     1180   1%   0% /sbin/ubusd

Are you running this stream over the internet, or just between two LAN computers? Because that shows that the router is doing absolutely nothing, so I'm guessing the packets are just going between ports on the switch?

This is on the internet, I just loaded up a random 1080p stream on mixer for a couple minutes

My guess is that the stream is buffering so at the moment you captured that data nothing was going through the router :wink: try turning on top, then initiate the stream and see how it looks in the first second or so.

I ran the command and refreshed the stream, let it run for about 20 seconds and the results are the exact same tbh :confused:

Throughout the whole 20 second period the idle percentage stayed at around 93% ? or just after 20 seconds it was 93%? Because my theory is that the stream buffers up 5 or 10 seconds of data at a time, so it will flip flop between say 10% idle and 95% idle every 5 or 10 seconds so you have to look carefully and have top refreshing every second to see the one or two samples where it's working very hard.

Okay here's the lowest it got. There was also a few around the 80% mark

Mem: 45256K used, 80108K free, 1052K shrd, 4832K buff, 15200K cached
CPU:   0% usr   3% sys   0% nic  61% idle   0% io   0% irq  33% sirq
Load average: 0.00 0.05 0.02 1/58 8013
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 2111     1 root     S     3652   3%   4% /usr/sbin/collectd -f
 8012  8004 root     R     1188   1%   1% top -d 1
 8003  1355 root     S     1148   1%   1% /usr/sbin/dropbear -F -P /var/run/dro
  101     2 root     SW       0   0%   1% [kworker/0:1]
 1703     1 root     S     3772   3%   0% /usr/sbin/uhttpd -f -h /www -r LEDE -
 1726     1 root     S     2776   2%   0% /usr/sbin/vsftpd
 2070     1 root     S     1892   2%   0% /usr/sbin/hostapd -s -P /var/run/wifi
 2089     1 root     S     1892   2%   0% /usr/sbin/hostapd -s -P /var/run/wifi
 1239     1 root     S     1704   1%   0% /sbin/netifd
 1855     1 root     S     1560   1%   0% /usr/sbin/nlbwmon -o /var/lib/nlbwmon
    1     0 root     S     1536   1%   0% /sbin/procd
 1168     1 root     S     1440   1%   0% /sbin/rpcd
 1306     1 root     S     1416   1%   0% /usr/sbin/odhcpd
 2505     1 root     S     1320   1%   0% /usr/sbin/miniupnpd -f /var/etc/miniu
 1159     1 root     S     1224   1%   0% /sbin/logd -S 64
 2662     1 dnsmasq  S     1216   1%   0% /usr/sbin/dnsmasq -C /var/etc/dnsmasq
 1999  1239 root     S     1204   1%   0% /usr/sbin/pppd nodetach ipparam wan i
 2243     1 root     S <   1192   1%   0% /usr/sbin/ntpd -n -N -S /usr/sbin/ntp
 8004  8003 root     S     1192   1%   0% -ash
  493     1 root     S     1180   1%   0% /sbin/ubusd

Ok, so that level of load doesn't concern me a lot, so I don't think it's likely to be lack of CPU power. I assume SQM was running here. If not, definitely redo it with SQM running.

Your data from above: SQM not working correctly - WNDR3800 18.06 and SQM not working correctly - WNDR3800 18.06 taken together suggests that under load there is congestion in the ISPs equipment. You're on DSL right? Is the modem here a bridge mode device? Is 172.16.11.200 the modem? Or is that a gateway at the FTTC cabinet?