TP Link wr1043nd v3 SQM Problems

Hello,

I recently tried LEDE 17.01.1 on my TP-Link wr1043nd v3. It seemed to be working fine until I installed luci-app-sqm on it. When I got SQM configured using cake and the piece-of-cake script along with my bandwidth parameters at first glance it seemed to work great and corrected buffer bloat. However, here in lies my problem.

When I was using my Playstation 4 to download a game it was downloading around my max download speed. I decided to run a speed test via my smart phone to see if QoS would kick in and allow my phone to get a descent speed while downloading. While doing this my phone averaged only 4Mbit/s, while my Playstation 4, still got around 190Mbit/s. When I went to run the test again, my phone took forever to try to connect to the server and at that time my Playstation's download speed dropped to around 5-20Mbit/s. Shortly after, my connection to the WAN dropped out completely. I retested it, and it seems to be an issue if multiple devices are trying to use bandwidth, because I tried with my laptop connected via ethernet.

This problem has happened not only on LEDE 17.01.1 with luci-app-smq installed, but OpenWrt 15.05.1 (v2 flashed) with luci-app-qos using the fq_codel and qos-scripts. Is there any thing I need to do special to fix it? I read somewhere that kmod-ipt-queue should be removed. Will that fix this issue or is there something more I need to do?

Any help would be greatly appreciated. Thank you in advance.

https://forum.openwrt.org/t/lan-stops-working-every-now-and-then/

Does this sound similar to what's happening to you?
Also what is your Internet speed?

No, I am able to access the LAN and if I reboot or power cycle the router it is unable to acquire a DHCP lease from my ISP until I power cycle my modem (Arris SB6183). It seems as if my modem goes haywire for some reason. My speed is 215 Mbps down and 22 Mbps up. I set the Ingress to 226000 for downloads and reach about 200 Mbps on a speed test and the Engress to 21000 and reach around 20 Mbps. Under the Link Layer I've tried None and Ethernet with Overhead. Ethernet with Overhead has a single value for me, which I've tried at 18 and 28. The value of 28 seemed to has lasted the longest when attempting to use another device on the LAN and WLAN, however it eventually resulted in the same problem.

I'am not sure how you are able to get 200Mbps and WR1043ND v3 since Archer C7, which has the same CPU can do only 120-130Mbps with SQM and cake.

Are you sure SQM is actually working for you?

SQM appears to be working because checking my bluffer bloat with it enabled and not enabled there is a huge difference. Using fq_codel and simple-qos with SQM, however, I am only able to achieve 165-170 Mbps.

You probably don't understand what SQM does, It improves latency on load. But you are not supposed to get a good speed test when you are downloading something heavy.

Well I don't know the inner workings of it, but it shouldn't be crashing my WAN while attempting a speed test or doing another activity on a different device while downloading something heavy. I do understand the difference between low and high latency and know that it gives lower latency and less of a buffer bloat with it enabled. As ran by tests on dslreports.com. I've read the documentation provided by LEDE and OpenWrt pertaining to SQM and QoS. However, I am not an expert on the matter, by no means.

Have you tried ifup wan or service network restart when this happens to you?

I have tried restarting the WAN via LUCI, but have not shelled in to run those commands. It still failed to obtain an IP Address.

The part with IP not being obtained on WAN port sounds similar to what I've been having on v4 but in my case the whole network on switch is dead.

You should try running those commends, next time something like this happens to you.

I've flashed back to the stock firmware for now. I'll try LEDE tomorrow and run those commands if it happens and also run a 'tc' to make sure cake is actually working as you suggested with the speed it might not be. I will let you know the results. Thank you for your help and suggestions thus far.

Well run:
tc -s qdisc
and:
cat /etc/config/sqm
and post the output here, please.

moeller0,

Here is the output of 'tc -s qdisc' and 'cat /etc/config/sqm'.

---- OUTPUT: tc -s qdisc ----

qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 800a: dev eth0 root refcnt 2 bandwidth 21Mbit besteffort triple-isolate rtt 100.0ms raw
Sent 25588 bytes 183 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
memory used: 2464b of 4Mb
capacity estimate: 21Mbit
Tin 0
thresh 21Mbit
target 5.0ms
interval 100.0ms
pk_delay 12us
av_delay 4us
sp_delay 4us
pkts 183
bytes 25588
way_inds 0
way_miss 40
way_cols 0
drops 0
marks 0
sp_flows 0
bk_flows 1
un_flows 0
max_len 386

qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------
Sent 6924 bytes 114 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 4074400 bytes 4511 pkt (dropped 0, overlimits 0 requeues 3)
backlog 0b 0p requeues 3
maxpacket 1514 drop_overlimit 0 new_flow_count 3 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 800b: dev ifb4eth0 root refcnt 2 bandwidth 226Mbit besteffort triple-isolate wash rtt 100.0ms raw
Sent 11712 bytes 114 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
memory used: 1984b of 11300000b
capacity estimate: 226Mbit
Tin 0
thresh 226Mbit
target 5.0ms
interval 100.0ms
pk_delay 13us
av_delay 2us
sp_delay 2us
pkts 114
bytes 11712
way_inds 0
way_miss 40
way_cols 0
drops 0
marks 0
sp_flows 1
bk_flows 0
un_flows 0
max_len 466

---- END OUTPUT ----

---- OUTPUT cat /etc/config/smq ----

config queue 'eth1'
option qdisc_advanced '0'
option interface 'eth0'
option debug_logging '0'
option verbosity '5'
option enabled '1'
option download '226000'
option upload '21000'
option qdisc 'cake'
option script 'piece_of_cake.qos'
option linklayer 'ethernet'
option overhead '28'

---- END OUTPUT ----

r43k3n,

I ran the 'ifup wan' and 'service network restart' commands after it happened again. Neither worked. I rebooted my router after disabling sqm and the sqm instance and ran the commands again. Neither worked. After power cycling my cable modem I was able to acquire a DHCP lease from my ISP. I can confirm that I do not have this issue with SQM disabled.

Okay, so these statistics are from shortly after a boot up without actually sending traffic over the link. That in itself is great as we need these for reference, but could re-do your test and collect the output of "tc -s qdisc" from before and after? It would be great if you could use the dslreports speedtest (see https://forum.openwrt.org/t/sqm-qos-recommended-settings-for-the-dslreports-speedtest-bufferbloat-testing/2803 for configuration notes). And you pretty much use the default configuration, have a look at https://lede-project.org/docs/howto/sqm especially the make sing and dance section to get more mileage out of cake. Also it would be great if you could monitor the router's load while running your test (in the router's cli use "top -d 1" and look at the idle value, if this is too close to zero expect bumpy performance...).
Finally could you summarize how many machines you use and how they are connected to the router?

The isuse with reconect is ISP cmts server try this
/etc/config/network

... config interface 'wan' ... option release '1' option clintid '01:your wan mac address' ## optional option vendorid 'lede udhcp' ## optional ...

moeller0,

I have 9 devices in total. 5 devices are on wireless, 2 are connected all the time (printer and smart phone). 1 is rarely used. 2 are connected most of the time (Laptop and TV). For the other 4 devices 2 of them are always connected to the LAN via ethernet via moca (Tivo and Tivo Mini), using 1 LAN port. My PS4 and PC are also connected to the router via ethernet (most of the time if the PS4 is on the PC is off). My laptop also sometimes is plugged in to the router directly.

Here is the output you asked for:
---- OUTPUT tc -s qdisc (reboot) ----

qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 8007: dev eth0 root refcnt 2 bandwidth 21Mbit besteffort triple-isolate rtt 100.0ms raw
Sent 37381 bytes 174 pkt (dropped 0, overlimits 6 requeues 0)
backlog 0b 0p requeues 0
memory used: 4288b of 4Mb
capacity estimate: 21Mbit
Tin 0
thresh 21Mbit
target 5.0ms
interval 100.0ms
pk_delay 352us
av_delay 14us
sp_delay 8us
pkts 174
bytes 37381
way_inds 0
way_miss 43
way_cols 0
drops 0
marks 0
sp_flows 0
bk_flows 1
un_flows 0
max_len 1542

qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------
Sent 27693 bytes 103 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 138113 bytes 389 pkt (dropped 0, overlimits 0 requeues 2)
backlog 0b 0p requeues 2
maxpacket 505 drop_overlimit 0 new_flow_count 1 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 8008: dev ifb4eth0 root refcnt 2 bandwidth 226Mbit besteffort triple-isolate wash rtt 100.0ms raw
Sent 32019 bytes 103 pkt (dropped 0, overlimits 1 requeues 0)
backlog 0b 0p requeues 0
memory used: 1984b of 11300000b
capacity estimate: 226Mbit
Tin 0
thresh 226Mbit
target 5.0ms
interval 100.0ms
pk_delay 18us
av_delay 3us
sp_delay 2us
pkts 103
bytes 32019
way_inds 0
way_miss 38
way_cols 0
drops 0
marks 0
sp_flows 1
bk_flows 1
un_flows 0
max_len 1542


'top -n d1' 94-97% idle

---- END OUTPUT ----

---- OUTPUT tc -s (After dslreports.com Test) ----
---- 16 download stream, 16 upload stream, hi-res bufferbloat, dodge compression ----

qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 8007: dev eth0 root refcnt 2 bandwidth 21Mbit besteffort triple-isolate rtt 100.0ms raw
Sent 56872912 bytes 154146 pkt (dropped 813, overlimits 144530 requeues 0)
backlog 0b 0p requeues 0
memory used: 77376b of 4Mb
capacity estimate: 21Mbit
Tin 0
thresh 21Mbit
target 5.0ms
interval 100.0ms
pk_delay 7.9ms
av_delay 3.7ms
sp_delay 8us
pkts 154959
bytes 58126558
way_inds 0
way_miss 371
way_cols 0
drops 813
marks 0
sp_flows 0
bk_flows 1
un_flows 0
max_len 1542

qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------
Sent 337837364 bytes 248503 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 342097092 bytes 252586 pkt (dropped 0, overlimits 0 requeues 3)
backlog 0b 0p requeues 3
maxpacket 1514 drop_overlimit 0 new_flow_count 4 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 8008: dev ifb4eth0 root refcnt 2 bandwidth 226Mbit besteffort triple-isolate wash rtt 100.0ms raw
Sent 348097160 bytes 248388 pkt (dropped 115, overlimits 158050 requeues 0)
backlog 0b 0p requeues 0
memory used: 1567360b of 11300000b
capacity estimate: 226Mbit
Tin 0
thresh 226Mbit
target 5.0ms
interval 100.0ms
pk_delay 15us
av_delay 5us
sp_delay 3us
pkts 248503
bytes 348274490
way_inds 0
way_miss 366
way_cols 0
drops 115
marks 0
sp_flows 1
bk_flows 2
un_flows 0
max_len 1542


'top -d 1' 1-12% idle download, 70-8-% idle upload

192.8 Mbps down, 19.85 Mbps up, Overall A+, Blufer Bloat A, Quality A+

---- END OUTPUT ----

---- OUTPUT tc -s qdisc (WAN crash) ----

qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 8007: dev eth0 root refcnt 2 bandwidth 21Mbit besteffort triple-isolate rtt 100.0ms raw
Sent 288611697 bytes 820498 pkt (dropped 2808, overlimits 698920 requeues 21)
backlog 0b 0p requeues 21
memory used: 988064b of 4Mb
capacity estimate: 21Mbit
Tin 0
thresh 21Mbit
target 5.0ms
interval 100.0ms
pk_delay 108us
av_delay 11us
sp_delay 3us
pkts 823306
bytes 292867293
way_inds 4025
way_miss 1626
way_cols 0
drops 2808
marks 0
sp_flows 0
bk_flows 1
un_flows 0
max_len 1542

qdisc ingress ffff: dev eth0 parent ffff:fff1 ----------------
Sent 1898879217 bytes 1387351 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 1530382237 bytes 1018418 pkt (dropped 0, overlimits 0 requeues 4)
backlog 0b 0p requeues 4
maxpacket 1514 drop_overlimit 0 new_flow_count 5 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 8008: dev ifb4eth0 root refcnt 2 bandwidth 226Mbit besteffort triple-isolate wash rtt 100.0ms raw
Sent 1957061639 bytes 1387294 pkt (dropped 57, overlimits 871408 requeues 0)
backlog 0b 0p requeues 0
memory used: 1402688b of 11300000b
capacity estimate: 226Mbit
Tin 0
thresh 226Mbit
target 5.0ms
interval 100.0ms
pk_delay 284us
av_delay 117us
sp_delay 7us
pkts 1387351
bytes 1957147959
way_inds 4923
way_miss 1663
way_cols 0
drops 57
marks 0
sp_flows 1
bk_flows 1
un_flows 0
max_len 1542


'top -d 1' 12% - 35% idle, did dropped to 0% at a point

---- END OUTPUT ----

trismo,

I added the option release '1' to /etc/config/network and committed before I ran the test again that causes WAN crash. It did not resolve the issue. I ran ifup wan and service network restart after it crashed and after a reboot of the router. I still had to power cycle the cable modem.

Thanks, so in your test you see unexpected unfairness for sharing between the wlan phone and the wired PS4 mainly correct?

Sidenote, this indicates that you overspecify the overhead (not your fault), but that is never going to increase the bufferbloat, so let's ignore this for now.

So while the router is not doing anything you have plenty of CPU cycles to spare, good.

But once you exercise your download you are getting dangerously close to running out of CPU (given that top only samples every second), that might explain some issues at the high end. Especially since WIFI/WLAN if used will also suck up CPU cycles and (assuming you did the speedtest without using the phone) even without WLAN youe are to close for comfort to 0% idle.

Now I would like to propose the following, not as a solution bot as a procedure to get to a solution:

  1. set the ingress shaper to 113Mbps (and the egress shaper to 10Mbps), to make sure your router does not run out of CPU cycles and that you do not bump against your ISPs ceiling shaper for your line (to goal to increase this again later) and repeat your measurement (you could also post a link to your speedtest result if you feel like it). Also repeat your actual fairness test with phone and PS4

  2. Tweak the cake configuration according to https://lede-project.org/docs/howto/sqm#making_cake_sing_and_dance_on_a_tight_rope_without_a_safety_net especially adding the "nat" keywords to the exposed should help. Also repeat your actual fairness test with phone and PS4, as well as a dsl reports speedtest.

With a bit of luck the nat keyword with the explicit direction codes dual-srchost and dual-dsthost (see lede's sqm howto for configuration details) will recover some fairness.

  1. If this works iteratively increase the upload/egress shaper bandwidth until bufferbloat gets larger than you want and/or unfairness between PS4 and the phone gets unacceptable.

4)If this works iteratively increase the download/ingress shaper bandwidth until bufferbloat gets larger than you want and/or unfairness between PS4 and the phone gets unacceptable.

This should make sure your sqm configuration is sane, but unless your crashes are caused by running out of CPU cycles this will not necessarily fix your WAN losses.

After a loss, have a look at the output of:
dmesg
and of:
logread

You might want to share the output here in the forum, but be sure to read what you post, as both (especially the logread output) might contain information you should anonymize before posting for the world to see...

Hope this helps.

A crash / router freez ?
Please try it again with option release '1' & option clientid '01:A4:2B:B0:DE:44:EA' with 01:mac address from wan port

"root@lede:# ifdown wan" than please restart this cable modem
after cable modem is online
ifup wan
ifdown wan
ifup wan

And then please post system log output on ssh use logread i want to see the real problem / error here.