SQM autorate-ingress: Can I set thresholds for this?

Progress!

2> sqmfeedback:main().
ping one.one.one.one with results: []
ping facebook.com with results: []
ping quad9.net with results: []
ping dns.google.com with results: []
ping gstatic.com with results: []
ping cloudflare.com with results: []
ping fbcdn.com with results: []
ping amazon.com with results: []
ping akamai.com with results: []
ping quad9.net with results: []
=ERROR REPORT==== 22-Feb-2020::16:53:37.275274 ===
Error in process <0.88.0> with exit value:
{function_clause,[{lists,last,[[]],[{file,"lists.erl"},{line,228}]},
                  {sqmfeedback,monitor_a_site,5,
                               [{file,"sqmfeedback.erl"},{line,26}]}]}

ping akamai.com with results: []
=ERROR REPORT==== 22-Feb-2020::16:53:38.330320 ===
Error in process <0.93.0> with exit value:
{function_clause,[{lists,last,[[]],[{file,"lists.erl"},{line,228}]},
                  {sqmfeedback,monitor_a_site,5,
                               [{file,"sqmfeedback.erl"},{line,26}]}]}

ping cloudflare.com with results: []
=ERROR REPORT==== 22-Feb-2020::16:53:47.973170 ===
Error in process <0.91.0> with exit value:
{function_clause,[{lists,last,[[]],[{file,"lists.erl"},{line,228}]},
                  {sqmfeedback,monitor_a_site,5,
                               [{file,"sqmfeedback.erl"},{line,26}]}]}

and so on...

aha, well I'm guessing maybe the openwrt ping has a different format than what the code is looking for... hold on... yeah the built-in ping doesn't like floating point seconds in the interval... try installing the package "iputils-ping"

1 Like

okay, that fixed the ping

1> c(sqmfeedback).
sqmfeedback.erl:46: Warning: variable 'Time' is unused
sqmfeedback.erl:74: Warning: variable 'T' is unused
sqmfeedback.erl:80: Warning: variable 'SitePids' is unused
sqmfeedback.erl:83: Warning: variable 'TimerPid' is unused
{ok,sqmfeedback}
2> sqmfeedback:main().
ping facebook.com with results: [40.8,41.1,41.2,41.3,41.3]
ping cloudflare.com with results: [40.9,41.0,41.1,41.2,41.6]
ping fbcdn.com with results: [40.6,40.7,40.9,41.1,41.6]
ping dns.google.com with results: [41.0,41.2,41.3,41.5,41.6]
ping one.one.one.one with results: [40.9,41.3,41.3,41.4,41.5]
ping gstatic.com with results: [41.1,41.3,41.4,41.9,42.6]
ping akamai.com with results: [42.2,42.2,42.3,42.5,43.0]
ping amazon.com with results: [41.1,41.4,41.5,41.6,41.7]
ping quad9.net with results: [99.5,99.6,99.7,99.8,99.8]
ping cloudflare.com with results: [40.9,41.3,41.3,41.4,41.4]
ping dns.google.com with results: [40.9,41.1,41.3,41.4,41.7]
ping akamai.com with results: [42.3,42.5,42.6,42.9,43.5]
ping amazon.com with results: [41.5,41.7,41.8,41.8,42.5]
ping gstatic.com with results: [41.1,41.3,41.4,41.4,41.6]
ping one.one.one.one with results: [40.9,41.0,41.3,41.4,42.4]
=ERROR REPORT==== 22-Feb-2020::17:00:09.740859 ===
Error in process <0.84.0> with exit value:
{badarg,[{erlang,binary_to_float,[<<"100">>],[]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,pingtimes_ms,3,[{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,monitor_a_site,5,[{file,"sqmfeedback.erl"},{line,24}]}]}

Checking up on things: 1582408810
tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 =ERROR REPORT==== 22-Feb-2020::17:00:10.747223 ===
Error in process <0.91.0> with exit value:
{badarg,[{io_lib,format, 
                 ["tc qdisc change root dev pppoe-wan cake bandwidth ~BKbit diffserv4 dual-dsthost nat overhead 34 ingress",
                  [2202.7723853314915]],
                 [{file,"io_lib.erl"},{line,183}]},
         {sqmfeedback,new_bandwidth,2,[{file,"sqmfeedback.erl"},{line,18}]},
         {sqmfeedback,adjuster,1,[{file,"sqmfeedback.erl"},{line,69}]}]}

Weird, it won't convert "100" to a float because it doesn't have a period.. haha I haven't had that issue yet. Ok. Thanks. I'll work on that issue next, should be a relatively quick fix, some error handling etc and just try converting to an int then convert the int to a float.

what architecture are you running on? I'm going to file a bug report against erlang and it'd be useful to say which package/arch worked vs didn't.

well, you're alot closer xD

	"kernel": "4.14.167",
	"hostname": "Main_SQM",
	"system": "Broadcom BCM4716",
	"model": "Linksys E3000 V1",
	"board_name": "0x04cf:42",
	"release": {
		"distribution": "OpenWrt",
		"version": "19.07.1",
		"revision": "r10911-c155900f66",
		"target": "brcm47xx/generic",
		"description": "OpenWrt 19.07.1 r10911-c155900f66"

Ok, I pushed a small fix:
https://raw.githubusercontent.com/dlakelan/routerperf/master/sqmfeedback.erl

Should that have fixed it? I see it hasn't gotten the bandwidth "~BKbit" either... I must still be missing a dependency.


> sqmfeedback:main().
ping cloudflare.com with results: [42.2,42.8,42.8,43.5,44.7]
ping one.one.one.one with results: [42.1,42.3,42.6,43.0,44.5]
ping facebook.com with results: [41.6,42.1,42.3,43.0,43.9]
ping dns.google.com with results: [42.2,42.8,42.9,43.0,44.0]
ping amazon.com with results: [42.3,43.1,43.9,44.0,44.6]
ping fbcdn.com with results: [42.4,42.5,42.8,44.5,46.9]
ping gstatic.com with results: [42.3,42.6,42.8,44.4,44.5]
ping akamai.com with results: [70.8,71.2,71.6,77.9,81.3]
ping quad9.net with results: [100.0,101.0,101.0,102.0,102.0]
ping dns.google.com with results: [41.5,44.5,48.8,60.4,197.0]
ping fbcdn.com with results: [60.6,117.0,117.0,148.0,157.0]
ping cloudflare.com with results: [57.4,66.8,82.6,130.0,153.0]
ping one.one.one.one with results: [106.0,160.0,183.0,195.0,210.0]
ping gstatic.com with results: [42.2,43.8,47.9,54.4,184.0]
ping quad9.net with results: [196.0,210.0,214.0,222.0,304.0]
=ERROR REPORT==== 22-Feb-2020::19:01:14.434301 ===
Error in process <0.85.0> with exit value:
{badarg,[{erlang,binary_to_integer,"-",[]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,pingtimes_ms,3,[{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,monitor_a_site,5,[{file,"sqmfeedback.erl"},{line,24}]}]}

ping akamai.com with results: [70.9,73.8,94.9,99.1,150.0]
ping amazon.com with results: [41.0,42.2,42.2,42.8,43.1]
ping dns.google.com with results: [41.5,42.0,42.4,42.8,43.1]
Checking up on things: 1582416079
tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 =ERROR REPORT==== 22-Feb-2020::19:01:19.882657 ===
Error in process <0.91.0> with exit value:
{badarg,[{io,format,
             [<0.63.0>,
              "tc qdisc change root dev pppoe-wan cake bandwidth ~BKbit diffserv4 dual-dsthost nat overhead 34 ingress",
              [2314.56394781418]],
             []},
         {sqmfeedback,adjuster,1,[{file,"sqmfeedback.erl"},{line,69}]}]}

ping one.one.one.one with results: [146.0,164.0,176.0,179.0,224.0]
ping gstatic.com with results: [63.5,163.0,164.0,178.0,192.0]
=ERROR REPORT==== 22-Feb-2020::19:01:29.418177 ===
Error in process <0.89.0> with exit value:
{badarg,[{erlang,binary_to_integer,"-",[]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,pingtimes_ms,3,[{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,monitor_a_site,5,[{file,"sqmfeedback.erl"},{line,24}]}]}

It's still running the old version, you have to recompile.

c(sqmfeedback).
sqmfeedback:main().

Yeah, I did that, so I have got to be missing something :confused:

2> sqmfeedback:main().
ping akamai.com with results: [194.0,195.0,223.0,241.0,242.0]
ping facebook.com with results: [184.0,195.0,230.0,232.0,239.0]
ping one.one.one.one with results: [186.0,198.0,231.0,236.0,250.0]
=ERROR REPORT==== 22-Feb-2020::19:28:19.309114 ===
Error in process <0.82.0> with exit value:
{badarg,[{erlang,binary_to_integer,"-",[]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,pingtimes_ms,3,[{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,monitor_a_site,1,[{file,"sqmfeedback.erl"},{line,37}]}]}

=ERROR REPORT==== 22-Feb-2020::19:28:19.529679 ===
Error in process <0.88.0> with exit value:
{badarg,[{erlang,binary_to_integer,"-",[]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,pingtimes_ms,3,[{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,monitor_a_site,1,[{file,"sqmfeedback.erl"},{line,37}]}]}

=ERROR REPORT==== 22-Feb-2020::19:28:19.569591 ===
Error in process <0.86.0> with exit value:
{badarg,[{erlang,binary_to_integer,"-",[]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,pingtimes_ms,3,[{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,monitor_a_site,1,[{file,"sqmfeedback.erl"},{line,37}]}]}

ping cloudflare.com with results: [44.9,81.1,118.0,145.0,192.0]
ping amazon.com with results: [43.6,92.2,114.0,141.0,206.0]
ping quad9.net with results: [216.0,232.0,250.0,256.0,273.0]
ping facebook.com with results: [67.2,191.0,236.0,239.0,254.0]

I'm guessing some of the sites are not responding to pings and instead of getting a time it's getting some other garbage... then that thread is dying... Let it die, the other threads continue to work... let it run for a while and see what happens with just the "good" sites. I'll work on more error handling later. :wink:

will do, seems I may have removed a necessary space when I changed the bandwidth values, I'm checking to make sure everything is the same as you have it

=ERROR REPORT==== 22-Feb-2020::19:39:57.895071 ===
Error in process <0.88.0> with exit value:
{badarg,[{erlang,binary_to_integer,"-",[]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,pingtimes_ms,3,[{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,monitor_a_site,5,[{file,"sqmfeedback.erl"},{line,24}]}]}

yes, constant pings, and this error now

So close! xD

 90    bandwidth ~BKbit dual-srchost overhead 34 ", 900, 999, 1024}]
                             removed the space  ^
 91	  al-dsthost nat overhead 34 ingress",   1024, 3400, 6944},

Here's my only issue now

ping gstatic.com with results: [42.4,42.7,42.8,43.1,44.0]
ping cloudflare.com with results: [41.0,41.7,41.9,41.9,42.1]
ping facebook.com with results: [42.9,43.1,43.4,44.6,45.3]
Checking up on things: 1582420875
=ERROR REPORT==== 22-Feb-2020::20:21:16.004000 ===
Error in process <0.91.0> with exit value:
{badarg,[{io,format,
             [<0.63.0>,
              "tc qdisc change root dev pppoe-wan cake bandwidth ~BKbit dual-srchost overhead 34",
              [3629.0778505877756]],
             []},
         {sqmfeedback,adjuster,1,[{file,"sqmfeedback.erl"},{line,68}]}]}

pushed a change to round that float before printing it as an integer.... try it out.

haha well, that last part was me being a dumbass... got things squared away now:


    monitor_ifaces([{"tc qdisc change root dev pppoe-wan cake bandwidth ~BKbit diffserv4 dual-srchost overhead 34 ", 900, 1000, 1024},
		    {"tc qdisc change root dev ifb4pppoe-wan cake bandwidth ~BKbit diffserv4 dual-dsthost nat overhead 34 ingress",1024, 2000,6944}],
    ["dns.google.com","one.one.one.one","quad9.net","facebook.com",
     "gstatic.com","cloudflare.com","fbcdn.com","akamai.com","amazon.com"]),
    receive
	true -> true
    end.

I'm testing after having let it run for a while, and I don't see it adjusting down after starting a download and seeing several checks with high pings, is there a way we can dial this in a bit?

I'm seeing pings in the 200s all the way down, and errors when packets are lost, but you truly are a boss xD Once this is dialed in, its going to help so many people with crappy DSL connections...

ping one.one.one.one with results: [173.0,179.0,186.0,200.0,212.0]
ping facebook.com with results: [177.0,181.0,185.0,200.0,212.0]
ping akamai.com with results: [171.0,194.0,213.0,227.0,242.0]
ping gstatic.com with results: [192.0,209.0,233.0,241.0,248.0]
Checking up on things: 1582424484
tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 tc qdisc change root dev ifb4pppoe-wan cake bandwidth 3691Kbit diffserv4 dual-dsthost nat overhead 34 ingressping cloudflare.com with results: [196.0,218.0,219.0,251.0,257.0]
ping amazon.com with results: [204.0,215.0,216.0,228.0,281.0]
ping one.one.one.one with results: [164.0,173.0,181.0,187.0,193.0]
ping quad9.net with results: [218.0,230.0,245.0,256.0,261.0]
ping dns.google.com with results: [111.0,122.0,160.0,180.0,187.0]
ping amazon.com with results: [128.0,134.0,135.0,139.0,144.0]
ping akamai.com with results: [126.0,137.0,138.0,142.0,143.0]
ping cloudflare.com with results: [148.0,152.0,157.0,163.0,175.0]
ping one.one.one.one with results: [101.0,150.0,151.0,155.0,174.0]
ping gstatic.com with results: [88.4,93.3,100.0,116.0,150.0]
ping facebook.com with results: [76.8,92.2,107.0,114.0,157.0]
ping quad9.net with results: [139.0,153.0,181.0,182.0,192.0]
Checking up on things: 1582424514
tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 tc qdisc change root dev ifb4pppoe-wan cake bandwidth 4233Kbit diffserv4 dual-dsthost nat overhead 34 ingressping amazon.com with results: [122.0,140.0,148.0,148.0,158.0]
ping akamai.com with results: [89.1,101.0,134.0,139.0,142.0]

edit I checked, and I must have grabbed your latest script right as you updated it, because that's what I'm running

Yeah, I need to do that error checking stuff so nothing dies... but for now let's just dial it in in the monitor_a_site function:

around line 21:

monitor_a_site(Rpid,Name,N,Inc,FiveTimes) ->
    T = rand:uniform()*20+10, % sleep 10 to 30 seconds
    timer:sleep(round(T*1000)),
    MS = pingtimes_ms(Name,N,50),
    Times = lists:sort(lists:append(FiveTimes,MS)),
    Delay= lists:last(Times) - lists:nth(5,Times),
    if Delay > 20.0 -> 
	    Rpid ! {delay,Name,Delay,erlang:system_time(seconds)};
       true -> 
	    true
    end,
    monitor_a_site(Rpid,Name,N,Inc,lists:sublist(Times,2,5)). % throw out the lowest, keep the next 5

change the lines where we calculate Delay:

Delay=lists:last(Times) - lists:nth(3,Times),

and also the line where we decide that there's a sufficient delay:

if Delay > 10.0 ->

It will only do the adjustment if there are 4 or more sites that are delayed... you could also dial that in a little tighter:

around line 50:

	    if length(RecentSites) >= 2 ->

now if there are 2 sites delayed by at least 10 ms it will adjust.

tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 tc qdisc change root dev ifb4pppoe-wan cake bandwidth 6944Kbit diffserv4 dual-dsthost nat overhead 34 ingressChecking up on things: 1582446634
tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 tc qdisc change root dev ifb4pppoe-wan cake bandwidth 6944Kbit diffserv4 dual-dsthost nat overhead 34 ingressChecking up on things: 1582446664
tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 tc qdisc change root dev ifb4pppoe-wan cake bandwidth 6944Kbit diffserv4 dual-dsthost nat overhead 34 ingressChecking up on things: 1582446694
tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 tc qdisc change root dev ifb4pppoe-wan cake bandwidth 6944Kbit diffserv4 dual-dsthost nat overhead 34 ingressclient_loop: send disconnect: Broken pipe

I just let it run, I think my laptop went to sleep at some point lastnight, so that may have caused it to crash. I never did see it auto adjust at any point though. Everything looks normal in the router logs, so that's good.

So it looks like the way I wrote it is it either adjusts up or down. Of course once it hits the top, it can't go up... so this indicates that it's not experiencing delays as determined by the criteria I'm using.

Did you notice lags and delays? Or maybe it was actually smooth sailing during this period?

also I'm not seeing any info on the ping stats in your listing, so is that because all the pinger threads died?

ping amazon.com with results: [57.4,77.5,103.0,107.0,113.0]
Checking up on things: 1582429384
tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 tc qdisc change root dev ifb4pppoe-wan cake bandwidth 3511Kbit diffserv4 dual-dsthost nat overhead 34 ingress=ERROR REPORT==== 22-Feb-2020::22:43:15.940285 ===
Error in process <0.84.0> with exit value:
{badarg,[{erlang,binary_to_integer,"-",[]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,pingtimes_ms,3,[{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,monitor_a_site,5,[{file,"sqmfeedback.erl"},{line,24}]}]}

ping amazon.com with results: [103.0,119.0,140.0,150.0,178.0]
ping amazon.com with results: [144.0,148.0,174.0,177.0,209.0]
Checking up on things: 1582429414
tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 tc qdisc change root dev ifb4pppoe-wan cake bandwidth 3586Kbit diffserv4 dual-dsthost nat overhead 34 ingress=ERROR REPORT==== 22-Feb-2020::22:43:44.827828 ===
Error in process <0.85.0> with exit value:
{badarg,[{erlang,binary_to_integer,"-",[]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,'-pingtimes_ms/3-lc$^1/1-1-',1,
                      [{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,pingtimes_ms,3,[{file,"sqmfeedback.erl"},{line,11}]},
         {sqmfeedback,monitor_a_site,5,[{file,"sqmfeedback.erl"},{line,24}]}]}

Checking up on things: 1582429444
tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 tc qdisc change root dev ifb4pppoe-wan cake bandwidth 3669Kbit diffserv4 dual-dsthost nat overhead 34 ingressChecking up on things: 1582429474
tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 tc qdisc change root dev ifb4pppoe-wan cake bandwidth 4057Kbit diffserv4 dual-dsthost nat overhead 34 ingressChecking up on things: 1582429504
tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 tc qdisc change root dev ifb4pppoe-wan cake bandwidth 4172Kbit diffserv4 dual-dsthost nat overhead 34 ingressChecking up on things: 1582429534
tc qdisc change root dev pppoe-wan cake bandwidth 1024Kbit diffserv4 dual-srchost overhead 34 tc qdisc change root dev ifb4pppoe-wan cake bandwidth 4449Kbit diffserv4 dual-dsthost nat overhead 34 ingressChecking up on things: 15824

here's before and into the time it stopped displaying the ping. Last night wasn't horrible bandwidth wise, but I definitely could start a download and see latency into the 175-200ms range without any changes happening.

yep it looks like all the ping threads died, eventually it was just pinging amazon and even that died.

Ok, next step is obvious then, I need to do some error catching, and monitor and restart any dying threads. Thanks!

No problem! If you're curious, I'm on an Adtran Total Access 1248, fed by 8x T1 or async E1 lines. There is probably at the least 10-15 customers on the unit I'm on, so some evenings bandwidth can get pretty hard to come by.

I hate Frontier!