Right on brotha... Mean time, does the latest push properly detect and report delays?
haha I missed that update, and yeah, works like a charm so far! Ill let it run and see what happens.
edit Yep, stopped right around the 1.7mbit/s range, which is about right.
Between that and my adblock, I am hurting for ram... but the sizeable swap partition I added the other day when I switched to a faster usb drive should help offset that somewhat. I already made an image just in case I wear it out.
Awesome. That doesn't seem like hurting that bad for Ram, a bunch of it is cache and buffers, so the OS can clear that out if it needs it, so you have a 25MB more or less space before you start swapping.
Let it run for a while, and let me know how the subjective performance seems to you. Do you mostly not have ping problems? Do you keep oscillating with delayed pings for like 1 minute, no pings for 1 min, pings, no pings, etc... Ideally I could design a tuning algorithm that makes it spend 90% of its time at something like 5-10% below the delay threshold or something.
WOW!
Not sure why the packet loss on the router in ping plotter, but as you can see there's no problems there. I can't thank you enough! Its already many times better than it is without it. This is under heavy load from a smart TV and at least two phones during this time.
That's great. ideally it wouldn't hammer the ping reflectors too much. I'll think about an algorithm that backs off a bit... imagine this is installed on 10M routers... 500 M pings per minute or something...
Yeah, the thought crossed my mind xD It would also be nice to reestablish a baseline every so often, and Ill also have to see what happens if the WAN interface were to go down for a short time. Mine rarely does, maybe 2-3 times a year, UPS believer that I am, but that may be something to look at.
actually the algorithm it's using right now automatically does maintain a realistic baseline. as long as the server doesn't go away or suddenly become much farther away that used to be it should get used to mild changes in baseline. if a given server suddenly goes away it should handle that but it won't be able to use it obviously.
I think it should actually work with the WAN going away entirely at least it shouldn't crash. so far though it's really just a proof-of-concept. I'm glad it is helping you though!
So, what's the longer term behavior been like? Are you still running it on your router?
So, because of the aforementioned issue of 'hammering the ping reflectors' I haven't been running it 24/7, but I did start it Tuesday afternoon and let it run until I woke up this morning. There haven't been any errors, and its working as expected/needed.
This seems to be the best way to run it going forward, but I'm a noob, so any suggestions are welcome.
erl -pa ebin -eval "sqmfeedback:main()." -noshell -detached
I will be sure to keep you updated, and will keep checking github for any updates. I may fork that script if/when I have time to mess with it, provided I have your blessing to do so.
for just one person it's probably not a big deal. it's doing something like 10 pings a minute to each site, and it purposefully uses randomization so it wouldn't sync up if there were multiple users. but yeah i'm thinking about how to make it both responsive and not overly hammery at scale. I should probably figure out how to use logs for messages, and a little database for recording historical speeds... it could learn if there's a pattern daily etc
This happens when the modem is restarted, and all the ping threads die. Like I said, it doesn't happen on its own, but it would be nice to have the threads come back when the wan interface reconnects.
=CRASH REPORT==== 26-Feb-2020::11:02:27.507879 ===
crasher:
initial call: sqmfeedback:monitor_a_site/1
pid: <0.89.0>
registered_name: []
exception error: no function clause matching lists:nth(1,[]) (lists.erl, line 170)
in function sqmfeedback:monitor_a_site/5 (sqmfeedback.erl, line 57)
ancestors: [<0.75.0>]
message_queue_len: 0
messages: []
links: [<0.75.0>]
dictionary: [{rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>,
next => #Fun<rand.5.10897371>,type => exrop,
uniform => #Fun<rand.6.10897371>,
uniform_n => #Fun<rand.7.10897371>,
weak_low_bits => 1},
[136007550988625197|127070042509926934]}}]
trap_exit: false
status: running
heap_size: 987
stack_size: 27
reductions: 831660
neighbours:
I'm not so sure in my case, that a daily/weekly pattern would be of any use. There seems to be more congestion issues on holidays, and pretty random times throughout the week, due to my DSLAM backhaul. Any extra use by other subscribers in the area can cause these conditions, and we're not talking about a huge number of people here. Maybe someone on a larger network would benefit from that though?
The function monitor_ifaces is supposed to receive a message about dying site pingers and restart them, but it's untested because I can't unplug my desktop where I am doing the development (i have an NFS home directory, so the whole system freezes until it comes back online). Ideally I'd probably put this on my laptop and then I could just bring the interface down and watch it crash and see what happens... Thanks for the report. I'll take a look at it friday probably.
Well, no rush HAHA I doubt that will happen again for a while as when I rebooted the modem it had been online for 60 something days.
shm0
94
I was thinking about to implement something similar.
Something like, saving the known good rates to a file for a specific time. Lets say 1-3 to 7 days.
Then use the average/median of those records as a starting point to reduce the rates.
@shm0 I think the exponential random walk finds a good rate so quickly that it'd be hard to really reduce the search time. It really should take only about 3 rounds for any kind of reasonable fluctuation levels. (.9^3 = .73 and .9^4 = .66 so if you're talking about fluctuations of 25 to 30 percent only a small number of rounds are needed... The bigger issue is that we want to "open up" quickly to avoid having slow speeds, and we want to "clamp down" quickly to avoid having high pings... it's hard to figure out how to do that without pinging frequently.
Maybe worrying about hammering the ping reflector at this level (which does include some desirable random jitter etc) isn't worthwhile. After all all these sites are actually huge load balanced anycast sites I'm sure.
shm0
96
Do we?
Congestion mostly occurs during rush hours, mostly at the evening.
And most of the time the time window is roughly the same.
I think its better to keep the reduced rates for a certain amount of time and reset the rates after that.
I would set this duration to amount of time needed to guarantee lag free experience.
For example. if you know your online gaming sessions last at least 1h, keep the new rates for this amount of time.
Or if you know the time window/duration of the congestion use this as duration time.
Doing a rate search upwards will introduce new lag to find the new max rates.
Did you take a look at my shell script? It's not perfect and very simplistic but it works and tries to implement this approach.
Maybe it is also a bit slow because of the 1sec ping limit of busybox 
In my case, we definitely want it to open up quickly. Sometimes Ill have slower speeds (less available bandwidth) at 10am, or any other given time of the day which only lasts for half an hour, and sometimes its smooth sailing all day. With only 6mbit/s to work with, I need all I can get! It is a unique case, (up to) 48 customers are fed from the dslam, all are provisioned 1mbit/s up, and after that they provision out the leftover bandwidth. Feeding this is 8x async E1 lines. 
I'm the only one on the box with 6mbit service
shm0
98
I just noticed my approach does exactly that 
In your case set a cooldown time of 1800-3600 seconds.
If no congestion occurs in this time, the rates will be reset to max rates.
But my script is a bit limited,
it can only ping one host and there is no detection when the host goes down 
@shm0, you bring up a good point, which is there is a tradeoff between high ping and low bandwidth... For people with bandwidth to spare, and hard core gamers, they can clamp their bandwidth down a lot, and ensure no bufferbloat... But if you're working in constrained region, like for example your game requires say 500kbps and your max speed is only 1000 this isn't a solution. Ideally you'd specify a function that describes how much you care about bandwidth vs how much you care about ping, and the algorithm would balance those so that it minimizes total cost.
The behavior would basically be different for different people. Someone who has plenty of bandwidth and cares a lot about ping, it'd probably just find a steady state where it had low ping all the time and wouldn't have to check often. Someone like @Bndwdthseekr it'll constantly have to adjust because the available bandwidth varies a lot and the cost of throttling too much is high enough that it's worth it to "open up quickly" as mentioned.
shm0
100
@dlakelan
Well, when there is congestion, bandwidth needs to be spared 
If bandwidth is more important than latency,
reduce the cooldown time to a shorter duration like 30 - 60sec
and decrease the reduce factor from default 10% to something lower.
But when using small decrease steps it can take a noticeable longer time to find new safe rates.
After each step it takes some time to let the new settings settle in.
So, I think, using bigger reduce steps is a better choice.