In the current Lua version it takes the interval (0.5s) divided by reflectors (5), so we're talking 10 packets/s, and it still uses too much CPU for more lower powered devices.
I've been experimenting with a C version to see how well that works, but it isn't quite ready yet, some significant parts are still missing. I haven't pushed this branch anywhere, but will do so when it's actually functional. It will need a lot of cleanups though, as it's mostly a learning project for fun.
It does use very little CPU with the same sending rate as the Lua script, and even the modestly powered modem I use can handle stupidly high rates.
2 Likes
tievolu
2408
I'm sure I could make my perl version run a lot more efficiently, for example by reducing the number of subroutines, which are pretty inefficient in perl. But it would make the code a lot more difficult to read (in traditional perl style
).
I also do a lot of extra calculations and a fair bit of string formatting and logging because I like to see the details of why the script is doing what it's doing, and I have other scripts that generate web pages with pretty graphs so I need the output for that. If I took all of that stuff out I'm sure it would use a lot less CPU.
I think a C implementation is what we really need, but I'm not the man for that task. Looking forward to seeing what you come up with. In the meantime I'll keep expanding my perl skillset for no good reason. I just find this whole problem strangely fascinating and can't stop playing with it.
Somehow C makes me shudder less than Perl, the more you know 
As long as it can be turned down to more CPU-friendly levels for those of us with less beefy CPU's I think the more logs/stats the better 
That will probably end up being the long-term solution to support "everything", but I'm not really the man for it either, though I'm enjoying the process of trying to figure out of to think in C.
While the C port is an incomplete version of the Lua script at the moment, the CPU-usage is wild (or lack thereof). While testing a quick and dirty import of the reflectors CSV file, it ended up pinging 300+ reflectors every 500ms, and even then it used about half of what the Lua script does
Admittedly Lua wasn't AFAIK even meant to be threaded (beyond coroutines), so maybe it shouldn't be all that surprising that it might end up being rather resource hungry.
I think it's just the difference between compiled and bytecode interpreted. Lua is quite efficient, but it's not machine code. Unless you have luajit of course.
Go is really much much nicer and safer than C for this kind of thing I think. But Luajit would be great. I can't remember why doesn't Luajit work? It's some kind of threads issue right?
Lynx
2411
Have others encountered cpu utilisation issues? Even with my bash implementation I only see about 2% cpu utilisation. The gentleman that wrote the go port indicated that on his R7000 cpu utilisation went to 100% when running the bash script. That admittedly seems terrible. Is that the fault of the bash script I wonder? Why is that so bad and yet on my RT3200 I am not seeing any issue in this respect? I know the cpu on the R7000 is not as beefy, but can that account for such a huge difference? I have irqbalance enabled - perhaps that helps a lot?
My bash implementation works sufficiently well for me that I use it all the time now. It certainly gives me more capacity from my LTE connection when needed and seems to do a good job of keeping latency low:
Without an autorate script I wouldn't get such download and upload rates because I'd have to compromise for something much lower to work only most of the time.
And I don't see evidence of heavy cpu utilisation (although perhaps the below is not definitive):
@tievolu like you I have been really enjoying working on this, albeit it is a huge distraction from my normal work.
How about a standard testing script is put together to objectively test the efficacy of any approach on a given device and connection? For example, how it handles step load, sustained load, etc.?
There are a lot of different solutions now and not a great understanding about how they fare.
Getting a better handle on the above could help prompt work on a new C-based implementation.
My dream goal in starting this thread was to converge on a general solution for many. On the plus side, a good amount of knowledge and experience has been accumulated, and there are now several viable options to choose from.
One of the things I like about CAKE is its simplicity in terms of set and forget. The bandwidth issue is a huge problem for variable connections and I hope there can be offered to users a similar set and forget option for OpenWrt users to resolve this significant issue. After all, autorate-ingress fails to deliver.
Interpreted vs compiled is a big part of it yes. LuaJIT does go a long way to get CPU usage to a more sane level, but it's still higher than I would like. Probably because of communication between Lanes (threads) (seemingly) being expensive.
It's some segfault issue with LuaJIT + LuaLanes yes, I have it working by recompiling LuaLanes against the LuaJIT package instead, but I could never figure out what the issue actually was, so we'd need someone more knowledgeable to help figure out a proper fix for this.
Well yes, but as with everything it's a tradeoff, and one we'd have to make if we went with Go is the large binaries we'd end up with. The Go port of @Lynx's script ends up at about 2.5M with stripped binaries. With all the prerequisites I'm not sure Lua is much better in total (but it helps that the Lua interpreter is included in OpenWrt). The binary for the current half-done version of the C port however ends up at 48K (with -Os and stripped symbols, which IIRC OpenWrt uses by default for packages).
The performance issues we saw with the Lua script were the reason I started poking around with a port, but binary size was mainly the reason I ended up trying in C.
In fairness, when included in an OpenWrt image, SquashFS would probably compress it down by quite a bit, which would help a lot. Still though, I'm operating under the belief that if a router can run CAKE, it should be able to run whatever autorate code we come up with.
1 Like
Can't recall that I ever had a problem with that, but the EdgeRouter 4 packs a decent punch, so I wasn't a good test case.
Over in Lua land I know @richb-hanover-priv had a lot of issues on his Archer C7v2 with the Lua script eating up all its CPU, and I've seen the same tendencies myself since moving my CAKE instance to the new 4G-modem, which has a less beefy processor, though the MT7621 in it should handle quite a bit.
A testing script would be nice. Plus it would be good to agree on a standard way to log bandwidth changes, reported RTT's, etc. so they're easily comparable where applicable.
There's certainly a lot of divergence now, but it's kind off to be expected given that it's rather uncharted waters we're dealing with.
As mentioned the C port is meant to have the same logic as the Lua script for now, but one of my hopes is that it could be used to build upon when we do converge on a solution that will work for as many people as one can reasonably expect.
1 Like
Is that compiled with tinygo?
No, the standard GO compiler. Tinygo looked really neat, but unfortunately it currently doesn't have support for compiling a generic MIPS binary for Linux: https://github.com/tinygo-org/tinygo/issues/1075
Lynx
2416
Would you or anyone else be willing to test my bash implementation? It just needs the .sh scripts downloading and placement in /root/CAKE-autorate/, chmod +x them, and then run with ./CAKE-autorate.sh (making sure to have the dependencies listed in CAKE-autorate.sh installed).
@richb-hanover-priv started doing so.
1 Like
Lynx
2417
Curious - what's everyone actually using at the moment? I have my bash implementation running all the time as a service at the moment, albeit I find it hard to stop tweaking various aspects of it.
Tests like this confirm why I absolutely need autorate on my LTE connection:
In general it seems very responsive.
1 Like
Maybe also post a test with autorate/cake completely disabled?
Still nothing, fixed rate link with little/no bufferbloat with static SQM config, as I said in the past all the different autorate options can offer on my link is not getting in the way 
Lynx
2419
It would be good if you could arbitrarily lengthen the warmup time.
I am looking forward to either myself or perhaps @richb-hanover-priv or another creating a standard test in ash to test the efficacy of these various implementations.
From my own subjective testing this bash implementation is working well, and the main goal for me now is working on getting the CPU usage down by looking to see what is eating up CPU cycles unnecessarily. I think you mentioned that calls to 'date' can be avoided. Is that using printf? The processing for each reflector in 'monitor_reflector_path' is especially important because that scales with each reflector and reduced ping interval. I am guessing the separate processes get distributed across the cores with irqbalance helping to distribute the load. I am sure there is some low hanging fruit.
1 Like
tievolu
2420
Curious - what's everyone actually using at the moment? I have my bash implementation running all the time as a service at the moment, albeit I find it hard to stop tweaking various aspects of it.
I'm using my updated Perl implementation "in production" and it's working well. I think it's pretty close to where I want it but I too cannot stop tweaking. It's now about 3500 lines of code 
For example, yesterday I added a load of logic to try and handle situations where either my or the reflector's millisecond counter has reset to zero (but the other hasn't), and today I totally changed the way the script tracks bandwidth usage, which has halved the CPU usage. Doesn't even register 1% in top on my x86 router now 
@Lynx I've been using your script more than week, and it's working. In my use case scenario I'm happy with the results except the ping spikes. Every 4-5 seconds I get ping spikes ranging from 4 ms to 25 or 85ms. I understand with your algorithm we can't totally nullify it but is there a way we can minimize it to little, say around 12ms instead whooping 85ms?
Lynx
2422
From the posts above I think the problem may be that your ping spikes too much on its own.
Can you post the output of 'config.sh'? I presume you are using the latest code in the main branch.
Setting the delay threshold is important:
Oh no I'm using the previous version of your code. The newer one you updated a day ago gave me bunch of syntax errors so I'm sticking to the older code as of now but I don't mind to help you with it.
Tried the latest code from main branch:
#!/bin/bash
# defaults.sh sets up defaults for CAKE-autorate
# defaults.sh is a part of CAKE-autorate
# CAKE-autorate automatically adjusts bandwidth for CAKE in dependence on detected load and RTT
# inspired by @moeller0 (OpenWrt forum)
# initial sh implementation by @Lynx (OpenWrt forum)
# requires packages: iputils-ping, coreutils-date and coreutils-sleep
alpha_OWD_increase=1 # how rapidly baseline OWD is allowed to increase (integer /1000)
alpha_OWD_decrease=900 # how rapidly baseline OWD is allowed to decrease (integer /1000)
debug=0
enable_verbose_output=1 # enable (1) or disable (0) output monitoring lines showing bandwidth changes
ul_if=wan # upload interface
dl_if=veth-lan # download interface
min_dl_rate=9000 # minimum bandwidth for download
base_dl_rate=9800 # steady state bandwidth for download
max_dl_rate=96000 # maximum bandwidth for download
min_ul_rate=9000 # minimum bandwidth for upload
base_ul_rate=9800 # steady state bandwidth for upload
max_ul_rate=96000 # maximum bandwidth for upload
alpha_OWD_increase=1 # how rapidly baseline RTT is allowed to increase (integer /1000)
alpha_OWD_decrease=900 # how rapidly baseline RTT is allowed to decrease (integer /1000)
rate_adjust_OWD_spike=50 # how rapidly to reduce bandwidth upon detection of bufferbloat (integer /1000)
rate_adjust_load_high=10 # how rapidly to increase bandwidth upon high load detected (integer /1000)
rate_adjust_load_low=25 # how rapidly to return to base rate upon low load detected (integer /1000)
high_load_thr=50 # % of currently set bandwidth for detecting high load (integer /100)
delay_buffer_len=4 # size of delay detection window
delay_thr=15 # extent of delay to classify as an offence
detection_thr=2 # number of offences within window to classify reflector path delayed
reflector_thr=2 # number of reflectors that need to be delayed to classify bufferbloat
ping_reflector_interval=0.1 # (seconds, e.g. 0.1s)
main_loop_tick_duration=200 # (milliseconds)
bufferbloat_refractory_period=300 # (milliseconds)
decay_refractory_period=5000 # (milliseconds)
ping_sleep_thr=60 # time threshold to put pingers to sleep on sustained ul and dl base rate (seconds)
# verify these are correct using 'cat /sys/class/...'
case "${dl_if}" in
\veth*)
rx_bytes_path="/sys/class/net/${dl_if}/statistics/tx_bytes"
;;
\ifb*)
rx_bytes_path="/sys/class/net/${dl_if}/statistics/tx_bytes"
;;
*)
rx_bytes_path="/sys/class/net/${dl_if}/statistics/rx_bytes"
;;
esac
case "${ul_if}" in
\veth*)
tx_bytes_path="/sys/class/net/${ul_if}/statistics/rx_bytes"
;;
\ifb*)
tx_bytes_path="/sys/class/net/${ul_if}/statistics/rx_bytes"
;;
*)
tx_bytes_path="/sys/class/net/${ul_if}/statistics/tx_bytes"
;;
esac
if (( $debug )) ; then
echo "rx_bytes_path: $rx_bytes_path"
echo "tx_bytes_path: $tx_bytes_path"
fi
# list of reflectors to use
reflectors=("1.1.1.1" "1.0.0.1")
no_reflectors=${#reflectors[@]}
Didn't touch other files, I get this error:
root@OpenWrt:~# . ./CAKE-autorate.sh
-ash: /root/CAKE-autorate/config.sh: 0: not found
-ash: /root/CAKE-autorate/config.sh: line 82: syntax error: unexpected "("
You probably need to install bash first:
opkg update ; opkg install bash
root@OpenWrt:~# opkg update ; opkg install bash
Downloading https://downloads.openwrt.org/snapshots/targets/bcm27xx/bcm2711/packages/Packages.gz
Updated list of available packages in /var/opkg-lists/openwrt_core
Downloading https://downloads.openwrt.org/snapshots/targets/bcm27xx/bcm2711/packages/Packages.sig
Signature check passed.
Downloading https://downloads.openwrt.org/snapshots/packages/aarch64_cortex-a72/base/Packages.gz
Updated list of available packages in /var/opkg-lists/openwrt_base
Downloading https://downloads.openwrt.org/snapshots/packages/aarch64_cortex-a72/base/Packages.sig
Signature check passed.
Downloading https://downloads.openwrt.org/snapshots/targets/bcm27xx/bcm2711/kmods/5.10.103-1-f6b2536fc50a2bccf542e07d83531095/Packages.gz
Updated list of available packages in /var/opkg-lists/openwrt_kmods
Downloading https://downloads.openwrt.org/snapshots/targets/bcm27xx/bcm2711/kmods/5.10.103-1-f6b2536fc50a2bccf542e07d83531095/Packages.sig
Signature check passed.
Downloading https://downloads.openwrt.org/snapshots/packages/aarch64_cortex-a72/luci/Packages.gz
Updated list of available packages in /var/opkg-lists/openwrt_luci
Downloading https://downloads.openwrt.org/snapshots/packages/aarch64_cortex-a72/luci/Packages.sig
Signature check passed.
Downloading https://downloads.openwrt.org/snapshots/packages/aarch64_cortex-a72/packages/Packages.gz
Updated list of available packages in /var/opkg-lists/openwrt_packages
Downloading https://downloads.openwrt.org/snapshots/packages/aarch64_cortex-a72/packages/Packages.sig
Signature check passed.
Downloading https://downloads.openwrt.org/snapshots/packages/aarch64_cortex-a72/routing/Packages.gz
Updated list of available packages in /var/opkg-lists/openwrt_routing
Downloading https://downloads.openwrt.org/snapshots/packages/aarch64_cortex-a72/routing/Packages.sig
Signature check passed.
Downloading https://downloads.openwrt.org/snapshots/packages/aarch64_cortex-a72/telephony/Packages.gz
Updated list of available packages in /var/opkg-lists/openwrt_telephony
Downloading https://downloads.openwrt.org/snapshots/packages/aarch64_cortex-a72/telephony/Packages.sig
Signature check passed.
Package bash (5.1.16-1) installed in root is up to date.
root@OpenWrt:~# . ./CAKE-autorate.sh
-ash: /root/CAKE-autorate/config.sh: 0: not found
-ash: /root/CAKE-autorate/config.sh: line 82: syntax error: unexpected "("
root@OpenWrt:~#
Still the same any idea?
Lynx
2426
@moeller0 quick question: at the moment for each reflector I keep last X samples and delay is detected if Y out of X are above threshold.
I am wondering about instead detecting delay on the basis of Y successive delays, e.g. if delay is sustained for >Y successive samples, then reflector delayed.
Which metric would you favour?