CAKE w/ Adaptive Bandwidth [August 2022 to March 2024]

moeller0 · September 19, 2022, 8:59am

I think it still makes sense, although it was 'sexie when it also appeared to ameliorate @patrakov's odd link behavior....

patrakov · September 19, 2022, 11:27am

I will try, but no guarantees. The complicating factor here is the need to keep the "fiber, with fallback to LTE" combo as the main connection, so I will have to write some mwan3 policies so that the VPN traffic always goes through the LTE link, and the pings and the test traffic generated by the laptop goes through the VPN.

moeller0 · September 19, 2022, 11:35am

Maybe just unplug the ONT for a test? That way mwan3 is forced to use the LTE fallback?

patrakov · September 19, 2022, 11:35am

I would rather avoid this.

moeller0 · September 19, 2022, 12:17pm

Fair enough. it is a rather disruptive test that does seem incompatible with normal network usage/

Lynx · September 19, 2022, 12:43pm

How about netifd PBR? Once set up you can just add entry in LuCi to route traffic to whichever destination. Works a treat albeit not so well documented.

I'd be curious if VPN use circumvents your ISP messing about with your traffic (assuming they do).

patrakov · September 19, 2022, 12:46pm

This would mean (because there can be only one PBR package active) dismantling my current mwan3 configuration - something that I wouldn't do without testing first on a separate router. The issue is not about routing the laptop through the VPN (that's easy), but about routing the IKEv2 VPN through LTE.

Lynx · September 19, 2022, 12:59pm

Why not WireGuard?

patrakov · September 19, 2022, 1:00pm

Commercial VPN provider.

Lynx · September 19, 2022, 1:23pm

Many offer WireGuard though don't they? Have you also considered VPS? It'd make two way measurement easy but I'm reluctant to make the switch since NordVPN has been great for me.

patrakov · September 19, 2022, 1:32pm

Please don't disrupt me, otherwise I will never start the test. Unfortunately this VPN provider has the WireGuard option only available in their client app, which is already too fishy.

Lynx · September 19, 2022, 1:41pm

NordVPN are the same but you can use their app to ascertain the WireGuard credentials and use those. Maybe same applies for you too.

Happy testing!

dtaht · September 19, 2022, 6:16pm

this looks promising: New WireGuard based OpenWrt VPN implementation: unetd - #19 by nbd

patrakov · September 20, 2022, 8:38am

Done. Actually I hit a bug in the xfrm interfaces (they drop 50% of NATed packets), so had to switch to strongswan-mod-kernel-libipsec, which works.

I will set up a packet capture and OWD ping train later today, but I cannot exclude the possibility that the ISP deprioritizes IKEv2 packets.

Lynx · September 20, 2022, 9:19am

I'd appreciate any willing volunteers to test the latest 'fping' code available here:

@rosbeef, @richb-hanover-priv, @WaningForests, @gba?

It incorporates a lot of changes from main including:

switch over to fping from iputils-ping to give more consistent ping timings - @patrakov identified that in certain conditions pings to different reflectors could become synchronized and, in any case, the timings would tend to drift until a reset, which spoils the granularity achieved by staggering pings to different reflectors. After battling with various solutions I opted to switch over to 'fping' which is designed to send pings to different reflectors in a round robin fashion and offers a lot of control over the timings, e.g. time between global sends and time between sends per reflector. So this ensures better timing is maintained in respect of pings.
stall detection - this detects certain stall conditions like interface being reset and won't rotate out reflectors in those.
better log file handling and log file rotation - now we use semi colons separators and user can specify number of minutes to retain in log file (default 60mins).
only use capacity estimation based on achieved rate when minimum set shaper rate exceeded - if achieved rate is very low then we just punish the shaper rate by the set factor rather than based on the achieved rate, which in practice means recovering a little bandwidth.
other minor changes and bug fixes

I'd like to verify it all still works OK before pushing all this to the main branch.

tievolu · September 20, 2022, 9:35am

New settings:

Increase standard download rate:

dl_bw_standard = 5000

Increase the bad ping percentage to 50. This means that 50% in the pings during the last two seconds need to be "bad" to trigger a bandwidth reduction, versus the default of 25%. This should make the script more tolerant of the constant packet loss and prevent spurious bandwidth reductions:

bad_ping_pc = 50

Set the idle bandwidths equal to the minimum bandwidths. You can set these explicitly, or just delete the lines in your conf file because the following is the default:

dl_bw_idle_threshold = $dl_bw_minimum
ul_bw_idle_threshold = $ul_bw_minimum

Disable all the debug settings as discussed before

Questions:

Why did you increase the reflector strike TTL to 750s? This makes it much more likely that reflectors will be deemed "bad" and rotated out, especially with the large amount of packet loss on your connection (which doesn't appear to be the reflectors' fault most of the time). I'm just trying to understand the motivation here.
Your minimum bandwidth settings are extremely low - do they really need to be that low?

moeller0 · September 20, 2022, 9:45am

Testers, please give this a fair shaking, the rationale here is good, but the open question is, how often does true achievable rate drop below the minimum, if that is next to never the new code (reverting back to the original behavior) gives better throughput without increasing latency, if dropping below the configured minimum rate is less rare however the older code is more conservative. With the current data this change seems good, so please check whether that holds on your links as well (especially starlink).

This is configurable? Or a fixed switch over? Asking because on macosx fping in the past introduced bugs that took a while to get fixed, so I would not "bet the farm on fping alone', having it as an higher-performant option with a manually configurable fall-back to ip-utils ping seems desirable to me.

Lynx · September 20, 2022, 9:56am

At the moment I am thinking to maintain separate branches, one for fping (new main) and one for iputils-ping (separate branch) because there is needed so much different code to handle either case. I could try to see if I could maintain both in the bash code and use a switch, but the code is getting larger and larger and I am very much trying to keep things simple.

The complexity associated with maintaining separate ping processes and the problems associated with the timings there makes me think that perhaps iputils-ping ought to be reserved for cases where there is just one reflector to be used and thus the generic approach is a pinger of some form that supports round robin pinging. I could try to write a wrapper for iputils-ping that does the round robin stuff for plugging into generic receiver.

But it's also the time it takes me to write all this. In a sense there is something clean about me maintaining something I use 24/7 because at least I know that the main code is tested and works well on at least my own connection.

moeller0 · September 20, 2022, 10:14am

Mmmh, with my experience with fping I am not sure that is a great idea... it would be way more robust to design this that switching can be done with a config option.

That is betting on fping being available and working as expected and timely fixes to bugs. I think that relaying on ip-utils ping is a much safer bet, but not my choice to make.

Lynx · September 20, 2022, 10:49am

BTW concerning log file rotation at the moment we specify time in minutes for log file rotation. Would it be better to specify a log file rotation check interval and max log file size and then rotate if the log file size has exceeded that size? I don't have so much experience with managing logging, but I notice that if a sleep is activated (and hence no logging) then on resumption this will trigger an immediate rotation. So it could rotate even pretty small log files. But perhaps that's fine and there is simplicity knowing what time period the log files relate to. I can't work out in my mind what behaviour is better here so would appreciate your guidance.