I think the work you guys are doing has applications beyond just augmenting SQM.
Having a way to characterize the latency profile of your internet service over time, with respect to time-of-day and bandwidth usage, would helpful for anyone ISP shopping.
It could also be great information to have when working with your ISP to resolve connection issues.
@Lynx Update regarding stalls: the supposed definition (no pings, almost zero traffic) does not apply. Because of the newly added logging, now I am not blind into what happens between pings. The traffic is almost never zero, so it is "simply" a special case of bufferbloat. Please don't add any code based on the old definition to the master branch, as it's wrong.
What's common to all "stall" instances so far is:
Large delay to several pings at once.
Significantly reduced but non-zero non-ping traffic (e.g. a drop from 3365 to 628 kbps).
Most important: all delayed ping replies come back together or almost together, with decreased latency (arithmetic progression with the step approximately equal to the negative interval between pings).
I guess you should get a packetcapture to get to the bottom of this. How large are the added delays for the ping packets and how large is the load interval? So is there a smaller period of zero traffic followed by low traffic or is this mainly really low rate but no full drop out... (does the modem have anything that might be diagnostic here, like retransmission counters?)
About that ping time series that sure looks like all ping processing is stopped, pings* are queued up somewhere** and then released, like in our initial stall concept. Could you for testing try to ping though a VPN so your ISP can't specialcase ICMP packets?
*) That is either the request packets to the reply packets, this is something that OWDs would be able to tell us more about the direction.
**) Not in our own queue though, so either in the modem for upstream or in the basestation for downstream.
I will try, but no guarantees. The complicating factor here is the need to keep the "fiber, with fallback to LTE" combo as the main connection, so I will have to write some mwan3 policies so that the VPN traffic always goes through the LTE link, and the pings and the test traffic generated by the laptop goes through the VPN.
How about netifd PBR? Once set up you can just add entry in LuCi to route traffic to whichever destination. Works a treat albeit not so well documented.
I'd be curious if VPN use circumvents your ISP messing about with your traffic (assuming they do).
This would mean (because there can be only one PBR package active) dismantling my current mwan3 configuration - something that I wouldn't do without testing first on a separate router. The issue is not about routing the laptop through the VPN (that's easy), but about routing the IKEv2 VPN through LTE.
Many offer WireGuard though don't they? Have you also considered VPS? It'd make two way measurement easy but I'm reluctant to make the switch since NordVPN has been great for me.
Please don't disrupt me, otherwise I will never start the test. Unfortunately this VPN provider has the WireGuard option only available in their client app, which is already too fishy.