OK I will try creating a MR for adding hping3 later today.
Regarding the Lua-based approach, I did try it in the past with the old router (TP-Link Archer C7 v2 on 21.02.3), and was not entirely satisfied, because I could not tune it down to my expectations from this LTE connection: avoid the multi-second bufferbloat (yes 500 ms delay is acceptable, but 10+s is not, and that's what happens without SQM on bad days at 9PM), and keep the used bandwidth to minimum. I could not find a configuration option for lowering the frequency of ICMP packets. Additionally, my current use case involves using this LTE connection as a backup, and therefore, there is absolutely no way to build a relevant database of prior working results, which is what the lua-based script requires:
After a while, sqm-autorate builds a profile of good speeds to use and the speed finding should stabilise. On very variable connections, this algorithmic stabilisation should complete within 30-90 minutes.
That confirms my suspicion that there simply weren't any ping peers considered good enough to use. I will make sure that condition does not lead to awk blowing up, but concerning your situation in particular, it's quite dire, I think... was it your ISP that somehow rate-limits ICMP? That's a thing that could be happening, given the horrible reliability the results seem to suggest.
If you ping a host like 1.1.1.1 or 8.8.8.8 without retries while your Internet uplink is otherwise quiet, how many ICMP ECHO requests go unanswered, roughly?
@patrakov I start to think you need another reflector type than ICMP, something your ISP does not throttle/block, so UDP or TCP.
And you need some directional information abot where congestion happens. Thrying to deduce this from tge load percentages simply is not generally possible; might be possible to come up with a bearable heuristic for specific cases, but I am out of ideas I can immediately poke large holes into.
If you ping a host like 1.1.1.1 or 8.8.8.8 without retries while your Internet uplink is otherwise quiet, how many ICMP ECHO requests go unanswered, roughly?
0% (tried sending 100 pings to both, with 1s interval).
I think you have just overloaded my link with the test.
@moeller0 not sure if I need another reflector type. The test pings many hosts at once with a low interval.
Well using a probe type that might result on your ISP rate limiting that probe type or worse seems sub optimal.
Do you happen to have a server available somewhere on the internet you could run services on?
I could rent a VPS from DigitalOcean. But why? The problem is with the script that performs the auto-selection of reflectors. Supply a few known working ones (9.9.9.9, 94.140.14.140, 74.82.42.42, 8.26.56.26), done.
With a reflector under your own control, you could switch away from ICMP to avoid your ISPs rate limiting, and might even switch to NTP to get reliable OWDs to solve your directionality problem.
Public NTP servers seem to consider intervals smaller than 15 seconds as abuse, so for going the NTP route it seems required to only hammer on your own infrastructure.
Still, why? The reflectors that I pasted work for ICMP type 13 messages. Regarding the rate limiting, look again - all three LTE ISPs are so bad during evenings that the target latency has to be of the order of 500 ms. There is absolutely no need for frequent pings that would trigger the rate limits imposed. Think of my setup as a 10x slowed version of someone else's setup.
Why? Because getting reliable delay measurement is at the core of the method and you are already close to being rate-limited by your ISP. Are you sure that the limit is per-user, otherwise what happens if your neighbor starts an mtr session? Now, clearly this is your choice, but I would look hard at changing the delay measurement provider to something less fickle.
That said,"running my own NTP server in a close by data center and exclusively querying that" is exactly what I would do if I had a variable rate link. My network sees occasional bi-directional crunch so I would prefer to control up- and downlink rates independently, and with my own reflector I would not rely on (ab)using other people's servers (as these might change their willing)nes to respond in a pinch. Why NTP? Essentially since it seems the only OWD option that can work over IPv6 and is a well established protocol ISPs should not filter willy-nilly.
@patrakov in my case perversely using a VPN helps because my 4G ISP throttles and VPN circumvents that. Have you tried using one of the free services like Cloudflare WARP e.g. using their android app on phone connected via WiFi just to see what happens?
Sorry, i have a bug instaling lua_32bit. i don't know why (turrisos based on Owrt 19.x) i hav no time to debug now , it's a big day for Chile voting for its future so i'm a bit out on digitalworld on thoses days.
No idea, for NTP however this interval was documented somewhere.... one of the reasons I argued for interleaving probes to different reflectors, even though getting 200ms inter-probe interval with 15 seconds intra-probe interval results in quite a number of parallel NTP queries...
Hence my 'plan' of using a single NTP reflector under my own control
I was wondering - is there an established, maybe even RFC-defined, Layer 7 protocol for exactly this kind of purpose - to determine latency between peers in both directions? So, not as a convenient side-effect of ICMP or NTP features or similar a thing, but some sort of high-precision time stamping service that was designed for the kind of measurement we are trying to perform?
IMHO there are a few RFCs about OWD measurement approaches, but the one's I looked recommended to use NTP to synchronize clocks anyway, at which point I realized that NTP itself fits our needs exceptionally well, with the exception of a large deployed base willing to tolerate high frequency queries, but the alternatives short of ICMP timestamp requests suffer from the same problem.
One alternative, if we want 'not-ntp' would be irtt, but that had essentially zero public servers deployed.
I am hopeful @patrakov can facilitate making hping3 an official package. If so I'll happily make new branch for that and switch over if it works well for everyone (and I imagine it would from our earlier testing).
But the (valid) operational concerns of NTP hosts somewhat contradict that view, don't you agree?
I was thinking that today's very efficient socket server APIs (epoll, io_uring) would allow for a lightweight, UDP- or maybe even better TCP-based protocol to be implemented that would try to shuttle around accurate, high-precision timestamp information with an emphasis on low latency and computational effort between a server and legions of clients, and maybe only refer to NTP offsets, assuming a shared NTP-based "time truth" available on both hosts, to make stamp values comparable...
I'll have to think some more about this, but it sure seems like a fun weekend project
Ah, I had missed your remark concerning irtt (https://github.com/heistp/irtt)! Maybe that's that I just described? Going to check it out.