With:
while read -r -u "${pinger_fds[pinger]}" timestamp reflector seq _ _ _ _ _ dl_owd_ms ul_owd_ms
Timecourse:
Raw CDFs:
What are we seeing here(?):
... not enough saturation?
With:
while read -r -u "${pinger_fds[pinger]}" timestamp reflector seq _ _ _ _ _ dl_owd_ms ul_owd_ms
Timecourse:
Raw CDFs:
What are we seeing here(?):
... not enough saturation?
There still seems to be something wrong, during downloads, both OWDs seem to go up, not what we expect.
We are still pulling the UL shaper down even tough there seems to be no upload data flowing during the download tests. Yes we see increased UL ODWs during the download, but I wonder are these real or do we have some piece of code that confuses UL and DL somewhere?
Unless I messed up again, no
I did change the order for the default output to match "-m" mode, but "-m" format should be the same.
BTW, here is a link:
to a response by Apple's Stuart Cheshire addressing the TCP/UDP question we where partly discussing. I feel I did make the point I wanted to make as eloquently and clearly as Stuart (no surprise here ) and I hope that his argument makes things clearer, and why discord seems to be to blame here (not that that helps, and if autorate can grow a config option that allows discord in situations like his it should)).
@Lynx Quick question, have you handled the midnight rollover problem for OWD's in your code?
I recall that was something we ran into in the Lua effort
Not explicitly. Perhaps it's wishful thinking to think that the existing baseline tracking and working with deltas will work with that. @moeller0? Since tsping outputs down and up OWDs what will those look like with the midnight rollover?
I am sure this will need a little care, but let's postpone tackling that part until we have tsping working well otherwise. By virtue of being close to the UTC time zone, europeans are unlikely to immediately suffer from this issue. I am not saying to ignore this for good, just ignore it for now and return to it once the rest of tsping interaction works smoothly.
Note the issue is two fold:
a) the simple problem of cycling to zero when 95040 would be expected, resulting in off measurements with our baseline tracking. E.g. 0-95040 = -95040 which is certainly unexpected large offset that would throw our baseline tracking off course. But that should be relatively easy to ignore
b) the other problem is when clocks between the endpoints are badly synchronized and hence the timestamps do not "flip" over close in time but with considerable delay, in that case the offset (our baseline) will change and we would need to update our base line estimate to account for that... but even that should be easy to detect, after all we roughly now which reported timestamp range is potentially problematic...
If the raw timestamps aren't being corrected you end up with OWD values that are offset by 86400000 milliseconds in one or both directions, depending on exactly when the reflector's clock resets relative to yours. I deal with four different scenarios:
I handle this in my perl implementation by detecting when a OWD value indicates that the reflector's offset (i.e. the relative difference between the reflector's clock and our clock) has changed by more than the configured ICMP timeout. When that happens I check the values and add 86400000 to the appropriate raw timestamp(s) to try and fix them.
Here's an example from my logs:
Mon Mar 6 23:54:40.882 2023 [12126-0004824721]: ICMP DEBUG: RECEIVE: WARNING: recv timestamp for "151.80.6.68 34725 13196" too small after applying offet of -441822. Attempting to correct...
Mon Mar 6 23:54:40.883 2023 [12126-0004824721]: ICMP DEBUG: RECEIVE: WARNING: Local and/or remote timer reset detected and corrected for "151.80.6.68 34725 13196":
Mon Mar 6 23:54:40.883 2023 [12126-0004824721]: ICMP DEBUG: RECEIVE: Before: ip=151.80.6.68 orig=86080862 recv=122697 tran=122697 end=86080882 ul_time=-85958165 dl_time=85958185 rtt=20
Mon Mar 6 23:54:40.883 2023 [12126-0004824721]: ICMP DEBUG: RECEIVE: After: ip=151.80.6.68 orig=86080862 recv=86522697 tran=86522697 end=86080882 ul_time=13 dl_time=7 rtt=20
But I work with absolute OWD values and detect the problem when the reflector's offset changes too much. I'm not sure how you'd detect the problem in the bash implementation.
Thanks, yes that is helpful.
I guess 3. and 4. could be dealt with primarily by ignoring such samples, but that really just reduces these cases to special cases of 1. and 2. namely that our baseline estimates change drastically; if they get smaller we currently should deal with quickly, but if the apparent baseline increases we will take a while to catch-up (which will trade-off some throughput but should keep latency fine, but might result in an extended epoch close to the minimum rates). But I have not actually looked at this closely enough to have more than a hunch and the only half-digested information from your post
Yeah, (1) and (2) are much more common than (3) and (4). I don't think I've ever seen (4) actually happen because the time window is so small, but it is theoretically possible.
@patrakov can you remind me why:
root@OpenWrt-1:~# cat /root/cake-autorate/test.sh
#!/bin/bash
while read -r line
do
echo $line
done< <(ping 1.1.1.1)
kills ping process just fine on exit, but when run from:
root@OpenWrt-1:~# cat /etc/init.d/try
#!/bin/sh /etc/rc.common
START=97
STOP=4
USE_PROCD=1
start_service() {
procd_open_instance
procd_set_param command "/root/cake-autorate/test.sh"
procd_close_instance
}
does not?
I think it's related to procd/systemd interfering with normal process management and things like SIGPIPE?
According to my tests, there are three processes involved: test.sh
(parent), test.sh
(child), and ping
. The child seems to be a subshell, but I don't know why it appears here. Anyway all that it does is to wait for the ping
process to finish.
procd
sends the SIGTERM
signal only to the parent process, which indeed terminates. Therefore, the pipe between it and the ping
process gets severed from the reader (receiver) end. ping
tries to write something there, and, because it is writing to a pipe the other end of which is closed, it gets a SIGPIPE
.
Without procd
, this signal terminates ping
, and then the child test.sh
process also finishes waiting for ping
to exit, and exits itself, leaving nothing.
With procd
, SIGPIPE
is ignored, and therefore ping
doesn't die, and the test.sh
child process continues waiting, in vain. Therefore, two processes remain.
Bad news: in bash, there is no way to restore the "normal" handling of SIGPIPE
, this is even documented in the manual page:
Signals ignored upon entry to the shell cannot be trapped or reset.
Terrific analysis! Is there by any chance a way to disable the ignoring of SIGPIPE in procd or systemd?
In any case, it seems that perhaps the safest option is to go on retaining the PID and explicitly killing it even though it's less elegant than the pipe teardown mechanism.
Please ignore systemd. It is not relevant to OpenWrt.
And I have checked the code of procd
, and can confirm that it sets SIGPIPE
to ignored unconditionally.
This commit seemed desirable:
... but hugely problematic for the reasons you have identified.
No way to disable the ignoring of SIGPIPE in procd then? Even if there were to be, perhaps relying on that would be foolhardy.
But you now can encapsulate the whole pinger PID locally inside the parser process an nobody else needs to know, you just keep a single PID. seems easy enough and avoids the need for elaborate process management...
the main loop only needs the maintain and logger PIDs, and maintain the parser PIDs and each parser handles a single binary.... seems pretty clean to me clean enough to be able to not need process management...
I think we can take some inspiration from mwan3
: it is also implemented in shell, runs ping
and other subprocesses, and is forced to run in an environment where SIGPIPE
does not work (which is, in my opinion, an unsupported environment for any shell).
It's been a while since i posted in this thread. life and all that jazz. I have been using the new bash implementation ever since the Lua stopped to work due to reasons already explained. As a user seeking solutions, it's a luxury to have these options. As far as i can determine, the recent bash version does seem to do what i want it to do. But you guys need data. I would love to serve up some here next weekend.
Hello.
It would be great if compatibility with other Linux distros could be preserved, as it seems to be the case so far in my x86 Debian setup.
But please, I don't want to be disrespectful or abusive in any way, asking something like this in an ... OpenWrt forum!
Of course you are OpenWrt devs thus your main goal is to produce OpenWrt-tailored code.
Anyway, thanks for the good work so far, and thanks for not kicking me out even if I'm not an OpenWrt user. (I still have OpenWrt in a wifi access point device, though )
If that is true then there is our path forward, trap that term and then initiate a proper staged shutdown, preferably by first asking each process nicely to shut itself down gracefully followed by a SIGKILL (and I mean -9 here the time to ask politely is when sending the shutdown request, so the "talk softly, but carry a big stick" of process management, if you will)