Yes this is MediaTek MT7621. irqbalance is enabled although I've tried it both ways and it seems similar. It doesn't seem like the cpu governor can be changed on this CPU unless I'm missing it. Shows it is set to "teo".
I tried setting 'adjust_dl_shaper_rate' and 'adjust_ul_shaper_rate' both to 0 but that doesn't change the CPU much. I also tried disabling logging and a small change perhaps but not much.
It looks like the trouble is with the frequency of pingers as if I set "reflector_ping_interval_s=10.0" (just for testing obviously) then the CPU is significantly reduced.
I'm not sure that the CPU utilization of cake-autorate is actually scaling with the download rate but it might just be adding on to the router's own CPU utilization for cake itself which does scale with the download rate. When my network is quiet but pingers are not idle then the CPU utilization on all 4 cores is around 30%. When cake-autorate is stopped it is about 1%. So cake-autorate just doing the pingers takes about 30% and then when I do the download cake itself adds 20-40%, and with process swapping that probably can explain my results.
So then I guess the question should be if I should adjust the ping frequency. My performance is certainly acceptable but it isn't ideal making the router's CPU bump up around it's maximum as that in and of itself could probably introduce latency.
I've just implemented the switch over to the sum of OWD baselines with this commit:
New REFLECTOR header and example data:
REFLECTOR_HEADER
LOG_DATETIME
LOG_TIMESTAMP
PROC_TIME_US
REFLECTOR
MIN_SUM_OWD_BASELINES_US
SUM_OWD_BASELINES_US
SUM_OWD_BASELINES_DELTA_US
SUM_OWD_BASELINES_DELTA_THR_US
MIN_DL_DELTA_EWMA_US
DL_DELTA_EWMA_US
DL_DELTA_EWMA_DELTA_US
DL_DELTA_EWMA_DELTA_THR
MIN_UL_DELTA_EWMA_US
UL_DELTA_EWMA_US
UL_DELTA_EWMA_DELTA_US
UL_DELTA_EWMA_DELTA_THR
REFLECTOR
2023-05-09-19:37:32
1683657452
1683657452
9.9.9.9
45151
45164
13
20000
553
553
0
10000
7955
8860
905
10000
REFLECTOR
2023-05-09-19:37:32
1683657452
1683657452
9.9.9.10
45151
45151
0
20000
553
830
277
10000
7955
7955
0
10000
REFLECTOR
2023-05-09-19:37:32
1683657452
1683657452
9.9.9.11
45151
45516
365
20000
553
572
19
10000
7955
8669
714
10000
REFLECTOR
2023-05-09-19:37:32
1683657452
1683657452
94.140.14.15
45151
51597
6446
20000
553
1648
1095
10000
7955
10885
2930
10000
REFLECTOR
2023-05-09-19:37:32
1683657452
1683657452
94.140.14.140
45151
51709
6558
20000
553
1068
515
10000
7955
8512
557
10000
REFLECTOR
2023-05-09-19:37:32
1683657452
1683657452
94.140.14.141
45151
52180
7029
20000
553
1567
1014
10000
7955
9494
1539
10000
@gba it seems apt to test with a reduced ping frequency. You could simply reduce the default number of pingers from 6 to 3 or keep them the same and increase the interval from 0.3s to 0.6s.
These MIPSen are getting long in the tooth... sure its a dual core with 4 threads, but each individual thread simply is not that powerful. It shows that the MIPS cores themselves have not seen major rearchitecting in the last decade, IIUC 1004K is really only little more than 24K with SMT and SMP.
Yes, I guess that is to be expected from your CPU, I am amazed that cake only adds 40% I would have expected cake @100Mbps to fully max out one CPU...
I would consider switching routers (I understand that that might not be a realistic option).
BTW, could you do a test with the old version again just to check whether current cake-autorate got leaner on your hardware as well?
# tsping employs ICMP type 13 and works with timestamps: Originate, Received, Transmit, Finished such that:
# dl_owd_us = Finished - Transmit
# ul_owd_us = Received - Originate
#
# The timestamps relate to milliseconds past midnight and hence timestamps can rollover at the local or remote ends,
# and the rollover may not be entirely synchronized.
#
# Such an event would result in a huge spike in dl_owd_us or ul_owd_us and a lare delta relative to the baseline.
#
# So, to compensate, check for delta > 50 mins and immediately reset the baselines to the new dl_owd_us and ul_owd_us.
#
# Happilly, the sum of dl_owd_baseline_us and ul_owd_baseline_us will roughly equal rtt_baseline_us.
# And since Transmit is approximately equal to Received, RTT is approximately equal to Finished - Originate.
# And thus the sum of dl_owd_baseline_us and ul_owd_baseline_us should not be affected.
# Therefore in maintain_pingers(), working with the sum of dl_owd_baseline_us and ul_owd_baseline_us is
# unaffected by the above-described rollover compensation.
@rany made the point on GitHub that I should really comment these changes because in the future we might forget what on earth they relate to.
"past midnight UTC" so conceptually the clocks should be reasonably in sync...
However in reality it seems not uncommon to return local time past midnight, so roll-overs can truly happen any time... (nobody is going to fix their ICMP type 13/14 responses no matter what is documented, so we should accept what we get and not be overly fussy )
Maybe add "... for the reflector recycling check..." or similar?
Yes, adding meaningful commit messages is a great idea
read from the fd pointing to an anonymous FIFO for that process with timeout set based on time until next tick
process any IPC command sent from other processes
process the regular commands for that tick on timeout
This suffered from a significant flaw in that a timeout could result in a partial read and hence data loss in a command sent from one process to another. Mostly that's fine since the data is non-critical, but some data like CHANGE_STATE is critical, e.g. to start the pingers*.
Another flaw relates to the timing given the use of the timeouts - namely I'm not sure that the tick intervals were always correct.
This allows a determination to be made as to whether there is data in the FIFO ready to be read.
Making use of this, I propose the following new sequence for each loop:
read with timeout of 0 to see whether data exists in the FIFO
whilst data exists in the FIFO, read it in and process each outstanding IPC command
once data no longer exists in the FIFO, process the regular commands for that tick
sleep until the next tick
This modified approach (which I am testing here) seems to benefit from slightly reduced CPU use as well.
*) This could result in a bad situation in which when moving from IDLE to RUNNING the pingers didn't start and the code then went into STALL. And during STALL we just wait for new ping responses or increased load. And increased load would transition from STALL to RUNNING, but then just STALL again given no ping responses.
I also just realised that if I kill 'tsping' and then start it again, I can end up with a partial line write from a previous line write getting mixed in with a new line write, e.g. see double timestamp in the following:
if we can detect a partial read the easiest might be to follow this up with a read of the rest of the line (after all FIFO read and writes are supposed to be atomic) and just merge things?
But I'm also thinking we can alternatively kill parse_preprocessor and start a new instance up:
You see what happens is that we read in a partial line from the old killed tsping instance, then read in a new line from the new instance, and end up with:
But if we shutdown the parse_preprocessor from the old tsping instance, that partial line is lost, then we can start it up afresh for the new tsping instance.
which happened to fix this partial read issue in respect of fping lines by skipping over such mixed up lines.
But probably the safest thing to do is to wipe the slate clean properly and start up a new parse_preprocessor instance:
I'm hopeful that these recent commits, which address the similar but separate partial reads issues in respect of both IPC commands and tsping lines, will render cake-autorate with tsping stable. Time will tell!
Should we restart the pingers, just once, on a stall timeout of 10s? At the moment we just leave the pinger instance(s) running on stall and hope for the best. But if an interface went down and later went back up might not a restart be a good idea? So for example say interface went down for a day. The first restart on initial 10s one off timeout would do nothing. But the second restart associated with end of stall given data transfer despite no reflector responses and subsequent stall and 10s timeout given no reflector responses would then work?
@patrakov do you know if ping or fping is running and an interface goes down and later back up, does the pinger binary need restarting or can we rely on the old one?
So has this issue appeared with normal messages or only with killing tsping?
If the latter we might ask @lochnair to have tsping synthesise and print a complete record on receiving a TERM signal?
Other than that, I think we should put an end marker at the last position of each record so detection of incomplete records becomes easy?
I am not sure because with fping we skipped over odd lines given regex check.
@Lochnair would it be easy enough to ensure tsping prints a complete line before it is killed?
Does anyone have any more thoughts on the points @patrakov raised about the timings:
Any thoughts on this?
Maybe at the very least maintain_pingers() should check that the pinger binary/binaries are actually running on the 10s stall or even as soon as the stall is identified.
Fair enough, let's implement my idea of adding a record-unique end-marker as last value per record (or if the first field is a timestamp simply repeat that same timestamps as last field).
Because conceptually the atomicity of FIFO writes an reads should avoid that situation ever happening...
Not for SIGKILL, but likely for SIGTERM...
I think I tested that with fping over pppoe reconnects and things just continued to work...
But generally I defer to @patrakov for the stall handling since he is actually experiencing them (and as much as I dislike the ad-hocery of the stall handling I think we should handle situations like his gracefully)...
The "check if actually running" seems to be a good idea... except I would make parse_pinger do this, as parse pinger knows the PID and the exact command line the pinger was started with so it could check very selectively whether the pinger it started actually is still running...
For now I've just opted for the simple option of restarting pingers on a global ping response timeout:
The recent adjustments should preclude any lost IPC commands and deals with a partial tsping write (which is just discarded).
Please can anyone test the main branch with tsping now? There have been many fixes and improvements based on my own testing. And things seem like they might be stable now with tsping.
@rb1 any chance that you could update to the latest version on 'master' and test?
@moeller0 any chance you could test on your system?
Based on the discussion here, I see that we can launch individual instances from the procd script directly and bypass the separate launcher script. The procd service status is bugged right now, but @jow has worked on a fix. Hopefully this will make it into 22.03 or at least future versions of OpenWrt.
Instead of actually checking for incomplete records? I would guess these two things to be complimentary, keeping it simple AND checking for correctness
I thouigh that would be an easy way just sandwich the data between the same time stamp and then on reading a record check that both timestamp fields have identical content.