Sqm doesnt play nice together with stadia?

Sure, like actually keeping a maximum counter that really counts the maximum, and for example gets reset to 0 on reading? That at least was my naive expectation...

IMHO the current code is used, because it is computationally cheap, addition subtraction and a few shifts is something even a lowly MIPS cpu can afford for non-essential things like statistics (these are only collected for humans to reason over). That said, I still think a true re-settable maximum would be god to have (and the reset does not need to be super precise temporally). I guess I will need to understand how tc -s actually reads these values, to understand whether a "consuming" read for max_delay is easy to implement or not

Max delay is probably less than really super useful. Often these things might be long-tailed, so yeah, one packet was delayed 300ms once since it was reset, but 99.9999% of them were delivered under 13ms or something.

How about this, which is also inexpensive:

keep a buffer of 20 delay values which you split into the first and second half. Every packet you look to see if you should replace a value in the first 10 with the current value. Every 10000 packets you copy the first 10 to the second 10 and zero out the first 10... then when it's read, you report the second 10 they will be the quantiles at 99.99% to 99.90%tile (or you just report the smallest, it'll be the 99.9%tile)