Microlags with SQM

dlakelan · March 19, 2019, 10:11am

So sounds like local loop is frequently congested in upstream direction.

It occurs to me that the modem may do a better job of scheduling time slices on the DOCSIS side when the modem has a buffer and the congestion control algorithm is engaged. When SQM is keeping it buffer free it may think "gee, nothing much going on I'll leave some time slices for others" ...

moeller0 · March 19, 2019, 10:55am

Ooops, you are right, I overlooked that.

That would be a failure in PIE/DOCSIS, as that means that traffic bel.ow full utilization has issues, with more and more cable ISPs offering 1Gbps plans I assume that even without a shaper in front of the modem, full utilization will be not that common.

dlakelan · March 19, 2019, 11:16am

My impression of most cable companies is they think like TV companies: "consumers consume, they don't produce"... Cable co spend all their effort on pushing things to the dwelling, and assume that upstream is just acks so upstream congestion may be very common if they purposefully undersize it so as to offer more downstream bandwidth (so called Gigabit, which is probably 1000/20). They usually market the downstream bandwidth and don't even mention up except in fine print.

I'm glad to be able to drop my DOCSIS connection for FTTP.

Also the way they offer those gigabit plans is by careful analysis of the traffic pattern so as to size the oversubscription rate to bring complaints down to a level they can handle with offshore cheap call center labor... The oversubscription rate is probably even as high as 100/1, so they put 1000 people on a 10gigabit backhaul and give them each a "gigabit" connection. All they need is that less than 1% are running simultaneous speed tests. But most of the time the modems aren't using very many channels, only during load does the modem negotiate a bunch of different frequencies to suck down bits. Channel management may be tied to congestion signals, such as queue length. It wouldn't surprise me if SQM interfered with this oversubscription / channel management, though I'm really just speculating here.

Remember the "pipe cleaner flow" from a year ago or so?

shm0 · March 19, 2019, 12:41pm

This is not a frame drop issue.
FPS is stable at 120.

For example, when switching weapons, there is a delay of like half a second, also the animation is sometimes screwed, like skipping parts of the animation. Which is actually caused by high ping or packet loss. In old half life engine based this can also be observed.
But the in game net graph shows no packet loss/ choke. Latency is around 24-35ms.
In Diablo 3 there is also a delay in casting spells.
On the LTE connection this doesn't happen.

Yes PIE is mandatory for Docsis 3.1.
But this is a docsis 3.0 connection.
I have read that it is possible to have pie aqm in docsis 3.0 modems.
I don't know if my modem supports this.

dlakelan · March 19, 2019, 12:53pm

Out of curiosity how does it perform if you run a moderate bandwidth flow at the same time, like watch a YouTube on low or medium or even high res on a separate box while playing? A steady load may cause the DOCSIS machinery to reserve you a set of channels and a time slot.

shm0 · March 19, 2019, 1:06pm

I haven't tested this yet.
I have a different problem now.
A technician was already here a few months back.
He installed a attenuator (upstream way).
A few days back we had a power loss here and the modem wasn't able to sync anymore.
I removed the attenuator and the modem synced again.
I tried with the attenuator again but no sync was possible.
Removing the end terminator and switching the cable from the modem to the outlet, made the modem sync again.
Today new end terminators arrived. But with the end terminator installed, the modem connection values are worse then without the terminator.
For example: down stream snr from 39-40 to 37db.
And Upstream power level from 38db to 56db.
What is going on?
// edit
seems like the terminator was defective..

moeller0 · March 19, 2019, 1:18pm

Rather unlikely.

It seems odd, to make these things depend on network communication (at least the animation could/should be done locally) , but half a second delay with a base RTT ~30ms seems rather odd.

dlakelan · March 19, 2019, 1:18pm

RF problems. In my experience the lines degrade with weather and soforth. We had to have a cable person out to change connectors on the pole at least once every 3-4 years due to corrosion etc. This is yet another reason to want fiber optics.

RF at hundreds of MHz or even GHz down a coax is basically subject to lots of issues, including reflections, attenuation, loss of shielding...

dlakelan · March 19, 2019, 1:23pm

Absolutely true, but the animation is trying to track a "game state" that is managed by the server, I could imagine the local state gets out of sync with the server, particularly if there are drops, and then there is a convergence time during which the engine doesn't know what to render... It's all I've got to explain why network would be involved...

diizzy · March 19, 2019, 10:52pm

I honestly don't think this will boil down to some issue with SQM, rather showing that CS is starting to show its age and running into possibly compatibility issues as there are a lot of odd "fixes" for CS:GO on various hardware. In general these does sound like local issues rather than SQM related but I wish you the best of luck diagnosting the cause.

shm0 · March 21, 2019, 4:01am

Thanks. But this nearly impossible to debug. Could be everything. Congestion on the line. Bad Game Servers or even a problem on the router device itself (wrt1200).
There is patch for the mvebu platform to force all packets to go through one tx queue.
I also don't understand why this device has 8 tx/rx queues.
Even with the patch removed most packets hit one queue anyway. And why are there 8 queues when there are only 2 cpus?
Maybe it has something to do with that. I don't know.
But this is all a bit off topic sorry :<

//edit
seems like it is indeed possible for nics to support more queues then are cpu's in the system.
there is also a qdisc that's supports round robin scheduling for multiple queues.
i will check that out and see what happens.
then there is the thing with dsa...
normally there is eth0 and eth1, one for each cpu, both with 8 queues.
but with dsa there is only one ethernet interface...
//edit2
By default openwrt comes with fq_codel as default qdisc.
Which will result in mq as main qdisc and a fq_codel qdisc attached to each hardware queue.
I'm not quite sure what mq is supposed to do? How are packets distributed over the queues with mq?
According to this document: https://www.kernel.org/doc/Documentation/networking/multiqueue.txt

Currently two qdiscs are optimized for multiqueue devices. The first is the
default pfifo_fast qdisc. This qdisc supports one qdisc per hardware queue.
A new round-robin qdisc, sch_multiq also supports multiple hardware queues.

Attaching pfifo_fast as root qdisc results in only 1 queue.
I wonder if mq is only meant to be used with pfifo_fast?
That would be the linux standard configuration.

//edit 3
After some testing, it seems like there is indeed either a problem on the path to the official servers or a problem with the official servers, but i don't know why pie behaves better in this case. On community servers ping is low and stable.
Like moeller0 wrote, maybe there is qos/aqm in place somewhere, and it reacts better to "the way" pie sends out packets.

And for the multi queue thing...
Default linux configuration would be mq on the main ethX devices and pfifo_fast for each hardware queue. By default each pfifo_fast queue uses 3 bands (and a priomap).
So when all use the same priomap, i would assume that the packets round robin between the pfifo_fast leafs/queues. But seems like, that is not the case.
I guess the preferred setup here would be to use pfifo_fast with only 1 band per hardware queue and adjust priomaps accordingly.

For the openwrt default, mq + fq_codel...
I think some tc filter magic is needed here
(could also be used with the pfifo_fast approach or any other qdisc setup)
to make packets hit more then one queue.

Anyway...i ended up going back from DSA to swconfig to have both cpu ports.
Then have multiq attached to the main ethernet devices (eth0 and eth1) to make packets round robin trough the hardware queues.
And attached fq_codel (with cake lan present) to my lan interface (eth0.10)
And cake on the wan interface (eth1.20)
One thing i noticed is that multiq shows some requeues...
so maybe there is indeed a problem that some hardware queues are blocked/delayed.
But it works quite well so far...iperf jitter test is usually around ~200ms

mj5030 · March 24, 2019, 2:48pm

Downstream SNR is typically good as long as it's above 30dB at no less than a power level of -6dBmV.
If the power level is less than -6dBmV SNR should typically be above 33dB.
Most of these levels are based on the type of modem, but this is a rough idea. You should be able to find the specs of your modem on the manufacture's website
You do not seem to have signal issues on the downstream as shown by the modem unless you are receiving a high number of corrected and uncorrected codewords.

However, your upstream seems to be the problem rising above 52dBmV power levels. On an ATDMA 5120Ksym/sec upstream channel you want a power level no greater than 51dBmV. And on a ATDMA 2560Ksym/sec upstream channel you want a power level no greater than 52dBmV.

shm0 · March 24, 2019, 6:16pm

What i wanted to say was why the Upstream Power Level was worse with the terminator in place.
I got some new ones, the first one i used, gave me the same strange high power level.
Tried another one, that one seemed to work fine, as i stated in my edit above.
I also switched the antenna dose to get the upstream power level from 38 to 42.
Downstream SNR is now around ~38dbm
Downstream Power level is range from -1.5 to +1.5.
Upstream power level is around 42-43 dBmv.
I'm thinking about to install a 3db attenuator in the upstream way to get the powerevel to 45.
But according to my "isp recommend chart" 42 i still in a good range.
For downstream power level they favor values more in the positive range.
Docsis 3.0: -3.9 - 13.0
Docsis 3.1: 2.1 - 19.0
So when they switch over to docsis 3.1 i have to replace the dose once more.
But for now, i think those values are okay.

mj5030 · March 25, 2019, 2:05pm

Those seem to be really great signals to the modem.

I actually would not install the attenuator. On the upstream the lower power level should actually be better as long as it is not too low. It means your modem is "not having to talk so loud" to communicate with the CMTS. Anything between 42-50 should be good. If it gives you a better SNR on the downstream for some reason it might be worth considering.