Help diagnoising network performance bottleneck

Hey!

i had been using dd-wrt on a netgear wndr3700v4 for the last ~ 7-10 years.

would consistently get 700mb/s down and 50 up on a 1000/50 connection. But as my friends and family began to use my hosted services more and more (mostly jellyfin) i noticed i was getting packet loss whenever uploading was occuring.

(See 1)

so i began automating speed tests every 5 minutes and piping the results over to prometheus for review

as you can see in this screenshot everytime there was usage of my network (i.e someone was leeching off me, like two peices of a puzzle connecting ) i would also get packet loss.

(See 1)

anyway i started looking around and i found that one cause of this over in AUS is that there is automated throttling mechanisms in place when to get close the the residential upload cap (50mb/s). This then led me down the rabbit hole of QOS. sadly i would find that QOS would just destroy my network performance..

(See 2)

This is when i relised you know what my router is pretty old, so i decided to take my backup tp-link archer a9 and chuck open-wrt on it. installed fine but straight away i noticed that i was getting a cap of 90 mb/s.

(See 3)

i then stumbled across the Routing/NAT Offloading and after enabling that i got a decent increase but now instead of a solid 700m/bs its bouncing between 500 and 900 so actually more than the old router but not always.

(See 4)

The crazier part if i run a speed test on my windows pc and not via the docker container running speed tests i get a solid ~800-850 every single time.

So i tried running the server through the exact same lan cable as the pc getting 850mb/s to try and reduce differences but speed tests were returning no difference.

I was suspicious maybe the speed tests were lieing to me so i tried downloading 10gb of files and the download speed now matched what the speed test was reporting ~ 500mb/s

The worst part of all this is that i still experince packetloss on another router, and the archer a9 with open wrt gets even worse throughput with QOS enabled >< than the old combo did

Sorry for the novel!

any tips would be appeciated

Thanks!

TLDR;

  • Does anyone have any tips on how to resolve packetloss when uploads occur besides destorying the network with QOS? maybe a way to just limit max upload speed without packet inspect that QOS performs? (assuming that this is really the cause of my packetloss ofc) - SOLVED, tldr its just how it works if you use 50% of your upload and you try to use mroe than 50% of your download packetloss will occur

  • Does anyone know why the new router + openwrt cause such sporatic speed tests compare tot he consistency i had before?

  • Why would my windows pc report such solid speeds but the server now gets such weird results,? even if their paths to the outside world are exactly the same? - SOLVED the speed tests on my pc are probably just faked to look good

  • QoS/ SQM for 1000/50 MBit/s requires pretty high-end/ new devices (x86_64 is a good idea, there are only few alternatives and filogic 8x0 is just catching up to those speeds)
  • ISPs do prioritize traffic (or even reroute it within their own network) to known speedtests, as that makes them look good
  • 50 MBit/s upload with 1 GBit/s download is a particularly bad ratio, the uplink can barely keep track with just acknowledging received packets
  • 'some' packet loss is unavoidable with sqm, especially with your uplink ratio, it's the only way sqm can 'tell' the ISP to slow down
3 Likes

To illustrate this point, TCP Reno with the default delayed ACKs and MTU around 1500 comes out at ~1/40 ratio of reverse ACK traffic / forward data traffic. Modern techniques like TSO/GSO/GRO further reduce tge ACK rate, but at 1000/50 saturating the download will likely eat up 50% of the uplink capacity.

2 Likes

Even without sqm packetloss will happen eventually under saturating loads of sufficient duration. Once the queue is maxed out dropping packets typically is the only viable option....

2 Likes
  • QoS/ SQM for 1000/50 MBit/s requires pretty high-end/ new devices (x86_64 is a good idea, there are only few alternatives and filogic 8x0 is just catching up to those speeds)

thanks yeah i did figure based on what ive read QOS/SQM its a struggle even for topend, and the archer a9 is only like 800mhz not much of an upgrade from the previous ~ 500

ISPs do prioritize traffic (or even reroute it within their own network) to known speedtests, as that makes them look good

Thanks, i did figure what i'm seeing in the browser may be inflated vs what the container tests were producing

50 MBit/s upload with 1 GBit/s download is a particularly bad ratio, the uplink can barely keep track with just acknowledging received packets

This is just the max cap enforced by the aussie regulators, its the "top end" option you can get in aus, to better you have to have an ABN (basically a legitimate business number) ;(.. though maybe i can look into this further

'some' packet loss is unavoidable with sqm, especially with your uplink ratio, it's the only way sqm can 'tell' the ISP to slow down

hmm fair enough, i did notice it was wayy less. but i kinda just assumed it was due to less traffic on the uplink overall tbh rather than it actually helping

To illustrate this point, TCP Reno with the default delayed ACKs and MTU around 1500 cones out at ~1/40 ratio of reverse ACK traffic / forward data traffic. Modern techniques like TSO/GSO/GRO further reduce tge ACK rate, but at 1000/50 saturating the download will likely eat up 50% of the uplink capacity.

sorry are you implying i can adjust the ACK/MTU to result in less up/down but more stable of a connection? or just saying given the ratio without limiting my overall download im out of luck?

i do understand that it is all ratio based, but fr making it crystal what im hitting is the if i use say ~20 (40% upload) mb/s up and i tried to use more than ~600mb/s (60% download) down than i would start getting throttled/experience packet loss? cause technically that's the full 100% of the connection?

So you have a few toggles, but nothing really earthshattering:
a) if you reduce the MTU/MSS ACK traffic will require a higher fraction, so that is not heöpful, but for normal internet traffic the MTU is essentially limited to around 1500.
b) you can use GRO on your devices (which results in larger meta packets that are treated as single TCP segments) which can elicit fewer ACKs
c) you can ACK filter (e.g. using cake's ack-filter keyword) which removes some redundant ACKs and can help, but this only helps if you have few bulk flows that individually cause lots of ACKs, if you have too many bulk flows ACK thinning in the extreme might not remobve a single ACK.

The best (albeit painful) self-help is to traffic shape the download rate such that the up/down asymmetry is less severe... but that comes at a throughput cost...

Yes, once the traffic load exceeds the capacity first queues biold up and the traffic will be dropped (if you use rfc3168 ECN some drtops can be avoided). TCP inte5orets such drops as signs that it exceeded the path capacity and it will slow down but all the packets in flight will have to be serviced before queues return to empty state.

1 Like

Thanks again, all that makes sense.

The best (albeit painful) self-help is to traffic shape the download rate such that the up/down asymmetry is less severe... but that comes at a throughput cost...

One final question, is the only way this can be done via QOS/SQM?

No, you can use tc commands directly to achieve this (then again sqm-scripts really mostly are a convenience layer around the necessary tc invocations...)
There are also other projects like qos-scripts, qosify, cake-qos-simple that can be used to achieve the same.

And there are no limits really, so you could also deploy libreqos in a bump-in-the-wire configuration, or rent a VPS server in the cloud to do all the ingress shaping on and then tunnel to your router, or...

In my biased opinion sqm-scripts will make it easy getting started (not that qosify or cake-qos-simple are more complicated) and you can then research what else you can do from a hopefully decent base line. Tipp: of all the options in sqm-scropts simplest_tbf.qos/fq_codel is the computationally cheapest that will allow the highest throughput, and if you goal is mostly to remedy the ISP's asymmetry somewhat that might be the best starting point.

1 Like

What kind of QoS would you expect hardware QoS to support? An individual posted the below concerning possible QoS support in the RT3200 devices:

IN theory it would be really sweet if we could outsource the traffic shaping to NIC hardware, and I think there are a few NICs that support something like that (not sure how well such a hardware shaper can be combined with user-space qdiscs) not sure though that the OP's hardware supports that.
One caveat with hardware offloads is that they typically are less flexible than software implementations and also slower to change in firmware, but that IMHO should not really be all that problematic for a traffic shaper, given that that problem space seems rather well understood and clearly defined...

In practical terms does this mean that if hardware QoS support is somehow unlocked it will be rather limited to something like fq-codel?

I think I read about an fq_codel implementation in hardware somewhere already, but I see no reason why other algorithms should be off limits... But I am really not all that knowledgeable in the hardware offload topic, so I might be off by a lot in my beliefs (I try to differentiate between fact-based knowledge and belief, and for offloads I am deep in belief territory).

Understood. I also saw a post recently about getting OpenWrt on an Android mobile phone:

that'd be pretty cool.

Thanks again, yeah i guess i just thought there would be a way to just set a simple max up and down and that would be job done. but both routers simply just gets devastated by turning sqm on.

Im looking at messing around with a nanopi at some point so ill give it a go on that to see if the extra oof helps me or not.

Thanks again all!

so after digging around it seems like mediatek is really streching the truth claiming hQos support all it really is is flow offloading and we already have that ... as far as I can tell there is zero implamentation of mqprio anyware to be found execept on android

1 Like