SQM, Flow Offloading, VLAN Tagging and Gaming

In theory, requesting low latency from AQM/SQM should result in small buffers and reduced bufferbloat. However, some interval settings should be more than your average RTT, if I've understood correctly. In addition, if you uplink is limited (e.g. ADSL) I think that your uplink (egress) traffic should not use ECN in any situation because it's better to drop the packet early instead of trying to push it through the limited uplink. On the other hand ECN should be okay for downloads (ingress) in all cases because the packet has already travelled the "last mile" and it would be waste to drop it at that point. Of course, this assumes that your WAN link is slower than local LAN but I have hard time imagining any case where this is not true.

Other things that I would try:

  • Setting low values for initcwnd and initrwnd. I did set those to 20 each to improve bandwidth of short TCP connections. That comes with the possibility of taking all your uplink for short periods of time. If you want to make sure UDP traffic will not suffer, forcing low values for initial window size should prevent bursts before TCP congestion algorithm limits the traffic. Note that web browsers usually make 6 paraller connections so any value you set here will be multiplied by 6 for the amount of packages on the fly for every new TCP connection. Value of 20 results to 20 x 6 x 1540 = 184800 bytes on the fly which will require about 200 Mbps connection to avoid causing more than 10ms of load for your connection.
  • Setting /proc/sys/net/ipv4/tcp_low_latency to one.
  • Reducing value of /proc/sys/net/ipv4/tcp_limit_output_bytes to low enough value to prevent kernel from buffering egress traffic too much. I'm not sure but this may need to be done on the client machines, not on the router. If I've understood correctly, if your NIC driver support BQL, this is not needed. Again, setting low value will slow down bursty TCP traffic but should improve UDP latency.
  • Running linux-lowlatency kernel on the client and router machines. (Also known as PREEMPT kernel. You should also try RT kernel if your hardware/distribution supports it.) If you run Windows, I don't known if anything similar exists for that.

In the end, the only way to get really low latency is to make sure you never ever fully use your full bandwidth. Optimal case would be full bandwidth minus 1 package all the time. The problem is that limiting the amount of traffic without introducing extra latency is really hard without real support from client applications. And because this is about pushing packages through multiple devices from different manufacturers, any weak link in this realtime traffic will cause stuttering.