Bad latency and jitter with SQM on ArcherC7v2

I recently upgraded my Wifi router to a TP-Link Archer C7 V2. I'm on a Comcast Xfinity cable connection, about 17Mbit down and 2.5Mbit up. I'm having a lot of trouble getting SQM to work properly, in terms of having bad download latency and jitter.

With SQM download set to 16000 and upload to 2000, here are some speedtest results:

root@OpenWrt:~# speedtest-netperf.sh -H netperf-west.bufferbloat.net -t 20
 Download:  14.89 Mbps
  Latency: [in msec, 21 pings, 0.00% packet loss]
      Min:  65.581
    10pct:  71.767
   Median: 154.813
      Avg: 167.915
    90pct: 268.883
      Max: 400.116
 CPU Load: [in % busy (avg +/- std dev), 18 samples]
     cpu0:  51.0 +/- 23.4
 Overhead: [in % used of total CPU available]
  netperf:  17.3

To me this seems terrible, high average and huge variance. They are comparable (sometimes worse) than without SQM at all. Upload latency and jitter is fine (~60ms very stable).

I have to decrease the download speed dramatically, to almost 50% of the maximum, in order to get acceptable average latency and jitter. Here are the results with download set to 9000:

 Download:   8.32 Mbps
  Latency: [in msec, 21 pings, 0.00% packet loss]
      Min:  62.342
    10pct:  62.495
   Median:  66.599
      Avg:  70.290
    90pct:  72.625
      Max: 115.823
 CPU Load: [in % busy (avg +/- std dev), 19 samples]
     cpu0:  37.3 +/- 21.2
 Overhead: [in % used of total CPU available]
  netperf:   9.7

Here is my /etc/config/sqm:

config queue 'sqmqueue1'
        option qdisc_advanced '0'
        option linklayer 'none'
        option enabled '1'
        option interface 'eth0'
        option download '16000'
        option upload '2000'
        option debug_logging '0'
        option verbosity '5'
        option qdisc 'cake'
        option script 'layer_cake.qos'

It is quite barebones. However, I have tried a huge number of variations, with different overhead values, different ECN/NOECN options, different qdisc (e.g., fq_codel), different scripts. None of it seems to make much of a difference.

Right now I'm running the optimized OpenWrt from https://github.com/infinitnet/lede-ar71xx-optimized-archer-c7-v2. However, I also tried it with the standard OpenWrt 19.07.4 and got similar results.

I'm at my wits end. Could it be possible that my router is faulty? I got it used from Amazon, but it seems functional in every way....

Here is a sample tc -s qdisc output (this is with download set to 11000):

qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 
qdisc cake 803e: dev eth0 root refcnt 2 bandwidth 2Mbit diffserv3 triple-isolate split-gso rtt 100.0ms raw overhead 0 
 Sent 12492999 bytes 64050 pkt (dropped 3109, overlimits 75250 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 252000b of 4Mb
 capacity estimate: 2Mbit
 min/max network layer size:           42 /    1514
 min/max overhead-adjusted size:       42 /    1514
 average network hdr offset:           14

                   Bulk  Best Effort        Voice
  thresh        125Kbit        2Mbit      500Kbit
  target        145.3ms        9.1ms       36.3ms
  interval      290.7ms      104.1ms      131.3ms
  pk_delay          0us        6.1ms        244us
  av_delay          0us        405us          5us
  sp_delay          0us         14us          5us
  pkts                0        67132           27
  bytes               0     17192679         2695
  way_inds            0         1356            0
  way_miss            0          614            7
  way_cols            0            0            0
  drops               0         3109            0
  marks               0            0            0
  ack_drop            0            0            0
  sp_flows            0            3            1
  bk_flows            0            2            0
  un_flows            0            0            0
  max_len             0         1514          160
  quantum           300          300          300

qdisc ingress ffff: dev eth0 parent ffff:fff1 ---------------- 
 Sent 98309051 bytes 83388 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 661610 bytes 5548 pkts (dropped 0, overlimits 0) 
  maxpacket 425 drop_overlimit 0 new_flow_count 9 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2 
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 
qdisc noqueue 0: dev eth1.1 root refcnt 2 
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 
qdisc noqueue 0: dev eth0.2 root refcnt 2 
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 
qdisc noqueue 0: dev wlan1 root refcnt 2 
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 
qdisc noqueue 0: dev wlan0 root refcnt 2 
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 
qdisc cake 803f: dev ifb4eth0 root refcnt 2 bandwidth 11Mbit besteffort triple-isolate wash split-gso rtt 100.0ms raw overhead 0 
 Sent 97896894 bytes 82261 pkt (dropped 1127, overlimits 137742 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 287680b of 4Mb
 capacity estimate: 11Mbit
 min/max network layer size:           60 /    1514
 min/max overhead-adjusted size:       60 /    1514
 average network hdr offset:           14

                  Tin 0
  thresh         11Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay       15.2ms
  av_delay       11.0ms
  sp_delay         10us
  pkts            83388
  bytes        99476483
  way_inds         1235
  way_miss          698
  way_cols            0
  drops            1127
  marks               0
  ack_drop            0
  sp_flows            6
  bk_flows            1
  un_flows            0
  max_len          1514
  quantum           335

Try:

config queue 'sqmqueue1'
	option ingress_ecn 'ECN'
	option egress_ecn 'NOECN'
	option itarget 'auto'
	option etarget 'auto'
	option verbosity '5'
	option qdisc 'cake'
	option script 'layer_cake.qos'
	option qdisc_advanced '1'
	option squash_dscp '1'
	option squash_ingress '1'
	option qdisc_really_really_advanced '1'
	option eqdisc_opts 'nat dual-srchost ack-filter'
	option linklayer 'ethernet'
	option linklayer_advanced '1'
	option tcMTU '2047'
	option tcTSIZE '128'
	option linklayer_adaptation_mechanism 'default'
	option debug_logging '1'
	option enabled '1'
	option iqdisc_opts 'nat dual-dsthost ingress'
	option interface 'eth0'
	option download '16000'
	option upload '2000'
	option overhead '18'
	option tcMPU '64'

Also, the tc output does not indicate long delays inside cake...

Could you also post the type of your docsis-modem and the output of ifstatus wan please to confirm that eth0 is the correct wan interface.

Thanks for the suggestion. I tried your configuration file and I am still getting bad performance. With the new config:

root@OpenWrt:/etc/config# speedtest-netperf.sh  -H netperf-west.bufferbloat.net -t 20
2020-09-13 17:46:14 Starting speedtest for 20 seconds per transfer session.
Measure speed to netperf-west.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
.....................
 Download:  14.76 Mbps
  Latency: [in msec, 21 pings, 0.00% packet loss]
      Min:  65.213
    10pct:  69.870
   Median: 158.626
      Avg: 220.020
    90pct: 432.365
      Max: 733.164
 CPU Load: [in % busy (avg +/- std dev), 19 samples]
     cpu0:  42.4 +/- 16.1
 Overhead: [in % used of total CPU available]
  netperf:  16.0
......................
   Upload:   1.93 Mbps
  Latency: [in msec, 22 pings, 0.00% packet loss]
      Min:  21.011
    10pct:  21.309
   Median:  25.989
      Avg:  26.610
    90pct:  30.767
      Max:  32.761
 CPU Load: [in % busy (avg +/- std dev), 20 samples]
     cpu0:  10.1 +/-  5.2
 Overhead: [in % used of total CPU available]
  netperf:   1.2

ifstatus wan shown below. The wan interface is eth0.2. I tried using both eth0 and eth0.2 interfaces in /etc/config/sqm and get the same results.

My cable modem is a Netgear CM400 (DOCSIS 3.0).

Just for comparison, here is the results of the speed test with SQM disabled:

root@OpenWrt:~# speedtest-netperf.sh  -H netperf-west.bufferbloat.net -t 20
2020-09-13 17:43:19 Starting speedtest for 20 seconds per transfer session.
Measure speed to netperf-west.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
.....................
 Download:  17.72 Mbps
  Latency: [in msec, 21 pings, 0.00% packet loss]
      Min:  22.512
    10pct:  46.585
   Median:  76.513
      Avg:  83.916
    90pct: 121.836
      Max: 170.932
 CPU Load: [in % busy (avg +/- std dev), 19 samples]
     cpu0:  22.3 +/-  9.7
 Overhead: [in % used of total CPU available]
  netperf:  11.8
......................
   Upload:   2.44 Mbps
  Latency: [in msec, 23 pings, 0.00% packet loss]
      Min:  20.039
    10pct:  22.639
   Median: 339.261
      Avg: 339.291
    90pct: 514.458
      Max: 569.469
 CPU Load: [in % busy (avg +/- std dev), 20 samples]
     cpu0:  12.4 +/- 10.6
 Overhead: [in % used of total CPU available]
  netperf:   1.1

You can see download is much better without SQM. One thing I just noticed is that upload has really bad latency and jitter without SQM, and improves dramatically with SQM.

As yet another data point, here are speedtest results through my previous, much older and much weaker TP-Link TL-WR841N:

.....................
 Download: 12.20 Mbps
  Latency: (in msec, 21 pings, 0.00% packet loss)
      Min: 23.201 
    10pct: 32.731 
   Median: 53.336 
      Avg: 55.025 
    90pct: 71.889 
      Max: 111.761
........................
   Upload: 1.74 Mbps
  Latency: (in msec, 24 pings, 0.00% packet loss)
      Min: 19.694 
    10pct: 22.104 
   Median: 27.644 
      Avg: 34.338 
    90pct: 49.031 
      Max: 90.389

You can see its much better than with the Archer! This old router is running a modified (low memory) OpenWrt 18.06.4, with the following /etc/config/sqm:

config queue 'eth1'
        option interface 'eth1'
        option qdisc_advanced '0'
        option debug_logging '0'
        option verbosity '5'
        option qdisc 'cake'
        option enabled '1'
        option download '15000'
        option upload '2300'
        option linklayer 'ethernet'
        option overhead '28'
        option script 'layer_cake.qos'

So confused...

ifstatus wan:
root@OpenWrt:/etc/config# ifstatus wan
{
	"up": true,
	"pending": false,
	"available": true,
	"autostart": true,
	"dynamic": false,
	"uptime": 845,
	"l3_device": "eth0.2",
	"proto": "dhcp",
	"device": "eth0.2",
	"updated": [
		"addresses",
		"routes",
		"data"
	],
	"metric": 0,
	"dns_metric": 0,
	"delegation": true,
	"ipv4-address": [
		{
			"address": "73.98.80.161",
			"mask": 22
		}
	],
	"ipv6-address": [
		
	],
	"ipv6-prefix": [
		
	],
	"ipv6-prefix-assignment": [
		
	],
	"route": [
		{
			"target": "0.0.0.0",
			"mask": 0,
			"nexthop": "73.98.80.1",
			"source": "73.98.80.161\/32"
		}
	],
	"dns-server": [
		"75.75.75.75",
		"75.75.76.76"
	],
	"dns-search": [
		"hsd1.nm.comcast.net."
	],
	"inactive": {
		"ipv4-address": [
			
		],
		"ipv6-address": [
			
		],
		"route": [
			
		],
		"dns-server": [
			
		],
		"dns-search": [
			
		]
	},
	"data": {
		"hostname": "OpenWrt",
		"leasetime": 302157
	}
}

How does this test look with the download set to 12000? Because honstly even without SQM the added 60ms to the download average is bad (if not exactly as bad as with SQM).
Since SQM seems to improve the upstream, one way forward would be to set the downstream shaper rate to zero (which disables SQM in the download direction it will not throttle down to 0 Kbps) and only shape on the downstream.
Question are the download rates and delays without SQM repeatable?

Hard to believe, I would recommend to use eth0.2 to instantiate SQM on.

Here is a sample test result with 12000 download and interface eth0.2:

2020-09-13 19:09:40 Starting speedtest for 20 seconds per transfer session.
Measure speed to netperf-west.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
.....................
 Download:  10.93 Mbps
  Latency: [in msec, 21 pings, 0.00% packet loss]
      Min:  21.821
    10pct:  22.285
   Median:  38.376
      Avg:  57.515
    90pct:  85.931
      Max: 202.992
 CPU Load: [in % busy (avg +/- std dev), 19 samples]
     cpu0:  28.6 +/- 10.2
 Overhead: [in % used of total CPU available]
  netperf:  11.8

You can see its pretty decent. It deteriorates steadily (and rather quickly) as I go above 12000. I cannot match the performance of my dinky old TP-Link TL-WR841N, which can barely run OpenWrt, but which can push out just over 12MBps with low latency and jitter. Just now I flashed the Archer with OpenWrt 18, tried the same configuration as on the old TP-Link, etc. No progress at all.

Download rates and delays without SQM are repeatable.

Yes I could turn off shaping on the ingress but that would kind of defeat the point. I was previously getting horrible videoconferencing performance, and installing OpenWrt with SQM on my old router was like a miracle. However, that old router had random freezes and requires nightly resets, which is why I wanted an upgrade. I'm very disappointed by the Archer and am wondering if in fact there might be some kind of hardware issue.

Thanks again for your help.

P.S. Regarding eth0 and eth0.2: I was also surprised that it didn't seem to make a difference. However, eth0.2 is the only virtual device under eth0 (there is no eth0.1), so perhaps it makes sense?

Mmmh, maybe it is becoming time for a DOCSIS3.1 modem? Could be that your ISP started to convert some bandwidth to DOCSIS3.1 and DOCSIS3.0 bandwidth might become congested?

That could mean that those downloads are a bit lumpy and cake throttles too early, but that would not really explain the insane latencies you see with downstream sqm on...

Sure, it is just that your current tests indicate that with the C7v2 downstream shaping is a bit less effective and might still incur massive delays under load.

Ah, okay, problems occur when wan and lan are running over the same switch and there is only one CPU port separated by VLAN ids, but if there are two CPU ethernet ports they do not interfere even if both use VLANs, good reasoning!

I replaced my Archer with a NETGEAR R6220 and everything worked amazingly well right out of the box.