Optimized build for the D-Link DIR-860L

@Bartvz Thanks for letting me know you see the same thing. I am on Macbook too.

To me, after messing with the power settings, plus your disassoc_low_ack trick, it come up in my mind thinking the auto adjust dBm become too weak so it disconnect me? Will try to test further with that option

Looking forward to your new build

P.S. the ath10k wifi I have at the moment won't disconnect me

Tested, setting disassoc_low_ack 0 does not help.

Only using higher channel can delay the connection drop, but it will still drop

Then I would advise posting your experience and logs to the mailing list where there are more technical people.

I flashed your r3194 and it is working fine for an hour now, so I don't know if I should report on the bug list - don't have much other info except those few lines I posted above. Let me use a few days and see how it goes.

Is there a way to improve the 2.4GHz performance?

In the stock OEM, it can perform at about 80Mbit, but with LEDE, only 9Mbit is archived.

Quick update: Still building and testing since I have run into some snags.
First snag was stack traces like the one below popping up:

[details=click here]Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.173000] WARNING: CPU: 3 PID: 0 at net/core/skbuff.c:4194 skb_try_coalesce+0x228/0x35c()
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.189000] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TEE xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipvThu Feb 9 18:39:54 2017 kern.warn kernel: [23041.424000] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.47 #0
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.436000] Stack : 00000000 00000000 804c6862 00000033 00000000 00000000 80460000 804e0000
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.436000] 87c4bf6c 80461c83 803df434 00000003 00000000 804c367c 87c21d3f 804fade0
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.436000] 00000000 80063260 80460000 804e0000 80466188 8046618c 803e4068 87c21bec
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.436000] 00000003 8006101c 87c21d3f 804fade0 00000000 00000126 00000000 00c21bec
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.436000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.436000] ...
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.507000] Call Trace:
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.511000] [<8001671c>] show_stack+0x6c/0x88
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.520000] [<801b1180>] dump_stack+0x8c/0xc0
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.529000] [<8002b944>] warn_slowpath_common+0xa0/0xd0
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.539000] [<8002b9fc>] warn_slowpath_null+0x18/0x24
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.549000] [<8028c16c>] skb_try_coalesce+0x228/0x35c
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.559000] [<802e7510>] tcp_try_coalesce+0x70/0xd4
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.569000]
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.572000] ---[ end trace a0c02c7791d287cb ]---[/details]

Second snag is random reboots at the moment. I will not release a build unless it is stable. However, if you guys are using "stock" LEDE, do you also run into random reboots?

@enri just did a quick iperf3 run with my latest build (r3511) using my laptop (connects at 144 Mbps) on the 2.4 GHz band using the following switches on the client: "-c -n 512M". These are my results:

[details=click here][ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.01 sec 8.75 MBytes 72.9 Mbits/sec
[ 4] 1.01-2.02 sec 7.75 MBytes 64.3 Mbits/sec
[ 4] 2.02-3.00 sec 8.00 MBytes 68.2 Mbits/sec
[ 4] 3.00-4.00 sec 6.88 MBytes 57.7 Mbits/sec
[ 4] 4.00-5.00 sec 6.12 MBytes 51.4 Mbits/sec
[ 4] 5.00-6.00 sec 5.88 MBytes 49.3 Mbits/sec
[ 4] 6.00-7.00 sec 7.50 MBytes 62.9 Mbits/sec
[ 4] 7.00-8.00 sec 8.25 MBytes 69.1 Mbits/sec
[ 4] 8.00-9.00 sec 7.00 MBytes 58.8 Mbits/sec
[ 4] 9.00-10.00 sec 5.88 MBytes 49.3 Mbits/sec
[ 4] 10.00-11.00 sec 6.75 MBytes 56.6 Mbits/sec
[ 4] 11.00-12.00 sec 6.25 MBytes 52.5 Mbits/sec
[ 4] 12.00-13.00 sec 5.50 MBytes 46.1 Mbits/sec
[ 4] 13.00-14.00 sec 5.62 MBytes 47.1 Mbits/sec
[ 4] 14.00-15.00 sec 4.25 MBytes 35.7 Mbits/sec
[ 4] 15.00-16.00 sec 3.75 MBytes 31.5 Mbits/sec
[ 4] 16.00-17.00 sec 3.88 MBytes 32.5 Mbits/sec
[ 4] 17.00-18.00 sec 4.88 MBytes 40.9 Mbits/sec
[ 4] 18.00-19.00 sec 4.12 MBytes 34.6 Mbits/sec
[ 4] 19.00-20.00 sec 3.75 MBytes 31.5 Mbits/sec
[ 4] 20.00-21.00 sec 4.75 MBytes 39.8 Mbits/sec
[ 4] 21.00-22.00 sec 4.00 MBytes 33.5 Mbits/sec
[ 4] 22.00-23.00 sec 3.00 MBytes 25.2 Mbits/sec
[ 4] 23.00-24.00 sec 3.38 MBytes 28.3 Mbits/sec
[ 4] 24.00-25.00 sec 4.25 MBytes 35.6 Mbits/sec
[ 4] 25.00-26.00 sec 1.50 MBytes 12.6 Mbits/sec
[ 4] 26.00-27.00 sec 4.00 MBytes 33.6 Mbits/sec
[ 4] 27.00-28.00 sec 4.00 MBytes 33.5 Mbits/sec
[ 4] 28.00-29.00 sec 4.38 MBytes 36.7 Mbits/sec
[ 4] 29.00-30.00 sec 4.12 MBytes 34.6 Mbits/sec
[ 4] 30.00-31.00 sec 3.00 MBytes 25.2 Mbits/sec
[ 4] 31.00-32.00 sec 4.12 MBytes 34.6 Mbits/sec
[ 4] 32.00-33.00 sec 3.88 MBytes 32.5 Mbits/sec
[ 4] 33.00-34.00 sec 4.88 MBytes 40.9 Mbits/sec
[ 4] 34.00-35.00 sec 5.50 MBytes 46.1 Mbits/sec
[ 4] 35.00-36.00 sec 5.12 MBytes 43.0 Mbits/sec
[ 4] 36.00-37.00 sec 4.88 MBytes 40.9 Mbits/sec
[ 4] 37.00-38.00 sec 5.12 MBytes 43.0 Mbits/sec
[ 4] 38.00-39.00 sec 3.75 MBytes 31.5 Mbits/sec
[ 4] 39.00-40.00 sec 4.75 MBytes 39.8 Mbits/sec
[ 4] 40.00-41.00 sec 6.50 MBytes 54.6 Mbits/sec
[ 4] 41.00-42.00 sec 7.50 MBytes 62.9 Mbits/sec
[ 4] 42.00-43.00 sec 7.75 MBytes 65.0 Mbits/sec
[ 4] 43.00-44.02 sec 5.00 MBytes 41.2 Mbits/sec
[ 4] 44.02-45.00 sec 5.00 MBytes 42.7 Mbits/sec
[ 4] 45.00-46.00 sec 7.25 MBytes 60.8 Mbits/sec
[ 4] 46.00-47.00 sec 9.12 MBytes 76.6 Mbits/sec
[ 4] 47.00-48.00 sec 7.12 MBytes 59.8 Mbits/sec
[ 4] 48.00-49.00 sec 7.38 MBytes 61.8 Mbits/sec
[ 4] 49.00-50.00 sec 7.12 MBytes 59.8 Mbits/sec
[ 4] 50.00-51.00 sec 6.50 MBytes 54.5 Mbits/sec
[ 4] 51.00-52.00 sec 5.88 MBytes 49.3 Mbits/sec
[ 4] 52.00-53.00 sec 5.62 MBytes 47.1 Mbits/sec
[ 4] 53.00-54.00 sec 7.00 MBytes 58.8 Mbits/sec
[ 4] 54.00-55.00 sec 7.75 MBytes 64.9 Mbits/sec
[ 4] 55.00-56.00 sec 4.62 MBytes 38.8 Mbits/sec
[ 4] 56.00-57.00 sec 5.88 MBytes 49.3 Mbits/sec
[ 4] 57.00-58.00 sec 5.38 MBytes 45.1 Mbits/sec
[ 4] 58.00-59.00 sec 5.00 MBytes 42.0 Mbits/sec
[ 4] 59.00-60.00 sec 6.50 MBytes 54.5 Mbits/sec
[ 4] 60.00-61.00 sec 6.88 MBytes 57.7 Mbits/sec
[ 4] 61.00-62.00 sec 6.62 MBytes 55.6 Mbits/sec
[ 4] 62.00-63.00 sec 5.75 MBytes 48.2 Mbits/sec
[ 4] 63.00-64.00 sec 8.62 MBytes 72.4 Mbits/sec
[ 4] 64.00-65.00 sec 5.25 MBytes 44.0 Mbits/sec
[ 4] 65.00-66.00 sec 5.75 MBytes 48.2 Mbits/sec
[ 4] 66.00-67.00 sec 4.25 MBytes 35.7 Mbits/sec
[ 4] 67.00-68.00 sec 6.88 MBytes 57.6 Mbits/sec
[ 4] 68.00-69.00 sec 7.12 MBytes 59.7 Mbits/sec
[ 4] 69.00-70.00 sec 6.38 MBytes 53.6 Mbits/sec
[ 4] 70.00-71.00 sec 7.12 MBytes 59.8 Mbits/sec
[ 4] 71.00-72.00 sec 6.62 MBytes 55.5 Mbits/sec
[ 4] 72.00-73.00 sec 5.88 MBytes 49.2 Mbits/sec
[ 4] 73.00-74.00 sec 5.88 MBytes 49.3 Mbits/sec
[ 4] 74.00-75.00 sec 4.00 MBytes 33.6 Mbits/sec
[ 4] 75.00-76.00 sec 4.38 MBytes 36.7 Mbits/sec
[ 4] 76.00-77.00 sec 5.12 MBytes 43.0 Mbits/sec
[ 4] 77.00-78.00 sec 5.75 MBytes 48.3 Mbits/sec
[ 4] 78.00-79.00 sec 5.25 MBytes 44.0 Mbits/sec
[ 4] 79.00-80.00 sec 5.38 MBytes 45.1 Mbits/sec
[ 4] 80.00-81.00 sec 6.00 MBytes 50.3 Mbits/sec
[ 4] 81.00-82.00 sec 6.00 MBytes 50.4 Mbits/sec
[ 4] 82.00-83.00 sec 5.12 MBytes 43.0 Mbits/sec
[ 4] 83.00-84.00 sec 5.50 MBytes 46.1 Mbits/sec
[ 4] 84.00-85.00 sec 5.50 MBytes 46.2 Mbits/sec
[ 4] 85.00-86.00 sec 1.75 MBytes 14.7 Mbits/sec
[ 4] 86.00-87.00 sec 3.38 MBytes 28.3 Mbits/sec
[ 4] 87.00-88.00 sec 4.38 MBytes 36.7 Mbits/sec
[ 4] 88.00-89.00 sec 4.25 MBytes 35.6 Mbits/sec
[ 4] 89.00-90.00 sec 4.62 MBytes 38.8 Mbits/sec
[ 4] 90.00-91.00 sec 5.38 MBytes 45.1 Mbits/sec
[ 4] 91.00-92.00 sec 4.38 MBytes 36.7 Mbits/sec
[ 4] 92.00-92.69 sec 3.88 MBytes 47.5 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-92.69 sec 512 MBytes 46.3 Mbits/sec sender
[ 4] 0.00-92.69 sec 512 MBytes 46.3 Mbits/sec receiver

iperf Done.[/details]
Sounds to me like a problem somewhere. Could be you chose a crowded band or that your device connects at a low speed due to hardware. This is a good setup guide and contains excellent pointers on how to improve wireless performance for the 2.4 GHz band.

@Bartvz
Thanks for the test results. It is about the same of the best I can get.
In the 11 channels (1 to 11) I can choose from, only 1 can create a speed up to 70Mbps at 2 times out of 20 times with speedtest.net. Others are well below, like 30Mbps is the maximum, and usually below 20Mbps. It is not so noise and busy here, only 11 AP around. With the old ath9k and another ath10k 2.4GHz performance are always around 85Mbps, in all 11 channels.

The 5GHz connection is stable now, no obvious problem.

My usage of the device is low, longest 2 hours in a time. Only take it out for some test. I remember a reboot (crash) of 1 or 2 times, I though it was my unit's problem.

New build is finally up!
No more random reboots or stack traces in the logs. It has been running for 3 days rock solid. Compiled with compiler flags which should eek out a bit more performance. Let me know what you guys think/experience!

Bartvz, thanks a lot.

I just installed r3636. Seems to be quite alright, but the snag came around:

Wed Mar  8 20:47:43 2017 kern.warn kernel: [   38.150000] WARNING: CPU: 1 PID: 2769 at net/core/skbuff.c:4194 skb_try_coalesce+0x228/0x35c()
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.170000] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TEE xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipvWed Mar  8 20:47:44 2017 kern.warn kernel: [   38.400000] CPU: 1 PID: 2769 Comm: uhttpd Not tainted 4.4.50 #0
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] Stack : 00000000 00000000 804c6862 00000033 00000000 00000000 80460000 804e0000
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  86c76b6c 80461c83 803df43c 00000001 00000ad1 804c367c 87c19d3f 804fade0
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  00000000 8006320c 80460000 804e0000 80466188 8046618c 803e4070 87c19bec
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  00000003 80060fc8 87c19d3f 804fade0 00000000 000000da 00000000 00c19bec
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  ...
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.480000] Call Trace:
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.490000] [<8001671c>] show_stack+0x6c/0x88
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.500000] [<801b1010>] dump_stack+0x8c/0xc0
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.510000] [<8002b944>] warn_slowpath_common+0xa0/0xd0
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.520000] [<8002b9fc>] warn_slowpath_null+0x18/0x24
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.530000] [<8028bff8>] skb_try_coalesce+0x228/0x35c
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.540000] [<802e73c0>] tcp_try_coalesce+0x70/0xd4
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.550000] 
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.550000] ---[ end trace 55743a95a3379d01 ]---

Jimzhong reported a similar issue here: https://forum.openwrt.org/t/lede-v17-01-0-rc1/1285/6 - it is attributed to relayd - this is quite possible, as I use WDS (this is the WDS server router).

The official stable 17.01.0 is much worse, it either hangs or reboots after a while (reported here: https://bugs.lede-project.org/index.php?do=details&task_id=606)

Anyway, I'll test it for a couple of days and will came back with the results.

Keep up the good work and thanks a lot

Do your builds also suffer from crashes when sqm-cake is enabled? Also, do you happen to know whether VLANs are functional with your build? Thank you very much for the great work you're doing :slight_smile:

@karesch & @Mushoz : Thanks for your kind words!
Those hangs, crashes and reboots might be something related to hardware offloading in the switch. See this message (and the rest of the thread) on the cake mailing list: https://www.mail-archive.com/lede-dev@lists.infradead.org/msg06441.html

VLANs are probably broken since they are on the rest of mt76 devices atm.

Compiling a new build atm with ethtool included. Will upload once it is done so the brave can test if they want. Also noticed that I forgot to include nano and BCP38 in the last build. They will be included in the coming builds.

Is the VLAN issue a recent issue with mt76 device? Are there any older builds with working VLAN support? I need VLAN support, because my new connection comes in as 3 different tagged VLANs on a single line, one for internet, one for IPTV and finally the last one for VoIP. I'd prefer to use a LEDE router over the standard ISP crap ^^

If my memory serves me correct this has been the situation for at least 2-3 months. But take this with a grain of salt because I don't use VLAN's.

Test build is up: r3677

Running ethtool -k eth0 prints the following:

Features for eth0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Run the following commands:

ethtool -K eth0 rx off
ethtool -K eth0 tx off
ethtool -K eth0 tso off

ethtool -k eth0 should now print:

Features for eth0:
rx-checksumming: off
tx-checksumming: off
tx-checksum-ipv4: off
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: off
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: off
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

This should do the trick but we cannot be sure unless we test.
Also, this looks suspicious among the sea off's:

tx-vlan-offload: on

1 Like

That looks interesting. Do you think it might be related to the non-functional VLAN functionality? Could you test whether VLANs are working once that setting is turned off?

Turning it off is not hard:

ethtool -K eth0 txvlan off

To verify if it's off:

ethtool -k eth0 | grep tx-vlan-offload

Never worked with VLAN's so what is something easy I can do to test?

Anyone else testing getting reboots and the following stack traces?
System log:

Sun Mar 12 13:45:11 2017 kern.err kernel: [ 9312.980000] INFO: rcu_sched detected stalls on CPUs/tasks:
Sun Mar 12 13:45:11 2017 kern.err kernel: [ 9312.990000] 3-...: (0 ticks this GP) idle=968/0/0 softirq=848170/848170 fqs=0
Sun Mar 12 13:45:11 2017 kern.err kernel: [ 9313.000000] (detected by 1, t=6004 jiffies, g=104506, c=104505, q=166)
Sun Mar 12 13:45:11 2017 kern.info kernel: [ 9313.010000] Task dump for CPU 3:
Sun Mar 12 13:45:11 2017 kern.info kernel: [ 9313.020000] swapper/3 R running 0 0 1 0x00100000
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] Stack : 00000004 00000001 0000000a 00000000 00000000 00000001 804af2a4 803e0000
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] 8045e75c 80460000 00000001 8045e680 00000001 80460000 00000000 800133f0
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] 1100fc03 00000003 87c74000 87c75eb8 80460000 8005d028 1100fc03 00000003
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] 00000000 803e0000 804af2a4 8005d020 87c74000 87c75ee0 80460000 8001aea8
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] 1100fc03 00000004 8045e4a0 000000a0 8045e75c 8001aeb0 02002810 26200200
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] ...
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.100000] Call Trace:
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.110000] [<8000bb7c>] __schedule+0x330/0x750
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.120000] [<800133f0>] r4k_wait_irqoff+0x0/0x20
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.130000]
Sun Mar 12 13:45:11 2017 kern.err kernel: [ 9313.130000] rcu_sched kthread starved for 6017 jiffies! g104506 c104505 f0x0 s3 ->state=0x1

Kernel log:

[ 9312.980000] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 9312.990000] 3-...: (0 ticks this GP) idle=968/0/0 softirq=848170/848170 fqs=0
[ 9313.000000] (detected by 1, t=6004 jiffies, g=104506, c=104505, q=166)
[ 9313.010000] Task dump for CPU 3:
[ 9313.020000] swapper/3 R running 0 0 1 0x00100000
[ 9313.030000] Stack : 00000004 00000001 0000000a 00000000 00000000 00000001 804af2a4 803e0000
[ 9313.030000] 8045e75c 80460000 00000001 8045e680 00000001 80460000 00000000 800133f0
[ 9313.030000] 1100fc03 00000003 87c74000 87c75eb8 80460000 8005d028 1100fc03 00000003
[ 9313.030000] 00000000 803e0000 804af2a4 8005d020 87c74000 87c75ee0 80460000 8001aea8
[ 9313.030000] 1100fc03 00000004 8045e4a0 000000a0 8045e75c 8001aeb0 02002810 26200200
[ 9313.030000] ...
[ 9313.100000] Call Trace:
[ 9313.110000] [<8000bb7c>] __schedule+0x330/0x750
[ 9313.120000] [<800133f0>] r4k_wait_irqoff+0x0/0x20
[ 9313.130000]
[ 9313.130000] rcu_sched kthread starved for 6017 jiffies! g104506 c104505 f0x0 s3 ->state=0x1
[ 9459.240000] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 9459.250000] 3-...: (0 ticks this GP) idle=bf0/0/0 softirq=848170/848170 fqs=0
[ 9459.260000] (detected by 1, t=6004 jiffies, g=104860, c=104859, q=2350)
[ 9459.270000] Task dump for CPU 3:
[ 9459.280000] swapper/3 R running 0 0 1 0x00100000
[ 9459.290000] Stack : 00000004 ffffffff 7fc85190 00000000 00000010 0040540f 804af2a4 803e0000
[ 9459.290000] 8045e75c 80460000 00000001 8045e680 00000001 80460000 000010d9 800133f0
[ 9459.290000] 1100fc03 00000003 87c74000 87c75eb8 80460000 8005d028 1100fc03 00000003
[ 9459.290000] 00000000 803e0000 804af2a4 8005d020 87c74000 87c75ee0 80460000 8001aea8
[ 9459.290000] 1100fc03 00000004 8045e4a0 000000a0 8045e75c 8001aeb0 02002810 26200200
[ 9459.290000] ...
[ 9459.360000] Call Trace:
[ 9459.370000] [<8000bb7c>] __schedule+0x330/0x750
[ 9459.380000] [<800133f0>] r4k_wait_irqoff+0x0/0x20
[ 9459.390000]
[ 9459.390000] rcu_sched kthread starved for 6017 jiffies! g104860 c104859 f0x0 s3 ->state=0x1

New test build here, r3764. All the usual stuff included.
I have been trying hard to crash it today by not turning off the offloading features and by running SQM QoS. So far, so good.
Also, VLAN tagging should be fixed. See this and this.

Just got another stack trace so testing now with all offloads off.

ethtool -K eth0 rx off tx off tso off sg off

I managed to crash it within a few minutes. It seems to fail much faster when under high loads. I was routing 500-600 Mbit/s of traffic WAN <-> LAN with SQM cake enabled, and I can reliably crash it within a few runs :frowning: going to test fq_codel now.

In my limited testing, fq_codel is completely stable. I was unable to crash the router over many high speed WAN <-> LAN tests through iperf with fq_codel enabled.

Was that with offloading turned off?

I didn't change the offloading settings, so if they are enabled by default on your build, as they are on the regular builds, then I tested with offloading enabled. I can retest with offloading disabled if you'd like. Is ethtool included in your build? My router is not connected to the internet, so I am not able to install packages through opkg.