Optimized build for the D-Link DIR-860L

I flashed your r3194 and it is working fine for an hour now, so I don't know if I should report on the bug list - don't have much other info except those few lines I posted above. Let me use a few days and see how it goes.

Is there a way to improve the 2.4GHz performance?

In the stock OEM, it can perform at about 80Mbit, but with LEDE, only 9Mbit is archived.

Quick update: Still building and testing since I have run into some snags.
First snag was stack traces like the one below popping up:

[details=click here]Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.173000] WARNING: CPU: 3 PID: 0 at net/core/skbuff.c:4194 skb_try_coalesce+0x228/0x35c()
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.189000] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TEE xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipvThu Feb 9 18:39:54 2017 kern.warn kernel: [23041.424000] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.47 #0
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.436000] Stack : 00000000 00000000 804c6862 00000033 00000000 00000000 80460000 804e0000
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.436000] 87c4bf6c 80461c83 803df434 00000003 00000000 804c367c 87c21d3f 804fade0
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.436000] 00000000 80063260 80460000 804e0000 80466188 8046618c 803e4068 87c21bec
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.436000] 00000003 8006101c 87c21d3f 804fade0 00000000 00000126 00000000 00c21bec
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.436000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.436000] ...
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.507000] Call Trace:
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.511000] [<8001671c>] show_stack+0x6c/0x88
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.520000] [<801b1180>] dump_stack+0x8c/0xc0
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.529000] [<8002b944>] warn_slowpath_common+0xa0/0xd0
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.539000] [<8002b9fc>] warn_slowpath_null+0x18/0x24
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.549000] [<8028c16c>] skb_try_coalesce+0x228/0x35c
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.559000] [<802e7510>] tcp_try_coalesce+0x70/0xd4
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.569000]
Thu Feb 9 18:39:54 2017 kern.warn kernel: [23041.572000] ---[ end trace a0c02c7791d287cb ]---[/details]

Second snag is random reboots at the moment. I will not release a build unless it is stable. However, if you guys are using "stock" LEDE, do you also run into random reboots?

@enri just did a quick iperf3 run with my latest build (r3511) using my laptop (connects at 144 Mbps) on the 2.4 GHz band using the following switches on the client: "-c -n 512M". These are my results:

[details=click here][ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.01 sec 8.75 MBytes 72.9 Mbits/sec
[ 4] 1.01-2.02 sec 7.75 MBytes 64.3 Mbits/sec
[ 4] 2.02-3.00 sec 8.00 MBytes 68.2 Mbits/sec
[ 4] 3.00-4.00 sec 6.88 MBytes 57.7 Mbits/sec
[ 4] 4.00-5.00 sec 6.12 MBytes 51.4 Mbits/sec
[ 4] 5.00-6.00 sec 5.88 MBytes 49.3 Mbits/sec
[ 4] 6.00-7.00 sec 7.50 MBytes 62.9 Mbits/sec
[ 4] 7.00-8.00 sec 8.25 MBytes 69.1 Mbits/sec
[ 4] 8.00-9.00 sec 7.00 MBytes 58.8 Mbits/sec
[ 4] 9.00-10.00 sec 5.88 MBytes 49.3 Mbits/sec
[ 4] 10.00-11.00 sec 6.75 MBytes 56.6 Mbits/sec
[ 4] 11.00-12.00 sec 6.25 MBytes 52.5 Mbits/sec
[ 4] 12.00-13.00 sec 5.50 MBytes 46.1 Mbits/sec
[ 4] 13.00-14.00 sec 5.62 MBytes 47.1 Mbits/sec
[ 4] 14.00-15.00 sec 4.25 MBytes 35.7 Mbits/sec
[ 4] 15.00-16.00 sec 3.75 MBytes 31.5 Mbits/sec
[ 4] 16.00-17.00 sec 3.88 MBytes 32.5 Mbits/sec
[ 4] 17.00-18.00 sec 4.88 MBytes 40.9 Mbits/sec
[ 4] 18.00-19.00 sec 4.12 MBytes 34.6 Mbits/sec
[ 4] 19.00-20.00 sec 3.75 MBytes 31.5 Mbits/sec
[ 4] 20.00-21.00 sec 4.75 MBytes 39.8 Mbits/sec
[ 4] 21.00-22.00 sec 4.00 MBytes 33.5 Mbits/sec
[ 4] 22.00-23.00 sec 3.00 MBytes 25.2 Mbits/sec
[ 4] 23.00-24.00 sec 3.38 MBytes 28.3 Mbits/sec
[ 4] 24.00-25.00 sec 4.25 MBytes 35.6 Mbits/sec
[ 4] 25.00-26.00 sec 1.50 MBytes 12.6 Mbits/sec
[ 4] 26.00-27.00 sec 4.00 MBytes 33.6 Mbits/sec
[ 4] 27.00-28.00 sec 4.00 MBytes 33.5 Mbits/sec
[ 4] 28.00-29.00 sec 4.38 MBytes 36.7 Mbits/sec
[ 4] 29.00-30.00 sec 4.12 MBytes 34.6 Mbits/sec
[ 4] 30.00-31.00 sec 3.00 MBytes 25.2 Mbits/sec
[ 4] 31.00-32.00 sec 4.12 MBytes 34.6 Mbits/sec
[ 4] 32.00-33.00 sec 3.88 MBytes 32.5 Mbits/sec
[ 4] 33.00-34.00 sec 4.88 MBytes 40.9 Mbits/sec
[ 4] 34.00-35.00 sec 5.50 MBytes 46.1 Mbits/sec
[ 4] 35.00-36.00 sec 5.12 MBytes 43.0 Mbits/sec
[ 4] 36.00-37.00 sec 4.88 MBytes 40.9 Mbits/sec
[ 4] 37.00-38.00 sec 5.12 MBytes 43.0 Mbits/sec
[ 4] 38.00-39.00 sec 3.75 MBytes 31.5 Mbits/sec
[ 4] 39.00-40.00 sec 4.75 MBytes 39.8 Mbits/sec
[ 4] 40.00-41.00 sec 6.50 MBytes 54.6 Mbits/sec
[ 4] 41.00-42.00 sec 7.50 MBytes 62.9 Mbits/sec
[ 4] 42.00-43.00 sec 7.75 MBytes 65.0 Mbits/sec
[ 4] 43.00-44.02 sec 5.00 MBytes 41.2 Mbits/sec
[ 4] 44.02-45.00 sec 5.00 MBytes 42.7 Mbits/sec
[ 4] 45.00-46.00 sec 7.25 MBytes 60.8 Mbits/sec
[ 4] 46.00-47.00 sec 9.12 MBytes 76.6 Mbits/sec
[ 4] 47.00-48.00 sec 7.12 MBytes 59.8 Mbits/sec
[ 4] 48.00-49.00 sec 7.38 MBytes 61.8 Mbits/sec
[ 4] 49.00-50.00 sec 7.12 MBytes 59.8 Mbits/sec
[ 4] 50.00-51.00 sec 6.50 MBytes 54.5 Mbits/sec
[ 4] 51.00-52.00 sec 5.88 MBytes 49.3 Mbits/sec
[ 4] 52.00-53.00 sec 5.62 MBytes 47.1 Mbits/sec
[ 4] 53.00-54.00 sec 7.00 MBytes 58.8 Mbits/sec
[ 4] 54.00-55.00 sec 7.75 MBytes 64.9 Mbits/sec
[ 4] 55.00-56.00 sec 4.62 MBytes 38.8 Mbits/sec
[ 4] 56.00-57.00 sec 5.88 MBytes 49.3 Mbits/sec
[ 4] 57.00-58.00 sec 5.38 MBytes 45.1 Mbits/sec
[ 4] 58.00-59.00 sec 5.00 MBytes 42.0 Mbits/sec
[ 4] 59.00-60.00 sec 6.50 MBytes 54.5 Mbits/sec
[ 4] 60.00-61.00 sec 6.88 MBytes 57.7 Mbits/sec
[ 4] 61.00-62.00 sec 6.62 MBytes 55.6 Mbits/sec
[ 4] 62.00-63.00 sec 5.75 MBytes 48.2 Mbits/sec
[ 4] 63.00-64.00 sec 8.62 MBytes 72.4 Mbits/sec
[ 4] 64.00-65.00 sec 5.25 MBytes 44.0 Mbits/sec
[ 4] 65.00-66.00 sec 5.75 MBytes 48.2 Mbits/sec
[ 4] 66.00-67.00 sec 4.25 MBytes 35.7 Mbits/sec
[ 4] 67.00-68.00 sec 6.88 MBytes 57.6 Mbits/sec
[ 4] 68.00-69.00 sec 7.12 MBytes 59.7 Mbits/sec
[ 4] 69.00-70.00 sec 6.38 MBytes 53.6 Mbits/sec
[ 4] 70.00-71.00 sec 7.12 MBytes 59.8 Mbits/sec
[ 4] 71.00-72.00 sec 6.62 MBytes 55.5 Mbits/sec
[ 4] 72.00-73.00 sec 5.88 MBytes 49.2 Mbits/sec
[ 4] 73.00-74.00 sec 5.88 MBytes 49.3 Mbits/sec
[ 4] 74.00-75.00 sec 4.00 MBytes 33.6 Mbits/sec
[ 4] 75.00-76.00 sec 4.38 MBytes 36.7 Mbits/sec
[ 4] 76.00-77.00 sec 5.12 MBytes 43.0 Mbits/sec
[ 4] 77.00-78.00 sec 5.75 MBytes 48.3 Mbits/sec
[ 4] 78.00-79.00 sec 5.25 MBytes 44.0 Mbits/sec
[ 4] 79.00-80.00 sec 5.38 MBytes 45.1 Mbits/sec
[ 4] 80.00-81.00 sec 6.00 MBytes 50.3 Mbits/sec
[ 4] 81.00-82.00 sec 6.00 MBytes 50.4 Mbits/sec
[ 4] 82.00-83.00 sec 5.12 MBytes 43.0 Mbits/sec
[ 4] 83.00-84.00 sec 5.50 MBytes 46.1 Mbits/sec
[ 4] 84.00-85.00 sec 5.50 MBytes 46.2 Mbits/sec
[ 4] 85.00-86.00 sec 1.75 MBytes 14.7 Mbits/sec
[ 4] 86.00-87.00 sec 3.38 MBytes 28.3 Mbits/sec
[ 4] 87.00-88.00 sec 4.38 MBytes 36.7 Mbits/sec
[ 4] 88.00-89.00 sec 4.25 MBytes 35.6 Mbits/sec
[ 4] 89.00-90.00 sec 4.62 MBytes 38.8 Mbits/sec
[ 4] 90.00-91.00 sec 5.38 MBytes 45.1 Mbits/sec
[ 4] 91.00-92.00 sec 4.38 MBytes 36.7 Mbits/sec
[ 4] 92.00-92.69 sec 3.88 MBytes 47.5 Mbits/sec
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-92.69 sec 512 MBytes 46.3 Mbits/sec sender
[ 4] 0.00-92.69 sec 512 MBytes 46.3 Mbits/sec receiver

iperf Done.[/details]
Sounds to me like a problem somewhere. Could be you chose a crowded band or that your device connects at a low speed due to hardware. This is a good setup guide and contains excellent pointers on how to improve wireless performance for the 2.4 GHz band.

@Bartvz
Thanks for the test results. It is about the same of the best I can get.
In the 11 channels (1 to 11) I can choose from, only 1 can create a speed up to 70Mbps at 2 times out of 20 times with speedtest.net. Others are well below, like 30Mbps is the maximum, and usually below 20Mbps. It is not so noise and busy here, only 11 AP around. With the old ath9k and another ath10k 2.4GHz performance are always around 85Mbps, in all 11 channels.

The 5GHz connection is stable now, no obvious problem.

My usage of the device is low, longest 2 hours in a time. Only take it out for some test. I remember a reboot (crash) of 1 or 2 times, I though it was my unit's problem.

New build is finally up!
No more random reboots or stack traces in the logs. It has been running for 3 days rock solid. Compiled with compiler flags which should eek out a bit more performance. Let me know what you guys think/experience!

Bartvz, thanks a lot.

I just installed r3636. Seems to be quite alright, but the snag came around:

Wed Mar  8 20:47:43 2017 kern.warn kernel: [   38.150000] WARNING: CPU: 1 PID: 2769 at net/core/skbuff.c:4194 skb_try_coalesce+0x228/0x35c()
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.170000] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TEE xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipvWed Mar  8 20:47:44 2017 kern.warn kernel: [   38.400000] CPU: 1 PID: 2769 Comm: uhttpd Not tainted 4.4.50 #0
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] Stack : 00000000 00000000 804c6862 00000033 00000000 00000000 80460000 804e0000
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  86c76b6c 80461c83 803df43c 00000001 00000ad1 804c367c 87c19d3f 804fade0
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  00000000 8006320c 80460000 804e0000 80466188 8046618c 803e4070 87c19bec
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  00000003 80060fc8 87c19d3f 804fade0 00000000 000000da 00000000 00c19bec
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  ...
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.480000] Call Trace:
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.490000] [<8001671c>] show_stack+0x6c/0x88
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.500000] [<801b1010>] dump_stack+0x8c/0xc0
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.510000] [<8002b944>] warn_slowpath_common+0xa0/0xd0
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.520000] [<8002b9fc>] warn_slowpath_null+0x18/0x24
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.530000] [<8028bff8>] skb_try_coalesce+0x228/0x35c
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.540000] [<802e73c0>] tcp_try_coalesce+0x70/0xd4
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.550000] 
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.550000] ---[ end trace 55743a95a3379d01 ]---

Jimzhong reported a similar issue here: https://forum.openwrt.org/t/lede-v17-01-0-rc1/1285/6 - it is attributed to relayd - this is quite possible, as I use WDS (this is the WDS server router).

The official stable 17.01.0 is much worse, it either hangs or reboots after a while (reported here: https://bugs.lede-project.org/index.php?do=details&task_id=606)

Anyway, I'll test it for a couple of days and will came back with the results.

Keep up the good work and thanks a lot

Do your builds also suffer from crashes when sqm-cake is enabled? Also, do you happen to know whether VLANs are functional with your build? Thank you very much for the great work you're doing :slight_smile:

@karesch & @Mushoz : Thanks for your kind words!
Those hangs, crashes and reboots might be something related to hardware offloading in the switch. See this message (and the rest of the thread) on the cake mailing list: https://www.mail-archive.com/lede-dev@lists.infradead.org/msg06441.html

VLANs are probably broken since they are on the rest of mt76 devices atm.

Compiling a new build atm with ethtool included. Will upload once it is done so the brave can test if they want. Also noticed that I forgot to include nano and BCP38 in the last build. They will be included in the coming builds.

Is the VLAN issue a recent issue with mt76 device? Are there any older builds with working VLAN support? I need VLAN support, because my new connection comes in as 3 different tagged VLANs on a single line, one for internet, one for IPTV and finally the last one for VoIP. I'd prefer to use a LEDE router over the standard ISP crap ^^

If my memory serves me correct this has been the situation for at least 2-3 months. But take this with a grain of salt because I don't use VLAN's.

Test build is up: r3677

Running ethtool -k eth0 prints the following:

Features for eth0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Run the following commands:

ethtool -K eth0 rx off
ethtool -K eth0 tx off
ethtool -K eth0 tso off

ethtool -k eth0 should now print:

Features for eth0:
rx-checksumming: off
tx-checksumming: off
tx-checksum-ipv4: off
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: off
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: off
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

This should do the trick but we cannot be sure unless we test.
Also, this looks suspicious among the sea off's:

tx-vlan-offload: on

1 Like

That looks interesting. Do you think it might be related to the non-functional VLAN functionality? Could you test whether VLANs are working once that setting is turned off?

Turning it off is not hard:

ethtool -K eth0 txvlan off

To verify if it's off:

ethtool -k eth0 | grep tx-vlan-offload

Never worked with VLAN's so what is something easy I can do to test?

Anyone else testing getting reboots and the following stack traces?
System log:

Sun Mar 12 13:45:11 2017 kern.err kernel: [ 9312.980000] INFO: rcu_sched detected stalls on CPUs/tasks:
Sun Mar 12 13:45:11 2017 kern.err kernel: [ 9312.990000] 3-...: (0 ticks this GP) idle=968/0/0 softirq=848170/848170 fqs=0
Sun Mar 12 13:45:11 2017 kern.err kernel: [ 9313.000000] (detected by 1, t=6004 jiffies, g=104506, c=104505, q=166)
Sun Mar 12 13:45:11 2017 kern.info kernel: [ 9313.010000] Task dump for CPU 3:
Sun Mar 12 13:45:11 2017 kern.info kernel: [ 9313.020000] swapper/3 R running 0 0 1 0x00100000
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] Stack : 00000004 00000001 0000000a 00000000 00000000 00000001 804af2a4 803e0000
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] 8045e75c 80460000 00000001 8045e680 00000001 80460000 00000000 800133f0
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] 1100fc03 00000003 87c74000 87c75eb8 80460000 8005d028 1100fc03 00000003
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] 00000000 803e0000 804af2a4 8005d020 87c74000 87c75ee0 80460000 8001aea8
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] 1100fc03 00000004 8045e4a0 000000a0 8045e75c 8001aeb0 02002810 26200200
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] ...
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.100000] Call Trace:
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.110000] [<8000bb7c>] __schedule+0x330/0x750
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.120000] [<800133f0>] r4k_wait_irqoff+0x0/0x20
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.130000]
Sun Mar 12 13:45:11 2017 kern.err kernel: [ 9313.130000] rcu_sched kthread starved for 6017 jiffies! g104506 c104505 f0x0 s3 ->state=0x1

Kernel log:

[ 9312.980000] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 9312.990000] 3-...: (0 ticks this GP) idle=968/0/0 softirq=848170/848170 fqs=0
[ 9313.000000] (detected by 1, t=6004 jiffies, g=104506, c=104505, q=166)
[ 9313.010000] Task dump for CPU 3:
[ 9313.020000] swapper/3 R running 0 0 1 0x00100000
[ 9313.030000] Stack : 00000004 00000001 0000000a 00000000 00000000 00000001 804af2a4 803e0000
[ 9313.030000] 8045e75c 80460000 00000001 8045e680 00000001 80460000 00000000 800133f0
[ 9313.030000] 1100fc03 00000003 87c74000 87c75eb8 80460000 8005d028 1100fc03 00000003
[ 9313.030000] 00000000 803e0000 804af2a4 8005d020 87c74000 87c75ee0 80460000 8001aea8
[ 9313.030000] 1100fc03 00000004 8045e4a0 000000a0 8045e75c 8001aeb0 02002810 26200200
[ 9313.030000] ...
[ 9313.100000] Call Trace:
[ 9313.110000] [<8000bb7c>] __schedule+0x330/0x750
[ 9313.120000] [<800133f0>] r4k_wait_irqoff+0x0/0x20
[ 9313.130000]
[ 9313.130000] rcu_sched kthread starved for 6017 jiffies! g104506 c104505 f0x0 s3 ->state=0x1
[ 9459.240000] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 9459.250000] 3-...: (0 ticks this GP) idle=bf0/0/0 softirq=848170/848170 fqs=0
[ 9459.260000] (detected by 1, t=6004 jiffies, g=104860, c=104859, q=2350)
[ 9459.270000] Task dump for CPU 3:
[ 9459.280000] swapper/3 R running 0 0 1 0x00100000
[ 9459.290000] Stack : 00000004 ffffffff 7fc85190 00000000 00000010 0040540f 804af2a4 803e0000
[ 9459.290000] 8045e75c 80460000 00000001 8045e680 00000001 80460000 000010d9 800133f0
[ 9459.290000] 1100fc03 00000003 87c74000 87c75eb8 80460000 8005d028 1100fc03 00000003
[ 9459.290000] 00000000 803e0000 804af2a4 8005d020 87c74000 87c75ee0 80460000 8001aea8
[ 9459.290000] 1100fc03 00000004 8045e4a0 000000a0 8045e75c 8001aeb0 02002810 26200200
[ 9459.290000] ...
[ 9459.360000] Call Trace:
[ 9459.370000] [<8000bb7c>] __schedule+0x330/0x750
[ 9459.380000] [<800133f0>] r4k_wait_irqoff+0x0/0x20
[ 9459.390000]
[ 9459.390000] rcu_sched kthread starved for 6017 jiffies! g104860 c104859 f0x0 s3 ->state=0x1

New test build here, r3764. All the usual stuff included.
I have been trying hard to crash it today by not turning off the offloading features and by running SQM QoS. So far, so good.
Also, VLAN tagging should be fixed. See this and this.

Just got another stack trace so testing now with all offloads off.

ethtool -K eth0 rx off tx off tso off sg off

I managed to crash it within a few minutes. It seems to fail much faster when under high loads. I was routing 500-600 Mbit/s of traffic WAN <-> LAN with SQM cake enabled, and I can reliably crash it within a few runs :frowning: going to test fq_codel now.

In my limited testing, fq_codel is completely stable. I was unable to crash the router over many high speed WAN <-> LAN tests through iperf with fq_codel enabled.

Was that with offloading turned off?

I didn't change the offloading settings, so if they are enabled by default on your build, as they are on the regular builds, then I tested with offloading enabled. I can retest with offloading disabled if you'd like. Is ethtool included in your build? My router is not connected to the internet, so I am not able to install packages through opkg.

According tot the lede-dev and cake mailing lists that should solve the problem.
Ethtool is included in my build so no internet required. Look at the above posts how to use it. You can use it via SSH or via LuCI (somewhere in startup settings you can add the line so it will be run every time the router reboots).

  1. It seems like ralink/mediatek chipset don't play well with intel 7265 wlan adapter

I have tried both rtn56u (Ralink RT3662F) and dlink 860l. In both cases pushing data from router to client via UDP causes excessive datagram loss at a very low throughput rate (client to router is totally fine). Looking at station dump the tx retry is more than 40% for intel wlan client, but less than 1% for an android tablet. Both clients are sitting at about the same distance from the AP.

The same behavior happens with rtn56u as well.

  1. Thank you OP for the custom builds. I am experiencing random crash / reboot - the config is stock except that I enabled SQM using piece_of_cake and cake qdisc. Since the router reboot itself after crashing - how do I collect crash log to help with debug?

EDIT: I used ethtool -K eth0 rx off tx off tso off sg off but still end up crashing. Upon taking a look at ethtool -k eth0 I am now using

ethtool -K eth0 txvlan off rx off tx off tso off sg off gso off gro off

Will report if it still crashes.

Probably a dumb question but are you using the latest and greatest drivers for your Intel 7265 card. Another thing you can try is disabling the power saving features for it. Do you also have the problem with other routers? If not, you might be on to something.

The builds have debugging enabled and should write a crashlog to "/sys/kernel/debug/crashlog" but mine is always empty. I really should setup remote logging or a serial console but I have been unable to find the time to do so.

At the moment, I would advice people not to use cake but to use fq_codel instead!