Optimized build for the D-Link DIR-860L

Bartvz, thanks a lot.

I just installed r3636. Seems to be quite alright, but the snag came around:

Wed Mar  8 20:47:43 2017 kern.warn kernel: [   38.150000] WARNING: CPU: 1 PID: 2769 at net/core/skbuff.c:4194 skb_try_coalesce+0x228/0x35c()
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.170000] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TEE xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipvWed Mar  8 20:47:44 2017 kern.warn kernel: [   38.400000] CPU: 1 PID: 2769 Comm: uhttpd Not tainted 4.4.50 #0
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] Stack : 00000000 00000000 804c6862 00000033 00000000 00000000 80460000 804e0000
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  86c76b6c 80461c83 803df43c 00000001 00000ad1 804c367c 87c19d3f 804fade0
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  00000000 8006320c 80460000 804e0000 80466188 8046618c 803e4070 87c19bec
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  00000003 80060fc8 87c19d3f 804fade0 00000000 000000da 00000000 00c19bec
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.410000] 	  ...
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.480000] Call Trace:
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.490000] [<8001671c>] show_stack+0x6c/0x88
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.500000] [<801b1010>] dump_stack+0x8c/0xc0
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.510000] [<8002b944>] warn_slowpath_common+0xa0/0xd0
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.520000] [<8002b9fc>] warn_slowpath_null+0x18/0x24
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.530000] [<8028bff8>] skb_try_coalesce+0x228/0x35c
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.540000] [<802e73c0>] tcp_try_coalesce+0x70/0xd4
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.550000] 
Wed Mar  8 20:47:44 2017 kern.warn kernel: [   38.550000] ---[ end trace 55743a95a3379d01 ]---

Jimzhong reported a similar issue here: https://forum.openwrt.org/t/lede-v17-01-0-rc1/1285/6 - it is attributed to relayd - this is quite possible, as I use WDS (this is the WDS server router).

The official stable 17.01.0 is much worse, it either hangs or reboots after a while (reported here: https://bugs.lede-project.org/index.php?do=details&task_id=606)

Anyway, I'll test it for a couple of days and will came back with the results.

Keep up the good work and thanks a lot

Do your builds also suffer from crashes when sqm-cake is enabled? Also, do you happen to know whether VLANs are functional with your build? Thank you very much for the great work you're doing :slight_smile:

@karesch & @Mushoz : Thanks for your kind words!
Those hangs, crashes and reboots might be something related to hardware offloading in the switch. See this message (and the rest of the thread) on the cake mailing list: https://www.mail-archive.com/lede-dev@lists.infradead.org/msg06441.html

VLANs are probably broken since they are on the rest of mt76 devices atm.

Compiling a new build atm with ethtool included. Will upload once it is done so the brave can test if they want. Also noticed that I forgot to include nano and BCP38 in the last build. They will be included in the coming builds.

Is the VLAN issue a recent issue with mt76 device? Are there any older builds with working VLAN support? I need VLAN support, because my new connection comes in as 3 different tagged VLANs on a single line, one for internet, one for IPTV and finally the last one for VoIP. I'd prefer to use a LEDE router over the standard ISP crap ^^

If my memory serves me correct this has been the situation for at least 2-3 months. But take this with a grain of salt because I don't use VLAN's.

Test build is up: r3677

Running ethtool -k eth0 prints the following:

Features for eth0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Run the following commands:

ethtool -K eth0 rx off
ethtool -K eth0 tx off
ethtool -K eth0 tso off

ethtool -k eth0 should now print:

Features for eth0:
rx-checksumming: off
tx-checksumming: off
tx-checksum-ipv4: off
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: off
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: off
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

This should do the trick but we cannot be sure unless we test.
Also, this looks suspicious among the sea off's:

tx-vlan-offload: on

1 Like

That looks interesting. Do you think it might be related to the non-functional VLAN functionality? Could you test whether VLANs are working once that setting is turned off?

Turning it off is not hard:

ethtool -K eth0 txvlan off

To verify if it's off:

ethtool -k eth0 | grep tx-vlan-offload

Never worked with VLAN's so what is something easy I can do to test?

Anyone else testing getting reboots and the following stack traces?
System log:

Sun Mar 12 13:45:11 2017 kern.err kernel: [ 9312.980000] INFO: rcu_sched detected stalls on CPUs/tasks:
Sun Mar 12 13:45:11 2017 kern.err kernel: [ 9312.990000] 3-...: (0 ticks this GP) idle=968/0/0 softirq=848170/848170 fqs=0
Sun Mar 12 13:45:11 2017 kern.err kernel: [ 9313.000000] (detected by 1, t=6004 jiffies, g=104506, c=104505, q=166)
Sun Mar 12 13:45:11 2017 kern.info kernel: [ 9313.010000] Task dump for CPU 3:
Sun Mar 12 13:45:11 2017 kern.info kernel: [ 9313.020000] swapper/3 R running 0 0 1 0x00100000
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] Stack : 00000004 00000001 0000000a 00000000 00000000 00000001 804af2a4 803e0000
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] 8045e75c 80460000 00000001 8045e680 00000001 80460000 00000000 800133f0
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] 1100fc03 00000003 87c74000 87c75eb8 80460000 8005d028 1100fc03 00000003
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] 00000000 803e0000 804af2a4 8005d020 87c74000 87c75ee0 80460000 8001aea8
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] 1100fc03 00000004 8045e4a0 000000a0 8045e75c 8001aeb0 02002810 26200200
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.030000] ...
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.100000] Call Trace:
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.110000] [<8000bb7c>] __schedule+0x330/0x750
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.120000] [<800133f0>] r4k_wait_irqoff+0x0/0x20
Sun Mar 12 13:45:11 2017 kern.warn kernel: [ 9313.130000]
Sun Mar 12 13:45:11 2017 kern.err kernel: [ 9313.130000] rcu_sched kthread starved for 6017 jiffies! g104506 c104505 f0x0 s3 ->state=0x1

Kernel log:

[ 9312.980000] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 9312.990000] 3-...: (0 ticks this GP) idle=968/0/0 softirq=848170/848170 fqs=0
[ 9313.000000] (detected by 1, t=6004 jiffies, g=104506, c=104505, q=166)
[ 9313.010000] Task dump for CPU 3:
[ 9313.020000] swapper/3 R running 0 0 1 0x00100000
[ 9313.030000] Stack : 00000004 00000001 0000000a 00000000 00000000 00000001 804af2a4 803e0000
[ 9313.030000] 8045e75c 80460000 00000001 8045e680 00000001 80460000 00000000 800133f0
[ 9313.030000] 1100fc03 00000003 87c74000 87c75eb8 80460000 8005d028 1100fc03 00000003
[ 9313.030000] 00000000 803e0000 804af2a4 8005d020 87c74000 87c75ee0 80460000 8001aea8
[ 9313.030000] 1100fc03 00000004 8045e4a0 000000a0 8045e75c 8001aeb0 02002810 26200200
[ 9313.030000] ...
[ 9313.100000] Call Trace:
[ 9313.110000] [<8000bb7c>] __schedule+0x330/0x750
[ 9313.120000] [<800133f0>] r4k_wait_irqoff+0x0/0x20
[ 9313.130000]
[ 9313.130000] rcu_sched kthread starved for 6017 jiffies! g104506 c104505 f0x0 s3 ->state=0x1
[ 9459.240000] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 9459.250000] 3-...: (0 ticks this GP) idle=bf0/0/0 softirq=848170/848170 fqs=0
[ 9459.260000] (detected by 1, t=6004 jiffies, g=104860, c=104859, q=2350)
[ 9459.270000] Task dump for CPU 3:
[ 9459.280000] swapper/3 R running 0 0 1 0x00100000
[ 9459.290000] Stack : 00000004 ffffffff 7fc85190 00000000 00000010 0040540f 804af2a4 803e0000
[ 9459.290000] 8045e75c 80460000 00000001 8045e680 00000001 80460000 000010d9 800133f0
[ 9459.290000] 1100fc03 00000003 87c74000 87c75eb8 80460000 8005d028 1100fc03 00000003
[ 9459.290000] 00000000 803e0000 804af2a4 8005d020 87c74000 87c75ee0 80460000 8001aea8
[ 9459.290000] 1100fc03 00000004 8045e4a0 000000a0 8045e75c 8001aeb0 02002810 26200200
[ 9459.290000] ...
[ 9459.360000] Call Trace:
[ 9459.370000] [<8000bb7c>] __schedule+0x330/0x750
[ 9459.380000] [<800133f0>] r4k_wait_irqoff+0x0/0x20
[ 9459.390000]
[ 9459.390000] rcu_sched kthread starved for 6017 jiffies! g104860 c104859 f0x0 s3 ->state=0x1

New test build here, r3764. All the usual stuff included.
I have been trying hard to crash it today by not turning off the offloading features and by running SQM QoS. So far, so good.
Also, VLAN tagging should be fixed. See this and this.

Just got another stack trace so testing now with all offloads off.

ethtool -K eth0 rx off tx off tso off sg off

I managed to crash it within a few minutes. It seems to fail much faster when under high loads. I was routing 500-600 Mbit/s of traffic WAN <-> LAN with SQM cake enabled, and I can reliably crash it within a few runs :frowning: going to test fq_codel now.

In my limited testing, fq_codel is completely stable. I was unable to crash the router over many high speed WAN <-> LAN tests through iperf with fq_codel enabled.

Was that with offloading turned off?

I didn't change the offloading settings, so if they are enabled by default on your build, as they are on the regular builds, then I tested with offloading enabled. I can retest with offloading disabled if you'd like. Is ethtool included in your build? My router is not connected to the internet, so I am not able to install packages through opkg.

According tot the lede-dev and cake mailing lists that should solve the problem.
Ethtool is included in my build so no internet required. Look at the above posts how to use it. You can use it via SSH or via LuCI (somewhere in startup settings you can add the line so it will be run every time the router reboots).

  1. It seems like ralink/mediatek chipset don't play well with intel 7265 wlan adapter

I have tried both rtn56u (Ralink RT3662F) and dlink 860l. In both cases pushing data from router to client via UDP causes excessive datagram loss at a very low throughput rate (client to router is totally fine). Looking at station dump the tx retry is more than 40% for intel wlan client, but less than 1% for an android tablet. Both clients are sitting at about the same distance from the AP.

The same behavior happens with rtn56u as well.

  1. Thank you OP for the custom builds. I am experiencing random crash / reboot - the config is stock except that I enabled SQM using piece_of_cake and cake qdisc. Since the router reboot itself after crashing - how do I collect crash log to help with debug?

EDIT: I used ethtool -K eth0 rx off tx off tso off sg off but still end up crashing. Upon taking a look at ethtool -k eth0 I am now using

ethtool -K eth0 txvlan off rx off tx off tso off sg off gso off gro off

Will report if it still crashes.

Probably a dumb question but are you using the latest and greatest drivers for your Intel 7265 card. Another thing you can try is disabling the power saving features for it. Do you also have the problem with other routers? If not, you might be on to something.

The builds have debugging enabled and should write a crashlog to "/sys/kernel/debug/crashlog" but mine is always empty. I really should setup remote logging or a serial console but I have been unable to find the time to do so.

At the moment, I would advice people not to use cake but to use fq_codel instead!

Hello Bartvz and thanks for trying to make this router a better device :slight_smile:
Do you know about the efforts of using Mediatek's hardware drivers/optimisations with OpenWRT: https://forum.mqmaker.com/t/alternative-openwrt-mtksdk-build-for-the-witi-board-wip/272? This is for another (but using the same chipset) device (Witi), but could you please analyse if something could be used with LEDE to make an even better device having hardware NAT/IPsec/QoS?

I used

ethtool -K eth0 txvlan off rx off tx off tso off sg off gso off

and used cake + piece_of_cake.qos on 5ghz, it has been stable since I last posted.

Hi Bart,

Do you happen to know if the DIR-860l supports jumbo frames? According to this topic in the mailing list it is disabled: http://lists.infradead.org/pipermail/lede-dev/2017-April/007008.html

Not sure if the SOC does not support jumbo frames and the flag is correctly not getting set, or if it should have been set but it isn't. The SOC itself isn't that old, so I would expect jumbo frame support to be honest.

According to this topic, the edgerouter x seems to support at least up to 2k frames: https://community.ubnt.com/t5/EdgeMAX/Edge-devices-and-Jumboframe-support/td-p/1382993

As far as I know, this router uses the same SOC as the DIR-860l. Is this correct?

I am no C programmer, but I believe in order to set this flag, this file should be changed: https://github.com/lede-project/source/blob/master/target/linux/ramips/files-4.9/drivers/net/ethernet/mtk/soc_mt7621.c

More, specifically, the following function should be edited:

static void mt7621_init_data(struct fe_soc_data *data,
		     struct net_device *netdev)

This section:

	priv->flags = FE_FLAG_PADDING_64B | FE_FLAG_RX_2B_OFFSET |
	FE_FLAG_RX_SG_DMA | FE_FLAG_NAPI_WEIGHT |
	FE_FLAG_HAS_SWITCH;

Should have the "FE_FLAG_JUMBO_FRAME" flag added, correct? Could somebody please verify whether this is correct? If so, I will try to compile the build with this edit to see if I can get a MTU of 1508 working.

Edit: Wait, I'm not sure if that is the correct file. Why is there a reference to 4.9 in the file url? Current builds are not using the 4.9 kernel yet, right?