Optimized build for the D-Link DIR-860L

Yeah, however only with the qos_simplest script(which seems to handle my connection the best). No codelq on that one I think?

Does anybody know whether this issue has been introduced by Lede? Is this bug also present in the latest openwrt branch or the CC release?

It depends on which you set fq_codel or cake :wink:

I have been thinking about working backwards to see when and where the bug (or multiple bugs) was introduced. A lot of work if you don't really know what you are looking for. Using some Google-fu, the "skbuff.c - skb_try_coalesce" bug is also present in OpenWRT trunk (posted in an unrelated issue). Haven't found anything regarding the "rcu_sched" for our chipset in combination with OpenWRT. However, I found this thread with a similar stack trace for the WRT1900AC, where people are pointing to the wireless driver.
I seem to remember a forum post where a member is building a build for our chipset without wireless network support to see if the crashes happen there. Unfortunately, even though I subjugate my housemates to my builds I cannot make a build without working WiFi since all modern people are as dependent on it as plants are from sunlight :stuck_out_tongue:
Edit: I might be able to make an experimental build with a 4.9 kernel to see if that solves one or both those bugs.

If you could send me the build without wireless network support, or a link to the forum post where the build is discussed, I could test it this weekend. The girlfriend is gone for the weekend, so I can run tests without her complaining about the lack of wifi :wink:

I mentioned that I would try a non-wireless build. However I don't own a DIR860, just very similar hardware, with the same problems. Unfortunately I didn't get around to build a testbuild and do some testing...

But since the wireless is blamed by others, is there any OpenWRT Ralink proprietary driver. I know Padavan is using those, but on a 3.1 kernel and I didn't get those to build on the Lede toolchain.

Looked into building a build without wireless but haven't the time to figure it out. However, I am building a build with a 4.9 kernel atm. Finger crossed! Unable to test it, if it builds correctly, till tomorrow though.
@drbrains: very interesting stuff you are up to! Noticed you also almost have ported the hardware crypto module for at least your device and that you are looking into porting the proprietary hwnat module.
Since all mt76 devices are affected I don't think it will matter much but then again there are a some hardware revisions (if I understand everything correctly). Different SoC's with different CPU cores so that also could be a possibility. I haven't figured out how to disable wireless but would it be possible to build mt76 as a kernel module and load/unload them as required?
About the proprietary driver, latest Mediatek SDK I could find is 5.0.1.0 which has a 3.1 kernel :confused: However this repo of the non-GPL Ralink/Mediatek SDK driver might help even though the latest commit is from 2014.

Edit: For the brave souls wanting to try a 4.9 kernel, you can download the build here. Please make sure you know how to recover your device since this untested image has a high chance that it will brick your device!

Maybe I will have time when I get home to test it. By the way, when you brick your device you can flash directly to LEDE from recovery. Is that an okay way to flash? Until now, I always first flashed the original firmware and then back to LEDE. Not sure whether the extra step is needed.

And if flashing LEDE from recovery is fine, should you use the factory or sysupgrade image?

If you brick your device you could flash a sysupgrade image from recovery but sometimes that won't boot. So, I always flash the factory image from recovery.

Edit: Oh and as a bonus the 4.9 kernel build has BBR as the default TCP congestion algorithm instead of cubic which, like SQM QoS, should help against bufferbloat. Some more reading here.

Third run of dsl-reports in a row without a crash running cake. Could it be? To be honest, I was surprised that the image even worked in the first place. I thought getting kernel 4.9 to work still required quite some hacking and tweaking to be useful. But it booted up and worked just fine :slight_smile: I did uncheck the "keep settings" option just to be sure. I will keep an eye on it some more, but initial tests are very promising :slight_smile:

Edit: I think I did 10 tests now. No crash or stack trace :slight_smile: I did run into the same bug as earlier again though. The router stopped responding to most of my commands in the SSH session (such as top, but some other commands were also affected), and changing settings through LUCI would also stop working (it would hang at "applying changes". It required a reboot to fix. Not sure what is up with that. Did you re-enable SMT in your build system? Or is it still disabled?

Edit 2: I think the LUCI not saving and commands not working in SSH are related. The router is probably also not responding anymore to the restart command on the network.

Exciting news! 4.9 Kernel here we come!

Does this build have disabled WiFi/disabled SMT?

I think wifi was disabled by default, but I didn't really pay that much attention to that. I restored my wireless config file after the flash, and It seems to work just fine :slight_smile: I am not sure whether SMT is disabled or enabled. I cannot check this, because due to a bug cat /proc/cpuinfo always shows 4 cores, regardless of whether it is enabled or disabled in the kernel options before compiling.

I will keep an eye on it and use it for daily use with SQM enabled and see how it goes. For now, the aforementioned bug with the router no longer accepting commands through SSH, which also prevents LUCI from applying setting changes is quite annoying though. It happens regularly for me, and if the bug is active it requires a reboot to change settings. Not the most convenient for me, since I like to tinker a lot with the router.

Hopefully SQM is stable over a longer period of time. I will keep you guys updated :slight_smile: Definitely very promising.

What bandwidth settings are you using in SQM? Is CPU load approaching 0% idle during a dsl reports test?

500/500 mbit. Unfortunately, I am only getting 350ish in both directions. Bufferbloat scores are at A though :slight_smile:

I didn't get too many looks, since the top command often stopped working due to the aforementioned bug. But I noticed around 15-20% idle during one of my tests. The CPU is still clearly the bottleneck though. The workload doesn't scale perfectly over 4 threads :frowning:

I just realised you meant to ask whether this build has WiFi disabled by removal of the mt76. As my answer might have already given away, the mt76 driver is still included and WiFi is functional.

Sounds like this is almost daily driver material, if the weird ssh/Luci bug is sorted out!

EDIT: Spoke too soon, I too have the Luci non-responsive bug. Router will function fine after a reboot however.

I went ahead and flashed this build on my router as well!

Snazzy new Luci skin, SQM performing well, and no SSH/Luci issues so far!

It even seems to have SQM and Luci-SSLbuilt in!

Gonna run it as a secondary router for the weekend to make sure no crashes occur, then this bad boy will become my main router!

I just woke up to a phone that was still connected to the wifi, but with no traffic possible. Disabling wifi and then re-enabling it again caused the phone not to be able to reconnect to the acccess point. The chromecast had disconnected from the wifi automatically during the night.

I booted up my computer, and tested the ethernet connection. No luck there either. The LUCI login page loaded, but failed to allow me to login. I presume that the LUCI login page was served from my computer's cache. Trying to open the page again also failed. The router was also unreachable through SSH. I could see that the ethernet nic from my laptop was connected, but issueing a /renew failed.

Definitely not a daily driver for me in it's current state, but we're getting closer and closer :slight_smile: The initial SQM tests were very promising :slight_smile: Let's dive into this some more to try and get everything worked out.

I am also suprised that the builld works and that it doesn't crash for you while running speedtests with cake enabled. The LuCI/SSH bug concerns me however. I have not run into it but for normal router management it's quite an essential thing ;). Is there something in the logs concerning LuCI or SSH? Maybe because I include luci-ssl-openssl instead of the regular luci-ssl? Just guessing here.
I forgot to mention that the build is a normal build with all the regular goodies so SMP,SMT and mt76 enabled.

Does your router still respond to commands in a SSH session?

So something breaks, but the initial results with cake are promising. If someone is able to get a log I will be very grateful :slight_smile: Tonight around 23:00 UTC+1 I can finally test the build myself, I need to have some patience :see_no_evil:

Nothing strange in the logs when SSH/LUCI broke. The symptoms and behavior of the bug were exactly the same as with the 4.4 build with SMT disabled. Maybe it is still a bug caused by SQM, but the crashing on non-experimental builds prevent the bug from ever rearing its head?

When the router stopped working completely and required a reboot, I wasn't able to get any logs, since it was not accessible through LUCI nor through SSH. I am now running a SSH session with logread -f to hopefully catch something before the thing crashes again.

1 Like

Just another thing clouding the waters unfortunately. Only thing we can do is try to get some logs and see what is causing these bugs. My gut-feeling says it has something to do with the drivers for our device but without evidence, that is just guesswork. Did you catch anything last night? Did WiFi break again?

Just flashed the 4.9 kernel build. Running it without SQM QoS for the time being (need a bit of stability due to lack of time). Let's see if I can break stuff :stuck_out_tongue:

Nothing broke so far. Luci still works and SSH commands are also working. I think it only broke under heavy load. Not sure if SQM had anything to do with the issues.

Perhaps SQM + heavy load now leaves it in a buggy state rather than causing stack traces / hard crashes? Will have to test to see if heavy traffic also causes the same issues with SQM disabled.