Asus RT-N16 unstable with LEDE

@buedi @mikeyp - I didn't have a chance to investigate this further over the weekend, instead learning about larger scale systems with a failure in domestic water heater. One thing I noticed though is that sometimes LEDE is failing and recovering (a brief period of internet unavailability, then an "Uptime" in main status page of <1 minute), and sometimes LEDE is failing and not recovering (no further routing, no response on HTTP or SSH ports, needs to be hard rebooted with unplugging to regain functionality).

I had started down the path of writing logs to a USB drive so that they survive across reboots, but USB drives aren't supported out of the box. Y'all have any strategy for logging events?

@mrbene - Me neither, Haha! But when I had Lede on my Router, I added USB support and wrote down the steps needed to get my Thumbdrive working. Maybe that helps you (although I don´t know how to redirect the Logs to another Directory):

  • opkg update && opkg install usbutils (this gives you access to lsusb and lsusb -t was very useful getting my Thumbdrive up and running)
  • opkg install e2fsprogs (because my drive had an existing ext2 filesystem)
  • opkt install kmod-fs-ext4 (to get ext2 / ext4 support into the kernel)
  • opkt install kmod-usb-storage (the kernel should support usb storage)

With that I was able to mount my thumbdrive and have access to it.

It´s not helpful in this case, but it helped me to rule out one potential problem: I opened up my Router and the capacitors look fantastic. So it should not be the more or less common capacitor problem. It still has the 470uF caps though. I´ve seen newer models have 680uF.
I wonder if this really could make a difference. But if it does, we still have the problem of the very high CPU utilization @mrbene noticed :frowning:

Last night after the kiddos passed out from sugar overdoses, I had a chance to poke around with the RT-N16. I changed the firmware to OpenWRT Chaos Calmer 15.05.1 (March 2016). High CPU (especially in SIRQ) under ethernet load reproduced there, too - but without triggering the failure that I'm seeing with LEDE 17.01.4.

Now, this might be because I'm maxing out my line at ~60 Mbps up/down and not reaching your 100 Mbps, @buedi. Definitely supports the thesis that an ethernet driver in the OpenWRT has a problem.

Oh, and here's additional data. Specifically:

Unfortunately CPUs on most of SoCs are too slow to provide 1000 Mb/s routing or NAT. It results in NAT being limited to something around 130Mb/s on BCM4706 and even less on slower units (like ~50Mb/s on BCM4718A1).

RT-N16 runs BCM4718.

So, synopsis:

  • The CPU in RT-N16 isn't expected to support NAT while also supporting transfer faster than ~50Mb/s
  • High CPU utilization seems to be the expected result of this, and early investigation to reverse engineering ctf.ko don't look to have gone anywhere.

Now I'm definitely interested in what version of DD-WRT you're using - and testing transfer rates on my network with that.

it would be nice to try those custom patches - it should improve wan speeds up to 300%

Hey @mrbene

everything you found makes sense. In that way, that the information you found is backed by facts and sounds valid.
But I´m reaching easily 100Mbit/s on the WAN side via NAT on my RT-N16. And on the LAN side (just switching), I reach speeds of up to 750MBit/s reading from my NAS and around 450MBit/s writing to my NAS and I´m not even sure if the NAS is the limiting factor here.

My ancient DD-WRT Image is this one:
DD-WRT v24-sp2 (08/07/10) mega - build 14896
I had 14949 on it before my flash adventure, but I was not able to find that, so I´m on 14896 now.

It runs like a champ, but I´d really like to have something more modern with the latest security patches. A friend will help me changing the capacitors on mine. They look good from the outside, but maybe they became dry after all those years. This might help running it more stable. But that CPU load will probably prevent using a newer Firmware version.

What blows my mind is: Why is there such a huge difference in load with a 2010 Firmware version compared to a newer one when performing the basic tasks like switching and NAT.
If we would using tons of additional fancy stuff, I could understand it. But you do the same with a 7 years newer Firmware and we get such a major difference? That´s really strange.

Not too much RT-N16 discussion around here, just wondering if you guys find the wifi driver included by default very useful?
I've found that I need to use the kmod-brcm-wl driver as detailed here: https://wiki.openwrt.org/toh/asus/rt-n16
I also find I have to use the command line mostly or risk having settings get stepped on. But I'm interested in your experiences.
thanks

Sorry to revive an old one here. Wanted to mention I have a rt-n16 and installed LEDE over the weekend (from ddwrt). I ran the original b43 driver, the kmod-brcmsmac and the kmod-brcm-wl driver using the instructions here https://wiki.openwrt.org/toh/asus/rt-n16. Landed on the kmod-brcm-wl after it was fastest after tweaks BUT I may have whiffed on the earlier drivers and had them running in G only. Has been very solid so far. Streaming sling, youtube etc. Has been up for a little over a day with 1G+ of transmission no drops or reboots. I did notice some slight oddness with the UI and commits of changes after putting on kmod-brcm-wl but definitely no worse than ddwrt.

Do you get a proper working list of wireless clients that are connected?

I do. They show up on the status page and the network/wireless. This router is setup as a "dumb AP" so no FW, DNS, DHCP or WAN. Also have IPv6 off. If there is something specific you want to see from the screens or config let me know and I can post it. I am running the latest build LEDE Reboot 17.01.4 r3560-79f57e422d.

Mine is set up as an AP as well, but I do have 5 vlan'd wifi networks on it (trunked back to the main Lede router). When I connect and go to the overview page, I see a list of 5 wireless networks (all incorrectly listed with the same name), and when I look at the connected clients list, I get 5 entries for my one device that is connected. You don't happen to have more than one SSID on there do you? I don't want to go through the hassle of installing a newer version because the unit lives where I can't easily get to it in the case of a problem, but if I was confident the gui would report correctly I would be much encouraged to do it.

1 Like

I'm also on 17.01.4 r3560 and unfortunately having the same problem with the client list (3 SSIDs).

1 Like

I only have one SSID. Based on your post I tried a second and ultimately connected to both and streamed a little over each simultaneously. Some notes. 1. I see the same behavior you guys do in the Luci. The associated stations look "dot producted" so I see each client twice in my case (2 SSIDs) and all SSIDs read as if they are the first one. All the locations in Luci where it is listing SSIDs I only see the first SSID dupped, although when you click to details I can see the 2nd SSID information. I am guessing this is a bug in Luci but don't know that. I also couldn't figure out a way to associate the new SSID with a new interface/vlan from the GUI. So did so via SSH in the /etc/config/wireless file manually. The 2nd SSID did not work until I did this. I didn't do a whole lot of run time, but no drops or router reboots or anything. I only have one LEDE flashed router so can't prove this, but my guess is this isn't specific to the RT-N16, but that is just a guess. HTH

2 Likes

I have a similar config on another lede router (same version of sw, different hardware) that works properly with multi SSID's as far as the reporting.

Would it be a good idea to replace it with a solid cap? The closest one I found was 470uf 16v as far as solid caps go (they don't make 680uf 16v).

Has anyone tried v18 with the RT-N16 and is there any improvements?

I'm heavily taxing a couple of RT-N16s with a bleeding-edge build. I can't speak for its WiFi capabilities, but as a router it's pretty solid.

It's also worth noting that I have replaced the electrolytic caps in all of my RT-N16s. I replaced the 680uF with a 1000uF. All Nichicon PW or better caps.

1 Like

I have 18.06.1 at rt-n16. kmod-brcm-wl does not work correctly. There are the following issues:

  1. Long connection or failure to receive an ip address
  2. Disconnection after 10-60 seconds

The kmod-b43 driver is fine.

1 Like

I have the same problems with kmod-brcm-wl. Did you solve them?