I'm encountering a problem that results in the same error posted here:
Oct 3 12:08:59 ROUTER1A kernel: [12276.321915] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000053
It happens approximately 4-10 hours. It's an issue with the SNAPSHOT r17652-21c7a8593d and also happened on r17443 from the UBI installer repo.
This morning I attempted to get pstore data but I was unable to contact the router using either it's assigned IP address, 10.10.1.1. I also didn't have any luck pinging 192.168.1.1. I will be attempting to access it through either SSH or web on default IPs/ports when it next fails and hopefully I have more data to provide.
One thing that I do note is that the router continued to access the Internet and route traffic through Ethernet. It's the WiFi (both 2.4 and 5.0 radios) that wasn't functional. The fact that it routed traffic as a gateway makes me think that it was still operating as 10.10.1.1, though I couldn't get a ping response and didn't note any traffic from that IP with Wireshark.
I've been following this thread for the past couple days as I tried to diagnose the issue. If I am able to get pstore logs/data, I'll post them here. This router is not part of a mesh or anything like that. Normal gateway router/AP only.
I think in that case I would consider DFS. You ought to be able to scan and monitor and see which channel segment is the least occupied.
BTW for anyone wanting more SQM / VPN performace just enable irqbalance. I noticed significantly decreased loadavg with that enabled. And I saw on another thread this router can manage 1Gbit SQM with it enabled. Actually is there any reason not to have this enabled?
Finally has anyone got 160Mhz mesh or WDS working? Or 160Mhz working well in general? As in greater throughput than 80Mhz.
I haven't really tried 160MHz. Originally tried to get it working back in June/July before LuCI had full ax support and couldn't get a stable connection. So I just set it to 80MHz and kept it there until earlier today experimenting a little bit. From my quick tests it did connect at 160MHz and report as such on the status page / station list. Have no idea about performance / stability, this was just a quick test before I decided I should probably keep the router on 80MHz until this crash problem is sorted so I don't introduce any additional variables in the troubleshooting process.
Is there any documentation on channel selection in this router, especially at 160MHz? I know with some routers only certain channels work with wider bandwidth, and it doesn't seem that all routers refer to the block of channels in a standard way. I seem to recall that when specifying the channel number, some routers want you to select the low numbered channel, some want the high numbered channel, and some want you to choose the center channel. Not sure how this works with this model, going to see what I can find out about this.
My router just had the oops occur again and I was finally able to access it.
The router is running r17652-21c7a8593d, and after the oops it booted into recovery SNAPSHOT r17443-90e167abaa. Accessing the router at 192.168.1.1 and default ports was successful.
I was able to obtain dmesg-ramoops-0 and dmesg-ramoops-1 which appear to have identical data about the trace, as follows:
These nullpointers at virtual address 0000000000000053 are seen and reported in this thread since end of July. Had multiple occurrences as well then revered to version prior end of July with OpenWrt SNAPSHOT r17114-349e2b7e65.
It is reported here
and supposedly fixed 10 days ago.
Not sure when it ends up on openwrt snapshot builds though, someone ?
I saw this reported earlier in the thread and the request for pstore logs. I also followed the link to the issue on GitHub, saw the commit, and thought that meant it'd be in snapshot builds from then on. Thought it must not have been a full fix since it seemed to be the same issue. Didn't realize until you pointed it out that commit doesn't get incorporated into subsequent OpenWrt snapshots without further steps. My mistake.
I think I'll revert later today to r17114 on my main router.
I am also waiting for this fix to be incorporated in the nightly build as my router goes into recovery mode every 2-3 days. Thinking of writing a script to clear the pstore content and to reboot the router whenever it goes into recovery for the time being.
My replacement arrived. I have rebooted many times and no issues. So really does seem like the issue was some form of hardware failure, albeit I wonder if perhaps it still could have been managed in software if chip could have been powered down and back up in software.
For those of you asking about using 160MHz on the low interference channels --
I'm running 160MHZ on the DFS channel block (100-166) with the following configuration and it's working great with two RT3200s (one as a router and the other as a wired AP) both on the same channel with DAWN. Stability is good and both 2.4Ghz and 5Ghz channels are using the same SSID and my devices seem to roam ok. I've only had it up for 2 days though and I have yet to migrate my IoT devices over from my old network so I can't say how it will hold up in the long run.
The only test that I did was to connect one RT3200 as a client to the other on a 160MHz wide channel since I don't have any AX-enabled devices. Throughput was around 500-600Mbps which seems higher than what I can usually achieve out of 802.11ac 80MHz (~400-500Mbps)
One of my routers started playing up, just saw the following in the logs:
Oct 6 15:39:49 router-mesh kernel: [ 1476.190032] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000053
Oct 6 15:39:49 router-mesh kernel: [ 1476.198853] Mem abort info:
Oct 6 15:39:49 router-mesh kernel: [ 1476.201655] ESR = 0x96000005
Oct 6 15:39:49 router-mesh kernel: [ 1476.204700] EC = 0x25: DABT (current EL), IL = 32 bits
Oct 6 15:39:49 router-mesh kernel: [ 1476.210002] SET = 0, FnV = 0
Oct 6 15:39:49 router-mesh kernel: [ 1476.213054] EA = 0, S1PTW = 0
Oct 6 15:39:49 router-mesh kernel: [ 1476.216188] Data abort info:
Oct 6 15:39:49 router-mesh kernel: [ 1476.219060] ISV = 0, ISS = 0x00000005
Oct 6 15:39:49 router-mesh kernel: [ 1476.222895] CM = 0, WnR = 0
Oct 6 15:39:49 router-mesh kernel: [ 1476.225865] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000041722000
Oct 6 15:39:49 router-mesh kernel: [ 1476.232322] [0000000000000053] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
Only remedy is a reboot.
Im running OpenWrt SNAPSHOT r17677-f82c93b93c, I was running one or 2 builds before this one and decided to upgrade to see if the problem would stop.
Managed to get a week without any issue but looks like there's something else going on.
Anything I can grab that could help in troubleshooting?
Edit: I have 2 routers and just realised both are doing this.
Edit2: Just saw these have been reported before ( I search for mem abort info and that's why I haven't seen it).
I'm seeing the same thing, don't have any logs unfortunately, but with hardware flow offloading enabled there are one or two crashes a day, with hardware flow offloading disabled (software flow still enabled) there aren't any crashes.
Well, the current behaviour (of booting into recovery if pstore present) is @daniel 's explicit choice for this router. It may well be suited for development, debugging & error reporting in the development phase, but it may be strange for end-users who merely wish that the router recovers automatically. Personally I am not sure if that should remain the default when the router gets into the next release...
To my knowledge there is no real reason not to simplify the bootcmd, and daniel himself mentions the possibility in