Hi Lynx,
I have it at 80Mhz, did only following to the howto.
Its stable as rock (Uptime 15 days), and its fast enough toward my needs.
Fasttransition works in my environment, measured with winfi app my windows laptop switched smooth with no ping loss to the mesh ap with the strongest signal.
I'm using wireguard, but my gateway is my internet router, so I do not need port forwarding.
Hi. I am on snapshot build 17598 and somehow I am not able to upgrade via attendedsysupgrade and auc. Attendedsysupgrade is not downloading the new build SNAPSHOT (r17648-16e83a7491) and also auc is displaying ssl error. Screenshot attached.
Both online image builders @ https://asu.aparcar.org/ https://chef.libremesh.org/
uses a Let's Encrypt certificate issued with a root CA in the chain with CN = DST Root CA X3 and got expired on 30th sept. older openssl systems seems to continue due to another kind of chain validation implementation but with wolfssl systems seem to have more issues. So could be depending on the implementation or the chain offered by the server. You might also try to remove the expired cert in /etc/ssl/certs/ca-certificates.crt if this cert is used to build the chain.
OK devs, I have three RT3200's but one of them gets into a bad state about 1 in 3 boots in which the 5Ghz radio driver initialization fails and I see kernel errors.
The main risk here is that you may have gotten a device where one or more erase-blocks in the very beginning of the SPI-NAND flash are broken. I did my best to make the installer also handle these cases properly (ie. relocate factory data, which has to be kept at known offset, in order to reverse/mitigate the effects of MediaTek's BMT which is used bu non-UBI OpenWrt as well as the stock firmware). I couldn't yet get hold of device having on of the first blocks broken, hence I had not chance to test this myself.
The worst-case here is a device which comes up without a valid MAC address and missing WiFi calibration. In that case you can either try to resolve things manually using the backup of the flash or use that to revert to the stock firmware.
Doesn't look like this is the case here. The hexdump of the factory partition looks alright (offsets match expectations) and if it was a problem related to calibration it would happen each and every time you boot.
As you only observe the problem on some but not every time you boot the device it has to be something else. I've heard MT7915E sometimes isn't reset durin soft-reboot and then stuck after boot, which looks more like what you are observing as well.
So first thing would be to try cold (ie. disconnect from power or using physical power switch) reset vs. warm (ie. using reboot command) reset and see if chances for Wifi to come up are any different.
Thanks. So presumably still a hardware fault? Is there something that could be done in software to help recover from such a reboot? For devices like this one with fault? Think I will just RMA this. My other two have been fine.
Yes it does. Seems to work perfectly and in mesh (with circa 300-400Mbit/s between it and the other two RT3200's, using 80Mhz on channel 36, albeit I could never get 160Mhz mesh to work).
The cold/warm boot issue you explain above seems to match with my experience. I wonder if this is a hardware issue with my specific device, or whether there could be a certain software state that can nudge any RT3200 into this state. Do you suppose the former? Could it start happening with time as the devices age?
In any case, could the chip be powered down and then back up in software in order to deal with this if it happens to other users to help improve reliability?
As you are running the UBI build, are you sure that you are not in the recovery mode when it starts wrongly, with the 5GHz WiFi disabled? Have you checked the /sys/fs/pstore for crash dumps? (Note that those survive warm reboot, and only cold boot clears them from RAM. Or you can manually delete them.)
So each time I checked for the existence of the /sys/fs/pstore crash dumps and they were not present. Is there another way to discern recovery mode? I think from recollection in recovery mode LuCi would show a different snapshot, and I did not see that.
I have three of these RT3200 devices in a mesh. Two are close together and this problematic one is further away but in range of both. In any case, I have consistently had problems with this one. Namely I would reboot all three and then find I could not access the web page for this one (but could access the other two fine). I would fix this by manually switching the switch at the back off, waiting a bit, and then switching it back on. And it was only when I hooked up a LAN cable to connect with my laptop that I realised the 5Ghz wifi was not coming up and saw these timeout errors.
In this state when I issued 'reboot' in ssh the issue seemed to remain. But if I power off and power on again then it seems to bring it back.
I'm encountering a problem that results in the same error posted here:
Oct 3 12:08:59 ROUTER1A kernel: [12276.321915] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000053
It happens approximately 4-10 hours. It's an issue with the SNAPSHOT r17652-21c7a8593d and also happened on r17443 from the UBI installer repo.
This morning I attempted to get pstore data but I was unable to contact the router using either it's assigned IP address, 10.10.1.1. I also didn't have any luck pinging 192.168.1.1. I will be attempting to access it through either SSH or web on default IPs/ports when it next fails and hopefully I have more data to provide.
One thing that I do note is that the router continued to access the Internet and route traffic through Ethernet. It's the WiFi (both 2.4 and 5.0 radios) that wasn't functional. The fact that it routed traffic as a gateway makes me think that it was still operating as 10.10.1.1, though I couldn't get a ping response and didn't note any traffic from that IP with Wireshark.
I've been following this thread for the past couple days as I tried to diagnose the issue. If I am able to get pstore logs/data, I'll post them here. This router is not part of a mesh or anything like that. Normal gateway router/AP only.
I think in that case I would consider DFS. You ought to be able to scan and monitor and see which channel segment is the least occupied.
BTW for anyone wanting more SQM / VPN performace just enable irqbalance. I noticed significantly decreased loadavg with that enabled. And I saw on another thread this router can manage 1Gbit SQM with it enabled. Actually is there any reason not to have this enabled?
Finally has anyone got 160Mhz mesh or WDS working? Or 160Mhz working well in general? As in greater throughput than 80Mhz.
I haven't really tried 160MHz. Originally tried to get it working back in June/July before LuCI had full ax support and couldn't get a stable connection. So I just set it to 80MHz and kept it there until earlier today experimenting a little bit. From my quick tests it did connect at 160MHz and report as such on the status page / station list. Have no idea about performance / stability, this was just a quick test before I decided I should probably keep the router on 80MHz until this crash problem is sorted so I don't introduce any additional variables in the troubleshooting process.
Is there any documentation on channel selection in this router, especially at 160MHz? I know with some routers only certain channels work with wider bandwidth, and it doesn't seem that all routers refer to the block of channels in a standard way. I seem to recall that when specifying the channel number, some routers want you to select the low numbered channel, some want the high numbered channel, and some want you to choose the center channel. Not sure how this works with this model, going to see what I can find out about this.
My router just had the oops occur again and I was finally able to access it.
The router is running r17652-21c7a8593d, and after the oops it booted into recovery SNAPSHOT r17443-90e167abaa. Accessing the router at 192.168.1.1 and default ports was successful.
I was able to obtain dmesg-ramoops-0 and dmesg-ramoops-1 which appear to have identical data about the trace, as follows:
These nullpointers at virtual address 0000000000000053 are seen and reported in this thread since end of July. Had multiple occurrences as well then revered to version prior end of July with OpenWrt SNAPSHOT r17114-349e2b7e65.
It is reported here
and supposedly fixed 10 days ago.
Not sure when it ends up on openwrt snapshot builds though, someone ?
I saw this reported earlier in the thread and the request for pstore logs. I also followed the link to the issue on GitHub, saw the commit, and thought that meant it'd be in snapshot builds from then on. Thought it must not have been a full fix since it seemed to be the same issue. Didn't realize until you pointed it out that commit doesn't get incorporated into subsequent OpenWrt snapshots without further steps. My mistake.
I think I'll revert later today to r17114 on my main router.
I am also waiting for this fix to be incorporated in the nightly build as my router goes into recovery mode every 2-3 days. Thinking of writing a script to clear the pstore content and to reboot the router whenever it goes into recovery for the time being.