Upgraded to snapshot r17216-8c2509dc5f via attended sysupgrade on my second E8450. Random MAC issue is gone, but last night it appears to have locked up. I have remote syslog set up and everything looks normal until this at the end:
Yesterday evening I changed the SSID and password on the 2.4 GHz radio to fix a tasmota device configured for a previous setup. At 22:02 I changed it back to the new SSID:
I guess the oops cause the device to reboot into recovery mode (like it should in such a case). Can you extract (hopefully) more complete logs from /sys/fs/pstore?
edit: you find the device on IP 192.168.1.1/24 in recovery mode and should be able to login using SSH root user without password.
Im dont think its a good idea to automatically boot to recovery after kernel crash as it causes confusion.
How to go from recovery to normal image again? Is a rebooty by luci or power toggle enough?
You just have to clear PSTORE and reboot, then everything will be back to normal. This can either be done using rm /sys/fs/pstore/* or by disconnecting the device from power for a short moment, so DRAM content will be cleared.
I think this is extremely useful, especially in snapshot images as kernel oops will not go unnoticed in that way and users for the first time are able to report meaningful things back to us even if the device' serial console is not connected during the crash. If you don't like this, it's also simple to tell U-Boot not to care about PSTORE at all, simply change the bootcmd variable from inside OpenWrt using fw_printenv/fw_setenv.
For a production release things will have to be a bit different, of course. One option would be to let the recovery image handle and clear PSTORE automatically, ie. upload logs to a URL configured in U-Boot environment or store them on an additional UBI storage volume.
And, of course, we need clear indication in LuCI that we are currently running recovery, it'd also be great to have a notification about logs being present in PSTORE and be able to view and clear them in LuCI. I can provide a JSON-RPC interface for PSTORE ops if anyone is willing to implement the front-end part (I'm not into web/front-end stuff at all, ie. graphical stuff makes me feel lost and angry, I hate using web-browsers and prefer everything in a simple text-mode console myself, ncurses is far as it gets with UI in my case, at least I can still use that without having to use the mouse).
As you see it results only in "wlan did not work anymore until i power cycled" I would wonder too if the device "boots itself" into recovery
What about to check if /sys/fs/pstore/* exists, and then show an additional line "crash logs found" or so in Luci-Overview, maybe red? Just by if (file-exist-condition) fields.push('output' in feeds/luci/modules/luci-mod-status/htdocs/luci-static/resources/view/status/include/*.js
But only if its possible to keep /sys/fs/pstore/* when not booting to recovery automatically.
Btw, if some of my access-points boot automatically to recovery and get the default ip (=same as the router) 192.168.1.1 it will disable whole internet access and it would some tome to notice its not a problem of the router
Additional:
For my device on the Overview page in luci it show "Linksys E8450" or "Linksys E8450 (UBI)", depending. It would be great if the Revovery shows as "Linksys E8450 (UBI-Recovery)", so i could get it directly
I have already receive many useful PSTORE dumps in the past. Ok, sometimes it took an initial confusion for users to understand what just happened, but I still believe it's worth it even if only a fraction of users manage to extract and submit logs from PSTORE.
PSTORE generally doesn't get lost unless you manually clear it or unplug power for a few seconds. The problem with not booting into recovery if there is something in PSTORE is the potential of triggering (costly, in terms of device lifetime) infinite reboot loops in case something crashes early during boot.
Also, as a useful side-effect, users can decide to manually boot into recovery using echo c > /proc/sysrq-trigger.
Regarding LuCI suggestions: there is https://github.com/openwrt/luci/pull/5041 in order to provide a generic infrastructure to display notifications of all kinds to the user in LuCI.
Regarding default IP in recovery mode: I was thinking to implement a way to store settings relevant to recovery in U-Boot env, or simply use the existing ipaddr from U-Boot env also for OpenWrt when booting into recovery.
In the meantime, maybe move your router away from 192.168.1.0/24 subnet...
When the device does not boot automatically into recovery, but shows a red notification user could still send the logs. The PR is interesting, but the discussion luci/global will still take some time. Just add a line to overview is done fast (and deleted later also, if needed)
To prevent a reboot-loop a counter could be added. Something like just a number in /sys/fs/pstore/crashcount and if some limit is exceeded fall back to recovery.
I dont want to change my routers ip, its this since my 1st lan! So i change the default of my images ip with files/etc/config file. Only recovery does not use this
Is $ fw_setenv bootcmd 'run boot_ubi' okay to change? Does holding the reset button durin power up then still work to get into recovery?
A reboot counter would not be enough, it'd need to be with timestamps (also tricky without RTC) to be able to recognize "early" reboots and what is "early" anyway (in seconds or ms)? I've only seen broken implementations of that approach for now...
Counting the number of records in pstore already works, but I don't see how it would be more transparent/easy for users to understand if their device hangs in recovery after 5 crashes instead of after the first time it happens. It will just delay the problem and keep users unaware of problems (unless they manually check /sys/fs/pstore or have an eye on uptime).
And yes, to not have U-Boot check PSTORE the change to U-Boot environment you stated works as expected (and you will still be able to manually trigger recovery or tftpboot by holding down RESET button during boot).
Thats simple: run in some startup script rm -f /sys/fs/pstore/crashcount on (every) sucessfull reboot (recovery + installed)
Not go to recovery after 1st crash helps if somthing after a long runtime goes wrong.
The boot-loop case is if eg in the kernel is something wrong and the device is not able to boot at all
In theory that's a good solution. However, it'd require an additional pstore record (crashcount) to be handled by the kernel -- for now, this is all just vanilla Linux features without patching anything related to pstore, but just using it as-is. If you think this is easy to do, please submit patches to upstream Linux and OpenWrt lists.
Imho it'd be easier to have an init-script which handles pstore in recovery, clears it and reboots (according to settings it finds in U-Boot env). I've just been to lazy to implement that (but it's on the list).
hello @daniel
my colleague has the same router as me but he lost the openwrt wifi on his router, he uses sqm for the video game and suddenly found himself in moderate nat,
instead of nat open, but with the difference that this time the router did not go back to the original blue interface as in the past, it disconnected 10SEC and reconnected, and the wifi came back, this problem could it be recurrent thank you?
The best would be if you manage to extract the logs from /sys/fs/pstore next time it happens, so we will be able to reproduce the cause (I didn't manage to crash it even once, but that can well be related to the behavior of wifi clients; on MT7620 there were problems which could only be triggered by WiFi action frames emitted by an Xbox.... you get the idea...).
If you that is not an option for you and you just want crashes to silently reboot the router and keep things functional, you can also disable the PSTORE feature in the U-Boot environment (see above).
Where is the best place for uploading pstore files? I have a couple. The device boots into recovery frequently. Finally read this thread about pstore files and how to clear them. Good stuff.
I have two Belkin RT3200 can't get mesh working. I just played around using the tutorial that I used in the past without success. Could someone post a working config to help me, please. The openwrt tutorial has worked in the past with other devices but does not work with the Belkin RT3200. I used the network/wireless setup and the command line examples. Using latest snapshot with wpad-mesh-wolfssl.
iw dev mesh0 station dump
iw dev mesh0 mpath dump
With the above two command there is no output. Log file looks like the mesh0 interface is created. Not sure if it is a belkin problem or something that I am doing wrong. Probably the later.
good evening everyone does upnp work on rt3200 it does not seem to want to make it work because i have xbox and ps5 several machines, thanks in advance