We have a fleet of about 100 OpenWRT devices. Across the entire fleet we observe about 5 unexpected per day. We have streaming logs set up via rsyslog but we don't see any messages from before the restart.
As far as I can tell we are compiling OpenWRT with KERNEL_CRASHLOG enabled, but I don't see anything in /sys/kernel/debug/crashlog. Other advice on the forum suggests compiling with the default configuration but that's not practical for us.
The absence of other evidence makes me suspect it's a memory issue. Do you have any tips on how to detect this, or debug unexpected crashes in general? Logging onto 50 boxes and hoping to see something in the terminal traceback is not great. We could sample the memory every 5-10 seconds and log it? If the memory is growing slowly over time we could see that but not if the memory spikes all at once. We could deploy an OOM killer to our development OpenWRT instances and see if anything gets killed due to OOM?
Is there some way to write a file to disk in the event the system is about to crash due to OOM?
Thanks for your help. This is my first post, I hope I followed the rules and please let me know if I should have posted this in a different forum.