Sort of sounds like a hardware issue, so you'd better plug in that monitor and see if it shows anything... Maybe make a boot USB with memtest64 and run that?
Both suggestions are worth testing, another would be testing to add intel_idle.max_cstate=1 as kernel parameter in grub. Background, many baytrail-d systems are plagued by a power delivery bug while waking up from deeper cstates, which make the system crash, as the CPU doesn't get enough power quickly enough, letting the voltage drop below the threshold needed to operate. This is a hardware bug, but the frequency of it happening varies widely, depending on the mainboard design, but also how often the kernel goes into deeper cstates, respectively how quickly it wants to wake up. This kernel parameter disables the cstates, keeping the CPU from entering them, in the hope to avoid the error condition.
Depending on your board, it may 'fix' the crashes, help - or do nothing/ don't help at all, you can only test it (and the cause of your issue might be something completely different altogether).
If you have run this x86 rig for 4years 24/7 plus the actual age of the actual hardware I seriously doubt a config change will do any difference now if it suddenly start to show signs of critical hardware failures now when the design life has been reached.
The easiest thing and pretty much the only thing you can change in that machine is the harddisk and/or psu.
Maybe the ram if you actually find something that old that actually fit the memory socket.
Maybe, maybe not. I (and many others on bugzilla.kernel.org) have been seeing this on and off again on many baytrail-d devices, the frequency of these issues varies a lot (from basically never to multiple times a day, depending on kernel, current workload, etc.). It sadly isn't an exact science when it comes to this hardware bug, as it really depends on the exact transitioning phases between cstate 5+ and wakeup calls (graphics demands are also part of the story) - so really subtle differences in usage patterns (versions of kernel and userspace) might make- or break it.
But yes, these oldest of these devices are surpassing the one decade mark of their service life, so hardware damage/ deterioration can't be ruled out either - but the above is an easy test.
The issue occurs while it's in use and also while it's in IDLE.
Difficult to suspect it's related to sleep states, for my particular case.
Thanks for passing on this valuable info for some other day.
That is not at all at odds with the issue described above, the issue is with transition between (and out of-) cstates, something the CPU is doing all the time (pretty much regardless of idle or being stressed).
Disk errors are covered with fsck? If I don't see any errors?
In the issue state, even the power switch button doesn't work.
Can that be RAM issue?
Does the CPU need RAM to process the power button?
Also, I am able to hit the issue once in 5 reboots. Whereras, leaving it ON, it takes 2-3 days to hit. Does it provide any debug info? What special happens post reboot?
And I did cross-check, it does come up fine for 2-3 seconds, DHCP serves IP, router responds to ping and then hangs.