Hello everyone ,
i just came accross a very weird issue with my (till now) very stable openwrt deployment on a remote site for the vpn , monitoring the equimpent there .
The CPE was the TP-LINK TD-W9980 v1
It was flashed correctly and in opperation for over 2.5 years!
Till yesterday that i went to download a new package to extend the functionality, i opened the updates tab and proceeded updating the already installed packages one by one , i lost it for some time ,it went offline ( normal i though since it was upgrading packgaes related to internet connection ) and came back , everything went up normaly , all my Vlans etc and procced my mentenace task remotely (over the TPlink's vpn) to the other subsystems of the station.
But sudendly ( a hour or so later) i lost it again and completely, i counld ping it even from its wan nor from the site`s main network , the one that the TPlink is backing up .
While sending a on-site techincian to Evaluate the situation and following my instructions he rebooted it but it never came back online , it would not link to our other swiches , to its DSL line nor to the usff that i have there that runs the monitoring software.
While we replaced it with a hot spare we keep in site , just in case , I requested to have them shipped to me , so i can evaluate it what happened to it and crashed so baddly . It was one of our "heaviest" workers, it surpassed in uptime even some Cisco gear we have there!
So im asking for advice/opinions what shoud i look for while doing this " post mortem" RCanalysis because , not gonna lie in all these years dealing with openwrt , i have nothing similar to this happen before and it sparked my curiosity!
i marked your reply as the solution
but i still have several questions ,
a) why didnt the router go offline imidetly after i run the upgrades but a coupple of hours later ?
b) what should have i done seen a bunch of packages were ready to recive a update (other that leaving it allone) ? in other words whats the proper procedure for upgrading openwrt?
c) for what i can understand from the link you provided , mainly the crash/softBrick is due to the packages completly filling up my overlay and since the package updates " cannot) overwrite the original (stored in ROM)," thus a RTFD from the switch at the back will fixes it ?
That's when whatever software/code that was no longer there, etc. - the failure occurred.
From the Wiki page linked:
In the vast majority of cases, any security patches of significant importance/risk will be rapidly released in an official stable maintenance release to be upgraded using the sysupgrade system. This is the recommended method for keeping up-to-date.
Those looking to be on the bleeding edge can consider using the snapshot releases, but should be mindful of the differences between stable and snapshot. Or, alternatively, build a custom image with the desired updated packages included in that image.
Not gonna lie , comming from pfsene & just plain regular linux this is new / intresting !
but reading about it i can understand why its a diferent case here .
thank you for taking the time to reply to my perhaps silly questions
Needing to be small at (almost) all costs comes with side effects - as do the requirements to cope with the very limited stone age bootloaders and OEM partitioning.