I'm guessing this last is cut short due to a reboot, so there may have been more messages coming. In any case, the restart proceeds until "loading kernel" and hangs there. The kernel info is correct, new build date, etc and checksums OK.
I'm seeing this on WRT1900ACS and WRT3200ACM. Interestingly, not on WRT1200AC which I believe is same hardware as WRT1900ACS, just 2 antennas instead of 4.
Any ideas on how to troubleshoot? I tried flashing both img and bin, same result.
After some config trial and error I get a loading kernel -- see the diff below, I'm not sure which is the culprit or why I was messing with these. Either they were fat-fingered or they've been there a while and something changed making one of them matter. (I'm guessing the latter because I check diffconfigs after most upgrades and didn't notice these as changes.)
However, the boot is now hanging at procd: ubus. I have added all new users (ntp, logd, ubus) to /etc/passwd, shadow and group. I also added the socket config to /etc/config/rpcd just in case (see newest img upgrade failed). Any other config changes that matter? For the record I sysupgraded without keeping changes and the boot is clean -- if it comes down to it I can reconfigure from scratch, I just figured this should have been a straightforward change.
I still get the umount errors (see orig message) when sysupgrading -- not sure if this is related.
Config diff, from non-booting to booting kernel:
< CONFIG_KERNEL_DEBUG_LL=y
< CONFIG_KERNEL_DEBUG_LL_UART_NONE=y
< # CONFIG_KERNEL_KALLSYMS is not set
< # CONFIG_KERNEL_MEMCG_SWAP is not set
> # CONFIG_KERNEL_SWAP is not set
The router from this point continually retries to start ubus, failing each time.
I see no /etc/init.d/ubus on any of my routers, even the ones that boot past this point. I also don't see this file installed or created in the package/system/ubus/Makefile. Any idea what's going on here?
Thanks for the response. I will update the original post to state I'm building from master and this seems related to the updates over the last few weeks to run logd, ubus, ntp as non-root.
So I had linked to the post in your first response indicating I had made those changes. I use nginx not uhttpd so I believe the only one that matters is the rpcd config. In that case I just pre-updated the config as it doesn't seem to matter for builds prior to the non-root upgrades. With a uci get rpcd.@rpcd[0].socket I get a return of /var/run/ubus/ubus.sock. On the off-chance it matters I updated the uhttpd config to include the socket change, with no luck.
I'm guessing there's something else legacy that matters in my config because I get a clean boot if I don't keep changes.
I was referring to the series of commits that modify ntp, logd, ubusd to run as non-root. It doesn't seem to matter because even with that line in the config, prior to these commits the ubus socket is still at /var/run/ubus.sock.
I may eventually bite the bullet and do what you're suggesting but I'm also inclined to understand what's going on here. Not being familiar with how procd, ubus etc interact the code is hard to walk through. I can't figure out why procd is reporting that ubus is trying to access /etc/init.d/ubus -- whether it does that all the time regardless and the failure to find it doesn't matter, etc.
Is it the fail on /etc/init.d/ubus that is the problem or is it something else that fails and emits the error code? I can find where the message is printed and what calls it, but I can't find the definitions for what 65280 means.
Below is what happens during a "working" boot. It fails the /etc/init.d/ubus access but looks like there are follow-on items that include reading random numbers.
The third area that calls the init_id routine doesn't appear to get specifically called by any code but is a constructor so gets called automatically when ubus is started I'm guessing. These are the only places /dev/random is read so either this fails or something before the random reads that doesn't emit any debug text.
So I have figured out the files (possibly not all) that matter here.
Sysupgrading when /etc/shadow, /etc/passwd and /etc/group are included fails to boot, halting when procd starts ubus.
Sysupgrading without keeping changes, and replacing just passwd/shadow/group with those contained in the sysupgrade also fails to boot in the same manner. I replaced these one at a time subsequently: group didn't matter (booted fine after replacing), replacing passwd caused the boot to hang. I did not reflash and try shadow.
Sysupgrading without keeping changes, and replacing everything except these 3 files boots fine.
The only real difference in passwd/shadow is an additional user I add for samba.
Additional testing: editing the files to add the samba user I mentioned above works fine -- I add the user and the router reboots just fine. There seem to only be issues with overwriting passwd (and possibly shadow) through copy/move or tar. Strange?
As mentioned above I'm flashing from r14721+18-4ff7bdfeeb to latest master, right now r14887+17-a47279154e.
So I believe I have this resolved. I still don't understand which commit caused the issue but keeping files was broken at r14721 and as of r14898+17-c02096361c it seems to no longer be an issue.
To upgrade and get a booting router:
save off a config, external to router
sysupgrade -n (not keeping files)
ssh into router, transfer config over and extract to a temp location
in the temp location, move /etc/passwd,shadow,groups to a different temp location
cp all remaining config files recursively to their proper locations
edit passwd/shadow/groups manually to add info from the files reserved from the config
This then boots and is once again sysupgradable keeping config.