Reboot on IPQ4018 causing power down, not reboot

I've added support for the EnGenius ENS620EXT, but I have one remaining problem. Reboots cause the device to power down, not reboot.

I've traced the reboot process and it appears there is the MSM reboot notifier and the Watchdog reboot notifier registered. It gets all the way to invoking the reboot notifier chain. I have not attempted to debug into them (that's next) as the device powers down - I expect the register write to de-assert the ps_hold is taking place (powering down the device) but don't know what I'm missing that should/would cause the restart part of reboot...

I've taken a look at the GPIO config between my build and the factory build, assuming that maybe a GPIO is not set the right way to cause reboot, but there is only one difference between configs (GPIO54, which is in mux_cs, shows as output low on my build, input on factory build.

If I remove the mux_cs (which "owns" GPIO 54) in the dts, the device does not find the SPI flash.

Any easy pointers? I've spend a fair amount of time trying to debug this and I'm hoping someone might have easy insights to save me more banging my head against this one.

This means when you type "reboot" it will show some "remove" messages and then hang, so you need to switch the power supply, right? :roll_eyes:

Exactly that. Here is the serial console (with some printk's and delays I added to make sure the messages come out:

root@OpenWrt:/# reboot
root@OpenWrt:/# [  111.103505] br-lan: port 1(eth0) entered disabled state
[  111.107946] device eth0 left promiscuous mode
[  111.108072] br-lan: port 1(eth0) entered disabled state
[  111.187843] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[  115.765659] GLENNON SYSCALL reboot
[  115.765943] GLENNON kernel_restart with cmd= (null) arm_pm_restart=   (null)
[  115.770413] GLENNON migrate_to_reboot_cpu
[  115.814787] reboot: Restarting system
[  115.815055] GLENNON do_kernel_restart with cmd= (null)

The next call in kernel/reboot.c is to atomic_notifier_call_chain(&restart_handler_list,reboot_mode,cmd);

That handler list contains:

[    1.294943] GLENNON: register_restart_handler function pointer is =c05e06b0     System.map:14767:c05e06b0 t do_msm_restart
[    1.297290] GLENNON: register_restart_handler function pointer is =c05e67dc     System.map:14942:c05e67dc t watchdog_restart_notifier

Should add that it hits the do_msm_restart rather than the watchdog:
GLENNON: In do_msm_restart, msm_ps_hold=0xd0ca6000

There is a thread over here http://community.onion.io/topic/2727/poweroff-and-reboot-don-t-do-what-they-should that suggests that this relates to resetting the SPI-NOR flash prior to reboot - and also indicates that "poorly chosen flash chips" can be in a non-boot-compatible mode at restart, creating the impression of powered down (constant reboot trying to read valid data from flash)

I tried the "reset the spi-nor flash" trick in the linked patch:
No luck.
Based on the last comment in that post (that implies the chip is in 4-byte addressing mode, and needs to be in 3-byte addressing mode), I set the SPI-NOR into 3-byte addressing mode explicitly (should also have been done by SPI-NOR reset commands):
No luck.

You may try @chunkeey 's staging tree with 4.19 kernel:
https://git.openwrt.org/openwrt/staging/chunkeey.git

@LGA1150 do you think you can just use gpio-hog to initialize the required PIN to the correct state?
This would make it much easier to backport.

@chunkeey I have not fully identified which GPIO (if any) is required to cause it to restart rather than power down. Once/If I work that out I'll plan to use gpio-hog (tried using it to mess with GPIO 59 which is an un-associated GPIO output). Right now symptoms are consistent with powered down (no LEDs), but also consistent with the SPI flash being in an unreadable state causing infinite boot loop before any serial output.

@LGA1150 @chunkeey In the midst of building from staging tree. Hitting a build environment issue that is failing - ending up with python2.7 linked as python in the staging_dir/host/bin which fails to build package/firmware/wireless-regdb. Errors out with no package 'builtins', requires python 3.x.

Just trying with a manually linked python3.6

yes, this is known issue and @ynezz is working on python3 : https://github.com/openwrt/openwrt/pull/1937 . I need the updated wireless-regdb for testing if you don't you could just as well revert the patch locally.

@chunkeey, thanks for that. Was able to build with python3.6.

@LGA1150, staging tree built as 4.14 or 4.19 behaves the same as my build.
Next step is work out "is it powered down, or is it stuck in a boot loop trying to read SPI-NOR".

Well it is not trying to read the SPI-NOR on reboot. Scoped the /CS line and get lots of action during boot, but it settles down and stops being activated. Reboot command does not cause the /CS to trigger, so I think I can rule out flash 4-byte addressing as the causative issue.

Stumbled across someone who managed to get the GPL dump from EnGenius (I had not succeeded) -->Forum Post so I will scour that for further clues/avenues of investigation.

Unfortunately fried the AP probing the power controller IC, so ordered another. I have a spare to work on in the meantime.

@chunkeey
Pretty sure python-future is only needed, at least it used to be

You say so here.:


So it must be true.

It was also confirmed by Felix later further down so no need to be snarky

1 Like

Ok, I'm sorry. That wasn't my intention. I wanted to highlight that "python-future" tip is "true"/"valid" and there's no need to have doubts about it.

1 Like

I have resolved (not really fixed) this. After some keyhole debug of stock firmware (using /dev/mem into physical memory at x8020800 as a proxy for kernel memory at xc0208000) I found that the stock firmware was not registering the do_msm_restart() handler, and it was invoking qcom_wdt_restart.

By disabling the restart controller in the device tree, it does not register do_msm_restart() and uses watchdog_restart_notifier. That version of restart works.

+
+               restart@4ab000 {
+                       status = "disabled";
+               };

Yay for persistence. And thanks to those here for listening and helping.

2 Likes

The failure seems to have related to a bug in the u-boot that ships with the device. Flashing with 3.5.5.3 EnGenius firmware (which is a bad idea right now, is non-downgradable and makes it ALMOST IMPOSSIBLE to get an OpenWrt image into the box) comes with a new u-boot that allows reboot to work with the msm restart path.