I've had occasional boot failures with my self built Unifi 6 images and cannot figure out why.
This happened most recently on a checkout from (the arbitrary) commit 32c683ddceba from Oct 16.
After some bad experiences in the past, I'm now careful to run "make dirclean" whenever I pull from master. So there should not be any residue from previous builds, I believe.
I usually build with the most recent toolchains available. Which means binutils 2.39 and GCC 12.2. So GCC 12 ssues were one of my first suspicions. Now I have two Unifi 6 Lites, so after the first one failed I simply rebuilt the imahe (after make dirclean again) using GCC 11.3 instead. But this second image failed too. I guess that pretty much excludes the possibility of both hardware failure and a GCC 12 specific issue.
None of the Unifi 6 Lites had console, unfortunately. So built an image for ZyXEL NR7101, which also is MT7621 based, using the exact same repo and toolchain configuration. Hoping that this would fail in a similar way with console output. But that image worked just fine.
So it's not related to the SoC platform either.
I finally got around to adding console to one of the failing Unifi 6 Lites. This revealed that it booted normally up to and including the point where the kernel writes
[ 0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
[ 0.000000] printk: bootconsole [early0] enabled
[ 0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
There it just stopped with no more output at all. No panic stack trace or error messages of any kind. Just hanging.
The next expected messages would have been
[ 0.000000] MIPS: machine is Ubiquiti UniFi 6 Lite
[ 0.000000] Initrd not found or empty - disabling initrd
[ 0.000000] VPE topology {2,2} total 4
But without any errors, and a similar SoC booting just fine with a kernel built from the same code and toolchain, I am pretty lost.
Does anyone have a suggestion? I'm completely lost. I would like to figure out why this happens so I can avoid it, but I don't know where to start.
So far, I've ended up working around the issue by using tftp rescue. This is the third time. And although the method is simple, it is a hassle. I have to get physically close enough to push and hold the reset button. And these APs are not mounted to be accessible, unfortunately.
Now that I have console, I might be able to fix that one "remotely". But that also requires being close to the AP, since I have no other device close enough for a permanent console connection (I wired up a bluetooth serial console. but that doesn't work over much longer distances than a serial cable anyway - just avoids making a hole in the case)
BTW, does anyone know how to possible control the Ubnt bootoader entirely using Bluetooth? The Unifi 6 Lite has a builtin bluetooth controller which is supported in the bootlaoder. Re-installing OpenWrt after such failures would be much easier if I could do that without having to touch the reset button. Can Bluetooth be used instead to trigger recovery? Or am I right guessing that the BLE functionality also depends on pressing the reset button? The cosnole output kind of suggests that. With button pressed:
-Boot 2018.03 [UniFi,v1.1.40.71] (Nov 18 2020 - 20:03:50 -0700), Build: jenkins-Bootloaders-BL_mtk_multi-1.1.40-1
MediaTek MT7621AT ver 1, eco 3
Clocks: CPU: 880MHz, DDR: 1200MHz, Bus: 220MHz, XTAL: 40MHz
DRAM: 256 MiB
Loading Environment from SPI Flash... SF: Detected mx25l25635f with page size 256 Bytes, erase size 64 KiB, total 32 MiB
OK
In: uartlite0@1e000c00
Out: uartlite0@1e000c00
Err: uartlite0@1e000c00
Net: eth0: eth@1e100000
SF: Detected mx25l25635f with page size 256 Bytes, erase size 64 KiB, total 32 MiB
load ubntapp ok
Board: Ubiquiti Networks MT7621 board (a612-15.0000)
UBNT application initialized
*WARNING*: Could not parse FW version, please check FW format
is_default true
is_ble_stp = true
~~~ p_device_model:U6-LITE
~~~ is_default:1 ~~~
~~~ p_macaddr:f4:92:bf:ac:83:58 ~~~
~~~ is_ble_stp:1 ~~~
=========================GPIO INIT=====================
GPIO_16, action is output low
GPIO_16, action is output high
GPIO_19, action is output low
GPIO_19, action is output high
=========================UART_3 INIT=====================
uartlite0@1e000c00, 1e000c00
uartlite0@1e000e00 bring up, 1e000e00
=========================FLOW 1=====================
[BT Power On Result] Success
=========================FLOW 2=====================
[HCI RESET Result] Success
=========================Extend FLOW=====================
[HCI LE BT MAC ADDR Result] Success
=========================FLOW 3=====================
[HCI LE SET ADVERTISING PARAMETER Result] Success
=========================FLOW 4=====================
[HCI LE SET ADVERTISING DATA Result] Success
=========================FLOW 5=====================
[HCI LE SET SCAN RESPONSE Result] Success
=========================FLOW 6=====================
[HIC LE SET ADVERISTING ENABLE Result] Success
MT7915 BLE broadcasting successfully
Autobooting in 2 seconds, press "<Esc><Esc>" to stop
ubnt boot ...
SF: Detected mx25l25635f with page size 256 Bytes, erase size 64 KiB, total 32 MiB
(and then it goes on with the recovery "Erasing cfg partition ......" if the button still is held)
Without button pressed:
U-Boot 2018.03 [UniFi,v1.1.44.73] (Dec 11 2020 - 02:09:10 +0000), Build: jenkins-Bootloaders-BL_mtk_multi-1.1.44-1
MediaTek MT7621AT ver 1, eco 3
Clocks: CPU: 880MHz, DDR: 1200MHz, Bus: 220MHz, XTAL: 40MHz
DRAM: 256 MiB
Loading Environment from SPI Flash... SF: Detected mx25l25635f with page size 256 Bytes, erase size 64 KiB, total 32 MiB
OK
In: uartlite0@1e000c00
Out: uartlite0@1e000c00
Err: uartlite0@1e000c00
Net: eth0: eth@1e100000
SF: Detected mx25l25635f with page size 256 Bytes, erase size 64 KiB, total 32 MiB
load ubntapp ok
Board: Ubiquiti Networks MT7621 board (a612-15.0000)
UBNT application initialized
*WARNING*: Could not parse FW version, please check FW format
is_default <NULL>
set is_default true
is_ble_stp <NULL>
is_ble_stp = false or NULL
~~~ p_device_model:U6-LITE
~~~ is_default:1 ~~~
~~~ p_macaddr:f4:92:bf:ac:83:58 ~~~
~~~ is_ble_stp:0 ~~~
=========================GPIO INIT=====================
GPIO_16, action is output low
GPIO_16, action is output high
GPIO_19, action is output low
GPIO_19, action is output high
=========================UART_3 INIT=====================
uartlite0@1e000c00, 1e000c00
uartlite0@1e000e00 bring up, 1e000e00
=========================FLOW 1=====================
len = 8, num = 10
[BT Power On]: Tx Cmd=01 6f fc 06 01 06 02 00 00 01
[BT Power On]: Rx Msg=00 00 00 00 00 00 00 00
[BT Power On Result] Fail
=========================FLOW 2=====================
=========================Extend FLOW=====================
=========================FLOW 3=====================
=========================FLOW 4=====================
=========================FLOW 5=====================
=========================FLOW 6=====================
MT7915 BLE broadcasting successfully
Autobooting in 2 seconds, press "<Esc><Esc>" to stop
ubnt boot ...
SF: Detected mx25l25635f with page size 256 Bytes, erase size 64 KiB, total 32 MiB
reading kernel 0 from: 0x1d0000, size: 0x00b7e000
So there's still that "BLE broadcasting " message, but it also says "[BT Power On Result] Fail" and all the LE messages are missing.