21.02.0 (and snapshot) fail to boot on My Book Live Duo

Not related, this here issue is with the SATA subtarget.

After shapshot switched to 5.10, I took another stab at it. Unfortunately I have to report that the same problem occurs with kernel 5.10:

bootlog
U-Boot 2009.08-svn54115 (Nov 15 2011 - 10:54:56), Build: 0.0.12

CPU:   AMCC PowerPC APM82181 Rev. D at 800 MHz (PLB=200, OPB=100, EBC=100 MHz)
       Security support
       Bootstrap Option E - Boot ROM Location NOR/SRAM (8 bits)
       32 kB I-Cache 32 kB D-Cache
Board: Apollo-3G - APM82181 Board, 2*SATA, 1*USB
I2C:   ready
DRAM:  Auto calibration 256 MB
FLASH: 512 kB
DTT:   1 FAILED INIT
Net:   PHY EC1 Register: 0x2c8c
ppc_4xx_eth0

Type run flash_nfs to mount root filesystem over NFS

p=============================================================================q
|:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::|
|:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::|
|:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::|
|:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::|
|:::::::::::WWWWWWWWW::::WWWWWWW::::::::WWWWWWW::DDDDDDDDDDDDDDDDD::::::::::::|
|:::::::::::WWWWWWWW::::WWWWWWWW:::::::WWWWWWW::DDDDDDDDDDDDDDDDDDDD::::::::::|
|:::::::::::WWWWWWWW:::WWWWWWWWW::::::WWWWWWW::DDDDDDDDDDDDDDDDDDDDDD:::::::::|
|::::::::::::::::::::::::::::::::::::::::::::::::::::::::::DDDDDDDDDDD::::::::|
|:::::::::::WWWWWWW:::WWWWWWWWWW::::WWWWWWW::DDDDDDDDD:::::::DDDDDDDDD::::::::|
|:::::::::::WWWWWWW::WWWWWWWWWWW:::WWWWWWW::DDDDDDDDD::::::::DDDDDDDDD::::::::|
|:::::::::::WWWWWW::WWWWWW::WWWWW:WWWWWWW::DDDDDDDDDD:::::::DDDDDDDDDD::::::::|
|:::::::::::WWWWWWWWWWWWW:::WWWWWWWWWWWW::DDDDDDDDDD::::::DDDDDDDDDDD:::::::::|
|:::::::::::WWWWWWWWWWWW::::WWWWWWWWWWW::DDDDDDDDDDD:::DDDDDDDDDDDDD::::::::::|
|:::::::::::WWWWWWWWWWW:::::WWWWWWWWWW::DDDDDDDDDDDDDDDDDDDDDDDDDD::::::::::::|
|:::::::::::WWWWWWWWWW::::::WWWWWWWWW::DDDDDDDDDDDDDDDDDDDDDDDDD::::::::::::::|
|:::::::::::WWWWWWWWW:::::::WWWWWWWW::DDDDDDDDDDDDDDDDDDDDDDDD::::::::::::::::|
|:::::::::::WWWWWWWW::::::::WWWWWWW::DDDDDDDDDDDDDDDDDDDD:::::::::::::::::::::|
|:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::|
|:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::|
|:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::|
|:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::|
b=============================================================================d
Hit any key to stop autoboot:  0
USB:    OTG-Controller ID(0x4F54) Version(0x290A)
Err detect usb time out hostport.d32 0x00001000
0 USB Device(s) found
       scanning bus for storage devices...
0 Storage Device(s) found
** Block device usb 0 not supported
Board is not configured for test. Do normal boot.
----- Checking Boot Partitions -----
SATA DWC initialization 0
init: Waiting for device...
sata_dwc: Device found
scan: Waiting for device...
SATA DWC initialization 1
sata_dwc: Hard Disk not found. Status = 0x7f
** Bad partition 1 **
SATA DWC initialization 0
init: Waiting for device...
sata_dwc: Device found
scan: Waiting for device...
SATA DWC initialization 1
sata_dwc: Hard Disk not found. Status = 0x7f
Loading file "/boot/boot.scr" from sata device 0:1 (hda1)
622 bytes read
0:1
## Executing script at 00100000
SATA DWC initialization 0
init: Waiting for device...
sata_dwc: Device found
scan: Waiting for device...
SATA DWC initialization 1
sata_dwc: Hard Disk not found. Status = 0x7f
Loading file "/boot/uImage" from sata device 0:1 (hda1)
4287817 bytes read
Loading file "/boot/apollo3g.dtb" from sata device 0:1 (hda1)
16384 bytes read
Loaded part 1
## Booting kernel from Legacy Image at 01000000 ...
   Image Name:   POWERPC OpenWrt Linux-5.10.72
   Image Type:   PowerPC Linux Kernel Image (gzip compressed)
   Data Size:    4287753 Bytes =  4.1 MB
   Load Address: 00000000
   Entry Point:  00000000
   Verifying Checksum ... OK
## Flattened Device Tree blob at 01800000
   Booting using the fdt blob at 0x1800000
   Uncompressing Kernel Image ... OK
   Loading Device Tree to 00ff9000, end 00ffffff ... OK
[    0.000000] printk: bootconsole [udbg0] enabled
[    0.000000] Linux version 5.10.72 (builder@buildhost) (powerpc-openwrt-linux-musl-gcc (OpenWrt GCC 11.2.0 r17742-977bf5e980) 11.2.0, GNU ld (GNU Binutils) 2.37) #0 Sun Oct 10 23:05:54 2021
[    0.000000] Using PowerPC 44x Platform machine description
[    0.000000] -----------------------------------------------------
[    0.000000] phys_mem_size     = 0x10000000
[    0.000000] dcache_bsize      = 0x20
[    0.000000] icache_bsize      = 0x20
[    0.000000] cpu_features      = 0x0000000000000120
[    0.000000]   possible        = 0x0000000040000120
[    0.000000]   always          = 0x0000000000000120
[    0.000000] cpu_user_features = 0x8c008000 0x00000000
[    0.000000] mmu_features      = 0x00000008
[    0.000000] -----------------------------------------------------
[    0.000000] Top of RAM: 0x10000000, Total RAM: 0x10000000
[    0.000000] Memory hole size: 0MB
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] On node 0 totalpages: 65536
[    0.000000]   Normal zone: 576 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 65536 pages, LIFO batch:15
[    0.000000] MMU: Allocated 1088 bytes of context maps for 255 contexts
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 64960
[    0.000000] Kernel command line: root=/dev/sda2 rw rootfstype=squashfs,ext4 console=ttyS0,115200
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 249952K/262144K available (7096K kernel code, 668K rwdata, 1408K rodata, 184K init, 223K bss, 12192K reserved, 0K cma-reserved)
[    0.000000] Kernel virtual memory layout:
[    0.000000]   * 0xffbdf000..0xfffff000  : fixmap
[    0.000000]   * 0xd1000000..0xffbdf000  : vmalloc & ioremap
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
[    0.000000] UIC0 (32 IRQ sources) at DCR 0xc0
[    0.000000] UIC1 (32 IRQ sources) at DCR 0xd0
[    0.000000] UIC2 (32 IRQ sources) at DCR 0xe0
[    0.000000] UIC3 (32 IRQ sources) at DCR 0xf0
[    0.000000] random: get_random_u32 called from start_kernel+0x2e0/0x3f0 with crng_init=0
[    0.000000] time_init: decrementer frequency = 800.000008 MHz
[    0.000000] time_init: processor frequency   = 800.000008 MHz
[    0.000018] clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0xb881274fa3, max_idle_ns: 440795210636 ns
[    0.010285] clocksource: timebase mult[1400000] shift[24] registered
[    0.016603] clockevent: decrementer mult[ccccccef] shift[32] cpu[0]
[    0.022925] pid_max: default: 32768 minimum: 301
[    0.027616] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.034829] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.044443] dyndbg: Ignore empty _ddebug table in a CONFIG_DYNAMIC_DEBUG_CORE build
[    0.054724] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.064487] futex hash table entries: 256 (order: -1, 3072 bytes, linear)
[    0.074673] NET: Registered protocol family 16
[    0.079591] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations

[    0.087151] thermal_sys: Registered thermal governor 'step_wise'
[    0.088769] 256k L2-cache enabled
[    0.098074] PCIE0: Port disabled via device-tree
[    0.103367] PCI: Probing PCI hardware
[    0.125117] SCSI subsystem initialized
[    0.129875] libata version 3.00 loaded.
[    0.136389] clocksource: Switched to clocksource timebase
[    0.142590] NET: Registered protocol family 2
[    0.146994] IP idents hash table entries: 4096 (order: 3, 32768 bytes, linear)
[    0.154588] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.162878] TCP established hash table entries: 2048 (order: 1, 8192 bytes, linear)
[    0.170462] TCP bind hash table entries: 2048 (order: 1, 8192 bytes, linear)
[    0.177447] TCP: Hash tables configured (established 2048 bind 2048)
[    0.183827] UDP hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.190280] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.197311] NET: Registered protocol family 1
[    0.201597] PCI: CLS 0 bytes, default 32
[    0.230385] dw_dmac 4bffd0800.dma: DesignWare DMA Controller, 2 channels
[    0.247275] workingset: timestamp_bits=14 max_order=16 bucket_order=2
[    0.257647] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.432778] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[    0.440914] gpio-473 (Enable Reset Button, disable NOR): hogged as output/low
[    0.448915] Serial: 8250/16550 driver, 16 ports, IRQ sharing enabled
[    0.457560] printk: console [ttyS0] disabled
[    0.461815] 4ef600300.serial: ttyS0 at MMIO 0x4ef600300 (irq = 33, base_baud = 462962) is a TI16750
[    0.470792] printk: console [ttyS0] enabled
[    0.470792] printk: console [ttyS0] enabled
[    0.479052] printk: bootconsole [udbg0] disabled
[    0.479052] printk: bootconsole [udbg0] disabled
[    0.501379] loop: module loaded
[    0.504534] Loading iSCSI transport class v2.0-870.
[    0.513292] sata-dwc 4bffd1000.sata: id 0, controller version 1.91
[    0.521060] scsi host0: sata-dwc
[    0.524549] ata1: SATA max UDMA/133 irq 40
[    0.528904] sata-dwc 4bffd1800.sata: id 0, controller version 1.91
[    0.537322] scsi host1: sata-dwc
[    0.540756] ata2: SATA max UDMA/133 irq 41
[    0.545986] libphy: Fixed MDIO Bus: probed
[    0.550120] PPC 4xx OCP EMAC driver, version 3.54
[    0.555233] MAL v2 /plb/mcmal, 1 TX channels, 1 RX channels
[    0.561018] RGMII /plb/opb/emac-rgmii@ef601500 initialized with MDIO support
[    0.568217] TAH /plb/opb/emac-tah@ef601350 initialized
[    0.573648] /plb/opb/emac-rgmii@ef601500: input 0 in rgmii-id mode
[    0.581337] libphy: emac_mdio: probed
[    0.593441] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:90:a9:b6:0b:ca
[    0.600365] eth0: found Broadcom BCM50610 PHY (0x01)
[    0.605488] i2c /dev entries driver
[    0.609084] booke_wdt: powerpc book-e watchdog driver loaded
[    0.629648] NET: Registered protocol family 10
[    0.635456] Segment Routing with IPv6
[    0.639234] NET: Registered protocol family 17
[    0.643707] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[    0.656629] 8021q: 802.1Q VLAN Support v1.8
[    0.660880] drmem: No dynamic reconfiguration memory found
[    0.867648] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    0.919425] ata2: SATA link down (SStatus 0 SControl 300)
[    5.876405] ata1.00: qc timeout (cmd 0xec)
[    5.880490] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[    6.217635] ata1: SATA link down (SStatus 0 SControl 300)
[    6.223052] md: Waiting for all devices to be available before autodetect
[    6.229823] md: If you don't use raid, use raid=noautodetect
[    6.235460] md: Autodetecting RAID arrays.
[    6.239542] md: autorun ...
[    6.242328] md: ... autorun DONE.
[    6.246066] /dev/root: Can't open blockdev
[    6.250184] VFS: Cannot open root device "sda2" or unknown-block(0,0): error -6
[    6.257469] Please append a correct "root=" boot option; here are the available partitions:
[    6.265794] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    6.274026] Rebooting in 1 seconds..

After shapshot switched to 5.10, I took another stab at it. Unfortunately I have to report that the same problem occurs with kernel 5.10:

thanks. I have the feeling that the MBL Single and Duo are indeed different designs.... and that I'm missing something in plain sight.

I am completely useless when it comes to deciphering DTS files
[Edit: original post removed to not add noise] ... and so I was barking up the wrong tree.

So, am I completely stupid in asking if it's correct that both sata ports have an identical sata-port@0 and reg = <0>?

That I can hopefully explain. Let's look at the bootlog (from your post above):

[    0.513292] sata-dwc 4bffd1000.sata: id 0, controller version 1.91
[    0.521060] scsi host0: sata-dwc
[    0.524549] ata1: SATA max UDMA/133 irq 40
[    0.528904] sata-dwc 4bffd1800.sata: id 0, controller version 1.91
[    0.537322] scsi host1: sata-dwc
[    0.540756] ata2: SATA max UDMA/133 irq 41

SATA0 and SATA1 are two independent SATA-controllers (each has its own memory window and IRQ). So each controller there has one port... and because developers start counting at 0, that's why there are two reg = <0>; for drive1: and drive0: (these are both just labels).

Note: these driveX: sata-port@0 nodes are mostly syntactic sugar and don't really have any influence about the problem here.

However, I think I found something in regards to the regulator. If it is indeed just a missing property... then it really was hidden in plain sight. I'll get around to this later
with a new .dtb.

1 Like

Here's a .dtb (this time no uuencode foolery, you should be able to get the file through a github gist.
https://gist.github.com/chunkeey/a744b6e28b6c018f5fc1cea82d6380e8#file-openwrt-apm821xx-sata-wd_mybooklive-apollo3g-dtb

(Preview looks broken, but it should be the real file. sha256sum of openwrt-apm821xx-sata-wd_mybooklive-apollo3g.dtb is 720d787a4ac7ce51dc472abd87c4119d17dee5fa50feff2164ec136bbb75996d )

also: I found the reason why this worked with 19.07: CONFIG_REGULATOR was not
set, so the DTS entries were ignored. :man_facepalming:

As for the fix, your logs hinted with the following messages what was wrong:

[    0.531599] sata1-regulator GPIO handle specifies active low - ignored
[    0.538391] sata0-regulator GPIO handle specifies active low - ignored

I changed the GPIO polarity and set the enable-active-high;property in my patch (which produced the .dtb above):

https://git.openwrt.org/?p=openwrt/staging/chunkeey.git;a=commit;h=2eb7b0718a082f5d84158f5bff8f7a2ee3ba1558

Cheers,
Christian

Bingo! With this DTB it boots up fine, you have been right on the money.

Honest to god, I was looking at GPIO_ACTIVE_LOW and I wondered if it could be related to the issue since the drives audibly powered down, suggesting (in my mind) that the SATA port was turned off. I just didn't dare to suggest it since, again, I'm patently useless when it comes to DTSsa and it worked with 19.07 (and now we know why.)

So, thank you! You can add me as tested-by if you like.

Edit/addendum: From your commit message

IMHO no. As far as I can see, the MBL Single just ignores the regulator setting it doesn't have. The same disk boots up fine on an MBL Single as well:

MBL Single bootlog excerpt
[    0.867775] ata1: SATA link down (SStatus 0 SControl 300)
[    0.918739] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    0.956740] ata2.00: ATA-8: SAMSUNG HD501LJ, CR100-12, max UDMA7
[    0.962733] ata2.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 1/32)
[    0.971825] ata2.00: configured for UDMA/133
[    0.976462] scsi 1:0:0:0: Direct-Access     ATA      SAMSUNG HD501LJ  0-12 PQ: 0 ANSI: 5
[    0.985724] sd 1:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/466 GiB)
[    0.993901] sd 1:0:0:0: [sda] Write Protect is off
[    0.998704] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    1.003832] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.023103]  sda: sda1 sda2
[    1.027084] sd 1:0:0:0: [sda] Attached SCSI disk

(Also I would really miss to be able to just move a disk between an MBL Single and a Duo.)

Pinging @chunkeey because time sensitivity: It seems 21.02.1 may get tagged tomorrow, does it make sense to rush the fix in to "properly" handle the issue or can 21.02.1 remain on the "band-aid" fix for the time being?

After the long time that was required for this "simple and proper fix", I would say: "let's get some real experience with these (SATA) regulators first.". But then, I didn't see/hear complains until 21.02 got released. I hope that the issue is now addressed, but I can't say it for certain.

I suspect ppl are not eager to update their MBL Duos. Understandably so, once they run they run, and they are not edge routers after all and security updates are not as important.

But yes, I am a bit puzzled by the lack of feedback, too. One would imagine that, since 21.02.0 definitely doesn't work on MBL Duo, someone would speak up. But this thread is basically a conversation between the two of us, and I have to believe that I am not the only one using an MBL Duo. :confused:

I guess the band-aid fix will (have to) do for 21.02.1.

2 Likes

The warning message at the device page has kept me from updating my MBL Duo. I've been checking this thread now and then. I wanted to let you know that there's at least one person out there waiting for a proper update.

2 Likes

21.02.1 has just been released and includes the "band aid" fix for MBL Duo. In the meantime, the "proper" fix has been merged to snapshot.

I've just updated to 21.02.1 and my MBL Duo died on me. The LED is green but the device is nowhere to be seen on the network. Maybe the band aid fix didn't work after all?

Hi!
I'm still using my MBL Duo with 19.07.3 as I didn't feel any urge to upgrade yet. Fortunately I've read this thread before I started...

Seeing the recent posts the problem seems to be not fixed yet. Unfortunately I cannot test it in the coming weeks due to other commitments, sorry. :frowning: I'll be available in mid-November, I'll dig myself in it...

The solid green LED (after blinking green for a bit) indicates that it is completely booted up, so it's definitely not the problem that prompted this thread.

(Edit) I went ahead and ironed 19.07.8 on a disk with default configuration, set the network to DHCP, connected the MBL Duo to my router, uploaded 21.02.1-sysupgrade (wget refused SSL), and then sysupgraded, all the while peeping in via serial to see and record if and what something goes wrong. Which it didn't. 21.02.1 booted up fine and the network settings have been properly retained.

This still doesn't answer the very valid question why your MBL Duo is not reachable anymore, but it's not 21.02.1's fault. For all the world it sounds like something is wrong with your network settings. Did you ...

  • accidentally select "do not keep settings" when sysupgrading, and now your MBL is at its default network settings (static 192.168.1.1/24)?
  • have your network connection set to something exotic that requires additional packages?
  • (unlikely) have the network connection depend on something that is kept on an additional partition? Since the root partition size changed between 19.07 and 21.02, the disk is reset to its default MBR, and additional partitions have to be recreated

The answer to all of these questions is no. My MBL Duo doesn't show up on my router's "connected devices" list anymore. I've also tried directly connecting it to my PC via Ethernet and visiting 192.168.1.1 and 192.168.2.99 (previously assigned static IP). Neither of them worked.

Just to make double sure: You remembered that you have to set your PC's ethernet to a matching static address, i.e. 192.168.1.2 and netmask 255.255.255.0 if you want to access 192.168.1.1?

If that fails, there's just two options. One would be to connect to the MBL Duo's serial port. Of course you would need a TTL adapter and the connector is fiddly and you need some experience.

The other option is a bit simpler: remove the disk from the enclosure, connect it to the PC, and have a look what's in the root file system by mounting /dev/sdx2, see how that system is currently configured. You must be set up to do this since you had to do it to get OpenWrt on the disk in the first place.

And if all else fails, you can just dd the 21.02.1 factory image to the disk, exactly like you originally wrote OpenWrt to the disk. It won't overwrite any additional partitions' data (sysupgrade pretty much does the same), but you will have to start configuration from scratch. (You did remember to take a note of your previous partitiion layout, as in the instructions, right?)

In addition to all of this: The reset button on the MBL Duo does not work (not that it would do anything relevant on ext4 installations.) I would have to test if it works on a MBL Single, but on a MBL Duo it does nothing when pressed, almost as if the button was not connected.

OK, so the problem was something to do with my network configuration after all.

After connecting the disk to my desktop PC and confirming that the previous configuration is retained, I've directly connected the MBL to my desktop and was able to login to LuCI. I'm not sure why it didn't work yesterday with laptop though. LuCI has asked me to migrate my network configuration and I was good to go after doing that.

That's a relief, I'm happy you could sort it out.

1 Like