Arris TR4400 v2 - procd: Connection to ubus failed

hi,

i'm suck porting OpenWrt to this device. i'm building and testing an initramfs uImage. the kernel boots, failsafe mode works, but normal mode gets stuck just after the option to enter failsafe.

debug level 4 is not that helpful:

[   12.331757] procd: Change state 1 -> 2
[   12.334356] procd: - watchdog -
[   12.337930] procd: Ping
[   12.341276] procd: Opened watchdog with timeout 30s
[   12.343458] procd: Watchdog did not previously reset the system
[   12.348472] procd: - ubus -
[   12.354255] procd: Create service ubus
[   12.357028] procd: Start instance ubus::instance1
[   12.361155] procd: Started instance ubus::instance1[217]
[   12.365502] procd: running /etc/init.d/ubus running
[   12.371109] procd: glob failed on /etc/init.d/ubus
[   12.404261] procd: Connection to ubus failed
[   12.506564] procd: Connection to ubus failed
[   12.711170] procd: Connection to ubus failed
[   13.116851] procd: Connection to ubus failed
[   13.921142] procd: Connection to ubus failed
[   14.925519] procd: Connection to ubus failed
[   15.928816] procd: Connection to ubus failed
[   16.936135] procd: Connection to ubus failed
[   17.346747] procd: Ping
[   17.946984] procd: Connection to ubus failed
[   18.958172] procd: Connection to ubus failed
[   19.968933] procd: Connection to ubus failed
[   20.979227] procd: Connection to ubus failed

the full bootlog is here. at that point, there is nothing else to do but rebooting. file /etc/init.d/ubus does not exist, but that seems not to be a problem as it looks like procd is hardwired to start ubusd.

failsafe works, and in that mode i can start ubusd and connect to it without issues.

i tried building master and a few earlier commits all the way back to fcc075e12f6f4718ccac6b5eb68cb8346064ee43 and there was no change.

i'm using linux mint to build (ubuntu based). i read somewhere that something like this could be caused by building in ubuntu, so i installed LMDE 5 beta (debian based) and built there; still same issue.

i'm stuck. what could be wrong?

thank you!

continue normal boot from failsafe

when booting normally the device hangs and you can't access it via serial or ssh, so you can't investigate. i searched for a way to enable console access during early normal boot, or a way to continue booting from failsafe, but came out empty handed. i only found people complaining that they are stuck there.

so i came up the following incantation. i don't know if there are better alternatives, but some solution to this should be documented in the wiki. to continue booting from failsafe, paste the following into the serial console:

touch /tmp/sysupgrade
lock -u /tmp/.failsafe
rm /tmp/sysupgrade

the rm command introduces a race condition. the rationale for it is that i don't know what the existence of /tmp/sysupgrade means elsewhere in the codebase, so i want it gone ASAP. the race could cause the incantation to fail at its job (never happened to me); in that case just repeat it as needed.

after continuing from failsafe:

root@(none):/# ps
  PID USER       VSZ STAT COMMAND
    1 root      1312 S    /sbin/procd
    2 root         0 SW   [kthreadd]
    3 root         0 IW<  [rcu_gp]
    4 root         0 IW<  [rcu_par_gp]
    5 root         0 IW   [kworker/0:0-eve]
    6 root         0 IW<  [kworker/0:0H]
    7 root         0 IW   [kworker/u4:0-ev]
    8 root         0 IW<  [mm_percpu_wq]
    9 root         0 SW   [ksoftirqd/0]
   10 root         0 IW   [rcu_sched]
   11 root         0 SW   [migration/0]
   12 root         0 SW   [cpuhp/0]
   13 root         0 SW   [cpuhp/1]
   14 root         0 SW   [migration/1]
   15 root         0 SW   [ksoftirqd/1]
   16 root         0 IW   [kworker/1:0-mm_]
   17 root         0 IW<  [kworker/1:0H]
   18 root         0 IW<  [netns]
   19 root         0 IW   [kworker/0:1-rcu]
   20 root         0 SW   [oom_reaper]
   21 root         0 IW<  [writeback]
   22 root         0 SW   [kcompactd0]
   29 root         0 IW<  [pencrypt_serial]
   30 root         0 IW<  [pdecrypt_serial]
   49 root         0 IW<  [kblockd]
   50 root         0 IW<  [blkcg_punt_bio]
   51 root         0 IW<  [kworker/u5:0]
   52 root         0 SW   [watchdogd]
   53 root         0 SW   [kswapd0]
   55 root         0 IW<  [kthrotld]
   56 root         0 IW   [kworker/u4:1-ev]
   57 root         0 IW   [kworker/u4:2-ev]
   58 root         0 SW   [irq/36-aerdrv]
   59 root         0 SW   [irq/38-aerdrv]
   60 root         0 IW   [kworker/1:1-eve]
   61 root         0 SW   [spi0]
   65 root         0 IW<  [stmmac_wq]
   66 root         0 IW<  [stmmac_wq]
   67 root         0 IW<  [stmmac_wq]
  101 root         0 IW<  [ipv6_addrconf]
  102 root         0 IW<  [dsa_ordered]
  103 root         0 IW   [kworker/0:2-eve]
  104 root         0 IW   [kworker/0:3-mm_]
  105 root         0 SW   [ubi_bgt0d]
  111 root         0 SW   [irq/41-keys]
  112 root         0 SW   [irq/42-keys]
  113 root         0 IW   [kworker/1:2-ipv]
  114 root         0 IW<  [ata_sff]
  200 root       868 S    dropbear -r /tmp/dropbear_failsafe_host_key
  206 root      1256 S    /bin/sh /etc/preinit
  208 root      1104 S    ash --login
  223 ubus      1016 S    /sbin/ubusd
  227 root      1100 R    ps

/sbin/ubusd is there but i can't talk to it:

root@(none):/# ubus list
Failed to connect to ubus
root@(none):/# ls -l /proc/223/fd
lr-x------    1 ubus     ubus            64 Jan  1 00:09 0 -> /dev/null
l-wx------    1 ubus     ubus            64 Jan  1 00:09 1 -> /dev/null
l-wx------    1 ubus     ubus            64 Jan  1 00:09 2 -> /dev/null
lr-x------    1 ubus     ubus            64 Jan  1 00:09 3 -> /dev/urandom
lrwx------    1 ubus     ubus            64 Jan  1 00:09 4 -> anon_inode:[eventpoll]
lr-x------    1 ubus     ubus            64 Jan  1 00:09 5 -> pipe:[837]
l-wx------    1 ubus     ubus            64 Jan  1 00:09 6 -> pipe:[837]
lrwx------    1 ubus     ubus            64 Jan  1 00:09 7 -> socket:[838]
lrwx------    1 ubus     ubus            64 Jan  1 00:09 8 -> socket:[840]

restarting it does not work:

root@(none):/# killall ubusd
root@(none):/# ps | grep ubus
  277 ubus      1016 S    /sbin/ubusd
  281 root      1100 S    grep ubus
root@(none):/# ubus list
Failed to connect to ubus

but if i manually run it...

root@(none):/# ubusd
[ 1075.017177] procd: - init -
Please press Enter to activate this console.
[ 1075.219110] kmodloader: loading kernel modules from /etc/modules.d/*
[ 1075.228326] Loading modules backported from Linux version v5.10.42-0-g65859eca4dff
[ 1075.228350] Backport generated by backports.git v5.10.42-1-0-gbee5c545
[ 1075.228749] urngd: v1.0.2 started.
[ 1075.241765] xt_time: kernel timezone is -0000
[ 1075.276120] PPP generic driver version 2.4.2
[ 1075.276844] NET: Registered protocol family 24
[ 1075.311235] ath10k 5.10 driver, optimized for CT firmware, probing pci device: 0x46.
[ 1075.311697] ath10k_pci 0000:01:00.0: enabling device (0140 -> 0142)
[ 1075.318692] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[ 1075.759325] ath10k_pci 0000:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
[ 1075.759365] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[ 1075.771493] ath10k_pci 0000:01:00.0: firmware ver 10.4b-ct-9984-fW-13-5ae337bb1 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 7ea63dc5
[ 1078.088093] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 0:1 crc32 85498734
[ 1081.635731] ath10k_pci 0000:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
[ 1081.635757] ath10k_pci 0000:01:00.0: msdu-desc: 2500  skid: 32
[ 1081.718147] ath10k_pci 0000:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0 ct-sta: 0'
[ 1081.719013] ath10k_pci 0000:01:00.0: wmi print 'free: 84920 iram: 13156 sram: 11224'
[ 1082.007335] ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1
[ 1082.109123] ath10k 5.10 driver, optimized for CT firmware, probing pci device: 0x40.
[ 1082.110585] ath10k_pci 0001:01:00.0: enabling device (0140 -> 0142)
[ 1082.117043] ath10k_pci 0001:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[ 1082.562414] ath10k_pci 0001:01:00.0: qca99x0 hw2.0 target 0x01000000 chip_id 0x003b01ff sub 168c:0002
[ 1082.562454] ath10k_pci 0001:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[ 1082.573565] ath10k_pci 0001:01:00.0: firmware ver 10.4b-ct-9980-fW-13-5ae337bb1 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 b36a12bf
[ 1082.643951] ath10k_pci 0001:01:00.0: board_file api 2 bmi_id 1:2 crc32 08fa09f2
[ 1083.797488] ath10k_pci 0001:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
[ 1083.797520] ath10k_pci 0001:01:00.0: msdu-desc: 2500  skid: 32
[ 1083.875680] ath10k_pci 0001:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0 ct-sta: 0'
[ 1083.876465] ath10k_pci 0001:01:00.0: wmi print 'free: 31080 iram: 23028 sram: 9596'
[ 1084.160291] ath10k_pci 0001:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1
[ 1084.267569] kmodloader: done loading kernel modules from /etc/modules.d/*
[ 1088.508753] ipq806x-gmac-dwmac 37200000.ethernet eth1: Link is Down
[ 1088.513276] dwmac1000: Master AXI performs any burst length
[ 1088.513828] ipq806x-gmac-dwmac 37200000.ethernet eth1: No Safety Features support found
[ 1088.519385] ipq806x-gmac-dwmac 37200000.ethernet eth1: IEEE 1588-2008 Advanced Timestamp supported
[ 1088.527545] ipq806x-gmac-dwmac 37200000.ethernet eth1: registered PTP clock
[ 1088.536486] ipq806x-gmac-dwmac 37200000.ethernet eth1: configuring for fixed/sgmii link mode
[ 1088.543638] ipq806x-gmac-dwmac 37200000.ethernet eth1: Link is Up - 1Gbps/Full - flow control off
[ 1088.552134] br-lan: port 1(eth1) entered blocking state
[ 1088.560696] br-lan: port 1(eth1) entered disabled state
[ 1088.565970] device eth1 entered promiscuous mode
[ 1088.572287] br-lan: port 1(eth1) entered blocking state
[ 1088.575798] br-lan: port 1(eth1) entered forwarding state
[ 1088.588678] ipq806x-gmac-dwmac 37600000.ethernet eth2: PHY [37000000.mdio-mii:07] driver [Atheros 8031 ethernet]
[ 1088.599576] dwmac1000: Master AXI performs any burst length
[ 1088.599605] ipq806x-gmac-dwmac 37600000.ethernet eth2: No Safety Features support found
[ 1088.604063] ipq806x-gmac-dwmac 37600000.ethernet eth2: IEEE 1588-2008 Advanced Timestamp supported
[ 1088.612090] ipq806x-gmac-dwmac 37600000.ethernet eth2: registered PTP clock
[ 1088.620971] ipq806x-gmac-dwmac 37600000.ethernet eth2: configuring for phy/sgmii link mode
...

the boot continues! and i get working dhcp and ssh services.

this is starting to look more and more like a platform issue rather than a port issue.

time to file a bug report?

Myself and some others are having the same if not similar issue building for x86 X86_64 build of 21.02 doesn't boot because ubus exits with error code 32512

They have an exit code in their log though. I do not, mine is identical to yours. I am trying to build in the openwrt docker build image which is based on debian but so far it's not looking promising

not the same issue as that thread, me thinks. my ubusd process is alive and well. just sits there, too cool to bother answering requests.

please do try debian, but i suspect it won't help you. (but report here, please!)

FYI, i reported here: https://github.com/openwrt/openwrt/issues/9492

Run -

fgrep -r sock /sbin/ubusd /etc/ /rom | grep 'run/ubus'; find /tmp/ | grep ubus.sock; grep BUILD_ID /etc/os-release

thanks!

boot to failsafe, then incantation to continue normal boot, then when boot finishes:

root@(none):/# fgrep -sr sock /sbin/ubusd /etc/ /rom | fgrep 'run/ubus'
/sbin/ubusd:/var/run/ubus/ubus.sock
root@(none):/# ls -l /var/run/ubus/ubus.sock
srw-rw-rw-    1 ubus     ubus             0 Jan  1 00:00 /var/run/ubus/ubus.sock

root@(none):/# find /tmp/ | fgrep ubus.sock
/tmp/run/ubus/ubus.sock
root@(none):/# ls -l /tmp/run/ubus/ubus.sock
srw-rw-rw-    1 ubus     ubus             0 Jan  1 00:00 /tmp/run/ubus/ubus.sock

root@(none):/# grep BUILD_ID /etc/os-release
BUILD_ID="r17410-b312b046f1"
root@(none):/# ls -l /var
lrwxrwxrwx    1 root     root             3 Jan  1 00:00 /var -> tmp

root@(none):/# ubus list
Failed to connect to ubus
root@(none):/# ubus -s /var/run/ubus/ubus.sock list
Failed to connect to ubus

Similar issue in this thread -

yeah i saw that post before asking for help. it doesn't seem to be the same issue, they are sysupgrading and doing some work on their overlay saves the day (as does a clean install, i think i remember). i'm booting initramfs: i've no state. also before posting i checked passwd,shadow,groups and ubus is there. in fact, you can see that in my setup ubusd is running, and running as user ubus (check the ps output above). thanks anyhow!

root cause found: see #9545

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.