TP-Link Archer AX23 does not reboot every time with 23.05.5

Hello,

I am running four TP-Link Archer AX23 devices as access points. I reboot the devices every night by cron (that's back from Archer C7 times and having problems from time to time with wifi bandwidth when not rebooting the devices every 1 to 7 days) and noticed that the devices are not coming up again every time like with 23.05.4 and 23.05.3 (I migrated to AX23 with 23.05.3). In fact, one of four devices is failing every night and is offline. Only one LED (power) is on, the rest is off. Unplugging power and booting the device did help every time for the last three days.

Anyway I just downgraded to 23.05.4 and I hope that this really is a problem with the 23.05.5 firmware.

Does anybody else experiencing these problems? Maybe also with other devices?

Regards,
Roi

1 Like

Power led is on - so device booted up?

It might. But is not reachable with its IP over the network. Also no wifi. So I unplugged the device this, another yesterday morning and a third the morning two days ago. The configuration is more or less identical, as the devices are. So it is not a device problem but something which changed from .4 to .5 - hopefully I see all devices up tomorrow morning and the following days.

One in N more or less is ASLR related that kernel leaves initialised memory where bootloader expects zeroes. Should repeat with same intensity at rebooting in a row.

Okay interesting. But I cannot do something about that, correct? It needs to be fixed in a upcoming version? Something changed because with the same devices I did not experience problems for months, rebooting the devices every night.

Without derial rog vhances are close to zero to discover at which point of (re-)boot it stops. Without that option i'd say to try to stabilize it as it is, like leave one AP up forever and check whether it accumulates fatigue of sorts ever?

Yeah not rebooting anymore or not so often would be a workaround. :wink: But the problem would still be there.

Did something change from .4 to .5 which would explain the problems I am experiencing here? Also I do not think that I am the only use with this hardware (and I would guess that such a thing is not limited to that specific model, but maybe to ramips-mt7621, but there are more router models with that chip) and also not the only one who reboots the devices.

There are many places pre-initialized devices or memory may fail some code. Unless serial log clearly points to what is wrong it is like hundred-some components to look through.

Looks like I have the same issue. My AX23 may randomly boot/reboot without network, only Power led is on.

AX23 v1.20
Latest OpenWRT from releases
openwrt-23.05.5-ramips-mt7621-tplink_archer-ax23-v1
Linux version 5.15.167 (builder@buildhost) (mipsel-openwrt-linux-musl-gcc (OpenWrt GCC 12.3.0 r24106-10cc5fcd00) 12.3.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Mon Sep 23 12:34:46 2024

Sometimes it boots and gets IP, and LAN led comes up and it works fine until a reboot, where it seems to boot, but stays offline inaccessible.

The serial console is working, the router is alive. But network is not usable. And what is the most interesting, when try to check the network status using "ip a" or "ifconfig" - the command hangs forever, while the console itself stays alive, and you still get kernel messages. But you cannot kill the command and type anything.

This is the end of boot log:

Sat Oct 12 07:15:13 2024 daemon.notice netifd: Interface 'loopback' is now up
Sat Oct 12 07:15:13 2024 user.notice firewall: Reloading firewall due to ifup of lan (br-lan)
Sat Oct 12 07:15:16 2024 kern.info kernel: [   26.740048] mt7530-mdio mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control off
root@OpenWrt:/# [   88.074007] mt7530-mdio mdio-bus:1f lan1: Link is Down

root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
ping: sendto: Network unreachable
root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# ifconfig 




^C^C^C^C^C^C

Is this happening only when manually restarting it or only "over the day"? I experienced a new problem for myself: I did not downgrade every device to .4 here and one of the .5 devices was once not reachable anymore yesterday. Same effect. But I did not check the console like you did.

For me it randomly happens during reboot, but also can happen during cold boot

To diagnose this correctly, one really would need a boot log, so a serial connection is required. As brada4 points out, it can just be about anything. The fact multiple people experience this issue does not make it easier to diagnose.

So far, this is the equivalent of 'it does not work'. You cannot expect someone to try and reproduce something as generic as that, since it could basically mean anything.

Seems some race between network interface start. If you restart wan interface at that point - is all ok?

I had it two times yesterday evening and tonight. Without rebooting, so just on "normal" operation. Se the devices seem to have crashed. In the evening, the device came back up including network, during the night, not.

Cannot bring interface down/up or restart the whole network - commands hang for a while and then fail with an error:

Command failed: Request timed out

But at least those commands don't hang forever. Running ifconfig wan hangs forever

Please press Enter to activate this console.
[    8.716968] kmodloader: loading kernel modules from /etc/modules.d/*
[    9.015985] Loading modules backported from Linux version v6.1.110-0-g5f55cad62cc9d
[    9.023710] Backport generated by backports.git v6.1.110-1-0-g965f73fc
[    9.238793] pci 0000:00:00.0: enabling device (0000 -> 0003)
[    9.244521] mt7915e_hif 0000:01:00.0: enabling device (0000 -> 0002)
[    9.251426] pci 0000:00:01.0: enabling device (0000 -> 0003)
[    9.257144] mt7915e 0000:02:00.0: enabling device (0000 -> 0002)
[    9.554915] mt7915e 0000:02:00.0: HW/SW Version: 0x8a108a10, Build Time: 20220929104113a
[    9.554915] 
[    9.808286] urngd: v1.0.2 started.
[    9.988290] mt7915e 0000:02:00.0: WM Firmware Version: ____000000, Build Time: 20220929104145
[   10.025050] mt7915e 0000:02:00.0: WA Firmware Version: DEV_000000, Build Time: 20220929104205
[   10.143764] mt7915e 0000:02:00.0: registering led 'mt76-phy0'
[   10.258031] mt7915e 0000:02:00.0: registering led 'mt76-phy1'
[   10.556859] random: jshn: uninitialized urandom read (4 bytes read)
[   10.727354] random: crng init done
[   10.730786] random: 31 urandom warning(s) missed due to ratelimiting
[   15.607302] PPP generic driver version 2.4.2
[   15.613323] NET: Registered PF_PPPOX protocol family
[   15.650373] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[   15.658276] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
[   15.675858] kmodloader: done loading kernel modules from /etc/modules.d/*
[   23.779319] mtk_soc_eth 1e100000.ethernet eth0: Link is Down
[   23.814052] mtk_soc_eth 1e100000.ethernet eth0: configuring for fixed/rgmii link mode
[   23.822552] mtk_soc_eth 1e100000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[   23.831728] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   23.843978] mt7530-mdio mdio-bus:1f lan1: configuring for phy/gmii link mode
[   23.853401] br-lan: port 1(lan1) entered blocking state
[   23.858727] br-lan: port 1(lan1) entered disabled state
[   23.866547] device lan1 entered promiscuous mode
[   23.871228] device eth0 entered promiscuous mode
[   23.902383] mt7530-mdio mdio-bus:1f lan2: configuring for phy/gmii link mode
[   23.910756] br-lan: port 2(lan2) entered blocking state
[   23.916124] br-lan: port 2(lan2) entered disabled state
[   23.923321] device lan2 entered promiscuous mode
[   23.937893] mt7530-mdio mdio-bus:1f lan3: configuring for phy/gmii link mode
[   23.946573] br-lan: port 3(lan3) entered blocking state
[   23.951891] br-lan: port 3(lan3) entered disabled state
[   23.959218] device lan3 entered promiscuous mode
[   23.977115] mt7530-mdio mdio-bus:1f lan4: configuring for phy/gmii link mode
[   23.986166] br-lan: port 4(lan4) entered blocking state
[   23.991506] br-lan: port 4(lan4) entered disabled state
[   23.998981] device lan4 entered promiscuous mode



BusyBox v1.36.1 (2024-09-23 12:34:46 UTC) built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 23.05.5, r24106-10cc5fcd00
 -----------------------------------------------------
=== WARNING! =====================================
There is no root password defined on this device!
Use the "passwd" command to set up a new password
in order to prevent unauthorized SSH logins.
--------------------------------------------------
root@OpenWrt:/# ifdown eth1
Interface eth1 not found
root@OpenWrt:/# ifdown eth0
Interface eth0 not found
root@OpenWrt:/# ifdown wan



^C^C^C^C^C^C^C^C





Command failed: Request timed out

root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# dmesg | tail
[   23.916124] br-lan: port 2(lan2) entered disabled state
[   23.923321] device lan2 entered promiscuous mode
[   23.937893] mt7530-mdio mdio-bus:1f lan3: configuring for phy/gmii link mode
[   23.946573] br-lan: port 3(lan3) entered blocking state
[   23.951891] br-lan: port 3(lan3) entered disabled state
[   23.959218] device lan3 entered promiscuous mode
[   23.977115] mt7530-mdio mdio-bus:1f lan4: configuring for phy/gmii link mode
[   23.986166] br-lan: port 4(lan4) entered blocking state
[   23.991506] br-lan: port 4(lan4) entered disabled state
[   23.998981] device lan4 entered promiscuous mode
root@OpenWrt:/# ifup wan
Command failed: Request timed out




^C^C^C^C^C^C^C^C^C^C^C^C



Command failed: Request timed out

root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# /etc/init.d/network restart
Command failed: Request timed out
Command failed: Request timed out
Command failed: Request timed out



^C^C^C^C^C^C^C^C



Command failed: Request timed out

root@OpenWrt:/# 

Difference between good and bad boots:

$ diff bad.aw good2.aw -u
--- bad.aw	2024-10-13 21:30:32.737964193 +0300
+++ good2.aw	2024-10-13 21:34:16.510874733 +0300
@@ -19,8 +19,8 @@
  Kernel command line: console=ttyS0,115200 rootfstype=squashfs,jffs2
  Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
  Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
- Writing ErrCtl register=00001000
- Readback ErrCtl register=00001000
+ Writing ErrCtl register=00001006
+ Readback ErrCtl register=00001006
  mem auto-init: stack:off, heap alloc:off, heap free:off
  Memory: 119216K/131072K available (7323K kernel code, 629K rwdata, 884K rodata, 1264K init, 225K bss, 11856K reserved, 0K cma-reserved)
  SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
@@ -214,7 +214,7 @@
  IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
  key and hit [enter
 , [2
- jffs2: notice: (446) jffs2_build_xattr_subsystem: complete building xattr subsystem, 53 of xdatum (35 unchecked, 18 orphan) and 67 of xref (18 dead, 0 orphan) found.
+ jffs2: notice: (442) jffs2_build_xattr_subsystem: complete building xattr subsystem, 52 of xdatum (35 unchecked, 17 orphan) and 66 of xref (17 dead, 0 orphan) found.
  mount_root: switching to jffs2 overlay
  overlayfs: upper fs does not support tmpfile.
  urandom-seed: Seeding with /etc/urandom.seed
@@ -236,8 +236,8 @@
  mt7915e 0000:02:00.0: enabling device (0000 -> 0002)
  mt7915e 0000:02:00.0: HW/SW Version: 0x8a108a10, Build Time: 20220929104113a
  
- urngd: v1.0.2 started.
  mt7915e 0000:02:00.0: WM Firmware Version: ____000000, Build Time: 20220929104145
+ urngd: v1.0.2 started.
  mt7915e 0000:02:00.0: WA Firmware Version: DEV_000000, Build Time: 20220929104205
  mt7915e 0000:02:00.0: registering led 'mt76-phy0'
  mt7915e 0000:02:00.0: registering led 'mt76-phy1'
@@ -270,3 +270,30 @@
  br-lan: port 4(lan4) entered blocking state
  br-lan: port 4(lan4) entered disabled state
  device lan4 entered promiscuous mode
+ mtk_soc_eth 1e100000.ethernet wan: PHY [mdio-bus:04
+ mtk_soc_eth 1e100000.ethernet wan: configuring for phy/rgmii link mode
+ br-lan: port 5(phy0-ap0) entered blocking state
+ br-lan: port 5(phy0-ap0) entered disabled state
+ device phy0-ap0 entered promiscuous mode
+ br-lan: port 6(phy1-ap0) entered blocking state
+ br-lan: port 6(phy1-ap0) entered disabled state
+ device phy1-ap0 entered promiscuous mode
+ br-lan: port 6(phy1-ap0) entered blocking state
+ br-lan: port 6(phy1-ap0) entered forwarding state
+ IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
+ device phy1-ap0 left promiscuous mode
+ br-lan: port 6(phy1-ap0) entered disabled state
+ br-lan: port 6(phy1-ap0) entered blocking state
+ br-lan: port 6(phy1-ap0) entered disabled state
+ device phy1-ap0 entered promiscuous mode
+ br-lan: port 6(phy1-ap0) entered blocking state
+ br-lan: port 6(phy1-ap0) entered forwarding state
+ br-lan: port 6(phy1-ap0) entered disabled state
+ mtk_soc_eth 1e100000.ethernet wan: Link is Up - 1Gbps/Full - flow control off
+ IPv6: ADDRCONF(NETDEV_CHANGE): wan: link becomes ready
+ IPv6: ADDRCONF(NETDEV_CHANGE): phy1-ap0: link becomes ready
+ br-lan: port 6(phy1-ap0) entered blocking state
+ br-lan: port 6(phy1-ap0) entered forwarding state
+ IPv6: ADDRCONF(NETDEV_CHANGE): phy0-ap0: link becomes ready
+ br-lan: port 5(phy0-ap0) entered blocking state
+ br-lan: port 5(phy0-ap0) entered forwarding state

@ptlink Main branch just saw a fix for MTK Ethernet, might be worth trying a main build once that fix has been incorporated (or you can compile yourself).

If the problem disappears, try an older main image to see if that particular commit fixed it.

3 Likes

Update: no, it is not resolved :frowning:

2 Likes

Uh oh ouch, the issue has just happened to me again with the latest master. Looks like it is harder to reproduce now, but it still randomly fails to bring up ethernet on boot :frowning:

23.05.4 - works fine
23.05.5 - fails

The only related to mtk ethernet commit between those releases was:

https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=c4dc5dbd3313c7a7b8c2a3e79fece998eaf41339

Which described in upstream kernel as:

This is a purely cosmetic change intended to help readers find
their way through the implementation.

I'm completely lost.

Is there anything we can do to find the root cause?

I've got same result 23.05.4 works fine, 23.05.5 - fails.
I've also tried the 24.10.0 it works a bit better than 23.05.5 - but the Ethernet periodically lost and get connection, so revert back to 23.05.4 which seems stable on AX23