[Fixed at 8f5986355c] Router crashes once a day on average (MT7621 + MT7615)

I hear that hardware offloading is unstable with nftables. Is that still true nowadays? My router GL-iNet MT1300 crashes once a day and I am on snapshot at https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=e410833bdd6117228a00aad35bbc56de91f7251e

There is no crashlog unfortunately.

There's only one way to find out - disable offloading and have a look how it behaves then.

Uptime 23h 29m 6s

Looking good so far

up 4 days, 18:58

so there must be a bug in hardware offloading.

Hi.
That reminds me of this thread about a year ago. Maybe you can find infos inthere.

OpenWrt 22.03.2 is rock stable here (MT7621AT + MT7915 + HW offload).

1 Like

Even without HW offloading my router still hangs every 3-10days, so I guess there are some other culprits.

Did you use 22.03.2 or still on snapshot?

Snapshot. I updated to the latest snapshot again yesterday and there were some mt76 updates. I will test for a few more days to report again.

Found a pull-request that will switch the used firmware for GL-iNet MT1300.

define Device/glinet_gl-mt1300
  $(Device/dsa-migration)
  IMAGE_SIZE := 32448k
  DEVICE_VENDOR := GL.iNet
  DEVICE_MODEL := GL-MT1300
 - DEVICE_PACKAGES := kmod-mt7615e kmod-mt7615-firmware kmod-usb3
 + DEVICE_PACKAGES := kmod-mt7615-firmware kmod-usb3
endef
TARGET_DEVICES += glinet_gl-mt1300

Maybe once this is merged, it will help?

The situation is much worse on the latest snapshot. My router freezes once a day.

Have you tried swapping out the power supply? A bad power supply can cause similar behaviour.

2 Likes

Not sure, but after I updated to the latest snapshot again, the freeze becomes much less frequent, now 3-4days a freeze. I suspect it's caused by multiple SSID in the 2.4GHz wlan interface.

I noticed that the system log was filled with DHCP messages. See Dnsmasq-dhcp[1]: no address range available for DHCP request via zt2lruqnob

Still crashy. I wonder if I should completely give up mt76 and switch to ath or something.

Things you could try:

  • provide your exact configuration and how you use your device β†’ this is helpful for debugging
  • enable further log levels to find the crash notification
  • enable a cron job that restarts the router once a day.
  • search for Dnsmasq-dhcp[1]: no address range available for DHCP request on the internet
    e.g. i found this: https://serverfault.com/questions/1029000/how-to-fix-no-address-range-available-for-dhcp-request-error/1029007
  • buy new hardware
    • If you intend to stay with Mediatek, mt7915 for example seems much better β†’ of course there are still bugs, but regular crashes as you report have been much rarer and are mostly related to advanced and complex config or when many devices are connected at the same time in a mesh network.

Unfortunately, there are many bug reports for these older mediatek chipsets like mt7615 or mt7603 and activity to fix them becomes less and less as most development now focuses on these newer wifi 6 and wifi 7 devices.

1 Like

This sounds like a good idea.

@easyteacher have the same problem, several mt7621 (ZBT) that crashed before but now hang completely. tried new power supply. no difference. has noted the following before it hangs:

commands like ip, wg etc hangs, i can run ps, but it hangs also after 100+ lines. men jag ser allot of stuck processes.

root@router: strace ip addr

execve("/sbin/ip", ["ip", "addr"], 0x7fe84484 /* 13 vars */) = 0
set_thread_area(0x77f90efc)             = 0
set_tid_address(0x77f88008)             = 10523
*removed allot of open/close libs*
mprotect(0x77edd000, 4096, PROT_READ)   = 0
mprotect(0x77eb4000, 4096, PROT_READ)   = 0
mprotect(0x77ea0000, 4096, PROT_READ)   = 0
mprotect(0x77e7c000, 4096, PROT_READ)   = 0
mprotect(0x460000, 4096, PROT_READ)     = 0
socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC, NETLINK_ROUTE) = 3
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [32768], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [1048576], 4) = 0
setsockopt(3, SOL_NETLINK, NETLINK_EXT_ACK, [1], 4) = 0
bind(3, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 0
getsockname(3, {sa_family=AF_NETLINK, nl_pid=10523, nl_groups=00000000}, [12]) = 0
setsockopt(3, SOL_NETLINK, NETLINK_DUMP_STRICT_CHK, [1], 4) = -1 ENOPROTOOPT (Protocol not available)
sendto(3, {{len=40, type=0x12 /* NLMSG_??? */, flags=NLM_F_REQUEST|0x300, seq=1650512819, pid=0}, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x08\x00\x1d\x00\x01\x00\x00\x00"}, 40, 0, NULL, 0

so, commands that use socket(AF_NETLINK hangs.

someone that have a clue?

I am now at r23838+1-8f5986355c and the uptime has reached 8 days.

I am almost sure the crash is gone. Offloading is disabled.