Belkin RT3200/Linksys E8450 WiFi AX discussion

I have the following in /etc/rc.local

echo "437500" > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
echo "ondemand" > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
echo 5 > /sys/devices/system/cpu/cpufreq/policy0/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/policy0/ondemand/sampling_down_factor

Can you please check if the freeze-on-reboot problem still occurs if you keep the default CPU frequency scheduler instead?

Seems to reboot without any problem.

# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
userspace

Is "ondemand" governor the problem? This commit https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=b1cc4eac1bc82086cb2569deecb1c68678011ab3 changed default governor to "ondemand".

I'm using echo schedutil > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor on the latest release and reboot works fine for me.

I've been using ondemand on three RT3200s for months without any issues. I've left the other parameters that you've modified at their default, though (well, prior to the frequency default being changed I did set it to the 437500 value, but I think it was like June or July when that was fixed).

I'll try out your values for up_threshold and sampling_down_factor on my local one, and see if that yields any clues (although I keep it on SNAPSHOT, updated weekly, so that might be a confounding factor).

My RT3200 is running snapshot r21070. Can anybody confirm if irqbalance is running correctly?
To my knowledge int 142 and 143 should be spread across both cores (as in earlier builds).

TIA

/etc/config$ cat /proc/interrupts
           CPU0       CPU1       
 10:   11209476    7925224     GICv2  30 Level     arch_timer
 15:          1          0  MT_SYSIRQ 163 Level     mt-pmic-pwrap
 22:          0          0   mt-eint   0 Edge      gpio-keys
 75:          6          0   mt-eint  53 Level     mt7530
124:          0          0   mt-eint 102 Edge      gpio-keys
125:         14          0  MT_SYSIRQ  91 Level     ttyS0
128:          0          0  MT_SYSIRQ 118 Level     1100a000.spi
131:      11569          0  MT_SYSIRQ  96 Level     mtk-snand
132:      41769          0  MT_SYSIRQ  95 Level     mtk-ecc
133:          0          0  MT_SYSIRQ 122 Level     11016000.spi
134:   39091441          0  MT_SYSIRQ 211 Level     mt7615e
135:          0          0  MT_SYSIRQ 232 Level     xhci-hcd:usb1
138:          0          0  MT_SYSIRQ 219 Level     1b007000.dma-controller
142:   25362240          0  MT_SYSIRQ 224 Level     1b100000.ethernet
143:   28363196          0  MT_SYSIRQ 225 Level     1b100000.ethernet
146:          0          0     dummy   0 Edge      PCIe PME
147:          0          3    mt7530   0 Edge      mt7530-0:00
148:          0          0    mt7530   1 Edge      mt7530-0:01
149:          0          1    mt7530   2 Edge      mt7530-0:02
150:          0          1    mt7530   3 Edge      mt7530-0:03
151:          0          1    mt7530   4 Edge      mt7530-0:04
152:   19150591          0  MTK PCIe MSI 524288 Edge      mt7915e
IPI0:   2227354    2288596       Rescheduling interrupts
IPI1:  26647502   68114609       Function call interrupts
IPI2:         0          0       CPU stop interrupts
IPI3:         0          0       CPU stop (for crash dump) interrupts
IPI4:         0          0       Timer broadcast interrupts
IPI5:  13611573    3272948       IRQ work interrupts
IPI6:         0          0       CPU wake-up interrupts
Err:          0

In case you are curious, I got those values from https://github.com/openwrt/openwrt/pull/4983#issuecomment-1026850168

Thanks, that's interesting.

I set cron to reboot the RT3200 at 0500 every night, which rebooted last night without issue. Maybe I'll set it to ever hour and see if I can get more samples.

@trr I'm running 22.03.1 r19777-2853b6d652 release from https://github.com/dangowrt/owrt-ubi-installer/releases

To which should I switch to get WED/irqbalance working?

In my case /sys/kernel/debug/ppe0/bind does not exist,
/sys/kernel/debug/mtk_ppe/bind is empty,
/sys/module/mt7915e/parameters/wed_enable contains an "Y" correctly
/etc/modules.conf does not exist, added WED config to /etc/modules.d/mt7915e.

irqbalance does not seem to work, interrupts look like below:

tail /proc/interrupts -n 100
           CPU0       CPU1
 10:    5274723    3379620     GIC-0  30 Level     arch_timer
 15:          1          0  MT_SYSIRQ 163 Level     mt-pmic-pwrap
 22:         12          0  MT_SYSIRQ  91 Level     ttyS0
 25:          0          0  MT_SYSIRQ 118 Level     1100a000.spi
 28:      16257          0  MT_SYSIRQ  96 Level     mtk-snand
 29:          0          0  MT_SYSIRQ 122 Level     11016000.spi
 30:    4041068          0  MT_SYSIRQ 211 Level     mt7615e
 31:          0          0  MT_SYSIRQ 232 Level     xhci-hcd:usb1
 34:          0          0  MT_SYSIRQ 219 Level     1b007000.dma-controller
 35:    5312217          0  MT_SYSIRQ 214 Level     mt7915e
 38:    5560497          0  MT_SYSIRQ 224 Level     1b100000.ethernet
 39:    7737884          0  MT_SYSIRQ 225 Level     1b100000.ethernet
 40:          0          0   mt-eint   0 Edge      gpio-keys
 93:         33          0   mt-eint  53 Level     mt7530
142:          0          0   mt-eint 102 Edge      gpio-keys
145:          3          0    mt7530   0 Edge      mt7530-0:00
146:         25          0    mt7530   1 Edge      mt7530-0:01
147:          1          0    mt7530   2 Edge      mt7530-0:02
148:          1          0    mt7530   3 Edge      mt7530-0:03
149:          3          0    mt7530   4 Edge      mt7530-0:04
IPI0:     17694      17304       Rescheduling interrupts
IPI1:   1315062    7664970       Function call interrupts
IPI2:         0          0       CPU stop interrupts
IPI3:         0          0       CPU stop (for crash dump) interrupts
IPI4:         0          0       Timer broadcast interrupts
IPI5:         0          0       IRQ work interrupts
IPI6:         0          0       CPU wake-up interrupts

What's also interesting on a 2.5Gb wired interface I get ~150MBps transfer from the internet (speedtest) while on AX 80MHz wifi I get almost 180MBps (tried multiple times both ways, same time)...

Well, cron rebooted the router 9 times total and it came back up fine every time. SNAPSHOT r21087-288b36c2ea with your ondemand settings as above...

Well, I know nobody wants to hear that, but:
MediaTek (and hence also Linksys) only ever runs the MT7622 SoC at full speed. They have told us so multiple times, and said clearly that QA only ever saw the device running at max speed.
So it can absolutely be that the RAM chips in earlier batches of RT3200/E8450 only have problems with reboot when CPU voltage is less than 1.0V. However, that may not be true for newer batches with potentially different RAM chips, and any problem with that wouldn't ever be detected by either MediaTek's or the board vendor's QA process, because none of them care about anything but the highest CPU clock rate.

7 Likes

@ka2107 why are you setting up periodic rebooting? If it is a wan IP refresh issue you might be able to use something like:

#!/bin/sh

renew_wan_lease=0

ip monitor link dev eth0 | while read event; do
	
	# logger "maintain-wan-lease detected eth0 event: "$event

	case $event in

	*'NO-CARRIER'* )
		if [ $renew_wan_lease -eq 0 ]; then
			logger "maintain-wan-lease detected eth0 state change to: 'NO-CARRIER', so forcing udhcpc to release wan lease."
			killall -SIGUSR2 udhcpc
			renew_wan_lease=1		
		fi
	;;

	*'LOWER_UP'* )
		if [ $renew_wan_lease -eq 1 ]; then
			logger "maintain-wan-lease detected eth0 state change from: 'NO-CARRIER' to: 'LOWER_UP', so forcing udhcpc to renew wan lease."
			killall -SIGUSR1 udhcpc
			renew_wan_lease=0
		fi
	;;
	esac

done

n.b. this was for an Asus router running Asus Merlin and I have not needed it with same upstream bridge modem because OpenWrt properly handled this situation.

But my point is that perhaps you can solve the root cause like I did with the above and thereby obviate the need to setup the manual reboot, which seems to me like a hack anyway.

2 Likes

I need some help.

I have a brand new RT3200 with stock 1.0 firmware and I wanna flash UBI. Before that I decide to backup vendor's firmware. I follow the instructions of the Device flash complete backup procedure. It's all going well until the final factory reset step. After I did factory reset, it just went soft-bricked. Apparently the factory reset corrupt the vendor firmware and I cannot boot into it anymore. It would just keep blinking white light.

I'm still able to get into the recovery initramfs or fail_safe mode if I press reset button during booting, and I have a complete backup of vendor‘s firmware(with total size of 127.75MB). However the procedure explicitly warned not to flash anything in that recovery, it also warned that flashing back vendor's backup also has risk if there is a bad block in the first 22M of NAND.

So now I'm stuck. I really don't want to end up having to open it up and serial/JTAG it. Any suggestion?

Usually the device boots again to vendor firmware after a few warm and/or cold reboots. You could of course just write back the backup files you have now, but as that is indeed a bit risky, I'd first try to reboot-torture it a bit and see if that helps to bring vendor firmware back (it usually does).

(Edit: to explain: the vendor firmware has a bootcount-based A/B dual-boot mechanism which is a bit broken. When booting the UBI recovery image, the bootcount is not maintained by that image, so the vendor bootloader should detect that sooner or later boot the other A/B image which should be the vendor firmware)

1 Like

Hi,

Thank you for reply. I went through your advice and reboot it a dozen times. It's either failed to boot, or boot into recovery. Any idea?

p.s. a small detail, when it failed to boot, the white light blinking would briefly stop, then resume.

When it "failed to boot", how much time did you give it and how did you conclude it must have failed?

During the "reboot-torture" each time I waited is about 90 seconds.

I setup my client with 192.168.1.254 and keep ping 192.168.1.1. Since there is no ARP nor ICMP response from 192.168.1.1 after 90 seconds I presume it must have failed.

So just let you know. I went hail mary and restore mtd3 backup to the nand. Lucky for me all the bad-blocks is located after 63M. It is now running on stock firmware.

1 Like

@snowboy stop spaming!

I've installed ksmbd on the RT3200 (set as Dumb AP) to use network shares. When I copy a file to the USB flash memory the Wi-Fi stops sending data to the clients and I have to reboot. Any clues?