Belkin RT3200/Linksys E8450 WiFi AX discussion

Well, they're not going to be happy about me tinkering again :sweat_smile: for most people, home network is usually a "fire and forget" thing that they expect to work if untouched. Then they wonder why their router became compromised, running a 2012 model with a firmware from 2016 that was already considerably old (kernel and software additions vise)...

Ah, all this headache just makes me want to get back to my little pet project that would allow almost Unifi-style local provisioning, but that would require a considerable change of how OpenWrt handles initial configuration of the network interfaces - which is partly why I made that post a few weeks back about moving said configurations out into external packages (that would still be added into the firmware during compile time, but would allow replacement via e.g. Image Builder or ASU).

1 Like

After flash an updated recovery, dont forget to set again "bootcmd" in case you changed it

I'm assuming this is a Luci bug, but just wanted to double check that it's not related to the calibration data problem before making a bug report, flashing your new installer brought the 5ghz radio back (thanks btw), but after installing the latest snapshot there's no wireless tab in Luci, wireless is all there in cli though.

edit - well never mind, right after I wrote that it occurred to me to generate a new wireless config and that brought up the wireless tab in Luci

So, the reason was the ECC thing as I thought ? :wink:

I compiled your installer myself, and after reflashing the router with it, both radios work again and there aren't that many logged errors.

Curiously, there seems to an error regarding mtdblock2 "sector 256" or "logical block 32" that gets to the kernel log a dozen times at boot.

[   13.857812] blk_update_request: I/O error, dev mtdblock2, sector 256 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   13.868862] blk_update_request: I/O error, dev mtdblock2, sector 256 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   13.879390] Buffer I/O error on dev mtdblock2, logical block 32, async page read

Looking at the install.sh code, the sectors in mtd2 after the wifi calibration data are left unwritten, right?

  • /dev/mtdblock2 seems to show contents only until 0x020000, the 128 kB rewritten and ECC fixed, while
  • /dev/mtd2 shows contents until 0x100000 and MACs are visible at 0x07fff4

Apparently the mtdblock parser throws error at the end of the ECC fixed part, and determines that the mtdblock2 ends there, while the partition actually goes until 1 MB, with MACs at the middle.

Accessing mtdblock2 e.g. with hexdump causes errors to surface regarding the remaining sectors.

Sun Aug 29 09:54:32 2021 kern.warn kernel: [40943.627391] print_req_error: 54 callbacks suppressed
Sun Aug 29 09:54:32 2021 kern.err kernel: [40943.627399] blk_update_request: I/O error, dev mtdblock2, sector 256 op 0x0:(READ) flags 0x84700 phys_seg 28 prio class 0
Sun Aug 29 09:54:32 2021 kern.err kernel: [40943.643487] blk_update_request: I/O error, dev mtdblock2, sector 264 op 0x0:(READ) flags 0x84700 phys_seg 27 prio class 0
Sun Aug 29 09:54:32 2021 kern.err kernel: [40943.654621] blk_update_request: I/O error, dev mtdblock2, sector 272 op 0x0:(READ) flags 0x84700 phys_seg 26 prio class 0
Sun Aug 29 09:54:32 2021 kern.err kernel: [40943.665712] blk_update_request: I/O error, dev mtdblock2, sector 280 op 0x0:(READ) flags 0x84700 phys_seg 25 prio class 0

Probably harmless, but strange that the new driver is so picky.

I wonder if it might make sense to rewrite also the remaining mtd2 (after the first 128 kB).

2 Likes

Yes it was ECC -- just not only what was written by the vendor but also all writes done using the old driver...
And yes, the installer only writes back the EEPROM data and MAC addresses inside mtd2, writing the whole partition didn't seem to be needed (and would also be slightly more tricky as there may be bad blocks between EEPROM data in the beginning and MAC addresses in the middle, so that the offset of the MACs would change ...).

As @981213 explained well in https://github.com/openwrt/openwrt/pull/4179 , the root-cause is a mix of API misuse by mt76 and mtdblock (not handling EUCLEAN) and apparently broken ECC data for things which had been written using the old driver.

3 Likes

I seem to be having a similar issue with snapshot where the 5ghz driver isn’t loaded. When I rolled back to commit 7119fd32d397567931e63dbbf72014e95624018f everything works fine again.

I’m running my own build without ubifs.

2 Likes

Hi @daniel , would you know if the old (snfi) nand driver work with the mtd2 ECC data that's re-written with the new v0.5.3 installer? Or would it be the case that once we use the new v0.5.3 installer, we must use the new snand driver?

1 Like

Hi @quarky,

the old driver is much less picky when it comes to ECC errors. It will work just as well with the re-written factory partition.

1 Like

That sounds bad, as it really means that the new spi-nand driver does not accept stuff originally written by the OEM to RT3200/E8450.

cc @981213

2 Likes

Thank you for reporting that @hlew . This confirms what I was guessing but could not empirically verify: It comes with ECC errors out of the factory. I re-wrote the flash completely several time during development and also the installer may re-write parts of the factory partition in case it is needed to restore offsets without BMT, so I wasn't sure if only things written with the old driver, also used in old versions of the installer, cause trouble for the new driver.
As you never run the UBI installer and still got the device with it's original bootchain, the fact that it broke for you too confirms that the problem of corrupted ECC data also applied when the device comes right out of factory....
btw: do you see read-errors when trying got read other /dev/mtdblock* devices?

Unfortunately the only way out is to either give the new driver an option (?) to be more tolerant with pre-existing ECC errors or to re-rewrite at least the factory partition using the new driver.
As having correct ECC data is nice to have if you plan to use this thing for a decade or so having that cheap-brand SPI-NAND chip, I'd recommend re-writing so you got ECC in future.
Doing so requires booting with initramfs image where you removed the read-only attribute from MTD partitions in device-tree.

3 Likes

OOB area for user data are reordered in the old driver but ECC part shouldn't be affected, as those data are placed by hardware, not by our drivers.
The old driver just doesn't report any bitflips in _mtd_read return value (due to how the spi-nand driver was hacked) which makes everything appear fine.
We just need to fix mt76 to ignore -EUCLEAN and everything should work again without other steps.

edit: I'd still recommend a rewriting to clear those bitflips as there are already 3 bits flipped when -EUCLEAN is reported and mt7622-snand is only able to correct up to 4 bitflips.

3 Likes

FYI there's a mtd-rw module in packages feed which unlocks all read-only partitions:

5 Likes

Last week, I was able to flash the non ubi image through the Belkin web interface. I decided to go back to stock by installing Belkin firmware through luci. Now I want to go back to non ubi image but nothing happens when I flash it on Belkin's web interface.

What could be wrong?

1 Like

I was having the same issue and assumed that the image was not flashing correctly hence the board was defaulting to the stock firmware. As I mentioned a few posts ago, I had to revert to an earlier commit/build to get everything working again.

2 Likes

Thanks! Doing that worked.

2 Likes

Most recent snapshots should also work again.

1 Like

Been busy for the past couple of days so I wasn't able to reply you earlier. Here is the network config:

root@OpenWrt:~# cat /etc/config/network

config interface 'loopback'
        option device 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fdef::::/48'
        option packet_steering '1'

config device
        option name 'br-lan'
        option type 'bridge'
        list ports 'lan1'
        list ports 'lan2'
        list ports 'lan3'
        list ports 'lan4'

config interface 'lan'
        option device 'br-lan'
        option proto 'static'
        option ipaddr '192.168.X.1'
        option netmask '255.255.255.0'
        option ip6assign '60'
        option gateway '192.168.X.254'
        option broadcast '192.168.X.255'
        list dns '192.168.X.254'
        list dns '9.9.9.9'

config interface 'wan'
        option device 'wan'
        option proto 'dhcp'

config interface 'wan6'
        option device 'wan'
        option proto 'dhcpv6'

root@OpenWrt:~#

Also I have updated the Wireless config as per your suggestion:

root@OpenWrt:~#  cat /etc/config/wireless

config wifi-device 'radio0'
        option type 'mac80211'
        option path 'platform/18000000.wmac'
        option band '2g'
        option channel 'auto'
        option cell_density '0'
        option country 'IN'
        option noscan '1'
        option htmode 'HT40'
        option legacy_rates '1'

config wifi-iface 'default_radio0'
        option device 'radio0'
        option network 'lan'
        option mode 'ap'
        option key '*********'
        option ssid '2.4G_Network'
        option disassoc_low_ack '0'
        option encryption 'psk-mixed'

config wifi-device 'radio1'
        option type 'mac80211'
        option path '1a143000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0'
        option band '5g'
        option channel 'auto'
        option htmode 'VHT80'
        option cell_density '3'

config wifi-iface 'default_radio1'
        option device 'radio1'
        option network 'lan'
        option mode 'ap'
        option key '*********'
        option ssid '5G_Network'
        option encryption 'psk-mixed'
        option disassoc_low_ack '0'
        option short_preamble '0'

Still I am seeing the same issue with Wifi clients not able to communicate to each other. It seems like we are hitting a bug similar to this https://bugs.openwrt.org/index.php?do=details&task_id=714, https://forum.openwrt.org/t/clients-in-same-wlan-cant-reach-each-other/2501

Anyone else having issues similar to this? I only see ARP requests between the wireless networks but no ARP replies when pinging.

Any idea what the issue could be related to?

3 Likes

Daniel,

After updating to the Openwrt firmware to r17443-90e167abaa https://github.com/dangowrt/linksys-e8450-openwrt-installer/releases/tag/v0.5.3, I lost my 5Ghz network. The interface doesn't show up and I see below errors in the system log:

Wed Sep  1 18:35:48 2021 daemon.notice netifd: radio1 (2324): Phy not found
Wed Sep  1 18:35:48 2021 daemon.notice netifd: radio1 (2324): Could not find PHY for device 'radio1'
Wed Sep  1 18:35:48 2021 user.notice ucitrack: Setting up /etc/config/wireless reload dependency on /etc/config/network
Wed Sep  1 18:35:48 2021 daemon.notice netifd: radio1 (2361): WARNING: Variable 'data' does not exist or is not an array/object
Wed Sep  1 18:35:48 2021 daemon.notice netifd: radio1 (2361): Bug: PHY is undefined for device 'radio1'

I tried updating to the latest snapshot but no luck. 5GHz is still down. Do I need to go back to a previous release?

Thanks in advance.

1 Like

Did you use the recovery installer or just sysupgrade?

I just used the sysupgrade installer, do I need to install the recovery first and then install the sysupgrade?