Belkin RT3200/Linksys E8450 WiFi AX discussion

:frowning:

@981213 any ideas?

1 Like

The crashlog shows that the first read from the mtdblock2 device (for parsing potential sub-partitions) fails. This is odd and fails even with the partition parser removed: In that case the device boots, UBI and everything works, but

root@OpenWrt:/# hexdump -C /dev/mtdblock2
[  293.744528] blk_update_request: I/O error, dev mtdblock2, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
[  293.755393] blk_update_request: I/O error, dev mtdblock2, sector 8 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0
[  293.766235] blk_update_request: I/O error, dev mtdblock2, sector 16 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[  293.777152] blk_update_request: I/O error, dev mtdblock2, sector 24 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[  293.788144] blk_update_request: I/O error, dev mtdblock2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  293.798571] Buffer I/O error on dev mtdblock2, logical block 0, async page read
hexdump: /dev/mtdblock2: I/O error

It does work for all other mtdblock devices, only mtdblock2 ie. 0x0000001c0000-0x0000002c0000 : "factory" fails on read operations on the mtdblock device, reading the mtd device itself works.
Also reading all other mtd and mtdblock devices works without problems. First I thought locking might be the issue as mtd2 is used as nvmem device for reading mac addresses and wifi eeprom. However, even reading mtdblock3 which is used by UBI works.

In order to make the build usable again, I fixed error handling in the FIT partition parser to error out in case of page read faults. So now you still get a lot of read errors, but in the end everything seems to work:

3 Likes

That made me to think if it be somehow related to the discussion here...

It is different driver, but might be something similar that the partition being declared from nvmem usage causes some type evaluation to misfire.

Edit: ok, while I was writing this, you apparently figured out the reason. (And my guess was wrong)

I've even tried removing all the NVMEM stuff and the error when reading mtdblock2 persists :worried:

1 Like

@bobbythomas @BeauSlim
When you had this kernel oops:

kernel NULL pointer dereference at virtual address 0000000000000053

Did you happened to have a usb device connected?
One of mine was having a similar issue reported here: https://github.com/openwrt/mt76/issues/565
Then I removed the usb device (just a usb powered TV antenna LNA, not too much power drained I think, nor does it talk to CPU), and it becomes stable as of today.

1 Like

no, no USB devices were connected. It's just in AP mode, nothing much in there.

I'm unable to reproduce this on my current mt7622 router :frowning:
I'll try again next week with my mt7622 rfb.

Thanks.
After your fix by commit cf40141b5 it works again ok.
https://github.com/openwrt/openwrt/commit/cf40141b515d518ff166afb85e898904ab2ae57a

Tested with r17443-90e167abaa

While commit cf40141b5 prevents the Kernel from crashing, the real cause has not been addressed by it. Check the entire boot log or try reading from /dev/mtdblock2, so will still see read errors. Reading from other /dev/mtdblock* devices works. Reading from /dev/mtd2 also works. I guess it's still some odd problem with @hackpascal's new SPI-NAND driver...

For example, this is block-mount/blockd probing all block devices, still causing a lot of noise now:

...
[   14.426729] Buffer I/O error on dev mtdblock2, logical block 1, async page read
[   15.101022] Buffer I/O error on dev mtdblock2, logical block 0, async page read
[   15.110508] Buffer I/O error on dev mtdblock2, logical block 1, async page read
[   15.663616] Buffer I/O error on dev mtdblock2, logical block 0, async page read
[   15.673045] Buffer I/O error on dev mtdblock2, logical block 1, async page read
[   19.323705] print_req_error: 76 callbacks suppressed
[   19.323713] blk_update_request: I/O error, dev mtdblock2, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
[   19.339636] blk_update_request: I/O error, dev mtdblock2, sector 8 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0
[   19.350507] blk_update_request: I/O error, dev mtdblock2, sector 16 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[   19.361461] blk_update_request: I/O error, dev mtdblock2, sector 24 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   19.372451] blk_update_request: I/O error, dev mtdblock2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   19.382843] buffer_io_error: 4 callbacks suppressed
[   19.382882] Buffer I/O error on dev mtdblock2, logical block 0, async page read
[   19.395457] blk_update_request: I/O error, dev mtdblock2, sector 32 op 0x0:(READ) flags 0x80700 phys_seg 8 prio class 0
[   19.406437] blk_update_request: I/O error, dev mtdblock2, sector 40 op 0x0:(READ) flags 0x80700 phys_seg 7 prio class 0
[   19.417394] blk_update_request: I/O error, dev mtdblock2, sector 48 op 0x0:(READ) flags 0x80700 phys_seg 6 prio class 0
[   19.428349] blk_update_request: I/O error, dev mtdblock2, sector 56 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 0
[   19.439305] blk_update_request: I/O error, dev mtdblock2, sector 64 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
[   19.450690] Buffer I/O error on dev mtdblock2, logical block 1, async page read
...

Those errors seem similar as e.g. with R7800.

I have thought them to be related to some ROM content being without ECC.

See

  • https://openwrt.org/docs/techref/flash#innocent_mtdblock_io_errors
  • https://bugs.openwrt.org/index.php?do=details&task_id=1871 (jogo's closing comments)

    Additional comments about closing:
    The first few blocks of a NAND flash are guaranteed good to ensure that a bootloader stored there can never get corrupted, so it will get written without valid ECC data (the SoC won't check the ECC anyway).
    When block-mount scans all block devices, it will try to read from those blocks, which are exposed as partitions, and the NAND driver will report failed ECC checks (the I/O errors in the log).
    There is nothing wrong here in either way, and nothing we can really do to prevent it.

As "factory" contents are MAC addresses etc. trivial permanent stuff, Belkin/Linksys has likely written without regarding to ECC.

1 Like

Thing is: I have wiped the flash of the device multiple times in the past and also written back the factory part there myself, in just the same way as I also did write bl2 there, ie. with ECC (and mtdblock0 and mtdblock1 read without problems).

1 Like

Anyone having issues with Android devices communicating with other devices over Wifi? Since I moved from Openwrt Archer c7 to Openwrt in E8450 I started seeing this issue. I have some Tasmota IoT devices and some IP cameras (all connected to wifi) which I used to access using IP address, now I cannot reach them from any android devices connected to wifi. I cannot ping them from Wifi and there are no ARP entries on the android devices. While I can access them and ping them from my windows laptop connected to wifi. Client Isolation is not enabled on the AP.

Any idea what the issue could be?

1 Like

I had a somewhat similar issue where Android devices often timed out or straight up refused to connect to certain services - but for me it was all internet-side things (e.g. couldn't use the Google SSO to log into apps, Deliveroo refused to load).

Funny thing is, after I replaced the main router/edge gateway (which was also OpenWrt, running in a VM), with a Ubiquiti UDM, the issue went away.

Do you use the RT3200 as your main router, or just as an AP?

1 Like

E8450 is only acting as an AP and it's directly connected to a L2 switch. All the devices are in the same broadcast domain so the gateway/router/firewall I am using doesn't come into the scene unless it needs to exit the network. I can assure that it's only with those android device connected to the same network over the E8450 because I can access those devices over vpn from the same android device.

What's your exact configuration for the AP?

FYI just installed r17443, and the AX radio is failing to start up. I'm not 100% sure, but fairly certain it's not an issue of my config :sweat_smile: given these logs:

Sat Aug 28 00:35:22 2021 kern.debug kernel: [    8.177222] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
Sat Aug 28 00:35:22 2021 kern.debug kernel: [    8.187923] mt7915e 0000:01:00.0: assign IRQ: got 143
Sat Aug 28 00:35:22 2021 kern.debug kernel: [    8.193043] pci 0000:00:00.0: enabling bus mastering
Sat Aug 28 00:35:22 2021 kern.info kernel: [    8.198050] mt7915e 0000:01:00.0: enabling device (0000 -> 0002)
Sat Aug 28 00:35:22 2021 kern.debug kernel: [    8.204141] mt7915e 0000:01:00.0: enabling bus mastering
Sat Aug 28 00:35:22 2021 kern.info kernel: [    8.254131] mt7622-wmac 18000000.wmac: HW/SW Version: 0x8a108a10, Build Time: 20190801210006a
Sat Aug 28 00:35:22 2021 kern.info kernel: [    8.254131]
Sat Aug 28 00:35:22 2021 kern.info kernel: [    8.274830] mt7915e 0000:01:00.0: HW/SW Version: 0x8a108a10, Build Time: 20201105222230a
Sat Aug 28 00:35:22 2021 kern.info kernel: [    8.274830]
Sat Aug 28 00:35:22 2021 kern.info kernel: [    8.359794] mt7622-wmac 18000000.wmac: N9 Firmware Version: 2.0, Build Time: 20200131180931
Sat Aug 28 00:35:22 2021 kern.info kernel: [    8.423962] mt7915e 0000:01:00.0: WM Firmware Version: ____000000, Build Time: 20201105222304
Sat Aug 28 00:35:22 2021 kern.info kernel: [    8.482611] mt7915e 0000:01:00.0: WA Firmware Version: DEV_000000, Build Time: 20201105222323
Sat Aug 28 00:35:22 2021 kern.warn kernel: [    8.655872] mt7915e: probe of 0000:01:00.0 failed with error -22
Sat Aug 28 00:35:22 2021 kern.info kernel: [    8.669608] PPP generic driver version 2.4.2
Sat Aug 28 00:35:22 2021 kern.info kernel: [    8.674675] NET: Registered protocol family 24
Sat Aug 28 00:35:22 2021 user.info kernel: [    8.685578] kmodloader: done loading kernel modules from /etc/modules.d/*
1 Like

Here is the config:

root@OpenWrt:~# cat /etc/config/wireless

config wifi-device 'radio0'
        option type 'mac80211'
        option path 'platform/18000000.wmac'
        option band '2g'
        option channel 'auto'
        option cell_density '0'
        option country 'IN'
        option noscan '1'
        option htmode 'HT40'

config wifi-iface 'default_radio0'
        option device 'radio0'
        option network 'lan'
        option mode 'ap'
        option key '**********'
        option ssid '2.4g_Network'
        option wpa_disable_eapol_key_retries '1'
        option encryption 'sae-mixed'
        option disassoc_low_ack '0'
        option ieee80211w '1'

config wifi-device 'radio1'
        option type 'mac80211'
        option path '1a143000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0'
        option band '5g'
        option cell_density '0'
        option channel 'auto'
        option htmode 'VHT160'

config wifi-iface 'default_radio1'
        option device 'radio1'
        option network 'lan'
        option mode 'ap'
        option key '**********'
        option ieee80211w '2'
        option encryption 'sae-mixed'
        option wpa_disable_eapol_key_retries '1'
        option ssid '5G_Network'

root@OpenWrt:~#

This is just your wireless config. Could you post your network config? That will have a larger effect on the outcome.

Although, on a second thought... Could you try disabling 802.11w, and switch to WPA/WPA2 mixed instead of WPA2/3 mixed?

I had issues with WPA3 and Android devices (even newer ones, like my Galaxy S21 Ultra, and especially, my Chromecast w/ Google TV dongle).

I will provide the network config shortly. I am not at my place now.

1 Like

I also see MT7915e failing to probe:

[   10.592727] mt7915e: probe of 0000:01:00.0 failed with error -22

Could be related to the SPI-NAND update, as EEPROM is read from the very same mtd2 == factory partition which also fails to be read using mtdblock2 device. (even though, when looking at /dev/mtd2 offset 0x50000 there seems to be a valid MT7915E EEPROM)

2 Likes