Dynalink DL-WRX36 Askey RT5010W IPQ8072A technical discussion

Do you have multiple SSIDs?

I tried to ping now from wireless client to wireless client and I have 170ms average and trace 45-50ms.

Kinda high isn't it?

These are similar to the symptoms I've described in Dynalink DL-WRX36 Askey RT5010W IPQ8072A technical discussion - #1630 by fif
Upstream bug: https://github.com/openwrt/openwrt/issues/9555

Hard to say, check as many devices as you can to see a clearer picture I guess

Since the wifi crash, I pulled the power cord to reboot and the 5GHz device I was having issues with has come good (touch wood :wood:)

Now another device on the 2.4GHz wifi is experiencing the same issue, however even though it's not accessible (by it's GUI or ping from another device on network). I know it's still functioning (I can check this online, the device uploads our home electricity data)
However can ping the device using Openwrt cli.

It's strange as it affects devices in an unscripted basis.

The issue only affects communication between wifi devices but wifi devices will continue to be able to reach LAN devices and the AP and therefore also the gateway for WAN communication. You can try to ping your device directly from the AP. It'll probably work.

1 Like

Did some more tests

Wifi device ->ping to ethernet device-> 22ms average

Wifi to wifi ping --> 118 average
1m distance from router, all alone in channel 149

Found something possibly related while I was researching option flags.
ap_isolate=1
Is set in /var/run/hostapd-phy0.conf and /var/run/hostapd-phy1.conf even if Isolate Clients is unchecked in Luci.

I don't understand what this flag is supposed to do. Apparently it's supposed to be set to 1 for better security but you can probably flip it to 0 as a workaround to your issue.

I am on OpenWrt SNAPSHOT r22923-fd0118c0a5 / LuCI Master git-23.118.79121-6fb185f.

Interesting. The thing is that the issue appears randomly. It might work for hours of days and then all of a sudden wifi devices can't reach each other anymore. It was first reported at least more than a year ago

I suspect this could be a kernel issue because all hostapd does with the isolate flag is sending it to the kernel via nl80211.

I found more interesting things.
In an earlier post I had reported that I found out that for secondary 5 GHz channels have the VLAN settings incorrect after reboot or /etc/init.d/network restart.

It turns out that some interfaces cannot also be wrongly configured to have hairpin off after a reboot. This seems to happen to secondary 5 GHz interfaces that are on a DFS channel.
As described earlier on, secondary 5 GHz interfaces appear after the DFS CAC has completed (~1 minute after for me on US Channel 100 for example).
Not only the VLANs are incorrect, but also the hairpin mode.

Now, the failure to ping/multicast/broadcast between wifi clients I reported earlier on was on the primary 5 GHz interface, and these do not seem to suffer from the misconfigured bridge problem. But I'll take a look next time I can see the problem happening.
Maybe, if CAC runs after boot because a radar or whatever is detected, the interface could get reconfigured, lose the hairpin setting and explain the issue we've been seeing?

Is there any way to force a CAC on an interface to test that theory?

I also thought about checking hairpin_mode next time the issue occurs though I don't see how that would drop multicast packets but not unicast.

@nbd do you maybe have an idea what would cause this issue that many people are experiencing on different platforms? You seem to be quite familiar with mac80211 and have also contributed to it. It's unfortunately quite a big deal when out of the blue wifi devices can't communicate with each other properly anymore.

You're correct: hairpin off would block all inter-client traffic, not just m/bcast.

Also, now found out that mcast_to_unicast is off as well on the secondary interfaces.
Another parameter to check for anyone that runs into the inter-client traffic doesn't go through problem.

head /sys/devices/virtual/net/br*/lower*/brport/hairpin_mode
head /sys/devices/virtual/net/br*/lower*/brport/multicast_to_unicast

multicast_to_anycast should be set to 1 for all wireless interfaces.
hairpin_mode should be set to 1 for all wireless interfaces that do NOT have port isolation on.

1 Like

Since posting my experience here .....

  • I disabled irqbalance in /etc/config/irqbalance => reboot, no difference
  • I disabled the service using service irqbalance disabled => reboot, now I'm not experiencing what I described in my post, can now ping all devices and reconnect to either 2.4/5G.

I noticed a night/day experience since the change :crossed_fingers: It's been 3 days since posting, still behaving normally.

Will re-post if issue arises.

I have been running this router for a while, and I also have a problem almost similar to yours, only that mine happens on almost a daily basis.
My router's WAN port is connected to the ISP router's LAN port. The ISP router is my GW.
I have a static IP on the WAN port of my router - 192.168.1.2/24, GW=192.168.1.1; My LAN subnet is 172.16.17.0/24, with the router being 172.16.17.1/24.

The router would be functional for several hours then suddenly will all devices would lose Internet connection. When this happens, a WiFi device connected to the router can reach all devices, except the upstream. Even the router itself cannot reach the GW - 192.168.1.1. I have to reboot the router by unplugging the power. A soft reboot does not cut it.
I initially thought it was a cable issue, but I have swapped the Ethernet cables and I have very high-quality cables from Siemon and others.
So far I am completely stumped as to what the issue could be. And it has nothing to do with overheating or resource usage. I monitor those using luci-app-statistics and they are all okay.
When I saw this post, I looked to see if I have that irqbalance option in mine, but it's not there.
Okay, maybe it was, but I had just installed r23051 before checking.

I am wondering whether anyone else is experiencing the same phenomena and what the fix would be.

Last I knew - irqbalance does not work properly with our routers. I know @hnyman was trying to sort it out, but not sure if it ever did or not. I've been manually assigning a few of the affinities to different cores to spread them out, but that's about it to this point.

#assign 4 rx interrupts to each cores
echo 8 > /proc/irq/50/smp_affinity
echo 4 > /proc/irq/51/smp_affinity
echo 2 > /proc/irq/52/smp_affinity
echo 1 > /proc/irq/53/smp_affinity

#assign 3 tcl completions to 3 CPUs
echo 4 > /proc/irq/73/smp_affinity
echo 2 > /proc/irq/74/smp_affinity
echo 1 > /proc/irq/75/smp_affinity

ath11k wifi seemed troublesome for continuously changing IRQs, so in the end I did not propose any IRQ ID heuristics to the upstream irqbalance.

I have been manually assigning lot of IRQs (but have not touched LAN/WAN/edma plus ath11 copy engine (ce) IRQs that caused crashes).

Somewhat differing from @SiXX , I have tried to set all assignments away from core0 (affinity 1), as the lan/wan/edma/ce is there.

root@router5:~# cat /etc/ath11k_v3_irq_balance.sh
#!/bin/sh
        #assign 4 rx interrupts to each cores
        irq_affinity_num=`grep -E -m1 'reo2host-destination-ring4' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
        [ -n "$irq_affinity_num" ] && echo 8 > /proc/irq/$irq_affinity_num/smp_affinity
        irq_affinity_num=`grep -E -m1 'reo2host-destination-ring3' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
        [ -n "$irq_affinity_num" ] && echo 4 > /proc/irq/$irq_affinity_num/smp_affinity
        irq_affinity_num=`grep -E -m1 'reo2host-destination-ring2' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
        [ -n "$irq_affinity_num" ] && echo 2 > /proc/irq/$irq_affinity_num/smp_affinity
        irq_affinity_num=`grep -E -m1 'reo2host-destination-ring1' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
        [ -n "$irq_affinity_num" ] && echo 8 > /proc/irq/$irq_affinity_num/smp_affinity

        #assign 3 tcl completions to 3 CPUs
        irq_affinity_num=`grep -E -m1 'wbm2host-tx-completions-ring3' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
        [ -n "$irq_affinity_num" ] && echo 4 > /proc/irq/$irq_affinity_num/smp_affinity
        irq_affinity_num=`grep -E -m1 'wbm2host-tx-completions-ring2' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
        [ -n "$irq_affinity_num" ] && echo 2 > /proc/irq/$irq_affinity_num/smp_affinity
        irq_affinity_num=`grep -E -m1 'wbm2host-tx-completions-ring1' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
        [ -n "$irq_affinity_num" ] && echo 8 > /proc/irq/$irq_affinity_num/smp_affinity

        #assign 3 tcl completions to 3 CPUs
        irq_affinity_num=`grep -E -m1 'ppdu-end-interrupts-mac3' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
        [ -n "$irq_affinity_num" ] && echo 4 > /proc/irq/$irq_affinity_num/smp_affinity
        irq_affinity_num=`grep -E -m1 'ppdu-end-interrupts-mac2' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
        [ -n "$irq_affinity_num" ] && echo 2 > /proc/irq/$irq_affinity_num/smp_affinity
        irq_affinity_num=`grep -E -m1 'ppdu-end-interrupts-mac1' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
        [ -n "$irq_affinity_num" ] && echo 8 > /proc/irq/$irq_affinity_num/smp_affinity

After 1 day uptime:

root@router5:~# cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  9:          0          0          0          0     GIC-0  39 Level     arch_mem_timer
 13:    3543423   11024860    3275093    3312013     GIC-0  20 Level     arch_timer
 16:          2          0          0          0     GIC-0 354 Edge      smp2p
 17:          0          0          0          0     GIC-0 216 Level     4a9000.thermal-sensor
 18:          0          0          0          0     GIC-0 239 Level     bam_dma
 21:          0          0          0          0     GIC-0 270 Level     bam_dma
 22:          6          0          0          0     GIC-0 340 Level     msm_serial0
 23:     105791          0          0          0     GIC-0 178 Level     bam_dma
 24:          0          0          0          0     GIC-0  35 Edge      wdt_bark
 25:          0          0          0          0     GIC-0 357 Edge      q6v5 wdog
 29:          5          0          0          0     GIC-0 348 Edge      ce0
 30:    2447780          0          0          0     GIC-0 347 Edge      ce1
 31:     754773          0          0          0     GIC-0 346 Edge      ce2
 32:     234956          0          0          0     GIC-0 343 Edge      ce3
 34:          0          0          0          0     GIC-0 443 Edge      ce5
 36:     184732          0          0          0     GIC-0  72 Edge      ce7
 38:          0          0          0          0     GIC-0 334 Edge      ce9
 39:          1          0          0          0     GIC-0 333 Edge      ce10
 40:          0          0          0          0     GIC-0  69 Edge      ce11
 47:          0          0          0          0     GIC-0 323 Edge      reo2ost-exception
 48:         74          0          0          0     GIC-0 322 Edge      wbm2host-rx-release
 49:         51          0          0          0     GIC-0 321 Edge      reo2host-status
 50:          4          0          0       8371     GIC-0 320 Edge      reo2host-destination-ring4
 51:         32          0       6972          0     GIC-0 271 Edge      reo2host-destination-ring3
 52:         76      10839          0          0     GIC-0 268 Edge      reo2host-destination-ring2
 53:         32          0          0       7445     GIC-0 267 Edge      reo2host-destination-ring1
 57:       1264          0     648097          0     GIC-0 263 Edge      ppdu-end-interrupts-mac3
 58:          0          0          0          0     GIC-0 262 Edge      ppdu-end-interrupts-mac2
 59:        250          0          0      70811     GIC-0 261 Edge      ppdu-end-interrupts-mac1
 60:          1          0          0          0     GIC-0 260 Edge      rxdma2host-monitor-status-ring-mac3
 61:          0          0          0          0     GIC-0 256 Edge      rxdma2host-monitor-status-ring-mac2
 62:          1          0          0          0     GIC-0 255 Edge      rxdma2host-monitor-status-ring-mac1
 63:          1          0          0          0     GIC-0 235 Edge      host2rxdma-host-buf-ring-mac3
 64:          0          0          0          0     GIC-0 215 Edge      host2rxdma-host-buf-ring-mac2
 65:          0          0          0          0     GIC-0 212 Edge      host2rxdma-host-buf-ring-mac1
 66:          0          0          0          0     GIC-0 211 Edge      rxdma2host-destination-ring-mac3
 67:          0          0          0          0     GIC-0 210 Edge      rxdma2host-destination-ring-mac2
 68:          0          0          0          0     GIC-0 209 Edge      rxdma2host-destination-ring-mac1
 73:         18          0        924          0     GIC-0 191 Edge      wbm2host-tx-completions-ring3
 74:         20       1989          0          0     GIC-0 190 Edge      wbm2host-tx-completions-ring2
 75:        139          0          0      29228     GIC-0 189 Edge      wbm2host-tx-completions-ring1
 77:         14          0          0          0     GIC-0  47 Edge      cpr3
 78:    1257470          0          0          0     GIC-0 377 Level     edma_txcmpl
 79:          0          0          0          0     GIC-0 385 Level     edma_rxfill
 80:    1121735          0          0          0     GIC-0 393 Level     edma_rxdesc
 81:          0          0          0          0     GIC-0 376 Level     edma_misc
 82:          0          0          0          0  pmic_arb 51380237 Edge      pm-adc5
 83:          0          0          0          0     smp2p   0 Edge      q6v5 fatal
 84:          1          0          0          0     smp2p   1 Edge      q6v5 ready
 85:          0          0          0          0     smp2p   2 Edge      q6v5 handover
 86:          0          0          0          0     smp2p   3 Edge      q6v5 stop
 87:          0          0          0          0   msmgpio  34 Edge      keys
 88:          0          0          0          0   msmgpio  63 Edge      keys
 89:          0          0          0          0     GIC-0 172 Level     xhci-hcd:usb1
 90:         64          0          0          0     GIC-0 353 Edge      glink-native
IPI0:      6063       7402       6486       9099       Rescheduling interrupts
IPI1:    428834     721706     554283     756649       Function call interrupts
IPI2:         0          0          0          0       CPU stop interrupts
IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
IPI4:         0          0          0          0       Timer broadcast interrupts
IPI5:      1463       1416       1577       1513       IRQ work interrupts
IPI6:         0          0          0          0       CPU wake-up interrupts
Err:          0

8 Likes

I flashed the router working on OpenWrt with a new snapshot version and the router no longer booted.
I got a serial connection working. Now, how do I unbrick the router?
This is the console boot output:

S - Core 0 Frequency, 1651 MHz


U-Boot 0.0.1-1-80112-CS (May 21 2021 - 09:29:10 +0800)

DRAM:  smem ram ptable found: ver: 1 len: 4
1 GiB
Led init ...
NAND:  Could not find nand_gpio in dts, using defaults
ONFI device found
ID = 1590aa2c
Vendor = 2c
Device = aa
qpic_nand: changing oobsize to 80 from 128 bytes
SF: Unsupported flash IDs: manuf ff, jedec ffff, ext_jedec ffff
ipq_spi: SPI Flash not found (bus/cs/speed/mode) = (0/0/48000000/0)
256 MiB
MMC:   sdhci: Node Not found, skipping initialization

PCI0 is not defined in the device tree
PCI1 is not defined in the device tree
In:    serial@78B3000
Out:   serial@78B3000
Err:   serial@78B3000
machid: 8850105
MMC Device 0 not found
eth5 MAC Address from ART is not valid
Hit any key to stop autoboot:  0
ubi0: attaching mtd1
ubi0: scanning is finished
ubi0: attached mtd1 (name "mtd=1", size 97 MiB)
ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
ubi0: good PEBs: 776, bad PEBs: 0, corrupted PEBs: 0
ubi0: user volume: 0, internal volumes: 1, max. volumes count: 128
ubi0: max/mean erase counter: 6/3, WL threshold: 4096, image sequence number: 1011277052
ubi0: available PEBs: 732, total reserved PEBs: 44, PEBs reserved for bad PEB handling: 40
Read 0 bytes from volume kernel to 44000000
Volume kernel not found!
ubi0: detaching mtd1
ubi0: mtd1 is detached
Wrong Image Format for bootm command
ERROR: can't get kernel image!

Net:   MAC0 addr:a4:97:33:df:b3:b6
PHY ID1: 0x4d
PHY ID2: 0xd0b1
PHY ID1: 0x4d
PHY ID2: 0xd101
EDMA ver 1 hw init
Num rings - TxDesc:1 (0-0) TxCmpl:1 (7-7)
RxDesc:1 (15-15) RxFill:1 (7-7)
ipq807x_edma_alloc_rings: successfull
ipq807x_edma_setup_ring_resources: successfull
ipq807x_edma_configure_rings: successfull
ipq807x_edma_hw_init: successfull
eth0

Post printenv from uboot.

Not sure if that's what you mean:

IPQ807x# printenv
baudrate=115200
bootargs=console=ttyMSM0,115200n8 ubi.mtd=rootfs rootfstype=squashfs rootwait
bootcmd=setenv bootargs console=ttyMSM0,115200n8 ubi.mtd=rootfs rootfstype=squashfs rootwait; ubi part fs; ubi read 0x44000000 kernel; bootm 0x44000000#config@rt5010w-d350-rev0
bootdelay=2
eth1addr=a4:97:33:df:b3:b7
eth2addr=a4:97:33:df:b3:b7
eth3addr=a4:97:33:df:b3:b7
eth4addr=a4:97:33:df:b3:b7
ethact=eth0
ethaddr=a4:97:33:df:b3:b6
fdt_high=0x4A400000
fdtcontroladdr=4a971480
flash_type=2
machid=8850105
mtddevname=fs_1
mtddevnum=0
mtdids=nand0=nand0
mtdparts=mtdparts=nand0:0x6100000@0x7a00000(fs),0x6100000@0x1000000(fs_1)
partition=nand0,0
soc_version_major=2
soc_version_minor=0
stderr=serial@78B3000
stdin=serial@78B3000
stdout=serial@78B3000

Environment size: 794/262140 bytes

Same here, however I've set Wan as a DHCP client. I don't have any issues that you describe though.

Yeah I found that out the hard way.