Do you have multiple SSIDs?
I tried to ping now from wireless client to wireless client and I have 170ms average and trace 45-50ms.
Kinda high isn't it?
Do you have multiple SSIDs?
I tried to ping now from wireless client to wireless client and I have 170ms average and trace 45-50ms.
Kinda high isn't it?
These are similar to the symptoms I've described in Dynalink DL-WRX36 Askey RT5010W IPQ8072A technical discussion - #1630 by fif
Upstream bug: https://github.com/openwrt/openwrt/issues/9555
Hard to say, check as many devices as you can to see a clearer picture I guess
Since the wifi crash, I pulled the power cord to reboot and the 5GHz device I was having issues with has come good (touch wood )
Now another device on the 2.4GHz wifi is experiencing the same issue, however even though it's not accessible (by it's GUI or ping from another device on network). I know it's still functioning (I can check this online, the device uploads our home electricity data)
However can ping the device using Openwrt cli.
It's strange as it affects devices in an unscripted basis.
The issue only affects communication between wifi devices but wifi devices will continue to be able to reach LAN devices and the AP and therefore also the gateway for WAN communication. You can try to ping your device directly from the AP. It'll probably work.
Did some more tests
Wifi device ->ping to ethernet device-> 22ms average
Wifi to wifi ping --> 118 average
1m distance from router, all alone in channel 149
Found something possibly related while I was researching option flags.
ap_isolate=1
Is set in /var/run/hostapd-phy0.conf and /var/run/hostapd-phy1.conf even if Isolate Clients is unchecked in Luci.
I don't understand what this flag is supposed to do. Apparently it's supposed to be set to 1 for better security but you can probably flip it to 0 as a workaround to your issue.
I am on OpenWrt SNAPSHOT r22923-fd0118c0a5 / LuCI Master git-23.118.79121-6fb185f.
Interesting. The thing is that the issue appears randomly. It might work for hours of days and then all of a sudden wifi devices can't reach each other anymore. It was first reported at least more than a year ago
I suspect this could be a kernel issue because all hostapd does with the isolate flag is sending it to the kernel via nl80211.
I found more interesting things.
In an earlier post I had reported that I found out that for secondary 5 GHz channels have the VLAN settings incorrect after reboot or /etc/init.d/network restart
.
It turns out that some interfaces cannot also be wrongly configured to have hairpin off after a reboot. This seems to happen to secondary 5 GHz interfaces that are on a DFS channel.
As described earlier on, secondary 5 GHz interfaces appear after the DFS CAC has completed (~1 minute after for me on US Channel 100 for example).
Not only the VLANs are incorrect, but also the hairpin mode.
Now, the failure to ping/multicast/broadcast between wifi clients I reported earlier on was on the primary 5 GHz interface, and these do not seem to suffer from the misconfigured bridge problem. But I'll take a look next time I can see the problem happening.
Maybe, if CAC runs after boot because a radar or whatever is detected, the interface could get reconfigured, lose the hairpin setting and explain the issue we've been seeing?
Is there any way to force a CAC on an interface to test that theory?
I also thought about checking hairpin_mode next time the issue occurs though I don't see how that would drop multicast packets but not unicast.
@nbd do you maybe have an idea what would cause this issue that many people are experiencing on different platforms? You seem to be quite familiar with mac80211 and have also contributed to it. It's unfortunately quite a big deal when out of the blue wifi devices can't communicate with each other properly anymore.
You're correct: hairpin off would block all inter-client traffic, not just m/bcast.
Also, now found out that mcast_to_unicast is off as well on the secondary interfaces.
Another parameter to check for anyone that runs into the inter-client traffic doesn't go through problem.
head /sys/devices/virtual/net/br*/lower*/brport/hairpin_mode
head /sys/devices/virtual/net/br*/lower*/brport/multicast_to_unicast
multicast_to_anycast
should be set to 1
for all wireless interfaces.
hairpin_mode
should be set to 1
for all wireless interfaces that do NOT have port isolation on.
Since posting my experience here .....
irqbalance
in /etc/config/irqbalance
=> reboot, no differenceservice irqbalance disabled
=> reboot, now I'm not experiencing what I described in my post, can now ping all devices and reconnect to either 2.4/5G.I noticed a night/day experience since the change It's been 3 days since posting, still behaving normally.
Will re-post if issue arises.
I have been running this router for a while, and I also have a problem almost similar to yours, only that mine happens on almost a daily basis.
My router's WAN port is connected to the ISP router's LAN port. The ISP router is my GW.
I have a static IP on the WAN port of my router - 192.168.1.2/24, GW=192.168.1.1; My LAN subnet is 172.16.17.0/24, with the router being 172.16.17.1/24.
The router would be functional for several hours then suddenly will all devices would lose Internet connection. When this happens, a WiFi device connected to the router can reach all devices, except the upstream. Even the router itself cannot reach the GW - 192.168.1.1. I have to reboot the router by unplugging the power. A soft reboot does not cut it.
I initially thought it was a cable issue, but I have swapped the Ethernet cables and I have very high-quality cables from Siemon and others.
So far I am completely stumped as to what the issue could be. And it has nothing to do with overheating or resource usage. I monitor those using luci-app-statistics and they are all okay.
When I saw this post, I looked to see if I have that irqbalance option in mine, but it's not there.
Okay, maybe it was, but I had just installed r23051 before checking.
I am wondering whether anyone else is experiencing the same phenomena and what the fix would be.
Last I knew - irqbalance does not work properly with our routers. I know @hnyman was trying to sort it out, but not sure if it ever did or not. I've been manually assigning a few of the affinities to different cores to spread them out, but that's about it to this point.
#assign 4 rx interrupts to each cores
echo 8 > /proc/irq/50/smp_affinity
echo 4 > /proc/irq/51/smp_affinity
echo 2 > /proc/irq/52/smp_affinity
echo 1 > /proc/irq/53/smp_affinity
#assign 3 tcl completions to 3 CPUs
echo 4 > /proc/irq/73/smp_affinity
echo 2 > /proc/irq/74/smp_affinity
echo 1 > /proc/irq/75/smp_affinity
ath11k wifi seemed troublesome for continuously changing IRQs, so in the end I did not propose any IRQ ID heuristics to the upstream irqbalance.
I have been manually assigning lot of IRQs (but have not touched LAN/WAN/edma plus ath11 copy engine (ce) IRQs that caused crashes).
Somewhat differing from @SiXX , I have tried to set all assignments away from core0 (affinity 1), as the lan/wan/edma/ce is there.
root@router5:~# cat /etc/ath11k_v3_irq_balance.sh
#!/bin/sh
#assign 4 rx interrupts to each cores
irq_affinity_num=`grep -E -m1 'reo2host-destination-ring4' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
[ -n "$irq_affinity_num" ] && echo 8 > /proc/irq/$irq_affinity_num/smp_affinity
irq_affinity_num=`grep -E -m1 'reo2host-destination-ring3' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
[ -n "$irq_affinity_num" ] && echo 4 > /proc/irq/$irq_affinity_num/smp_affinity
irq_affinity_num=`grep -E -m1 'reo2host-destination-ring2' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
[ -n "$irq_affinity_num" ] && echo 2 > /proc/irq/$irq_affinity_num/smp_affinity
irq_affinity_num=`grep -E -m1 'reo2host-destination-ring1' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
[ -n "$irq_affinity_num" ] && echo 8 > /proc/irq/$irq_affinity_num/smp_affinity
#assign 3 tcl completions to 3 CPUs
irq_affinity_num=`grep -E -m1 'wbm2host-tx-completions-ring3' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
[ -n "$irq_affinity_num" ] && echo 4 > /proc/irq/$irq_affinity_num/smp_affinity
irq_affinity_num=`grep -E -m1 'wbm2host-tx-completions-ring2' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
[ -n "$irq_affinity_num" ] && echo 2 > /proc/irq/$irq_affinity_num/smp_affinity
irq_affinity_num=`grep -E -m1 'wbm2host-tx-completions-ring1' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
[ -n "$irq_affinity_num" ] && echo 8 > /proc/irq/$irq_affinity_num/smp_affinity
#assign 3 tcl completions to 3 CPUs
irq_affinity_num=`grep -E -m1 'ppdu-end-interrupts-mac3' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
[ -n "$irq_affinity_num" ] && echo 4 > /proc/irq/$irq_affinity_num/smp_affinity
irq_affinity_num=`grep -E -m1 'ppdu-end-interrupts-mac2' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
[ -n "$irq_affinity_num" ] && echo 2 > /proc/irq/$irq_affinity_num/smp_affinity
irq_affinity_num=`grep -E -m1 'ppdu-end-interrupts-mac1' /proc/interrupts | cut -d ':' -f 1 | tail -n1 | tr -d ' '`
[ -n "$irq_affinity_num" ] && echo 8 > /proc/irq/$irq_affinity_num/smp_affinity
After 1 day uptime:
root@router5:~# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
9: 0 0 0 0 GIC-0 39 Level arch_mem_timer
13: 3543423 11024860 3275093 3312013 GIC-0 20 Level arch_timer
16: 2 0 0 0 GIC-0 354 Edge smp2p
17: 0 0 0 0 GIC-0 216 Level 4a9000.thermal-sensor
18: 0 0 0 0 GIC-0 239 Level bam_dma
21: 0 0 0 0 GIC-0 270 Level bam_dma
22: 6 0 0 0 GIC-0 340 Level msm_serial0
23: 105791 0 0 0 GIC-0 178 Level bam_dma
24: 0 0 0 0 GIC-0 35 Edge wdt_bark
25: 0 0 0 0 GIC-0 357 Edge q6v5 wdog
29: 5 0 0 0 GIC-0 348 Edge ce0
30: 2447780 0 0 0 GIC-0 347 Edge ce1
31: 754773 0 0 0 GIC-0 346 Edge ce2
32: 234956 0 0 0 GIC-0 343 Edge ce3
34: 0 0 0 0 GIC-0 443 Edge ce5
36: 184732 0 0 0 GIC-0 72 Edge ce7
38: 0 0 0 0 GIC-0 334 Edge ce9
39: 1 0 0 0 GIC-0 333 Edge ce10
40: 0 0 0 0 GIC-0 69 Edge ce11
47: 0 0 0 0 GIC-0 323 Edge reo2ost-exception
48: 74 0 0 0 GIC-0 322 Edge wbm2host-rx-release
49: 51 0 0 0 GIC-0 321 Edge reo2host-status
50: 4 0 0 8371 GIC-0 320 Edge reo2host-destination-ring4
51: 32 0 6972 0 GIC-0 271 Edge reo2host-destination-ring3
52: 76 10839 0 0 GIC-0 268 Edge reo2host-destination-ring2
53: 32 0 0 7445 GIC-0 267 Edge reo2host-destination-ring1
57: 1264 0 648097 0 GIC-0 263 Edge ppdu-end-interrupts-mac3
58: 0 0 0 0 GIC-0 262 Edge ppdu-end-interrupts-mac2
59: 250 0 0 70811 GIC-0 261 Edge ppdu-end-interrupts-mac1
60: 1 0 0 0 GIC-0 260 Edge rxdma2host-monitor-status-ring-mac3
61: 0 0 0 0 GIC-0 256 Edge rxdma2host-monitor-status-ring-mac2
62: 1 0 0 0 GIC-0 255 Edge rxdma2host-monitor-status-ring-mac1
63: 1 0 0 0 GIC-0 235 Edge host2rxdma-host-buf-ring-mac3
64: 0 0 0 0 GIC-0 215 Edge host2rxdma-host-buf-ring-mac2
65: 0 0 0 0 GIC-0 212 Edge host2rxdma-host-buf-ring-mac1
66: 0 0 0 0 GIC-0 211 Edge rxdma2host-destination-ring-mac3
67: 0 0 0 0 GIC-0 210 Edge rxdma2host-destination-ring-mac2
68: 0 0 0 0 GIC-0 209 Edge rxdma2host-destination-ring-mac1
73: 18 0 924 0 GIC-0 191 Edge wbm2host-tx-completions-ring3
74: 20 1989 0 0 GIC-0 190 Edge wbm2host-tx-completions-ring2
75: 139 0 0 29228 GIC-0 189 Edge wbm2host-tx-completions-ring1
77: 14 0 0 0 GIC-0 47 Edge cpr3
78: 1257470 0 0 0 GIC-0 377 Level edma_txcmpl
79: 0 0 0 0 GIC-0 385 Level edma_rxfill
80: 1121735 0 0 0 GIC-0 393 Level edma_rxdesc
81: 0 0 0 0 GIC-0 376 Level edma_misc
82: 0 0 0 0 pmic_arb 51380237 Edge pm-adc5
83: 0 0 0 0 smp2p 0 Edge q6v5 fatal
84: 1 0 0 0 smp2p 1 Edge q6v5 ready
85: 0 0 0 0 smp2p 2 Edge q6v5 handover
86: 0 0 0 0 smp2p 3 Edge q6v5 stop
87: 0 0 0 0 msmgpio 34 Edge keys
88: 0 0 0 0 msmgpio 63 Edge keys
89: 0 0 0 0 GIC-0 172 Level xhci-hcd:usb1
90: 64 0 0 0 GIC-0 353 Edge glink-native
IPI0: 6063 7402 6486 9099 Rescheduling interrupts
IPI1: 428834 721706 554283 756649 Function call interrupts
IPI2: 0 0 0 0 CPU stop interrupts
IPI3: 0 0 0 0 CPU stop (for crash dump) interrupts
IPI4: 0 0 0 0 Timer broadcast interrupts
IPI5: 1463 1416 1577 1513 IRQ work interrupts
IPI6: 0 0 0 0 CPU wake-up interrupts
Err: 0
I flashed the router working on OpenWrt with a new snapshot version and the router no longer booted.
I got a serial connection working. Now, how do I unbrick the router?
This is the console boot output:
S - Core 0 Frequency, 1651 MHz
U-Boot 0.0.1-1-80112-CS (May 21 2021 - 09:29:10 +0800)
DRAM: smem ram ptable found: ver: 1 len: 4
1 GiB
Led init ...
NAND: Could not find nand_gpio in dts, using defaults
ONFI device found
ID = 1590aa2c
Vendor = 2c
Device = aa
qpic_nand: changing oobsize to 80 from 128 bytes
SF: Unsupported flash IDs: manuf ff, jedec ffff, ext_jedec ffff
ipq_spi: SPI Flash not found (bus/cs/speed/mode) = (0/0/48000000/0)
256 MiB
MMC: sdhci: Node Not found, skipping initialization
PCI0 is not defined in the device tree
PCI1 is not defined in the device tree
In: serial@78B3000
Out: serial@78B3000
Err: serial@78B3000
machid: 8850105
MMC Device 0 not found
eth5 MAC Address from ART is not valid
Hit any key to stop autoboot: 0
ubi0: attaching mtd1
ubi0: scanning is finished
ubi0: attached mtd1 (name "mtd=1", size 97 MiB)
ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
ubi0: good PEBs: 776, bad PEBs: 0, corrupted PEBs: 0
ubi0: user volume: 0, internal volumes: 1, max. volumes count: 128
ubi0: max/mean erase counter: 6/3, WL threshold: 4096, image sequence number: 1011277052
ubi0: available PEBs: 732, total reserved PEBs: 44, PEBs reserved for bad PEB handling: 40
Read 0 bytes from volume kernel to 44000000
Volume kernel not found!
ubi0: detaching mtd1
ubi0: mtd1 is detached
Wrong Image Format for bootm command
ERROR: can't get kernel image!
Net: MAC0 addr:a4:97:33:df:b3:b6
PHY ID1: 0x4d
PHY ID2: 0xd0b1
PHY ID1: 0x4d
PHY ID2: 0xd101
EDMA ver 1 hw init
Num rings - TxDesc:1 (0-0) TxCmpl:1 (7-7)
RxDesc:1 (15-15) RxFill:1 (7-7)
ipq807x_edma_alloc_rings: successfull
ipq807x_edma_setup_ring_resources: successfull
ipq807x_edma_configure_rings: successfull
ipq807x_edma_hw_init: successfull
eth0
Post printenv from uboot.
Not sure if that's what you mean:
IPQ807x# printenv
baudrate=115200
bootargs=console=ttyMSM0,115200n8 ubi.mtd=rootfs rootfstype=squashfs rootwait
bootcmd=setenv bootargs console=ttyMSM0,115200n8 ubi.mtd=rootfs rootfstype=squashfs rootwait; ubi part fs; ubi read 0x44000000 kernel; bootm 0x44000000#config@rt5010w-d350-rev0
bootdelay=2
eth1addr=a4:97:33:df:b3:b7
eth2addr=a4:97:33:df:b3:b7
eth3addr=a4:97:33:df:b3:b7
eth4addr=a4:97:33:df:b3:b7
ethact=eth0
ethaddr=a4:97:33:df:b3:b6
fdt_high=0x4A400000
fdtcontroladdr=4a971480
flash_type=2
machid=8850105
mtddevname=fs_1
mtddevnum=0
mtdids=nand0=nand0
mtdparts=mtdparts=nand0:0x6100000@0x7a00000(fs),0x6100000@0x1000000(fs_1)
partition=nand0,0
soc_version_major=2
soc_version_minor=0
stderr=serial@78B3000
stdin=serial@78B3000
stdout=serial@78B3000
Environment size: 794/262140 bytes
Same here, however I've set Wan as a DHCP client. I don't have any issues that you describe though.
Yeah I found that out the hard way.