Archer C7 2.4 GHz wireless dies in 24~48 hours

it's a little "library" easy to integrate into your own bash scripts and to send/edit telegram notifications via a telegram bot to your phone.

I use it for different things, lately to notify myself of unexpected router reboots.

1 Like

Can you tell me how to make it work ? example check connection or if some clients disconnect, the bot received a notification, thanks.

1 Like

It's just filling in your API key of your Telegram bot and the chatId from Telegram into https://github.com/Catfriend1/openwrt-presence/blob/master/scripts/telegram/lib_telegram_cfg_bot.sh .

You are then ready to execute the test/demo script : https://github.com/Catfriend1/openwrt-presence/blob/master/scripts/telegram/test_lib_telegram_messenger.sh

Hi, I'm trying to make ath9k-watchdog.sh work on my openwrt-sfe-flowoffload-ath79 image from https://github.com/gwlim/openwrt-sfe-flowoffload-ath79/tree/master/JUL-2020/openwrt-sfe-flowoffload-normal/mips74k/TP-Link%20Archer%20C7v2-mips74k-ath10k
For that I'd have to install bash if I got everything right. Unfortunately I cannot install bash normally ("/usr/sbin/opkg-key: line 22: usign: not found" when updating package lists) - I assume because of this custom image. I can't find a compatible ipk for manual install. Can you help me out?

While I've made a telegram notifier that pushes me a message when any of my Archer's output syslog line indicating an unexpected reboot occured (procd -- init complete --), I've discovered something interesting along the days:

  • all archers went fine and stable for 9 days without sudden reboots
  • one unit (that is very similar configured like the others) often has sudden reboots, after 8 days it went into the reboot and my script reported this immediately (from the syslog server to telegram).
  • the "problematic" unit did the reboot as soon I was turning on my old TV (in same room, short 2.4 GHz wifi distance) and using it to view video from a miniDLNA server.

So, I wonder if those "sudden reboots" have something to do with multicast/broadcast announcements of UPnP?!

btw /sys/kernel/debug/crashlog was empty. ( Source: Crashlog retrieval (MIPS) - #3 by hailfinger )

I've now hit a different log line for a PC associated with the AP:

[934478.437766] Rekeying PTK for STA ac:7b:a1:xx:yy:zz but driver can't safely do that.

What does that mean? Any negative consequences to be expected after that occured?

Hi there,

I put the workaround in my rc.local:
iw dev wlan_2g scan trigger freq 2437 flush >/dev/null 2>&1

It kind works, my wifi connection didnt stop anymore, for Youtube,Download,Browser is working fine. But when I'm using VPN or other specific apps they crash or disconnect every time, probably because of IW SCAN.

Anyone has experience that ?
1 Like

No I don't experience this problem. How is your wan connected? Ethernet or sta/ap?

1 Like

I'm connected via Ethernet on FIBER.

1 Like

I think I have identify the cause of ath9k slowness. I have an GL AR300m running OpenWRT 21.02 which only have a single 2.4Ghz radio. I run iperf and the slowness shows before 3hrs on AP. This rate increases in repeater mode with the repeater losing beacon and the WiFi stack reconnects.

I think the problem is txq buffer size.

Check your physical radio for 2.4ghz , it's either phy0 or phy1

Determine your phy 2.4ghz radio
iwinfo|grep -m 1 -A 10 '.*2\.4'|grep -Eo 'phy\d+$'

Check memory limit and double it. Mine was originally set at 4194304

iw phy phy0 get txq

iw phy phy0 set txq memory_limit 8388608

Don't forget to disable scan workaround if you got that running

I can confirm the below settting on a GL AR300 with 128M memory has resolved my issues.

root@repeater:~# iw phy phy0 get txq
Packet limit: 8192 pkts
Memory limit: 8388608 bytes
Quantum: 300 bytes
Number of queues: 4096
Backlog: 0 pkts
Memory usage: 0 bytes
Packet limit overflows: 0
Memory limit overflows: 0
Hash collisions: 3116

The above setting was testing for AP/STA. When testing for dumb AP, it dies alot quicker
so we know txq is part of the equation. I've also double the Packet limit to 16384 and currently testing

iw phy phy0 set txq limit 16384

4 Likes

OpenWrt 21.02.0-rc.3 on TP-Link Archer C7v2 here:

iw phy phy0 get txq

Packet limit:           8192 pkts
Memory limit:           16777216 bytes
Quantum:                300 bytes
Number of queues:       4096
Backlog:                0 pkts
Memory usage:           0 bytes
Packet limit overflows: 0
Memory limit overflows: 0
Hash collisions:        0

iw phy phy1 get txq

Packet limit:           8192 pkts
Memory limit:           4194304 bytes
Quantum:                300 bytes
Number of queues:       4096
Backlog:                0 pkts
Memory usage:           0 bytes
Packet limit overflows: 0
Memory limit overflows: 0
Hash collisions:        4546

@sammo Which value do you suggest to optimize here in order to stabilize the Archer C7v2 ath9k 2.4 GHz Wifi without using the ath9k-watchdog.sh script? Ahhh silly me, phy1 is the 2.4 GHz and yes, I'll gve the 8 M memory limit a try now :-). Thank you.

ToDo for me:

iw phy phy1 set txq memory_limit 8388608

and I'll test if it is enough to set it once after startup or if it must be regularly refreshed e.g. when the adapter restarts.

To see how long the change sticks:

while(true);do clear; iw phy phy1 get txq; sleep 1; done;

UPDATE: To make the setting stick, put it in /etc/rc.local , on top for example is fine.

# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.

/usr/sbin/iw phy phy1 set txq memory_limit 8388608

exit 0

I can confirm these 2 txq setting makes a difference to wireless dying.
I dont know the optimum value or understand how the algorithm works, but these 2 setting makes a big difference. At least this pinpoints where the problem is and a developer can do further investigation

iw phy <PHY> set txq memory_limit 8388608
iw phy <PHY> set txq limit 16384
1 Like

My question would be - is this actually a "fix" that provides for some sort of needed "ceiling" memory usage? Or is this only pacifying some sort of a memory leak, such that it will only prolong the inevitable?

Good question, I cannot be sure. I ran iperf over 3hrs which would have cause it to die, but it did not. I checked dmesg/logread and cannot see oom or kernel panics. I guess if you are on a 32M device it would oom.

If more people can test the settings then we have more information

1 Like

Very interesting, I probably will give this a try shortly... C7V3 here.

Currently, am taking the (embarrassing to me) 'kick out the plug on the 2.4ghz radio every night" cron task approach.

I'm wondering about my large number of overflows, and what that may mean. Even on the 5ghz, I seem to have a lot. Below is current, on a 13 day uptime.

5ghz radio
root@AP:~# iw phy phy0 get txq
Packet limit:           8192 pkts
Memory limit:           16777216 bytes
Quantum:                300 bytes
Number of queues:       4096
Backlog:                0 pkts
Memory usage:           0 bytes
Packet limit overflows: 652176
Memory limit overflows: 652176
Hash collisions:        11857

2.4ghz radio
root@AP:~# iw phy phy1 get txq
Packet limit:           8192 pkts
Memory limit:           4194304 bytes
Quantum:                300 bytes
Number of queues:       4096
Backlog:                0 pkts
Memory usage:           0 bytes
Packet limit overflows: 2103
Memory limit overflows: 2103
Hash collisions:        3840

Kinda makes me wonder why I'm not noticing 5ghz problems? Also, have only been watching this since reading the article 30min ago, but only thing changing has been the 2.4ghz hash collision number, up 4. No overflow number change. Anyone have ideas where to read up on what these are telling us?

Will be following this and giving it a try...

EDIT: The more I look at these numbers, the more I think I don't understand what they are. Packet and Memory limit overflows don't change often, but then seem to do so in some kind of rare event. I have run both radios with multiple bufferbloat tests, without seeing those two overflows happen. I have only seen occasional and few hash collision counts with and without bufferbloat testing.

Also I'm having problems digging up info on what they indicate. Anyone having better luck?

1 Like

Did abit more testing and investigation.

https://www.mail-archive.com/search?l=cake@lists.bufferbloat.net&q=subject:"Re\%3A+\[Cake\]+\[Battlemesh\]+Wifi+Memory+limits+in+small+platforms"&o=newest&f=1

The memory limit needs to do down. My setting are

root@repeater:~# iw phy phy0 get txq
Packet limit:           8192 pkts
Memory limit:           2097152 bytes
Quantum:                300 bytes
Number of queues:       4096
Backlog:                0 pkts
Memory usage:           0 bytes
Packet limit overflows: 0
Memory limit overflows: 0
Hash collisions:        22

I did not read that whole conversation, but where did you get the idea of lowering the limit, other than Toke gave that as an example? Are you seeing better/different results in halving it, vs doubling it?

I'm on only10hrs after doubling it (8Mb memory limit on 2.4ghz)... will have to wait 3 days or more for the problem to occur, that was average instance time here in my home use environment.

Still wondering why my C7 has many overflow errors and hash collisions compared with pgn-1111... see my edit to earlier post.

1 Like

I'm curious about reported memory use, from a fellow C7 user... I haven't been able to reboot my router, people, uh, get cranky..

I see, in the klog during boot, we get some information. One is available memory, which I think was mentioned by Ben Greear as useful to see how close to the end you are getting when increasing number of stations (different topic). What do you get for the below lines, in your kernel log, after the memory increase?

[   15.766785] ath10k_pci 0000:00:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
[   16.710836] ath10k_pci 0000:00:00.0: 10.1 wmi init: vdevs: 16  peers: 127  tid: 256
[   16.727604] ath10k_pci 0000:00:00.0: wmi print 'P 128 V 8 T 410'
[   16.733935] ath10k_pci 0000:00:00.0: wmi print 'msdu-desc: 1424  sw-crypt: 0 ct-sta: 0'
[   16.742062] ath10k_pci 0000:00:00.0: wmi print 'alloc rem: 20984 iram: 25656'
[   16.791956] ath10k_pci 0000:00:00.0: htt-ver 2.1 wmi-op 2 htt-op 2 cal file max-sta 128 raw 0 hwcrypto 1

I'm interested in the second to the last line, that is indicating memory... and for that other topic the second from the first, for number of vdevs and peers as well. Might be instructive on where that goes with various amounts of allocated memory. Thanks!

On the GL ar300 there is 128M of memory. When I double it for AP/STA it never die but when I switch it to AP and ran the iperf, it die fairly quickly. When you read the link it mention about 3M is the limit for queue on Openwrt. So far my testing with 2M have been stable on AP/STA and also AP

Ehh... it says that 3M was the former max size limit, then it was changed and now 4M is the default. Like we see with the iw phy command.

I don't see it said that 2M is anything other than an example of how to change it. The overall conversation seems to be about devices with low RAM, like 32M routers.

I don't know how much this effects our 128M devices. I wish I knew more. Does it depend on how many stations are configured? How much difference in that, between your router and a C7? Again, anyone know of some reference sources? I don't have the means at the moment to set up a decent test set myself, or I would go deeper with this.

>>
>> So before the queueing patches to mac80211, the maximum packet queue
>> size for ath9k was 3MB in total, or 2.2MB if only a single AC was used
>> on the WiFi link (that's 128 packets in the driver + 1000 in the
>> pfifo_fast qdisc * 2074 bytes for the truesize of a full-size packet).
>> Whereas now the default is 4MB for a non-vht device. So it's not
>> actually that big of a difference, and as you've already discovered the
>> defaults can be changed.
>>

Also, seems that the mailing list site is not responding, so I can't search back for the whole conversation....