Hi,
We're about to setup a 802.11s mesh network routed by B.A.T.M.A.N adv. Although I closely followed the guidance of this blog post as well as the OpenWrt forum currently our network is barley usable. What we see is that the speed goes from high rates of 200-300 Mbit/s down to the (presumably) minimum of 6-7 Mbit/s. Speed stays low for some seconds and goes slowly back up again - to just drop again down to 6-7 Mbit/s. This seems to affect the whole network.
Setup
[Bridge 1]-----5 Ghz-----[Gateway]-----5 Ghz-----[Bridge 2]-----5 Ghz-----[Bridge 3]
802.11s | 802.11s 802.11s
Internet
The network is situated in an apartment building and provides multiple flats with access to the internet. Client access is provided at 2.4 Ghz, the 5 Ghz band is exclusively for meshing. The nodes are approx. 10-20 m apart from each other - always divided by at least one wall.
With a wifi analyzer I confirmed that our channel 149 is not used by someone else. Also the mesh nodes only see their direct neighbors (batctl n). The nodes at the edge (Bridge 1 and Bridge 3) only have 1 neighbor each. The nodes in the middle (Gateway and Bridge 2) have 2 neighbors each.
Experiments
As an experiment, I configured Bridge 1 and Gateway to exclusively communicate on their own channel - and the performance (measured with batctl tp
) was just fine. I never saw these drops down to 6 Mbit/s. Instead the connection speed stayed constant and reliable at a high rate.
In the moment I brought in Bridge 2 and Bridge 3 again, the situation turned bad again - the speed drops described above returned.
Also, I tried to force Gateway to not use the low rates with this config (snippet):
wireless.radio0.basic_rate='12000 18000 24000 36000 48000 54000'
wireless.radio0.supported_rates='12000 18000 24000 36000 48000 54000'
...
wireless.wmesh=wifi-iface
wireless.wmesh.device='radio0'
wireless.wmesh.ifname='if-mesh'
wireless.wmesh.network='mesh'
wireless.wmesh.mode='mesh'
wireless.wmesh.basic_rate='12000 18000 24000 36000 48000 54000'
wireless.wmesh.supported_rates='12000 18000 24000 36000 48000 54000'
...
However, after applying the config I still saw the connection speed of the associated mesh neighbor (iwinfo if-mesh assoclist) drop to 6 Mbit/s. So I believe the config change had no effect.
Assumption
My assumption is that the frame collisions (I mean on the radio layer) leads to the bad performance. However, I wonder how other mesh networks do it differently. To my understanding it is required that all mesh nodes communicate via the same channel. How do you deal with the frame collisions of multiple nodes? And more importantly: What do you do to provide acceptable performance (e.g. 50 Mbit/s) in such a network?
Details
Hardware/ Firmware Spec all devices:
- TP-Link Archer 7 V5
- OpenWrt 19.07.7, r11306-c4a6851c72
Installed software (identical):
root@Gateway:~# opkg list-installed | egrep "kmod-ath10k|ath10k-firmware-qca988x|wpad-mesh-openssl|kmod-mac80211|kmod-cfg80211"
ath10k-firmware-qca988x - 2019-10-03-d622d160-1
kmod-ath10k - 4.14.221+4.19.161-1-1
kmod-cfg80211 - 4.14.221+4.19.161-1-1
kmod-mac80211 - 4.14.221+4.19.161-1-1
wpad-mesh-openssl - 2019-08-08-ca8c2bd2-7
Loaded firmware: 10.2.4-1.0-00047
logread
output on startup:
...
Sun May 16 20:14:57 2021 kern.info kernel: [ 13.309842] ath10k_pci 0000:00:00.0: pci irq legacy oper_irq_mode 1 irq_mode 0 reset_mode 0
Sun May 16 20:14:57 2021 kern.warn kernel: [ 13.623838] ath10k_pci 0000:00:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:00:00.0.bin failed with error -2
Sun May 16 20:14:57 2021 kern.warn kernel: [ 13.634923] ath10k_pci 0000:00:00.0: Falling back to user helper
Sun May 16 20:14:57 2021 kern.err kernel: [ 13.911257] firmware ath10k!pre-cal-pci-0000:00:00.0.bin: firmware_loading_store: map pages failed
Sun May 16 20:14:57 2021 kern.warn kernel: [ 13.924488] ath10k_pci 0000:00:00.0: Direct firmware load for ath10k/QCA988X/hw2.0/firmware-6.bin failed with error -2
Sun May 16 20:14:57 2021 kern.warn kernel: [ 13.935596] ath10k_pci 0000:00:00.0: Falling back to user helper
Sun May 16 20:14:57 2021 kern.err kernel: [ 14.152300] firmware ath10k!QCA988X!hw2.0!firmware-6.bin: firmware_loading_store: map pages failed
Sun May 16 20:14:57 2021 kern.info kernel: [ 14.591884] ath10k_pci 0000:00:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000
Sun May 16 20:14:57 2021 kern.info kernel: [ 14.601450] ath10k_pci 0000:00:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 1
Sun May 16 20:14:57 2021 kern.info kernel: [ 14.614497] ath10k_pci 0000:00:00.0: firmware ver 10.2.4-1.0-00047 api 5 features no-p2p,raw-mode,mfp,allows-mesh-bcast crc32 35bd9258
Sun May 16 20:14:57 2021 kern.warn kernel: [ 14.660214] ath10k_pci 0000:00:00.0: Direct firmware load for ath10k/QCA988X/hw2.0/board-2.bin failed with error -2
Sun May 16 20:14:57 2021 kern.warn kernel: [ 14.671024] ath10k_pci 0000:00:00.0: Falling back to user helper
Sun May 16 20:14:57 2021 kern.err kernel: [ 14.783898] firmware ath10k!QCA988X!hw2.0!board-2.bin: firmware_loading_store: map pages failed
Sun May 16 20:14:57 2021 kern.info kernel: [ 14.794552] ath10k_pci 0000:00:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
Sun May 16 20:14:57 2021 kern.info kernel: [ 15.919414] ath10k_pci 0000:00:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal file max-sta 128 raw 0 hwcrypto 1
Sun May 16 20:14:57 2021 kern.debug kernel: [ 16.054509] ath: EEPROM regdomain: 0x0
Sun May 16 20:14:57 2021 kern.debug kernel: [ 16.054516] ath: EEPROM indicates default country code should be used
Sun May 16 20:14:57 2021 kern.debug kernel: [ 16.054519] ath: doing EEPROM country->regdmn map search
Sun May 16 20:14:57 2021 kern.debug kernel: [ 16.054532] ath: country maps to regdmn code: 0x3a
Sun May 16 20:14:57 2021 kern.debug kernel: [ 16.054537] ath: Country alpha2 being used: US
Sun May 16 20:14:57 2021 kern.debug kernel: [ 16.054540] ath: Regpair used: 0x3a
Sun May 16 20:14:57 2021 kern.info kernel: [ 16.308123] batman_adv: B.A.T.M.A.N. advanced openwrt-2019.2-11 (compatibility version 15) loaded
Sun May 16 20:14:57 2021 kern.debug kernel: [ 16.360880] ath: EEPROM regdomain: 0x0
Sun May 16 20:14:57 2021 kern.debug kernel: [ 16.360888] ath: EEPROM indicates default country code should be used
Sun May 16 20:14:57 2021 kern.debug kernel: [ 16.360891] ath: doing EEPROM country->regdmn map search
Sun May 16 20:14:57 2021 kern.debug kernel: [ 16.360904] ath: country maps to regdmn code: 0x3a
Sun May 16 20:14:57 2021 kern.debug kernel: [ 16.360908] ath: Country alpha2 being used: US
Sun May 16 20:14:57 2021 kern.debug kernel: [ 16.360911] ath: Regpair used: 0x3a
Sun May 16 20:14:57 2021 kern.debug kernel: [ 16.373336] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
Sun May 16 20:14:57 2021 kern.info kernel: [ 16.374884] ieee80211 phy1: Atheros AR9561 Rev:0 mem=0xb8100000, irq=2
Sun May 16 20:14:57 2021 user.info kernel: [ 16.423460] kmodloader: done loading kernel modules from /etc/modules.d/*
...
The 802.11s configuration is (almost) identical for all devices:
# /etc/config/wireless
...
config wifi-device 'radio0'
option type 'mac80211'
option hwmode '11a'
option path 'pci0000:00/0000:00:00.0'
option htmode 'VHT80'
option disabled '0'
option channel '149' # moved to here to guarantee no interference
option txpower '20'
config wifi-iface 'wmesh'
option device 'radio0'
option ifname 'if-mesh'
option network 'mesh'
option mode 'mesh'
option mesh_id 'our-house-mesh-backbone'
option encryption 'sae'
option key '<some key>'
option mesh_fwding '0'
option mesh_ttl '1'
option mesh_rssi_threshold '0'
config wifi-device 'radio1'
option type 'mac80211'
option hwmode '11g'
option path 'platform/ahb/18100000.wmac'
option htmode 'HT20'
option disabled '0'
option channel '1'
option txpower '24'
config wifi-iface 'liberlo2_4Ghz'
option ssid 'our-house-liberlo'
option device 'radio1'
option mode 'ap'
option key '<some other key>'
option network 'lan_bat0_100'
option encryption 'psk2'
option disabled '0'
...
# /etc/config/network
...
config interface 'mesh'
option proto 'batadv_hardif'
option master 'bat0'
option mtu '2304'
option throughput_override '0'
config interface 'bat0'
option proto 'batadv'
option routing_algo 'BATMAN_IV'
option aggregated_ogms '1'
option ap_isolation '0'
option bonding '0'
option bridge_loop_avoidance '1'
option distributed_arp_table '1'
option fragmentation '1'
option gw_mode 'off'
option hop_penalty '30'
option isolation_mark '0x00000000/0x00000000'
option log_level '0'
option multicast_mode '1'
option multicast_fanout '16'
option network_coding '0'
option orig_interval '1000'
...
Planned next steps/ Experiments
These are the things I plan to do next with the hop to improve the situation:
- upgrade all nodes to firmware version 21.02.0-rc1
- reduce the 5 Ghz channel bandwidth from 80 Mhz to 40 Mhz (I read somewhere this is supposed to be more reliable - whatever that means)
As I am running out of ideas - I really hope for your ideas, suggestions and experience. If you need some more background - please let me know.
Thank you for reading and your support.