BATMAN V chooses link with bad RSSI

Hi,

I am running the latest OpenWRT on 6 TP-Link EAP-225 with a BATMAN 5 GHz 802.11s mesh with default BATMAN options if not stated otherwise below.
5 of the devices are sitting in a row in different rooms, all having a stable connection to eachother with -60 to -70 dBm respectively.
With the last static link I am still able to get around 10Mbit/s iperf3 UDP connected to the first link.

The 6th device is setup to be mobile, moving from the first static device to the last static device.
I would like to see that the mobile device connects to device 1, 2, 3, 4 and 5 when moving to get the best iperf3 throughput.

I found the BATMAN IV performance to be bad as the link quality to one of the static devices with a bad RSSI (-80 to -90 dBm) was very high like > 200. Shouldnt the link quality fluctuate way more?

Because of that I am using BATMAN V. For that I set the elp_interval to 50ms to have a fast switch when the connection gets worse since BATMAN is internally using a exponential weighted moving average filter, see https://www.ewsn.org/file-repository/ewsn2024/ewsn2024posters-final14.pdf ...
With that I am theoretically able to switch fast between links. However I have the problem that connections with a bad RSSI like -80 to -90 dBm which the moving device is still connected to, show a "high" phy rate of 45Mbit/s (which seems to be the lowest MCS) even though it should be 0. This causes BATMAN to choose this device instead of the next hop device which has a stronger RSSI and the connection fails.
I am already using a RSSI threshold for joining, but this is obviously only for joining and does not cancel the connection afterwards when the RSSI gets worse.
I do not see a way to manually tell BATMAN to avoid a connection with a bad RSSI and my approach of removing the connection to this device via iw command proved to drop the connection for several seconds before finding a new connection.
Am I doing something wrong and the performance should be way better?
Elsewise I would compile kmod-batman-adv myself and setup a RSSI threshold in the kernel module which avoids connections (set expected throughput to 1mbps) with a RSSI under a defined threshold...

Thank you for any kind of help!

Hi, I think for the start, in order to have a more clear picture some info would be required from your config: at least /etc/config/wireless for the interfaces and radio devices serving the mesh, also /etc/config/network at least for the batadv and batadv_hardif interfaces. Also some running status and capability: cat /var/run/wpa_supplicant-"your radio interface".conf and result of command iw phy "your radio device" info. Try to redact sensitive private information when posting.

Thank you, I will post those configs on monday. Have a nice weekend!

Ok, I was also thinking it would be good to have the output from the "mobile mesh node" when it is under better coverage of "next node" but still attached at the "last node" with better phy rate. Output result from: batctl meshif "your meshif" (probably bat0) n
and from:
iw dev "your mesh interface" station dump

Hi,
here is the shortened output of the mentioned files:
/etc/wireless:

config wifi-device 'radio0'
	option type 'mac80211'
	option path 'pci0000:00/0000:00:00.0'
	option band '5g'
	option channel '36'
	option htmode 'VHT40'
	option country 'DE'
	option cell_density '0'

config wifi-iface 'default_radio0'
	option device 'radio0'
	option network 'batmesh'
	option mode 'mesh'
	option encryption 'sae'
	option mesh_id '5ghz'
	option mesh_fwding '0'
	option mesh_rssi_threshold '-75'
	option key 'MYPASSWORD'

/etc/config/network:

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fd16:b01a:72be::/48'
	option packet_steering '1'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'bat0'
	list ports 'eth0'

config interface 'lan'
	option device 'br-lan'
	option proto 'static'
	option ipaddr '192.168.1.2'
	option netmask '255.255.255.0'
	option ip6assign '60'
	list dns '8.8.8.8'
	list dns '8.8.4.4'

config interface 'bat0'
	option proto 'batadv'
	option routing_algo 'BATMAN_V'
	option bridge_loop_avoidance '1'
	option gw_mode 'off'
	option hop_penalty '30'

config interface 'batmesh'
	option proto 'batadv_hardif'
	option master 'bat0'

config device
	option name 'phy0-mesh0'
	option mtu '1532'
	option mtu6 '1532'

/var/run/wpa_supplicant-phy0-mesh0.conf:

country=DE
network={
	
	ssid="5ghz"
	key_mgmt=SAE
	mode=5
	mesh_fwding=0
	mesh_rssi_threshold=-75
	fixed_freq=1
	frequency=5180
	ht40=1
	vht=1
	max_oper_chwidth=0
	noscan=1
	sae_password="MYPASSWORD"
	beacon_int=100
}

iw phy:

Wiphy phy0
	wiphy index: 0
	max # scan SSIDs: 16
	max scan IEs length: 199 bytes
	max # sched scan SSIDs: 0
	max # match sets: 0
	Retry short limit: 7
	Retry long limit: 4
	Coverage class: 0 (up to 0m)
	Device supports AP-side u-APSD.
	Device supports T-DLS.
	Available Antennas: TX 0x3 RX 0x3
	Configured Antennas: TX 0x3 RX 0x3
	Supported interface modes:
		 * managed
		 * AP
		 * AP/VLAN
		 * monitor
		 * mesh point
		 * P2P-client
		 * P2P-GO
		 * P2P-device
	Band 2:
		Capabilities: 0x19ef
			RX LDPC
			HT20/HT40
			SM Power Save disabled
			RX HT20 SGI
			RX HT40 SGI
			TX STBC
			RX STBC 1-stream
			Max AMSDU length: 7935 bytes
			DSSS/CCK HT40
		Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
		Minimum RX AMPDU time spacing: 8 usec (0x06)
		HT TX/RX MCS rate indexes supported: 0-15
		VHT Capabilities (0x339979b2):
			Max MPDU length: 11454
			Supported Channel Width: neither 160 nor 80+80
			RX LDPC
			short GI (80 MHz)
			TX STBC
			SU Beamformer
			SU Beamformee
			MU Beamformer
			MU Beamformee
			RX antenna pattern consistency
			TX antenna pattern consistency
		VHT RX MCS set:
			1 streams: MCS 0-9
			2 streams: MCS 0-9
			3 streams: not supported
			4 streams: not supported
			5 streams: not supported
			6 streams: not supported
			7 streams: not supported
			8 streams: not supported
		VHT RX highest supported: 0 Mbps
		VHT TX MCS set:
			1 streams: MCS 0-9
			2 streams: MCS 0-9
			3 streams: not supported
			4 streams: not supported
			5 streams: not supported
			6 streams: not supported
			7 streams: not supported
			8 streams: not supported
		VHT TX highest supported: 0 Mbps
		VHT extended NSS: not supported
		Frequencies:
			* 5180.0 MHz [36] (23.0 dBm)
			* 5200.0 MHz [40] (23.0 dBm)
			* 5220.0 MHz [44] (23.0 dBm)
			* 5240.0 MHz [48] (23.0 dBm)
			* 5260.0 MHz [52] (20.0 dBm) (radar detection)
			* 5280.0 MHz [56] (20.0 dBm) (radar detection)
			* 5300.0 MHz [60] (20.0 dBm) (radar detection)
			* 5320.0 MHz [64] (20.0 dBm) (radar detection)
			* 5500.0 MHz [100] (26.0 dBm) (radar detection)
			* 5520.0 MHz [104] (26.0 dBm) (radar detection)
			* 5540.0 MHz [108] (26.0 dBm) (radar detection)
			* 5560.0 MHz [112] (26.0 dBm) (radar detection)
			* 5580.0 MHz [116] (26.0 dBm) (radar detection)
			* 5600.0 MHz [120] (26.0 dBm) (radar detection)
			* 5620.0 MHz [124] (26.0 dBm) (radar detection)
			* 5640.0 MHz [128] (26.0 dBm) (radar detection)
			* 5660.0 MHz [132] (26.0 dBm) (radar detection)
			* 5680.0 MHz [136] (26.0 dBm) (radar detection)
			* 5700.0 MHz [140] (26.0 dBm) (radar detection)
			* 5720.0 MHz [144] (13.0 dBm) (radar detection)
			* 5745.0 MHz [149] (13.0 dBm)
			* 5765.0 MHz [153] (13.0 dBm)
			* 5785.0 MHz [157] (13.0 dBm)
			* 5805.0 MHz [161] (13.0 dBm)
			* 5825.0 MHz [165] (13.0 dBm)
			* 5845.0 MHz [169] (13.0 dBm)
			* 5865.0 MHz [173] (13.0 dBm)
	valid interface combinations:
		 * #{ managed } <= 1, #{ AP, mesh point } <= 16,
		   total <= 16, #channels <= 1, STA/AP BI must match, radar detect widths: { 20 MHz (no HT), 20 MHz, 40 MHz, 80 MHz, 80+80 MHz, 160 MHz }

	HT Capability overrides:
		 * MCS: ff ff ff ff ff ff ff ff ff ff
		 * maximum A-MSDU length
		 * supported channel width
		 * short GI for 40 MHz
		 * max A-MPDU length exponent
		 * min MPDU start spacing
	max # scan plans: 1
	max scan plan interval: -1
	max scan plan iterations: 0
	Maximum associated stations in AP mode: 32
	Supported extended features:
		* [ VHT_IBSS ]: VHT-IBSS
		* [ RRM ]: RRM
		* [ SET_SCAN_DWELL ]: scan dwell setting
		* [ FILS_STA ]: STA FILS (Fast Initial Link Setup)
		* [ CQM_RSSI_LIST ]: multiple CQM_RSSI_THOLD records
		* [ CONTROL_PORT_OVER_NL80211 ]: control port over nl80211
		* [ TXQS ]: FQ-CoDel-enabled intermediate TXQs
		* [ AIRTIME_FAIRNESS ]: airtime fairness scheduling
		* [ AQL ]: Airtime Queue Limits (AQL)
		* [ CONTROL_PORT_NO_PREAUTH ]: disable pre-auth over nl80211 control port support
		* [ SCAN_FREQ_KHZ ]: scan on kHz frequency support
		* [ CONTROL_PORT_OVER_NL80211_TX_STATUS ]: tx status for nl80211 control port support
		* [ POWERED_ADDR_CHANGE ]: can change MAC address while up

That is not what an 802.11s mesh is for. All the meshnodes need to be static. If you move one, your 6th for example, yes it will connect in due course to which ever node has the lowest mesh metric, a value recalculated based on speed and hop-count.
This can take a number of seconds, possibly tens of seconds once the mobile node stops moving.
But if the mobile node keeps moving this may never happen, instead staying connected to its original peer until it totally looses its connection.

1 Like

Yes, but also this would be nice to have in order to observe if any optimization could be applied maybe using the coverage density option...

Hi, I have to disagree. 802.11s is only the lower layer.

I am using the Better approach to Mobile Ad-Hoc Networking above it which does the routing.

I am convinced that the BATMAN V performance is so bad in my case because the ath9k and ath10k drivers are broken somehow. I have used serveral devices and all have wrong PHY layer information about the TX and RX rates which cause BATMAN V to perform bad. Still I do not understand why BATMAN IV performed so bad in my cases :smiley:

Spoiler: As I already mentioned I now created my own RSSI based BATMAN V routing algorithm which proved to be great in my testing!

In my test case with several static stations and a moving station on a roboter (all routers are TP-Link EAP225-Outdoor) it always chooses a good route and iperf3 is able to transmit a load without interruptions.

I will probably open-source it with a description soon when I have the time :smiley:

P.S.: Sorry that I haven’t answered earlier, I am too busy :frowning:

Did/do you mind or care to announce and discuss your issues and modifications on the batman mailing list?

Yes I will probably show them my results (when I have the time).

1 Like

The 802.11s layer 2 wireless frame type could perhaps be considered as "the lower layer", but the 802.11s mesh standard is much more than just a frame type. It also defines a layer 2 mac-routing protocol that dynamically builds a mesh backhaul with multi-point to multi-point connections between nodes. The mac-routing tables are replicated through the backhaul.

Once the 802.11s backhaul is established it functions pretty much like a virtual distributed unmanaged layer 2 switch.

You can then use this as a transport mechanism for a layer 3 network. In a simplistic case, all you need is an ipv4 dhcp server and you will get ipv4 connectivity throughout the backhaul, driven by classic ARP.

But in larger networks, something more efficient would be sensible. OLSR has been used for this, but BATMAN has proven more popular with very active development phases over the years.

The MOBILE in "Better approach to Mobile Ad-Hoc Networking" refers to client devices, eg you wandering around a community and being able to connect to any public access point and get a connection.

It does not refer to the meshnodes themselves being in motion. Yes, they can move and the mac-routing protocol will re-route/self-repair. What is important is relative velocity, compared to configured mac-routing parameters/timeouts etc.
Default settings are for stability of a pseudo-static mesh, rather than rapidly moving nodes.

BATMAN expects the mac-routing protocol to be set in a "minimal mode" so it can do its own thing without conflicting, but that is detail, rather than technical principle.

This use case is better done by having the robot's radio configured as a STA and roaming between AP's connected to the meshnodes. This can be good enough for video calls, so should be good enough for a robot.

Very interesting. Yes, mesh_rssi_threshold is certainly one of the 802.11s mac-routing protocol parameters that would have to be dynamically adjusted on a moving meshnode (along with numerous others) to ensure rapid convergence.

Please do!

1 Like

Don't nail me on details but if memory serves me well… THE killer argument for 80211s was the African non electrical grid solar only school with like 24 one laptop per child devices. But I also have no idea what actually happened with the project and idea of bringing simple adhoc network to the people.

1 Like

Not if multiple robots move :smiley:

This threshold is unfortunately only applied for joining..

I also tried to disconnect from a STA with a iw command in a external script whenever the RSSI gets bad but that turned out to cause disruptions..

Works just fine if multiple humans move though /s

No, are you confusing "mesh" with "roaming"? These are very different things.

For example "mesh station" is a completely different thing to "managed STA".

A managed STA can roam seamlessly and almost instantaneously from one managed AP to another providing you set it up correctly.

A mesh backhaul peer has nothing to roam to/from, it is a PEER of the mesh backhaul.

It can move and the mac-routing (built into the Linux kernel) will exclude it from the backhaul as it becomes out of range of its peers, based on a calculated metric.
The moved (or moving) meshnode can then attempt to rejoin the backhaul in its new location. This is not instantaneous like AP/STA roaming is. It takes time (the time taken depends on many things and can be tuned).

Edit: This thread is highlighting an interesting scenario and at least from my point of view has inspired me to revisit the underlying technicalities.

I am probably mixing up the technical terms :smiley:

I know that they are different things.

I do not have any Access Points, all my TP-Link’s are setup as mesh nodes running 802.11s with my modified BATMAN V as routing algorithm which is a Linux kernel module.

And those mesh nodes could/are ALL moving in my scenarios.

I know the whole calculation of the mac-routing of BATMAN V. In principle it uses the PHY layer TX rate (modulation coding scheme rate) divided by 3 and filtered for it’s link assessment. This rate is then put into a filter. And at every hop forwarding penalties are applied..

I have to disagree. It was instantaneous in my testing. 802.11s allows for multiple connections and the same time and then it is just a matter of BATMAN choosing another node to route over. Of course it takes some time for 802.11s to detect another new node when moving but that was never a problem for me, because it can just take the “old” route until fully connected (which happens fast enough).

You really need to be consistent.
Which is it, instantaneous, or does it take some time?

Yes, and that was your reported problem at the beginning of this thread.

Now we are getting to the bottom of your requirement.
This is the classic "Drone Swarm" problem for which there are numerous proprietary solutions way beyond the current scope of OpenWrt.

That sounds very roughly like it is trying to do an equivalent to the HWMP ALM (Airtime Link Metric) calculations.

Have you tried tuning the HWMP ALM?
The mesh11sd package does this already, but in the other direction ie static nodes, to reduce background overheads. I might do some testing with a "mobile node" config...

Hi, first of all thank you for your answer and sorry for my absence!

I mean when a mesh node gets in physical range of another mesh node the establishment of the link between those two mesh nodes takes some time.
When a mesh node is already connected to several other mesh nodes, packets can be routed to and over those other mesh nodes instantaneously.

Yes, the "Drone Swarm" problem is pretty much what I am trying to do :slight_smile:
Yeah, but BATMAN should be able to do it and I dont want a proprietary solution :confused:

No I have not tried tuning HWMP ALM. However I looked into the implementation and I can confirm that it is pretty equivalent to BATMAN V:
In the airtime_link_metric_get method it reads sta->mesh->tx_rate_avg. Then simply put, the airtime link metric is just some parameters divided by the filtered tx_rate_avg. See:

tx_rate_avg is calculated in ieee80211s_update_metric (same file) with the current MCS of different standards:

BATMAN V calls the same cfg80211_calculate_bitrate (just filters it with a different exponentially-weighted moving average low-pass filter). See:

I therefore strongly assume that mesh_hwmp will have the same bad performance for me.
I still strongly suspect that the root of the problem lies in ath9k/ath10k/ath11k driver implementation.
Maybe some commit bricked it without anyone noticing? Maybe it makes sense to take a look at an older implementation?

Thank you for your mesh11sd project btw (just found it) :slight_smile: