Hi all, (sorry, long post).
the described setup runs 19.07 snapshot 4.19.66 on all 4 OpenWrt devices, 3 are ath79(ath9k/ath10k), 1 ipq40xx(ath10k). The latter is a GL.iNet B1300 (GW), the other ones are TPlink Archer c7 v5 (LR) and (WR), and a GL.iNet AR300M-lite(M).
There are two 802.11s(WPA3/WPA2-PSK) meshes: 5GHz on the first 3 devices, and 2GHz on all 4 devices. They are all within WIFI reception range of each other. There are LAN devices hooked up to all of them.
Most of the time the LAN-devices are supposed to be in the same L2. (Occasionally, I want to isolate LAN-ports into a dedicated "examination VLAN").
The good(ish) news: It works! ... with max 90MBits, slow ping, and/or lost packets.
On the UI, and locking at batctl n
output, everything looks normal: After bootup all expected WIFI associations(3x5GHz, 4x2GHz) and neighbours show up, initially on 6Mbits and 20MHz, which then climb up to near link maxima (300MBit/s & 866MBit/s) within the minute, and then fluctuate a bit in the top 20% sometimes.
All OpenWrt devices can ping each-other, and any of the LAN-Devices, and vice versa over Bat0InBridge. But slow pings, mostly 3-15ms, very fluctuating. VLAN worked (on 1 quick test).
BUT: Half-loaded web-pages, and other problems right from the get-go.
I had followed the online guide more-or-less, so then I started tweaking MTU downwards from 2304 (the value in the guide, and the value I'd prefer) over 1595 (A value i saw recommended somewhere) over 1536, 1532 all the way to 1500, which finally seems error-free, but slow, both in terms of latency, and bandwidth. (but batman complains, because MTU<1532 which leads to fragmentation->slowdown)
An indicator for failure w.r.t. high MTU was inability to ping with packets > -s 1462
.
iperf
gives wildly varying values from 90 KBits to 105Mbits.
I found netcat -l -p 9993 </dev/zero
with netcat -p 9993 other_host | pv >/dev/null
to be much more useful in this case.
Getting there meant overcoming some hurdles and facing some truths, such as:
- Don't try with <19.07 snapshots. All the batman docs have been updated to 19.xx. Plus: You get a much newer, better kernel with better security(2).
- 19.07 snapshots have growing pains, but a step in the right direction. Con: You get a new kernel, that somehow in some things works worse than a 3.xx(2).
- You'll need to use the image-builder to try all the combinations of drivers/firmware/kernels/x/y without too much pain.
- Follow @jeff's example: Name your radios, and interfaces: rd5g, mesh5g, ap2g, mon2g, etc.
- on some devices I had to add
option macaddr 12:34:56:X:Y:Z
to avoid warning floods about duplicated MACs. - IBSS (ad-hoc) mode has similar headaches. (Many devices/fw/drivers don't support RSN-IBSS (encryption)
- on 2GHz all (ath79?) devices requires an additional AP vIF to enable HT40(1).
- The whole fw{ct|no_ct}X{ct|no_ct}driver situation isn't quite clear (to me).
- The Archers seem to require the (non-default) NO_CT firmware for 11s.
- The ipq40xx currently runs the (default) CT fw, but it's not working well.(3)
(1) This seems to be a known problem. Can somebody elaborate, what happens here on the FW/Chip/driver level? Perhaps some sort of chip setup sequence edge cases, that don't entirely correctly setup crypto/framing/other units, if an AP is not configured, or such?
(2) Flow offloading, AP/VLAN, and other not-yet forwardTo5.3-then-backwardTo4.19-ported fixes for problems from the 3.xx era. But we're getting there.
(3) Is there an (up2date) table(4) somewhere with chipset/driver/firmware/x/y/z combinations known to have functioning 11s mesh with WPA{3|2} with {5|2}GHz AP/MESH combinations running at MTU 2304 with hwcrypto=1 and batman and steady ping <3ms between neighbors, and at least 50% link throughput for athXk, and others? (with batman-adv network-coding also, ideally) @Candela_tech perhaps?
(4) Is there an existing table/page I could update with my findings?
My configs all +- look like the examples from here:
https://openwrt.org/docs/guide-user/network/wifi/mesh/batman
Luci added option mesh_rssi_threshold '0'
to all my mesh definitions.
I added following options to the APs:
option ieee80211r '1'
option nasid 'ROAMID'
option ifname 'ap2g'
option ft_over_ds '1'
option ft_psk_generate_local '1'
option wpa_disable_eapol_key_retries '1'
2 More remaining questions Ich habe are:
- Why are ping-times over
batman
so fluctuating? Does Batman just not optimize for small ICMP packets? Or is this the effect ofnetwork coding
? In my config, it reports as disabled. - Is the following correct: "The higher the MTU on hard_IFs, the more small additional packets can be coalesced into the same transmission" AND "With network coding, the padding can include also packets going to other destinations behind that neighbor, but not behind other neighbors, even if they could also see the packet"
So, thanks for hanging on until now, I hope, I added all the useful context information. I could also post complete configs, but apart from what i posted above, they conform to the template. I can post anything requested.
If you have any additional hints/experiences you could share, I would be very happy. Thanks