4.19.66 snapshot batman 802.11s 5g+2g WPA3 MTU drv/fw combination with >90MBit/s

Hi all, (sorry, long post).

the described setup runs 19.07 snapshot 4.19.66 on all 4 OpenWrt devices, 3 are ath79(ath9k/ath10k), 1 ipq40xx(ath10k). The latter is a GL.iNet B1300 (GW), the other ones are TPlink Archer c7 v5 (LR) and (WR), and a GL.iNet AR300M-lite(M).

There are two 802.11s(WPA3/WPA2-PSK) meshes: 5GHz on the first 3 devices, and 2GHz on all 4 devices. They are all within WIFI reception range of each other. There are LAN devices hooked up to all of them.

Most of the time the LAN-devices are supposed to be in the same L2. (Occasionally, I want to isolate LAN-ports into a dedicated "examination VLAN").

The good(ish) news: It works! ... with max 90MBits, slow ping, and/or lost packets.

On the UI, and locking at batctl n output, everything looks normal: After bootup all expected WIFI associations(3x5GHz, 4x2GHz) and neighbours show up, initially on 6Mbits and 20MHz, which then climb up to near link maxima (300MBit/s & 866MBit/s) within the minute, and then fluctuate a bit in the top 20% sometimes.

All OpenWrt devices can ping each-other, and any of the LAN-Devices, and vice versa over Bat0InBridge. But slow pings, mostly 3-15ms, very fluctuating. VLAN worked (on 1 quick test).

BUT: Half-loaded web-pages, and other problems right from the get-go.

I had followed the online guide more-or-less, so then I started tweaking MTU downwards from 2304 (the value in the guide, and the value I'd prefer) over 1595 (A value i saw recommended somewhere) over 1536, 1532 all the way to 1500, which finally seems error-free, but slow, both in terms of latency, and bandwidth. (but batman complains, because MTU<1532 which leads to fragmentation->slowdown)

An indicator for failure w.r.t. high MTU was inability to ping with packets > -s 1462.
iperf gives wildly varying values from 90 KBits to 105Mbits.
I found netcat -l -p 9993 </dev/zero
with netcat -p 9993 other_host | pv >/dev/null
to be much more useful in this case.

Getting there meant overcoming some hurdles and facing some truths, such as:

  • Don't try with <19.07 snapshots. All the batman docs have been updated to 19.xx. Plus: You get a much newer, better kernel with better security(2).
  • 19.07 snapshots have growing pains, but a step in the right direction. Con: You get a new kernel, that somehow in some things works worse than a 3.xx(2).
  • You'll need to use the image-builder to try all the combinations of drivers/firmware/kernels/x/y without too much pain.
  • Follow @jeff's example: Name your radios, and interfaces: rd5g, mesh5g, ap2g, mon2g, etc.
  • on some devices I had to add option macaddr 12:34:56:X:Y:Z to avoid warning floods about duplicated MACs.
  • IBSS (ad-hoc) mode has similar headaches. (Many devices/fw/drivers don't support RSN-IBSS (encryption)
  • on 2GHz all (ath79?) devices requires an additional AP vIF to enable HT40(1).
  • The whole fw{ct|no_ct}X{ct|no_ct}driver situation isn't quite clear (to me).
    • The Archers seem to require the (non-default) NO_CT firmware for 11s.
    • The ipq40xx currently runs the (default) CT fw, but it's not working well.(3)

(1) This seems to be a known problem. Can somebody elaborate, what happens here on the FW/Chip/driver level? Perhaps some sort of chip setup sequence edge cases, that don't entirely correctly setup crypto/framing/other units, if an AP is not configured, or such?
(2) Flow offloading, AP/VLAN, and other not-yet forwardTo5.3-then-backwardTo4.19-ported fixes for problems from the 3.xx era. But we're getting there.
(3) Is there an (up2date) table(4) somewhere with chipset/driver/firmware/x/y/z combinations known to have functioning 11s mesh with WPA{3|2} with {5|2}GHz AP/MESH combinations running at MTU 2304 with hwcrypto=1 and batman and steady ping <3ms between neighbors, and at least 50% link throughput for athXk, and others? (with batman-adv network-coding also, ideally) @Candela_tech perhaps?
(4) Is there an existing table/page I could update with my findings?

My configs all +- look like the examples from here:

https://openwrt.org/docs/guide-user/network/wifi/mesh/batman

Luci added option mesh_rssi_threshold '0' to all my mesh definitions.
I added following options to the APs:

option ieee80211r '1'
option nasid 'ROAMID'
option ifname 'ap2g'
option ft_over_ds '1'
option ft_psk_generate_local '1'
option wpa_disable_eapol_key_retries '1'

2 More remaining questions Ich habe are:

  • Why are ping-times over batman so fluctuating? Does Batman just not optimize for small ICMP packets? Or is this the effect of network coding? In my config, it reports as disabled.
  • Is the following correct: "The higher the MTU on hard_IFs, the more small additional packets can be coalesced into the same transmission" AND "With network coding, the padding can include also packets going to other destinations behind that neighbor, but not behind other neighbors, even if they could also see the packet"

So, thanks for hanging on until now, I hope, I added all the useful context information. I could also post complete configs, but apart from what i posted above, they conform to the template. I can post anything requested.

If you have any additional hints/experiences you could share, I would be very happy. Thanks

I've run into MTU problems when I switched to the EA8300s (and one remaining Archer C7v2), but haven't come back to them yet. It is as if the 802.11 interface won't accept more than 1500-ish, no matter what the MTU is set to. My notes show "1458-byte ping OK, 1460-byte ping fails" with tcpdump showing "length 1466". Other notes include

jeff@office:~$ size=1450 ; while sudo ping -c 1 -s $size 172.x.y.z ; do size=$(( size + 1 )) ; done

Same, dies at 1487 bytes

(where the 172 address is on a meshed host).

I found several references about similar problems with MTU more than a tiny bit above 1500 through Internet searches. I didn't find any clear resolution.

http://lists.infradead.org/pipermail/ath10k/2016-October/008621.html

I would like to refresh a recent thread back in August about the MTU
size of ATH10k
(http://lists.infradead.org/pipermail/ath10k/2016-August/008226.html).
I was trying to play with the mtu size of a ath10k card using 9980 CT
firmware. It seems it accepts up to a value of 2300 by configuration,
but practically the limit is 1595 as the thread commented.

https://ath10k.infradead.narkive.com/phiZq6Pc/driver-drops-rx-packets-when-mtu-is-more-than-1500

We get about 150-200 mbps when MTU is decreased to 1400 or lower.
MTU 1500 is also good, but sometimes we see a lot of dropped RX packets (1-5% )
MTU 1528 is the last good variant in our case with a little bit more
dropped packets (5-10%)
MTU 1529-1564 and more shows 0 kbps iperf throughput and we see that
all rx packets are dropped.

I haven't tried things since the recent commit:

commit bd926fdde5
Author: Koen Vandeputte <redacted>
Date:   Fri Aug 16 10:06:51 2019 +0200

    ath10k-firmware: update Candela Tech firmware images
    
    This should fix a problem with 1560 MTU, 160Mhz on DFS channels,
    some other small issues on < 5.2 kernels, and for 5.2 driver,
    it pulls in some upstream stable fixes.
    
    wave-1 firmware changes since last update:
    
      *  June 24, 2019: Try allocating low-priority WMI msgs if high-prio are not available.
    
      *  June 24, 2019: Init rate-ctrl to start at lowest rate instead of in the middle.  Hoping
                        this helps DHCP when station connects from a long distance.
    
    wave-2:
    
      *  June 24, 2019  Start rate-ctrl at minimal values to help DHCP work better for far-away peers.
    
      *  July 24, 2019  Fix old regression that made /a (and probably /b/g) perform poorly, at least on
                        diet-compiled images.
    
      *  Aug 8, 2019  Improve a/b/g rate-ctrl by damping the PER swings caused by the all-or-nothing logic
                      of transmitting non-block-ack frames one at a time.

(Edit, and probably not after)

commit e9d875a537
Author: Christian Lamparter <redacted>
Date:   Sun Aug 18 02:21:46 2019 +0200

    ath10k-ct: update to HEAD of 2019-08-14 - 9e5ab2
    
    Update ath10k-ct to commit 9e5ab25027e0971fa24ccf93373324c08c4e992d
    
    git log --pretty=oneline --abbrev-commit f0aa8130..9e5ab250
    
    9e5ab25 ath10k-ct:  Update to latest 5.2 upstream, support bigger mtu, 160Mhz
    
    Created with the help of the make-package-update-commit.sh script
    and refresh patches.

That is also my experience with the Archer C7v2 units.

Edit: See also https://github.com/openwrt/openwrt/pull/2341#issuecomment-523122442

ath10k-ct does not support mesh for wave-1 radios.

1 Like

My "hard cutoff" was always at ping -s 1463

Yes, I also found some of those, and similar, also without conclusion.

The firmware in ath79 & ipq40xx snapshots currently installed on the local devices, as well as the the kmod-ct are a few days older that the dates of those commits. I'll check over the next few days, if the imagebuilder gets newer packages. Or get them manually.

But anyway, the message makes it sound like MTUs up to 1560 were supported already before those commits

But it does support RSN-IBSS, which i'll try next with 1532-1555ish

1 Like

I have a mix of other devices, and have been building from source various versions of OpenWRT based on LibreMesh and LibreRouter Project also

With mixed success since lots of changes have been done to how LibreRouter and LibreMesh use Babeld, BMX, Batman-adv, dnsmaq, etc

particularly with shared-state-data and distributed dns-masq leases

Devices (some of them)

Kirdwood
Cisco ON-100
Pogo Plug mobile 4
Pogo Plug E02

PPC
Cisco Meraki MR24
Cisco Meraki MX60

IPQ4018
Netgear EX6100v2
Netgear EX6150v2

ARM
Nano Pi NEO
Orange Pi Zero

X86-64
Wyse ZD90W thin clinets

In my tests i have also treid FT over air AP roaming (seems to work ok)

Encryption with hw support (AES-NI and kirkwood and ppc)

Yggdrasil encrypted ipv6 routing

Irqbalance to move lan and wifi IRQ's to different cores

Im now looking at batman-adv bonding

https://www.open-mesh.org/projects/batman-adv/wiki/Multi-link-optimize

Anyone tried this with success ?

1 Like

I have been facing the same issue for some time on Archer C7(v2).

Applying the MTU fix from ath10k-ct on ath10k driver allowed for functional MTU >1500 with WPA3, however after a period of time (few hours) the link becomes unstable and is no longer forwarding properly. Setting MTU to 1500 with the same driver seems to allow for a stable link (been running for a couple of days without interruption).

3 Likes

I'm currently also stuck on using batman adv with mtu 1500 and need the fix for the non ct version.

1 Like