B.A.T.M.A.N Mesh MTU issue?!

Hi, I try to build a small Mesh WiFi with OpenWRT 19.07.2 on a TP-Link Archer C7 v2 (ath79) as Master and a D-Link DAP-2695-A2 (ar71xx) as Mesh Client.

The Setup is working great, but it looks like there is a MTU problem.
If I connected to the D-Link, it is possible to access ssh to the TP-Link over the WiFi Mesh.
But Internet works mostly only on Google Services which use UDP (QUIC) Protocol.

I followed the setup guide from https://openwrt.org/docs/guide-user/network/wifi/mesh/batman

My setup is as following (on the TP-Link):

config interface 'lan'
        option type 'bridge'
        option ifname 'eth1.1 bat0.1'
        option proto 'static'
        option ipaddr '192.168.123.1'
        option netmask '255.255.255.0'
        option ip6assign '64'
        option igmp_snooping '1'
        option stp '1'

config interface 'bat0'
        option proto 'batadv'
        option routing_algo 'BATMAN_IV'
        option aggregated_ogms 1
        option ap_isolation 0
        option bonding 0
        option fragmentation 1
        option gw_mode 'server'
        option gw_bandwidth '10000/2000'
        #option gw_sel_class 20
        option log_level 0
        option orig_interval 1000
        option bridge_loop_avoidance 1
        option distributed_arp_table 1
        option multicast_mode 1
        option network_coding 0
        option hop_penalty 30
        option isolation_mark '0x00000000/0x00000000'
        #option mtu '1532'

config interface 'mesh2g'
        option proto 'batadv_hardif'
        option master 'bat0'
        option mtu '2304'

config interface 'mesh5g'
        option proto 'batadv_hardif'
        option master 'bat0'
        option mtu '2304'
config wifi-device 'radio0'
        option path 'pci0000:00/0000:00:00.0'
        option type 'mac80211'
        option channel '36'
        option hwmode '11a'
        option htmode 'VHT80'
        option country 'DE'
        option legacy_rates '0'

config wifi-iface 'mesh5g'
        option device 'radio0'
        option ifname 'mesh5g'
        option network 'mesh5g'
        option mode 'mesh'
        option mesh_id 'HM5G'
        option mesh_fwding '0'
        option encryption 'psk2+ccmp'
        option key 'XXXXXXXX'
        option hidden '1'

config wifi-device 'radio1'
        option type 'mac80211'
        option channel 'auto'
        option hwmode '11g'
        option htmode 'HT40'
        option country 'DE'
        option country_ie '1'
        option noscan '1'
        option ht_coex '1'
        option legacy_rates '0'
        option disabled '0'
        option path 'platform/ahb/18100000.wmac'

config wifi-iface 'mesh2g'
        option device 'radio1'
        option ifname 'mesh2g'
        option network 'mesh2g'
        option mode 'mesh'
        option mesh_id 'HM2G'
        option mesh_fwding '0'
        option encryption 'psk2+ccmp'
        option key 'XXXXXXXX'
        option hidden '1'

And almost the same on the D-Link:

config interface 'lan'
	option type 'bridge'
	option ifname 'eth0.1 bat0.1'
	option proto 'static'
	option ipaddr '192.168.123.2'
	option netmask '255.255.255.0'
	option gateway '192.168.123.1'
	option dns '192.168.123.1'
	option igmp_snooping '1'
	option stp '1'

config interface 'bat0'
	option proto 'batadv'
	option routing_algo 'BATMAN_IV'
	option aggregated_ogms '1'
	option ap_isolation '0'
	option bonding '0'
	option fragmentation '1'
	option gw_mode 'client'
	option gw_sel_class '2'
	option log_level '0'
	option orig_interval '1000'
	option bridge_loop_avoidance '1'
	option distributed_arp_table '1'
	option multicast_mode '1'
	option network_coding '0'
	option hop_penalty '30'
	option isolation_mark '0x00000000/0x00000000'
        #option mtu '1532'

config interface 'mesh2g'
	option mtu '2304'
	option proto 'batadv_hardif'
	option master 'bat0'

config interface 'mesh5g'
	option mtu '2304'
	option proto 'batadv_hardif'
	option master 'bat0'
config wifi-device 'radio0'
	option path 'pci0000:00/0000:00:00.0'
	option type 'mac80211'
	option channel '36'
	option hwmode '11a'
	option htmode 'VHT80'
	option country 'DE'
	option legacy_rates '0'

config wifi-iface 'mesh5g'
	option device 'radio0'
	option ifname 'mesh5g'
	option network 'mesh5g'
	option mode 'mesh'
	option mesh_id 'HM5G'
	option mesh_fwding '0'
	option encryption 'psk2+ccmp'
	option key 'XXXXXXXX'
	option hidden '1'

config wifi-device 'radio1'
	option path 'platform/qca955x_wmac'
	option type 'mac80211'
	option channel 'auto'
	option hwmode '11g'
	option htmode 'HT40'
	option country 'DE'
	option country_ie '1'
	option noscan '1'
	option ht_coex '1'
	option legacy_rates '0'
	option disabled '0'

config wifi-iface 'mesh2g'
	option device 'radio1'
	option ifname 'mesh2g'
	option network 'mesh2g'
	option mode 'mesh'
	option mesh_id 'HM2G'
	option mesh_fwding '0'
	option encryption 'psk2+ccmp'
	option key 'XXXXXXXX'
	option hidden '1'

On that site, they say that we should add a IP address to bat0 to avoid path mtu problems - but which address/subnet?

https://wiki.freifunk.net/BATMAN-Konfiguration#Probleme_mit_MTU_PATH_DISCOVERY_vermeiden

I also tried add add a MTU of 1532 to bat0 - also a ip address - but all without luck.

Hi
I have the same problem. Do you already have a solution?
Gruss
Mike

I had lowered my LAN/WLAN MTU - specially I had to propagate that MTU over DHCP Options

option dhcp_option_force '26,1462'

config interface 'lan'
	option type 'bridge'
	option ifname 'eth1.1 bat0'
        [....]
	option mtu '1462'

config interface 'bat0'
	option proto 'batadv'
        [....]
	option mtu '1496'

config interface 'mesh5g'
	option proto 'batadv_hardif'
	option master 'bat0'
	option mtu '1528'

At least, this was working - but this was not my preferred solution :frowning:

Someone from BATMAN IRC Channel told me, that there is a bug in max MTU of ath10k driver, which was recently fixed. I don't remember the patch/url he told me - but I had no time to test that.

Ok, found the URL he told me:

2 Likes

Maybe also related:

1 Like

I'm seeing those problems two on my archer c7 v2/5 devices. The log told me to increase mtu for batman adv to 1532 and then I had ssh, scp access but lost web ui access to the device. Reverting it back to 1500 popped up the warning again but now my mesh fully works on 5ghz with around 100 mbit/s throughput. I use 19.07.1 with non-ct ath10k drivers.

Regarding the fix:
When is this expected to be available in Openwrt? Do I need to upgrade rhe driver or firmware with opkg or install a new firmware image?

Is your mesh using encryption? Does your device reboot after a day or so at all?

Yes, its using encryption. My device reboots after some longer time ocassionally ~ 12 to 50 days.

without encryption performance increases but it not ideal (not secure)

I tested not encrypted and wpa3 (sae) but on either ath10k vanilla or atk10k-ct my router reboots.....

also been trying on x86_64 on snapshot/master with compex wle900vx ath10k pci card

Well are you sure about encryption? My Archer achieves the max internet bandwidth (100 mbit down, 40 mbit up) via the encrypted mesh.

yes, same here upto 125Mbps but not 200 as the picture show (open - unencrypted)

200 mbit looks quite much for me, I've had that only one time with a modern dell notebook (7 series) nearby the router. Other devices were definitely slower.

Since 19.07.3 , my mesh can be configured to MTU 1532 and works fine. Still using non-ct drivers, encrypting the mesh with WPA3/SAE and don't have range problems with the 2.4 ghz radio. Let's see if the random crashes and watchdog reboots are still occuring.

I've had similar experiences. 19.7.3 allows an MTU above 1500 but there's still a kernel issue there.

I've seen pretty horrible throughput losses with higher MTUs. Between my two test devices (Both Archer C7s) I'm seeing ~40MB/s actual throughput irrespective of whether the device is in 802.11s, ibss or client/ap modes.

As soon as I bump the MTU above 1500 and try to run larger packets over the link that drops to 16 MB/s. That's as easy as running Batman on the link and sending 1500 byte frames over the batman interface (actual packets are about 1528 bytes).

As far as I can see, the issue is that the kernel is spending almost all of its time handling soft interrupts (ksoftirqd goes to 90-100%). I'm suspecting a driver issue, but it's potentially hardware related.

1 Like