Batman on devices with DSA ... Various issues

Updated:
I use Batman for L2 meshing, with wired and wireless links as backhaul, using Openwrt hosts (near git master, on 5.10) and Debian bullseye

On all my in-use OpenWrt devices with DSA (MT7621, Realtek) I have issues using the wired interfaces as backhaul. MTU needing to be higher than 1560 to avoid crap performance is only one of problems.
For MT762X, MTU 2030 is possible, with this backport to 5.10 from patches linked to by @LGA1150

Since ipq40xx and ath79 (qc8k) will probably soon be joining the list of DSA-targets, I would like to see, if my current usage of batmanOnDSA is somehow wrong.

So, on a DSA-target, we have 1(or more LAGged) CPU-port, usually eth0, mtu 1504 on mt7621 and realtek, with a bunch of lanX@eth0 interfaces set to MTU 1500, all bridged into a vlan-aware-bridge switch, also mtu 1500. The CPU-port on MT7621 can be set to 2030 after the kernel patch.

I would like to use some of the wired ports for backhaul traffic and some for user traffic, and some ports carrying both through different VLANs.

First intuition:

  • set configuration of eth0 (DSA master port) to MTU 2030.
  • Set the switch bridge interface and all lanX ports MTU to 2026
  • Define 1 additional VLAN on bridge for batman hardif traffic, e.g. 7.
  • Add VLAN7 to some ports, either tagged, or untagged PVID
  • configure switch.7 to have a MTU of 2026, and to have its master set to the bat0 device.
  • set unique MAC addresses on most L2 links.
  • add bat0 to the switch slave ports.

That would leave the User VLAN at normal MTU 1500, and enable switch-level forwarding for user and batman-traffic, while also enabling the MT7621 and realtek devices to participate in the mesh.

But that has following problems:

  • MTU of LAN-ports doesn't change (But switch passes large packets between ports). Are they supposed to?
  • Large packets don't arrive at the master port
  • switch.7 cannot be added as a slave to bat0: Resource busy. ip l set down didn't help.

-> fail.

Second try: Use a separate bridge, e.g. hi4b for batman hardif wired L2 interfaces, e.g. lan3 & lan4, and add that to bat0.

-> Doesn't work either: batman on DSA-host sees no neighbors in batctl n (But they show up in brctl showmacs hi4b) . Curiously the non-DSA batman hosts attached to lan3 & lan4, they do see the DSA host in their neighbor list, but can't batctl ping it, only each-other.

IPv6 traffic (and login over SSH) works, using the fe80:: auto-generated addresses.

3rd try: add the lan{3,4} ports directly to bat0, no intermediary bridge.
--> Fail, similar result as above, but now IPv6 doesn't work at all, and the non-DSA-hosts see the MAC of eth0 as batman neighbor, but communication doesn't work.

To me, the above points to (at least) 3 separate problems (with my usage of) DSA and VLAN-switches:

  • MTU setting/display is faulty. lanX always stay at 1500.
  • switch.X interfaces should be able to be used as batman hardIfs.
  • the bridges somehow filter/block/drop the batman broadcasts incoming. (i see them with tcpdump on the lanX interfaces though).

At this point I'm dreading the switch-over to DSA for ath79 and ipq40xx, because there I can still use the wired ports as wanted.

Or how is batman supposed to be used on DSA-targets? Can somebody help me debug this?

Hi,

Jumbo frame support for MT7621 was not present on kernel 5.4. You can try snapshot build, which has switched to kernel 5.10.

1 Like

I've tried with the testing kernel too. Before posting.

Currently running on my cudy wr1300:

OpenWrt SNAPSHOT, r17769+4-333f93333e
 -----------------------------------------------------
root@cudy:~# uname -a
Linux cudy 5.10.72 #0 SMP Mon Oct 18 11:33:42 2021 mips GNU/Linux

(+4 is for reverting the 4 last mt76 wireless commits. they break RSN-IBSS for me.)

I recently tried wired mesh with the EdgeRouter X (MT7621) and had lots of strange issues as I was using LibreMesh feeds. (DSA and non DSA)

Then the device had a strange issue where my WAN bandwidth was reduced to 50%

Reverting the device back to stock did not have this same issue.

The other targets i was using were IPQ4018 based

1 Like

If your MT7621 runs the RAM at 1200, you can set the MAC-mode to trgmii, which will help reduce the loss to 40 percent, because then the CPU-port will run at 1200Mbit. There was a pull request recently, that i tested before it was revoked. (it breaks eth on device with DDR2, or MemClock != 1200)

1 Like

Have you been able at all to get Multi-link Optimizations (batman bonding) working on batman-adv using wired connections ?

Ah ok, thank you for pointing that out

On non-DSA hosts, yes. But those are amd64, or older MIPS. I was planning to replace some of those with the DualCore+SMT MT7621 devices, but am currently blocked on all DSA-targets

1 Like

I agree, and feel your question is very important but cannot help howto debug this except for:

This post from one of the LibreMesh devleopers has relevance regarding strange 802.1ad behaviour found on Mediatek device but not on Atheros

and a mentioned patch
http://lkml.iu.edu/hypermail/linux/kernel/2103.0/00696.html

Oops, I didn't recall right, you still have to backport those 3 commits:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9470174e7581e75a8ebd78964997314dfc2e706c
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=771c8901568dd8776a260aa93db41be88a60389e
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4fd59792097a6b2fb949d41264386a7ecade469e

1 Like

Thank you for linking those patches. I've dumped the 12 hunks (only 1 needed rebasing) into a file unter ramips/patches-5.10/. Currently compiling with that patch.

For anyone interested, here's the backported patchfile: https://pastebin.com/79iW2TLY

Edit: Now running the patched 5.10. MTU is now set to 2030 on CPU-port.
All other problems remain though. I updated the first post.

Do you mean changing MTU does not work?

Changing the MTU up to 2030 is now possible on the CPU-Port eth0 with the patch I made from the the patches you linked. The lanX can now also be set to 2026 (2030-4).

But batman still doesn't work on the wired links. I cannot ip l set switch.$BatHardIfVlanNum master bat0 nor via batctl if add switch.7.

--> That makes it impossible to have batman and other traffic in different vlans on the same switch.

  • lan3 and lan4 (batman hardIfs) can be put in a separate bridge without vlan_filtering, and then this bridge can be added to bat0, but neighbors never show on batman level, even though i can see them in bridge fdb show dev hi4b. -> No mesh traffic.

  • adding lan{3,4} directly to bat0 also doesn't work. The DSA-host does not see the Non-DSA mesh partners.

I suspect, there is some sort of problem with MAC-address-learning for the MAC of the CPU-port side of the bridge.

Backport this commit then

1 Like

Thanks for pointing out that patch series. I looked through the other 3 too, and they look useful as well, so I'm going to (attempt to) backport the 4 patches, and will report back (probably) later today.

Any other maybe-probably-useful patch series you have up your sleeve?

1 Like

That patch does not apply cleanly. I tried adapting it, but my first attempt applies the Hunks, but doesn't compile.
I will look into it more later today

1 Like