Batman stopped working on Ath79 snapshot after last batch of mac80211 updates

@nbd Up to current git master head (r17467 , 5.4.143) I have isolated the following block of commits, that break batman since mid-august:

0f6887972a mac80211: add missing change for encap offload on devices with sw rate control
f04c0ead33 mac80211: refresh patch
a0d81ba0d5 mac80211: fix HT40 mode for 6G band

The symptoms are that batctl n shows neighbors, but batctl o shows no originators, batctl ping fails to/from all shown neighbors, and no mesh traffic goes through bat0. Also the other devices do not even see this device as a neigbour. The actual physical interface works fine, I was able to to SSH to the IPv6 allocated to the Hard_IF below batman.

I noticed, when my on-desk test-device (CONFIG_TARGET_ath79_generic_DEVICE_glinet_gl-ar300m-lite=y) started exhibiting this problem. Other devices still working had 5.4.142, so the problem had to be after that. git reverting the above 3 commits gets Batman working again normally.

Can you ping the ip address of the other node ?

It seems the bridge definition part has changed in latest version of OpenWRT
How have you defined your lan mesh bridge in /etc/config/network

Can you ping the ip address of the other node ?

Mesh-nodes can ping6 the IPv6 of each-others mesh2g-interface, yes.
None of the other meshnodes can batctl p the (running snapshot) AR300m. In fact, they don't even see the AR300 in the batctl n neighbors list.

All magically goes back to normal when using the 21.02.0-rc4.

It seems the bridge definition part has changed in latest version of OpenWRT

I don't understand, how this comes into play at this level. Or are you not talking about the "LAN"-bridge? In my case LAN is a bridge with eth0 and bat0.myVLANNumber. IP-traffic to/from other devices on LAN-L2 does not work.

Ok, i had strange behaviour on a MT7620 as soon as i enabled encryption, only worked after i disabled encryption

Does the throughput stay at 1 Mbps when you run iw station dump perhaps ?

I have MT76 devices as well. Will also test them.

No, station dump looks "normal", i think:

Station 12:34:56:78:90:de (on mesh2g)
        inactive time:  10 ms
        rx bytes:       1246718
        rx packets:     7032
        tx bytes:       118026
        tx packets:     547
        tx retries:     161
        tx failed:      0
        rx drop misc:   54
        signal:         -40 [-45, -42] dBm
        signal avg:     -41 [-46, -43] dBm
        Toffset:        775307160489 us
        tx bitrate:     130.0 MBit/s MCS 15
        tx duration:    52663 us
        rx bitrate:     144.4 MBit/s MCS 15 short GI
        rx duration:    10664262 us
        airtime weight: 256
        expected throughput:    45.226Mbps
        mesh llid:      0
        mesh plid:      0
        mesh plink:     ESTAB
        mesh airtime link metric: 177
        mesh connected to gate: no
        mesh connected to auth server:  no
        mesh local PS mode:     ACTIVE
        mesh peer PS mode:      ACTIVE
        mesh non-peer PS mode:  ACTIVE
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       long
        WMM/WME:        yes
        MFP:            yes
        TDLS peer:      no
        DTIM period:    2
        beacon interval:5000
        connected time: 475 seconds
        associated at [boottime]:       37.992s
        associated at:  1630389169897 ms
        current time:   1630389644741 ms

Ok, dont know what else to suggest.

I see you are installing alfred

Do you use luci-app-batman-adv ?

Thanks in any case :slight_smile:

Uhm, yes, for testing. But not in the images built with imagebuilder. So it makes no difference in this case.

Where would I find that? I don't have that option/app in my menuconfig. I searched through / for "luci-app-b" and "batman".

Since i was also building LibreMesh images, I took this ipk from those builds

LibreMesh only builds successfully upto OpenWRT 19.07.7

it a universal ipk not target specific

luci-app-batman-adv_2021-07-08-1625782035_all.ipk

You then have a luci front-end for visualisation of your batman nodes

I was also running batman-adv on ethernet also

1 Like

I built the package

CONFIG_PACKAGE_luci-app-batman-adv=y

In slightly more detail:

  • git clone https://github.com/libremesh/lime-packages somewhere
  • cp the subfolder luci-app-batman-adv into the openwrt as feeds/packages/net/luci-app-batman-adv/
  • update & install the package feeds
  • make menuconfig
  • Luci -> Applications -> luci-app-batadv

So, now I have the Luci Batman GUI, that tells me the same thing: Neigbors can be seen, but originators stays at zero. My devices still running 5.4.142 (r17390+11-9baca41064) from 10 days ago still have functioning batman. It broke somewhere since then.

Update: The three last mac80211 patches (between 5.4.142 and .143) seem to provoke this problem here.

1 Like

@stragies can you please test if reverting only 0f6887972a and not the other two will also make it work again?

1 Like

I thought, I had already tried that last night ... that's why I wrote, that reverting all 3 is required ... Oops.

Yes, reverting only 0f6887972a suffices to get Batman working again properly. Sorry for the confusion.

Hi @nbd, just to confirm: My builds since the revised patch you committed upstream do not exhibit the problem anymore. Thanks!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.