Br-lan not working on mt7621 in master with batman

I've run into a weird problem with the normal br-lan bridge on mt7621 routers (xiaomi redmi ac2100). I'm using the master branch of openwrt as of a few days ago, so the new dsa drivers. The switch involved is a mt7530.

I set up a "simple" batman mesh config between two of these routers, on interface bat0. If I use a wifi 802.11s link (the bat_mesh0 interface) inside bat0 to communicate between the two nodes, all is well.

If I use an ethernet link (a cable) (the bat_eth interface) between the two nodes instead, I can still ping from one node to the other node. In other words I can ssh from a laptop into one node, and then ping the other node fine. But if I try to ping directly from that same laptop, I can ping the first node it is connected to, but I can not ping the other node! The bridge interface on the node is not bridging traffic between the bridged wifi ap and the bat0 lan interface.

This works fine on other hardware, like ath79 units I've tested, even on the master branch, so it's something with the mt7621/mt7530 drivers I think. Ideas?

Here's my /etc/config/network. (It's basically the same for both nodes, just different IPs.) Nothing else like wireless or firewall is different from stock default.

config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fdb0:42d0:69f2::/48'
        option packet_steering '1'

config interface 'lan'
        option type 'bridge'
        option ifname 'bat0 lan3 lan4'
        option proto 'static'
        option netmask '255.255.255.0'
        option ip6assign '60'
        option gateway '192.168.2.1'
        option stp '1'
        option ipaddr '192.168.2.2'
        option dns '192.168.2.1'

config interface 'bat0'
        option proto 'batadv'
        option routing_algo 'BATMAN_IV'
        option aggregated_ogms '1'
        option ap_isolation '0'
        option bonding '0'
        option fragmentation '1'
        option gw_mode 'off'
        option log_level '0'
        option orig_interval '1000'
        option bridge_loop_avoidance '1'
        option distributed_arp_table '1'
        option multicast_mode '1'
        option network_coding '0'
        option hop_penalty '10'
        option isolation_mark '0x00000000/0x00000000'

config interface 'bat_mesh0'
        option mtu '2304'
        option proto 'batadv_hardif'
        option master 'bat0'

config interface 'wan'
        option ifname 'wan'
        option proto 'dhcp'

config interface 'wan6'
        option ifname 'wan'
        option proto 'dhcpv6'

config device
        option name 'lan1'
        option mtu '1560'

config device
        option name 'lan2'
        option mtu '1560'

config device
        option name 'eth0'
        option mtu '1560'

config interface 'bat_eth'
        option mtu '1560'
        option proto 'batadv_hardif'
        option master 'bat0'
        option ifname 'lan1'

config interface 'bat_eth2'
        option mtu '1560'
        option proto 'batadv_hardif'
        option master 'bat0'
        option ifname 'lan2'

option mtu '1560'

Currently MT7621 ethernet driver does not support jumbo frames

Right, I know. I was able to solve that issue using a few magic lines of code adjusting the mt7530 registers. But that can be for another thread.
I get the same issue of the bridge not bridging even when not doing any mtu / jumbo frame settings, and no special code to enable them in firmware.

Here's my new /etc/network/config :

config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fdb0:42d0:69f2::/48'
        option packet_steering '1'

config interface 'lan'
        option type 'bridge'
        option ifname 'bat0 lan1 lan2'
        option proto 'static'
        option netmask '255.255.255.0'
        option ip6assign '60'
        option stp '1'
        option ipaddr '192.168.2.1'

config interface 'bat0'
        option proto 'batadv'
        option routing_algo 'BATMAN_IV'
        option aggregated_ogms '1'
        option ap_isolation '0'
        option bonding '0'
        option fragmentation '1'
        option gw_mode 'off'
        option log_level '0'
        option orig_interval '1000'
        option bridge_loop_avoidance '1'
        option distributed_arp_table '1'
        option multicast_mode '1'
        option network_coding '0'
        option hop_penalty '10'
        option isolation_mark '0x00000000/0x00000000'

config interface 'bat_mesh0'
        option mtu '2304'
        option proto 'batadv_hardif'
        option master 'bat0'

config interface 'wan'
        option ifname 'wan'
        option proto 'dhcp'

config interface 'wan6'
        option ifname 'wan'
        option proto 'dhcpv6'

config interface 'bat_eth'
        option proto 'batadv_hardif'
        option master 'bat0'
        option ifname 'lan3'

Try turning off STP

I had tried that. Unfortunately that does not help.

I do get this warning/crash in dmesg...

[   86.691338] ------------[ cut here ]------------
[   86.700599] WARNING: CPU: 1 PID: 0 at net/bridge/br_switchdev.c:46 br_handle_frame_finish+0xac/0x4ac
[   86.718799] Modules linked in: pppoe ppp_async iptable_nat batman_adv xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD pppox ppp_generic nf_nat nf_flow_table_hw nf_flow_table nf_conntrack_rtcache nf_conntrack mt7615e mt7615_common mt7603e mt76 mac80211 ipt_REJECT cfg80211 xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG slhc nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_filter ip_tables crc_ccitt compat ledtrig_heartbeat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 leds_gpio xhci_plat_hcd xhci_pci xhci_mtk xhci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
[   86.848169] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.71 #0
[   86.859937] Stack : 00000000 8007d3b4 80680000 80681564 806e0000 8068152c 80680680 87c0dcc4
[   86.876559]         80820000 87c3c184 806c8ce3 80618ec8 00000001 00000001 87c0dc68 acc17b0e
[   86.893178]         00000000 00000000 80860000 00000000 00000030 00000186 342e3520 2031372e
[   86.909797]         00000000 00000004 00000000 000cf129 00000000 806e0000 00000000 8053c248
[   86.926416]         00000009 806c6ce4 87c0dea0 806c0000 00000002 80338848 00000004 80820004
[   86.943035]         ...
[   86.947893] Call Trace:
[   86.952775] [<8000b72c>] show_stack+0x30/0x100
[   86.961615] [<8055fd78>] dump_stack+0xa4/0xdc
[   86.970293] [<8002bf88>] __warn+0xc0/0x10c
[   86.978438] [<8002c030>] warn_slowpath_fmt+0x5c/0xac
[   86.988317] [<8053c248>] br_handle_frame_finish+0xac/0x4ac
[   86.999228] [<8053c824>] br_handle_frame+0x1dc/0x334
[   87.009114] [<80405918>] __netif_receive_skb_core+0x254/0xa90
[   87.020545] [<80406178>] __netif_receive_skb_one_core+0x24/0x50
[   87.032322] [<80406388>] process_backlog+0x9c/0x178
[   87.042024] [<80407aac>] __napi_poll+0x3c/0x10c
[   87.051034] [<80407d20>] net_rx_action+0x114/0x28c
[   87.060568] [<805808d4>] __do_softirq+0x16c/0x334
[   87.069936] [<800306f4>] irq_exit+0x98/0xb0
[   87.078254] [<802da518>] plat_irq_dispatch+0x64/0x104
[   87.088301] [<80006de8>] except_vec_vi_end+0xb8/0xc4
[   87.098193] [<8057ff88>] r4k_wait_irqoff+0x1c/0x24
[   87.107954] ---[ end trace 195a726768122039 ]---

Looks like a bug of either linux kernel or batman

Yes, that really narrows it down. :slight_smile:

It gets weirder.
If I ping 192.168.2.2 from a computer attached to the lan1 interface (on the bridge) of 192.168.2.1, I can get arp requests/replies through just fine. (arping works fine.) But the ip pings don't work, except right after an arp! So if I'm arping while pinging, many pings get through. If I'm not arping, and clear the arp cache on the computer, the first ping gets through and back, because an arp request is sent and returned just before the ip ping. But then no other pings get through until every 40s later when the next arp request/reply is sent!
tcpdump shows that the pings make it to 192.168.2.2 and a reply is generated fine, and it gets back to 192.168.2.1 on the bat0 and the br-lan interface, but does not get seen on the lan1 interface. Why?!

And arps do not get through (or pings) coming in from a computer on wlan (also on the bridge). An arp reply is generated from 192.168.2.2 and gets back to 192.168.2.1 on bat0/br-lan but is not sent out on the wlan interface.

Something seems to be broken about whether the bridge thinks it needs to forward packets to wlan/lan from the bat0/lan interface. bat0/wlan interface (wireless mesh) works fine, remember.

I think that crash is relevant, actually. The warning is printed here (by the WARN_ON macro) in net/bridge/br_switchdev.c:

void nbp_switchdev_frame_mark(const struct net_bridge_port *p, struct sk_buff *skb)
{
        if (skb->offload_fwd_mark && !WARN_ON(!p->offload_fwd_mark))
                BR_INPUT_SKB_CB(skb)->offload_fwd_mark = p->offload_fwd_mark;
}

And it's printed whenever I ask that packets go over ethernet through batman.
It seems to be related to switch flow offloading? I'm not sure what this offload_fwd_mark is for.
Ideas anyone?

The same problem, I think it is related to a patch upstream of the mt7530 dsa driver: [PATCH REPOST] net: dsa: mt7530: fix roaming from DSA user ports.

Delete the following code in the file:/net/dsa/tag_mtk.c in the patch:

	/* Only unicast or broadcast frames are offloaded */
	if (likely(!is_multicast_skb))
		skb->offload_fwd_mark = 1;

The bridge can forward mesh traffic normally on the Ethernet network.There is no warning message in the kernel.