DSA and Appletalk multicast issues

Hello,

I'm new at DSA and layer 3 networking so apologies if this is beyond basic.

I am testing the Appletalk protocol on OpenWrt (old, I know). It is a non-IP protocol that relies on layer-3 multicasting for handshakes.

On both DSA / non-DSA builds, I can see incoming traffic (<remoteMAC> below) on br-lan with tcpdump:

00:27:03.652198 <localMAC> (oui Unknown) > 09:00:07:ff:ff:ff (oui Unknown), 802.3, length 36: LLC, dsap SNAP (0xaa) Individual, ssap SNAP (0xaa) Command, ctrl 0x03: oui Ethernet (0x000000), ethertype Appletalk ARP (0x80f3), length 28: aarp probe 6400.206 tell 6400.206
00:27:06.982423 <remoteMAC> (oui Unknown) > 09:00:07:ff:ff:ff (oui Unknown), 802.3, length 37: LLC, dsap SNAP (0xaa) Individual, ssap SNAP (0xaa) Command, ctrl 0x03: oui Appletalk (0x080007), pid Appletalk (0x809b), length 29: 6400.148.rtmp > 0.rtmp:  at-rtmp 16

Somehow that second packet never reaches the AppleTalk daemon's listening socket on the DSA build. It does on non-DSA (23.05) builds. I am narrowing down on the commit when things stop working for the two platforms I have here and I'm getting pretty confident this is a DSA thing.

Somehow the kernel with a DSA managed switch (bridge?) is either filtering that second packet, or not flooding, or not sending it to the packet to the listening socket.

Router is running with no firewall (it is stopped and disabled) so it can't be that.

Any pointers on what to look for? Any suggestions on how to get more details on what the bridge is doing (or what the kernel is doing with that bridge)? How to query or set or clear filtering? I've read about the bridge command in ip-bridge but don't really know what to do with it. Anything else?

Thanks in advance.

Try tcpdump without promisc and/or setting involved interfaces promisc.
That is the easiest difference between tcpdump and normal workings.

In the meantime

ubus call system board
cat /etc/config/network

Probably save 10..100..1000 packets (tcpdump ... ... -c 100 -w /tmp/probe1.pcap) then check with wireguard on a desktop.

Hm personally I found the most useful command in the bridge tool to be 'monitor'. For example if you use igmp-snooping, you could:

bridge monitor mdb

or some such, then disconnect/reconnect devices, and bridge should dump on console what multicast groups the devices join etc.

(also useful for monitoring other bridge events such as MACs moving around links, not so relveant here, but what I've actually used it for.)

:+1: good idea

Some random commands:

root@OpenWrt:~# ip stats show dev br-lan
root@OpenWrt:~# grep "" /sys/devices/virtual/net/br-lan/statistics/*

If you want to post a single AppleTalk and/or AppleTalk-ARP frame here so the rest of us can tcp-replay it and test:

root@OpenWrt:~# tshark -r input.pcap -w output.pcap -Y 'frame.number eq 5'
root@OpenWrt:~# base64 < output.pcap 

If the appletalk frame is captured frame number 5 fx.

I think there's also a "perf" utility or something that works a bit like dtrace, and can be hooked up to some DSA kernel function one thinks might be dropping a frame, and it will print a stack trace when it hits, but probably premature at this point. Also assuming the frame actually enters the switch.

All good stuff and much appreciated. Need to reflash both devices with snapshot now that the module is available and custom builds are no longer required.
Gimme a day and I'll get back with details.
Thanks again!

Pretty vanilla config.
Same behaviour on the ipq806x below or on an ath79, so this is not a device thing.

root@OpenWrt:~# ubus call system board
{
	"kernel": "6.6.63",
	"hostname": "OpenWrt",
	"system": "ARMv7 Processor rev 0 (v7l)",
	"model": "TP-Link Archer C2600",
	"board_name": "tplink,c2600",
	"rootfs_type": "squashfs",
	"release": {
		"distribution": "OpenWrt",
		"version": "SNAPSHOT",
		"revision": "r28197-5695267847",
		"target": "ipq806x/generic",
		"description": "OpenWrt SNAPSHOT r28197-5695267847",
		"builddate": "1732549290"
	}
}
root@OpenWrt:~# cat /etc/config/network

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fdc5:8c02:920a::/48'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'lan4'

config interface 'lan'
	option device 'br-lan'
	option proto 'static'
	option ipaddr '192.168.128.63'
	option netmask '255.255.255.0'
	option ip6assign '60'
	option gateway '192.168.128.1'
	list dns '192.168.120.3'

config interface 'wan'
	option device 'wan'
	option proto 'dhcp'

config interface 'wan6'
	option device 'wan'
	option proto 'dhcpv6'

I'll work on that capture tomorrow.

root@TestWrt-63:~# ip stats show dev br-lan
10: br-lan: group offload subgroup hw_stats_info
    l3_stats off used off
10: br-lan: group xstats_slave subgroup bond suite 802.3ad
10: br-lan: group xstats_slave subgroup bridge suite mcast
10: br-lan: group xstats_slave subgroup bridge suite stp
10: br-lan: group xstats subgroup bond suite 802.3ad
10: br-lan: group xstats subgroup bridge suite mcast
                    IGMP queries:
                      RX: v1 0 v2 0 v3 0
                      TX: v1 0 v2 0 v3 0
                    IGMP reports:
                      RX: v1 0 v2 0 v3 0
                      TX: v1 0 v2 0 v3 0
                    IGMP leaves: RX: 0 TX: 0
                    IGMP parse errors: 0
                    MLD queries:
                      RX: v1 0 v2 0
                      TX: v1 0 v2 0
                    MLD reports:
                      RX: v1 0 v2 0
                      TX: v1 0 v2 0
                    MLD leaves: RX: 0 TX: 0
                    MLD parse errors: 0

10: br-lan: group xstats subgroup bridge suite stp
10: br-lan: group afstats subgroup mpls
10: br-lan: group offload subgroup l3_stats off used off
10: br-lan: group offload subgroup cpu_hit

10: br-lan: group link
    RX:  bytes packets errors dropped  missed   mcast           
      43606090   53112      0       0       0   10639 
    TX:  bytes packets errors dropped carrier collsns           
       4543466    7709      0       0       0       0 

Run atalkd and everything looks the same but for the counters at the bottom:

10: br-lan: group link
    RX:  bytes packets errors dropped  missed   mcast           
      43943736   53717      0       0       0   10728 
    TX:  bytes packets errors dropped carrier collsns           
       4655919    7904      0       0       0       0 
root@TestWrt-63:~# grep "" /sys/devices/virtual/net/br-lan/statistics/*
/sys/devices/virtual/net/br-lan/statistics/collisions:0
/sys/devices/virtual/net/br-lan/statistics/multicast:11460
/sys/devices/virtual/net/br-lan/statistics/rx_bytes:46554162
/sys/devices/virtual/net/br-lan/statistics/rx_compressed:0
/sys/devices/virtual/net/br-lan/statistics/rx_crc_errors:0
/sys/devices/virtual/net/br-lan/statistics/rx_dropped:0
/sys/devices/virtual/net/br-lan/statistics/rx_errors:0
/sys/devices/virtual/net/br-lan/statistics/rx_fifo_errors:0
/sys/devices/virtual/net/br-lan/statistics/rx_frame_errors:0
/sys/devices/virtual/net/br-lan/statistics/rx_length_errors:0
/sys/devices/virtual/net/br-lan/statistics/rx_missed_errors:0
/sys/devices/virtual/net/br-lan/statistics/rx_nohandler:0
/sys/devices/virtual/net/br-lan/statistics/rx_over_errors:0
/sys/devices/virtual/net/br-lan/statistics/rx_packets:58026
/sys/devices/virtual/net/br-lan/statistics/tx_aborted_errors:0
/sys/devices/virtual/net/br-lan/statistics/tx_bytes:5432589
/sys/devices/virtual/net/br-lan/statistics/tx_carrier_errors:0
/sys/devices/virtual/net/br-lan/statistics/tx_compressed:0
/sys/devices/virtual/net/br-lan/statistics/tx_dropped:0
/sys/devices/virtual/net/br-lan/statistics/tx_errors:0
/sys/devices/virtual/net/br-lan/statistics/tx_fifo_errors:0
/sys/devices/virtual/net/br-lan/statistics/tx_heartbeat_errors:0
/sys/devices/virtual/net/br-lan/statistics/tx_packets:9075
/sys/devices/virtual/net/br-lan/statistics/tx_window_errors:0

Need to read on the bridge command. Not sure what it does. But it listed out the broadcast address Appletalk is using:

root@TestWrt-63:~# bridge -d fdb
33:33:00:00:00:01 dev eth0 self permanent
01:00:5e:00:00:01 dev eth0 self permanent
   ...tons more stuff...
33:33:ff:05:3a:77 dev br-lan self permanent
33:33:ff:00:00:00 dev br-lan self permanent
09:00:07:ff:ff:ff dev br-lan self permanent
09:00:00:ff:ff:ff dev br-lan self permanent

That 09:00:00:ff:ff:ff is critical for this to work.

Again... apologies as I don't know what I'm doing but...

In 23.05:

root@TestWrt-64:~# bridge link
8: eth1.1@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br-lan state forwarding priority 32 cost 4 

and

root@TestWrt-64:~# ip stats show dev eth1.1
...bunch of stuff...
8: eth1.1: group link
    RX:  bytes  packets errors dropped  missed   mcast           
    8941477829 10159861      0       0       0 1591920 
    TX:  bytes  packets errors dropped carrier collsns           
      33280074   294062      0       0       0       0 

(br-lan reports ballpark the same mcast value)

While on snapshot:

root@TestWrt-63:~# bridge link
4: lan4@eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br-lan state disabled priority 32 cost 100 
5: lan3@eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br-lan state disabled priority 32 cost 100 
6: lan2@eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br-lan state disabled priority 32 cost 100 
7: lan1@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br-lan state forwarding priority 32 cost 5 

and

root@TestWrt-63:~# ip stats show dev lan1
...another bunch of stuff...
7: lan1: group link
    RX:  bytes packets errors dropped  missed   mcast           
      77315903  109776      0       0       0       0 
    TX:  bytes packets errors dropped carrier collsns           
      14577741   27014      0       0       0       0 

Zero on mcast. While br-lan has plenty:

root@TestWrt-63:~# ip stats show dev br-lan
...yet more stuff...
10: br-lan: group link
    RX:  bytes packets errors dropped  missed   mcast           
      75125994  106465      0       0       0   19175 
    TX:  bytes packets errors dropped carrier collsns           
      14468758   23468      0       0       0       0 

Might mean nothing, but 23.05 seems to have a link between multicasts on bridge and on interface, while snapshot sees no multicasts on interface?

You were advised to set promisc flag on interface, but you are still producing output without any.

Ah. Sorry. Misunderstood that you were providing two options. Doesn't help that I'm not too sure yet what I'm looking at, but I think it's beginning to make sense. Thanks for the patience so far.

root@TestWrt-63:~# tcpdump --no-promiscuous-mode -i br-lan -e | grep Appletalk results in the same output as before. So does for lan1. I see packets coming in from outside both at lan1 and br-lan but the daemon is oblivious.
For eth1 I hit a nice known bug in libpcap (I think I saw the issue open somewhere):

root@TestWrt-63:~# tcpdump --no-promiscuous-mode  -i eth1 -e | grep Appletalk
tcpdump: unsupported DSA tag: qca

tcpdump with no arguments gives me the same error.
Annoyingly enough, libpcap 1.10.4 on 23.05 doesn't have this problem. (or... funny coincidence, libpcap 1.10.5 on Snapshot doesn't like DSA).
I'll flash the ath79 device tomorrow and hopefully this works on that one.

If I ip link set br-lan promisc on (and/or lan1 and/or eth1) so I see this on ifconfig: UP BROADCAST RUNNING PROMISC MULTICAST MTU:1502 Metric:1
the output is again identical (including the libpcap error on eth1). I can see the MAC address of the 23.05 device on the snapshot one at bridge and lan port, but daemon is none the wiser.

If you were expecting something else, please let me know.

I'll look into packet captures tomorrow.

Well that particular statistics field/number is "RX mcast".

Is there any equipment connected to lan1 that is sending multicast into the port?

If there is only multicast traffic going "the other direction" (ie. origin is atalkd daemon running on openwrt, into the br-lan interface, frame copy to switch hardware, presumably out of some 'state forwarding' phys ports...) then zero for lan1 could be correct. It would be TXing the atalkd mcast frame, and just not receiving anything in return.

Ok. Looking at the tcpdump:

Might also be prudent to make sure that the 09:00:07:ff:ff:ff MAC is defiltered.

(Looks like it already is though!)

Wow yes, brings back a mix of happy and traumatic memories :wink:

To make it work you need to enable igmp_snooping. I'm not sure if this is the difference between DSA and non-DSA .... probably.
https://openwrt.org/docs/guide-user/network/network_configuration#bridge_options

Yeah. Who would've thought there's a community out there addicted to those. :crazy_face:
I'm looking at it as a learning opportunity. And this issue made me lose endless time thinking something was busted in netatalk code until I decided to test in 23.05 where it all magically works, so I'm after the kernel for payback.

Tried that before. No change on behaviour. Enabling either via OpenWrt config
option igmp_snooping 1 or
echo "1" > /sys/devices/virtual/net/br-lan/bridge/multicast_snooping

Without being an expert so take my terminology with a full bag of salt... I don't think the issue is between ports and bridge or bridge itself as I can see packets coming and going (beside that multicast counter at zero being weird).
Issue might be between kernel (DSA bits which I half understand) and atalkd socket.

What does DSA mess up so that code that opens a socket on br-lan no longer gets traffic? And ideally, what can be done to DSA config so it plays nice as before?

Thanks! How could I check for that?
Regarding 09:00:00 vs 09:00:07... 07 is the correct address. 00 is a defect in netatalk 4.0.4 fixed in 4.0.6. I will do an ad-hoc build to test 4.0.6 but I don't think this is material as 4.0.4 works OK in non-DSA setups. That... unless... you think this is what could be messing the way DSA is dealing with the whole thing.
I'll get that build going...

I *think* that bridge fdb | grep 09:00:07:ff:ff:ff also represents what the hardware MAC filters will be programmed with...

Not sure at this point. I think that @brada4 had the right idea - use tcpdump to verify that mcast from atalkd is emitted out lan1, and that mcast from whatever is connected to lan1 also end up on br-lan.

It was always layer 2 multicasting as far as I can remember. A bridge knows nothing about nor cares about layer 3.

Ah! I missed that bit.
The old netatalk package disappeared many years ago, long before DSA was an OpenWrt thing (because appletalk support was removed from the kernel).

If you have a third party appletalk daemon, you should contact the developers of it. If it is what you have, it is more likely that it does not understand DSA, rather than DSA being broken.

Yes. I think we are talking about the same thing but I'm not using terminology right.
DDP (AppleTalk) is layer 3, but it's not IP.
It relies on layer 2 multicasting (ethernet level), not layer 3 multicasting (what IP 224/8 would be).

The appletalk kernel module never left the kernel tree, and is still maintained (a patch is going in for 6.9). But yes, it was dropped by OpenWrt builds years ago. It's back in snapshots now, along with the netatalk package.
The module enables DDP in the kernel. If DSA breaks the module in a way that can't be addressed by configuration, this is a mainline kernel issue with slim odds it will ever get solved. I'll poke at it but it is way above my pay grade.

The user space daemon is atalkd (part of netatalk) that is also actively maintained and has been in use for a long time especially in BSD platforms. I contacted the maintainers (netatalk upstream) to fix build issues and to poke at this, but DSA is news to them.

So that's where I am now and where I'm looking for ideas... what change could DSA cause to break either:

  • the interface (br-lan) to kernel module connection, something hopefully OpenWrt community can throw me some pointers about, fix through config or fix through implementation changes to DSA on OpenWrt (long shot - I have an issue open for that), or
  • kernel module to user space daemon, which I'll try patch myself and/or take to netatalk upstream.

In summary... the layer 3 implementation (kernel module) works nicely with layer 2 (bridge) on non-DSA setups (23.05, Ubuntu). Layer 2 had some change with DSA that broke this.
Half broke it, as the userspace daemon can send the multicasts, to the point working non-DSA setups see the traffic, but that daemon on a DSA setup is not receiving multicasts back.