Somehow that second packet never reaches the AppleTalk daemon's listening socket on the DSA build. It does on non-DSA (23.05) builds. I am narrowing down on the commit when things stop working for the two platforms I have here and I'm getting pretty confident this is a DSA thing.
Somehow the kernel with a DSA managed switch (bridge?) is either filtering that second packet, or not flooding, or not sending it to the packet to the listening socket.
Router is running with no firewall (it is stopped and disabled) so it can't be that.
Any pointers on what to look for? Any suggestions on how to get more details on what the bridge is doing (or what the kernel is doing with that bridge)? How to query or set or clear filtering? I've read about the bridge command in ip-bridge but don't really know what to do with it. Anything else?
If the appletalk frame is captured frame number 5 fx.
I think there's also a "perf" utility or something that works a bit like dtrace, and can be hooked up to some DSA kernel function one thinks might be dropping a frame, and it will print a stack trace when it hits, but probably premature at this point. Also assuming the frame actually enters the switch.
All good stuff and much appreciated. Need to reflash both devices with snapshot now that the module is available and custom builds are no longer required.
Gimme a day and I'll get back with details.
Thanks again!
Ah. Sorry. Misunderstood that you were providing two options. Doesn't help that I'm not too sure yet what I'm looking at, but I think it's beginning to make sense. Thanks for the patience so far.
root@TestWrt-63:~# tcpdump --no-promiscuous-mode -i br-lan -e | grep Appletalk results in the same output as before. So does for lan1. I see packets coming in from outside both at lan1 and br-lan but the daemon is oblivious.
For eth1 I hit a nice known bug in libpcap (I think I saw the issue open somewhere):
tcpdump with no arguments gives me the same error.
Annoyingly enough, libpcap 1.10.4 on 23.05 doesn't have this problem. (or... funny coincidence, libpcap 1.10.5 on Snapshot doesn't like DSA).
I'll flash the ath79 device tomorrow and hopefully this works on that one.
If I ip link set br-lan promisc on (and/or lan1 and/or eth1) so I see this on ifconfig: UP BROADCAST RUNNING PROMISC MULTICAST MTU:1502 Metric:1
the output is again identical (including the libpcap error on eth1). I can see the MAC address of the 23.05 device on the snapshot one at bridge and lan port, but daemon is none the wiser.
If you were expecting something else, please let me know.
Well that particular statistics field/number is "RX mcast".
Is there any equipment connected to lan1 that is sending multicast into the port?
If there is only multicast traffic going "the other direction" (ie. origin is atalkd daemon running on openwrt, into the br-lan interface, frame copy to switch hardware, presumably out of some 'state forwarding' phys ports...) then zero for lan1 could be correct. It would be TXing the atalkd mcast frame, and just not receiving anything in return.
Yeah. Who would've thought there's a community out there addicted to those.
I'm looking at it as a learning opportunity. And this issue made me lose endless time thinking something was busted in netatalk code until I decided to test in 23.05 where it all magically works, so I'm after the kernel for payback.
Tried that before. No change on behaviour. Enabling either via OpenWrt config option igmp_snooping 1 or echo "1" > /sys/devices/virtual/net/br-lan/bridge/multicast_snooping
Without being an expert so take my terminology with a full bag of salt... I don't think the issue is between ports and bridge or bridge itself as I can see packets coming and going (beside that multicast counter at zero being weird).
Issue might be between kernel (DSA bits which I half understand) and atalkd socket.
What does DSA mess up so that code that opens a socket on br-lan no longer gets traffic? And ideally, what can be done to DSA config so it plays nice as before?
Thanks! How could I check for that?
Regarding 09:00:00 vs 09:00:07... 07 is the correct address. 00 is a defect in netatalk 4.0.4 fixed in 4.0.6. I will do an ad-hoc build to test 4.0.6 but I don't think this is material as 4.0.4 works OK in non-DSA setups. That... unless... you think this is what could be messing the way DSA is dealing with the whole thing.
I'll get that build going...
I *think* that bridge fdb | grep 09:00:07:ff:ff:ff also represents what the hardware MAC filters will be programmed with...
Not sure at this point. I think that @brada4 had the right idea - use tcpdump to verify that mcast from atalkd is emitted out lan1, and that mcast from whatever is connected to lan1 also end up on br-lan.
It was always layer 2 multicasting as far as I can remember. A bridge knows nothing about nor cares about layer 3.
Ah! I missed that bit.
The old netatalk package disappeared many years ago, long before DSA was an OpenWrt thing (because appletalk support was removed from the kernel).
If you have a third party appletalk daemon, you should contact the developers of it. If it is what you have, it is more likely that it does not understand DSA, rather than DSA being broken.
Yes. I think we are talking about the same thing but I'm not using terminology right.
DDP (AppleTalk) is layer 3, but it's not IP.
It relies on layer 2 multicasting (ethernet level), not layer 3 multicasting (what IP 224/8 would be).
The appletalk kernel module never left the kernel tree, and is still maintained (a patch is going in for 6.9). But yes, it was dropped by OpenWrt builds years ago. It's back in snapshots now, along with the netatalk package.
The module enables DDP in the kernel. If DSA breaks the module in a way that can't be addressed by configuration, this is a mainline kernel issue with slim odds it will ever get solved. I'll poke at it but it is way above my pay grade.
The user space daemon is atalkd (part of netatalk) that is also actively maintained and has been in use for a long time especially in BSD platforms. I contacted the maintainers (netatalk upstream) to fix build issues and to poke at this, but DSA is news to them.
So that's where I am now and where I'm looking for ideas... what change could DSA cause to break either:
the interface (br-lan) to kernel module connection, something hopefully OpenWrt community can throw me some pointers about, fix through config or fix through implementation changes to DSA on OpenWrt (long shot - I have an issue open for that), or
kernel module to user space daemon, which I'll try patch myself and/or take to netatalk upstream.
In summary... the layer 3 implementation (kernel module) works nicely with layer 2 (bridge) on non-DSA setups (23.05, Ubuntu). Layer 2 had some change with DSA that broke this.
Half broke it, as the userspace daemon can send the multicasts, to the point working non-DSA setups see the traffic, but that daemon on a DSA setup is not receiving multicasts back.