DSA and Appletalk multicast issues

Need to:

  • service stop afpd
  • assuming debug on is needed, open a separate terminal and run atalkd -d
  • assuming you really want afp, service start afpd

appletalk = yes is what tells afpd to look for a running atalkd and register on it by opening a port for itself and adding the AFPServer name to the zone.

If you don't care about atalkd's stdout debug, running it without -d will make it fork a server per interface and exit quietly. I have yet to build the scripts to start and stop it regularly.

Default configuration (empty /etc/atalkd.conf) will run atalkd fine.
If you want it to run on a particular interface, list the interface per line.
For the time being on my testing box I forced a server on every interface which is not correct. Like this:

# AppleTalk daemon configuration (netatalk 4.x)
#
# See the `atalkd.conf' manual page for examples.
br-lan -seed -phase 2 -net 300 -addr 300.180 -zone "bridge"
lan1 -seed -phase 2 -net 200 -addr 200.238 -zone "lan"
eth1 -seed -phase 2 -net 100 -addr 100.228 -zone "ethernet"

Well - It is not surprise I have about your PR being accepted - I am horrified!! How did you get away with it? I will have to look up who reviewed and merged it :wink:
I don't mean this in a bad way, In fact I think it is excellent that you are putting in the effort to get it going.

Well, you're going to be busy. Module got backported into 24.10 (just got told today). Better get started... we're looking at 5-6 PR and counting.
Once we fix this stuff it will be at least 1 or 2 more.

Which reminds me... I keep mulling the point of having to add an interface to support the protocol. I'll investigate how, but still doesn't make sense to me.
The daemon can send traffic already. I see it on other devices. Which means without an interface the thing can send traffic its merry way. So... interface for one way only? You can step out any time you want, but you can never leave? (sorry, lyrics came to mind, but it's on the opposite direction.)

I'm building an image with the lore @appelsin shared. Let's see what goodies it provides. I'm probably wrong but my guess is that something is broken at socket level in either DSA, appletalk kmod or atalkd that results in the registration for multicasting not working. Hopefully it's atalkd... that's easily fixed. Everything else will be an uphill battle.

1 Like
root@TestWrt-63:~# cd /sys/kernel/debug/tracing
-ash: cd: can't cd to /sys/kernel/debug/tracing: No such file or directory

I had high hopes.

Searching on how that may be different for a 6.6 kernel...

Oh fun...

It has to be a low priority side track for me just now, unfortunately. Maybe an hour or so today out of sheer interest.

From your original PR:

Description:
No changes to package other than using latest available upstream code base.

I'll put the blame squarely on your shoulders then - not enough testing before submitting :wink:

What if you just mount tracefs manually:

# mount -t tracefs nodev /sys/kernel/tracing

docs

@apccv
Guess of the day:
kmod-appletalk does not understand DSA

You commented that 40-odd posts ago, I know. Still, I'll believe it when I see it.
And I know my opinion won't matter but if DSA forces changes to userspace or to protocols that's pretty crappy to begin with. Was IPX adjusted for DSA? ICMP or ARP? Do any of those have code that says "hey if DSA active then do this instead"?
Moot point. Let's see if next kernel cooking says anything useful.

No, but Novell had an appletalk driver that added a virtual interface on the hardware interface (this is where most of my appletalk involvement came from, providing file services for Macs in corporate environments where file sharing and printing between Windows and Mac users was vital - 30 years ago I might add!!

No they don't - and they work. In DSA, the switch or bridge is separated entirely from the interfaces. This, in my opinion, is where the problem lies.

The example I gave of vxlan protocol - ip-full utility only recognises "type vxlan" if kmod-vxlan is installed. When it is installed, it is possible to create the vxlan interface and add it to the bridge device. Without it you get the error "unknown type". The appletalk kmod, just reinstated, predates any of this unfortunately.

A way forward might be to create a dummy, non DSA, interface and add that to the bridge. I'm not sure how this would work. I would test it but do not currently have any means, or time, to do so. Maybe I need an overnighter to play with it....

I mean yeah, it might certainly be the case that there's an old "bug" or non-compliance with standards (current standards at least) in atalkd that didn't rear it's head until now, sure.

In that case it might make sense to for example do something like cut down atalkd into a simple test program that just sends one static multicast frame out every few seconds, and run a multicast echo program on Ubuntu that just reflects it back.

In order to have a simple codebase where one can fiddle with the socket code, make minor adjustments and such.

I think that's what we have right now.

atalkd on OpenWrt sends 3 data frames and a burst of AARP frames on startup. Can be made to keep going if set as a "seed" for that broadcast domain.

Any linux distro or the famously tweaked 23.05 OpenWrt running atalkd will pick up that and lob back a response either saying "yep, you're the router" or "hold on, I'm already the router" and that is making it back to the DSA device bridge.

Even without the round trip, two DSA devices will lob frames at each other without requiring any tweaks but configuring atalkd to be a router on the lan segment.

Something interesting... if I start atalkd to route on lan1, eth1 and br-lan (which would be incorrect as those would be three routers on the same segment) they seem to hear each other. This means local traffic going into the DSA layer somehow is read back. I'll draw a picture later. Almost feels like this needs a "brouter".

@appelsin I think the traces in kernel will be very useful to figure out why those frames are not reaching the atalkd behind the DSA switch. Just waiting for the n-th build which hopefully will have that active. In lieu of that I'll patch the kmod to add debugging all over the darn place.

1 Like

That's interesting.

I guess the code path is a bit different for a pure frame-already-in-software-bridge multicast fanout compared to an externally-received-frame-copied-to-cpu-port multicast fanout.

Not sure what all the differences are though.

One difference is the MTU since the frame received on the CPU port will be larger by the proprietary tag.

No clue if raising the MTU on the various interfaces would do anything at all...

Mainly because I don't have the tcpdump :slight_smile: but my guess would be it would do nothing since fx. the AARP frames do not work either, and they are much smaller than 1500. So probably not an angle worth pursuing.

How's the compilation going?

EDIT: I'd still also like to see a tcpdump on ubuntu eth0, same time as tcpdump on DSA master interface ethX, same time as tcpdump on DSA slave interface lanX (before kernel drops the frame). So we can find 1 frame in the 3 dumps and compare it at different points in time / points in the software stack.

More crashes than on the local stretch of highway. Trying again.

I'll find some time to store tcpdumps.

:rofl:


In net/bridge/br_input.c,

br_pass_frame_up() starts with:

	dev_sw_netstats_rx_add(brdev, skb->len);

Ergo if the frame gets as far as passing through all the special-case handling in br_handle_frame_finish(), it should cause a RX counter to increase by a few bytes.

br_pass_frame_up() ends with:

	return NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,
		       dev_net(indev), NULL, skb, indev, NULL,
		       br_netif_receive_skb);

If the frame gets as far as passing through br_pass_frame_up(), it should trigger a Netfilter hook where one can insert a counter statement (and inspect with nft list ruleset), just before the frame passes to br_netif_receive_skb().

Might be useful to figure out if the frame reaches netif_receive_skb() in net/core/dev.c or gets /dev/nullified in the bridge code ?

Just another tracing idea


Also have another dumb idea to add to the dumb ideas list:

  try throwing a vlan in the stack
  try raising the mtu a bit
+ try `ifconfig lan1 promisc`

see if setting the interface promisc and ?maybe bypassing a few dst-mac filters? gets anything flowing (probably not)

Please elaborate if you can :slight_smile:

00:27:06.982423 <remoteMAC> > 09:00:07:ff:ff:ff, 802.3, length 37: LLC, SNAP (0xaaaa), ctrl 0x03: oui Appletalk (0x080007), pid Appletalk (0x809b)
[ scroll --> ]                                                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ethertype is in a funny place...

... so I took a very quick look at the QCA bits of DSA but couldn't find anything that would mangle that.

also finally downloaded netatalk to see how it makes sockets:

netatalk-4.0.7$ grep -R AF_ | grep -Po '\bAF_\w+\b' | sort | uniq -c | sort -rn
     54 AF_APPLETALK
     25 AF_INET
     13 AF_UNSPEC
      9 AF_INET6
      2 AF_UNIX

my uneducated guess would be that an AF_APPLETALK socket should be good enough to receive LLC-SNAP + Appletalk

Yes. It's a 802.3 + LLC frame. Those are not EtherTypes. Those are the "Organizationally Unique Identifier" and "Protocol ID".

I like this explanation vs Ethernet II:
https://community.cisco.com/t5/switching/ether-frames-802-3-naming-conventions/td-p/2076323
More history here:
https://web.archive.org/web/20160530210544/http://www.ee.siue.edu/~bnoble/comp/networks/frametypes.html
``

Should and does, in non-DSA.

I have the code for atalkd handy (and if you downloaded the netatalk tarball it has it's own directory, albeit in a weird spot). I can share the code that open the socket. As far as I understand it opens BSD-type sockets (not POSIX) which might be part of the DSA problem. Opens quite a few:

root@TestWrt-64:~# lsof | grep atalkd
atalkd    31293    root  cwd       DIR       0,18        0         83 /root
atalkd    31293    root  rtd       DIR       0,18        0          2 /
atalkd    31293    root  txt       REG       0,19    57401        136 /usr/sbin/atalkd
atalkd    31293    root  mem       REG      31,13                 136 /usr/sbin/atalkd (path dev=0,19)
atalkd    31293    root  mem       REG      31,12                 338 /lib/libgcc_s.so.1 (path dev=0,20)
atalkd    31293    root  mem       REG      31,13                 123 /usr/lib/libatalk.so.19.0.0 (path dev=0,19)
atalkd    31293    root  mem       REG      31,12                 336 /lib/libc.so (path dev=0,20)
atalkd    31293    root    0u      CHR        1,3      0t0         73 /dev/null
atalkd    31293    root    1u      CHR        1,3      0t0         73 /dev/null
atalkd    31293    root    2u      CHR        1,3      0t0         73 /dev/null
atalkd    31293    root    3u     unix 0x8785f1cd      0t0    1174776 type=DGRAM 
atalkd    31293    root    4u     sock        0,6      0t0    1174777 protocol: DDP
atalkd    31293    root    5u     sock        0,6      0t0    1174780 protocol: DDP
atalkd    31293    root    6u     sock        0,6      0t0    1174781 protocol: DDP
atalkd    31293    root    7u     sock        0,6      0t0    1174782 protocol: DDP
atalkd    31293    root    8u     sock        0,6      0t0    1174783 protocol: DDP
atalkd    31293    root    9u     sock        0,6      0t0    1174786 protocol: DDP
atalkd    31293    root   10u     sock        0,6      0t0    1174787 protocol: DDP
atalkd    31293    root   11u     sock        0,6      0t0    1174788 protocol: DDP
atalkd    31293    root   12u     sock        0,6      0t0    1174789 protocol: DDP

On our other topic, I may be on a better path to get a tracing build done. Who would have known?.. the toolchain needs to be built with some of it in place or else the kernel never gets built with it. I don't have a great build machine and building the toolchain takes lots of time.

Tried all kinds of promisc settings at all kinds of levels, no difference, no change in packet availability or layout.
Tried VLANs and locked myself out of the test router... I need to read a bit before I try that again.
Tried igmp_snooping... no change. I kind of expect that because (and I'm sure wrong on that too) that's more IP-specific, at least on current implementation.
Will play with MTU once the current build is done.
.

Darn builds...
openwrt/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/lttng-modules-2.13.9/src/probes/lttng-probe-skb.o -- Error 1

Do we need that?

I think this is beginning to make sense...
Looking at:
https://www.kernel.org/doc/html/v6.6/networking/dsa/dsa.html
There's a diagram a third into the page (the only diagram on page in any case).
The issue you are pointing out is that the application (or kernel extension) is opening the port directly to the interface (if that's even possible in DSA) and not going through the "DSA switch driver". This would explain why traffic goes out as it is dropped on the interface (not sure how it explains atalkd printing it's own package but that might be done prior transmission and not on loop back receipt) and why, traffic lacking the appropriate switch tags doesn't make it back to the application's socket.
Will read that a couple dozen times while what I hope will be a good build cooks overnight.
Diagram does say unaware application... but given how old this code is and how atypical these ethernet frames are...

1 Like

Would it perhaps not be preferable to debug using a program that registers just one socket?

By the way, are there any sockets created by kmod-appletalk as well? Maybe not important, I guess we could just unload that module while testing.

Awesome, would love to compare frames and verify that they do not get mangled by DSA or what not, however unlikely. And would love to tcpreplay them into my own Marvell DSA switch. :slight_smile:

If the function tracer turns out to be more trouble than it's worth, put it on the back-burner?

There are other options if we need tracing. For example a more modern BPF tracer.

I personally quite like ply.

It should be a much easier build than a whole kernel.

(In fact, it would be really nice to have an opkg / apk for ply, I might attempt that at some point.)


More dumb ideas:

+ make lanX into a standalone interface by removing it from the bridge, test again

As long as the interface is a DSA master, inbound frames will not actually reach any socket bound to a DSA master interface, frames being handled by DSA instead and what not. Any outbound datagrams could end up being dropped by switch hardware because it's missing the DSA tag.

But sure, it's possible, just bind to the DSA master interface (ethX) instead of the DSA user interface (lanX).

Is there any reason to suspect that atalkd wrongly binds a socket to ethX rather than lanX?

An unaware application can still bind to a DSA master interface that carries proprietary-tagged frames rather than EthernetII or 802.3 frames; just a question of code in question happening to have a setsockopt(, SOL_SOCKET, SO_BINDTODEVICE, ) call or something similar.

Since the DSA master interface runs on top of a completely standard eth driver. DSA just hooks in to grab the frames for detagging, and then it uses a side-channel to tell the switch to start tagging stuff.

Haven't tried but you could probably unload the DSA module, and it should tell the switch to stop tagging.

In that case, the interface would probably start working like a normal port in the hardware switch, and having a socket bound to it would be a perfectly sane thing to do.

The "unaware application" probably won't know whether one told it to bind to ethX because one disabled the DSA module and it's just a port in a switch now. Or if one told it to bind to ethX by mistake while DSA is mastering the interface and tagging is on.

Missed this one, sorry.

Yeah, no, that's for userspace tracing as far as I know, skip it.

I think ftrace just needs kprobe and tracepoint (maybe uprobe)

Oh man. Kernel traces are short of magical. Even if I don't get to fix this and throw in the towel (fear not, might be a while) the three of you that have been putting up with me have my eternal gratitude for all I've learnt so far. Thanks.

I managed to get some level of tracing built in. Issue was I was being ambitious. You mentioned uprobes and that requires kernel_menuconfig-ing, which opens a different level of pain. Leave that aside and at least it traces functions.
Just did a basic test as I will need to segregate these things from my LAN... the amount of irrelevant frames flying on the LAN is unreal. But one thing I had to try before I went to bed:

  • grep available_filter_functions for anything appletalk (there are 43)
  • stick that into set_ftrace_filter
  • enable function trace & start local atalkd (a remote is always running on the 23.05 box)
  • run tcpdump on the bridge (or lan - can't do eth due to libcap issue)

...that because I really needed to confirm that the kmod-appletalk is not getting traffic (which short of rules out this being an atalkd code issue).

And... that is the case.

Trace goes nuts when local atalkd starts... sock_create -> atalk_create, then ioctl, connect, release, rcv... in many ways than one... sys_bind -> atalk_bind in there... sock_recvmsg -> atalk_recvmsg, ditto for sendmsg...

and then it dies. It stops and nothing happens when atalkd gives up when nobody replies to it (even though the 23.05 box is putting frames on the local br-lan). So to me that proves that somehow atalkd reads it's own messages after pushing them out (so it receives it's own multicast) but never sees anything from the outside. Neither does the kmod.

Will need to play with this more, isolate maybe the boxes (which makes my life more complicated), rebuild because I don't have function_graph as a tracer option, re-read what you mentioned of br_netif_receive_skb and netif_receive_skb to filter on those. Regardless, I see options. And I see that looking at atalkd code is useless. It's all the kernel module (yeah, I know @bluewavenet, you said so).

More to come.

2 Likes