Multicast/switch/snooping/vlan problem. Bug?

Just out of curiosity could you provide /etc/config/network?

Sure, here it is:

config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fdac:98a3:30b0::/48'

config interface 'lan'
        option type 'bridge'
        option proto 'static'
        option netmask '255.255.255.0'
        option ip6assign '60'
        option ipaddr '192.168.12.1'
        option ifname 'eth1.1'

config interface 'wan'
        option _orig_ifname 'eth0.2'
        option _orig_bridge 'false'
        option proto 'dhcp'
        option ifname 'eth0.2'

config interface 'wan6'
        option proto 'dhcpv6'
        option ifname 'eth0.2'

config switch
        option name 'switch0'
        option reset '1'
        option enable_vlan '1'

config switch_vlan
        option device 'switch0'
        option vlan '1'
        option vid '1'
        option ports '1 2 3 4 6t'

config switch_vlan
        option device 'switch0'
        option vlan '2'
        option vid '2'
        option ports '0t 5'

config switch_vlan
        option device 'switch0'
        option vlan '3'
        option ports '3t 6t'
        option vid '10'

config switch_vlan
        option device 'switch0'
        option vlan '4'
        option vid '11'
        option ports '3t 6t'

config interface 'guest'
        option type 'bridge'
        option _orig_ifname 'radio0.network2 radio1.network2'
        option _orig_bridge 'true'
        option proto 'static'
        option ifname 'eth1.10'
        option ipaddr '192.168.14.1'
        option netmask '255.255.255.0'

config interface 'vpn'
        option proto 'none'
        option ifname 'tun0'

config interface 'iptv'
        option ifname 'eth1.11'
        option _orig_ifname 'eth1.11'
        option _orig_bridge 'false'
        option proto 'dhcp'
        option defaultroute '0'
        option peerdns '0'

The iptv interface is the upstream multicast interface, and the clients to receive the multicast are in lan. With this setup, multicasting works, but all the wired devices are flooded with the stream. If I enable igmp snooping on the switch, the flooding stops, but so does the multicast stream after the server times out. The server sends queries, but those are lost, so the server times out.

At first glance according to the ar8xxx driver specifics (at least as I get it) I'm not sure that these coexist correctly, it should be 1 vid/vlan per port.

Edit: try to exclude port3 from here also, as it's the same coexistance

config switch_vlan
option device 'switch0'
option vlan '1'
option vid '1'
option ports '1 2 3 4 6t'

Huh? Are you saying that one port can't be a member of more than one VLAN? I don't think that's correct. One port can only be an untagged member in one VLAN (usually referred to as the pvid), but it should be able to be a tagged member of multiple VLANs.

The quoted setup for these VLANs works as expected, BTW. Port 3 on the router is connected to a VLAN-capable switch, which is why I trunk VLANs 1 (untagged), 10 and 11 (both tagged).

I thought IGMP queries ends up to eth0 because there are forwarded in vlan id 1, which is the default vlan for every port at boot for most common switch. If you are sure that port 0 was untag with a different vlan id, so it's a really strange behavior, something I never saw before. Maybe related to offloading…

Yep, that's what I'm saying :slight_smile: and no it's not pvid, it's vid in terms of driver. That may be incorrect but still, could you please try excluding port3 from every configuration but iptv?
I back my assumption on logic that ar8xxx driver segments ports by vlan internally, so assigning several vlans/vid to 1 port may confuse the driver at some point eventually.
But it's not the case for qca8k driver.

But if what you are saying is true, then my current setup should have failed spectacularly for everything, not just IGMP. The trunking of VLANs 1, 10 and 11 on port 3 is most definitely working as expected. VLAN 10 is used for containing traffic in my guest network.

Please also make note that the tests I've been doing does not involve those VLANs either. I've been injecting packets on port 4, which is only a member of VLAN 1.

That said, I have now tried turning both VLAN 10 and VLAN 11 off on all ports, which leaves me with a basic configuration:

config switch
        option name 'switch0'
        option reset '1'
        option enable_vlan '1'

config switch_vlan
        option device 'switch0'
        option vlan '1'
        option vid '1'
        option ports '1 2 3 4 6t'

config switch_vlan
        option device 'switch0'
        option vlan '2'
        option vid '2'
        option ports '0t 5'

config switch_vlan
        option device 'switch0'
        option vlan '3'
        option vid '10'

config switch_vlan
        option device 'switch0'
        option vlan '4'
        option vid '11'

Makes no difference. With IGMP snooping off, IGMP packets injected on port 4 shows up in a tcpdump of eth1. With IGMP snooping on, the IGMP queries shows up on eth0 instead. Reports remain on eth1.

Okay, good, what if you invert cpu ports? Set port0 for lan and port6 for wan? So you'll have eth0 for lan and eth1 for wan

Well, I don't know about offloading, but it sounds a bit unlikely to me. Both port 0 and port 6 have a pvid of 0 as reported by swconfig:

enable_eee: ???
igmp_snooping: 0
pvid: 0
link: port:0 link:up speed:1000baseT full-duplex

enable_eee: ???
igmp_snooping: 0
pvid: 0
link: port:6 link:up speed:1000baseT full-duplex

So at least in that regard they are equal.

Hmm, I guess I could try that (and I will), but isn't there a reason for the default? Will this in any way affect performance? I can't do that test right now, because then I'll have to restart the router, and that's inconvenient at the moment. I'll try it later tonight, though.

If that also inverts (and thereby fixes my issue), I will still say this is a bug, though. :slight_smile:

Ive browsed through the driver code and it seems that you are setting snooping a small bit wrong. igmp_snooping can be set on per port basis or on a switch as a whole. If snooping is set on switch as a whole the driver just iterates over each port and sets snooping to on.
Try setting snooping to on on physical ports that are affected without touching wan cpu port and physical port.

No it shouldn't, it's just a test on a logic flaw

Oh, believe me, I have tried every possible combination. I have also looked at the driver, and I saw that the global setting is just a shorthand for setting all ports to the same setting (like you said).

If I set snooping to on for ports 1-4 (and leave it off for 0 and 6), and then inject the packets on port4, I actually get none of them on eth1, but the query still shows up on eth0...

Snooping should be on for 1, 2, 3, 4 and 6 more likely if you want to receive it on eth1.

Yes, that's true. If I do that, the reports are seen on eth1 again. But the queries still end up at eth0.

Ok let's just see what happens if you invert cpu ports then, maybe it's a flaw in logic that sets some register to forward to port0

I'll test that in a little bit, and report back. However, I've also done some reading, and I found this document on the AR8327 (which is probably very similar to the AR/QCA8337). In chapter 9 the hardware IGMP snooping features are described, but they only mention joins/leaves (which are the only messages that is really relevant for the switch itself), not queries. So why does enabling IGMP snooping also affect queries? Getting a firm grip on this isn't exactly easy, I'll admit.

Now, if only @nbd or @blogic could chime in, I'm sure an answer would be provided.

That's it, on my config, all "only tagged" ports are pvid of 0 with swconfig. I believe that's the default vlan for untag frames if no vlan id is set. That said, it doesn't explain your issue with last switch config. Look forward to know the result with invert cpu ports.

Ok, now I've tested swapping the cpu ports. The most annoying thing with that is that the mac addresses are also swapped, so computers on the inside deems that the connection has changed. Not serious, just annoying.

But the results are clear. With this setup, the query frames still end up on eth0 (which is now lan), even with igmp snooping enabled globally on all ports. eth1 ends up being isolated (like it should). In other words, this means that eth0 for some reason is special. I did revert back to the original setting after testing, though.

But I can confirm that with the ports swapped, IGMP snooping on the LAN side would most likely work as it should, since the IGMP queries then would end up where they are needed.

But the big question still remains: What is so special about eth0 as to cause it to be a magnet for IGMP queries? I'd rather have the default setup for WAN/LAN, as that is what every build usually has.

Along with your test it's definitely wrong redirection to port0. Check bit 0 at offset 0x0210 and follow the description
https://www.deyisupport.com/cfs-file.ashx/__key/telligent-evolution-components-attachments/00-25-01-00-00-20-73-71/QCA8337N_5F00_Data_5F00_Sheet_5F00_MKG_2D00_17793_5F00_v1.0.pdf

And the cpu port is controlled by bit 10 at 0x0620 - only port0, probably it explains the behavior

You may try setting igmp_copy_en bit to 1.

You may try using it

edit: or 0x0618 bit 28 may also solve the issue, so it puts a correct entry into an arl table
edit2: nvm this bit is already set to 1 by default