First post just the default config. But then, when you start configuring your VLANs, you should create just one new VLAN and get that working. Keep it simple and once everything is working you should hopefully have the knowledge and even a formula that you can use to build out the rest.
Ok, so I ended up not resetting because, like reinstalling windows, that is usually the mindless way out. You almost never learn what you did wrong, only the correct steps to do it by resetting.
I disabled the LAG exposed only LANPORT1 and boom everything worked.
So now the question is how should this look with a LAG?
Working config -
config globals 'globals'
option ula_prefix 'fde4:aab8:c578::/48'
config switch
option name 'switch0'
option reset '1'
option enable_vlan '1'
config switch_vlan
option device 'switch0'
option vlan '1'
option vid '1'
option ports '6t 4'
config switch_vlan
option device 'switch0'
option vlan '2'
option ports '0t 5'
option vid '2'
config switch_vlan
option device 'switch0'
option vlan '3'
option vid '3'
option ports '6t 3'
config switch_vlan
option device 'switch0'
option vlan '4'
option vid '4'
option ports '6t 2'
config switch_vlan
option device 'switch0'
option vlan '5'
option ports '6t 1'
option vid '5'
config switch_vlan
option device 'switch0'
option vlan '7'
option ports '6t 4t 3t 2t 1t'
option vid '10'
config device
option name 'eth1'
config device
option name 'wlan0'
config device
option name 'wlan1'
config device
option name 'eth0'
config device
option name 'br-lan'
option type 'bridge'
list ports 'eth1'
list ports 'eth1.1'
config device
option name 'bonding-LAGTest'
config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config interface 'lan'
option device 'br-lan'
option proto 'static'
option ip6assign '60'
list dns '1.1.1.1'
list dns '1.0.0.1'
list ipaddr '192.168.1.1/24'
config interface 'LAGTest'
option proto 'bonding'
option netmask '255.255.255.0'
option bonding_policy '802.3ad'
option min_links '0'
option ad_actor_sys_prio '1'
option ad_select 'stable'
option lacp_rate 'fast'
option xmit_hash_policy 'layer2'
option all_slaves_active '0'
option link_monitoring 'mii'
option miimon '100'
option downdelay '0'
option updelay '0'
option use_carrier '1'
option ipaddr '192.168.2.10'
list slaves 'eth1.1'
list slaves 'eth1.3'
list slaves 'eth1.4'
list slaves 'eth1.5'
config interface 'LANPORT1'
option proto 'static'
option device 'eth1.1'
list ipaddr '192.168.2.6/24'
config interface 'LANPORT3'
option proto 'static'
option device 'eth1.3'
list ipaddr '192.168.2.7/24'
config interface 'LANPORT4'
option proto 'static'
option device 'eth1.4'
list ipaddr '192.168.2.8/24'
config interface 'LANPORT5'
option proto 'static'
option device 'eth1.5'
list ipaddr '192.168.2.9/24'
config interface 'VLAN10'
option proto 'static'
option device 'eth1.10'
list ipaddr '192.168.88.1/24'
config interface 'wan'
option device 'eth0.2'
option proto 'dhcp'
config interface 'wan6'
option device 'eth0.2'
option proto 'dhcpv6'
So I added a device to VLAN20 that is attached to the Netgear switch and it works properly with the LACP trunk an the router. So something is breaking at the trunk between the Netgear switch and the Mikrotik switch.
So I sniffed the packets with wireshare on the trunk port (on both sides) between the netgear switch and the mikrotik switch and the vlan tags looked correct. So something is causing the tags to be messed up between the netgear switch and the router.
If you anyone has experience examining vlan packets and dhcp packets let me know. It's either a misconfiguration with the mikrotik switch or my vlan on the router.
Here is the current config -
Router -
config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config interface 'lan'
option device 'br-lan'
option proto 'static'
option ip6assign '60'
list dns '1.1.1.1'
list dns '1.0.0.1'
list ipaddr '192.168.1.1/24'
config interface 'wan'
option device 'eth0.2'
option proto 'dhcp'
config interface 'wan6'
option device 'eth0.2'
option proto 'dhcpv6'
config switch
option name 'switch0'
option reset '1'
option enable_vlan '1'
config switch_vlan
option device 'switch0'
option vlan '1'
option vid '1'
option ports '6t 4'
config switch_vlan
option device 'switch0'
option vlan '2'
option ports '0t 5'
option vid '2'
config device
option name 'eth1'
config device
option name 'br-lan'
option type 'bridge'
list ports 'bonding-LAGTest'
list ports 'eth1'
config interface 'LAGTest'
option proto 'bonding'
option netmask '255.255.255.0'
option bonding_policy '802.3ad'
option min_links '0'
option ad_actor_sys_prio '1'
option ad_select 'stable'
option lacp_rate 'fast'
option xmit_hash_policy 'layer2'
option all_slaves_active '0'
option link_monitoring 'mii'
option miimon '100'
option downdelay '0'
option updelay '0'
option use_carrier '1'
option ipaddr '192.168.2.10'
list slaves 'eth1.1'
list slaves 'eth1.3'
list slaves 'eth1.4'
list slaves 'eth1.5'
config interface 'LANPORT3'
option device 'eth1.3'
option proto 'static'
list ipaddr '192.168.1.7'
config interface 'LANPORT4'
option device 'eth1.4'
option proto 'static'
list ipaddr '192.168.1.8/24'
config interface 'LANPORT1'
option device 'eth1.1'
option proto 'static'
list ipaddr '192.168.1.5/24'
config device
option name 'bonding-LAGTest'
config device
option name 'wlan0'
config device
option name 'wlan1'
config device
option name 'eth0'
config interface 'LANPORT2'
option proto 'static'
list ipaddr '192.168.1.6/24'
option device 'eth1.5'
config switch_vlan
option device 'switch0'
option vlan '5'
option ports '3'
option vid '3'
config switch_vlan
option device 'switch0'
option vlan '6'
option ports '2'
option vid '4'
config switch_vlan
option device 'switch0'
option vlan '7'
option ports '1'
option vid '5'
config interface 'VLAN10'
option proto 'static'
option device 'eth1.10'
list ipaddr '192.168.88.1/24'
config switch_vlan
option device 'switch0'
option vlan '8'
option vid '10'
option ports '6t 4t 3t 2t 1t'
There is nothing for you to help me with if the LAG is gone. It works without the LAG. Thanks for the help up to this point but I guess we all have our limits.
Doing this actually gives me the exact some behavior as I was getting with the LAG failing. So if we can figure out why I can't use anything but port 1 for my vlan then we can probably figure out why the LAG isn't working with the VLAN. It's acting like it doesn't know what port to use. I actually got it into a loop at one point because I had to pull all but the first router port out of my switch to fix it.
After playing with the config back and forth I was able to get the bond to work with the VLAN by assigning the bond to my LAN bridge, using the bridge vlan filtering option to assign VLAN10 to the bridge interface as tagged. This seemed to work but it caused issues with wifi that were on the same bridge.
I have isolated the bond to it's own bridge and bonded only the last 3 ports. I am going to play with the config and see if I can mimic what I did above but isolate the wifi so they aren't messed up by the bridge VLAN filter. I am using the first port to maintain everything so my network doesn't break whenever I screw up the bond.
Here is my config now:
config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals 'globals'
option ula_prefix 'fdcc:c1dc:0b80::/48'
config device
option name 'br-lan'
option type 'bridge'
list ports 'eth1.1'
config interface 'lan'
option device 'br-lan'
option proto 'static'
option ipaddr '192.168.1.1'
option netmask '255.255.255.0'
option ip6assign '60'
config interface 'wan'
option device 'eth0.2'
option proto 'dhcp'
config interface 'wan6'
option device 'eth0.2'
option proto 'dhcpv6'
config switch
option name 'switch0'
option reset '1'
option enable_vlan '1'
config switch_vlan
option device 'switch0'
option vlan '1'
option vid '1'
option ports '6t 4'
config switch_vlan
option device 'switch0'
option vlan '2'
option ports '0t 5'
option vid '2'
config switch_vlan
option device 'switch0'
option vlan '3'
option vid '3'
option ports '6t'
config switch_vlan
option device 'switch0'
option vlan '4'
option vid '4'
option ports '6t 3'
config switch_vlan
option device 'switch0'
option vlan '5'
option vid '5'
option ports '6t 2'
config switch_vlan
option device 'switch0'
option vlan '6'
option ports '6t 1'
option vid '6'
config switch_vlan
option device 'switch0'
option vlan '7'
option vid '10'
option ports '6t 4t'
config interface 'LANPORT2'
option proto 'static'
option device 'eth1.4'
list ipaddr '192.168.1.5/24'
config interface 'LANPORT3'
option proto 'static'
option device 'eth1.5'
list ipaddr '192.168.1.6/24'
config interface 'LANPORT4'
option proto 'static'
option device 'eth1.6'
list ipaddr '192.168.1.7/24'
config interface 'VLAN10'
option proto 'static'
option device 'eth1.10'
list ipaddr '192.168.88.1/24'
config interface 'bond0'
option proto 'bonding'
option auto '0'
option netmask '255.255.255.0'
list slaves 'eth1.4'
list slaves 'eth1.5'
list slaves 'eth1.6'
option bonding_policy '802.3ad'
option min_links '0'
option ad_actor_sys_prio '65535'
option ad_select 'stable'
option lacp_rate 'fast'
option xmit_hash_policy 'layer2'
option all_slaves_active '0'
option link_monitoring 'mii'
option miimon '100'
option downdelay '0'
option updelay '0'
option use_carrier '1'
option ipaddr '192.168.3.10'
config device
option type 'bridge'
option name 'br-LAG'
list ports 'bonding-bond0'
config interface 'LAG_LAN'
option device 'br-LAG'
option proto 'static'
list ipaddr '192.168.3.1/24'
option auto '0'
config bridge-vlan
option device 'br-LAG'
option vlan '10'
list ports 'bonding-bond0:t'
Use your LAG. You intend to bond multiple 10Gbps ports together at some point, yeah? I get tired of people telling others on forums not to use the available features because they aren't using them yet.
VLAN subnets are separate L3 networks. If you are getting DHCP and DNS your subnet for the VLAN is functioning normally. Somewhere upstream needs to be an L3 device to handle inter-vlan routing between subnets and hopefully implement some sort of firewall rules engine to control who can do what between networks. If you have a router in place (router-on-a-stick works, or an egress firewall can do the job generally) then you need to allow the inter-vlan routing between subnets.
Guessing your firewall is just blocking the traffic between VLANs. I like to treat IoT as its own zone, separate from LAN, Wifi, DMZ, etc. It is a little harder to set up initially but you will benefit from being able to control who can have IPV6 addresses this way.
I finally build a SuperMicro 1U server and installed Sophos XG on it because PFSense/OPNSense and IPFire didn't like me replacing the LAN port with a bridge on a pair of LAGs (1G AND 10G) and the license is free.
LAG adds a considerable extra complexity (for OpenWrt and your 3rd party switch), the crux of debugging would be to get the basic working first, before turning to more complex setups. Expecting supporters (read volunteers!) to go out of their area of expertise and comfort zone to fix your issues is difficult, if you insist on the full monty - instead of being able to compromise on simplifying your problem/ configuration and leave out (potentially very relevant!) orthogonal settings.
At this point it might be easier to escalate your issue by hiring a commercial consultant locally, who will probably gladly follow your line of thought for the right fee.
But I did get a simple and working config, and when I introduce the LAG it doesn't work. I am missing something about how I should add VLAN tagging to the LAG.
This is ultimately the same place I was when I originally made the post. I just am more familiar with everything now.
I really don't have any expectations of anyone. I made this post on the off chance
someone walked the path I am currently walking. It's kind of the only expectation you can have of conversations on the internet. But I guess I will just have to be the first person to clear the way.
I will keep playing with it, probably dive into LAG configuration on linux and see if I can identify what's wrong.
Ya, I have been keeping it simple and keeping everything in the same firewall ruleset. So they should be able to communicate fine. I haven't even really been diagnosing cross VLAN communication. I have been more focused on getting each VLAN to work by itself. The issue really is when the LAG is configured with VLAN's it causes the devices on the 10GbE switch to not be able to reach their gateway for the subnet (192.168.88.1).
I also have issues with some of my devices (TP-Link EAP225-Wall) under OpenWRT 23.05.5 : DSA is partially operational (Because of some issues with some drivers related to ath79 wifi baseband IC I was told), and swconfig is also installed. Under such "hybrid" setup with DSA and swconfig, I have issues with the switch to create some configurations that normally work fine either with setups with no DSA at all, of with setups with full DSA. : I understood well how to create configurations with VLAN and the switch either for full DSA or no DSA at all with swconfig, but with hybrid setups, I fail doing it.
So my question is : Is there a post somewhere or a piece of documentation explaining how to handle the switch and interfaces when we are into such hybrid setups ? It's like this situation is going to last a bit because DSA won't be fully implemented for all devices before some time, because of drivers integration / adaptations efforts that seem to be quite time consuming for those developing this part of OpenWRT.