So I've installed 23.05.0-rc1 to rc3 on most of my devices without new or at least unknown issues over 22.03. However, one device (TP-Link Archer C7 v2) is where my WiFi mesh connects to my wired LANs and so also where I initiate my GRE tunnels to get the VLANs plumbed through to the wireless APs.
So here the weird thing:
The wireless APs (e.g., UniFi AP LR) are fine with 23.05.0
The GRE tunnels all work if the source (C7) is 22.03.4
If I reboot the C7 to 23.05.0 some work and some do not -- they just don't come up
The configuration of the ~10 VLANs / tunnels is largely the same since all I'm doing is a bunch of boilerplate mapping eth1.# to @trunk1.#
I say "weird" because normally I'd expect consistent success or failure, but here there's no difference between the tunnels that work vs the ones that fail. So below is the boilerplate, for # in 3..12:
config interface 'NAME'
option type 'bridge'
option device 'br-name'
option proto 'none'
option defaultroute '0'
option device 'switch0'
option vlan '#'
option ports '0t 1t'
option vid '#'
option type 'bridge'
option name 'br-name'
option ipv6 '0'
list ports 'eth1.#'
list ports '@trunk0.#'
list ports '@trunk1.#'
Fwiw there's no errors in the logs. If I boot with 22.03.4 I get messages like below and with 23.05.0 gre4t-trunk1.12 simply never comes up:
# logread | egrep -i 'VLAN.*trunk[0-9]\.[1-9]'
Fri Aug 25 08:03:38 2023 daemon.notice netifd: VLAN 'gre4t-trunk0.8' link is up
Fri Aug 25 08:03:38 2023 daemon.notice netifd: VLAN 'gre4t-trunk0.4' link is up
Fri Aug 25 08:03:38 2023 daemon.notice netifd: VLAN 'gre4t-trunk0.3' link is up
Fri Aug 25 08:03:39 2023 daemon.notice netifd: VLAN 'gre4t-trunk1.3' link is up
Fri Aug 25 08:03:39 2023 daemon.notice netifd: VLAN 'gre4t-trunk1.12' link is up
Hmm currently I'm running a GRE tunnel and it works fine (DSA switch) on 23.05.0-RC3, I see you use non DSA switch?, I think even if its not DSA it should work.
I'm only missing something important here:
What have you configurated inside the GRE interface?
Can you show the network configuration part of this?
I only had once a issue if you are in luci and edit the GRE interface I kept the Network Interface field empty, aswell for the bind interface, otherwise this caused issues on the master router, on the dumbap I only added wan to the bind interface.
Also note that if you rebooted OpenWrt after restart GRE kmods are loaded, but... If you see a message that the trunk is not connected that is normal, just ignore it should work fine.
You can validate this by looking to the rx and tx count on the trunk interface.
Thanks, I think I had tried that one before, but tried again. It doesn't help.
So I've found a clear, proximate symptom: I was about to tcpdump the GFE interface to see what was going on and when doing ifconfig -a I noticed gre4t-trunk1.12 doesn't even exist: it stopped creating interfaces at gre4t-trunk1.3 (to be clear, .4 - .11 aren't configured on that trunk and trunk2 is disabled).
Here's the config from my wireless bridge device that doesn't work (tried with and without bridge_empty):
But those symptoms let me think... It's not that you cannot run two GRE instances so that is neither the problem.
Now there is MTU, I do know from experience MTU can do really weird behaviour when the MTU is too low or too high.
In your GRE interfaces I did see you override it to 1500 that is a good MTU, because 1500 is what GRE documents it as a stable MTU.
But that doesn't say anything about the devices mtu, on my DSA configuration each device and bridge device also comes with a mtu, on my configuration I have set those aswell to 1500 even though I would have expected it to be overriden by GRE interface.
You could try it and see if it affects the symptom you have?
1500 is the default, but I set it anyway and it didn't help. Also in my experience the MTU would affect the ability of it to pass traffic, not bringing up the interface. The bottom line is that it never even creates the interface.
I also tried much of the other stuff you had in your config, as well as adding it to trunk0 since the only difference between the one that works and the one that doesn't is the one that works is on all the tunnels / trunks. That didn't help, either.
At this point I'm out of ideas and need to revert since I have downstream stuff that is unhappy with being offline so long.