[SOLVED] GRE tunnels not working in 23.05.0-rc#

So I've installed 23.05.0-rc1 to rc3 on most of my devices without new or at least unknown issues over 22.03. However, one device (TP-Link Archer C7 v2) is where my WiFi mesh connects to my wired LANs and so also where I initiate my GRE tunnels to get the VLANs plumbed through to the wireless APs.

So here the weird thing:

  • The wireless APs (e.g., UniFi AP LR) are fine with 23.05.0
  • The GRE tunnels all work if the source (C7) is 22.03.4
  • If I reboot the C7 to 23.05.0 some work and some do not -- they just don't come up
  • The configuration of the ~10 VLANs / tunnels is largely the same since all I'm doing is a bunch of boilerplate mapping eth1.# to @trunk1.#

I say "weird" because normally I'd expect consistent success or failure, but here there's no difference between the tunnels that work vs the ones that fail. So below is the boilerplate, for # in 3..12:

config interface 'NAME'                                                                                                                                                                                              
        option type 'bridge'                                                                                                                                                                                          
        option device 'br-name'                                                                                                                                                                                      
        option proto 'none'                                                                                                                                                                                           
        option defaultroute '0'                                                                                                                                                                                       
                                    
config switch_vlan                                                                                                                                                                                                    
        option device 'switch0'                                                                                                                                                                                       
        option vlan '#'                                                                                                                                                                                              
        option ports '0t 1t'                                                                                                                                                                                          
        option vid '#'           

config device                                                                                                                                                                                                         
        option type 'bridge'                                                                                                                                                                                          
        option name 'br-name'                                                                                                                                                                                        
        option ipv6 '0'                                                                                                                                                                                               
        list ports 'eth1.#'                                                                                                                                                                                          
        list ports '@trunk0.#'                                                                                                                                                                                       
        list ports '@trunk1.#'         

Any ideas?

Fwiw there's no errors in the logs. If I boot with 22.03.4 I get messages like below and with 23.05.0 gre4t-trunk1.12 simply never comes up:

# logread | egrep -i 'VLAN.*trunk[0-9]\.[1-9]'
Fri Aug 25 08:03:38 2023 daemon.notice netifd: VLAN 'gre4t-trunk0.8' link is up
Fri Aug 25 08:03:38 2023 daemon.notice netifd: VLAN 'gre4t-trunk0.4' link is up
Fri Aug 25 08:03:38 2023 daemon.notice netifd: VLAN 'gre4t-trunk0.3' link is up
Fri Aug 25 08:03:39 2023 daemon.notice netifd: VLAN 'gre4t-trunk1.3' link is up
Fri Aug 25 08:03:39 2023 daemon.notice netifd: VLAN 'gre4t-trunk1.12' link is up

Hmm currently I'm running a GRE tunnel and it works fine (DSA switch) on 23.05.0-RC3, I see you use non DSA switch?, I think even if its not DSA it should work.

I'm only missing something important here:

What have you configurated inside the GRE interface?

Can you show the network configuration part of this?

I only had once a issue if you are in luci and edit the GRE interface I kept the Network Interface field empty, aswell for the bind interface, otherwise this caused issues on the master router, on the dumbap I only added wan to the bind interface.

Also note that if you rebooted OpenWrt after restart GRE kmods are loaded, but... If you see a message that the trunk is not connected that is normal, just ignore it should work fine. :+1:

You can validate this by looking to the rx and tx count on the trunk interface.

Correct, this continues to be the case on the C7 on all versions through 23.05.0-rc3.

config interface 'trunk0'           
        option proto 'gretap'
        option force_link '1'
        option peeraddr '192.168.10.3'
        option tunlink 'lan'
        option df '0' 
        option defaultroute '0'
        option mtu '1500'
                  
config interface 'trunk1'      
        option proto 'gretap'
        option force_link '1'
        option peeraddr '192.168.10.254'
        option df '0'
        option defaultroute '0'
        option mtu '1500'      

Also btw I've tried both with and without a option tunlink 'lan' statement.

Sorry, what other parts are you looking for that aren't above or in the OP?

Yes, rx and tx are incrementing, probably because, like I alluded to, 1.3 is working but 1.12 is not.

Thanks.

-- Mike

When I look to your snippets I notice you are missing the checkbox for force link.

you can fix it by adding it like:
uci set network.trunk0.force_link='1' and then uci commit network even when there is no link, the interface keeps up.

then on my DSA configuration I'm not sure if this something in swconfig in the bridge you need to select a option to keep up the bridge.

here is a example I have on my wireless bridge device:

root@Mochabin:~# uci show network.@device[-5]
network.cfg250f15=device
network.cfg250f15.type='bridge'
network.cfg250f15.name='br-wlan0'
network.cfg250f15.bridge_empty='1' <---------- this.
network.cfg250f15.ipv6='0'
network.cfg250f15.ports='@trunk.50'
network.cfg250f15.igmp_snooping='1'
network.cfg250f15.mtu='1500'

for reference my config from the master router:

root@Mochabin:~# uci show network.trunk
network.trunk=interface
network.trunk.proto='gretap'
network.trunk.force_link='1'
network.trunk.peeraddr='10.234.53.3'
network.trunk.ipaddr='10.234.53.1'
network.trunk.df='0'
network.trunk.delegate='0'
network.trunk.metric='10'
network.trunk.multicast='1'
network.trunk.peerdns='1'
network.trunk.defaultroute='0'
network.trunk.mtu='1500'

and the dumbap:

root@GL-AX1800:~# uci show network.trunk
network.trunk=interface
network.trunk.proto='gretap'
network.trunk.ipaddr='10.234.53.3'
network.trunk.delegate='0'
network.trunk.df='0'
network.trunk.defaultroute='0'
network.trunk.force_link='1'
network.trunk.peeraddr='10.234.53.1'
network.trunk.metric='10'
network.trunk.multicast='1'
network.trunk.tunlink='wan'

I forgot why I even added metric 10 I don't think its supposed to be there for me but I probably once did for a unknown conflict reason :stuck_out_tongue:

It was there if you look at the OP.

Thanks, I think I had tried that one before, but tried again. It doesn't help.

So I've found a clear, proximate symptom: I was about to tcpdump the GFE interface to see what was going on and when doing ifconfig -a I noticed gre4t-trunk1.12 doesn't even exist: it stopped creating interfaces at gre4t-trunk1.3 (to be clear, .4 - .11 aren't configured on that trunk and trunk2 is disabled).

Here's the config from my wireless bridge device that doesn't work (tried with and without bridge_empty):

root@C7:~# uci show network.@device[-2]
network.cfg230f15=device
network.cfg230f15.type='bridge'
network.cfg230f15.name='br-remwc'
network.cfg230f15.ipv6='0'
network.cfg230f15.ports='eth1.12' '@trunk1.12' '@trunk2.12'
network.cfg230f15.bridge_empty='1'

Here's one that does work:

root@C7:~# uci show network.@device[-11]
network.cfg1a0f15=device
network.cfg1a0f15.type='bridge'
network.cfg1a0f15.name='br-iot'
network.cfg1a0f15.ipv6='0'
network.cfg1a0f15.ports='eth1.3' '@trunk0.3' '@trunk1.3' '@trunk2.3'

And the GFE source router:

root@C7:~# uci show network.trunk1
network.trunk1=interface
network.trunk1.proto='gretap'
network.trunk1.force_link='1'
network.trunk1.peeraddr='192.168.10.254'
network.trunk1.df='0'
network.trunk1.defaultroute='0'
network.trunk1.mtu='1500'

And the dumbAP:

root@UniFi-0:~# uci show network.trunk
network.trunk=interface
network.trunk.proto='gretap'
network.trunk.force_link='1'
network.trunk.df='0'
network.trunk.defaultroute='0'
network.trunk.tunlink='lan'
network.trunk.mtu='1500'
network.trunk.peeraddr='192.168.10.5'
network.trunk.ipaddr='192.168.10.254'

Thanks!

-- Mike

From what I see all looks fine now.

But those symptoms let me think... It's not that you cannot run two GRE instances so that is neither the problem.

Now there is MTU, I do know from experience MTU can do really weird behaviour when the MTU is too low or too high.

In your GRE interfaces I did see you override it to 1500 that is a good MTU, because 1500 is what GRE documents it as a stable MTU.

But that doesn't say anything about the devices mtu, on my DSA configuration each device and bridge device also comes with a mtu, on my configuration I have set those aswell to 1500 even though I would have expected it to be overriden by GRE interface.

You could try it and see if it affects the symptom you have?

1500 is the default, but I set it anyway and it didn't help. Also in my experience the MTU would affect the ability of it to pass traffic, not bringing up the interface. The bottom line is that it never even creates the interface.

I also tried much of the other stuff you had in your config, as well as adding it to trunk0 since the only difference between the one that works and the one that doesn't is the one that works is on all the tunnels / trunks. That didn't help, either.

At this point I'm out of ideas and need to revert since I have downstream stuff that is unhappy with being offline so long.

But thanks for the suggestions!

-- Mike

I think that there are three options:

  1. allow the fragmentation of gre tunnel
  2. decrease the MTU of gre tunnel to 1476
  3. increase the MTU of the media where the gre tunnel is run. If the gre tunnel is run over ethernet then the MTU need to be at least 1524

I totally get what you're saying (I've been a sysadmin on networked systems since 1987), but there's still 4 contradictory facts:

  1. This exact same config works in <= 22.03.4
  2. Some of the identically configured tunnels still work in 23.05.0
  3. MTU problems typically manifest as full-sized packets not being passed, which is not the issue here
  4. OpenWrt never even brings the interface up

As the device is aging (8 years old!), I'd rather not reflash it again since IDK how many reflashes it has in it unless I'm more confident of a "smoking gun" or obvious debugging path.

Thanks.

-- Mike

1 Like

Updated: I tried 23.05.0-rc4 and have the same results.

Btw to @mattimat's point, I spent a couple of hours playing around with MTU settings on 22.03.4 and 23.05.0-rc3, including tcpdumping the traffic and ping -s # -M do, and verified that my MTU settings are functional. To understand why, read the later posts in [SOLVED] Gretap Tunnel and MTU-Size Problem about Layer 2 bridges. Basically by going with the default settings and not forcing MTU changes, larger GRE packets will be fragmented at the source and reassembled at the destination, which hurts performance but otherwise works just fine.

And again, all this MTU discussion ignores that 23.05.0-rc# fails to even create the gre4t-trunk1.12 interface.

Linux kernel interface max length

2 Likes

@vgaetera :man_facepalming: Thank you! I already renamed, reflashed, and verified it works.

You have no idea how much time I've wasted on this. It might have been obvious if I'd brought-up, say, gre4t-trunk1.1 - gre4t-trunk1.19 and it stopped creating interfaces at gre4t-trunk1.9, but of course I don't tunnel all my VLANs, just the ones used by the mesh APs, and so it's sparse.

2 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.