Suspected packet fragmentation issues with GRETAP tunnel

I have been attempting to setup a system using a GRETAP tunnel with a Wi-Fi underlay between 2
OpenWrt routers. This has proved to be somewhat challenging. Much of the information out there is out of date, or lacks details. Not to mention that OpenWrt is a bit of a moving target.

Let me describe my setup, then I'll describe my issue. Sorry for the lengthy description to follow,
but I want to accurately describe the setup and what is occurring.

I have a typical requirement for a home network with both "IoT" devices and general purpose "LAN"
devices (i.e. laptops, PCs, etc.). To this end, I have setup multiple VLANs for different purposes.
An "admin" VLAN (1), a "lan" VLAN (20), and an "iot" VLAN (30). I have multiple hosts connected
over wireless to the "lan" network as well as other hosts connected to the "iot" network. This is a
typical setup where hosts on the "lan" network can access the upstream internet, and hosts on the
"lan" and "iot" networks. Hosts on the "iot" network can access the upstream internet and hosts on
the "iot" network but cannot access hosts on the "lan" network. Great! So prior to starting on
this latest project, all was apparently working well with a single router (Belkin RT3200 running
OpenWrt) and various devices connected to an appropriate VLAN via an AP or wired link according to
their intended purpose.

I wanted to extend this network over a wireless link to areas that I cannot link via a wired
connection. Thus, my approach was to setup a GRETAP point to point tunnel to carry a VLAN trunk over a wireless link between my existing router (Belkin RT3200) and a second router (OpenWrt One). This will give me the option of using a wired or wireless connection for devices at the remote
location using the OpenWrt One that will be an extension of my existing network.

As an aside, in retrospect, I'm not sure that using a GRETAP point to point tunnel was the best
approach for what I am ultimately wanting to achieve. But, I have learned a bunch, and now I want
to see this through. I am thinking that a better approach would be to setup a 802.11s mesh as a
wireless backhaul with a VXLAN underlay for a VLAN trunk. But, that is for another day. I also
decided not to attempt to use relayd for this project as it was my understanding that there are
some limitations with it. And its also point to point.

So, my issue. I have set up the GRETAP tunnel over a dedicated wireless link (on its own subnet)
between the Belkin RT3200 (router "a") and the OpenWrt One (router "b"). Both routers are running
OpenWrt 24.10.5. I have a MacBook Air (lets call it "host a") connected to router "a" via a Wi-Fi
on the "lan" network. I have an Asus PN50 NUC running Linux Mint (lets call it "host b") connected
via an Ethernet cable to the LAN port on router "b" also on the "lan" network. (There are numerous other devices connected to the network, but are not relevant for the discussion.)

If I define the MTU of the OpenWrt GRETAP tunnel interface ("gt0") and the OpenWrt tunnel wireless
transport interface ("wtun") to something like 2048 on each router, everything apparently works
fine. This includes regular internet website access, a working RDP connection between host a and b
across the GRETAP tunnel, and an MQTT bridge used by Home Assistant to access the MQTT broker on my Victron Cerbo GX also across the GRETAP tunnel. I say apparently in part because my theory is that this MTU setting is large enough to avoid any fragmentation from occuring along the various paths in the network. There are numerous references out there that seem to indicate that this is a “good thing to do”...but I am not convinced. My theory is that this is just masking an underlying problem.

When I reduce the MTU on the GRETAP tunnel interface to something around 1458 (a size similar to
that also suggested in numerous references) and allow the tunnel wireless transport interface to use
its default MTU (1500), things go badly. I believe that "go badly" is caused by packets being dropped
somewhere along the path.

I have performed a number of experiments to try to isolate where the problem is.

  • I can successfully ping from router "a" targeting the wireless tunnel endpoint on router "b",
    even when specifying a large ping payload size (i.e. 2000). I can also do this in reverse from router “b”. I take this as evidence that the problem is not with the wireless tunnel transport.
  • I can also successfully ping between host "a" connected to router "a" and host "b" connected to
    router "b" only if I use a small ping payload size less than the MTU. Once my payload size
    gets to a point where fragmentation occurs, ping just hangs.
  • Interestingly I can successfully ping from router "a" to host "a" (connected via a wireless link)
    with a large ping payload.
  • I cannot successfully ping from router "b" to host "b" (connected via an Ethernet cable) with
    a large ping payload.

The configuration on each router is slightly different, so there may be something in the
configuration that causes the difference in behavior (?)

To further isolate where the problem is, I used tcpdump and wireshark to inspect various points
along the path. I start a ping from router "b" targeting host "a" (through the GRETAP tunnel) and
then attach tcpdump to various interfaces along the path. I have determined that the ping "request"
with the large payload makes it intact all the way from router "b", through the GRETAP tunnel to router "a" then on to host "a" via its wireless lan link. From the wireshark analysis, I can see the fragments and the reassembly being performed. The ping "reply" leaves host "a", traverses the
Wi-Fi lan link between host "a" and router "a", appears at the router "a" wireless AP interface (in
my case wl0-ap0), but stops there. This wireless interface appears as a port on the br-lan
bridge as a result of the OpenWrt wireless config for the AP. The other port on the br-lan bridge
is br-sw.20 which is the VLAN 20 interface on the br-sw bridge. The br-sw bridge has VLAN
filtering enabled with ports being the physical switch ports on the RT3200 and the Linux gre4t-gt0
interface that is the GRETAP tunnel endpoint. The ping "reply" payload never appears on the
br-sw.20 VLAN interface, so my assumption is the ping "reply" payload is being dropped for some
reason by the br-lan bridge and never traverses the bridge between the wl0-ap0 interface and
the br-sw.20 VLAN interface.

Ultimately, I am not sure that this issue is even related to the GRETAP tunnel, but is somehow
related to bridge configuration and packet fragmentation/reassembly of the ping response payload.
This is somewhat refuted by the fact that large ping payloads between router "b" and host "b" via a
wired connection fail. But that may be yet another issue (possibly related to VLAN filtering).
Note that host "b" (the ASUS PN50) hosts a VirtualBox VM running Home Assistant. HA is configured
to use VLAN 30, thus the configuration for both untagged VLAN 20 (for the host OS; Linux Mint) and
tagged VLAN 30 (for HAOS on the VM) on the port where host "b" is connected. This setup worked in my original configuration with a wired connection from the RT3200 to the ASUS PN50. Thus the goal of this project, replace the cable with a GRETAP tunnel over wireless.

Note that I have temporarily set the MSS size using a nftables forward rule to clamp the MSS. The
mtu_fix options didn't seem to improve the behavior, whilst setting the MSS size via nftables did
noticeably improve behavior. Setting the MSS size helps with the network behavior, but still leaves issues (such as RDP blank screen and MQTT bridge dropouts). I believe that MSS clamping is not related to the root cause of the problem, but probably still necessary in the long run.

Hopefully someone has a suggestion as to why all of this is happening, and maybe suggestions on
how to modify/improve my OpenWrt network configuration.

Thanks in advance!!!

Here is a simple diagram to help visualize the setup

I will post the configuration files in a followup…

Configuration

Belkin RT3200

ubus call system board

{
	"kernel": "6.6.119",
	"hostname": "banderson-rt32",
	"system": "ARMv8 Processor rev 4",
	"model": "Linksys E8450 (UBI)",
	"board_name": "linksys,e8450-ubi",
	"rootfs_type": "squashfs",
	"release": {
		"distribution": "OpenWrt",
		"version": "24.10.5",
		"revision": "r29087-d9c5716d1d",
		"target": "mediatek/mt7622",
		"description": "OpenWrt 24.10.5 r29087-d9c5716d1d",
		"builddate": "1766005702"
	}
}

ip link show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1504 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
3: lan1@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master br-sw state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
4: lan2@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master br-sw state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
5: lan3@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-sw state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
6: lan4@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master br-sw state DOWN mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
7: wan@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:25 brd ff:ff:ff:ff:ff:ff
8: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/tunnel6 :: brd :: permaddr e6be:f375:e28a::
9: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/gre 0.0.0.0 brd 0.0.0.0
10: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
11: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
12: ip6gre0@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/gre6 :: brd :: permaddr ca58:21ff:e4b5::
209: br-adm: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
210: br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
211: br-sw.1@br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue master br-adm state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
212: br-iot: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
213: br-sw.30@br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue master br-iot state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
218: wl0-ap1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-adm state UP mode DEFAULT group default qlen 1000
    link/ether da:ec:5e:43:34:27 brd ff:ff:ff:ff:ff:ff permaddr d8:ec:5e:43:34:27
219: wl0-ap2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-iot state UP mode DEFAULT group default qlen 1000
    link/ether de:ec:5e:43:34:27 brd ff:ff:ff:ff:ff:ff permaddr d8:ec:5e:43:34:27
220: wl0-ap3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d2:ec:5e:43:34:27 brd ff:ff:ff:ff:ff:ff permaddr d8:ec:5e:43:34:27
221: wl1-ap1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-adm state UP mode DEFAULT group default qlen 1000
    link/ether da:ec:5e:43:34:28 brd ff:ff:ff:ff:ff:ff permaddr d8:ec:5e:43:34:28
222: gre4t-gt0@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc fq_codel master br-sw state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 2a:41:a8:f5:4a:86 brd ff:ff:ff:ff:ff:ff
223: wl1-ap2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-iot state UP mode DEFAULT group default qlen 1000
    link/ether de:ec:5e:43:34:28 brd ff:ff:ff:ff:ff:ff permaddr d8:ec:5e:43:34:28
230: wl1-ap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:28 brd ff:ff:ff:ff:ff:ff
231: wl0-ap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:27 brd ff:ff:ff:ff:ff:ff
234: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
235: br-sw.20@br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff

network

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fd3b:1654:54dc::/48'
	option packet_steering '1'

config interface 'wan'
	option device 'wan'
	option proto 'dhcp'

config interface 'wan6'
	option device 'wan'
	option proto 'dhcpv6'

config device
	option type 'bridge'
	option name 'br-sw'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'lan4'
	list ports 'gre4t-gt0'

config bridge-vlan
	option device 'br-sw'
	option vlan '1'
	list ports 'gre4t-gt0:t*'
	list ports 'lan1:t*'

config bridge-vlan
	option device 'br-sw'
	option vlan '10'
	list ports 'lan1:t'

config bridge-vlan
	option device 'br-sw'
	option vlan '20'
	list ports 'gre4t-gt0:t'
	list ports 'lan1:t'
	list ports 'lan2'
	list ports 'lan4'

config bridge-vlan
	option device 'br-sw'
	option vlan '30'
	list ports 'gre4t-gt0:t'
	list ports 'lan1:t'
	list ports 'lan3'
	list ports 'lan4:t'

config bridge-vlan
	option device 'br-sw'
	option vlan '40'
	list ports 'lan1:t'

config bridge-vlan
	option device 'br-sw'
	option vlan '50'
	list ports 'lan1:t'

config device
	option type 'bridge'
	option name 'br-adm'
	option igmp_snooping '1'
	option stp '1'
	list ports 'br-sw.1'

config interface 'adm'
	option proto 'static'
	option device 'br-adm'
	option ipaddr '192.168.201.1'
	option netmask '255.255.255.0'

config device
	option type 'bridge'
	option name 'br-lan'
	option igmp_snooping '1'
	list ports 'br-sw.20'
	option stp '1'

config interface 'lan'
	option proto 'static'
	option device 'br-lan'
	option ipaddr '192.168.220.1'
	option netmask '255.255.255.0'

config device
	option type 'bridge'
	option name 'br-iot'
	option igmp_snooping '1'
	option stp '1'
	list ports 'br-sw.30'

config interface 'iot'
	option proto 'static'
	option device 'br-iot'
	option ipaddr '192.168.230.1'
	option netmask '255.255.255.0'

config interface 'wtun'
	option proto 'static'
	option ipaddr '192.168.20.10'
	option netmask '255.255.255.0'

config interface 'gt0'
	option proto 'gretap'
	option peeraddr '192.168.10.20'
	option ipaddr '192.168.10.10'
	option mtu '1452'
	option df '0'

config interface 'gre_lo'
	option proto 'static'
	option device 'lo'
	option ipaddr '192.168.10.10'
	option netmask '255.255.255.255'

config route
	option interface 'wtun'
	option target '192.168.10.20/32'
	option gateway '192.168.20.20'

firewall


config defaults
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option synflood_protect '1'

config zone
	option name 'z_lan'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'lan'

config zone
	option name 'z_wan'
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option masq '1'
	option mtu_fix '1'
	list network 'wan'
	list network 'wan6'

config forwarding
	option src 'z_lan'
	option dest 'z_wan'

config rule
	option name 'Allow-DHCP-Renew'
	option src 'z_wan'
	option proto 'udp'
	option dest_port '68'
	option target 'ACCEPT'
	option family 'ipv4'

config rule
	option name 'Allow-Ping'
	option src 'z_wan'
	option proto 'icmp'
	option icmp_type 'echo-request'
	option family 'ipv4'
	option target 'ACCEPT'

config rule
	option name 'Allow-IGMP'
	option src 'z_wan'
	option proto 'igmp'
	option family 'ipv4'
	option target 'ACCEPT'

config rule
	option name 'Allow-DHCPv6'
	option src 'z_wan'
	option proto 'udp'
	option dest_port '546'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-MLD'
	option src 'z_wan'
	option proto 'icmp'
	option src_ip 'fe80::/10'
	list icmp_type '130/0'
	list icmp_type '131/0'
	list icmp_type '132/0'
	list icmp_type '143/0'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Input'
	option src 'z_wan'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	list icmp_type 'router-solicitation'
	list icmp_type 'neighbour-solicitation'
	list icmp_type 'router-advertisement'
	list icmp_type 'neighbour-advertisement'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Forward'
	option src 'z_wan'
	option dest '*'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-IPSec-ESP'
	option src 'z_wan'
	option dest 'z_lan'
	option proto 'esp'
	option target 'ACCEPT'

config rule
	option name 'Allow-ISAKMP'
	option src 'z_wan'
	option dest 'z_lan'
	option dest_port '500'
	option proto 'udp'
	option target 'ACCEPT'

config zone
	option name 'z_adm'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'adm'

config forwarding
	option src 'z_adm'
	option dest 'z_wan'

config forwarding
	option src 'z_adm'
	option dest 'z_lan'

config forwarding
	option src 'z_adm'
	option dest 'z_iot'

config zone
	option name 'z_iot'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'iot'

config forwarding
	option src 'z_iot'
	option dest 'z_wan'

config forwarding
	option src 'z_lan'
	option dest 'z_iot'

config zone
	option name 'z_tun'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'REJECT'
	list network 'wtun'

config forwarding
	option src 'z_tun'
	option dest 'z_adm'

config forwarding
	option src 'z_tun'
	option dest 'z_iot'

config forwarding
	option src 'z_tun'
	option dest 'z_lan'

config forwarding
	option src 'z_adm'
	option dest 'z_tun'

config forwarding
	option src 'z_iot'
	option dest 'z_tun'

config forwarding
	option src 'z_lan'
	option dest 'z_tun'

dhcp

config dnsmasq
	option domainneeded '1'
	option localise_queries '1'
	option rebind_protection '1'
	option rebind_localhost '1'
	option local '/lan/'
	option domain 'lan'
	option expandhosts '1'
	option authoritative '1'
	option readethers '1'
	option leasefile '/tmp/dhcp.leases'
	option resolvfile '/tmp/resolv.conf.d/resolv.conf.auto'
	option localservice '1'
	option ednspacket_max '1232'

config dhcp 'wan'
	option interface 'wan'
	option ignore '1'

config odhcpd 'odhcpd'
	option maindhcp '0'
	option leasefile '/tmp/hosts/odhcpd'
	option leasetrigger '/usr/sbin/odhcpd-update'
	option loglevel '4'
	option piofolder '/tmp/odhcpd-piofolder'

config host
	option dns '1'
	option ip '192.168.201.10'
	option name 'banderson-gs1900'
	option mac 'D8:EC:E5:8D:FB:74'

config host
	option dns '1'
	option mac 'B8:EC:A3:E2:61:24'
	option ip '192.168.201.15'
	option name 'banderson-nwa50ax'

config dhcp 'adm'
	option interface 'adm'
	option start '100'
	option limit '150'
	option leasetime '12h'

config dhcp 'iot'
	option interface 'iot'
	option start '100'
	option limit '150'
	option leasetime '12h'

config dhcp 'lan'
	option interface 'lan'
	option start '100'
	option limit '150'
	option leasetime '12h'

config host
	option name 'einstein'
	option mac '48:e7:da:87:77:67'
	option ip '192.168.230.110'

config host
	option name 'ha-rv'
	list mac '00:1E:06:42:83:48'
	option ip '192.168.230.101'

config host
	option name 'ha-dev'
	list mac '08:00:27:79:A5:58'
	option ip '192.168.230.102'

OpenWrt One

ubus call system board

{
	"kernel": "6.6.119",
	"hostname": "banderson-owrtone",
	"system": "ARMv8 Processor rev 4",
	"model": "OpenWrt One",
	"board_name": "openwrt,one",
	"rootfs_type": "squashfs",
	"release": {
		"distribution": "OpenWrt",
		"version": "24.10.5",
		"revision": "r29087-d9c5716d1d",
		"target": "mediatek/filogic",
		"description": "OpenWrt 24.10.5 r29087-d9c5716d1d",
		"builddate": "1766005702"
	}
}

ip link show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 20:05:b6:01:10:d0 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br-sw state UP mode DEFAULT group default qlen 1000
    link/ether 20:05:b6:01:10:d1 brd ff:ff:ff:ff:ff:ff
4: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/tunnel6 :: brd :: permaddr 61b:a170:2096::
5: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/gre 0.0.0.0 brd 0.0.0.0
6: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
7: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
8: ip6gre0@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/gre6 :: brd :: permaddr ca6a:38d3:f9ee::
9: br-adm: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 20:05:b6:01:10:d1 brd ff:ff:ff:ff:ff:ff
10: br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 20:05:b6:01:10:d1 brd ff:ff:ff:ff:ff:ff
11: br-sw.1@br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue master br-adm state UP mode DEFAULT group default qlen 1000
    link/ether 20:05:b6:01:10:d1 brd ff:ff:ff:ff:ff:ff
12: br-iot: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 20:05:b6:01:10:d1 brd ff:ff:ff:ff:ff:ff
13: br-sw.30@br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue master br-iot state UP mode DEFAULT group default qlen 1000
    link/ether 20:05:b6:01:10:d1 brd ff:ff:ff:ff:ff:ff
14: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 20:05:b6:01:10:d1 brd ff:ff:ff:ff:ff:ff
15: br-sw.20@br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether 20:05:b6:01:10:d1 brd ff:ff:ff:ff:ff:ff
16: gre4t-gt0@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1452 qdisc fq_codel master br-sw state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 36:04:cd:98:ad:c4 brd ff:ff:ff:ff:ff:ff
17: phy0-ap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether 20:05:b6:01:10:d2 brd ff:ff:ff:ff:ff:ff
18: phy0-sta0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DORMANT group default qlen 1000
    link/ether 20:05:b6:01:10:d5 brd ff:ff:ff:ff:ff:ff
19: phy0-ap1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-adm state UP mode DEFAULT group default qlen 1000
    link/ether 20:05:b6:01:10:d3 brd ff:ff:ff:ff:ff:ff permaddr 20:05:b6:01:10:d2
20: phy0-ap2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-iot state UP mode DEFAULT group default qlen 1000
    link/ether 20:05:b6:01:10:d4 brd ff:ff:ff:ff:ff:ff permaddr 20:05:b6:01:10:d2

network

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fd34:2b45:f4ab::/48'
	option packet_steering '1'

config interface 'wan'
	option device 'eth0'
	option proto 'dhcp'

config interface 'wan6'
	option device 'eth0'
	option proto 'dhcpv6'

config device
	option type 'bridge'
	option name 'br-sw'
	option igmp_snooping '1'
	option stp '1'
	list ports 'eth1'
	list ports 'gre4t-gt0'

config bridge-vlan
	option device 'br-sw'
	option vlan '1'
	list ports 'gre4t-gt0:t*'

config bridge-vlan
	option device 'br-sw'
	option vlan '20'
	list ports 'eth1:u*'
	list ports 'gre4t-gt0:t'

config bridge-vlan
	option device 'br-sw'
	option vlan '30'
	list ports 'eth1:t'
	list ports 'gre4t-gt0:t'

config interface 'wtun'
	option proto 'static'
	option ipaddr '192.168.20.20'
	option netmask '255.255.255.0'

config interface 'gt0'
	option proto 'gretap'
	option ipaddr '192.168.10.20'
	option peeraddr '192.168.10.10'
	option mtu '1452'
    option df '0'

config device
	option type 'bridge'
	option name 'br-adm'
	option igmp_snooping '1'
	option stp '1'
	list ports 'br-sw.1'

config interface 'adm'
	option proto 'static'
	option device 'br-adm'
	option ipaddr '192.168.201.2'
	option netmask '255.255.255.0'
	option gateway '192.168.201.1'

config device
	option name 'br-lan'
	option type 'bridge'
	option igmp_snooping '1'
	option stp '1'
	list ports 'br-sw.20'

config interface 'lan'
	option device 'br-lan'
	option proto 'static'
	option ipaddr '192.168.220.2'
	option netmask '255.255.255.0'
	option gateway '192.168.220.1'

config device
	option type 'bridge'
	option name 'br-iot'
	option igmp_snooping '1'
	option stp '1'
	list ports 'br-sw.30'

config interface 'iot'
	option proto 'static'
	option device 'br-iot'
	option ipaddr '192.168.230.2'
	option netmask '255.255.255.0'
	option gateway '192.168.230.1'

config interface 'gre_lo'
	option proto 'static'
	option device 'lo'
	option ipaddr '192.168.10.20'
	option netmask '255.255.255.255'

config route
	option interface 'wtun'
	option target '192.168.10.10/32'
	option gateway '192.168.20.10'

firewall

config defaults
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option synflood_protect '1'

config zone
	option name 'z_lan'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'lan'

config zone
	option name 'z_wan'
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option masq '1'
	option mtu_fix '1'
	list network 'wan'
	list network 'wan6'

config forwarding
	option src 'z_lan'
	option dest 'z_wan'

config rule
	option name 'Allow-DHCP-Renew'
	option src 'z_wan'
	option proto 'udp'
	option dest_port '68'
	option target 'ACCEPT'
	option family 'ipv4'

config rule
	option name 'Allow-Ping'
	option src 'z_wan'
	option proto 'icmp'
	option icmp_type 'echo-request'
	option family 'ipv4'
	option target 'ACCEPT'

config rule
	option name 'Allow-IGMP'
	option src 'z_wan'
	option proto 'igmp'
	option family 'ipv4'
	option target 'ACCEPT'

config rule
	option name 'Allow-DHCPv6'
	option src 'z_wan'
	option proto 'udp'
	option dest_port '546'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-MLD'
	option src 'z_wan'
	option proto 'icmp'
	option src_ip 'fe80::/10'
	list icmp_type '130/0'
	list icmp_type '131/0'
	list icmp_type '132/0'
	list icmp_type '143/0'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Input'
	option src 'z_wan'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	list icmp_type 'router-solicitation'
	list icmp_type 'neighbour-solicitation'
	list icmp_type 'router-advertisement'
	list icmp_type 'neighbour-advertisement'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Forward'
	option src 'z_wan'
	option dest '*'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-IPSec-ESP'
	option src 'z_wan'
	option dest 'z_lan'
	option proto 'esp'
	option target 'ACCEPT'

config rule
	option name 'Allow-ISAKMP'
	option src 'z_wan'
	option dest 'z_lan'
	option dest_port '500'
	option proto 'udp'
	option target 'ACCEPT'

config zone
	option name 'z_tun'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'wtun'

config forwarding
	option src 'z_tun'
	option dest 'z_lan'

config forwarding
	option src 'z_lan'
	option dest 'z_tun'

config zone
	option name 'z_adm'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'adm'

config forwarding
	option src 'z_adm'
	option dest 'z_lan'

config forwarding
	option src 'z_adm'
	option dest 'z_tun'

config forwarding
	option src 'z_tun'
	option dest 'z_adm'

config forwarding
	option src 'z_adm'
	option dest 'z_wan'

config zone
	option name 'z_iot'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'iot'

config forwarding
	option src 'z_iot'
	option dest 'z_tun'

config forwarding
	option src 'z_iot'
	option dest 'z_wan'

config forwarding
	option src 'z_adm'
	option dest 'z_iot'

config forwarding
	option src 'z_tun'
	option dest 'z_iot'

config forwarding
	option src 'z_lan'
	option dest 'z_iot'

dhcp

config dnsmasq
	option domainneeded '1'
	option localise_queries '1'
	option rebind_protection '0'
	option domain 'lan'
	option expandhosts '1'
	option cachesize '1000'
	option readethers '1'
	option leasefile '/tmp/dhcp.leases'
	option localservice '1'
	option ednspacket_max '1232'
	option local '/lan/'
	option noresolv '1'
	list server '/lan/192.168.201.1'
	list server '192.168.201.1'

config dhcp 'lan'
	option interface 'lan'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option dhcpv4 'server'
	option dhcpv6 'hybrid'
	option ra 'hybrid'
	list ra_flags 'managed-config'
	list ra_flags 'other-config'
	option ignore '1'

config dhcp 'wan'
	option interface 'wan'
	option ignore '1'

config odhcpd 'odhcpd'
	option maindhcp '0'
	option leasefile '/tmp/hosts/odhcpd'
	option leasetrigger '/usr/sbin/odhcpd-update'
	option loglevel '4'
	option piofolder '/tmp/odhcpd-piofolder'

config dhcp 'adm'
	option interface 'adm'
	option ignore '1'
	option start '100'
	option limit '150'
	option leasetime '12h'

config dhcp 'iot'
	option interface 'iot'
	option ignore '1'
	option start '100'
	option limit '150'
	option leasetime '12h'

config dhcp 'gt0'
	option interface 'gt0'
	option ignore '1'

config dhcp 'wtun'
	option interface 'wtun'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option ignore '1'

Have you experienced any issues when using the large MTU?

Try

option df 0

For gre tunnel interface.
You will need to use mini-jumbo size on tunnel's outer interface totunnel actual 1500B packets cleanly.

@eduperez

Not that I can readily see. I should probably go back and try some more experiments to see if large packets are dropped when I have set a large MTU. I am really trying to determine why I am seeing the behavior that I am seeing with the smaller MTU.

@brada4

I am already using option df 0 on my GRETAP tunnel interface definition (gt0).

Can you explain why I need to use “mini-jumbo” size on the tunnel outer interface (assuming that this would be gt0 in my setup). From my experiments it seems as if my current setting (MTU=1452) properly fragments/reassembles packets across the tunnel as evidenced by the ability to ping the remote tunnel endpoint with a large ping payload.

Note that when I set the MTU on the tunnel interface to 1452, the br-sw bridge that it is enslaved to as well as the br-lan bridge have their MTU set to 1452.

What am I missing here? Shouldn’t this “just work” sans performance (due to frag/reassembly) regardless of what the MTU setting is? Maybe I have a misconception of what the goal is when deciding what the MTU setting should be. Hmm, maybe something along the path has set the DF bit such that the bridge drops the large ping payload reply rather than reassembling the fragments?

Thanks for your responses.

to avoid fragmentation, alternatively (if legacy switches in the middle do not support > 1500 MTU) you could route and benefit from frag-needed icmp.

gre encapsulation is 30 bytes added, so ethX backend being 1500 gre interface needs to be mtu 1470 or less, if you do not allow for overhead then the tail of packet payload gets discarded at some point. Especially painful with UDP like syslog etc.

I went back and set the MTU on the GRETAP interface to 2000. This alone did not work (leaving the underlying wireless MTU as unspecified). When additionally setting the MTU of the wireless interface to 2000, things do appear to work, even with huge ping payloads (e.g. 4000). I haven’t examined things with tcpdump and wiresharkbut I am assuming that frag/reassembly is working since large ping payloads work.

I am confused.

Understood. This is exactly why I set the MTU on the GRETAP interface to 1452, so it seems as if this should have worked. Are you saying that it might be possible that the switch on the RT3200 and/or driver is the problem? What about similar issues with the OpenWrt One?

Likely one of fragmentation offloads like gro gso lro stands in the way and you will get same mis-behaviour with any device with very functional network cards.
Try disabling ethtool -k/K offloads one by one...

OK, thanks for the suggestion.

How is it that this H/W offload would affect a virtual bridge device on a different router (Belkin R3200, not on the OpenWrt One…although it is probably happening there too in the reverse, I just haven’t spent the time to analyze it with tcpdump). According to my investigations, the large ping reply payload is presumably being dropped by the br-lan bridge on the Belkin RT3200. Is there a way I can determine why the bridge is dropping the packet? Or at least confirm that it is the bridge doing this. Like enable logging somewhere.

So which interface should I check with ethtool? None of the ethernet NICs should be involved since my test runs over wireless. Of course the br-sw bridge on the Belkin RT3200 has dsa conduits as ports for the devices switch ports…so maybe it is somehow involved?

Looking at the wireless NIC on the Belkin RT3200 that was the last place the large packet appeared before being dropped:

root@banderson-rt32:~# ethtool -k wl0-ap0
Features for wl0-ap0:
rx-checksumming: on [fixed]
tx-checksumming: off
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: off
	tx-scatter-gather: off [fixed]
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [fixed]
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp-mangleid-segmentation: off [fixed]
	tx-tcp6-segmentation: off [fixed]
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: on
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

What should I address here?

Maybe I am just chasing a rabbit down a hole here and be satisfied that setting a large MTU on the GRETAP tunnel and corresponding wireless transport and be done with this. Very frustrating…its hard to imagine that there isn’t some issue somewhere along the line.

Thanks for your response and help.

ethtool -k eth0 | grep -v fixed
ethtool -k grr0 | grep -v fixed

@brada4

First, let me say that I appreciate you taking the time and patience to help me work through this problem. I am a retired software geek (developer) that is trying to up my networking foo and keep my brain from rotting! I am not a network engineer. This project, and working with OpenWrt has certainly helped in both regards…that is, upping my networking foo and keeping my brain from rotting! This has been a challenging project I must say.

With that said, it would help me if you were more explicit about what you are asking me to do and/or what information you might want.

What interfaces are you interested in?

The path that is traversed by the failing large payload ping is:

(OpenWrt One) br-lan → br-sw → GRETAP tunnel interface → phy0-sta0 → … → (Belkin RT3200) wl0-ap3 → GRETAP tunnel interface → br-sw → br-lan (which properly reassembles the fragmented request) → wl0-ap0 → (Macbook) → wl0-ap0 → br-lan (which drops the fragmented reply before it gets to the br-sw.20 VLAN interface port on the bridge)

Where do I start?

Again, is there some logging or something I can enable that will allow me to determine why br-lan is dropping the (fragmented) ping response?

Just for grins, assuming that I grok what info you want, here is the output of ethtool for a couple of interfaces on the Belkin RT3200 where the reply packet is dropped:

root@banderson-rt32:~# ethtool -k wl0-ap0 | grep -v fixed
Features for wl0-ap0:
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
tx-nocache-copy: off
rx-gro-list: on
rx-udp-gro-forwarding: off
root@banderson-rt32:~# ethtool -k gre4t-gt0 | grep -v fixed
Features for gre4t-gt0:
tx-checksumming: on
	tx-checksum-ip-generic: on
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: on
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: on
	tx-tcp-mangleid-segmentation: on
	tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
highdma: on
tx-sctp-segmentation: on
tx-udp-segmentation: on
tx-nocache-copy: off
rx-gro-list: on
rx-udp-gro-forwarding: off
root@banderson-rt32:~# ethtool -k wl0-ap3|grep -v fixed
Features for wl0-ap3:
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
tx-nocache-copy: off
rx-gro-list: on
rx-udp-gro-forwarding: off

And the (redacted) output of ip link show

297: br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
298: br-sw.1@br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master br-adm state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
300: br-sw.30@br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master br-iot state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
301: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
302: br-sw.20@br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
304: wl0-ap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:27 brd ff:ff:ff:ff:ff:ff
307: wl0-ap3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d2:ec:5e:43:34:27 brd ff:ff:ff:ff:ff:ff permaddr d8:ec:5e:43:34:27
308: gre4t-gt0@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc fq_codel master br-sw state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 22:67:92:d2:08:53 brd ff:ff:ff:ff:ff:ff

The configuration that works (MTU=2000 set on gre4t-gt0 and wl0-ap3) ends up leaving the MTU on br-sw and br-lan at its default value (1500). I also see that the MTU on the wl0-ap0 interface (a port on the br-lan bridge) remains at 1500 in both cases. Hmm, this means that when I set the MTU to 1400 on gre4t-gt0 and wl0-ap3 (the tunnel wireless link) that is then propagated to br-lan, a MTU mismatch occurs between the br-lan bridge (MTU=1400) and the wl0-ap0 port (MTU=1500). Is this a problem? What is the expectation of the MTU setting on a bridge vs its ports?

14: br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
15: br-sw.1@br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-adm state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
17: br-sw.30@br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-iot state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
18: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
19: br-sw.20@br-sw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:26 brd ff:ff:ff:ff:ff:ff
21: wl0-ap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether d8:ec:5e:43:34:27 brd ff:ff:ff:ff:ff:ff
24: wl0-ap3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether d2:ec:5e:43:34:27 brd ff:ff:ff:ff:ff:ff permaddr d8:ec:5e:43:34:27
25: gre4t-gt0@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2000 qdisc fq_codel master br-sw state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 7a:dd:06:4b:1c:43 brd ff:ff:ff:ff:ff:ff

Remove the gre backend from the switches.

OK. On the Belkin RT3200, I removed the gre4t-gt0 interface from the br-swbridge that contains the dsa conduits for the switch. I then added gre4t-gt0.20to the br-lanbridge (and similarly for br-admand br-iot). That leaves the br-lanbridge with br-sw.20and gre4t-gt0.20as ports. I made a similar modification on the OpenWrt One router. Didn’t work. When I attached tcpdump to the wireless interface on the RT3200, I see frag/reassembly of the reply. Attached to the bridge itself, I see the frag/reassembly of the reply. But, the reply never makes it to the bridge port containing gre4t-gt0.20. So, more or less the same symptom as before…I have just managed to move it around a bit.

The question still remains: Why does the ping reply packet fail to traverse the bridge ports?

I am about ready to throw in the towel on this and admit that we just need to set the MTU on the gre4t-gt0and tunnel wireless transport interfaces to something like 2000 and call it a day. Its a mystery at this point.

Because tcp udp fragmentation is offloaded to intelligent netcards.

…which presumably are screwing this up. Or the driver is screwing this up. Somehow. Interesting that a modestly large MTU (2000) works, but a (modestly) small MTU works in one direction only.

Thanks for your help!

So, I am going to retract this statement as I do not now believe that it accurately reflects the root cause of the problem.

I found that the following blog posts were very enlightening regarding how to appropriately pick MTUs throughout the network and why I have been seeing packets being “black holed”:

Hopefully this will help others that may come across this post.

So, my take on what was happening. To recap the scenario: I started a large payload (4000 bytes) ping from router “b” (OpenWrt One) connected to router “a” (Belkin RT3200) via a GRETAP tunnel targeting an end device connected to router “a”.

  • The large payload (4000 bytes) ping request arrives at router “a” via the GRETAP tunnel already fragmented (DO bit reset on the GRETAP interface so that it is free to do fragmentation).
  • The GRETAP tunnel interface on both routers is configured as a port on the br-sw bridge (no IP attached) with MTU=1400. The wireless IP tunnel transport is also configured with a MTU=1400.
  • The wireless AP (which the target end device is attached to) is also a port on the br-sw bridge, with MTU=1500.
  • The bridge, seeing the fragmented packet arriving on its GRETAP tunnel port is able to forward that to its wireless AP port since the packet is smaller than the MTU of the wireless AP.
  • The ping request arrives at the end device. The end device echos the request back towards the origin.
  • The ping reply arrives back at the wireless AP on router “a” from the end device.
  • The br-sw bridge, seeing that the (fragmented) reply packet is 1500 bytes, drops the packet since its own MTU is 1400. Note that the bridge typically has an MTU that is the size of the smallest MTU of its attached ports. Bang! Black hole. And no ICMP PTB (packet too big) messages can be issued since this is a L2 bridge (no IP attached).

Also, a large payload ping sent from an end device attached to router “b”, never makes it past the br-sw bridge on router “b” since the MTU of the bridge is also 1400 due to the MTU of its GRETAP tunnel port.

The solution of course is to set the MTU of the GRETAP tunnel and its underlying IP transport network to >1500. The MTU of the GRETAP tunnel MUST be the same as the MTU of the underlying IP transport network or we again run into packets being dropped when trying to traverse between the interface with the larger MTU to the interface with the smaller MTU.

Since my network end devices have the standard MTU of 1500, it seems reasonable that setting the MTU of the GRETAP tunnel and its underlying IP transport to 1600 is sufficient to accommodate the overhead that the GRETAP tunnel encapsulation adds to a standard 1500 byte frame. This also implies that the MTU of the br-sw bridge will remain at 1500.

There is a lot of confusing and conflicting information out there regarding what the proper MTU of a GRETAP tunnel should be. Hopefully this clears things up a bit. I was never comfortable with just setting the MTU of the tunnel and its transport to some arbitrarily big MTU without understanding why I should do this. Long journey (for me at least). My network foo is just a littler bigger than it was before.

Unfortunately, the only way to configure the MTU of the underlying IP network is via /etc/config/network. This parameter is not exposed in Luci. And it is imperative that the MTU of the GRETAP tunnel is the same as the MTU of the underlying transport.

Cheers…