Issue Introduced in 22.03.0-rc4 (still there in rc5 an rc6)

pmagid · August 4, 2022, 2:43am

Issue Description: Any Openwrt router running 22.03.0-rc4, 22.03.0-rc5 or 22.03.0-rc6 on my network behaves as follows: When freshly flashed or rebooted any client attached to the router will drop packets say to 8.8.8.8 for a while on connecting to the router but then will successfully ping. I have 802.11r set up so I am used to minimal packet loss on switching base stations, so this pause in routing packets immediately got my attention. After 12 to 24 hours packets are not routeable beyond the base station at all no matter how long you wait. If the base station ip is 10.2.1.8 it is almost immediately pingable but nothing beyond that address. My main household router (pfsense) is not pingable and obviously nor is 8.8.8.8. My Openwrt router setup works on 22.03.0-rc1 and before. I have rolled back to 22.03.0-rc1 with no changes in setup to keep things running. I have experienced this issue on working 22.03.0-rc1 routers upgraded to 22.03.0-rc4, 22.03.0-rc5, and 22.03.0-rc6. To make sure there are no dangling incompatible settings I did a fresh from scratch install of 22.03.0-rc6 and then hand configured it to match my normal setup and I still have the same behavior. This does not seem to be hardware specific as it is happening on at least three different sets of hardware (Netgear R6220, Totolink X5000R, Linksys E8450 (UBI)).

My Network: [cable modem] <-> [Pfsense Router] <-> 2* [Openwrt 22.03.0-rc1 base stations]
All my network and routing services are handled by my pfsense router. My two Openwrt base stations are just there for wireless connectivity. I have no WAN network on my Openwrt boxes. I have a LAN network on vlan br-lan.1. I have two guest networks on br-lan.3 and br-lan.5 respectively. Wireless appears to be unaffected and working correctly... It is routing to my router and beyond that is an issue.

One difference I have noticed is on 22.03.0-rc1 the global target and gateway are on lan

on 22.03.0-rc6 the global target and gateway is on GUEST1

I dont know if this is a red herring or is significant...

Some additional screenshots that show my setup:

I apologize for this somewhat nebulous report. I am hoping I can get some help on what to look at next as I am now drawing a blank and am embarrassed at how long I have spent without success.

Bill · August 4, 2022, 4:29am

cat /etc/board.json

cat /etc/board.json

Let's see the delta if any.

pmagid · August 4, 2022, 5:47am

lukasz92 · August 4, 2022, 10:07am

@pmagid but all wireless devices you have are being handled by mt76 driver.
This could be the issue of common code of the driver. There were many reports of problems, and many are not solved.

pmagid · August 4, 2022, 2:32pm

@lukasz92 I was worried about that... I have a linksys wrt1200 (marvell). I will try with that and report back here later...

QuestionMan · August 4, 2022, 4:56pm

Seriously? So if I get a router that uses the mt76 driver should I stay at 22.02.3 or go lower/higher?

pmagid · August 4, 2022, 11:26pm

@lukaz92 I can confirm that my Linksys WRT1200AC appears to be behaving correctly on the newer RCs. My Apple Mac client switches to the Marvell based AP with only the drop of a packet or two (As expected with 802.11r).... To be sure we will have to wait until tomorrow morning..... With an mt76 nothing would be working after 12-24 hours.... I am betting it will still be good with this Marvell box. I think there is an issue w/ vlans and mt76 possibly... Is there anything I can do to further chase down the issue or just hang tight for a mt76 update and a newer RC?

pmagid · August 5, 2022, 2:01pm

@lukasz92 I can confirm 12 hours later the Linksys WRT1200AC is working correctly. Its definitely an issue with the mt76 driver (somewhere).

anon78773196 · August 5, 2022, 2:20pm

maybe interesting @nbd

pmagid · August 5, 2022, 4:42pm

@nbd is there anything I can do to help or track down and narrow in on the issue?

lukasz92 · August 5, 2022, 5:25pm

Hmmm.. https://github.com/openwrt/openwrt/commit/33df033b73365487c5bb5a58b77aed04d4ca6ac1 - this is the faulty commit.

When I checked 22.03 tags, 22.03.0-rc4 and 22.03.0-rc3 have the same version of mt76 driver (22.03-rc5 introduced many changes). The commit I refer is the only change for mac80211 stack between 22.03.0-rc4 and 22.03.0-rc3.

There is an interesting comment below:

pmagid · August 5, 2022, 7:08pm

FWIW I am not using 802.11s which this commit also breaks according to the comments....

kramsac · August 5, 2022, 9:45pm

I don't understand why you have all the ports tagged in VLAN 3 and 5, and yet they are also untagged in VLAN1 and have PVID set to 1 (indicated by the asterix next to each port in VLAN 1) - This doesn't make any sense to me....Tagged ports are essentially trunk ports, for use to connect to other switches (including the switch in all-in-one devices).

Currently, if you connect a computer to one of the ports (for example, port 3), the computer will only send untagged data. As the port has PVID 1, it will be tagged with ID '1' on ingress, and the switch will send all data to VLAN 1 (I.e br-lan.1). Data leaving port 3 will be tagged with VLAN tag 3 AND 5 - dumb devices like computers/end-devices can't understand tagged data, so it will be discarded.

You need to re-think the design of your VLANs.

Untagged ports are for use with end-devices.

Tagged ports are for use with other switches.

How are your dumb access points connected to the router? Mesh? ethernet?

pmagid · August 5, 2022, 9:54pm

My dumb aps are all attached with ethernet. (I am not using mesh)

I set up my vlans like this because this is what has worked for what I need more or less from day one... Am I certain it is optimal / correct: no I am not.... I can look at the design for sure.... I am always up for improving things. But that vlan design is not the cause of the problem I am having, right?

kramsac · August 5, 2022, 10:07pm

Fix the VLAN problems - it will probably sort this out. Ports with the dumb APs connected should be tagged (on both the AP and the switch). Depending on how your WiFi is set up on the different APs, you may need to tag them on VLAN 1, 3, AND/OR 5 (for example, if you have 2 ssids on the same AP belonging to guest and Lan, they should be tagged on VLAN 3 and 1). They definitely shouldn't be untagged though.

PolynomialDivision · August 5, 2022, 10:49pm

I had/have similar issues

github.com/openwrt/openwrt

mt7622: packet passing ethernet to wifi broken

opened 07:18AM - 26 May 22 UTC

PolynomialDivision

release/22.03

Setup: ``` Client <-WiFi (wlan5-ff)-> AP (Belkin MT7622) <-Ethernet Vlan 40-> …Core-Router (Belkin MT7622) (switch0.40 br-dhcp) ``` What works: - Clients' packets arrive at Core-Router - Core-Router response - Response can be seen on switch0.40 on AP What does not work: - Packets are never passed to wlan5-ff (client network) How to reproduce: - Try to reach a client from Lan (VLAN) to WiFi I suspect: A patch series is adding offloading from **ethernet to WLAN** on MT7622 SoC (https://lwn.net/Articles/862864/). As explicitly stated, **WLAN->Ethernet offload** is not supported by MT7622. So probably, the offload fails, and that is why we see the packets arriving at the core-router because there is no offloading from WLAN->Ethernet. Bisecting shows that OpenWrt 22.03-rc1 is working, but 22.03-SNAPSHOT is not. The log shows a probably related commit (https://github.com/openwrt/openwrt/commit/77e123340f0b5490905e27ddc92f0dff8ed017a5). I suspect some of the patches break the passing from ethernet to wifi: ``` ethernet: mtk_eth_soc: add support for coherent DMA It improves performance by eliminating the need for a cache flush on rx and tx In preparation for supporting WED (Wireless Ethernet Dispatch), also add a function for disabling coherent DMA at runtime. ``` ``` arm64: dts: mediatek: mt7622: add support for coherent DMA It improves performance by eliminating the need for a cache flush on rx and tx ``` ``` net: ethernet: mtk_eth_soc: implement flow offloading to WED devices This allows hardware flow offloading from Ethernet to WLAN on MT7622 SoC ``` ``` arm64: dts: mediatek: mt7622: introduce nodes for Wireless Ethernet Dispatch Introduce wed0 and wed1 nodes in order to enable offloading forwarding between ethernet and wireless devices on the mt7622 chipset. ``` ``` net: ethernet: mtk_eth_soc: add ipv6 flow offload support Add the missing IPv6 flow offloading support for routing only. Hardware flow offloading is done by the packet processing engine (PPE) of the Ethernet MAC and as it doesn't support mangling of IPv6 packets, IPv6 NAT cannot be supported. ``` ``` net: ethernet: mtk_eth_soc: support TC_SETUP_BLOCK for PPE offload This allows offload entries to be created from user space ``` ``` net: ethernet: mtk_eth_soc: remove bridge flow offload type entry support According to MediaTek, this feature is not supported in current hardware ``` ping @nbd168 --- **Setup to reproduce:** Change network config to: ``` config interface 'loopback' option device 'lo' option proto 'static' option ipaddr '127.0.0.1' option netmask '255.0.0.0' config device option type 'bridge' option name 'switch0' config bridge-vlan option device 'switch0' option vlan '40' option ports 'wan:t lan1:t lan2:t lan3:t lan4:t' config device option name 'br-lan' option type 'bridge' option macaddr '02:00:00:00:00:01' list ports 'switch0.40' config interface 'lan' option device 'br-lan' option proto 'static' option ipaddr '192.168.1.1' option netmask '255.255.255.0' option ip6assign '60' ``` Plug your pc into the router and add the vlan: ``` sudo ip link add link enp0s25 name eth0.40 type vlan id 40 sudo ip addr add 192.168.1.42/24 dev eth0.40 sudo ip link set dev eth0.40 up ``` Now add another device to the wifi and try to ping it. It will work from the router, but not from the pc. Maybe related: - https://github.com/openwrt/openwrt/commit/0f029b3d2b505b40aca9a24a002838ed1060f83d

pmagid · August 6, 2022, 12:19am

@kramsac I think you are jumping to conclusions and misunderstanding my topology. And assuming it is a problem when it may or may not be. When responding and displaying shortness please remember I am trying to help improve the quality/robustness of openwrt...

I have a pfsense router that handles all my networking including the setup of the vlans. The vlans are used to segregate and dish guest network class c addresses that are different from each other and different than my main network. Attached to that pfsense router is an unmanaged switch attached via ethernet to the unmanaged switch are two dumb aps. Those dumb aps need to be able to allow connections on the separate 3 networks (guest guest1 and lan)

I will gladly try switching the the untagged to tagged and report back here.

Please remember two things: 1) this has worked from 22.03.0-rc1 and prior and for years at that, 2) it continues to work with radios based on other chipsets (Marvell).

pmagid · August 6, 2022, 12:48am

@kramsac Please carefully read everything I have posted. You have jumped to conclusions and shown you have missed key things I have said in multiple instances.... For example, you asked if I was using mesh when just one post before I said I was not using 802.11s. You seem to not have noticed that I stated repeatedly this was a configuration that had worked for years and continues to work on other chipsets. Yet you approach the situation like I am a dumbass with a wrong vlan configuration that cant get that to work (been there done that but that is not the case here). If that were the case it would never have worked and would not currently work on the problematic RCs with other chipsets.

That all being said I tried both:

As I expected neither worked and neither solved the issue.

pmagid · August 6, 2022, 12:54am

@PolynomialDivision thank you for referencing https://github.com/openwrt/openwrt/issues/9945 it looks like this is what I am struggling with....

Wow, its been open since May 26.....

kramsac · August 6, 2022, 7:48am

I was respectful and took time to try and help you. To each his own I guess.