Roaming Issues Xiaomi AX3600

Perhaps somewhat naievly I've built qca_nss_bridge_mgr as well as qca_nss_vlan which it seems to depend on.

I actually spent more time preparing the Makefile to get it to attempt to build the modules than fixing the one issue the modules had. Good learning experience :slight_smile:

It sort of works... in terms of 1 step forward 2 steps back.

The Pi and Macbook can roam freely between the Asus & 703n.

That's where the good news ends.

4 or 5 times in a row it panicked at pretty much the same uptime +- 5s. As I'm writing this it's somehow survived longer and panicked at 848s. Always at br_fdb_bridge_dev_get_and_hold, which I'll get to later.

[  496.607273] Mem abort info:
[  496.614055]   ESR = 0x96000004
[  496.616742]   EC = 0x25: DABT (current EL), IL = 32 bits
[  496.619904]   SET = 0, FnV = 0
[  496.625330]   EA = 0, S1PTW = 0
[  496.628229] Data abort info:
[  496.631231]   ISV = 0, ISS = 0x00000004
[  496.634349]   CM = 0, WnR = 0
[  496.637955] [0000ffffffc0101c] address between user and kernel address ranges
[  496.641053] Internal error: Oops: 96000004 [#1] SMP
[  496.648155] Modules linked in: iptable_nat ath11k_ahb ath11k ath10k_pci ath10k_core ath xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD nf_nat nf_flow_table nf_conntrack mac80211 ipt_REJECT cfg80211 xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG ppp_async nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables hwmon crc_ccitt compat qca_nss_pppoe pppoe pppox ppp_generic slhc qca_nss_bridge_mgr qca_nss_vlan nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 qca_nss_drv qca_nss_dp qca_ssdk seqiv jitterentropy_rng drbg michael_mic hmac cmac leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_qcom gpio_button_hotplug
[  496.698321] CPU: 0 PID: 280 Comm: kworker/0:2 Tainted: G        W         5.10.54 #0
[  496.720557] Hardware name: Xiaomi AX3600 (DT)
[  496.728374] Workqueue: events_long br_fdb_cleanup
[  496.732619] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[  496.737311] pc : br_fdb_bridge_dev_get_and_hold+0x4/0x38
[  496.743395] lr : nss_bridge_mgr_find_instance+0x100/0x1f0 [qca_nss_bridge_mgr]
[  496.748680] sp : ffffffc013063c70
[  496.755703] x29: ffffffc013063c70 x28: 0000000000000000 
[  496.759093] x27: ffffff8003424d50 x26: ffffffc01004c930 
[  496.764475] x25: ffffffc01238b9b0 x24: ffffff80034248c0 
[  496.769771] x23: ffffffc013063d8a x22: 0000000000000000 
[  496.775066] x21: 00000000000124f8 x20: 00000000fffffffe 
[  496.780361] x19: ffffffc013063d8a x18: 0000000000000000 
[  496.785655] x17: 0000000000000000 x16: 0000000000000000 
[  496.790951] x15: 0000000000000000 x14: 0000000000000040 
[  496.796247] x13: 0000000000000228 x12: 0000000000000000 
[  496.801541] x11: 0000000000000000 x10: 0000000000000000 
[  496.806837] x9 : 0000000000000000 x8 : ffffff801f6cde00 
[  496.812132] x7 : ffffffc01232ecd8 x6 : 0000000000000000 
[  496.817427] x5 : ffffffc013063b48 x4 : ffffffc0089fd8c8 
[  496.822722] x3 : 0000000000000000 x2 : ffffffc013063d8a 
[  496.828017] x1 : 0000000000000000 x0 : 4300ffffffc01004 
[  496.833313] Call trace:
[  496.838604]  br_fdb_bridge_dev_get_and_hold+0x4/0x38
[  496.840784]  atomic_notifier_call_chain+0x58/0x88
[  496.845986]  br_fdb_cleanup+0x1a4/0x1e8
[  496.850587]  process_one_work+0x200/0x3b0
[  496.854231]  worker_thread+0x54/0x4e8
[  496.858396]  kthread+0x124/0x128
[  496.862043]  ret_from_fork+0x10/0x30
[  496.865345] Code: 9400bab0 d4210000 17ffffd2 d503233f (f9400c01) 
[  496.868905] ---[ end trace 04421dadb8445493 ]---
[  496.874893] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[  496.879587] SMP: stopping secondary CPUs
[  496.886265] Kernel Offset: disabled
[  496.890341] CPU features: 0x0040002,20002000
[  496.893553] Memory Limit: none
[  497.098090] Rebooting in 3 seconds..

Similarly connecting to the Ath11 network causes an instant panic at the same place.

Now to the build... All that's missing is the function br_fdb_bridge_dev_get_and_hold() which as noted above just happens to cause the panic. Curiously this function is nothing more than a wrapper

struct net_device *br_fdb_bridge_dev_get_and_hold(struct net_bridge *br)
{
	dev_hold(br->dev);
	return br->dev;
}
EXPORT_SYMBOL_GPL(br_fdb_bridge_dev_get_and_hold);

And is called just once. It might make more sense if there was an accompanying br_fdb_bridge_dev_put() which was a wrapper for dev_put() but alas, inside nss_bridge_mgr_fdb_update_callback() they call dev_put() directly. I'm sure QC must have big plans for br_fdb_bridge_dev_put() in the future!

That snippet lives in net/bridge/br_fdb.c

The the following definition goes in include/linux/if_bridge.h

extern struct net_device *br_fdb_bridge_dev_get_and_hold(struct net_bridge *br);

The qca-nss-clients Makefile:

include $(TOPDIR)/rules.mk

PKG_NAME:=qca-nss-clients
PKG_RELEASE:=$(AUTORELEASE)

PKG_SOURCE_URL:=https://source.codeaurora.org/quic/cc-qrdk/oss/lklm/nss-clients
PKG_SOURCE_PROTO:=git
PKG_SOURCE_DATE:=2021-04-29
PKG_SOURCE_VERSION:=b93c72c1b72c591c2ddc2f0b24f0e2b457720118
PKG_MIRROR_HASH:=9fab23da994bfbac9a3cef32cdfec31a87a03ed415f36bc926da32b7b0934259

include $(INCLUDE_DIR)/kernel.mk
include $(INCLUDE_DIR)/package.mk

define KernelPackage/qca-nss-drv-pppoe
  SECTION:=kernel
  CATEGORY:=Kernel modules
  SUBMENU:=Network Devices
  TITLE:=Kernel driver for NSS (connection manager) - PPPoE
  DEPENDS:=@TARGET_ipq807x +kmod-qca-nss-drv +kmod-ppp +kmod-pppoe
  FILES:=$(PKG_BUILD_DIR)/pppoe/qca-nss-pppoe.ko
  AUTOLOAD:=$(call AutoLoad,51,qca-nss-pppoe)
endef

define KernelPackage/qca-nss-drv-pppoe/Description
Kernel modules for NSS connection manager - Support for PPPoE
endef

define KernelPackage/qca-nss-drv-bridge-mgr
  SECTION:=kernel
  CATEGORY:=Kernel modules
  SUBMENU:=Network Devices
  TITLE:=Kernel driver for NSS bridge manager
  DEPENDS:=@TARGET_ipq807x +kmod-qca-nss-drv +kmod-qca-nss-drv-vlan-mgr
  FILES:=$(PKG_BUILD_DIR)/bridge/qca-nss-bridge-mgr.ko
  AUTOLOAD:=$(call AutoLoad,51,qca-nss-bridge-mgr)
endef

define KernelPackage/qca-nss-drv-bridge-mgr/Description
Kernel modules for NSS bridge manager
endef

define KernelPackage/qca-nss-drv-vlan-mgr
  SECTION:=kernel
  CATEGORY:=Kernel modules
  SUBMENU:=Network Devices
  TITLE:=Kernel driver for NSS vlan manager
  DEPENDS:=@TARGET_ipq807x +kmod-qca-nss-drv
  FILES:=$(PKG_BUILD_DIR)/vlan/qca-nss-vlan.ko
  AUTOLOAD:=$(call AutoLoad,51,qca-nss-vlan)
endef

define KernelPackage/qca-nss-drv-vlan-mgr/Description
Kernel modules for NSS vlan manager
endef

EXTRA_CFLAGS+= \
	-I$(STAGING_DIR)/usr/include/qca-nss-drv \
	-I$(STAGING_DIR)/usr/include/qca-nss-crypto \
	-I$(STAGING_DIR)/usr/include/qca-nss-cfi \
	-I$(STAGING_DIR)/usr/include/qca-nss-gmac \
	-I$(STAGING_DIR)/usr/include/qca-ssdk \
	-I$(STAGING_DIR)/usr/include/qca-ssdk/fal \
	-I$(STAGING_DIR)/usr/include/nat46

ifneq ($(CONFIG_PACKAGE_kmod-qca-nss-drv-pppoe),)
NSS_CLIENTS_MAKE_OPTS+=pppoe=y
endif

ifneq ($(CONFIG_PACKAGE_kmod-qca-nss-drv-bridge-mgr),)
NSS_CLIENTS_MAKE_OPTS+=bridge-mgr=y
#enable OVS bridge if ovsmgr is enabled
ifneq ($(CONFIG_PACKAGE_kmod-qca-ovsmgr),)
NSS_CLIENTS_MAKE_OPTS+= NSS_BRIDGE_MGR_OVS_ENABLE=y
EXTRA_CFLAGS+= -I$(STAGING_DIR)/usr/include/qca-ovsmgr
endif
endif

ifneq ($(CONFIG_PACKAGE_kmod-qca-nss-drv-vlan-mgr),)
NSS_CLIENTS_MAKE_OPTS+=vlan-mgr=y
endif

ifeq ($(CONFIG_TARGET_BOARD), "ipq807x")
    SOC="ipq807x_64"
else ifeq ($(CONFIG_TARGET_BOARD), "ipq60xx")
    SOC="ipq60xx_64"
endif

define Build/Compile
	$(MAKE) -C "$(LINUX_DIR)" $(strip $(NSS_CLIENTS_MAKE_OPTS)) \
		CROSS_COMPILE="$(TARGET_CROSS)" \
		ARCH="$(LINUX_KARCH)" \
		M="$(PKG_BUILD_DIR)" \
		EXTRA_CFLAGS="$(EXTRA_CFLAGS)" \
		SoC=$(SOC) \
		$(KERNEL_MAKE_FLAGS) \
		modules
endef


$(eval $(call KernelPackage,qca-nss-drv-pppoe))
$(eval $(call KernelPackage,qca-nss-drv-bridge-mgr))
$(eval $(call KernelPackage,qca-nss-drv-vlan-mgr))

1.5 steps forward... Think I've fixed the crash, roaming between the external ports works but still roaming to the Ath11 interface takes a long time. Need to double check but I'm sure previously roaming between the 703n & Ath11 worked fine, but now it also has a delay. 703n to Asus and back is fine.

A diff of br_fdb.c from vanilla 4.4.60 and 4.4.60 QSDK 11.3 has amongst other things

@@ -308,10 +337,16 @@ void br_fdb_cleanup(unsigned long _data)
 			if (f->added_by_external_learn)
 				continue;
 			this_timer = f->updated + delay;
-			if (time_before_eq(this_timer, jiffies))
+			if (time_before_eq(this_timer, jiffies)) {
+				memset(&fdb_event, 0, sizeof(fdb_event));
+				ether_addr_copy(fdb_event.addr, f->addr.addr);
 				fdb_delete(br, f);
-			else if (time_before(this_timer, next_timer))
+				atomic_notifier_call_chain(
+					&br_fdb_update_notifier_list, 0,
+					(void *)&fdb_event);
+			} else if (time_before(this_timer, next_timer)) {
 				next_timer = this_timer;
+			}
 		}
 	}
 	spin_unlock(&br->hash_lock);

so the final parameter of atomic_notifier_call_chain() is an empty struct br_fdb_event with just fdb_event.addr populated, and this is what the callback function nss_bridge_mgr_fdb_update_callback() in nss_bridge_mgr is expecting.

Whilst in robimarko's build, it's subtly different

                               ether_addr_copy(mac_addr, f->key.addr.addr);
                                fdb_delete(br, f, true);
                                /* QCA NSS ECM support - Start */
                                atomic_notifier_call_chain(
                                        &br_fdb_update_notifier_list, 0,
                                        (void *)mac_addr);
                                /* QCA NSS ECM support - End */

Here atomic_notifier_call_chain() is supplied with the address directly, not within an otherwise empty struct br_fdb_event. Is there a reason for this change? I've modified nss_bridge_mgr_fdb_update_callback() to work with the address and put that back in a struct for the rest of the function.

static int nss_bridge_mgr_fdb_update_callback(struct notifier_block *notifier,
                                              unsigned long val, void *ctx)
{
        struct br_fdb_event _event;
        struct br_fdb_event *event = &_event;
        struct nss_bridge_pvt *b_pvt = NULL;
        struct net_device *br_dev = NULL;
        fal_fdb_entry_t entry;

        memset(&_event, 0, sizeof(_event));
        ether_addr_copy(_event.addr, ctx);

Uptime 45 minutes and no crashes connecting to the Ath11 so I think this is a small improvement at least now.

3 Likes

Inspired by avalentin's idea on the main AX3600 thread I've taken a peek at ssdk-shell. Setting the fdb ageTime to 30s (from 150s) reduces the roaming delay on Ath11 to... 30s ish. Setting to 300s took 490s!

Monitoring the fdb inside ssdk-shell shows the fdb it's showing isn't updated when roaming during the dead period, and in fact when the device gains connectivity, it's not even listed in that fdb, it just disappears, which seems a bit odd.

Ahh, that would suggest that the kernel has no control over the FDB in this driver setup as the default aging time should be 30s.
I gotta say once again that I hate this setup without a proper switch and ethernet driver but rather this mess with fake netdevs.

D'oh!

While I was investigating the crash yesteday, I neutered the br_fdb_bridge_dev_get_and_hold() function to just return 0... and of course forgot to put it back. Seems with that put back, Ath11 roaming works fine too.

1 Like

Roaming issue is kind of solved with a workaround by @avalentin

See over there Adding OpenWrt support for Xiaomi AX3600 for details.

Of course still interested in clean solution...

I forgot that I added in a few extra bits today, so it wasn't just the neutered function that was needed.

What hopefully is a clean solution then...

The above Makefile needs placing in the qca-nss-clients directory to make available the bridge-mgr & vlan-mgr packages. Select these from kernel modules -> network devices.

Then the following patch is needed

--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -66,6 +66,15 @@ void br_fdb_update_unregister_notify(str
 EXPORT_SYMBOL_GPL(br_fdb_update_unregister_notify);
 /* QCA NSS ECM support - End */
 
+/* QCA NSS bridge-mgr support - Start */
+struct net_device *br_fdb_bridge_dev_get_and_hold(struct net_bridge *br)
+{
+	dev_hold(br->dev);
+	return br->dev;
+}
+EXPORT_SYMBOL_GPL(br_fdb_bridge_dev_get_and_hold);
+/* QCA NSS bridge-mgr support - End */
+
 int __init br_fdb_init(void)
 {
 	br_fdb_cache = kmem_cache_create("bridge_fdb_cache",
@@ -371,7 +380,7 @@ void br_fdb_cleanup(struct work_struct *
 	unsigned long delay = hold_time(br);
 	unsigned long work_delay = delay;
 	unsigned long now = jiffies;
-	u8 mac_addr[6]; /* QCA NSS ECM support */
+	struct br_fdb_event fdb_event; /* QCA NSS bridge-mgr support */
 
 	/* this part is tricky, in order to avoid blocking learning and
 	 * consequently forwarding, we rely on rcu to delete objects with
@@ -399,12 +408,13 @@ void br_fdb_cleanup(struct work_struct *
 		} else {
 			spin_lock_bh(&br->hash_lock);
 			if (!hlist_unhashed(&f->fdb_node)) {
-				ether_addr_copy(mac_addr, f->key.addr.addr);
+				memset(&fdb_event, 0, sizeof(fdb_event));
+				ether_addr_copy(fdb_event.addr, f->key.addr.addr);
 				fdb_delete(br, f, true);
 				/* QCA NSS ECM support - Start */
 				atomic_notifier_call_chain(
 					&br_fdb_update_notifier_list, 0,
-					(void *)mac_addr);
+					(void *)&fdb_event);
 				/* QCA NSS ECM support - End */
 			}
 			spin_unlock_bh(&br->hash_lock);
@@ -615,6 +625,7 @@ void br_fdb_update(struct net_bridge *br
 		   const unsigned char *addr, u16 vid, unsigned long flags)
 {
 	struct net_bridge_fdb_entry *fdb;
+	struct br_fdb_event fdb_event; /* QCA NSS bridge-mgr support */
 
 	/* some users want to always flood. */
 	if (hold_time(br) == 0)
@@ -640,6 +651,12 @@ void br_fdb_update(struct net_bridge *br
 			if (unlikely(source != fdb->dst &&
 				     !test_bit(BR_FDB_STICKY, &fdb->flags))) {
 				br_switchdev_fdb_notify(br, fdb, RTM_DELNEIGH);
+				/* QCA NSS bridge-mgr support - Start */
+				ether_addr_copy(fdb_event.addr, addr);
+				fdb_event.br = br;
+				fdb_event.orig_dev = fdb->dst->dev;
+				fdb_event.dev = source->dev;
+				/* QCA NSS bridge-mgr support - End */
 				fdb->dst = source;
 				fdb_modified = true;
 				/* Take over HW learned entry */
@@ -651,7 +668,7 @@ void br_fdb_update(struct net_bridge *br
 				/* QCA NSS ECM support - Start */
 				atomic_notifier_call_chain(
 					&br_fdb_update_notifier_list,
-					0, (void *)addr);
+					0, (void *)&fdb_event);
 				/* QCA NSS ECM support - End */
 			}
 
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -195,4 +195,8 @@ typedef struct net_bridge_port *br_get_d
 extern br_get_dst_hook_t __rcu *br_get_dst_hook;
 /* QCA NSS ECM support - End */
 
+/* QCA NSS bridge-mgr support - Start */
+extern struct net_device *br_fdb_bridge_dev_get_and_hold(struct net_bridge *br);
+/* QCA NSS bridge-mgr support - End */
+
 #endif

Build, and see if you can roam freely.

1 Like

Thanks, would like to try, but I doubt I will be able to recreate your build from the infos above.
Do you happen to have a fork with all your changes somewhere? Not time critical though, since it will probably take a few days until I can tackle this.

Your wish is my command :slight_smile:

One thing to add, booting with my Friday night image including bridge-mgr had 200 MB free, then just leaving it overnight about 12 hours later it was down to 80 MB free. Booted last night from the current build, and by the morning again it was down to 80 MB. I'm not sure if this is just the Ath11k memory leak, as I haven't paid much attention to that yet, or if bridge-mgr is also introducing a memory leak as well...

I've just rmmod ath11k_ahb & ath11k after a reboot, free went up from 195M to 288M after removing ath11k_ahb. I'll leave it be now and see if the memory is stable or not.

8 hours later 288M free still so I guess no memory leak here.

1 Like

Status update.

  • Current image works fine for dumb AP + DNS/DHCP server for several days now.
  • No roaming issues with the workaround-script.
  • Free memory between 30-170MB, average 100MB free, no ooms (many clients)

Building psi-c image worked just fine. I plan to give it a try this weekend.

Changes of currently running build wrt robimarko restart branch:

package/firmware/ath11k-firmware/Makefile
package/kernel/mac80211/patches/ath11k/801-ath11k-fix-4-addr-tx-failure-for-AP-and-STA-modes.patch
package/kernel/mac80211/patches/ath11k/802-ath11k-small-tx-queue.patch
package/kernel/mac80211/patches/ath11k/803-ath11k-fix-zero-address-in-probe-request.patch
Build configuration for reference
CONFIG_TARGET_ipq807x=y
CONFIG_TARGET_ipq807x_generic=y
CONFIG_TARGET_ipq807x_generic_DEVICE_xiaomi_ax3600=y
CONFIG_DEVEL=y
CONFIG_BUILD_LOG=y
CONFIG_KERNEL_BUILD_DOMAIN="patch-mem-roam-1"
CONFIG_KERNEL_BUILD_USER="joba-1"
CONFIG_MACTELNET_PLAIN_SUPPORT=y
CONFIG_OPENVPN_wolfssl=y
CONFIG_OPENVPN_wolfssl_ENABLE_DEF_AUTH=y
CONFIG_OPENVPN_wolfssl_ENABLE_FRAGMENT=y
CONFIG_OPENVPN_wolfssl_ENABLE_LZ4=y
CONFIG_OPENVPN_wolfssl_ENABLE_MULTIHOME=y
CONFIG_OPENVPN_wolfssl_ENABLE_PF=y
CONFIG_OPENVPN_wolfssl_ENABLE_PORT_SHARE=y
CONFIG_OPENVPN_wolfssl_ENABLE_SMALL=y
# CONFIG_PACKAGE_ath10k-firmware-qca9887-ct is not set
CONFIG_PACKAGE_ath10k-firmware-qca9887-ct-full-htt=y
CONFIG_PACKAGE_atop=y
CONFIG_PACKAGE_bridge=y
CONFIG_PACKAGE_cJSON=y
CONFIG_PACKAGE_cgi-io=y
CONFIG_PACKAGE_dawn=y
CONFIG_PACKAGE_hostapd-utils=y
CONFIG_PACKAGE_htop=y
CONFIG_PACKAGE_ip-bridge=y
CONFIG_PACKAGE_iperf3=y
CONFIG_PACKAGE_iptables-mod-conntrack-extra=y
CONFIG_PACKAGE_iptables-mod-extra=y
CONFIG_PACKAGE_iptables-mod-ipopt=y
CONFIG_PACKAGE_iptables-mod-physdev=y
CONFIG_PACKAGE_irqbalance=y
# CONFIG_PACKAGE_kmod-ath10k-ct is not set
CONFIG_PACKAGE_kmod-ath10k-ct-smallbuffers=y
CONFIG_PACKAGE_kmod-br-netfilter=y
CONFIG_PACKAGE_kmod-ifb=y
CONFIG_PACKAGE_kmod-ipt-conntrack-extra=y
CONFIG_PACKAGE_kmod-ipt-extra=y
CONFIG_PACKAGE_kmod-ipt-ipopt=y
CONFIG_PACKAGE_kmod-ipt-physdev=y
CONFIG_PACKAGE_kmod-ipt-raw=y
CONFIG_PACKAGE_kmod-netatop=y
CONFIG_PACKAGE_kmod-netlink-diag=y
CONFIG_PACKAGE_kmod-qca-nss-drv=y
CONFIG_PACKAGE_kmod-qca-nss-ecm=y
CONFIG_PACKAGE_kmod-sched-connmark=y
CONFIG_PACKAGE_kmod-sched-core=y
CONFIG_PACKAGE_kmod-tun=y
CONFIG_PACKAGE_libcares=y
CONFIG_PACKAGE_libgcrypt=y
CONFIG_PACKAGE_libgpg-error=y
CONFIG_PACKAGE_libiwinfo-lua=y
CONFIG_PACKAGE_liblua=y
CONFIG_PACKAGE_liblucihttp=y
CONFIG_PACKAGE_liblucihttp-lua=y
CONFIG_PACKAGE_libmosquitto-nossl=y
CONFIG_PACKAGE_libncurses=y
CONFIG_PACKAGE_libpcap=y
CONFIG_PACKAGE_libpcre=y
CONFIG_PACKAGE_librt=y
CONFIG_PACKAGE_libstdcpp=y
CONFIG_PACKAGE_libtirpc=y
CONFIG_PACKAGE_libubus-lua=y
CONFIG_PACKAGE_libuci-lua=y
CONFIG_PACKAGE_logger=y
CONFIG_PACKAGE_lsof=y
CONFIG_PACKAGE_lua=y
CONFIG_PACKAGE_lua-bit32=y
CONFIG_PACKAGE_luabitop=y
CONFIG_PACKAGE_luasocket=y
CONFIG_PACKAGE_luci=y
CONFIG_PACKAGE_luci-app-dawn=y
CONFIG_PACKAGE_luci-app-firewall=y
CONFIG_PACKAGE_luci-app-openvpn=y
CONFIG_PACKAGE_luci-app-opkg=y
CONFIG_PACKAGE_luci-app-qos=y
CONFIG_PACKAGE_luci-base=y
CONFIG_PACKAGE_luci-compat=y
CONFIG_PACKAGE_luci-lib-base=y
CONFIG_PACKAGE_luci-lib-ip=y
CONFIG_PACKAGE_luci-lib-json=y
CONFIG_PACKAGE_luci-lib-jsonc=y
CONFIG_PACKAGE_luci-lib-nixio=y
CONFIG_PACKAGE_luci-mod-admin-full=y
CONFIG_PACKAGE_luci-mod-dashboard=y
CONFIG_PACKAGE_luci-mod-network=y
CONFIG_PACKAGE_luci-mod-status=y
CONFIG_PACKAGE_luci-mod-system=y
CONFIG_PACKAGE_luci-proto-ipv6=y
CONFIG_PACKAGE_luci-proto-ppp=y
CONFIG_PACKAGE_luci-ssl=y
CONFIG_PACKAGE_luci-theme-bootstrap=y
CONFIG_PACKAGE_luci-theme-openwrt-2020=y
CONFIG_PACKAGE_mac-telnet-client=y
CONFIG_PACKAGE_mac-telnet-discover=y
CONFIG_PACKAGE_mac-telnet-ping=y
CONFIG_PACKAGE_mac-telnet-server=y
CONFIG_PACKAGE_mii-tool=y
CONFIG_PACKAGE_mosquitto-client-nossl=y
CONFIG_PACKAGE_ncat=y
CONFIG_PACKAGE_netatop=y
CONFIG_PACKAGE_nmap=y
CONFIG_PACKAGE_nping=y
CONFIG_PACKAGE_openvpn-wolfssl=y
CONFIG_PACKAGE_prometheus-node-exporter-lua=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-dawn=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-hostapd_stations=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-hostapd_ubus_stations=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-nat_traffic=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-netstat=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-openwrt=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-snmp6=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-textfile=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-uci_dhcp_host=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-wifi=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-wifi_stations=y
CONFIG_PACKAGE_prometheus-statsd-exporter=y
CONFIG_PACKAGE_px5g-wolfssl=y
CONFIG_PACKAGE_qos-scripts=y
CONFIG_PACKAGE_rpcd=y
CONFIG_PACKAGE_rpcd-mod-file=y
CONFIG_PACKAGE_rpcd-mod-iwinfo=y
CONFIG_PACKAGE_rpcd-mod-luci=y
CONFIG_PACKAGE_rpcd-mod-rrdns=y
CONFIG_PACKAGE_ss=y
CONFIG_PACKAGE_tc-mod-iptables=y
CONFIG_PACKAGE_tc-tiny=y
CONFIG_PACKAGE_tcpdump=y
CONFIG_PACKAGE_terminfo=y
CONFIG_PACKAGE_uhttpd=y
CONFIG_PACKAGE_uhttpd-mod-ubus=y
CONFIG_PACKAGE_umdns=y
# CONFIG_PACKAGE_wpad-basic-wolfssl is not set
CONFIG_PACKAGE_wpad-wolfssl=y
CONFIG_PACKAGE_zlib=y
CONFIG_WOLFSSL_HAS_OPENVPN=y

@psi-c Just stumbled on the perfect explanation of the roaming issue, it was as I thought FDB-s fault with duplicate entries until the old one expires, it also explains why the bridge-mgr solved it.

5 Likes

Does that mean QC bodged the silicon or the driver and has to fix it with 2 additional services? Or....???

Its due to their driver as they are not using DSA or switchdev to actually represent it as a switch and are instead faking that those are individual netdevs so there is nothing in the kernel to update the FDB on a HW level.

Thanks
So the kernel can't correlate the seemingly independent ports...

Would it in theory be possible for the OS community to rewrite the driver in a more conventional way, if someone would want to take this task up, or is there something missing?

Sure, somebody can write a proper ethernet + DSA driver but good luck in writing that without any docs.

1 Like

Thanks, so we are back to the workarounds :wink:

no time to tinker lately, just wanted to let you know that the flush workaround for roaming works well for me. My biggest issue is a nonworking service or a reset due to low mem approximately every 2 weeks.

Good find, and indeed makes sense!

Anything against including my changes to add bridge-mgr to your nss repo? I've been running this as just an AP for the last month or so without issue. I don't know if anybody else has tried and has anything good ro bad to report?

No, I don't have anything against it as it fixes a really annoying thing.
I just want people to know that this is all really flimsy due to the amount of QCA code we are pulling in, for example, the 2.5G port on AX9000 won't work in 1G when using it in initramfs and offloading is broken again.
And this just happened when rebasing on newer kernel point releases so everything that I was afraid of is happening and I don't have time nor knowledge to debug that whole mess.
I am spending most of the time working on this trying to get as most as possible upstream and that is a slow process.

So, If you want those to be included feel free to make a PR

Hi!

I'm know using robimarkos repo with psi-c patch, works perfectly.
Thank you psi-c for this good solution.