Roaming Issues Xiaomi AX3600

I forgot that I added in a few extra bits today, so it wasn't just the neutered function that was needed.

What hopefully is a clean solution then...

The above Makefile needs placing in the qca-nss-clients directory to make available the bridge-mgr & vlan-mgr packages. Select these from kernel modules -> network devices.

Then the following patch is needed

--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -66,6 +66,15 @@ void br_fdb_update_unregister_notify(str
 EXPORT_SYMBOL_GPL(br_fdb_update_unregister_notify);
 /* QCA NSS ECM support - End */
 
+/* QCA NSS bridge-mgr support - Start */
+struct net_device *br_fdb_bridge_dev_get_and_hold(struct net_bridge *br)
+{
+	dev_hold(br->dev);
+	return br->dev;
+}
+EXPORT_SYMBOL_GPL(br_fdb_bridge_dev_get_and_hold);
+/* QCA NSS bridge-mgr support - End */
+
 int __init br_fdb_init(void)
 {
 	br_fdb_cache = kmem_cache_create("bridge_fdb_cache",
@@ -371,7 +380,7 @@ void br_fdb_cleanup(struct work_struct *
 	unsigned long delay = hold_time(br);
 	unsigned long work_delay = delay;
 	unsigned long now = jiffies;
-	u8 mac_addr[6]; /* QCA NSS ECM support */
+	struct br_fdb_event fdb_event; /* QCA NSS bridge-mgr support */
 
 	/* this part is tricky, in order to avoid blocking learning and
 	 * consequently forwarding, we rely on rcu to delete objects with
@@ -399,12 +408,13 @@ void br_fdb_cleanup(struct work_struct *
 		} else {
 			spin_lock_bh(&br->hash_lock);
 			if (!hlist_unhashed(&f->fdb_node)) {
-				ether_addr_copy(mac_addr, f->key.addr.addr);
+				memset(&fdb_event, 0, sizeof(fdb_event));
+				ether_addr_copy(fdb_event.addr, f->key.addr.addr);
 				fdb_delete(br, f, true);
 				/* QCA NSS ECM support - Start */
 				atomic_notifier_call_chain(
 					&br_fdb_update_notifier_list, 0,
-					(void *)mac_addr);
+					(void *)&fdb_event);
 				/* QCA NSS ECM support - End */
 			}
 			spin_unlock_bh(&br->hash_lock);
@@ -615,6 +625,7 @@ void br_fdb_update(struct net_bridge *br
 		   const unsigned char *addr, u16 vid, unsigned long flags)
 {
 	struct net_bridge_fdb_entry *fdb;
+	struct br_fdb_event fdb_event; /* QCA NSS bridge-mgr support */
 
 	/* some users want to always flood. */
 	if (hold_time(br) == 0)
@@ -640,6 +651,12 @@ void br_fdb_update(struct net_bridge *br
 			if (unlikely(source != fdb->dst &&
 				     !test_bit(BR_FDB_STICKY, &fdb->flags))) {
 				br_switchdev_fdb_notify(br, fdb, RTM_DELNEIGH);
+				/* QCA NSS bridge-mgr support - Start */
+				ether_addr_copy(fdb_event.addr, addr);
+				fdb_event.br = br;
+				fdb_event.orig_dev = fdb->dst->dev;
+				fdb_event.dev = source->dev;
+				/* QCA NSS bridge-mgr support - End */
 				fdb->dst = source;
 				fdb_modified = true;
 				/* Take over HW learned entry */
@@ -651,7 +668,7 @@ void br_fdb_update(struct net_bridge *br
 				/* QCA NSS ECM support - Start */
 				atomic_notifier_call_chain(
 					&br_fdb_update_notifier_list,
-					0, (void *)addr);
+					0, (void *)&fdb_event);
 				/* QCA NSS ECM support - End */
 			}
 
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -195,4 +195,8 @@ typedef struct net_bridge_port *br_get_d
 extern br_get_dst_hook_t __rcu *br_get_dst_hook;
 /* QCA NSS ECM support - End */
 
+/* QCA NSS bridge-mgr support - Start */
+extern struct net_device *br_fdb_bridge_dev_get_and_hold(struct net_bridge *br);
+/* QCA NSS bridge-mgr support - End */
+
 #endif

Build, and see if you can roam freely.

1 Like

Thanks, would like to try, but I doubt I will be able to recreate your build from the infos above.
Do you happen to have a fork with all your changes somewhere? Not time critical though, since it will probably take a few days until I can tackle this.

Your wish is my command :slight_smile:

One thing to add, booting with my Friday night image including bridge-mgr had 200 MB free, then just leaving it overnight about 12 hours later it was down to 80 MB free. Booted last night from the current build, and by the morning again it was down to 80 MB. I'm not sure if this is just the Ath11k memory leak, as I haven't paid much attention to that yet, or if bridge-mgr is also introducing a memory leak as well...

I've just rmmod ath11k_ahb & ath11k after a reboot, free went up from 195M to 288M after removing ath11k_ahb. I'll leave it be now and see if the memory is stable or not.

8 hours later 288M free still so I guess no memory leak here.

1 Like

Status update.

  • Current image works fine for dumb AP + DNS/DHCP server for several days now.
  • No roaming issues with the workaround-script.
  • Free memory between 30-170MB, average 100MB free, no ooms (many clients)

Building psi-c image worked just fine. I plan to give it a try this weekend.

Changes of currently running build wrt robimarko restart branch:

package/firmware/ath11k-firmware/Makefile
package/kernel/mac80211/patches/ath11k/801-ath11k-fix-4-addr-tx-failure-for-AP-and-STA-modes.patch
package/kernel/mac80211/patches/ath11k/802-ath11k-small-tx-queue.patch
package/kernel/mac80211/patches/ath11k/803-ath11k-fix-zero-address-in-probe-request.patch
Build configuration for reference
CONFIG_TARGET_ipq807x=y
CONFIG_TARGET_ipq807x_generic=y
CONFIG_TARGET_ipq807x_generic_DEVICE_xiaomi_ax3600=y
CONFIG_DEVEL=y
CONFIG_BUILD_LOG=y
CONFIG_KERNEL_BUILD_DOMAIN="patch-mem-roam-1"
CONFIG_KERNEL_BUILD_USER="joba-1"
CONFIG_MACTELNET_PLAIN_SUPPORT=y
CONFIG_OPENVPN_wolfssl=y
CONFIG_OPENVPN_wolfssl_ENABLE_DEF_AUTH=y
CONFIG_OPENVPN_wolfssl_ENABLE_FRAGMENT=y
CONFIG_OPENVPN_wolfssl_ENABLE_LZ4=y
CONFIG_OPENVPN_wolfssl_ENABLE_MULTIHOME=y
CONFIG_OPENVPN_wolfssl_ENABLE_PF=y
CONFIG_OPENVPN_wolfssl_ENABLE_PORT_SHARE=y
CONFIG_OPENVPN_wolfssl_ENABLE_SMALL=y
# CONFIG_PACKAGE_ath10k-firmware-qca9887-ct is not set
CONFIG_PACKAGE_ath10k-firmware-qca9887-ct-full-htt=y
CONFIG_PACKAGE_atop=y
CONFIG_PACKAGE_bridge=y
CONFIG_PACKAGE_cJSON=y
CONFIG_PACKAGE_cgi-io=y
CONFIG_PACKAGE_dawn=y
CONFIG_PACKAGE_hostapd-utils=y
CONFIG_PACKAGE_htop=y
CONFIG_PACKAGE_ip-bridge=y
CONFIG_PACKAGE_iperf3=y
CONFIG_PACKAGE_iptables-mod-conntrack-extra=y
CONFIG_PACKAGE_iptables-mod-extra=y
CONFIG_PACKAGE_iptables-mod-ipopt=y
CONFIG_PACKAGE_iptables-mod-physdev=y
CONFIG_PACKAGE_irqbalance=y
# CONFIG_PACKAGE_kmod-ath10k-ct is not set
CONFIG_PACKAGE_kmod-ath10k-ct-smallbuffers=y
CONFIG_PACKAGE_kmod-br-netfilter=y
CONFIG_PACKAGE_kmod-ifb=y
CONFIG_PACKAGE_kmod-ipt-conntrack-extra=y
CONFIG_PACKAGE_kmod-ipt-extra=y
CONFIG_PACKAGE_kmod-ipt-ipopt=y
CONFIG_PACKAGE_kmod-ipt-physdev=y
CONFIG_PACKAGE_kmod-ipt-raw=y
CONFIG_PACKAGE_kmod-netatop=y
CONFIG_PACKAGE_kmod-netlink-diag=y
CONFIG_PACKAGE_kmod-qca-nss-drv=y
CONFIG_PACKAGE_kmod-qca-nss-ecm=y
CONFIG_PACKAGE_kmod-sched-connmark=y
CONFIG_PACKAGE_kmod-sched-core=y
CONFIG_PACKAGE_kmod-tun=y
CONFIG_PACKAGE_libcares=y
CONFIG_PACKAGE_libgcrypt=y
CONFIG_PACKAGE_libgpg-error=y
CONFIG_PACKAGE_libiwinfo-lua=y
CONFIG_PACKAGE_liblua=y
CONFIG_PACKAGE_liblucihttp=y
CONFIG_PACKAGE_liblucihttp-lua=y
CONFIG_PACKAGE_libmosquitto-nossl=y
CONFIG_PACKAGE_libncurses=y
CONFIG_PACKAGE_libpcap=y
CONFIG_PACKAGE_libpcre=y
CONFIG_PACKAGE_librt=y
CONFIG_PACKAGE_libstdcpp=y
CONFIG_PACKAGE_libtirpc=y
CONFIG_PACKAGE_libubus-lua=y
CONFIG_PACKAGE_libuci-lua=y
CONFIG_PACKAGE_logger=y
CONFIG_PACKAGE_lsof=y
CONFIG_PACKAGE_lua=y
CONFIG_PACKAGE_lua-bit32=y
CONFIG_PACKAGE_luabitop=y
CONFIG_PACKAGE_luasocket=y
CONFIG_PACKAGE_luci=y
CONFIG_PACKAGE_luci-app-dawn=y
CONFIG_PACKAGE_luci-app-firewall=y
CONFIG_PACKAGE_luci-app-openvpn=y
CONFIG_PACKAGE_luci-app-opkg=y
CONFIG_PACKAGE_luci-app-qos=y
CONFIG_PACKAGE_luci-base=y
CONFIG_PACKAGE_luci-compat=y
CONFIG_PACKAGE_luci-lib-base=y
CONFIG_PACKAGE_luci-lib-ip=y
CONFIG_PACKAGE_luci-lib-json=y
CONFIG_PACKAGE_luci-lib-jsonc=y
CONFIG_PACKAGE_luci-lib-nixio=y
CONFIG_PACKAGE_luci-mod-admin-full=y
CONFIG_PACKAGE_luci-mod-dashboard=y
CONFIG_PACKAGE_luci-mod-network=y
CONFIG_PACKAGE_luci-mod-status=y
CONFIG_PACKAGE_luci-mod-system=y
CONFIG_PACKAGE_luci-proto-ipv6=y
CONFIG_PACKAGE_luci-proto-ppp=y
CONFIG_PACKAGE_luci-ssl=y
CONFIG_PACKAGE_luci-theme-bootstrap=y
CONFIG_PACKAGE_luci-theme-openwrt-2020=y
CONFIG_PACKAGE_mac-telnet-client=y
CONFIG_PACKAGE_mac-telnet-discover=y
CONFIG_PACKAGE_mac-telnet-ping=y
CONFIG_PACKAGE_mac-telnet-server=y
CONFIG_PACKAGE_mii-tool=y
CONFIG_PACKAGE_mosquitto-client-nossl=y
CONFIG_PACKAGE_ncat=y
CONFIG_PACKAGE_netatop=y
CONFIG_PACKAGE_nmap=y
CONFIG_PACKAGE_nping=y
CONFIG_PACKAGE_openvpn-wolfssl=y
CONFIG_PACKAGE_prometheus-node-exporter-lua=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-dawn=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-hostapd_stations=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-hostapd_ubus_stations=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-nat_traffic=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-netstat=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-openwrt=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-snmp6=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-textfile=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-uci_dhcp_host=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-wifi=y
CONFIG_PACKAGE_prometheus-node-exporter-lua-wifi_stations=y
CONFIG_PACKAGE_prometheus-statsd-exporter=y
CONFIG_PACKAGE_px5g-wolfssl=y
CONFIG_PACKAGE_qos-scripts=y
CONFIG_PACKAGE_rpcd=y
CONFIG_PACKAGE_rpcd-mod-file=y
CONFIG_PACKAGE_rpcd-mod-iwinfo=y
CONFIG_PACKAGE_rpcd-mod-luci=y
CONFIG_PACKAGE_rpcd-mod-rrdns=y
CONFIG_PACKAGE_ss=y
CONFIG_PACKAGE_tc-mod-iptables=y
CONFIG_PACKAGE_tc-tiny=y
CONFIG_PACKAGE_tcpdump=y
CONFIG_PACKAGE_terminfo=y
CONFIG_PACKAGE_uhttpd=y
CONFIG_PACKAGE_uhttpd-mod-ubus=y
CONFIG_PACKAGE_umdns=y
# CONFIG_PACKAGE_wpad-basic-wolfssl is not set
CONFIG_PACKAGE_wpad-wolfssl=y
CONFIG_PACKAGE_zlib=y
CONFIG_WOLFSSL_HAS_OPENVPN=y

@psi-c Just stumbled on the perfect explanation of the roaming issue, it was as I thought FDB-s fault with duplicate entries until the old one expires, it also explains why the bridge-mgr solved it.

5 Likes

Does that mean QC bodged the silicon or the driver and has to fix it with 2 additional services? Or....???

Its due to their driver as they are not using DSA or switchdev to actually represent it as a switch and are instead faking that those are individual netdevs so there is nothing in the kernel to update the FDB on a HW level.

Thanks
So the kernel can't correlate the seemingly independent ports...

Would it in theory be possible for the OS community to rewrite the driver in a more conventional way, if someone would want to take this task up, or is there something missing?

Sure, somebody can write a proper ethernet + DSA driver but good luck in writing that without any docs.

1 Like

Thanks, so we are back to the workarounds :wink:

no time to tinker lately, just wanted to let you know that the flush workaround for roaming works well for me. My biggest issue is a nonworking service or a reset due to low mem approximately every 2 weeks.

Good find, and indeed makes sense!

Anything against including my changes to add bridge-mgr to your nss repo? I've been running this as just an AP for the last month or so without issue. I don't know if anybody else has tried and has anything good ro bad to report?

No, I don't have anything against it as it fixes a really annoying thing.
I just want people to know that this is all really flimsy due to the amount of QCA code we are pulling in, for example, the 2.5G port on AX9000 won't work in 1G when using it in initramfs and offloading is broken again.
And this just happened when rebasing on newer kernel point releases so everything that I was afraid of is happening and I don't have time nor knowledge to debug that whole mess.
I am spending most of the time working on this trying to get as most as possible upstream and that is a slow process.

So, If you want those to be included feel free to make a PR

Hi!

I'm know using robimarkos repo with psi-c patch, works perfectly.
Thank you psi-c for this good solution.

I can confirm that the roaming issue is solved for me with the bridge-mgr, too.

No more need for the flush script in /etc/rc.local, when using e.g. OpenWrt SNAPSHOT r0-bad92cd due to @psi-c 's pr roaming just works.

Decomissioned script, for reference:

#!/bin/ash
function flush() {
	ssdk_sh fdb entry flush 1
}

IFS=$'\n'
logread -f | while read line; do
	if [[ "$line" == *"AP-STA-"* ]]; then
		flush </dev/null >/dev/null 2>&1
	fi
done

I consider this a closed case - thanks to all for the support!

2 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.