Confirmed: The main unmanaged switch was flooding unknown unicast traffic to all ports at 789,000 PPS due to MAC table overflow from randomized client MACs cycling through
Partial fix: Replacing main switch with a smart switch (Netview GBDJ-08G02GB) reduced PPS from 12,000 to 60 PPS in testing
Remaining issue: ARP table still fills with FAILED entries over time, APs still become unreachable periodically
With randomized MACs constantly cycling, what is the recommended way to prevent ARP table degradation on a large flat /20 network?
Is there a way to make uspot more resilient to ARP table failures — i.e. not drop authenticated sessions just because a client's ARP entry went FAILED?
Should we be using a different network architecture (VLANs per AP, multiple bridges) to contain broadcast domains with 28 APs?
What is the recommended sysctl tuning for net.ipv4.neigh on a high-client-turnover captive portal deployment?
What we've tried:
Increased gc_thresh1/2/3 — made things worse (larger thresholds = more stale entries accumulate)
Enabled multicast_snooping on br-lan — helped slightly
Installed irqbalance — helped slightly
Disabled DHCPv6 on WAN — removed a CPU-consuming renewal loop
Set network.wan.release='0' — prevents WAN lease flap during renewal
Reduced bridge MAC ageing time to 60 seconds
Replaced main unmanaged switch with smart switch — major improvement in PPS
Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button (red circle; this works best in the 'Markdown' composer view in the blue oval):
Remember to redact passwords, VPN keys, MAC addresses and any public IP addresses you may have:
ubus call system board
cat /etc/config/network
cat /etc/config/dhcp
cat /etc/config/firewall
Hi frollic, thanks a lot for responding to help. I've been so engulf in try to find a solution that I forgot to follow up on this post. I will send the outputs of those commands in about an hour from now.
Nothing unusual, typical of a large hotel type of venue.
So a very small venue, or a large one with hardly any clients.
Nearly one access point per client...
I looked at this when it was first released. IMHO it has a severe design flaw in that it uses nft sets in the fw4 table to track clients if I remember correctly.
This is bad because, amongst other issues:
nftsets are quite inefficient and not intended for this type of use case.
Tracking of client authentication and access rights for a captive portal are by their very nature extremely dynamic. Using nftsets for this in the fw4 table is very unfortunate as fw4 is a simple static firewall configuration tool and should not really be used for dynamic use cases.
You say you are using UAM mode. That too is unfortunate as uspot is your only "standard" choice, given that CoovaChilli is no longer properly maintained.
Perhaps you should at this stage review your design.
Not if he absolutely requires UAM and does not want to write his own OpenNDS FAS script to give support. People say they have done this, it would not be too hard, but there is little call for it to be honest. It's usually "IT Department Policy" to use a legacy RADIUS infrastructure that is saved over and over by the corporate sunk cost fallacy way of thinking eg "We spent millions developing this, someone better think of a use for it..."
As it is still possible to have iptables installed on recent openwrt, I could imagine, that CoovaChilli still can be used to do the job. In case, IPv6 is not required. On custom built image, of course. Having this powerful openwrt-device, and the low utilization,coovas xt_module not required.
Looks like, you have a RADIUS-allergy However, sometimes good old, well proven methods have some advantages.
Thanks for your response. I have deployed this system in several other locations but they work perfectly well with zero issues.. that's my first point of confusion: say, "why does it not work here?" I thought it might be the APs I make use of, and yes, the APs are the problem. In what way? It isn't really clear to me. But when I asked the manufacturer, they said the AP is not built for controller-based use. Unlike the TP-Link EAPs I've used in the past. It seems the APs are causing a lot of noise in the system.
So, I decided to bring in a managed switch and enabled port isolation. I'm currently testing with with a small number of the APs (about 11) and it's been so fine.
I actually used Opennds before uspot. It was fine, but it randomly restarts. Sometimes, it just hangs (authentication stops). That's why I had to switch to uspot.
This was a known bug that raised it head more often on 32 bit devices, much less so on 64 bit, but nevertheless a bug.
It is fixed in v11.x.x onwards. This should be submitted to the OpenWrt package feeds soon.
If you want to use it now, you can compile it yourself with the OpenWrt SDK.
If you want to pursue this, open an issue on Github: OpenNDS issues