Uspot captive portal stops working after ~1 hour clients lose internet

Setup:

  • OpenWrt router with uspot captive portal (UAM mode)
  • 28 APs connected via 3 PoE switches aggregated through 1 main switch
  • Starlink as WAN uplink
  • ~30 clients at any given time
  • All clients use randomized MACs (modern Android/iOS)
  • WireGuard VPN tunnel (wg0) for management — not in data path

Symptoms:

  • After 30 minutes to a few hours of uptime, clients lose internet access
  • Captive portal page becomes slow or blank for new clients
  • APs become unreachable or respond with 200ms+ latency (should be <1ms)
  • nft list set inet fw4 uspot shows empty or near-empty set during failure
  • ip neigh show dev br-lan shows 50+ FAILED entries, only 3-6 REACHABLE
  • Packet rate on br-lan spikes to 10,000-789,000 PPS during failure (should be ~300-500 PPS for 30 clients)
  • Only a full system reboot (including Starlink + all switches + APs) restores service
  • Rebooting OpenWrt alone does not fix it
  • Removing OpenWrt and connecting switches directly to Starlink works fine indefinitely

Root cause investigation so far:

  1. Confirmed NOT: DHCP exhaustion, conntrack overflow, memory exhaustion, WireGuard issues, rogue DHCP server, duplex mismatch
  2. Confirmed: The main unmanaged switch was flooding unknown unicast traffic to all ports at 789,000 PPS due to MAC table overflow from randomized client MACs cycling through
  3. Partial fix: Replacing main switch with a smart switch (Netview GBDJ-08G02GB) reduced PPS from 12,000 to 60 PPS in testing
  4. Remaining issue: ARP table still fills with FAILED entries over time, APs still become unreachable periodically

Current configuration:

br-lan: 192.168.0.0/20
option setname 'uspot'
option idle_timeout '86400'
option session_timeout '0'
option acct_interval '30'
option auth_mode 'uam'
  • With randomized MACs constantly cycling, what is the recommended way to prevent ARP table degradation on a large flat /20 network?
  • Is there a way to make uspot more resilient to ARP table failures — i.e. not drop authenticated sessions just because a client's ARP entry went FAILED?
  • Should we be using a different network architecture (VLANs per AP, multiple bridges) to contain broadcast domains with 28 APs?
  • What is the recommended sysctl tuning for net.ipv4.neigh on a high-client-turnover captive portal deployment?

What we've tried:

  • Increased gc_thresh1/2/3 — made things worse (larger thresholds = more stale entries accumulate)
  • Enabled multicast_snooping on br-lan — helped slightly
  • Installed irqbalance — helped slightly
  • Disabled DHCPv6 on WAN — removed a CPU-consuming renewal loop
  • Set network.wan.release='0' — prevents WAN lease flap during renewal
  • Reduced bridge MAC ageing time to 60 seconds
  • Replaced main unmanaged switch with smart switch — major improvement in PPS

Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button (red circle; this works best in the 'Markdown' composer view in the blue oval):

Screenshot 2025-10-20 at 8.14.14 PM

Remember to redact passwords, VPN keys, MAC addresses and any public IP addresses you may have:

ubus call system board
cat /etc/config/network
cat /etc/config/dhcp
cat /etc/config/firewall

One thing to try: use static/permanent ARP entries loaded from a file for the infrastructure devices like the access points.

Hi frollic, thanks a lot for responding to help. I've been so engulf in try to find a solution that I forgot to follow up on this post. I will send the outputs of those commands in about an hour from now.

Hi

ubus call system board:

{
	"kernel": "6.12.74",
	"hostname": "OpenWrt",
	"system": "ARMv8 Processor rev 0",
	"model": "FriendlyElec NanoPi R5C",
	"board_name": "friendlyarm,nanopi-r5c",
	"rootfs_type": "squashfs",
	"release": {
		"distribution": "OpenWrt",
		"version": "25.12.1",
		"firmware_url": "https://downloads.openwrt.org/",
		"revision": "r32768-b21cfa8f8c",
		"target": "rockchip/armv8",
		"description": "OpenWrt 25.12.1 r32768-b21cfa8f8c",
		"builddate": "1773711117"
	}
}

/etc/config/network:

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	list ipaddr '127.0.0.1/8'

config globals 'globals'
	option dhcp_default_duid '0004cc3264709ad994c034615a184'
	option ula_prefix 'fd45:1691:4191::/48'

config device
	option name 'eth0'
	option macaddr '42:f3:c9:90:61:7c'

config interface 'wan'
	option device 'eth1'
	option proto 'dhcp'

config interface 'wan6'
	option device 'eth1'
	option proto 'dhcpv6'

config interface 'wg0'
	option proto 'wireguard'
	option private_key '+9+sXZtndh9Ps8AwrpWw='
	option addresses '10.172.0.15/24'

config wireguard_wg0 'wg0_peer1'
	option public_key '+A/Io14='
	option endpoint_host 'website.tech'
	option endpoint_port '51820'
	option allowed_ips '0.0.0.0/0'
	option persistent_keepalive '25'

config interface 'lan'
	option device 'eth0'
	option proto 'static'
	option ipaddr '192.168.0.1'
	option netmask '255.255.240.0'
	option ip6assign '60'
	option multicast_querier '1'	
	option gc_stale_time '30'
	option igmp_snooping '1'

/etc/config/dhcp:

config odhcpd 'odhcpd'
	option maindhcp '0'
	option leasefile '/tmp/odhcpd.leases'
	option leasetrigger '/usr/sbin/odhcpd-update'
	option loglevel '4'
	option piodir '/tmp/odhcpd-piodir'
	option hostsdir '/tmp/hosts'

config dnsmasq
	option domainneeded '1'
	option boguspriv '1'
	option filterwin2k '0'
	option localise_queries '1'
	option rebind_protection '1'
	option rebind_localhost '1'
	option local '/lan/'
	option domain 'ariyonet.client'
	option expandhosts '1'
	option nonegcache '0'
	option cachesize '1000'
	option authoritative '1'
	option readethers '1'
	option ednspacket_max '1232'
	option filter_aaaa '0'
	option filter_a '0'

config dhcp
	option interface 'lan'
	option start '100'
	option limit '3000'
	option leasetime '1h'
	list dhcp_option '114,http://192.168.0.1'
	list dhcp_option '1,255.255.240.0'
	list dhcp_option '3,192.168.0.1'
	list dhcp_option '6,192.168.0.1'

config dhcp
	option interface 'wan'
	option ignore '1'

config ipset
	list name 'wlist'
	list domain 'ariyonet.wififlow.tech'
	list domain 'www.ariyonet.wififlow.tech'
	list domain 'checkout.paystack.com'
	list domain 'js.paystack.co'
	list domain 'challenges.cloudflare.com'
	list domain 'paystack.com'
	list domain 'api.opayweb.com'
	list domain '*.opayweb.com'
	list domain 'opayweb.com'
	list domain 'sandbox.sdk.monnify.com'
	list domain 'sdk.monnify.com'
	list domain 'sandbox.monnify.com'
	list domain 'monnify.com'
	list domain 'sandboxcashier.opaycheckout.com'
	list domain 'opaycheckout.com'
	list domain 'express.opaycheckout.com'
	list domain 'opay-sdk-tddevice-api.opayweb.com'
	list domain 'api.paystack.co'
	list domain 'standard.paystack.co'
	list domain 'googletagmanager.com'
	list domain 's3-eu-west-1.amazonaws.com'
	list domain 'sockjs-eu.pusher.com'
	list domain 'www.google-analytics.com'
	list domain 'google-analytics.com'
	list domain 'fonts.googleapis.com'
	list domain 'safehavenmfb.com'
	list domain 'api.safehavenmfb.com'
	list domain 'api.sandbox.safehavenmfb.com'
	list domain 'browser-intake-datadoghq.eu'
	list domain 'cdn.jsdelivr.net'
	list domain 'cdnjs.cloudflare.com'
	list domain 'fonts.gstatic.com'

config domain
	option name 'ariyonet.client'
	option ip '192.168.0.1'


/etc/config/firewall:

config defaults
	option syn_flood '1'
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'

config zone
	option name 'lan'
	list network 'lan'
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'

config zone
	option name 'wan'
	list network 'wan'
	list network 'wan6'
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'DROP'
	option masq '1'
	option mtu_fix '1'

config zone
	option name 'wg'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'wg0'

config ipset
	option name 'uspot'
	list match 'src_mac'

config ipset
	option name 'wlist'
	option match 'dest_ip'

config rule
	option name 'Allow-DHCP-Renew'
	option src 'wan'
	option proto 'udp'
	option dest_port '68'
	option target 'ACCEPT'
	option family 'ipv4'

config rule
	option name 'Allow-Ping'
	option src 'wan'
	option proto 'icmp'
	option icmp_type 'echo-request'
	option target 'ACCEPT'
	option family 'ipv4'

config rule
	option name 'Allow-IGMP'
	option src 'wan'
	option proto 'IGMP'
	option target 'ACCEPT'
	option family 'ipv4'

config rule
	option name 'Allow-DHCPv6'
	option src 'wan'
	option proto 'udp'
	option dest_port '546'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-MLD'
	option src 'wan'
	option proto 'icmp'
	option src_ip 'fe80::/10'
	list icmp_type '130/0'
	list icmp_type '131/0'
	list icmp_type '132/0'
	list icmp_type '143/0'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Input'
	option src 'wan'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	list icmp_type 'router-solicitation'
	list icmp_type 'neighbour-solicitation'
	list icmp_type 'router-advertisement'
	list icmp_type 'neighbour-advertisement'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Forward'
	option src 'wan'
	option dest '*'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-IPSec-ESP'
	option src 'wan'
	option dest 'lan'
	option proto 'esp'
	option target 'ACCEPT'

config rule
	option name 'Allow-ISAKMP'
	option src 'wan'
	option dest 'lan'
	option proto 'udp'
	option dest_port '500'
	option target 'ACCEPT'

config rule
	option name 'Allow-DHCP-NTP-captive'
	option src 'lan'
	option proto 'udp'
	option dest_port '67 123'
	option target 'ACCEPT'

config rule
	option name 'Allow-DNS-captive'
	option src 'lan'
	option dest_port '53'
	option target 'ACCEPT'
	list proto 'udp'
	list proto 'tcp'

config rule
	option name 'Allow-captive-CPD-WEB-UAM'
	option src 'lan'
	option dest_port '80 443 3990'
	option proto 'tcp'
	option target 'ACCEPT'

config redirect
	option name 'Redirect-unauth-captive-CPD'
	option src 'lan'
	option src_dport '80'
	option proto 'tcp'
	option target 'DNAT'
	option reflection '0'
	option ipset '!uspot'

config rule
	option name 'Forward-auth-captive'
	option src 'lan'
	option dest 'wan'
	option proto 'any'
	option target 'ACCEPT'
	option ipset 'uspot'

config rule
	option name 'Allow-Whitelist'
	option src 'lan'
	option dest 'wan'
	option proto 'any'
	option ipset 'wlist'
	option target 'ACCEPT'

config rule
	option name 'Drop-unauth-to-wan'
	option src 'lan'
	option dest 'wan'
	option proto 'all'
	option target 'DROP'

config rule
	option name 'Allow-RADIUS-DAE'
	option src 'wan'
	option proto 'udp'
	option dest_port '3799'
	option target 'ACCEPT'
	option family 'ipv4'
	option src_ip '3.148.117.74'

config rule
	option name 'Restrict-input-captive'
	option src 'lan'
	option dest_ip '!lan'
	option target 'DROP'


Hi, thanks for the input. I will definitely look at this as well

Nothing unusual, typical of a large hotel type of venue.

So a very small venue, or a large one with hardly any clients.
Nearly one access point per client...

I looked at this when it was first released. IMHO it has a severe design flaw in that it uses nft sets in the fw4 table to track clients if I remember correctly.
This is bad because, amongst other issues:

  1. nftsets are quite inefficient and not intended for this type of use case.
  2. Tracking of client authentication and access rights for a captive portal are by their very nature extremely dynamic. Using nftsets for this in the fw4 table is very unfortunate as fw4 is a simple static firewall configuration tool and should not really be used for dynamic use cases.

You say you are using UAM mode. That too is unfortunate as uspot is your only "standard" choice, given that CoovaChilli is no longer properly maintained.

Perhaps you should at this stage review your design.

opennds FTW ? :slight_smile:

Not if he absolutely requires UAM and does not want to write his own OpenNDS FAS script to give support. People say they have done this, it would not be too hard, but there is little call for it to be honest. It's usually "IT Department Policy" to use a legacy RADIUS infrastructure that is saved over and over by the corporate sunk cost fallacy way of thinking eg "We spent millions developing this, someone better think of a use for it..."

As it is still possible to have iptables installed on recent openwrt, I could imagine, that CoovaChilli still can be used to do the job. In case, IPv6 is not required. On custom built image, of course. Having this powerful openwrt-device, and the low utilization,coovas xt_module not required.

Looks like, you have a RADIUS-allergy :slight_smile: However, sometimes good old, well proven methods have some advantages.

Hey bluewavenet,

Thanks for your response. I have deployed this system in several other locations but they work perfectly well with zero issues.. that's my first point of confusion: say, "why does it not work here?" I thought it might be the APs I make use of, and yes, the APs are the problem. In what way? It isn't really clear to me. But when I asked the manufacturer, they said the AP is not built for controller-based use. Unlike the TP-Link EAPs I've used in the past. It seems the APs are causing a lot of noise in the system.

So, I decided to bring in a managed switch and enabled port isolation. I'm currently testing with with a small number of the APs (about 11) and it's been so fine.

I actually used Opennds before uspot. It was fine, but it randomly restarts. Sometimes, it just hangs (authentication stops). That's why I had to switch to uspot.

This was a known bug that raised it head more often on 32 bit devices, much less so on 64 bit, but nevertheless a bug.
It is fixed in v11.x.x onwards. This should be submitted to the OpenWrt package feeds soon.
If you want to use it now, you can compile it yourself with the OpenWrt SDK.
If you want to pursue this, open an issue on Github:
OpenNDS issues