Very slow with lots of access points present (>>100)

Ok, so I just got back from a technical show where I had my BTHHV5a running LEDE (not latest).
I estimated that there were >> 100 access points visible on wifi, both 2.5 and 5g.
I had it configured for a DHCP WAN, with an internal Ethernet LAN (192.168.9.x), and a guest WIFI LAN (192.168.10.x).

We had two issues.
1/ the router almost always failed to bring up the wifi on first boot (sometimes 5g missing, sometimes 2.5g missing, sometimes both). Normally, a poweroff, poweron would resolve this, maybe for the 5g to drop again later.
2/ LUCI was VERY slow. CPU Utilisation was ~1.9 (that's basically almost 100%, right?).

(note; running the same firmware in a 'normal' wifi environment is absolutely fine, and routing of ethernet seemed to work nicely, and the wifi points I was contributing to the melee worked fine at close range also, but there may have been some disturbance on the br-lan side - the PC had some slowness talking to a VM on the same PC with bridged network. It MAY be that there was extreme multicast traffic on the WAN ethernet; i did not check this at the time ).

My immediate thought was that there may be some buffer overruns in whatever gathers the lists of available wifi SSIDS, or memory exhaustion.

So, for future similar scenarios (where hundreds of access points are out there, all visible to LEDE), any advice on tweaks to prevent LEDE from looking for them and occupying itself doing nothing useful for me would be appreciated ! :).
br, simon

A load of 1.9 means it is 190% of what the router can handle without being overloaded.

that will be why it's slow then :). But I thought, 1.9 out of 2.0, since it has dual core CPU? But I'm not aware of how it uses the second core..... even if this is available to our linux, or is some specialist communications core.
(Lantiq Xway VR9 VRX268 PSB 80910 (MIPS 34Kc) v1.2.1)

That is correct for the CPUs alone, but the load also includes processes waiting for IO (Disk/network) and some other stuff, so the system can be overloaded by this measurement even if the CPUs are almost idle.

1 Like

high-density environments like tech-shows are a regular shit show regarding wifi.
atop of the hundred(s) of visble ap's, there are probably as much that are not broadcasting a beacon (hidden) plus a metric fuckton of other wireless gadgets. it' a miracle that anything works at all.
so i dont think your experience is attributable to openwrt.

then again, stuff like airtime fairness, noscan, legacy_rates and multicast comes to mind and yea.. maybe you could make it suck a tiny bit less :wink:

The only problem will be that you can only reasonably debug this under similar circumstances - in other words during a crowded fair...

Well, it seems to me that you should never have wifi fail to even start up, so that's suspicious.

I do agree that in this kind of density, stuff is unlikely to work well though. For example there are collision avoidance methods. Perhaps your wifi is trying to send beacons and finding that there are zero free time-slices, and therefore borking. Typically beacons are sent around every 100ms, which is 10 times a second, so if there are around 100 APs using the same channel, you'd have something like 1000 beacons a second, which would by itself take up most of your airtime. Even if they are not all on the same channel, a few tens of wifi APs on a given channel, plus some actual devices trying to send data, and you're basically guaranteed to never get a free timeslot.

EDIT: if you want to actually set up a crowded tech fair with working WiFi, it's very doable, you just have to plan out the WiFi deployment, and use very low power access points. If you have 100 APs they should probably be transmitting at about 4 to 8 dBm and be on different channels so you never hear more than 2 on any given channel.

thanks all :).

'you just have to plan out the WiFi deployment' - not a hope in hell of coordinating with other stands; generally anyone with some small technical knowledge has upped their power, not lowered it :).

I was thinking that something in OpenWRT is taking note of the other access points, and if this could be turned off, then it may be more stable; but have not looked at where/what yet - I was hoping for some insight here; maybe just terminating a process would do the job; but then again if it's a basic part of the stack....

'reasonably debug' - yes, this is the main issue; at the show, you can't afford the time to develop OpenWRT; it's not your reason for being there (anyone going to an OpenWRT focussed meet? maybe that's the place to diagnose :slight_smile: ).

I think my first port of call is to get onto a latest firmware; the wireless on the BTHH5a in ~17.01 was not perfect, and may have already been improved. But with no way to test... Second mitigation, taking an ethernet switch so I can isolate my PC world to see if the PC slowness is related to the LAN side of OpenWRT, and only connecting when I NEED internet. Of course (for us) wireless at the show is a non-essential, so I should really turn it off unless required, so giving everybody else a fighting chance... we'd be a little daft to try to show something which relied on wifi to work (in the past, we've even wired Apple and Android devices, as the wifi cannot be relied on. physical ethernet on iPhone is a challenge...).

In one of the failures, there was a system log, but again, no time to even note it during the show, and no reproduction outside of the show.

:). it's a broadcast market show (TV). I cam back and found my samsung phone had paired with 136 samsung TVs hoping to act as a remote!.

thankyou for some pointers. I'll look at 'noscan' and see if this can be configured on the interfaces; I assume this would mean it didn't look at the other SSIDs. (or maybe not; just looked it up, a little dangerous? a scan through other wifi options does not reveal anything specifically useful :frowning: ).

Yes. There's a good paper from LISA '12 that talks about "Building a Wireless Network for a High Density of Users" https://www.usenix.org/conference/lisa12/technical-sessions/presentation/lang_david_wireless that is the first place I had seen the "lots of access points at low power" for successful wi-fi at a conference.

I was more thinking of the organizer rather than individual booths. In fact if you create a universal open WiFi network using Enterprise security, and hand out little business cards with random UID and passwords on them to participants as they enter... there is no need for any additional wifi. Make a no additional wifi line item in the booth contract, and have a couple of people enforcing it... You can have fabulous connectivity. But you need to know something about how to do it.

ha ha, there is site wide wifi (the venue is the RAI in amsterdam), but many booths need wifi with access to their internal booth LAN..... I don't see coordinated actions happening any show soon. This year we were lucky; the wired internet actually worked. Last year there was a DNS issue which meant no wired internet access anyway... so our unexpected windows update was sharing 4G with 500 other tethered phones :(.
Maybe if we highlight that the wifi issue is spiraling out of control, then the RAI will start to listen...

@richb-hanover-priv - nice link - good talk. The show I was at had ~60,000 visitors, and 1,700 stands spread over 112,200 m², but as you get larger, the wifi density must peak, as you can only fit so many stands into each square km.

Properly coordinated, stations do not up their power. They properly section off by channeling, and only use the power necessary. The goal is reduction of interference.