Intent of DAWN hearing map?

I'm trying to understand the intent behind the DAWN hearing map.

I've successfully installed DAWN on several routers ( 2022-01-17-7a726740-1 ) and I've been able to confirm that 802.11v neighbour reports work as expected. The "neighbour map" feature (both via luci and via ubus call dawn get_network) also seems to work as expected.

However, I don't understand the "hearing map" feature (neither via luci nor via ubus call dawn get_hearing_map). At a high-level, my impression is that this data would allow the APs to "nudge" clients over to better APs. However, when I query the data I typically see a hearing map showing just the clients and the AP they are already attached to. I don't understand how the hearing map will be able to "nudge" clients to a better AP if there is no information on how well the client can see other APs.

So for example, I might see:

{
        "MySSID": {
                "33:33:33:33:33:33": {
                        "11:11:11:11:11:5F": {
                                "signal": -42,
                                "rcpi": 224,
                                "rsni": 0,
                                "freq": 5500,
                                "ht_capabilities": true,
                                "vht_capabilities": true,
                                "channel_utilization": 3,
                                "num_sta": 2,
                                "ht_support": true,
                                "vht_support": true,
                                "score": 181
                        }
                }
        }
}

It does seem that many of my devices support RRM (802.11k Radio Resource Management). I do see dawn actively sending beacon requests in the system log. For example, I see:

Sat Feb 19 05:08:09 2022 daemon.notice hostapd: wlan0: BEACON-RESP-RX 33:33:33:33:33:33 212 00 ...actual beacon data elided...

The odd thing is, it seems that DAWN is requesting the beacon for the AP the client is already connected to?! I'm confused why DAWN wants the beacon for the client's current AP, as the signal strength for the current AP should already be known.

It does seem that my devices that support RRM can request the beacon for other APs. For example, if I manually run: ubus call hostapd.wlan0 rrm_beacon_req '{"addr":"33:33:33:33:33:33","mode":1,"op_class":0,"channel":11,"duration":100,"bssid":"11:11:11:11:11:60","ssid":"MySSID"}' I'll see something like the following in the system log:

Sat Feb 19 05:13:21 2022 daemon.notice hostapd: wlan0: BEACON-RESP-RX 33:33:33:33:33:33 218 00 ...actual beacon data elided...

and sure enough the hearing map will then populate for the additional AP:

{
        "MySSID": {
                "33:33:33:33:33:33": {
                        "11:11:11:11:11:5F": {
                                "signal": -42,
                                "rcpi": 223,
                                "rsni": 0,
                                "freq": 5500,
                                "ht_capabilities": true,
                                "vht_capabilities": true,
                                "channel_utilization": 3,
                                "num_sta": 2,
                                "ht_support": true,
                                "vht_support": true,
                                "score": 181
                        },
                        "11:11:11:11:11:60": {
                                "signal": -43,
                                "rcpi": 220,
                                "rsni": 0,
                                "freq": 2462,
                                "ht_capabilities": true,
                                "vht_capabilities": true,
                                "channel_utilization": 59,
                                "num_sta": 1,
                                "ht_support": true,
                                "vht_support": false,
                                "score": 154
                        }
                }
        }
}

The other odd thing I noticed is that the hearing map often contains the results of PROBE messages for devices from other households. I'll often see a dozen or more devices in the hearing map that are not mine and are not authorized to connect to MySSID. It's not clear to me if the hearing map is storing broadcast PROBE requests, or if these devices are sending directed PROBE requests to MySSID.

I looked briefly through the DAWN code to see why it queries beacons for the client's current AP. It does seem like the code is capable of querying other APs (src/utils/ubus.c:ubus_send_beacon_report() takes an AP parameter), but it seems intentional that only the current AP is queried (src/utils/ubus.c:update_beacon_reports() seems to only request the local AP radios).

I was thinking I might "hack" the DAWN code to request beacons for all APs, but thought I should ask what the intent of the code is first. I did read the recent docs at https://github.com/Ian-Clowes/DAWN (which was very helpful), but I'm still confused on the hearing map.

Some questions:

  1. What is the intended workflow of an "AP invoked client nudge to another AP"; how is the data in the hearing map intended to be used?
  2. What value is there in requesting the client to report the beacon of the AP it is currently connected to?
  3. Why does the hearing map store probe reports for clients that aren't actually connected to any AP (and perhaps not authorized to connect to any AP)?

Perhaps @PolynomialDivision or @IanC may have some thoughts on the above?

Thanks,
-Kevin

1 Like

Very good questions.

The hearing map is basically a view of the information that DAWN has stored in relation to each client device, which is the latest PROBE and BEACON reports in relation to each AP. A key thing to look for is where a client is known to at least two APs, as that allows the scores (last field in list) to be compared to see if moving the client is justified.

You're right that only the current AP for a device asks for the BEACON REPORT, and only a single request - nominally in the context of that AP. Frankly this is where I hit a bit of a barrier in understanding expected behaviour. When I read some of the 802.11k specification summaries that I can find (becasue the full spec is a commercial document) I see wording that suggests a client should include in the BEACON REPORT details of all relevant APs that it hears - so a single request should be sufficient. But the message that hostapd returns to DAWN doesn't have fields for all of that.

Another problem for me is that my two Samsung devices are "patchy" in what BEACON REPORTS they return. I can try many combinations of parameter via ubus, and get no response. The single iPhone in the house is better, but it then seems to highlight other behaviours such as returning BEACONs quite reliably, but not if requested in quick succession. There are also things like different parts of 802.11 making RCPI or RSSI easier / harder to get, with complication of comparison.

My two pronged apprach to try and understand this better is going to be to get some kind of packet sniffing set up so I can see if devices are returning frames that somehow get lost in hostapd, and also set up a fork of hostapd so I can experiment a bit. If anyone knows enough about 802.11k to describe how it is intended to work and / or how hostapd interacts with that it may well save me a good few hours!

You're also right that DAWN keeps information on PROBEs that may not lead to AUTHENTICATION or ASSOCIATION follow-ups. That is related to trying to keep pre-802.11k devices (legacy) on a suitable AP as well. In that case PROBEs are the initial point to steer a client away, but a bit of history is kept so that if a device keeps coming back it can be allowed in (on the basis that it will have no service at all if not). It's fair to say that this isn't fully working yet, but as mentioned in the documentation you read there are configuration flags that can be set to try it out.

The above then leads into a core discussion of what DAWN should be trying to do that other steering daemons don't. As has been shown with some other approaches / packages it's possible to keep all the 802.11k/r/v aware devices in a network of a few APs on a good connection using quite simple scripts around hostapd. The essentials are tell every device about the full AP set (802.11k neighbor report), ask them to move if AP signal strength goes below a threshold (801.11v transition request, again including NR), and use Fast Transition to make the handover pretty seamless (802.11r). I've added a couple of changes to DAWN to help in that same area - DAWN didn't ought to be worse for that simpler case.

I then focus on two other areas that other daemons may not be:

  • legacy and / or non-conformant devices, as I have quite a few in the house
  • larger networks of 20-30 APs, such as auto-optimisation of the neighbor reports to include the half-dozen nearest APs

DAWN started when the first of those was a real problem - relatively few devices supported 802.11k, etc. Now more support it, but not necessarily in consistent ways. So a possible answer to your Q is that DAWN is evolving to try and be useful in a wide range of cases, but with imperfect inderstanding of how various devices really behave.

1 Like

Hi Ian. Thanks for your detailed response.

FYI, I was able to create an account at https://ieeexplore.ieee.org/browse/standards/get-program/page/series?id=68 and then download the 802.11-2020 pdf ( https://ieeexplore.ieee.org/document/9363693 ). There does appear to be notable information on beacon requests in that document (specifically section 11.10.9.1).

As I understand the active/passive beacon request mechanism, the AP is asking the client to tune the radio to the requested channel for the requested duration (in increments of 1024 microseconds), and then return back to its normal channel/AP to report the found beacons.

It seems that DAWN's ubus_send_beacon_report() code is asking the client to tune to the client's current radio channel to make that measurement. But, of course, it wont see any beacons for other APs that are not on that channel. So, it seems the response beacon report wont contain information for other APs (unless the user has multiple APs on the same radio channel - which does not seem like an ideal setup).

I've observed something similar in the handful of devices I've tested. I've found using a duration=100 to be most reliable - in particular for passive requests. However, I suspect a high duration like this may degrade quality for the client (if the radio is listening for beacon reports for 102.4ms, then it would seem likely that it isn't processing normal data and that may introduce jitter to streaming content).

I've also played around with "wildcard queries". However, it seems most devices will only respond with a few (typically 4) beacon reports for any single beacon request. This seems to limit the utility of wildcard queries - it may be simpler overall to make a query for each AP radio than to have to handle overflows.

Thanks. I guess I understand this. FWIW, though, it seems to me this method will have diminishing overall returns. That is, if a client is actively probing all channels then it seems likely it will choose a good channel anyway. It seems to me that the challenge is those devices that don't periodically probe - as then neither the AP nor the client know that a better alternative is available.

Interesting. I'm curious on how you plan to handle this. One of the things I've seen is some older devices will connect to a particular AP radio and then stay on that radio for days; seemingly never probing for alternatives. I didn't have any ideas how to improve this though. I guess one could periodically disconnect the client in hopes it will probe other channels - but that seems a bit extreme - in particular if it is actually connected to the best AP radio for it.

Thanks again,
-Kevin

I'll take a look at that, thanks.

I also set up WireShark on a spare laptop, so seeing what devices really do will probably be as important as what the specs say they should do.

That makes sense, and is a useful clarification. I have WireShark doing 802.11 managment frame logging, so will align theory with practice.

I'm already running a local patch that does a round-robin for a report on the other BSSID for the relevant SSID. II think it is highlighting that we need to give the client device time to complete the first request before giving it a 2nd, otherwise one / more is discarded. So looks like that'll take a bit more thought - maybe as simple as ask all clients for report on one AP, wait ~200ms (assuming a 100TU scan duration) and do same for next AP, etc.

For small networks it's simple enough to ask a client how well it can see the other ~3-4 devices (which could be 8 BSSID if using same SSID on both bands). For larger networks it crosses into the area of each AP knowing it's nearest neighbors (the set it sends out via a Neighbor Report) so that it can be used for the BEACON requests. Making that list persistent in DAWN would be part of the more complete design that I'd want to put in place.

Assuming that a device might be requested to report on ~12 other BSSID (depending on physical AP distribution and dual-band use) that'd be ~1.5 seconds of time away from the data transmission band. If that was spread over (say) 5 seconds it'd be ~30% "down time" for that period. I've no idea how impactful that might be to real-time data use such as VOIP calls, or whether the device would decline to respond while knowing that type of transmission was underway.

I've done some additional investigations and had some high-level ideas that I can share. I haven't implemented any code at this point, so take everything "with a grain of salt".

FWIW, I was thinking that the code could query one AP for each client each update_beacon_reports time. So, if the update time was 1 minute and there are 10 AP radios in a client neighbour map then a full scan would occur over 10 minutes for that client. Also, the current code does a query for all clients on each wakeup event, and I think it may be better to spread that out a bit - as I fear the current system could lead to a "probe storm". So, if there are 5 clients and update_beacon_reports is 1 minute then it may be better to try to handle each client once every ~12 seconds.

A scan over ~10 minutes may sound like a lot of time, but I think that should work okay for "network balancing".

At a high-level, it seems to me one could look at DAWN as helping with two high-level "use cases" - "fast roaming" and "network balancing". The former being typically characterized by a phone/tablet moving from one room to another and the latter being characterized by devices that stay on an AP for days/weeks even though better APs are available.

In that thinking, it seems the beacon requests can help quite a bit with "network balancing" even if there are several minutes between queries.

-Kevin

I've done some additional investigations on beacon request types. There are three types - passive, active, and table.

The table beacon request does not seem useful to me - it might even be worth removing support from the DAWN code as I suspect the current code would just be buggy. At a high-level the "table" request seems to request the last beacon the client has seen for the requested channel/bssid/ssid. I have few devices which support this query, but the one that I do have (a Nexus 4s phone) will dutifully report the last beacon it had for the channel. However, that beacon was often hours old, or nothing would be reported at all. This doesn't seem like it has much utility.

The "active" beacon request seems the most robust. As I understand it, this is requesting the client to tune to a given channel, send a PROBE message on that channel, wait for the requested duration, and then report the results back to the AP on its normal channel. If I fully specified a channel/bssid/ssid then I can get consistently good results with a duration=20 (or even lower). I suspect, in the short-term, focusing on active beacon requests will provide the best utility.

Finally, the "passive" beacon request seems to be similar to the active beacon request - except no PROBE message is sent. So, a request for the client to tune to the given channel for the given duration, and then report back any found beacon messages to the AP on its normal channel. I found that passive requests were only consistent if a duration=100 is issued. This makes sense as most APs are configured to send a beacon every 100 time units (102.4ms). Thus I'd only get a response if the query happened to overlap with the APs normal beacon transmission time, or if it happened to overlap with some other device sending a probe message on that channel. Unfortunately, I suspect using a duration=100 is not a good solution for client jitter.

Interestingly, I think it may be possible to transition from active beacons to passive beacons if that was desired. Each beacon reports a timestamp and the interval between beacons. So, after a few active reports it should (at least in theory) be possible for software to estimate the APs next normal beacon and send a passive request for that time. In my quick tests, timing was pretty consistent, so I suspect one could code up a timed passive beacon request system with duration=20 (or less) and get good results. It may also be possible to directly query the radio timestamp and beacon interval from the OpenWRT instance running the given AP radio (I'm not sure if that info is exported anywhere). Unfortunately, the code complexity of this implementation seems a bit high and I'm not sure it has much advantage over active probes. I also noted that hostapd doesn't currently export the full beacon results from a beacon request (so DAWN doesn't currently have access to the radio timestamp nor beacon interval), and I also noticed that there is a complex interatction with hidden SSIDs (a hidden SSID will report the SSID in a beacon sent in response to a PROBE but not in its normal periodic beacon, and some of my devices would not support a beacon request with bssid set but with an empty SSID field).

Cheers,
-Kevin

1 Like

Thanks, a lot of what you say rienforces what I'd already inferred - but practical tests are really useful.

I also have a preference for using PASSIVE scans - I assume it saves battery and reduces network / radio congestion. It would be a bit ironic if DAWN's efforts to get the most up to date view of the network in order to improve it were a cause of overall network degradation!

I think there are some fairly fundamantal questions in here about how up to date the knowledge used to steer devices (and hence hearing map) needs to be. For example, if we assume devices are largely static (phone on a sofa, laptop on a desk) then it's reasonable to compare radio quality to a few APs where the data points were taken across ~10-100 seconds period. If we're trying to do the same for people moving around a campus then comparing a good signal they had while next to an AP a minute ago to a reasonable signal reading a few seconds ago to the AP they are now walking towards (and already better than the AP they left behind) will give confusing results.

Perhaps that simplifies if we assume that a typical walking rate between two APs will lead to a ~10dB change in a minute (I'll have to do some calcs on that :slight_smile:), and long as we can do a steer when the "old" signal is in the -65db to -75dB range all will be well. And that may be handled well enough by the new mode 2 kicking, which simply looks for the current connection crossing that -65dB line rather than worrying about which is the best other AP to steer to.

So DAWN would currently handle two cases:

  • Simple threshold kicking as a device moves around, perhaps too fast for point-point comparison
  • "Long term" optimisation, such as a laptop seeing it's current AP at -63dB (so below DAWN's threshold kick and probably it's own roam trigger) but also a nearer AP at -52dB. In this case DAWN could use scoring (eg 2 points per dB and a trigger diffrence of 20 points) to optimise the connection.

That's proabably quite a nice balance, and could drive a reasonably simple optimisation of the current design.

I agree.

FWIW, I think an interesting target is streaming devices (eg, roku, fire stick, chromecast, ip cameras). These devices don't typically move at all, but can be heavy bandwidth users. If they get out of balance they can notably congest the network. But, it is fine to rebalance them over a period of 10-15 minutes. One way to distinguish these devices between more mobile devices would be to keep the last few probe/beacon reports for each client/AP pair. So, if there is a better AP and it has been the better AP for the last 3 probe cycles then it's likely a good choice to move the client.

Just throwing out some ideas.. One possibility would be to check for rapid changes in signal strength. So, if a client has -70dB and always had -70dB then kicking may not be a good idea. However, if the client used to have -40dB and over the last 30 seconds has started consistently getting -60dB then maybe beacon requests should be prioritized (if applicable) or even active kicking. The code complexity might not be worth it, but I figured I'd raise it as an idea.

Cheers,
-Kevin

At the moment a device has to have a better AP for three (by default) consecutive checks before it is kicked. It doesn't have to be the same AP each time though. As a part of the above pondering I was thinking that checking it was the same one could be added. That wouldn't be much differernt to now for stationary devices - its almost certainly the same AP each time. What it would do is prevent stering a moving device from AP1 to AP4 just beacause it happend to be the most recent best, even though it had been AP2 and AP3 on the previous checks. But I'm not sure the extra complexity is really justified for those dynamic cases. If it eventually stopped moving near AP5 we should then spot that and steer it, even if the steer to AP4 had been ultimately unnecessary.

Maybe, although the FireTV Sticks I have don't seem to show any 802.11k/r/v capabilty so we'd be reliant on classical PROBEs to discover the situation, and a hard disassociation to make them move.

An ability to auto classify devices as static or dynamic could be useful in large networks. If about 50% of devices in a space didn't move throughout the day (laptops, POS displays, wireless printers, etc) while the remainder did (mobile phones, shop floor scanners, etc) then it would be useful to segment the management "style" for each. The trick would be to be able to flag a device as mobile again soon enough, such as within a minute of someone got up to go to a quite zone to make a call on thier laptop. My initial though is that the absolute dB threshold "quick kick" plus "static during call" optimisation will again give a decent outcome.

I chose similar but different. A DAWN instance now asks its local clients how well they can see a specific other AP on a round-robin basis driven by the target refresh period and number of BSSID in the overall network.

For example, with 5 dual-band AP devices (so 10 BSSID) and a target update_beacon_reports of 20 seconds the interval will be 2 seconds. Every 2 seconds each locally connected client will be asked for a BEACON report on the next BSSID, as long as it is offering the same SSID that the client device is already connected to. That means any given client device receives a request to perform a single passive (by default) BEACON every 2 seconds.

It seems to work pretty well, although again seems to highlight that various devices including my ~12m old Samsung phone are reluctant to do BEACONs.

And there is still the underlying snag that RSSI from PROBEs and local measurement is used for steering, but if we're starting to get more reliable RCPI in BEACON reports then it could be an alternative for enabled devices.

Interesting. Is this something you've coded up? (I didn't see an update to https://github.com/Ian-Clowes/DAWN .)

FWIW, one thing to be aware of is "ping storms" - for example, it may be problematic if 10 clients all send a PROBE to a given AP radio at roughly the same time. Passive probes wouldn't have that issue, but I suspect passive probes wont work well unless one can implement some kind of request timing for them (as mentioned above).

Cheers,
-Kevin

I'll try and get it pushed ASAP, just like I have for the last few weeks ...

The mechanism should be quite "lightweight" for any AP or client device I think. It's using passive mode (although that can be changed with the RRM config item), and since each client device is only asked to make a scan by the AP it is currently connected to there is no exponential effect of every AP asking every client device.

PROBEs are a whole other problem. If legacy devices could be encouraged to send them it would make handling them easier, but of course that's why 802.11k came along...