Dawn: a decentralized wireless controller

I'm enabling the features here:

That is a very interesting finding. Actually bss_tm_req should kick it because we set some timer?

But I just realized sometimes I use del_client_interface. I need further investigation into it.

If you want to help me, feel free to open a wiki artikel about your findings and config options and I am happy to read the things and correct them if they are wrong?

Currently, I lack on a good testbed, and I have more important coding tasks in my limited free time to do. :confused: If you want to start coding for DAWN, I am happy to help. Or please write issues on GitHub, and then I can rewrite DAWN code and add more debug output that you need. I take it with me that I should write some log, that shows the steering decisions?

1 Like

Can you create a ring buffer that has the last ~50 decisions that can be read from /proc ?

Sure. Can you point to some example code, I can use as reference?

Are you in kernel space or userspace only? I think proc is kernel
Kernel space example

Userspace only.

To be honest I was gonna give up on Dawn - as it wasn't really working in my network environment, but....

... after I actually went back from snapshot to release build 21.02.1 without dawn, by accident I bumped into this old comment from you: How does rrm work? - #36 by PolynomialDivision

I then went ahead and try without any expectation but very much to my surprise, both my iPhone (old 6s+) and my Android (Xiaomi Mi11) could be steered at will to the AP of my choice ! All I did was

ubus call hostapd.wlan0-1 wnm_disassoc_imminent '{"addr":"xx:xx:xx:xx:xx:xx", "duration": 120, "neighbors":"$sam"}'

where $sam is

sam=$(ssh root@<ip of another AP> ubus call hostapd.wlan0-1 rrm_nr_get_own)

... works like magic. So now I have high hopes for Dawn again all of a sudden. But too bad last time I did any sort of programming was something like 20 years ago so it will be difficult for me to determine from your source code if kicking means 'wnm_disassoc_imminent' is issued (nor will I be able to start contributing to your github project even though I am willing to do it).

I will soon set up Dawn once again in snapshot Openwrt, and try to determine whether wnm_disassoc_imminent was somehow never triggered in Dawn, or something went wrong during wnm_disassoc_imminent causing the transition to have never taken place for my phones. At least now I know what to look for in logread: any appearance of BSS-TM-RESP xx:xx:xx:xx:xx:xx status_code=0 bss_termination_delay=0 target_bssid=yy:yy:yy:yy:yy:yy as I can see these logs all over my manual tests.

If Dawn ends up working well in my environment, I will be more than happy to contribute to your future wiki (or your README.md in github) :+1:

(so far for Dawn I can see a scoring system that seems to calculate the scores well, plus I know hostapd's wnm_disassoc_imminent is used to steer/kick devices around. What seems to be missing for me is a clear sign of linkage between these two aspects of Dawn. But I am happy there is progress in my testing)

With the latest snapshot build of OpenWrt + Dawn on all of my routers (just two in total to be exact), plus the knowledge I have accumulated over the past two weeks or so particularly hostapd's ubus wnm_disassoc_imminent method, I can now consistently reproduce an issue with Dawn. I thought of opening an issue on GitHub, but this is such a basic core feature of Dawn that really shouldn't be failing so I am writing here first for @PolymomialDivision to see if I have overlooked anything first before filing a bug report.

Here is the crux of the issue: I cannot get Dawn to send out any wnm_disassoc_imminent command whatsoever.

  1. I have option kicking set to '1' on both of my routers.

  2. I can see the other AP with ubus call hostapd.wlan0-1 rrm_nr_list. So the neighbor list has been appropriately populated. I can also see both routers on "Network Overview" on luci interface with devices connected to each.

  3. I make sure my device (will expose the MAC addresses for the sake of clarity, fingers crossed no actual damage will be done by exposing MAC addresses) 9C:BC:F0:A1:10:B7 responds to roaming instructions by manually constructing and sending wnm_disassoc_imminent requests to the device. Sure enough each time it responds by switching to the other AP, plus I see a line like hostapd: wlan0-1: BSS-TM-RESP 9c:bc:f0:a1:10:b7 status_code=0 bss_termination_delay=0 target_bssid=10:6f:3f:3d:67:64 in syslog.

  4. I then make sure the device 9C:BC:F0:A1:10:B7 is connected to router 10:6F:3F:3D:6E:8D, and move the device to a place where 10:6F:3F:3D:6E:8D's signal is weaker than 10:6F:3F:3D:67:64.

  5. Check LUCI's Dawn Hearing Map, and it shows (appropriately) that 10:6F:3F:3D:6E:8D has a score 20 plus points lower than that of 10:6F:3F:3D:67:64, like so

  1. Waited until I ran out of patience. Device is still connected to 10:6F:3F:3D:6E:8D and no sign of moving to 10:6F:3F:3D:67:64.

  2. Checked syslog. I do not see any BSS-TM-RESP message other than the ones that I have triggered manually like in step 3 above. This to me means Dawn is not generating any wnm_disassoc_imminent like it should not only for 9C:BC:F0:A1:10:B7, but for any of my other devices that are connected to my network as well that are eligible for switching according to the Hearing Map.

Like I said, I feel this is such a trivial function of Dawn to be able to issue wnm_disassoc_imminent messages, I am really surprised to find out zero BSS-TM-RESP syslog messages are appearing in either of my two routers. I am hesitant to file a bug at this stage... could very well be something that I have overlooked?

(note: I see lots of other Dawn-related hostapd logs like BEACON-REQ-TX-STATUS and BEACON-RESP-RX. I stop Dawn and these logs stop getting generated. So I think Dawn is functional)

1 Like

Thanks for all your great documentation and bug reports. I will later answer on it in more detail. Just to be sure DAWN is sending out wnm_disassoc_imminent you can just use "ubus monitor" to see if DAWN actually sends requests to hostapd over the bus.

And we already have debug messages. If you enable stdout in dawn-procd file you should also see those messages:

1 Like

Btw reading the discussion here, I have also enabled "kicking" with DAWN and changed min RSSI to 70dB still some clients are hanging to original AP even when stronger AP exists, while they report between 75-79dB...

config network
        option broadcast_ip '10.0.0.255'
        option broadcast_port '1025'
        option tcp_port '1026'
        option network_option '2'
        option shared_key 'Niiiiiiiiiiiiick'
        option iv 'Niiiiiiiiiiiiick'
        option use_symm_enc '0'
        option collision_domain '-1'
        option bandwidth '-1'

config ordering
        option sort_order 'cbfs'

config hostapd
        option hostapd_dir '/var/run/hostapd'

config times
        option update_client '10'
        option denied_req_threshold '30'
        option remove_client '15'
        option remove_probe '30'
        option remove_ap '460'
        option update_hostapd '10'
        option update_tcp_con '10'
        option update_chan_util '5'
        option update_beacon_reports '600'

config metric 'global'
        option rssi_weight '0'
        option rssi_center '0'
        option initial_score '0'
        option kicking_threshold '20'
        option duration '600'
        option rrm_mode 'apt'
        option ap_weight '0'
        option ht_support '0'
        option vht_support '0'
        option no_ht_support '0'
        option no_vht_support '0'
        option rssi '0'
        option low_rssi '0'
        option freq '0'
        option chan_util '0'
        option max_chan_util '0'
        option rssi_val '-60'
        option low_rssi_val '-75'
        option chan_util_val '0'
        option max_chan_util_val '0'
        option min_probe_count '0'
        option bandwidth_threshold '6'
        option use_station_count '0'
        option max_station_diff '0'
        option eval_probe_req '1'
        option eval_auth_req '1'
        option eval_assoc_req '1'
        option deny_auth_reason '1'
        option deny_assoc_reason '17'
        option use_driver_recog '1'
        option chan_util_avg_period '3'
        option set_hostapd_nr '1'
        option kicking '1'

config metric '802_11a'
        option rssi_weight '2'
        option rssi_center '-70'
        option initial_score '125'

config metric '802_11g'
        option rssi_weight '2'
        option rssi_center '-70'
        option initial_score '100'

Any idea why it's not kicked out and roam to another AP ? When I disconnect it manually, it immediately connects to other AP (on hearing map I see both APs), with Debug enabled I don't see roaming happening however.
Shall I further decrease option low_rssi_val or adjust other parameter ? Thank you.

Yet another observation that I'd like to bring to your attention.

(Dawn has no direct control over this, but...)

I have discovered the wnm_disassoc_imminent actually makes my Android phone's wifi performance much worse after a transition induced by it (bandwidth reduced to less than 30% of usual). My Apple "burner phone" doesn't seem to be impacted however. Problem reproducible every time.

I then tried the other methods exposed by hostapd, namely del_client and even the newest bss_transition_request. Of the three, only del_client allows my phone to retain its normal bandwidth after connecting to another AP.

(bss_transition_request downright doesn't work. After several transitions it totally breaks down. Clients disconnect before reconnecting to another AP).

I realize using del_client would mean sacrificing fast transition 802.11r (observed from my testing). But at least my phone still doesn't lose connection altogether during roaming, and I don't have to deal with reduced bandwidth.

So for now I won't be experimenting with Dawn anymore unfortunately. Will definitely keep an eye on further developments plus possibly improvements on hostapd so it will be compatible with (my model of) Android phones. I truly think Dawn has huge potential, when it becomes 100% stable and functional.

(for those interested, this tool does make use of del_client: Wi-Fi roaming recipe)

Edit 7 days later - I have since discovered that my Android phone does not play well with Openwrt's 802.11r. So I disabled it, and wnm_disassoc_imminent no longer makes an impact on my phone's data throughput anymore. So the culprit seems to be 802.11r and not wnm_disassoc_imminent. Also, I collaborated with the author of wifi-disconnect-low-signal (i.e. the tool I mentioned that's in this link: Wi-Fi roaming receipe) so now it makes use of wnm_disassoc_imminent instead of del_client. With it my wifi roaming solution is finally complete at least for now. Of course it is primitive compared with Dawn, but until Dawn can actually "kick" clients to other APs, I will stick with my current solution.

1 Like

I have noticed that there is no Settings menu item in DAWN menu in Luci in the latest snapshot realease, is it normal ? Settings for dawn were present previousely in luci

To be abandoned the LuCI app.

I try once more even nobody answered on my first question but anyway, I have two identical APs configs (HW wise it's Ubiquiti UniFi AC Lite and Ubiquiti UniFi AC Pro) and on AC Pro variant, I'm getting following error in the log quite often.

Tue Dec  7 20:51:46 2021 daemon.err dawn[15643]: Neighbor-Report is NULL!
Tue Dec  7 20:54:11 2021 daemon.err dawn[15643]: Neighbor-Report is NULL!
Tue Dec  7 20:54:11 2021 daemon.err dawn[15643]: Neighbor-Report is NULL!
Tue Dec  7 20:54:11 2021 daemon.err dawn[15643]: Neighbor-Report is NULL!
Tue Dec  7 20:55:37 2021 daemon.err dawn[15643]: Neighbor-Report is NULL!
Tue Dec  7 20:55:37 2021 daemon.err dawn[15643]: Neighbor-Report is NULL!
Tue Dec  7 20:55:44 2021 daemon.err dawn[15643]: Neighbor-Report is NULL!
Tue Dec  7 20:55:44 2021 daemon.err dawn[15643]: Neighbor-Report is NULL!
Tue Dec  7 20:56:19 2021 daemon.err dawn[15643]: Neighbor-Report is NULL!
Tue Dec  7 20:57:32 2021 daemon.err dawn[15643]: Neighbor-Report is NULL!
Tue Dec  7 20:57:32 2021 daemon.err dawn[15643]: Neighbor-Report is NULL!
Tue Dec  7 20:59:14 2021 daemon.err dawn[15643]: Neighbor-Report is NULL!
Tue Dec  7 21:00:53 2021 daemon.err dawn[15643]: Neighbor-Report is NULL!

Any idea what's wrong please ?

Just wanted to put this here in case others hit the same issue; It seems the root cause of Dawn not working (for some?) including myself out of the box, and throwing the below error:

daemon.err dawn[1111]: Failed to look up test object for umdns

was identified in this ticket as missing seccomp dependency in the builds.

The issue would be that Dawn would not be aware of the other Access Points also running Dawn (empty list, etc). Prior to this bit of info, I had the following command in my cron which worked around the issue:

*/5 * * * * test -f /etc/seccomp/umdns.json && rm /etc/seccomp/umdns.json; sync; /etc/init.d/umdns restart

Once i installed procd-seccomp, removed the above cron command, and restarted the Access Points, as soon as each came up, it could see the others.

@PolynomialDivision the user friendlier until upstream fixes things / defensive coding solution out of the box may be (as noted in the ticket), if Dawn either added procd-seccomp as a dependency, and/or Dawn does a validation to see if it's installed at runtime.

2 Likes

Thanks for the heads up. This fixed my issue.

@Edrikk Are you testing on master, or openwrt 21.02?

Okay, this should be already fixed by:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=520403cd4978fd2e3cca389e5009ca5c0ac26db9

or in 21.02:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=520403cd4978fd2e3cca389e5009ca5c0ac26db9

Please use Openwrt-Snapshot-21.02 or master builds and test again:
https://downloads.openwrt.org/releases/21.02-SNAPSHOT/targets/

Awesome will do.

I’m new to OpenWRT (main router is Tomato) so still on stable builds 21.02 while learning.

And thank you!

Edit: I just noticed that patch was from Nov 2020. Which I believe would have already been in the first release of 21.02. Am I misunderstanding? Because if that’s the case, the patch either didn’t capture all scenarios, or the DAWN specific dependencies patch may still be needed.

I was doing some pre-push testing of a revised logging approach, and seem to have stumbled across a potentially significant bug in the main tree.

I'm not really sure how kicking clients to better APs can have been working for anyone with this error - so I'm posting this partly to see if I'm misunderstanding something :slight_smile:. In my testing this week as I wander around the house I was always seeing a "No active transmission data for client. Don't kick!" message for my phone. Thinking that this was wrong I tracked down the apparent error and once changed I do now get sensible kicking.

It's caused by a change I made in June 2020: https://github.com/berlin-open-wireless-lab/DAWN/commit/67c3ed0d0aba4f9a55da32ded854d223e8e06e74.

In kick_clients() there was a test:

 float rx_rate, tx_rate;
 if (get_bandwidth_iwinfo(client_array[j].client_addr, &rx_rate, &tx_rate)) {
    // only use rx_rate for indicating if transmission is going on

which I changed to this, to add a bit more tracing info:

float rx_rate, tx_rate;
if (get_bandwidth_iwinfo(client_array[j].client_addr, &rx_rate, &tx_rate)) {
    printf("No active transmission data for client. Don't kick!\n");
}
else
{
    // only use rx_rate for indicating if transmission is going on

The mistake is that I didn't add a ! operator to keep the if / else sense correct. Although the code here has changed more since July 2020 it seems that the inverted logic that I added has been preserved.

Adding the negation to the current code made it work for me, changing:

bool have_bandwidth_iwinfo =
    !(get_bandwidth_iwinfo(j->client_addr, &rx_rate, &tx_rate));

to

bool have_bandwidth_iwinfo =
    get_bandwidth_iwinfo(j->client_addr, &rx_rate, &tx_rate);

The "natural reading" of this also makes more sense, since get_bandwidth_iwinfo() returns 1 if it gets the data.

It's a part of the big PR I'm preparing for the logging approach, but wanted to mention it separately.

2 Likes

These messages are a side effect of turning on evaluation of probe, association and authentication requests via the three eval_... options set to '1' in your config. They are harmless, and just reflect a specific code path that is perhaps a little confused :slight_smile:.

I've been working on some revisions to logging, and these will be eliminated by that if it makes it into the core code.

2 Likes