Dawn: a decentralized wireless controller

Just wanted to put this here in case others hit the same issue; It seems the root cause of Dawn not working (for some?) including myself out of the box, and throwing the below error:

daemon.err dawn[1111]: Failed to look up test object for umdns

was identified in this ticket as missing seccomp dependency in the builds.

The issue would be that Dawn would not be aware of the other Access Points also running Dawn (empty list, etc). Prior to this bit of info, I had the following command in my cron which worked around the issue:

*/5 * * * * test -f /etc/seccomp/umdns.json && rm /etc/seccomp/umdns.json; sync; /etc/init.d/umdns restart

Once i installed procd-seccomp, removed the above cron command, and restarted the Access Points, as soon as each came up, it could see the others.

@PolynomialDivision the user friendlier until upstream fixes things / defensive coding solution out of the box may be (as noted in the ticket), if Dawn either added procd-seccomp as a dependency, and/or Dawn does a validation to see if it's installed at runtime.

2 Likes

Thanks for the heads up. This fixed my issue.

@Edrikk Are you testing on master, or openwrt 21.02?

Okay, this should be already fixed by:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=520403cd4978fd2e3cca389e5009ca5c0ac26db9

or in 21.02:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=520403cd4978fd2e3cca389e5009ca5c0ac26db9

Please use Openwrt-Snapshot-21.02 or master builds and test again:
https://downloads.openwrt.org/releases/21.02-SNAPSHOT/targets/

Awesome will do.

I’m new to OpenWRT (main router is Tomato) so still on stable builds 21.02 while learning.

And thank you!

Edit: I just noticed that patch was from Nov 2020. Which I believe would have already been in the first release of 21.02. Am I misunderstanding? Because if that’s the case, the patch either didn’t capture all scenarios, or the DAWN specific dependencies patch may still be needed.

I was doing some pre-push testing of a revised logging approach, and seem to have stumbled across a potentially significant bug in the main tree.

I'm not really sure how kicking clients to better APs can have been working for anyone with this error - so I'm posting this partly to see if I'm misunderstanding something :slight_smile:. In my testing this week as I wander around the house I was always seeing a "No active transmission data for client. Don't kick!" message for my phone. Thinking that this was wrong I tracked down the apparent error and once changed I do now get sensible kicking.

It's caused by a change I made in June 2020: https://github.com/berlin-open-wireless-lab/DAWN/commit/67c3ed0d0aba4f9a55da32ded854d223e8e06e74.

In kick_clients() there was a test:

 float rx_rate, tx_rate;
 if (get_bandwidth_iwinfo(client_array[j].client_addr, &rx_rate, &tx_rate)) {
    // only use rx_rate for indicating if transmission is going on

which I changed to this, to add a bit more tracing info:

float rx_rate, tx_rate;
if (get_bandwidth_iwinfo(client_array[j].client_addr, &rx_rate, &tx_rate)) {
    printf("No active transmission data for client. Don't kick!\n");
}
else
{
    // only use rx_rate for indicating if transmission is going on

The mistake is that I didn't add a ! operator to keep the if / else sense correct. Although the code here has changed more since July 2020 it seems that the inverted logic that I added has been preserved.

Adding the negation to the current code made it work for me, changing:

bool have_bandwidth_iwinfo =
    !(get_bandwidth_iwinfo(j->client_addr, &rx_rate, &tx_rate));

to

bool have_bandwidth_iwinfo =
    get_bandwidth_iwinfo(j->client_addr, &rx_rate, &tx_rate);

The "natural reading" of this also makes more sense, since get_bandwidth_iwinfo() returns 1 if it gets the data.

It's a part of the big PR I'm preparing for the logging approach, but wanted to mention it separately.

2 Likes

These messages are a side effect of turning on evaluation of probe, association and authentication requests via the three eval_... options set to '1' in your config. They are harmless, and just reflect a specific code path that is perhaps a little confused :slight_smile:.

I've been working on some revisions to logging, and these will be eliminated by that if it makes it into the core code.

2 Likes

Thank you - I was just wondering why it happens only on one of my AP and not on the 2nd one having exactly the same DAWN config...
Anyway, looking forward for new version, I would really like to have DAWN functional but it seems in my specific case it does so far nothing, even with default configs...haven't seen low RSSI clients being kicked out, haven't seen active roaming with detailed logs etc.

Some setup guide for noob like me would be awesome.

The bug I mention just above may be the reason you don't see kicking happening. If you turn on stdout logging in /etc/init.d/dawn and see the "No active transmission" message then I'd say it is.

To work around it for now you can try setting option bandwidth_threshold '0' in /etc/config/dawn. it will avoid that specific bug. You might hit something else :frowning:, but hopefully not.

3 Likes

Heyho, @IanC published a lot of changes. :slight_smile: Thanks a lot.

I would be happy, if multiple people could test and review:

So far it's looking fine...

obrazek

I saw even some attempts for kicking but it was denied due to the ongoing data transfer.

Btw I'm using config with disabled evaluations (not sure it still somehow make difference)

2 Likes

Hopefully you will some device being kicked as well. If not then there may still be a path that stops them when it shouldn't.

I think this is fine. I see these as "advanced" steering to try and stop a client connecting if it could be using a different AP. Allowing it to connect (by disabling evaluation of those requests) and then letting it be kicked should be OK. From a pragmatic point of view ensuring the new code base works for kicking and then starting to check other features in more detail works for me.

EDIT: But it should be equally fine to use those if you want to :slight_smile:

1 Like

Thank you, I will be checking more, so far I see following I don't understand :
kickcount 1 below threshold of 3!
Any idea please what is meaning of it ? Is it due to the settings or it didn't reach certain level yet...

And also I see this quite often - and often with the same value, which is suspicious...

Client is probably in active transmisison. Don't kick! RxRate is: 6.000000

Can you try increasing this value?

1 Like

Both of those are normal.

Rather than kick a client the first time it dips below the target DAWN waits for consecutive 3 (or as config) times to ensure it really has moved away. So you should see 1, 2 and then kick. Or 1, (possibly 2) then "best AP" which will reset the count and not disrupt the session.

RxRate is a guide as to whether a device would be kicked in the middle of downloading data. If the device is not busy that rate can come down to ~1.0. If you prefer strong connections with some interruption as you wander around watching Netflix then set the rate higher.

1 Like

Actually I do have it set to 0 already.

 option bandwidth_threshold '0'

...and actually I got even kick confirmation which confirms it takes current value...so maybe there is indeed a glitch

Thu Jan  6 22:29:23 2022 daemon.info dawn[3429]: Station xx:xx:xx:43:DF:FD: Kicking as no active transmission data for client, but bandwidth_threshold=0 is OK.

That is clear to me...I was just wondering, that's it's interesting coincidence it's almost always "6.000" ?

       config network
        option broadcast_ip '10.0.0.255'
        option broadcast_port '1025'
        option tcp_port '1026'
        option network_option '2'
        option shared_key 'Niiiiiiiiiiiiick'
        option iv 'Niiiiiiiiiiiiick'
        option use_symm_enc '0'
        option collision_domain '-1'
        option bandwidth '-1'

config ordering
        option sort_order 'cbfs'

config hostapd
        option hostapd_dir '/var/run/hostapd'

config times
        option update_client '10'
        option denied_req_threshold '30'
        option remove_client '15'
        option remove_probe '30'
        option remove_ap '460'
        option update_hostapd '10'
        option update_tcp_con '10'
        option update_chan_util '5'
        option update_beacon_reports '600'

config metric 'global'
        option rssi_weight '0'
        option rssi_center '0'
        option initial_score '0'
        option kicking_threshold '20'
        option duration '600'
        option rrm_mode 'apt'
        option ap_weight '0'
        option ht_support '1'
        option vht_support '1'
        option no_ht_support '0'
        option no_vht_support '0'
        option rssi '0'
        option low_rssi '0'
        option freq '0'
        option chan_util '0'
        option max_chan_util '0'
        option rssi_val '-60'
        option low_rssi_val '-75'
        option chan_util_val '0'
        option max_chan_util_val '0'
        option min_probe_count '0'
        option bandwidth_threshold '0'
        option use_station_count '0'
        option max_station_diff '0'
        option eval_probe_req '0'
        option eval_auth_req '0'
        option eval_assoc_req '0'
        option deny_auth_reason '1'
        option deny_assoc_reason '17'
        option use_driver_recog '1'
        option chan_util_avg_period '3'
        option set_hostapd_nr '1'
        option kicking '1'

config metric '802_11a'
        option rssi_weight '2'
        option rssi_center '-70'
        option initial_score '125'

config metric '802_11g'
        option rssi_weight '2'
        option rssi_center '-70'
        option initial_score '100'
config local
    option loglevel             '1'

**reason for edit ** - I thought my config got truncated but I just didn't scroll terminal ... :man_facepalming:

Btw, looking on this condition, is this right when option bandwidth_threshold is actually '0' - it's always true, or ?

I don't think it should be 0 by default. Did you set it that way a few days ago when I said this to work around the bug that was there until fixed in the most recent updates?

2 Likes

Try setting it higher than 6. Maybe in your settings the RX-Rate will not go to 1.

1 Like

Probably yes but I'm on the latest version from you - https://github.com/berlin-open-wireless-lab/DAWN/pull/158, I thought this has been fixed ? So basically it means to avoid using for now

option bandwidth_threshold '0'

Actually it does go above 1, please see my screenshot from above - just not so often, so probably tuning right level is required, will try 7 for example :wink:

Also another interesting "bug" but I understood that values returned might be sometimes inconsistent.

FYI, seems to be much better after increasing it to 7 ... anyway it's suspicious to me :slight_smile: