Dawn: a decentralized wireless controller

Hi, while playing around a bit with Dawn, it looks like there would be a huge difference in settings depending on the network that you're trying to setup, right?

A simple home network, no matter how large, would have:

  • relatively few devices
  • not that many passers-by
    While a network on a large site could potentially:
  • have a gazillion devices
  • have many, many, many passers-by.

Is that correct?

Going further, I'm thinking that, for a home situation the remove_client setting could be in hours - maybe days even - instead of the current 15 seconds. But I'm not sure about that. Also, can I just freely set the remove_probe information to hours or days?

The reason for asking is that my relatively new phone with good 5G support is one of these stubborn Android types that refuses to optimally scan the network. Sometimes it attaches to 5G but many times it takes ages for it to appear in a 5G scan and it keeps connecting to the single 2.4GHz net on the ground floor, i.e. where it appeared first. So I'm thinking: having Dawn remember that long, long ago this phone attached to 5G would help it kick. Or am I thinking wrong?

So would that work, setting remove_client to hours?

Since the update from OpenWrt 22.03.3 to 23.05.2 I'm unable to view the DAWN Network Overview and Hearing Map using LuCI. I get the following error message:

When looking on my AccessPoints that are still on 22.03.3 I can see the upgraded AccessPoints just fine in LuCI and DAWN seems operational on 23.05.2 as kicking message logs are shown but unfortunately I cannot see any more details in LuCI.

The following packages are installed:

Package name Version Size (.ipk) Description
dawn 2023-05-14-e036905a-1 - -
luci-app-dawn git-23.318.36526-7739e9f - -
prometheus-node-exporter-lua-dawn 2022.08.08-1 - -
umdns 2023-10-19-d45c443a-5 - -
wpad-mbedtls 2023-09-08-e5ccbfc6-6 - -

I did switch from wpad-openssl to wpad-mbedtls to be in sync with OpenWrt's default. But this shouldn't be the issue right?

Update:
When installing luci-app-dawn git-23.093.42704-9c92e46 from 22.03 it works, I've also tried git-23.311.69135-69eeebe from snapshot but this failed as well.

3 Likes

Start exploring what to tweak in the configs. Check out these reddit posts: https://www.reddit.com/r/openwrt/comments/v26ybu/comment/ie63ckp/?utm_source=reddit&utm_medium=web2x&context=3 / https://www.reddit.com/r/openwrt/comments/15y9sqs/dawn_the_bss_transition_controller_and_tips_on/. They helped me with figuring out what I needed (I didn't copy-paste blindly; always experiment and observe one logical change at a time).

I recently updated from 23.05.0 (not 22!) to 23.05.2 and I also got this error.
I think this happening is due to the bug mentioned in Dawn: a decentralized wireless controller - #276 by chaitanyapramod / https://github.com/openwrt/openwrt/issues/14120 that seems to be affecting MT7621 devices specifically.

I've been running 23.05.0 for almost 4 months before this maintenance upgrade and it didn't have this issue. It appears that the latest version 23.05.2 results has DAWN segfaulting resuling in the UI to be unable to display anything. This is what's printed out in the log periodically:

[   26.981973] do_page_fault(): sending SIGSEGV to umdns for invalid write access to 00000004
[   26.998634] epc = 77d0d62b in libubox.so.20230523[77d0b000+1f000]
[   27.010962] ra  = 77d10a0f in libubox.so.20230523[77d0b000+1f000]

Additionally, I see a lot of entries as follows in between the above:

Thu Feb 29 00:31:16 2024 daemon.warn dawn: Failed to lookup ID!

Some other nodes are able to "see" each other while others have this in their logs (also repeated continuously:

Wed Feb 28 22:56:22 2024 daemon.err dawn: connect_cb()=tcpsocket.c@319 Connection failed (ERROR)

(they have different hardware and only one has the same openwrt upgrade)

Not likely — the problem is happening for me and I have wpad-openssl.

FWIW I'm going to downgrade to 23.05.0 and wait until this all gets sorted out. Luckily, it's easy for me since I'm using an image builder that generates images that self-configure on first boot and can be deployed as immutable...

cc @PolynomialDivision FYI

2 Likes

That seems reasonable. Although, I haven't tried myself. It's an interesting idea and I'm eager to hear more about your experiments with this setting and maybe follow your example...

That's an interesting thought. I'd also like to add that some Android phones (and I'm talking about three recent ones I had are OnePlus 9R, OnePlus 10 Pro 5G and OnePlus 12) have clever mechanisms for using several connections to the internet. Specifically, there's a setting for the phone to balance between 5G and Wi-Fi, being connected to both (I suppose the traffic is separated per TCP session in this case). Another setting is connecting to two Wi-Fi radios simultaneously (like 5GHz+2GHz). I think these phones have two separate Wi-Fi adapters, per my observations (how else would they hold two Wi-Fi connections?), plus there's a possibility to set up a Wi-Fi hotspot while still being connected to another Wi-Fi network, and share that connection.

This all is to say that many managed Wi-Fi APs / networks would probably be confused when one device shows up twice (although, the MAC addresses of its radios would be different). In some cases, at least.

I also know that modern phones generate "private" MAC addresses when connecting to Wi-Fi networks. And for some (Android?), that MAC address is (used to be?) unique per Wi-Fi network name, while for others (iPhone?) the mac addresses are unique per Wi-Fi AP/radio. Meaning that when such a device roams, DAWN would probably be unable to track when the same device connects to a different AP on the same network, treating it as a new client.

With that in mind, I'd suggest trying to record observations while changing the settings to see if they are in fact helpful.

i've been using DAWN on 3x rt4230 configured as access points on a 192.168.1.x network, for the past 6+ months. i've been on 23.02.X and on snapshots. i continue to expeience growing lags in wireless transmission the longer the AP's are up without a reboot. after rebooting all and allowing for about 20 min of client (about 20 of them in my hme) stabilization, i get excellent ping and speed. as the days pass, both transmission rates and ping time increase.
i looks like the overhead assocaited with keeping up the network grows with DAWN enabled. see the attached images for interface traffic over a week from the AP's - you will see a progressive increase which returns to baseline with reboots.

i've got each AP configured with the same SSID on both 2.4G and 5G radios, 802.11r fast roaming enabled on each AP, but turning off makes no difference. . i've used 'default' and various modified parmeters in DAWN. i've turned cell coverage density off for the 2.4G and on (high) for 5G - but turning both off makes no differnce.

if I disable DAWN, the progressive increase in latency goes away, but steering clients to a closer/faster AP takes a lot longer (ofter requring manual disconnect/connect from
client)
thius DAWN seems to be working, but imposing a lot, and growing, overhead.

thoughts?
anyone else seeing this?

thanks



The Github issue pointing out that page_fault is fixed in actual snapshot. I switched to braodcasts to avoid umdns as it seems broken also on other devices in actual release as after some days it does not find any hosts until restarting the service.

Sadly I still run without umdns in the tcpsocket error. Do you know something new about the issue?

1 Like

Nope. How did you switch to broadcasts, by the way? I don't remember seeing a specific toggle for this...

You have to set network_option to 0 to use broadcasts and also broadcast_ip to the network the APs can communicate over.

        option broadcast_ip '192.168.177.255'
        option broadcast_port '1025'
        option tcp_port '1026'
        option network_option '0'
2 Likes

If dawn dead? Last activity was in May 2023. Is there a working alternativ? Read above about usteer...but there is also not much activity.

Activity · berlin-open-wireless-lab/DAWN (github.com)

Not claiming it's dead, but it has been pretty quiet around here lately. I'm also concerned for usteer :disappointed:

Hi,

with newest updates DAWN seems working again (but running for 24h only actually...). On one device I can see the following error in the hearing map in LuCi sometimes. Does anybody know it?

TypeError

Cannot read properties of undefined (reading '0')

Sometimes see TypeError in OpenWRT LuCi integration · Issue #240 · berlin-open-wireless-lab/DAWN (github.com)

Cheers,
Nils

i have DAWN on 3 AP's in my home network.
am now seeing this error on one of my AP's,
dawn-> hearing map-> red banner "TypeError cannot read properties of undefined"

since about 2 weeks ago. clearing all configs does not stop it.
AP's ar rt4230w, running snapshot openwrt builds.

were the mod proposed in https://github.com/openwrt/luci/pull/5992 still merged?
thanks -

@ghoffman , @NilsRo - Opened a PR regarding this here

Hopefully it gets merged in.

2 Likes

fantastic. thx

1 Like

Long time user of Dawn. It has been serving me well, except for what I am pretty sure is a bug.

One of my routers in my network has two radios (2.4G and 5G), and naturally they have the same name under "Access Point" (see picture below). What happens is SOMETIMES the "Signal" of the two radios have identical values, even when the wireless device is at a location where I am a hundred percent certain the signal strength can't be the same.

Whenever this happens (the problem does not happen all the time), DAWN will make incorrect decisions on which radio to bounce the device to.

Here is an example to illustrate what I am trying to say: when the signal strength from the 2.4G radio is -50 and 5G radio is -87, DAWN is supposed to provide a higher score for the 2.4G radio, but what I see instead is the picture below:

Either that, or I would see both radios showing -50, in which case DAWN would calculate a higher score for the 5G radio (based on how I have configured DAWN, so the calculations are right, but the source signal strength based on which the score is calculated is incorrect) and steer the device to the 5G radio, then we go back to the picture above.

Again this problem does not happen all the time, i.e. sometimes the two lines would show their own signal strength values like they are supposed to. But whenever the problem occurs, DAWN will make incorrect guiding instructions to a client device.

My gut feeling tells me SOME of the code in DAWN would see the same "access point" names (even though the MAC addresses are different) and mix up the signal strength values. Although I am not that technically inclined to be able to validate this theory of mine.

Hopefully this will get the attention of someone capable of looking into this issue, and make DAWN even better.

1 Like

L̶o̶o̶k̶s̶ l̶i̶k̶e̶ I̶ h̶a̶v̶e̶ r̶e̶s̶o̶l̶v̶e̶d̶ m̶y̶ i̶s̶s̶u̶e̶ b̶y̶ c̶h̶a̶n̶g̶i̶n̶g̶ "s̶e̶t̶_̶h̶o̶s̶t̶a̶p̶d̶_̶n̶r̶" f̶r̶o̶m̶ 1̶ b̶a̶c̶k̶ t̶o̶ 0̶ (̶t̶h̶e̶ d̶e̶f̶a̶u̶l̶t̶)̶.

... p̶r̶o̶b̶a̶b̶l̶y̶ b̶e̶t̶t̶e̶r̶ t̶o̶ l̶e̶a̶v̶e̶ t̶h̶i̶n̶g̶s̶ a̶s̶-̶i̶s̶ w̶h̶e̶n̶ n̶o̶t̶ s̶u̶r̶e̶ w̶h̶a̶t̶ i̶t̶ i̶s̶ a̶b̶o̶u̶t̶
:slight_smile:

1 Like

Edit: turns out there's what appears to be a bug in DAWN. See my new replies below.

O̶K̶ I̶ t̶h̶i̶n̶k̶ I̶ f̶i̶n̶a̶l̶l̶y̶ h̶a̶v̶e̶ i̶d̶e̶n̶t̶i̶f̶i̶e̶d̶ t̶h̶e̶ c̶a̶u̶s̶e̶ o̶f̶ t̶h̶e̶s̶e̶ 's̶t̶i̶c̶k̶y̶' s̶i̶g̶n̶a̶l̶ v̶a̶l̶u̶e̶s̶, a̶s̶ t̶h̶e̶y̶ c̶o̶n̶t̶i̶n̶u̶e̶d̶ t̶o̶ c̶r̶e̶e̶p̶ u̶p̶ e̶v̶e̶n̶ a̶f̶t̶e̶r̶ I̶ c̶h̶a̶n̶g̶e̶d̶ t̶h̶e̶ s̶e̶t̶_̶h̶o̶s̶t̶a̶p̶d̶_̶n̶r̶.

M̶y̶ r̶e̶l̶e̶n̶t̶l̶e̶s̶s̶ e̶f̶f̶o̶r̶t̶s̶ a̶n̶d̶ t̶e̶s̶t̶s̶ t̶o̶ t̶r̶y̶ a̶n̶d̶ r̶e̶v̶e̶a̶l̶ t̶h̶e̶ c̶a̶u̶s̶e̶ i̶n̶d̶i̶c̶a̶t̶e̶d̶ t̶h̶a̶t̶

  1. I̶f̶ t̶h̶e̶ r̶o̶u̶t̶e̶r̶ ̶̶t̶h̶i̶n̶k̶s̶̶̶ t̶h̶e̶ c̶l̶i̶e̶n̶t̶ d̶e̶v̶i̶c̶e̶ i̶s̶ s̶t̶i̶l̶l̶ c̶o̶n̶n̶e̶c̶t̶e̶d̶ t̶o̶ i̶t̶s̶ A̶P̶, D̶A̶W̶N̶ w̶i̶l̶l̶ h̶a̶p̶p̶i̶l̶y̶ t̶a̶k̶e̶ t̶h̶e̶ p̶h̶a̶n̶t̶o̶m̶ r̶s̶s̶i̶ v̶a̶l̶u̶e̶ f̶r̶o̶m̶ t̶h̶e̶ A̶P̶ a̶n̶d̶ c̶o̶n̶t̶i̶n̶u̶e̶ t̶o̶ r̶e̶g̶i̶s̶t̶e̶r̶ t̶h̶e̶ v̶a̶l̶u̶e̶ i̶n̶ D̶A̶W̶N̶'s̶ o̶w̶n̶ h̶e̶a̶r̶i̶n̶g̶ m̶a̶p̶.

  2. T̶h̶e̶ l̶a̶s̶t̶ p̶r̶o̶b̶e̶d̶ v̶a̶l̶u̶e̶ i̶n̶ D̶A̶W̶N̶'s̶ o̶w̶n̶ s̶c̶a̶n̶n̶i̶n̶g̶ e̶f̶f̶o̶r̶t̶s̶ w̶i̶l̶l̶ b̶e̶ l̶e̶f̶t̶ i̶n̶ t̶h̶e̶ h̶e̶a̶r̶i̶n̶g̶ m̶a̶p̶ f̶o̶r̶ a̶ c̶e̶r̶t̶a̶i̶n̶ p̶e̶r̶i̶o̶d̶ o̶f̶ t̶i̶m̶e̶, e̶v̶e̶n̶ w̶h̶e̶n̶ t̶h̶e̶ c̶l̶i̶e̶n̶t̶ d̶e̶v̶i̶c̶e̶ h̶a̶s̶ b̶e̶e̶n̶ m̶o̶v̶e̶d̶ t̶o̶ a̶ d̶i̶f̶f̶e̶r̶e̶n̶t̶ l̶o̶c̶a̶t̶i̶o̶n̶ a̶n̶d̶ l̶o̶s̶e̶ s̶i̶g̶h̶t̶ o̶f̶ a̶ p̶a̶r̶t̶i̶c̶u̶l̶a̶r̶ A̶P̶.

T̶h̶e̶s̶e̶ t̶w̶o̶ f̶a̶c̶t̶o̶r̶s̶ c̶o̶m̶b̶i̶n̶e̶d̶ a̶r̶e̶ e̶n̶o̶u̶g̶h̶ t̶o̶ c̶a̶u̶s̶e̶ h̶a̶v̶o̶c̶ i̶n̶ D̶A̶W̶N̶'s̶ c̶a̶l̶c̶u̶l̶a̶t̶i̶o̶n̶s̶ t̶o̶ t̶r̶y̶ a̶n̶d̶ s̶t̶e̶e̶r̶ a̶ d̶e̶v̶i̶c̶e̶ t̶o̶ w̶h̶a̶t̶ D̶A̶W̶N̶ t̶h̶i̶n̶k̶s̶ i̶s̶ s̶t̶i̶l̶l̶ t̶h̶e̶ A̶P̶ w̶i̶t̶h̶ t̶h̶e̶ s̶t̶r̶o̶n̶g̶e̶s̶t̶ s̶i̶g̶n̶a̶l̶, w̶h̶e̶n̶ i̶n̶ f̶a̶c̶t̶ t̶h̶e̶s̶e̶ s̶t̶e̶e̶r̶i̶n̶g̶ d̶e̶c̶i̶s̶i̶o̶n̶s̶ a̶r̶e̶ b̶a̶s̶e̶d̶ o̶n̶ o̶b̶s̶o̶l̶e̶t̶e̶ d̶a̶t̶a̶.

I̶ t̶h̶e̶n̶ t̶r̶i̶e̶d̶ t̶o̶ s̶p̶e̶e̶d̶ u̶p̶ t̶h̶e̶ r̶e̶f̶r̶e̶s̶h̶i̶n̶g̶ o̶f̶ s̶i̶g̶n̶a̶l̶ v̶a̶l̶u̶e̶s̶ t̶o̶ a̶d̶d̶r̶e̶s̶s̶ t̶h̶e̶ t̶w̶o̶ p̶r̶o̶b̶l̶e̶m̶s̶ a̶b̶o̶v̶e̶.

F̶o̶r̶ '1̶', c̶h̶a̶n̶g̶i̶n̶g̶ t̶h̶e̶ w̶i̶r̶e̶l̶e̶s̶s̶ c̶o̶n̶f̶i̶g̶'s̶ m̶a̶x̶_̶i̶n̶a̶c̶t̶i̶v̶i̶t̶y̶ v̶a̶l̶u̶e̶ o̶f̶ e̶a̶c̶h̶ A̶P̶ t̶o̶ '1̶0̶' d̶i̶d̶ t̶h̶e̶ t̶r̶i̶c̶k̶. A̶f̶t̶e̶r̶ t̶h̶e̶ c̶h̶a̶n̶g̶e̶ w̶h̶e̶n̶ a̶ c̶o̶n̶n̶e̶c̶t̶i̶o̶n̶ j̶u̶m̶p̶s̶ f̶r̶o̶m̶ o̶n̶e̶ A̶P̶ t̶o̶ a̶n̶o̶t̶h̶e̶r̶, t̶h̶e̶ r̶o̶u̶t̶e̶r̶ w̶i̶l̶l̶ p̶u̶r̶g̶e̶ t̶h̶e̶ o̶l̶d̶ c̶o̶n̶n̶e̶c̶t̶i̶o̶n̶ w̶i̶t̶h̶i̶n̶ 1̶0̶ s̶e̶c̶o̶n̶d̶s̶.

F̶o̶r̶ '2̶', I̶ a̶d̶d̶r̶e̶s̶s̶e̶d̶ t̶h̶e̶ i̶s̶s̶u̶e̶ b̶y̶ c̶h̶a̶n̶g̶i̶n̶g̶ r̶e̶m̶o̶v̶e̶_̶p̶r̶o̶b̶e̶ i̶n̶ D̶A̶W̶N̶'s̶ c̶o̶n̶f̶i̶g̶ t̶o̶ '1̶0̶'. N̶o̶w̶ w̶e̶ n̶o̶ l̶o̶n̶g̶e̶r̶ h̶a̶v̶e̶ a̶ b̶u̶n̶c̶h̶ o̶f̶ h̶e̶a̶r̶i̶n̶g̶ m̶a̶p̶ e̶n̶t̶r̶i̶e̶s̶ w̶i̶t̶h̶ v̶e̶r̶y̶ o̶l̶d̶ p̶r̶o̶b̶i̶n̶g̶ d̶a̶t̶a̶ t̶h̶a̶t̶ s̶t̶a̶y̶e̶d̶ i̶n̶ D̶A̶W̶N̶'s̶ m̶e̶m̶o̶r̶y̶.

A̶l̶s̶o̶, s̶e̶t̶_̶h̶o̶s̶t̶a̶p̶d̶_̶n̶r̶ i̶s̶ n̶o̶w̶ "2̶", w̶h̶i̶c̶h̶ m̶e̶a̶n̶s̶ (̶a̶s̶ i̶n̶d̶i̶c̶a̶t̶e̶d̶ f̶r̶o̶m̶ m̶y̶ n̶u̶m̶e̶r̶o̶u̶s̶ t̶e̶s̶t̶i̶n̶g̶)̶ D̶A̶W̶N̶ w̶i̶l̶l̶ u̶p̶d̶a̶t̶e̶ i̶t̶s̶ h̶e̶a̶r̶i̶n̶g̶ m̶a̶p̶ b̶a̶s̶e̶d̶ o̶n̶ t̶h̶e̶ l̶i̶s̶t̶ o̶f̶ A̶P̶s̶ s̶e̶e̶n̶ b̶y̶ a̶ c̶l̶i̶e̶n̶t̶ d̶e̶v̶i̶c̶e̶. i̶.e̶. i̶f̶ t̶h̶e̶ d̶e̶v̶i̶c̶e̶ d̶o̶e̶s̶n̶'t̶ s̶e̶e̶ a̶n̶ A̶P̶ b̶e̶c̶a̶u̶s̶e̶ i̶t̶ i̶s̶ t̶o̶o̶ f̶a̶r̶, D̶A̶W̶N̶ w̶i̶l̶l̶ n̶o̶t̶ s̶h̶o̶w̶ t̶h̶a̶t̶ A̶P̶ a̶t̶ a̶l̶l̶ i̶n̶ t̶h̶e̶ h̶e̶a̶r̶i̶n̶g̶ m̶a̶p̶ o̶f̶ t̶h̶a̶t̶ d̶e̶v̶i̶c̶e̶.

I̶ m̶a̶y̶ n̶e̶e̶d̶ t̶o̶ a̶d̶j̶u̶s̶t̶ t̶h̶e̶s̶e̶ v̶a̶l̶u̶e̶s̶ t̶o̶ f̶i̶n̶e̶ t̶u̶n̶e̶ D̶A̶W̶N̶'s̶ r̶e̶s̶p̶o̶n̶s̶i̶v̶e̶n̶e̶s̶s̶, b̶u̶t̶ a̶f̶t̶e̶r̶ a̶ d̶a̶y̶ o̶f̶ n̶o̶r̶m̶a̶l̶ u̶s̶a̶g̶e̶, t̶h̶e̶r̶e̶ i̶s̶ n̶o̶ m̶o̶r̶e̶ j̶i̶t̶t̶e̶r̶y̶ j̶u̶m̶p̶s̶ t̶o̶ d̶i̶f̶f̶e̶r̶e̶n̶t̶ A̶P̶s̶, a̶n̶d̶ w̶h̶e̶n̶ D̶A̶W̶N̶ s̶t̶e̶e̶r̶s̶ a̶ d̶e̶v̶i̶c̶e̶ t̶o̶ a̶ n̶e̶w̶ A̶P̶, i̶t̶ n̶o̶ l̶o̶n̶g̶e̶r̶ g̶o̶e̶s̶ b̶a̶c̶k̶ a̶n̶d̶ f̶o̶r̶t̶h̶.

I̶ h̶o̶p̶e̶ I̶ w̶o̶n̶'t̶ h̶a̶v̶e̶ t̶o̶ c̶o̶m̶e̶ b̶a̶c̶k̶ w̶i̶t̶h̶ y̶e̶t̶ a̶n̶o̶t̶h̶e̶r̶ u̶p̶d̶a̶t̶e̶ o̶n̶ t̶h̶i̶s̶ i̶s̶s̶u̶e̶. T̶h̶i̶s̶ i̶s̶ h̶o̶p̶e̶f̶u̶l̶l̶y̶ m̶y̶ f̶i̶n̶a̶l̶ f̶i̶x̶ o̶n̶c̶e̶ a̶n̶d̶ f̶o̶r̶ a̶l̶l̶.

4 Likes

Does your problem go away with just lowering dawn.@times[0].remove_probe='10'?

I too am facing some weirdness myself when devices disconnect in a 4 AP environment, from my logs it also happens when it tries to steer away disconnected clients, and have been looking for options to mitigate it.

i added these tweaks to my 3-AP home. so far - much better roaming with less sticking.. i'll report back.
i'm still working on tweaks for 2.4G ->5G preference.
thx