What kinds of features would you want from a OpenWRT management system? (Dusk)

I am working on a decentralized management and monitoring solution for OpenWRT called Dusk. Dusk is inspired a bit by DAWN and I am working to design it in a way that allows for no centralized points. The original design involved a device that would be designated as master but I have sense thrown out that idea. I am still working on technical aspects of how I am going to make it work but I have come up with some new partial designs. I am focusing on security and flexibility and I hope to make something that will work on both a small and large scale. I plan on making administration be done primarily though Luci.

Here are the targeted features for Dusk:

  • Group policy that can be applied to specific devices, groups or network wide (uci based)
  • peer to peer operation with self healing and recovery
  • Users and user permissions
  • Global Wireless status with all connected devices
  • Controlled propagation of changes
  • Mass joining operations

Here are the bonus things I might do if I actually get this project going (most of these are very hard)

  • Automatic Wifi channel selection
  • Band steering
  • Rouge AP and interference detection
  • high availability for services like dhcp (auto move to new host)
  • Backup and restore
  • generation of configuration files for services not configured via uci
  • decentralized roles (think DHCP as a service that is not pinned to a device)
  • maybe other things

From a security perspective all changes must be signed via a authorized key. The key itself will be encrypted with a password and to make a change you will need to enter that password. (password will be verified via hash) Because of the risk of lockout there will be a recovery phrase that unlocks a key that can reset the user key. If the phrase and the password are lost the network will need to be rebuild from scratch.

Some of the things I would like feedback on is the actual use case for Dusk. Are there features that I left out? What would make it easy to use? Peer to peer networks are complex so I am working on a underlying design that makes it hassle free as much as possible. However, I am scared that a network could end up in a state that would be hard to correct.

Follow up question: Do you think storing historical data is important? My current designs don't really have a storage solution but I could look into adding some sort of device history that keeps records of events and relevant data. (This would most likely be stored in a DHT)

3 Likes

in which programming languages you are planning to work on this?

1 Like

For me one of the main draws with openwisp was a firmware management system. Do a test on a test group, then I do a batch roll out to everything.

Probably Rust but for the prototyping I'm using python or JavaScript.

I'm looking to exploit the nature of distributed system in order to to gradual rollouts. A node will apply a change before replicating. If the change fails the node will not replicate.

This works for any changes including firmware upgrades

2 Likes

Awesome. To further elaborate on my use case, long as it's designed for multiple groups of devices. For example I have an A deployment group, B deployment group and test deployment group on one control page =)

I guess that can be challenging with a distributed system however as the primary control node would eventually need to update itself. Or you end up with three interfaces, one per group.

As it stands the configuration is going to stored in a configuration file that is synced across the network. That file has everything that is in the network even if it is not applied to the current node.

I'm not sure what you mean my primary control node but in my design there is no central control point. Originally that was my design but I soon realized that I don't need perfect consensus as eventual consensus is enough. I am inspired by libp2p and i2p.

1 Like

Mm. This is more of an interface consideration.

It's more UI / interface or options for how an interface is going to work. like "single plane of glass" considerations.

I mean even with a distributed system, what would the interface be?

You have to log into one of the nodes? At some point there will be an interface. Unless you plan on doing a control node or cluster IP, how does one handle updating the node you are on? Or is this something where you can/will have a control node that can participate in updating the config but otherwise doesn't participate?

I'm thinking in terms of how I'd normally do it with ansible where you can have a separate node you run the playbook on that has the config, i.e. in a push model.

Oh that makes sense. You technically would lose access temporarily but you can access Luci on a different device.

1 Like

Follow up question: Would you want a way of seeing historical data? What I mean is a log of things like reboots and RF interference. My thought is it might be useful to have a history you can fall back on to help find issues like faulty hardware or periodic RF interference.

I think initially one would want to limit the scope.

Log analysis and collection can be left to other tools.

On my wishlist is an easy way to log clients roaming on floor plans etc but I think it's important to limit the scope to actually reach a major release first. I think logging historical data is fraught with danger, but a good way to see realtime client performance, roaming and signal integrity data etc is good. What I currently do is read off of usteer for instantaneous RSSI, have a log server with openobserve for associations and other wifi logs, zabbix for availability coupled with openwisp for everything else.

I absolutely agree that it is important to limit scope. However, I am still designing the underlying system and it is critical that I do it in a way that is stable and flexible.

The reason I ask about storage is because I am thinking about potentially utilizing Kademlia DHT for data storage and resiliency. It isn't perfectly ideal for a small network but it very well tested and documented which makes using it easier. Also it would mean that I wouldn't need to worry about eventually consistency.

Basically I am still working on the base protocol. I want that to be solid so that cool features could be added in the future. I'm leaning toward skipping the DHT thing entirely as it seems out of scope.

1 Like

Mm. Getting an architecture in place is good yeah. Don't want a big ball of mud =P

Is there (considerable) overhead expected on Access Points with regards to RAM or Flash? If so, minimal requirements for Access Points may need to be mentioned in order to work efficient in this management system.

Regarding scalability: is this intended for environments with i.e. up to 10 Access Points, or many more?

I would suggest a containerized solution. One of the ways OpenWRT is superior is by its small size out of the box. Nice not to think of adding it into an existing router deployment. Not to mention spinning up new instances of the management container would make upgrades painless.

The feature list is very ambitious. Especially considering p2p operation. My instinct would be to plan for a modular design and initially implement the smallest possible subset of features to achieve the core functionality. Probably leaving the p2p idea for later. Maybe you are a genius, in which case please disregard my comment :slight_smile:

1 Like

That adds overhead and complexity for little gain. I'm designing it to be fairly small and light.

Moving from a client server model to p2p would be very hard and would require a total rewrite. As far as features go I'm mostly leveraging existing stuff in OpenWRT.

My plan is to tackle the gossip protocol and the underlying storage and connections. Gossip protocols have been around for many years so I'm not doing anything to crazy. All I need to do is implement conflict resolution and storage.

I'm designing this to work in both small and large environments. As far as access points go it shouldn't require anything special. The protocol as it stands is just a device pulling from another device. This won't scale across buildings.

Right now as it stands the network will use mDNS to do connections in a single building. Once in a while nodes will pull from another building. This means that the status data will be a little dated for nodes far away but it is way more effective than trying to pull data over a potentially slow link.

1 Like