SDN with openvswitch and ospf for mesh network

I'd like to share an SDN configuration I use between 3 openwrt routers. Maybe this will be interesting to some, and maybe someone can point out an issue to improve.

Essentially it removes the local interfaces from openwrt's control and connects them into a virtual switch that sits ontop of a routed network. This allows for multiple links between each AP such as wifi and ethernet and the routing protocol will specify the preferred link (i.e. prefer ethernet over wifi)
The slight drawback on consumer hardware is the MTU of this virtual network is 1474 because each packet is encapsulated between ap/nodes

Imagine you have an AP in another building but your network is connected to it over a pre-existing configuration you can't or don't want to turn into a vlan trunk. With this openvswitch SDN, as long as you can ping a router in the other building, you can point the gre virtual wires at it and create a "flat" network to support wifi roaming on a single SSID
With some more advanced configuration or the routing table on each AP/node, you could still "exit" the wifi network to the local building's network for local access by configuring the same "default gateway ip" on multiple routers, and blocking ARP requests for a particular IP from crossing certain openvswitch virtual wire ports.

I use AP/Node interchangeably, but this is not an option for slow CPU wireless access points, this assumes all devices are "routers" capable of the task. You could connect an slow AP to one of your physical ports attached to openvswitch and set the tag to 0 to become a trunk for multiple SSIDs

Required packages
frr
frr-vtysh
frr-watchdog
frr-ospfd
frr-staticd
wpad-openssl
dawn
openvswitch (all packages)
kmod-openvswitch-gre (the other protocols like vxlan and geneve use different hardware)
kmod-dummy

Optional
frr-bfdd (optional)
mesh11sd

Add the following to each routers wireless config /etc/config/wireless,
This should be the first network listed after the wifi-device sections (important later)

config wifi-iface 'default_radio'
	option device 'radio1'
	option mode 'mesh'
	option mesh_fwding '1'
	option mesh_rssi_threshold '0'
	option key 'MeshPW$Here'
	option encryption 'sae'
	option mesh_id 'MeshNetName'
	option network 'mesh'
	option ifname 'm-11s-0'

If you are going to use a mesh network for the inter-ap connection
You could use a series of Point to Point Wifi networks and tune the OSPF costs accordingly, or let mesh11sd decide where to route traffic on the mesh
/etc/config/mesh11sd

config mesh11sd 'setup'
	option enabled '1'
	option debuglevel '1'
	option checkinterval '10'
	option interface_timeout '10'
	option portal_detect '0'
	option auto_config '0'

config mesh11sd 'mesh_params'
	option mesh_fwding '1'
#If your nodes are far away, -100 will allow them to connect but expect poor performance
	option mesh_rssi_threshold '-100'
	option mesh_hwmp_rootmode '3'
	option mesh_max_peer_links '8'
#Uncomment this line on your main node
#	option mesh_connected_to_gate '1' 
#	option mesh_gate_announcements '1'

Create a new firewall zone called "ovs" and set the traffic policy to accept for input, output, and forward.
I also suggest allowing traffic from/to your LAN or admin network incase you need to backdoor an isolated node

Setup a new openwrt interface for the dummy0 device on each node using a /32 ip address in the example network: 192.168.255.x and name it ovs.
for example:

ip: 192.168.255.255
subnetmask: 255.255.255.255

Assign this interface to the ovs firewall zone

The reason for this is we want to direct vswitch traffic in the direction of the node announcing this /32 ipaddress, instead of configuring a vswitch port for every single interface connected to the openvswitch network

For each network interface that will be part of the openvswitch backbone network create an individual openwrt network and assign it to the ovs firewall zone
notes:

  • If you are using a mesh network, or an ethernet network with multiple nodes in the backbone. Use a /24 network such as 192.168.254.0/24
  • If you are using point to point networks between each or some nodes, assign them /30 networks such as 192.168.23.0/30
  • IPv6 networks in the backbone aren't useful because of the additional header size

Configure your physical LAN networks, such as guest, devices, lan, etc... using the following notes

  • In order to support attaching wifi to a lan, each LAN network needs to be defined as a bridge
    I.e. br-lan, br-guest, br-devices
  • you must detach your physical networks from any bridge/vlan, etc... at this stage I recommend connecting one interface directly to br-lan or your "admin" network.
  • Also click the "unconfigure" button next to each physical network (except wan if this is the gateway)

Update the openvswitch configuration using the below example.
/etc/config/openvswitch

config ovs ovs
	option disabled 0
#	option ca '/etc/openvswitch/example_ca.crt'
#	option cert '/etc/openvswitch/example_cert.crt'
#	option key '/etc/openvswitch/example_key.crt'

config ovn_northd north
	option disabled 1

config ovn_controller controller
	option disabled 1

config ovs_bridge
	option disabled 0
	option name 'ovs-vswitch'
	option datapath_desc ''
	option datapath_id ''
	option fail_mode 'standalone'

#These "internal" ports are devices that you will attach to your matching br-lan, br-guest, etc... named by the "port" option. Select a different "tag" to specify different vlans, but avoid tag 1 or 0
config ovs_port
	option disabled 0
	option bridge 'ovs-vswitch'
	option port 'lan'
	option ofport '1'
	option tag '2'
	option type 'internal'

config ovs_port
        option disabled 0
        option bridge 'ovs-vswitch'
        option port 'devices'        
        option ofport '1'      
        option tag '3'   
        option type 'internal'

config ovs_port
#If you are setting the initial configuration while connected to lan1, set disabled 1. Otherwise use disabled 0 if you are connecting via WiFi
# if you want a vlan trunk, set the "tag" option to 0
        option disabled 1
        option bridge 'ovs-vswitch'
        option port 'lan1'
        option ofport '1'
        option tag '3'
        option type 'system'

config ovs_port
        option disabled 0
        option bridge 'ovs-vswitch'
        option port 'lan2'   
        option ofport '1'
        option tag '2'
        option type 'system'

config ovs_port
        option disabled 0
        option bridge 'ovs-vswitch'
        option port 'lan3'   
        option ofport '1'
        option tag '2'
        option type 'system'

config ovs_port
        option disabled 0
        option bridge 'ovs-vswitch'
        option port 'lan4'   
        option ofport '1'
        option tag '3'
        option type 'system'

config ovs_port                                                                        
        option disabled 0                                                              
        option bridge 'ovs-vswitch'                                                    
#note, this example is from a router being used as a wifi mesh node, so the WAN port is connected to the openvswitch
        option port 'wan'                                                             
        option ofport '1'                                                              
        option tag '3'                                                                 
        option type 'system'

after saving and restart openvswitch service to

service openvswitch restart

Now you can return to the bridge configuration for each of your networks and add the matching openvswitch internal interface to the bridge

  • add "lan" to br-lan
  • add "guest" to br-guest
  • add "devices" to br-devices
  • etc...

You can now try to connect your computer to another physical interface and test to see if you can access your router. If you have success, then you can remove the physical network from your br-lan.
Set the option disabled 1 to 0 in the openvswitch configuration and restart the service again.
Test moving your computer back to port 1 and confirm you still have access

Setup frr on each node
edit /etc/frr/daemons and set ospfd and bfdd to on
Use the below as a base for frr.conf

frr version 8.2
frr defaults traditional
hostname nppvmesh02.abman.home
log syslog
!
password zebra
!
# we blackhole the openvswitch network to drop traffic when a network is down
ip route 192.168.254.0/23 blackhole
!
# we want the dummy interface ip to be announced in the ospf network
interface dummy0
 ip ospf area 0.0.0.0
exit
!
#Example mesh network
interface m-11s-0
 ip ospf area 0.0.0.0
 ip ospf bfd
# non-broadcast may be desireable if we want to quiet down traffic on a mesh network. this is optional
 ip ospf network non-broadcast
 no ip ospf passive
exit
!
#Example LAN or PtP wifi network
interface lan4
 ip ospf area 0.0.0.0
 ip ospf bfd
 #In the case of ethernet, we set the cost to 1 to make it the preferred uplink
 ip ospf cost 1
 no ip ospf passive
exit
!
router ospf
 passive-interface default
#when setting a mesh network non-broadcast, we need to specify each node's IP on the openvswitch network
 neighbor 192.168.254.1
 neighbor 192.168.254.2
exit
!
access-list vty seq 5 permit 127.0.0.0/8
access-list vty seq 10 deny any
!
line vty
 access-class vty
exit
!
# bfd peers will allow nodes to be dropped instantly from ospf if they are offline. Makes a mesh network noisy, optional
bfd
 peer 192.168.254.1
 exit
 !
 peer 192.168.254.2
 exit
 !
exit
!

Run the following on each node to setup internode traffic. These "ports" encapsulate traffic between each node in a gre packet like a virtual vlan trunk (or even as an access port if configured, but out of scope for this guide)
In this example, we are on "node2" so you will notice that a port for node1 and node3 are specified.
When configuring node 1 and others, switch the node names and select the correct "remote_ip" and use the dummy0 ip for "local_ip"

ovs-vsctl add-port ovs-vswitch node1 -- set interface node1 type=gre options:remote_ip=192.168.255.255 options:local_ip=192.168.255.253
ovs-vsctl add-port ovs-vswitch node3 -- set interface node3 type=gre options:remote_ip=192.168.255.254 options:local_ip=192.168.255.253
ovs-vsctl set int ovs-vswitch mtu_request=1476
ovs-vsctl set Bridge ovs-vswitch stp_enable=false
ovs-vsctl set Bridge ovs-vswitch rstp_enable=true

you can verify the configuration by running

ovs-vsctl show
0c755014-8508-407a-9a3b-6fa7df4239a4
    Bridge ovs-vswitch
        fail_mode: standalone
        Port node3
            Interface node3
                type: gre
                options: {local_ip="192.168.255.253", remote_ip="192.168.255.254"}
        Port node1
            Interface node1
                type: gre
                options: {local_ip="192.168.255.255", remote_ip="192.168.255.254"}
        Port lan1
            tag: 2
            Interface lan1
                type: system
        Port lan3
            tag: 2
            Interface lan3
                type: system
        Port lan
            tag: 2
            Interface lan
                type: internal
        Port ovs-vswitch
            Interface ovs-vswitch
                type: internal
        Port lan2
            tag: 2
            Interface lan2
                type: system
        Port lan4
            tag: 2
            Interface lan4
                type: system
        Port devices
            tag: 3
            Interface devices
                type: internal
    ovs_version: "2.15.3"

If you goof, just run ovs-vsctl del-port ovs-vswitch node1 to delete the node and reconfigure

Next add the same commands you just ran on each node to the startup script for openwrt, this will ensure on reboot your inter-node ports are configured if they ever get damaged.

Now we can reboot the gateway and each node to confirm they come online. you should be able to ping the openvswitch backbone IP between each node, to confirm connectivity between each, and then ping the LAN ip of your networks (provided firewall is setup)

You can troubleshoot if traffic is passing over the openvswitch gre ports by installing tcpdump and running

tcpdump -i all proto gre

Which will dump all traffic going over the virtual openvswitch wires, if no traffic is flowing, check your route table and ospf neighbors by running

vtysh

sh ip ospf nei

sh ip ospf route

sh bfd peers

sh ip route

if you do not have any neighbors listed, troubleshoot the frr config related to the interface between each node. if you do not see the dummy0 ip addresses in the routing table, troubleshoot your frr config for the dummy0 section. openvswitch will not be able to transmit data to another node until the remote dummy0 address is installed in the kernel routing table, shown by "sh ip route"

If you chose to use bfd, ospf will still connect via hello if bfd is not working, but you can check if the peer relationship is working for fast state changes

At this point, you hopefully have a highly resiliant openvswitch routed network across your mesh/AP node.
I've found that dawn has issues connecting to other nodes if you use a LAN interface in /etc/config/umdns. So I added

list interface ovs

under the default "list network lan"

Finally, make sure that DHCP is turned off all nodes except the gateway, unless you have setup primary/bavkup dhcp servers

If you can't figure this out based off the general examples, this is probably not for you. but it was a fun project and being able to roam through a single wifi SSID between routers that are connected with any number of uplinks

What's interesting here is each node in the backbone network is handing traffic for the vswitch between eachother, almost like an underground black market, while each device on the different LAN vlans think they're on a traditional flat network, including ARP, multicast, broadcasts

Bonus:
If a node is isolated, you don't want wifi devices connecting to it. So I run a script every minute that will check for ping to a particular IP and kindly dissociate clients from it's networks (I say kindly, because it notifies them that the AP is over capacity, and it's up to them to roam to another BSSID"

in scheduled tasks

*/1 * * * * /root/dissocwifi.sh

iwinfo



m-11s-0   ESSID: "MeshNet"

wlan0     ESSID: "Netw1"

wlan0-1   ESSID: "Netw2"

wlan1-1   ESSID: "Netw1"

this requires the package hostapd-utils for hostapd_cli
run iwinfo to get the parameters for -i matched up to your SSIDs
You must configure this for each SSID you want to kick clients off of in an isolated state.
This is superrior to turning off SSIDs, because in a mesh11s network, changes to the hostapd configuration via openwrt can reset the mesh11s network

dissocwifi.sh

#!/bin/sh

ping -c4 -q 192.168.2.1 > /dev/null 2>&1

if [ $? -eq 1 ]; then

timetorun=60   # In seconds
i=0
stoptime=$((timetorun + $(date +%s)))
while [ $(date +%s) -lt $stoptime ]; do
	if [ "$(hostapd_cli -i wlan0 list_sta)" ]; then
		i=$i+1
		hostapd_cli -i wlan0 list_sta|tr '\n' '\0'| xargs -0 -I '%%' ubus call hostapd.wlan0 del_client '{"addr":"%%","reason":5,"dauth":false,"ban_time":60000}'
	fi
	if [ "$(hostapd_cli -i wlan0-1 list_sta)" ]; then
		i=$i+1
		hostapd_cli -i wlan0-1 list_sta|tr '\n' '\0'| xargs -0 -I '%%' ubus call hostapd.wlan0-1 del_client '{"addr":"%%","reason":5,"dauth":false,"ban_time":60000}'
	fi
        if [ "$(hostapd_cli -i wlan1-1 list_sta)" ]; then
                i=$i+1
                hostapd_cli -i wlan1-1 list_sta|tr '\n' '\0'| xargs -0 -I '%%' ubus call hostapd.wlan1-1 del_client '{"addr":"%%","reason":5,"dauth":false,"ban_time":60000}
        fi

	if [ $i = 0 ]; then
		exit
	fi
done

fi
1 Like