Help with Softflowd and exporting netflow data to PRTG

RuralRoots · June 25, 2020, 4:59pm

I set up an old dual boot laptop that was gathering dust with PRTG and dedicated it as a network monitor mainly to monitor SNMP stats from the router along with additional sensors monitoring certain aspects of other connected devices, and it’s worked quite well.

I was looking for a bit more granularity specifically to traffic so I set up a netflow sensor in PRTG to look for netflow data on routerIP:5555 and installed softflowd and nfdump.

My softflowd config is:

root@MyDomain:~# cat /etc/config/softflowd
config softflowd
        option enabled        '1'
        option interface      'br-lan'
        option pcap_file      ''
        option timeout        '60'
        option max_flows      '8192'
        option host_port      '10.10.1.100:5555'
        option pid_file       '/var/run/softflowd.pid'
        option control_socket '/var/run/softflowd.ctl'
        option export_version '5'
        option hoplimit       ''
        option tracking_level 'full'
        option track_ipv6     '0'
        option sampling_rate  '1'
root@MyDomain:~#

I did

root@MyDomain:~#/etc/init.d/softflowd start

root@MyDomain:~# /etc/init.d/softflowd status
running
root@MyDomain:~# /etc/init.d/softflowd running
root@MyDomain~#

I am not seeing anything going through to the sensor.

I also ran a Netflow test tool provided by PRTG and it just sits there waiting for data.

It all looks right, and I get not specific errors from either router or monitor, but no flows. ???

dl12345 · June 25, 2020, 5:40pm

I use softflowd to export flows into an ELK stack where I use Elastiflow to give me nice pretty dashboards. For some reason (I can't remember why as I set it up literally years ago), I don't use the init script that ships with the package. It might be that the switches it uses didn't work for me.

Remember, though, that you need to leave it running for a while before it expires flows and sends them to the server. It won't happen immediately.

Here's my init script

START=99

IFACE=br-lan
PORT=<listen port of logstash server>
SERVER=<ip of logstash server>
PIDFILE=/var/run/softflowd-logstash.pid
CTLFILE=/var/run/softflowd-logstash.ctl
SOFTFLOWD=/usr/sbin/softflowd
SOFTFLOWCTL=/usr/sbin/softflowctl

EXTRA_COMMANDS="status"
EXTRA_HELP="        status Show the status of the qat device"

USE_PROCD=1

reload_service()
{
	stop_service
	start_service
}

restart()
{
	stop_service
	start_service
}

start_service()
{
	procd_open_instance
	procd_set_param command /usr/bin/taskset 1
	procd_append_param command "${SOFTFLOWD}" -i "${IFACE}" -6 -v 9 -n "${SERVER}:${PORT}" -p "${PIDFILE}"
	procd_close_instance
}


service_triggers()
{
    procd_add_reload_trigger "network"
}

status()
{
	[ -S ${CTLFILE} ] && ${SOFTFLOWCTL} statistics
}

stop_service()
{
	[ -S ${CTLFILE} ] && {
		
		${SOFTFLOWCTL} shutdown
		rm -f ${PIDFILE} 2> /dev/null
		rm -f ${CTLFILE} 2> /dev/null

	}
}

EDIT: if you use this script just remove the line with taskset on it. I have a multi-core router, so I use this to pin softflowd to the second core. It uses a lot of machine resources on gigabit links

lleachii · June 25, 2020, 5:45pm

To be honest:

I've never been able to configure that file successfully
Nor was I sure if it were possible to define mutiple PHYs

So, I added these lines like these to `/etc/rc.local' instead (an example of WAN as eth0.2; and an HE IPv6 tunnel):

softflowd -i eth0.2 -v 9 -n 192.168.xxx.xxx:xxxxx -d &           
softflowd -i 6in4-henet -v 9 -n 192.168.xxx.xxx:xxxxx -d &

Hope this helps.

RuralRoots · June 25, 2020, 6:45pm

That was another question I had “can I offload to cpu2”, but first things first - get it running.

Your script caught my eye with reference to the softflowd.ctl file. It just doesn’t get created. Something to look at.

It won't happen immediately.

2 hours now and naught.

Seems that’s now three of us with the same conclusion - it doesn’t complain, but it sure doesn’t work per the docs.

Thank you kindly for your pointers and scripts. Gives me some focus as to where to look. My brain is getting so fried trying so many different approaches to get it working that I’m repeating previous failed attempts

I’ll update my progress.

dl12345 · June 25, 2020, 7:57pm

You can pin a process to a specific core with the taskset utility, which is part of util-linux, although it's not included in Openwrt. You can add it with this patch (which also adds the renice utility, another useful thing)

diff --git a/package/utils/util-linux/Makefile b/package/utils/util-linux/Makefile
index 77b4b98b56..4411c4b814 100644
--- a/package/utils/util-linux/Makefile
+++ b/package/utils/util-linux/Makefile
@@ -396,6 +396,15 @@ define Package/partx-utils/description
  contains partx, addpart, delpart
 endef
 
+define Package/renice
+$(call Package/util-linux/Default)
+  TITLE:=Alter the priority of running processes
+endef
+
+define Package/renice/description
+ Alter the priority of running processes
+endef
+
 define Package/script-utils
 $(call Package/util-linux/Default)
   TITLE:=make and replay typescript of terminal session
@@ -441,6 +450,15 @@ define Package/swap-utils/description
  contains: mkswap, swaplabel
 endef
 
+define Package/taskset
+$(call Package/util-linux/Default)
+  TITLE:=Set or retrieve a task's CPU affinity
+endef
+
+define Package/taskset/description
+ Alter the priority of running processes
+endef
+
 define Package/unshare
 $(call Package/util-linux/Default)
   TITLE:=unshare userspace tool
@@ -726,6 +744,11 @@ define Package/partx-utils/install
 	$(INSTALL_BIN) $(PKG_INSTALL_DIR)/usr/sbin/delpart $(1)/usr/sbin/
 endef
 
+define Package/renice/install
+	$(INSTALL_DIR) $(1)/usr/bin
+	$(INSTALL_BIN) $(PKG_INSTALL_DIR)/usr/bin/renice $(1)/usr/bin/
+endef
+
 define Package/script-utils/install
 	$(INSTALL_DIR) $(1)/usr/bin
 	$(INSTALL_BIN) $(PKG_INSTALL_DIR)/usr/bin/script $(1)/usr/bin/
@@ -748,6 +771,11 @@ define Package/swap-utils/install
 	$(INSTALL_BIN) $(PKG_INSTALL_DIR)/usr/sbin/swaplabel $(1)/usr/sbin/
 endef
 
+define Package/taskset/install
+	$(INSTALL_DIR) $(1)/usr/bin
+	$(INSTALL_BIN) $(PKG_INSTALL_DIR)/usr/bin/taskset $(1)/usr/bin/
+endef
+
 define Package/unshare/install
 	$(INSTALL_DIR) $(1)/usr/bin
 	$(INSTALL_BIN) $(PKG_INSTALL_DIR)/usr/bin/unshare $(1)/usr/bin/
@@ -810,10 +838,12 @@ $(eval $(call BuildPackage,nsenter))
 $(eval $(call BuildPackage,prlimit))
 $(eval $(call BuildPackage,rename))
 $(eval $(call BuildPackage,partx-utils))
+$(eval $(call BuildPackage,renice))
 $(eval $(call BuildPackage,script-utils))
 $(eval $(call BuildPackage,setterm))
 $(eval $(call BuildPackage,sfdisk))
 $(eval $(call BuildPackage,swap-utils))
+$(eval $(call BuildPackage,taskset))
 $(eval $(call BuildPackage,unshare))
 $(eval $(call BuildPackage,uuidd))
 $(eval $(call BuildPackage,uuidgen))

RuralRoots · June 27, 2020, 1:04pm

Success - sort of anyway.

BusyBox v1.31.1 () built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt SNAPSHOT, r13342-e35e40ad82
 -----------------------------------------------------

root@MyDomain:~# softflowd -D -v 5 -i br-lan -n 10.10.1.100:5555 -T full
Using br-lan (idx: 0)
softflowd v1.0.0 starting data collection
Exporting flows to [10.10.1.100]:5555
ADD FLOW seq:1 [10.10.1.1]:22 <> [xx.xx.x.xxx]:55954 proto:6 vlan>:0 vlan<:0  ether:00:00:00:00:00:00 <> 00:00:00:00:00:00 
ADD FLOW seq:2 [10.10.1.1]:161 <> [xx.xx.x.xxx]:50811 proto:17 vlan>:0 vlan<:0  ether:00:00:00:00:00:00 <> 00:00:00:00:00:00 
ADD FLOW seq:3 [10.10.1.1]:0 <> [xx.xx.x.xxx]:0 proto:2 vlan>:0 vlan<:0  ether:00:00:00:00:00:00 <> 00:00:00:00:00:00 
ADD FLOW seq:4 [10.10.1.1]:161 <> [xx.xx.x.xxx]:50812 proto:17 vlan>:0 vlan<:0  ether:00:00:00:00:00:00 <> 00:00:00:00:00:00 
ADD FLOW seq:5 [xx.xx.x.xxx]:0 <> [xx.xx.x.xxx]:0 proto:2 vlan>:0 vlan<:0  ether:00:00:00:00:00:00 <> 00:00:00:00:00:00 
ADD FLOW seq:6 [10.10.1.1]:161 <> [xx.xx.x.xxx]:50813 proto:17 vlan>:0 vlan<:0  ether:00:00:00:00:00:00 <> 00:00:00:00:00:00 
ADD FLOW seq:7 [xx.xx.x.xxx]:57682 <> [xx.xx.x.xxx]:443 proto:6 vlan>:0 vlan<:0  ether:00:00:00:00:00:00 <> 00:00:00:00:00:00 
ADD FLOW seq:8 [xx.xx.x.xxx]:53618 <> [xx.xx.x.xxx]:443 proto:6 vlan>:0 vlan<:0  ether:00:00:00:00:00:00 <> 00:00:00:00:00:00 
ADD FLOW seq:9 [10.10.1.1]:161 <> [xx.xx.x.xxx]:50814 proto:17 vlan>:0 vlan<:0  ether:00:00:00:00:00:00 <> 00:00:00:00:00:00 
ADD FLOW seq:10 [xxxx::xxxx:xxxx:xxxx:xxxx]:63726 <> [ffxx::c]:1900 proto:17 vlan>:0 vlan<:0  ether:00:00:00:00:00:00 <> 00:00:00:00:00:00

but, it won't run in anything other than debug mode (-D Command line switch). All the other CL options match correctly to the /etc/config/softflowd config file.

Setting it to run as a daemon on boot puts it into a crash loop. ie

Reboot router
Verify daemon status

root@MyDomain:~# /etc/init.d/softflowd status
running

Verify the socket is created

root@MyDomain:~# ls -ltr /var/run/soft*
srwxr-xr-x    1 root     root             0 Jun 26 10:59 softflowd.ctl

Check that the socket is collecting

root@MyDomain:~# softflowctl statistics
ctl connect("/var/run/softflowd.ctl") error: Connection refused

HMMMMM!!!!!

root@MyDomain:~# logread -e softflowd
Fri Jun 26 10:12:37 2020 daemon.info procd: Instance softflowd::instance1 s in a crash loop 6 crashes, 0 seconds since last crash
Fri Jun 26 11:35:19 2020 daemon.info procd: Instance softflowd::instance1 s in a crash loop 7 crashes, 0 seconds since last crash
Fri Jun 26 14:49:20 2020 daemon.info procd: Instance softflowd::instance1 s in a crash loop 6 crashes, 0 seconds since last crash

So

root@MyDomain:~# /etc/init.d/softflowd stop
root@MyDomain:~# /etc/init.d/softflowd status
inactive

And

root@MyDomain:~# softflowd -D -v 5 -i br-lan -n 10.10.1.100:5555 -T full

Works, but only in debug mode.

lleachii · June 27, 2020, 1:40pm

I'm confused. Why didn't you use the command syntax I noted above?

Also, why didn't you place it where I noted?

Lastly, I'm not sure why you're messing with the init.d when you're typing in the command to run it.

RuralRoots · June 27, 2020, 2:01pm

It is according to the docs I've seen. Just add another instance into the config file apparently. Can't test or verify that at the moment of course

In any event, I know what's happening now, and I have a workaround in the interim using /etc/rc.local. Thanks to you both for the assist.

Off to get myself informed. Thanks.

dl12345 · June 27, 2020, 2:08pm

Both @lleachii and I use almost identical command line options and neither of us use the -D flag and it seems to work fine this way. I also get the softflowd.ctl file created

root@openwrt:~# ls -l /var/run/soft*
-rw-r--r-- 1 root root 5 Jun 12 02:01 /var/run/softflowd.pid
srwxr-xr-x 1 root root 0 Jun 12 02:01 /var/run/softflowd.ctl

If you really want it controlled by procd just use the script I posted and change the block

start_service()
{
	procd_open_instance
	procd_set_param command /usr/bin/taskset 1
	procd_append_param command "${SOFTFLOWD}" -i "${IFACE}" -6 -v 9 -n "${SERVER}:${PORT}" -p "${PIDFILE}"
	procd_close_instance
}

to

start_service()
{
	procd_open_instance
	procd_set_param command "${SOFTFLOWD}" -i "${IFACE}" -6 -v 9 -n "${SERVER}:${PORT}" -p "${PIDFILE}"
	procd_close_instance
}

dl12345 · June 27, 2020, 2:10pm

Incidentally, you almost certainly don't want to run multiple instances. It's a real resource hog. On a gigabit flow it uses a substantial amount of CPU

anon50098793 · June 27, 2020, 2:58pm

-D ( and -d ) switch forces run as root
no -D softflowd runs as nobody...

nobody ( triggers PRIVDROP_USER ) is unable to access /dev/null ... so process fails

strace softflowd -v 5 -i br-lan -n 10.10.1.100:5555 -T full 2>&1 | grep -E '(chdir|/dev/null)'

RuralRoots · June 27, 2020, 4:00pm

Yes, I understood what you were saying - just invoke the instance through rc.local and be done with it. Unfortunately, I'm a bit of a mule when it comes to things that don't work. I am currently using your workaround in fact until I can figure out why it doesn't work in daemon mode (crash loop).

The config "enable" switch was set to 1 in the config so when the process initially was called, it set it up to run as a daemon on boot (init.d). It was running as a service so it needs to be explicity referenced to control the service. No messing with init.d to run it from CLI, rather only to control the already existing process.
Reboot router
Verify daemon status (init.d)
Verify the socket is created (init.d)
Check that the socket is collecting
Stop the process (init.d)
Verify the process is stopped (init.d)
Invoke the process from CLI in debug mode to watch process on stdout

That's what started this whole thing Gents.
Invoking it from /usr/sbin by starting it with 'softflowd' on the command line, will use the /etc/config/softflowd - doesn't work.

Invoking it from the command line without the -D switch - doesn't work
@dl12345 's modded script works
@lleachii 's rc.local example works
Invoking it from the command line with the -D switch - works

Very,very true. Just noting the docs say it is so.

lleachii · June 27, 2020, 4:05pm

...I'm not sure why you don't option enabled '0'...(I guess that's the in you so OK).

Try a stable release of OpenWrt and see if it happens. I've only had issues like that on snapshots (and it changes from one release to the next, so upgrading snapshots may work too).

dl12345 · June 27, 2020, 4:16pm

Any service run via a procd script MUST be run in the foreground and not as a daemon

https://openwrt.org/docs/guide-developer/procd-init-scripts

procd init script parameters

A procd init script is similiar to an old init script, but with a few differences:

procd expects services to run in the foreground
Different shebang line: #!/bin/sh /etc/rc.common
Explicitly use procd USE_PROCD=1

lleachii · June 27, 2020, 4:50pm

I'm lost...perhaps you accidentally replied to me.

I'm not sure why you boldfaced this. Let's all look at the script included with OpenWrt:

        procd_open_instance                                                                                                                                                    
        procd_set_param command /usr/sbin/softflowd -d $args${pid_file:+ -p $pid_file}                                                                                         
        procd_set_param respawn                                                                                                                                                
        procd_close_instance

As you see, -d is used.

dl12345 · June 27, 2020, 4:50pm

Yes, I replied to the wrong post

It was simply copy/pasted from the online docs, where it is boldfaced.

The post was a comment to @RuralRoots that a procd service isn't meant to be run in the background anyway, since he's been talking about it not working when run in the background.

lleachii · June 27, 2020, 4:53pm

I was prompting this reply from you...once I realized.

RuralRoots · June 27, 2020, 5:13pm

anon50098793:

-D ( and -d ) switch forces run as root

no -D softflowd runs as nobody...

nobody ( triggers PRIVDROP_USER ) is unable to access /dev/null ... so process fails
strace softflowd -v 5 -i br-lan -n 10.10.1.100:5555 -T full 2>&1 | grep -E '(chdir|/dev/null)'

From my references and thus my understanding - thanks
-d Don't daemonise (run in foreground)
-D Debug mode: + verbosity + track v6 flows

Strace - Thanks. Reminds me of the process of stepping through my code in the sandbox.

RuralRoots · June 27, 2020, 5:39pm

Thanks, it was clearly addressed at me and I did take note.

I'm just going to sit back at this point and ruminate on all this until I can digest it all. Thanks to you all. Time for due diligence on my part.

RuralRoots · July 3, 2020, 3:18pm

I believe I found the cause of my issues.

Just for my own information, I wanted to see the impact of adding additional flows so I added another flow to the rc.local and performed a re-boot. Only the first flow shows up in ‘top’ - ????

I added echo commands into the rc.local to track the process and logread -e “echo-value” sure enough indicates clearly both instances were invoked, but again only the first instance appears in the process tree - ?????

Finally logread -e “instance2 interface-name” returns “instance2 interface-name” doesn’t exist (yet).
Adding sleep 1m before instance2 in the rc.local fixed that and all is good.

So, I turned my attention to the init.d script and the first thing that hit me was that it had a start priority of ‘50’. It was running prior to the existence of the target interface and thus failing. Changed start priority to ‘99’ and all is good - I see both instances in ‘top’ and both collectors are now receiving data.

So, my original issue is solved - thank you all for adding to my personal KB.

Now, I’ve come across another issue - same subject.

Occasionally the instance2 interface restarts based on ‘some’ trigger and instance2 terminates. Is there any way of re-invoking instance2 based on an IFUP of the instance2 interface? In the case of this particular application at least, it would make sense to start the flow acquisition whenever the target interface comes up.

I am also coming up with a dearth of information relating to script interpreters - softflowd script for example uses rc.common as it’s interpreter. Any hints where I can find out more how parameters are passed, placeholder variables get their values, interpreter syntax, . . .

Effectively, I would like to learn how to “step through the process flow if that makes sense.