Open Source DPI and Network Intelligence Engine (Beta)

pbaldwin · January 17, 2020, 10:35pm

Open Source DPI and Network Intelligence Engine (Beta)

Hi OpenWRT Community,

We're a Canadian company that has developed two bits of technology that can help with managing networks at the edge of the Internet:

An open source DPI engine - the Netify Agent (netifyd) - that can detect protocols, applications, and other fun network tidbits. The protocol detection part of the engine is currently based on nDPI, but we’re always adding fun features and hooks to get better network visibility.
A cloud service (gasp!) - Netify - that provides complete network traffic visibility and intelligence (screenshots) using the aforementioned DPI engine.

For network administrators, the solution is a great tool for managing dozens, hundreds or thousands of edge gateways. For integrators, the technologies have been integrated into SD-WAN devices, IoT gateways, and firewalls. Below is a pfSense screenshot showing WhatsApp and Facebook getting blocked by the Netify Agent DPI engine.

Since OpenWrt is a great solution for the edge of the network, we’re starting to provide support for the platform. With the release of OpenWrt 19.07, you can easily install the Netify DPI agent the usual OpenWrt way. More on this below.

Please keep in mind, DPI in tight spaces is tricky, but we’re going to take a crack at it!

Getting Technical

OpenWrt users tend to lean on the technical side, so let's get down to the details. The open-source Netify Agent (netifyd) uses deep packet inspection (DPI) to extract useful metadata from a network conversation:

Application
Protocol (yes, encrypted protocols too)
SSL ciphers, SNI, certificate names, etc.
Hostnames
Agent strings
Torrent hashes
DHCP fingerprints
SSL fingerprints
and more

The Netify Agent only analyzes the first few packets in a network conversation and then sends a JSON-encoded stream of metadata to a socket interface (TCP/IP and Unix sockets). We’re not interested in the payload, just the metadata in a network connection. Third party tools can connect to this data stream and do all sorts of different things - firewalling, QoS, reporting, etc.

For example, the screenshot below is part of a stream of information coming from on an Avast client on a Windows 7 desktop making an HTTPS/TLS 1.2 (version: 0x0303) connection to an Avast server (wildcard SSL certificate: *.avast.com”).

As you can imagine, it’s possible to build hooks into OpenWrt’s firewall and QoS engines using this data stream. Do you want to throw BitTorrent traffic into a low-priority QoS queue? That’s the hope for 2020, but we’re first going to concentrate on tuning DPI performance.

Integrators: Kicking the Tires

Do you want to kick the tires in OpenWrt? First, you should know that deep packet inspection (DPI) requires some horsepower, so please don’t try this on low-end hardware! I have it running on an old TP-Link WDR3600. Memory is not a problem, but the CPU load is approaching max when I’m pushing the wireless network as fast as I can.

If you are interested in seeing live data on your network, you can use nc (netcat) and jq (JSON processor) to see data flows like the above screenshot. With the release of OpenWrt 19.07, you can install the netifyd engine the usual OpenWrt way.

opkg update
opkg install netifyd

And jq for viewing pretty JSON output:

opkg install jq

The nc tool in OpenWrt is only able to listen to TCP sockets, so you have to add the listen_address[0] line below to enable TCP sockets in the netifyd engine. In /etc/netifyd.conf:

[socket]
listen_path[0] = /var/run/netifyd/netifyd.sock
listen_address[0] = 127.0.0.1

Then restart netifyd:

service netifyd restart

Then you can watch the JSON payloads go by on your terminal using netcat and jq:

nc 127.0.0.1 7150 | jq

If you are a developer, you can do all sorts of fun stuff with the data. You can find more information on the DPI engine integrator page.

Netify Network Intelligence

Well, those are some of the details about the on-device DPI engine. Now I’m going to put on my product hat for the cloud-based Netify product.

product pitch warning

For those of you who want a deep analysis of what’s happening on your network, read on.

This same JSON network data stream is used for Netify - a cloud-based subscription service that provides network intelligence and visibility. Netify makes it possible to manage network resources, identify & inventory devices, enforce company policies, provide forensics, detect weaknesses, and stay on top of cyberthreats. In essence, Netify provides insights to help manage network and devices. Here are some screenshots.

Features include:

/product pitch warning

What's Free, What's Not

Just to summarize what's free and what's not:

The underlying netifyd deep packet inspection agent is free, open source, and licensed under the GPLv3. We’re big fans of open source ... we used to manage and maintain a Linux distribution (ClearOS).
The Netify cloud-based service is a paid subscription service, starting at $25 per month. Subscription levels can be found on Netify's pricing page. Please feel free to take a test drive with our no-obligation 7-day free trial.

We know, we know... "cloud" is not everyone's cup of tea, but offloading the horsepower to do analysis is necessary. It’s also important for our clients who manage hundreds or thousands of endpoints on the network. Regardless, there’s an option for private hosting for enterprise deployments.

Installation and Configuration

Though the underlying netifyd engine is mature, the OpenWrt environment is new to us, so it comes with the usual first release kind of software quality. If you want to try the Netify cloud-based service on your OpenWrt 19.07 system today, you can find installation and configuration instructions here:

Netify for OpenWrt

From time to time, we’ll post updates on the DPI engine in this thread. Feedback and comments are welcome.

juppin · January 17, 2020, 11:25pm

This looks very promising and I already like it without trying it.

Are you also planning a on device graphical user interface or an integration into luci?

I'm asking, although I've been using pfSense as a firewall for a long time, because I'm planning to change my edge device to OpenWrt and I haven't done it yet because there are not many great graphical tools in luci like on pfsense.

lantis1008 · January 18, 2020, 12:51am

I'd like to see this written as a netfilter extension.

pbaldwin · January 19, 2020, 8:38pm

Hi juppin,

Are you also planning a on device graphical user interface or an integration into luci?

At this stage, we're just seeing how far we can get with DPI on a small device. However, Luci integration is likely going to happen at some point in the future... it's really only a matter of when.

pbaldwin · January 19, 2020, 8:54pm

Hi lantis1008,

We have thought about creating tighter integration with netilter for a couple reasons:

Speed - it would be able to react to traffic-making decisions a little faster
Resources - it would require less overhead, important for typical OpenWrt hardware deployments

The big disadvantages: it would be a Linux-only feature, and it would slightly break the Unix philosophy.

Regardless, it's certainly technically possible and doesn't require too much development.

lantis1008 · January 19, 2020, 9:40pm

I've certainly been looking for a Layer7-ish replacement for a while for use in Gargoyle, and i'd dabbled in nDPI (and the various forks that turn it into a NF extension) and while it was great, it was typically unstable and caused more crashes than it was worth.

If this was done (and i certainly would encourage it!), i'd also like to see the patterns/rules being pluggable. I know that some of the rules are quite complex and don't lend themsleves to this concept easily, but it should certainly be possible for some of them.
This allows for rules to be updated easily in the case where an IP range updates for a match, rather than recompiling a new version.

Anyway, thanks for sharing. I'll certainly keep an eye on where you guys are heading.

pbaldwin · January 20, 2020, 2:07pm

Hi lantis1008,

In Netify, this has already been done to a certain extent. As you have noticed, the nDPI project has munged together the concepts of "protocols" and "applications". So a bandwidth graph might show totals like:

HTTPS: 40%
HTTP: 25%
YouTube: 20%
Facebook: 10%
DNS: 5%

There are protocols (HTTPS, HTTP, DNS) and there are applications that run on top of those protocols (YouTube, Facebook). Mixing the two concepts was a bit problematic for us, so these are separated in Netify.

Yup! Detecting things like encrypted BitTorrent requires a proper engine (or "dissector" in the language of nDPI). These protocols need to be compiled into the DPI engine - it's hard to get around that. It's not too bad though, protocols are relatively slow moving targets - FTP in 2015 is the same FTP in 2020. Applications, on the other hand, are fast moving targets. Fortunately, there's no need to recompile netifyd code to make changes to application definitions -- the changes can be done via a configuration file.

For example, here's a sample of the Snapchat application definition:

host:"^snapchat.com$",host:".snapchat.com$"@netify.snapchat
host:"^sc-cdn.net$",host:".sc-cdn.net$"@netify.snapchat
host:"^sc-static.net$",host:".sc-static.net$"@netify.snapchat
etc.

And about a dozen others (here's the current list of Snapchat domains). Any DNS, HTTP, HTTPS/SNI, SSL certificate name, etc. that matches the domain (or IP address block) will get classified into the specified application.

You can see the list of detected applications shipped with OpenWrt in /etc/netify.d/netify-sink.conf. For Netify cloud customers, this list is updated on a regular basis. For non-cloud customers, we can also provide an updated list that could be updated via a cronjob or some other way.

anon50098793 · January 20, 2020, 2:19pm

Trying to ascertain how secure the offsite transport is... after clicking "Learn More" several times on your product site... all I found was this...

User-provided data is 100% private, encrypted with a passphrase known only to you

Which leaves me to wondering... Where would someone who was interested in using your product be able to find out how offsite traffic is secured? ( shouldn't this feature more prominently in your post / site? ). Does the above statement refer to cloud based on-disk encryption or transport only? Is the same key used for both?

pbaldwin · January 20, 2020, 7:31pm

Yes, we should have a landing page that describes all the offsite details. I'll add it to our TODOs.

Short version...

The netifyd agent is open source, so developers can certainly poke around the data the is sent to the cloud. The JSON payload example on the netifyd page gives you a rough idea though. The network metadata sent to the cloud can be further anonymized via some of the privacy features listed here: Netify Privacy.

The data is transported via HTTPS using a unique netifyd identifier. The encryption key, which is never sent to us, is used to encrypt the following user-provided data:

API resource keys
Owners (e.g. Dave Smith)
Groups (e.g. Sales)
Device Names (e.g. Pete's Mobile)

So encrypted user-provided data, along with the anonymized metadata, provides a path to remove traces of personally identifiable information. In addition, the data stored is not directly linked to a user's account - that's where the API resource keys come into play. Only the encryption key can establish the link between an account and the network metadata.

There are more technical details in the "Netify Privacy" link above, but that's the abridged version.

Oh. For large deployments, all of the infrastructure can be hosted on a private network... no public cloud required. For medium-size deployments, we also have the option of storing just the data on a private network.

anon27813507 · May 9, 2020, 10:20am

From the source code of netify-fwa it seems that nftables are not supported?

Any plans to provide/package Netify Console tool for OpenWrt?

The console is currently available for ClearOS

Could not locate

netifyd.conf(5) man page for documentation

/etc/netifyd.conf refers to

See /usr/share/netifyd/netifyd.conf-sample for all possible options.

however, that file is absent.

The package description states:

These detections can be saved locally

Is this related to

[socket]
dump_established_flows = <enable to write all established flows to connecting client>
dump_unknown_flows = <enable to write unknown flows to connecting client>

or to

[netifyd]
json_save = <yes/no>

? If so how can the dump location be specified?

[flow_hash_cache]
save = <persistent/volatile/disabled> 
[dns_hint_cache] 
save = <persistent/volatile/disabled>

If persistent how can the dump path be specified?

dsokoloski · May 12, 2020, 2:28am

kuhfufhrbuierf,

Thanks for the questions! I am the principal developer for the Netify Agent.

The Netify Console tool (as currently released) is a PHP application. The packaging for that was done only for ClearOS. There were no plans to release a package for other distributions/platforms because a new version is being designed in C++. The PHP version was first a debugging tool which now should be developed further into a full application with more needed features.

That being said, the PHP version is available here, and can be run from a cloned/manual install on any host that has a PHP interpreter. Netify Agent can then be configured to listen on a network socket (versus the default file socket), enabling remote Netify Console connections. TODO: At the moment, there is no privacy/authentication/encryption on this socket so some thought should be given to secure network access to it.

We don't package/include man pages or other documentation files for OpenWrt. I thought considering it's an embedded platform, perhaps that would be frowned upon. In hindsight, these files are so tiny compared to the rest of the image, we can include them in the next release if that's expected.

In the meantime, you can find the the man pages and sample configuration here.

Both. Depending on your requirements. An established socket connection will stream real-time detections and other status information (JSON payloads) for applications that want to ingest a stream. The "dump_established_flows", when enabled, will send the connecting client the entire current state of the engine. It does not dump the current state to a file. For that, use "json_save".

"json_save" will periodically (15 seconds by default), dump all new detections and all active flows to the file: sink-request.json

"dump_unknown_flows" is more of a debug function. It creates small pcap files (8 - 10 packets, configurable) for unidentified flows. When enabled, these files can be found in the volatile state directory as: nd-flow-xxxxxxxx.cap

This file is saved to the "volatile" state directory, which on OpenWrt is: /run/netifyd/

This path is currently compiled in and cannot be changed at runtime.

Again, this path is compiled into the executable and currently cannot be changed at runtime. For OpenWrt, the "persistent state path" is: /etc/netify.d/

dsokoloski · May 12, 2020, 2:33am

kuhfufhrbuierf,

I missed this first question. You are correct... we will be circling back to add support for it. I spent the last development time on implementing support for BSD PF, which consumed considerable time. NFTables is on the roadmap next.

anon27813507 · May 12, 2020, 10:01am

Thank you for the pointer and explanation. Perhaps you would consider packaging it for OpenWrt, reckon it would add value?

That somewhat lessens the appeal/attraction of the app. Whilst it may make sense for devices that only feature storage prone to extensive wear by intensive disk writes, e.g. NAND flash, there are also devices supported by OpenWrt that feature support for installation of SSD/USB drives.

With only the volatile storage option any dump will vanish during a power cycle.

dsokoloski · May 29, 2020, 5:24pm

It seems a bit heavy (dependency-wise) to run on an embedded router. Netify Console works over the network so most users run it on their laptop/desktop or other server to view real-time flow data from their embedded gateway. That being said, the rewrite in C++ that is underway would run beautifully on tiny systems so we certainly can release that as a separate OpenWrt package.

We will consider making these paths configurable at run-time for the next release.

A moot point considering this file is overwritten every 15 seconds. You would have only lost the last 15 seconds of activity. If you want to keep the history, you would have to select and copy the data you're interesting in, regardless of the update file's storage location.

If a complete history was to be kept, a simple shell script could be written to copy the file to a permanent SSD/USB storage location every 15 seconds.

anon27813507 · May 29, 2020, 8:34pm

Thanks for pointing that out, it was not clear.

Suppose that is what being offered, aside from other features, through the cloud-based paid subscription service.

codemarauder · July 13, 2020, 5:21am

Thanks for this and glad that the agent is licensed under a Free Software license.

I have just connected to the socket to look at the stream and it makes me happy to know that there are immense possibilities to write applications that take decisions on the device itself.

I have a question though, it seems to auto-detect WAN interface which I suppose is to merge the flows from LAN and WAN on the analysis server to identify end-to-end client-to-server flow, instead of considering them 2 separate flows. Now, what happens in the multi-wan scenario?

We extensively use MWAN3 with 2 to 4 WANs. Does the agent support multiple -E options?

Thanks in advance.

pyb74 · August 6, 2020, 10:34am

Hi Netify!

I've been deploying the agent and sending
JSON to ELK stack over TCP socket in a couple of minutes! (actually might be hours but I'm also new to ELK stack...)
Currently registered to your cloud service in "trial mode", your dashboards are amazing! So many insights on my familiy devices/Internet usage
One comment though: as an advanced/nerd home user I don't think I would pay for the basic plan monthly fee beyond trial... It would be good to offer a free plan with much limited features or limited to a single site? Self-hosting might be also an option?
Alternatively I would try to build up nice dashboards on Kibana, but do you provide somewhere a full description of the JSON fields and a list/lookup for protocols and applications identifiers?
Thanks and great job!

dsokoloski · August 7, 2020, 3:32pm

First, sorry for the inexcusable delay responding to your message! I have notifications enabled for this thread but for some reason I'm not receiving emails for replies.

Second, thank-you for the feedback and kind words! Glad to hear you like what you see so far...

Regarding Multi-WAN; yes you can specify an "unlimited" number of LAN (internal) and WAN (external) interfaces. The latest source code in the OpenWrt packages repository also has updates that let you configure the Agent by editing /etc/config/netifyd, but I'm not sure if it will allow multiple interfaces (I didn't write that code).

The easiest way to test custom options is to edit: /etc/init.d/netifyd and disable "auto-detect" mode. You can then specify your full command-line in the NETIFY_OPTS variable (in quotes), such as:

NETIFY_OPTS="-I br-lan -E eth0 -E eth1"

dsokoloski · August 7, 2020, 3:48pm

Greetings!

Thanks for the positive feedback -- much appreciated!

Regarding a "free" (or very low) plan for home use -- I hear what you're saying and I've been a proponent to introducing such an option but I have no updates on that yet.

We have some limited documentation, but it lacks field-by-field descriptions. However it still may have some information that you find useful. There are also links in there to other applications that use the JSON stream which you could refer to for example implementations.

The link is here: Netify Agent JSON v1.90

Of course, if there are some specific questions you have regarding the structure, I would be happy to answer them privately.

pyb74 · August 8, 2020, 4:01pm

Thanks Darryl for the explanations.
I was able to capture the "protocols" type message