Switch-firewall-traffic-capture

GaryByatt · August 3, 2018, 12:34pm

I would never have asked, but as you offered, yes, please test bridging a couple of ports - that would be a great help. The rest of this project seems like it will be fairly straightforward. The risk is: can I transparently bridge two ports on one of those APU2 boards and tap the data passing though while still maintaining good speed through the bridge. As I will not be shifting much data off the device I am not concerned too much about that bandwidth. Could I even inject the captured data back into the bridge? I certainly do this on a hardware network tap I use. I occasionally connect my laptop to a hardware tap between a switch and a host and it becomes part of the network as if connected to the switch. This is broadly what I want to replace, except in this case I need to minimise the data captured before it gets uploaded.

I need to experiment with ZFS, but I am keen to try it now, and you are right the extra 2 GB is small change compared to the rest.

jeff · August 3, 2018, 3:02pm

Yes, I'll see what I can rig up this weekend to test the ability of an APU2 to bridge and monitor traffic.

On injecting packets it shouldn't matter (within reason) what NICs you're using with a Linux kernel or with FreeBSD. As soon as you can write raw packets, you can do pretty much whatever you want.

On ZFS, I'd try it on a VM with a couple of virtual disks to start. 16 GB "growable" from VirtualBox's default works fine for FreeBSD and don't consume much space on your physical drive. I'd also suggest FreeBSD 11 for several reasons. First is that it can install a system on ZFS with the built-in installer, without jumping through hoops. sudo pkg install beadm will get you the boot environment manager so you can look at that too (the Linux world doesn't have this yet). Also, on a VM, you can see how straightforward it is to replace drives, or even expand an existing pool by adding drives. Moving from 2 TB drives to 6 TB? Add two 6 TB drives to the pool of two 2 TB drives, wait for them to resilver, remove the two 2 TB drives and you have a 6 TB pool now. No reformatting, no RAID-card reconfiguration, it's that easy.

You can't "convert" another file system to ZFS. You can, however serialize a ZFS data set (anything from a snapshot to an entire filesystem, all its children and all their snapshots) and send it to a "flat file" or most anything else (such as piping it top another process or over ssh to another machine). That serial representation can be received into an other ZFS pool as a child filesystem, replicating the entirety of the original as a subtree of the pool. (In ZFS, the mount point can be specified independently of the position of the file system within the pool.) These serial streams can be snapshot, full, or incremental, with the ability to recover "mid stream" (at least with Solaris and FreeBSD). There's also iSCSI for block-level remote storage, if that's the path you want to pursue.

GaryByatt · August 3, 2018, 7:34pm

Thanks Jeff

Very good of you to test that idea out on your hardware.

I did not mange to find time to install ZFS today, but I will over the weekend.

I don’t have any suitable spare hardware at the moment, but next week I will get a new SFF device as I have a need for one of those soon anyway. It will probably have a J4105 or N5000 or N5005. I will try out FreeBSD 11 on it before it gets to its final job, perhaps still running FreeBSD with ZFS if all is well.

The new SFF device will be more powerful than I require for my project, but that means I can judge how much less of a processor I can get away with, whereas if it were underpowered it would be hard to know how much by.

I must say I am increasingly impressed by the capabilities you are telling me about ZFS. I am surprised that I have not seen the advantages to using it before. Perhaps they were not so concisely put and I before I got to understand them I just dismissed it as yet another file system with a different balance of strengths and weaknesses. Do I need to install a desktop of some sort to support beadm? Sounds like it would be easier to get started with a UI on ZFS.

jeff · August 3, 2018, 7:58pm

No, you don't need a desktop. I find that trying out OS features in a VM (I use VirtualBox) to be the easiest and safest, not to mention cheapest as there's no additional hardware needed. It shouldn't take more than 10 minutes or so to get FreeBSD installed on a VirtualBox VM, and then a couple more minutes to install pkg (the package manager) and beadm. The filesystem, if you used the ZFS option for the install, will already be configured to work well with beadm.

https://www.freebsd.org/cgi/man.cgi?query=zfs will get you the man page (HTML formatted) without having to install the whole OS. Others of interest would be zpool, beadm, pkg, freebsd-update, and maybe gpart and glabel. apt, dpkg, or the RPM tools get covered by pkg and freebsd-update -- there is a "strict" separation in FreeBSD between core OS (/etc/, /bin/, /sbin/, /usr/bin/, /usr/sbin/, /lib/, ...) and non-core packages and their config (/usr/local/etc/, /usr/local/bin/, ...)

No GUI for ZFS as it came from the "server world" (Solaris, then FreeBSD many years ago). I'm guessing it isn't too popular on Linux-based OSes between it not being able to be distributed as a binary with those distros, and that it isn't neatly integrated into GUI disk-management tools. The commands are typically pretty straightforward and there is lots of good advice on how to accomplish tasks on the (now) Oracle site, FreeBSD site, and associated forums. You can get yourself into "interesting" situations with clone and send piped into recv where, if you're not careful, you end up with the "wrong" filesystem mounted on your running system, such as on / or /usr, but they can be recovered from. Another reason to try things out on a VM until you're comfortable with it, or are trying something new and "dangerous".

jeff · August 4, 2018, 4:35am

Bottom line is that an APU2C4 appears to be able to bridge and run tcpdump at GigE rates without breaking a sweat.

DUT is an APU2C4 running FreeBSD 11.2-RELEASE. Test-path Ethernet in/out connected on em0 and em1 with ssh management access over em2. ssh in and running htop to watch CPU utilization. Bridge created as per https://www.freebsd.org/doc/handbook/network-bridging.html

Test harness is two FreeBSD 11.2-RELEASE machines with Intel NICs, one an i3-7100T on a "normal" motherboard, using the NIC in the PCH, the other a Xeon e3-1265v2 in a Lanner FW7582A with discrete Intel NICs. Test software is netperf-2.7.1.p20170921

Cabling is CAT6, 1-2 m in length.

No interface configuration past setting IPv4 address and netmask on the test-harness NICs, or past adding to a bridge and bringing them up on the APU.

Ethernet Cable

[jeff@lanner ~]$ netperf -H 192.168.100.102 -I 99,2           
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.102 () port 0 AF_INET : +/-1.000% @ 99% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 65536  32768  32768    10.04     941.47

Bridge Only

No "tap" -- one or two CPU cores (of four) engaged, typically under 40% for the core or two engaged.

[jeff@lanner ~]$ netperf -H 192.168.100.102 -I 99,2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.102 () port 0 AF_INET : +/-1.000% @ 99% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 65536  32768  32768    10.07     941.10

Bridge, tcpdump -w > /dev/null

tcpdump consumes 60-90% of one core, remaining cores at relatively low levels

[jeff@apu-too ~]$ sudo tcpdump -i igb0 -w /dev/null
tcpdump: listening on igb0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C7364551 packets captured
7362628 packets received by filter
0 packets dropped by kernel


[jeff@lanner ~]$ netperf -H 192.168.100.102 -I 99,2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.102 () port 0 AF_INET : +/-1.000% @ 99% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 65536  32768  32768    10.04     940.96

Bridge, tcpdump Pumping Everything Over ssh

ssh swamps a core when trying to pump 1 Gbps through encryption and over the wire. The remaining cores are running in the 30-60% range.

[jeff@miniup ~]$ ssh jeff@apu-too tcpdump -i igb0 -w - > /dev/null

[jeff@lanner ~]$ netperf -H 192.168.100.102 -I 99,2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.102 () port 0 AF_INET : +/-1.000% @ 99% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 65536  32768  32768    10.04     939.62

ssh Throughput

[jeff@apu-too ~]$ dd if=/dev/random of=random.1000m bs=1m count=1000
[jeff@apu-too ~]$ dd if=random.1000m of=/dev/null bs=1m
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 2.549130 secs (411346640 bytes/sec)

[jeff@miniup ~]$ scp jeff@apu-too:random.1000m /dev/null
random.1000m     100% 1000MB  24.4MB/s   00:41

Disk read is over 400 MB/s, so 24.4 MB/s is probably dominated by ssh.

Call it 200 mpbs or so, encrypted over the wire, for incompressible data.

GaryByatt · August 6, 2018, 12:39pm

Sorry about this Jeff; a commercial dispute that started on Friday has taken a lot of my time, including over the weekend and today, so I am not getting far on this project.

As I currently don't have a host runing a hypervisor that I can use for this test, I installed ZFS directly on a Fedora 28 Server.
This sever exists to occasionaly test server solutions before they are deployed to live systems.
So for my install it was only running my SSH session and cockpit without any clients logged in.
This took 325 MB RAM out of 16 GB, but after installing ZFS went to 370 MB eventually settelling down to 368 MB.
In fact I don't use any Linux servers with a GUI, so I am glad I don't need one for beadm.
Minimal install was just two lines, but the second of them that compiled the kernal modules took quite a while.
In fact I had just launched another session to look if there was a problem when it finnaly fininshed.

I hope to return testing ZFS after today.

GaryByatt · August 6, 2018, 1:32pm

That is some excellent test work Jeff.

It does seem like the CPU may get into difficulty if the bridge is carrying capacity traffic and I apply tcpdump filtering to reduce the upload size – from experience this pushes the CPU hard to dissect the frames and compare to filters, but that much traffic is a special case and the reduction in data over SSH is going to reduce the CPU load.

The APU2C4 seems to close to ideal for this project having exactly the right number of NICs and a reasonable cost as well.

I may be able to do this for a little less with some other equipment, but I am not sure that it is worth the effort or the expense of testing them now I know the APU2C4 is good enough.

Thanks so much for all your insights and testing work Jeff.

Will let you know how I get on when I have time to get the hardware, replicate what you have done, and add the filtering.

jeff · August 6, 2018, 4:21pm

I don't know what kind of filtering you're needing, but pcap filtering is generally pretty efficient.

With no filtering, the complete mbuf chain for each packet needs to be duplicated and output, so that load would be reduced.

dlakelan · August 6, 2018, 6:53pm

If you really need high performance filtering, you should probably write a custom filter using eBPF, here's a summary. https://opensource.com/article/17/9/intro-ebpf

I think there is a clang backend to compile to eBPF so you basically write a single limited functionality C function that does the filtering, it compiles to eBPF bytecode and then you load that bytecode into the appropriate tc action or firewall rule

edit: you can do iptables matches based on ebpf using iptables ... -m bpf ... and in ingress or egress queueing using tc filter ... bpf ...

see "man iptables-extensions" and "man tc-bpf" for details

jeff · August 6, 2018, 7:12pm

Similarly, if you decide to go with FreeBSD, ipfw can perform very efficient packet-header matching and then either tee to a socket or into the netgraph subsystem (which also has a ng_bpf node type). Poke me if you go that way. There are some tricks around both applying ipfw to bridged packets, as well as needing to make sure you don't introduce packet loops (duplicated packet from the tee re-entering the rule set).

With a Linux-based OS, you might want to look at nftables as I find it to be much more understandable than iptables.

jow · August 6, 2018, 8:28pm

Please note that using and configuring BSD is offtopic in the OpenWrt forum, you might want to use a dedicated BSD community site to discuss its use and fitness for packet filtering.

diizzy · August 7, 2018, 7:53am

If you want to do fancy network stuff DPDK (available in both Linux and FreeBSD) is probably the way to go if you want to squeeze out as much performance as possible within reasonable time (ie, not writing your own solution from scratch).
https://www.dpdk.org/
https://doc.dpdk.org/guides/freebsd_gsg/index.html

GaryByatt · August 7, 2018, 3:20pm

I guess I am down to testing now on this.

GaryByatt · August 7, 2018, 3:45pm

Thank you Daniel
Not heard of eBPF before - seems like an efficient mechanism to try.
I want to extract a few details from frames after simple filtering to minimise the the data set.
At the moment the obvious details to extract are IP and MAC source and destination addresses.
I will want to modify the filtering according need, so for example I might drop frames that are IP or RFC 1918 or ARP or malformed or DHCP or DNS or the inverse of any of these. It don't expect filtering to get much more complex than a boolean combination of these.
Usually my test is capture everything with a laptop on a network tap. Then go back and apply these kinds of filters looking for problems. Its hard to be prescriptive about the filters as it depends on the problem, but I rarely need to look at much more than the addresses in a frame.
So if eBPF can do this kind of filtering efficiently its a good idea.

GaryByatt · August 7, 2018, 4:08pm

Thanks again diizzy
DPDK also looks very promising.
Although I don't have sophisticated filtering requirements, as I noted to Daniel they will need to be modified sometimes.
To keep the compute unit small low-power and low-cost, the filtering and extraction need to be efficient as they will probably dominate the load. As Jeff showed the bridge alone is actually quite efficient, at least on the APU2C4. I suspect that efficiency is strongly related to the design of the hardware, for example a separate bus and controller for each bridged port may offload work from the CPU, but as I said before I know approximately nothing about electronics. The upload over TLS or SSH I don't expect to be very expensive when the data is reduced to just a few addresses, so the main cost I suspect is the filtering and extracting to get to that small data set. I could do that work after upload, and that is still an option, but it risks high loads on network nodes if everything is getting sent twice. I suppose it is not essential to use a TLS or SSH tunnel to upload if that substantially reduces load on the compute unit as Jefs test seem to indicate, but these days that seem to be the expectation.

dlakelan · August 7, 2018, 4:12pm

iptables will efficiently handle matching on MAC or IP addresses, especially with ipsets which can use a single rule to match a whole set of IP addresses or networks. eBPF will be more efficient if you need to do more complicated matching where you'd need ten or twenty or a hundred iptables rules or to match on packet content etc.

GaryByatt · August 7, 2018, 4:35pm

I use firewalld on CentOS and Fedora, but the principle is the same as with iptables.
On this project I only need simple firewall rules to protect the port used for control and upload, but I could apply rules to the bridge.
I have never tried to firewall a bridge, so I don't know what problems to expect there.
Although the idea got lost due to the very useful three port APU2C4, I may want to use some one or two Ethernet port hardware in the future, or at least have that flexibility. As single Ethernet port hardware is a lot more common and I bet the USB controller on most has a separate bus from the Ethernet port, I could as Jeff said use a USB to Ethernet adapter - I use these a lot on my laptop, worryingly they seem to keep failing.
I know it is possible to use the bridge for command and upload because I have used wireless bridges that act as an L2 bridge for traffic to the remote network via the wireless interface, but can be controlled via the bridged port. So they must be intercepting traffic addressed for them rather than forwarding it over the wireless bridge.

dlakelan · August 7, 2018, 5:03pm

On linux you just turn on a sysctl and the kernel will call iptables on the bridge traffic, it'll look at the forward table for traffic going through the bridge, and you can filter there. It's pretty straightforward. The sysctls involved are:

net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-ip6tables

https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

GaryByatt · August 8, 2018, 8:45am

Thanks Daniel
That saves some searching; I would not have know where to start looking for that.
Looking back at my comments about 'command and upload' and "intercepting" that traffic on a bridge I think I was making this too complex.
Even if a NIC is bridged it still has an associated IP address, so I suspect the NIC will as normal extract the PDU from the traffic addressed to it and forward that to OS as usual. Not sure if that traffic is still sent though the bridge, but if it is firewall rules on the bridge could drop the 'command and upload' traffic.

jeff · August 8, 2018, 2:55pm

If you assign one to it. Like a switch, they don't need an IP address assigned to them.

A bridged set of interfaces will typically work like a Layer 2 switch. Directed frames will be sent out the interface on which the target MAC is found (with the behavior if not already in the APR table being implementation specific) and broadcast frames will be sent out all interfaces.