Switch-firewall-traffic-capture

GaryByatt · August 2, 2018, 7:56am

Certainly there is bus sharing on the original model B and RPi 2 and 3. As I am unlikely to use an original model A again I can't bring myself to check that in greater detail. Bus sharing could be a significant factor for some projects, so its good to understand it.
RPi have their uses, ultimately enabling many uses cases where higher cost compute units would not be tolerated. I often see them in use in businesses for a whole range of purposes, but they have a low price for a reason, so we should not expect too much of them. I am glad they came to market, as they got me to look seriously at low power compute units.

GaryByatt · August 2, 2018, 10:09am

I have never used ZFS, but you have got me interested. I will be looking for use cases for that from now, although as you say there are issues reported around using it that stop me using it until I understand it and its issues better.
Performance comparison shows the AMD GX-412HC at 1.2 GHz (can’t find numbers on the 1 GHz GX-412TC used in the APU2C2) is quite close to the Celeron N2920 and probably fast enough. Performance to price ratio is very similar too. However, I can get a Chinese J1900 device (so circa 22% faster CPU mark over the GX-412HC and N2920) with 4 Intel NIC 2 GB RAM 32 GB eMMC (but no circuit design so not sure what I am getting) for very similar money.
Glad you mentioned irqbalance, not used it. Docs on it says its for balancing across processors rather than cores. So I will experiment a bit with it.
Great detail on the bridging requirement. This sounds like it will be the biggest challenge as I clearly don’t understand it well enough yet.
Also Interested to hear about the cross-port leaking before boot is complete. In fact, a few weeks back I looked into similar behaviour I have detected multiple times over the years in different in running commercial switches. The more I looked the more special cases I found. When I got to ten I stopped. Broadly, in some special cases a switch may not be certain of traffic’s destination port, so rather than just drop it broadcasts the traffic so that it will arrive where it should, even it arrives where it shouldn’t too. I have no doubt that a network exploit using this will be created, if it has not already.
Thanks for all the help; you know a lot.

jeff · August 2, 2018, 4:13pm

The "issues" are very straightforward -- Linux and many Linux-based distros are licensed under the very restrictive GPL-family, ZFS is licensed under a much more open license. As such, you can't distribute the two of them together in binary form. You just have to build the ZFS modules from source during install, then on your target machine. It's straightforward. See for example https://github.com/zfsonlinux/zfs/wiki/Debian-Stretch-Root-on-ZFS

Nor the quality or authenticity of the components on the board or the build quality. Take a gander at, for example, the pfSense forums for some enlightening opinions about the cheap imports. In a similar vein, there are lots of new "Intel" NICs out there selling at "surprisingly reasonable" prices that are certainly couterfeit. See, for example, https://forums.servethehome.com/index.php?threads/comparison-intel-i350-t4-genuine-vs-fake.6917/

On the consumer-grade switch chips "leaking", basically they come up with a "stock" configuration at power-on. In contrast to a good SOHO- or enterprise-grade switch, the ports are not "held off" until the switch is configured. So you get whatever that switch chip has as its power-on configuration for the minute or so until the OS configures it. This can mean that, for example, all the LAN ports are bridged together for that period of time.

GaryByatt · August 2, 2018, 6:11pm

Yes, I looked at ZFS for CentOS and Fedora, it seems DKMS is the answer to rebuild the modules on upgrading the kernel. I can see ZFS being useful for typical servers, but it sounds like it will be a bit costly on performance for this project.
I suspect I will not find the right compute unit first time on this project, as it is hard to estimate the actual performance cost.
Agree re cheap Chinese copies. I remember one chip maker complaining that some firm over there copied one of their chip designs so closely it included the designers name! I have also spoken someone that had his equipment made in China, turned up at a trade fair with it to find them selling it under their name.
I only buy from suppliers in my country, so if its rubbish I can send it back, but its hard to tell if something contains fake parts, and harder to convince the seller.

jeff · August 2, 2018, 6:17pm

ZFS is for a lot more than servers. I run every system I can using it because of its ability to perform instant, "no cost" snapshots and cloned filesystems, no need to pre-allocate file systems, and a host of other great features. On-the-fly compression and de-dupe can significantly increase useful drive capacity (1.5-2x in my experience). Mirrored drives add redundancy with ease. The speed is comparable or better than other current filesystems. Throw in some cheap SSDs for L2ARC and ZIL and slow, spinning-platter drive for backing store become blisteringly fast. I've been using it for many years for FreeBSD and finally Linux has a working implementation (though no well-supported beadm that I've found yet).

diizzy · August 2, 2018, 7:28pm

I agree with @jeff here, I'd also go for a x86 (64-bit) solution running FreeBSD or possibly pfsense although FreeBSD will probably be a better choice in this case.

Depending on network (throughput) speed I'd be very careful going for a rather old/slow GX-412HC solution.
I'd recommend at least Intel Gemini Lake or newer

ASRock J4105M should be a pretty good starting point if you have a few parts around otherwise https://up-shop.org/28-up-squared might be of interest, Realtek NICs aren't great but given the price it's not a bad deal.

jeff · August 2, 2018, 7:50pm

Interesting boards -- I've bookmarked them.

I agree that the GX-412HC aren't screamingly fast and may have problems with high-bandwidth connections. I could test what one can bridge and either tee to the third interface or tcpdump and send over ssh with FreeBSD. I know they're fine at 300 mbps and would guess that 500 mbps wouldn't be a problem.

The Realtek NICs are reasonably decent these days (10 years ago they really, really sucked under load), though I'll swap in a real Intel PCIe NIC (lots of counterfeits on the secondary market) if I'm going to depend on "100%" performance.

If you do go with an embedded amd64 / x86_64 board, I'd go with 4 GB or more. While you can run ZFS with only 2 GB, you lose some of the advantages such as prefetch. It's generally a small up-charge that can either improve performance now, or extend the useful lifetime of the board.

GaryByatt · August 3, 2018, 11:24am

You have me convinced on ZFS. I will try it today on a Fedora host I know has a lot of unused SSD space. The bit I have read makes it look like I have a lot to learn about it though. I was under the impression from some of what I read that ZFS required more resources than many file systems, and extra admin to get good performance from it. Have you ever tried to convert another a file system to ZFS, or perhaps more to the point would you consider doing that on a live system?

If I have a ‘serious’ data requirement on a server or workstation I always go for hardware RAID 1. My experience of software RAID has not been good. Having said that, storage space is cheap, so if ZFS is doing software mirroring on hosts where I would not normally install hardware RAID 1 then I suppose it is an advantage, if it comes at reasonable admin effort. It would be extra useful if the mirroring could be across hosts, say to SAN/NAS.

GaryByatt · August 3, 2018, 12:14pm

Thanks diizzy

I only have BSD through a pfSense instance. To take on a new OS I would need to find a significant advantage in it over the two I have converged on to now: CentOS and Fedora. It sounds like support for ZFS is better integrated, so that might be the trigger if I find I can’t live without it.

If I can justify the cost, my low power processors of choice are Pentium Silver N5000 with TDP of only 6 W and circa 50% faster than J1900, or the J5005 with TDP of 10 W like the j1900 but circa 60% faster than it. The problem at the moment is finding them in any SFF device. As far as I can see only the ASRock J5005-ITX and the Gigabyte GB-BLPD-5005 have the J5005, and an MSI device has the N5000. Either way, by the time you add it all together they come out noticeably more expensive than a J1900 device, as I suppose they should. The J4105 is circa 46% faster than the J1900 also with TDP of only 6 W and is also a fair bit cheaper than the other two, so I agree it’s probably the best choice if the J1900 cannot do the work. It is also available in more devices, like the Gigabyte GB-BLCE-4105C and some ASRock boards. The other problem is they all have only 1 NIC, so I would have use a USB 3 to NIC adapter like Jeff said. I have had problems with these not being recognised on CentOS, but Fedora has more current drivers, so I may have to use that. Thanks for the link, they have dual NIC devices, but I am in the UK so I suspect the postage will be high.

GaryByatt · August 3, 2018, 12:34pm

I would never have asked, but as you offered, yes, please test bridging a couple of ports - that would be a great help. The rest of this project seems like it will be fairly straightforward. The risk is: can I transparently bridge two ports on one of those APU2 boards and tap the data passing though while still maintaining good speed through the bridge. As I will not be shifting much data off the device I am not concerned too much about that bandwidth. Could I even inject the captured data back into the bridge? I certainly do this on a hardware network tap I use. I occasionally connect my laptop to a hardware tap between a switch and a host and it becomes part of the network as if connected to the switch. This is broadly what I want to replace, except in this case I need to minimise the data captured before it gets uploaded.

I need to experiment with ZFS, but I am keen to try it now, and you are right the extra 2 GB is small change compared to the rest.

jeff · August 3, 2018, 3:02pm

Yes, I'll see what I can rig up this weekend to test the ability of an APU2 to bridge and monitor traffic.

On injecting packets it shouldn't matter (within reason) what NICs you're using with a Linux kernel or with FreeBSD. As soon as you can write raw packets, you can do pretty much whatever you want.

On ZFS, I'd try it on a VM with a couple of virtual disks to start. 16 GB "growable" from VirtualBox's default works fine for FreeBSD and don't consume much space on your physical drive. I'd also suggest FreeBSD 11 for several reasons. First is that it can install a system on ZFS with the built-in installer, without jumping through hoops. sudo pkg install beadm will get you the boot environment manager so you can look at that too (the Linux world doesn't have this yet). Also, on a VM, you can see how straightforward it is to replace drives, or even expand an existing pool by adding drives. Moving from 2 TB drives to 6 TB? Add two 6 TB drives to the pool of two 2 TB drives, wait for them to resilver, remove the two 2 TB drives and you have a 6 TB pool now. No reformatting, no RAID-card reconfiguration, it's that easy.

You can't "convert" another file system to ZFS. You can, however serialize a ZFS data set (anything from a snapshot to an entire filesystem, all its children and all their snapshots) and send it to a "flat file" or most anything else (such as piping it top another process or over ssh to another machine). That serial representation can be received into an other ZFS pool as a child filesystem, replicating the entirety of the original as a subtree of the pool. (In ZFS, the mount point can be specified independently of the position of the file system within the pool.) These serial streams can be snapshot, full, or incremental, with the ability to recover "mid stream" (at least with Solaris and FreeBSD). There's also iSCSI for block-level remote storage, if that's the path you want to pursue.

GaryByatt · August 3, 2018, 7:34pm

Thanks Jeff

Very good of you to test that idea out on your hardware.

I did not mange to find time to install ZFS today, but I will over the weekend.

I don’t have any suitable spare hardware at the moment, but next week I will get a new SFF device as I have a need for one of those soon anyway. It will probably have a J4105 or N5000 or N5005. I will try out FreeBSD 11 on it before it gets to its final job, perhaps still running FreeBSD with ZFS if all is well.

The new SFF device will be more powerful than I require for my project, but that means I can judge how much less of a processor I can get away with, whereas if it were underpowered it would be hard to know how much by.

I must say I am increasingly impressed by the capabilities you are telling me about ZFS. I am surprised that I have not seen the advantages to using it before. Perhaps they were not so concisely put and I before I got to understand them I just dismissed it as yet another file system with a different balance of strengths and weaknesses. Do I need to install a desktop of some sort to support beadm? Sounds like it would be easier to get started with a UI on ZFS.

jeff · August 3, 2018, 7:58pm

No, you don't need a desktop. I find that trying out OS features in a VM (I use VirtualBox) to be the easiest and safest, not to mention cheapest as there's no additional hardware needed. It shouldn't take more than 10 minutes or so to get FreeBSD installed on a VirtualBox VM, and then a couple more minutes to install pkg (the package manager) and beadm. The filesystem, if you used the ZFS option for the install, will already be configured to work well with beadm.

https://www.freebsd.org/cgi/man.cgi?query=zfs will get you the man page (HTML formatted) without having to install the whole OS. Others of interest would be zpool, beadm, pkg, freebsd-update, and maybe gpart and glabel. apt, dpkg, or the RPM tools get covered by pkg and freebsd-update -- there is a "strict" separation in FreeBSD between core OS (/etc/, /bin/, /sbin/, /usr/bin/, /usr/sbin/, /lib/, ...) and non-core packages and their config (/usr/local/etc/, /usr/local/bin/, ...)

No GUI for ZFS as it came from the "server world" (Solaris, then FreeBSD many years ago). I'm guessing it isn't too popular on Linux-based OSes between it not being able to be distributed as a binary with those distros, and that it isn't neatly integrated into GUI disk-management tools. The commands are typically pretty straightforward and there is lots of good advice on how to accomplish tasks on the (now) Oracle site, FreeBSD site, and associated forums. You can get yourself into "interesting" situations with clone and send piped into recv where, if you're not careful, you end up with the "wrong" filesystem mounted on your running system, such as on / or /usr, but they can be recovered from. Another reason to try things out on a VM until you're comfortable with it, or are trying something new and "dangerous".

jeff · August 4, 2018, 4:35am

Bottom line is that an APU2C4 appears to be able to bridge and run tcpdump at GigE rates without breaking a sweat.

DUT is an APU2C4 running FreeBSD 11.2-RELEASE. Test-path Ethernet in/out connected on em0 and em1 with ssh management access over em2. ssh in and running htop to watch CPU utilization. Bridge created as per https://www.freebsd.org/doc/handbook/network-bridging.html

Test harness is two FreeBSD 11.2-RELEASE machines with Intel NICs, one an i3-7100T on a "normal" motherboard, using the NIC in the PCH, the other a Xeon e3-1265v2 in a Lanner FW7582A with discrete Intel NICs. Test software is netperf-2.7.1.p20170921

Cabling is CAT6, 1-2 m in length.

No interface configuration past setting IPv4 address and netmask on the test-harness NICs, or past adding to a bridge and bringing them up on the APU.

Ethernet Cable

[jeff@lanner ~]$ netperf -H 192.168.100.102 -I 99,2           
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.102 () port 0 AF_INET : +/-1.000% @ 99% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 65536  32768  32768    10.04     941.47

Bridge Only

No "tap" -- one or two CPU cores (of four) engaged, typically under 40% for the core or two engaged.

[jeff@lanner ~]$ netperf -H 192.168.100.102 -I 99,2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.102 () port 0 AF_INET : +/-1.000% @ 99% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 65536  32768  32768    10.07     941.10

Bridge, tcpdump -w > /dev/null

tcpdump consumes 60-90% of one core, remaining cores at relatively low levels

[jeff@apu-too ~]$ sudo tcpdump -i igb0 -w /dev/null
tcpdump: listening on igb0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C7364551 packets captured
7362628 packets received by filter
0 packets dropped by kernel


[jeff@lanner ~]$ netperf -H 192.168.100.102 -I 99,2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.102 () port 0 AF_INET : +/-1.000% @ 99% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 65536  32768  32768    10.04     940.96

Bridge, tcpdump Pumping Everything Over ssh

ssh swamps a core when trying to pump 1 Gbps through encryption and over the wire. The remaining cores are running in the 30-60% range.

[jeff@miniup ~]$ ssh jeff@apu-too tcpdump -i igb0 -w - > /dev/null

[jeff@lanner ~]$ netperf -H 192.168.100.102 -I 99,2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.102 () port 0 AF_INET : +/-1.000% @ 99% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 65536  32768  32768    10.04     939.62

ssh Throughput

[jeff@apu-too ~]$ dd if=/dev/random of=random.1000m bs=1m count=1000
[jeff@apu-too ~]$ dd if=random.1000m of=/dev/null bs=1m
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 2.549130 secs (411346640 bytes/sec)

[jeff@miniup ~]$ scp jeff@apu-too:random.1000m /dev/null
random.1000m     100% 1000MB  24.4MB/s   00:41

Disk read is over 400 MB/s, so 24.4 MB/s is probably dominated by ssh.

Call it 200 mpbs or so, encrypted over the wire, for incompressible data.

GaryByatt · August 6, 2018, 12:39pm

Sorry about this Jeff; a commercial dispute that started on Friday has taken a lot of my time, including over the weekend and today, so I am not getting far on this project.

As I currently don't have a host runing a hypervisor that I can use for this test, I installed ZFS directly on a Fedora 28 Server.
This sever exists to occasionaly test server solutions before they are deployed to live systems.
So for my install it was only running my SSH session and cockpit without any clients logged in.
This took 325 MB RAM out of 16 GB, but after installing ZFS went to 370 MB eventually settelling down to 368 MB.
In fact I don't use any Linux servers with a GUI, so I am glad I don't need one for beadm.
Minimal install was just two lines, but the second of them that compiled the kernal modules took quite a while.
In fact I had just launched another session to look if there was a problem when it finnaly fininshed.

I hope to return testing ZFS after today.

GaryByatt · August 6, 2018, 1:32pm

That is some excellent test work Jeff.

It does seem like the CPU may get into difficulty if the bridge is carrying capacity traffic and I apply tcpdump filtering to reduce the upload size – from experience this pushes the CPU hard to dissect the frames and compare to filters, but that much traffic is a special case and the reduction in data over SSH is going to reduce the CPU load.

The APU2C4 seems to close to ideal for this project having exactly the right number of NICs and a reasonable cost as well.

I may be able to do this for a little less with some other equipment, but I am not sure that it is worth the effort or the expense of testing them now I know the APU2C4 is good enough.

Thanks so much for all your insights and testing work Jeff.

Will let you know how I get on when I have time to get the hardware, replicate what you have done, and add the filtering.

jeff · August 6, 2018, 4:21pm

I don't know what kind of filtering you're needing, but pcap filtering is generally pretty efficient.

With no filtering, the complete mbuf chain for each packet needs to be duplicated and output, so that load would be reduced.

dlakelan · August 6, 2018, 6:53pm

If you really need high performance filtering, you should probably write a custom filter using eBPF, here's a summary. https://opensource.com/article/17/9/intro-ebpf

I think there is a clang backend to compile to eBPF so you basically write a single limited functionality C function that does the filtering, it compiles to eBPF bytecode and then you load that bytecode into the appropriate tc action or firewall rule

edit: you can do iptables matches based on ebpf using iptables ... -m bpf ... and in ingress or egress queueing using tc filter ... bpf ...

see "man iptables-extensions" and "man tc-bpf" for details

jeff · August 6, 2018, 7:12pm

Similarly, if you decide to go with FreeBSD, ipfw can perform very efficient packet-header matching and then either tee to a socket or into the netgraph subsystem (which also has a ng_bpf node type). Poke me if you go that way. There are some tricks around both applying ipfw to bridged packets, as well as needing to make sure you don't introduce packet loops (duplicated packet from the tee re-entering the rule set).

With a Linux-based OS, you might want to look at nftables as I find it to be much more understandable than iptables.

jow · August 6, 2018, 8:28pm

Please note that using and configuring BSD is offtopic in the OpenWrt forum, you might want to use a dedicated BSD community site to discuss its use and fitness for packet filtering.