Introducing shell script for subnets aggregation

But the output is the larger subnet. How would this eliminate subnets?

Can you give an example of how you envision someone using this script to solve their subnetting problems?

I personally have an application for this, which is why I wrote this script. When creating a whitelist-based geoip blocking, I want to whitelist all local subnets. So I have to deal with multiple entries like fdxx:xxxx:xxxx:10::7bf/128 and fdxx:xxxx:xxxx:10:xxxx:xxxx:xxxx:xxxx/64.

Now I could just add all entries to the whitelist but I think that this is lame. So this script allows to eliminate smaller subnets that are encompassed.

There's no need to aggregate anything since nftables provide a built-in auto-merge feature:
https://www.netfilter.org/projects/nftables/manpage.html#:~:text=auto-merge,-automatic

1 Like

That's good to know. Also, i knew about similar functionality in ipset. However i thought that creating an ipset (or nftables set) for a handful of subnets (in most cases this will be 2 or 3) is a waste of memory. Maybe I'm wrong.

I mean, obviously i am using ipsets for the geoip whitelist but i think there should be a separate firewall rule for local networks, and creating a set for it seemed wasteful.

Updated the description, so hopefully it's more clear now what the script actually does.
Also, if you think that the functionality could be extended to a nearby area which would be useful, I'm open to hear suggestions. @psherman

Found a few ways to optimize these scripts, so now they work about 2.5x faster. Still takes almost 2 seconds to aggregate 3 ipv6 subnets on my old router CPU. But it's better than 5 seconds as it was before :slight_smile:

hi,
i have a question too:

ipv4_regex='((25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])'

if i read correctly, in this part (actually parts as it is repeated) (25[0-5]|(2[0-4]|1[0-9]|[1-9]|) there is an empty pattern at the end: here|). seems to me as a typo. am i wrong and there is a reason to capture empty string?

thanks

1 Like

No, you are correct. (Edit: see update below). And wow, I wish I had your ability to spot such a detail :slight_smile:
Thank you for letting me know.

Actually, turns out that it's not a typo. It makes the capturing group (2[0-4]|1[0-9]|[1-9]|) optional, which is required to match a single digit in that octet of the ipv4 address. That's a peculiar way to design a regex and I wasn't even aware of this since I basically copied it verbatim from the stackoverflow answer and just adapted to the ERE syntax (the link to the answer is in the code comment btw). At the time, I tested so many different regex options that I couldn't go much into detail of each one, I was just happy to find this one which was performing 40x faster than some other ones (including the shortest and sexiest one in the same answer), while still seemingly being completely water-proof. The regex speed doesn't really matter in this script, but I'm utilizing the same regex to validate lists containing thousands of ip addresses in my other scripts, and there it makes a huge difference.

thanks for looking into it.

  1. yes, i see now that one digit octet will be matched this way but there will be two capturing groups which may create problem if you process capturing groups.
  2. you don't need a one-liner regexp per se, as it is a script. there are other options to find x.x.x.x format and validate using script language - but that's purely personal preference.
  3. strangely though (25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])\. according to regex101.com site matches on 256 too.
  4. this regexp creates capturing groups which i doubt you use, so maybe you can even fine tune by using non-capturing groups (?:) - but that should be measured, may not give any performance benefit at all.

anyhow thanks for the script.

In theory yes, in practice I don't think implementing ip validation through shell logic can compete performance-wise with a well-designed regex.

If you take the complete regex ^((25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])$ (I added the anchors in the beginning and in the end because I'm adding them in the code anyway) and check on that website in ECMAscript flavor (which should match ERE syntax? afaik), it doesn't match any numbers above 255 in any octet. Even just the portion you quoted, on its own and without anchors, doesn't match 256 for me. Not sure which of us is doing something wrong.

Yeah, I should test this sometime, thanks for the tip.

Also, since we've had a pretty good chat about code here, I'd like to ask if you would agree to test another script related to the same mother project that I'm working on. It's here, in the comment marked as Solution. I just need some statistics to see if the heuristics I found actually work in environments other than mine. All you need to do is copy the shell code into a new .sh file (like find-local-subnets.sh), download the getsubnet.sh script (which that code depends on) to the same directory, run sh find-local-subnets.sh (no root required) and check the output. Would be very thankful if you agreed.

most probably it's me. or not.

there is match, although it is 56.0.0.1 and not 256.0.0.1 but either way it is wrong.

The website is trying to be helpful by displaying which portion of the input does match. However as long as the complete input is not highlighted, it means that there is no actual match in reality.

then it is me definitely :slight_smile:

And if you add the ^ anchor in the beginning (^ means start of the line), you'll see that nothing is highlighted anymore.

yes, because that's a totally different regexp.

Actually I said a silly thing in my previous comment and I have to correct myself. It would match in reality without the anchor. But then that's why anchors are a must, and I always use them.

Optimized some more and partially re-written, 2x speed improvement (and yet more improvement with higher number of subnets). Probably this is as fast as I can make it in shell code.

Just wanted to put an update here. I have re-written fairly big parts of the scripts to improve the performance on slow CPUs and to streamline the code. That lead to another 30-40% gain in performance. I also added a script which detects and aggregates local area networks (which was my ultimate goal for this side project) and another one which does the same but as a stand-alone version which doesn't depend on other scripts. Need help testing it. If someone's interested to help, please post in this thread:

Testers wanted for a shell script