Firewall configuration assistance required

oskari.rauta · November 20, 2021, 8:20am

I have a minor issue with firewall configuration which might escalate to a security issue, so let's talk about it. It's on a snapshot version, and I am using fw3. Let's start with my network configuration...

It's non-relevant, but system is x86_64.

LAN is 10.0.0.0/9 ( 10.0.0.1-10.127.255.255 address space )

Actually LAN is only 10.0.0.0/16 (10.0.0.1-10.0.255.0 ) but since I have some VPN network with bridging, outside hosts are using 10.x.0.0/24 as their address spacing, so 10.0.0.0/9 covers this. I have a testing device where I have this "issue" with address of 10.95.0.1.

Issue comes with containerised service(s), I have nginx running inside a podman container. CNI is configured to use 10.129.0.0/24 as subnet, so:

config interface 'podman'
	option proto 'none'
	option device 'cni-podman0'

and in my firewall configuration I have following (un-related parts cut out):

config defaults
	option flow_offloading	1
	option syn_flood	1
	option input		ACCEPT
	option output		ACCEPT
	option forward		REJECT

config zone
	option name		lan
	list   network		'lan'
	option input		ACCEPT
	option output		ACCEPT
	option forward		ACCEPT

...

config zone
	option name		podman
	list   device		'cni-podman0'
	list   subnet		'10.129.0.0/24'
	option input		REJECT
	option output		ACCEPT
	option forward		ACCEPT
	option masq		0
	option mtu_fix		1

...

config forwarding
	option src		lan
	option dest		podman

config forwarding
	option src		podman
	option dest		wan

...

config rule
	option name		Reject-podman-to-lan
	option src		podman
	option dest		*
	option dest_ip		10.0.0.0/9
	option proto		tcp+udp
	option target		REJECT

config redirect
	option name		Allow-HTTP
	option src		wan
	option dest		podman
	option src_dport	80
	option dest_ip		10.129.0.2
	option dest_port	80
	option proto		tcp
	option target		DNAT

config redirect
	option name		Allow-HTTPS
	option src		wan
	option dest		podman
	option src_dport	80
	option dest_ip		10.129.0.2
	option dest_port	80
	option proto		tcp
	option target		DNAT

ifconfig:

cni-podman0 Link encap:Ethernet  HWaddr FE:7D:2E:77:DF:39  
          inet addr:10.129.0.1  Bcast:10.129.0.255  Mask:255.255.255.0
...

br-lan    Link encap:Ethernet  HWaddr 36:16:CD:39:4D:23  
          inet addr:10.95.0.1  Bcast:10.95.0.255  Mask:255.255.255.0
...

firewall.user does not have any rules.

And then I have a pod that is set to use IP address 10.129.0.2.
With recent podman, on OpenWrt it is possible to use storage for podman which is on a physical device instead of in RAM, so pod(s) & container(s) can survive reboot, but if I won't use this set up, I have a kube file which builds pod and it's host(nginx) on rc.local

Now, as you see, on my firewall configuration, I want to disable access from container to ANY lan ip addresses, and all my configurations work as they should- except... After building my service with script on rc.local, I also must add /etc/init.d/firewall restart (maybe reload would be sufficient...) to block access to LAN from containers. I have not tested this yet, but I am pretty sure, I need to reload firewall EVERY time after creating containers to keep this wanted behaviour.
If I keep my containers on a persisting storage, like on a disk, I do not need to restart firewall.

And that is my concern of security issue I am talking about, as I'd like to fix this so one would not need to remember to restart firewall to retain podman -> lan blocking...

Here is what I think this is about: firewall starts BEFORE podman. So even though cni-podman0 is introduced in network configuration and zone is listed in firewall configuration, rules are ignored - which is strange in that way, that http(s) redirection works any way..

I have not actually ever been a firewall (iptables) expert, so that is why these are just thoughts why I have this issue. So, now I am looking for those experts to tell me where my mistake is and how to fix this.

Hopefully we can stay on topic...

anon50098793 · November 20, 2021, 9:57am


cat <<PPP > /etc/hotplug.d/iface/37-podman
if [ "${ACTION}" = ifup ]; then
	case "${DEVICE}" in
		"cni"*) /etc/init.d/firewall reload; ;;
	esac
fi
PPP
grep -q '/etc/hotplug.d/iface/37-podman' /etc/sysupgrade.conf || echo '/etc/hotplug.d/iface/37-podman' >> /etc/sysupgrade.conf

oskari.rauta · November 20, 2021, 12:48pm

That is actually quite good solution to restart firewall on podman start- I first thought I would add a hook to init script.. Or just add it to start()

And that sysupgrade add-on is a nice touch, although personally in my case as it's x86_64, upgrading does not happen through Luci...

But still this doesn't solve the issue. cni-podman0 is active right after podman socket is up- but rest of clients use veth and they don't expose their ip addresses on veth interfaces and suffixes on veth interfaces are random (I think)- I could of course change that script to reload on ifup veth*.... But what if veth* is used for something else too that does not require such action....

Anything I could do for my firewall config that would help?

anon50098793 · November 20, 2021, 1:07pm

uci add firewall include
uci set firewall.@include[-1].path='/etc/podman.deny'
uci set firewall.@include[-1].reload='1'
uci commit firewall

cat <<'TTT' > /etc/podman.deny
#!/bin/sh
bNET="10.2.3.0"
bPREFIX="25"

for pIP in $(podman inspect -l | grep IPAddress | cut -d'"' -f4 | sort | uniq); do
	iptables -t mangle -D FORWARD -s $bNET/$bPREFIX -d $pIP/32 -j DROP 2>/dev/null
	iptables -t mangle -I FORWARD -s $bNET/$bPREFIX -d $pIP/32 -j DROP
done
TTT
chmod +x /etc/podman.deny

/etc/init.d/firewall restart

???

(change cni in the earlier hotplug script to veth move it to 'net' , match 'add' instead of ifup and match INTERFACE not DEVICE)

scavenging has room for improvement

edit: thinking about it a little the -D section above is a pretty hacky method that won't really scavenge too well... so may end up with some stale block rules as we are not tearing these down on hotplug-net-remove(etc)... this is probably not a huge deal for simple local network block... but for production environment it can be made much better with a chain that can be flushed -F (see banIP) or --comment on each rule too locate the stale ones...

oskari.rauta · November 20, 2021, 3:28pm

not enough info about the veth interfaces or prefixes to really advise...

ifconfig for veth*:

vethcc114952 Link encap:Ethernet  HWaddr CE:48:08:B0:BF:0C  
          inet6 addr: fe80::cc48:8ff:feb0:bf0c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:206 errors:0 dropped:0 overruns:0 frame:0
          TX packets:349609 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:44214 (43.1 KiB)  TX bytes:14697486 (14.0 MiB)

If I understand correctly, these veth's are tunneling from pods/containers to interact with cni-podman0 (gw for pods/containers), and there's no masquerading, so 10.129.0.2 comes as 10.129.0.2 when with masquerading it would have ip of cni-podman0.

And so so sorry, with prefix I meant naming prefix, in this case prefix for name is cc114952, meaning that I cannot create a script that targets specific veth names as they seem to be more or less random.

oskari.rauta · November 20, 2021, 3:46pm

bNet="10.129.0.0"
bPrefix="24"

There also should be a support to not do it twice to same ip in script..
as my:

root@dev2:~# podman inspect -l|grep IPAddress
            "IPAddress": "10.129.0.2",
                    "IPAddress": "10.129.0.2",
root@dev2:~#

lists same ip twice..

Actually I have been working on a podman ubus controller/manager for a while now..

It doesn't have that feature, but it could have, a feature that lists ip addresses, problem is that my software isn't 100% real-time as updates happen on a background, but maybe I could make a call that would work on a realtime..

Reason for that tool is a nice LuCi status/minimal commander in a manner similar to cockpit-ubus, I am aware of luci-app-docker which I have for most parts ported to podman (load stats does not work/won't ever work), but I wasn't happy with it- it is almost full commander for podman, when I think a webui should not be that complex, complex setup is supposed to be done on a terminal, and still there were issues, pulling a new image took too long and caused a time-out; which then timed out resulting with a failed pull. So in that way I am not re-inventing a wheel.. It's still work-in-progress..

At the moment it provides me with this in LuCi overview:

anon50098793 · November 20, 2021, 3:48pm

this is for the non-container network you want to block... the container networks are -d and are already un-duplified...

oskari.rauta · November 20, 2021, 3:51pm

then..

bNet="10.0.0.0"
bPrefix="9"

anon50098793 · November 20, 2021, 3:53pm

that will still match you container network?

a simple example;

###### br-lan: 192.168.1.0/24
bNET="192.168.1.0"
bPREFIX="24"

you know more about what rules you'd like... the above are all the ingredients to make it happen...

in the above example i've given... the rules dissallow a 'network' from reaching each instantiated container(s=all)... as is how I interpreted your request... adjust as required...

oskari.rauta · November 20, 2021, 3:55pm

No it won't.

10.0.0.0/9 is 10.0.0.1-10.127.255.255
and container network was explicitly modified to be out of that range:
10.129.0.0/24

anon50098793 · November 20, 2021, 3:56pm

ok... looks good then...

(it does seem like the services themselves have some facility to manage these types of things and it's likely less clunky than this method... but as i've never used them before too much hassle for me to dig into cnitool or whatever to extrapolate how they work... )

oskari.rauta · November 20, 2021, 4:02pm

On my software, there seems to be issue of mis-calculation of uptime as nginx has been running longer... I'd better check that up sometime... But not now, me and my spouse have anniversary so we are heading to a restaurant, glad we worked out a solution although I will have time to set it up at tomorrow at earliest..

/9 prefix is a bit wide, but as I have some on-going projects that I test on a live devices, I like to have some workspace available when necessary..

/10 would had been sufficient already; but- there was issue that I already had hosts outside that range and didn't want to change their addressing, so I came up with 9 and instead adjust cni's configuration.

Thanks wulfy23

oskari.rauta · November 21, 2021, 8:37am

I completely forgotten to add here my firewall rule which for some reason only blocked access to container's hosting ip, but not to other ip addresses in lan, here goes:

config rule
	option name		Reject-podman-to-lan
	option src		podman
	option dest		*
	option dest_ip		10.0.0.0/9
	option proto		all
	option target		REJECT

This rule is useless, as forwarding rules already drop connections from podman to lan. I think firewall does not like /9 why rule does not work to bridged vpn network(s).

Anyway, I probably wasn't clear enough, but I wanted to disable access the other way around- there is no point of blocking access from LAN -> containers, at least not in my case as I don't have a guest-network. I have adjusted for desired behaviour.

oskari.rauta · November 21, 2021, 9:31am

Anyway, I found a simpler solution that does work. Using mangle table seems to be the key here to make this work, so even though hotplug scripts were a nice addition; just adding this to firewall.user seems to do the job:

iptables -t mangle -D FORWARD -s 10.129.0.0/24 -d 10.0.0.0/9 -j DROP 2>/dev/null
iptables -t mangle -I FORWARD -s 10.129.0.0/24 -d 10.0.0.0/9 -m comment --comment "podman_drop_lan_rule" -j DROP

Idea of dis-allowing access from containers to lan is isolation; on the following scenario, web server is accessible from wan and is running inside a container. Then web server has a issue with security and someone is able to gain login access to it. By isolating it's access to LAN, only containers running in the same pod are in risk, but host system cannot be accessed.

If all files outside of base system used by containers (let's say configurations, certificates and website files) are backed up, even if container would be completely ruined, it would be pretty straight forward task to restore it by restoring files from backup and re-creating containers, this could even be setup so that it would happen during each boot to make sure, as long as backups are done manually on a time where it would be certain that system hasn't compromised.

oskari.rauta · November 27, 2021, 8:39am

new partly related issue...

setup is following with containers:

proxy server (caddy) with ports 80 and 443+1443 -> proxies to :1085
nginx server with port 1085

port 1443 must be there for local connections to be possible, but this is the issue; I'd want to not need port 1443 on my pod (container group)

I have a firewall rule redirecting from wan:443 to caddy -> 443, and this works perfectly. (Allow-HTTP & Allow-HTTPS)
But from LAN, connections to 10.129.0.2:443 (or :80) will be most likely dropped as connection ends eventually to time-out. (REJECT would notify that it was rejected..)
When I create my pod, I do not publish any ports as I those previously mentioned firewall rules Allow-HTTP and Allow-HTTPS to redirect to pod and I trust that more than publishing with podman as it's targeted for a bit different systems.

First, I though this was uhttpd issue, so I changed ip addresses that it listens at to cover only host's lan & vpn ip addresses. Did not help, but most likely is a good choice in case I want to isolate containers..

Second, I use zerotier, so I decided to blacklist 10.129.0.0/24, but still- no good..
I also checked my zerotier service configuration at my.zerotier.com and found out I had a bridging issue there, instead of 10.0.0.0/16 I should use 10.0.0.0/9. That updated routing information, but still this wasn't a solution either. Eventually I removed zerotier's local config to blacklist 10.129.0.0/24 as it wasn't helpful.

I can set it to use 444 instead of 1443, and it works. So it is not a problem of port below < 1024.

I do have a work-around, that port 1443; I use it to determine that service is running properly, and this again is used with uacme script when attempting certificate renewal. First it uses my podman-ubus check that both, caddy and nginx are running, then it uses

curl --insecure -H "Host: ${SERVICE_HOSTNAME}" -IL https://${SERVICE_IP}:${SERVICE_PORT}

and parses then status code which needs to match 200.

But still- I'd like to get it to work from lan -> podman with port(s) (80 and) 443 also, without needing any extra ports.

oskari.rauta · November 28, 2021, 9:59am

I have a solution..

Original route to 10.129.0.0/24:

10.129.0.0/24 dev cni-podman0 proto kernel scope link src 10.129.0.1

I could add it to my rc.local after creating a pod on boot- but I wonder if there is a prettier solution as well..
Following route change fixes connection:

ip route del 10.129.0.0/24
ip route add 10.129.0.0/24 dev cni-podman0 proto kernel scope link via 10.129.0.1 src 10.95.0.1

I tried to fix this by setting up a static route, but I didn't have luck with that..
Maybe there should be a hotplug script to fix this...
But for now, I added this to my rc.local which creates pod and containers inside it..

# Fix routing to podman..
LAN_IPADDR=$(uci get network.lan.ipaddr)
ip route del 10.129.0.0/24
ip route add 10.129.0.0/24 dev cni-podman0 proto kernel scope link via 10.129.0.1 src ${LAN_IPADDR}

Some thoughts.. There is a cni plugin "cni-route-override", I started to think that maybe I could patch it to support fields for src and via to achieve this from cni configuration.. But that seems something that should be instead done on a mainstream level directly to cni's tree. I made a proposal for this through Issue tracking.

later notes:
Change must be made in cni's repository, or bigger changes (custom structure for route) would be needed for cni-route-override. Many components use types.go which contains type route, and changing it elsewhere doesn't sound very wise; but making my own patch for cni, won't make a change alone as tools taking advantage over cni, need to be updated as well, so I dropped this idea for now. So back to local.rc or similar solution.

Though I made a PR for new package, cni-route-override so if it's handy for someone, it should be available soon for recent git builds..

Next issue:
connection using internet hostname fails due to rejection from lan. My host is behind a domain name and requests from lan to internet domain name get rejected. But requests outside LAN (internet) do work...