Help with uxc containers

I wanted to try uxc- opener's own minimal container system. It is OCI compliant; so I should be able to use extracted docker images with it- but let's put that away for the moment- let's just get something small running, for example; another instance of openwrt in a container shell.

So I downloaded x86_64 image for this purpose, for issues faced later, I replaced that image with my own build, since cttyhack isn't included as default in busybox.

Okay, so here's how to do it, at least in theory:

# mkdir -p /root/cntr/rootfs
# mkdir -p /root/cntr/overlay
# cd /root/cntr
# crun spec
# cd /root/cntr/rootfs
# tar xvfz {DL_PATH}/openwrt-x86-64-generic-rootfs.tar.gz

Now our container is all set.. except for uxc part.. Let's continue to that..
ocispec generated by crun spec (config.json) contains path rootfs as root fs path, you can change this to whatever.. You can also use others instead of crun, for example runc..
It also contains command to run, which at default is sh

# cd /root/cntr
# uxc create cntr --bundle /root/cntr --write-overlay-path /root/cntr/overlay

Now container cntr has been created. With uxc list we get following:
[ ] cntr created runtime pid: 4566 container pid: 4569

And our syslog shows:

daemon.info ubusd[4570]: loading /usr/share/acl.d/dnsmasq_acl.json
daemon.info ubusd[4570]: loading /usr/share/acl.d/luci-base.json
daemon.info ubusd[4570]: loading /usr/share/acl.d/ntpd.json
daemon.info ubusd[4570]: loading /usr/share/acl.d/wpad_acl.json
daemon.info procd: out: jail: using guest console /dev/pts/2
daemon.info netifd[4571]: jail: exec-ing /sbin/netifd
user.notice : Added device handler type: bonding
user.notice : Added device handler type: 8021ad
user.notice : Added device handler type: 8021q
user.notice : Added device handler type: macvlan
user.notice : Added device handler type: veth
user.notice : Added device handler type: bridge
user.notice : Added device handler type: Network device
user.notice : Added device handler type: tunnel
daemon.err netifd[4571]: netifd_ubus_init(1372): connected as 013760e5
daemon.err netifd[4571]: config_init_wireless(648): No wireless configuration found
daemon.notice netifd: Interface 'loopback' is enabled
daemon.notice netifd: Interface 'loopback' is setting up now
daemon.notice netifd: Interface 'loopback' is now up
daemon.notice netifd: Network device 'lo' link is up
daemon.notice netifd: Interface 'loopback' has link connectivity

so far so good, except for those netifd errors, wireless obviously isn't working; I don't have wireless in my host- and other error probably is related to not yet configured network (veth pair, as explained here in this bit out-dated wiki: https://gitlab.com/prpl-foundation/prplos/prplos/-/wikis/uxc ).

Okay, then we start our container with uxc start cntr, still - no errors, but.. system log has..

daemon.info procd: out: jail: prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, 5, 0, 0) failed: No such file or directory
daemon.notice netifd: Interface 'loopback' is now down
daemon.notice netifd: Interface 'loopback' is disabled
daemon.notice netifd: Network device 'lo' link is down
daemon.notice netifd: Interface 'loopback' has link connectivity loss
daemon.info netifd[4571]: jail: jail (4572) exited with exit: 0

Okay, serious problems with capabilities; man prctl says that caps cannot be set if one of blocking caps is set first (can't remember it, google it if interested in facts) - So I started to disect this; by removing caps from config.json's ambient section, I eventually got rid of these errors, I actually added new groups (to all but ambient).
Now, delete your container cntr with uxc delete cntr and make capabilities look like this:

                "capabilities": {
                        "bounding": [
                                "CAP_KILL",
                                "CAP_NET_RAW",
                                "CAP_AUDIT_WRITE",
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "effective": [
                                "CAP_KILL",
                                "CAP_NET_RAW",
                                "CAP_AUDIT_WRITE",
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "inheritable": [
                                "CAP_KILL",
                                "CAP_NET_RAW",
                                "CAP_AUDIT_WRITE",
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "permitted": [
                                "CAP_KILL",
                                "CAP_NET_RAW",
                                "CAP_AUDIT_WRITE",
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "ambient": [
                        ]
                },          

for now, make also rootfs writable:

        "root": {
                "path": "rootfs",
                "readonly": false
        },

I also added /tmp to mounts..

                {
                        "destination": "/tmp",
                        "type": "tmpfs",
                        "source": "tmpfs",
                        "options": [
                                "nosuid",
                                "noexec",
                                "nodev",
                                "rw"
                        ]
                }

Also, you could install catatonit - or if you built your own image, in build directory of tini, there is also statically built version of tini. catatonit is built static already. Or get both, see what works for you, if decide to use it; copy catatonit and/or mini to rootfs's root.

If you want to use them; change config.json again like this:

                "args": [
                        "/catatonit", "--", "dropbear",
                        "-R", "-F"
                ],

advantage of these, is that they pass signals to child processes.. but to get that to work, we need to add environment variable - and now; this is for busybox (and ash/sh) only, on other systems, there are their own unique ways.. but config.json again:

                "env": [
                        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                        "TERM=xterm",
                        "ENV=/etc/profile"
                ],

Adding "ENV=/etc/profile" causes it to run profile, which I pretty much cleared from stuff that isn't needed in containers, so it looks like this:

export PS1='\u@\h:\w\$ '
export EDITOR=/usr/bin/nano
trap "exit 0" HUP INT QUIT TERM KILL

all this helps uxc's kill command to work, might need tweaking still. Also when attached, ctrl-C kills terminal.

Those caps among other things, allow now usage of ping- which everyone wants as it's easiest way to test for connectivity.

Re-create and start your container, results:

# uxc list
[ ] cntr running runtime pid: 22705 container pid: 22760
# uxc state cntr
{
"ociVersion": "1.0.2",
"id": "cntr",
"status": "running",
"pid": 22760,
"bundle": "/root/cntr"
}

And here is the network setup to enable networking for container..

config device 'veth0'
	option type 'veth'
	option name 'vhost0'
	option peer_name 'virt0'

config interface 'virt0'
	option proto 'static'
	option device 'virt0'
	option ipaddr '10.0.201.2'
	option netmask '255.255.0.0'
	option gateway '10.0.0.2'
	option jail 'cntr'
	option jail_ifname 'host0'

my lan net mask is 10.0.0.0/255.255.0.0, and 10.0.0.2 is not gateway per se, it's host's ip address, 10.0.0.1 is gateway.. But this is how it works. 10.0.200 > is outside of my dnsmasq's provided ip address space, but accessible in subnet.

Now; network works, both to lan and to wan- but name service doesn't.
with uxc, /etc/resolv.conf is linked to /dev/resolv.conf.d/resolv.conf.auto and overwriting /etc/resolv.conf doesn't help; on your host (not in container), check /tmp - you'll find /tmp/resolv.conf.d there, and /tmp/resolv.conf-cntr.d - in resolv.conf.d you'll find your current resolv.conf, but in resolv.conf-cntr.d; file is empty.

With OCI/ucx - it is possible to create hooks. I tried to create createRuntime hook that copies file from resolve.conf.d to resolv.conf-cntr.d but it did not work out; container did not start properly, it got stuck in every possible way, container could not be killed or deleted. With uxc list, you get pid for runtime and container, by killing both pids, you can delete container. Instead I made this startup script, which also starts container to help out with this issue; and it does work.

# cat /root/cntr/create.sh
#!/bin/sh
uxc create cntr --bundle /root/cntr --write-overlay-path /root/cntr/overlay
cp /tmp/resolv.conf.d/resolv.conf.auto /tmp/resolv.conf-cntr.d/
sleep 1
uxc start cntr

without sleep, it will result in errors. 1 second is a long time, but I didn't start looking for perfect value to use with usleep.

now, to things that do not work..

Do you want to replace podman/docker with uxc? Well.. There's a huge gap. I assume you want to be able to attach to container's shell from time to time, in podman it would be done like this:
podman exec -it cntr /bin/sh

But first, our service. All attempts to run dropbear, failed. Correct command-line arguments for dropbear would be "-R" and "-F", maybe also "-P" and "/tmp/dropbear.pid" where last 2 set location of pid file, since we don't have /tmp/run initially.
You can add dropbear to config.json's process -> args, even with tini or catatonit, I was not able to get it to work.
I tried to add dropbear to profile with and without -F argument. -F argument keeps it on foreground, so if I could get it to work... I would also have shell. All attempts to do this failed. So I attached to shell, and tried to execute it there. No errors, or nothing.. just clean exit of dropbear. This is where I came to conclusion that uxc limits container so that processes cannot have child processes. Yet, even as a single process, I failed every attempt to run dropbear.

From https://github.com/jjlin/docker-image-extract you can find a great script to pull and extract images from docker hub. I searched for tiniest alpine based dropbear image and found one. By using it as root filesystem, keeping profile we created first, I got some better results. I was even able to run dropbear and connect to it, but still ending up with failure. This is probably due to that single process (no forking allowed) limitation, sh isn't allowed to exec as child process. After logging in, connection just ends. Also shell with dropbear, with previously mentioned methods failed.

So, how is this done in podman? Oh, there is no shell. Instead /bin/sh or what ever command- is executed separately in same namespace. You probably could find out namespace (check running pid with ucx list and then check /proc/PID_HERE/ns) and execute sh there by yourself, maybe with nsenter? I didn't test this since my testing ended to dropbear not to work. web server probably would had worked as it rarely forks.. this is probably why containers normally run a single process, in case of web server, you have other containers for proxy and php and whatever, like mysql.. but at least with podman and little effort, you CAN have child processes and even run all mentioned in single container. Though pods are there for that..

Also one really big downside is that when using uxc, you will have to get familiar with command reset, as something happens to your terminal with uxc pretty often. Even when viewing logs in another terminal. Especially when attaching.

Also, other container systems are easy - they build oci specs automatically with given arguments while creating container. Probably also hooks actually work there. As system has its flaws, it's really time consuming to try get it to work when editing spec manually.

Killing containers doesn't allow you to restart them. But it's do-able, with ubus..

ubus call container.state '{ "spawn": true, "name": "cntr" }'

Hmm... Probably a wrapper script would be great assistance to speed up management while testing..

If you have problems to stop containers, you might want to force it with signal 9..

uxc kill cntr 9

Though, if you end up with that previously mentioned issue with hooks, this won't work- use methods I mentioned in that part of guide.

So maybe not yet as there to compete with other container systems. But very promising. Also up to date and very thorough documentation is needed, this post gives some pointers to get a jump start. There might be a capability that allows to run child processes, someone else probably can find out.

But still very promising. Podman for example is great. Indeed great, where I mean big, huge- and most of users don't need even a fraction of it. It is possible to write a script that retrieves cpu usage of process, I have a mini container (for podman) management that works in luci, it can only start/stop/restart containers, tells which containers are available and their state, cpu usage and memory consumption. Because that is all I need + exec -it cntr /bin/sh - so this isn't that far from it. But then there's problems that I wasn't able to make it work with openwrt rootfs to keep dropbear running, even though I really tried- so work in progress.

Oh, I also want logs.. Also, when attaching to shell, your syslog is filled with every keypress by procd..

If uxc gets to level I mentioned that I use with podman, I very probably would use uxc instead of podman.

2 Likes

Some updates, even though I thought that I would left this for someone else to figure it out..

shell to container with nsenter..

# uxc list
[ ] cntr running runtime pid: 15642 container pid: 15649
# nsenter -a -t 15649 /bin/sh
# %n@%m:%~# ps ax
PID   USER     TIME  COMMAND
    1 root      0:00 /catatonit -- dropbear -R -F
    2 root      0:00 dropbear -R -F
   42 root      0:00 /bin/sh
   43 root      0:00 ps ax

prctl is command that is executed by uxc/ujail to set capabilities.
list of prctl capabilities here

Probably good to inspect further:

  • PR_SET_CHILD_SUBREAPER
  • PR_SET_NO_NEW_PRIVS

list of linux capabilities here
CAP_SYSLOG might be a solution to logging..

After CAP_SYSLOG, logger command outputs to my host systems syslog. I wonder what would happen if my small dropbear image would have a syslog daemon and could I run it....

Something that I got wrong, is ability to run child processes-
It is completely do-able; as I can run sh in attached session. So a child sh works.

@daniel - could you maybe chime in here?

@oskari.rauta Great to see that someone other than me is actually trying to use this :slight_smile:

First of, did you see uxc attach command which allows you to attach to a console of a running container?

OCI demands that containers, once killed, have to be created again, it's a finite life cycle per definition. To make things a bit more convenient at least, I've implemented uxc create in such a way that you can skip all the additional parameters if the container had existed before. So to kill and restart you have to do

uxc kill cntr 9
uxc create cntr
uxc start cntr

If you want to see console output while the container is starting you can do

uxc start --console cntr

Nice fine. We kinda need a good way to downloads, extract, strip and re-pack Docker containers into squashfs filesystems to be used by APK / uvol, which is the other half of this container ecosystem which kinda got stuck because APK has not yet replaced OPKG. With APK we are going to have "container packages" which will ship a squashfs image plus some metadata, to be written directly into a uvol storage volume (backed either by LVM2 or UBI).

The distribution aspect is very different from Docker and OCI, simply because OCI demands that you download a tarball and extract it somewhere, ie. requiring both the space to store the tarball and space to store the extracted content somewhere on a filesystem. This obviously isn't very feasible on small devices which may not have so much storage space. Hence the idea when using uxc with uvol and APK is that a container image (squashfs) can directly be downloaded into a storage volume automatically created for that purpose, blockd will take care of auto-mounting and the whole thing is a bit inspired by https://distr1.org/

Yes, this was for debugging console forwarding (terminal subsystem in Linux and UNIX isn't exactly straight forward nor easy to understand. Having terminal attachment working was one of the most difficult things in that whole store). Let's have that disabled by default in future.

I can't express how much I appreciate that! It's been years that I have been using uxc myself (mostly to run Grafana inside Alpine container on some more powerful routers, and SpamAssassin inside Debian container on my OpenWrt-based mailserver) and it was kinda demotivating that nobody except me ever seemed to even try using it.

The one most important thing to move forward with this would be to work on getting #4294 Add APK package building capabilities to OpenWrt into a state that we could merge it. I've resumed working on this with @aparcar and hope we will get this in for the next release.

This will then open the doors for having containers managed via APK, and having binary feeds for containers just like we got for packages.

Yes, I've been following this for a while; Weren't you the person who asked me to take over maintaining podman..? There's few years from that now and I decided to give uxc another try.. Maybe a feature request would be in place? Integrate nsenter to command-line argument "console" :slight_smile: and make a alias for kill with name "stop" :slight_smile:

Oh- didn't come to think that I could use attach for log viewing.. Still it ain't quite the same as piping output to something that I could retrieve instead..

I thought that was uxc's way to do the dance I made with nsenter..

And then.. There is the real problem that I wasn't able to get sshd (dropbear in this case) to work; on openwrt root, not at all, and on alpine's.. Well, couldn't log in. Actually, I tried also with alternative command but it didn't do anything. Login actually worked, password was asked, but when correct password was entered, it just ended the connection to there; when it should had executed sh or shell of user's choice..
Maybe not that big problem, just very difficult to trace what is wrong there as nothing ends to output..

Well, there are some people who spun containers for everything; and then there is me- who uses it to run exposed servers (and expose here has nothing to do with --expose from docker or podman which I am greatly against..), so they don't need to change.. almost ever.

Disk space ofcourse is a problem; thought not on my end, openwrt doesn't take almost any space, and all my routers run with minimum of 250gb, up to 1tb, so, I am not that familiar with way to save space, but it's very nice that you have given some thought to it, looks promising.

That script I linked, is great indeed.
I've been also running my server with openwrt for years, using podman with it; like I mentioned, I even have some special tools for it; but as I also said, podman has tons of features- that people do not understand, or need- me included. Some simpler setup, such as uxc, would be sufficient, just needs some fine tuning.

So, do you have resolv.conf stuff working? Could you also take a look on issue with hooks- my create.sh script is simple and works great, but surely we can do one better?

I also like apk very much. opkg is like a light weight version of Debian's package manager and I haven't been a fan of Debian's package manager... Ever.. So in my case opkg was an upgrade from that, but keeping my eyes open for apk support, before looking at your PR on apk, I didn't even realise it's complexity.

But... Why couldn't you have apk on your containers? Or even what ever else? For example, if you take almost any alpine based container, it has apk.. Now just add a overlay... Shell into it, set it up, update database and install your package and it's in the overlay and that way, in the container.. I didn't try this with my uxc attempt as I chose most minimal dropbear image that I came up, and noticed that it didn't come with apk binary- but I don't see why that doesn't work..

Okay, I'll make here a huge confession that I surely will regret later.. I confess that I've never used emacs, and what comes to vi... I've used it less times that I have fingers in my hands (I do have all of them) - and on those times, I've checked guidance until proceeding- I always use nano. There was time when I used Joe, and was it pico that nano was used to be called about 20 or so, years ago? So when I work on my containers and notice that I haven't included nano in them, this is the way I usually use to get it.

In my opinion; uxc is even better than podman, for in my case. It's rather difficult to set it static.. I've done it, if I reboot my server (kvm server about 5000km away from me, uptime currently about 84days, running openwrt, has containers for nginx, caddy and php, used to have exim, but apple's iCloud+ handles that now for me), servers are ready faster, when they don't need to be pulled from online sources..

Actually with uxc, I can have them truly static; like I have in my example post, which serves me the best. Everyone else though doesn't like it that way, but in my opinion great work. Just keep doing it :wink:

Please, investigate on dropbear problem.. Try with openwrt image..
Or external image, such as this one: https://hub.docker.com/r/danielkza/dropbear-static

I think I used that one... Beware, it's very minimal. No apk included :wink:

@daniel

Couldn't give up on it then and made some further tests- I took official alpine image as my base, opened up a shell to it with nsenter, installed all kinds of gimmicks, such as dropbear, catatonit and tini on it along with my favourite shell zsh and linked it in the same location as openwrt- so that I won't need to ask nsenter to start sh everytime (nsenter tries with user in effect to start his shell from location that is defined in passwd, disregarding of what is in the namespace to enter; which makes perfect sense, as it could be very different..)

So I added a new test user, set a password for it, along setting root password and then started dropbear with dropbear -R -F -B and logging in with ssh worked like a charm. So was something wrong with image I used previously? No. I changed config.json to start dropbear with same arguments (and in so many ways, including sh -c, tini and catatonit, and none of those), and attempted to login. What do I get with that? Well, the same message that I got with my previous login.. But no shell, once again it disconnected after successful logon, so I attempted to debug this, connected with something that has full openssh, so I could use -v and -vvv to find out the issue. I did not find one, connection just ended.

Next thing that came into my mind, that maybe, just maybe, command-line is executed before filesystem with all it's bindings is ready; though it wouldn't matter, otherwise you'd had to restart your ssh server every time you change system- so I copied files from overlay directly to rootfs of container. As I suspected, this wasn't the culprit and didn't change the end results.

This is impossible to debug, as it doesn't have any debug output. Dropbear doesn't die on successful connection, it just ends the connection and you can re-connect to it as many times as you want, but it won't change the fact that you won't get into shell.

This might sound like I would be very keen on getting ssh to work and I can assure that is not what I am after. Actually it's pretty pointless even to be able to connect to a container running nothing more than just ssh- it's just the first attempt I took to this, and ended up in a failure, so that's a starting point in my opinion. And the fact that dropbear's sshd fails, means that there can be other things that would fail as well and to perfect, this is one of those issues that I think, are on the top of the list- as isn't the main reason for containers existence, to run software isolated from host? And for the moment, there just seems to be a big problem with that..

EDIT:
progress has been made. Opensshd - far from success.. dropbear: I can now login as root to container, I am working on getting pts access for normal user as well..

I just logged into the forum after seeing the email notification of your message which did not contain that final part. My intention was to tell you that PTS/PTY allocation or access rights are most likely the problem, but it looks like you have figured it out by yourself already by now :wink:

As in how to debug: I tend to strace everything when I don't know what's going on and then I usually do know what's happening afterwards...

Regarding config.json: Does it use user namespaces? Because those do make things a bit more tricky and I have done only little testing with user namespaces.

Well, PTS/PTY rights are still the problem that causes test user login to fail, dropbear fails to chown /dev/pts/0 which should be chowned for test:5 - where test is my test user that I am trying to log on to, and group 5 is tty group.

By cap changes, I was able to make root login to work- but I'd like to document these changes that are necessary to avoid the same trouble from anyone else attempting the same, it's just very slow- I have no idea what capacities am I missing, so it's try one by one to find correct capability..

I used nsenter to get to container namespace- in there, dropbear works.. Here's list of capabilities from that session:

crun:/proc/7# cat status| grep Cap
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
crun:/proc/7# capsh --decode=000001ffffffffff
0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore

When on the container start.. this is current list:

0x00000000a80425fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap

so as I didn't find any info about caps that I am missing, this will be super slow; there's quite many caps to test.. Also, if I would know in which group it should be, in case that it even needs to be in just group- it would speed things up, adding cap one by one, is not fast, it's the manual labor here that keeps me from success :smiley:

EDIT:
@daniel

Where in uxc/ujail is host's /tmp/resolf.conf-${CNTR_NAME}.d/resolv.conf.auto created (or copied from /tmp/resolv.conf.d/resolv.conf.auto)? All references I found were just to create a symlink from /dev/resolv.conf.d/resolv.conf.auto to /etc/resolv.conf and /dev/resolv.conf.d/resolv.conf.auto is binded to host's /tp/resolv.conf-${CNTR_NAME}.d/resolv.conf.auto

My create.sh script works and fixes this issue, no problems there- but I still would like to see uxc handle this without my manual assistance..

Are you sure /dev/tty is populated correctly? I have a error with openssh's sshd:

open /dev/tty failed - could not set controlling tty: Permission denied

permissions are:

crw-rw----    1 root     tty         5,   0 Feb 21 09:48 tty

but on a podman container where this error is not present:

crw-rw-rw-    1 root     root        5,   0 Nov 27 16:35 tty

If I chmod it to 666, error is gone.

And one more; if I add "user" to namespaces, container fails.
I checked oci spec for linux, and user namespace should be available.

Well, now I have made it, dropbear and openssh runs, login is possible for both; user and root (depending ofcourse on your ssh server config..)
relevant parts of config.json are:

user uid: 0, did: 0
args: /usr/sbin/sshd -D -e
or args: dropbear -R -F -B -E

capabilities:

"CAP_KILL",
"CAP_NET_RAW",
"CAP_AUDIT_WRITE",
"CAP_NET_BIND_SERVICE",
"CAP_SETGID",
"CAP_CHOWN",
"CAP_FOWNER",
"CAP_FSETID",
"CAP_SETUID",
"CAP_DAC_OVERRIDE",
"CAP_SYS_CHROOT",
"CAP_SYS_PTRACE"

these are necessary on following sets, but you can assign them to all capability sets:
bounding, inheritable, permitted

readonly: false - or mount necessary paths to a tmpfs, or use overlay..
namespaces: pid, network, ipc, uts, cgroup, mount

To run as non-root user, you should be add

        "sysctl" : {
                "net.ipv4.ip_unprivileged_port_start": "22"
        }

But I was not able to get this to work.

EDIT2:
If /dev/tty permissions are unified...
010-make-ujail-tty-writable.patch:

Index: procd-2023-11-28-7e6c6efd/jail/jail.c
===================================================================
--- procd-2023-11-28-7e6c6efd.orig/jail/jail.c
+++ procd-2023-11-28-7e6c6efd/jail/jail.c
@@ -585,7 +585,7 @@ static struct mknod_args default_devices
        { .path = "/dev/full", .mode = (S_IFCHR|S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH), .dev = makedev(1, 7) },
        { .path = "/dev/random", .mode = (S_IFCHR|S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH), .dev = makedev(1, 8) },
        { .path = "/dev/urandom", .mode = (S_IFCHR|S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH), .dev = makedev(1, 9) },
-       { .path = "/dev/tty", .mode = (S_IFCHR|S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP), .dev = makedev(5, 0), .gid = 5 },
+       { .path = "/dev/tty", .mode = (S_IFCHR|S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH), .dev = makedev(5, 0), .gid = 5 },
        { 0 },
 };

hooks are still broken, though that maybe I should create resolv.conf with hook, uxc should first check that file to run exists/is executable and at least inform if it isn't or ignore it. If file exists and is executable, jail is stuck in creating mode. You cannot kill it, even with signal 9, because runtime is stuck. Only way to delete container, is to first kill -9 container's pid, and then kill -9 runtime's pid. I did a quick review on jail.c, where execution happens, on line 1432 is parseOCIhook which looks good to me. It even supports using hooks without args, as OCI spec does also. At line 467 is run_hooklist, where hooks are executed with execve, looks pretty good to me. I didn't go through uloop code, but in case that it doesn't contain waitpid - I think it should be used. As it doesn't use pthreads, thread join is not necessary.

But uloop code might already use either wait or waitpid. But as there seems to be feature for uloop timeout, it would seem that without timeout, waitpid is used in uloop handler.

So this needs further debugging, please check it out. I didn't notice obvious problems, but there seems to be some. I used a shell script as hook, it doesn't seem to exist as a ghost or keep running inside a fork, according to process list, so very strange. It also gets executed. My script:

#!/bin/sh
env > /tmp/cntr_env

which I tried if it contains environment value with container's name to ease up populating resolv.conf - it doesn't have such environment value. Eventually I found out that from the script, I can do following:
CNTR_NAME=$(cat /proc/$PPID/cgroup | **cut** **-d** **'/'** -f4)
this way you can find your container's name. You can also replace $PPID with $$ to use scripts PID.

But yes, for the moment.. hook execution fails. Script I previously wrote that write's resolv.conf after creation of container, also fails during restart of container(uxc kill cntr, uxc create cntr), as this clears resolv.conf - also I suspect, that hooks won't be executed at re-creation of container, at least ubus container did not have them shown in ubus call container list, so only bullet proof version I can think of, is to write file from inside the container, with profile file using "ENV=/etc/profile" - but this won't work either, as resolv.conf.auto in /dev/resolv.conf.d inside container - is not writable. DNS resolution definitely needs tweaking.

EDIT3:
If you are interested in adding a "shell" or "console" or "exec/execute" argument to get into containers shell, I wrote a quite minimal version of nsenter freely available for adoption :slight_smile:

EDIT4:
Actually- if you are interested in adding exec function.... I've been busy writing one..
Here you go: 020-add-uxc-exec-function.patch

It adds exec argument, which works like this:
uxc exec container_name [command] [args]
without command it executes /bin/sh

If you decide this to be useful, you might want to check it out; whilst it works and has no errors- I had to make a override to program argument handling when exec is used, because when using exec- argument parser trimmed arguments such as -c which could be used in for example if you want to /bin/sh -c my_cmd - another thing that I'd like you to check is error codes in uxc_exec, I wasn't quite sure what would be most suitable return value for failed namespace joining, or failed exec.

Yes, error codes definitely need some checking, result is all wrong:
root@gateway:~# /tmp/uxc2 exec cntr /bin/sh2

failed to execute /bin/sh2 in container cntr
uxc error: No such device or address

Well, biggest job anyway is done, rest is mostly cosmetic changes :slight_smile:
Now... What else I could participate with... Hmm.. If terminal is disabled, output could be redirected to /dev/log - which could accept input and store it in ubus.... Ugh, huge task which would need quite a lot reworking even on ubus side, then there would be needed a socket, that would accept that output and store it in ubus.. Sounds complicated and I already feel old and tired..
Maybe I could instead make restart, that would be pretty simple; just calls kill, create and start... All functions that already are there.. Well, some other day..

Thank you again for digging into all those issues.

Your patch 010-make-ujail-tty-writable.patch can be applied as-is imho, unless there is something in OCI standard regarding making this configurable (eg. by having tty group and only allowing rw access to members of that group).

Hooks not working is quite strange, I haven't tested it in a while, but I did test them excessively when implementing support for hooks. As nobody else has ever tested (esp. hooks) apart from me when implementing them, it can very well be that there are still problems. The biggest being the obvious lack of documentation of the various convenience hacks (such as per-container netifd instance, automatic handling of containers /etc/resolv.conf by netifd(!), ...) which can of course be in conflict with things done by hooks when using other container environments.

Regarding nsenter: I got a similar minimal implementation for testing, I should have included it as a package. Your implementation now is much more beautiful and of course we should take that now :wink: The idea was kinda that in the end you should not be needing it, of course. uxc attach ... should be enough.

Regarding uxc exec: I think it can be very useful, though the implementation should be cleaned up a bit more. Listing all the different namespace types there once again feels wrong, esp. given that this list is getting longer from time to time, and we will then have to edit it an various places (eg. to add support for CLONE_NEWTIME now that kernel is new enough to support it).

Regarding user namespaces: While user namespaces generally do work afaik, there are problems with configurations combining cgroups and user namespaces. I've checked other implementations and found their work-arounds for the same issue quite ugly, and then the "nice" solution was around the corner CLONE_INTO_CGROUP (since Linux 5.7). Now that we have left behind Linux 5.4 we could mitigate the problem by using that.

ubus already supports passing open file handlers, and this is how uxc attach $containername as well as uxc start --console $containername works. In the same way we can also implement more logging -- and the console-pass-through code anyway needs some cleaning as you already noticed.

I will dedicate my day tomorrow to validate/fix hooks functionality and which ever of the other problems you mentioned I manage to reproduce and fix. As they are numerous, maybe we should have a checklist to tick them off and comment (I think that's enough, no need to overkill with spending time on issue-trackers).

Bugs:

  • hook executable existence and permissions aren't checked
  • hook process hangs (uloop/waitpid problem?)

Maybe bugs:

  • /dev/tty access requires membership in gid 5 (tty). Make this configurable somehow (spec?).

Missing:

  • uxc exec (patch suggested)
  • Documentation for netifd integration, make /etc/resolv.conf handling more configurable

uxc exec is different. It's pretty much the same that it is in podman/docker when you ask it to exec -it [cmd] - all capabilities are there. That's actually how I ended up having that full list of capabilities where ssh servers were executable and then one by one eliminated what were not needed; so I think it's very useful, in exec environment you can do some testing outside of borders that are normally set for executed software/shell. If something that you try to run isn't working in container but works from exec, configuration must be off (unless problem is not fixable with configuration, limits exist). Also, if you run a software on your container, as you should; exec allows you to run a shell along side in the same environment and namespace; you sometimes need concurrent tasks when perfecting things.

I think tty 666 permissions are correct. They are used in host systems and at least podman uses same permissions, didn't test on docker or anything else, but according to that things just won't go south with write permissions to /dev/tty - I'd say it's correct. I think you definitely should be able to run sshd servers even on how pointless it is.. or maybe you use your container for development on different kind of linux host..

processes of hooks (script would show in ps ax) are not anywhere to be seen, they execute (test output existed), and exit - but uxc/ujail waits there forever or loops on something, I really don't know- but it's stuck anyway.

I would add also FR there; because I got a tiny feature request, just regarding the scripts.. Would it be possible to add env variable for container's name? In case that you do a script that populates name servers or what ever that you want to do in a hook- you could use same script in a reliable way to make some extra configuration, when script/executable would know which container we are talking here.. You could in this case share scripts/executables with all your containers (in case where you run servers on separate containers)

I think hooks are best place to populate nameservers; I just checked and my resolv.conf also contains nameservers to my vpn; even though firewall won't allow access to there when set up perfectly, I might not want to advertise a vpn connections existence, nor allow name resolution to there either(I am using netbird), so script helps me to set up a custom name resolution. If it would be a command line argument, it would kinda be cool; but then also container creation line would grow to excessively long- and we can't put them in oci spec if we want to keep in standards.

About growing amount of namespaces- well, they don't multiply that often that it wouldn't be a problem, from time to time, you are working on procd anyway, so if one pops up once in a year, which might be even too often - I don't think we have a problem there..

I thought about that if you want and if I have time - I could maybe work on making a pull command for uxc... Same thing that script does, but integrated to uxc? Would that be useful?

EDIT:
Okay, I figured out how to get name servers to containers resolv.conf - and it's pretty neat actually;

config device 'veth0'
	option type 'veth'
	option name 'vhost0'
	option peer_name 'virt0'

config interface 'virt0'
	option proto 'static'
	option device 'virt0'
	option ipaddr '10.0.202.2'
	option netmask '255.255.0.0'
	option gateway '10.0.0.1'
	option dns '10.0.0.1'
	option jail 'cntr'
	option jail_ifname 'host0'

There, option dns of virt0 generates resolv.conf