simple-container-runtime

A minimal Linux container runtime written in C (~250 lines), implementing the core isolation mechanisms that production runtimes like runc use under the hood.

This is not a wrapper around Docker or containerd. It directly invokes Linux kernel APIs to build a container from scratch.

What it implements

Isolation layer	Kernel mechanism	Effect
Hostname isolation	`unshare(CLONE_NEWUTS)`	Container has its own hostname
Filesystem isolation	`unshare(CLONE_NEWNS)`	Container has its own mount tree
IPC isolation	`unshare(CLONE_NEWIPC)`	Container has its own SysV IPC / POSIX MQ
PID isolation	`unshare(CLONE_NEWPID)`	Container process becomes PID 1
Root filesystem swap	`mount --bind` + `pivot_root(2)`	Full filesystem root replacement
Memory limit	cgroups v2 `memory.max`	100 MB hard cap enforced by the kernel
Capability drop	`libcap-ng`	All capabilities dropped except a small whitelist
Syscall filtering	`libseccomp` (BPF)	`reboot`, `swapon`, module loading blocked

Why `pivot_root` instead of `chroot`

chroot(2) only redirects pathname lookups — a process with CAP_SYS_CHROOT can escape it. pivot_root(2) replaces the root mount point of the entire mount namespace, then the old root is unmounted with MNT_DETACH, making it genuinely unreachable from inside the container. This is what runc does.

Execution flow

Parent                                  Child (PID 1 in new PID ns)
──────────────────────────────────      ──────────────────────────────────
unshare(CLONE_NEWPID)
fork() ─────────────────────────────►  block on sync pipe
mkdir /sys/fs/cgroup/simple_container
write memory.max = 100000000
write <child_pid> → cgroup.procs
signal child via pipe ──────────────►  unshare(UTS | NS | IPC)
waitpid()                               sethostname("mycontainer")
rmdir cgroup dir                        mount --bind rootfs → rootfs
                                        pivot_root(., old_root)
                                        umount2(old_root, MNT_DETACH)
                                        mount /proc
                                        drop capabilities (libcap-ng)
                                        seccomp BPF filter (libseccomp)
                                        execvp(cmd)

The sync pipe ensures the child does not exec before the parent places it inside the cgroup — avoiding a race where the process runs unrestricted before memory limits are applied.

Requirements

Linux kernel ≥ 4.6 (cgroups v2 with memory.max)
libcap-ng — libcap-ng-devel on Fedora/RHEL, libcap-ng-dev on Debian/Ubuntu
libseccomp — libseccomp-devel / libseccomp-dev
gcc, make
Must run as root (or with CAP_SYS_ADMIN)

Build

make

Usage

You need a root filesystem directory. One way to get a minimal one:

mkdir alpine-rootfs
docker export $(docker create alpine) | tar -xC alpine-rootfs

Then run:

sudo ./simple_container ./alpine-rootfs /bin/sh

Inside the container:

/ # hostname
mycontainer
/ # echo $$
1

Limitations

No network namespace (container shares the host network stack)
No user namespace (requires root on the host)
cgroups v2 only — no v1 fallback
Memory limit only; no CPU or I/O constraints
Single-container use; no lifecycle management

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
simple_container.c		simple_container.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

simple-container-runtime

What it implements

Why `pivot_root` instead of `chroot`

Execution flow

Requirements

Build

Usage

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

simple-container-runtime

What it implements

Why pivot_root instead of chroot

Execution flow

Requirements

Build

Usage

Limitations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Why `pivot_root` instead of `chroot`

Packages