Skip to content

setns: support user namespaces#13323

Open
shayonj wants to merge 1 commit into
google:masterfrom
shayonj:issue-13314-userns-setns
Open

setns: support user namespaces#13323
shayonj wants to merge 1 commit into
google:masterfrom
shayonj:issue-13314-userns-setns

Conversation

@shayonj
Copy link
Copy Markdown
Contributor

@shayonj shayonj commented May 30, 2026

User namespace entries under /proc/[pid]/ns currently render as fake
namespace symlinks. They look like the other namespace files, but opening
them does not produce an nsfs file that setns(2) can use. Rootless
container tools such as buildah and podman rely on that file when they
re-enter the pause process user namespace, so the second lifecycle command
fails with EINVAL.

Make UserNamespace implement vfs.Namespace and give each user namespace
an nsfs inode when it is created. /proc/[pid]/ns/user now uses the
regular namespace symlink path, so opening it returns a joinable namespace
file instead of a fake link target.

Setns now accepts CLONE_NEWUSER from both nsfds and pidfds. It
follows the Linux restrictions for user namespace joins by rejecting the
caller's current user namespace, requiring CAP_SYS_ADMIN in the target
user namespace, rejecting multithreaded callers, and rejecting callers with
fs state shared outside the thread group. The capability checks for any
other namespaces in the same setns call use the credentials the caller
would have after joining the user namespace.

Add a syscall regression test that creates a child user namespace, opens
/proc/<pid>/ns/user, and verifies that setns(CLONE_NEWUSER) succeeds.

Fixes #13314

User namespace entries under /proc/[pid]/ns currently render as fake
namespace symlinks. They look like the other namespace files, but opening
them does not produce an nsfs file that setns(2) can use. Rootless
container tools such as buildah and podman rely on that file when they
re-enter the pause process user namespace, so the second lifecycle command
fails with EINVAL.

Make UserNamespace implement vfs.Namespace and give each user namespace an
nsfs inode when it is created. /proc/[pid]/ns/user now uses the regular
namespace symlink path, so opening it returns a joinable namespace file
instead of a fake link target.

Setns now accepts CLONE_NEWUSER from both nsfds and pidfds. It follows the
Linux restrictions for user namespace joins by rejecting the caller's
current user namespace, requiring CAP_SYS_ADMIN in the target user
namespace, rejecting multithreaded callers, and rejecting callers with fs
state shared outside the thread group. The capability checks for any other
namespaces in the same setns call use the credentials the caller would have
after joining the user namespace.

Add a syscall regression test that creates a child user namespace, opens
/proc/<pid>/ns/user, and verifies that setns(CLONE_NEWUSER) succeeds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/proc/[pid]/ns/user is not usable with setns

1 participant