Skip to content

fix(erofs): write per-instance VMDK descriptor to bundle dir; use atomic write#212

Merged
dmcgowan merged 1 commit into
containerd:mainfrom
austinvazquez:fix/erofs-vmdk-stale-descriptor
Jun 2, 2026
Merged

fix(erofs): write per-instance VMDK descriptor to bundle dir; use atomic write#212
dmcgowan merged 1 commit into
containerd:mainfrom
austinvazquez:fix/erofs-vmdk-stale-descriptor

Conversation

@austinvazquez
Copy link
Copy Markdown
Member

@austinvazquez austinvazquez commented Jun 1, 2026

The merged_fs.vmdk descriptor was written to the snapshotter directory and reused across mounts via a bare existence check. If the backing extent files moved or were cleaned up, the cached descriptor became permanently stale with no recovery path.

Switch to a simpler, more correct design:

  • Per-instance descriptor: write to the OCI bundle directory (filepath.Join(bundleDir, "merged_fs.vmdk")) so the file is scoped to one container lifetime and torn down with the bundle. No descriptor is shared across instances.
  • Unconditional regeneration: regenerate on every transformMounts call — no existence check, no cache, no stale state possible.
  • Atomic write: os.CreateTemp + fsync + os.Rename so concurrent writers each get a unique temporary filename and a reader never observes a partially-written file.

bundleDir is threaded through setupMounts and transformMounts; the erofs branch writes the descriptor to filepath.Join(bundleDir, "merged_fs.vmdk").

No new public API is introduced.

The erofs snapshotter now writes the VMDK extent descriptor to the per-instance bundle directory and regenerates it on every mount, eliminating the class of failures caused by stale cached descriptors.

Copilot AI review requested due to automatic review settings June 1, 2026 15:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds self-healing for the cached merged_fs.vmdk descriptor in the erofs snapshotter: descriptors are now validated on each mount, regenerated when stale, written with extent paths relative to the descriptor directory, and persisted via an atomic temp-file + rename.

Changes:

  • internal/erofs/vmdk.go: New ParseVMDKDescriptorExtents; DumpVMDKDescriptor takes a baseDir to resolve relative paths for stat-ing; DumpVMDKDescriptorToFile writes atomically via .tmp + rename.
  • internal/shim/task/mount.go: Computes per-descriptor relative device paths, validates the existing descriptor via new vmdkDescriptorIsStale, and regenerates on stale/parse failure.
  • internal/erofs/vmdk_test.go: Adds round-trip, stale-detection, relative-path, large-extent, and atomic-write tests.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
internal/erofs/vmdk.go Adds parser, atomic write, baseDir-aware stat, and relative-path support for descriptor writing.
internal/shim/task/mount.go Builds relative extent paths, validates the cached descriptor, and triggers regeneration on stale/parse failure.
internal/erofs/vmdk_test.go New unit tests covering round-trip parse, stale detection scenarios, relative-path resilience, and atomic write.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/erofs/vmdk.go Outdated
@austinvazquez austinvazquez force-pushed the fix/erofs-vmdk-stale-descriptor branch from b27d0b0 to d5a569c Compare June 1, 2026 15:58
@dmcgowan
Copy link
Copy Markdown
Member

dmcgowan commented Jun 1, 2026

Ugh, this is still way too fragile

Comment thread internal/shim/task/mount.go Outdated
@austinvazquez
Copy link
Copy Markdown
Member Author

austinvazquez commented Jun 1, 2026

Agreed on both fronts — generating per-instance and never persisting is the cleaner shape, and matches what the rest of the producer code in mount.go does (the bundle dir is already the natural home for ephemeral per-VM state). I'll rework this PR to:

  • keep the tmp + fsync + rename atomic write
  • move merged_fs.vmdk into the bundle dir and regenerate it unconditionally on each transformMounts call
  • drop ParseVMDKDescriptorExtents and the relative-extent-path logic; they exist only to make the cache safe, and there's no cache anymore

Copilot AI review requested due to automatic review settings June 1, 2026 21:55
@austinvazquez austinvazquez force-pushed the fix/erofs-vmdk-stale-descriptor branch from d5a569c to c51364d Compare June 1, 2026 21:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread internal/shim/task/mount.go Outdated
Comment thread internal/shim/task/mount.go
…mic write

The merged_fs.vmdk descriptor was written to the snapshotter directory
and reused across mounts via an existence check. If the backing extent
files moved or were cleaned up, the cached descriptor became permanently
stale with no recovery path.

Switch to a simpler, more correct design:

- Write the descriptor to the per-instance OCI bundle directory so it
  is naturally scoped to one container lifetime and torn down with the
  bundle. No descriptor is ever shared across instances.
- Regenerate unconditionally on every transformMounts call. There is no
  cache to validate, so there is no stale state possible.
- Use an atomic write (os.CreateTemp + fsync + os.Rename) so concurrent
  writers each get a unique temporary file name and a reader never
  observes a partially-written descriptor.

Thread bundleDir through setupMounts and transformMounts so the erofs
branch can place the file at filepath.Join(bundleDir, "merged_fs.vmdk").

No new public API is introduced.

Signed-off-by: Austin Vazquez <austinvazquez@users.noreply.github.com>
@austinvazquez austinvazquez force-pushed the fix/erofs-vmdk-stale-descriptor branch from c51364d to c9d57bc Compare June 1, 2026 22:01
@austinvazquez austinvazquez changed the title fix(erofs): validate cached VMDK descriptor before reuse and write relative extent paths fix(erofs): write per-instance VMDK descriptor to bundle dir; use atomic write Jun 1, 2026
@austinvazquez austinvazquez requested a review from Copilot June 1, 2026 22:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Member

@hsiangkao hsiangkao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If vmdks are generated in the bundle dirs, I think there is no concurrent write anyway.
but leaving write+fsync+rename is also fine.

@dmcgowan dmcgowan merged commit 6b63899 into containerd:main Jun 2, 2026
15 checks passed
@austinvazquez austinvazquez deleted the fix/erofs-vmdk-stale-descriptor branch June 2, 2026 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants