Add erofs support to apko.#2249
Conversation
4516981 to
268396a
Compare
| `format` selects the on-wire layer payload format. Two values are recognized: | ||
|
|
||
| - `tar` (default): standard gzip-compressed tar layers (`application/vnd.oci.image.layer.v1.tar+gzip`). | ||
| - `erofs`: uncompressed EROFS filesystem images (`application/vnd.erofs`), per the draft [erofs/erofs-image-spec](https://github.com/erofs/erofs-image-spec). EROFS layers advertise `erofs` in the image config's `os.features` so consumers that do not implement the spec can identify and skip them. |
There was a problem hiding this comment.
This mentions "uncompressed EROFS" which is a bit confusing. EROFS by default uses LZ4 compression on data, unless you pass flags to to the erofs tool. The spec mention talks about EROFS "raw" and EROFS+zstd compressed; but, IIUC, what compression used by erofs is handled by erofs runtime mounting -- what do you mean here w.r.t use of compression, and where.
I'm not sure if the spec discussion of "raw" as EROFS (using lz4 default compression), or completely no compression mode of erofs. And I'm trying to understand what the spec means by EROFS+ztsd -- does one zstd compress a "raw" EROFS uncompressed image or would that zstd compress default EROFS (lz4) image?
There was a problem hiding this comment.
I see further down you mention that 1) you're not using erofs-utils provided tools for creating erofs images 2) because of (1) there isn't any compression yet.
I'll caution that this sure looks a lot like the trouble that gzip compressed tar layers have in golang where the intermediate layers that transform compressed things can end up getting different hashes. In stacker we ran into issues with docker recompressing stacker generated gzip blobs with different window.
Not directly relevant to this PR, but we should likely follow up on the erofs-image-spec about that scenario so we can ensure re-compression does not produce different sha256sums.
There was a problem hiding this comment.
I hooked up use of 'mkfs.erofs'. and it will take that path when you ask for compression via --format= . i dont love it, but it is nice to not wait on go-erofs for it.
the one completely missing feature is dmverity data now.
Emit OCI image layers as EROFS filesystem images (application/vnd.erofs) instead of tar+gzip. Selected via `--format=erofs` on `apko build` / `apko publish` or `format: erofs` in apko.yaml. Tracks the draft erofs/erofs-image-spec (PR chainguard-dev#1). Single-layer and multi-layer (layering) builds are supported. Multi-layer emits each non-final group with `org.erofs.role=overlay-lower` per spec §3.8 and a per-group partial `usr/lib/apk/db/installed` so per-layer scanners still work. Manifests declare `erofs` in os.features per §5.4. Uses github.com/erofs/go-erofs (Apache-2.0, pure Go) for the writer. Reproducibility via SOURCE_DATE_EPOCH. Tests cover roundtrip via erofs.Open, byte-identical determinism, the full ImageLayoutToLayer dispatch, OSFeatures plumbing, and end-to-end validation via `fsck.erofs` (skipped when the binary isn't on PATH). `+zstd`, dm-verity, and chunk indexes are not implemented in this round. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step-by-step guide for producing EROFS images with --format=erofs, inspecting the layer blob without root (fsck.erofs / dump.erofs / fsck.erofs --extract), mounting it (kernel mount or erofsfuse), pulling layer blobs from a registry, and assembling multi-layer images via overlayfs. Includes the current limitations (no +zstd, dm-verity, chunk index) and links from apko_file.md. All commands shown were verified against a real `apko build` of examples/wolfi-base.yaml. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds an `apko erofs` command group that wraps the EROFS mount workflow: `mount` accepts a raw blob or an OCI image directory (auto-detected, or via `erofs:`/`oci:`/`oci-dir:` prefixes), `umount` reads a per-mount state file to unwind every layer, and `ls` produces a `tar tvf`-style listing without leaving mounts behind. The new pkg/erofsmount library handles source parsing, OCI layout reading, kernel/FUSE drivers with kernel-overlay-over- fuse fallback to fuse-overlayfs, and state-file teardown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`apko erofs ls` now opens each EROFS layer blob directly with go-erofs and walks a layered fs.FS in user space, instead of mounting the layers and walking the merged mountpoint. This removes the kernel/FUSE dependency for `ls` (works on darwin/windows too), eliminates the mount log noise, and is faster. Introduces a reusable pkg/erofsmount.Stack: a layered fs.FS implementing fs.ReadDirFS/StatFS/ReadLinkFS with full AUFS-style overlay semantics — .wh.NAME whiteouts hide siblings, .wh..wh..opq markers hide all lower- layer entries in a directory, ancestor whiteouts hide whole subtrees, type-mismatch in a higher layer shadows lower contents. apko's writer never emits whiteouts (it splits one rootfs into groups, doesn't merge), so 15 unit tests synthesize the whiteout cases via testing/fstest.MapFS. Mount and Unmount remain Linux-only since they genuinely need the kernel or FUSE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
go-erofs cannot write compressed EROFS images yet, so the new compound format value routes compressed builds through `mkfs.erofs` (erofs-utils). ALGO is one of zstd|lz4|lz4hc|deflate, with an optional ,level=N. Plain `--format=erofs` keeps using the pure-Go writer. LayerFormat gains Base/Compressor/CompressionLevel methods; Valid is extended to whitelist the compressor names mkfs.erofs supports. Existing dispatch in build.go/layers.go/oci/image.go switches from Resolved() to Base() so the compressor suffix doesn't break format-kind comparisons. `apko erofs ls` wraps go-erofs's ErrNotImplemented (returned for compressed images on the read side) with a friendly message pointing the user at `apko erofs mount`, which decompresses via the kernel or erofsfuse. This is expected to be temporary — once go-erofs gains read-side compression, `apko erofs ls` will work against compressed images without code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
writeERofs was the only identifier in the repo using mid-word
acronym-style "ERofs"; everywhere else treats it as a word ("Erofs").
Rename writeERofs / writeERofsViaMkfs and the related test names so the
codebase is uniform.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d layers The mkfs.erofs path previously returned the same SHA-256 for Digest and DiffID and read its `Uncompressed()` bytes from the compressed file, which is wrong for OCI: `rootfs.diff_ids` is supposed to identify the uncompressed layer payload, and `org.erofs.uncompressed-digest` was declared as a constant but never set on any descriptor. Run mkfs.erofs a second time without `-z` to materialize the uncompressed-equivalent image, hash it for DiffID, persist it for the lifetime of the returned `v1.Layer`, and surface the digest via the spec's `org.erofs.uncompressed-digest` annotation. Raw EROFS layers keep DiffID == Digest as before. Also fix a double-close in `ImageLayoutToLayer`: the mkfs branch explicitly closed the output file but the deferred close at the top of the function ran again on the same descriptor. Move the defer past the mkfs branch so it only registers for paths that hold the descriptor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
apko_file.md still claimed EROFS layers were uncompressed and that +zstd was unimplemented; the +ALGO variants have shipped since the format field was first documented. Rewrite the format list to enumerate raw and compressed variants, mention the uncompressed-digest annotation, and note the mkfs.erofs runtime dependency. erofs.md's manual-overlay reference snippet had two bugs that prevented it from running end-to-end: $ROOT/../../blobs/sha256/$MANIFEST double-traversed the OCI layout, and the lowerdir chain hard-coded a four-layer count with explicit lower0/lower1 references that wouldn't generalize. Rewrite the loop to derive $BLOBS and $MANIFEST cleanly and accumulate $LOWERS as it mounts. Also fix two small accuracy bugs: --arch on apko takes Go arches (amd64, arm64), not uname -m output (x86_64, aarch64) — replace with --arch=host, which is what the YAML examples in the same file use. And on Debian/Ubuntu erofsfuse ships inside the erofs-utils package; the separate erofsfuse package only exists on Wolfi/Alpine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The --format help on apko build and apko publish previously listed only 'tar' and 'erofs', so users had no way to discover the erofs+zstd|lz4|lz4hc|deflate[,level=N] variants from --help. Spell out the compound form and the supported algorithms. LayerFormat.Valid() also silently accepted any "key=value" trailing options as long as the compressor name parsed: erofs+zstd,level=oops returned valid with level dropped, and erofs+zstd,foo=bar (or any unknown key) was treated as fine. That meant typos produced uncompressed fallback at best and surprising mkfs.erofs invocations at worst. Tighten Valid() to require known keys (currently just "level") with parseable values, and update the table-driven test to cover unknown keys, bare options without '=', and the previously-accepted level=BAD case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The application/vnd.erofs media type and the org.erofs.role / overlay-lower / org.erofs.uncompressed-digest annotation strings lived as parallel unexported consts in pkg/build/erofs.go and pkg/erofsmount/oci.go with a "keep in sync" comment guarding the duplicate. Promote them to a single set of exported constants in pkg/build/types/erofs.go so both the writer and the reader/mount tools reference the same source, and test fixtures lock to the same strings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mount(8) since util-linux 2.29 autodetects when the source is a regular file and allocates a loop device with O_AUTOCLEAR, freeing it on umount. Asking for "-o loop" explicitly relies on a separate code path whose cleanup semantics differ across util-linux releases and busybox builds — on older or non-GNU versions the loop device can leak after umount. Drop "loop" from the argv. Keep "-o ro" to document intent (EROFS is intrinsically read-only, but the explicit flag tells a reader who is copy-pasting the equivalent shell command that we never plan to write). Update the matching tests and the two "doing it manually" snippets in docs/erofs.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
st.Mounts is recorded overlay-first then per-layer mounts in LIFO order. If the overlay umount fails, every subsequent layer umount returns EBUSY because the overlay still pins them — the previous loop collected and errors.Join()'d every one of those, giving the user a long block of identical "device busy" noise where only the first error described the real problem. Return on the first failed umount with a single error that names which mountpoint the user needs to clear; leave the remaining mounts and the state file in place so a follow-up `apko erofs umount` finishes the job. Deliberately do not fall back to `umount -l`: lazy unmount would let the process exit with the user believing things were torn down while the mounts and pinned files quietly persist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erofsmount.Options.ReadOnly was already plumbed into the overlay assembly but had no way to be set from the CLI; the only consumers were external library callers. Wire it through 'apko erofs mount --read-only', omitting the upperdir/workdir overlay just like a library caller would. For single-layer images, overlayfs adds nothing in the read-only case and a lowerdir-only overlay over one EROFS mount has historically been finicky across overlayfs releases. When --read-only is set and the image has exactly one layer, skip the layers/upper/work directories entirely and mount the lone layer straight at DEST/merged. The state file's Mounts slice records that single mountpoint, so Unmount naturally cleans up the same way as a multi-layer mount. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implement apko writing of erofs images according to draft spec at https://github.com/erofs/erofs-image-spec