Skip to content

vortex.zstd: dictionary-compressed segments not decodable (pure-Java backend lacks dict support) #104

@dfa1

Description

@dfa1

Summary

vortex.zstd segments compressed with a trained dictionary cannot be decoded. ZstdEncodingDecoder fails fast when dictionary_size != 0:

io.github.dfa1.vortex.core.VortexException:
  vortex.zstd: dictionary-compressed Zstd segments are not supported (pure-Java decoder)

This is hit by the upstream compatibility fixture zstd.vortex (v0.75.0).

Why

The decode backend (io.airlift:aircompressor-v3) has no Zstd dictionary support:

  • ZstdFrameDecompressor exposes no API to preload a dictionary's entropy tables or seed the back-reference window; reset() only clears state.
  • Rust uses zstd::bulk::Decompressor::with_dictionary(dict) (native libzstd).

The published zstd.vortex fixture genuinely requires it:

  • dict buffer begins with 0xEC30A437trained dictionary (preset Huffman + 3 FSE tables + content), Dictionary_ID 0x22a28c3d.
  • frames carry the Dictionary_ID flag referencing that same ID — preset entropy tables and window seeding are mandatory.

Buffer layout (per Rust encodings/zstd/src/array.rs): buffer[0] = dictionary, buffer[1..] = frames.

Options considered

  1. Pure-Java dict decoder from scratch on MemorySegment (~1500–2500 LOC: dict parse, Huffman incl. Treeless, 3 FSE tables incl. Repeat mode, sequence execution with window-prefix matches). Verifiable against one fixture only — high correctness risk.
  2. Vendor/extend aircompressor's frame decompressor — rejected: its core uses sun.misc.Unsafe (~67 refs), violating the project's no-Unsafe / FFM-only rule.
  3. zstd-jni (native) — rejected: violates the no-JNI rule.

Current state

Fail-fast guard retained; covered by VortexHttpReaderIT#scan_zstdVortex_rejectsDictionaryCompression, which asserts the clear error. zstd.vortex is excluded from the encoding smoke test. Re-enable once a pure-Java dictionary path exists (or aircompressor ships dict support).

References

  • Rust decode: encodings/zstd/src/array.rs (decompress, line ~888: Decompressor::with_dictionary)
  • Buffer layout: same file, deserialize line ~204

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions