Skip to content

fix(writer): exclude I8/I16 from global dict — reader can't decode narrow-int dict#100

Merged
dfa1 merged 1 commit into
mainfrom
fix-i16-dict-incompat
Jun 20, 2026
Merged

fix(writer): exclude I8/I16 from global dict — reader can't decode narrow-int dict#100
dfa1 merged 1 commit into
mainfrom
fix-i16-dict-incompat

Conversation

@dfa1

@dfa1 dfa1 commented Jun 20, 2026

Copy link
Copy Markdown
Owner

Fixes the write/read incompatibility surfaced by mutation coverage in #99.

The bug

The writer's global dict admitted I8/U8/I16/U16 columns, but the reader's lazy dict decode (ScanIterator.buildLazyDictPrimitive) only supports I32/I64/F32/F64. A low-cardinality I16 column therefore wrote a vortex.dict the reader rejected on read-back:

VortexException: vortex.dict: layout: unsupported ptype for lazy dict: I16

i.e. the writer produced an unreadable file.

Cross-checked against Rust (ground truth)

New RustWritesJavaReadsIntegrationTest#jniWriter_javaReader_lowCardinalityI16: the Rust/JNI writer encodes a 10k-row, 3-distinct I16 column and the Java reader reads it back exactly — no dict error. So Rust does not dict-encode narrow ints, and the Java reader is already Rust-conformant. The Java writer was the outlier.

Fix

Exclude I8/U8/I16/U16 from isDictCandidate (alongside F16/F32):

  • a U8/U16 code is no smaller than the value → dict gives no real benefit;
  • matches the Rust compressor;
  • only ever produced files the reader couldn't read → not a regression.

A low-card I16 column now encodes via the cascade and round-trips cleanly (GlobalDictPrimitiveTest#lowCardinality_i16_notDicted_roundTrips), and VortexWriterDictDecisionTest pins the new exclusion.

Verify

./mvnw -pl writer -am test
./mvnw verify -pl integration -am -Dit.test="RustWritesJavaReadsIntegrationTest#jniWriter_javaReader_lowCardinalityI16"

🤖 Generated with Claude Code

…rrow-int dict

Mutation coverage (PR #99) surfaced that the writer's global dict admitted
I8/U8/I16/U16 columns, but the reader's lazy dict decode only supports
I32/I64/F32/F64 — a low-cardinality I16 column wrote a vortex.dict the reader
rejected with "unsupported ptype for lazy dict: I16", i.e. an unreadable file.

Cross-checked against the Rust reference: the JNI writer does NOT dict-encode a
low-cardinality I16 column, and the Java reader reads its output back exactly
(new RustWritesJavaReadsIntegrationTest#jniWriter_javaReader_lowCardinalityI16).
So the Java reader is already Rust-conformant; the Java writer was the outlier.

Fix: exclude I8/U8/I16/U16 from isDictCandidate (alongside F16/F32). Narrow-int
dict gives no real benefit (a U8/U16 code is no smaller than the value), matches
Rust, and only ever produced files the reader couldn't read — so excluding it is
not a regression. A low-card I16 column now encodes via the cascade and
round-trips (GlobalDictPrimitiveTest#lowCardinality_i16_notDicted_roundTrips).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dfa1 dfa1 merged commit 473256b into main Jun 20, 2026
6 checks passed
@dfa1 dfa1 deleted the fix-i16-dict-incompat branch June 20, 2026 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant