GH-3475: Fix parquet-vector compatiblity with Java > 17 by iemejia · Pull Request #3476 · apache/parquet-java

iemejia · 2026-04-13T14:31:49Z

Replace the ByteBuffer-specific vector loads with local helpers that copy the required bytes and then call ByteVector.fromArray. This removes the dependency on JDK-specific ByteVector.fromByteBuffer entry points, which can fail with NoSuchMethodError on newer runtimes.

Assisted-by: OpenCode:gpt-5.4

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Replace the ByteBuffer-specific vector loads with local helpers that copy the required bytes and then call ByteVector.fromArray. This removes the dependency on JDK-specific ByteVector.fromByteBuffer entry points, which can fail with NoSuchMethodError on newer runtimes. Assisted-by: OpenCode:gpt-5.4

Copilot

Pull request overview

Updates the vector-encoding plugin to avoid JDK-specific ByteVector.fromByteBuffer(...) entry points (which can break on newer Java runtimes), and expands CI coverage to build/test the vector plugins on more JDK versions.

Changes:

Replace ByteVector.fromByteBuffer(...) loads with local fromByteBuffer(...) helpers that copy bytes into an array and call ByteVector.fromArray(...).
Add helper methods to read bytes from ByteBuffer for vector loads (masked and unmasked).
Expand the vector-plugins GitHub Actions matrix to run on JDK 17/21/25 and conditionally skip Spotless on non-17 JDKs; ignore Eclipse .factorypath.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.

File	Description
parquet-plugins/parquet-encoding-vector/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBitPacking512VectorLE.java	Reworks ByteBuffer-based vector loads to go through array-backed helpers instead of `fromByteBuffer`.
.github/workflows/vector-plugins.yml	Expands CI to additional Java versions and adjusts Spotless skipping behavior.
.gitignore	Ignores `.factorypath`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T16:01:15Z

+  private static ByteVector fromByteBuffer(
+      VectorSpecies<Byte> species, ByteBuffer input, int inPos, VectorMask<Byte> mask) {
+    return ByteVector.fromArray(species, readInputBytes(input, inPos, mask.trueCount()), 0, mask);
+  }


In the masked overload, readInputBytes(..., mask.trueCount()) creates a backing array smaller than species.length(). ByteVector.fromArray(species, array, 0, mask) may still perform bounds checks assuming array.length >= species.length() (and also breaks immediately if a mask ever has a true lane at an index >= array.length). To keep masked loads safe across JDK implementations, allocate an array of species.length() and only fill the needed prefix bytes (e.g., mask.trueCount()), leaving the rest as zero.

Good catch — this was wrong. mask.trueCount() only happens to be safe at the call sites in this file because every mask is built as a contiguous-prefix mask, but ByteVector.fromArray(species, array, 0, mask) requires array.length >= species.length() for any mask shape, and the masked overload of OpenJDK 17's ByteVector.fromByteBuffer actually loads the full window unconditionally and then blends with the mask — not a prefix copy. The fix in dffb6e6 is exactly that: always allocate species.length() bytes and let fromArray(..., mask) apply the mask, matching JDK semantics for arbitrary masks. Verified the AVX-512 unpack tests still pass on JDK 17 / 21 / 25.

Copilot · 2026-04-18T16:01:15Z

+  private static ByteVector fromByteBuffer(VectorSpecies<Byte> species, ByteBuffer input, int inPos) {
+    return ByteVector.fromArray(species, readInputBytes(input, inPos, species.length()), 0);
+  }
+
+  private static ByteVector fromByteBuffer(
+      VectorSpecies<Byte> species, ByteBuffer input, int inPos, VectorMask<Byte> mask) {
+    return ByteVector.fromArray(species, readInputBytes(input, inPos, mask.trueCount()), 0, mask);
+  }
+
+  private static byte[] readInputBytes(ByteBuffer input, int inPos, int byteCount) {
+    byte[] bytes = new byte[byteCount];
+    ByteBuffer source = input.duplicate();
+    source.position(inPos);
+    source.get(bytes);


readInputBytes allocates a new byte[] and creates a new ByteBuffer via duplicate() on every vector unpack from a ByteBuffer. This is a significant per-call allocation/copy regression compared to the previous ByteVector.fromByteBuffer(...) path and can materially impact the performance benefits of the vector plugin. Consider reducing allocations by reusing a scratch buffer (e.g., via ThreadLocal<byte[]> sized to the max species length) and/or fast-pathing heap buffers (input.hasArray()) to load directly from the backing array when possible.

Suggested change

private static ByteVector fromByteBuffer(VectorSpecies<Byte> species, ByteBuffer input, int inPos) {

return ByteVector.fromArray(species, readInputBytes(input, inPos, species.length()), 0);

}

private static ByteVector fromByteBuffer(

VectorSpecies<Byte> species, ByteBuffer input, int inPos, VectorMask<Byte> mask) {

return ByteVector.fromArray(species, readInputBytes(input, inPos, mask.trueCount()), 0, mask);

}

private static byte[] readInputBytes(ByteBuffer input, int inPos, int byteCount) {

byte[] bytes = new byte[byteCount];

ByteBuffer source = input.duplicate();

source.position(inPos);

source.get(bytes);

private static final ThreadLocal<byte[]> INPUT_SCRATCH =

ThreadLocal.withInitial(() -> new byte[B512.length()]);

private static ByteVector fromByteBuffer(VectorSpecies<Byte> species, ByteBuffer input, int inPos) {

if (input.hasArray()) {

return ByteVector.fromArray(species, input.array(), input.arrayOffset() + inPos);

}

return ByteVector.fromArray(species, readInputBytes(input, inPos, species.length()), 0);

}

private static ByteVector fromByteBuffer(

VectorSpecies<Byte> species, ByteBuffer input, int inPos, VectorMask<Byte> mask) {

if (input.hasArray()) {

return ByteVector.fromArray(species, input.array(), input.arrayOffset() + inPos, mask);

}

return ByteVector.fromArray(species, readInputBytes(input, inPos, mask.trueCount()), 0, mask);

}

private static byte[] readInputBytes(ByteBuffer input, int inPos, int byteCount) {

byte[] bytes = INPUT_SCRATCH.get();

for (int i = 0; i < byteCount; i++) {

bytes[i] = input.get(inPos + i);

}

Agreed on the regression concern, and the heap-buffer fast path is in dffb6e6 — input.hasArray() returns ByteVector.fromArray(species, input.array(), input.arrayOffset() + inPos[, mask]) directly, which HotSpot intrinsifies to the same vmovdqu64 AVX-512 load as the original fromByteBuffer. Since BytePacker call sites in parquet-mr use ByteBuffer.wrap(byte[]) (heap), this fast path covers the dominant production case with no copy and no duplicate().

I'd like to push back on the ThreadLocal<byte[]> part:

It only eliminates the byte[] allocation, not the ByteBuffer.duplicate(), which is the larger of the two costs on the direct-buffer path. To eliminate both you'd need a thread-local ByteBuffer view as well, at which point the helper is doing more bookkeeping than the load itself.

The byte[] is 8–64 bytes, TLAB-allocated, never escapes the helper, and is a strong escape-analysis candidate — HotSpot will frequently scalar-replace it on hot paths. ThreadLocal.get() (~5 ns) never scalarizes, so on the common JIT-hot case the ThreadLocal is a net loss.

ThreadLocal of a mutable byte[] carries the usual operational risks in long-lived JVMs (thread-pool retention, classloader leak surface) for code that is explicitly throwaway: dffb6e6 adds a TODO marking ByteVector.fromMemorySegment (JEP 454, JDK 22+) as the proper replacement, which removes the copy entirely for both heap and direct buffers via ScopedMemoryAccess and is intrinsifiable. Once the project's minimum JDK reaches 22, both helpers and readInputBytes go away.

So the change keeps the direct-buffer path as a simple byte[] + duplicate() fallback (matching the previous PR's behavior on that path), adds the heap fast path you suggested for the dominant case, and defers further optimization to the fromMemorySegment migration tracked in the TODO.

…oad helpers Mirror OpenJDK 17's masked fromByteBuffer fast path: read the full species.length() window into a backing array, then let ByteVector.fromArray(..., mask) apply the mask. Matches the JDK semantics for arbitrary mask shapes (not just contiguous-prefix masks) and keeps array.length == species.length(), satisfying the bounds-check precondition that the masked fromArray overload may make. Add a heap-buffer fast path: when input.hasArray() is true, load directly from the backing byte[] via ByteVector.fromArray. This avoids the per-call ByteBuffer.duplicate() and the intermediate byte[] allocation/copy that the previous workaround introduced as a regression versus the original ByteVector.fromByteBuffer code path. The direct (off-heap) buffer case still falls back to a small scratch array; addressing it requires ByteVector.fromMemorySegment, which is only available on JDK 19+ and cannot be called from --release 17 sources.

Copilot

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T14:17:21Z

+  private static ByteVector fromByteBuffer(VectorSpecies<Byte> species, ByteBuffer input, int inPos) {
+    // Heap buffers are loaded directly from their backing array: no ByteBuffer.duplicate(),
+    // no intermediate byte[] allocation, no extra copy. ByteVector.fromArray is intrinsified by
+    // HotSpot and lowered to a single AVX-512 load.
+    if (input.hasArray()) {
+      return ByteVector.fromArray(species, input.array(), input.arrayOffset() + inPos);
+    }
+    // Off-heap (direct) buffers: fall back to copying species.length() bytes into a scratch
+    // array. The original JDK 17 fast path used ScopedMemoryAccess.loadFromByteBuffer to avoid
+    // the copy, but that intrinsic is not exposed; the alternative ByteVector.fromMemorySegment
+    // is only available on JDK 19+ and cannot be called from --release 17 sources.
+    return ByteVector.fromArray(species, readInputBytes(input, inPos, species.length()), 0);
+  }
+
+  private static ByteVector fromByteBuffer(
+      VectorSpecies<Byte> species, ByteBuffer input, int inPos, VectorMask<Byte> mask) {
+    // Mirror the fast path of OpenJDK 17's ByteVector.fromByteBuffer(species, bb, offset, bo, m):
+    // when the full vector window fits in the buffer, read the entire window unconditionally and
+    // let fromArray(species, array, 0, mask) apply the mask. This matches the JDK semantics for
+    // arbitrary mask shapes (not just contiguous-prefix masks) and keeps array.length equal to
+    // species.length(), which satisfies the bounds-check precondition of fromArray.
+    if (input.hasArray()) {
+      return ByteVector.fromArray(species, input.array(), input.arrayOffset() + inPos, mask);
+    }
+    return ByteVector.fromArray(species, readInputBytes(input, inPos, species.length()), 0, mask);
+  }


The new fromByteBuffer(...) helpers introduce a distinct fallback path for non-array-backed buffers (!input.hasArray()), including ByteBuffer.allocateDirect() and read-only heap buffers. The existing vector unpack tests only cover heap ByteBuffer.wrap(byte[]), so this direct/read-only path is currently untested. Add a unit test that runs unpackValuesUsingVector(ByteBuffer, ...) against a direct (and optionally read-only) buffer to validate correctness and protect against regressions in readInputBytes/masked loads on newer JDKs.

wgtmac · 2026-04-18T15:58:38Z

 mvn_install.log
 .vscode/*
 .DS_Store
-


Let's keep this blank line

wgtmac · 2026-04-19T14:43:43Z

          EXTRA_JAVA_TEST_ARGS=$(./mvnw help:evaluate -Dexpression=extraJavaTestArgs -q -DforceStdout)
          export MAVEN_OPTS="$MAVEN_OPTS $EXTRA_JAVA_TEST_ARGS"
-          ./mvnw install --batch-mode -Pvector-plugins -DskipTests=true -Dmaven.javadoc.skip=true -Dsource.skip=true -Dmaven.buildNumber.skip=true -Djava.version=${{ matrix.java }} -pl parquet-plugins/parquet-encoding-vector,parquet-plugins/parquet-plugins-benchmarks -am
+          ./mvnw install --batch-mode -Pvector-plugins -DskipTests=true -Dmaven.javadoc.skip=true -Dsource.skip=true -Dmaven.buildNumber.skip=true -Dspotless.check.skip=true -Djava.version=${{ matrix.java }} -pl parquet-plugins/parquet-encoding-vector,parquet-plugins/parquet-plugins-benchmarks -am


Is it possible to use similar pattern like SPOTLESS_ARGS below?

wgtmac · 2026-04-19T14:47:23Z

+  }
+
+  private static ByteVector fromByteBuffer(
+      VectorSpecies<Byte> species, ByteBuffer input, int inPos, VectorMask<Byte> mask) {


I'm not an expert on this, but Codex complains as below:

The new fromByteBuffer(species, input, inPos, mask) helper changes the contract of the old JDK 17 masked load. Per the JDK 17 ByteVector.fromByteBuffer(..., m) docs, only masked-on lanes must be in bounds, but this fallback always copies species.length() bytes for direct buffers before applying the mask. That breaks packers whose mask intentionally covers fewer bytes than the vector width, such as Packer3 (B128 with a 12-byte mask): a direct ByteBuffer with exactly the required packed bytes now throws BufferUnderflowException, even though the old implementation and the array-backed path both accept it. The fix needs to preserve masked-load bounds semantics instead of unconditionally reading the whole vector window.

iemejia force-pushed the vector-compat branch 2 times, most recently from 4527830 to 313db67 Compare April 17, 2026 14:07

iemejia force-pushed the vector-compat branch from 313db67 to 23cbc88 Compare April 17, 2026 14:20

wgtmac requested a review from Copilot April 18, 2026 15:57

Copilot started reviewing on behalf of wgtmac April 18, 2026 15:57 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

wgtmac requested a review from Copilot April 19, 2026 14:10

Copilot started reviewing on behalf of wgtmac April 19, 2026 14:11 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

wgtmac reviewed Apr 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-3475: Fix parquet-vector compatiblity with Java > 17#3476

GH-3475: Fix parquet-vector compatiblity with Java > 17#3476
iemejia wants to merge 2 commits intoapache:masterfrom
iemejia:vector-compat

iemejia commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

iemejia Apr 19, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

iemejia Apr 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

wgtmac Apr 18, 2026

Uh oh!

wgtmac Apr 19, 2026

Uh oh!

wgtmac Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

iemejia commented Apr 13, 2026

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

iemejia Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

iemejia Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

wgtmac Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

wgtmac Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

wgtmac Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iemejia Apr 19, 2026 •

edited

Loading