docs(splat-native): address review feedback on #212 (ray segmentation + Cholesky scratch buffer) by AdaWorldAPI · Pull Request #213 · AdaWorldAPI/ndarray

AdaWorldAPI · 2026-06-05T10:23:10Z

Summary

Follow-up to #212 (merged). Addresses 2 codex findings on the W1c primitive signatures in the splat-native SIMD substrate plan.

Fixes

Fix 1 — `batched_opacity_blend` ray segmentation (codex P1)

The original signature took a single flat sorted_amplitudes slice and emitted a single out_alpha[ray] — but a renderer composites N independent view rays per frame, each with its own front-to-back-sorted Gaussian sequence. Without per-ray boundaries the implementation could not know which Gaussians belong to which output pixel.

Adds a CSR-style ray_offsets: &[u32] prefix-sum (length n_rays + 1) that segments the flat amplitude buffer into per-ray ranges:

pub fn batched_opacity_blend(
    sorted_amplitudes: &[f32],    // flat; all rays' samples concatenated
    ray_offsets: &[u32],          // length = n_rays + 1 (CSR-style); ray r's range is [ray_offsets[r]..ray_offsets[r+1])
    opacity_lut: &[u8; 256],
    out_alpha: &mut [u8],         // length = n_rays
);

Contract: ray_offsets[0] == 0; ray_offsets[n_rays] == sorted_amplitudes.len(); empty ray (ray_offsets[r] == ray_offsets[r+1]) yields α = 0; rays are independent (outer loop parallelizable); per-frame quantization is caller-side.

Tests added: single-ray + multi-ray reference comparison, empty-ray boundary case, multi-ray independence (concatenated rays ≡ separate calls), ray_offsets invariant debug-asserts, SIMD parity.

Fix 2 — `batched_mahalanobis` Cholesky scratch buffer (codex P2)

The implementation note said L was "heap-free via stack or caller-provided scratch" but the public signature had no scratch parameter. At the documented N = 1_000_000 bench size, the Cholesky cache is 6 * N * size_of::<f32>() = 24 MiB — not stack-feasible.

Adds explicit cholesky_scratch: &mut [f32] parameter (length 6*N):

pub fn batched_mahalanobis(
    query_xyz: &[f32],            // length 3*M
    mu_xyz: &[f32],               // length 3*N
    sigma_packed: &[f32],         // length 6*N (upper-triangle Σ per Gaussian)
    cholesky_scratch: &mut [f32], // length 6*N (caller-provided; holds packed L per Gaussian)
    out_dist_sq: &mut [f32],      // length M*N (row-major)
);

Sizing guidance: N ≤ 8192 MAY use stack-resident; N > 8192 MUST allocate once at engine init and re-use across frames; the function MUST NOT allocate internally. Matches the splat-fit engine + registration loop pattern (allocated once per SplatFitActor mailbox at boot).

What's NOT in this PR

Source code: still none. Plan-spec only.
The W1c primitive-addition contract (all three backends mandatory, parity tests gate, VPABSB-correction-style degenerate-input documentation) is unchanged.

Test plan

Codex P1 (ray segmentation) — added per-ray offset + 3 new tests.
Codex P2 (Mahalanobis scratch) — added cholesky_scratch parameter + sizing note + zero-allocation contract.
Codex re-review on this PR.

Companion follow-up PRs (the four-PR fix arc)

Repo	Branch	Fixes
`lance-graph`	`claude/splat-native-ultrasound-v1-fixes`	5 fixes — StepDomain alignment with shipped contract, STATUS_BOARD canonical-status invariant + Sprint column, `pose_se3` 16→24, NR-SPLAT-PHI normative rule, MD040 fence-tag lint, IVD-MDR→MDR Annex VIII (#471 follow-up)
`ndarray`	`claude/splat-native-ultrasound-v1-fixes`	2 fixes — `batched_opacity_blend` ray segmentation (CSR-style `ray_offsets`), `batched_mahalanobis` Cholesky-scratch buffer (#212 follow-up)
`MedCare-rs`	`claude/splat-native-ultrasound-v1-fixes`	1 fix — `pose_se3` 16→24 (#163 follow-up)
`OGAR`	`claude/splat-native-customer-fixes` (PR #35)	1 fix — IVD-MDR → MDR Annex VIII Rule 11 (#34 follow-up)

All four cross-reference each other. No source code in any of the four PRs — design-spec / handover updates only.

Authored by session claude/lance-graph-ontology-review-Pyry3.

Follow-up to PR #212 (merged). Addresses two codex review findings. ## Fix 1 — `batched_opacity_blend` needs ray segmentation (codex P1) The original signature took a single flat `sorted_amplitudes` slice and emitted a single `out_alpha[ray]` — but a renderer composites N independent view rays per frame, each with its own front-to-back- sorted Gaussian sequence. Without per-ray boundaries the implementation could not know which Gaussians belong to which output pixel, so it would either composite the same global sequence for every ray or guess boundaries outside the API. Adds a CSR-style `ray_offsets: &[u32]` prefix-sum (length `n_rays + 1`) that segments the flat amplitude buffer into per-ray ranges. Documented contract: - `ray_offsets[0] == 0` and `ray_offsets[n_rays] == sorted_amplitudes.len()` - Empty ray (`ray_offsets[r] == ray_offsets[r+1]`) yields `out_alpha[r] = 0` - Rays are independent (no cross-ray data dependence) — outer loop is trivially parallelizable - Per-frame amplitude quantization is caller-side; `opacity_lut` is a frame-global constant for that pass Adds three new tests: - Multi-ray independence (concatenated rays match per-ray calls) - Empty-ray boundary case (→ α = 0) - `ray_offsets` invariant debug-asserts ## Fix 2 — `batched_mahalanobis` needs scratch buffer for Cholesky cache (codex P2) The original implementation note said L was "heap-free via stack or caller-provided scratch" but the public signature had no scratch parameter. At the documented `N = 1_000_000` bench size, the Cholesky cache is `6 * N * size_of::<f32>() = 24 MiB` — not stack-feasible. The function would either have to allocate internally (breaking the zero-allocation contract) or recompute factors per query (breaking the throughput contract). Adds explicit `cholesky_scratch: &mut [f32]` parameter (length `6*N`) with documented sizing guidance: - `N ≤ 8192` MAY use a stack-resident buffer - `N > 8192` MUST allocate once at engine init and re-use across frames - The function MUST NOT allocate internally Matches the `splat-fit` engine and registration-loop pattern where the scratch is allocated once per `SplatFitActor` mailbox at boot. ## What's NOT in this PR - Source code: still none. Plan-spec only. - The W1c primitive-addition contract (all three backends mandatory, parity tests gate, VPABSB-correction-style degenerate-input documentation) is unchanged — the fix updates the two signatures but not the testing or backend invariants. ## Test plan - [x] Codex P1 (ray segmentation) — added per-ray offset + 3 new tests to the contract. - [x] Codex P2 (Mahalanobis scratch) — added `cholesky_scratch` parameter + sizing note + zero-allocation contract. - [x] Signatures rebalanced (each line-broken with one arg per line + sized comments) for readability. - [ ] Codex re-review on this PR.

coderabbitai · 2026-06-05T10:23:19Z

Warning

Review limit reached

@AdaWorldAPI, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 15 minutes and 7 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d39e15cc-db31-4036-a3cf-4103cbdb34d5

📥 Commits

Reviewing files that changed from the base of the PR and between 481205a and 3c78901.

📒 Files selected for processing (1)

.claude/plans/splat-native-ultrasound-simd-substrate-v1.md

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c1fb05d81

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-05T10:25:16Z

+    query_xyz: &[f32],            // length 3*M
+    mu_xyz: &[f32],               // length 3*N
+    sigma_packed: &[f32],         // length 6*N (upper-triangle Σ per Gaussian)
+    cholesky_scratch: &mut [f32], // length 6*N (caller-provided; holds packed L per Gaussian)


Keep the Mahalanobis skeleton signature in sync

The detailed spec now requires cholesky_scratch, but the module skeleton at the top of this same plan still advertises batched_mahalanobis(query_xyz, mu_xyz, sigma_packed, out_dist_sq) without any scratch buffer. Since this document is the handoff for implementing src/simd_splat.rs, an implementer following the skeleton would recreate the exact no-scratch API this change is meant to eliminate, leaving the zero-allocation contract unenforceable.

Useful? React with 👍 / 👎.

Addressed in commit 3c78901 on the same PR branch before merge — the module skeleton at line 50 now shows the cholesky_scratch: &mut [f32] parameter with the inline note pointing to §4.2 for sizing + the "function MUST NOT allocate" contract. Verified on merged master tip e2ec430:

pub fn batched_mahalanobis( query_xyz: &[f32], // M × 3 query points mu_xyz: &[f32], // N × 3 Gaussian centroids sigma_packed: &[f32], // N × 6 packed Σ cholesky_scratch: &mut [f32], // N × 6 — caller-provided packed-L scratch (24 MiB @ N=1M); function MUST NOT allocate (see §4.2) out_dist_sq: &mut [f32], // M × N output (squared Mahalanobis) );

Closing — skeleton and §4.2 detailed spec are now in sync.

chatgpt-codex-connector · 2026-06-05T10:25:16Z

 pub fn batched_opacity_blend(
-    sorted_amplitudes: &[f32], opacity_lut: &[u8; 256], out_alpha: &mut [u8],
+    sorted_amplitudes: &[f32],    // flat; contains all rays' samples concatenated
+    ray_offsets: &[u32],          // length = n_rays + 1 (CSR-style); ray r's range is [ray_offsets[r]..ray_offsets[r+1])


Update the opacity skeleton to include ray offsets

The per-primitive spec now makes ray_offsets mandatory for segmenting concatenated rays, but the module skeleton above still shows batched_opacity_blend(sorted_amplitudes, opacity_lut, out_alpha) with no segmentation parameter. If downstream code is implemented from that skeleton, it will still have no way to map the flat amplitude buffer to independent output rays, preserving the API bug this follow-up is supposed to fix.

Useful? React with 👍 / 👎.

Addressed in commit 3c78901 on the same PR branch before merge — the module skeleton now shows the CSR-style ray_offsets: &[u32] parameter (length n_rays + 1) with the inline note pointing to §4.3 for the segmentation contract; out_alpha length corrected to n_rays; sorted_amplitudes re-described as flat concatenation. Verified on merged master tip e2ec430:

pub fn batched_opacity_blend( sorted_amplitudes: &[f32], // flat; all rays' samples concatenated (front-to-back per ray) ray_offsets: &[u32], // length = n_rays + 1 (CSR-style); ray r's range is [ray_offsets[r]..ray_offsets[r+1]) (see §4.3) opacity_lut: &[u8; 256], // amplitude → opacity LUT out_alpha: &mut [u8], // length = n_rays — composited alpha per ray );

Closing — skeleton and §4.3 detailed spec are now in sync.

…213) Follow-up to codex review on PR #213. The per-primitive specs at §4.2 (`batched_mahalanobis`) and §4.3 (`batched_opacity_blend`) were correctly updated in #213, but the module skeleton at the top of the same file (the `pub fn` declarations in the `src/simd_splat.rs` sketch around line 50) still advertised the OLD no-scratch / no-ray-offsets signatures. Since this document is the implementation handoff for `src/simd_splat.rs`, an implementer scanning the skeleton first would recreate the exact APIs #213 was meant to eliminate, leaving the zero-allocation contract (Mahalanobis) and per-ray segmentation contract (opacity blend) unenforceable. ## Fix Updates the two skeleton signatures to match §4.2 and §4.3 exactly, with cross-references to the per-primitive sections: - `batched_mahalanobis` — adds `cholesky_scratch: &mut [f32]` (length N × 6); inline comment cites §4.2 sizing guidance and the "function MUST NOT allocate" contract. - `batched_opacity_blend` — adds `ray_offsets: &[u32]` (length n_rays + 1, CSR-style); `sorted_amplitudes` re-described as flat concatenation; `out_alpha` length now `n_rays` (not "per pixel"); inline comment cites §4.3 segmentation contract. The two implementations of the §4.2 / §4.3 detailed specs are unchanged in this PR — this commit only syncs the up-front skeleton so the two views of the API agree. ## Test plan - [x] Skeleton + §4.2 + §4.3 now show identical parameter lists. - [x] Comments on the new parameters cite the section that owns the detailed contract (so an implementer who reads the skeleton first gets pointed to the contract section). - [ ] Codex re-review on this PR.

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

AdaWorldAPI merged commit e2ec430 into master Jun 5, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(splat-native): address review feedback on #212 (ray segmentation + Cholesky scratch buffer)#213

docs(splat-native): address review feedback on #212 (ray segmentation + Cholesky scratch buffer)#213
AdaWorldAPI merged 2 commits into
masterfrom
claude/splat-native-ultrasound-v1-fixes

AdaWorldAPI commented Jun 5, 2026

Uh oh!

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

Review limit reached

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Uh oh!

AdaWorldAPI Jun 5, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Uh oh!

AdaWorldAPI Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Jun 5, 2026

Summary

Fixes

Fix 1 — batched_opacity_blend ray segmentation (codex P1)

Fix 2 — batched_mahalanobis Cholesky scratch buffer (codex P2)

What's NOT in this PR

Test plan

Companion follow-up PRs (the four-PR fix arc)

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

AdaWorldAPI Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

AdaWorldAPI Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix 1 — `batched_opacity_blend` ray segmentation (codex P1)

Fix 2 — `batched_mahalanobis` Cholesky scratch buffer (codex P2)

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading