docs(splat-native): address review feedback on #212 (ray segmentation + Cholesky scratch buffer)#213
Conversation
Follow-up to PR #212 (merged). Addresses two codex review findings. ## Fix 1 — `batched_opacity_blend` needs ray segmentation (codex P1) The original signature took a single flat `sorted_amplitudes` slice and emitted a single `out_alpha[ray]` — but a renderer composites N independent view rays per frame, each with its own front-to-back- sorted Gaussian sequence. Without per-ray boundaries the implementation could not know which Gaussians belong to which output pixel, so it would either composite the same global sequence for every ray or guess boundaries outside the API. Adds a CSR-style `ray_offsets: &[u32]` prefix-sum (length `n_rays + 1`) that segments the flat amplitude buffer into per-ray ranges. Documented contract: - `ray_offsets[0] == 0` and `ray_offsets[n_rays] == sorted_amplitudes.len()` - Empty ray (`ray_offsets[r] == ray_offsets[r+1]`) yields `out_alpha[r] = 0` - Rays are independent (no cross-ray data dependence) — outer loop is trivially parallelizable - Per-frame amplitude quantization is caller-side; `opacity_lut` is a frame-global constant for that pass Adds three new tests: - Multi-ray independence (concatenated rays match per-ray calls) - Empty-ray boundary case (→ α = 0) - `ray_offsets` invariant debug-asserts ## Fix 2 — `batched_mahalanobis` needs scratch buffer for Cholesky cache (codex P2) The original implementation note said L was "heap-free via stack or caller-provided scratch" but the public signature had no scratch parameter. At the documented `N = 1_000_000` bench size, the Cholesky cache is `6 * N * size_of::<f32>() = 24 MiB` — not stack-feasible. The function would either have to allocate internally (breaking the zero-allocation contract) or recompute factors per query (breaking the throughput contract). Adds explicit `cholesky_scratch: &mut [f32]` parameter (length `6*N`) with documented sizing guidance: - `N ≤ 8192` MAY use a stack-resident buffer - `N > 8192` MUST allocate once at engine init and re-use across frames - The function MUST NOT allocate internally Matches the `splat-fit` engine and registration-loop pattern where the scratch is allocated once per `SplatFitActor` mailbox at boot. ## What's NOT in this PR - Source code: still none. Plan-spec only. - The W1c primitive-addition contract (all three backends mandatory, parity tests gate, VPABSB-correction-style degenerate-input documentation) is unchanged — the fix updates the two signatures but not the testing or backend invariants. ## Test plan - [x] Codex P1 (ray segmentation) — added per-ray offset + 3 new tests to the contract. - [x] Codex P2 (Mahalanobis scratch) — added `cholesky_scratch` parameter + sizing note + zero-allocation contract. - [x] Signatures rebalanced (each line-broken with one arg per line + sized comments) for readability. - [ ] Codex re-review on this PR.
|
Warning Review limit reached
More reviews will be available in 15 minutes and 7 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0c1fb05d81
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| query_xyz: &[f32], // length 3*M | ||
| mu_xyz: &[f32], // length 3*N | ||
| sigma_packed: &[f32], // length 6*N (upper-triangle Σ per Gaussian) | ||
| cholesky_scratch: &mut [f32], // length 6*N (caller-provided; holds packed L per Gaussian) |
There was a problem hiding this comment.
Keep the Mahalanobis skeleton signature in sync
The detailed spec now requires cholesky_scratch, but the module skeleton at the top of this same plan still advertises batched_mahalanobis(query_xyz, mu_xyz, sigma_packed, out_dist_sq) without any scratch buffer. Since this document is the handoff for implementing src/simd_splat.rs, an implementer following the skeleton would recreate the exact no-scratch API this change is meant to eliminate, leaving the zero-allocation contract unenforceable.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in commit 3c78901 on the same PR branch before merge — the module skeleton at line 50 now shows the cholesky_scratch: &mut [f32] parameter with the inline note pointing to §4.2 for sizing + the "function MUST NOT allocate" contract. Verified on merged master tip e2ec430:
pub fn batched_mahalanobis(
query_xyz: &[f32], // M × 3 query points
mu_xyz: &[f32], // N × 3 Gaussian centroids
sigma_packed: &[f32], // N × 6 packed Σ
cholesky_scratch: &mut [f32], // N × 6 — caller-provided packed-L scratch (24 MiB @ N=1M); function MUST NOT allocate (see §4.2)
out_dist_sq: &mut [f32], // M × N output (squared Mahalanobis)
);Closing — skeleton and §4.2 detailed spec are now in sync.
| pub fn batched_opacity_blend( | ||
| sorted_amplitudes: &[f32], opacity_lut: &[u8; 256], out_alpha: &mut [u8], | ||
| sorted_amplitudes: &[f32], // flat; contains all rays' samples concatenated | ||
| ray_offsets: &[u32], // length = n_rays + 1 (CSR-style); ray r's range is [ray_offsets[r]..ray_offsets[r+1]) |
There was a problem hiding this comment.
Update the opacity skeleton to include ray offsets
The per-primitive spec now makes ray_offsets mandatory for segmenting concatenated rays, but the module skeleton above still shows batched_opacity_blend(sorted_amplitudes, opacity_lut, out_alpha) with no segmentation parameter. If downstream code is implemented from that skeleton, it will still have no way to map the flat amplitude buffer to independent output rays, preserving the API bug this follow-up is supposed to fix.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in commit 3c78901 on the same PR branch before merge — the module skeleton now shows the CSR-style ray_offsets: &[u32] parameter (length n_rays + 1) with the inline note pointing to §4.3 for the segmentation contract; out_alpha length corrected to n_rays; sorted_amplitudes re-described as flat concatenation. Verified on merged master tip e2ec430:
pub fn batched_opacity_blend(
sorted_amplitudes: &[f32], // flat; all rays' samples concatenated (front-to-back per ray)
ray_offsets: &[u32], // length = n_rays + 1 (CSR-style); ray r's range is [ray_offsets[r]..ray_offsets[r+1]) (see §4.3)
opacity_lut: &[u8; 256], // amplitude → opacity LUT
out_alpha: &mut [u8], // length = n_rays — composited alpha per ray
);Closing — skeleton and §4.3 detailed spec are now in sync.
…213) Follow-up to codex review on PR #213. The per-primitive specs at §4.2 (`batched_mahalanobis`) and §4.3 (`batched_opacity_blend`) were correctly updated in #213, but the module skeleton at the top of the same file (the `pub fn` declarations in the `src/simd_splat.rs` sketch around line 50) still advertised the OLD no-scratch / no-ray-offsets signatures. Since this document is the implementation handoff for `src/simd_splat.rs`, an implementer scanning the skeleton first would recreate the exact APIs #213 was meant to eliminate, leaving the zero-allocation contract (Mahalanobis) and per-ray segmentation contract (opacity blend) unenforceable. ## Fix Updates the two skeleton signatures to match §4.2 and §4.3 exactly, with cross-references to the per-primitive sections: - `batched_mahalanobis` — adds `cholesky_scratch: &mut [f32]` (length N × 6); inline comment cites §4.2 sizing guidance and the "function MUST NOT allocate" contract. - `batched_opacity_blend` — adds `ray_offsets: &[u32]` (length n_rays + 1, CSR-style); `sorted_amplitudes` re-described as flat concatenation; `out_alpha` length now `n_rays` (not "per pixel"); inline comment cites §4.3 segmentation contract. The two implementations of the §4.2 / §4.3 detailed specs are unchanged in this PR — this commit only syncs the up-front skeleton so the two views of the API agree. ## Test plan - [x] Skeleton + §4.2 + §4.3 now show identical parameter lists. - [x] Comments on the new parameters cite the section that owns the detailed contract (so an implementer who reads the skeleton first gets pointed to the contract section). - [ ] Codex re-review on this PR.
Summary
Follow-up to #212 (merged). Addresses 2 codex findings on the W1c primitive signatures in the splat-native SIMD substrate plan.
Fixes
Fix 1 —
batched_opacity_blendray segmentation (codex P1)The original signature took a single flat
sorted_amplitudesslice and emitted a singleout_alpha[ray]— but a renderer composites N independent view rays per frame, each with its own front-to-back-sorted Gaussian sequence. Without per-ray boundaries the implementation could not know which Gaussians belong to which output pixel.Adds a CSR-style
ray_offsets: &[u32]prefix-sum (lengthn_rays + 1) that segments the flat amplitude buffer into per-ray ranges:Contract:
ray_offsets[0] == 0;ray_offsets[n_rays] == sorted_amplitudes.len(); empty ray (ray_offsets[r] == ray_offsets[r+1]) yieldsα = 0; rays are independent (outer loop parallelizable); per-frame quantization is caller-side.Tests added: single-ray + multi-ray reference comparison, empty-ray boundary case, multi-ray independence (concatenated rays ≡ separate calls),
ray_offsetsinvariant debug-asserts, SIMD parity.Fix 2 —
batched_mahalanobisCholesky scratch buffer (codex P2)The implementation note said L was "heap-free via stack or caller-provided scratch" but the public signature had no scratch parameter. At the documented
N = 1_000_000bench size, the Cholesky cache is6 * N * size_of::<f32>() = 24 MiB— not stack-feasible.Adds explicit
cholesky_scratch: &mut [f32]parameter (length6*N):Sizing guidance:
N ≤ 8192MAY use stack-resident;N > 8192MUST allocate once at engine init and re-use across frames; the function MUST NOT allocate internally. Matches thesplat-fitengine + registration loop pattern (allocated once perSplatFitActormailbox at boot).What's NOT in this PR
Test plan
cholesky_scratchparameter + sizing note + zero-allocation contract.Companion follow-up PRs (the four-PR fix arc)
lance-graphclaude/splat-native-ultrasound-v1-fixespose_se316→24, NR-SPLAT-PHI normative rule, MD040 fence-tag lint, IVD-MDR→MDR Annex VIII (#471 follow-up)ndarrayclaude/splat-native-ultrasound-v1-fixesbatched_opacity_blendray segmentation (CSR-styleray_offsets),batched_mahalanobisCholesky-scratch buffer (#212 follow-up)MedCare-rsclaude/splat-native-ultrasound-v1-fixespose_se316→24 (#163 follow-up)OGARclaude/splat-native-customer-fixes(PR #35)All four cross-reference each other. No source code in any of the four PRs — design-spec / handover updates only.
Authored by session
claude/lance-graph-ontology-review-Pyry3.