docs(splat-native): ndarray SIMD substrate plan (D-SPLAT-2, P1 foundation — 5 W1c primitives, three-backend mandatory)#212
Conversation
Companion to lance-graph's canonical splat-native-ultrasound-v1 plan (see `lance-graph/.claude/plans/splat-native-ultrasound-v1.md`). This file is the ndarray-side perspective: the one deliverable owed by ndarray — `src/simd_splat.rs` — and its W1c-style five primitives: - `batched_cholesky_3x3` — 3×3 packed Σ → Cholesky factor L - `batched_mahalanobis` — squared Mahalanobis over M queries × N Gaussians - `batched_opacity_blend` — front-to-back alpha composite over sorted Gaussians - `batched_sh_eval_l3` — degree-3 spherical harmonics at a single view direction - `batched_se3_transform` — rigid SE(3) transform on centroids + covariance All three backends mandatory (AVX-512 / NEON / scalar) per the existing consumer-contract knowledge doc + W1a primitive-addition pattern. Parity tests gating; Jirak-grounded significance thresholds for any derived bounds (I-NOISE-FLOOR-JIRAK). Sprint window: P1 (sprint 1-2). FOUNDATION work — must land before any downstream consumer (lance-graph carriers, splat-fit engine, registration math, AR renderer). No code in this PR. Spec-only.
|
Warning Review limit reached
More reviews will be available in 59 minutes and 57 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f714de8f3c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub fn batched_opacity_blend( | ||
| sorted_amplitudes: &[f32], opacity_lut: &[u8; 256], out_alpha: &mut [u8], | ||
| ); |
There was a problem hiding this comment.
Add ray segmentation to opacity-blend API
This signature is supposed to produce out_alpha[ray] for multiple rays, but it only receives one flat sorted_amplitudes sequence and no per-ray offsets/counts, so an implementation cannot know which Gaussians belong to each output pixel/ray. In a renderer with more than one ray this either composites the same global sequence for every output or has to guess boundaries outside the API, so the plan should include a segmented layout (and any needed per-frame quantization data) before downstream splat-render depends on it.
Useful? React with 👍 / 👎.
| ); | ||
| ``` | ||
|
|
||
| **Implementation note:** internally calls `batched_cholesky_3x3` once on `sigma_packed`, caches L (heap-free via stack or caller-provided scratch), then triangular-solve + squared norm per (m, n) pair. |
There was a problem hiding this comment.
Expose scratch storage for Mahalanobis caching
The note requires caching Cholesky factors for all N Gaussians while staying heap-free, but the public signature has no scratch parameter; at the stated N=1M benchmark that cache is about 6 * N * sizeof(f32) (~24 MB), which is not stack-feasible. Without caller-provided scratch the implementation must allocate internally on every call or recompute factors for each query, breaking the zero-allocation/throughput contract this plan sets up.
Useful? React with 👍 / 👎.
…nd-v1-fixes docs(splat-native): address review feedback on #212 (ray segmentation + Cholesky scratch buffer)
Summary
ndarray-side companion to the cross-workspace splat-native-ultrasound-v1 integration plan (canonical at
lance-graph/.claude/plans/splat-native-ultrasound-v1.md).This PR ships one new plan file that specs the one deliverable owed by ndarray —
src/simd_splat.rs— as a W1c-style five-primitive batch module for anisotropic Gaussian math:batched_cholesky_3x3batched_mahalanobisbatched_opacity_blendbatched_sh_eval_l3batched_se3_transformR Σ Rᵀ)All three backends mandatory (AVX-512 / NEON / scalar) per the existing consumer-contract knowledge doc and W1a primitive-addition pattern. Parity tests gating; degenerate-input sentinel semantics documented per primitive.
Sprint window
P1, sprint 1-2. Foundation work — must land before every downstream consumer in the splat-native arc (lance-graph carriers, splat-fit engine, registration math, AR renderer).
What this PR ships
.claude/plans/splat-native-ultrasound-simd-substrate-v1.md(~380 lines).No code. Spec-only. Implementation PR comes after user ratifies the OQ-SPLAT-1..5 open questions in the canonical plan.
Anchored to
I-NOISE-FLOOR-JIRAK(Jirak weak-dependence Berry-Esseen for any significance threshold derived in downstream registration math)lance-graph/.claude/knowledge/ndarray-vertical-simd-alien-magic.md(W1a/W1c primitive-addition pattern; mandatory three-backend; parity-test gates; VPABSB-correction-style semantic carve-out for degenerate Σ)Inherits (no new build)
simd_caps()runtime dispatch (already battle-tested across simd_avx2/simd_avx512/simd_neon)core::archintrinsics (no new third-party deps)permute_bytes— relevant forbatched_opacity_blend)simd-splatfeature flag pattern)Acceptance criteria
Test plan
Cross-PR coordination
This is one of four coordinated PRs for the splat-native-ultrasound-v1 cross-workspace plan. All four reference each other; lance-graph is canonical.
lance-graphclaude/splat-native-ultrasound-v1ndarrayclaude/splat-native-ultrasound-v1MedCare-rsclaude/splat-native-ultrasound-v1OGARclaude/splat-native-customerTriggered by user-supplied architecture diagrams (English 6-stage technical + German business-facing). No source code in any of the four PRs — design-spec / handover only.
Authored by session
claude/lance-graph-ontology-review-Pyry3.