Skip to content

docs(splat-native): ndarray SIMD substrate plan (D-SPLAT-2, P1 foundation — 5 W1c primitives, three-backend mandatory)#212

Merged
AdaWorldAPI merged 1 commit into
masterfrom
claude/splat-native-ultrasound-v1
Jun 5, 2026
Merged

docs(splat-native): ndarray SIMD substrate plan (D-SPLAT-2, P1 foundation — 5 W1c primitives, three-backend mandatory)#212
AdaWorldAPI merged 1 commit into
masterfrom
claude/splat-native-ultrasound-v1

Conversation

@AdaWorldAPI

Copy link
Copy Markdown
Owner

Summary

ndarray-side companion to the cross-workspace splat-native-ultrasound-v1 integration plan (canonical at lance-graph/.claude/plans/splat-native-ultrasound-v1.md).

This PR ships one new plan file that specs the one deliverable owed by ndarraysrc/simd_splat.rs — as a W1c-style five-primitive batch module for anisotropic Gaussian math:

Primitive Purpose
batched_cholesky_3x3 3×3 packed Σ → Cholesky factor L (degenerate-Σ NaN sentinel)
batched_mahalanobis Squared Mahalanobis over M queries × N Gaussians (Σ-sandwich-ready)
batched_opacity_blend Front-to-back alpha composite over sorted Gaussians (CPU rasterization)
batched_sh_eval_l3 Degree-3 spherical harmonics at one view direction
batched_se3_transform Rigid SE(3) on N centroids + covariance (R Σ Rᵀ)

All three backends mandatory (AVX-512 / NEON / scalar) per the existing consumer-contract knowledge doc and W1a primitive-addition pattern. Parity tests gating; degenerate-input sentinel semantics documented per primitive.

Sprint window

P1, sprint 1-2. Foundation work — must land before every downstream consumer in the splat-native arc (lance-graph carriers, splat-fit engine, registration math, AR renderer).

What this PR ships

.claude/plans/splat-native-ultrasound-simd-substrate-v1.md (~380 lines).

No code. Spec-only. Implementation PR comes after user ratifies the OQ-SPLAT-1..5 open questions in the canonical plan.

Anchored to

  • I-NOISE-FLOOR-JIRAK (Jirak weak-dependence Berry-Esseen for any significance threshold derived in downstream registration math)
  • The ndarray vertical-SIMD consumer contract at lance-graph/.claude/knowledge/ndarray-vertical-simd-alien-magic.md (W1a/W1c primitive-addition pattern; mandatory three-backend; parity-test gates; VPABSB-correction-style semantic carve-out for degenerate Σ)

Inherits (no new build)

  • Existing simd_caps() runtime dispatch (already battle-tested across simd_avx2/simd_avx512/simd_neon)
  • Existing AVX-512 / NEON parity-test infrastructure
  • Existing core::arch intrinsics (no new third-party deps)
  • ndarray PR fix(simd): VBMI gate for permute_bytes + Inf clamp for simd_exp_f32 #142 (VBMI gate pattern for permute_bytes — relevant for batched_opacity_blend)
  • ndarray PR #463 (ndarray-is-mandatory in lance-graph; default-on simd-splat feature flag pattern)

Acceptance criteria

  • Correctness: reference-implementation comparison passes within 4 ulp on each primitive (nalgebra Cholesky / scipy Mahalanobis / analytical SH closed-form).
  • Parity: AVX-512 ≡ NEON ≡ scalar within ULP tolerance.
  • Bench: ≥ 2× scalar throughput on AVX-512 at N=1M for Cholesky; ≥ 1.4× at M=1k, N=1M for Mahalanobis (the registration regime).

Test plan

  • Docs/plan only — no source code in this PR.
  • User ratification of OQ-SPLAT-1..5 in canonical plan before implementation PR opens.
  • Implementation PR (separate) lands the five primitives, all three backends, parity tests, bench gates.

Cross-PR coordination

This is one of four coordinated PRs for the splat-native-ultrasound-v1 cross-workspace plan. All four reference each other; lance-graph is canonical.

Repo Branch What
lance-graph claude/splat-native-ultrasound-v1 canonical plan + board hygiene (14 D-SPLAT-* deliverables)
ndarray claude/splat-native-ultrasound-v1 D-SPLAT-2 SIMD substrate plan (5 batch primitives, three-backend mandatory)
MedCare-rs claude/splat-native-ultrasound-v1 D-SPLAT-10/11 HIPAA wire handover
OGAR claude/splat-native-customer §6 FMA-litmus customer narrative

Triggered by user-supplied architecture diagrams (English 6-stage technical + German business-facing). No source code in any of the four PRs — design-spec / handover only.


Authored by session claude/lance-graph-ontology-review-Pyry3.

Companion to lance-graph's canonical splat-native-ultrasound-v1 plan
(see `lance-graph/.claude/plans/splat-native-ultrasound-v1.md`). This
file is the ndarray-side perspective: the one deliverable owed by
ndarray — `src/simd_splat.rs` — and its W1c-style five primitives:

- `batched_cholesky_3x3` — 3×3 packed Σ → Cholesky factor L
- `batched_mahalanobis` — squared Mahalanobis over M queries × N Gaussians
- `batched_opacity_blend` — front-to-back alpha composite over sorted Gaussians
- `batched_sh_eval_l3` — degree-3 spherical harmonics at a single view direction
- `batched_se3_transform` — rigid SE(3) transform on centroids + covariance

All three backends mandatory (AVX-512 / NEON / scalar) per the existing
consumer-contract knowledge doc + W1a primitive-addition pattern. Parity
tests gating; Jirak-grounded significance thresholds for any derived
bounds (I-NOISE-FLOOR-JIRAK).

Sprint window: P1 (sprint 1-2). FOUNDATION work — must land before any
downstream consumer (lance-graph carriers, splat-fit engine, registration
math, AR renderer).

No code in this PR. Spec-only.
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@AdaWorldAPI, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 59 minutes and 57 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: de8465bb-23f0-4b40-8110-c533ff1a228a

📥 Commits

Reviewing files that changed from the base of the PR and between 0129b5c and f714de8.

📒 Files selected for processing (1)
  • .claude/plans/splat-native-ultrasound-simd-substrate-v1.md

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f714de8f3c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +228 to +230
pub fn batched_opacity_blend(
sorted_amplitudes: &[f32], opacity_lut: &[u8; 256], out_alpha: &mut [u8],
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add ray segmentation to opacity-blend API

This signature is supposed to produce out_alpha[ray] for multiple rays, but it only receives one flat sorted_amplitudes sequence and no per-ray offsets/counts, so an implementation cannot know which Gaussians belong to each output pixel/ray. In a renderer with more than one ray this either composites the same global sequence for every output or has to guess boundaries outside the API, so the plan should include a segmented layout (and any needed per-frame quantization data) before downstream splat-render depends on it.

Useful? React with 👍 / 👎.

);
```

**Implementation note:** internally calls `batched_cholesky_3x3` once on `sigma_packed`, caches L (heap-free via stack or caller-provided scratch), then triangular-solve + squared norm per (m, n) pair.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expose scratch storage for Mahalanobis caching

The note requires caching Cholesky factors for all N Gaussians while staying heap-free, but the public signature has no scratch parameter; at the stated N=1M benchmark that cache is about 6 * N * sizeof(f32) (~24 MB), which is not stack-feasible. Without caller-provided scratch the implementation must allocate internally on every call or recompute factors for each query, breaking the zero-allocation/throughput contract this plan sets up.

Useful? React with 👍 / 👎.

@AdaWorldAPI AdaWorldAPI merged commit 481205a into master Jun 5, 2026
18 checks passed
AdaWorldAPI added a commit that referenced this pull request Jun 5, 2026
…nd-v1-fixes

docs(splat-native): address review feedback on #212 (ray segmentation + Cholesky scratch buffer)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants