Skip to content

feat(userApi): layer factory API PR 2 — Conv1d, ConvT1d, Pool1d, Softmax + bit-parity CI#175

Open
LeoBuron wants to merge 4 commits into
layer-factory-api-pr1from
layer-factory-api-pr2
Open

feat(userApi): layer factory API PR 2 — Conv1d, ConvT1d, Pool1d, Softmax + bit-parity CI#175
LeoBuron wants to merge 4 commits into
layer-factory-api-pr1from
layer-factory-api-pr2

Conversation

@LeoBuron
Copy link
Copy Markdown
Member

Summary

Second of the two-PR series migrating the layer construction API to
designated-initializer-based factories (PR 1 landed Linear + ReLU +
Flatten + infrastructure). This PR:

  • Adds Borrowing + Owning factories for Conv1d, Conv1dTransposed,
    MaxPool1d, AvgPool1d, and Softmax — same pattern as PR 1's Linear /
    ReLU.
  • Wires CONV1D and CONV1D_TRANSPOSED dispatch into the
    layerLoadWeights primitive that PR 1 stubbed.
  • Extracts the per-file deepCopyQuantization helpers that PR 1
    introduced into LinearApi.c and ReluApi.c into a shared utility
    in LayerQuant.{h,c} — exercised by every new Owning factory.
  • Migrates the two example training binaries to new _v2 directories
    (coexistence strategy Z — no rename churn). Each v2 binary supports
    a BIT_PARITY env-var mode that loads the PyTorch state_dict via
    modelLoadStateDict and runs inference, skipping training.
  • Adds a c-bit-parity CI job that trains PyTorch (emitting per-layer
    weight .npy files), runs the v2 binaries with BIT_PARITY=1, and
    diffs their inference outputs against the PyTorch reference
    predictions/reconstructions.
  • Adds `_Static_assert(VALID == 0, ...)` guards in `Conv1dApi.h`,
    `Conv1dTransposedApi.h`, `Pool1dApi.h`.

Design refinements from the spec

  • `pool1dInit_t` is split into `maxPool1dInit_t` (with `inputChannels`
    • `inputLength` for argmax pre-allocation) and `avgPool1dInit_t`
      (without). Spec section 3.3 anticipated this split for the
      `ceil_mode` case; we trigger it now because the argmax-sizing
      requirement is structurally similar.

Test plan

  • `ctest --preset unit_test_debug` — 50/50 tests pass (47 from PR 1 + 3 new test files: UnitTestConv1dApi, UnitTestConv1dTransposedApi, UnitTestPool1dApi; plus new tests added to UnitTestLayerQuant, UnitTestLayerWeightsApi, UnitTestSoftmax).
  • `ctest --preset unit_test_error` — 50/50 pass at strictest log level.
  • Both v2 example binaries build cleanly (`train_c_har_classifier_v2`, `train_c_ecg_anomaly_ae_v2`).
  • Both legacy example binaries still build (proves the rename sweeps are correct).
  • alloc-locality grep gate passes.
  • clang-format dry-run is clean.
  • CI `c-bit-parity` job goes green on the PR.

References

  • Spec: `docs/superpowers/specs/2026-05-15-layer-factory-api-design.md` (gitignored)
  • Prior PR: `layer-factory-api-pr1` (this branch is stacked on top —
    rebase onto `develop` after PR 1 lands)

🤖 Generated with Claude Code

@LeoBuron LeoBuron force-pushed the layer-factory-api-pr1 branch from 2588296 to b6a8a51 Compare May 23, 2026 09:58
LeoBuron added 4 commits May 23, 2026 11:58
… bias memcpy

Mirrors the CONV1D case but routes through conv1dTransposedConfig_t.
Caller-side weight buffer shape is [inChannels, outChannels/groups,
kernelSize] — the SWAP relative to Conv1d is documented in the
Conv1dTransposedApi header.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1

refactor(userApi): rename Softmax factories to *Legacy for new API coexistence

Frees the canonical softmaxLayerInit / freeSoftmaxLayer names. Legacy
bodies are functionally unchanged except for the explicit
softmaxConfig->ownsQuantizations = false (defensive — matches calloc
default).

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md

feat(userApi): declare conv1dInit_t and new Conv1d factory signatures

Adds the per-layer init struct, Borrowing/Owning factory decls, and
freeConv1dLayer decl. _Static_assert guards that paddingType_t::VALID
remains enum value 0 so .padding zero-init defaults to VALID. No impl
yet — that follows in the next two commits.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 3.4, 4

test(userApi): failing tests for new conv1dLayerInit Borrowing variant

Four tests: shape correctness with explicit fields, BIAS_DEFAULT
resolution, BIAS_FALSE leaves bias NULL, padding/stride/dilation/groups
zero-init defaults (VALID/1/1/1). Fails at link until impl lands.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 4, 5.1

feat(userApi): declare conv1dTransposedInit_t and Conv1dTransposed factory signatures

New Conv1dTransposedApi.h header with conv1dTransposedInit_t struct,
Borrowing/Owning factory decls, freeConv1dTransposedLayer decl, and
_Static_assert(VALID == 0). Stub .c file registered in CMake.
Implementation in subsequent commits.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 3.4, 4

test(userApi): failing tests for new conv1dTransposedLayerInit Borrowing

Three tests: shape correctness (with inChannels/outChannels weight
shape SWAP relative to Conv1d), BIAS_FALSE leaves bias NULL,
outputPadding propagates to internal config. Fails at link until impl
lands.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 4

feat(userApi): declare Pool1d factory signatures (Max + Avg)

Splits the spec's shared pool1dInit_t into maxPool1dInit_t (with
inputChannels + inputLength for argmax pre-allocation) and
avgPool1dInit_t (no input geometry, no dilation). Both factory pairs
declared in one header; impl stubbed in Pool1dApi.c for the CMake
graph. Per-layer init code follows.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (with documented split), 3.4, 4

test(userApi): failing tests for maxPool1dLayerInit Borrowing + Owning

Four tests: kernel + argmax shape correctness, stride defaulting to
kernelSize (PyTorch pool convention), Owning deep-copy of forwardMath
and backwardMath into the two pool config slots (forwardQ + propLossQ),
and a leak-check loop. Fails at link until impl lands.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (split), 4

test(userApi): failing tests for avgPool1dLayerInit Borrowing + Owning

Three tests: kernel correctness, stride defaulting to kernelSize, and
Owning deep-copy. AvgPool has no dilation (struct field omitted) and
no argmax tensor (no input geometry required).

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 4

test(userApi): failing tests for layerLoadWeights CONV1D case

Two tests: weight + bias memcpy, and no-bias accepts NULL biasData.
Fails because the current CONV1D dispatch is the PR 1 PRINT_ERROR
stub. Implementation in next commit.

Also adds TensorApi include + MORE_LIBS entry (provides freeQuantization,
which the conv1d tests need but the plan omitted).

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1

feat(userApi): layerLoadWeights CONV1D dispatch — weight + bias memcpy

Replaces PR 1's TODO stub. Same shape as the LINEAR case: memcpy from
the caller-provided float* buffer into the factory-allocated tensor
data, with bias presence/absence enforcement matching the bool resolved
from conv1dInit_t::bias.

Also links Conv1d into LayerWeightsApi CMake target and adds TensorApi
to the test's MORE_LIBS (provides freeQuantization, omitted from spec).

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1

test(userApi): failing test for layerLoadWeights CONV1D_TRANSPOSED case

Verifies weight + bias memcpy into the factory-allocated Conv1dTransposed
parameter tensors. Fails because the current dispatch is a
PRINT_ERROR + exit stub.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1

test(userApi): failing tests for shared deepCopyQuantization in LayerQuant

Three tests cover the null-input shortcut, FLOAT32 (no qConfig), and
SYM_INT32 (qConfig bytes duplicated). Fails at link until the impl
lands in LayerQuant.c.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 5.2, 5.3

refactor(userApi): extract deepCopyQuantization to shared LayerQuant utility

PR 1 inlined deepCopyQuantization into LinearApi.c and an equivalent
reluDeepCopyQuantization into ReluApi.c. Hoists both into a single
externally-linked function in LayerQuant.c. LinearApi and ReluApi now
share the same code path; the new Conv1d / Conv1dTransposed / Pool1d /
Softmax Owning factories that follow in this PR use it directly.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 5.2, 5.3

refactor(userApi): rename Conv1d factories to *Legacy for new API coexistence

Frees the canonical conv1dLayerInit / freeConv1dLayer names for the new
conv1dInit_t-based factories landing in this PR. Legacy bodies are
functionally unchanged; only adds 'conv1dConfig->ownsQuantizations =
false' (no behavior change since the new field defaults to false via
calloc anyway, but explicit is clearer).

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md
…SoftmaxLayer

Factory takes layerQuant_t* and stores .forwardMath / .backwardMath as
the layer's forward/backward quantizations. Owning deep-copies both
via deepCopyQuantization. freeSoftmaxLayer reads ownsQuantizations to
decide whether to also tear down the two quantizations and qConfigs.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5

feat(layer): add ownsQuantizations flag to five internal layer configs

Mirrors the field PR 1 added to linearConfig_t and reluConfig_t.
Foundation for the new factory API in subsequent commits — each new
*LayerInitOwning sets the flag to true and the canonical free*Layer
branches on it. Calloc-backed allocation makes the default false,
which preserves the existing borrowing-semantics for legacy callers.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.3

feat(layer): implement conv1dLayerInit Borrowing variant + freeConv1dLayer

Factory allocates kernel, weight, and bias internally. KAIMING_UNIFORM
weights / ZEROS bias (calloc-implicit). Stores the four lq quantization
pointers verbatim; sets ownsQuantizations=false. freeConv1dLayer tears
down parameters + kernel unconditionally and the quantizations only
when ownsQuantizations=true (defensive dedup against pointer aliasing).
Fixes the pre-existing layer->config leak that the legacy free* path
still has.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5.1, 5.3

test(layer): failing tests for conv1dLayerInitOwning

Verifies deep copy of all four quantization_t into fresh allocations,
ownsQuantizations=true, and clean teardown without leaks.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2

feat(layer): implement conv1dLayerInitOwning (deep-copy variant)

Factory deep-copies each of the four quantization_t in lq via the
shared deepCopyQuantization helper. Always four separate copies (no
aliasing), keeping freeConv1dLayer simple. Caller can drop lq + all
four quantizations immediately after the call.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2

feat(layer): implement conv1dTransposedLayerInit Borrowing variant + freeConv1dTransposedLayer

Allocates kernel, weights ([inChannels, outChannels/groups, kernelSize]),
optional bias. KAIMING_UNIFORM weight init / ZEROS bias.
Stores the four lq pointers verbatim. freeConv1dTransposedLayer tears
down parameters + kernel unconditionally and quantizations only when
ownsQuantizations=true.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5.1, 5.3

test(layer): failing tests for conv1dTransposedLayerInitOwning

Verifies deep-copy of the four quantization_t into fresh allocations,
ownsQuantizations=true, and clean teardown.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2

feat(layer): implement conv1dTransposedLayerInitOwning (deep-copy variant)

Deep-copies each of the four quantization_t via deepCopyQuantization.
Always four separate copies (no aliasing). Caller can drop lq + all
four quantizations immediately after the call.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2

feat(layer): implement maxPool1dLayerInit Borrowing + Owning + freeMaxPool1dLayer

Factory pre-allocates kernel + argmaxIndices INT32 tensor (shape
[1, inputChannels, outputLength]). outputLength derived via
computePool1dOutputLength replicating the geometry rule from
windowGeometry1dCalc. Stride defaults to kernelSize (PyTorch
convention). Owning deep-copies forwardMath + backwardMath into the
config's forwardQ + propLossQ slots.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (split), 4, 5

feat(layer): implement avgPool1dLayerInit Borrowing + Owning + freeAvgPool1dLayer

Factory pre-allocates kernel only (no argmax). Stride defaults to
kernelSize. Owning deep-copies forwardMath + backwardMath into the
forwardQ + propLossQ slots.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5

test(layer): failing tests for new softmaxLayerInit Borrowing + Owning

Two tests: Borrowing stores lq pointers verbatim with
ownsQuantizations=false; Owning deep-copies them with
ownsQuantizations=true. Fails at link until impl lands.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 4
…AR + ECG)

train_pytorch.py for both examples now writes the trained model's
per-layer weight + bias tensors to examples/<dir>/weights/. This is the
input to the v2 binary's BIT_PARITY mode introduced in PR 2 Tasks 20/21,
which loads these via modelLoadStateDict and runs inference for the CI
bit-parity gate.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10

feat(examples): add har_classifier_v2 using new factory API

Same architecture as the legacy har_classifier binary (Conv->ReLU->Pool
x3 + Flatten + Linear + Softmax) but constructed via conv1dLayerInit,
reluLayerInit, maxPool1dLayerInit, avgPool1dLayerInit, flattenLayerInit,
linearLayerInit, softmaxLayerInit. Shares the legacy data directory.
Outputs to examples/har_classifier_v2/{logs,outputs}/. Supports
BIT_PARITY env-var mode (used by the bit-parity CI step) which loads
PyTorch state_dict via modelLoadStateDict and skips training.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10 (coexistence strategy Z)

feat(examples): add ecg_anomaly_ae_v2 using new factory API

Encoder/decoder AE same as legacy ecg_anomaly_ae but built via
conv1dLayerInit / reluLayerInit / maxPool1dLayerInit / avgPool1dLayerInit
/ conv1dTransposedLayerInit. Supports BIT_PARITY env-var mode using
modelLoadStateDict.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10
…ch reference

New job c-bit-parity runs in parallel with c-build-and-test. Steps:
  1. PyTorch trains HAR + ECG (emits pytorch_*.npy + per-layer weights)
  2. Builds the two v2 binaries via cmake --preset examples
  3. Runs both v2 binaries with BIT_PARITY=1 (loads state_dict via
     modelLoadStateDict, skips training, writes inference outputs)
  4. uv-run examples/_shared/compare_predictions.py per example —
     exact match for HAR int32, allclose (rtol=1e-4, atol=1e-5) for
     ECG float32

The job fails the CI if the new factories produce different inference
outputs than PyTorch with the same weights — catches factory-wiring
regressions immediately.

Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10
@LeoBuron LeoBuron force-pushed the layer-factory-api-pr2 branch from 4705c7b to 9798260 Compare May 23, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant