feat(userApi): layer factory API PR 2 — Conv1d, ConvT1d, Pool1d, Softmax + bit-parity CI by LeoBuron · Pull Request #175 · es-ude/OnDeviceTraining

LeoBuron · 2026-05-15T20:26:10Z

Summary

Second of the two-PR series migrating the layer construction API to
designated-initializer-based factories (PR 1 landed Linear + ReLU +
Flatten + infrastructure). This PR:

Adds Borrowing + Owning factories for Conv1d, Conv1dTransposed,
MaxPool1d, AvgPool1d, and Softmax — same pattern as PR 1's Linear /
ReLU.
Wires CONV1D and CONV1D_TRANSPOSED dispatch into the
layerLoadWeights primitive that PR 1 stubbed.
Extracts the per-file deepCopyQuantization helpers that PR 1
introduced into LinearApi.c and ReluApi.c into a shared utility
in LayerQuant.{h,c} — exercised by every new Owning factory.
Migrates the two example training binaries to new _v2 directories
(coexistence strategy Z — no rename churn). Each v2 binary supports
a BIT_PARITY env-var mode that loads the PyTorch state_dict via
modelLoadStateDict and runs inference, skipping training.
Adds a c-bit-parity CI job that trains PyTorch (emitting per-layer
weight .npy files), runs the v2 binaries with BIT_PARITY=1, and
diffs their inference outputs against the PyTorch reference
predictions/reconstructions.
Adds `_Static_assert(VALID == 0, ...)` guards in `Conv1dApi.h`,
`Conv1dTransposedApi.h`, `Pool1dApi.h`.

Design refinements from the spec

`pool1dInit_t` is split into `maxPool1dInit_t` (with `inputChannels`
- `inputLength` for argmax pre-allocation) and `avgPool1dInit_t`
  (without). Spec section 3.3 anticipated this split for the
  `ceil_mode` case; we trigger it now because the argmax-sizing
  requirement is structurally similar.

Test plan

`ctest --preset unit_test_debug` — 50/50 tests pass (47 from PR 1 + 3 new test files: UnitTestConv1dApi, UnitTestConv1dTransposedApi, UnitTestPool1dApi; plus new tests added to UnitTestLayerQuant, UnitTestLayerWeightsApi, UnitTestSoftmax).
`ctest --preset unit_test_error` — 50/50 pass at strictest log level.
Both v2 example binaries build cleanly (`train_c_har_classifier_v2`, `train_c_ecg_anomaly_ae_v2`).
Both legacy example binaries still build (proves the rename sweeps are correct).
alloc-locality grep gate passes.
clang-format dry-run is clean.
CI `c-bit-parity` job goes green on the PR.

References

Spec: `docs/superpowers/specs/2026-05-15-layer-factory-api-design.md` (gitignored)
Prior PR: `layer-factory-api-pr1` (this branch is stacked on top —
rebase onto `develop` after PR 1 lands)

🤖 Generated with Claude Code

… bias memcpy Mirrors the CONV1D case but routes through conv1dTransposedConfig_t. Caller-side weight buffer shape is [inChannels, outChannels/groups, kernelSize] — the SWAP relative to Conv1d is documented in the Conv1dTransposedApi header. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1 refactor(userApi): rename Softmax factories to *Legacy for new API coexistence Frees the canonical softmaxLayerInit / freeSoftmaxLayer names. Legacy bodies are functionally unchanged except for the explicit softmaxConfig->ownsQuantizations = false (defensive — matches calloc default). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md feat(userApi): declare conv1dInit_t and new Conv1d factory signatures Adds the per-layer init struct, Borrowing/Owning factory decls, and freeConv1dLayer decl. _Static_assert guards that paddingType_t::VALID remains enum value 0 so .padding zero-init defaults to VALID. No impl yet — that follows in the next two commits. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 3.4, 4 test(userApi): failing tests for new conv1dLayerInit Borrowing variant Four tests: shape correctness with explicit fields, BIAS_DEFAULT resolution, BIAS_FALSE leaves bias NULL, padding/stride/dilation/groups zero-init defaults (VALID/1/1/1). Fails at link until impl lands. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 4, 5.1 feat(userApi): declare conv1dTransposedInit_t and Conv1dTransposed factory signatures New Conv1dTransposedApi.h header with conv1dTransposedInit_t struct, Borrowing/Owning factory decls, freeConv1dTransposedLayer decl, and _Static_assert(VALID == 0). Stub .c file registered in CMake. Implementation in subsequent commits. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 3.4, 4 test(userApi): failing tests for new conv1dTransposedLayerInit Borrowing Three tests: shape correctness (with inChannels/outChannels weight shape SWAP relative to Conv1d), BIAS_FALSE leaves bias NULL, outputPadding propagates to internal config. Fails at link until impl lands. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 4 feat(userApi): declare Pool1d factory signatures (Max + Avg) Splits the spec's shared pool1dInit_t into maxPool1dInit_t (with inputChannels + inputLength for argmax pre-allocation) and avgPool1dInit_t (no input geometry, no dilation). Both factory pairs declared in one header; impl stubbed in Pool1dApi.c for the CMake graph. Per-layer init code follows. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (with documented split), 3.4, 4 test(userApi): failing tests for maxPool1dLayerInit Borrowing + Owning Four tests: kernel + argmax shape correctness, stride defaulting to kernelSize (PyTorch pool convention), Owning deep-copy of forwardMath and backwardMath into the two pool config slots (forwardQ + propLossQ), and a leak-check loop. Fails at link until impl lands. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (split), 4 test(userApi): failing tests for avgPool1dLayerInit Borrowing + Owning Three tests: kernel correctness, stride defaulting to kernelSize, and Owning deep-copy. AvgPool has no dilation (struct field omitted) and no argmax tensor (no input geometry required). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 4 test(userApi): failing tests for layerLoadWeights CONV1D case Two tests: weight + bias memcpy, and no-bias accepts NULL biasData. Fails because the current CONV1D dispatch is the PR 1 PRINT_ERROR stub. Implementation in next commit. Also adds TensorApi include + MORE_LIBS entry (provides freeQuantization, which the conv1d tests need but the plan omitted). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1 feat(userApi): layerLoadWeights CONV1D dispatch — weight + bias memcpy Replaces PR 1's TODO stub. Same shape as the LINEAR case: memcpy from the caller-provided float* buffer into the factory-allocated tensor data, with bias presence/absence enforcement matching the bool resolved from conv1dInit_t::bias. Also links Conv1d into LayerWeightsApi CMake target and adds TensorApi to the test's MORE_LIBS (provides freeQuantization, omitted from spec). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1 test(userApi): failing test for layerLoadWeights CONV1D_TRANSPOSED case Verifies weight + bias memcpy into the factory-allocated Conv1dTransposed parameter tensors. Fails because the current dispatch is a PRINT_ERROR + exit stub. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1 test(userApi): failing tests for shared deepCopyQuantization in LayerQuant Three tests cover the null-input shortcut, FLOAT32 (no qConfig), and SYM_INT32 (qConfig bytes duplicated). Fails at link until the impl lands in LayerQuant.c. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 5.2, 5.3 refactor(userApi): extract deepCopyQuantization to shared LayerQuant utility PR 1 inlined deepCopyQuantization into LinearApi.c and an equivalent reluDeepCopyQuantization into ReluApi.c. Hoists both into a single externally-linked function in LayerQuant.c. LinearApi and ReluApi now share the same code path; the new Conv1d / Conv1dTransposed / Pool1d / Softmax Owning factories that follow in this PR use it directly. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 5.2, 5.3 refactor(userApi): rename Conv1d factories to *Legacy for new API coexistence Frees the canonical conv1dLayerInit / freeConv1dLayer names for the new conv1dInit_t-based factories landing in this PR. Legacy bodies are functionally unchanged; only adds 'conv1dConfig->ownsQuantizations = false' (no behavior change since the new field defaults to false via calloc anyway, but explicit is clearer). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md

…SoftmaxLayer Factory takes layerQuant_t* and stores .forwardMath / .backwardMath as the layer's forward/backward quantizations. Owning deep-copies both via deepCopyQuantization. freeSoftmaxLayer reads ownsQuantizations to decide whether to also tear down the two quantizations and qConfigs. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5 feat(layer): add ownsQuantizations flag to five internal layer configs Mirrors the field PR 1 added to linearConfig_t and reluConfig_t. Foundation for the new factory API in subsequent commits — each new *LayerInitOwning sets the flag to true and the canonical free*Layer branches on it. Calloc-backed allocation makes the default false, which preserves the existing borrowing-semantics for legacy callers. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.3 feat(layer): implement conv1dLayerInit Borrowing variant + freeConv1dLayer Factory allocates kernel, weight, and bias internally. KAIMING_UNIFORM weights / ZEROS bias (calloc-implicit). Stores the four lq quantization pointers verbatim; sets ownsQuantizations=false. freeConv1dLayer tears down parameters + kernel unconditionally and the quantizations only when ownsQuantizations=true (defensive dedup against pointer aliasing). Fixes the pre-existing layer->config leak that the legacy free* path still has. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5.1, 5.3 test(layer): failing tests for conv1dLayerInitOwning Verifies deep copy of all four quantization_t into fresh allocations, ownsQuantizations=true, and clean teardown without leaks. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2 feat(layer): implement conv1dLayerInitOwning (deep-copy variant) Factory deep-copies each of the four quantization_t in lq via the shared deepCopyQuantization helper. Always four separate copies (no aliasing), keeping freeConv1dLayer simple. Caller can drop lq + all four quantizations immediately after the call. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2 feat(layer): implement conv1dTransposedLayerInit Borrowing variant + freeConv1dTransposedLayer Allocates kernel, weights ([inChannels, outChannels/groups, kernelSize]), optional bias. KAIMING_UNIFORM weight init / ZEROS bias. Stores the four lq pointers verbatim. freeConv1dTransposedLayer tears down parameters + kernel unconditionally and quantizations only when ownsQuantizations=true. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5.1, 5.3 test(layer): failing tests for conv1dTransposedLayerInitOwning Verifies deep-copy of the four quantization_t into fresh allocations, ownsQuantizations=true, and clean teardown. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2 feat(layer): implement conv1dTransposedLayerInitOwning (deep-copy variant) Deep-copies each of the four quantization_t via deepCopyQuantization. Always four separate copies (no aliasing). Caller can drop lq + all four quantizations immediately after the call. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2 feat(layer): implement maxPool1dLayerInit Borrowing + Owning + freeMaxPool1dLayer Factory pre-allocates kernel + argmaxIndices INT32 tensor (shape [1, inputChannels, outputLength]). outputLength derived via computePool1dOutputLength replicating the geometry rule from windowGeometry1dCalc. Stride defaults to kernelSize (PyTorch convention). Owning deep-copies forwardMath + backwardMath into the config's forwardQ + propLossQ slots. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (split), 4, 5 feat(layer): implement avgPool1dLayerInit Borrowing + Owning + freeAvgPool1dLayer Factory pre-allocates kernel only (no argmax). Stride defaults to kernelSize. Owning deep-copies forwardMath + backwardMath into the forwardQ + propLossQ slots. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5 test(layer): failing tests for new softmaxLayerInit Borrowing + Owning Two tests: Borrowing stores lq pointers verbatim with ownsQuantizations=false; Owning deep-copies them with ownsQuantizations=true. Fails at link until impl lands. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 4

…AR + ECG) train_pytorch.py for both examples now writes the trained model's per-layer weight + bias tensors to examples/<dir>/weights/. This is the input to the v2 binary's BIT_PARITY mode introduced in PR 2 Tasks 20/21, which loads these via modelLoadStateDict and runs inference for the CI bit-parity gate. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10 feat(examples): add har_classifier_v2 using new factory API Same architecture as the legacy har_classifier binary (Conv->ReLU->Pool x3 + Flatten + Linear + Softmax) but constructed via conv1dLayerInit, reluLayerInit, maxPool1dLayerInit, avgPool1dLayerInit, flattenLayerInit, linearLayerInit, softmaxLayerInit. Shares the legacy data directory. Outputs to examples/har_classifier_v2/{logs,outputs}/. Supports BIT_PARITY env-var mode (used by the bit-parity CI step) which loads PyTorch state_dict via modelLoadStateDict and skips training. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10 (coexistence strategy Z) feat(examples): add ecg_anomaly_ae_v2 using new factory API Encoder/decoder AE same as legacy ecg_anomaly_ae but built via conv1dLayerInit / reluLayerInit / maxPool1dLayerInit / avgPool1dLayerInit / conv1dTransposedLayerInit. Supports BIT_PARITY env-var mode using modelLoadStateDict. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10

…ch reference New job c-bit-parity runs in parallel with c-build-and-test. Steps: 1. PyTorch trains HAR + ECG (emits pytorch_*.npy + per-layer weights) 2. Builds the two v2 binaries via cmake --preset examples 3. Runs both v2 binaries with BIT_PARITY=1 (loads state_dict via modelLoadStateDict, skips training, writes inference outputs) 4. uv-run examples/_shared/compare_predictions.py per example — exact match for HAR int32, allclose (rtol=1e-4, atol=1e-5) for ECG float32 The job fails the CI if the new factories produce different inference outputs than PyTorch with the same weights — catches factory-wiring regressions immediately. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10

LeoBuron force-pushed the layer-factory-api-pr1 branch from 2588296 to b6a8a51 Compare May 23, 2026 09:58

LeoBuron added 4 commits May 23, 2026 11:58

LeoBuron force-pushed the layer-factory-api-pr2 branch from 4705c7b to 9798260 Compare May 23, 2026 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(userApi): layer factory API PR 2 — Conv1d, ConvT1d, Pool1d, Softmax + bit-parity CI#175

feat(userApi): layer factory API PR 2 — Conv1d, ConvT1d, Pool1d, Softmax + bit-parity CI#175
LeoBuron wants to merge 4 commits into
layer-factory-api-pr1from
layer-factory-api-pr2

LeoBuron commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LeoBuron commented May 15, 2026

Summary

Design refinements from the spec

Test plan

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant