feat(userApi): layer factory API PR 2 — Conv1d, ConvT1d, Pool1d, Softmax + bit-parity CI#175
Open
LeoBuron wants to merge 4 commits into
Open
feat(userApi): layer factory API PR 2 — Conv1d, ConvT1d, Pool1d, Softmax + bit-parity CI#175LeoBuron wants to merge 4 commits into
LeoBuron wants to merge 4 commits into
Conversation
2588296 to
b6a8a51
Compare
… bias memcpy Mirrors the CONV1D case but routes through conv1dTransposedConfig_t. Caller-side weight buffer shape is [inChannels, outChannels/groups, kernelSize] — the SWAP relative to Conv1d is documented in the Conv1dTransposedApi header. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1 refactor(userApi): rename Softmax factories to *Legacy for new API coexistence Frees the canonical softmaxLayerInit / freeSoftmaxLayer names. Legacy bodies are functionally unchanged except for the explicit softmaxConfig->ownsQuantizations = false (defensive — matches calloc default). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md feat(userApi): declare conv1dInit_t and new Conv1d factory signatures Adds the per-layer init struct, Borrowing/Owning factory decls, and freeConv1dLayer decl. _Static_assert guards that paddingType_t::VALID remains enum value 0 so .padding zero-init defaults to VALID. No impl yet — that follows in the next two commits. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 3.4, 4 test(userApi): failing tests for new conv1dLayerInit Borrowing variant Four tests: shape correctness with explicit fields, BIAS_DEFAULT resolution, BIAS_FALSE leaves bias NULL, padding/stride/dilation/groups zero-init defaults (VALID/1/1/1). Fails at link until impl lands. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 4, 5.1 feat(userApi): declare conv1dTransposedInit_t and Conv1dTransposed factory signatures New Conv1dTransposedApi.h header with conv1dTransposedInit_t struct, Borrowing/Owning factory decls, freeConv1dTransposedLayer decl, and _Static_assert(VALID == 0). Stub .c file registered in CMake. Implementation in subsequent commits. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 3.4, 4 test(userApi): failing tests for new conv1dTransposedLayerInit Borrowing Three tests: shape correctness (with inChannels/outChannels weight shape SWAP relative to Conv1d), BIAS_FALSE leaves bias NULL, outputPadding propagates to internal config. Fails at link until impl lands. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3, 4 feat(userApi): declare Pool1d factory signatures (Max + Avg) Splits the spec's shared pool1dInit_t into maxPool1dInit_t (with inputChannels + inputLength for argmax pre-allocation) and avgPool1dInit_t (no input geometry, no dilation). Both factory pairs declared in one header; impl stubbed in Pool1dApi.c for the CMake graph. Per-layer init code follows. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (with documented split), 3.4, 4 test(userApi): failing tests for maxPool1dLayerInit Borrowing + Owning Four tests: kernel + argmax shape correctness, stride defaulting to kernelSize (PyTorch pool convention), Owning deep-copy of forwardMath and backwardMath into the two pool config slots (forwardQ + propLossQ), and a leak-check loop. Fails at link until impl lands. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (split), 4 test(userApi): failing tests for avgPool1dLayerInit Borrowing + Owning Three tests: kernel correctness, stride defaulting to kernelSize, and Owning deep-copy. AvgPool has no dilation (struct field omitted) and no argmax tensor (no input geometry required). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 4 test(userApi): failing tests for layerLoadWeights CONV1D case Two tests: weight + bias memcpy, and no-bias accepts NULL biasData. Fails because the current CONV1D dispatch is the PR 1 PRINT_ERROR stub. Implementation in next commit. Also adds TensorApi include + MORE_LIBS entry (provides freeQuantization, which the conv1d tests need but the plan omitted). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1 feat(userApi): layerLoadWeights CONV1D dispatch — weight + bias memcpy Replaces PR 1's TODO stub. Same shape as the LINEAR case: memcpy from the caller-provided float* buffer into the factory-allocated tensor data, with bias presence/absence enforcement matching the bool resolved from conv1dInit_t::bias. Also links Conv1d into LayerWeightsApi CMake target and adds TensorApi to the test's MORE_LIBS (provides freeQuantization, omitted from spec). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1 test(userApi): failing test for layerLoadWeights CONV1D_TRANSPOSED case Verifies weight + bias memcpy into the factory-allocated Conv1dTransposed parameter tensors. Fails because the current dispatch is a PRINT_ERROR + exit stub. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 8.1 test(userApi): failing tests for shared deepCopyQuantization in LayerQuant Three tests cover the null-input shortcut, FLOAT32 (no qConfig), and SYM_INT32 (qConfig bytes duplicated). Fails at link until the impl lands in LayerQuant.c. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 5.2, 5.3 refactor(userApi): extract deepCopyQuantization to shared LayerQuant utility PR 1 inlined deepCopyQuantization into LinearApi.c and an equivalent reluDeepCopyQuantization into ReluApi.c. Hoists both into a single externally-linked function in LayerQuant.c. LinearApi and ReluApi now share the same code path; the new Conv1d / Conv1dTransposed / Pool1d / Softmax Owning factories that follow in this PR use it directly. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 5.2, 5.3 refactor(userApi): rename Conv1d factories to *Legacy for new API coexistence Frees the canonical conv1dLayerInit / freeConv1dLayer names for the new conv1dInit_t-based factories landing in this PR. Legacy bodies are functionally unchanged; only adds 'conv1dConfig->ownsQuantizations = false' (no behavior change since the new field defaults to false via calloc anyway, but explicit is clearer). Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md
…SoftmaxLayer Factory takes layerQuant_t* and stores .forwardMath / .backwardMath as the layer's forward/backward quantizations. Owning deep-copies both via deepCopyQuantization. freeSoftmaxLayer reads ownsQuantizations to decide whether to also tear down the two quantizations and qConfigs. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5 feat(layer): add ownsQuantizations flag to five internal layer configs Mirrors the field PR 1 added to linearConfig_t and reluConfig_t. Foundation for the new factory API in subsequent commits — each new *LayerInitOwning sets the flag to true and the canonical free*Layer branches on it. Calloc-backed allocation makes the default false, which preserves the existing borrowing-semantics for legacy callers. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.3 feat(layer): implement conv1dLayerInit Borrowing variant + freeConv1dLayer Factory allocates kernel, weight, and bias internally. KAIMING_UNIFORM weights / ZEROS bias (calloc-implicit). Stores the four lq quantization pointers verbatim; sets ownsQuantizations=false. freeConv1dLayer tears down parameters + kernel unconditionally and the quantizations only when ownsQuantizations=true (defensive dedup against pointer aliasing). Fixes the pre-existing layer->config leak that the legacy free* path still has. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5.1, 5.3 test(layer): failing tests for conv1dLayerInitOwning Verifies deep copy of all four quantization_t into fresh allocations, ownsQuantizations=true, and clean teardown without leaks. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2 feat(layer): implement conv1dLayerInitOwning (deep-copy variant) Factory deep-copies each of the four quantization_t in lq via the shared deepCopyQuantization helper. Always four separate copies (no aliasing), keeping freeConv1dLayer simple. Caller can drop lq + all four quantizations immediately after the call. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2 feat(layer): implement conv1dTransposedLayerInit Borrowing variant + freeConv1dTransposedLayer Allocates kernel, weights ([inChannels, outChannels/groups, kernelSize]), optional bias. KAIMING_UNIFORM weight init / ZEROS bias. Stores the four lq pointers verbatim. freeConv1dTransposedLayer tears down parameters + kernel unconditionally and quantizations only when ownsQuantizations=true. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5.1, 5.3 test(layer): failing tests for conv1dTransposedLayerInitOwning Verifies deep-copy of the four quantization_t into fresh allocations, ownsQuantizations=true, and clean teardown. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2 feat(layer): implement conv1dTransposedLayerInitOwning (deep-copy variant) Deep-copies each of the four quantization_t via deepCopyQuantization. Always four separate copies (no aliasing). Caller can drop lq + all four quantizations immediately after the call. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 5.2 feat(layer): implement maxPool1dLayerInit Borrowing + Owning + freeMaxPool1dLayer Factory pre-allocates kernel + argmaxIndices INT32 tensor (shape [1, inputChannels, outputLength]). outputLength derived via computePool1dOutputLength replicating the geometry rule from windowGeometry1dCalc. Stride defaults to kernelSize (PyTorch convention). Owning deep-copies forwardMath + backwardMath into the config's forwardQ + propLossQ slots. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 3.3 (split), 4, 5 feat(layer): implement avgPool1dLayerInit Borrowing + Owning + freeAvgPool1dLayer Factory pre-allocates kernel only (no argmax). Stride defaults to kernelSize. Owning deep-copies forwardMath + backwardMath into the forwardQ + propLossQ slots. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md sections 4, 5 test(layer): failing tests for new softmaxLayerInit Borrowing + Owning Two tests: Borrowing stores lq pointers verbatim with ownsQuantizations=false; Owning deep-copies them with ownsQuantizations=true. Fails at link until impl lands. Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 4
…AR + ECG)
train_pytorch.py for both examples now writes the trained model's
per-layer weight + bias tensors to examples/<dir>/weights/. This is the
input to the v2 binary's BIT_PARITY mode introduced in PR 2 Tasks 20/21,
which loads these via modelLoadStateDict and runs inference for the CI
bit-parity gate.
Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10
feat(examples): add har_classifier_v2 using new factory API
Same architecture as the legacy har_classifier binary (Conv->ReLU->Pool
x3 + Flatten + Linear + Softmax) but constructed via conv1dLayerInit,
reluLayerInit, maxPool1dLayerInit, avgPool1dLayerInit, flattenLayerInit,
linearLayerInit, softmaxLayerInit. Shares the legacy data directory.
Outputs to examples/har_classifier_v2/{logs,outputs}/. Supports
BIT_PARITY env-var mode (used by the bit-parity CI step) which loads
PyTorch state_dict via modelLoadStateDict and skips training.
Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10 (coexistence strategy Z)
feat(examples): add ecg_anomaly_ae_v2 using new factory API
Encoder/decoder AE same as legacy ecg_anomaly_ae but built via
conv1dLayerInit / reluLayerInit / maxPool1dLayerInit / avgPool1dLayerInit
/ conv1dTransposedLayerInit. Supports BIT_PARITY env-var mode using
modelLoadStateDict.
Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10
…ch reference
New job c-bit-parity runs in parallel with c-build-and-test. Steps:
1. PyTorch trains HAR + ECG (emits pytorch_*.npy + per-layer weights)
2. Builds the two v2 binaries via cmake --preset examples
3. Runs both v2 binaries with BIT_PARITY=1 (loads state_dict via
modelLoadStateDict, skips training, writes inference outputs)
4. uv-run examples/_shared/compare_predictions.py per example —
exact match for HAR int32, allclose (rtol=1e-4, atol=1e-5) for
ECG float32
The job fails the CI if the new factories produce different inference
outputs than PyTorch with the same weights — catches factory-wiring
regressions immediately.
Refs spec: docs/superpowers/specs/2026-05-15-layer-factory-api-design.md section 10
4705c7b to
9798260
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Second of the two-PR series migrating the layer construction API to
designated-initializer-based factories (PR 1 landed Linear + ReLU +
Flatten + infrastructure). This PR:
MaxPool1d, AvgPool1d, and Softmax — same pattern as PR 1's Linear /
ReLU.
layerLoadWeightsprimitive that PR 1 stubbed.deepCopyQuantizationhelpers that PR 1introduced into
LinearApi.candReluApi.cinto a shared utilityin
LayerQuant.{h,c}— exercised by every new Owning factory._v2directories(coexistence strategy Z — no rename churn). Each v2 binary supports
a
BIT_PARITYenv-var mode that loads the PyTorch state_dict viamodelLoadStateDictand runs inference, skipping training.c-bit-parityCI job that trains PyTorch (emitting per-layerweight
.npyfiles), runs the v2 binaries withBIT_PARITY=1, anddiffs their inference outputs against the PyTorch reference
predictions/reconstructions.
`Conv1dTransposedApi.h`, `Pool1dApi.h`.
Design refinements from the spec
(without). Spec section 3.3 anticipated this split for the
`ceil_mode` case; we trigger it now because the argmax-sizing
requirement is structurally similar.
Test plan
References
rebase onto `develop` after PR 1 lands)
🤖 Generated with Claude Code