Skip to content

inline: LPIR inliner through M5#27

Open
Yona-Appletree wants to merge 11 commits into
mainfrom
feature/inline
Open

inline: LPIR inliner through M5#27
Yona-Appletree wants to merge 11 commits into
mainfrom
feature/inline

Conversation

@Yona-Appletree

Copy link
Copy Markdown
Member

Created from branch feature/inline

- Stage i (M0 CalleeRef) and stage ii (M1 CompilerConfig + compile-opt) plan files
- Update M1 roadmap: compile-opt directive, middle-end framing, config on all backends

Made-with: Cursor
- Replace flat CalleeRef index with Import(ImportId) and Local(FuncId)
- Store local functions in BTreeMap keyed by FuncId in LpirModule
- Update builder, parse, print, validate, interp, and lpvm-native/wasm/cranelift/emu paths
- Adjust lps-frontend lower and lp-cli shader debug helpers for new CalleeRef

Made-with: Cursor
- Add lpir::compiler_config (InlineConfig, apply keys, ConfigError)
- Thread CompilerConfig through native, Cranelift, and WASM compile options
- Filetests: parse // compile-opt, merge into config before compile; strip from GLSL
- Complete stage-ii plan: move to docs/plans-done with summary.md

Made-with: Cursor
Add LpirOp::Continuing for explicit loop-continue labels, and thread it
through the builder, printer, validator, interpreter, const folding, and VM
backends as a no-op control edge.

Add inline_module with callgraph construction, offset recompute, callee-body
remap/splice, and heuristic-driven inlining controlled by CompilerConfig.

Document stage III plans and refresh lpir-inliner roadmap notes.

Made-with: Cursor
Adds three pub weight candidates (body_len, markers_zero, heavy_bias)
under lpir::inline_weights and a --weights flag on lp-cli shader-debug
that emits per-function body_len/mz/hb columns next to the existing
LPIR / disasm counts. Includes a small inline-weights.glsl corpus
under lps-filetests/filetests/debug/.

Used to tune small_func_threshold in the follow-up commit.

Made-with: Cursor
Empirically tuned against the rv32n cost model on the new
inline-weights.glsl corpus and the existing rainbow.glsl. body_len is
the best simple correlate of rv32n_insns (Pearson r=0.98 combined);
the other two candidates (markers-zero, heavy-bias) tracked slightly
worse and add complexity for no gain. Threshold 16 is the largest
body_len at which every corpus function lowers to ≤ 51 rv32n insns;
the next size up (body=18) jumps to 85.

func_weight, the three weight candidates, and the new
lp-cli shader-debug --weights flag remain available for future
re-tuning.

See docs/roadmaps/2026-04-15-lpir-inliner/m3.1-tune-inline-weights.md.

Made-with: Cursor
Captures the M2.5 + M3 + M3.1 work in docs/design/optimization/inline.md:
algorithm (bottom-up topo splice), splicer mechanics (vmctx aliasing,
param scan-then-alias-or-copy, return-shape wrap), offset recompute and
the Continuing marker that enables it, configuration table with current
defaults, heuristic decision matrix, the three weight candidates with
empirical Pearson-r results from the M3.1 corpus, file layout, and
alternatives considered.

Made-with: Cursor
- m4-wire-and-validate: full plan rewrite + Outcome section with
  suite-wide and rainbow.glsl A/B numbers; firmware override skipped
  (3.7% rv32n_insns growth << 25% threshold)
- impl-notes: notes for the upcoming unified lps-shader crate
- future-work: CI optimization-profile sweeps, examples corpus,
  call-order.glsl forced-inline triage
- notes: debug/rainbow.glsl → examples/rainbow.glsl
- scripts/glsl-filetests.sh: --force-opt KEY=VALUE wrapper
- scripts/shader-debug.sh: documented bare --compiler-opt help behavior
- run-tests.sh: track rainbow.glsl move
- Cargo.lock: lpvm-cranelift log dep

Made-with: Cursor
Add `dead_func_elim` pass that removes local functions not transitively
reachable from a caller-supplied root set. Wired into all four backend
entry points (`lpvm-native`, `lpvm-cranelift::{jit,object}_module`,
`lpvm-wasm`) and `lp-cli shader-debug`. Default mode is `Never`, opted
into via `compile-opt(dead_func_elim.mode, auto)` or
`--compiler-opt dead_func_elim.mode=auto`.

`lps-frontend` now marks `render` and the synthesized `__shader_init`
with `is_entry = true` so they survive DFE.

Fix WASM emitter's `wasm_func_index` to look up local FuncIds via a
`BTreeMap<FuncId, u32>` rather than `filtered_import_count + id`. The
old indexing assumed contiguous FuncIds starting at 0, which DFE
breaks by leaving gaps in the function map.

End-to-end filetest under `optimizer/dead_func_elim/` exercises the
pass across `rv32n.q32`, `rv32c.q32`, and `wasm.q32`.

Known limitations and follow-ups (inliner stale-index bug, marking
`test_*` as `is_entry`) captured in `future-work.md`.

Plan: docs/plans/2026-04-19-lpir-inliner-m5-dead-func-elim/
Made-with: Cursor
These were untracked local development files that slipped into the M5
commit via 'git add -A'. Restore them as untracked.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant