inline: LPIR inliner through M5#27
Open
Yona-Appletree wants to merge 11 commits into
Open
Conversation
- Stage i (M0 CalleeRef) and stage ii (M1 CompilerConfig + compile-opt) plan files - Update M1 roadmap: compile-opt directive, middle-end framing, config on all backends Made-with: Cursor
- Replace flat CalleeRef index with Import(ImportId) and Local(FuncId) - Store local functions in BTreeMap keyed by FuncId in LpirModule - Update builder, parse, print, validate, interp, and lpvm-native/wasm/cranelift/emu paths - Adjust lps-frontend lower and lp-cli shader debug helpers for new CalleeRef Made-with: Cursor
- Add lpir::compiler_config (InlineConfig, apply keys, ConfigError) - Thread CompilerConfig through native, Cranelift, and WASM compile options - Filetests: parse // compile-opt, merge into config before compile; strip from GLSL - Complete stage-ii plan: move to docs/plans-done with summary.md Made-with: Cursor
Add LpirOp::Continuing for explicit loop-continue labels, and thread it through the builder, printer, validator, interpreter, const folding, and VM backends as a no-op control edge. Add inline_module with callgraph construction, offset recompute, callee-body remap/splice, and heuristic-driven inlining controlled by CompilerConfig. Document stage III plans and refresh lpir-inliner roadmap notes. Made-with: Cursor
Adds three pub weight candidates (body_len, markers_zero, heavy_bias) under lpir::inline_weights and a --weights flag on lp-cli shader-debug that emits per-function body_len/mz/hb columns next to the existing LPIR / disasm counts. Includes a small inline-weights.glsl corpus under lps-filetests/filetests/debug/. Used to tune small_func_threshold in the follow-up commit. Made-with: Cursor
Empirically tuned against the rv32n cost model on the new inline-weights.glsl corpus and the existing rainbow.glsl. body_len is the best simple correlate of rv32n_insns (Pearson r=0.98 combined); the other two candidates (markers-zero, heavy-bias) tracked slightly worse and add complexity for no gain. Threshold 16 is the largest body_len at which every corpus function lowers to ≤ 51 rv32n insns; the next size up (body=18) jumps to 85. func_weight, the three weight candidates, and the new lp-cli shader-debug --weights flag remain available for future re-tuning. See docs/roadmaps/2026-04-15-lpir-inliner/m3.1-tune-inline-weights.md. Made-with: Cursor
Captures the M2.5 + M3 + M3.1 work in docs/design/optimization/inline.md: algorithm (bottom-up topo splice), splicer mechanics (vmctx aliasing, param scan-then-alias-or-copy, return-shape wrap), offset recompute and the Continuing marker that enables it, configuration table with current defaults, heuristic decision matrix, the three weight candidates with empirical Pearson-r results from the M3.1 corpus, file layout, and alternatives considered. Made-with: Cursor
- m4-wire-and-validate: full plan rewrite + Outcome section with suite-wide and rainbow.glsl A/B numbers; firmware override skipped (3.7% rv32n_insns growth << 25% threshold) - impl-notes: notes for the upcoming unified lps-shader crate - future-work: CI optimization-profile sweeps, examples corpus, call-order.glsl forced-inline triage - notes: debug/rainbow.glsl → examples/rainbow.glsl - scripts/glsl-filetests.sh: --force-opt KEY=VALUE wrapper - scripts/shader-debug.sh: documented bare --compiler-opt help behavior - run-tests.sh: track rainbow.glsl move - Cargo.lock: lpvm-cranelift log dep Made-with: Cursor
Add `dead_func_elim` pass that removes local functions not transitively
reachable from a caller-supplied root set. Wired into all four backend
entry points (`lpvm-native`, `lpvm-cranelift::{jit,object}_module`,
`lpvm-wasm`) and `lp-cli shader-debug`. Default mode is `Never`, opted
into via `compile-opt(dead_func_elim.mode, auto)` or
`--compiler-opt dead_func_elim.mode=auto`.
`lps-frontend` now marks `render` and the synthesized `__shader_init`
with `is_entry = true` so they survive DFE.
Fix WASM emitter's `wasm_func_index` to look up local FuncIds via a
`BTreeMap<FuncId, u32>` rather than `filtered_import_count + id`. The
old indexing assumed contiguous FuncIds starting at 0, which DFE
breaks by leaving gaps in the function map.
End-to-end filetest under `optimizer/dead_func_elim/` exercises the
pass across `rv32n.q32`, `rv32c.q32`, and `wasm.q32`.
Known limitations and follow-ups (inliner stale-index bug, marking
`test_*` as `is_entry`) captured in `future-work.md`.
Plan: docs/plans/2026-04-19-lpir-inliner-m5-dead-func-elim/
Made-with: Cursor
These were untracked local development files that slipped into the M5 commit via 'git add -A'. Restore them as untracked. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Created from branch feature/inline