ci: cap cargo build parallelism to fit 6Gi runner (fixes 1.85 OOM)#27
Open
27Bslash6 wants to merge 2 commits into
Open
ci: cap cargo build parallelism to fit 6Gi runner (fixes 1.85 OOM)#2727Bslash6 wants to merge 2 commits into
27Bslash6 wants to merge 2 commits into
Conversation
Temporary diagnostic PR to reproduce the failing 1.85 (MSRV) CI job on a fresh ARC runner so its logs can be captured live (GitHub blob logs are not persisting — see #25). Will be closed once root cause is captured.
cargo reads the node CPU count (32t), not the pod cgroup quota, so it spawns ~32 parallel rustc in the 6Gi cachekit-lean runner and the build is OOM-killed (SIGKILL 137) mid-`cargo test`. The 1.85 job is hit because it skips clippy (which would warm artifacts first) and does the full cold compile in one step; stable/beta survive via incremental reuse. Capping to 4 jobs fits the ceiling. Diagnosed live on the ARC runner (logs were not persisting to GitHub). Ref #25.
WalkthroughThis PR introduces a single environment variable to the CI workflow that caps Rust build parallelism at 4 jobs, preventing out-of-memory failures during test compilation on the runner. ChangesCI Parallelism Configuration
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the 6-day red CI on
main(#25).Root cause (diagnosed live on the ARC runner)
The
1.85matrix job was killed (SIGKILL/137) ~10 min intocargo test, in the test step, with no step conclusion — and the runner died before it could upload its log blob, which is why GitHub returnedBlobNotFoundand the failure was undiagnosable from the UI.cargo reads the node's CPU count (32 threads), not the pod's cgroup CPU quota, so it spawned ~32 parallel
rustcin the 6Gicachekit-leanrunner. The build's memory peak (codegen/link) blew the 6Gi ceiling → cgroup memory pressure → thrash for ~10 min → OOM-kill.Why only
1.85:stable/betarunclippyfirst (warming build artifacts), so theircargo testis incremental and lower-peak.1.85skips fmt/clippy and does the full cold 32-way compile in one step → hits the ceiling.Fix + proof
Cap
CARGO_BUILD_JOBS: "4"so peak memory fits the 6Gi limit. Result on this branch:The build doesn't just survive — it's an order of magnitude faster, because it's no longer thrashing under memory pressure.
Note
A more principled fix is making cargo honor the cgroup CPU quota (derive
-jfrom the request), but the env cap is the pragmatic, proven fix. The same footgun likely affects other cachekit Rust repos on these runners (e.g. cachekit-core).