Skip to content

ci: cap cargo build parallelism to fit 6Gi runner (fixes 1.85 OOM)#27

Open
27Bslash6 wants to merge 2 commits into
mainfrom
ci/diag-185-capture
Open

ci: cap cargo build parallelism to fit 6Gi runner (fixes 1.85 OOM)#27
27Bslash6 wants to merge 2 commits into
mainfrom
ci/diag-185-capture

Conversation

@27Bslash6

@27Bslash6 27Bslash6 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Fixes the 6-day red CI on main (#25).

Root cause (diagnosed live on the ARC runner)

The 1.85 matrix job was killed (SIGKILL/137) ~10 min into cargo test, in the test step, with no step conclusion — and the runner died before it could upload its log blob, which is why GitHub returned BlobNotFound and the failure was undiagnosable from the UI.

cargo reads the node's CPU count (32 threads), not the pod's cgroup CPU quota, so it spawned ~32 parallel rustc in the 6Gi cachekit-lean runner. The build's memory peak (codegen/link) blew the 6Gi ceiling → cgroup memory pressure → thrash for ~10 min → OOM-kill.

Why only 1.85: stable/beta run clippy first (warming build artifacts), so their cargo test is incremental and lower-peak. 1.85 skips fmt/clippy and does the full cold 32-way compile in one step → hits the ceiling.

Fix + proof

Cap CARGO_BUILD_JOBS: "4" so peak memory fits the 6Gi limit. Result on this branch:

before after (capped)
1.85 ❌ killed at ~10:00 (137) 75s
stable / beta / wasm

The build doesn't just survive — it's an order of magnitude faster, because it's no longer thrashing under memory pressure.

Note

A more principled fix is making cargo honor the cgroup CPU quota (derive -j from the request), but the env cap is the pragmatic, proven fix. The same footgun likely affects other cachekit Rust repos on these runners (e.g. cachekit-core).

27Bslash6 added 2 commits June 6, 2026 23:35
Temporary diagnostic PR to reproduce the failing 1.85 (MSRV) CI job on a
fresh ARC runner so its logs can be captured live (GitHub blob logs are not
persisting — see #25). Will be closed once root cause is captured.
cargo reads the node CPU count (32t), not the pod cgroup quota, so it spawns
~32 parallel rustc in the 6Gi cachekit-lean runner and the build is OOM-killed
(SIGKILL 137) mid-`cargo test`. The 1.85 job is hit because it skips clippy
(which would warm artifacts first) and does the full cold compile in one step;
stable/beta survive via incremental reuse. Capping to 4 jobs fits the ceiling.

Diagnosed live on the ARC runner (logs were not persisting to GitHub). Ref #25.
@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

This PR introduces a single environment variable to the CI workflow that caps Rust build parallelism at 4 jobs, preventing out-of-memory failures during test compilation on the runner.

Changes

CI Parallelism Configuration

Layer / File(s) Summary
Cargo build jobs environment variable
.github/workflows/ci.yml
Adds CARGO_BUILD_JOBS: "4" to the workflow-level env block alongside existing RUSTUP_HOME and CARGO_HOME environment variables, with comments explaining the rationale for limiting parallelism to prevent OOM kills during cargo test.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: capping cargo build parallelism to address OOM issues on 6Gi runners for Rust 1.85.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/diag-185-capture

Comment @coderabbitai help to get the list of available commands and usage tips.

@27Bslash6 27Bslash6 changed the title ci: capture 1.85 job diagnostics (temporary, do not merge) ci: cap cargo build parallelism to fit 6Gi runner (fixes 1.85 OOM) Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant