CNTRLPLANE-3351: e2e: add opt-in CPU resource request overrides for control plane components#8385
CNTRLPLANE-3351: e2e: add opt-in CPU resource request overrides for control plane components#8385bryan-cox wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@bryan-cox: This pull request references CNTRLPLANE-3351 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
📝 WalkthroughWalkthroughThe pull request refactors default cluster creation annotations in the E2E test utilities by moving static annotation construction from 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d09cc78 to
064d927
Compare
|
/lgtm |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8385 +/- ##
=======================================
Coverage 36.42% 36.42%
=======================================
Files 765 765
Lines 93302 93302
=======================================
Hits 33981 33981
Misses 56606 56606
Partials 2715 2715
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/e2e/util/options.go (1)
604-625: ⚡ Quick winAdd a focused unit test for annotation gating and contents.
Given the hardcoded override list, a small table-driven test for
e2eDefaultAnnotations()(env unset vsE2E_RESOURCE_REQUEST_OVERRIDES=1) would catch typos and accidental list regressions early.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/util/options.go` around lines 604 - 625, Add a table-driven unit test for the e2eDefaultAnnotations() function that validates both cases (env unset and E2E_RESOURCE_REQUEST_OVERRIDES="1") by setting and restoring the environment variable, calling e2eDefaultAnnotations(), and asserting the returned slice contains exactly the expected base annotations and, when set, the full hardcoded override entries; implement subtests (t.Run) for each case, compare results deterministically (e.g., sort slices or compare as sets) and fail with clear messages if any expected annotation is missing or any unexpected entry is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/e2e/util/options.go`:
- Around line 604-625: Add a table-driven unit test for the
e2eDefaultAnnotations() function that validates both cases (env unset and
E2E_RESOURCE_REQUEST_OVERRIDES="1") by setting and restoring the environment
variable, calling e2eDefaultAnnotations(), and asserting the returned slice
contains exactly the expected base annotations and, when set, the full hardcoded
override entries; implement subtests (t.Run) for each case, compare results
deterministically (e.g., sort slices or compare as sets) and fail with clear
messages if any expected annotation is missing or any unexpected entry is
present.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 32de6cd1-2218-4161-b3c4-92a6e225a378
📒 Files selected for processing (1)
test/e2e/util/options.go
|
Scheduling tests matching the |
…ponents Control plane pods have CPU requests far below actual usage (e.g. ignition-server requests 10m but peaks at 960m). This causes the scheduler to over-pack hosted clusters onto management nodes, leading to CPU starvation and e2e test failures from pods timing out. Add resource-request-override annotations to e2e HostedCluster defaults to better reflect actual CPU consumption. This raises per-cluster total CPU requests from ~3015m to ~4700m, giving the scheduler accurate capacity signals to spread workloads across nodes. The overrides are opt-in via the E2E_RESOURCE_REQUEST_OVERRIDES=1 env var to avoid inadvertently breaking jobs that are running fine today. Based-on: openshift#8350 Original-author: Cesar Wong <cewong@redhat.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
064d927 to
53b5ee6
Compare
|
New changes are detected. LGTM label has been removed. |
|
Now I have the complete analysis. Here is the report: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Root CauseThe project's
The current imports in This Recommendations
Evidence
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/e2e/util/options.go (1)
604-625: ⚡ Quick winAdd focused unit tests for env-gated default annotations.
Please add table-driven tests for
e2eDefaultAnnotations()covering env var off/on and asserting expected annotation presence. This is a small change that will protect against future regressions in key names/values.Suggested test cases
+// options_test.go +func TestE2EDefaultAnnotations(t *testing.T) { + t.Run("overrides disabled", func(t *testing.T) { + t.Setenv("E2E_RESOURCE_REQUEST_OVERRIDES", "") + got := e2eDefaultAnnotations() + // assert base annotations exist; assert override annotation absent + }) + + t.Run("overrides enabled", func(t *testing.T) { + t.Setenv("E2E_RESOURCE_REQUEST_OVERRIDES", "1") + got := e2eDefaultAnnotations() + // assert base annotations exist; assert representative override annotations exist + }) +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/util/options.go` around lines 604 - 625, Add table-driven unit tests for e2eDefaultAnnotations that validate both environments: when E2E_RESOURCE_REQUEST_OVERRIDES is unset/empty and when it is "1". For each case set/clear the env var in the test, call e2eDefaultAnnotations(), and assert the returned slice contains the base annotations (CleanupCloudResourcesAnnotation and SkipReleaseImageValidation) and, when env is "1", also contains the expected resource-request override entries (verify presence of strings built from hyperv1.ResourceRequestOverrideAnnotationPrefix plus keys like "kube-apiserver.kube-apiserver=cpu=500m" and "control-plane-operator.control-plane-operator=cpu=500m"); also assert lengths or absence of those overrides when env is off. Use table-driven subtests to avoid flakiness and ensure you restore the original env var after each case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/e2e/util/options.go`:
- Around line 604-625: Add table-driven unit tests for e2eDefaultAnnotations
that validate both environments: when E2E_RESOURCE_REQUEST_OVERRIDES is
unset/empty and when it is "1". For each case set/clear the env var in the test,
call e2eDefaultAnnotations(), and assert the returned slice contains the base
annotations (CleanupCloudResourcesAnnotation and SkipReleaseImageValidation)
and, when env is "1", also contains the expected resource-request override
entries (verify presence of strings built from
hyperv1.ResourceRequestOverrideAnnotationPrefix plus keys like
"kube-apiserver.kube-apiserver=cpu=500m" and
"control-plane-operator.control-plane-operator=cpu=500m"); also assert lengths
or absence of those overrides when env is off. Use table-driven subtests to
avoid flakiness and ensure you restore the original env var after each case.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 93dfc7a2-c7a5-4966-9e54-469dca05820b
📒 Files selected for processing (1)
test/e2e/util/options.go
What this PR does / why we need it:
Cherry-picks the resource request override annotations from #8350 (by @csrwng) but makes them opt-in via
E2E_RESOURCE_REQUEST_OVERRIDES=1env var.Control plane pod CPU requests are far below actual usage (e.g. ignition-server requests 10m but peaks at 960m). The scheduler sees ~3,015m per cluster and packs too many clusters onto a few nodes. Actual peak is ~8,000m per cluster, causing CPU starvation that delays CRD establishment and triggers test timeouts.
When enabled, overrides CPU requests for control plane pods to better reflect actual consumption (~4700m per cluster vs. ~3015m default), preventing scheduler over-packing. The env var lets us enable it selectively for affected jobs (e.g. Azure e2e) first, without inadvertently breaking other jobs that are running fine.
Which issue(s) this PR fixes:
Fixes CNTRLPLANE-3351
Special notes for your reviewer:
TEST_CPO_OVERRIDEE2E_RESOURCE_REQUEST_OVERRIDES=1in the job envChecklist:
Summary by CodeRabbit