Skip to content

CNTRLPLANE-3351: e2e: add opt-in CPU resource request overrides for control plane components#8385

Open
bryan-cox wants to merge 1 commit intoopenshift:mainfrom
bryan-cox:CNTRLPLANE-3351
Open

CNTRLPLANE-3351: e2e: add opt-in CPU resource request overrides for control plane components#8385
bryan-cox wants to merge 1 commit intoopenshift:mainfrom
bryan-cox:CNTRLPLANE-3351

Conversation

@bryan-cox
Copy link
Copy Markdown
Member

@bryan-cox bryan-cox commented Apr 30, 2026

What this PR does / why we need it:

Cherry-picks the resource request override annotations from #8350 (by @csrwng) but makes them opt-in via E2E_RESOURCE_REQUEST_OVERRIDES=1 env var.

Control plane pod CPU requests are far below actual usage (e.g. ignition-server requests 10m but peaks at 960m). The scheduler sees ~3,015m per cluster and packs too many clusters onto a few nodes. Actual peak is ~8,000m per cluster, causing CPU starvation that delays CRD establishment and triggers test timeouts.

When enabled, overrides CPU requests for control plane pods to better reflect actual consumption (~4700m per cluster vs. ~3015m default), preventing scheduler over-packing. The env var lets us enable it selectively for affected jobs (e.g. Azure e2e) first, without inadvertently breaking other jobs that are running fine.

Which issue(s) this PR fixes:

Fixes CNTRLPLANE-3351

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Tests
    • Centralized default test annotations and made them environment-driven.
    • Default cleanup and validation annotations are preserved.
    • New opt-in behavior: setting an environment switch adds CPU resource-override annotations for control-plane and operator components during e2e runs.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 30, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 30, 2026

@bryan-cox: This pull request references CNTRLPLANE-3351 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Cherry-picks the resource request override annotations from WIP: e2e: add CPU resource request overrides for control plane components #8350 (by @csrwng) but makes them opt-in via E2E_RESOURCE_REQUEST_OVERRIDES=1 env var
  • When enabled, overrides CPU requests for control plane pods to better reflect actual consumption (~4700m per cluster vs. ~3015m default), preventing scheduler over-packing
  • Follows the same pattern as the existing TEST_CPO_OVERRIDE env var

Problem

Control plane pod CPU requests are far below actual usage (e.g. ignition-server requests 10m but peaks at 960m). The scheduler sees ~3,015m per cluster and packs too many clusters onto a few nodes. Actual peak is ~8,000m per cluster, causing CPU starvation that delays CRD establishment and triggers test timeouts.

Why opt-in?

Making this unconditional could inadvertently break other jobs (e.g. AWS e2e) that are running fine with current scheduling. The env var lets us enable it selectively for affected jobs (e.g. Azure e2e) first.

Test plan

  • go vet ./test/e2e/util/... passes
  • Run Azure self-managed e2e with E2E_RESOURCE_REQUEST_OVERRIDES=1 and verify pods start within expected timeframes
  • Verify no regression on AWS e2e without the env var set

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Walkthrough

The pull request refactors default cluster creation annotations in the E2E test utilities by moving static annotation construction from DefaultClusterOptions into a new unexported helper e2eDefaultAnnotations(). The helper always includes CleanupCloudResourcesAnnotation=true and SkipReleaseImageValidation=true. If the environment variable E2E_RESOURCE_REQUEST_OVERRIDES equals "1", it appends a fixed set of CPU request override annotations for specific control-plane and component pods. No exported/public APIs were changed.

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding opt-in CPU resource request overrides for control plane components via environment variable, which is the core functionality introduced in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed This PR modifies a utility file for e2e test configuration, not a Ginkgo test file. No Ginkgo test declarations (It(), Describe(), Context(), When(), etc.) are added or modified.
Test Structure And Quality ✅ Passed The modified file is a utility configuration module containing helper functions that provide default cluster options and annotations, not test code. It contains no Ginkgo test blocks, assertions, or test interactions requiring quality checks.
Microshift Test Compatibility ✅ Passed PR does not add any new Ginkgo e2e tests; only refactors test configuration by extracting annotation logic into a helper function.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This pull request does not add any new Ginkgo e2e tests. Changes are confined to utility/configuration file with no test function definitions.
Topology-Aware Scheduling Compatibility ✅ Passed Pull request modifies only test utility code and does not introduce deployment manifests, operator code, or controllers with scheduling constraints. Resource request override annotations do not constitute topology-dependent scheduling logic.
Ote Binary Stdout Contract ✅ Passed The new e2eDefaultAnnotations() function uses only fmt.Sprintf() for in-memory string building and returns data structures without writing to stdout. The function is not process-level code and does not emit output that could corrupt the OTE JSON protocol.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The PR only modifies test/e2e/util/options.go without adding new Ginkgo e2e tests. The introduced helper function contains no IPv4 hardcoded addresses, IPv4-specific parsing logic, IPv4 CIDR blocks, or external connectivity calls.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from jparrill and sjenning April 30, 2026 14:00
@openshift-ci openshift-ci Bot added the area/testing Indicates the PR includes changes for e2e testing label Apr 30, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 30, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-area labels Apr 30, 2026
@csrwng
Copy link
Copy Markdown
Contributor

csrwng commented Apr 30, 2026

/lgtm

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 36.42%. Comparing base (5eaee74) to head (53b5ee6).
⚠️ Report is 29 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8385   +/-   ##
=======================================
  Coverage   36.42%   36.42%           
=======================================
  Files         765      765           
  Lines       93302    93302           
=======================================
  Hits        33981    33981           
  Misses      56606    56606           
  Partials     2715     2715           
Flag Coverage Δ
cmd-support 30.37% <ø> (ø)
cpo-hostedcontrolplane 37.08% <ø> (ø)
cpo-other 35.69% <ø> (ø)
hypershift-operator 47.88% <ø> (ø)
other 27.76% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e/util/options.go (1)

604-625: ⚡ Quick win

Add a focused unit test for annotation gating and contents.

Given the hardcoded override list, a small table-driven test for e2eDefaultAnnotations() (env unset vs E2E_RESOURCE_REQUEST_OVERRIDES=1) would catch typos and accidental list regressions early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/util/options.go` around lines 604 - 625, Add a table-driven unit
test for the e2eDefaultAnnotations() function that validates both cases (env
unset and E2E_RESOURCE_REQUEST_OVERRIDES="1") by setting and restoring the
environment variable, calling e2eDefaultAnnotations(), and asserting the
returned slice contains exactly the expected base annotations and, when set, the
full hardcoded override entries; implement subtests (t.Run) for each case,
compare results deterministically (e.g., sort slices or compare as sets) and
fail with clear messages if any expected annotation is missing or any unexpected
entry is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/e2e/util/options.go`:
- Around line 604-625: Add a table-driven unit test for the
e2eDefaultAnnotations() function that validates both cases (env unset and
E2E_RESOURCE_REQUEST_OVERRIDES="1") by setting and restoring the environment
variable, calling e2eDefaultAnnotations(), and asserting the returned slice
contains exactly the expected base annotations and, when set, the full hardcoded
override entries; implement subtests (t.Run) for each case, compare results
deterministically (e.g., sort slices or compare as sets) and fail with clear
messages if any expected annotation is missing or any unexpected entry is
present.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 32de6cd1-2218-4161-b3c4-92a6e225a378

📥 Commits

Reviewing files that changed from the base of the PR and between d09cc78 and 064d927.

📒 Files selected for processing (1)
  • test/e2e/util/options.go

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

…ponents

Control plane pods have CPU requests far below actual usage (e.g.
ignition-server requests 10m but peaks at 960m). This causes the
scheduler to over-pack hosted clusters onto management nodes, leading
to CPU starvation and e2e test failures from pods timing out.

Add resource-request-override annotations to e2e HostedCluster defaults
to better reflect actual CPU consumption. This raises per-cluster total
CPU requests from ~3015m to ~4700m, giving the scheduler accurate
capacity signals to spread workloads across nodes.

The overrides are opt-in via the E2E_RESOURCE_REQUEST_OVERRIDES=1 env
var to avoid inadvertently breaking jobs that are running fine today.

Based-on: openshift#8350
Original-author: Cesar Wong <cewong@redhat.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 30, 2026

New changes are detected. LGTM label has been removed.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

Now I have the complete analysis. Here is the report:

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

test/e2e/util/options.go:229:1: File is not properly formatted (gci)
			Annotations: e2eDefaultAnnotations(),
^
1 issues:
* gci: 1
make: *** [Makefile:120: lint] Error 1

Summary

The gci (Go import ordering) formatter detected that test/e2e/util/options.go has import grouping/formatting that does not match the project's configured gci sections in .golangci.yml. This is a pre-existing import ordering violation — it was not introduced by PR #8385. The violation only surfaced now because golangci-lint's diff processor restricts checking to files modified in the PR, and this PR modified options.go (refactoring annotations into the new e2eDefaultAnnotations() function). The previous PR to touch this file (#8320, merged April 24) would have passed because gci's behavior or the diff boundary was different. The fix is to run make lint-fix to auto-correct the import order.

Root Cause

The project's .golangci.yml configures the gci formatter under formatters.settings.gci.sections with a strict 9-group import ordering:

  1. standard (stdlib)
  2. dot (dot imports)
  3. prefix(github.com/openshift/hypershift)
  4. prefix(github.com/openshift)
  5. prefix(github.com/aws)
  6. prefix(github.com/Azure)
  7. prefix(k8s.io)
  8. prefix(sigs.k8s.io)
  9. default (all other third-party)

The current imports in test/e2e/util/options.go do not match the exact formatting gci expects — likely incorrect blank-line separators between import groups or subtle ordering within groups. The gci formatter rewrites the entire file's imports and compares; when the reformatted file differs from the original, it reports "File is not properly formatted" at the first divergent line (line 229, which is in the code body, indicating the import reformatting shifted line numbers).

This gci config has been in place since the golangci-lint v2 upgrade (commit 3be7ddede6ac, November 30, 2025). The violation in options.go has existed since at least that date but remained dormant because golangci-lint only checks files changed in the PR (diff processor: in/out = 1/1). PR #8385 modified options.go to refactor the Annotations field, triggering gci to validate the whole file and exposing the pre-existing import issue.

Recommendations
  1. Fix the import ordering — Run make lint-fix locally, which invokes golangci-lint run --fix. This will auto-reformat the imports in test/e2e/util/options.go to match the configured gci sections. Commit the result.

  2. Alternative manual fix — Reorder the imports in test/e2e/util/options.go to match the 9-section grouping in .golangci.yml, ensuring each group is separated by a blank line:

    import (
        // standard library
        
        // github.com/openshift/hypershift/...
        
        // k8s.io/...
        
        // sigs.k8s.io/...
        
        // default (github.com/blang/semver)
    )
  3. Consider a repo-wide gci fix — Other files may have similar dormant gci violations that will surface when touched. Running golangci-lint run --fix on the entire repo would proactively fix all import ordering issues. The processing stats show 915 total issues found before filtering — some of those may be gci violations in other files.

Evidence
Evidence Detail
Lint error test/e2e/util/options.go:229:1: File is not properly formatted (gci)
Linter version golangci-lint v2.11.4, built with go1.25.7
Diff filter stats diff: 1/1 — 1 issue survived diff filtering (file was modified by PR)
Processing pipeline 915 → generated_file_filter: 496 → exclusion_rules: 57 → nolint_filter: 1 → diff: 1/1
gci config source .golangci.ymlformatters.settings.gci.sections (9 sections, custom-order: true)
Config introduced Commit 3be7ddede6ac (2025-11-30) — golangci-lint v2 upgrade
PR change in file Refactored Annotations field from inline []string{...} to e2eDefaultAnnotations() call; added new function e2eDefaultAnnotations()
Files changed by PR 1 file: test/e2e/util/options.go (+26, -6)
Pre-existing issue Imports identical on main branch — violation existed before this PR

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e/util/options.go (1)

604-625: ⚡ Quick win

Add focused unit tests for env-gated default annotations.

Please add table-driven tests for e2eDefaultAnnotations() covering env var off/on and asserting expected annotation presence. This is a small change that will protect against future regressions in key names/values.

Suggested test cases
+// options_test.go
+func TestE2EDefaultAnnotations(t *testing.T) {
+  t.Run("overrides disabled", func(t *testing.T) {
+    t.Setenv("E2E_RESOURCE_REQUEST_OVERRIDES", "")
+    got := e2eDefaultAnnotations()
+    // assert base annotations exist; assert override annotation absent
+  })
+
+  t.Run("overrides enabled", func(t *testing.T) {
+    t.Setenv("E2E_RESOURCE_REQUEST_OVERRIDES", "1")
+    got := e2eDefaultAnnotations()
+    // assert base annotations exist; assert representative override annotations exist
+  })
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/util/options.go` around lines 604 - 625, Add table-driven unit tests
for e2eDefaultAnnotations that validate both environments: when
E2E_RESOURCE_REQUEST_OVERRIDES is unset/empty and when it is "1". For each case
set/clear the env var in the test, call e2eDefaultAnnotations(), and assert the
returned slice contains the base annotations (CleanupCloudResourcesAnnotation
and SkipReleaseImageValidation) and, when env is "1", also contains the expected
resource-request override entries (verify presence of strings built from
hyperv1.ResourceRequestOverrideAnnotationPrefix plus keys like
"kube-apiserver.kube-apiserver=cpu=500m" and
"control-plane-operator.control-plane-operator=cpu=500m"); also assert lengths
or absence of those overrides when env is off. Use table-driven subtests to
avoid flakiness and ensure you restore the original env var after each case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/e2e/util/options.go`:
- Around line 604-625: Add table-driven unit tests for e2eDefaultAnnotations
that validate both environments: when E2E_RESOURCE_REQUEST_OVERRIDES is
unset/empty and when it is "1". For each case set/clear the env var in the test,
call e2eDefaultAnnotations(), and assert the returned slice contains the base
annotations (CleanupCloudResourcesAnnotation and SkipReleaseImageValidation)
and, when env is "1", also contains the expected resource-request override
entries (verify presence of strings built from
hyperv1.ResourceRequestOverrideAnnotationPrefix plus keys like
"kube-apiserver.kube-apiserver=cpu=500m" and
"control-plane-operator.control-plane-operator=cpu=500m"); also assert lengths
or absence of those overrides when env is off. Use table-driven subtests to
avoid flakiness and ensure you restore the original env var after each case.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 93dfc7a2-c7a5-4966-9e54-469dca05820b

📥 Commits

Reviewing files that changed from the base of the PR and between 064d927 and 53b5ee6.

📒 Files selected for processing (1)
  • test/e2e/util/options.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants