SDCICD-1840 Add MCSimulation operator restart logic#3255
Conversation
|
There are test jobs defined for this repository which are not configured to run automatically. Comment |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: YiqinZhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/override ci/prow/hypershift-pr-check |
|
@YiqinZhang: Overrode contexts on behalf of YiqinZhang: ci/prow/hypershift-pr-check DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
0845cc6 to
6a62f99
Compare
6a62f99 to
a17640a
Compare
|
/override ci/prow/hypershift-pr-check |
|
@YiqinZhang: Overrode contexts on behalf of YiqinZhang: ci/prow/hypershift-pr-check DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
pkg/common/mcsimulation/restart.go (1)
58-58: ⚡ Quick winHardcoded timeout lacks flexibility.
The 2-minute timeout may be insufficient for operators with slow startup or excessive for fast-starting operators. Consider making this configurable via function parameter.
♻️ Proposed refactor
-func RestartOperator(ctx context.Context, clientset kubernetes.Interface, namespace, labelSelector string) error { +func RestartOperator(ctx context.Context, clientset kubernetes.Interface, namespace, labelSelector string, readinessTimeout time.Duration) error { ... - if err := waitForDeploymentReady(ctx, clientset, namespace, deploy.Name, 2*time.Minute); err != nil { + if err := waitForDeploymentReady(ctx, clientset, namespace, deploy.Name, readinessTimeout); err != nil {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/common/mcsimulation/restart.go` at line 58, The call to waitForDeploymentReady uses a hardcoded 2*time.Minute which is inflexible; update the code to accept a configurable timeout instead: add a timeout parameter (e.g., ctxTimeout or readyTimeout time.Duration) to the function that calls waitForDeploymentReady and to waitForDeploymentReady itself, replace the hardcoded 2*time.Minute with that parameter, and propagate the new parameter through any callers (or provide a sensible default constant and overload/variant) so operators can adjust the readiness wait duration without changing the code.pkg/common/mcsimulation/restart_test.go (2)
1-108: ⚡ Quick winAdd test for multiple deployments scenario.
Given the multiple-deployment ambiguity in
RestartOperator(see restart.go comment), add a test case that creates two deployments matching the label selector and verifies the function returns an appropriate error.📝 Proposed test case
func TestRestartOperator_MultipleDeployments(t *testing.T) { deploy1 := newDeployment(1) deploy2 := newDeployment(1) deploy2.Name = "controller-manager-2" client := fake.NewSimpleClientset(deploy1, deploy2) err := RestartOperator(context.Background(), client, testNS, "app=test-operator") if err == nil || !strings.Contains(err.Error(), "multiple deployments") { t.Fatalf("expected 'multiple deployments' error, got: %v", err) } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/common/mcsimulation/restart_test.go` around lines 1 - 108, Add a new unit test named TestRestartOperator_MultipleDeployments that constructs two deployments via newDeployment (change the second's Name to e.g. "controller-manager-2"), seeds both into fake.NewSimpleClientset, calls RestartOperator with the selector "app=test-operator", and asserts the call returns a non-nil error whose message contains "multiple deployments" to validate RestartOperator rejects ambiguous matches; use context.Background() for the call and follow the same error assertion pattern as the other tests.
48-63: 💤 Low valueTest validation incomplete.
The test verifies pod deletion but doesn't validate the full restart cycle. The fake client doesn't simulate the Deployment controller recreating pods, so this test only confirms deletion succeeds and the initial deployment readiness passes immediately.
Consider adding a comment documenting this limitation or using a reactor to simulate pod recreation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/common/mcsimulation/restart_test.go` around lines 48 - 63, The test TestRestartOperator_DeletesPodsAndWaitsReady currently only asserts pod deletion because fake.NewSimpleClientset doesn't simulate Deployment controller behavior; either add a clear comment above TestRestartOperator_DeletesPodsAndWaitsReady explaining this limitation (that RestartOperator's wait-for-ready relies on real controller behavior and the fake client only exercises deletion), or enhance the fake client with a reactor: attach a delete-reactor on Pods (or a watch-reactor) that, when the controller-manager pod is deleted, programmatically creates a replacement pod (using newPod) and/or updates the Deployment object (from newDeployment) to simulate readiness so RestartOperator's full restart-and-wait logic is exercised; reference RestartOperator, TestRestartOperator_DeletesPodsAndWaitsReady, newPod, newDeployment and fake.NewSimpleClientset when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/common/mcsimulation/restart.go`:
- Around line 22-33: The current RestartOperator function silently picks the
first match when multiple Deployments match the labelSelector; update it to
detect when len(deployments.Items) > 1 and handle that case explicitly: either
return an error detailing how many matches were found and listing their names
(use deployments.Items[i].Name) so the caller can resolve the ambiguity, or
choose a deterministic selection rule (e.g., require exact one, or sort by name)
and document it; modify the code around the deployments variable and deploy
assignment in RestartOperator to implement this behavior and surface a clear
error message when multiple matches exist.
---
Nitpick comments:
In `@pkg/common/mcsimulation/restart_test.go`:
- Around line 1-108: Add a new unit test named
TestRestartOperator_MultipleDeployments that constructs two deployments via
newDeployment (change the second's Name to e.g. "controller-manager-2"), seeds
both into fake.NewSimpleClientset, calls RestartOperator with the selector
"app=test-operator", and asserts the call returns a non-nil error whose message
contains "multiple deployments" to validate RestartOperator rejects ambiguous
matches; use context.Background() for the call and follow the same error
assertion pattern as the other tests.
- Around line 48-63: The test TestRestartOperator_DeletesPodsAndWaitsReady
currently only asserts pod deletion because fake.NewSimpleClientset doesn't
simulate Deployment controller behavior; either add a clear comment above
TestRestartOperator_DeletesPodsAndWaitsReady explaining this limitation (that
RestartOperator's wait-for-ready relies on real controller behavior and the fake
client only exercises deletion), or enhance the fake client with a reactor:
attach a delete-reactor on Pods (or a watch-reactor) that, when the
controller-manager pod is deleted, programmatically creates a replacement pod
(using newPod) and/or updates the Deployment object (from newDeployment) to
simulate readiness so RestartOperator's full restart-and-wait logic is
exercised; reference RestartOperator,
TestRestartOperator_DeletesPodsAndWaitsReady, newPod, newDeployment and
fake.NewSimpleClientset when making the change.
In `@pkg/common/mcsimulation/restart.go`:
- Line 58: The call to waitForDeploymentReady uses a hardcoded 2*time.Minute
which is inflexible; update the code to accept a configurable timeout instead:
add a timeout parameter (e.g., ctxTimeout or readyTimeout time.Duration) to the
function that calls waitForDeploymentReady and to waitForDeploymentReady itself,
replace the hardcoded 2*time.Minute with that parameter, and propagate the new
parameter through any callers (or provide a sensible default constant and
overload/variant) so operators can adjust the readiness wait duration without
changing the code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: c6411e1a-ede7-4294-9d59-954a2632a46d
📒 Files selected for processing (2)
pkg/common/mcsimulation/restart.gopkg/common/mcsimulation/restart_test.go
|
/override ci/prow/hypershift-pr-check |
|
@YiqinZhang: Overrode contexts on behalf of YiqinZhang: ci/prow/hypershift-pr-check DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override ci/prow/hypershift-pr-check |
|
@YiqinZhang: Overrode contexts on behalf of YiqinZhang: ci/prow/hypershift-pr-check DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@YiqinZhang: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Adds RestartOperator to the mcsimulation package. After CRD installation, operators that detect CRD presence at startup (like RMO's HostedControlPlane controller) need to be restarted to pick up the newly registered CRDs. This function handles that restart safely.
Approach mirrors RMO's restartRMODeployment pattern: delete matching pods (Deployment controller recreates them automatically) then poll until ReadyReplicas > 0. No Deployment spec mutation — equivalent to kubectl rollout restart without the annotation.
Co-authored-by: Cursor cursor@cursor.com
Summary by CodeRabbit
Release Notes
New Features
Tests