Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions charts/openstack-hypervisor-operator/alerts/eviction.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/evictionfailed
annotations:
summary: "Eviction {{ $labels.name }} has failed"
description: "The eviction {{ $labels.name }} for hypervisor {{ $labels.hypervisor }} has reached a terminal failure state. Manual intervention is required — check if the hypervisor exists in OpenStack."
Expand All @@ -24,6 +25,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/evictionmigrationfailing
annotations:
summary: "Eviction {{ $labels.name }} has failing instance migrations for over 1 hour"
description: "The eviction {{ $labels.name }} has had MigratingInstance=Failed for more than 1 hour while still running. Instances may be in ERROR state, blocking eviction progress."
Expand All @@ -37,6 +39,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/evictionoutstandingram
annotations:
summary: "Eviction {{ $labels.name }} has outstanding RAM for over 6 hours"
description: "The eviction {{ $labels.name }} has had {{ $value }}MB of outstanding RAM for more than 6 hours. Check for stuck live-migrations or instances that cannot be moved."
7 changes: 7 additions & 0 deletions charts/openstack-hypervisor-operator/alerts/operator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisoronboardingstuck
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify operator playbook documentation exists
# Expected: Find references to these playbook paths in the repository or documentation

echo "Searching for operator playbook path references..."
rg -i "hypervisoronboardingstuck|hypervisorevictionstuck|hypervisorevictedtoolong|hypervisortraitsyncfailed|hypervisoraggregatesyncfailed|hypervisorreconcileerrors|hypervisoroperatordown" \
  --type md --type yaml -g '!alerts/' -C2

echo -e "\nSearching for playbook documentation structure..."
fd -t f -e md -i hypervisor | head -20

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 676


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) Search for exact referenced playbook paths..."
rg -n --hidden --no-ignore-vcs \
  -S "docs/compute/kvm/playbooks/(hypervisoronboardingstuck|hypervisorevictionstuck|hypervisorevictedtoolong|hypervisortraitsyncfailed|hypervisoraggregatesyncfailed|hypervisorreconcileerrors|hypervisoroperatordown)" \
  . || true

echo
echo "2) List any files whose path includes docs/compute/kvm/playbooks/"
fd -t f -i --type f "hypervisoronboardingstuck" . || true
fd -t f -i --type f "hypervisorevictionstuck" . || true
fd -t f -i --type f "hypervisorevictedtoolong" . || true
fd -t f -i --type f "hypervisortraitsyncfailed" . || true
fd -t f -i --type f "hypervisoraggregatesyncfailed" . || true
fd -t f -i --type f "hypervisorreconcileerrors" . || true
fd -t f -i --type f "hypervisoroperatordown" . || true

echo
echo "3) Check if docs/compute/kvm/playbooks exists and show a sample listing..."
if [ -d "docs/compute/kvm/playbooks" ]; then
  ls -la "docs/compute/kvm/playbooks" | head -200
else
  echo "Directory not found: docs/compute/kvm/playbooks"
fi

echo
echo "4) Search for 'playbooks/' directory references in charts docs..."
rg -n "compute/kvm/playbooks/" -S . || true

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 2620


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) Search for exact referenced playbook paths..."
rg -n --hidden --no-ignore-vcs \
  -S "docs/compute/kvm/playbooks/(hypervisoronboardingstuck|hypervisorevictionstuck|hypervisorevictedtoolong|hypervisortraitsyncfailed|hypervisoraggregatesyncfailed|hypervisorreconcileerrors|hypervisoroperatordown)" \
  . || true

echo
echo "2) Locate any files by playbook name substring (any location, case-insensitive)..."
fd -t f -i "hypervisoronboardingstuck" .
fd -t f -i "hypervisorevictionstuck" .
fd -t f -i "hypervisorevictedtoolong" .
fd -t f -i "hypervisortraitsyncfailed" .
fd -t f -i "hypervisoraggregatesyncfailed" .
fd -t f -i "hypervisorreconcileerrors" .
fd -t f -i "hypervisoroperatordown" .

echo
echo "3) Check if docs/compute/kvm/playbooks exists and show a sample listing..."
if [ -d "docs/compute/kvm/playbooks" ]; then
  ls -la "docs/compute/kvm/playbooks" | head -200
else
  echo "Directory not found: docs/compute/kvm/playbooks"
fi

echo
echo "4) Search for playbooks/ directory references..."
rg -n "compute/kvm/playbooks/" -S . || true

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 2620


Fix broken playbook doc references (docs/compute/kvm/playbooks/*) in operator.yaml

charts/openstack-hypervisor-operator/alerts/operator.yaml adds playbook links for:

  • hypervisoronboardingstuck, hypervisorevictionstuck, hypervisorevictedtoolong, hypervisortraitsyncfailed, hypervisoraggregatesyncfailed, hypervisorreconcileerrors, hypervisoroperatordown

But docs/compute/kvm/playbooks/ does not exist anywhere in this repo (only these references are present, including in alerts/eviction.yaml). Ensure the corresponding playbook docs/pages are added at docs/compute/kvm/playbooks/ (or update the paths to the correct location) before merging.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@charts/openstack-hypervisor-operator/alerts/operator.yaml` at line 14,
operator.yaml contains broken playbook links pointing at
docs/compute/kvm/playbooks/* (e.g., hypervisoronboardingstuck,
hypervisorevictionstuck, hypervisorevictedtoolong, hypervisortraitsyncfailed,
hypervisoraggregatesyncfailed, hypervisorreconcileerrors,
hypervisoroperatordown) that do not exist; either create corresponding
documentation pages at docs/compute/kvm/playbooks/<playbook-name> (preferably
markdown files with the playbook content and frontmatter) or update the paths in
charts/openstack-hypervisor-operator/alerts/operator.yaml (and any other files
referencing the same paths such as alerts/eviction.yaml) to point to the correct
existing doc location so all listed playbook links resolve.

annotations:
summary: "Hypervisor {{ $labels.name }} onboarding stuck for over 1 hour"
description: "The hypervisor {{ $labels.name }} in zone {{ $labels.zone }} has been onboarding for more than 1 hour. Check nova registration, test VM status, or trait/aggregate sync."
Expand All @@ -22,6 +23,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisorevictionstuck
annotations:
summary: "Hypervisor {{ $labels.name }} eviction running for over 4 hours"
description: "The hypervisor {{ $labels.name }} in zone {{ $labels.zone }} has had an active eviction for more than 4 hours. Check for stuck live-migrations or failed VMs."
Expand All @@ -35,6 +37,7 @@ groups:
labels:
severity: info
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisorevictedtoolong
annotations:
summary: "Hypervisor {{ $labels.name }} has been evicted for over 7 days"
description: "The hypervisor {{ $labels.name }} in zone {{ $labels.zone }} has been evicted for more than 7 days without being offboarded. Consider re-enabling or decommissioning."
Expand All @@ -50,6 +53,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisortraitsyncfailed
annotations:
summary: "Hypervisor {{ $labels.name }} trait sync has been failing"
description: "The hypervisor {{ $labels.name }} in zone {{ $labels.zone }} has had TraitsUpdated=False for more than 30 minutes outside of onboarding. Check OpenStack Placement API connectivity."
Expand All @@ -65,6 +69,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisoraggregatesyncfailed
annotations:
summary: "Hypervisor {{ $labels.name }} aggregate sync has been failing"
description: "The hypervisor {{ $labels.name }} in zone {{ $labels.zone }} has had AggregatesUpdated=False for more than 30 minutes outside of onboarding and eviction. Check OpenStack Nova API connectivity."
Expand All @@ -78,6 +83,7 @@ groups:
labels:
severity: warning
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisorreconcileerrors
annotations:
summary: "Hypervisor operator controller {{ $labels.controller }} has persistent reconcile errors"
description: "The controller {{ $labels.controller }} has been producing sustained reconciliation errors for more than 15 minutes."
Expand All @@ -89,6 +95,7 @@ groups:
labels:
severity: critical
type: hypervisor_operator
playbook: docs/compute/kvm/playbooks/hypervisoroperatordown
annotations:
summary: "Hypervisor operator is down"
description: "The hypervisor operator metrics endpoint has been unreachable for more than 5 minutes."
27 changes: 27 additions & 0 deletions charts/openstack-hypervisor-operator/templates/servicemonitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# SPDX-FileCopyrightText: 2025 SAP SE or an SAP affiliate company and cobaltcore-dev contributors
# SPDX-License-Identifier: Apache-2.0

{{- if .Values.serviceMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: {{ include "openstack-hypervisor-operator.fullname" . }}
labels:
{{- include "openstack-hypervisor-operator.labels" . | nindent 4 }}
{{- with .Values.serviceMonitor.labels }}
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
selector:
matchLabels:
control-plane: controller-manager
{{- include "openstack-hypervisor-operator.selectorLabels" . | nindent 6 }}
endpoints:
- port: https
scheme: https
tlsConfig:
insecureSkipVerify: true
{{- with .Values.serviceMonitor.interval }}
interval: {{ . }}
{{- end }}
{{- end }}
4 changes: 4 additions & 0 deletions charts/openstack-hypervisor-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ metricsService:
protocol: TCP
targetPort: 8443
type: ClusterIP
serviceMonitor:
enabled: true
labels: {}
interval: 60s
secret:
servicePassword: ""
serviceAccount:
Expand Down
Loading