Skip to content

selenosis does not delete Failed Browser CRDs; pods stuck NotReady (1/2) indefinitely #8

Description

@mideeff

Summary

Browser pods can remain stuck in 1/2 NotReady indefinitely after seleniferous exits, while the corresponding Browser CRD stays in Failed. This persists well beyond SESSION_IDLE_TIMEOUT. Cleanup requires manual intervention.
Observed when running many parallel browser sessions and under cluster resource pressure (CPU/memory/scheduling; many Pending or slow-starting pods). The product issue is lack of guaranteed cleanup when Browser is terminal Failed and selenosis does not delete the CRD.

Environment

Component Version
selenosis alcounit/selenosis:v2.0.7
browser-controller alcounit/browser-controller:v0.0.6
browser-service alcounit/browser-service:v0.0.6
browser-ui alcounit/browser-ui:v0.0.7
seleniferous (sidecar) alcounit/seleniferous:v2.0.6

Steps to reproduce

  1. Run automated tests that create many browser sessions in parallel (enough to stress CPU/memory or scheduling in the namespace).
  2. Optionally observe scheduling pressure: kubectl get pods -n <namespace> shows multiple Pending or slow-starting browser pods.
  3. Wait for tests to finish.
  4. Check pods: kubectl get pods -n <namespace>
  5. Check CRDs: kubectl get browsers -n <namespace>

Expected behavior

After SESSION_IDLE_TIMEOUT (or after a definitive failure path), selenosis should remove the Browser CRD. browser-controller should complete finalizer handling and the pod should terminate.

Actual behavior

Pods remain 1/2 NotReady. Browser CRDs remain Failed and are not removed by selenosis (stayed more than 3 days in our cluster).
Example:

kubectl get browsers -n selenosis
NAME                                   BROWSER   VERSION     PHASE    AGE
eee39701-45c9-4427-a7bd-ace1f6502377   chrome    145.0-csp   Failed   33m
f31c9a1a-0933-4607-9bff-448326efb3ca   chrome    145.0-csp   Failed   33m
kubectl get pods -n selenosis
eee39701-45c9-4427-a7bd-ace1f6502377   1/2   NotReady   0   33m
f31c9a1a-0933-4607-9bff-448326efb3ca   1/2   NotReady   0   33m

Logs

browser-controller

browser-controller — marks Failed, sees seleniferous not ready, then stops further reconciliation (waits for selenosis to delete the CRD):

{"level":"info","Browser":{"name":"eee39701-45c9-4427-a7bd-ace1f6502377","namespace":"selenosis"},"message":"Browser status set to Failed"}
{"level":"info","Browser":{"name":"eee39701-45c9-4427-a7bd-ace1f6502377","namespace":"selenosis"},"containerName":"seleniferous","containerReady":"false","restartCount":0,"message":"Browser Pod container statuses"}
{"level":"info","Browser":{"name":"eee39701-45c9-4427-a7bd-ace1f6502377","namespace":"selenosis"},"finalizers":["browserpod.selenosis.io/finalizer"],"message":"current finalizers on Browser"}
{"level":"info","Browser":{"name":"eee39701-45c9-4427-a7bd-ace1f6502377","namespace":"selenosis"},"message":"Browser is in Failed state, nothing to do"}

Suspected root cause

  1. Under resource pressure, parallel runs produce partial failures: e.g. seleniferous exits or never becomes ready while the browser container may still run → pod 1/2, Browser Failed.
  2. selenosis may not complete normal session lifecycle / idle-timeout cleanup for these objects (e.g. session never fully registered or state lost under load).
  3. browser-controller sets Failed and then "nothing to do", expecting selenosis to delete the Browser CRD.
  4. Deadlock: selenosis does not delete the CRD → finalizer / pod cleanup does not finish → pods linger NotReady.

Workaround

kubectl delete browsers -n selenosis --all

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingv0.0.7

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions