Replace Redis with Garage S3 and migrate workflow scaling to metrics API#51
Replace Redis with Garage S3 and migrate workflow scaling to metrics API#51
Conversation
Replace the removed redis-list KEDA triggers with per-workflow metrics-api triggers that poll /jobs/v2/counts on the web api and scale on the pending count for each workflow. Add api-web to scaled_job_deps in Tiltfile so ScaledJobs come up after the endpoint is reachable.
Run the bucket/key bootstrap as a native sidecar inside the garage pod instead of a separate Job. Sidecar readiness gates pod readiness so dependents only proceed after bootstrap completes.
There was a problem hiding this comment.
Code Review
This pull request replaces Redis with Garage (S3-compatible storage) and transitions KEDA triggers from Redis lists to a metrics-based API. It also updates several service image versions, refactors the tag update script, and introduces a wipe utility. Key feedback includes a critical deadlock in the Garage StatefulSet where an initContainer waits for the main container to be healthy, and the inefficient practice of installing packages during container startup. Additionally, improvements were suggested for the portability of shell scripts, handling version prefixes in release tags, and addressing security concerns regarding the use of sudo in cleanup operations.
| initContainers: | ||
| - name: garage-init | ||
| image: alpine:3.20 | ||
| restartPolicy: Always | ||
| command: ["/bin/sh", "-c", "/scripts/init.sh && touch /tmp/ready && exec sleep infinity"] | ||
| resources: | ||
| requests: | ||
| cpu: 50m | ||
| memory: 32Mi | ||
| limits: | ||
| cpu: 100m | ||
| memory: 64Mi | ||
| readinessProbe: | ||
| exec: | ||
| command: ["test", "-f", "/tmp/ready"] | ||
| periodSeconds: 2 | ||
| volumeMounts: | ||
| - name: init-script | ||
| mountPath: /scripts |
There was a problem hiding this comment.
This initContainer will cause a deadlock. In Kubernetes, all initContainers (including sidecars with restartPolicy: Always) must either complete or become ready before the main containers start. Since init.sh waits for the garage service (which is in the containers section) to be healthy (line 41), it will never become ready, preventing the garage container from ever starting.
To fix this, move the bootstrap logic into a regular container within the containers section so they start concurrently. Note that restartPolicy: Always is not a valid field for regular containers and should be removed.
| ACCESS_KEY_ID="GK000000000000000000000001" | ||
| SECRET_ACCESS_KEY="0000000000000000000000000000000000000000000000000000000000000001" | ||
|
|
||
| apk add --no-cache curl jq >/dev/null |
| # Update the file if it exists | ||
| if [[ -f "$file" ]]; then | ||
| sed -i "s/${prefix}[0-9.]\+/${prefix}${tag}/" "$file" | ||
| sed -i "s|${prefix}[0-9.]\+|${prefix}${tag}|" "$file" |
There was a problem hiding this comment.
The sed -i command is not portable between GNU and BSD (macOS) versions. On macOS, sed -i requires an extension argument (e.g., sed -i '' ...). A portable approach is to use a temporary backup extension and then remove it.
| sed -i "s|${prefix}[0-9.]\+|${prefix}${tag}|" "$file" | |
| sed -i.bak "s|${prefix}[0-9.]\+|${prefix}${tag}|" "$file" && rm "${file}.bak" |
|
|
||
| echo "Clearing host data at /virtool/data..." | ||
| if [ -d /virtool/data ]; then | ||
| rm -rf /virtool/data/* 2>/dev/null || sudo rm -rf /virtool/data/* |
There was a problem hiding this comment.
Using sudo rm -rf on a host path is risky and can lead to accidental data loss if the path is misconfigured. It also interrupts the workflow by prompting for a password. Consider if this cleanup can be handled via a Kubernetes Job with appropriate volume mounts or by ensuring the user has the necessary permissions without sudo.
- scripts/wipe.sh: refuse to run unless kubectl current-context and the running Minikube profile are both 'virtool'; replace `sudo rm -rf /virtool/data/*` with `find -mindepth 1` so dotfiles are removed too. - scripts/init.sh: pin every minikube/kubectl call to the dedicated 'virtool' profile/context. - scripts/pull.sh: strip leading 'v' from upstream tags and clean up the sed backup file. - manifests/db/garage.yaml: move garage-init from an initContainer running apk-installed curl/jq to a sidecar using the badouralix/curl-jq image.
Swap the Garage S3 store for Azurite (the Azure Blob Storage emulator) so the dev cluster matches the SDK and semantics used in production. Drops the Garage StatefulSet, init sidecar, and admin/layout/key bootstrap dance in favor of a single Azurite container with a /data PVC. Updates Tiltfile, wipe.sh, and AGENTS.md to reference azurite. App-side env wiring (connection string for devstoreaccount1) is a follow-up in the virtool repo; the deployments in this repo do not currently set any S3/Blob env vars.
Summary
/jobs/counts), eliminating the Redis dependency for job queue scalingpull.shscript to keep both kustomization and migration manifest in syncAGENTS.md(withCLAUDE.mdsymlinked to it) documenting the dev environment layout and common tasks