Skip to content

Replace Redis with Garage S3 and migrate workflow scaling to metrics API#51

Merged
igboyes merged 9 commits intomainfrom
big-changes
May 6, 2026
Merged

Replace Redis with Garage S3 and migrate workflow scaling to metrics API#51
igboyes merged 9 commits intomainfrom
big-changes

Conversation

@igboyes
Copy link
Copy Markdown
Member

@igboyes igboyes commented Apr 30, 2026

Summary

  • Remove Redis from the dev environment entirely; replace with Garage (a self-hosted S3-compatible object store) for storage, backed by a StatefulSet with an init-container that bootstraps the bucket and access key
  • Switch KEDA workflow ScaledJob triggers from Redis list-length polling to the Virtool jobs counts HTTP metrics API (/jobs/counts), eliminating the Redis dependency for job queue scaling
  • Bump all component image tags (virtool, UI, and all workflow images) to their latest versions and update the pull.sh script to keep both kustomization and migration manifest in sync
  • Add AGENTS.md (with CLAUDE.md symlinked to it) documenting the dev environment layout and common tasks

igboyes added 5 commits April 16, 2026 15:27
Replace the removed redis-list KEDA triggers with per-workflow
metrics-api triggers that poll /jobs/v2/counts on the web api and
scale on the pending count for each workflow.

Add api-web to scaled_job_deps in Tiltfile so ScaledJobs come up
after the endpoint is reachable.
Run the bucket/key bootstrap as a native sidecar inside the garage
pod instead of a separate Job. Sidecar readiness gates pod readiness
so dependents only proceed after bootstrap completes.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces Redis with Garage (S3-compatible storage) and transitions KEDA triggers from Redis lists to a metrics-based API. It also updates several service image versions, refactors the tag update script, and introduces a wipe utility. Key feedback includes a critical deadlock in the Garage StatefulSet where an initContainer waits for the main container to be healthy, and the inefficient practice of installing packages during container startup. Additionally, improvements were suggested for the portability of shell scripts, handling version prefixes in release tags, and addressing security concerns regarding the use of sudo in cleanup operations.

Comment thread manifests/db/garage.yaml Outdated
Comment on lines +88 to +106
initContainers:
- name: garage-init
image: alpine:3.20
restartPolicy: Always
command: ["/bin/sh", "-c", "/scripts/init.sh && touch /tmp/ready && exec sleep infinity"]
resources:
requests:
cpu: 50m
memory: 32Mi
limits:
cpu: 100m
memory: 64Mi
readinessProbe:
exec:
command: ["test", "-f", "/tmp/ready"]
periodSeconds: 2
volumeMounts:
- name: init-script
mountPath: /scripts
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This initContainer will cause a deadlock. In Kubernetes, all initContainers (including sidecars with restartPolicy: Always) must either complete or become ready before the main containers start. Since init.sh waits for the garage service (which is in the containers section) to be healthy (line 41), it will never become ready, preventing the garage container from ever starting.

To fix this, move the bootstrap logic into a regular container within the containers section so they start concurrently. Note that restartPolicy: Always is not a valid field for regular containers and should be removed.

Comment thread manifests/db/garage.yaml Outdated
ACCESS_KEY_ID="GK000000000000000000000001"
SECRET_ACCESS_KEY="0000000000000000000000000000000000000000000000000000000000000001"

apk add --no-cache curl jq >/dev/null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Installing packages via apk add during container startup is inefficient and makes the service dependent on external network availability. It's better to use a container image that already includes curl and jq (e.g., badouralix/curl-jq or a custom image).

Comment thread scripts/pull.sh
Comment thread scripts/pull.sh Outdated
# Update the file if it exists
if [[ -f "$file" ]]; then
sed -i "s/${prefix}[0-9.]\+/${prefix}${tag}/" "$file"
sed -i "s|${prefix}[0-9.]\+|${prefix}${tag}|" "$file"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sed -i command is not portable between GNU and BSD (macOS) versions. On macOS, sed -i requires an extension argument (e.g., sed -i '' ...). A portable approach is to use a temporary backup extension and then remove it.

Suggested change
sed -i "s|${prefix}[0-9.]\+|${prefix}${tag}|" "$file"
sed -i.bak "s|${prefix}[0-9.]\+|${prefix}${tag}|" "$file" && rm "${file}.bak"

Comment thread scripts/wipe.sh Outdated

echo "Clearing host data at /virtool/data..."
if [ -d /virtool/data ]; then
rm -rf /virtool/data/* 2>/dev/null || sudo rm -rf /virtool/data/*
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using sudo rm -rf on a host path is risky and can lead to accidental data loss if the path is misconfigured. It also interrupts the workflow by prompting for a password. Consider if this cleanup can be handled via a Kubernetes Job with appropriate volume mounts or by ensuring the user has the necessary permissions without sudo.

@igboyes igboyes requested a review from ReeceHoffmann April 30, 2026 18:29
Comment thread scripts/wipe.sh Outdated
Comment thread scripts/wipe.sh Outdated
Comment thread Tiltfile
igboyes added 4 commits May 1, 2026 09:37
- scripts/wipe.sh: refuse to run unless kubectl current-context and
  the running Minikube profile are both 'virtool'; replace
  `sudo rm -rf /virtool/data/*` with `find -mindepth 1` so dotfiles
  are removed too.
- scripts/init.sh: pin every minikube/kubectl call to the dedicated
  'virtool' profile/context.
- scripts/pull.sh: strip leading 'v' from upstream tags and clean up
  the sed backup file.
- manifests/db/garage.yaml: move garage-init from an initContainer
  running apk-installed curl/jq to a sidecar using the
  badouralix/curl-jq image.
Swap the Garage S3 store for Azurite (the Azure Blob Storage emulator)
so the dev cluster matches the SDK and semantics used in production.

Drops the Garage StatefulSet, init sidecar, and admin/layout/key
bootstrap dance in favor of a single Azurite container with a /data
PVC. Updates Tiltfile, wipe.sh, and AGENTS.md to reference azurite.

App-side env wiring (connection string for devstoreaccount1) is a
follow-up in the virtool repo; the deployments in this repo do not
currently set any S3/Blob env vars.
@igboyes igboyes merged commit 243af91 into main May 6, 2026
2 checks passed
@igboyes igboyes deleted the big-changes branch May 6, 2026 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants