Skip to content

Phase 1: docker-based deploy pipeline#43

Merged
cyberb merged 19 commits into
masterfrom
docker-deploy
May 17, 2026
Merged

Phase 1: docker-based deploy pipeline#43
cyberb merged 19 commits into
masterfrom
docker-deploy

Conversation

@cyberb
Copy link
Copy Markdown
Member

@cyberb cyberb commented May 17, 2026

Summary

  • Build a syncloud/redirect docker image from a root Dockerfile (multi-stage, distroless static, api/www/cli + emails baked in)
  • New deploy/deploy.sh migrates a host from the existing systemd units (redirect.api, redirect.www) to two --restart=unless-stopped containers, mounting /var/www/redirect so unix sockets stay where Apache already proxies to
  • Pipeline gains a deploy test step that runs the script against the same www.syncloud.test service the integration step provisioned via systemd, so the systemd-to-docker migration is exercised end-to-end before any UAT/prod run
  • UI tests now run against the docker-migrated host, proving the cutover doesn't break the API or web
  • deploy uat and deploy prod are wired up like the store pipeline; they will be red on CI until uat_deploy_* / prod_deploy_* secrets are added to Drone — that gating is deliberate so the first real UAT run is the only thing left untested

The legacy tarball + systemd flow is left in place so running prod is undisturbed by this branch. Removal will be a follow-up phase.

Test plan

  • Drone build 1239 — every step before deploy uat is green (clone, services, build web/backend, package, test-integration, docker push, deploy test, test-ui-desktop, test-ui-mobile, artifact, all testapi stages); deploy uat is red because secrets aren't set; deploy prod is skipped on this non-stable branch
  • Add uat_deploy_host / uat_deploy_user / uat_deploy_key / uat_deploy_url in Drone and push to trigger a real UAT migration rehearsal
  • After UAT is verified, add prod_deploy_* secrets and merge to stable to migrate production

cyberb added 19 commits May 17, 2026 00:06
Build a syncloud/redirect image and push it from CI. Add a deploy step
that exercises the systemd->docker migration on the same test platform
that the integration step already provisioned, so the real UAT/prod
migration has no surprises.

UAT and prod steps are wired like the store pipeline; they will be red
until the corresponding deploy_* secrets are configured.
bookworm tag does not exist for that Go version; the rest of the
pipeline already pins to buster.
On a freshly-provisioned host where docker.io was just installed
the daemon is not auto-started inside containerised systemd
contexts (deb-systemd-invoke can't reach systemd from postinst).
CI uses a privileged platform-bookworm service container in which
docker.service refuses to start under systemd. Bring up dockerd in
the background with the vfs storage driver so nested execution is
not gated on overlay/cgroup features. Real UAT/prod hosts will be
served by the systemctl branch above.
The CI 'deploy test' step targets the in-pipeline www.syncloud.test
service, but the URL the verifier curls (api.syncloud.test) is not
resolvable from the runner without an alias. UAT/prod use real DNS
so the alias is only added when the URL host fails to resolve.
The 'build backend' pipeline step already produces static binaries at
build/bin/{api,www,cli}; rebuilding inside the image was duplicate
work and forced a heavier multi-stage layout. Now the image just
copies what the previous step produced, matching how ../store does
it.
- Move the Go build and test invocations out of .drone.jsonnet into
  backend/build.sh and backend/test.sh so the same recipe can be run
  locally and by CI without copy-paste drift.
- Drop -linkmode external -extldflags -static; with CGO_ENABLED=0 the
  Go toolchain already produces a fully static binary suitable for
  distroless/static.
- Add a backend/version package and inject GitSha/BuildNumber/BuildTime
  via -X ldflags, matching the store layout. Each main.go prints its
  build info on startup so it lands in journal/docker logs.
- Scope the long-standing 'api' .gitignore entry to the repo root so
  backend/cmd/api stops being shadowed.
Rename 'test-integration' to 'systemd install' and have it run only
the install path plus a smoke index check on the systemd-deployed
redirect. Add a new 'test-api' step that runs the full API suite from
verify.py *after* deploy test migrates the host to docker, so the
upgrade path (the critical one for the first UAT) is exercised end
to end and every API assertion runs against what is actually deployed.
Root cause of the intermittent 'cannot mount squashfs ... failed to
setup loop device' failure in test-integration / systemd install: the
verify.py test_start fixture invokes 'snap remove platform' on the
in-pipeline www.syncloud.test service, which forces snapd to run its
syscheck and try to loop-mount a squashfs probe. Under load on the
shared CI host loop devices are scarce and that mount fails about
half the time.

Switch the test service from syncloud/platform-bookworm-amd64 (which
ships the platform snap pre-installed) to syncloud/bootstrap-bookworm-amd64
(same image ../store uses). Bootstrap has no snap, so no probe runs
and the loop-device contention is gone. Drop the now-unnecessary
'snap remove platform' line; the rest of test_start only relies on
apt + ssh + systemctl which bootstrap provides.
test_start (skipped in this step) was the only place that called
add_host_alias to make api.syncloud.test and auth.syncloud.test
resolve to the device. Re-add the alias inline before pytest so the
post-deploy API suite can reach the deployed host.
test_backup hits /var/www/redirect/current/bin/redirectdb after the
docker migration and fails with exit 127. Dump the post-migration
state of the redirect dir and check mysqldump availability so the
next CI run shows which piece is missing.
test_backup (and any future ssh-based check) shells out to sshpass via
device.run_ssh; the test-api runner image had only default-mysql-client
so sshpass returned 127. Add both packages to match what 'systemd
install' installs. Drop the deploy-verify diagnostic now that the
cause is identified.
verify.py was doing two jobs: installing redirect on a fresh systemd
host (test_start), and asserting API behaviour against the running
service. The pipeline now has two distinct phases (systemd install,
then docker deploy), so put each phase in its own file:

- test-systemd.py owns the tarball install plus a smoke index check
  and the apache/journal log-collection teardown.
- test.py (renamed from verify.py) holds the API suite that runs
  against the docker-deployed redirect.

Move the add_host_alias setup into an autouse session-scoped fixture
in conftest.py so both files get it without any /etc/hosts bash hack
in the Drone yaml. Drop the now-unused imports left in test.py and
the corresponding --deselect flags from the Drone commands.
deploy-verify.sh now also curls the www host (https://www.<domain>/)
to confirm the user-facing UI is up, and POSTs /domain/update with
$SMOKE_TOKEN to exercise the critical DB+Route53 path end to end. The
smoke step is opt-in: deploy uat/prod read SMOKE_TOKEN from new
uat_smoke_token / prod_smoke_token secrets, and the check is skipped
when the env is empty (so deploy test still works as before).
/status only proves the api process is alive and apache is routing
to it; it does not exercise the DB. POST /domain/update with a known
bogus token: the handler must query the DB to know the token does
not exist, so a working DB returns success:false with 'unknown
domain update token' while a broken DB would 500 or hang. Runs on
every deploy (test/uat/prod) — no setup needed.
Bogus-token DB smoke is enough until we set up a dedicated smoke
account; the SMOKE_TOKEN branch was dead code that referenced
secrets we agreed not to create yet.
Validation runs before the DB query, so a token-only payload short
circuits with 'web_protocol Missing' / 'web_local_port Missing'
instead of touching the DB. Include both with default values so the
handler reaches GetDomainByToken and we can assert 'unknown domain
update token' as the DB-alive signal.
@cyberb cyberb merged commit ce84189 into master May 17, 2026
1 check passed
@cyberb cyberb deleted the docker-deploy branch May 17, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant