feature(ssh-tunnel): backend service and client API routes (Part 3 / Steps 1-3 + 4a)#1
Closed
CodeLieutenant wants to merge 21 commits intomasterfrom
Closed
feature(ssh-tunnel): backend service and client API routes (Part 3 / Steps 1-3 + 4a)#1CodeLieutenant wants to merge 21 commits intomasterfrom
CodeLieutenant wants to merge 21 commits intomasterfrom
Conversation
Signed-off-by: Dusan Malusev <dusan@dusanmalusev.dev>
Removing due no subscription anyomre.
Add design document for routing Argus client traffic through an SSH tunnel via a proxy host, avoiding Cloudflare HTTPS costs. Covers client-generated ephemeral keys with ScyllaDB TTL, real-time key lookup via sshd AuthorizedKeysCommand, and automated proxy host provisioning from the admin panel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ents/nemeses commands - Return an actionable 'unauthorized' error when the server responds with a non-JSON Content-Type (e.g. an HTML Cloudflare login page) instead of a confusing decode error, allowing LLMs and humans to self-correct - Make --type optional on 'argus run get': the plugin type is now resolved automatically via GET /run/<run_id>/type and cached with a 24-hour TTL (run type is immutable), so subsequent calls skip the network round-trip - Add 'argus run details' command: returns only the basic information shown on the Argus Details tab (no logs, screenshots, events, nemeses, resources or histograms); heavy data is accessible via dedicated subcommands - Add 'argus run events' command: fetches CRITICAL and ERROR events directly from GET /client/sct/<run_id>/events/<severity>/get with --before, --after, and --limit flags; time flags accept Unix timestamps, RFC3339, or YYYY-MM-DD - Add 'argus run nemeses' command: fetches nemesis records from the new GET /client/sct/<run_id>/nemesis/get endpoint with --before/--after filters - Backend: add GET /client/sct/<run_id>/nemesis/get endpoint and SCTService.get_nemeses() with before/after timestamp filtering - Backend: add 'after' query parameter support to both events endpoints (GET /events/get and GET /events/<severity>/get) and propagate it through SCTService.get_events() and SCTTestRun.get_events_limited()
…tive flag After CF login, immediately exchange the session for a PAT via GET /api/v1/user/token and store it as the primary credential, discarding the session. This makes auth more robust in CI and across CF token expiry. - ArgusService.Login: PAT-first fast-path; always exchanges session → PAT - ArgusService.fetchPAT: new helper calling the user-token endpoint - ErrFetchingToken / ErrStoringPAT: new sentinel errors - UserToken route and UserTokenResponse model added - root.go: extract buildAPIClientRaw; add --non-interactive persistent flag stored on context via cmdctx.WithNonInteractive - RunWithAuthRetry: wraps command RunE; on unauthorized error either returns ErrUnauthorized (--non-interactive) or re-auths and retries once - Auth tests updated to verify PAT is stored and session deleted post-login
…e; RunDetails view
- run get: SCTTestRun now renders as a structured two-section Details table
(Run Details + System Information + Summary) mirroring the Argus web UI
Details tab; events and nemesis rows show counts only, keeping AI context
windows small. All other run types keep the generic KV table.
- models: add SCTEventsResponse (Tabular, caching-aware), NemesisResponse
(extracted from run object, no extra API call), and RunDetails wrapper.
Retain SCTRunDetails/GenericRunDetails/DriverRunDetails/SirenadaRunDetails
and NemesisRecord for the run details and nemeses subcommands.
- cache/keys: rename TTLEvents→TTLSCTEvents (5 min, matching run TTL);
add TTLNemesis alias; rename EventsKey→SCTEventsKey (takes severity +
before-cursor so each pagination page is cached independently under
sct-events/{runID}/{severity}/{cursor}); add NemesisKey alongside the
existing NemesesKey so both endpoint-based and run-object-derived paths
have their own cache namespace.
- api/routes: fix SCT event route prefix from /client/sct/ to /sct/
(matches actual Flask blueprint mount point); add SCTEventsCountBySeverity
and SCTNemesisGet; keep SCTEventsBySeverity name used by run_nemeses_events.
- events/nemeses commands now follow a 4-step cache-first strategy: full-dataset cache hit → filter locally (no network); exact filtered cache hit → return directly; cache miss → fetch from API; store under full key (unfiltered) or filtered key (filtered) - cache/keys: SCTEventsKey now encodes both before+after cursors; add SCTEventsFullKey, NemesesFilteredKey, and update NemesesKey to store under a 'full' sentinel sub-path - fix all 19 linter errors (errcheck, staticcheck, unused): propagate fmt.Fprint* errors in cache.go; extract extractTarFile helper in logs.go to capture close errors; replace bare defer resp.Body.Close() with draining defers in api, auth, and cloudflared; fix Stats() nil-before-deref in cache.go; remove unused isCacheMiss and getCFToken; fix QF1011 in testid_test.go
- Suppress CF JWT from terminal output; only browser-URL lines are printed
- Add ErrPATNotFound sentinel and DeletePAT(); guard against empty PAT in LoadPAT()
- Add session fast-path in Login() to recover from failed PAT exchanges
- Pass CF token alongside session cookie in fetchPAT() for CF Access passthrough
- Add jwt.IsOlderThan() with iat-based age check; enforce 12h max CF token age
- Add ErrUnauthorized sentinel to api.DoJSON; use errors.Is in isUnauthorizedErr()
- Fix Rows() slice aliasing bug in RunDetails (out := rows[:0] → make)
- Unify SCT run details path: use RunDetails{Run: full}, delete SCTRunDetails
- Remove unused NemesisKey, TTLNemeses, TTLNemesis from cache/keys.go
- Rewrite logging: JSON file + opt-in text console with independent level filters
- Add -v/-vv/-vvv count flag wiring WithConsoleWriter to cmd.ErrOrStderr()
- Print success message to cmd.OutOrStdout() after argus auth completes
- Update logging tests: JSON file assertions, new console writer coverage
Explicitly discard return values from fmt.Fprintln/Fprintf calls and wrap deferred resp.Body.Close() in anonymous funcs to satisfy errcheck. Replace redundant runtime type assertion with idiomatic compile-time interface check to fix staticcheck S1040.
… silence usage on runtime errors - Fix log download route: /testrun/tests/... -> /api/v1/tests/... - Fix SCT events routes: add missing /client prefix to match actual Flask blueprint mount path - Fix run nemeses: remove reference to non-existent GET endpoint; extract NemesisData from full run response instead - Remove dead NemesisRecord type and SCTNemesisGet route constant; add NewNemesisResponse constructor - run get now uses generic KVTabular full dump; run details keeps the curated sectioned view - RunDetails.MarshalJSON emits only the fields shown in the text table (runDetailsJSON) for consistent JSON/text output - Suppress usage output on runtime errors (API failures) by setting cmd.SilenceUsage=true at the start of every RunE; flag-parse and required-flag errors still show usage as before
…m run_details switch Move the per-run-type details projection logic out of the inline switch in run_details.go into a RunTypeDetailsHandlers registry and a DispatchDetails helper in testrun.go, keeping it alongside the existing RunTypeHandlers. Adding support for a new plugin now only requires a single map entry in each registry. The run details command body shrinks from 47 lines to 3.
run get now shows the lightweight details summary via DispatchDetails, while run details fetches the full run object via RunTypeHandlers with KVTabular output and caching.
… handlers All run, log, nemesis, event, comment and discussion commands now emit scoped zerolog entries at the appropriate level: - Debug: entry-point with input flags/IDs, cache hit/miss, route, counts - Info: one success summary per command with outcome fields - Warn: non-fatal cache write failures that don't abort the operation - Error: every error-return site, with full context fields Also fix a latent bug in auth_token.go where log.Err(nil) was a zerolog no-op; replaced with log.Error() so the message is actually emitted.
…t 401/403 in DoJSON Add SkipAuthRetryAnnotation to cache clear, cache info, and auth-token commands so they are excluded from the transparent re-authentication wrapper. Teach DoJSON to recognise HTTP 401 and 403 responses as ErrUnauthorized before attempting JSON decoding, enabling the auth retry logic to trigger on explicit server rejections.
Replace the single CF token keychain entry with three headless CF Access entries: cf-access-client-id, cf-access-client-secret, and cf-access-argus-token. Add StoreCFAccess, LoadCFAccess, HasCFAccess, and DeleteCFAccess functions so the CLI can persist and retrieve service-token credentials that bypass the cloudflared browser-based login flow. Includes tests for round-trip, partial storage, deletion, and HasCFAccess.
Add 'argus auth headless' which interactively prompts for three secrets (CF Access Client ID, CF Access Client Secret, Argus API Token) with masked input via golang.org/x/term and stores them in the OS keychain. This enables authentication without cloudflared or a browser. Add 'argus auth logout' which removes all stored credentials from the keychain: PAT, session cookie, and headless CF Access credentials. Both commands are registered as subcommands of 'argus auth' and carry SkipAuthRetryAnnotation.
Add Set, Get, GetAll, Keys, and IsValidKey functions to the config package so individual settings can be read and written to the config file on disk without disturbing other keys. Add 'argus config list', 'argus config get <key>', and 'argus config set <key> <value>' commands with shell completion for recognised keys (url, use_cloudflare). The auth headless command now automatically sets use_cloudflare=false in the config file after storing headless credentials.
…keychain probing Replace all keychain.HasCFAccess() / keychain.LoadCFAccess() decision points with cfg.UseCf so the auth mode is driven by the use_cloudflare config flag rather than the presence of keychain entries. When use_cloudflare=false (headless mode): - buildAPIClientRaw loads CF Access headers + Argus token from keychain - runWithAuthRetry reports an actionable error instead of launching cloudflared - auth command directs the user to 'auth headless' or 'config set' When use_cloudflare=true (cloudflared mode): - buildAPIClientRaw loads PAT or session from keychain - runWithAuthRetry triggers the full cloudflared browser login - auth command verifies existing credentials before re-authenticating Remove short-circuit early-returns from ArgusService.Login() so callers verify their credentials first and only call Login() when re-auth is needed. Update the corresponding test. This lets users keep both sets of credentials in the keychain and switch between modes with 'argus config set use_cloudflare true/false'.
… repeated_at Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…(Part 3, Steps 1-3 + 4a)
Implements the server-side foundation required for the SSH tunnel feature
described in docs/plans/ssh-tunnel-design.md. This is the first stage that
must land before any client-side or proxy-host work can be tested end-to-end.
## What is included
### DB models (Step 1 + Step 2)
- argus/backend/models/ssh_key.py
- SSHTunnelKey: stores a client-registered ed25519 public key scoped to a
specific (user, tunnel) pair. Rows carry a ScyllaDB TTL (default 24 h) so
expired keys are auto-deleted with no manual cleanup. expires_at is stored
as an informational timestamp so clients know when to re-register.
- ProxyTunnelConfig: stores the connection details of a proxy tunnel server
(host, port, proxy_user, target_host, target_port, host_key_fingerprint,
service_user_id, is_active). Only one config has is_active=True at a time.
- argus/backend/models/web.py: both models added to USED_MODELS so they are
created automatically by sync_models / at test startup.
### Backend service (Step 3)
- argus/backend/service/tunnel_service.py — TunnelService class:
- register_tunnel(user, public_key, tunnel_id?, ttl_seconds?): validates and
fingerprints the submitted public key (SHA256 derived server-side via the
cryptography library — private key never touches the server), stores an
SSHTunnelKey row with ScyllaDB TTL, returns the full proxy connection dict
including expires_at (UTC ISO-8601).
- get_authorized_keys(tunnel_id?): returns all non-expired public keys as a
newline-separated OpenSSH authorized_keys string. Called by the proxy host
via AuthorizedKeysCommand → argus-cli ssh-keys.
- list_keys(tunnel_id?) / delete_key(key_id): key inventory and revocation.
- get_proxy_tunnel_config(tunnel_id?) / save_proxy_tunnel_config(payload):
config CRUD. save_ deactivates the previous active config and creates a
dedicated Argus service user (proxy-tunnel-<host>) with a fresh API token
that is returned once to the admin for proxy-host provisioning.
- TunnelServiceException for all expected business errors.
### Client API routes (Step 4a)
- argus/backend/controller/ssh_api.py — Blueprint registered at /ssh:
- POST /ssh/tunnel @api_login_required: register a public key, get proxy
config back. Accepts optional ttl_seconds and tunnel_id.
- GET /ssh/keys @api_login_required: return authorized_keys text.
Accepts optional ?tunnel_id= query param.
- argus/backend/controller/client_api.py: ssh_api blueprint registered as a
sub-blueprint → final URLs are /api/v1/client/ssh/tunnel and
/api/v1/client/ssh/keys.
### Tests
- argus/backend/tests/tunnel/test_tunnel_service.py: 14 docker_required tests
covering register_tunnel (happy path, custom TTL, no active config, invalid
key, explicit tunnel_id), get_authorized_keys (format + tunnel scoping),
delete_key, save_proxy_tunnel_config (service user creation, old config
deactivation, missing fields), get_proxy_tunnel_config.
- argus/backend/tests/tunnel/test_ssh_api.py: 10 docker_required integration
tests via the Flask test client covering both routes (success, ttl, explicit
tunnel_id, missing/invalid key, malformed UUID, unauthenticated access,
tunnel scoping, empty response).
## No new dependencies
cryptography is already a transitive dependency via PyJWT[crypto].
## What is NOT included (follow-up PRs)
- Step 4b: admin API endpoints (proxy tunnel config + key list/delete)
- Step 4c: proxy host provisioning Jinja template
- Step 4d: Admin Panel UI (ProxyTunnelManager.svelte)
- Step 5/6: argus-client tunnel module and base.py integration
- Step 7b: argus-cli ssh-keys command
## How to run the tests
uv run pytest argus/backend/tests/tunnel/ -m docker_required -v
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
This PR implements Part 3 of the SSH Tunnel design (
docs/plans/ssh-tunnel-design.md) — the server-side foundation that must be in place before any client-side work (Steps 5/6) or proxy-host provisioning (Steps 4c/7b) can be tested end-to-end.The goal of this stage: a client can POST a public key to Argus and receive proxy tunnel connection details in return; the proxy host can GET all valid public keys at any time via
AuthorizedKeysCommand.What's in this PR
Step 1 + 2 — DB models (
argus/backend/models/ssh_key.py)Two new CQLEngine models added to
USED_MODELS(auto-created bysync_models/test startup, no manual CQL migration needed):SSHTunnelKeyStores a client-registered public key scoped to a
(user_id, tunnel_id)pair.ttl_seconds). Expired rows are auto-deleted — no cleanup job needed.expires_atis stored as an informational timestamp so clients know when to re-register.fingerprintis computed server-side (SHA256 of raw key bytes,SHA256:<base64>format) — the private key never touches the server.ProxyTunnelConfigStores the connection details of an SSH proxy tunnel server (host, port, proxy_user, target_host, target_port, host_key_fingerprint, service_user_id, is_active).
Only one config has
is_active=Trueat a time.Step 3 — Backend service (
argus/backend/service/tunnel_service.py)TunnelServicewith the following methods:register_tunnel(user, public_key, tunnel_id?, ttl_seconds?)SSHTunnelKeywith TTL, return proxy connection dict +expires_atget_authorized_keys(tunnel_id?)authorized_keystext — called by proxy host viaAuthorizedKeysCommandlist_keys(tunnel_id?)delete_key(key_id)get_proxy_tunnel_config(tunnel_id?)save_proxy_tunnel_config(payload)proxy-tunnel-<host>) with a fresh API token for the proxy hostTunnelServiceExceptioncovers all expected business errors (no active config, invalid key, missing fields, not found).Step 4a — Client API routes (
argus/backend/controller/ssh_api.py)New
ssh_apiBlueprint registered as a sub-blueprint ofclient_api→ final URLs under/api/v1/client/ssh/:Register a public key, receive proxy connection details.
Request body:
{ "public_key": "ssh-ed25519 ...", "ttl_seconds": 86400, "tunnel_id": "<uuid>" }Response:
{ "status": "ok", "response": { "key_id", "tunnel_id", "proxy_host", "proxy_port", "proxy_user", "target_host", "target_port", "host_key_fingerprint", "expires_at" } }Returns all non-expired public keys as
text/plain(authorized_keysformat, one key per line).Optional query param:
?tunnel_id=<uuid>to scope to a specific proxy host.Called by the proxy host's
AuthorizedKeysCommandviaargus-cli ssh-keys(Step 7b).Tests (
argus/backend/tests/tunnel/)All tests are
@pytest.mark.docker_requiredand use the shared ScyllaDB Docker fixture fromconftest.py.test_tunnel_service.py— 14 unit tests:register_tunnel: happy path, custom TTL, no active config raises, invalid key, missing key, explicit tunnel_idget_authorized_keys: OpenSSH format validation, tunnel scopingdelete_key: row removed, nonexistent raisessave_proxy_tunnel_config: service user created, old config deactivated, missing fields raisesget_proxy_tunnel_config: returns active, by id, None when none activetest_ssh_api.py— 10 integration tests via Flask test client:POST /ssh/tunnel: success, ttl_seconds, explicit tunnel_id, missing key, invalid key, malformed UUID (400), unauthenticated (403)GET /ssh/keys: success + key in output, tunnel scoping, malformed UUID (400), unauthenticated (403), empty when no keysNo new dependencies
cryptographyis already a transitive dependency viaPyJWT[crypto].How to test
# Run all tunnel tests (Docker required) uv run pytest argus/backend/tests/tunnel/ -m docker_required -vTo exercise the API manually after the server is running:
Note: a
ProxyTunnelConfigrow must exist withis_active=TruebeforePOST /ssh/tunnelwill work. That can be inserted directly via a Python shell until Step 4b (admin API) lands.What is NOT in this PR (follow-up)
ProxyTunnelManager.svelteargus-clienttunnel module +base.pyintegration--use-tunnelflag +argus-cli ssh-keyscommand