Skip to content

feature(ssh-tunnel): backend service and client API routes (Part 3 / Steps 1-3 + 4a)#1

Closed
CodeLieutenant wants to merge 21 commits intomasterfrom
feature/ssh-tunnel-backend-part3
Closed

feature(ssh-tunnel): backend service and client API routes (Part 3 / Steps 1-3 + 4a)#1
CodeLieutenant wants to merge 21 commits intomasterfrom
feature/ssh-tunnel-backend-part3

Conversation

@CodeLieutenant
Copy link
Copy Markdown
Owner

Context

This PR implements Part 3 of the SSH Tunnel design (docs/plans/ssh-tunnel-design.md) — the server-side foundation that must be in place before any client-side work (Steps 5/6) or proxy-host provisioning (Steps 4c/7b) can be tested end-to-end.

The goal of this stage: a client can POST a public key to Argus and receive proxy tunnel connection details in return; the proxy host can GET all valid public keys at any time via AuthorizedKeysCommand.


What's in this PR

Step 1 + 2 — DB models (argus/backend/models/ssh_key.py)

Two new CQLEngine models added to USED_MODELS (auto-created by sync_models/test startup, no manual CQL migration needed):

SSHTunnelKey
Stores a client-registered public key scoped to a (user_id, tunnel_id) pair.

  • Inserted with a ScyllaDB TTL (default 24 h, overridable via ttl_seconds). Expired rows are auto-deleted — no cleanup job needed.
  • expires_at is stored as an informational timestamp so clients know when to re-register.
  • fingerprint is computed server-side (SHA256 of raw key bytes, SHA256:<base64> format) — the private key never touches the server.

ProxyTunnelConfig
Stores the connection details of an SSH proxy tunnel server (host, port, proxy_user, target_host, target_port, host_key_fingerprint, service_user_id, is_active).
Only one config has is_active=True at a time.


Step 3 — Backend service (argus/backend/service/tunnel_service.py)

TunnelService with the following methods:

Method Purpose
register_tunnel(user, public_key, tunnel_id?, ttl_seconds?) Validate + fingerprint key, store SSHTunnelKey with TTL, return proxy connection dict + expires_at
get_authorized_keys(tunnel_id?) Return all non-expired keys as OpenSSH authorized_keys text — called by proxy host via AuthorizedKeysCommand
list_keys(tunnel_id?) Key inventory (used by admin panel in a later PR)
delete_key(key_id) Immediate revocation
get_proxy_tunnel_config(tunnel_id?) Read active or specific config
save_proxy_tunnel_config(payload) Create new config, deactivate the old active one, create a dedicated service user (proxy-tunnel-<host>) with a fresh API token for the proxy host

TunnelServiceException covers all expected business errors (no active config, invalid key, missing fields, not found).


Step 4a — Client API routes (argus/backend/controller/ssh_api.py)

New ssh_api Blueprint registered as a sub-blueprint of client_api → final URLs under /api/v1/client/ssh/:

POST /api/v1/client/ssh/tunnel    @api_login_required

Register a public key, receive proxy connection details.
Request body: { "public_key": "ssh-ed25519 ...", "ttl_seconds": 86400, "tunnel_id": "<uuid>" }
Response: { "status": "ok", "response": { "key_id", "tunnel_id", "proxy_host", "proxy_port", "proxy_user", "target_host", "target_port", "host_key_fingerprint", "expires_at" } }

GET /api/v1/client/ssh/keys    @api_login_required

Returns all non-expired public keys as text/plain (authorized_keys format, one key per line).
Optional query param: ?tunnel_id=<uuid> to scope to a specific proxy host.
Called by the proxy host's AuthorizedKeysCommand via argus-cli ssh-keys (Step 7b).


Tests (argus/backend/tests/tunnel/)

All tests are @pytest.mark.docker_required and use the shared ScyllaDB Docker fixture from conftest.py.

test_tunnel_service.py — 14 unit tests:

  • register_tunnel: happy path, custom TTL, no active config raises, invalid key, missing key, explicit tunnel_id
  • get_authorized_keys: OpenSSH format validation, tunnel scoping
  • delete_key: row removed, nonexistent raises
  • save_proxy_tunnel_config: service user created, old config deactivated, missing fields raises
  • get_proxy_tunnel_config: returns active, by id, None when none active

test_ssh_api.py — 10 integration tests via Flask test client:

  • POST /ssh/tunnel: success, ttl_seconds, explicit tunnel_id, missing key, invalid key, malformed UUID (400), unauthenticated (403)
  • GET /ssh/keys: success + key in output, tunnel scoping, malformed UUID (400), unauthenticated (403), empty when no keys

No new dependencies

cryptography is already a transitive dependency via PyJWT[crypto].


How to test

# Run all tunnel tests (Docker required)
uv run pytest argus/backend/tests/tunnel/ -m docker_required -v

To exercise the API manually after the server is running:

# Register a key as a client:
curl -X POST http://localhost:5000/api/v1/client/ssh/tunnel \
  -H "Authorization: token <your_token>" \
  -H "Content-Type: application/json" \
  -d '{"public_key": "ssh-ed25519 AAAA..."}'

# Fetch authorized keys (as the proxy host would via AuthorizedKeysCommand):
curl http://localhost:5000/api/v1/client/ssh/keys \
  -H "Authorization: token <proxy_service_user_token>"

Note: a ProxyTunnelConfig row must exist with is_active=True before POST /ssh/tunnel will work. That can be inserted directly via a Python shell until Step 4b (admin API) lands.


What is NOT in this PR (follow-up)

Step Description
4b Admin API endpoints for proxy tunnel config + key management
4c Proxy host provisioning Jinja template
4d Admin Panel UI — ProxyTunnelManager.svelte
5/6 argus-client tunnel module + base.py integration
7/7b CLI --use-tunnel flag + argus-cli ssh-keys command

CodeLieutenant and others added 21 commits April 7, 2026 13:24
Signed-off-by: Dusan Malusev <dusan@dusanmalusev.dev>
Removing due no subscription anyomre.
Add design document for routing Argus client traffic through an SSH
tunnel via a proxy host, avoiding Cloudflare HTTPS costs. Covers
client-generated ephemeral keys with ScyllaDB TTL, real-time key
lookup via sshd AuthorizedKeysCommand, and automated proxy host
provisioning from the admin panel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ents/nemeses commands

- Return an actionable 'unauthorized' error when the server responds with
  a non-JSON Content-Type (e.g. an HTML Cloudflare login page) instead of
  a confusing decode error, allowing LLMs and humans to self-correct

- Make --type optional on 'argus run get': the plugin type is now resolved
  automatically via GET /run/<run_id>/type and cached with a 24-hour TTL
  (run type is immutable), so subsequent calls skip the network round-trip

- Add 'argus run details' command: returns only the basic information shown
  on the Argus Details tab (no logs, screenshots, events, nemeses, resources
  or histograms); heavy data is accessible via dedicated subcommands

- Add 'argus run events' command: fetches CRITICAL and ERROR events directly
  from GET /client/sct/<run_id>/events/<severity>/get with --before, --after,
  and --limit flags; time flags accept Unix timestamps, RFC3339, or YYYY-MM-DD

- Add 'argus run nemeses' command: fetches nemesis records from the new
  GET /client/sct/<run_id>/nemesis/get endpoint with --before/--after filters

- Backend: add GET /client/sct/<run_id>/nemesis/get endpoint and
  SCTService.get_nemeses() with before/after timestamp filtering

- Backend: add 'after' query parameter support to both events endpoints
  (GET /events/get and GET /events/<severity>/get) and propagate it through
  SCTService.get_events() and SCTTestRun.get_events_limited()
…tive flag

After CF login, immediately exchange the session for a PAT via
GET /api/v1/user/token and store it as the primary credential, discarding
the session. This makes auth more robust in CI and across CF token expiry.

- ArgusService.Login: PAT-first fast-path; always exchanges session → PAT
- ArgusService.fetchPAT: new helper calling the user-token endpoint
- ErrFetchingToken / ErrStoringPAT: new sentinel errors
- UserToken route and UserTokenResponse model added
- root.go: extract buildAPIClientRaw; add --non-interactive persistent flag
  stored on context via cmdctx.WithNonInteractive
- RunWithAuthRetry: wraps command RunE; on unauthorized error either returns
  ErrUnauthorized (--non-interactive) or re-auths and retries once
- Auth tests updated to verify PAT is stored and session deleted post-login
…e; RunDetails view

- run get: SCTTestRun now renders as a structured two-section Details table
  (Run Details + System Information + Summary) mirroring the Argus web UI
  Details tab; events and nemesis rows show counts only, keeping AI context
  windows small.  All other run types keep the generic KV table.

- models: add SCTEventsResponse (Tabular, caching-aware), NemesisResponse
  (extracted from run object, no extra API call), and RunDetails wrapper.
  Retain SCTRunDetails/GenericRunDetails/DriverRunDetails/SirenadaRunDetails
  and NemesisRecord for the run details and nemeses subcommands.

- cache/keys: rename TTLEvents→TTLSCTEvents (5 min, matching run TTL);
  add TTLNemesis alias; rename EventsKey→SCTEventsKey (takes severity +
  before-cursor so each pagination page is cached independently under
  sct-events/{runID}/{severity}/{cursor}); add NemesisKey alongside the
  existing NemesesKey so both endpoint-based and run-object-derived paths
  have their own cache namespace.

- api/routes: fix SCT event route prefix from /client/sct/ to /sct/
  (matches actual Flask blueprint mount point); add SCTEventsCountBySeverity
  and SCTNemesisGet; keep SCTEventsBySeverity name used by run_nemeses_events.
- events/nemeses commands now follow a 4-step cache-first strategy:
  full-dataset cache hit → filter locally (no network); exact filtered
  cache hit → return directly; cache miss → fetch from API; store under
  full key (unfiltered) or filtered key (filtered)
- cache/keys: SCTEventsKey now encodes both before+after cursors;
  add SCTEventsFullKey, NemesesFilteredKey, and update NemesesKey to
  store under a 'full' sentinel sub-path
- fix all 19 linter errors (errcheck, staticcheck, unused):
  propagate fmt.Fprint* errors in cache.go; extract extractTarFile
  helper in logs.go to capture close errors; replace bare
  defer resp.Body.Close() with draining defers in api, auth, and
  cloudflared; fix Stats() nil-before-deref in cache.go; remove unused
  isCacheMiss and getCFToken; fix QF1011 in testid_test.go
- Suppress CF JWT from terminal output; only browser-URL lines are printed
- Add ErrPATNotFound sentinel and DeletePAT(); guard against empty PAT in LoadPAT()
- Add session fast-path in Login() to recover from failed PAT exchanges
- Pass CF token alongside session cookie in fetchPAT() for CF Access passthrough
- Add jwt.IsOlderThan() with iat-based age check; enforce 12h max CF token age
- Add ErrUnauthorized sentinel to api.DoJSON; use errors.Is in isUnauthorizedErr()
- Fix Rows() slice aliasing bug in RunDetails (out := rows[:0] → make)
- Unify SCT run details path: use RunDetails{Run: full}, delete SCTRunDetails
- Remove unused NemesisKey, TTLNemeses, TTLNemesis from cache/keys.go
- Rewrite logging: JSON file + opt-in text console with independent level filters
- Add -v/-vv/-vvv count flag wiring WithConsoleWriter to cmd.ErrOrStderr()
- Print success message to cmd.OutOrStdout() after argus auth completes
- Update logging tests: JSON file assertions, new console writer coverage
Explicitly discard return values from fmt.Fprintln/Fprintf calls and
wrap deferred resp.Body.Close() in anonymous funcs to satisfy errcheck.
Replace redundant runtime type assertion with idiomatic compile-time
interface check to fix staticcheck S1040.
… silence usage on runtime errors

- Fix log download route: /testrun/tests/... -> /api/v1/tests/...
- Fix SCT events routes: add missing /client prefix to match actual Flask blueprint mount path
- Fix run nemeses: remove reference to non-existent GET endpoint; extract NemesisData from full run response instead
- Remove dead NemesisRecord type and SCTNemesisGet route constant; add NewNemesisResponse constructor
- run get now uses generic KVTabular full dump; run details keeps the curated sectioned view
- RunDetails.MarshalJSON emits only the fields shown in the text table (runDetailsJSON) for consistent JSON/text output
- Suppress usage output on runtime errors (API failures) by setting cmd.SilenceUsage=true at the start of every RunE; flag-parse and required-flag errors still show usage as before
…m run_details switch

Move the per-run-type details projection logic out of the inline switch in
run_details.go into a RunTypeDetailsHandlers registry and a DispatchDetails
helper in testrun.go, keeping it alongside the existing RunTypeHandlers.
Adding support for a new plugin now only requires a single map entry in each
registry. The run details command body shrinks from 47 lines to 3.
run get now shows the lightweight details summary via DispatchDetails,
while run details fetches the full run object via RunTypeHandlers with
KVTabular output and caching.
… handlers

All run, log, nemesis, event, comment and discussion commands now emit
scoped zerolog entries at the appropriate level:
- Debug: entry-point with input flags/IDs, cache hit/miss, route, counts
- Info:  one success summary per command with outcome fields
- Warn:  non-fatal cache write failures that don't abort the operation
- Error: every error-return site, with full context fields

Also fix a latent bug in auth_token.go where log.Err(nil) was a zerolog
no-op; replaced with log.Error() so the message is actually emitted.
…t 401/403 in DoJSON

Add SkipAuthRetryAnnotation to cache clear, cache info, and auth-token
commands so they are excluded from the transparent re-authentication
wrapper.

Teach DoJSON to recognise HTTP 401 and 403 responses as ErrUnauthorized
before attempting JSON decoding, enabling the auth retry logic to trigger
on explicit server rejections.
Replace the single CF token keychain entry with three headless CF Access
entries: cf-access-client-id, cf-access-client-secret, and
cf-access-argus-token.

Add StoreCFAccess, LoadCFAccess, HasCFAccess, and DeleteCFAccess
functions so the CLI can persist and retrieve service-token credentials
that bypass the cloudflared browser-based login flow.

Includes tests for round-trip, partial storage, deletion, and HasCFAccess.
Add 'argus auth headless' which interactively prompts for three secrets
(CF Access Client ID, CF Access Client Secret, Argus API Token) with
masked input via golang.org/x/term and stores them in the OS keychain.
This enables authentication without cloudflared or a browser.

Add 'argus auth logout' which removes all stored credentials from the
keychain: PAT, session cookie, and headless CF Access credentials.

Both commands are registered as subcommands of 'argus auth' and carry
SkipAuthRetryAnnotation.
Add Set, Get, GetAll, Keys, and IsValidKey functions to the config
package so individual settings can be read and written to the config
file on disk without disturbing other keys.

Add 'argus config list', 'argus config get <key>', and
'argus config set <key> <value>' commands with shell completion for
recognised keys (url, use_cloudflare).

The auth headless command now automatically sets use_cloudflare=false
in the config file after storing headless credentials.
…keychain probing

Replace all keychain.HasCFAccess() / keychain.LoadCFAccess() decision
points with cfg.UseCf so the auth mode is driven by the use_cloudflare
config flag rather than the presence of keychain entries.

When use_cloudflare=false (headless mode):
- buildAPIClientRaw loads CF Access headers + Argus token from keychain
- runWithAuthRetry reports an actionable error instead of launching cloudflared
- auth command directs the user to 'auth headless' or 'config set'

When use_cloudflare=true (cloudflared mode):
- buildAPIClientRaw loads PAT or session from keychain
- runWithAuthRetry triggers the full cloudflared browser login
- auth command verifies existing credentials before re-authenticating

Remove short-circuit early-returns from ArgusService.Login() so callers
verify their credentials first and only call Login() when re-auth is
needed.  Update the corresponding test.

This lets users keep both sets of credentials in the keychain and switch
between modes with 'argus config set use_cloudflare true/false'.
… repeated_at

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…(Part 3, Steps 1-3 + 4a)

Implements the server-side foundation required for the SSH tunnel feature
described in docs/plans/ssh-tunnel-design.md. This is the first stage that
must land before any client-side or proxy-host work can be tested end-to-end.

## What is included

### DB models  (Step 1 + Step 2)
- argus/backend/models/ssh_key.py
  - SSHTunnelKey: stores a client-registered ed25519 public key scoped to a
    specific (user, tunnel) pair. Rows carry a ScyllaDB TTL (default 24 h) so
    expired keys are auto-deleted with no manual cleanup. expires_at is stored
    as an informational timestamp so clients know when to re-register.
  - ProxyTunnelConfig: stores the connection details of a proxy tunnel server
    (host, port, proxy_user, target_host, target_port, host_key_fingerprint,
    service_user_id, is_active). Only one config has is_active=True at a time.
- argus/backend/models/web.py: both models added to USED_MODELS so they are
  created automatically by sync_models / at test startup.

### Backend service  (Step 3)
- argus/backend/service/tunnel_service.py  —  TunnelService class:
  - register_tunnel(user, public_key, tunnel_id?, ttl_seconds?): validates and
    fingerprints the submitted public key (SHA256 derived server-side via the
    cryptography library — private key never touches the server), stores an
    SSHTunnelKey row with ScyllaDB TTL, returns the full proxy connection dict
    including expires_at (UTC ISO-8601).
  - get_authorized_keys(tunnel_id?): returns all non-expired public keys as a
    newline-separated OpenSSH authorized_keys string. Called by the proxy host
    via AuthorizedKeysCommand → argus-cli ssh-keys.
  - list_keys(tunnel_id?) / delete_key(key_id): key inventory and revocation.
  - get_proxy_tunnel_config(tunnel_id?) / save_proxy_tunnel_config(payload):
    config CRUD. save_ deactivates the previous active config and creates a
    dedicated Argus service user (proxy-tunnel-<host>) with a fresh API token
    that is returned once to the admin for proxy-host provisioning.
  - TunnelServiceException for all expected business errors.

### Client API routes  (Step 4a)
- argus/backend/controller/ssh_api.py  —  Blueprint registered at /ssh:
  - POST /ssh/tunnel   @api_login_required: register a public key, get proxy
    config back. Accepts optional ttl_seconds and tunnel_id.
  - GET  /ssh/keys     @api_login_required: return authorized_keys text.
    Accepts optional ?tunnel_id= query param.
- argus/backend/controller/client_api.py: ssh_api blueprint registered as a
  sub-blueprint → final URLs are /api/v1/client/ssh/tunnel and
  /api/v1/client/ssh/keys.

### Tests
- argus/backend/tests/tunnel/test_tunnel_service.py: 14 docker_required tests
  covering register_tunnel (happy path, custom TTL, no active config, invalid
  key, explicit tunnel_id), get_authorized_keys (format + tunnel scoping),
  delete_key, save_proxy_tunnel_config (service user creation, old config
  deactivation, missing fields), get_proxy_tunnel_config.
- argus/backend/tests/tunnel/test_ssh_api.py: 10 docker_required integration
  tests via the Flask test client covering both routes (success, ttl, explicit
  tunnel_id, missing/invalid key, malformed UUID, unauthenticated access,
  tunnel scoping, empty response).

## No new dependencies
cryptography is already a transitive dependency via PyJWT[crypto].

## What is NOT included (follow-up PRs)
- Step 4b: admin API endpoints (proxy tunnel config + key list/delete)
- Step 4c: proxy host provisioning Jinja template
- Step 4d: Admin Panel UI (ProxyTunnelManager.svelte)
- Step 5/6: argus-client tunnel module and base.py integration
- Step 7b: argus-cli ssh-keys command

## How to run the tests
    uv run pytest argus/backend/tests/tunnel/ -m docker_required -v
@github-actions github-actions Bot added the ai-assisted AI-assisted contribution label Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-assisted AI-assisted contribution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants