diff --git a/.github/workflows/release-pypi.yml b/.github/workflows/release-pypi.yml index db7ed8b..ad5d4c4 100644 --- a/.github/workflows/release-pypi.yml +++ b/.github/workflows/release-pypi.yml @@ -1,4 +1,4 @@ -# Publish sdist + wheel to PyPI when a SemVer tag is pushed (e.g. v1.1.2). +# Publish sdist + wheel to PyPI when a SemVer tag is pushed (e.g. v1.2.0). # Configure "trusted publishing" on PyPI for this workflow + repository + optional GitHub environment. # https://docs.pypi.org/trusted-publishers/ diff --git a/CHANGELOG.md b/CHANGELOG.md index d783c4c..04e96ae 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,38 +6,36 @@ This project follows [Semantic Versioning](https://semver.org/). From **v1.0.0** ## Unreleased +## 1.2.0 - 2026-05-03 + ### Breaking - **`POST /v1/events`:** uses the same **`FLIGHTDECK_LOCAL_API_TOKEN`** / loopback policy as promotion and rollback. Remote unauthenticated ingest is no longer accepted; set the env var and send **`Authorization: Bearer`** (Python SDK **`api_token=`**, or **`--api-token`** / env in **[examples/integration/emit_sample_events.node.mjs](examples/integration/emit_sample_events.node.mjs)**). - **`GET /v1/*`:** when **`FLIGHTDECK_LOCAL_API_TOKEN`** is set, read APIs require **`Authorization: Bearer`** (same header as writes); previously only mutations were Bearer-gated. - **Python:** **`requires-python`** is **`>=3.11,<4`** (replaces **`>=3.14,<3.15`**). **`[tool.ruff] target-version`** is **`py311`**. CI follows **`.python-version`** (currently **3.12**). -### Changed - -- **Docs / examples:** **`DEVELOPMENT.md`**, **`AGENTS.md`**, **`docs/sdk.md`**, **`docs/troubleshooting.md`**, **`examples/integration/README.md`**, **`examples/integration/adoption/README.md`**, **`examples/deploy/README.md`** — align with the Python range and ledger-write ingest model. - ### Added +- **`flightdeck init`** (default): migrates the ledger, imports **bundled** OpenAI / Anthropic / Google pricing tables (**`pricing_version` `flightdeck-bundled-2026-05`**, illustrative snapshot), writes **`.flightdeck/pricing-catalog.yaml`**, and sets **`pricing_catalog_path`** in **`flightdeck.yaml`** so diffs can show cost signals without manual **`pricing import`**. Opt out with **`--no-bundled-pricing`**. Bundled YAML ships under **`src/flightdeck/bundled_pricing/`** (wheel package data). - **`GET /health`:** **`read_auth`** (`open` vs `bearer`) describes whether **`GET /v1/*`** requires **`Authorization: Bearer`** when **`FLIGHTDECK_LOCAL_API_TOKEN`** is set (aligned with writes). - **SQLite:** bounded retries on **`database is locked` / busy** for ledger **`execute`** paths; **`flightdeck serve --sqlite-lock-timeout`** / **`--retry-sqlite-lock`** (and env **`FLIGHTDECK_SQLITE_*`**) plus **`docs/operations-and-policy.md`** concurrency notes. - **CI / dev:** **`pytest-cov`** with **`--cov-fail-under=80`** on **`src/flightdeck`** (**`integrations/*`**, **`quickstart_smoke`**, and **`sdk/client.py`** omitted from the denominator — see **`[tool.coverage.run]`** in **`pyproject.toml`**). - **Experimental `flightdeck.integrations`:** optional extras **`integrations-langchain`**, **`integrations-temporal`**, **`integrations-openai-agents`**, and meta **`integrations-ci`** (CI job); thin mappers from OpenAI chat completions, Anthropic messages, OpenAI Agents–style results, LangChain callbacks, CrewAI-style manual totals, and Temporal-oriented **`labels`**. Docs: **`docs/sdk-integrations.md`**; examples: **`examples/integration/adoption/`**. Contributor policy updates in **`AGENTS.md`** / **`CLAUDE.md`**. +- **PostgreSQL ledger:** optional **`database_url`** in **`flightdeck.yaml`** (`postgresql://` or `postgres://`); install **`psycopg`** with **`uv sync --extra postgres`** (or **`pip install 'flightdeck-ai[postgres]'`**). Same schema migrations and API behavior as SQLite; run filters use **`::json`** predicates on **`event_json`**. **`flightdeck doctor --backup`** stays SQLite-only (use **`pg_dump`** for Postgres). Optional integration tests: **`FLIGHTDECK_TEST_POSTGRES_URL`** with the **`postgres`** extra. +- **`GET /v1/runs/export`** — NDJSON stream of the same filtered slice as **`GET /v1/runs`** (optional response headers when truncated). +- **`session_id`** / **`span_id`** query filters on **`GET /v1/runs`**, matching CLI/SDK, and **`offset`** pagination on run listings (with **`runs list`** / **`runs export`**). +- **Web Runs** page — query **`GET /v1/runs`** from the bundled UI. ### Changed +- **Docs / examples:** **`DEVELOPMENT.md`**, **`AGENTS.md`**, **`README.md`**, **`ROADMAP.md`**, **`SUPPORT.md`**, **`CONTRIBUTING.md`**, **`docs/sdk.md`**, **`docs/troubleshooting.md`**, **`docs/pricing-catalog.md`**, **`examples/integration/README.md`**, **`examples/integration/adoption/README.md`**, **`examples/deploy/README.md`** — align with the Python range, ledger-write ingest model, bundled init, ICP/sustainability copy, and outcome-oriented roadmap language. - **Web Runs:** forensics — empty / offset / truncation messaging, export copy, trace band rows or **Group by trace_id**, **View** drawer (structured fields + full JSON, **session_id** / **span_id**, focus trap + return focus, **`aria-haspopup="dialog"`**), trace/status columns; **run-query** failures show a typed error card with **Retry**. - **Web Diff:** scannable sections (policy, evidence window, pricing/catalog/hints, rollups), pre-query hint, `evaluated_at` when present; warn when imported **pricing table versions** or **providers** differ baseline vs candidate. - **Web Actions:** workspace loading skeleton; numbered approval steps; pending **Refresh list** / **Use for confirm**; clearer confirms; approval-reason placeholder; **Rollback** danger-styled; **Actions** shows whether **`VITE_FLIGHTDECK_LOCAL_API_TOKEN`** is set (no value) and an inline hint when the server uses **Bearer** and the UI token is missing. - **Web shell / Overview / CSS:** **Langfuse-style** left sidebar + main column (stacks on narrow viewports); skeleton loading on first load; **Overview** auto-polls timeline + metrics every **30s** when the tab is visible (silent refresh; no manual **Refresh** button); updates after **Actions** mutations via context; ledger metrics hints + links to **Diff** / **Runs**; Diff query **`aria-busy`**; **Security strip** `/health` loading + **Bearer** + client-token reassurance line; shared **focus-visible** / type scale / narrow breakpoints; **skip to main** (HashRouter-safe); **[ROADMAP.md](ROADMAP.md)** adds **Visual system** backlog item and theme deferral. -- **Examples / deploy / SECURITY / web README:** [examples/README.md](examples/README.md) end-to-end loop + **UI polish / operator flow** blurb; deploy checklist + **`restart: unless-stopped`**; **[SECURITY.md](SECURITY.md)** deploy pointer; **[web/README.md](web/README.md)** Playwright approval vs default runs. -- **Playwright:** `e2e-server.mjs` gates approval workspace on **`PW_FORCE_APPROVAL_WORKSPACE`** (set from config); **`reuseExistingServer: false`**; config sets approval workspace only when the CLI lists **exactly one** `e2e/*.spec.ts` path and it is **`actions-approval.spec.ts`** (avoids multi-spec argv; **`PW_WEBSERVER_APPROVAL`** no longer toggles the server so a stale value cannot break **`npm run test:e2e`**); **`actions-approval.spec.ts`** skips when **`GET /v1/workspace`** shows approval off (e.g. full suite with **`FD_E2E_FORCE_APPROVAL=1`**). - -### Added - -- **PostgreSQL ledger:** optional **`database_url`** in **`flightdeck.yaml`** (`postgresql://` or `postgres://`); install **`psycopg`** with **`uv sync --extra postgres`** (or **`pip install 'flightdeck-ai[postgres]'`**). Same schema migrations and API behavior as SQLite; run filters use **`::json`** predicates on **`event_json`**. **`flightdeck doctor --backup`** stays SQLite-only (use **`pg_dump`** for Postgres). Optional integration tests: **`FLIGHTDECK_TEST_POSTGRES_URL`** with the **`postgres`** extra. -- **`GET /v1/runs/export`** — NDJSON stream of the same filtered slice as **`GET /v1/runs`** (optional response headers when truncated). -- **`session_id`** / **`span_id`** query filters on **`GET /v1/runs`**, matching CLI/SDK, and **`offset`** pagination on run listings (with **`runs list`** / **`runs export`**). -- **Web Runs** page — query **`GET /v1/runs`** from the bundled UI. +- **Examples / deploy / SECURITY / web README:** [examples/README.md](examples/README.md) end-to-end loop + **UI polish / operator flow** blurb; deploy checklist + **`restart: unless-stopped`**; **[SECURITY.md](SECURITY.md)** deploy pointer; **[web/README.md](web/README.md)** Playwright approval vs default runs and **`init --no-bundled-pricing`** for stable **`GET /v1/workspace`** probes. +- **Playwright:** `e2e-server.mjs` gates approval workspace on **`PW_FORCE_APPROVAL_WORKSPACE`** (set from config); **`reuseExistingServer: false`**; config sets approval workspace only when the CLI lists **exactly one** `e2e/*.spec.ts` path and it is **`actions-approval.spec.ts`** (avoids multi-spec argv; **`PW_WEBSERVER_APPROVAL`** no longer toggles the server so a stale value cannot break **`npm run test:e2e`**); **`actions-approval.spec.ts`** skips when **`GET /v1/workspace`** shows approval off (e.g. full suite with **`FD_E2E_FORCE_APPROVAL=1`**); default e2e workspace uses **`flightdeck init --no-bundled-pricing`** so **`pricing_catalog_configured`** stays **`false`** for **`e2e/smoke.spec.ts`**. +- **Examples / CI / deploy / Helm pins:** **`flightdeck-ai>=1.2.0`** where version pins apply. ## 1.1.2 - 2026-05-03 diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 24574e3..24081f1 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -6,6 +6,10 @@ Contributions are accepted under the **Apache License, Version 2.0** (see **`LIC Human and AI contributors: follow **[AGENTS.md](AGENTS.md)** (full rules). For a short index, see **[CLAUDE.md](CLAUDE.md)**. In **Cursor**, the project rule **[`.cursor/rules/flightdeck-ci-artifacts.mdc`](.cursor/rules/flightdeck-ci-artifacts.mdc)** (`alwaysApply`) summarizes the **web `static/`** and **`schemas/`** drift gates CI enforces. +## Who we are building for + +The product ICP is **platform or ML engineering teams** (often about **5–30** people) at **Series B+**-style companies shipping **at least two** **LLM-backed agents** to production—teams that have already been burned by a **cost spike** or **quality regression** tied to a **prompt** or **model** change. Contributions should shorten their path to **versioned releases**, **ingested evidence**, **economic diffs**, and **policy-gated promote**—not broaden scope into orchestration or hosted tracing (see **[AGENTS.md](AGENTS.md)** non-goals). + ## Local Setup Recommended (**[uv](https://docs.astral.sh/uv/)** — see **`DEVELOPMENT.md`**): diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md index f42ccf7..f1d2c5c 100644 --- a/DEVELOPMENT.md +++ b/DEVELOPMENT.md @@ -85,7 +85,7 @@ Full command flags and exit codes: [README.md](https://github.com/flightdeckdev/ `flightdeck-quickstart-verify` (entry point for `src/flightdeck/quickstart_smoke.py`) runs the full quickstart workflow end-to-end in an isolated temp directory: -1. `flightdeck init` +1. `flightdeck init` (bundled OpenAI / Anthropic / Google snapshot + catalog; additive with the imports below) 2. Import both pricing tables from `examples/quickstart/` 3. `flightdeck policy set` 4. Register baseline and candidate releases — capture the `release_id` printed to stdout @@ -143,7 +143,7 @@ Merging to **`main` does not publish packages** — PyPI uploads are **tag-drive 1. **PyPI:** add a **trusted publisher** for **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)** — workflow **`release-pypi.yml`**. If PyPI offers **Environment name: (Any)**, you can still use a GitHub **Environment** named **`pypi`** for approval gates; otherwise match whatever you register on PyPI ([trusted publishers](https://docs.pypi.org/trusted-publishers/)). 2. **GitHub:** Settings → **Environments** → create **`pypi`** (optional: required reviewers / wait timer before OIDC publish). 3. Bump **`version`** in **`pyproject.toml`** and **`src/flightdeck/__init__.py`**, update **`CHANGELOG.md`**, merge to **`main`**. -4. **`git tag vX.Y.Z`** (must match **`pyproject.toml`** exactly, e.g. **`v1.1.2`**) then **`git push origin vX.Y.Z`**. +4. **`git tag vX.Y.Z`** (must match **`pyproject.toml`** exactly, e.g. **`v1.2.0`**) then **`git push origin vX.Y.Z`**. The workflow runs **ruff**, **pytest**, schema drift, **`uv build`**, publishes **sdist + wheel** to **PyPI** via **OIDC** (no long-lived API token in repo secrets), enables **publish attestations**, and creates a **GitHub Release** with generated notes and **`dist/*`** assets. diff --git a/README.md b/README.md index 752229a..9c49321 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ **Ship AI agents safely with release diffs, runtime evidence, and policy gates.** -FlightDeck is **local-first** (CLI + SQLite + optional **`flightdeck serve`** UI). It is not an agent framework, prompt IDE, tracing dashboard, or gateway — it is where **what shipped**, **what ran**, **what it cost**, and **whether promote is allowed** are recorded and compared. +FlightDeck is **local-first** (CLI + SQLite + optional **`flightdeck serve`** UI): run evidence, pricing tables, and the ledger **stay on disk in your environment** by default—**no trace or billing payload is sent to FlightDeck as a vendor**. That posture matters for **regulated**, **air-gapped**, and **data-sovereignty** teams that cannot ship telemetry to a third-party SaaS observability backend. It is not an agent framework, prompt IDE, tracing dashboard, or gateway — it is where **what shipped**, **what ran**, **what it cost**, and **whether promote is allowed** are recorded and compared. ## In ~20 seconds @@ -13,12 +13,14 @@ FlightDeck is **local-first** (CLI + SQLite + optional **`flightdeck serve`** UI ## Example outcome -You ship a candidate whose **system prompt drifts by a handful of tokens**; under your imported tariffs the diff shows **cost per run up ~31%** while policy caps spend. **`flightdeck release promote`** (or the HTTP promote path) **stays blocked** until you change the model, relax policy with intent, or widen evidence — not because CI is slow, but because the **governed ledger** says no. +You ship a candidate whose **system prompt drifts by a handful of tokens**; under your tariffs the diff shows **cost per run up ~31%** while policy caps spend. **`flightdeck release promote`** (or the HTTP promote path) **stays blocked** until you change the model, relax policy with intent, or widen evidence — not because CI is slow, but because the **governed ledger** says no. (The **~31%** story uses the **two custom pricing YAMLs** in **[examples/quickstart/](examples/quickstart/)**; **`flightdeck init`** alone seeds a **bundled snapshot** so your **first** cost-aware diff does not start from an empty pricing ledger.) ## Who should use this? +- **Primary buyer / ICP:** **Platform or ML engineering teams** (often **5–30** people) at **growth-stage** companies shipping **two or more** **LLM agents** to production—especially teams that already had a **cost** or **regression** incident from a **prompt** or **model** change and need a **governed** promote path. - Teams that **version agent builds** (prompts, tools, model pins) and need a **durable audit trail**. - Engineers who want **one command** to answer “is this candidate safe to roll forward?” with **numbers**, not gut feel. +- **Healthcare, fintech, and enterprise** operators who **cannot** default to sending traces or cost data to a **hosted** observability vendor—**local-first** evidence and pricing imports are the default integration model. - Anyone who has outgrown **ad hoc** folder diffs or **spreadsheet** promote checklists. ## How FlightDeck fits your stack @@ -53,6 +55,7 @@ flowchart LR | **Primary job** | **Release + promote governance** for agents (ledger, diff, policy) | Tracing, sessions, evals, LLM observability | ML / model observability and monitoring | Source control and generic pipelines | | **Immutable release artifact** | Yes (`release.yaml` + checksum) | No | No | Only if you build it | | **Evidence + cost/latency diff** | Yes (runs + pricing tables / optional catalog) | Different lens (trace-level) | Different lens | DIY | +| **Default data residency** | **On your machine** (CLI / SQLite / local HTTP) | Typically SaaS-hosted | Cloud offerings | Your repo | | **Policy gate on promote** | First-class | No | No | DIY | **Try the UI:** run **`flightdeck serve`**, then open **http://127.0.0.1:8765/** — Overview, Diff, and Actions (see [docs/web-ui.md](docs/web-ui.md)). @@ -61,7 +64,7 @@ flowchart LR Small prompt or model changes can silently move **cost**, **latency**, and **error rate**. FlightDeck turns those moves into **explicit promote decisions** backed by ingested runs — before production pointers advance. -**Current local spine:** versioned **`release.yaml`** + checksums · **`RunEvent`** ingest (JSONL or arrays) · immutable **pricing** imports · **`flightdeck release diff`** · policy-gated **`release promote`** / rollback · full **audit history**. +**Current local spine:** versioned **`release.yaml`** + checksums · **`RunEvent`** ingest (JSONL or arrays) · **bundled default pricing** on **`flightdeck init`** (plus optional **`pricing import`**) · **`flightdeck release diff`** · policy-gated **`release promote`** / rollback · full **audit history**. ## Status @@ -70,9 +73,11 @@ FlightDeck is **local-first** and ships as a Python CLI backed by SQLite. **v1.0.0** froze **SemVer-stable public contracts** for the documented CLI, committed **`schemas/v1/`**, and **`POST /v1/events`** with **`api_version` `v1`**. **v1.1.x** adds catalog-aware diffs, approval flows, and forensics slices (optional pricing catalog on diffs, promotion request/confirm, read-only runs listing, **`GET /v1/workspace`** for UI and automation, Helm/fleet examples) -without breaking those v1.0 shapes. See **[RELEASE_NOTES.md](RELEASE_NOTES.md)** and **[CHANGELOG.md](CHANGELOG.md)**. +without breaking those v1.0 shapes. **v1.2.0** raises the Python floor to **3.11+**, tightens **Bearer** gating for **`POST /v1/events`** and **`GET /v1/*`** when **`FLIGHTDECK_LOCAL_API_TOKEN`** is set, adds optional **PostgreSQL**, **bundled default pricing** on **`flightdeck init`**, and experimental **`flightdeck.integrations`**. See **[RELEASE_NOTES.md](RELEASE_NOTES.md)** and **[CHANGELOG.md](CHANGELOG.md)**. The product scope is still intentionally narrow (release governance, not a hosted agent platform). +**Maintenance and sustainability:** the project is **Apache-2.0** with **no required commercial license**. If FlightDeck matters to your production stack, use **[SUPPORT.md](SUPPORT.md)** for security, commercial, and sponsorship pointers, and the **Sponsor** affordance on **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)** when it is enabled—signals like that answer “what happens if maintenance stops?” more credibly than roadmap prose alone. + Not implemented yet: - hosted control plane @@ -123,10 +128,12 @@ Or use the bash wrapper (Git Bash / WSL on Windows): ./scripts/smoke.sh ``` -Or walk through the core commands: +**Bundled pricing (default `init`):** **`flightdeck init`** migrates the ledger, imports **OpenAI**, **Anthropic**, and **Google** (Gemini-class) tables at **`pricing_version` `flightdeck-bundled-2026-05`**, and writes **`.flightdeck/pricing-catalog.yaml`** with **`pricing_catalog_path`** set in **`flightdeck.yaml`**. In **`release.yaml`**, set **`spec.pricing_reference`** to `{ provider: openai | anthropic | google, pricing_version: flightdeck-bundled-2026-05 }` to get **per-table** and **catalog** cost lines on diffs without authoring YAML. These rates are a **convenience snapshot**, not live vendor billing—**`flightdeck pricing import`** your own files for production. Use **`flightdeck init --no-bundled-pricing`** for an empty ledger. + +Or walk through the **full quickstart** (policy + **two** custom tariffs for the **~31%** narrative—same flow CI runs): ```bash -flightdeck init +flightdeck init # omit --no-bundled-pricing; bundled tables are additive with the imports below flightdeck pricing import examples/quickstart/pricing-baseline.yaml flightdeck pricing import examples/quickstart/pricing-candidate.yaml flightdeck policy set examples/quickstart/policy.yaml @@ -166,6 +173,7 @@ Substitute them before ingestion, or run **`uv run flightdeck-quickstart-verify` - [Development](DEVELOPMENT.md) - [Contributing](CONTRIBUTING.md) - [Security](SECURITY.md) +- [Support and sustainability](SUPPORT.md) - [CLAUDE.md](CLAUDE.md) and [AGENTS.md](AGENTS.md) ## Development diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md index a16aea3..a1a53ed 100644 --- a/RELEASE_NOTES.md +++ b/RELEASE_NOTES.md @@ -4,24 +4,19 @@ High-level notes for **shipping FlightDeck**. Detailed history: **[CHANGELOG.md] Narrative docs (including the CLI reference) are maintained on **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)** `main`; this file and **`schemas/`** ship in minimal clones. -## Upcoming — Optional Python integrations (experimental) - -Patch-line documentation: optional **`flightdeck.integrations`** mappers behind **`integrations-*`** -extras (see **`pyproject.toml`**, **`docs/sdk-integrations.md`**, **`examples/integration/adoption/`**). -**Stable payload contract:** **`RunEvent`** JSON for **`POST /v1/events`** (shape unchanged; **v1.2.0** tightens **HTTP access** for ingest — see **v1.2.0** notes below). **`AGENTS.md`** clarifies -that these adapters are adoption glue, not in-product orchestration or a plugin registry. CI adds -an **`integrations`** job (**`uv sync --frozen --extra dev --extra integrations-ci`**) for LangChain -callback coverage. - -## v1.2.0 — Python 3.11+ floor and protected event ingest - -Minor release (see **[CHANGELOG.md](CHANGELOG.md)**): **`requires-python`** is **`>=3.11,<4`** so -installs work on common production interpreters (**3.11–3.14**). **`POST /v1/events`** is a **ledger -write** and now matches the promote/rollback access model: **loopback-only** when -**`FLIGHTDECK_LOCAL_API_TOKEN`** is unset; **Bearer required** when it is set (covers Docker **`--host -0.0.0.0`** and private LANs). **`POST /v1/diff`** remains read-only and ungated. Migration: remote -emitters must send **`Authorization: Bearer`** whenever the server uses a local API token; loopback -scripts without a token are unchanged. +## v1.2.0 — Python 3.11+, protected ingest and reads, bundled pricing, Postgres, integrations + +Minor release (see **[CHANGELOG.md](CHANGELOG.md)** for the full list). + +- **Python floor:** **`requires-python`** is **`>=3.11,<4`** so installs work on common production interpreters (**3.11–3.14**). **`[tool.ruff] target-version`** is **`py311`**. +- **HTTP / trust:** **`POST /v1/events`** is a **ledger write** and matches the promote/rollback access model: **loopback-only** when **`FLIGHTDECK_LOCAL_API_TOKEN`** is unset; **Bearer required** when it is set. When a token is set, **`GET /v1/*`** read APIs require the same **Bearer** header (previously only mutations were gated). **`POST /v1/diff`** stays read-only and ungated. **`GET /health`** adds **`read_auth`** (`open` vs `bearer`). **Migration:** remote emitters must send **`Authorization: Bearer`** whenever the server uses a local API token; loopback scripts without a token are unchanged. +- **`flightdeck init`:** by default seeds **bundled** OpenAI / Anthropic / Google (**`google`** = Gemini-class) pricing at **`flightdeck-bundled-2026-05`**, writes **`.flightdeck/pricing-catalog.yaml`**, and sets **`pricing_catalog_path`** (additive for **new** workspaces). **`flightdeck init --no-bundled-pricing`** restores config-only init. +- **Ledger backends:** optional **`database_url`** (**PostgreSQL**) with **`psycopg`** extra; SQLite busy retries and **`flightdeck serve`** SQLite tuning flags. +- **Evidence / UI:** **`GET /v1/runs/export`**, **`session_id`** / **`span_id`** filters and **`offset`** on run listings; bundled **Web Runs** page and substantial **Runs / Diff / Actions / shell** UX improvements (see changelog). +- **Experimental `flightdeck.integrations`:** optional **`integrations-*`** extras and CI **`integrations`** job; **`RunEvent`** wire shape unchanged — adapters are adoption glue per **`AGENTS.md`**. +- **Quality:** **`pytest-cov`** with **`--cov-fail-under=80`** on core **`flightdeck`**; Playwright **`e2e-server.mjs`** uses **`init --no-bundled-pricing`** for stable default **`GET /v1/workspace`** expectations. + +**Stable contracts:** breaking items are the **Python range**, **ingest + read Bearer** rules when a token is set, and the new **default init** workspace layout; HTTP and **`v1`** payload shapes remain additive aside from those access changes. ## v1.1.2 — Forensics filters, JSONL export, productization closure slice diff --git a/ROADMAP.md b/ROADMAP.md index 6a2c47e..2fb80da 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -6,7 +6,7 @@ This document is **strategy and ordering**, not a second changelog. It goes from **Reality check:** FlightDeck is intentionally **local-first** (CLI + SQLite + optional `flightdeck serve`). That keeps trust boundaries explicit; teams still supply integration glue to run it broadly in production. -**Version detail:** The current shipping line is **v1.1.2**. For SemVer-by-SemVer behavior and migrations, use **[RELEASE_NOTES.md](RELEASE_NOTES.md)** and **[CHANGELOG.md](CHANGELOG.md)**. +**Version detail:** The current shipping line is **v1.2.0**. For SemVer-by-SemVer behavior and migrations, use **[RELEASE_NOTES.md](RELEASE_NOTES.md)** and **[CHANGELOG.md](CHANGELOG.md)**. --- @@ -35,25 +35,25 @@ Strategic UX intent for the bundled React app (routing and components: **[docs/w **Shipped surfaces** -| Surface | Role | -|--------|------| -| **Overview** | Ledger / promotion snapshot, ledger metrics | -| **Diff** | Release comparison, pricing / catalog / hints, policy outcome | -| **Runs** | Forensics filters, listing, export | -| **Actions / Promote** | Direct promote vs approval request/confirm, rollback | -| **Shell** | Primary nav, security/status strip, optional read-only build | +| Surface | Role | Operator outcome (intent) | +|--------|------|----------------------------| +| **Overview** | Ledger / promotion snapshot, ledger metrics | See promotion posture and ledger health at a glance before opening Diff or Runs. | +| **Diff** | Release comparison, pricing / catalog / hints, policy outcome | Decide promote vs blocked with scannable economics and policy, not raw JSON first. | +| **Runs** | Forensics filters, listing, export | Narrow to the slice that explains a spike or incident without re-ingesting elsewhere. | +| **Actions / Promote** | Direct promote vs approval request/confirm, rollback | Complete an auditable promotion or rollback with clear guardrails. | +| **Shell** | Primary nav, security/status strip, optional read-only build | Trust posture (token, read-only) stays visible while navigating. | **UX and UI backlog (grouped)** These map to **What is next** items **1**, **2**, and **5**; ship notes stay in **RELEASE_NOTES** / **CHANGELOG**. -1. **Runs and forensics (web)** — Run or trace **detail** (drawer or page), clearer **empty and error** states, optional **timeline** grouping by `trace_id` / session, export affordances consistent with server limits. -2. **Diff comprehension** — Stronger **scannability** for policy blocks and pricing/catalog lines; surface **version skew** and hint copy when the API exposes it. -3. **Promotion and approval** — **Progressive disclosure** for approval vs direct promote, clearer confirmation copy, **pending requests** table polish. -4. **Overview and trust** — Metrics **context** (what a counter means), light cross-links to Diff/Runs—not a metrics dashboard product. -5. **Shell and quality bar** — **Loading** states, consistent spacing and type rhythm, keyboard **focus** and labels, layouts that tolerate narrow viewports where cheap. -6. **Security ergonomics (UI)** — Token/env/mutation visibility, read-only build behavior, cautious affordances for destructive actions. -7. **Visual system** — Shared typography scale, spacing rhythm, **focus-visible** affordances, and narrow-layout breakpoints so the operator surfaces stay legible without a separate design system product. +1. **Outcome:** an engineer can open a **single run or trace** view and answer “what happened on this request?” without leaving the app — **Runs and forensics (web):** run or trace **detail** (drawer or page), clearer **empty and error** states, optional **timeline** grouping by `trace_id` / session, export affordances consistent with server limits. +2. **Outcome:** a reviewer spots **policy blocks** and **pricing skew** in seconds — **Diff comprehension:** stronger **scannability** for policy blocks and pricing/catalog lines; surface **version skew** and hint copy when the API exposes it. +3. **Outcome:** an approver completes **request → confirm** without ambiguity — **Promotion and approval:** **progressive disclosure** for approval vs direct promote, clearer confirmation copy, **pending requests** table polish. +4. **Outcome:** counters on Overview are **interpretable**, not decorative — **Overview and trust:** metrics **context** (what a counter means), light cross-links to Diff/Runs—not a metrics dashboard product. +5. **Outcome:** the UI feels **fast and accessible** on a laptop — **Shell and quality bar:** **loading** states, consistent spacing and type rhythm, keyboard **focus** and labels, layouts that tolerate narrow viewports where cheap. +6. **Outcome:** operators **see** when mutations or tokens apply — **Security ergonomics (UI):** token/env/mutation visibility, read-only build behavior, cautious affordances for destructive actions. +7. **Outcome:** dense operator layouts stay **readable** without a bespoke design system — **Visual system:** shared typography scale, spacing rhythm, **focus-visible** affordances, and narrow-layout breakpoints so the operator surfaces stay legible without a separate design system product. **Explicit UI deferrals** @@ -81,15 +81,15 @@ Gaps between “works locally” and “easy to use across production services. Each item ties to the core promise: **release integrity**, **runtime evidence**, **policy-gated promotion**, and **auditability** (see **[AGENTS.md](AGENTS.md)**). -1. **Evidence and forensics (web)** — Replay/trace-oriented views and richer export semantics on top of `runs list`, `trace_id`, and JSONL export, so operators can reason over evidence without leaving the product surface. *UI details: **[Web UI and operator experience](#web-ui-and-operator-experience)**.* -2. **Catalog lifecycle and diff diagnostics** — Stronger mismatch signals beyond pricing-table row presence (for example version skew hints), strengthening economic governance on diffs. *UI details: **[Web UI and operator experience](#web-ui-and-operator-experience)**.* -3. **Integration glue** — Maintain app runtime emitters, CI/GitOps examples, and `serve` deployment recipes so the path from code to gated promotion is copy-pasteable. -4. **Serve and deployment hardening** — Clear operator narrative for health checks, supervision, and backup/restore alongside existing Compose/Helm references. -5. **Security ergonomics** — Continue explicit token/env status, mutation guardrails, and optional read-only UI patterns for local and bounded remote use. *UI details: **[Web UI and operator experience](#web-ui-and-operator-experience)**.* -6. **OTLP-oriented integration (mid term)** — Documented or thin adapter-style paths for correlated telemetry; not a commitment to an in-product APM. -7. **Fleet / cross-workspace (conditional)** — Broader governance surfaces only after the signals in **Horizons and conditions** below; default remains one workspace, one ledger. +1. **Outcome:** operators **pinpoint the run or trace** behind a regression or cost jump from the web — **Evidence and forensics (web):** replay/trace-oriented views and richer export semantics on top of `runs list`, `trace_id`, and JSONL export, so operators can reason over evidence without leaving the product surface. *UI details: **[Web UI and operator experience](#web-ui-and-operator-experience)**.* +2. **Outcome:** economic diffs **surface version and naming skew** before a bad promote — **Catalog lifecycle and diff diagnostics:** stronger mismatch signals beyond pricing-table row presence (for example version skew hints), strengthening economic governance on diffs. *UI details: **[Web UI and operator experience](#web-ui-and-operator-experience)**.* +3. **Outcome:** a new service reaches **register → ingest → diff → gate** using **maintained examples** — **Integration glue:** maintain app runtime emitters, CI/GitOps examples, and `serve` deployment recipes so the path from code to gated promotion is copy-pasteable. +4. **Outcome:** **`flightdeck serve`** in production is **boring to operate** (health, restarts, backups) — **Serve and deployment hardening:** clear operator narrative for health checks, supervision, and backup/restore alongside existing Compose/Helm references. +5. **Outcome:** teams using **Bearer** and read-only builds **do not foot-gun** — **Security ergonomics:** continue explicit token/env status, mutation guardrails, and optional read-only UI patterns for local and bounded remote use. *UI details: **[Web UI and operator experience](#web-ui-and-operator-experience)**.* +6. **Outcome:** correlated **infra** telemetry can sit **next to** ledger evidence without becoming an APM product — **OTLP-oriented integration (mid term):** documented or thin adapter-style paths for correlated telemetry; not a commitment to an in-product APM. +7. **Outcome (conditional):** multi-team governance **without** breaking one-ledger trust — **Fleet / cross-workspace (conditional):** broader governance surfaces only after the signals in **Horizons and conditions** below; default remains one workspace, one ledger. -Optional milestone framing (headline only): a **v1.2** line might emphasize **forensics + catalog diagnostics**; ship notes still land in **RELEASE_NOTES** / **CHANGELOG**. +**v1.2.0** ships the Python **3.11+** floor, **HTTP access** tightening for ingest and read APIs when a local token is set, **bundled default pricing** on **`flightdeck init`**, optional **PostgreSQL**, **runs export** / filters, substantial **web** operator UX, and experimental **`flightdeck.integrations`**. Deeper **catalog diagnostics** and **forensics** workstreams continue under **What is next**; ship notes live in **RELEASE_NOTES** / **CHANGELOG**. --- @@ -139,8 +139,8 @@ Use **[examples/README.md](examples/README.md)** as a discoverability pass again **Operator experience (web):** -- An operator can reach a **promote vs blocked-by-policy** conclusion from **Diff** and **Actions** without opening raw JSON first. -- A forensics task (for example trace-scoped triage) is completed from **Runs** without falling back to the CLI for the same filters and slice. +- **Outcome:** within one **Diff** + **Actions** pass, an operator states **promote vs blocked-by-policy** without opening raw JSON first. +- **Outcome:** within about **two minutes**, an engineer **isolates the run or trace** responsible for a cost or error spike using **Runs** filters and export—without re-running the CLI for the same slice. --- diff --git a/SUPPORT.md b/SUPPORT.md new file mode 100644 index 0000000..21499ce --- /dev/null +++ b/SUPPORT.md @@ -0,0 +1,15 @@ +# Support + +## Security issues + +Follow **[SECURITY.md](SECURITY.md)** for vulnerability reporting. Do not open public issues for undisclosed security problems. + +## Commercial inquiries and sustainability + +FlightDeck is **Apache-2.0** open source. There is no required hosted tier; the default product runs **on your machines** with the CLI and optional local **`flightdeck serve`**. + +For **commercial partnerships**, **support arrangements**, or **sustainability** questions, use **[GitHub Discussions or Issues](https://github.com/flightdeckdev/flightdeck)** on the canonical repository. If **GitHub Sponsors** is enabled on that org or repository, the **Sponsor** button there is the preferred low-friction way to signal that ongoing maintenance matters to your team. + +## Bugs and features + +Use **[GitHub Issues](https://github.com/flightdeckdev/flightdeck/issues)** for bug reports and scoped feature requests, with a minimal repro or workspace description when possible. diff --git a/docs/cli.md b/docs/cli.md index 526d07b..5591085 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -16,8 +16,8 @@ serve` see [http-api.md](http-api.md). | `--help` | Print help for any command or subcommand | All commands require a `flightdeck.yaml` in the working directory (or the default path -`./flightdeck.yaml`). Run `flightdeck init` to create one. The only exception is -`flightdeck init` itself — it writes the file and does not call `load_config`. +`./flightdeck.yaml`). Run `flightdeck init` to create one. **`flightdeck init`** writes the +config, then loads it to migrate the ledger and (by default) import bundled pricing. ## Actor resolution @@ -44,28 +44,35 @@ field in the request body (defaults to `"http"` when omitted). ## `flightdeck init` -Create a default `flightdeck.yaml` workspace config in the current directory. +Create a default `flightdeck.yaml` workspace config in the current directory. By default +this also **migrates the ledger**, **imports bundled** OpenAI / Anthropic / Google pricing +tables (snapshot **`flightdeck-bundled-2026-05`**), writes **`.flightdeck/pricing-catalog.yaml`**, +and sets **`pricing_catalog_path`** so diffs can show **catalog** rollups without a manual +**`pricing import`**. Use **`--no-bundled-pricing`** for an empty ledger (air-gapped or +custom-only). ```bash -flightdeck init [--path PATH] +flightdeck init [--path PATH] [--no-bundled-pricing] ``` | Option | Default | Description | |--------|---------|-------------| | `--path` | `flightdeck.yaml` | Target path for the config file | +| `--no-bundled-pricing` | off | Skip bundled pricing import and catalog; omit `pricing_catalog_path` from the new config | Fails with exit 1 if the file already exists. -**Example output:** +**Example output (default):** ``` Wrote flightdeck.yaml +Bundled pricing snapshot (flightdeck-bundled-2026-05): imported openai, anthropic, google; wrote catalog to .flightdeck/pricing-catalog.yaml ``` -The generated file uses all defaults. Edit `diff.*` thresholds or `db_path` before using -in a shared repo. For **PostgreSQL**, set **`database_url`** to a `postgresql://…` (or -`postgres://…`) DSN and install **`psycopg`** (`uv sync --extra postgres`); **`db_path`** -is ignored when **`database_url`** is set. **`flightdeck doctor --backup`** remains -SQLite-only. See [release-artifact.md § Workspace config](release-artifact.md). +The generated file uses defaults except **`pricing_catalog_path`** when bundled pricing is +enabled. Edit `diff.*` thresholds or `db_path` before using in a shared repo. For **PostgreSQL**, +set **`database_url`** to a `postgresql://…` (or `postgres://…`) DSN and install **`psycopg`** +(`uv sync --extra postgres`); **`db_path`** is ignored when **`database_url`** is set. +**`flightdeck doctor --backup`** remains SQLite-only. See [release-artifact.md § Workspace config](release-artifact.md) and [pricing-catalog.md](pricing-catalog.md) (bundled snapshot). --- diff --git a/docs/http-api.md b/docs/http-api.md index e6b2f06..ec2cf72 100644 --- a/docs/http-api.md +++ b/docs/http-api.md @@ -109,7 +109,7 @@ Read-only flags derived from `flightdeck.yaml` plus the running package version. "kind": "WorkspacePublic", "promotion_requires_approval": false, "pricing_catalog_configured": false, - "server_version": "1.1.2" + "server_version": "1.2.0" } ``` diff --git a/docs/pricing-catalog.md b/docs/pricing-catalog.md index 7f2e235..0edb51a 100644 --- a/docs/pricing-catalog.md +++ b/docs/pricing-catalog.md @@ -10,6 +10,18 @@ Imported **pricing tables** (`flightdeck pricing import …`) drive per-model to `POST /v1/diff` / `release diff`) and diagnostics in **`pricing.hints`** when multiple pricing table versions or naming patterns appear in the evidence window. +## Bundled snapshot (`flightdeck init`) + +Unless you pass **`--no-bundled-pricing`**, **`flightdeck init`** imports three convenience tables +(**`openai`**, **`anthropic`**, **`google`**) at **`pricing_version` `flightdeck-bundled-2026-05`** +(illustrative USD/1k token rates, **not** live vendor APIs). It copies the matching **PricingCatalog** +to **`.flightdeck/pricing-catalog.yaml`** and sets **`pricing_catalog_path`** in **`flightdeck.yaml`**. + +Pin **`spec.pricing_reference`** in **`release.yaml`** to **`provider` + `flightdeck-bundled-2026-05`** +for the side you want priced. For **Gemini-class** models, use **`provider: google`** in both the +release runtime and pricing reference. For production accuracy, **`flightdeck pricing import`** +your own YAML (and optionally **`--replace`** with **`--reason`**). + ## Relationship to `pricing.prices` On a diff, **`pricing.prices`** (when present) reflects **per-side imported tables** for the resolved baseline/candidate diff --git a/examples/ci/README.md b/examples/ci/README.md index 2fe7e31..001ab73 100644 --- a/examples/ci/README.md +++ b/examples/ci/README.md @@ -37,7 +37,7 @@ uv run python examples/ci/ledger_gate.py Example (**PyPI** install): ```bash -pip install "flightdeck-ai>=1.1.2" +pip install "flightdeck-ai>=1.2.0" export WORKSPACE="$(mktemp -d)" export QUICKSTART_ROOT=/path/to/flightdeck/examples/quickstart python /path/to/flightdeck/examples/ci/ledger_gate.py diff --git a/examples/ci/github-actions/policy-gate-pypi.yml b/examples/ci/github-actions/policy-gate-pypi.yml index 74c1dff..c6618df 100644 --- a/examples/ci/github-actions/policy-gate-pypi.yml +++ b/examples/ci/github-actions/policy-gate-pypi.yml @@ -11,7 +11,7 @@ on: env: # Pin to a tag or SHA that matches your installed flightdeck-ai version when possible. FLIGHTDECK_REF: main - FLIGHTDECK_AI_SPEC: ">=1.1.2" + FLIGHTDECK_AI_SPEC: ">=1.2.0" jobs: ledger-gate: diff --git a/examples/deploy/Dockerfile b/examples/deploy/Dockerfile index 823e9b7..837c767 100644 --- a/examples/deploy/Dockerfile +++ b/examples/deploy/Dockerfile @@ -2,7 +2,7 @@ FROM python:3.14-slim RUN pip install --no-cache-dir --upgrade pip \ - && pip install --no-cache-dir "flightdeck-ai>=1.1.2" + && pip install --no-cache-dir "flightdeck-ai>=1.2.0" WORKDIR /workspace diff --git a/examples/deploy/chart/flightdeck/Chart.yaml b/examples/deploy/chart/flightdeck/Chart.yaml index b6e9a2e..fce0b69 100644 --- a/examples/deploy/chart/flightdeck/Chart.yaml +++ b/examples/deploy/chart/flightdeck/Chart.yaml @@ -2,5 +2,5 @@ apiVersion: v2 name: flightdeck description: Optional Helm chart for flightdeck serve (single-replica reference) type: application -version: 0.1.0 -appVersion: "1.1.2" +version: 0.1.1 +appVersion: "1.2.0" diff --git a/src/flightdeck/bundled_pricing/__init__.py b/src/flightdeck/bundled_pricing/__init__.py new file mode 100644 index 0000000..1e24330 --- /dev/null +++ b/src/flightdeck/bundled_pricing/__init__.py @@ -0,0 +1 @@ +"""Bundled convenience pricing snapshots shipped with FlightDeck (see bundled_pricing_bootstrap).""" diff --git a/src/flightdeck/bundled_pricing/anthropic.yaml b/src/flightdeck/bundled_pricing/anthropic.yaml new file mode 100644 index 0000000..4d4e2a0 --- /dev/null +++ b/src/flightdeck/bundled_pricing/anthropic.yaml @@ -0,0 +1,15 @@ +# FlightDeck bundled snapshot — illustrative public list prices, not live vendor APIs. +# For production accuracy, run: flightdeck pricing import +# Snapshot id: flightdeck-bundled-2026-05 (see README / docs/pricing-catalog.md). +provider: anthropic +pricing_version: flightdeck-bundled-2026-05 +entries: + - model: claude-3-5-haiku-20241022 + input_usd_per_1k_tokens: 0.25 + output_usd_per_1k_tokens: 1.25 + - model: claude-3-5-sonnet-20241022 + input_usd_per_1k_tokens: 3.0 + output_usd_per_1k_tokens: 15.0 + - model: claude-sonnet-4-20250514 + input_usd_per_1k_tokens: 3.0 + output_usd_per_1k_tokens: 15.0 diff --git a/src/flightdeck/bundled_pricing/catalog.yaml b/src/flightdeck/bundled_pricing/catalog.yaml new file mode 100644 index 0000000..7ad2f34 --- /dev/null +++ b/src/flightdeck/bundled_pricing/catalog.yaml @@ -0,0 +1,23 @@ +# Bundled PricingCatalog — maps bundled pricing tables to one comparable slot per tier. +# api_version / kind per schemas/v1/pricing_catalog.schema.json +api_version: v1 +kind: PricingCatalog +catalog_version: flightdeck-bundled-2026-05 +mappings: + - provider: openai + pricing_version: flightdeck-bundled-2026-05 + model: gpt-4o-mini + catalog_slot_id: comparable_flash_tier + - provider: anthropic + pricing_version: flightdeck-bundled-2026-05 + model: claude-3-5-haiku-20241022 + catalog_slot_id: comparable_flash_tier + - provider: google + pricing_version: flightdeck-bundled-2026-05 + model: gemini-2.0-flash + catalog_slot_id: comparable_flash_tier +tariffs: + comparable_flash_tier: + input_usd_per_1k_tokens: 0.12 + output_usd_per_1k_tokens: 0.45 + cached_input_usd_per_1k_tokens: 0.03 diff --git a/src/flightdeck/bundled_pricing/google.yaml b/src/flightdeck/bundled_pricing/google.yaml new file mode 100644 index 0000000..85c43c1 --- /dev/null +++ b/src/flightdeck/bundled_pricing/google.yaml @@ -0,0 +1,16 @@ +# FlightDeck bundled snapshot — illustrative public list prices, not live vendor APIs. +# Provider key "google" is the supported convention for Gemini-class models in release.yaml. +# For production accuracy, run: flightdeck pricing import +# Snapshot id: flightdeck-bundled-2026-05 (see README / docs/pricing-catalog.md). +provider: google +pricing_version: flightdeck-bundled-2026-05 +entries: + - model: gemini-2.0-flash + input_usd_per_1k_tokens: 0.10 + output_usd_per_1k_tokens: 0.40 + - model: gemini-2.0-flash-001 + input_usd_per_1k_tokens: 0.10 + output_usd_per_1k_tokens: 0.40 + - model: gemini-1.5-pro + input_usd_per_1k_tokens: 1.25 + output_usd_per_1k_tokens: 5.0 diff --git a/src/flightdeck/bundled_pricing/openai.yaml b/src/flightdeck/bundled_pricing/openai.yaml new file mode 100644 index 0000000..cc104dc --- /dev/null +++ b/src/flightdeck/bundled_pricing/openai.yaml @@ -0,0 +1,15 @@ +# FlightDeck bundled snapshot — illustrative public list prices, not live vendor APIs. +# For production accuracy, run: flightdeck pricing import +# Snapshot id: flightdeck-bundled-2026-05 (see README / docs/pricing-catalog.md). +provider: openai +pricing_version: flightdeck-bundled-2026-05 +entries: + - model: gpt-4o-mini + input_usd_per_1k_tokens: 0.15 + output_usd_per_1k_tokens: 0.60 + - model: gpt-4o + input_usd_per_1k_tokens: 2.5 + output_usd_per_1k_tokens: 10.0 + - model: gpt-4.1-mini + input_usd_per_1k_tokens: 0.40 + output_usd_per_1k_tokens: 1.60 diff --git a/src/flightdeck/bundled_pricing_bootstrap.py b/src/flightdeck/bundled_pricing_bootstrap.py new file mode 100644 index 0000000..6473cc4 --- /dev/null +++ b/src/flightdeck/bundled_pricing_bootstrap.py @@ -0,0 +1,48 @@ +"""Load bundled pricing YAML from the wheel and seed a new workspace.""" + +from __future__ import annotations + +from importlib import resources +from pathlib import Path +from typing import Any + +import yaml + +from flightdeck.models import PricingTable +from flightdeck.storage import Storage + +BUNDLED_PRICING_VERSION = "flightdeck-bundled-2026-05" +BUNDLED_TABLE_FILENAMES = ("openai.yaml", "anthropic.yaml", "google.yaml") +BUNDLED_CATALOG_FILENAME = "catalog.yaml" +DEFAULT_CATALOG_RELATIVE_PATH = ".flightdeck/pricing-catalog.yaml" + + +def _read_resource_text(name: str) -> str: + root = resources.files("flightdeck.bundled_pricing") + return root.joinpath(name).read_text(encoding="utf-8") + + +def load_bundled_pricing_tables() -> list[PricingTable]: + tables: list[PricingTable] = [] + for name in BUNDLED_TABLE_FILENAMES: + data: Any = yaml.safe_load(_read_resource_text(name)) + tables.append(PricingTable.model_validate(data)) + return tables + + +def load_bundled_catalog_yaml_text() -> str: + return _read_resource_text(BUNDLED_CATALOG_FILENAME) + + +def bootstrap_bundled_pricing( + *, + storage: Storage, + actor: str, + catalog_dest: Path, +) -> None: + """Import bundled pricing tables and write the catalog file to ``catalog_dest``.""" + catalog_dest.parent.mkdir(parents=True, exist_ok=True) + catalog_dest.write_text(load_bundled_catalog_yaml_text(), encoding="utf-8", newline="\n") + + for table in load_bundled_pricing_tables(): + storage.insert_pricing_table(table, replace=False, actor=actor, reason=None) diff --git a/src/flightdeck/cli/main.py b/src/flightdeck/cli/main.py index b7b1fd1..cb8e445 100644 --- a/src/flightdeck/cli/main.py +++ b/src/flightdeck/cli/main.py @@ -14,6 +14,11 @@ from flightdeck import __version__ from flightdeck.bundle import bundle_checksum +from flightdeck.bundled_pricing_bootstrap import ( + BUNDLED_PRICING_VERSION, + DEFAULT_CATALOG_RELATIVE_PATH, + bootstrap_bundled_pricing, +) from flightdeck.config import DEFAULT_CONFIG_FILENAME, load_config, write_default_config from flightdeck.doctor import run_doctor from flightdeck.models import ( @@ -74,13 +79,31 @@ def cli() -> None: @cli.command() @click.option("--path", "path_", default=DEFAULT_CONFIG_FILENAME, show_default=True) -def init(path_: str) -> None: +@click.option( + "--no-bundled-pricing", + is_flag=True, + default=False, + help="Skip bundled OpenAI/Anthropic/Google pricing import and catalog (air-gapped or custom-only).", +) +def init(path_: str, no_bundled_pricing: bool) -> None: """Create a local `flightdeck.yaml` workspace config.""" p = Path(path_) if p.exists(): raise click.ClickException(f"{p} already exists") - written = write_default_config(p) + catalog_rel: str | None = None if no_bundled_pricing else DEFAULT_CATALOG_RELATIVE_PATH + written = write_default_config(p, pricing_catalog_path=catalog_rel) click.echo(f"Wrote {written}") + cfg = load_config(written) + storage = storage_from_config(cfg) + storage.migrate() + if not no_bundled_pricing: + rel = cfg.pricing_catalog_path or DEFAULT_CATALOG_RELATIVE_PATH + catalog_dest = (Path.cwd() / Path(rel)).resolve() + bootstrap_bundled_pricing(storage=storage, actor=actor_name(), catalog_dest=catalog_dest) + click.echo( + f"Bundled pricing snapshot ({BUNDLED_PRICING_VERSION}): imported openai, anthropic, google; " + f"wrote catalog to {rel}" + ) @cli.command("doctor") diff --git a/src/flightdeck/config.py b/src/flightdeck/config.py index 6bcfa3e..1568f80 100644 --- a/src/flightdeck/config.py +++ b/src/flightdeck/config.py @@ -28,8 +28,13 @@ def load_config(path: str | Path = DEFAULT_CONFIG_FILENAME) -> WorkspaceConfig: return WorkspaceConfig.model_validate(data) -def write_default_config(path: str | Path = DEFAULT_CONFIG_FILENAME) -> Path: - cfg = WorkspaceConfig() +def write_default_config( + path: str | Path = DEFAULT_CONFIG_FILENAME, + *, + pricing_catalog_path: str | None = None, +) -> Path: + """Write a new ``flightdeck.yaml``. Pass ``pricing_catalog_path`` to enable bundled catalog layout.""" + cfg = WorkspaceConfig(pricing_catalog_path=pricing_catalog_path) p = Path(path) p.parent.mkdir(parents=True, exist_ok=True) diff --git a/tests/test_cli.py b/tests/test_cli.py index ffe3525..742db66 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -25,7 +25,7 @@ def test_cli_version() -> None: def test_doctor_backup_writes_valid_sqlite(tmp_path, monkeypatch: pytest.MonkeyPatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 dest = tmp_path / "snap" / "ledger.db" res = runner.invoke(cli, ["doctor", "--backup", str(dest)]) assert res.exit_code == 0 diff --git a/tests/test_cli_contract.py b/tests/test_cli_contract.py index 2b76cb4..9a22303 100644 --- a/tests/test_cli_contract.py +++ b/tests/test_cli_contract.py @@ -13,7 +13,7 @@ def test_release_verify_checksum_mismatch_exits_2(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") assert runner.invoke(cli, ["pricing", "import", str(pricing)]).exit_code == 0 rel_dir = write_release( @@ -35,7 +35,7 @@ def test_release_verify_checksum_mismatch_exits_2(tmp_path: Path, monkeypatch) - def test_release_diff_fail_on_policy_exits_1(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(tmp_path, max_cost_per_run_usd=0.000001) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") @@ -116,7 +116,7 @@ def test_release_diff_fail_on_policy_exits_1(tmp_path: Path, monkeypatch) -> Non def test_release_diff_contract_invalid_window_is_nonzero(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") assert runner.invoke(cli, ["pricing", "import", str(pricing)]).exit_code == 0 @@ -144,7 +144,7 @@ def test_release_diff_contract_invalid_window_is_nonzero(tmp_path: Path, monkeyp def test_release_promote_policy_fail_contract(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(tmp_path, max_cost_per_run_usd=0.0001) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") @@ -190,7 +190,7 @@ def test_release_promote_policy_fail_contract(tmp_path: Path, monkeypatch) -> No def test_release_verify_ok_exits_zero(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") assert runner.invoke(cli, ["pricing", "import", str(pricing)]).exit_code == 0 rel_dir = write_release( @@ -209,7 +209,7 @@ def test_release_verify_ok_exits_zero(tmp_path: Path, monkeypatch) -> None: def test_release_diff_unknown_baseline_nonzero(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") assert runner.invoke(cli, ["pricing", "import", str(pricing)]).exit_code == 0 cand = write_release( @@ -228,7 +228,7 @@ def test_release_diff_unknown_baseline_nonzero(tmp_path: Path, monkeypatch) -> N def test_release_history_shows_promote_line(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(tmp_path, max_cost_per_run_usd=10.0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") @@ -272,7 +272,7 @@ def test_release_history_shows_promote_line(tmp_path: Path, monkeypatch) -> None def test_release_rollback_exits_zero_and_history_shows_rollback(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(tmp_path, max_cost_per_run_usd=10.0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") diff --git a/tests/test_doctor.py b/tests/test_doctor.py index a703756..08a1beb 100644 --- a/tests/test_doctor.py +++ b/tests/test_doctor.py @@ -16,7 +16,7 @@ def test_doctor_passes_after_init(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 res = runner.invoke(cli, ["doctor"]) assert res.exit_code == 0 assert "schema_migrations" in res.output @@ -26,7 +26,7 @@ def test_doctor_passes_after_init(tmp_path: Path, monkeypatch) -> None: def test_doctor_audit_seq_ok_after_two_promotions(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(tmp_path, max_cost_per_run_usd=10.0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") @@ -152,7 +152,7 @@ def test_insert_promotion_record_uses_immediate_transaction(tmp_path: Path) -> N def test_doctor_fails_when_promoted_release_missing(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 assert runner.invoke(cli, ["doctor"]).exit_code == 0 db_path = tmp_path / ".flightdeck" / "flightdeck.db" @@ -176,7 +176,7 @@ def test_doctor_fails_when_promoted_release_missing(tmp_path: Path, monkeypatch) def test_release_actions_audit_seq_is_contiguous_direct_check(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(tmp_path, max_cost_per_run_usd=10.0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") diff --git a/tests/test_init_bundled_pricing.py b/tests/test_init_bundled_pricing.py new file mode 100644 index 0000000..a2a0ec3 --- /dev/null +++ b/tests/test_init_bundled_pricing.py @@ -0,0 +1,55 @@ +from __future__ import annotations + +from pathlib import Path + +import yaml +from click.testing import CliRunner + +from flightdeck.bundled_pricing_bootstrap import BUNDLED_PRICING_VERSION, DEFAULT_CATALOG_RELATIVE_PATH +from flightdeck.cli.main import cli +from flightdeck.config import load_config +from flightdeck.storage import storage_from_config + + +def test_init_seeds_bundled_pricing_and_catalog(tmp_path: Path, monkeypatch) -> None: + monkeypatch.chdir(tmp_path) + runner = CliRunner() + assert runner.invoke(cli, ["init"]).exit_code == 0 + + cfg = load_config(tmp_path / "flightdeck.yaml") + assert cfg.pricing_catalog_path == DEFAULT_CATALOG_RELATIVE_PATH + + cat = tmp_path / DEFAULT_CATALOG_RELATIVE_PATH + assert cat.is_file() + data = yaml.safe_load(cat.read_text(encoding="utf-8")) + assert data.get("kind") == "PricingCatalog" + assert BUNDLED_PRICING_VERSION in str(data.get("catalog_version", "")) + + storage = storage_from_config(cfg) + for provider in ("openai", "anthropic", "google"): + t = storage.get_pricing_table(provider, BUNDLED_PRICING_VERSION) + assert t is not None, f"missing {provider}" + assert t.entries + + +def test_init_no_bundled_pricing_skips_imports(tmp_path: Path, monkeypatch) -> None: + monkeypatch.chdir(tmp_path) + runner = CliRunner() + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 + + cfg = load_config(tmp_path / "flightdeck.yaml") + assert cfg.pricing_catalog_path is None + + storage = storage_from_config(cfg) + assert storage.get_pricing_table("openai", BUNDLED_PRICING_VERSION) is None + + assert not (tmp_path / ".flightdeck" / "pricing-catalog.yaml").is_file() + + +def test_bundled_resources_readable() -> None: + from flightdeck.bundled_pricing_bootstrap import load_bundled_pricing_tables + + tables = load_bundled_pricing_tables() + assert len(tables) == 3 + providers = {t.provider for t in tables} + assert providers == {"openai", "anthropic", "google"} diff --git a/tests/test_operator_slice.py b/tests/test_operator_slice.py index dea692e..6bdbe15 100644 --- a/tests/test_operator_slice.py +++ b/tests/test_operator_slice.py @@ -35,7 +35,7 @@ def _enable_promotion_approval(path: Path) -> None: def test_pricing_hints_when_alternate_pricing_version_exists(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy( tmp_path, min_candidate_runs=0, @@ -63,7 +63,7 @@ def test_pricing_hints_when_alternate_pricing_version_exists(tmp_path: Path, mon def test_catalog_comparable_cost_on_cross_provider_diff(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy( tmp_path, min_candidate_runs=0, @@ -173,7 +173,7 @@ def test_promotion_request_and_confirm(tmp_path: Path) -> None: ws.mkdir(parents=True, exist_ok=True) runner = CliRunner() with _cwd(ws): - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(ws, min_candidate_runs=0, min_baseline_runs=0, min_low_runs=0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(ws, provider="openai", pricing_version="openai-2026-04-30") @@ -212,7 +212,7 @@ def test_get_v1_runs(tmp_path: Path) -> None: ws.mkdir(parents=True, exist_ok=True) runner = CliRunner() with _cwd(ws): - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(ws, min_candidate_runs=0, min_baseline_runs=0, min_low_runs=0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(ws, provider="openai", pricing_version="openai-2026-04-30") @@ -240,7 +240,7 @@ def test_get_v1_runs_export_ndjson_and_headers(tmp_path: Path) -> None: ws.mkdir(parents=True, exist_ok=True) runner = CliRunner() with _cwd(ws): - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(ws, min_candidate_runs=0, min_baseline_runs=0, min_low_runs=0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(ws, provider="openai", pricing_version="openai-2026-04-30") @@ -270,7 +270,7 @@ def test_get_v1_runs_session_filter_and_offset(tmp_path: Path) -> None: ws.mkdir(parents=True, exist_ok=True) runner = CliRunner() with _cwd(ws): - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(ws, min_candidate_runs=0, min_baseline_runs=0, min_low_runs=0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(ws, provider="openai", pricing_version="openai-2026-04-30") @@ -318,7 +318,7 @@ def test_runs_trace_id_filter_http_and_cli(tmp_path: Path) -> None: ws.mkdir(parents=True, exist_ok=True) runner = CliRunner() with _cwd(ws): - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(ws, min_candidate_runs=0, min_baseline_runs=0, min_low_runs=0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(ws, provider="openai", pricing_version="openai-2026-04-30") @@ -366,7 +366,7 @@ def test_runs_trace_id_filter_http_and_cli(tmp_path: Path) -> None: def test_cli_runs_list_json(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(tmp_path, min_candidate_runs=0, min_baseline_runs=0, min_low_runs=0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") @@ -384,7 +384,7 @@ def test_cli_runs_list_json(tmp_path: Path, monkeypatch) -> None: def test_cli_runs_export_jsonl_truncation_and_stderr(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(tmp_path, min_candidate_runs=0, min_baseline_runs=0, min_low_runs=0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") @@ -410,7 +410,7 @@ def test_diff_survives_malformed_catalog_yaml_syntax(tmp_path: Path, monkeypatch """Invalid YAML in pricing catalog must not crash diff (YAMLError → catalog warning).""" monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 (tmp_path / "bad_catalog.yaml").write_text("catalog_version: 'unterminated\n", encoding="utf-8") fd = tmp_path / "flightdeck.yaml" cfg = yaml.safe_load(fd.read_text(encoding="utf-8")) or {} diff --git a/tests/test_release_verify.py b/tests/test_release_verify.py index 473c144..e4fbcdc 100644 --- a/tests/test_release_verify.py +++ b/tests/test_release_verify.py @@ -17,7 +17,7 @@ def test_release_verify_ok(tmp_path: Path, monkeypatch) -> None: bundle = tmp_path / "bundle" shutil.copytree(FIXTURE, bundle) - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = { "provider": "openai", "pricing_version": "p", @@ -39,7 +39,7 @@ def test_release_verify_exit_2_on_mismatch(tmp_path: Path, monkeypatch) -> None: bundle = tmp_path / "bundle" shutil.copytree(FIXTURE, bundle) - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = { "provider": "openai", "pricing_version": "p", diff --git a/tests/test_server_actions.py b/tests/test_server_actions.py index 72507b8..7226c13 100644 --- a/tests/test_server_actions.py +++ b/tests/test_server_actions.py @@ -32,7 +32,7 @@ def _seed_workspace(path: Path) -> tuple[CliRunner, str, str]: path.mkdir(parents=True, exist_ok=True) runner = CliRunner() with _cwd(path): - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(path, max_cost_per_run_usd=10.0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(path, provider="openai", pricing_version="openai-2026-04-30") @@ -115,7 +115,7 @@ def test_http_v1_diff_pricing_warnings_when_model_missing(tmp_path: Path) -> Non ws.mkdir(parents=True, exist_ok=True) runner = CliRunner() with _cwd(ws): - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy( ws, max_cost_per_run_usd=10.0, @@ -267,7 +267,7 @@ def test_http_v1_workspace_reflects_catalog_and_approval_flags(tmp_path: Path) - ws.mkdir(parents=True, exist_ok=True) runner = CliRunner() with _cwd(ws): - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 cfg_path = ws / "flightdeck.yaml" data: dict[str, Any] = yaml.safe_load(cfg_path.read_text(encoding="utf-8")) or {} data["pricing_catalog_path"] = "pricing/catalog.yaml" @@ -285,7 +285,7 @@ def test_http_v1_workspace_reflects_catalog_and_approval_flags(tmp_path: Path) - def test_ui_root_serves_vite_index(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) - assert CliRunner().invoke(cli, ["init"]).exit_code == 0 + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 app = create_app() with TestClient(app) as client: r = client.get("/") diff --git a/tests/test_server_health.py b/tests/test_server_health.py index edaa544..56e3acd 100644 --- a/tests/test_server_health.py +++ b/tests/test_server_health.py @@ -42,7 +42,7 @@ def test_health_whitespace_only_token_treated_as_loopback(monkeypatch: pytest.Mo def test_get_v1_metrics_401_without_bearer_when_token_set(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: monkeypatch.setenv("FLIGHTDECK_LOCAL_API_TOKEN", "metrics-read-gate") monkeypatch.chdir(tmp_path) - assert CliRunner().invoke(cli, ["init"]).exit_code == 0 + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 with TestClient(create_app()) as client: r = client.get("/v1/metrics") assert r.status_code == 401 @@ -52,7 +52,7 @@ def test_get_v1_metrics_401_without_bearer_when_token_set(tmp_path: Path, monkey def test_get_v1_metrics_200_with_bearer_when_token_set(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: monkeypatch.setenv("FLIGHTDECK_LOCAL_API_TOKEN", "metrics-read-ok") monkeypatch.chdir(tmp_path) - assert CliRunner().invoke(cli, ["init"]).exit_code == 0 + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 with TestClient(create_app()) as client: r = client.get("/v1/metrics", headers={"Authorization": "Bearer metrics-read-ok"}) assert r.status_code == 200 @@ -62,7 +62,7 @@ def test_get_v1_metrics_200_with_bearer_when_token_set(tmp_path: Path, monkeypat def test_v1_metrics_returns_counters(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") assert runner.invoke(cli, ["pricing", "import", str(pricing)]).exit_code == 0 diff --git a/tests/test_server_ingest.py b/tests/test_server_ingest.py index 23fddc3..615f727 100644 --- a/tests/test_server_ingest.py +++ b/tests/test_server_ingest.py @@ -16,7 +16,7 @@ def test_post_v1_events_ingests(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = { "provider": "openai", @@ -84,7 +84,7 @@ def test_post_v1_events_ingests(tmp_path: Path, monkeypatch) -> None: def test_post_v1_events_rejects_non_v1_api_version(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 app = create_app() client = TestClient(app) @@ -150,7 +150,7 @@ def _make_run_event_dict(*, api_version: str | None = "v1") -> dict: def test_post_v1_events_rejects_empty_api_version_string(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) - assert CliRunner().invoke(cli, ["init"]).exit_code == 0 + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 app = create_app() client = TestClient(app) ev = _make_run_event_dict(api_version="") @@ -163,7 +163,7 @@ def test_post_v1_events_rejects_empty_api_version_string(tmp_path: Path, monkeyp def test_post_v1_events_rejects_wrong_casing_v1(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) - assert CliRunner().invoke(cli, ["init"]).exit_code == 0 + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 app = create_app() client = TestClient(app) ev = _make_run_event_dict(api_version="V1") @@ -176,7 +176,7 @@ def test_post_v1_events_rejects_wrong_casing_v1(tmp_path: Path, monkeypatch) -> def test_post_v1_events_rejects_null_api_version(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) - assert CliRunner().invoke(cli, ["init"]).exit_code == 0 + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 app = create_app() client = TestClient(app) ev = _make_run_event_dict() @@ -191,7 +191,7 @@ def test_post_v1_events_rejects_null_api_version(tmp_path: Path, monkeypatch) -> def test_post_v1_events_accepts_omitted_api_version(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = { "provider": "openai", @@ -256,7 +256,7 @@ def test_post_v1_events_accepts_omitted_api_version(tmp_path: Path, monkeypatch) def test_post_v1_events_rejects_non_loopback_without_token(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) - assert CliRunner().invoke(cli, ["init"]).exit_code == 0 + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 app = create_app() transport = httpx.ASGITransport(app=app, client=("198.51.100.2", 44444)) @@ -272,7 +272,7 @@ async def _run() -> httpx.Response: def test_post_v1_events_non_loopback_requires_bearer_when_token_set(tmp_path: Path, monkeypatch) -> None: monkeypatch.setenv("FLIGHTDECK_LOCAL_API_TOKEN", "ingest-test-secret") monkeypatch.chdir(tmp_path) - assert CliRunner().invoke(cli, ["init"]).exit_code == 0 + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 app = create_app() transport = httpx.ASGITransport(app=app, client=("198.51.100.3", 44444)) @@ -289,7 +289,7 @@ def test_post_v1_events_accepts_non_loopback_with_bearer(tmp_path: Path, monkeyp monkeypatch.setenv("FLIGHTDECK_LOCAL_API_TOKEN", "ingest-bearer-ok") monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = { "provider": "openai", diff --git a/tests/test_spine.py b/tests/test_spine.py index 7da7d1a..425bd86 100644 --- a/tests/test_spine.py +++ b/tests/test_spine.py @@ -206,7 +206,7 @@ def test_bundle_checksum_order_independent(tmp_path: Path) -> None: def test_release_diff_invalid_window(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") assert runner.invoke(cli, ["pricing", "import", str(pricing)]).exit_code == 0 @@ -224,7 +224,7 @@ def test_release_diff_invalid_window(tmp_path: Path, monkeypatch) -> None: def test_pricing_replace_requires_reason(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") assert runner.invoke(cli, ["pricing", "import", str(pricing)]).exit_code == 0 @@ -241,7 +241,7 @@ def test_rollback_promotes_prior_release(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 assert runner.invoke(cli, ["policy", "set", str(write_policy(tmp_path))]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") @@ -317,7 +317,7 @@ def test_end_to_end_local_diff(tmp_path: Path, monkeypatch) -> None: runner = CliRunner() # init workspace config - res = runner.invoke(cli, ["init"]) + res = runner.invoke(cli, ["init", "--no-bundled-pricing"]) assert res.exit_code == 0 assert (tmp_path / "flightdeck.yaml").exists() @@ -373,7 +373,7 @@ def test_diff_rejects_cross_agent(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - res = runner.invoke(cli, ["init"]) + res = runner.invoke(cli, ["init", "--no-bundled-pricing"]) assert res.exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") @@ -393,7 +393,7 @@ def test_pricing_show_and_missing_table_error(tmp_path: Path, monkeypatch) -> No monkeypatch.chdir(tmp_path) runner = CliRunner() - res = runner.invoke(cli, ["init"]) + res = runner.invoke(cli, ["init", "--no-bundled-pricing"]) assert res.exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") @@ -420,7 +420,7 @@ def test_diff_reports_missing_baseline_pricing_table(tmp_path: Path, monkeypatch monkeypatch.chdir(tmp_path) runner = CliRunner() - res = runner.invoke(cli, ["init"]) + res = runner.invoke(cli, ["init", "--no-bundled-pricing"]) assert res.exit_code == 0 candidate_pricing = write_pricing(tmp_path, provider="openai", pricing_version="candidate-pricing") @@ -453,7 +453,7 @@ def test_diff_reports_missing_candidate_pricing_table(tmp_path: Path, monkeypatc monkeypatch.chdir(tmp_path) runner = CliRunner() - res = runner.invoke(cli, ["init"]) + res = runner.invoke(cli, ["init", "--no-bundled-pricing"]) assert res.exit_code == 0 baseline_pricing = write_pricing(tmp_path, provider="openai", pricing_version="baseline-pricing") @@ -486,7 +486,7 @@ def test_diff_reports_missing_model_entry_in_pricing_table(tmp_path: Path, monke monkeypatch.chdir(tmp_path) runner = CliRunner() - res = runner.invoke(cli, ["init"]) + res = runner.invoke(cli, ["init", "--no-bundled-pricing"]) assert res.exit_code == 0 pricing = write_pricing( @@ -551,7 +551,7 @@ def test_diff_uses_separate_baseline_and_candidate_pricing_tables( monkeypatch.chdir(tmp_path) runner = CliRunner() - res = runner.invoke(cli, ["init"]) + res = runner.invoke(cli, ["init", "--no-bundled-pricing"]) assert res.exit_code == 0 baseline_pricing = write_pricing( @@ -606,7 +606,7 @@ def test_policy_set_and_show(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - res = runner.invoke(cli, ["init"]) + res = runner.invoke(cli, ["init", "--no-bundled-pricing"]) assert res.exit_code == 0 policy = write_policy(tmp_path, max_cost_per_run_usd=1.5) @@ -624,7 +624,7 @@ def test_first_promotion_requires_reason_and_records_history(tmp_path: Path, mon monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") assert runner.invoke(cli, ["pricing", "import", str(pricing)]).exit_code == 0 @@ -674,7 +674,7 @@ def test_second_promotion_fails_when_policy_fails_and_keeps_current_release( monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(tmp_path, max_cost_per_run_usd=1.0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") @@ -753,7 +753,7 @@ def test_passing_second_promotion_replaces_current_release(tmp_path: Path, monke monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(tmp_path, max_cost_per_run_usd=10.0) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") @@ -825,7 +825,7 @@ def test_diff_medium_confidence_blocks_promotion(tmp_path: Path, monkeypatch) -> monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy(tmp_path, require_high_diff_confidence=True) assert runner.invoke(cli, ["policy", "set", str(policy)]).exit_code == 0 @@ -882,7 +882,7 @@ def test_diff_medium_confidence_blocks_promotion(tmp_path: Path, monkeypatch) -> def test_runs_ingest_empty_file(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 empty = tmp_path / "empty.jsonl" empty.write_text("", encoding="utf-8") @@ -895,7 +895,7 @@ def test_runs_ingest_empty_file(tmp_path: Path, monkeypatch) -> None: def test_runs_ingest_rejects_malformed_jsonl(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 bad = tmp_path / "bad.jsonl" bad.write_text("{not valid json}\n", encoding="utf-8") @@ -907,7 +907,7 @@ def test_runs_ingest_rejects_malformed_jsonl(tmp_path: Path, monkeypatch) -> Non def test_runs_ingest_accepts_json_array(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing = write_pricing(tmp_path, provider="openai", pricing_version="openai-2026-04-30") assert runner.invoke(cli, ["pricing", "import", str(pricing)]).exit_code == 0 @@ -960,7 +960,7 @@ def test_diff_cross_provider_releases(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 openai_pricing = write_pricing( tmp_path, @@ -1058,7 +1058,7 @@ def test_diff_cross_model_same_provider(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 pricing_path = tmp_path / "pricing_openai_multi.yaml" pricing_data = { @@ -1144,7 +1144,7 @@ def test_diff_cross_model_same_provider(tmp_path: Path, monkeypatch) -> None: def test_release_diff_output_json_shape(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy( tmp_path, min_candidate_runs=0, @@ -1178,7 +1178,7 @@ def test_release_diff_output_json_shape(tmp_path: Path, monkeypatch) -> None: def test_release_diff_pricing_warnings_when_model_not_in_table(tmp_path: Path, monkeypatch) -> None: monkeypatch.chdir(tmp_path) runner = CliRunner() - assert runner.invoke(cli, ["init"]).exit_code == 0 + assert runner.invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 policy = write_policy( tmp_path, min_candidate_runs=0, diff --git a/web/README.md b/web/README.md index d5285e2..f4024dc 100644 --- a/web/README.md +++ b/web/README.md @@ -50,7 +50,7 @@ npm run test:e2e **`playwright.config.ts`** starts **`scripts/e2e-server.mjs`**: a fresh workspace under **`.tmp/playwright-fd-workspace/`**, then **`flightdeck serve`** on **`http://127.0.0.1:9876`**. On GitHub Actions the server uses **`uv run flightdeck …`**; locally it uses **`python -m flightdeck.cli.main`** or **`py -3`**. -The default **`npm run test:e2e`** suite expects **`promotion_requires_approval: false`** in that workspace. A stray shell **`FD_E2E_FORCE_APPROVAL=1`** does **not** flip the server by itself: **`e2e-server.mjs`** only patches YAML when **`PW_FORCE_APPROVAL_WORKSPACE=1`**, which **`playwright.config.ts`** sets when the Playwright CLI lists **exactly one** `e2e/*.spec.ts` argument and it is **`e2e/actions-approval.spec.ts`**. Run approval tests with **`FD_E2E_FORCE_APPROVAL=1 npx playwright test e2e/actions-approval.spec.ts`** (that single-file form both enables the approval workspace and un-skips the describe block). Do not pass **`e2e/actions-approval.spec.ts`** together with other spec paths unless you intend a split server mode (the server is one workspace for the whole run). +The default **`npm run test:e2e`** suite expects **`promotion_requires_approval: false`** and **`pricing_catalog_configured: false`** in that workspace; **`e2e-server.mjs`** runs **`flightdeck init --no-bundled-pricing`** so the probe matches **`e2e/smoke.spec.ts`**. A stray shell **`FD_E2E_FORCE_APPROVAL=1`** does **not** flip the server by itself: **`e2e-server.mjs`** only patches YAML when **`PW_FORCE_APPROVAL_WORKSPACE=1`**, which **`playwright.config.ts`** sets when the Playwright CLI lists **exactly one** `e2e/*.spec.ts` argument and it is **`e2e/actions-approval.spec.ts`**. Run approval tests with **`FD_E2E_FORCE_APPROVAL=1 npx playwright test e2e/actions-approval.spec.ts`** (that single-file form both enables the approval workspace and un-skips the describe block). Do not pass **`e2e/actions-approval.spec.ts`** together with other spec paths unless you intend a split server mode (the server is one workspace for the whole run). Run **`npm`** commands from this **`web/`** directory (repo root is one level up: **`cd web`**). diff --git a/web/scripts/e2e-server.mjs b/web/scripts/e2e-server.mjs index 79e2657..2c79ffe 100644 --- a/web/scripts/e2e-server.mjs +++ b/web/scripts/e2e-server.mjs @@ -30,12 +30,14 @@ function run(cmd, args, opts) { fs.rmSync(ws, { recursive: true, force: true }); fs.mkdirSync(ws, { recursive: true }); +// Minimal workspace for UI/e2e: no bundled catalog path so GET /v1/workspace matches +// smoke.spec.ts (pricing_catalog_configured: false) and actions approval tests stay predictable. if (inCi) { - await run("uv", ["run", "flightdeck", "init"], { cwd: ws }); + await run("uv", ["run", "flightdeck", "init", "--no-bundled-pricing"], { cwd: ws }); } else if (process.platform === "win32") { - await run("py", ["-3", "-m", "flightdeck.cli.main", "init"], { cwd: ws }); + await run("py", ["-3", "-m", "flightdeck.cli.main", "init", "--no-bundled-pricing"], { cwd: ws }); } else { - await run("python", ["-m", "flightdeck.cli.main", "init"], { cwd: ws }); + await run("python", ["-m", "flightdeck.cli.main", "init", "--no-bundled-pricing"], { cwd: ws }); } // Set by `playwright.config.ts` only when the Playwright CLI targets `e2e/actions-approval.spec.ts`