Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/release-pypi.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Publish sdist + wheel to PyPI when a SemVer tag is pushed (e.g. v1.1.2).
# Publish sdist + wheel to PyPI when a SemVer tag is pushed (e.g. v1.2.0).
# Configure "trusted publishing" on PyPI for this workflow + repository + optional GitHub environment.
# https://docs.pypi.org/trusted-publishers/

Expand Down
24 changes: 11 additions & 13 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,38 +6,36 @@ This project follows [Semantic Versioning](https://semver.org/). From **v1.0.0**

## Unreleased

## 1.2.0 - 2026-05-03

### Breaking

- **`POST /v1/events`:** uses the same **`FLIGHTDECK_LOCAL_API_TOKEN`** / loopback policy as promotion and rollback. Remote unauthenticated ingest is no longer accepted; set the env var and send **`Authorization: Bearer`** (Python SDK **`api_token=`**, or **`--api-token`** / env in **[examples/integration/emit_sample_events.node.mjs](examples/integration/emit_sample_events.node.mjs)**).
- **`GET /v1/*`:** when **`FLIGHTDECK_LOCAL_API_TOKEN`** is set, read APIs require **`Authorization: Bearer`** (same header as writes); previously only mutations were Bearer-gated.
- **Python:** **`requires-python`** is **`>=3.11,<4`** (replaces **`>=3.14,<3.15`**). **`[tool.ruff] target-version`** is **`py311`**. CI follows **`.python-version`** (currently **3.12**).

### Changed

- **Docs / examples:** **`DEVELOPMENT.md`**, **`AGENTS.md`**, **`docs/sdk.md`**, **`docs/troubleshooting.md`**, **`examples/integration/README.md`**, **`examples/integration/adoption/README.md`**, **`examples/deploy/README.md`** — align with the Python range and ledger-write ingest model.

### Added

- **`flightdeck init`** (default): migrates the ledger, imports **bundled** OpenAI / Anthropic / Google pricing tables (**`pricing_version` `flightdeck-bundled-2026-05`**, illustrative snapshot), writes **`.flightdeck/pricing-catalog.yaml`**, and sets **`pricing_catalog_path`** in **`flightdeck.yaml`** so diffs can show cost signals without manual **`pricing import`**. Opt out with **`--no-bundled-pricing`**. Bundled YAML ships under **`src/flightdeck/bundled_pricing/`** (wheel package data).
- **`GET /health`:** **`read_auth`** (`open` vs `bearer`) describes whether **`GET /v1/*`** requires **`Authorization: Bearer`** when **`FLIGHTDECK_LOCAL_API_TOKEN`** is set (aligned with writes).
- **SQLite:** bounded retries on **`database is locked` / busy** for ledger **`execute`** paths; **`flightdeck serve --sqlite-lock-timeout`** / **`--retry-sqlite-lock`** (and env **`FLIGHTDECK_SQLITE_*`**) plus **`docs/operations-and-policy.md`** concurrency notes.
- **CI / dev:** **`pytest-cov`** with **`--cov-fail-under=80`** on **`src/flightdeck`** (**`integrations/*`**, **`quickstart_smoke`**, and **`sdk/client.py`** omitted from the denominator — see **`[tool.coverage.run]`** in **`pyproject.toml`**).
- **Experimental `flightdeck.integrations`:** optional extras **`integrations-langchain`**, **`integrations-temporal`**, **`integrations-openai-agents`**, and meta **`integrations-ci`** (CI job); thin mappers from OpenAI chat completions, Anthropic messages, OpenAI Agents–style results, LangChain callbacks, CrewAI-style manual totals, and Temporal-oriented **`labels`**. Docs: **`docs/sdk-integrations.md`**; examples: **`examples/integration/adoption/`**. Contributor policy updates in **`AGENTS.md`** / **`CLAUDE.md`**.
- **PostgreSQL ledger:** optional **`database_url`** in **`flightdeck.yaml`** (`postgresql://` or `postgres://`); install **`psycopg`** with **`uv sync --extra postgres`** (or **`pip install 'flightdeck-ai[postgres]'`**). Same schema migrations and API behavior as SQLite; run filters use **`::json`** predicates on **`event_json`**. **`flightdeck doctor --backup`** stays SQLite-only (use **`pg_dump`** for Postgres). Optional integration tests: **`FLIGHTDECK_TEST_POSTGRES_URL`** with the **`postgres`** extra.
- **`GET /v1/runs/export`** — NDJSON stream of the same filtered slice as **`GET /v1/runs`** (optional response headers when truncated).
- **`session_id`** / **`span_id`** query filters on **`GET /v1/runs`**, matching CLI/SDK, and **`offset`** pagination on run listings (with **`runs list`** / **`runs export`**).
- **Web Runs** page — query **`GET /v1/runs`** from the bundled UI.

### Changed

- **Docs / examples:** **`DEVELOPMENT.md`**, **`AGENTS.md`**, **`README.md`**, **`ROADMAP.md`**, **`SUPPORT.md`**, **`CONTRIBUTING.md`**, **`docs/sdk.md`**, **`docs/troubleshooting.md`**, **`docs/pricing-catalog.md`**, **`examples/integration/README.md`**, **`examples/integration/adoption/README.md`**, **`examples/deploy/README.md`** — align with the Python range, ledger-write ingest model, bundled init, ICP/sustainability copy, and outcome-oriented roadmap language.
- **Web Runs:** forensics — empty / offset / truncation messaging, export copy, trace band rows or **Group by trace_id**, **View** drawer (structured fields + full JSON, **session_id** / **span_id**, focus trap + return focus, **`aria-haspopup="dialog"`**), trace/status columns; **run-query** failures show a typed error card with **Retry**.
- **Web Diff:** scannable sections (policy, evidence window, pricing/catalog/hints, rollups), pre-query hint, `evaluated_at` when present; warn when imported **pricing table versions** or **providers** differ baseline vs candidate.
- **Web Actions:** workspace loading skeleton; numbered approval steps; pending **Refresh list** / **Use for confirm**; clearer confirms; approval-reason placeholder; **Rollback** danger-styled; **Actions** shows whether **`VITE_FLIGHTDECK_LOCAL_API_TOKEN`** is set (no value) and an inline hint when the server uses **Bearer** and the UI token is missing.
- **Web shell / Overview / CSS:** **Langfuse-style** left sidebar + main column (stacks on narrow viewports); skeleton loading on first load; **Overview** auto-polls timeline + metrics every **30s** when the tab is visible (silent refresh; no manual **Refresh** button); updates after **Actions** mutations via context; ledger metrics hints + links to **Diff** / **Runs**; Diff query **`aria-busy`**; **Security strip** `/health` loading + **Bearer** + client-token reassurance line; shared **focus-visible** / type scale / narrow breakpoints; **skip to main** (HashRouter-safe); **[ROADMAP.md](ROADMAP.md)** adds **Visual system** backlog item and theme deferral.
- **Examples / deploy / SECURITY / web README:** [examples/README.md](examples/README.md) end-to-end loop + **UI polish / operator flow** blurb; deploy checklist + **`restart: unless-stopped`**; **[SECURITY.md](SECURITY.md)** deploy pointer; **[web/README.md](web/README.md)** Playwright approval vs default runs.
- **Playwright:** `e2e-server.mjs` gates approval workspace on **`PW_FORCE_APPROVAL_WORKSPACE`** (set from config); **`reuseExistingServer: false`**; config sets approval workspace only when the CLI lists **exactly one** `e2e/*.spec.ts` path and it is **`actions-approval.spec.ts`** (avoids multi-spec argv; **`PW_WEBSERVER_APPROVAL`** no longer toggles the server so a stale value cannot break **`npm run test:e2e`**); **`actions-approval.spec.ts`** skips when **`GET /v1/workspace`** shows approval off (e.g. full suite with **`FD_E2E_FORCE_APPROVAL=1`**).

### Added

- **PostgreSQL ledger:** optional **`database_url`** in **`flightdeck.yaml`** (`postgresql://` or `postgres://`); install **`psycopg`** with **`uv sync --extra postgres`** (or **`pip install 'flightdeck-ai[postgres]'`**). Same schema migrations and API behavior as SQLite; run filters use **`::json`** predicates on **`event_json`**. **`flightdeck doctor --backup`** stays SQLite-only (use **`pg_dump`** for Postgres). Optional integration tests: **`FLIGHTDECK_TEST_POSTGRES_URL`** with the **`postgres`** extra.
- **`GET /v1/runs/export`** — NDJSON stream of the same filtered slice as **`GET /v1/runs`** (optional response headers when truncated).
- **`session_id`** / **`span_id`** query filters on **`GET /v1/runs`**, matching CLI/SDK, and **`offset`** pagination on run listings (with **`runs list`** / **`runs export`**).
- **Web Runs** page — query **`GET /v1/runs`** from the bundled UI.
- **Examples / deploy / SECURITY / web README:** [examples/README.md](examples/README.md) end-to-end loop + **UI polish / operator flow** blurb; deploy checklist + **`restart: unless-stopped`**; **[SECURITY.md](SECURITY.md)** deploy pointer; **[web/README.md](web/README.md)** Playwright approval vs default runs and **`init --no-bundled-pricing`** for stable **`GET /v1/workspace`** probes.
- **Playwright:** `e2e-server.mjs` gates approval workspace on **`PW_FORCE_APPROVAL_WORKSPACE`** (set from config); **`reuseExistingServer: false`**; config sets approval workspace only when the CLI lists **exactly one** `e2e/*.spec.ts` path and it is **`actions-approval.spec.ts`** (avoids multi-spec argv; **`PW_WEBSERVER_APPROVAL`** no longer toggles the server so a stale value cannot break **`npm run test:e2e`**); **`actions-approval.spec.ts`** skips when **`GET /v1/workspace`** shows approval off (e.g. full suite with **`FD_E2E_FORCE_APPROVAL=1`**); default e2e workspace uses **`flightdeck init --no-bundled-pricing`** so **`pricing_catalog_configured`** stays **`false`** for **`e2e/smoke.spec.ts`**.
- **Examples / CI / deploy / Helm pins:** **`flightdeck-ai>=1.2.0`** where version pins apply.

## 1.1.2 - 2026-05-03

Expand Down
4 changes: 4 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ Contributions are accepted under the **Apache License, Version 2.0** (see **`LIC

Human and AI contributors: follow **[AGENTS.md](AGENTS.md)** (full rules). For a short index, see **[CLAUDE.md](CLAUDE.md)**. In **Cursor**, the project rule **[`.cursor/rules/flightdeck-ci-artifacts.mdc`](.cursor/rules/flightdeck-ci-artifacts.mdc)** (`alwaysApply`) summarizes the **web `static/`** and **`schemas/`** drift gates CI enforces.

## Who we are building for

The product ICP is **platform or ML engineering teams** (often about **5–30** people) at **Series B+**-style companies shipping **at least two** **LLM-backed agents** to production—teams that have already been burned by a **cost spike** or **quality regression** tied to a **prompt** or **model** change. Contributions should shorten their path to **versioned releases**, **ingested evidence**, **economic diffs**, and **policy-gated promote**—not broaden scope into orchestration or hosted tracing (see **[AGENTS.md](AGENTS.md)** non-goals).

## Local Setup

Recommended (**[uv](https://docs.astral.sh/uv/)** — see **`DEVELOPMENT.md`**):
Expand Down
4 changes: 2 additions & 2 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ Full command flags and exit codes: [README.md](https://github.com/flightdeckdev/
`flightdeck-quickstart-verify` (entry point for `src/flightdeck/quickstart_smoke.py`) runs the full
quickstart workflow end-to-end in an isolated temp directory:

1. `flightdeck init`
1. `flightdeck init` (bundled OpenAI / Anthropic / Google snapshot + catalog; additive with the imports below)
2. Import both pricing tables from `examples/quickstart/`
3. `flightdeck policy set`
4. Register baseline and candidate releases — capture the `release_id` printed to stdout
Expand Down Expand Up @@ -143,7 +143,7 @@ Merging to **`main` does not publish packages** — PyPI uploads are **tag-drive
1. **PyPI:** add a **trusted publisher** for **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)** — workflow **`release-pypi.yml`**. If PyPI offers **Environment name: (Any)**, you can still use a GitHub **Environment** named **`pypi`** for approval gates; otherwise match whatever you register on PyPI ([trusted publishers](https://docs.pypi.org/trusted-publishers/)).
2. **GitHub:** Settings → **Environments** → create **`pypi`** (optional: required reviewers / wait timer before OIDC publish).
3. Bump **`version`** in **`pyproject.toml`** and **`src/flightdeck/__init__.py`**, update **`CHANGELOG.md`**, merge to **`main`**.
4. **`git tag vX.Y.Z`** (must match **`pyproject.toml`** exactly, e.g. **`v1.1.2`**) then **`git push origin vX.Y.Z`**.
4. **`git tag vX.Y.Z`** (must match **`pyproject.toml`** exactly, e.g. **`v1.2.0`**) then **`git push origin vX.Y.Z`**.

The workflow runs **ruff**, **pytest**, schema drift, **`uv build`**, publishes **sdist + wheel** to **PyPI** via **OIDC** (no long-lived API token in repo secrets), enables **publish attestations**, and creates a **GitHub Release** with generated notes and **`dist/*`** assets.

Expand Down
20 changes: 14 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

**Ship AI agents safely with release diffs, runtime evidence, and policy gates.**

FlightDeck is **local-first** (CLI + SQLite + optional **`flightdeck serve`** UI). It is not an agent framework, prompt IDE, tracing dashboard, or gateway — it is where **what shipped**, **what ran**, **what it cost**, and **whether promote is allowed** are recorded and compared.
FlightDeck is **local-first** (CLI + SQLite + optional **`flightdeck serve`** UI): run evidence, pricing tables, and the ledger **stay on disk in your environment** by default—**no trace or billing payload is sent to FlightDeck as a vendor**. That posture matters for **regulated**, **air-gapped**, and **data-sovereignty** teams that cannot ship telemetry to a third-party SaaS observability backend. It is not an agent framework, prompt IDE, tracing dashboard, or gateway — it is where **what shipped**, **what ran**, **what it cost**, and **whether promote is allowed** are recorded and compared.

## In ~20 seconds

Expand All @@ -13,12 +13,14 @@ FlightDeck is **local-first** (CLI + SQLite + optional **`flightdeck serve`** UI

## Example outcome

You ship a candidate whose **system prompt drifts by a handful of tokens**; under your imported tariffs the diff shows **cost per run up ~31%** while policy caps spend. **`flightdeck release promote`** (or the HTTP promote path) **stays blocked** until you change the model, relax policy with intent, or widen evidence — not because CI is slow, but because the **governed ledger** says no.
You ship a candidate whose **system prompt drifts by a handful of tokens**; under your tariffs the diff shows **cost per run up ~31%** while policy caps spend. **`flightdeck release promote`** (or the HTTP promote path) **stays blocked** until you change the model, relax policy with intent, or widen evidence — not because CI is slow, but because the **governed ledger** says no. (The **~31%** story uses the **two custom pricing YAMLs** in **[examples/quickstart/](examples/quickstart/)**; **`flightdeck init`** alone seeds a **bundled snapshot** so your **first** cost-aware diff does not start from an empty pricing ledger.)

## Who should use this?

- **Primary buyer / ICP:** **Platform or ML engineering teams** (often **5–30** people) at **growth-stage** companies shipping **two or more** **LLM agents** to production—especially teams that already had a **cost** or **regression** incident from a **prompt** or **model** change and need a **governed** promote path.
- Teams that **version agent builds** (prompts, tools, model pins) and need a **durable audit trail**.
- Engineers who want **one command** to answer “is this candidate safe to roll forward?” with **numbers**, not gut feel.
- **Healthcare, fintech, and enterprise** operators who **cannot** default to sending traces or cost data to a **hosted** observability vendor—**local-first** evidence and pricing imports are the default integration model.
- Anyone who has outgrown **ad hoc** folder diffs or **spreadsheet** promote checklists.

## How FlightDeck fits your stack
Expand Down Expand Up @@ -53,6 +55,7 @@ flowchart LR
| **Primary job** | **Release + promote governance** for agents (ledger, diff, policy) | Tracing, sessions, evals, LLM observability | ML / model observability and monitoring | Source control and generic pipelines |
| **Immutable release artifact** | Yes (`release.yaml` + checksum) | No | No | Only if you build it |
| **Evidence + cost/latency diff** | Yes (runs + pricing tables / optional catalog) | Different lens (trace-level) | Different lens | DIY |
| **Default data residency** | **On your machine** (CLI / SQLite / local HTTP) | Typically SaaS-hosted | Cloud offerings | Your repo |
| **Policy gate on promote** | First-class | No | No | DIY |

**Try the UI:** run **`flightdeck serve`**, then open **http://127.0.0.1:8765/** — Overview, Diff, and Actions (see [docs/web-ui.md](docs/web-ui.md)).
Expand All @@ -61,7 +64,7 @@ flowchart LR

Small prompt or model changes can silently move **cost**, **latency**, and **error rate**. FlightDeck turns those moves into **explicit promote decisions** backed by ingested runs — before production pointers advance.

**Current local spine:** versioned **`release.yaml`** + checksums · **`RunEvent`** ingest (JSONL or arrays) · immutable **pricing** imports · **`flightdeck release diff`** · policy-gated **`release promote`** / rollback · full **audit history**.
**Current local spine:** versioned **`release.yaml`** + checksums · **`RunEvent`** ingest (JSONL or arrays) · **bundled default pricing** on **`flightdeck init`** (plus optional **`pricing import`**) · **`flightdeck release diff`** · policy-gated **`release promote`** / rollback · full **audit history**.

## Status

Expand All @@ -70,9 +73,11 @@ FlightDeck is **local-first** and ships as a Python CLI backed by SQLite.
**v1.0.0** froze **SemVer-stable public contracts** for the documented CLI, committed **`schemas/v1/`**,
and **`POST /v1/events`** with **`api_version` `v1`**. **v1.1.x** adds catalog-aware diffs, approval flows, and forensics slices (optional pricing catalog on diffs,
promotion request/confirm, read-only runs listing, **`GET /v1/workspace`** for UI and automation, Helm/fleet examples)
without breaking those v1.0 shapes. See **[RELEASE_NOTES.md](RELEASE_NOTES.md)** and **[CHANGELOG.md](CHANGELOG.md)**.
without breaking those v1.0 shapes. **v1.2.0** raises the Python floor to **3.11+**, tightens **Bearer** gating for **`POST /v1/events`** and **`GET /v1/*`** when **`FLIGHTDECK_LOCAL_API_TOKEN`** is set, adds optional **PostgreSQL**, **bundled default pricing** on **`flightdeck init`**, and experimental **`flightdeck.integrations`**. See **[RELEASE_NOTES.md](RELEASE_NOTES.md)** and **[CHANGELOG.md](CHANGELOG.md)**.
The product scope is still intentionally narrow (release governance, not a hosted agent platform).

**Maintenance and sustainability:** the project is **Apache-2.0** with **no required commercial license**. If FlightDeck matters to your production stack, use **[SUPPORT.md](SUPPORT.md)** for security, commercial, and sponsorship pointers, and the **Sponsor** affordance on **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)** when it is enabled—signals like that answer “what happens if maintenance stops?” more credibly than roadmap prose alone.

Not implemented yet:

- hosted control plane
Expand Down Expand Up @@ -123,10 +128,12 @@ Or use the bash wrapper (Git Bash / WSL on Windows):
./scripts/smoke.sh
```

Or walk through the core commands:
**Bundled pricing (default `init`):** **`flightdeck init`** migrates the ledger, imports **OpenAI**, **Anthropic**, and **Google** (Gemini-class) tables at **`pricing_version` `flightdeck-bundled-2026-05`**, and writes **`.flightdeck/pricing-catalog.yaml`** with **`pricing_catalog_path`** set in **`flightdeck.yaml`**. In **`release.yaml`**, set **`spec.pricing_reference`** to `{ provider: openai | anthropic | google, pricing_version: flightdeck-bundled-2026-05 }` to get **per-table** and **catalog** cost lines on diffs without authoring YAML. These rates are a **convenience snapshot**, not live vendor billing—**`flightdeck pricing import`** your own files for production. Use **`flightdeck init --no-bundled-pricing`** for an empty ledger.

Or walk through the **full quickstart** (policy + **two** custom tariffs for the **~31%** narrative—same flow CI runs):

```bash
flightdeck init
flightdeck init # omit --no-bundled-pricing; bundled tables are additive with the imports below
flightdeck pricing import examples/quickstart/pricing-baseline.yaml
flightdeck pricing import examples/quickstart/pricing-candidate.yaml
flightdeck policy set examples/quickstart/policy.yaml
Expand Down Expand Up @@ -166,6 +173,7 @@ Substitute them before ingestion, or run **`uv run flightdeck-quickstart-verify`
- [Development](DEVELOPMENT.md)
- [Contributing](CONTRIBUTING.md)
- [Security](SECURITY.md)
- [Support and sustainability](SUPPORT.md)
- [CLAUDE.md](CLAUDE.md) and [AGENTS.md](AGENTS.md)

## Development
Expand Down
Loading
Loading