LunarCommand · chris-colinsky · May 16, 2026 · May 16, 2026 · May 16, 2026 · May 16, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
 
 ### Added
 
+- **Prompt-management capability (proposal 0017, introduced in spec v0.15.0).** New `openarmature.prompts` subpackage. `PromptManager` composes one or more `PromptBackend`s, exposes `fetch` / `render` / `get`, applies the §8 fallback semantics (`prompt_store_unavailable` continues to the next backend; `prompt_not_found` stops the chain), and renders templates with Jinja2's `StrictUndefined` per §7. `Prompt` / `PromptResult` / `PromptGroup` are Pydantic models matching spec §3 / §4 / §9. Three error categories (`PromptNotFound`, `PromptRenderError`, `PromptStoreUnavailable`) with `PROMPT_TRANSIENT_CATEGORIES` exported for retry-middleware classifiers. `FilesystemPromptBackend` is the minimum local-filesystem reference backend (layout: `<root>/<label>/<name>.j2`; `version` derived from the first 16 hex chars of `template_hash`). New runtime dependency: `jinja2>=3.1`.
+- **`openarmature.prompts.context` — observability propagation per spec §11.** `with_active_prompt(result)` and `with_active_prompt_group(group)` context managers + `current_prompt_result()` / `current_prompt_group()` inspectors. When the OTel observer is active and an LLM call fires inside `with_active_prompt`, the `openarmature.llm.complete` span carries the normative `openarmature.prompt.*` attributes (`name`, `version`, `label`, `template_hash`, `rendered_hash`, `group_name`). Nesting is innermost-wins.
 - **Image content blocks for user messages (proposal 0015, introduced in spec v0.13.0).** `UserMessage.content` now accepts `str | list[ContentBlock]`. The block surface introduces `TextBlock`, `ImageBlock`, `ImageSourceURL`, `ImageSourceInline`, and the `ContentBlock` / `ImageSource` discriminated unions over the block / source `type` field. `ImageBlock` carries a `media_type` (required for inline sources; ignored for URL sources; typed as `str | None` so callers MAY pass any `image/*` type the bound model supports) and an optional `detail` hint (`"auto"` / `"low"` / `"high"`; `None` default omits the field from the wire so providers apply their own default). System, assistant, and tool messages stay text-string-only; image inputs are user-only in v1.
 - **`OpenAIProvider` content-array wire mapping.** When `UserMessage.content` is a content-block sequence, the wire body uses OpenAI's `content` array per §8.1.1. `TextBlock → {type: "text", text}`. `ImageBlock` with a URL source maps to `{type: "image_url", image_url: {url, detail?}}`. `ImageBlock` with an inline source constructs an RFC 2397 `data:<media_type>;base64,<base64_data>` URI and goes through the same `image_url` entry shape. Inline bytes pass through unchanged — no inspection, transcoding, or re-encoding.
 - **New error category `ProviderUnsupportedContentBlock` (non-transient).** Raised when the bound model rejects a content block type / media variant. Distinct from `ProviderInvalidRequest` (which covers spec-shape malformation): this category surfaces a *capability* mismatch, letting callers route differently (e.g., fall back to a multimodal-capable provider) without overloading the malformed-request category. Carries `block_type` ("image" / "audio" / "video") and `reason` (provider's human-readable message) when those are recoverable from the rejection. `OpenAIProvider` detects content rejection via HTTP 400 bodies — heuristic on `error.code` (known set: `image_content_not_supported`, `unsupported_image_media_type`, `audio_content_not_supported`, etc.), `error.type` (`image_parse_error`), and `error.message` ("does not support" + image/audio/video).

diff --git a/docs/concepts/llms.md b/docs/concepts/llms.md
@@ -196,7 +196,7 @@ response post-receive against the supplied schema; strict is a
 wire-level optimization, not a correctness requirement.
 
 `strict_mode_supported(schema)` (exported from `openarmature.llm`)
-performs the deep recursive check. The heuristic is conservative —
+performs the deep recursive check. The heuristic is conservative:
 anything not on the list below trips to `strict: false`:
 
 - Top-level schema is `type: "object"`.
@@ -240,7 +240,7 @@ A text block is the array-form equivalent of a text-string message:
 text block is normatively equivalent to one with `content="describe
 this"`.
 
-An image block carries one source — URL or inline base64 — plus an
+An image block carries one source (URL or inline base64) plus an
 optional `detail` hint:
 
 ```python
@@ -302,7 +302,7 @@ fidelity: `"auto"`, `"low"`, or `"high"`. The class default is `None`,
 which **omits the field from the wire** and lets the provider apply
 its own default (conceptually `"auto"`). Setting `detail="auto"`
 explicitly on the spec block forces the wire to carry an explicit
-`"auto"` — usually unnecessary, since the provider's default is the
+`"auto"`, usually unnecessary since the provider's default is the
 same value.
 
 ### When the model can't handle the block
@@ -324,12 +324,12 @@ provider on this category) compose cleanly against it.
 "audio", "video") and `reason` (the provider's human-readable
 message) when those are recoverable from the rejection.
 
-`OpenAIProvider` detects content rejection via the response body —
+`OpenAIProvider` detects content rejection via the response body:
 HTTP 400 with an error code like `image_content_not_supported` or a
 message like "does not support image inputs." Pre-send capability
 checks (failing fast before the wire trip when you know the model
 doesn't support images) live above the provider as userland
-middleware — the provider doesn't ship a static model-capability
+middleware; the provider doesn't ship a static model-capability
 catalog.
 
 ## Routing on parsed fields

diff --git a/docs/concepts/prompts.md b/docs/concepts/prompts.md
@@ -0,0 +1,278 @@
+# Prompts
+
+Named, versioned, content-addressed prompts. OpenArmature's
+prompt-management capability separates *fetching* a template
+from *rendering* it, lets you compose multiple backends with
+explicit fallback, and propagates prompt identity to your
+observability backend so trace UIs can pivot on the prompt
+that produced a call.
+
+Skip ahead to [a minimal example](#a-minimal-example) if you
+want code first.
+
+## The two halves: fetch and render
+
+A `PromptBackend` knows how to find a template by `name` and
+`label`; nothing more. A `PromptManager` composes one or more
+backends and adds rendering on top:
+
+```python
+from openarmature.prompts import PromptManager, FilesystemPromptBackend
+
+manager = PromptManager(FilesystemPromptBackend("./prompts"))
+
+# Fetch returns a Prompt (the raw template + identity metadata).
+prompt = await manager.fetch("greeting", "production")
+
+# Render applies variables and returns a PromptResult (the
+# rendered messages plus a content-addressed identity).
+result = manager.render(prompt, {"user": "Alice"})
+
+# Or do both in one shot:
+result = await manager.get("greeting", "production", {"user": "Alice"})
+```
+
+Why two operations instead of one? Three reasons:
+
+- **Inspect templates without binding variables.** Schema
+  validation, prompt diffing, tooling that walks the prompt
+  catalogue.
+- **Cache templates separately from rendered output.** The
+  fetch step is the I/O step; rendering is pure local
+  computation.
+- **Render the same template with different variables in
+  tight loops.** Map-reduce over chunks, batch evaluation,
+  fan-out fixtures.
+
+The convenience `get()` operation gives you the single-call
+shape when you want it without removing the separability.
+
+## Prompt identity
+
+Every `Prompt` carries five identity fields:
+
+- `name`: your stable identifier (`"greeting"`).
+- `version`: the backend's version string. Implementation-defined:
+  a backend MAY use semver, monotonic integers, content
+  hashes, git short-SHAs, or any stable identifier. The
+  filesystem backend derives it from the template content
+  hash.
+- `label`: the slot the prompt was fetched from
+  (`"production"`, `"latest"`, `"variant-a"`). The label is
+  part of the query.
+- `template_hash`: SHA-256 of the raw template source.
+  Two prompts with different content always have different
+  hashes.
+- `fetched_at`: when the prompt was fetched. Cached
+  backends preserve the original fetch time, not the
+  cache-hit time.
+
+The `name + version + label` triple identifies the prompt;
+the `template_hash` lets you tell two prompts apart by
+*content*, which matters when a vendor backend serves
+different content under the same `latest` label over time.
+
+A `PromptResult` propagates all of those, plus:
+
+- `rendered_hash`: SHA-256 over the rendered messages.
+  Same template + same variables → same hash. This is the
+  cache-key value a memoization layer wants.
+- `messages`: the rendered output as an LLM-ready
+  `list[Message]`. Directly consumable by
+  `Provider.complete()`.
+- `variables`: what was applied. Audit-trail friendly.
+- `rendered_at`: when the render happened. Distinct from
+  `fetched_at`.
+
+## Strict variables by default
+
+A template that references a variable not in the mapping
+raises `PromptRenderError`:
+
+```python
+prompt = await manager.fetch("greeting", "production")  # "Hello, {{ user }}! Today is {{ day }}."
+manager.render(prompt, {"user": "Alice"})  # raises: "day" is undefined
+```
+
+This is intentional. Silently substituting empty strings for
+missing variables masks bugs: a typo'd variable name produces
+a working-but-wrong prompt, often invisibly. If you need
+lenient behavior, wrap your variables in your own defaulting
+layer before passing them to `render()`.
+
+The Python implementation uses Jinja2's `StrictUndefined`.
+
+## Composite backends and fallback
+
+A manager constructed with multiple backends consults them in
+order. The fallback rule distinguishes infrastructure failure
+from logical absence:
+
+```python
+from openarmature.prompts import PromptManager
+from openarmature_langfuse import LangfusePromptBackend  # hypothetical sibling
+
+manager = PromptManager(
+    LangfusePromptBackend(api_key=...),
+    FilesystemPromptBackend("./prompts"),  # local fallback
+)
+```
+
+- **`PromptStoreUnavailable` from a backend → try the next.**
+  Network's down, vendor API is 5xx-ing, filesystem hiccupped,
+  so the manager falls back. This is the "Langfuse is degraded,
+  use the local copy" case.
+- **`PromptNotFound` from a backend → STOP the chain.** The
+  error propagates. This is the "operator deliberately deleted
+  the prompt from Langfuse to retire it" case; falling back here
+  would silently resurface a stale local copy under a name the
+  operator wanted gone.
+- **All backends `PromptStoreUnavailable` → manager raises
+  `PromptStoreUnavailable`.** Everything's down.
+
+The two error categories have different operational
+meanings; the manager keeps them separated.
+
+## Errors
+
+Three categories cover every failure mode:
+
+| Error                     | When                                                                | Transient |
+| ------------------------- | ------------------------------------------------------------------- | --------- |
+| `PromptNotFound`          | No prompt matches `(name, label)` in any backend (after §8 rules)   | No        |
+| `PromptRenderError`       | Undefined variable, template parse error, coercion failure          | No        |
+| `PromptStoreUnavailable`  | Backend infrastructure failure (network, I/O, vendor API)           | Yes       |
+
+`PROMPT_TRANSIENT_CATEGORIES` is exported as a frozenset for
+retry-middleware classifiers, matching the pattern
+`openarmature.llm` uses with its `TRANSIENT_CATEGORIES`.
+
+## PromptGroup: tracing related prompts together
+
+A `PromptGroup` is a structural grouping of two or more
+`PromptResult` instances under a stable `group_name`. The
+group itself doesn't execute anything; it gives observability
+a shared name to render related calls under.
+
+```python
+from openarmature.prompts import PromptGroup, with_active_prompt_group
+
+classify = await manager.get("classify", variables={"input": user_query})
+answer = await manager.get("answer", variables={"input": user_query, ...})
+
+group = PromptGroup(group_name="classifier_chain", members=[classify, answer])
+with with_active_prompt_group(group):
+    # Every LLM call in this scope carries
+    # openarmature.prompt.group_name="classifier_chain".
+    classification = await provider.complete(classify.messages, ...)
+    final = await provider.complete(answer.messages, ...)
+```
+
+Canonical patterns the primitive covers:
+
+- **Multi-stage classification**: `[coarse, fine, answer]`.
+- **RAG with reranking**: `[query_rewrite, retrieve, rerank, answer]`.
+- **Self-correction loops**: `[generate, critique, revise]`.
+- **Map-reduce over chunks**: `[chunk_classify_1..N, synthesize]`.
+
+The N=2 case ("classifier + follow-up") is the simplest;
+larger groups work under the same primitive. The group rejects
+empty and single-member shapes; single-prompt tagging is
+already served by the per-prompt observability attributes
+below.
+
+## Observability propagation
+
+When an LLM call fires inside `with_active_prompt(result)` (or
+`with_active_prompt_group(group)`), the OTel observer surfaces
+six normative attributes on the `openarmature.llm.complete`
+span:
+
+- `openarmature.prompt.name`
+- `openarmature.prompt.version`
+- `openarmature.prompt.label`
+- `openarmature.prompt.template_hash`
+- `openarmature.prompt.rendered_hash`
+- `openarmature.prompt.group_name`
+
+Pattern:
+
+```python
+result = await manager.get("greeting", "production", {"user": "Alice"})
+with with_active_prompt(result):
+    response = await provider.complete(result.messages, ...)
+```
+
+Trace UIs can then pivot on `prompt.name`, filter on
+`prompt.template_hash` to find every call that used a given
+template version, or surface `prompt.group_name` to group
+related calls into a single workflow view.
+
+Nesting is innermost-wins. If you activate a result inside
+another active result, the inner one wins for the duration
+of the inner block.
+
+## Determinism and content-addressed caching
+
+`render` is deterministic: same `Prompt`, same `variables` →
+bytewise-identical `messages` and `rendered_hash` across
+calls. This is the cache-key contract: `rendered_hash`
+gives a downstream memoization layer the right equivalence
+relation for free.
+
+Templates MAY reference user-supplied variables that capture
+nondeterministic values (`now=datetime.utcnow()`); the
+determinism contract applies to the render operation given
+fixed inputs, not to user-supplied variable content.
+
+## A minimal example
+
+```python
+import asyncio
+from pathlib import Path
+
+from openarmature.prompts import FilesystemPromptBackend, PromptManager
+
+
+async def main() -> None:
+    manager = PromptManager(FilesystemPromptBackend(Path("./prompts")))
+    result = await manager.get(
+        "greeting",
+        "production",
+        variables={"user": "Alice"},
+    )
+    print(result.messages[0].content)         # rendered text
+    print(result.rendered_hash)               # cache key
+
+
+asyncio.run(main())
+```
+
+The filesystem backend layout is
+`<root>/<label>/<name>.j2`; for the example above,
+`./prompts/production/greeting.j2`.
+
+## What's out of scope (for now)
+
+- **Specific vendor backends**: Langfuse, PromptLayer, etc.,
+  ship as sibling packages (`openarmature-langfuse`, …). The
+  core ships the protocol + a filesystem reference.
+- **Prompt versioning workflows**: how versions are assigned,
+  promoted, pinned. Per project. The spec defines the
+  `version` field; the discipline is yours.
+- **Cache invalidation policies**: `template_hash` and
+  `rendered_hash` are the keys; the cache itself is a
+  separate concern.
+- **Prompt linting / evaluation**: quality checks belong to
+  separate tools (or the future eval capability).
+- **Multi-message render decomposition**: v1 emits a single
+  `UserMessage` carrying the rendered text. If you need
+  `system + user` splits, construct the messages list
+  manually outside `render()` for now.
+
+## Where to next
+
+- **[Model Providers](../model-providers/index.md)**:
+  what to pass `result.messages` into.
+- **[API reference: `openarmature.prompts`](../reference/prompts.md)**:
+  the full public surface.
diff --git a/docs/model-providers/authoring.md b/docs/model-providers/authoring.md
@@ -69,7 +69,7 @@ class MyProvider:
         response_schema: dict[str, Any] | type[BaseModel] | None = None,
     ) -> Response:
         # response_schema is part of the Protocol; a skeleton provider
-        # MUST NOT silently ignore it — callers expect either
+        # MUST NOT silently ignore it: callers expect either
         # Response.parsed populated or a StructuredOutputInvalid raise.
         # Until the wire path is implemented, raise
         # ProviderInvalidRequest when response_schema is set. A
@@ -206,8 +206,8 @@ of:
   `ImageSourceInline`) are stable across providers; only the wire
   shape differs. Provider authors targeting non-multimodal models
   MUST surface `ProviderUnsupportedContentBlock` when the request
-  carries blocks the bound model can't serve — pre-send or
-  post-receive per §7.
+  carries blocks the bound model can't serve (pre-send or
+  post-receive per §7).
 - **Structured output.** Threading `response_schema` through the
   request body (native `response_format` if the underlying wire
   supports it; prompt-augmentation fallback otherwise) and validating

diff --git a/docs/model-providers/index.md b/docs/model-providers/index.md
@@ -89,7 +89,7 @@ in the LLMs concept page for the multimodal contract; see
 
 `OpenAIProvider` detects unsupported-content-block rejections via
 the response body (HTTP 400 with an error code or message indicating
-content rejection) — a post-receive mapping rather than a static
+content rejection): a post-receive mapping rather than a static
 pre-send capability check. Pre-send protection is a userland
 middleware pattern when callers know the bound model's capabilities
 up front.

diff --git a/docs/reference/prompts.md b/docs/reference/prompts.md
@@ -0,0 +1,7 @@
+# openarmature.prompts
+
+::: openarmature.prompts
+    options:
+      show_root_heading: false
+      show_source: false
+      heading_level: 2
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -95,6 +95,7 @@ nav:
     - Composition: concepts/composition.md
     - Fan-out: concepts/fan-out.md
     - LLMs: concepts/llms.md
+    - Prompts: concepts/prompts.md
     - Observability: concepts/observability.md
     - Checkpointing: concepts/checkpointing.md
   - Model Providers:
@@ -104,6 +105,7 @@ nav:
     - reference/index.md
     - openarmature.graph: reference/graph.md
     - openarmature.llm: reference/llm.md
+    - openarmature.prompts: reference/prompts.md
     - openarmature.checkpoint: reference/checkpoint.md
     - openarmature.observability: reference/observability.md
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -25,6 +25,7 @@ dependencies = [
     "pydantic>=2.7",
     "httpx>=0.27",
     "jsonschema>=4.0",
+    "jinja2>=3.1",
 ]
 
 [project.optional-dependencies]