Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The

### Added

- **Prompt-management capability (proposal 0017, introduced in spec v0.15.0).** New `openarmature.prompts` subpackage. `PromptManager` composes one or more `PromptBackend`s, exposes `fetch` / `render` / `get`, applies the §8 fallback semantics (`prompt_store_unavailable` continues to the next backend; `prompt_not_found` stops the chain), and renders templates with Jinja2's `StrictUndefined` per §7. `Prompt` / `PromptResult` / `PromptGroup` are Pydantic models matching spec §3 / §4 / §9. Three error categories (`PromptNotFound`, `PromptRenderError`, `PromptStoreUnavailable`) with `PROMPT_TRANSIENT_CATEGORIES` exported for retry-middleware classifiers. `FilesystemPromptBackend` is the minimum local-filesystem reference backend (layout: `<root>/<label>/<name>.j2`; `version` derived from the first 16 hex chars of `template_hash`). New runtime dependency: `jinja2>=3.1`.
- **`openarmature.prompts.context` — observability propagation per spec §11.** `with_active_prompt(result)` and `with_active_prompt_group(group)` context managers + `current_prompt_result()` / `current_prompt_group()` inspectors. When the OTel observer is active and an LLM call fires inside `with_active_prompt`, the `openarmature.llm.complete` span carries the normative `openarmature.prompt.*` attributes (`name`, `version`, `label`, `template_hash`, `rendered_hash`, `group_name`). Nesting is innermost-wins.
- **Image content blocks for user messages (proposal 0015, introduced in spec v0.13.0).** `UserMessage.content` now accepts `str | list[ContentBlock]`. The block surface introduces `TextBlock`, `ImageBlock`, `ImageSourceURL`, `ImageSourceInline`, and the `ContentBlock` / `ImageSource` discriminated unions over the block / source `type` field. `ImageBlock` carries a `media_type` (required for inline sources; ignored for URL sources; typed as `str | None` so callers MAY pass any `image/*` type the bound model supports) and an optional `detail` hint (`"auto"` / `"low"` / `"high"`; `None` default omits the field from the wire so providers apply their own default). System, assistant, and tool messages stay text-string-only; image inputs are user-only in v1.
- **`OpenAIProvider` content-array wire mapping.** When `UserMessage.content` is a content-block sequence, the wire body uses OpenAI's `content` array per §8.1.1. `TextBlock → {type: "text", text}`. `ImageBlock` with a URL source maps to `{type: "image_url", image_url: {url, detail?}}`. `ImageBlock` with an inline source constructs an RFC 2397 `data:<media_type>;base64,<base64_data>` URI and goes through the same `image_url` entry shape. Inline bytes pass through unchanged — no inspection, transcoding, or re-encoding.
- **New error category `ProviderUnsupportedContentBlock` (non-transient).** Raised when the bound model rejects a content block type / media variant. Distinct from `ProviderInvalidRequest` (which covers spec-shape malformation): this category surfaces a *capability* mismatch, letting callers route differently (e.g., fall back to a multimodal-capable provider) without overloading the malformed-request category. Carries `block_type` ("image" / "audio" / "video") and `reason` (provider's human-readable message) when those are recoverable from the rejection. `OpenAIProvider` detects content rejection via HTTP 400 bodies — heuristic on `error.code` (known set: `image_content_not_supported`, `unsupported_image_media_type`, `audio_content_not_supported`, etc.), `error.type` (`image_parse_error`), and `error.message` ("does not support" + image/audio/video).
Expand Down
10 changes: 5 additions & 5 deletions docs/concepts/llms.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ response post-receive against the supplied schema; strict is a
wire-level optimization, not a correctness requirement.

`strict_mode_supported(schema)` (exported from `openarmature.llm`)
performs the deep recursive check. The heuristic is conservative
performs the deep recursive check. The heuristic is conservative:
anything not on the list below trips to `strict: false`:

- Top-level schema is `type: "object"`.
Expand Down Expand Up @@ -240,7 +240,7 @@ A text block is the array-form equivalent of a text-string message:
text block is normatively equivalent to one with `content="describe
this"`.

An image block carries one source URL or inline base64 plus an
An image block carries one source (URL or inline base64) plus an
optional `detail` hint:

```python
Expand Down Expand Up @@ -302,7 +302,7 @@ fidelity: `"auto"`, `"low"`, or `"high"`. The class default is `None`,
which **omits the field from the wire** and lets the provider apply
its own default (conceptually `"auto"`). Setting `detail="auto"`
explicitly on the spec block forces the wire to carry an explicit
`"auto"`usually unnecessary, since the provider's default is the
`"auto"`, usually unnecessary since the provider's default is the
same value.

### When the model can't handle the block
Expand All @@ -324,12 +324,12 @@ provider on this category) compose cleanly against it.
"audio", "video") and `reason` (the provider's human-readable
message) when those are recoverable from the rejection.

`OpenAIProvider` detects content rejection via the response body
`OpenAIProvider` detects content rejection via the response body:
HTTP 400 with an error code like `image_content_not_supported` or a
message like "does not support image inputs." Pre-send capability
checks (failing fast before the wire trip when you know the model
doesn't support images) live above the provider as userland
middleware the provider doesn't ship a static model-capability
middleware; the provider doesn't ship a static model-capability
catalog.

## Routing on parsed fields
Expand Down
278 changes: 278 additions & 0 deletions docs/concepts/prompts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,278 @@
# Prompts

Named, versioned, content-addressed prompts. OpenArmature's
prompt-management capability separates *fetching* a template
from *rendering* it, lets you compose multiple backends with
explicit fallback, and propagates prompt identity to your
observability backend so trace UIs can pivot on the prompt
that produced a call.

Skip ahead to [a minimal example](#a-minimal-example) if you
want code first.

## The two halves: fetch and render

A `PromptBackend` knows how to find a template by `name` and
`label`; nothing more. A `PromptManager` composes one or more
backends and adds rendering on top:

```python
from openarmature.prompts import PromptManager, FilesystemPromptBackend

manager = PromptManager(FilesystemPromptBackend("./prompts"))

# Fetch returns a Prompt (the raw template + identity metadata).
prompt = await manager.fetch("greeting", "production")

# Render applies variables and returns a PromptResult (the
# rendered messages plus a content-addressed identity).
result = manager.render(prompt, {"user": "Alice"})

# Or do both in one shot:
result = await manager.get("greeting", "production", {"user": "Alice"})
```

Why two operations instead of one? Three reasons:

- **Inspect templates without binding variables.** Schema
validation, prompt diffing, tooling that walks the prompt
catalogue.
- **Cache templates separately from rendered output.** The
fetch step is the I/O step; rendering is pure local
computation.
- **Render the same template with different variables in
tight loops.** Map-reduce over chunks, batch evaluation,
fan-out fixtures.

The convenience `get()` operation gives you the single-call
shape when you want it without removing the separability.

## Prompt identity

Every `Prompt` carries five identity fields:

- `name`: your stable identifier (`"greeting"`).
- `version`: the backend's version string. Implementation-defined:
a backend MAY use semver, monotonic integers, content
hashes, git short-SHAs, or any stable identifier. The
filesystem backend derives it from the template content
hash.
- `label`: the slot the prompt was fetched from
(`"production"`, `"latest"`, `"variant-a"`). The label is
part of the query.
- `template_hash`: SHA-256 of the raw template source.
Two prompts with different content always have different
hashes.
- `fetched_at`: when the prompt was fetched. Cached
backends preserve the original fetch time, not the
cache-hit time.

The `name + version + label` triple identifies the prompt;
the `template_hash` lets you tell two prompts apart by
*content*, which matters when a vendor backend serves
different content under the same `latest` label over time.

A `PromptResult` propagates all of those, plus:

- `rendered_hash`: SHA-256 over the rendered messages.
Same template + same variables → same hash. This is the
cache-key value a memoization layer wants.
- `messages`: the rendered output as an LLM-ready
`list[Message]`. Directly consumable by
`Provider.complete()`.
- `variables`: what was applied. Audit-trail friendly.
- `rendered_at`: when the render happened. Distinct from
`fetched_at`.

## Strict variables by default

A template that references a variable not in the mapping
raises `PromptRenderError`:

```python
prompt = await manager.fetch("greeting", "production") # "Hello, {{ user }}! Today is {{ day }}."
manager.render(prompt, {"user": "Alice"}) # raises: "day" is undefined
```

This is intentional. Silently substituting empty strings for
missing variables masks bugs: a typo'd variable name produces
a working-but-wrong prompt, often invisibly. If you need
lenient behavior, wrap your variables in your own defaulting
layer before passing them to `render()`.

The Python implementation uses Jinja2's `StrictUndefined`.

## Composite backends and fallback

A manager constructed with multiple backends consults them in
order. The fallback rule distinguishes infrastructure failure
from logical absence:

```python
from openarmature.prompts import PromptManager
from openarmature_langfuse import LangfusePromptBackend # hypothetical sibling

manager = PromptManager(
LangfusePromptBackend(api_key=...),
FilesystemPromptBackend("./prompts"), # local fallback
)
```

- **`PromptStoreUnavailable` from a backend → try the next.**
Network's down, vendor API is 5xx-ing, filesystem hiccupped,
so the manager falls back. This is the "Langfuse is degraded,
use the local copy" case.
- **`PromptNotFound` from a backend → STOP the chain.** The
error propagates. This is the "operator deliberately deleted
the prompt from Langfuse to retire it" case; falling back here
would silently resurface a stale local copy under a name the
operator wanted gone.
- **All backends `PromptStoreUnavailable` → manager raises
`PromptStoreUnavailable`.** Everything's down.

The two error categories have different operational
meanings; the manager keeps them separated.

## Errors

Three categories cover every failure mode:

| Error | When | Transient |
| ------------------------- | ------------------------------------------------------------------- | --------- |
| `PromptNotFound` | No prompt matches `(name, label)` in any backend (after §8 rules) | No |
| `PromptRenderError` | Undefined variable, template parse error, coercion failure | No |
| `PromptStoreUnavailable` | Backend infrastructure failure (network, I/O, vendor API) | Yes |

`PROMPT_TRANSIENT_CATEGORIES` is exported as a frozenset for
retry-middleware classifiers, matching the pattern
`openarmature.llm` uses with its `TRANSIENT_CATEGORIES`.

## PromptGroup: tracing related prompts together

A `PromptGroup` is a structural grouping of two or more
`PromptResult` instances under a stable `group_name`. The
group itself doesn't execute anything; it gives observability
a shared name to render related calls under.

```python
from openarmature.prompts import PromptGroup, with_active_prompt_group

classify = await manager.get("classify", variables={"input": user_query})
answer = await manager.get("answer", variables={"input": user_query, ...})

group = PromptGroup(group_name="classifier_chain", members=[classify, answer])
with with_active_prompt_group(group):
# Every LLM call in this scope carries
# openarmature.prompt.group_name="classifier_chain".
classification = await provider.complete(classify.messages, ...)
final = await provider.complete(answer.messages, ...)
```

Canonical patterns the primitive covers:

- **Multi-stage classification**: `[coarse, fine, answer]`.
- **RAG with reranking**: `[query_rewrite, retrieve, rerank, answer]`.
- **Self-correction loops**: `[generate, critique, revise]`.
- **Map-reduce over chunks**: `[chunk_classify_1..N, synthesize]`.

The N=2 case ("classifier + follow-up") is the simplest;
larger groups work under the same primitive. The group rejects
empty and single-member shapes; single-prompt tagging is
already served by the per-prompt observability attributes
below.

## Observability propagation

When an LLM call fires inside `with_active_prompt(result)` (or
`with_active_prompt_group(group)`), the OTel observer surfaces
six normative attributes on the `openarmature.llm.complete`
span:

- `openarmature.prompt.name`
- `openarmature.prompt.version`
- `openarmature.prompt.label`
- `openarmature.prompt.template_hash`
- `openarmature.prompt.rendered_hash`
- `openarmature.prompt.group_name`

Pattern:

```python
result = await manager.get("greeting", "production", {"user": "Alice"})
with with_active_prompt(result):
response = await provider.complete(result.messages, ...)
```

Trace UIs can then pivot on `prompt.name`, filter on
`prompt.template_hash` to find every call that used a given
template version, or surface `prompt.group_name` to group
related calls into a single workflow view.

Nesting is innermost-wins. If you activate a result inside
another active result, the inner one wins for the duration
of the inner block.

## Determinism and content-addressed caching

`render` is deterministic: same `Prompt`, same `variables` →
bytewise-identical `messages` and `rendered_hash` across
calls. This is the cache-key contract: `rendered_hash`
gives a downstream memoization layer the right equivalence
relation for free.

Templates MAY reference user-supplied variables that capture
nondeterministic values (`now=datetime.utcnow()`); the
determinism contract applies to the render operation given
fixed inputs, not to user-supplied variable content.

## A minimal example

```python
import asyncio
from pathlib import Path

from openarmature.prompts import FilesystemPromptBackend, PromptManager


async def main() -> None:
manager = PromptManager(FilesystemPromptBackend(Path("./prompts")))
result = await manager.get(
"greeting",
"production",
variables={"user": "Alice"},
)
print(result.messages[0].content) # rendered text
print(result.rendered_hash) # cache key


asyncio.run(main())
```
Comment thread
chris-colinsky marked this conversation as resolved.

The filesystem backend layout is
`<root>/<label>/<name>.j2`; for the example above,
`./prompts/production/greeting.j2`.

## What's out of scope (for now)

- **Specific vendor backends**: Langfuse, PromptLayer, etc.,
ship as sibling packages (`openarmature-langfuse`, …). The
core ships the protocol + a filesystem reference.
- **Prompt versioning workflows**: how versions are assigned,
promoted, pinned. Per project. The spec defines the
`version` field; the discipline is yours.
- **Cache invalidation policies**: `template_hash` and
`rendered_hash` are the keys; the cache itself is a
separate concern.
- **Prompt linting / evaluation**: quality checks belong to
separate tools (or the future eval capability).
- **Multi-message render decomposition**: v1 emits a single
`UserMessage` carrying the rendered text. If you need
`system + user` splits, construct the messages list
manually outside `render()` for now.

## Where to next

- **[Model Providers](../model-providers/index.md)**:
what to pass `result.messages` into.
- **[API reference: `openarmature.prompts`](../reference/prompts.md)**:
the full public surface.
6 changes: 3 additions & 3 deletions docs/model-providers/authoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ class MyProvider:
response_schema: dict[str, Any] | type[BaseModel] | None = None,
) -> Response:
# response_schema is part of the Protocol; a skeleton provider
# MUST NOT silently ignore it callers expect either
# MUST NOT silently ignore it: callers expect either
# Response.parsed populated or a StructuredOutputInvalid raise.
# Until the wire path is implemented, raise
# ProviderInvalidRequest when response_schema is set. A
Expand Down Expand Up @@ -206,8 +206,8 @@ of:
`ImageSourceInline`) are stable across providers; only the wire
shape differs. Provider authors targeting non-multimodal models
MUST surface `ProviderUnsupportedContentBlock` when the request
carries blocks the bound model can't serve pre-send or
post-receive per §7.
carries blocks the bound model can't serve (pre-send or
post-receive per §7).
- **Structured output.** Threading `response_schema` through the
request body (native `response_format` if the underlying wire
supports it; prompt-augmentation fallback otherwise) and validating
Expand Down
2 changes: 1 addition & 1 deletion docs/model-providers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ in the LLMs concept page for the multimodal contract; see

`OpenAIProvider` detects unsupported-content-block rejections via
the response body (HTTP 400 with an error code or message indicating
content rejection) a post-receive mapping rather than a static
content rejection): a post-receive mapping rather than a static
pre-send capability check. Pre-send protection is a userland
middleware pattern when callers know the bound model's capabilities
up front.
Expand Down
7 changes: 7 additions & 0 deletions docs/reference/prompts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# openarmature.prompts

::: openarmature.prompts
options:
show_root_heading: false
show_source: false
heading_level: 2
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ nav:
- Composition: concepts/composition.md
- Fan-out: concepts/fan-out.md
- LLMs: concepts/llms.md
- Prompts: concepts/prompts.md
- Observability: concepts/observability.md
- Checkpointing: concepts/checkpointing.md
- Model Providers:
Expand All @@ -104,6 +105,7 @@ nav:
- reference/index.md
- openarmature.graph: reference/graph.md
- openarmature.llm: reference/llm.md
- openarmature.prompts: reference/prompts.md
- openarmature.checkpoint: reference/checkpoint.md
- openarmature.observability: reference/observability.md

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ dependencies = [
"pydantic>=2.7",
"httpx>=0.27",
"jsonschema>=4.0",
"jinja2>=3.1",
]

[project.optional-dependencies]
Expand Down
Loading