-
Notifications
You must be signed in to change notification settings - Fork 0
feat(prompts): prompt-management core (proposal 0017) #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
35907f1
feat(prompts): error classes and category constants
chris-colinsky fe1a779
feat(prompts): Prompt, PromptResult, PromptGroup types
chris-colinsky f448e2a
feat(prompts): PromptBackend protocol, PromptManager, jinja2 dep
chris-colinsky e5d8f1b
feat(prompts): FilesystemPromptBackend + OTel attribute propagation
chris-colinsky 9bdbcb7
test(conformance): prompt-management harness and 12 fixtures
chris-colinsky 853b6d5
test(unit): prompts subpackage + OTel attribute propagation
chris-colinsky ad2e896
docs: prompts concept page, API reference, changelog
chris-colinsky 0f458ed
fix: CoPilot review pass on PR #45
chris-colinsky f5eba2a
docs(prompts): drop em dashes from concept page
chris-colinsky 85b3561
docs: drop em dashes from llms.md, model-providers/{index,authoring}.md
chris-colinsky bcf63a2
fix: CoPilot review round-2 pass on PR #45
chris-colinsky 420ce20
fix: CoPilot review round-3 pass on PR #45
chris-colinsky File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,278 @@ | ||
| # Prompts | ||
|
|
||
| Named, versioned, content-addressed prompts. OpenArmature's | ||
| prompt-management capability separates *fetching* a template | ||
| from *rendering* it, lets you compose multiple backends with | ||
| explicit fallback, and propagates prompt identity to your | ||
| observability backend so trace UIs can pivot on the prompt | ||
| that produced a call. | ||
|
|
||
| Skip ahead to [a minimal example](#a-minimal-example) if you | ||
| want code first. | ||
|
|
||
| ## The two halves: fetch and render | ||
|
|
||
| A `PromptBackend` knows how to find a template by `name` and | ||
| `label`; nothing more. A `PromptManager` composes one or more | ||
| backends and adds rendering on top: | ||
|
|
||
| ```python | ||
| from openarmature.prompts import PromptManager, FilesystemPromptBackend | ||
|
|
||
| manager = PromptManager(FilesystemPromptBackend("./prompts")) | ||
|
|
||
| # Fetch returns a Prompt (the raw template + identity metadata). | ||
| prompt = await manager.fetch("greeting", "production") | ||
|
|
||
| # Render applies variables and returns a PromptResult (the | ||
| # rendered messages plus a content-addressed identity). | ||
| result = manager.render(prompt, {"user": "Alice"}) | ||
|
|
||
| # Or do both in one shot: | ||
| result = await manager.get("greeting", "production", {"user": "Alice"}) | ||
| ``` | ||
|
|
||
| Why two operations instead of one? Three reasons: | ||
|
|
||
| - **Inspect templates without binding variables.** Schema | ||
| validation, prompt diffing, tooling that walks the prompt | ||
| catalogue. | ||
| - **Cache templates separately from rendered output.** The | ||
| fetch step is the I/O step; rendering is pure local | ||
| computation. | ||
| - **Render the same template with different variables in | ||
| tight loops.** Map-reduce over chunks, batch evaluation, | ||
| fan-out fixtures. | ||
|
|
||
| The convenience `get()` operation gives you the single-call | ||
| shape when you want it without removing the separability. | ||
|
|
||
| ## Prompt identity | ||
|
|
||
| Every `Prompt` carries five identity fields: | ||
|
|
||
| - `name`: your stable identifier (`"greeting"`). | ||
| - `version`: the backend's version string. Implementation-defined: | ||
| a backend MAY use semver, monotonic integers, content | ||
| hashes, git short-SHAs, or any stable identifier. The | ||
| filesystem backend derives it from the template content | ||
| hash. | ||
| - `label`: the slot the prompt was fetched from | ||
| (`"production"`, `"latest"`, `"variant-a"`). The label is | ||
| part of the query. | ||
| - `template_hash`: SHA-256 of the raw template source. | ||
| Two prompts with different content always have different | ||
| hashes. | ||
| - `fetched_at`: when the prompt was fetched. Cached | ||
| backends preserve the original fetch time, not the | ||
| cache-hit time. | ||
|
|
||
| The `name + version + label` triple identifies the prompt; | ||
| the `template_hash` lets you tell two prompts apart by | ||
| *content*, which matters when a vendor backend serves | ||
| different content under the same `latest` label over time. | ||
|
|
||
| A `PromptResult` propagates all of those, plus: | ||
|
|
||
| - `rendered_hash`: SHA-256 over the rendered messages. | ||
| Same template + same variables → same hash. This is the | ||
| cache-key value a memoization layer wants. | ||
| - `messages`: the rendered output as an LLM-ready | ||
| `list[Message]`. Directly consumable by | ||
| `Provider.complete()`. | ||
| - `variables`: what was applied. Audit-trail friendly. | ||
| - `rendered_at`: when the render happened. Distinct from | ||
| `fetched_at`. | ||
|
|
||
| ## Strict variables by default | ||
|
|
||
| A template that references a variable not in the mapping | ||
| raises `PromptRenderError`: | ||
|
|
||
| ```python | ||
| prompt = await manager.fetch("greeting", "production") # "Hello, {{ user }}! Today is {{ day }}." | ||
| manager.render(prompt, {"user": "Alice"}) # raises: "day" is undefined | ||
| ``` | ||
|
|
||
| This is intentional. Silently substituting empty strings for | ||
| missing variables masks bugs: a typo'd variable name produces | ||
| a working-but-wrong prompt, often invisibly. If you need | ||
| lenient behavior, wrap your variables in your own defaulting | ||
| layer before passing them to `render()`. | ||
|
|
||
| The Python implementation uses Jinja2's `StrictUndefined`. | ||
|
|
||
| ## Composite backends and fallback | ||
|
|
||
| A manager constructed with multiple backends consults them in | ||
| order. The fallback rule distinguishes infrastructure failure | ||
| from logical absence: | ||
|
|
||
| ```python | ||
| from openarmature.prompts import PromptManager | ||
| from openarmature_langfuse import LangfusePromptBackend # hypothetical sibling | ||
|
|
||
| manager = PromptManager( | ||
| LangfusePromptBackend(api_key=...), | ||
| FilesystemPromptBackend("./prompts"), # local fallback | ||
| ) | ||
| ``` | ||
|
|
||
| - **`PromptStoreUnavailable` from a backend → try the next.** | ||
| Network's down, vendor API is 5xx-ing, filesystem hiccupped, | ||
| so the manager falls back. This is the "Langfuse is degraded, | ||
| use the local copy" case. | ||
| - **`PromptNotFound` from a backend → STOP the chain.** The | ||
| error propagates. This is the "operator deliberately deleted | ||
| the prompt from Langfuse to retire it" case; falling back here | ||
| would silently resurface a stale local copy under a name the | ||
| operator wanted gone. | ||
| - **All backends `PromptStoreUnavailable` → manager raises | ||
| `PromptStoreUnavailable`.** Everything's down. | ||
|
|
||
| The two error categories have different operational | ||
| meanings; the manager keeps them separated. | ||
|
|
||
| ## Errors | ||
|
|
||
| Three categories cover every failure mode: | ||
|
|
||
| | Error | When | Transient | | ||
| | ------------------------- | ------------------------------------------------------------------- | --------- | | ||
| | `PromptNotFound` | No prompt matches `(name, label)` in any backend (after §8 rules) | No | | ||
| | `PromptRenderError` | Undefined variable, template parse error, coercion failure | No | | ||
| | `PromptStoreUnavailable` | Backend infrastructure failure (network, I/O, vendor API) | Yes | | ||
|
|
||
| `PROMPT_TRANSIENT_CATEGORIES` is exported as a frozenset for | ||
| retry-middleware classifiers, matching the pattern | ||
| `openarmature.llm` uses with its `TRANSIENT_CATEGORIES`. | ||
|
|
||
| ## PromptGroup: tracing related prompts together | ||
|
|
||
| A `PromptGroup` is a structural grouping of two or more | ||
| `PromptResult` instances under a stable `group_name`. The | ||
| group itself doesn't execute anything; it gives observability | ||
| a shared name to render related calls under. | ||
|
|
||
| ```python | ||
| from openarmature.prompts import PromptGroup, with_active_prompt_group | ||
|
|
||
| classify = await manager.get("classify", variables={"input": user_query}) | ||
| answer = await manager.get("answer", variables={"input": user_query, ...}) | ||
|
|
||
| group = PromptGroup(group_name="classifier_chain", members=[classify, answer]) | ||
| with with_active_prompt_group(group): | ||
| # Every LLM call in this scope carries | ||
| # openarmature.prompt.group_name="classifier_chain". | ||
| classification = await provider.complete(classify.messages, ...) | ||
| final = await provider.complete(answer.messages, ...) | ||
| ``` | ||
|
|
||
| Canonical patterns the primitive covers: | ||
|
|
||
| - **Multi-stage classification**: `[coarse, fine, answer]`. | ||
| - **RAG with reranking**: `[query_rewrite, retrieve, rerank, answer]`. | ||
| - **Self-correction loops**: `[generate, critique, revise]`. | ||
| - **Map-reduce over chunks**: `[chunk_classify_1..N, synthesize]`. | ||
|
|
||
| The N=2 case ("classifier + follow-up") is the simplest; | ||
| larger groups work under the same primitive. The group rejects | ||
| empty and single-member shapes; single-prompt tagging is | ||
| already served by the per-prompt observability attributes | ||
| below. | ||
|
|
||
| ## Observability propagation | ||
|
|
||
| When an LLM call fires inside `with_active_prompt(result)` (or | ||
| `with_active_prompt_group(group)`), the OTel observer surfaces | ||
| six normative attributes on the `openarmature.llm.complete` | ||
| span: | ||
|
|
||
| - `openarmature.prompt.name` | ||
| - `openarmature.prompt.version` | ||
| - `openarmature.prompt.label` | ||
| - `openarmature.prompt.template_hash` | ||
| - `openarmature.prompt.rendered_hash` | ||
| - `openarmature.prompt.group_name` | ||
|
|
||
| Pattern: | ||
|
|
||
| ```python | ||
| result = await manager.get("greeting", "production", {"user": "Alice"}) | ||
| with with_active_prompt(result): | ||
| response = await provider.complete(result.messages, ...) | ||
| ``` | ||
|
|
||
| Trace UIs can then pivot on `prompt.name`, filter on | ||
| `prompt.template_hash` to find every call that used a given | ||
| template version, or surface `prompt.group_name` to group | ||
| related calls into a single workflow view. | ||
|
|
||
| Nesting is innermost-wins. If you activate a result inside | ||
| another active result, the inner one wins for the duration | ||
| of the inner block. | ||
|
|
||
| ## Determinism and content-addressed caching | ||
|
|
||
| `render` is deterministic: same `Prompt`, same `variables` → | ||
| bytewise-identical `messages` and `rendered_hash` across | ||
| calls. This is the cache-key contract: `rendered_hash` | ||
| gives a downstream memoization layer the right equivalence | ||
| relation for free. | ||
|
|
||
| Templates MAY reference user-supplied variables that capture | ||
| nondeterministic values (`now=datetime.utcnow()`); the | ||
| determinism contract applies to the render operation given | ||
| fixed inputs, not to user-supplied variable content. | ||
|
|
||
| ## A minimal example | ||
|
|
||
| ```python | ||
| import asyncio | ||
| from pathlib import Path | ||
|
|
||
| from openarmature.prompts import FilesystemPromptBackend, PromptManager | ||
|
|
||
|
|
||
| async def main() -> None: | ||
| manager = PromptManager(FilesystemPromptBackend(Path("./prompts"))) | ||
| result = await manager.get( | ||
| "greeting", | ||
| "production", | ||
| variables={"user": "Alice"}, | ||
| ) | ||
| print(result.messages[0].content) # rendered text | ||
| print(result.rendered_hash) # cache key | ||
|
|
||
|
|
||
| asyncio.run(main()) | ||
| ``` | ||
|
|
||
| The filesystem backend layout is | ||
| `<root>/<label>/<name>.j2`; for the example above, | ||
| `./prompts/production/greeting.j2`. | ||
|
|
||
| ## What's out of scope (for now) | ||
|
|
||
| - **Specific vendor backends**: Langfuse, PromptLayer, etc., | ||
| ship as sibling packages (`openarmature-langfuse`, …). The | ||
| core ships the protocol + a filesystem reference. | ||
| - **Prompt versioning workflows**: how versions are assigned, | ||
| promoted, pinned. Per project. The spec defines the | ||
| `version` field; the discipline is yours. | ||
| - **Cache invalidation policies**: `template_hash` and | ||
| `rendered_hash` are the keys; the cache itself is a | ||
| separate concern. | ||
| - **Prompt linting / evaluation**: quality checks belong to | ||
| separate tools (or the future eval capability). | ||
| - **Multi-message render decomposition**: v1 emits a single | ||
| `UserMessage` carrying the rendered text. If you need | ||
| `system + user` splits, construct the messages list | ||
| manually outside `render()` for now. | ||
|
|
||
| ## Where to next | ||
|
|
||
| - **[Model Providers](../model-providers/index.md)**: | ||
| what to pass `result.messages` into. | ||
| - **[API reference: `openarmature.prompts`](../reference/prompts.md)**: | ||
| the full public surface. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # openarmature.prompts | ||
|
|
||
| ::: openarmature.prompts | ||
| options: | ||
| show_root_heading: false | ||
| show_source: false | ||
| heading_level: 2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.