feat(prompts): prompt-management core (proposal 0017)#45
Conversation
Establishes the prompt-management subpackage with the three canonical error categories from spec §10: - PromptNotFound (non-transient): no prompt matches (name, label). - PromptRenderError (non-transient): undefined variable, template parse error, or variable-coercion failure. - PromptStoreUnavailable (transient): backend infrastructure failure (network, I/O, vendor API). Exports PROMPT_TRANSIENT_CATEGORIES mirroring the TRANSIENT_CATEGORIES frozenset in openarmature.llm.errors, so retry-middleware classifiers can identify transient prompt-management failures by category.
Pydantic models for the prompt-management capability shapes from spec §3, §4, and §9. Prompt carries the raw template source string plus identity metadata (name, version, label, template_hash, fetched_at, optional metadata). The raw-string representation keeps Prompt serializable and engine-agnostic; compilation happens on render. PromptResult propagates identity from the source Prompt and carries the rendered messages list (compatible with openarmature.llm.Message and directly consumable by Provider.complete()), the variables used, rendered_hash, and rendered_at. PromptGroup wraps an ordered N>=2 sequence of PromptResult instances with a stable group_name. The validator rejects empty and single-member groups per §9 (single-prompt tagging is already served by per-prompt observability attributes). Hashing helpers compute SHA-256 over UTF-8 bytes (template) and over a canonical JSON serialization with sort_keys + minimal separators (rendered). Both prefixed with 'sha256:' so future algorithm changes are self-describing.
PromptBackend is a runtime-checkable Protocol with a single async fetch(name, label) method, matching the openarmature.llm.Provider pattern. The docstring restates the §5 contract: reentrant, no mutation, raises PromptNotFound / PromptStoreUnavailable, and the rule that cached results MUST preserve the original fetched_at. PromptManager composes one or more PromptBackends and exposes: - fetch: §8 fallback semantics. First successful fetch wins; PromptNotFound STOPS the chain (logical absence MUST NOT silently substitute); PromptStoreUnavailable continues to the next backend; all-exhausted raises PromptStoreUnavailable with the last unavailable chained as __cause__. WARN-level log on each fallback per §8. - render: synchronous string transform via Jinja2 with StrictUndefined per §7. Produces a single UserMessage in v1 (multi-message decomposition deferred). UndefinedError and TemplateError both map to PromptRenderError carrying the prompt's identity + the variables + a description. Pydantic ValidationError on the UserMessage(content=rendered_text) construction (empty-string render case) also maps to PromptRenderError per §10's 'variable's value not coercible' framing. - get: convenience equivalent to render(await fetch(...), variables). Adds jinja2>=3.1 to runtime dependencies.
FilesystemPromptBackend reads prompts from <root>/<label>/<name>.j2.
The subdirectory-per-label layout keeps name-collisions across
labels distinct without prefix-escape concerns. version is
derived from the first 12 hex chars of the template_hash so two
file contents map deterministically to two distinct versions
without needing a sidecar metadata file (spec §3 lets backends
pick any stable identifier). The docstring notes that future
caching backends MUST preserve the original fetched_at on
returned Prompts per spec §3.
Adds the context-variable propagation mechanism for spec §11
LLM-call span attributes:
- openarmature.prompts.context module exposes
with_active_prompt(result) and with_active_prompt_group(group)
context managers plus current_prompt_result() /
current_prompt_group() inspectors.
- OTelObserver._on_llm_event reads the two ContextVars at LLM-
call span start and surfaces:
openarmature.prompt.name
openarmature.prompt.version
openarmature.prompt.label
openarmature.prompt.template_hash
openarmature.prompt.rendered_hash
openarmature.prompt.group_name
- Nesting is innermost-wins (matches Python's natural ContextVar
token-stacking behavior; spec §11 doesn't mandate a policy).
The attribute names match spec §11's normative list. The
mechanism (context variables) is one of the two example
mechanisms §11 names; bundling it now keeps the §11 surface
discoverable from the moment prompt-management lands.
Adds prompt-management as the fifth conformance capability: - harness/prompt_management.py — typed YAML models for the new fixture shape (backends + manager + calls with target / operation / capture_as, plus per-call and top-level expected blocks for raises / result_equivalence / prompt_group / rendered_hash_equal / rendered_hash_different). - harness/fixtures.py — PromptManagementFixture added to the discriminated union; the discriminator recognizes top-level 'backends:' (without 'mock_provider:') as the prompt-management shape. - harness/loader.py — 'prompt-management' added to CAPABILITIES so test_fixture_parsing.py discovers and parses the new fixtures. test_prompt_management.py drives all 12 spec fixtures (001-fetch-success through 012-prompt-result-rendered-hash-stability) against the real PromptManager + a MockPromptBackend that implements the protocol with optional simulate_unavailable + preloaded prompts + a call_count for fixtures that assert fallback chain visits. All 12 fixtures pass.
Adds tests/unit/test_prompts.py (25 tests) covering gaps the conformance fixtures don't exercise directly: - error categories match spec §10 strings; PROMPT_TRANSIENT_CATEGORIES contains only prompt_store_unavailable. - error attribute carriage (PromptNotFound name/label/backend, PromptRenderError name/version/label/variables/description). - template_hash / rendered_hash determinism, prefix, and length; divergence for different inputs. - Prompt extra-field rejection; PromptGroup 0/1-member rejection and 2+ acceptance. - PromptManager construction (zero-backend rejection). - Empty-string render output boundary wrap (the spec-agent's concern about Jinja2 cleanly rendering '' but UserMessage rejecting empty content — verified to surface as PromptRenderError). - Identity-field propagation from Prompt to PromptResult on render. - FilesystemPromptBackend disk I/O: success path, missing file raises PromptNotFound, OSError that isn't FileNotFoundError raises PromptStoreUnavailable. - Context-var propagation: with_active_prompt / _prompt_group set + reset, innermost-wins nesting, async-task visibility. - PromptManager fallback gaps: first-match short-circuits later backends; render returns a UserMessage carrying the rendered text. Adds two OTel observer tests under tests/unit/test_observability_otel.py: - Active prompt + active prompt group propagates the six openarmature.prompt.* span attributes (name, version, label, template_hash, rendered_hash, group_name) on the openarmature.llm.complete span. - Without an active prompt, the LLM-call span carries no openarmature.prompt.* attributes.
docs/concepts/prompts.md walks through the prompt-management capability: the fetch + render split (and why both, not just get()), Prompt identity fields, strict-by-default variables, composite-backend fallback (PromptStoreUnavailable continues, PromptNotFound stops), the three error categories, PromptGroup for tracing related prompts, observability propagation via with_active_prompt and the six normative openarmature.prompt.* attributes, determinism + content-addressed caching, a minimal example, and what's out of scope (vendor backends, versioning workflows, cache invalidation, multi-message decomposition). docs/reference/prompts.md is an mkdocstrings autodoc page in the same shape as docs/reference/llm.md. mkdocs.yml gains the two new pages in the Concepts and Reference nav sections. CHANGELOG.md adds two entries under [Unreleased]: - the new openarmature.prompts subpackage with PromptManager, the three error categories, FilesystemPromptBackend, and the jinja2>=3.1 runtime dependency. - the observability propagation surface in openarmature.prompts.context plus the OTel observer wiring.
There was a problem hiding this comment.
Pull request overview
Implements proposal 0017 (prompt-management core) in a new openarmature.prompts subpackage: typed Prompt / PromptResult / PromptGroup models, a PromptBackend protocol, a PromptManager that composes backends with §8 fallback semantics and renders via Jinja2 StrictUndefined, three canonical error categories, a FilesystemPromptBackend reference implementation, and with_active_prompt / with_active_prompt_group context managers wired into the OTel observer for §11 span-attribute propagation. Adds a conformance harness shape for the 12 new prompt-management fixtures.
Changes:
- New
openarmature.promptssubpackage (models, errors, manager, FS backend, hashing helpers, context vars) withjinja2>=3.1as a new runtime dep. - OTel observer surfaces six
openarmature.prompt.*attributes onopenarmature.llm.completespans when called inside the new context managers. - Conformance harness extended with
PromptManagementFixture, a YAML model layer, and a parametrized test runner using an in-processMockPromptBackend.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
src/openarmature/prompts/__init__.py |
Public API re-exports for the new subpackage. |
src/openarmature/prompts/prompt.py |
Prompt / PromptResult Pydantic models with extra="forbid". |
src/openarmature/prompts/group.py |
PromptGroup with N≥2 member validator. |
src/openarmature/prompts/errors.py |
Three canonical error classes + category constants + transient frozenset. |
src/openarmature/prompts/backend.py |
Runtime-checkable PromptBackend Protocol. |
src/openarmature/prompts/manager.py |
Fallback fetch, Jinja2 strict render with empty-string boundary wrap, get convenience. |
src/openarmature/prompts/hashing.py |
SHA-256 helpers (sha256: prefix) over template source and canonical message JSON. |
src/openarmature/prompts/context.py |
ContextVar-backed with_active_prompt / with_active_prompt_group. |
src/openarmature/prompts/backends/filesystem.py |
Reference FS backend; reads <root>/<label>/<name>.j2, derives version from hash prefix. |
src/openarmature/prompts/backends/__init__.py |
Backend re-export. |
src/openarmature/observability/otel/observer.py |
LLM-span attribute propagation reading the prompt/group context vars. |
pyproject.toml, uv.lock |
Add jinja2>=3.1 runtime dependency. |
tests/conformance/harness/prompt_management.py |
Typed YAML fixture models for the new shape. |
tests/conformance/harness/fixtures.py |
Discriminated-union registration of PromptManagementFixture. |
tests/conformance/harness/loader.py |
Adds prompt-management to known capabilities. |
tests/conformance/test_prompt_management.py |
Fixture runner with MockPromptBackend + per-call / top-level assertions. |
tests/unit/test_prompts.py |
Unit coverage for types, hashing, FS backend, context vars, manager edge cases. |
tests/unit/test_observability_otel.py |
OTel-attribute propagation present/absent cases. |
docs/concepts/prompts.md, docs/reference/prompts.md, mkdocs.yml |
Concept page, API reference, nav update. |
CHANGELOG.md |
[Unreleased] entries for the prompt-management capability and observability propagation. |
Comments suppressed due to low confidence (2)
src/openarmature/prompts/backends/filesystem.py:53
path.read_text(encoding="utf-8")can raiseUnicodeDecodeErrorif a template file contains invalid UTF-8.UnicodeDecodeErroris a subclass ofValueError, notOSError, so it bypasses bothexceptbranches and propagates as an uncategorized exception out offetch(). This means a corrupt prompt file surfaces as a generic Python exception instead of the spec's canonicalPromptStoreUnavailable(or arguablyPromptNotFound), defeating the §8 fallback chain (PromptManager.fetchwon't fall through to the next backend). Consider catchingUnicodeDecodeErrorexplicitly (orExceptionnarrowly aroundread_text) and mapping toPromptStoreUnavailable.
try:
template_source = await asyncio.to_thread(path.read_text, encoding="utf-8")
except FileNotFoundError as exc:
raise PromptNotFound(
f"prompt ({name!r}, {label!r}) not found under {self._root}",
name=name,
label=label,
backend=str(self._root),
) from exc
except OSError as exc:
raise PromptStoreUnavailable(
f"filesystem I/O error reading ({name!r}, {label!r}): {exc}"
) from exc
src/openarmature/prompts/backends/filesystem.py:49
nameandlabelare passed into the backend without any path-component validation. If a caller (or upstream code that forwards user input) passes a value containing..or an absolute path,self._root / label / f"{name}.j2"will silently traverse outside the configured root (Path("/etc") / "passwd"yields/etc/passwd, and..segments are not normalized away by/). Even though prompt names are typically developer-controlled, this is a reference backend and the public API contract onPromptBackend.fetchdoesn't forbid externally-sourced names. Consider rejecting names/labels containing path separators or..components, or resolving and verifyingpath.is_relative_to(self._root.resolve())before reading.
async def fetch(self, name: str, label: str = "production") -> Prompt:
path = self._root / label / f"{name}.j2"
try:
template_source = await asyncio.to_thread(path.read_text, encoding="utf-8")
except FileNotFoundError as exc:
raise PromptNotFound(
f"prompt ({name!r}, {label!r}) not found under {self._root}",
name=name,
label=label,
backend=str(self._root),
) from exc
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- manager.py: hoist Jinja2 Environment to module-level singleton (stateless config; thread-safe for compile + render; avoids re-parsing config on every render call), keep the autoescape-disabled-by-design comment. - errors.py: PromptStoreUnavailable carries optional name / label / backends_tried for operator diagnosability; PromptManager's aggregate raise populates backends_tried with the ordered list of consulted backends. PromptRenderError docstring documents spec §10's non-transient mandate. - backends/filesystem.py: widen the version-prefix length from 12 to 16 hex chars (~64 bits; birthday-paradox boundary at ~4B templates), document the rationale + the wider-prefix / alternative-identifier guidance for higher-scale backends. Also carries name / label on PromptStoreUnavailable raises. - observability/otel/observer.py: hoist prompts.context import to module top-level (no longer optional; cost off the per-event hot path). - harness/fixtures.py: tighten the prompt-management discriminator from `backends:` alone to `backends:` co-occurring with `calls:` AND absence of graph-shape keys; avoids silently misrouting future fixtures that introduce a backends list for some other purpose. - test_prompt_management.py: lift per-call call-count assertions out of the raises branch so they apply on both success and error paths; add internal-consistency check that a fixture's fields_must_match and fields_may_differ sets don't overlap. - test_prompts.py: mock Path.read_text for the OSError-routing test instead of relying on platform-dependent NotADirectoryError behavior; update the version-prefix length assertion to match the widened 16-char prefix.
Memory rule: no em dashes in user-facing copy. Reworded the new docs/concepts/prompts.md to use colons, semicolons, parens, or sentence restructuring in place of em dashes.
Sweep of leftover em dashes from PR-1/PR-2 docs that slipped past the no-em-dashes-in-user-facing-copy rule. Same substitutions as the prompts.md cleanup (colons, semicolons, parens, or sentence restructuring).
- CHANGELOG.md: update 12 → 16 hex chars to match the widened FilesystemPromptBackend.version derivation. - prompt.py: PromptResult.messages gains Field(min_length=1) so the spec §4 'Ordered non-empty sequence' mandate is enforced at the type boundary, not just by the construction path. - errors.py: PromptStoreUnavailable gains an optional causes list[BaseException] attribute carrying per-backend exceptions index-aligned to backends_tried. - manager.py: aggregate raise populates causes with the per-backend exceptions in fallback order, while keeping the __cause__ chain pointing at the last unavailable for stack-trace continuity. - manager.py: PromptManager carries a per-instance dict[str, jinja2.Template] keyed by template_hash. Render consults the cache and only re-parses on miss. Unbounded for v1 (typical apps have O(10) prompts; an LRU follow-on can land if benchmarks show memory pressure). template_hash is content-derived, so cache invalidation is automatic when a backend returns updated content. - test_prompts.py: new tests for empty-messages rejection and for the compiled-template cache hit behavior.
- harness/prompt_management.py: fix misleading comment on FixtureExpectedRaises.carries (secondary_backend_call_count is a sibling field on FixtureExpectedPerCall, not inside carries). - manager.py: replace 'assert causes' with an explicit 'if not causes: raise RuntimeError(...)' guard so the invariant holds under 'python -O' (asserts stripped) and surfaces as a clear RuntimeError rather than an opaque IndexError if a future change ever silently swallows an exception in the fallback loop. - test_prompts.py: rewrite the active-prompt-in-nested-async-function test to spawn via asyncio.create_task so it actually exercises context-copy across the task boundary, matching the function name's implied claim. The previous form's await ran in the same context where ContextVar propagation is trivially expected.
Summary
openarmature.promptssubpackage. PR-3 of the five-PR batch following PR-1 (feat(llm): structured output (proposal 0016) #42, proposal 0016) and PR-2 (feat(llm): image content blocks (proposal 0015) #44, proposal 0015).PromptManagercomposes one or morePromptBackends, exposesfetch/render/get, applies the §8 fallback contract (prompt_store_unavailablefalls through to the next backend;prompt_not_foundstops the chain), and renders templates through Jinja2'sStrictUndefinedper §7.Prompt/PromptResult/PromptGroupPydantic models match spec §3 / §4 / §9 exactly.PromptGrouprequireslen(members) >= 2.PromptNotFound,PromptRenderError,PromptStoreUnavailable) with category-string constants andPROMPT_TRANSIENT_CATEGORIES = frozenset({"prompt_store_unavailable"})exported in the same shape asopenarmature.llm.errors.TRANSIENT_CATEGORIES.FilesystemPromptBackendis the minimum local-filesystem reference backend (layout:<root>/<label>/<name>.j2;versionderived from the first 12 hex chars of the template's SHA-256 hash).with_active_prompt(result)andwith_active_prompt_group(group)context managers +current_prompt_result()/current_prompt_group()inspectors. When an LLM call fires inside one of those contexts, the OTel observer surfaces the six normativeopenarmature.prompt.*attributes on theopenarmature.llm.completespan. Nesting is innermost-wins.What's new
openarmature.prompts__init__re-exports the full public API).Prompt,PromptResult,PromptGroupPromptBackendfetch(name, label)method.PromptManagerfetch(with fallback) /render(sync, Jinja2 strict) /getconvenience.FilesystemPromptBackend<root>/<label>/<name>.j2from disk.PromptNotFound/PromptRenderError/PromptStoreUnavailablePROMPT_TRANSIENT_CATEGORIEScompute_template_hash,compute_rendered_hash"sha256:"prefix.with_active_prompt,with_active_prompt_groupOTelObserverjinja2>=3.1docs/concepts/prompts.md+docs/reference/prompts.mdRelease gate
PR-3 of a five-PR batch (
0016→0015→0017→0014→0011). Do not tag a release until all five land — the CHANGELOG[Unreleased]Notes section carries the gate from PR-1.Commits
feat(prompts): error classes and category constantsfeat(prompts): Prompt, PromptResult, PromptGroup typesfeat(prompts): PromptBackend protocol, PromptManager, jinja2 depfeat(prompts): FilesystemPromptBackend + OTel attribute propagationtest(conformance): prompt-management harness and 12 fixturestest(unit): prompts subpackage + OTel attribute propagationdocs: prompts concept page, API reference, changelogNotable implementation details
""through Jinja2 (e.g.,{{ x if x else '' }}withx=None) would constructUserMessage(content="")which Pydantic rejects. The render path catches thatValidationErrorand re-raises asPromptRenderErrorper §10's "variable's value not coercible" framing. Surfaces the prompt's identity + the variables + the underlying description.name,version,label,template_hash,rendered_hash,group_name) match the normative §11 list.with_active_promptandwith_active_prompt_groupare active, the per-prompt attributes AND thegroup_nameattribute fire on the span.fetched_aton returned Prompts per §3; the FS backend's docstring notes the rule applies to caching backends even though FS doesn't cache.[UserMessage(content=rendered_text)]. Multi-message split convention deferred to a follow-on if real patterns surface.Conformance harness extensions
tests/conformance/harness/prompt_management.pyadds typed YAML models for the new fixture shape (backends:+manager:+calls:withtarget/operation/capture_as, plus per-call and top-levelexpectedblocks forraises/result_equivalence/prompt_group/rendered_hash_equal/rendered_hash_different).harness/fixtures.pyregistersPromptManagementFixturein the discriminated union; the discriminator recognizes top-levelbackends:(withoutmock_provider:) as the prompt-management shape.harness/loader.pyaddsprompt-managementtoCAPABILITIESsotest_fixture_parsing.pydiscovers and parses the 12 new fixtures.tests/conformance/test_prompt_management.pyparametrizes over the 12 fixtures and drives them against the realPromptManager+ an in-processMockPromptBackend. No I/O.Test plan
uv run pytest— 602 pass, 79 skipped (up from 73; +6 new docs example snippets inprompts.md), 0 failed.uv run pyright— clean.uv run ruff check+uv run ruff format— clean.uv run --group docs mkdocs build --strict— clean.001-fetch-successthrough012-prompt-result-rendered-hash-stability) pass.tests/unit/test_prompts.pycovering construction, error attribute carriage, hashing determinism, FilesystemPromptBackend disk I/O, context-var propagation, empty-string render boundary wrap, and PromptManager fallback semantics.tests/unit/test_observability_otel.pycovering attribute propagation (both present-with-context and absent-without-context cases).docs/concepts/prompts.mdend-to-end againstmkdocs serveto confirm rendering of code blocks and cross-links.Pre-1.0 SemVer
Additive change. New subpackage; no existing surface modified except the OTel observer (which gained a no-op-when-no-active-prompt branch). Existing callers see no behavior change. Jinja2 is added to required runtime dependencies — downstream
openarmatureconsumers will pick it up on nextpip install -U/uv sync.