Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,37 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Fixed
- **Firewall redaction now fails closed at the depth boundary (#149).**
`redact()` previously returned any subtree nested at/below `max_depth`
verbatim, so PII/secrets nested beyond the cap reached the LLM unscanned. The
boundary now scrubs leaf strings and *elides* nested containers
(`[REDACTED: nested data beyond depth limit]`) instead of returning them raw.
- **Handle expansion is now redacted (#150).** `HandleStore.expand()` routes its
projected rows through the same `redact()` the firewall applies on first
invocation, so a secret inline in a grant-permitted field (e.g. a Bearer token
in a `note` value) is scrubbed on the expand path. Expansion Frames now carry
redaction `warnings`.
- **Cross-chunk redaction safety for streaming Frames (#151).**
`Firewall.apply_stream()` keeps a per-field `StreamRedactor` that holds back a
trailing overlap window, so a secret whose characters are split across two
streamed chunks is reassembled and redacted before either half is emitted.
Documented limit: patterns containing internal whitespace split exactly at the
held boundary may still evade detection (see `docs/security.md`).
- **Trace argument and error redaction extended beyond `memory.*` (#172).**
`ActionTrace.args` for **every** capability — and driver `error` text — now
pass through the firewall redactor before persistence, so the trace store no
longer becomes a sensitive-data sink when arguments carry secrets/PII or when a
`DriverError` embeds a raw response body. Memory-payload stripping for
`memory.*` capabilities is unchanged.

### Added
- **Secret-canary regression suite (#206).** `tests/test_firewall_canary.py`
plants distinctive canary secrets and asserts they never appear in any kernel
egress — summary/table/raw Frames, handle expansions, streamed chunks, trace
args/errors, and adapter-rendered payloads — turning the I-01 boundary into an
executable invariant and a regression net for the fixes above.

## [0.11.0] - 2026-06-13

### Added
Expand Down
21 changes: 21 additions & 0 deletions docs/context_firewall.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,27 @@ When a capability has `SensitivityTag.PII` or `SensitivityTag.PCI`:

Principals with the `pii_reader` role bypass `allowed_fields` enforcement.

Redaction is applied on **every** path that returns data to the LLM, not just
the first `transform()`:

- **Depth boundary (fail-closed).** The `max_depth` cap bounds recursion cost.
At the boundary, scalar strings are still pattern-scrubbed, but a nested
container is *elided* (`[REDACTED: nested data beyond depth limit]`) rather
than returned verbatim — a deeply nested subtree never reaches the LLM
unscanned.
- **Handle expansion.** `HandleStore.expand()` runs its projected rows through
the same `redact()` as the first invocation, so a secret inline in a
permitted field (e.g. a token in a `note` value) is scrubbed on expand too.
- **Streaming.** `Firewall.apply_stream()` keeps a per-field `StreamRedactor`
that holds back a trailing overlap window, so a secret split across two
chunks is reassembled and redacted before either half is emitted. Patterns
containing internal whitespace (phone/SSN/spaced card numbers) split exactly
at the held boundary may still evade detection — see `docs/security.md`.

Invocation **arguments** recorded on `ActionTrace.args`, and driver **error**
text, are run through the same redactor before persistence, so the trace store
never becomes a sensitive-data sink (see `docs/security.md`).

## Summarization

Summaries are produced deterministically:
Expand Down
13 changes: 10 additions & 3 deletions docs/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,14 @@
| Token forgery / tampering | HMAC-SHA256 signature; any bit flip → `TokenInvalid` |
| Token replay after expiry | Expiry checked on every `verify()` call |
| Context injection via raw tool output | Firewall always transforms `RawResult → Frame`; raw data never reaches LLM by default |
| PII / PCI leakage | Redaction + `allowed_fields` enforcement in the firewall |
| PII / PCI leakage | Redaction + `allowed_fields` enforcement in the firewall, applied on every egress path (summary/table/raw, handle expansion, streaming) |
| PII / secret leak below the depth budget | Redaction fails *closed* at `max_depth`: leaf strings are scrubbed; nested containers are elided rather than returned verbatim (#149) |
| Inline secret leak via handle expansion | `HandleStore.expand()` runs projected rows through the firewall redactor, so a secret in a permitted field is scrubbed (#150) |
| Cross-chunk secret split in streaming | `Firewall.apply_stream()` holds back a per-field overlap window so a secret spanning two chunks is reassembled before redaction (#151) |
| Privilege escalation via WRITE/DESTRUCTIVE | Policy engine enforces role requirements |
| Audit evasion | Every `invoke()` creates an immutable `ActionTrace` |
| Handle scope escape (expand exceeds grant) | Handles persist grant constraints; `HandleStore.expand` rechecks `max_rows`, `allowed_fields`, `scope`, and principal binding (#76) |
| Memory exfiltration via tool output | `SensitivityTag.MEMORY` capabilities gate sensitive reads and durable writes; `ActionTrace.args` redacts payload-like fields for `memory.*` capabilities (#75) |
| Raw memory payload reaching audit log | Kernel strips `payload`/`content`/`value`/`memory`/`text`/`body` from `ActionTrace.args` for `memory.*` capabilities |
| Sensitive data reaching the audit log via args/errors | `ActionTrace.args` and driver `error` text are run through the firewall redactor for **every** capability; memory payloads (`payload`/`content`/`value`/`memory`/`text`/`body`) are additionally stripped wholesale for `memory.*` capabilities (#75, #172) |
| Scanned content / raw result reaching audit log | `ActionTrace.result_summary` is built only from the post-firewall `Frame` (counts and flags, never raw driver data), so the audit trail records an invocation's outcome without re-introducing the data the firewall removed |

## Token scopes
Expand Down Expand Up @@ -126,6 +128,11 @@ audit.db` exits non-zero on any divergence (see [cli.md](cli.md)).
- The `WEAVER_KERNEL_SECRET` must be kept secret. Rotate it if compromised.
- The default `InMemoryDriver` has no persistence — suitable for testing only.
- PII redaction is heuristic (regex-based). It is not a substitute for proper data governance.
- Streaming redaction (`Firewall.apply_stream`) reassembles patterns split across
chunks by holding back a bounded overlap window. A contiguous secret
(JWT/Bearer/API-key/connection-string body) is never split across a commit
boundary, but a pattern containing internal whitespace (phone, SSN, spaced card
number) split exactly at the held boundary may still evade detection.
- Rate limiting is enforced per `(principal_id, capability_id)` pair using a sliding window.
Default limits: 60 READ / 10 WRITE / 2 DESTRUCTIVE invocations per 60-second window.
Principals with the `"service"` role receive 10× the default limits. Limits are
Expand Down
129 changes: 114 additions & 15 deletions src/weaver_kernel/firewall/redaction.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,12 +65,39 @@
"""Matches connection strings containing embedded credentials (``scheme://user:pass@host``)."""

_REDACTED = "[REDACTED]"
_DEPTH_ELIDED = "[REDACTED: nested data beyond depth limit]"


def _is_sensitive_field_name(name: str) -> bool:
return name.lower() in _SENSITIVE_FIELDS


def _redact_string(data: str) -> tuple[str, list[str]]:
"""Redact inline sensitive patterns from a single string.

Pure leaf helper shared by :func:`redact` and :class:`StreamRedactor` so
the pattern set lives in exactly one place.

Args:
data: The string to scrub.

Returns:
A tuple of ``(redacted_string, warnings)``.
"""
original = data
data = _EMAIL_RE.sub(_REDACTED, data)
data = _PHONE_RE.sub(_REDACTED, data)
data = _CARD_RE.sub(_REDACTED, data)
data = _SSN_RE.sub(_REDACTED, data)
data = _BEARER_RE.sub(_REDACTED, data)
data = _JWT_RE.sub(_REDACTED, data)
data = _API_KEY_RE.sub(r"\1" + _REDACTED, data)
data = _CONN_STR_RE.sub(r"\1" + _REDACTED + r"\2", data)
if data != original:
return data, ["String value contained sensitive patterns and was redacted."]
return data, []


def redact(
data: Any,
*,
Expand All @@ -84,6 +111,13 @@ def redact(
all others are removed. Sensitive field names are replaced with
``[REDACTED]`` regardless.

The ``max_depth`` cap bounds recursion cost; it must **fail closed**. At
the boundary, scalar strings are still pattern-redacted (a leaf scan is
cheap and cannot recurse), but any nested container is *elided* and
replaced with a marker rather than returned verbatim — a deeply nested
subtree must never reach the LLM unscanned (the I-01 boundary; see
``docs/agent-context/invariants.md``).

Args:
data: The data to redact.
allowed_fields: If non-empty, only keep these field names in dicts.
Expand All @@ -94,10 +128,18 @@ def redact(
A tuple of ``(redacted_data, warnings)`` where *warnings* is a list of
human-readable strings describing what was redacted.
"""
warnings: list[str] = []

if depth >= max_depth:
return data, warnings
# Fail closed at the depth boundary: scrub leaf strings, elide nested
# containers (they would otherwise flow through unredacted).
if isinstance(data, str):
return _redact_string(data)
if isinstance(data, dict | list):
Comment thread
dgenio marked this conversation as resolved.
return _DEPTH_ELIDED, [
"Nested data beyond the configured max_depth was elided (not scanned)."
]
return data, []

warnings: list[str] = []

if isinstance(data, dict):
result: dict[str, Any] = {}
Expand Down Expand Up @@ -127,17 +169,74 @@ def redact(
return redacted_list, warnings

if isinstance(data, str):
original = data
data = _EMAIL_RE.sub(_REDACTED, data)
data = _PHONE_RE.sub(_REDACTED, data)
data = _CARD_RE.sub(_REDACTED, data)
data = _SSN_RE.sub(_REDACTED, data)
data = _BEARER_RE.sub(_REDACTED, data)
data = _JWT_RE.sub(_REDACTED, data)
data = _API_KEY_RE.sub(r"\1" + _REDACTED, data)
data = _CONN_STR_RE.sub(r"\1" + _REDACTED + r"\2", data)
if data != original:
warnings.append("String value contained sensitive patterns and was redacted.")
return data, warnings
return _redact_string(data)

return data, warnings


# Characters that can appear *inside* a contiguous secret token (JWT, Bearer
# value, API key, connection-string body). A commit boundary is never placed
# inside a run of these, so such a token is never split across chunks.
_TOKEN_CHAR_RE = re.compile(r"[A-Za-z0-9._~+/:=@-]")

# How many trailing characters of a string stream are held back before
# emission so a pattern split across two chunks is reassembled first.
_STREAM_OVERLAP = 256


class StreamRedactor:
"""Redacts an incrementally delivered text stream with cross-chunk safety.

A per-chunk regex pass cannot catch a secret whose characters are split
across two chunks (e.g. ``"...eyJ"`` then ``"abc.def..."``). This buffer
holds back the trailing :data:`_STREAM_OVERLAP` characters of the stream
and only commits text once enough right-context has arrived, so a pattern
straddling a chunk boundary is reassembled before either half is emitted.

Commit boundaries are placed only at non-token separators, so a contiguous
secret (JWT/Bearer/API-key/connection-string body) is never severed across
a commit. Patterns that contain internal whitespace (phone, SSN, spaced
card numbers) and are split exactly at the held boundary may still evade
detection — a documented limit, mirrored in ``docs/security.md``.

The redactor is single-stream and stateful: feed chunks in order, then
call :meth:`flush` once at end-of-stream.
"""

__slots__ = ("_pending", "_overlap", "_max_pending")

def __init__(self, *, overlap: int = _STREAM_OVERLAP) -> None:
self._pending = ""
self._overlap = overlap
# Bound the buffer: a single unbroken token longer than this is
# force-committed rather than held indefinitely (memory safety).
self._max_pending = overlap * 4

def feed(self, text: str) -> tuple[str, list[str]]:
"""Accept the next chunk; return ``(redacted_committed_text, warnings)``.

The returned text is the portion now safe to emit; the trailing
overlap window is retained until a later :meth:`feed` or :meth:`flush`.
"""
if text:
self._pending += text
if len(self._pending) <= self._overlap:
return "", []
cut = len(self._pending) - self._overlap
if len(self._pending) <= self._max_pending:
# Back the cut off a contiguous token so we never sever one.
while cut > 0 and _TOKEN_CHAR_RE.match(self._pending[cut - 1]):
cut -= 1
if cut <= 0:
return "", []
committed = self._pending[:cut]
self._pending = self._pending[cut:]
return _redact_string(committed)

def flush(self) -> tuple[str, list[str]]:
"""Redact and return any buffered remainder at end-of-stream."""
if not self._pending:
return "", []
out = _redact_string(self._pending)
self._pending = ""
return out
66 changes: 63 additions & 3 deletions src/weaver_kernel/firewall/transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
ResponseMode,
)
from .budgets import Budgets
from .redaction import redact
from .redaction import StreamRedactor, redact
from .summarize import summarize

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -232,9 +232,15 @@ async def apply_stream(
budget caps that apply to a single-shot :meth:`transform` apply to
*every* chunk — PII never leaks even when results stream in.

Cross-chunk redaction safety: top-level string fields are routed
through a per-field :class:`StreamRedactor`, which holds back a
trailing overlap window so a secret whose characters span two chunks
is reassembled and redacted before either half is emitted. Non-string
and nested values are redacted per chunk by :meth:`transform`.

Mode escalation across chunks (e.g. dropping from ``table`` to
``summary`` as budget drains) is the caller's responsibility — the
Firewall itself is stateless. ``Kernel.invoke_stream`` orchestrates
Firewall itself does not escalate. ``Kernel.invoke_stream`` orchestrates
escalation via :class:`BudgetManager.suggested_mode`.

The synthetic key ``"__is_final__"`` on a chunk is stripped before
Expand All @@ -256,9 +262,13 @@ async def apply_stream(
Yields:
:class:`Frame` chunks with ``is_final`` set on the last one.
"""
redactors: dict[str, StreamRedactor] = {}
async for chunk in response_chunks:
is_final = bool(chunk.get("__is_final__", False))
payload = {k: v for k, v in chunk.items() if k != "__is_final__"}
raw_payload = {k: v for k, v in chunk.items() if k != "__is_final__"}
payload, stream_warnings = _apply_stream_redactors(
raw_payload, redactors, is_final=is_final
)
synthetic_raw = RawResult(
capability_id=capability_id,
data=payload,
Expand All @@ -272,6 +282,8 @@ async def apply_stream(
response_mode=response_mode,
constraints=constraints,
)
if stream_warnings:
frame = replace(frame, warnings=[*frame.warnings, *stream_warnings])
if is_final:
frame = replace(frame, is_final=True)
yield frame
Expand All @@ -295,6 +307,54 @@ def _make_table(self, data: Any, *, max_rows: int) -> list[dict[str, Any]]:
return result


def _apply_stream_redactors(
payload: dict[str, Any],
redactors: dict[str, StreamRedactor],
*,
is_final: bool,
) -> tuple[dict[str, Any], list[str]]:
"""Route a chunk's top-level string fields through per-field redactors.

String values are fed to a :class:`StreamRedactor` (created lazily per
field) so patterns split across chunks are reassembled before emission.
Non-string values are passed through unchanged — :meth:`Firewall.transform`
still redacts them per chunk. On the final chunk every active redactor is
flushed, including fields absent from the final payload (their held tail is
re-attached under the original key) so no buffered text is dropped.

Args:
payload: The chunk payload (``__is_final__`` already stripped).
redactors: Mutable per-field redactor state carried across chunks.
is_final: Whether this is the last chunk (triggers flush).

Returns:
``(redacted_payload, warnings)``.
"""
out: dict[str, Any] = {}
warnings: list[str] = []
for key, value in payload.items():
if isinstance(value, str):
redactor = redactors.setdefault(key, StreamRedactor())
committed, warns = redactor.feed(value)
if is_final:
tail, tail_warns = redactor.flush()
committed += tail
warns = [*warns, *tail_warns]
out[key] = committed
warnings.extend(warns)
else:
out[key] = value
if is_final:
for key, redactor in redactors.items():
if key in out:
continue
tail, tail_warns = redactor.flush()
if tail:
out[key] = tail
warnings.extend(tail_warns)
return out, warnings


def _cap_facts(facts: list[str], max_chars: int) -> list[str]:
"""Return as many facts as fit within *max_chars* total."""
total = 0
Expand Down
14 changes: 13 additions & 1 deletion src/weaver_kernel/handles.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from typing import Any

from .errors import HandleConstraintViolation, HandleExpired, HandleNotFound
from .firewall.redaction import redact
from .models import Frame, Handle, Provenance, ResponseMode
from .policy_reasons import DenialReason

Expand Down Expand Up @@ -319,11 +320,22 @@ def expand(
else:
table_preview = [{"value": r} for r in rows]

# ── Redaction ───────────────────────────────────────────────────────────
# expand() builds its Frame directly from the raw stored dataset, which
# is persisted pre-firewall. Field-level grant constraints
# (allowed_fields / scope) are already enforced above, but a permitted
# field can still carry inline secrets (e.g. a Bearer token in a `note`
# value). Route the projected rows through the same redactor the
# Firewall applies on first invocation so the I-01 boundary holds on the
# expansion path too (see docs/agent-context/invariants.md).
redacted_preview, warnings = redact(table_preview)

return Frame(
action_id=action_id,
capability_id=handle.capability_id,
response_mode=response_mode,
table_preview=table_preview,
table_preview=redacted_preview,
warnings=warnings,
handle=handle,
provenance=Provenance(
capability_id=handle.capability_id,
Expand Down
Loading
Loading