Verifiable Browser Runtime (VBR) is a compact contract-first, verifier-first, trace-first reliability runtime for controlled local browser agent tasks.
It turns supported local browser goals into explicit task contracts, routes actions through deterministic guards and adapters, verifies completion with objective success criteria, and records structured traces for failure attribution. It is not a general browser agent, not a public-web automation system, and not a benchmark leaderboard.
Browser agents can report "done" while the page state is wrong, unsafe, or unreviewable. VBR keeps a small set of controlled local browser tasks inside a contracted runtime loop: plan the task, guard the action, execute locally, verify objectively, and leave trace evidence a reviewer can inspect.
| Proof | Status | Evidence |
|---|---|---|
| Contracted local runtime path | passed |
demo-evidence.md |
| False-success interception | failed as expected |
false-success-summary.md |
| Bounded verifier-driven retry/replan | passed |
verifier-retry-summary.md |
| Unsafe action policy block | blocked |
policy-block-summary.md |
| Static trace summaries | tracked |
trace-summaries/ |
| Optional local vision evidence | bounded local |
vision-screenshot-fixture-smoke.md, real-vision-end-to-end-controlled-task.md |
flowchart LR
goal["Supported local goal"] --> planner["ContractPlanner<br/>TaskContract + initial plan"]
planner --> router["RuleRouter<br/>explicit route decision"]
router --> policy["PolicyGuard<br/>pre-execution safety gate"]
policy -->|allowed| executor["PlaywrightExecutor<br/>controlled local actions"]
policy -->|blocked| trace["TraceWriter<br/>trace.json + summary.md + summary.html + curated evidence"]
router -. "DOM-insufficient core demo" .-> mock["MockVisionGrounder<br/>adapter contract only"]
router -. "optional local evidence" .-> buv["browser-use-vision fixture smoke<br/>controlled screenshot + local Florence backend<br/>not core dependency"]
executor --> verifier["Verifier<br/>objective success conditions"]
mock --> verifier
buv --> verifier
verifier --> trace
python3 -m pip install -e .[dev]
python3 -m playwright install chromium
vbr-demo all
python3 -m pytest -q
OPENSPEC_TELEMETRY=0 openspec validate --all --strictRepresentative demo output:
dom: succeeded -> runs/<run_id>
visual: succeeded -> runs/<run_id>
false-success: failed -> runs/<run_id>
verifier-retry: succeeded -> runs/<run_id>
policy-block: blocked -> runs/<run_id>
- Detailed evidence map:
docs/evidence/demo-evidence.md. - CLI evidence:
docs/evidence/demo-cli-output.txt. - Static trace summaries:
docs/evidence/trace-summaries/. - Release notes:
docs/release-notes/v0.1.0.md. - Public readiness evidence:
docs/evidence/public-release-readiness.md. - Claim boundaries: Non-Goals And Claim Boundaries.
The optional vision rows are intentionally narrow: they are controlled local fixture and deterministic fake-real task results, not public-web generalization, not universal icon localization, and not benchmark-grade vision performance.
| Proof point | Current outcome | What it proves | Evidence |
|---|---|---|---|
| Core runtime demo | passed |
Supported local goals can flow through contract planning, routing, guarded execution, objective verification, and trace writing. | demo-evidence.md, demo-cli-output.txt |
| False-success verifier proof | failed as expected |
Executor self-report is not accepted when objective success conditions are missing. | false-success-summary.md |
| Verifier-driven retry proof | passed |
One controlled deterministic verifier-driven retry/replan proof: a no-effect self-report fails verification, one bounded recovery replan runs, and final objective verification passes. | verifier-retry-summary.md |
| Policy-block proof | blocked |
PolicyGuard stops an unsafe submit action before executor execution. |
policy-block-summary.md |
| Mock visual fallback route | passed |
The core runtime can route DOM-insufficient targets through a typed vision-grounder adapter path. This is mock adapter routing only. | visual-summary.md |
| Static trace summary | tracked |
Runtime runs now write summary.html as a static trace summary for reviewer-readable contract, route, policy, executor, verifier, retry, failure, and claim-boundary evidence. It is not a trace replay UI or benchmark evidence. |
trace-summaries/ |
Optional browser-use-vision + local Florence fixture |
passed |
A controlled local screenshot fixture returned a non-mock bbox grounding result through the optional backend-backed adapter path. | vision-screenshot-fixture-smoke.md, vision-screenshot-fixture-smoke.json |
| Optional real-vision end-to-end controlled task | passed |
A deterministic fake-real controlled run passed screenshot bytes to a non-mock provider-shaped grounder, executed a normalized bbox click, and verified the controlled page objective. | real-vision-end-to-end-controlled-task.md, real-vision-end-to-end-controlled-task.json |
VBR uses explicit component handoffs rather than a single black-box browser agent. Safety-critical roles are deterministic-first.
| Role | Runtime object | Responsibility |
|---|---|---|
| Contract/planning | ContractPlanner |
Converts supported local goals into TaskContract plus an initial plan. |
| Routing | RuleRouter |
Selects Playwright, mock vision adapter, policy stop, or verifier route. |
| Safety gate | PolicyGuard |
Blocks forbidden actions, high-risk actions, credential-like entry, and off-allowlist navigation before executor execution. |
| Browser execution | PlaywrightExecutor |
Executes deterministic local browser actions and captures observations/screenshots. |
| Visual fallback | MockVisionGrounder |
Demonstrates an adapter path for DOM-insufficient targets; it is not real visual grounding. |
| Success gate | Verifier |
Checks task-specific success conditions instead of trusting executor self-report. |
| Evidence | TraceWriter |
Emits trace.json, summary.md, and static summary.html for review and failure attribution. |
For developer-facing module paths, handoff objects, and current adapter
boundaries, see docs/adapter-interface.md. It
documents today's VisionGrounder injection point and current Executor /
Verifier shapes without adding a stable plugin registry, production SDK,
public-web automation, or benchmark evidence.
core demo outcomes:
dom: succeeded
visual: succeeded (mock adapter routing only)
false-success: failed (button_click_no_effect)
verifier-retry: succeeded
first verification: failed (button_click_no_effect)
retry_decision: retryable_verifier_failure
replan_created: #real-done
final verification: passed
claim: one controlled deterministic verifier-driven retry/replan proof
policy-block: blocked (unsafe_action_blocked)
optional vision fixture:
source_status=importable
smoke_status=passed
adapter_executed=true
run_kind=controlled-local-screenshot-fixture
provider=browser-use-vision
live_backend=true
backend_configured=true
is_mock=false
method=florence-phrase-grounding
selected_target_ref=bbox:0.2415,0.3395,0.7615,0.6865
claim=controlled local screenshot fixture returned non-mock bbox grounding result
optional real-vision controlled task:
run_kind=deterministic-fake-real-controlled-task
provider_shape=browser-use-vision
live_backend=false
is_mock=false
method=fake-real-bbox-grounding
selected_target_ref=bbox:0.0050,0.0700,0.0550,0.1700
bbox_execution=succeeded
verification=passed
claim=optional real-vision end-to-end controlled task exercised the full controlled visual demo chain; not live backend evidence
The tracked evidence records only sanitized fixture metadata. It does not record backend URLs, hosts, IP addresses, credentials, raw screenshot bytes, unredacted screenshot content, backend logs, or absolute local paths.
The repository includes an MIT license and a curated public-release readiness
note:
docs/evidence/public-release-readiness.md.
The v0.1.0 source-review release is documented in
docs/release-notes/v0.1.0.md. The
tag/GitHub Release publication records the bounded source snapshot after local
validation, Reviewer pass, and remote CI success. A manual GitHub visibility
change or tag/release publication is repository-display evidence, not runtime
capability evidence.
The default runtime remains mock-compatible: vbr-demo all and CI use the mock
vision adapter path unless a real adapter is explicitly selected. This keeps the
core reliability loop deterministic and avoids requiring optional local
packages, credentials, GPU services, backend URLs, or repository secrets.
browser-use-vision and the local Florence backend remain optional local
integrations. They are not core dependencies and are not part of required CI.
If a compatible adapter is installed, it can be selected explicitly:
VBR_BROWSER_USE_VISION_MODULE=<importable_module> vbr-demo --vision-provider browser-use-vision visualA local-only smoke command records whether the optional adapter is available:
BUV_SOURCE_PATH=/path/to/browser-use-vision vbr-demo vision-smoke
# or
VBR_BROWSER_USE_VISION_SOURCE_PATH=/path/to/browser-use-vision vbr-demo vision-smoke
# or
VBR_BROWSER_USE_VISION_MODULE=<importable_module> vbr-demo vision-smoke
# or
VBR_BROWSER_USE_VISION_MODULE=<importable_module> vbr-vision-smokeThere is also an opt-in controlled screenshot fixture smoke:
BUV_SOURCE_PATH=/path/to/browser-use-vision vbr-demo vision-fixture-smoke
# optional, only when a local screenshot backend is already running:
VBR_BROWSER_USE_VISION_BACKEND_URL=<local-backend-url> \
BUV_SOURCE_PATH=/path/to/browser-use-vision vbr-demo vision-fixture-smokeThe tracked smoke artifact currently records a structured local pass: a local
browser-use-vision source path was supplied, the package exposed a compatible
ground(...) adapter entrypoint, and VBR normalized a non-mock adapter result:
docs/evidence/vision-adapter-smoke.md
and
docs/evidence/vision-adapter-smoke.json.
It also records source status separately from smoke status in
docs/evidence/vision-source-discovery.json:
current source status is importable, current smoke status is passed, and
adapter_executed=true. The adapter evidence method is
local-contract-smoke, which means the smoke proves the optional entrypoint and
contract normalization path, not screenshot-based visual grounding.
The tracked controlled screenshot fixture artifact currently records a bounded local pass:
the source is importable, the backend-aware adapter was executed with
controlled PNG screenshot bytes, and Florence phrase grounding returned a
normalized non-mock bounding-box result:
docs/evidence/vision-screenshot-fixture-smoke.md
and
docs/evidence/vision-screenshot-fixture-smoke.json.
The fixture evidence records only backend_configured=true/false and
live_backend=true/false; it does not record backend URLs, hosts, IP
addresses, credentials, raw screenshot bytes, or backend logs.
The tracked optional real-vision end-to-end controlled task artifact records a
deterministic fake-real controlled run through the full controlled visual demo
chain: screenshot bytes were passed to a non-mock provider-shaped grounder,
provider_shape=browser-use-vision and live_backend=false are recorded, a
normalized bbox target was executed by Playwright, and objective verification
passed:
docs/evidence/real-vision-end-to-end-controlled-task.md
and
docs/evidence/real-vision-end-to-end-controlled-task.json.
This artifact proves the runtime wiring and trace shape for the controlled
task. It is not live backend evidence, not a live backend run, and does not
make optional vision components required for CI or core runtime use.
This optional path does not claim universal visual grounding, public-web automation, or benchmark-grade visual performance.
The controlled local demo set includes these proof points:
dom: completes a DOM-rich form task through Playwright and objective verification.visual: routes a DOM-insufficient target through the mockVisionGrounderadapter path. This proves adapter routing only; it does not claim real visual grounding.false-success: clicks a no-effect button that self-reports completion, then catches the false success through objective verification and failure attribution.verifier-retry: starts from the same false-success failure, recordsbutton_click_no_effect, permits one contract-bounded retry decision, creates one deterministic recovery replan for#real-done, and passes final objective verification. This is one controlled deterministic verifier-driven retry/replan proof; it does not prove arbitrary replanning, public-web recovery, benchmark reliability, or production autonomy.policy-block: attempts a forbidden submit action and is blocked byPolicyGuardbefore an executor result is recorded.
These demo tasks are regression evidence for runtime behavior, not a benchmark leaderboard.
Representative output:
dom: succeeded -> runs/<run_id>
visual: succeeded -> runs/<run_id>
false-success: failed -> runs/<run_id>
verifier-retry: succeeded -> runs/<run_id>
policy-block: blocked -> runs/<run_id>
Curated evidence is tracked in
docs/evidence/demo-evidence.md and
docs/evidence/demo-cli-output.txt.
Curated static trace summary HTML evidence is tracked under
docs/evidence/trace-summaries/; these
summary.html examples are static reviewer-readable summaries, not a trace
replay UI. The false-success evidence shows the key verifier-first proof:
executor self-report is not accepted when objective success conditions are
missing.
Run them:
PYTHONPATH=src python3 -m verifiable_browser_runtime.cli allor install the package and use:
vbr-demo allRender an existing run trace into its static HTML summary:
vbr-trace-summary runs/<run_id>/trace.jsonIf Playwright browsers are not installed in a clean environment, install the Chromium browser once:
python3 -m playwright install chromiumRun the test suite:
python3 -m pytest -qThis project intentionally does not claim or implement:
- a general browser agent;
- arbitrary public-web automation;
- a benchmark leaderboard or success-rate competition;
- benchmark-grade visual performance;
- universal icon localization or universal visual grounding;
- production visual support;
- trace replay UI;
- arbitrary verifier-driven replanning or production autonomy;
- voice input;
- required
browser-use-visionor Florence backend dependency in core; - required CI dependency on optional vision packages, GPU services, backend URLs, credentials, or local project paths;
- Stagehand, Browser Use, or Playwright MCP backend integration.
The local pages are controlled proof cases for contract, policy, verification, and trace behavior. The optional visual evidence is limited to controlled local fixture and deterministic fake-real task results that returned or executed non-mock bbox grounding outputs.
The minimal GitHub Actions workflow mirrors the local review gates by running:
python -m pytest -q
openspec validate --all --strictThe workflow also installs Playwright Chromium so browser-backed tests can run
on a fresh runner. It is validation-only: it does not publish, deploy, or create releases, and it does not require repository secrets. CI runs deterministic
fake-real runtime tests, but it does not run the optional live
browser-use-vision / Florence backend fixture evidence path.