issue #66: add no-LLM CI (ephemeral anvil tier-1 + scaffolded test-broker tier-2) by hanwencheng · Pull Request #98 · litentry/agentKeys

hanwencheng · 2026-05-21T01:27:14Z

Closes #66.

Builds on top of #95 (ERC-7730 + EIP-712 typed-data signing). Will need to re-target main once #95 lands.

One file, no scaffolding

Per operator feedback: this PR adds exactly one file — .github/workflows/harness-ci.yml. It invokes the PRODUCTION harness scripts (harness/v2-stage{1,2,3}-demo.sh) unchanged. The only delta from a prod operator's invocation is that scripts/operator-workstation.env is materialized at CI-time with TEST resource names from GitHub secrets.

Mirroring production on Heima mainnet — answer to "is this possible with identical .sol files?"

Yes. EVM contract addresses derive from (deployer_address, nonce) (or CREATE2(salt)), and Solidity → bytecode is deterministic. The identical crates/agentkeys-chain/src/*.sol files compiled by the identical DeployAgentKeysV1.s.sol script and broadcast by a different deployer wallet on Heima mainnet produces a parallel set of contracts at new addresses on the production chain. Same code, same chain, isolated storage — the test contracts can't see or write to prod contract state.

The deploy is one-shot per test-environment refresh (operator action), not per CI run — the test contract addresses are pinned in GitHub secrets so CI doesn't burn HEI on every push. To re-deploy (e.g., after a contract revision), the operator funds the test wallet, runs AGENTKEYS_CHAIN=heima HEIMA_DEPLOYER_KEY_FILE=~/.agentkeys/heima-deployer-test.key bash scripts/heima-bring-up.sh, then updates the TEST_*_HEIMA secrets.

What the workflow does

rust-checks — cargo fmt --check, cargo clippy --workspace -- -D warnings, cargo test --workspace -- --test-threads=1. Covers ~600 tests including all the in-process broker integration tests that already mock STS + SES.
preflight — gates the E2E job on TEST_OIDC_AWS_ROLE_ARN being set. Until the operator activates the test infra, the harness job is a clean ::warning:: skip.
harness-e2e — assumes the test IAM role via GitHub Actions OIDC (no long-lived secrets), writes the test deployer key, overwrites scripts/operator-workstation.env with test resource names, then runs harness/v2-stage1-demo.sh --skip-deploy --skip-email, harness/v2-stage2-demo.sh --stub --skip-build, harness/v2-stage3-demo.sh — the unmodified production scripts.

Operator secrets (one-shot setup)

TEST_OIDC_AWS_ROLE_ARN (gate), TEST_ACCOUNT_ID, TEST_AWS_REGION, TEST_BROKER_HOST, TEST_{VAULT,MEMORY}_BUCKET, TEST_{VAULT,MEMORY,DATA}_ROLE_ARN, TEST_HEIMA_DEPLOYER_KEY, plus the six TEST_*_CONTRACT_ADDRESS_HEIMA for pre-deployed contracts. Full list documented in the workflow file header (no separate docs file per the "no new files" rule).

What did NOT land

Operator-side bring-up scripts for the test broker / IAM roles / S3 buckets / contract deploy — by rule, this PR adds only the CI workflow. The operator follows the existing production runbook (docs/cloud-setup.md + scripts/setup-broker-host.sh --issuer-url https://test-broker.litentry.org) but substitutes the test-suffixed identifiers everywhere. Same scripts, different inputs.
Per-PR fast feedback on stage-3 — stage-3's PrincipalTag isolation needs real AWS STS + a publicly-reachable JWKS, so this workflow is serialized per-ref. PRs that want faster feedback can rely on cargo test --workspace from the rust-checks job (which exercises the same per-data-class isolation logic against in-process mocks).

Test plan

bash -n .github/workflows/harness-ci.yml is not applicable to YAML; python3 -c 'import yaml; yaml.safe_load(...)' parses clean.
Single-file diff confirmed: git diff origin/claude/gallant-ride-cec4d7 --stat → 1 file changed, 260 insertions(+).
First CI run on this PR (rust-checks job) — exercises 600+ tests.
Operator activates test infra (one-shot), sets TEST_OIDC_AWS_ROLE_ARN → harness-e2e job runs end-to-end against Heima mainnet test contracts.

@claude

Two-tier CI matching issue #66's "shared test broker for CI + dev" vision: Tier 1 — ephemeral (every push/PR, fully self-contained, ~10–15 min): * .github/workflows/harness-ci.yml — cargo fmt + clippy + test + harness/ci-ephemeral-stack.sh. No LLM, no @claude invocation. * harness/ci-ephemeral-stack.sh — spins up anvil (new chain), runs forge build + test, deploys fresh v2 stage-1 contracts via DeployAgentKeysV1.s.sol (new contracts, new anvil-prefunded deployer), verifies via scripts/verify-heima-contracts.sh, then stands up mock-server + agentkeys-broker-server with --skip-startup-check (StubSts path) and probes OIDC discovery surface. EXIT trap tears everything down. Tier 2 — long-lived test broker (nightly + workflow_dispatch, scaffolded here, operator-activated via TEST_OIDC_AWS_ROLE_ARN secret): * .github/workflows/harness-e2e.yml — gated workflow that targets test-broker.litentry.org with real test AWS resources, runs all three stage demos against the long-lived parallel infra. Includes nightly cleanup of stale ci/ S3 prefixes. Uses GitHub Actions OIDC (id-token: write) for AWS auth, never long-lived secrets. * scripts/provision-test-environment.sh — operator-run one-shot provisioner that walks the 7 steps to stand up test-broker (separate OIDC provider, separate IAM roles, separate buckets, separate deployer wallet, fresh contracts on Heima-Paseo). * scripts/test-environment.env.example — committed env template mirroring operator-workstation.env with -test suffixes. * docs/test-environment.md — bring-up runbook, secret list, rotation, cleanup, and the two-tier design rationale. WebAuthn: harness scripts default to WEBAUTHN_MODE=0 (stage-1 line 131, stage-2 --stub) so no Touch ID prompt is ever needed; --webauthn is opt-in and never passed by either workflow. Validated locally: bash harness/ci-ephemeral-stack.sh --skip-broker passes all 8 steps (anvil up, 33 forge tests, 6 contracts deployed + verified, clean teardown). YAML + shell syntax checked.

@claude

Per operator feedback: 1. "do not create new files, only add the test file" — drop the ephemeral-stack helper, provisioner, env template, e2e workflow, and docs. Single deliverable: .github/workflows/harness-ci.yml. 2. "onchain solution should test on Heima mainnet with a new smart contract address" — confirmed possible: Solidity compiles deterministically and EVM contract addresses derive from (deployer, nonce). Identical crates/agentkeys-chain/src/*.sol + identical DeployAgentKeysV1.s.sol + a different deployer key on Heima mainnet = isolated parallel contract set at new addresses on the production chain. 3. "CI mirrors the production env" — the workflow now invokes the PRODUCTION harness scripts (harness/v2-stage{1,2,3}-demo.sh) unchanged. The only thing CI does differently from a prod operator is materialize scripts/operator-workstation.env with TEST_* resource names from GitHub secrets: - TEST_OIDC_AWS_ROLE_ARN (gate; until set, harness job skips) - TEST_ACCOUNT_ID / TEST_AWS_REGION / TEST_BROKER_HOST - TEST_VAULT_BUCKET / TEST_MEMORY_BUCKET - TEST_{VAULT,MEMORY,DATA}_ROLE_ARN - TEST_HEIMA_DEPLOYER_KEY (raw 0x-prefixed mainnet key — test wallet, distinct from prod deployer) - TEST_{SCOPE,SIDECAR_REGISTRY,K3_EPOCH_COUNTER, CREDENTIAL_AUDIT,P256_VERIFIER,K11_VERIFIER}_CONTRACT_ADDRESS_HEIMA (pre-deployed once per test-env refresh; harness skips deploy via --skip-deploy so CI doesn't burn HEI on every push) AWS auth via GitHub Actions OIDC (id-token: write), no long-lived secrets. Per-run S3 prefix isolation. The workflow gates itself on TEST_OIDC_AWS_ROLE_ARN being set so it's inert until the operator activates the test infra. WebAuthn: never invoked — harness scripts default to WEBAUTHN_MODE=0 (stage-1 line 131) and stage-2's --stub flag is passed explicitly. LLM: zero. Plain cargo/forge/aws-cli/curl orchestration. Distinct from claude.yml + claude-code-review.yml which intentionally do call @claude.

…ima}.sh Per operator request: pivot cloud-setup.md from a verbose manual-bash runbook to a concise prereq/script-pointer split, add new heima-setup.md + ci-setup.md for the chain + CI flows, and move troubleshooting into the ./wiki/ folder. What changed: docs/cloud-setup.md — UPDATE, 970 → 314 lines Add a TL;DR with the three-command operator flow (manual §1-§4 prereqs, then setup-broker-host.sh, then setup-heima.sh). Slim §1-§4 to invariants + helper-script pointers + brief command blocks (DKIM bulk-record / receipt rule / per-data-class role provisioning all delegate to the existing scripts/*.sh). Replace the verbose §5/§6/§7 (EC2 broker / signer / workers, each with 100+ lines of inline bash) with one §5 "Run setup-broker-host.sh" section that names what the script does (build, systemd, nginx, certbot, keypairs, env files) + what it doesn't (DNS, IAM, OIDC provider — those stay in §1-§4). Keep §0 (identities table) and §6 (cleanup recipe). docs/heima-setup.md — NEW, 106 lines The 15-step pipeline in scripts/setup-heima.sh, with idempotency check + helper-script pointer per step. Mainnet vs Paseo vs Anvil tradeoff table. Per-step re-run examples. Heima London EVM pin explanation. docs/ci-setup.md — NEW, 184 lines The 7-step operator bring-up for the no-LLM .github/workflows/harness-ci.yml workflow: provision test broker via setup-broker-host.sh with -test suffix, provision parallel AWS resources, register the test OIDC provider, generate + fund the test deployer wallet, deploy fresh test contracts on Heima mainnet with the same .sol source (different deployer → different addresses → isolated parallel contract set), register the GitHub Actions OIDC role, set the repo secrets. Includes the full TEST_* secret list, manual-dispatch instructions, and a secret-hygiene reminder. wiki/cloud-setup-faq.md — NEW, 94 lines wiki/heima-setup-faq.md — NEW, 111 lines wiki/ci-setup-faq.md — NEW, 96 lines Troubleshooting + edge cases for each setup doc. Lives under ./wiki/ per CLAUDE.md "Wiki-location policy" — auto-published to the GitHub wiki on every push to main. Constraints applied: - Concise: every doc fits in a few screens. - Idempotent: every flow reuses the existing idempotent helper scripts (setup-broker-host.sh, setup-heima.sh, provision-*-role.sh, apply-*-bucket-policy.sh). - No project credentials exposed: account IDs, role ARNs, bucket names, deployer keys, contract addresses all referenced via ${ACCOUNT_ID} / ${BROKER_HOST} / ${REGION} placeholders or via "read from operator-workstation.env" / "from step N" pointers. Real values live only in the operator's local env file + the GitHub repo secrets store. All internal links verified via a python url-walker (every relative link resolves to an existing file).

hanwencheng · 2026-05-21T02:06:21Z

Added concise setup docs aligned with the existing idempotent scripts (commit 5a66a85):

Docs (concise, idempotent, no project credentials exposed):

Doc	Lines	What it covers
docs/cloud-setup.md (updated, 970 → 314)	314	TL;DR + §1–§4 prereqs (DNS, SES, IAM, OIDC) + §5 "Run `setup-broker-host.sh`" + §6 cleanup. The script is named as the single entry point per CLAUDE.md "Remote broker host" rule.
docs/heima-setup.md (new)	106	The 15-step `setup-heima.sh` pipeline (from #95), with per-step idempotency check + helper-script pointer. Mainnet / Paseo / Anvil tradeoff table.
docs/ci-setup.md (new)	184	Operator bring-up for the harness-ci.yml workflow: 7 steps, then the `TEST_*` secret list. Confirms the "same `.sol` source, different deployer → isolated parallel contracts on Heima mainnet" story.

FAQ wiki pages (independent, per the request):

wiki/cloud-setup-faq.md — 94 lines
wiki/heima-setup-faq.md — 111 lines
wiki/ci-setup-faq.md — 96 lines

Secret hygiene: every account ID / role ARN / bucket name / deployer key / contract address in the docs is a ${PLACEHOLDER} or a "read from operator-workstation.env" pointer. Real values live only in the operator's local env file + GitHub repo secrets. The lone hex address in the heima FAQ (anvil's first-nonce P256Verifier deploy) is deterministic public knowledge — anvil ships with a fixed default mnemonic for reproducible test setups.

Net doc diff: -839 / +782 lines (cloud-setup.md alone shrunk 970 → 314 by collapsing inline bash into script pointers).

Per operator request: the very-beginning cloud-account provisioning (IAM users + role, DNS, SES, S3 buckets, instance profile) needs to live in a separate doc so it stays reachable when: - Adding a second AWS account (test instance, regional shard) - Migrating to AliCloud / GCP / Tencent Cloud - Re-bootstrapping after a teardown - Auditing the identity surface The previous condense pass collapsed those sections into cloud-setup.md's slim §1-§3 — convenient for day-to-day operators but stripped the depth needed for the migration / second-account use cases. What changed: docs/cloud-bootstrap.md — NEW, 365 lines First-time, per-account, cloud-provider-portable bootstrap doc: §1 Identities — four IAM principals, cloud-agnostic §2 Domain + DNS — subdomain map, parent-zone confirm §3 Email backend — SES domain verify + receipt rule + inbound S3 bucket creation §4 IAM users + roles — agentkeys-daemon + agentkeys-data-role + per-data-class vault/memory roles §5 Initial bucket policy — static-IAM variant (pre-OIDC) §6 Instance profile — agentkeys-broker-host (EC2 optional) §7 Security audit — strip legacy over-broad attached policies (`AmazonS3FullAccess` checklist from the pre-condense §3.4a) §8 Cloud-provider port — AWS / AliCloud / GCP / Tencent Cloud 1:1 mapping table + migration playbook Restores the operational depth (DKIM bulk-record bash, daemon user create, role trust shape, broker-host instance profile, security audit) that the previous condense pass removed. Adds the portability framing (concept first, AWS-specific commands as ONE implementation) so the doc is the durable reference for non-AWS deployments. docs/cloud-setup.md — UPDATE, 314 → 202 lines Refocus on what comes AFTER bootstrap: OIDC federation activation (§1, was §4) + the setup-broker-host.sh runtime entry point (§2, was §5) + cleanup (§3, was §6). Drop the duplicate §1-§3 prereqs; add a clear cross-ref to cloud-bootstrap.md at the top. Section numbers renumbered. wiki/cloud-setup-faq.md — minor header tweak The FAQ now covers both cloud-bootstrap.md and cloud-setup.md (operators hit the same gotchas across both phases). Constraints applied: - Concise: every doc still fits in a few screens (bootstrap is longest at 365 lines because it carries the actual provisioning commands; cloud-setup.md is now 202 lines, down from 970 originally). - Idempotent: every flow uses the existing idempotent helper scripts. - No project credentials exposed: same placeholder convention as the prior pass (${ACCOUNT_ID}, ${ZONE}, etc.). Verified via grep. All internal links verified (python url-walker).

WildmetaAgent added 2 commits May 21, 2026 09:33

hanwencheng force-pushed the claude/romantic-ardinghelli-34d7a7 branch from bbe72b8 to cd25bde Compare May 21, 2026 01:34

hanwencheng changed the base branch from main to claude/gallant-ride-cec4d7 May 21, 2026 01:34

hanwencheng added the documentation Improvements or additions to documentation label May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue #66: add no-LLM CI (ephemeral anvil tier-1 + scaffolded test-broker tier-2)#98

issue #66: add no-LLM CI (ephemeral anvil tier-1 + scaffolded test-broker tier-2)#98
hanwencheng wants to merge 4 commits into
claude/gallant-ride-cec4d7from
claude/romantic-ardinghelli-34d7a7

hanwencheng commented May 21, 2026 •

edited

Loading

Uh oh!

hanwencheng commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hanwencheng commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

One file, no scaffolding

Mirroring production on Heima mainnet — answer to "is this possible with identical .sol files?"

What the workflow does

Operator secrets (one-shot setup)

What did NOT land

Test plan

Uh oh!

hanwencheng commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hanwencheng commented May 21, 2026 •

edited

Loading