Skip to content

issue #66: add no-LLM CI (ephemeral anvil tier-1 + scaffolded test-broker tier-2)#98

Open
hanwencheng wants to merge 4 commits into
claude/gallant-ride-cec4d7from
claude/romantic-ardinghelli-34d7a7
Open

issue #66: add no-LLM CI (ephemeral anvil tier-1 + scaffolded test-broker tier-2)#98
hanwencheng wants to merge 4 commits into
claude/gallant-ride-cec4d7from
claude/romantic-ardinghelli-34d7a7

Conversation

@hanwencheng
Copy link
Copy Markdown
Member

@hanwencheng hanwencheng commented May 21, 2026

Closes #66.

Builds on top of #95 (ERC-7730 + EIP-712 typed-data signing). Will need to re-target main once #95 lands.

One file, no scaffolding

Per operator feedback: this PR adds exactly one file.github/workflows/harness-ci.yml. It invokes the PRODUCTION harness scripts (harness/v2-stage{1,2,3}-demo.sh) unchanged. The only delta from a prod operator's invocation is that scripts/operator-workstation.env is materialized at CI-time with TEST resource names from GitHub secrets.

Mirroring production on Heima mainnet — answer to "is this possible with identical .sol files?"

Yes. EVM contract addresses derive from (deployer_address, nonce) (or CREATE2(salt)), and Solidity → bytecode is deterministic. The identical crates/agentkeys-chain/src/*.sol files compiled by the identical DeployAgentKeysV1.s.sol script and broadcast by a different deployer wallet on Heima mainnet produces a parallel set of contracts at new addresses on the production chain. Same code, same chain, isolated storage — the test contracts can't see or write to prod contract state.

The deploy is one-shot per test-environment refresh (operator action), not per CI run — the test contract addresses are pinned in GitHub secrets so CI doesn't burn HEI on every push. To re-deploy (e.g., after a contract revision), the operator funds the test wallet, runs AGENTKEYS_CHAIN=heima HEIMA_DEPLOYER_KEY_FILE=~/.agentkeys/heima-deployer-test.key bash scripts/heima-bring-up.sh, then updates the TEST_*_HEIMA secrets.

What the workflow does

  1. rust-checkscargo fmt --check, cargo clippy --workspace -- -D warnings, cargo test --workspace -- --test-threads=1. Covers ~600 tests including all the in-process broker integration tests that already mock STS + SES.
  2. preflight — gates the E2E job on TEST_OIDC_AWS_ROLE_ARN being set. Until the operator activates the test infra, the harness job is a clean ::warning:: skip.
  3. harness-e2e — assumes the test IAM role via GitHub Actions OIDC (no long-lived secrets), writes the test deployer key, overwrites scripts/operator-workstation.env with test resource names, then runs harness/v2-stage1-demo.sh --skip-deploy --skip-email, harness/v2-stage2-demo.sh --stub --skip-build, harness/v2-stage3-demo.sh — the unmodified production scripts.

Operator secrets (one-shot setup)

TEST_OIDC_AWS_ROLE_ARN (gate), TEST_ACCOUNT_ID, TEST_AWS_REGION, TEST_BROKER_HOST, TEST_{VAULT,MEMORY}_BUCKET, TEST_{VAULT,MEMORY,DATA}_ROLE_ARN, TEST_HEIMA_DEPLOYER_KEY, plus the six TEST_*_CONTRACT_ADDRESS_HEIMA for pre-deployed contracts. Full list documented in the workflow file header (no separate docs file per the "no new files" rule).

What did NOT land

  • Operator-side bring-up scripts for the test broker / IAM roles / S3 buckets / contract deploy — by rule, this PR adds only the CI workflow. The operator follows the existing production runbook (docs/cloud-setup.md + scripts/setup-broker-host.sh --issuer-url https://test-broker.litentry.org) but substitutes the test-suffixed identifiers everywhere. Same scripts, different inputs.
  • Per-PR fast feedback on stage-3 — stage-3's PrincipalTag isolation needs real AWS STS + a publicly-reachable JWKS, so this workflow is serialized per-ref. PRs that want faster feedback can rely on cargo test --workspace from the rust-checks job (which exercises the same per-data-class isolation logic against in-process mocks).

Test plan

  • bash -n .github/workflows/harness-ci.yml is not applicable to YAML; python3 -c 'import yaml; yaml.safe_load(...)' parses clean.
  • Single-file diff confirmed: git diff origin/claude/gallant-ride-cec4d7 --stat1 file changed, 260 insertions(+).
  • First CI run on this PR (rust-checks job) — exercises 600+ tests.
  • Operator activates test infra (one-shot), sets TEST_OIDC_AWS_ROLE_ARN → harness-e2e job runs end-to-end against Heima mainnet test contracts.

Two-tier CI matching issue #66's "shared test broker for CI + dev" vision:

  Tier 1 — ephemeral (every push/PR, fully self-contained, ~10–15 min):
    * .github/workflows/harness-ci.yml — cargo fmt + clippy + test +
      harness/ci-ephemeral-stack.sh. No LLM, no @claude invocation.
    * harness/ci-ephemeral-stack.sh — spins up anvil (new chain), runs
      forge build + test, deploys fresh v2 stage-1 contracts via
      DeployAgentKeysV1.s.sol (new contracts, new anvil-prefunded
      deployer), verifies via scripts/verify-heima-contracts.sh, then
      stands up mock-server + agentkeys-broker-server with
      --skip-startup-check (StubSts path) and probes OIDC discovery
      surface. EXIT trap tears everything down.

  Tier 2 — long-lived test broker (nightly + workflow_dispatch, scaffolded
  here, operator-activated via TEST_OIDC_AWS_ROLE_ARN secret):
    * .github/workflows/harness-e2e.yml — gated workflow that targets
      test-broker.litentry.org with real test AWS resources, runs all
      three stage demos against the long-lived parallel infra. Includes
      nightly cleanup of stale ci/ S3 prefixes. Uses GitHub Actions
      OIDC (id-token: write) for AWS auth, never long-lived secrets.
    * scripts/provision-test-environment.sh — operator-run one-shot
      provisioner that walks the 7 steps to stand up test-broker
      (separate OIDC provider, separate IAM roles, separate buckets,
      separate deployer wallet, fresh contracts on Heima-Paseo).
    * scripts/test-environment.env.example — committed env template
      mirroring operator-workstation.env with -test suffixes.
    * docs/test-environment.md — bring-up runbook, secret list,
      rotation, cleanup, and the two-tier design rationale.

WebAuthn: harness scripts default to WEBAUTHN_MODE=0 (stage-1 line 131,
stage-2 --stub) so no Touch ID prompt is ever needed; --webauthn is
opt-in and never passed by either workflow.

Validated locally: bash harness/ci-ephemeral-stack.sh --skip-broker
passes all 8 steps (anvil up, 33 forge tests, 6 contracts deployed +
verified, clean teardown). YAML + shell syntax checked.
Per operator feedback:

1. "do not create new files, only add the test file" — drop the
   ephemeral-stack helper, provisioner, env template, e2e workflow,
   and docs. Single deliverable: .github/workflows/harness-ci.yml.

2. "onchain solution should test on Heima mainnet with a new smart
   contract address" — confirmed possible: Solidity compiles
   deterministically and EVM contract addresses derive from
   (deployer, nonce). Identical crates/agentkeys-chain/src/*.sol +
   identical DeployAgentKeysV1.s.sol + a different deployer key on
   Heima mainnet = isolated parallel contract set at new addresses on
   the production chain.

3. "CI mirrors the production env" — the workflow now invokes the
   PRODUCTION harness scripts (harness/v2-stage{1,2,3}-demo.sh)
   unchanged. The only thing CI does differently from a prod operator
   is materialize scripts/operator-workstation.env with TEST_*
   resource names from GitHub secrets:

     - TEST_OIDC_AWS_ROLE_ARN  (gate; until set, harness job skips)
     - TEST_ACCOUNT_ID / TEST_AWS_REGION / TEST_BROKER_HOST
     - TEST_VAULT_BUCKET / TEST_MEMORY_BUCKET
     - TEST_{VAULT,MEMORY,DATA}_ROLE_ARN
     - TEST_HEIMA_DEPLOYER_KEY  (raw 0x-prefixed mainnet key — test
                                 wallet, distinct from prod deployer)
     - TEST_{SCOPE,SIDECAR_REGISTRY,K3_EPOCH_COUNTER,
            CREDENTIAL_AUDIT,P256_VERIFIER,K11_VERIFIER}_CONTRACT_ADDRESS_HEIMA
       (pre-deployed once per test-env refresh; harness skips deploy
        via --skip-deploy so CI doesn't burn HEI on every push)

   AWS auth via GitHub Actions OIDC (id-token: write), no long-lived
   secrets. Per-run S3 prefix isolation. The workflow gates itself on
   TEST_OIDC_AWS_ROLE_ARN being set so it's inert until the operator
   activates the test infra.

WebAuthn: never invoked — harness scripts default to WEBAUTHN_MODE=0
(stage-1 line 131) and stage-2's --stub flag is passed explicitly.

LLM: zero. Plain cargo/forge/aws-cli/curl orchestration. Distinct from
claude.yml + claude-code-review.yml which intentionally do call @claude.
@hanwencheng hanwencheng force-pushed the claude/romantic-ardinghelli-34d7a7 branch from bbe72b8 to cd25bde Compare May 21, 2026 01:34
@hanwencheng hanwencheng changed the base branch from main to claude/gallant-ride-cec4d7 May 21, 2026 01:34
…ima}.sh

Per operator request: pivot cloud-setup.md from a verbose manual-bash
runbook to a concise prereq/script-pointer split, add new heima-setup.md
+ ci-setup.md for the chain + CI flows, and move troubleshooting into
the ./wiki/ folder.

What changed:

  docs/cloud-setup.md  — UPDATE, 970 → 314 lines
    Add a TL;DR with the three-command operator flow (manual §1-§4
    prereqs, then setup-broker-host.sh, then setup-heima.sh). Slim
    §1-§4 to invariants + helper-script pointers + brief command
    blocks (DKIM bulk-record / receipt rule / per-data-class role
    provisioning all delegate to the existing scripts/*.sh). Replace
    the verbose §5/§6/§7 (EC2 broker / signer / workers, each with
    100+ lines of inline bash) with one §5 "Run setup-broker-host.sh"
    section that names what the script does (build, systemd, nginx,
    certbot, keypairs, env files) + what it doesn't (DNS, IAM, OIDC
    provider — those stay in §1-§4). Keep §0 (identities table) and
    §6 (cleanup recipe).

  docs/heima-setup.md  — NEW, 106 lines
    The 15-step pipeline in scripts/setup-heima.sh, with idempotency
    check + helper-script pointer per step. Mainnet vs Paseo vs Anvil
    tradeoff table. Per-step re-run examples. Heima London EVM pin
    explanation.

  docs/ci-setup.md  — NEW, 184 lines
    The 7-step operator bring-up for the no-LLM
    .github/workflows/harness-ci.yml workflow: provision test broker
    via setup-broker-host.sh with -test suffix, provision parallel
    AWS resources, register the test OIDC provider, generate + fund
    the test deployer wallet, deploy fresh test contracts on Heima
    mainnet with the same .sol source (different deployer →
    different addresses → isolated parallel contract set), register
    the GitHub Actions OIDC role, set the repo secrets. Includes
    the full TEST_* secret list, manual-dispatch instructions, and
    a secret-hygiene reminder.

  wiki/cloud-setup-faq.md     — NEW, 94 lines
  wiki/heima-setup-faq.md     — NEW, 111 lines
  wiki/ci-setup-faq.md        — NEW, 96 lines
    Troubleshooting + edge cases for each setup doc. Lives under
    ./wiki/ per CLAUDE.md "Wiki-location policy" — auto-published
    to the GitHub wiki on every push to main.

Constraints applied:

  - Concise: every doc fits in a few screens.
  - Idempotent: every flow reuses the existing idempotent helper
    scripts (setup-broker-host.sh, setup-heima.sh, provision-*-role.sh,
    apply-*-bucket-policy.sh).
  - No project credentials exposed: account IDs, role ARNs, bucket
    names, deployer keys, contract addresses all referenced via
    ${ACCOUNT_ID} / ${BROKER_HOST} / ${REGION} placeholders or via
    "read from operator-workstation.env" / "from step N" pointers.
    Real values live only in the operator's local env file + the
    GitHub repo secrets store.

All internal links verified via a python url-walker (every relative
link resolves to an existing file).
@hanwencheng hanwencheng added the documentation Improvements or additions to documentation label May 21, 2026
@hanwencheng
Copy link
Copy Markdown
Member Author

Added concise setup docs aligned with the existing idempotent scripts (commit 5a66a85):

Docs (concise, idempotent, no project credentials exposed):

Doc Lines What it covers
docs/cloud-setup.md (updated, 970 → 314) 314 TL;DR + §1–§4 prereqs (DNS, SES, IAM, OIDC) + §5 "Run setup-broker-host.sh" + §6 cleanup. The script is named as the single entry point per CLAUDE.md "Remote broker host" rule.
docs/heima-setup.md (new) 106 The 15-step setup-heima.sh pipeline (from #95), with per-step idempotency check + helper-script pointer. Mainnet / Paseo / Anvil tradeoff table.
docs/ci-setup.md (new) 184 Operator bring-up for the harness-ci.yml workflow: 7 steps, then the TEST_* secret list. Confirms the "same .sol source, different deployer → isolated parallel contracts on Heima mainnet" story.

FAQ wiki pages (independent, per the request):

Secret hygiene: every account ID / role ARN / bucket name / deployer key / contract address in the docs is a ${PLACEHOLDER} or a "read from operator-workstation.env" pointer. Real values live only in the operator's local env file + GitHub repo secrets. The lone hex address in the heima FAQ (anvil's first-nonce P256Verifier deploy) is deterministic public knowledge — anvil ships with a fixed default mnemonic for reproducible test setups.

Net doc diff: -839 / +782 lines (cloud-setup.md alone shrunk 970 → 314 by collapsing inline bash into script pointers).

Per operator request: the very-beginning cloud-account provisioning
(IAM users + role, DNS, SES, S3 buckets, instance profile) needs to
live in a separate doc so it stays reachable when:

  - Adding a second AWS account (test instance, regional shard)
  - Migrating to AliCloud / GCP / Tencent Cloud
  - Re-bootstrapping after a teardown
  - Auditing the identity surface

The previous condense pass collapsed those sections into cloud-setup.md's
slim §1-§3 — convenient for day-to-day operators but stripped the depth
needed for the migration / second-account use cases.

What changed:

  docs/cloud-bootstrap.md  — NEW, 365 lines
    First-time, per-account, cloud-provider-portable bootstrap doc:

      §1  Identities             — four IAM principals, cloud-agnostic
      §2  Domain + DNS           — subdomain map, parent-zone confirm
      §3  Email backend          — SES domain verify + receipt rule +
                                    inbound S3 bucket creation
      §4  IAM users + roles      — agentkeys-daemon + agentkeys-data-role +
                                    per-data-class vault/memory roles
      §5  Initial bucket policy  — static-IAM variant (pre-OIDC)
      §6  Instance profile       — agentkeys-broker-host (EC2 optional)
      §7  Security audit         — strip legacy over-broad attached policies
                                    (`AmazonS3FullAccess` checklist from the
                                    pre-condense §3.4a)
      §8  Cloud-provider port    — AWS / AliCloud / GCP / Tencent Cloud
                                    1:1 mapping table + migration playbook

    Restores the operational depth (DKIM bulk-record bash, daemon user
    create, role trust shape, broker-host instance profile, security
    audit) that the previous condense pass removed. Adds the portability
    framing (concept first, AWS-specific commands as ONE implementation)
    so the doc is the durable reference for non-AWS deployments.

  docs/cloud-setup.md  — UPDATE, 314 → 202 lines
    Refocus on what comes AFTER bootstrap: OIDC federation activation
    (§1, was §4) + the setup-broker-host.sh runtime entry point (§2,
    was §5) + cleanup (§3, was §6). Drop the duplicate §1-§3 prereqs;
    add a clear cross-ref to cloud-bootstrap.md at the top. Section
    numbers renumbered.

  wiki/cloud-setup-faq.md — minor header tweak
    The FAQ now covers both cloud-bootstrap.md and cloud-setup.md
    (operators hit the same gotchas across both phases).

Constraints applied:

  - Concise: every doc still fits in a few screens (bootstrap is
    longest at 365 lines because it carries the actual provisioning
    commands; cloud-setup.md is now 202 lines, down from 970 originally).
  - Idempotent: every flow uses the existing idempotent helper scripts.
  - No project credentials exposed: same placeholder convention as the
    prior pass (${ACCOUNT_ID}, ${ZONE}, etc.). Verified via grep.

All internal links verified (python url-walker).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stage 7: shared test broker (test-broker.litentry.org) for CI + dev — parallel infra, isolated AWS resources

2 participants