Skip to content

Migrate from policyengine v0.x to v4 (prerequisite for TRACE TRO emission)#3487

Draft
MaxGhenis wants to merge 2 commits intomasterfrom
migrate-to-policyengine-v4
Draft

Migrate from policyengine v0.x to v4 (prerequisite for TRACE TRO emission)#3487
MaxGhenis wants to merge 2 commits intomasterfrom
migrate-to-policyengine-v4

Conversation

@MaxGhenis
Copy link
Copy Markdown
Collaborator

Draft. Requires Modal simulation-worker coordination before merge.

Summary

  • Bumps `policyengine` from `>0.12.0,<1` to `>=4.3.1,<5`; `policyengine_us` 1.634.9 → 1.653.3; `policyengine_uk` 2.78.0 → 2.88.0.
  • Removes the two pre-v4 imports (`from policyengine.simulation import SimulationOptions` and `from policyengine.utils.data.datasets import get_default_dataset`) in favor of api-local equivalents in a new `policyengine_api.libs.simulation_types` module.
  • The new `SimulationOptions` is a Pydantic model matching the Modal worker's existing JSON wire contract (country, scope, reform, baseline, time_period, include_cliffs, region, data, model_version, data_version). Owning the type api-side decouples the api from pe.py's internal class layout.
  • The new `get_default_dataset` returns the default HuggingFace-hosted dataset URI for a (country, region) pair. State / district regions fall back to the national default, matching prior behavior.
  • 9 smoke tests for the new module. Tests pass in isolation; the full api pytest suite requires CloudSQL credentials to import, so end-to-end integration verification lands in CI.

Why this is draft

The Modal simulation worker is pinned to its own pe.py version (independent of this api). Bumping the api's pe.py pin without coordinating with Modal could produce JSON payloads that the worker does not decode correctly. Whoever merges this should:

  1. Confirm the Modal worker is either on a pe.py version that accepts the api-local `SimulationOptions` JSON shape, or bump the worker in the same change window.
  2. Verify the api-pytest suite runs green in CI (couldn't run locally because tests require `POLICYENGINE_DB_PASSWORD`).
  3. Test a reform economy-wide simulation end-to-end through a staging deploy before promoting.

Why this unblocks a lot

Once this merges, #3485 (webapp TRACE TRO emission) becomes a straightforward feature PR:

```python
from policyengine.provenance.trace import (
build_trace_tro_from_release_bundle,
build_simulation_trace_tro,
serialize_trace_tro,
)

... on every simulation completion, build and persist a TRO.

```

Those imports only exist on the v4 line.

Test plan

  • New module smoke-tests pass in isolation
  • `uv run python -c "from policyengine.provenance.trace import build_trace_tro_from_release_bundle"` succeeds against the new dep pins
  • `uv lock` resolves cleanly
  • CI green (needs `POLICYENGINE_DB_PASSWORD` secrets)
  • Modal-worker side confirmed compatible
  • Staging deploy smoke test against a real economy-wide reform

Fixes #3486. Unblocks #3485.

🤖 Generated with Claude Code

MaxGhenis and others added 2 commits April 21, 2026 15:11
Prerequisite for #3485 (webapp TRACE TRO
emission). The v4 provenance primitives live in
policyengine.provenance.trace.* which only ship on the 4.x line; the
api was pinned to 0.x and could not use them.

Changes:
- pyproject.toml: bump policyengine >0.12.0,<1 -> >=4.3.1,<5;
  policyengine_us 1.634.9 -> 1.653.3; policyengine_uk 2.78.0 -> 2.88.0.
- Remove two imports that relied on the pre-v4 orchestrator:
    from policyengine.simulation import SimulationOptions
    from policyengine.utils.data.datasets import get_default_dataset
- Replace with api-local equivalents in a new
  policyengine_api.libs.simulation_types module:
    * SimulationOptions: Pydantic model matching the Modal simulation
      worker's existing JSON wire contract. Owning the type api-side
      decouples the api from pe.py's internal class layout.
    * get_default_dataset(country_id, region): api-local helper
      returning the default HF-hosted dataset URI for (country, region).
      State / district regions fall back to the national default,
      matching prior behavior.
- 9 smoke tests for the new module. Tests pass in isolation; the
  api's full pytest suite requires CloudSQL credentials to import,
  so end-to-end integration verification lands in CI.

Draft status: this PR ships the import-level migration and the type
contract. It does NOT exercise the full v4 code paths in production,
because the Modal simulation-worker side also pins pe.py — whoever
merges this should coordinate with Modal-side version alignment.
After that, the follow-up to #3485 becomes tractable: the api can
call policyengine.provenance.trace.build_trace_tro_from_release_bundle
to emit institutionally-signed TROs for every simulation run.

Related:
- #3485 (webapp TRO emission — unblocked by this)
- #3486 (this migration — issue scoped)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review caught two real bugs in the first pass of this PR:

1. The UK default URI pointed at the wrong dataset (enhanced_frs_2022_23
   in the public bucket), while both the pre-v4 contract and the
   canonical policyengine.py release manifest expect
   enhanced_frs_2023_24 in the -private bucket (UKDS-licensed data).
   Shipping as-was would have broken every UK webapp run.

2. US state / congressional-district / place regions were collapsed to
   the national default, contradicting the existing test contract in
   tests/unit/services/test_economy_service.py:1353,1371 which requires
   per-state (states/CA.h5), per-district (districts/CA-37.h5), and
   place-to-parent-state (place/NJ-57000 -> states/NJ.h5) routing. This
   would have broken per-state and per-district economy-wide runs.

Both classes of bug surface as provenance drift: the api metadata
would show one dataset, the worker would run against another, and
any TRO emitted on top of that would still look authoritative.

The rewritten get_default_dataset faithfully ports the pre-v4 GCS-URI
contract (gs://policyengine-{country}-data{-private}/...) to the
api-local module. State codes and district IDs are upper-cased;
place regions parse the NJ-57000 shape and reuse the parent state.
Unknown regions raise rather than silently falling through.

Tests updated to match the real contract instead of the earlier
naive fallback assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MaxGhenis MaxGhenis force-pushed the migrate-to-policyengine-v4 branch from 6da4f35 to 0be9e1a Compare April 21, 2026 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate policyengine dep from 0.x to v4 (prereq for TRACE TRO emission)

1 participant