Migrate from policyengine v0.x to v4 (prerequisite for TRACE TRO emission) by MaxGhenis · Pull Request #3487 · PolicyEngine/policyengine-api

MaxGhenis · 2026-04-21T19:12:26Z

Draft. Requires Modal simulation-worker coordination before merge.

Summary

Bumps `policyengine` from `>0.12.0,<1` to `>=4.3.1,<5`; `policyengine_us` 1.634.9 → 1.653.3; `policyengine_uk` 2.78.0 → 2.88.0.
Removes the two pre-v4 imports (`from policyengine.simulation import SimulationOptions` and `from policyengine.utils.data.datasets import get_default_dataset`) in favor of api-local equivalents in a new `policyengine_api.libs.simulation_types` module.
The new `SimulationOptions` is a Pydantic model matching the Modal worker's existing JSON wire contract (country, scope, reform, baseline, time_period, include_cliffs, region, data, model_version, data_version). Owning the type api-side decouples the api from pe.py's internal class layout.
The new `get_default_dataset` returns the default HuggingFace-hosted dataset URI for a (country, region) pair. State / district regions fall back to the national default, matching prior behavior.
9 smoke tests for the new module. Tests pass in isolation; the full api pytest suite requires CloudSQL credentials to import, so end-to-end integration verification lands in CI.

Why this is draft

The Modal simulation worker is pinned to its own pe.py version (independent of this api). Bumping the api's pe.py pin without coordinating with Modal could produce JSON payloads that the worker does not decode correctly. Whoever merges this should:

Confirm the Modal worker is either on a pe.py version that accepts the api-local `SimulationOptions` JSON shape, or bump the worker in the same change window.
Verify the api-pytest suite runs green in CI (couldn't run locally because tests require `POLICYENGINE_DB_PASSWORD`).
Test a reform economy-wide simulation end-to-end through a staging deploy before promoting.

Why this unblocks a lot

Once this merges, #3485 (webapp TRACE TRO emission) becomes a straightforward feature PR:

```python
from policyengine.provenance.trace import (
build_trace_tro_from_release_bundle,
build_simulation_trace_tro,
serialize_trace_tro,
)

... on every simulation completion, build and persist a TRO.

```

Those imports only exist on the v4 line.

Test plan

New module smoke-tests pass in isolation
`uv run python -c "from policyengine.provenance.trace import build_trace_tro_from_release_bundle"` succeeds against the new dep pins
`uv lock` resolves cleanly
CI green (needs `POLICYENGINE_DB_PASSWORD` secrets)
Modal-worker side confirmed compatible
Staging deploy smoke test against a real economy-wide reform

Fixes #3486. Unblocks #3485.

🤖 Generated with Claude Code

Prerequisite for #3485 (webapp TRACE TRO emission). The v4 provenance primitives live in policyengine.provenance.trace.* which only ship on the 4.x line; the api was pinned to 0.x and could not use them. Changes: - pyproject.toml: bump policyengine >0.12.0,<1 -> >=4.3.1,<5; policyengine_us 1.634.9 -> 1.653.3; policyengine_uk 2.78.0 -> 2.88.0. - Remove two imports that relied on the pre-v4 orchestrator: from policyengine.simulation import SimulationOptions from policyengine.utils.data.datasets import get_default_dataset - Replace with api-local equivalents in a new policyengine_api.libs.simulation_types module: * SimulationOptions: Pydantic model matching the Modal simulation worker's existing JSON wire contract. Owning the type api-side decouples the api from pe.py's internal class layout. * get_default_dataset(country_id, region): api-local helper returning the default HF-hosted dataset URI for (country, region). State / district regions fall back to the national default, matching prior behavior. - 9 smoke tests for the new module. Tests pass in isolation; the api's full pytest suite requires CloudSQL credentials to import, so end-to-end integration verification lands in CI. Draft status: this PR ships the import-level migration and the type contract. It does NOT exercise the full v4 code paths in production, because the Modal simulation-worker side also pins pe.py — whoever merges this should coordinate with Modal-side version alignment. After that, the follow-up to #3485 becomes tractable: the api can call policyengine.provenance.trace.build_trace_tro_from_release_bundle to emit institutionally-signed TROs for every simulation run. Related: - #3485 (webapp TRO emission — unblocked by this) - #3486 (this migration — issue scoped) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codex review caught two real bugs in the first pass of this PR: 1. The UK default URI pointed at the wrong dataset (enhanced_frs_2022_23 in the public bucket), while both the pre-v4 contract and the canonical policyengine.py release manifest expect enhanced_frs_2023_24 in the -private bucket (UKDS-licensed data). Shipping as-was would have broken every UK webapp run. 2. US state / congressional-district / place regions were collapsed to the national default, contradicting the existing test contract in tests/unit/services/test_economy_service.py:1353,1371 which requires per-state (states/CA.h5), per-district (districts/CA-37.h5), and place-to-parent-state (place/NJ-57000 -> states/NJ.h5) routing. This would have broken per-state and per-district economy-wide runs. Both classes of bug surface as provenance drift: the api metadata would show one dataset, the worker would run against another, and any TRO emitted on top of that would still look authoritative. The rewritten get_default_dataset faithfully ports the pre-v4 GCS-URI contract (gs://policyengine-{country}-data{-private}/...) to the api-local module. State codes and district IDs are upper-cased; place regions parse the NJ-57000 shape and reuse the parent state. Unknown regions raise rather than silently falling through. Tests updated to match the real contract instead of the earlier naive fallback assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MaxGhenis and others added 2 commits April 21, 2026 15:11

MaxGhenis force-pushed the migrate-to-policyengine-v4 branch from 6da4f35 to 0be9e1a Compare April 21, 2026 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate from policyengine v0.x to v4 (prerequisite for TRACE TRO emission)#3487

Migrate from policyengine v0.x to v4 (prerequisite for TRACE TRO emission)#3487
MaxGhenis wants to merge 2 commits intomasterfrom
migrate-to-policyengine-v4

MaxGhenis commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Apr 21, 2026

Summary

Why this is draft

Why this unblocks a lot

... on every simulation completion, build and persist a TRO.

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant