Migrate from policyengine v0.x to v4 (prerequisite for TRACE TRO emission)#3487
Draft
Migrate from policyengine v0.x to v4 (prerequisite for TRACE TRO emission)#3487
Conversation
Prerequisite for #3485 (webapp TRACE TRO emission). The v4 provenance primitives live in policyengine.provenance.trace.* which only ship on the 4.x line; the api was pinned to 0.x and could not use them. Changes: - pyproject.toml: bump policyengine >0.12.0,<1 -> >=4.3.1,<5; policyengine_us 1.634.9 -> 1.653.3; policyengine_uk 2.78.0 -> 2.88.0. - Remove two imports that relied on the pre-v4 orchestrator: from policyengine.simulation import SimulationOptions from policyengine.utils.data.datasets import get_default_dataset - Replace with api-local equivalents in a new policyengine_api.libs.simulation_types module: * SimulationOptions: Pydantic model matching the Modal simulation worker's existing JSON wire contract. Owning the type api-side decouples the api from pe.py's internal class layout. * get_default_dataset(country_id, region): api-local helper returning the default HF-hosted dataset URI for (country, region). State / district regions fall back to the national default, matching prior behavior. - 9 smoke tests for the new module. Tests pass in isolation; the api's full pytest suite requires CloudSQL credentials to import, so end-to-end integration verification lands in CI. Draft status: this PR ships the import-level migration and the type contract. It does NOT exercise the full v4 code paths in production, because the Modal simulation-worker side also pins pe.py — whoever merges this should coordinate with Modal-side version alignment. After that, the follow-up to #3485 becomes tractable: the api can call policyengine.provenance.trace.build_trace_tro_from_release_bundle to emit institutionally-signed TROs for every simulation run. Related: - #3485 (webapp TRO emission — unblocked by this) - #3486 (this migration — issue scoped) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review caught two real bugs in the first pass of this PR:
1. The UK default URI pointed at the wrong dataset (enhanced_frs_2022_23
in the public bucket), while both the pre-v4 contract and the
canonical policyengine.py release manifest expect
enhanced_frs_2023_24 in the -private bucket (UKDS-licensed data).
Shipping as-was would have broken every UK webapp run.
2. US state / congressional-district / place regions were collapsed to
the national default, contradicting the existing test contract in
tests/unit/services/test_economy_service.py:1353,1371 which requires
per-state (states/CA.h5), per-district (districts/CA-37.h5), and
place-to-parent-state (place/NJ-57000 -> states/NJ.h5) routing. This
would have broken per-state and per-district economy-wide runs.
Both classes of bug surface as provenance drift: the api metadata
would show one dataset, the worker would run against another, and
any TRO emitted on top of that would still look authoritative.
The rewritten get_default_dataset faithfully ports the pre-v4 GCS-URI
contract (gs://policyengine-{country}-data{-private}/...) to the
api-local module. State codes and district IDs are upper-cased;
place regions parse the NJ-57000 shape and reuse the parent state.
Unknown regions raise rather than silently falling through.
Tests updated to match the real contract instead of the earlier
naive fallback assertions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6da4f35 to
0be9e1a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft. Requires Modal simulation-worker coordination before merge.
Summary
Why this is draft
The Modal simulation worker is pinned to its own pe.py version (independent of this api). Bumping the api's pe.py pin without coordinating with Modal could produce JSON payloads that the worker does not decode correctly. Whoever merges this should:
Why this unblocks a lot
Once this merges, #3485 (webapp TRACE TRO emission) becomes a straightforward feature PR:
```python
from policyengine.provenance.trace import (
build_trace_tro_from_release_bundle,
build_simulation_trace_tro,
serialize_trace_tro,
)
... on every simulation completion, build and persist a TRO.
```
Those imports only exist on the v4 line.
Test plan
Fixes #3486. Unblocks #3485.
🤖 Generated with Claude Code