Lift by_path + controls gate (DID^X residualization) by igerber · Pull Request #378 · igerber/diff-diff

igerber · 2026-04-25T21:57:18Z

Summary

Wave 3 Add comprehensive code review for diff-diff library #5 of the dCDH by_path follow-up sequence: combine by_path=k with controls=[...] (DID^X residualization)
Pure gate-lift — the existing per-baseline OLS residualization at diff_diff/chaisemartin_dhaultfoeuille.py:1498 runs once on the full panel BEFORE path enumeration, so all four downstream surfaces (analytical SE / bootstrap SE / per-path placebos / per-path joint sup-t bands) auto-inherit the residualized Y_mat (Frisch-Waugh-Lovell)
New R parity scenario multi_path_reversible_by_path_controls validates per-path point estimates against canonical R did_multiplegt_dyn(..., by_path=3, controls="X1") — exact match (rtol ~1e-11) on the single-baseline DGP

Methodology notes

R does per-path re-residualization with a path-restricted subsample (path's switchers + same-baseline not-yet-treated controls). Our architecture residualizes once globally, then disaggregates. On the multi_path_reversible DGP all switchers share baseline D_{g,1}=0, so R's per-path control pool equals our global control pool — residualization coefficients coincide and per-path point estimates match exactly. Per-path SE inherits the documented cross-path cohort-sharing deviation from R (Phase 2 envelope, ~6.5% rtol on this scenario).

Per-period effects remain unadjusted. The per-period DID path uses raw Y_mat per the existing controls + per-period contract documented at chaisemartin_dhaultfoeuille.py:1493-1496. Pinned by test_per_period_effects_unadjusted_with_by_path_controls.

Files changed

File	Change
`diff_diff/chaisemartin_dhaultfoeuille.py`	Delete gate at `:988-992`; update `by_path` docstring (remove `controls` from incompatible list, add inheritance paragraph)
`tests/test_chaisemartin_dhaultfoeuille.py`	Remove `controls` parametrize entry from `TestByPathGates`; add `TestByPathControls` class (12 tests)
`tests/test_chaisemartin_dhaultfoeuille_parity.py`	Add `TestDCDHDynRParityByPathControls` class
`benchmarks/R/generate_dcdh_dynr_test_values.R`	Add Scenario 16 `multi_path_reversible_by_path_controls`
`benchmarks/data/dcdh_dynr_golden_values.json`	Regenerated via `Rscript benchmarks/R/generate_dcdh_dynr_test_values.R`
`docs/methodology/REGISTRY.md`	Remove `controls` from gated-combos list in the Phase-3 by_path Note; add inheritance sub-paragraph
`CHANGELOG.md`	Unreleased > Added entry

TestByPathControls covers all four downstream surfaces explicitly (analytical / bootstrap / placebo / sup-t bands), plus covariate_residuals round-trip, to_dataframe(level="by_path") cband columns under controls + bootstrap, multi-covariate, path-enumeration-unaffected-by-controls, and the per-period-unadjusted contract.

Test plan

pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls -m "slow or not slow" — 12 passed
pytest tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls — 1 passed (point estimates exact match, SE within Phase 2 envelope)
pytest tests/test_chaisemartin_dhaultfoeuille.py -k "ByPath" — 88 passed (all existing by_path classes regression-clean)
pytest tests/test_chaisemartin_dhaultfoeuille.py::TestCovariateAdjustment — 8 passed (existing controls regression-clean)
pytest tests/test_chaisemartin_dhaultfoeuille_parity.py — 19 passed
Full dCDH suite (tests/test_chaisemartin_dhaultfoeuille.py + test_methodology_chaisemartin_dhaultfoeuille.py + test_chaisemartin_dhaultfoeuille_parity.py) — 231 passed

🤖 Generated with Claude Code

github-actions · 2026-04-25T22:02:29Z

Overall Assessment

⚠️ Needs changes

Executive Summary

by_path + controls is now enabled globally, but the PR’s own parity note narrows exact R agreement to a single-baseline fixture and acknowledges multi-baseline point-estimate drift; that deviation is not documented in REGISTRY.md or surfaced at runtime.
The underlying data flow is otherwise consistent: DID^X residualization happens before multi-horizon/by-path computation, and per-period DID stays on raw outcomes.
The PR deletes the only explicit Tutorial 19 drift-regression test without an evident replacement, while docs still claim the tutorial has drift guards.
I did not find new inline-inference/NaN-guard regressions in the changed estimator path.
Static review only: pytest is not installed in this sandbox.

Methodology

Severity: P1. diff_diff/chaisemartin_dhaultfoeuille.py:L416-L424 now presents by_path + controls as generally supported after the gate lift in the by_path precondition block (diff_diff/chaisemartin_dhaultfoeuille.py:L992-L1000). But the new parity docstring explicitly says exact parity only holds on the single-baseline fixture and that multi-baseline panels can produce different point estimates because Python residualizes once globally while R re-runs per path (tests/test_chaisemartin_dhaultfoeuille_parity.py:L761-L777). REGISTRY.md documents only the inherited SE deviation, not this point-estimate deviation (docs/methodology/REGISTRY.md:L641-L643), and the new R-generator comment says the opposite about what R does (benchmarks/R/generate_dcdh_dynr_test_values.R:L702-L714). Impact: multi-baseline users can silently get by-path DID^X estimates that do not match the documented R reference behavior. Concrete fix: either add a Deviation from R note in REGISTRY.md plus a runtime warning/guard when by_path + controls is used with multiple switcher baselines, or re-gate the combination until a multi-baseline parity fixture and rationale are added.

Code Quality

No material findings in the changed runtime code beyond the methodology-contract issue above.

Performance

No material findings.

Maintainability

No separate findings beyond the methodology-contract inconsistency already noted.

Tech Debt

No material findings.

Security

No findings.

Documentation/Tests

Severity: P2. The PR deletes tests/test_t19_marketing_pulse_drift.py:L1-L304 (deleted), which was the explicit numeric drift guard for Tutorial 19, but still advertises “drift guards” in CHANGELOG.md:L15 and docs/tutorials/README.md:L90-L96. I also could not find replacement assertion cells in docs/tutorials/19_dcdh_marketing_pulse.ipynb. Impact: the tutorial’s quoted numbers can now go stale silently as long as the notebook still executes. Concrete fix: restore the deleted test or move equivalent executed assertions into the notebook, then update the README/changelog if drift guards are intentionally being removed.

Path to Approval

Resolve the by_path + controls contract for multi-baseline panels: document it in docs/methodology/REGISTRY.md as an explicit deviation from R and warn/guard at runtime, or keep the gate until broader parity support is established.
Restore Tutorial 19 drift protection, either by undeleting tests/test_t19_marketing_pulse_drift.py or by adding equivalent executed assertions to docs/tutorials/19_dcdh_marketing_pulse.ipynb, and align the README/changelog text with that choice.

…note CI reviewer flagged that the new by_path + controls combination silently produces point-estimate divergence from R on multi-baseline switcher panels (R re-runs per-baseline residualization on each path's restricted subsample; we residualize once globally). The parity test docstring documented the deviation but REGISTRY.md and the runtime did not. Fixes: - Emit UserWarning in fit() when by_path + controls is used on a panel with multiple switcher D_{g,1} values (chaisemartin_dhaultfoeuille.py inside the controls residualization block, after _compute_group_switch_metadata) - Update the by_path docstring with an explicit "Deviation from R on multi-baseline switcher panels" paragraph - Update REGISTRY.md "Per-path covariate residualization (DID^X)" paragraph to document the point-estimate deviation alongside the existing SE deviation - Update CHANGELOG entry to call out the multi-baseline deviation - Update R-generator scenario 16 comment to correctly describe R's per-path re-residualization (the prior comment misstated R's behavior as "residualize once globally") - Update parity test class docstring to be precise about R's per-path call site (R/R/did_multiplegt_dyn.R lines 393-411) - Add two regression tests: * test_multi_baseline_panel_emits_r_deviation_warning — joiner + leaver + always-treated + never-treated panel triggers the warning * test_single_baseline_panel_does_not_emit_r_deviation_warning — standard 3-path joiners-only fixture does NOT trigger the warning The single-baseline R-parity scenario (multi_path_reversible_by_path_controls) remains exact-match (rtol ~1e-11) because all switchers in the DGP share D_{g,1}=0 and R's per-path control pool reduces to the global control pool we use. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T22:37:41Z

/ai-review

github-actions · 2026-04-25T22:42:36Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a3de936dc77d572584975f16d62c42b5942e2e52

Overall Assessment

✅ Looks good

Executive Summary

Affected methods: ChaisemartinDHaultfoeuille.by_path and DID^X covariate residualization (controls=[...]), with inherited per-path bootstrap/placebo/sup-t surfaces.
Previous methodology P1 is resolved: the by_path + controls R-divergence is now documented in the estimator docstring and docs/methodology/REGISTRY.md, and a fit-time UserWarning is emitted on multi-baseline switcher panels (diff_diff/chaisemartin_dhaultfoeuille.py:L416-L435, diff_diff/chaisemartin_dhaultfoeuille.py:L1493-L1524, docs/methodology/REGISTRY.md:L641-L643).
I did not find new inline-inference or NaN/SE anti-pattern regressions in the changed estimator path.
Previous documentation/tests P2 remains open: the supplied diff deletes tests/test_t19_marketing_pulse_drift.py, but Tutorial 19 is still advertised as having “drift guards” and the notebook itself does not contain executable assertion cells (CHANGELOG.md:L15-L15, docs/tutorials/README.md:L90-L96, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L319-L400).
Static review only; I did not run pytest in this sandbox.

Methodology

Severity: P3 informational. The prior undocumented deviation for by_path + controls is now mitigated. The implementation explicitly documents that Python residualizes once before path enumeration, warns when multi-baseline switcher panels can diverge from R, and adds parity/warning tests for the supported single-baseline case and the warning path (diff_diff/chaisemartin_dhaultfoeuille.py:L416-L435, diff_diff/chaisemartin_dhaultfoeuille.py:L1493-L1524, docs/methodology/REGISTRY.md:L641-L643, tests/test_chaisemartin_dhaultfoeuille.py:L6561-L6650, tests/test_chaisemartin_dhaultfoeuille_parity.py:L756-L870). Impact: no blocker under the review rubric because the deviation is documented in REGISTRY.md and surfaced at runtime. Concrete fix: none.

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings.

Tech Debt

No separate findings. The live Tutorial 19 issue below is not tracked in TODO.md, so it remains an unmitigated documentation/tests item rather than accepted deferred work.

Security

No findings.

Documentation/Tests

Severity: P2. The PR removes the dedicated Tutorial 19 numeric drift test (tests/test_t19_marketing_pulse_drift.py:L1-L304 in the supplied diff), but the public-facing docs still claim the tutorial has drift guards (CHANGELOG.md:L15-L15, docs/tutorials/README.md:L90-L96). The notebook workflow still uses pytest --nbmake docs/tutorials/, which checks execution rather than quoted-number stability (.github/workflows/notebooks.yml:L50-L58), and the notebook tail is narrative-only rather than assert-backed (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L319-L400). Impact: the tutorial’s quoted headline, joiner/leaver, placebo, and event-study numbers can now drift silently without CI catching prose mismatch. Concrete fix: either keep tests/test_t19_marketing_pulse_drift.py, or move equivalent executed assertions into docs/tutorials/19_dcdh_marketing_pulse.ipynb and keep the “drift guards” claim; otherwise, remove the “drift guards” wording from the changelog and tutorial catalog.

PR #357 shipped by_path foundation; PRs #364/#371/#374 completed the inference surface (bootstrap, placebos, sup-t bands). Wave 3 begins design-variant extensions; this PR is item #5: combine by_path=k with controls=[...] (DID^X). Architecture: the per-baseline OLS residualization at chaisemartin_dhaultfoeuille.py:1498 runs once on the full panel BEFORE path enumeration, so all four downstream surfaces (analytical SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized Y_mat automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing controls + per-period DID contract. Canonical R behavior: `did_multiplegt_dyn(..., by_path=k, controls=...)` re-runs the per-baseline residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls). On the multi_path_reversible DGP all switchers share baseline D_{g,1}=0, so R's per-path control pool equals our global control pool and the residualization coefficients coincide. Per-path point estimates match R exactly (rtol ~1e-11); per-path SE within ~6.5% (Phase 2 envelope, inheriting the documented cross-path cohort- sharing deviation). Changes: - Delete the gate at chaisemartin_dhaultfoeuille.py:988-992 - Update by_path docstring (remove `controls` from incompatible list, add inheritance paragraph) - New R parity scenario `multi_path_reversible_by_path_controls` in benchmarks/R/generate_dcdh_dynr_test_values.R + regenerated golden values - New TestDCDHDynRParityByPathControls in tests/test_chaisemartin_dhaultfoeuille_parity.py - New TestByPathControls in tests/test_chaisemartin_dhaultfoeuille.py (12 tests covering analytical / bootstrap / placebo / sup-t / cband to_dataframe / per-period unadjusted / covariate_residuals round- trip / multi-covariate) - Remove the `controls` parametrize entry from TestByPathGates::test_forbids_phase3_fit_kwargs - Update REGISTRY.md (remove `controls` from gated-combos list, add inheritance sub-paragraph documenting the four-surface auto- inheritance) - CHANGELOG: Unreleased > Added entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…note CI reviewer flagged that the new by_path + controls combination silently produces point-estimate divergence from R on multi-baseline switcher panels (R re-runs per-baseline residualization on each path's restricted subsample; we residualize once globally). The parity test docstring documented the deviation but REGISTRY.md and the runtime did not. Fixes: - Emit UserWarning in fit() when by_path + controls is used on a panel with multiple switcher D_{g,1} values (chaisemartin_dhaultfoeuille.py inside the controls residualization block, after _compute_group_switch_metadata) - Update the by_path docstring with an explicit "Deviation from R on multi-baseline switcher panels" paragraph - Update REGISTRY.md "Per-path covariate residualization (DID^X)" paragraph to document the point-estimate deviation alongside the existing SE deviation - Update CHANGELOG entry to call out the multi-baseline deviation - Update R-generator scenario 16 comment to correctly describe R's per-path re-residualization (the prior comment misstated R's behavior as "residualize once globally") - Update parity test class docstring to be precise about R's per-path call site (R/R/did_multiplegt_dyn.R lines 393-411) - Add two regression tests: * test_multi_baseline_panel_emits_r_deviation_warning — joiner + leaver + always-treated + never-treated panel triggers the warning * test_single_baseline_panel_does_not_emit_r_deviation_warning — standard 3-path joiners-only fixture does NOT trigger the warning The single-baseline R-parity scenario (multi_path_reversible_by_path_controls) remains exact-match (rtol ~1e-11) because all switchers in the DGP share D_{g,1}=0 and R's per-path control pool reduces to the global control pool we use. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T22:45:13Z

/ai-review

github-actions · 2026-04-25T23:06:30Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 028e7e180952f627642d9f545fb2f43d51451202

Overall Assessment

✅ Looks good

Executive Summary

Affected methodology surface: ChaisemartinDHaultfoeuille.fit(..., by_path, controls) plus inherited per-path bootstrap/placebo/sup-t outputs. The gate lift is wired through the existing residualized-Y_mat pipeline rather than a separate estimator branch, which is the right shape for this feature. diff_diff/chaisemartin_dhaultfoeuille.py:L416-L445, diff_diff/chaisemartin_dhaultfoeuille.py:L1493-L1535, diff_diff/chaisemartin_dhaultfoeuille.py:L1679-L1693
The Python-vs-R divergence on multi-baseline switcher panels is now documented in both the estimator docstring and the Methodology Registry, and surfaced at runtime via UserWarning; under the review rubric that is a documented deviation, so I do not count it as a methodology defect. diff_diff/chaisemartin_dhaultfoeuille.py:L424-L435, diff_diff/chaisemartin_dhaultfoeuille.py:L1505-L1524, docs/methodology/REGISTRY.md:L641-L643
I did not find new NaN/inference anti-patterns on the modified estimator path; bootstrap path/placebo propagation still goes through safe_inference and writes full-NaN inference tuples when bootstrap SEs are invalid. diff_diff/chaisemartin_dhaultfoeuille.py:L2990-L3048
One non-blocking issue remains: CHANGELOG.md now contradicts itself inside the same Unreleased section. The new entry says by_path + controls is supported, but the older by_path entry still says controls raises NotImplementedError. CHANGELOG.md:L11-L11, CHANGELOG.md:L29-L29
Re-review note: I am not carrying forward the prior Tutorial 19 drift-guard concern; the dedicated drift test file is present in the current tree. tests/test_t19_marketing_pulse_drift.py:L1-L16
Static review only; I could not execute the targeted pytest slices here because this sandbox does not have the pytest module installed.

Methodology

Severity: P3 informational. Impact: the changed estimator path matches the documented contract: residualize once before path enumeration, keep per-period DID on raw outcomes, and warn on the documented multi-baseline deviation from R. No unmitigated P0/P1 methodology findings. Concrete fix: none. diff_diff/chaisemartin_dhaultfoeuille.py:L416-L445, diff_diff/chaisemartin_dhaultfoeuille.py:L1493-L1535, diff_diff/chaisemartin_dhaultfoeuille.py:L1679-L1693, docs/methodology/REGISTRY.md:L641-L643

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings.

Tech Debt

No separate findings.

Security

No findings.

Documentation/Tests

Severity: P2. Impact: the public Unreleased changelog is self-contradictory about whether by_path + controls is supported, which will mislead users and reviewers even though the code and registry are aligned. Concrete fix: remove controls from the older by_path incompatibility list, or consolidate the overlapping by_path bullets so compatibility is stated in one canonical place. CHANGELOG.md:L11-L11, CHANGELOG.md:L29-L29
Verification note: targeted test execution was not possible in this sandbox because pytest is unavailable.

Older Unreleased by_path entry (PR #357 origin) listed `controls` in the gated-combos list, contradicting the new entry that says `by_path + controls` is supported. Same internal contradiction applied implicitly to the bootstrap, placebo, and sup-t bands extensions that landed in the same Unreleased block — those weren't in the gated list to begin with, but the entry didn't acknowledge they're now supported. Updated the by_path bullet to remove `controls` from the gated list and explicitly acknowledge `n_bootstrap > 0`, `placebo=True`, joint sup-t bands, and `controls` as supported (see dedicated entries elsewhere in [Unreleased]). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T23:10:22Z

/ai-review

github-actions · 2026-04-25T23:21:30Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 1e49dff9c73cb73afbeb785dbe9a4d017c76b667

Overall Assessment

⚠️ Needs changes

Executive Summary

Re-review: the prior changelog contradiction appears fixed; CHANGELOG.md now consistently says by_path + controls is supported. CHANGELOG.md:L11-L11, CHANGELOG.md:L29-L29
The gate lift itself is routed through the existing DID^X residualization path; I did not find new NaN/SE anti-patterns in the modified by_path plumbing.
P1 methodology blocker: the new docs/runtime warning treat “all switchers share one D_{g,1}” as sufficient for Python/R point-estimate parity, but the DID^X first stage is estimated on all baseline-matching, not-yet-switched observations. That means same-baseline panels can still differ from R if different paths contribute different pre-switch switcher rows.
Because the new warning only keys on multiple baseline values, those same-baseline divergence cases can be silent in production.
The new tests and benchmark comments reinforce that too-broad parity claim instead of covering the constant-baseline / heterogeneous-switch-timing case.
Static review only; pytest is not installed in this sandbox.

Methodology

Severity: P1
Impact: ChaisemartinDHaultfoeuille.fit(..., by_path, controls) now documents and warns on the wrong parity condition. The new docstring/registry note say Python matches R whenever switchers share a single baseline, and the runtime warning only fires when switchers span multiple D_{g,1} values. But _compute_covariate_residualization() fits theta_hat_d on all baseline-matching, not-yet-switched observations (valid = d_mask & not_yet_switched & both_observed), so pre-switch rows from later-treated switchers remain in the Python first stage even when D_{g,1} is constant. R’s by_path + controls reruns that first stage on each path-restricted subsample. So parity depends on first-stage sample equivalence, not just baseline multiplicity, and same-baseline panels can still yield different per-path point estimates with no warning. This is an undocumented methodology deviation from the cited R behavior and from the new registry/docstring contract. diff_diff/chaisemartin_dhaultfoeuille.py:L424-L434, diff_diff/chaisemartin_dhaultfoeuille.py:L1493-L1519, diff_diff/chaisemartin_dhaultfoeuille.py:L4152-L4210, docs/methodology/REGISTRY.md:L641-L641, tests/test_chaisemartin_dhaultfoeuille_parity.py:L761-L784, tests/test_chaisemartin_dhaultfoeuille.py:L6562-L6659, benchmarks/R/generate_dcdh_dynr_test_values.R:L706-L719
Concrete fix: base the warning/documentation on whether the DID^X first-stage sample differs path-by-path, not on np.unique(_switcher_baselines).size > 1 alone. If you do not want to implement that broader detection, then narrow the docs/REGISTRY/CHANGELOG/parity comments so they only claim exact parity for the specific validated special case, and add a regression for a constant-D_{g,1} panel with heterogeneous F_g / pre-switch rows.

Code Quality

No material findings beyond the methodology issue above.

Performance

No material findings.

Maintainability

No separate findings. The main maintainability risk is the incorrect methodology contract being duplicated across code comments, docs, tests, and benchmark notes.

Tech Debt

No separate findings. I did not find a TODO.md entry that would mitigate the P1 methodology issue.

Security

No findings.

Documentation/Tests

No separate blocker beyond the P1 methodology issue. The previous re-review P2 on contradictory changelog text appears resolved. CHANGELOG.md:L11-L11, CHANGELOG.md:L29-L29
Verification note: static review only; pytest is not available in this sandbox.

Path to Approval

Fix the by_path + controls methodology contract: either broaden the runtime warning to cover any case where the path-restricted DID^X first-stage sample differs from the global first stage, or narrow the docs so they no longer claim exact parity on all single-baseline panels.
Add a regression test for a by_path + controls panel with constant D_{g,1} but heterogeneous switch timing / pre-switch switcher rows across paths, and assert the intended behavior explicitly (warning fires, or parity holds if you implement a stronger equivalence condition).

…egression Reviewer flagged that the parity condition documented for by_path + controls (single-baseline switcher panel) might not be sufficient, hypothesizing that R's per-path subset could exclude pre-switch rows of other-path switchers and produce a different first-stage residualization sample. Verified the hypothesis is empirically falsified and analytically incorrect by reading R/R/did_multiplegt_dyn.R lines 401-405 line-by- line. R's per-path subset for path B includes: - Rows where path_XX == B (path-B switchers, all rows) - OR rows where yet_to_switch=1 AND baseline matches (pre-switch rows of any group with matching baseline, regardless of path) So R's per-path first-stage sample equals (pre-switch rows of all switchers with matching baseline + all rows of never-switchers with matching baseline) — bit-identical to our global first-stage sample under single-baseline switcher panels, regardless of how F_g varies across paths or within a path. Empirical confirmation: scenario 16 (`multi_path_reversible_by_path_controls`) has switcher F_g spanning [0..6] across 4 distinct paths under D_{g,1}=0 and Python matches R to rtol ~1e-11 across all (path, horizon) cells. Strengthened the contract: - Expanded the warning code comment to spell out R's per-path subset construction (citing R source line numbers) and why single-baseline- switcher is the precise parity condition (control-pool equivalence via the OR clause), with the empirical scenario reference baked in - Updated REGISTRY.md "Per-path covariate residualization (DID^X)" paragraph to cite R lines 401-405 and clarify never-switcher baselines do not affect parity - New regression test `test_single_baseline_heterogeneous_F_g_no_warning_and_matches_r` uses the golden-value scenario (single-baseline, heterogeneous F_g across paths) to assert: (a) no UserWarning fires, (b) per-path point estimates are produced finite. The numeric R-parity is locked separately in TestDCDHDynRParityByPathControls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T23:39:46Z

/ai-review

github-actions · 2026-04-25T23:44:58Z

🔁 AI review rerun (requested by @igerber)

Head SHA: d24ae257aca4b7153cc6942dfbbe9295ee629ce6

Overall Assessment

Looks good

Executive Summary

Re-review: the prior P1 on the parity condition appears addressed. The updated reasoning around single-baseline panels matches the first-stage sample actually used in diff_diff/chaisemartin_dhaultfoeuille.py:L1493-L1548, and the PR adds targeted warning/parity coverage in tests/test_chaisemartin_dhaultfoeuille.py:L6562-L6745 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L758-L860.
I did not find new P0/P1 estimator bugs in the by_path + controls gate lift. The residualized Y_mat now flows through the existing analytical/bootstrap/placebo/sup-t paths, while per_period_effects correctly stays on raw outcomes via diff_diff/chaisemartin_dhaultfoeuille.py:L1567-L1572 and diff_diff/chaisemartin_dhaultfoeuille.py:L1703-L1718.
One P3 informational issue remains: the new “single-baseline => exact R match” wording is broader than the library’s already-documented DID^X equal-cell-weighting deviation from R in docs/methodology/REGISTRY.md:L629-L629.
Static review only; I did not run pytest in this sandbox.

Methodology

Severity: P3. Impact: the new docstring/changelog/registry/benchmark/test commentary says single-baseline switcher panels “match R exactly,” but the existing DID^X note still documents a separate Python-vs-R first-stage weighting difference on uneven (g,t) cell sizes. So single-baseline panels can still differ from R for that already-documented reason even though the new multi-baseline warning will not fire. References: docs/methodology/REGISTRY.md:L629-L629, diff_diff/chaisemartin_dhaultfoeuille.py:L424-L434, docs/methodology/REGISTRY.md:L641-L641, CHANGELOG.md:L11-L11, benchmarks/R/generate_dcdh_dynr_test_values.R:L706-L719, tests/test_chaisemartin_dhaultfoeuille_parity.py:L770-L785. Concrete fix: qualify the exact-match claim to one-observation-per-cell / cell-aggregated inputs, or explicitly cross-reference the existing DID^X weighting-deviation note when describing the single-baseline parity condition.

Previous P1 resolved: the updated sample-equivalence argument for heterogeneous F_g on single-baseline panels is consistent with _compute_covariate_residualization() and the new regression coverage.

Code Quality

No material findings.

Performance

No material findings.

Maintainability

Severity: P3. Impact: the Step 7b header comment still says DID^X makes “ALL downstream DID computations (per-period and per-group multi-horizon) automatically produce covariate-adjusted estimates,” but the actual implementation intentionally excludes per_period_effects. This is comment drift, not a runtime bug, but it now conflicts with the contract added elsewhere in the PR. References: diff_diff/chaisemartin_dhaultfoeuille.py:L1466-L1472, diff_diff/chaisemartin_dhaultfoeuille.py:L1567-L1572, diff_diff/chaisemartin_dhaultfoeuille.py:L1703-L1718. Concrete fix: narrow that header comment to the per-group multi-horizon path and explicitly exclude the per-period DID surface.

Tech Debt

No separate findings. I did not see a TODO.md entry that changes the assessment of the items above.

Security

No findings.

Documentation/Tests

No separate blocker beyond the P3 wording/comment issues above.
The new tests cover the critical re-review cases well: lifted gate, multi-baseline warning, single-baseline heterogeneous-F_g no-warning case, bootstrap/placebo/sup-t inheritance, and the new R parity fixture in tests/test_chaisemartin_dhaultfoeuille.py:L6148-L6745 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L758-L860.
Verification note: static review only; tests not run in this sandbox.

…scope P3 #1 (Methodology): qualified the "exact R match" claim across docstring / REGISTRY / CHANGELOG / R-generator comment / parity test docstring with a cross-reference to the existing DID^X cell-weighting deviation (Python's first-stage uses equal cell weights, R weights by N_gt). The two coincide on one-observation-per-(g,t) panels (the common cell-aggregated regime that the parity scenario uses). The multi-observation-per-cell deviation is independent of the by_path lift and was already documented in REGISTRY's "Note (Phase 3 DID^X covariate adjustment)". P3 #2 (Maintainability): narrowed the Step 7b header comment in chaisemartin_dhaultfoeuille.py:1465-1473 to spell out that DID^X residualization applies to the per-group multi-horizon path (event_study_effects, overall_att, joiners/leavers, by_path, placebos, sup-t bands) but intentionally excludes per_period_effects which stays on raw outcomes per the existing "Note (Phase 3 DID^X covariate adjustment)" contract. Documentation-only fix; no runtime behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T23:48:00Z

/ai-review

github-actions · 2026-04-25T23:53:51Z

🔁 AI review rerun (requested by @igerber)

Head SHA: fb7a9129a35d08afabe0fdfe1625bc8080eb452e

Overall Assessment
✅ Looks good

Executive Summary

Severity P3: the new test at tests/test_chaisemartin_dhaultfoeuille.py:L6663-L6745 is named/docstringed as if it verifies R parity, but it only checks warning suppression and finite effects; the actual numeric parity assertions live in tests/test_chaisemartin_dhaultfoeuille_parity.py:L800-L875.
Re-review: the prior wording/comment-drift concerns appear addressed. Step 7b now explicitly excludes per_period_effects from DID^X residualization at diff_diff/chaisemartin_dhaultfoeuille.py:L1470-L1481, and the registry note now qualifies the single-baseline parity claim with the existing DID^X cell-weighting deviation at docs/methodology/REGISTRY.md:L641.
The estimator change itself is narrow: removing the gate at diff_diff/chaisemartin_dhaultfoeuille.py:L997-L1043 now lets controls flow into the existing residualization step before the existing by-path / placebo / bootstrap / sup-t branches at diff_diff/chaisemartin_dhaultfoeuille.py:L1486-L1581, diff_diff/chaisemartin_dhaultfoeuille.py:L2051-L2215, and diff_diff/chaisemartin_dhaultfoeuille.py:L2978-L3115.
I did not find a new P0/P1 methodology, control-group, variance/SE, or NaN-propagation defect on the now-enabled by_path + controls path.
Added coverage is solid overall: gate lift, cross-surface inheritance, per-period raw-outcome behavior, multi-baseline warning, and R parity are all represented in tests/test_chaisemartin_dhaultfoeuille.py:L6148-L6745 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L758-L875.
Verification note: static review only; pytest is not installed in this sandbox.

Methodology

No material findings. The multi-baseline Python-vs-R point-estimate difference is now documented in the registry and surfaced as a runtime UserWarning, so this is a documented deviation rather than an undocumented mismatch.

Code Quality

No material findings.

Performance

No findings.

Maintainability

No material findings. The earlier Step 7b comment drift is fixed at diff_diff/chaisemartin_dhaultfoeuille.py:L1470-L1481.

Tech Debt

No new findings. I did not see a relevant TODO.md entry needed for this change.

Security

No findings.

Documentation/Tests

Severity P3. Impact: tests/test_chaisemartin_dhaultfoeuille.py:L6663-L6745 is presented as if it locks the “single-baseline heterogeneous F_g still matches R” claim, but the body only checks that the multi-baseline warning does not fire and that effects are finite. The actual golden-value parity check is separate, in tests/test_chaisemartin_dhaultfoeuille_parity.py:L800-L875. Concrete fix: rename the test/docstring to reflect its real scope, or add direct golden-value assertions there if duplicate parity coverage is intentional.

Test name `test_single_baseline_heterogeneous_F_g_no_warning_and_matches_r` oversold its scope — the body only checks warning suppression and finite effects. The numeric R-parity assertion (rtol ~1e-11 per-path) lives in TestDCDHDynRParityByPathControls which fits the same scenario and compares against the golden values. Renamed to `test_single_baseline_heterogeneous_F_g_does_not_warn` and updated docstring + inline comment to spell out: this test locks the warning-suppression invariant; the numeric R-parity check is in the parity test class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T01:06:14Z

/ai-review

github-actions · 2026-04-26T01:17:05Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6785d6a8a29da219c361f2e28029fd64f87fbcd7

Overall Assessment
✅ Looks good

Executive Summary

No unmitigated P0/P1 issues found in the by_path + controls gate lift.
The estimator change is narrow: by_path now reuses the existing DID^X residualization before path enumeration, and the downstream per-path analytical, bootstrap, placebo, and sup-t paths all consume the residualized Y_mat, while per_period_effects stays on raw outcomes as documented in diff_diff/chaisemartin_dhaultfoeuille.py:L1469-L1581, diff_diff/chaisemartin_dhaultfoeuille.py:L1701-L1728, and diff_diff/chaisemartin_dhaultfoeuille.py:L2048-L2215.
The only substantive R divergence on the new path is the multi-baseline-switcher case, and that is now explicitly documented in docs/methodology/REGISTRY.md:L641-L643 and surfaced with a fit-time UserWarning in diff_diff/chaisemartin_dhaultfoeuille.py:L1538-L1557, so this is a documented deviation rather than an undocumented methodology mismatch.
Re-review: the prior P3 wording/naming concern appears addressed. The heterogeneous-F_g test now clearly scopes itself to warning behavior and points numeric parity to the dedicated parity fixture in tests/test_chaisemartin_dhaultfoeuille.py:L6663-L6757 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L756-L876.
Coverage on the new combination is solid: analytical path, bootstrap path, per-path placebos, sup-t bands, raw per-period contract, warning behavior, and R parity all have direct regression coverage in tests/test_chaisemartin_dhaultfoeuille.py:L6147-L6757 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L756-L876.
Static review only: pytest, numpy, and pandas are not installed in this sandbox, so I did not execute the suite.

Methodology

Severity P3 (informational). Impact: by_path + controls intentionally diverges from R on multi-baseline switcher panels because Python residualizes once globally while R re-runs residualization per path; this is now documented in docs/methodology/REGISTRY.md:L641-L643 and warned at runtime in diff_diff/chaisemartin_dhaultfoeuille.py:L1538-L1557, so users are not exposed to a silent methodology mismatch. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked deferrable item identified. I did not see anything in the changed path that needs a new TODO.md entry beyond the documented deviations already carried in docs/methodology/REGISTRY.md and the existing review-debt table at TODO.md:L51-L59.

Security

No findings.

Documentation/Tests

No findings. The prior AI-review P3 appears resolved: the warning-suppression test is now described consistently with what it actually asserts, and the numeric parity check lives in the dedicated parity test file at tests/test_chaisemartin_dhaultfoeuille.py:L6663-L6757 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L800-L876.

igerber and others added 2 commits April 25, 2026 18:44

igerber force-pushed the dcdh-by-path-controls branch from a3de936 to 028e7e1 Compare April 25, 2026 22:44

Conversation

igerber commented Apr 25, 2026

Summary

Methodology notes

Files changed

Test plan

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant