Skip to content

Lift by_path + controls gate (DID^X residualization)#378

Open
igerber wants to merge 6 commits intomainfrom
dcdh-by-path-controls
Open

Lift by_path + controls gate (DID^X residualization)#378
igerber wants to merge 6 commits intomainfrom
dcdh-by-path-controls

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 25, 2026

Summary

  • Wave 3 Add comprehensive code review for diff-diff library #5 of the dCDH by_path follow-up sequence: combine by_path=k with controls=[...] (DID^X residualization)
  • Pure gate-lift — the existing per-baseline OLS residualization at diff_diff/chaisemartin_dhaultfoeuille.py:1498 runs once on the full panel BEFORE path enumeration, so all four downstream surfaces (analytical SE / bootstrap SE / per-path placebos / per-path joint sup-t bands) auto-inherit the residualized Y_mat (Frisch-Waugh-Lovell)
  • New R parity scenario multi_path_reversible_by_path_controls validates per-path point estimates against canonical R did_multiplegt_dyn(..., by_path=3, controls="X1") — exact match (rtol ~1e-11) on the single-baseline DGP

Methodology notes

R does per-path re-residualization with a path-restricted subsample (path's switchers + same-baseline not-yet-treated controls). Our architecture residualizes once globally, then disaggregates. On the multi_path_reversible DGP all switchers share baseline D_{g,1}=0, so R's per-path control pool equals our global control pool — residualization coefficients coincide and per-path point estimates match exactly. Per-path SE inherits the documented cross-path cohort-sharing deviation from R (Phase 2 envelope, ~6.5% rtol on this scenario).

Per-period effects remain unadjusted. The per-period DID path uses raw Y_mat per the existing controls + per-period contract documented at chaisemartin_dhaultfoeuille.py:1493-1496. Pinned by test_per_period_effects_unadjusted_with_by_path_controls.

Files changed

File Change
diff_diff/chaisemartin_dhaultfoeuille.py Delete gate at :988-992; update by_path docstring (remove controls from incompatible list, add inheritance paragraph)
tests/test_chaisemartin_dhaultfoeuille.py Remove controls parametrize entry from TestByPathGates; add TestByPathControls class (12 tests)
tests/test_chaisemartin_dhaultfoeuille_parity.py Add TestDCDHDynRParityByPathControls class
benchmarks/R/generate_dcdh_dynr_test_values.R Add Scenario 16 multi_path_reversible_by_path_controls
benchmarks/data/dcdh_dynr_golden_values.json Regenerated via Rscript benchmarks/R/generate_dcdh_dynr_test_values.R
docs/methodology/REGISTRY.md Remove controls from gated-combos list in the Phase-3 by_path Note; add inheritance sub-paragraph
CHANGELOG.md Unreleased > Added entry

TestByPathControls covers all four downstream surfaces explicitly (analytical / bootstrap / placebo / sup-t bands), plus covariate_residuals round-trip, to_dataframe(level="by_path") cband columns under controls + bootstrap, multi-covariate, path-enumeration-unaffected-by-controls, and the per-period-unadjusted contract.

Test plan

  • pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls -m "slow or not slow" — 12 passed
  • pytest tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls — 1 passed (point estimates exact match, SE within Phase 2 envelope)
  • pytest tests/test_chaisemartin_dhaultfoeuille.py -k "ByPath" — 88 passed (all existing by_path classes regression-clean)
  • pytest tests/test_chaisemartin_dhaultfoeuille.py::TestCovariateAdjustment — 8 passed (existing controls regression-clean)
  • pytest tests/test_chaisemartin_dhaultfoeuille_parity.py — 19 passed
  • Full dCDH suite (tests/test_chaisemartin_dhaultfoeuille.py + test_methodology_chaisemartin_dhaultfoeuille.py + test_chaisemartin_dhaultfoeuille_parity.py) — 231 passed

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment

⚠️ Needs changes

Executive Summary

  • by_path + controls is now enabled globally, but the PR’s own parity note narrows exact R agreement to a single-baseline fixture and acknowledges multi-baseline point-estimate drift; that deviation is not documented in REGISTRY.md or surfaced at runtime.
  • The underlying data flow is otherwise consistent: DID^X residualization happens before multi-horizon/by-path computation, and per-period DID stays on raw outcomes.
  • The PR deletes the only explicit Tutorial 19 drift-regression test without an evident replacement, while docs still claim the tutorial has drift guards.
  • I did not find new inline-inference/NaN-guard regressions in the changed estimator path.
  • Static review only: pytest is not installed in this sandbox.

Methodology

Code Quality

  • No material findings in the changed runtime code beyond the methodology-contract issue above.

Performance

  • No material findings.

Maintainability

  • No separate findings beyond the methodology-contract inconsistency already noted.

Tech Debt

  • No material findings.

Security

  • No findings.

Documentation/Tests

  • Severity: P2. The PR deletes tests/test_t19_marketing_pulse_drift.py:L1-L304 (deleted), which was the explicit numeric drift guard for Tutorial 19, but still advertises “drift guards” in CHANGELOG.md:L15 and docs/tutorials/README.md:L90-L96. I also could not find replacement assertion cells in docs/tutorials/19_dcdh_marketing_pulse.ipynb. Impact: the tutorial’s quoted numbers can now go stale silently as long as the notebook still executes. Concrete fix: restore the deleted test or move equivalent executed assertions into the notebook, then update the README/changelog if drift guards are intentionally being removed.

Path to Approval

  1. Resolve the by_path + controls contract for multi-baseline panels: document it in docs/methodology/REGISTRY.md as an explicit deviation from R and warn/guard at runtime, or keep the gate until broader parity support is established.
  2. Restore Tutorial 19 drift protection, either by undeleting tests/test_t19_marketing_pulse_drift.py or by adding equivalent executed assertions to docs/tutorials/19_dcdh_marketing_pulse.ipynb, and align the README/changelog text with that choice.

igerber added a commit that referenced this pull request Apr 25, 2026
…note

CI reviewer flagged that the new by_path + controls combination
silently produces point-estimate divergence from R on multi-baseline
switcher panels (R re-runs per-baseline residualization on each
path's restricted subsample; we residualize once globally). The
parity test docstring documented the deviation but REGISTRY.md and
the runtime did not.

Fixes:
- Emit UserWarning in fit() when by_path + controls is used on a
  panel with multiple switcher D_{g,1} values (chaisemartin_dhaultfoeuille.py
  inside the controls residualization block, after _compute_group_switch_metadata)
- Update the by_path docstring with an explicit "Deviation from R on
  multi-baseline switcher panels" paragraph
- Update REGISTRY.md "Per-path covariate residualization (DID^X)"
  paragraph to document the point-estimate deviation alongside the
  existing SE deviation
- Update CHANGELOG entry to call out the multi-baseline deviation
- Update R-generator scenario 16 comment to correctly describe R's
  per-path re-residualization (the prior comment misstated R's
  behavior as "residualize once globally")
- Update parity test class docstring to be precise about R's
  per-path call site (R/R/did_multiplegt_dyn.R lines 393-411)
- Add two regression tests:
  * test_multi_baseline_panel_emits_r_deviation_warning — joiner +
    leaver + always-treated + never-treated panel triggers the warning
  * test_single_baseline_panel_does_not_emit_r_deviation_warning —
    standard 3-path joiners-only fixture does NOT trigger the warning

The single-baseline R-parity scenario (multi_path_reversible_by_path_controls)
remains exact-match (rtol ~1e-11) because all switchers in the DGP
share D_{g,1}=0 and R's per-path control pool reduces to the global
control pool we use.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: a3de936dc77d572584975f16d62c42b5942e2e52


Overall Assessment

✅ Looks good

Executive Summary

  • Affected methods: ChaisemartinDHaultfoeuille.by_path and DID^X covariate residualization (controls=[...]), with inherited per-path bootstrap/placebo/sup-t surfaces.
  • Previous methodology P1 is resolved: the by_path + controls R-divergence is now documented in the estimator docstring and docs/methodology/REGISTRY.md, and a fit-time UserWarning is emitted on multi-baseline switcher panels (diff_diff/chaisemartin_dhaultfoeuille.py:L416-L435, diff_diff/chaisemartin_dhaultfoeuille.py:L1493-L1524, docs/methodology/REGISTRY.md:L641-L643).
  • I did not find new inline-inference or NaN/SE anti-pattern regressions in the changed estimator path.
  • Previous documentation/tests P2 remains open: the supplied diff deletes tests/test_t19_marketing_pulse_drift.py, but Tutorial 19 is still advertised as having “drift guards” and the notebook itself does not contain executable assertion cells (CHANGELOG.md:L15-L15, docs/tutorials/README.md:L90-L96, docs/tutorials/19_dcdh_marketing_pulse.ipynb:L319-L400).
  • Static review only; I did not run pytest in this sandbox.

Methodology

  • Severity: P3 informational. The prior undocumented deviation for by_path + controls is now mitigated. The implementation explicitly documents that Python residualizes once before path enumeration, warns when multi-baseline switcher panels can diverge from R, and adds parity/warning tests for the supported single-baseline case and the warning path (diff_diff/chaisemartin_dhaultfoeuille.py:L416-L435, diff_diff/chaisemartin_dhaultfoeuille.py:L1493-L1524, docs/methodology/REGISTRY.md:L641-L643, tests/test_chaisemartin_dhaultfoeuille.py:L6561-L6650, tests/test_chaisemartin_dhaultfoeuille_parity.py:L756-L870). Impact: no blocker under the review rubric because the deviation is documented in REGISTRY.md and surfaced at runtime. Concrete fix: none.

Code Quality

  • No material findings.

Performance

  • No material findings.

Maintainability

  • No material findings.

Tech Debt

  • No separate findings. The live Tutorial 19 issue below is not tracked in TODO.md, so it remains an unmitigated documentation/tests item rather than accepted deferred work.

Security

  • No findings.

Documentation/Tests

  • Severity: P2. The PR removes the dedicated Tutorial 19 numeric drift test (tests/test_t19_marketing_pulse_drift.py:L1-L304 in the supplied diff), but the public-facing docs still claim the tutorial has drift guards (CHANGELOG.md:L15-L15, docs/tutorials/README.md:L90-L96). The notebook workflow still uses pytest --nbmake docs/tutorials/, which checks execution rather than quoted-number stability (.github/workflows/notebooks.yml:L50-L58), and the notebook tail is narrative-only rather than assert-backed (docs/tutorials/19_dcdh_marketing_pulse.ipynb:L319-L400). Impact: the tutorial’s quoted headline, joiner/leaver, placebo, and event-study numbers can now drift silently without CI catching prose mismatch. Concrete fix: either keep tests/test_t19_marketing_pulse_drift.py, or move equivalent executed assertions into docs/tutorials/19_dcdh_marketing_pulse.ipynb and keep the “drift guards” claim; otherwise, remove the “drift guards” wording from the changelog and tutorial catalog.

igerber and others added 2 commits April 25, 2026 18:44
PR #357 shipped by_path foundation; PRs #364/#371/#374 completed the
inference surface (bootstrap, placebos, sup-t bands). Wave 3 begins
design-variant extensions; this PR is item #5: combine by_path=k with
controls=[...] (DID^X).

Architecture: the per-baseline OLS residualization at
chaisemartin_dhaultfoeuille.py:1498 runs once on the full panel
BEFORE path enumeration, so all four downstream surfaces (analytical
SE, bootstrap SE, per-path placebos, per-path joint sup-t bands)
consume the residualized Y_mat automatically (Frisch-Waugh-Lovell).
Per-period effects remain unadjusted, consistent with the existing
controls + per-period DID contract.

Canonical R behavior: `did_multiplegt_dyn(..., by_path=k, controls=...)`
re-runs the per-baseline residualization on each path's restricted
subsample (path's switchers + same-baseline not-yet-treated controls).
On the multi_path_reversible DGP all switchers share baseline
D_{g,1}=0, so R's per-path control pool equals our global control
pool and the residualization coefficients coincide. Per-path point
estimates match R exactly (rtol ~1e-11); per-path SE within ~6.5%
(Phase 2 envelope, inheriting the documented cross-path cohort-
sharing deviation).

Changes:
- Delete the gate at chaisemartin_dhaultfoeuille.py:988-992
- Update by_path docstring (remove `controls` from incompatible list,
  add inheritance paragraph)
- New R parity scenario `multi_path_reversible_by_path_controls` in
  benchmarks/R/generate_dcdh_dynr_test_values.R + regenerated golden
  values
- New TestDCDHDynRParityByPathControls in
  tests/test_chaisemartin_dhaultfoeuille_parity.py
- New TestByPathControls in tests/test_chaisemartin_dhaultfoeuille.py
  (12 tests covering analytical / bootstrap / placebo / sup-t / cband
  to_dataframe / per-period unadjusted / covariate_residuals round-
  trip / multi-covariate)
- Remove the `controls` parametrize entry from
  TestByPathGates::test_forbids_phase3_fit_kwargs
- Update REGISTRY.md (remove `controls` from gated-combos list, add
  inheritance sub-paragraph documenting the four-surface auto-
  inheritance)
- CHANGELOG: Unreleased > Added entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…note

CI reviewer flagged that the new by_path + controls combination
silently produces point-estimate divergence from R on multi-baseline
switcher panels (R re-runs per-baseline residualization on each
path's restricted subsample; we residualize once globally). The
parity test docstring documented the deviation but REGISTRY.md and
the runtime did not.

Fixes:
- Emit UserWarning in fit() when by_path + controls is used on a
  panel with multiple switcher D_{g,1} values (chaisemartin_dhaultfoeuille.py
  inside the controls residualization block, after _compute_group_switch_metadata)
- Update the by_path docstring with an explicit "Deviation from R on
  multi-baseline switcher panels" paragraph
- Update REGISTRY.md "Per-path covariate residualization (DID^X)"
  paragraph to document the point-estimate deviation alongside the
  existing SE deviation
- Update CHANGELOG entry to call out the multi-baseline deviation
- Update R-generator scenario 16 comment to correctly describe R's
  per-path re-residualization (the prior comment misstated R's
  behavior as "residualize once globally")
- Update parity test class docstring to be precise about R's
  per-path call site (R/R/did_multiplegt_dyn.R lines 393-411)
- Add two regression tests:
  * test_multi_baseline_panel_emits_r_deviation_warning — joiner +
    leaver + always-treated + never-treated panel triggers the warning
  * test_single_baseline_panel_does_not_emit_r_deviation_warning —
    standard 3-path joiners-only fixture does NOT trigger the warning

The single-baseline R-parity scenario (multi_path_reversible_by_path_controls)
remains exact-match (rtol ~1e-11) because all switchers in the DGP
share D_{g,1}=0 and R's per-path control pool reduces to the global
control pool we use.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the dcdh-by-path-controls branch from a3de936 to 028e7e1 Compare April 25, 2026 22:44
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 028e7e180952f627642d9f545fb2f43d51451202


Overall Assessment

✅ Looks good

Executive Summary

  • Affected methodology surface: ChaisemartinDHaultfoeuille.fit(..., by_path, controls) plus inherited per-path bootstrap/placebo/sup-t outputs. The gate lift is wired through the existing residualized-Y_mat pipeline rather than a separate estimator branch, which is the right shape for this feature. diff_diff/chaisemartin_dhaultfoeuille.py:L416-L445, diff_diff/chaisemartin_dhaultfoeuille.py:L1493-L1535, diff_diff/chaisemartin_dhaultfoeuille.py:L1679-L1693
  • The Python-vs-R divergence on multi-baseline switcher panels is now documented in both the estimator docstring and the Methodology Registry, and surfaced at runtime via UserWarning; under the review rubric that is a documented deviation, so I do not count it as a methodology defect. diff_diff/chaisemartin_dhaultfoeuille.py:L424-L435, diff_diff/chaisemartin_dhaultfoeuille.py:L1505-L1524, docs/methodology/REGISTRY.md:L641-L643
  • I did not find new NaN/inference anti-patterns on the modified estimator path; bootstrap path/placebo propagation still goes through safe_inference and writes full-NaN inference tuples when bootstrap SEs are invalid. diff_diff/chaisemartin_dhaultfoeuille.py:L2990-L3048
  • One non-blocking issue remains: CHANGELOG.md now contradicts itself inside the same Unreleased section. The new entry says by_path + controls is supported, but the older by_path entry still says controls raises NotImplementedError. CHANGELOG.md:L11-L11, CHANGELOG.md:L29-L29
  • Re-review note: I am not carrying forward the prior Tutorial 19 drift-guard concern; the dedicated drift test file is present in the current tree. tests/test_t19_marketing_pulse_drift.py:L1-L16
  • Static review only; I could not execute the targeted pytest slices here because this sandbox does not have the pytest module installed.

Methodology

  • Severity: P3 informational. Impact: the changed estimator path matches the documented contract: residualize once before path enumeration, keep per-period DID on raw outcomes, and warn on the documented multi-baseline deviation from R. No unmitigated P0/P1 methodology findings. Concrete fix: none. diff_diff/chaisemartin_dhaultfoeuille.py:L416-L445, diff_diff/chaisemartin_dhaultfoeuille.py:L1493-L1535, diff_diff/chaisemartin_dhaultfoeuille.py:L1679-L1693, docs/methodology/REGISTRY.md:L641-L643

Code Quality

  • No material findings.

Performance

  • No material findings.

Maintainability

  • No material findings.

Tech Debt

  • No separate findings.

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: the public Unreleased changelog is self-contradictory about whether by_path + controls is supported, which will mislead users and reviewers even though the code and registry are aligned. Concrete fix: remove controls from the older by_path incompatibility list, or consolidate the overlapping by_path bullets so compatibility is stated in one canonical place. CHANGELOG.md:L11-L11, CHANGELOG.md:L29-L29
  • Verification note: targeted test execution was not possible in this sandbox because pytest is unavailable.

Older Unreleased by_path entry (PR #357 origin) listed `controls`
in the gated-combos list, contradicting the new entry that says
`by_path + controls` is supported. Same internal contradiction
applied implicitly to the bootstrap, placebo, and sup-t bands
extensions that landed in the same Unreleased block — those weren't
in the gated list to begin with, but the entry didn't acknowledge
they're now supported.

Updated the by_path bullet to remove `controls` from the gated list
and explicitly acknowledge `n_bootstrap > 0`, `placebo=True`, joint
sup-t bands, and `controls` as supported (see dedicated entries
elsewhere in [Unreleased]).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 1e49dff9c73cb73afbeb785dbe9a4d017c76b667


Overall Assessment

⚠️ Needs changes

Executive Summary

  • Re-review: the prior changelog contradiction appears fixed; CHANGELOG.md now consistently says by_path + controls is supported. CHANGELOG.md:L11-L11, CHANGELOG.md:L29-L29
  • The gate lift itself is routed through the existing DID^X residualization path; I did not find new NaN/SE anti-patterns in the modified by_path plumbing.
  • P1 methodology blocker: the new docs/runtime warning treat “all switchers share one D_{g,1}” as sufficient for Python/R point-estimate parity, but the DID^X first stage is estimated on all baseline-matching, not-yet-switched observations. That means same-baseline panels can still differ from R if different paths contribute different pre-switch switcher rows.
  • Because the new warning only keys on multiple baseline values, those same-baseline divergence cases can be silent in production.
  • The new tests and benchmark comments reinforce that too-broad parity claim instead of covering the constant-baseline / heterogeneous-switch-timing case.
  • Static review only; pytest is not installed in this sandbox.

Methodology

  • Severity: P1
    Impact: ChaisemartinDHaultfoeuille.fit(..., by_path, controls) now documents and warns on the wrong parity condition. The new docstring/registry note say Python matches R whenever switchers share a single baseline, and the runtime warning only fires when switchers span multiple D_{g,1} values. But _compute_covariate_residualization() fits theta_hat_d on all baseline-matching, not-yet-switched observations (valid = d_mask & not_yet_switched & both_observed), so pre-switch rows from later-treated switchers remain in the Python first stage even when D_{g,1} is constant. R’s by_path + controls reruns that first stage on each path-restricted subsample. So parity depends on first-stage sample equivalence, not just baseline multiplicity, and same-baseline panels can still yield different per-path point estimates with no warning. This is an undocumented methodology deviation from the cited R behavior and from the new registry/docstring contract. diff_diff/chaisemartin_dhaultfoeuille.py:L424-L434, diff_diff/chaisemartin_dhaultfoeuille.py:L1493-L1519, diff_diff/chaisemartin_dhaultfoeuille.py:L4152-L4210, docs/methodology/REGISTRY.md:L641-L641, tests/test_chaisemartin_dhaultfoeuille_parity.py:L761-L784, tests/test_chaisemartin_dhaultfoeuille.py:L6562-L6659, benchmarks/R/generate_dcdh_dynr_test_values.R:L706-L719
    Concrete fix: base the warning/documentation on whether the DID^X first-stage sample differs path-by-path, not on np.unique(_switcher_baselines).size > 1 alone. If you do not want to implement that broader detection, then narrow the docs/REGISTRY/CHANGELOG/parity comments so they only claim exact parity for the specific validated special case, and add a regression for a constant-D_{g,1} panel with heterogeneous F_g / pre-switch rows.

Code Quality

No material findings beyond the methodology issue above.

Performance

No material findings.

Maintainability

No separate findings. The main maintainability risk is the incorrect methodology contract being duplicated across code comments, docs, tests, and benchmark notes.

Tech Debt

No separate findings. I did not find a TODO.md entry that would mitigate the P1 methodology issue.

Security

No findings.

Documentation/Tests

  • No separate blocker beyond the P1 methodology issue. The previous re-review P2 on contradictory changelog text appears resolved. CHANGELOG.md:L11-L11, CHANGELOG.md:L29-L29
  • Verification note: static review only; pytest is not available in this sandbox.

Path to Approval

  1. Fix the by_path + controls methodology contract: either broaden the runtime warning to cover any case where the path-restricted DID^X first-stage sample differs from the global first stage, or narrow the docs so they no longer claim exact parity on all single-baseline panels.
  2. Add a regression test for a by_path + controls panel with constant D_{g,1} but heterogeneous switch timing / pre-switch switcher rows across paths, and assert the intended behavior explicitly (warning fires, or parity holds if you implement a stronger equivalence condition).

…egression

Reviewer flagged that the parity condition documented for by_path +
controls (single-baseline switcher panel) might not be sufficient,
hypothesizing that R's per-path subset could exclude pre-switch
rows of other-path switchers and produce a different first-stage
residualization sample.

Verified the hypothesis is empirically falsified and analytically
incorrect by reading R/R/did_multiplegt_dyn.R lines 401-405 line-by-
line. R's per-path subset for path B includes:
  - Rows where path_XX == B  (path-B switchers, all rows)
  - OR rows where yet_to_switch=1 AND baseline matches  (pre-switch
    rows of any group with matching baseline, regardless of path)

So R's per-path first-stage sample equals (pre-switch rows of all
switchers with matching baseline + all rows of never-switchers with
matching baseline) — bit-identical to our global first-stage sample
under single-baseline switcher panels, regardless of how F_g varies
across paths or within a path. Empirical confirmation: scenario 16
(`multi_path_reversible_by_path_controls`) has switcher F_g spanning
[0..6] across 4 distinct paths under D_{g,1}=0 and Python matches
R to rtol ~1e-11 across all (path, horizon) cells.

Strengthened the contract:
- Expanded the warning code comment to spell out R's per-path subset
  construction (citing R source line numbers) and why single-baseline-
  switcher is the precise parity condition (control-pool equivalence
  via the OR clause), with the empirical scenario reference baked in
- Updated REGISTRY.md "Per-path covariate residualization (DID^X)"
  paragraph to cite R lines 401-405 and clarify never-switcher
  baselines do not affect parity
- New regression test
  `test_single_baseline_heterogeneous_F_g_no_warning_and_matches_r`
  uses the golden-value scenario (single-baseline, heterogeneous F_g
  across paths) to assert: (a) no UserWarning fires, (b) per-path
  point estimates are produced finite. The numeric R-parity is locked
  separately in TestDCDHDynRParityByPathControls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: d24ae257aca4b7153cc6942dfbbe9295ee629ce6


Overall Assessment

Looks good

Executive Summary

  • Re-review: the prior P1 on the parity condition appears addressed. The updated reasoning around single-baseline panels matches the first-stage sample actually used in diff_diff/chaisemartin_dhaultfoeuille.py:L1493-L1548, and the PR adds targeted warning/parity coverage in tests/test_chaisemartin_dhaultfoeuille.py:L6562-L6745 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L758-L860.
  • I did not find new P0/P1 estimator bugs in the by_path + controls gate lift. The residualized Y_mat now flows through the existing analytical/bootstrap/placebo/sup-t paths, while per_period_effects correctly stays on raw outcomes via diff_diff/chaisemartin_dhaultfoeuille.py:L1567-L1572 and diff_diff/chaisemartin_dhaultfoeuille.py:L1703-L1718.
  • One P3 informational issue remains: the new “single-baseline => exact R match” wording is broader than the library’s already-documented DID^X equal-cell-weighting deviation from R in docs/methodology/REGISTRY.md:L629-L629.
  • Static review only; I did not run pytest in this sandbox.

Methodology

  • Severity: P3. Impact: the new docstring/changelog/registry/benchmark/test commentary says single-baseline switcher panels “match R exactly,” but the existing DID^X note still documents a separate Python-vs-R first-stage weighting difference on uneven (g,t) cell sizes. So single-baseline panels can still differ from R for that already-documented reason even though the new multi-baseline warning will not fire. References: docs/methodology/REGISTRY.md:L629-L629, diff_diff/chaisemartin_dhaultfoeuille.py:L424-L434, docs/methodology/REGISTRY.md:L641-L641, CHANGELOG.md:L11-L11, benchmarks/R/generate_dcdh_dynr_test_values.R:L706-L719, tests/test_chaisemartin_dhaultfoeuille_parity.py:L770-L785. Concrete fix: qualify the exact-match claim to one-observation-per-cell / cell-aggregated inputs, or explicitly cross-reference the existing DID^X weighting-deviation note when describing the single-baseline parity condition.

Previous P1 resolved: the updated sample-equivalence argument for heterogeneous F_g on single-baseline panels is consistent with _compute_covariate_residualization() and the new regression coverage.

Code Quality

No material findings.

Performance

No material findings.

Maintainability

  • Severity: P3. Impact: the Step 7b header comment still says DID^X makes “ALL downstream DID computations (per-period and per-group multi-horizon) automatically produce covariate-adjusted estimates,” but the actual implementation intentionally excludes per_period_effects. This is comment drift, not a runtime bug, but it now conflicts with the contract added elsewhere in the PR. References: diff_diff/chaisemartin_dhaultfoeuille.py:L1466-L1472, diff_diff/chaisemartin_dhaultfoeuille.py:L1567-L1572, diff_diff/chaisemartin_dhaultfoeuille.py:L1703-L1718. Concrete fix: narrow that header comment to the per-group multi-horizon path and explicitly exclude the per-period DID surface.

Tech Debt

No separate findings. I did not see a TODO.md entry that changes the assessment of the items above.

Security

No findings.

Documentation/Tests

  • No separate blocker beyond the P3 wording/comment issues above.
  • The new tests cover the critical re-review cases well: lifted gate, multi-baseline warning, single-baseline heterogeneous-F_g no-warning case, bootstrap/placebo/sup-t inheritance, and the new R parity fixture in tests/test_chaisemartin_dhaultfoeuille.py:L6148-L6745 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L758-L860.
  • Verification note: static review only; tests not run in this sandbox.

…scope

P3 #1 (Methodology): qualified the "exact R match" claim across
docstring / REGISTRY / CHANGELOG / R-generator comment / parity test
docstring with a cross-reference to the existing DID^X cell-weighting
deviation (Python's first-stage uses equal cell weights, R weights
by N_gt). The two coincide on one-observation-per-(g,t) panels (the
common cell-aggregated regime that the parity scenario uses). The
multi-observation-per-cell deviation is independent of the by_path
lift and was already documented in REGISTRY's "Note (Phase 3 DID^X
covariate adjustment)".

P3 #2 (Maintainability): narrowed the Step 7b header comment in
chaisemartin_dhaultfoeuille.py:1465-1473 to spell out that DID^X
residualization applies to the per-group multi-horizon path
(event_study_effects, overall_att, joiners/leavers, by_path,
placebos, sup-t bands) but intentionally excludes per_period_effects
which stays on raw outcomes per the existing "Note (Phase 3 DID^X
covariate adjustment)" contract. Documentation-only fix; no runtime
behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: fb7a9129a35d08afabe0fdfe1625bc8080eb452e


Overall Assessment
✅ Looks good

Executive Summary

Methodology

  • No material findings. The multi-baseline Python-vs-R point-estimate difference is now documented in the registry and surfaced as a runtime UserWarning, so this is a documented deviation rather than an undocumented mismatch.

Code Quality

  • No material findings.

Performance

  • No findings.

Maintainability

Tech Debt

  • No new findings. I did not see a relevant TODO.md entry needed for this change.

Security

  • No findings.

Documentation/Tests

Test name `test_single_baseline_heterogeneous_F_g_no_warning_and_matches_r`
oversold its scope — the body only checks warning suppression and
finite effects. The numeric R-parity assertion (rtol ~1e-11 per-path)
lives in TestDCDHDynRParityByPathControls which fits the same
scenario and compares against the golden values.

Renamed to `test_single_baseline_heterogeneous_F_g_does_not_warn`
and updated docstring + inline comment to spell out: this test
locks the warning-suppression invariant; the numeric R-parity check
is in the parity test class.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 6785d6a8a29da219c361f2e28029fd64f87fbcd7


Overall Assessment
✅ Looks good

Executive Summary

  • No unmitigated P0/P1 issues found in the by_path + controls gate lift.
  • The estimator change is narrow: by_path now reuses the existing DID^X residualization before path enumeration, and the downstream per-path analytical, bootstrap, placebo, and sup-t paths all consume the residualized Y_mat, while per_period_effects stays on raw outcomes as documented in diff_diff/chaisemartin_dhaultfoeuille.py:L1469-L1581, diff_diff/chaisemartin_dhaultfoeuille.py:L1701-L1728, and diff_diff/chaisemartin_dhaultfoeuille.py:L2048-L2215.
  • The only substantive R divergence on the new path is the multi-baseline-switcher case, and that is now explicitly documented in docs/methodology/REGISTRY.md:L641-L643 and surfaced with a fit-time UserWarning in diff_diff/chaisemartin_dhaultfoeuille.py:L1538-L1557, so this is a documented deviation rather than an undocumented methodology mismatch.
  • Re-review: the prior P3 wording/naming concern appears addressed. The heterogeneous-F_g test now clearly scopes itself to warning behavior and points numeric parity to the dedicated parity fixture in tests/test_chaisemartin_dhaultfoeuille.py:L6663-L6757 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L756-L876.
  • Coverage on the new combination is solid: analytical path, bootstrap path, per-path placebos, sup-t bands, raw per-period contract, warning behavior, and R parity all have direct regression coverage in tests/test_chaisemartin_dhaultfoeuille.py:L6147-L6757 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L756-L876.
  • Static review only: pytest, numpy, and pandas are not installed in this sandbox, so I did not execute the suite.

Methodology

  • Severity P3 (informational). Impact: by_path + controls intentionally diverges from R on multi-baseline switcher panels because Python residualizes once globally while R re-runs residualization per path; this is now documented in docs/methodology/REGISTRY.md:L641-L643 and warned at runtime in diff_diff/chaisemartin_dhaultfoeuille.py:L1538-L1557, so users are not exposed to a silent methodology mismatch. Concrete fix: none.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new untracked deferrable item identified. I did not see anything in the changed path that needs a new TODO.md entry beyond the documented deviations already carried in docs/methodology/REGISTRY.md and the existing review-debt table at TODO.md:L51-L59.

Security

  • No findings.

Documentation/Tests

  • No findings. The prior AI-review P3 appears resolved: the warning-suppression test is now described consistently with what it actually asserts, and the numeric parity check lives in the dedicated parity test file at tests/test_chaisemartin_dhaultfoeuille.py:L6663-L6757 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L800-L876.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant