From 963a409df50c0c5dbb72b0cd945f439f75fd0bd2 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 17:56:53 -0400 Subject: [PATCH 1/6] Lift by_path + controls gate (DID^X residualization) PR #357 shipped by_path foundation; PRs #364/#371/#374 completed the inference surface (bootstrap, placebos, sup-t bands). Wave 3 begins design-variant extensions; this PR is item #5: combine by_path=k with controls=[...] (DID^X). Architecture: the per-baseline OLS residualization at chaisemartin_dhaultfoeuille.py:1498 runs once on the full panel BEFORE path enumeration, so all four downstream surfaces (analytical SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized Y_mat automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing controls + per-period DID contract. Canonical R behavior: `did_multiplegt_dyn(..., by_path=k, controls=...)` re-runs the per-baseline residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls). On the multi_path_reversible DGP all switchers share baseline D_{g,1}=0, so R's per-path control pool equals our global control pool and the residualization coefficients coincide. Per-path point estimates match R exactly (rtol ~1e-11); per-path SE within ~6.5% (Phase 2 envelope, inheriting the documented cross-path cohort- sharing deviation). Changes: - Delete the gate at chaisemartin_dhaultfoeuille.py:988-992 - Update by_path docstring (remove `controls` from incompatible list, add inheritance paragraph) - New R parity scenario `multi_path_reversible_by_path_controls` in benchmarks/R/generate_dcdh_dynr_test_values.R + regenerated golden values - New TestDCDHDynRParityByPathControls in tests/test_chaisemartin_dhaultfoeuille_parity.py - New TestByPathControls in tests/test_chaisemartin_dhaultfoeuille.py (12 tests covering analytical / bootstrap / placebo / sup-t / cband to_dataframe / per-period unadjusted / covariate_residuals round- trip / multi-covariate) - Remove the `controls` parametrize entry from TestByPathGates::test_forbids_phase3_fit_kwargs - Update REGISTRY.md (remove `controls` from gated-combos list, add inheritance sub-paragraph documenting the four-surface auto- inheritance) - CHANGELOG: Unreleased > Added entry Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 1 + benchmarks/R/generate_dcdh_dynr_test_values.R | 39 ++ benchmarks/data/dcdh_dynr_golden_values.json | 114 ++++ diff_diff/chaisemartin_dhaultfoeuille.py | 23 +- docs/methodology/REGISTRY.md | 2 +- tests/test_chaisemartin_dhaultfoeuille.py | 493 +++++++++++++++++- ...test_chaisemartin_dhaultfoeuille_parity.py | 109 ++++ 7 files changed, 770 insertions(+), 11 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ff37dbc2..efc0c411 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added +- **`ChaisemartinDHaultfoeuille.by_path` + `controls`** (DID^X residualization) — the per-baseline OLS residualization (Web Appendix Section 1.2) is now compatible with `by_path=k`. The residualization runs once on the first-differenced outcome BEFORE path enumeration, so all four downstream surfaces (analytical per-path SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Inherits the cross-path cohort-sharing SE deviation from R** documented for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` via the new `multi_path_reversible_by_path_controls` golden-value scenario (per-path point estimates exact match — measured rtol ~1e-11 across all path × horizon cells; per-path SE within ~6.5% of R, well inside the Phase 2 multi-horizon envelope). Gate at `chaisemartin_dhaultfoeuille.py:988-992` removed; `by_path` docstring updated to add the new compatibility paragraph and remove `controls` from the incompatible list. R-parity test at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls`; cross-surface inheritance regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path covariate residualization (DID^X)" for the full contract. - **HAD linearity-family pretests under survey (Phase 4.5 C).** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` now accept `weights=` / `survey=` keyword-only kwargs. Stute family uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap): each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix, broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`, weighted OLS refit, weighted CvM via new `_cvm_statistic_weighted` helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence AND PSU clustering. Yatchew uses **closed-form weighted OLS + pweight-sandwich variance components** (no bootstrap): `sigma2_lin = sum(w·eps²)/sum(w)`, `sigma2_diff = sum(w_avg·diff²)/(2·sum(w))` with arithmetic-mean pair weights `w_avg_g = (w_g+w_{g-1})/2`, `sigma4_W = sum(w_avg·prod)/sum(w_avg)`, `T_hr = sqrt(sum(w))·(sigma2_lin-sigma2_diff)/sigma2_W`. All three Yatchew components reduce bit-exactly to the unweighted formulas at `w=ones(G)` (locked at `atol=1e-14` by direct helper test). The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (new `survey._make_trivial_resolved` helper) so the same kernel handles both entry paths. `did_had_pretest_workflow(..., survey=, weights=)` removes the Phase 4.5 C0 `NotImplementedError`, dispatches to the survey-aware sub-tests, **skips the QUG step with `UserWarning`** (per C0 deferral), sets `qug=None` on the report, and appends a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix to the verdict. `HADPretestReport.qug` retyped from `QUGTestResults` to `Optional[QUGTestResults]`; `summary()` / `to_dict()` / `to_dataframe()` updated to None-tolerant rendering. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise `NotImplementedError` at every entry point (defense in depth, reciprocal-guard discipline) — parallel follow-up after this PR. **Stratified designs (`SurveyDesign(strata=...)`) also raise `NotImplementedError` on the Stute family** — the within-stratum demean + `sqrt(n_h/(n_h-1))` correction that the HAD sup-t bootstrap applies to match the Binder-TSL stratified target has not been derived for the Stute CvM functional, so applying raw multipliers from `generate_survey_multiplier_weights_batch` directly to residual perturbations would leave the bootstrap p-value silently miscalibrated. Phase 4.5 C narrows survey support to **pweight-only**, **PSU-only** (`SurveyDesign(weights=, psu=)`), and **FPC-only** (`SurveyDesign(weights=, fpc=)`) designs; stratified is a follow-up after the matching Stute-CvM stratified-correction derivation lands. Strictly positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Per-row `weights=` / `survey=col` aggregated to per-unit via existing HAD helpers `_aggregate_unit_weights` / `_aggregate_unit_resolved_survey` (constant-within-unit invariant enforced). Unweighted code paths preserved bit-exactly. Patch-level addition (additive on stable surfaces). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C) for the full methodology. - **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0` joint sup-t bands** — per-path joint sup-t simultaneous confidence intervals across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon. Surfaced on `results.path_sup_t_bands` (dict keyed by path tuple, each entry with `crit_value / alpha / n_bootstrap / method / n_valid_horizons`); as `cband_conf_int` per horizon entry on `path_effects[path]["horizons"][l]`; and as `cband_lower` / `cband_upper` columns on `results.to_dataframe(level="by_path")` (mirrors the OVERALL `level="event_study"` schema; positive-horizon rows of banded paths get populated values, placebo / unbanded / empty-window rows get NaN). Gates: a path needs `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Empty-state contract: `path_sup_t_bands is None` when not requested; `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL `event_study_sup_t_bands`:** the per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Documented intentional choice to preserve RNG-state isolation for existing per-path SE seed-reproducibility tests. Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Deviation from R:** `did_multiplegt_dyn` does not provide joint / sup-t bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL sup-t bands (also Python-only). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Pre-audit fix bundled: stale "Phase 2 placeholder" docstring on the existing `sup_t_bands` field updated to the actual contract description. Tests at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands` (`@pytest.mark.slow`). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path joint sup-t bands)` for the full contract. - **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract. diff --git a/benchmarks/R/generate_dcdh_dynr_test_values.R b/benchmarks/R/generate_dcdh_dynr_test_values.R index 6da79b8d..f507bcfe 100644 --- a/benchmarks/R/generate_dcdh_dynr_test_values.R +++ b/benchmarks/R/generate_dcdh_dynr_test_values.R @@ -699,6 +699,45 @@ scenarios$multi_path_reversible_by_path_placebo <- list( results = extract_dcdh_by_path(res15, n_effects = 3, n_placebos = 2) ) +# Scenario 16: multi_path_reversible + by_path=3 + controls="X1" (Phase 3 +# Wave 3 #5: by_path + DID^X residualization). Same deterministic DGP +# and n_periods=10 as scenarios 14/15, with a confounding covariate X1 +# added via the same `add_covariate` helper used by scenario 10's +# `joiners_only_controls`. Per-baseline OLS residualization runs once +# globally before path enumeration on both Python and R sides +# (verified against `chaisemartinPackages/did_multiplegt_dyn` source — +# `did_multiplegt_by_path` calls `did_multiplegt_main()` once with the +# global controls residualization, then disaggregates per-path through +# aggregation). Per-path event-study point estimates and switcher +# counts must match R exactly; per-path SE within the documented Phase +# 2 envelope and inherits the cross-path cohort-sharing deviation from +# R documented for `path_effects`. Single covariate keeps the scenario +# tight; multi-covariate is exercised via internal regression tests. +cat(" Scenario 16: multi_path_reversible_by_path_controls\n") +d16 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 10, + pattern = "multi_path_reversible", seed = 116, + L_max = 3) +d16 <- add_covariate(d16, seed = 216, x_effect = 1.5) +res16 <- did_multiplegt_dyn( + df = d16, outcome = "outcome", group = "group", time = "period", + treatment = "treatment", effects = 3, by_path = 3, controls = "X1", + ci_level = 95 +) +scenarios$multi_path_reversible_by_path_controls <- list( + data = list( + group = as.numeric(d16$group), + period = as.numeric(d16$period), + treatment = as.numeric(d16$treatment), + outcome = as.numeric(d16$outcome), + X1 = as.numeric(d16$X1) + ), + params = list(pattern = "multi_path_reversible", + n_switcher_groups = N_GOLDEN, n_realized_groups = N_GOLDEN + 40L, + n_periods = 10, seed = 116, effects = 3, by_path = 3, + controls = "X1", ci_level = 95), + results = extract_dcdh_by_path(res16, n_effects = 3) +) + # --------------------------------------------------------------------------- # Write output # --------------------------------------------------------------------------- diff --git a/benchmarks/data/dcdh_dynr_golden_values.json b/benchmarks/data/dcdh_dynr_golden_values.json index 0014e0dd..da2a821a 100644 --- a/benchmarks/data/dcdh_dynr_golden_values.json +++ b/benchmarks/data/dcdh_dynr_golden_values.json @@ -909,6 +909,120 @@ } ] } + }, + "multi_path_reversible_by_path_controls": { + "data": { + "group": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 43, 43, 43, 43, 43, 43, 43, 43, 43, 43, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 71, 71, 71, 71, 71, 71, 71, 71, 71, 71, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 77, 77, 77, 77, 77, 77, 77, 77, 77, 77, 78, 78, 78, 78, 78, 78, 78, 78, 78, 78, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 82, 82, 82, 82, 82, 82, 82, 82, 82, 82, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 84, 84, 84, 84, 84, 84, 84, 84, 84, 84, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 89, 89, 89, 89, 89, 89, 89, 89, 89, 89, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 91, 91, 91, 91, 91, 91, 91, 91, 91, 91, 92, 92, 92, 92, 92, 92, 92, 92, 92, 92, 93, 93, 93, 93, 93, 93, 93, 93, 93, 93, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 96, 96, 96, 96, 96, 96, 96, 96, 96, 96, 97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 102, 102, 102, 102, 102, 102, 102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 103, 103, 103, 103, 104, 104, 104, 104, 104, 104, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105, 105, 105, 105, 105, 106, 106, 106, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 109, 109, 109, 109, 109, 109, 109, 109, 109, 109, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 111, 111, 111, 111, 111, 111, 111, 111, 111, 111, 112, 112, 112, 112, 112, 112, 112, 112, 112, 112, 113, 113, 113, 113, 113, 113, 113, 113, 113, 113, 114, 114, 114, 114, 114, 114, 114, 114, 114, 114, 115, 115, 115, 115, 115, 115, 115, 115, 115, 115, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 117, 117, 117, 117, 117, 117, 117, 117, 117, 117, 118, 118, 118, 118, 118, 118, 118, 118, 118, 118, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119], + "period": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], + "treatment": [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], + "outcome": [14.5657763521, 16.4315531713, 16.5448960407, 17.7507011996, 18.4631249524, 18.9280948508, 18.8075994031, 19.6637467868, 19.7962620006, 21.2471126124, 10.1284194484, 12.3601565176, 12.7777051536, 12.7653019317, 13.3450639448, 14.6872748896, 14.4056790723, 15.6207638829, 16.702943209, 16.8774656389, 15.7866352064, 18.4951442814, 19.0571910118, 20.2235251484, 20.125310386, 20.5368502826, 21.4975884993, 20.3934393581, 21.5459227858, 22.8925614833, 12.7035813034, 14.375545679, 16.5999000162, 17.0191745167, 16.3429034042, 16.8747500709, 18.4401308315, 18.8146607145, 17.8670358242, 20.3277992285, 9.33368404, 11.2956239127, 11.7589281047, 14.0452754663, 13.1054095039, 14.1975233967, 14.8472027751, 15.9193203557, 16.2556871391, 18.0404048159, 9.4615906633, 10.8610991043, 11.8999113247, 13.1370129373, 14.0294404193, 14.5795298566, 13.2090892551, 14.9197915287, 14.8059926677, 15.0473301581, 11.2072537142, 14.1713446324, 15.6004650118, 16.1095134046, 17.3096706701, 16.5563675517, 17.0859502797, 17.9986543971, 18.218523947, 19.6231400429, 18.2596996064, 21.3804531545, 21.7126465111, 22.1909239891, 22.7143877969, 23.0506931585, 24.6056328237, 24.3400011039, 25.8795649319, 25.1766487235, 6.0539332691, 9.2827363552, 10.1419158332, 10.1804100218, 10.1041361372, 11.0130625455, 13.006337731, 13.3153829831, 12.4276581547, 12.810364752, 9.2724266834, 12.5741069777, 11.6597577994, 12.7113096058, 12.9509999892, 13.6423328621, 14.8342207764, 15.9621708028, 15.4689039967, 17.3072434697, 11.3219486881, 14.0840590324, 13.795564971, 15.6570827197, 14.938412928, 16.1481233354, 17.1322090331, 17.8858162993, 17.5668166816, 19.0545207839, 12.2914805711, 15.6629748933, 14.6914887807, 15.3766924372, 16.6390957942, 17.0822943908, 18.1280500905, 19.1822672074, 18.9732195092, 20.3874655862, 10.6825771133, 15.0546444976, 13.4833947045, 15.4148767803, 15.5657994444, 16.4199373824, 16.2889765181, 16.3087368004, 18.621061747, 18.0998481159, 11.8955581479, 14.7734643956, 14.324237986, 16.5703338307, 16.629521326, 16.6795762162, 16.7152895526, 16.2734294204, 18.1190854382, 19.0253597805, 7.2540157915, 10.7247914627, 10.8447831756, 12.2050597282, 12.9154242814, 12.5076217942, 13.4976181511, 13.8831055294, 14.1504293352, 14.5822070875, 11.6117225585, 13.3019157147, 13.887799041, 14.2466626349, 13.8029829907, 16.1803409705, 15.677893397, 17.11020835, 17.2663475931, 17.6938417319, 9.1622754401, 10.543076325, 12.0120982581, 13.0585709292, 11.9914860973, 13.0708440072, 14.2151541934, 14.6876325281, 14.2324164084, 15.8867899197, 8.3370346131, 10.799599941, 12.2430260329, 12.3800347049, 12.2118112757, 13.6412349656, 13.8337387201, 15.016356586, 16.1434473654, 16.0286274743, 8.0722426953, 11.5002529442, 11.4053534371, 13.0022956926, 13.0727130409, 12.6914697693, 14.0972149707, 13.7882953765, 14.704792619, 15.1616940487, 10.2899984004, 13.2596927701, 12.7779327001, 13.4588385023, 15.2797864525, 14.6878757718, 15.0941321692, 16.4916001094, 16.0066987722, 16.5833035965, 5.3540052463, 5.6269585209, 8.5873162307, 8.9947299427, 8.9186139692, 8.82370695, 11.6886704837, 10.6880480326, 11.5783427219, 13.0820601659, 12.0450967673, 11.933890191, 15.46602847, 15.4110632993, 15.7399761936, 15.5075454386, 17.5233999346, 16.9954057407, 18.7992544013, 18.1548317776, 10.1848225763, 10.5010514932, 11.4723535116, 13.2442991142, 14.6483569246, 15.3485384828, 15.1545739673, 15.8735028116, 14.6112934805, 17.9358010251, 13.9118473843, 13.9899552426, 17.4560877621, 17.5109050004, 17.4406289053, 17.5551688351, 19.2315146529, 18.3554671954, 20.1194508074, 20.6950402272, 6.1739878805, 7.1777420003, 10.2832471659, 10.3824069545, 11.8157674581, 12.0437980311, 13.1461370615, 13.6212692497, 13.0916129549, 14.118381701, 16.4900716813, 17.7046422758, 21.5271683741, 19.6375835618, 22.092084663, 23.2363915744, 22.1440063166, 23.203085255, 23.2345996687, 25.4605878861, 11.4988438501, 10.5003714863, 14.3644676428, 15.281022548, 16.7908299872, 15.4379766157, 16.716033074, 17.8194291416, 19.2650305648, 18.3938390059, 14.4219763071, 14.785499764, 17.8792819676, 17.4200682002, 18.8438347794, 18.9525199733, 20.4273345363, 20.7709021388, 19.9026586054, 20.4739998031, 8.5182158215, 8.5984605112, 11.6572179916, 11.5715369865, 14.0105578192, 13.3521097633, 14.473586392, 14.5889268059, 14.9482975597, 15.4283386394, 11.9322938308, 14.1492251143, 15.8140224473, 17.6485464971, 16.3129829229, 18.1187402972, 17.9490665436, 19.3125165947, 19.5918207695, 20.1325185552, 11.9442576342, 11.6059135305, 15.9376754373, 15.4085308956, 15.5178288282, 16.5539932409, 16.467134288, 17.7566365393, 18.0713437424, 18.6638065693, 9.0711303102, 10.5721016112, 12.5411652057, 13.2984567827, 14.2484386622, 13.7220057248, 14.8280269976, 15.2830517496, 15.5752258776, 16.5698283231, 7.3126972697, 6.7983964406, 9.9781371615, 10.5085474831, 10.7713446035, 11.9694551511, 12.7699720165, 12.8354835507, 12.7995899433, 13.8145715668, 5.5003288257, 7.2536953191, 9.7603807827, 10.7265193804, 10.304479232, 11.2536927418, 12.27719387, 12.5777584433, 12.4964554955, 13.3156529311, 7.2137455779, 8.0948212638, 10.6333685527, 11.8181285867, 11.3844640723, 11.5904181485, 11.6523673301, 13.5039131014, 13.9380243503, 14.3149706968, 9.6540766772, 9.8084694314, 13.9731406831, 13.8542678069, 13.5848136862, 15.0561127096, 14.1997486244, 16.1026971856, 16.6886253553, 16.742756502, 8.5787492059, 8.7873491009, 10.3534009194, 11.142973218, 11.3927467924, 12.9279827713, 14.5382598969, 14.4094882036, 14.4341664466, 14.586350965, 8.579988086, 10.5423980062, 11.5804808682, 12.2083800518, 12.8869413471, 13.2623184334, 14.2098851174, 13.2863937495, 14.1005981266, 15.5006591545, 7.6441278173, 8.6325037293, 11.0952413093, 11.6729015348, 12.3466541144, 12.5807345267, 13.222186613, 12.0146985994, 13.722051794, 14.5994965593, 11.8207510234, 13.7605153922, 16.1443166865, 16.2887772274, 17.3487912812, 19.0892577184, 17.4529109317, 17.8857245393, 19.848776404, 20.3986789384, 13.3409649621, 13.4282585107, 13.5742501859, 17.3963755538, 15.8329435915, 15.1994877385, 15.71495415, 16.3397960058, 17.5063321693, 17.7614922034, 13.9130140845, 13.4248320168, 14.097572696, 17.381796019, 16.7529372098, 17.0138715208, 16.879122302, 17.2248705803, 17.4304539674, 17.7935284122, 6.9193101486, 7.5198491075, 6.7421146014, 11.1140817179, 11.1472080733, 9.6894081375, 10.0308042258, 9.839260373, 11.0730993296, 11.7545731454, 7.2035935625, 7.6876094111, 9.7038449702, 11.3983197828, 12.5610189005, 11.2816834897, 11.5011542545, 12.0454867116, 12.952061067, 13.1055798298, 10.0634990922, 11.4008124366, 11.488542128, 13.8130509498, 14.4005867666, 13.5172061465, 13.4175873433, 15.509104808, 14.2041752237, 14.7293351556, 11.3875853905, 12.9071328972, 12.5307578314, 14.962156757, 15.4188852619, 13.0943310555, 15.1811474116, 15.4747812387, 16.5202013658, 15.8184678004, 10.4053531236, 9.4001936622, 10.3758721103, 13.3825417576, 14.3097833368, 12.1682463105, 12.3283770445, 12.9364999894, 13.6247829303, 13.7879118841, 15.7509067054, 16.2380760002, 16.3355434807, 18.7886473142, 19.7603495752, 18.7743574914, 18.7133050645, 18.0127794516, 19.1549611947, 19.3785918431, 13.3527982185, 13.5300842272, 13.0769921963, 16.7786943516, 16.7904402302, 15.6208289168, 15.2284276002, 15.8520508447, 16.4858379602, 17.5825096387, 8.2801667283, 9.7191703298, 10.2109312935, 12.5049641541, 12.9029230528, 11.2913124676, 11.8787046793, 12.4268419066, 13.6759198097, 13.7439941894, 7.0291214542, 6.1889604277, 8.6652642277, 9.6062245637, 9.8816855302, 8.7794614498, 8.773800402, 9.7834578534, 10.5277432491, 10.7204518362, 15.1221707937, 15.5887588405, 15.1506118254, 18.6539008209, 19.70129324, 16.3999990294, 18.2905670002, 17.6004039124, 19.0851874041, 20.0389204952, 11.2961520844, 12.2750471229, 11.6709837987, 14.7511536733, 14.6128097635, 14.6075144028, 14.9106901608, 16.4099689981, 15.6512784803, 16.8978637512, 11.8162778667, 12.185081922, 12.4536027774, 16.428888552, 15.7613364402, 14.5979195222, 14.9366590903, 15.5307021482, 15.9896210212, 16.2147282393, 9.2875586724, 10.0295859516, 8.8932584783, 11.73609812, 11.9542338529, 10.6325119047, 11.7257438913, 11.5609345709, 12.3731188328, 12.7591980812, 6.7402031348, 8.4412593186, 9.6977718343, 9.7831791021, 11.347426083, 12.1584506317, 11.1626912093, 11.3832335735, 10.8164452991, 12.5023313357, 12.3793124505, 12.6840120922, 13.7926846747, 14.1849385033, 15.4109344261, 17.8421576056, 15.1826300933, 16.3319813242, 16.461603734, 16.6944855123, 7.9571505221, 8.7885252938, 8.9279724081, 8.2582777344, 12.1050412615, 12.1668136103, 11.5214410063, 11.7192982432, 13.0075693192, 12.8260365083, 11.682189358, 11.247572521, 11.9969743821, 14.0713627372, 16.4589816647, 17.5772216764, 14.9184650387, 16.3931346293, 17.0666344707, 16.7231109093, 6.8657141913, 6.0565169565, 7.0019183144, 8.7121090463, 10.6211464569, 11.0738569906, 9.3274322486, 11.0261313089, 11.3557752621, 11.3341912597, 12.2839027881, 14.0053484469, 13.128342705, 14.6745131865, 16.6209948988, 16.8361231432, 16.4233192687, 15.7775032918, 16.8207629923, 17.3846079455, 9.7588940362, 10.7779665214, 9.6522999476, 11.1569815384, 13.6066967184, 14.3953674265, 12.6429942556, 13.6183832757, 15.2689473038, 15.5707086684, 10.1269469671, 10.7700890894, 10.4258006925, 11.2684316412, 15.2076882422, 15.4544076813, 14.0084157571, 14.3262372474, 14.4558209785, 15.2805859908, 10.5241069116, 9.5962627544, 10.580139289, 12.0069914797, 14.4128163148, 13.8897058584, 13.5893758655, 14.7429656101, 14.3147587383, 15.7328922661, 4.8438949515, 6.587106256, 5.9392680152, 7.1215764805, 9.6997302943, 9.8964004735, 9.3881454205, 9.1179502458, 9.7469106314, 10.307942024, 11.0935423114, 11.778403317, 11.9203539045, 11.9172399712, 13.2836629563, 15.6065546773, 15.0786184505, 15.4086396311, 15.4007641038, 15.923070026, 8.1812795393, 10.0108301258, 10.7355125987, 11.8176238156, 10.4636236953, 14.4268236345, 12.2865306902, 11.6446483062, 14.1122021078, 16.1959973122, 6.8393038097, 6.6916220638, 8.280017756, 8.6645340794, 8.5848375893, 11.9887102554, 10.1910752162, 10.8429874664, 11.588487525, 11.3826743434, 10.320702028, 10.1361483842, 10.5414079433, 10.5798719294, 11.9067606554, 14.3007952709, 12.705472364, 12.786762742, 13.77022693, 14.01671708, 6.860308975, 7.040954534, 7.2978541904, 7.3288527166, 8.3983927516, 11.2774260678, 9.7614498638, 9.1395301284, 11.1085583904, 10.407607537, 7.9795182913, 8.2202407764, 9.8878306274, 9.924751601, 9.6739570342, 13.2236088883, 11.5959625407, 12.6062133297, 14.1510495821, 13.6304449721, 8.3775527591, 8.2564522388, 9.7140362202, 9.6065916016, 9.6582400641, 11.7354953591, 10.4108631181, 11.6334952076, 11.7836884677, 13.2359234782, 12.7294562428, 13.140982459, 14.51121146, 14.4926690306, 14.200609898, 17.8916342244, 16.8854239118, 16.3761514534, 17.6430236622, 18.3735078954, 10.3097707972, 11.0558735193, 12.673773866, 12.8168636342, 12.7391359408, 15.44103716, 13.7308061468, 12.8387670254, 13.640055089, 14.9558993768, 7.1931432346, 7.4624261663, 7.4311404359, 9.1951962113, 9.5298558719, 11.709198488, 11.4538805844, 11.5510935737, 11.3959153198, 12.6733050737, 11.5693768753, 11.9237395426, 13.6197300442, 12.6856386302, 14.0832270053, 14.5366609329, 16.2729030932, 15.6886464583, 18.0304562748, 19.0156881983, 9.6340738858, 9.351455107, 10.4699955083, 10.7502109214, 12.868074309, 11.9076735882, 14.0542282633, 13.8692549631, 17.3918179835, 17.5303543246, 6.9554328546, 8.049630549, 7.6419902105, 8.4614092239, 9.0187852196, 10.5816291609, 13.0626038902, 10.3278252973, 13.7282770029, 13.6176366427, 10.8525228314, 10.610389946, 10.4428673947, 12.1398532976, 11.8534005547, 13.3484767509, 17.123793655, 14.4835070345, 16.9159173803, 17.3990154202, 9.1102099063, 8.5810329512, 9.087225583, 10.4370381965, 9.5306743127, 10.931725487, 14.545985431, 12.0668969588, 14.8818338198, 14.1341374005, 12.4876383975, 10.1951499518, 11.0351082082, 12.8474434025, 12.4875266234, 13.2070530238, 13.3871521322, 15.5081801432, 16.2533199335, 15.5644585267, 8.2250819944, 10.3973030034, 9.9613076506, 11.7779639711, 10.9688495762, 12.1115793627, 12.3696989466, 12.5840563943, 14.2990244491, 14.1894233239, 9.7828593904, 9.3947837331, 10.8455152414, 11.1496054767, 12.4865980715, 12.5910281492, 11.9817460825, 12.8461291683, 13.1485720693, 14.7037444349, 11.4388124831, 12.7753750645, 11.1674337068, 12.5052065428, 13.054307334, 14.0952122549, 13.8029699513, 14.2308191071, 15.0435433138, 15.8241745023, 14.2030429959, 12.7811975842, 15.2354842723, 15.3392359656, 15.0888292533, 15.3836596898, 16.672367426, 16.9981559822, 17.321031339, 17.5003089103, 11.0838898191, 11.8961216637, 11.5791078133, 12.2604807927, 12.3935988404, 13.1733124437, 13.295398874, 14.742475762, 14.6647216417, 15.5592641772, 8.2171313686, 9.4436191515, 9.4326031779, 10.2478935176, 10.9751972241, 11.3213989511, 12.2576021787, 11.5358232025, 13.1062635691, 13.2433062797, 10.3303696721, 11.2820228209, 12.8347796591, 12.0175503988, 13.7161801334, 13.7206880992, 13.2039148852, 14.8303020288, 16.0208486626, 16.4056155847, 3.7568423631, 4.2406618617, 4.1718059424, 4.6011807045, 5.1390312254, 5.5514219452, 7.4198139193, 7.3190029225, 7.4538768108, 7.8583036016, 10.8267116454, 11.1295251, 11.7509838565, 13.8012619502, 12.2868804441, 14.3175922843, 13.0663037182, 14.5742292117, 14.8115407052, 15.5518266977, 14.6462449315, 16.7933150099, 16.3282269565, 17.5308929384, 17.8717178291, 19.1554218629, 19.074679525, 20.3850121543, 21.1794166537, 20.268278164, 7.7219206011, 8.0236491234, 8.5111675732, 9.9043170934, 9.7323366963, 10.5674596524, 10.7095797317, 12.1568342619, 12.2290734908, 12.6508304786, 8.5707478854, 8.9435264515, 9.5865134209, 10.595232762, 10.1227466467, 11.9090659754, 11.6749502286, 12.9308143337, 13.4581253188, 13.7731538045, 10.9838856864, 11.8819649664, 12.320490454, 12.6971778938, 13.1864032152, 12.8898650472, 14.2031075177, 15.5676802909, 14.9248528644, 16.0524798841, 10.6983688349, 9.5495753513, 9.9996536035, 9.9856168881, 11.3466480813, 12.7094662386, 13.0380948639, 13.1303104397, 14.9721288193, 14.5234582724, 9.3837998046, 9.3942979703, 9.2855678601, 10.0746588312, 11.173619008, 12.0618644799, 11.5987242415, 12.3284882986, 13.6134381406, 11.6046773676, 13.3644867716, 13.2424552355, 13.1295398831, 14.7304657332, 15.2473000111, 15.2583286352, 16.464791424, 17.1296594061, 17.5390030436, 18.7110437222, 5.8456382695, 5.3311861879, 5.965229824, 6.7037114263, 8.0981326968, 8.8492189692, 8.5644862895, 9.4682323815, 10.0758046765, 11.1343907635, 2.2580690506, 3.6303480841, 4.2002244916, 5.0878329427, 5.7297301002, 6.1630746976, 6.3814203956, 7.8697823884, 7.5952865417, 8.086904876, 9.0189024861, 10.6210293615, 10.3845773263, 10.4796382471, 11.4746526775, 11.3876825744, 11.4428459555, 12.5694539368, 14.0091944606, 14.6256465975, 8.9641801048, 8.7997603557, 9.7567710069, 10.7599483284, 12.2338612406, 12.2529468072, 12.4831995235, 12.7760036913, 13.1116590262, 14.0312437786, 10.1345917381, 10.4270107849, 11.6603861839, 12.631764838, 13.2009610273, 12.4877246584, 15.0236452117, 13.7485908633, 15.3781404522, 14.9697105256, 10.1226724757, 11.2566463287, 11.6638074589, 11.9902527013, 12.9871950586, 13.8648086563, 14.2354578186, 14.5778459121, 15.2707826036, 16.3231208366, 13.671693195, 13.7875794307, 14.975340831, 14.300838196, 15.7099383499, 15.0169641124, 15.1420825195, 17.3381575149, 15.0573481403, 18.1691486173, 9.1272291916, 10.3195691613, 9.7599762821, 11.1952949308, 11.0888637055, 12.089931351, 12.8033641772, 13.9516800194, 13.4669174225, 13.737763548, 15.2616717468, 14.6716648107, 16.1022179953, 16.5985084116, 16.7925911542, 17.0180677627, 17.7389863575, 17.7307393488, 18.6270643049, 19.8022795598, 12.217796317, 12.2790360103, 13.2442227828, 14.5262264653, 14.5418187478, 15.0272333735, 16.4861689983, 15.7114726241, 15.947900875, 16.1865698378, 14.6971500474, 16.078895434, 17.074024203, 16.6105540715, 18.2426187237, 18.1069548849, 17.4302114699, 18.8015806179, 19.7275302105, 20.0955182301, 10.264057409, 11.5456417933, 11.3586416183, 13.2914892716, 13.2778048925, 13.9565439433, 14.2834141443, 14.9483742276, 15.6702207749, 16.4493624417, 8.1707476742, 8.932828363, 9.272619524, 10.4043554271, 10.905796066, 10.3593896096, 12.8374495658, 12.2872977617, 13.6616336415, 12.7851581093, 12.8098096367, 14.3980880134, 13.7844784616, 14.8880986302, 15.0441594041, 14.6516113046, 16.6894525993, 17.586057235, 18.8614050811, 19.3787012754, 13.2645174754, 14.4402411683, 15.8161511437, 14.715234678, 15.487753236, 15.8829865703, 17.4003939769, 17.491176027, 18.0872322497, 19.3737454491, 12.9641766104, 13.7874408461, 13.5190329784, 16.0099296437, 15.8708476821, 16.5059530637, 16.322812549, 16.8687396871, 17.4687589626, 18.8846853823, 10.7952753677, 10.6523938905, 12.4384590279, 11.4783850427, 12.1896737933, 12.7618279857, 13.2548194771, 14.203182058, 14.3679839569, 14.552570345, 8.5953828017, 9.652904691, 8.6779184465, 10.472117567, 10.3851019233, 11.2895867378, 10.9554722141, 11.1536345972, 12.1767399604, 13.937308457, 11.8902592002, 11.8075160228, 11.8429425868, 14.5165744358, 14.3973209846, 13.1822183543, 15.9082298984, 15.2785049291, 15.3850377601, 15.6200642863, 16.3178404382, 17.9497817224, 16.7731289251, 17.0336113964, 18.4939120506, 18.8878605182, 19.1480588468, 20.0986621159, 20.1179551679, 19.6476091014, 10.9073853706, 11.0307662911, 11.9769704784, 10.9524129215, 13.0284360547, 13.1781473262, 12.6896019925, 13.773682184, 13.7871282753, 15.8066802412, 12.8327036859, 14.8132227014, 13.4882314095, 13.6291273972, 15.7303039945, 15.4398124149, 16.1081680036, 16.9110988261, 18.326266592, 18.2936670773, 10.7249379683, 12.4861222741, 12.5090415094, 12.5329158726, 13.7330331415, 14.4546035974, 14.5082950602, 15.3247586488, 15.1628563061, 16.3843795909], + "X1": [1.7388402869, 1.9440817873, 2.2910585559, 2.7663750025, 2.7646436884, 3.4277739209, 3.30634629, 3.5587004436, 3.8424535228, 4.3184848303, 0.9496363157, 1.3309369, 1.328249767, 1.8853948165, 2.4472706272, 2.3889672923, 2.6612907613, 3.3127745919, 3.4277543313, 3.7958315124, -0.13131943019, 0.24792974686, 0.7470427081, 0.78882838389, 1.2387226036, 1.4556772825, 1.5362625525, 1.8666572031, 1.961215124, 2.6234257668, -0.57322319387, 0.080221185587, 0.22197696471, 0.52210052246, 0.37835502473, 0.7130208595, 1.0672998466, 1.5582442033, 1.6657430727, 2.3415455369, 0.020342345207, 0.56384450951, 0.48860463468, 0.9898299714, 1.2013731501, 1.3670544826, 1.7816400556, 2.3750586858, 2.492916421, 3.0526810626, -0.52080114692, -0.84217517685, 0.25398168406, 0.12296020224, 0.26609722997, 0.80074912822, 0.81591960381, 1.3289737704, 1.7613140373, 1.4652679931, 1.2969365489, 1.6997696962, 2.1118995662, 2.5819428261, 2.6955825519, 2.9120175217, 3.2210773601, 3.3117793697, 3.7597981411, 4.3521858658, 3.7990065614, 4.0650529378, 4.4179808188, 4.3040128773, 4.941139948, 4.8734862648, 5.591948347, 5.6945783881, 6.3674122001, 5.9313398757, -1.1088537545, -0.59447046792, -0.40286091464, 0.071341875082, 0.48124066481, 0.41898693933, 1.410443218, 1.3311378725, 1.8434689317, 1.7265260873, -0.6506733271, -0.25657008553, -0.27126794705, 0.059623848461, 0.44667147049, 0.66137845776, 1.3136552297, 1.7413126584, 1.6574385337, 2.5745631537, 1.2530586511, 1.0533948601, 1.6240999918, 1.5462054267, 2.1331732705, 2.3301870268, 2.6519047504, 3.2679075385, 3.3042384256, 3.7022238487, 0.96553620438, 1.8209848008, 1.8565644063, 1.5485346093, 2.5123139283, 2.4902491051, 3.2575361464, 3.1095880775, 3.543139837, 3.6782044769, -0.56545111667, 0.24822127161, -0.14630511559, 0.55017783216, 0.7209767591, 1.06516098, 1.2198337637, 1.3730617537, 2.0697776531, 1.9266474307, 0.76114574863, 0.77477898654, 1.2441604516, 1.3992429505, 1.8079563241, 1.8522629104, 2.4637310808, 2.5050928276, 2.7621038391, 3.2789610082, 0.10226645213, 0.71821582169, 0.52094593045, 1.0681437974, 1.5535776151, 1.7076034082, 1.8745656173, 2.4598477481, 2.6897887475, 2.7737401468, 2.2038630278, 2.6340551812, 2.7247290196, 3.0970067146, 2.984402521, 3.8420265156, 4.0916048544, 4.2403626522, 4.5414951623, 4.9016706056, 0.10415052056, 0.51724639537, 1.0485795016, 1.2648662251, 1.6180636377, 1.5686925053, 2.6042025858, 2.4658615526, 2.6385396439, 3.3102596743, 1.8330168377, 1.7157897375, 2.393619004, 2.7444952051, 2.9119560483, 3.5750056834, 3.7798173537, 4.2250274603, 4.1980219588, 4.4661210104, -0.231894494, 0.45046514491, 0.4842415772, 0.98393277705, 1.1466569137, 1.5469521192, 2.1507024116, 1.8879944884, 2.4561586404, 2.8621345191, 1.1212861123, 1.5518577547, 1.3626204561, 1.8753588772, 2.6415401594, 2.1573255552, 2.6930365964, 3.4172238888, 3.5416808254, 3.7738710565, -1.6156085581, -1.0133665107, -1.0087338555, -0.53569994563, -0.55087026425, -0.3503007191, 0.56148260007, 0.52730960174, 0.78679079091, 1.1492419578, 0.61659043623, 0.7931061939, 1.2858435441, 1.3550248427, 1.7607856448, 1.9497441317, 2.3148475589, 2.2470421147, 2.9043782045, 3.2314387852, 0.73705448049, 1.2520167007, 1.2405805987, 2.0506579288, 2.1927235369, 2.1404179046, 2.3369623846, 2.7771305937, 2.7486511158, 3.3796136594, 1.2434290598, 1.4825459168, 1.680495653, 2.0251439181, 2.0166391503, 2.1793728166, 2.7746585519, 3.1262425764, 3.3205369848, 3.3720639497, -1.0635696097, -0.94000908185, -0.58259869466, -0.51522666513, 0.5045929707, 0.30522788237, 0.81012730986, 0.60029184976, 1.4960679451, 1.6942452066, 2.9789710963, 3.2119555565, 3.6227962954, 3.3917878104, 4.551827671, 4.5483587023, 4.6835302505, 4.8030309509, 5.1021999593, 5.7542690087, 1.1189914921, 1.056288753, 1.2466978641, 1.7243142349, 2.4924439914, 2.3262369007, 2.7905578179, 3.2093275443, 3.4420441964, 3.7688080745, 2.0417818053, 2.1440989404, 2.9275976864, 2.8251877845, 3.4393773498, 3.4622860204, 4.1402867076, 4.4643812839, 4.5697091582, 4.273478103, -0.82877940591, -1.0177736778, -0.2811293118, -0.26233673538, 0.61788391764, 0.45189836352, 0.99682760057, 1.078105357, 1.0887457018, 1.6758840935, 0.12738992604, 1.1014879604, 0.77438603648, 1.5817725418, 1.4260541884, 1.943657644, 2.3756611584, 2.8072579045, 2.884123729, 3.3527389108, 0.3890559857, 0.8588996245, 1.7622662023, 1.6905933625, 1.9895707275, 2.3721827026, 2.740675616, 2.9238565259, 2.9545117674, 3.7412116524, -0.63404362932, -0.26919364868, 0.56781809275, 0.61371076805, 1.1211077714, 0.99184593499, 1.4584451523, 1.8250083881, 1.7341440958, 2.2813665369, -0.81181642053, -0.70003546079, -0.66685114933, -0.21223258333, -0.48657864268, 0.42682970583, 0.82388296972, 0.99027334459, 0.93768796987, 1.5700585915, 0.12145126703, 0.85784385611, 1.2314310061, 1.4210255783, 1.7689570536, 1.9585855408, 2.2651037871, 2.5330613964, 2.6008327926, 3.2027934539, -2.3180129235, -1.9129115839, -1.912872001, -1.158349536, -1.369014259, -1.1537779239, -0.73749119698, -0.060179973365, 0.36738985255, 0.27413578256, -0.80513185139, -0.61784146742, -0.059550093072, 0.037110175751, 0.40598026637, 0.56282846293, 0.93569685844, 1.3283073463, 1.379617105, 2.0910675179, 0.48478410689, 0.61553350503, 0.75430161538, 0.95619736442, 1.3464414902, 2.0789688948, 2.3844791084, 2.5977175024, 2.7319402408, 2.8795808283, 0.11839542928, 0.85185210177, 0.71837598722, 1.0550031885, 1.796960875, 1.5459030038, 1.9726620471, 2.2348078118, 2.4764340247, 3.1872588676, -0.65938067691, -0.37786371428, 0.10340204655, 0.53425713059, 0.55910418185, 0.81135351972, 1.4998400195, 1.1928928268, 1.6377285316, 1.8040089906, 0.83655947047, 1.4167080214, 1.6064302665, 2.1639892654, 2.3951891253, 2.860770408, 2.6072424857, 2.857057487, 3.697056916, 3.9473071493, 0.93620350795, 1.2439379122, 1.684360508, 2.1958380157, 1.9418896587, 2.3205341964, 2.9379011821, 3.0663717859, 3.6670783392, 3.5204531655, 1.9298980092, 2.3121664357, 2.5843273751, 2.8473097791, 3.1869521021, 3.6799279463, 3.6672005012, 4.0536456721, 4.1221442422, 4.7370835648, -0.80959988527, -0.37885188458, -0.21153208134, 0.26892353464, 0.41485095688, 0.60889944518, 1.1725659358, 1.4981011461, 1.2128697722, 1.7963940774, 0.11456906275, 0.53206267243, 1.1322776445, 1.2782501073, 2.0230283498, 1.9915233051, 2.1959958513, 2.5076104605, 2.9247882054, 3.4157145609, 0.3852005693, 0.42376419032, 0.90493353318, 1.1324272906, 1.5161872215, 1.7811916043, 1.7667723985, 2.2449288859, 2.4037856827, 2.7784604693, -0.51336492118, 0.21300381007, 0.61156640344, 0.15831304603, 1.1199927187, 1.5377708586, 1.8424376155, 2.1793501992, 1.9773502152, 2.1474627218, -1.3855462082, -1.0773913769, -0.92275273008, -0.71596295456, -0.26213137655, 0.10144534041, 0.17104011112, -0.026739149711, 0.74621093133, 1.0307273785, 2.2237294938, 2.949370338, 2.8841904995, 3.2980611592, 3.3521200524, 3.6120943293, 3.9504106077, 3.5903604164, 4.6928423773, 4.6559327956, 0.96833102426, 1.1117959357, 1.3647723995, 1.9929815744, 2.3855746853, 2.4752299178, 2.6213110041, 2.9808082799, 3.2354533044, 3.6005611771, -1.9419045637, -1.6947144558, -1.5589118318, -1.069404662, -0.80972169404, -0.551539557, -0.4446693382, 0.049276996059, 0.37623204588, 0.55961514833, -0.28994414707, 0.12664851664, 0.86727262945, 0.43810088385, 1.2753094796, 1.4406262769, 1.2735992038, 1.8076835393, 2.4125246863, 2.7362140816, 0.56475662993, 0.42043275169, 0.59138004566, 1.2246501974, 1.7369392951, 1.5825989704, 2.5540926935, 2.1512428724, 2.9958856916, 3.0899334187, 0.93583122538, 0.76813045008, 1.2758032285, 1.5491685304, 1.7727657548, 2.2562249182, 2.9105411409, 2.9896783551, 3.2707855641, 3.9053588655, 1.0175376224, 1.3220460085, 1.2790694519, 2.3390997546, 1.8515325124, 2.481234804, 2.6027534312, 3.0393300872, 3.3874529446, 3.9070091749, -0.94323774503, -0.40689689063, -0.54995129984, -0.32581904978, 0.27416175575, 0.38214436978, 0.77157258115, 1.0036287182, 1.1699807534, 1.7629150171, -0.87639188833, -0.093734034204, 0.09909643643, 0.21854905281, 0.40379507986, 0.97557007881, 1.1571556248, 1.4276796232, 1.6775527389, 2.1757145013, 2.5816861702, 2.654677903, 3.2186642768, 3.7852754233, 3.7156336616, 4.0998196507, 4.1447925625, 4.5834339787, 5.0792086169, 5.3247616469, -1.2418788131, -0.82757230955, -0.63889531958, -0.62886946011, 0.13321800992, 0.35956822614, 0.59773950137, 1.2547640956, 1.2753729779, 1.7276954741, -0.70633157642, -0.16110707671, -0.078369245329, 0.65643988762, 1.3080461446, 1.3036160504, 1.4784813388, 1.9448084133, 2.4191516437, 2.5501108238, -0.81000415935, -0.65206686698, -0.0019396291668, 0.15853850916, 0.54581383378, 0.69455601407, 0.74892670911, 1.1983971124, 1.7903904013, 1.7853498998, -0.048997920503, 0.38986343755, 0.44782621084, 0.87222129244, 1.2736317662, 1.2555347297, 1.5111118779, 2.1321836262, 2.186961843, 2.7507516664, -0.43609820517, 0.084850193919, 0.063614295948, 0.27603896886, 0.50879070751, 0.99320676371, 1.3743626895, 1.7797052546, 2.0706981141, 2.4368375081, -0.45550655614, -0.27801379331, -0.033747868759, 0.51698484453, 0.92377504919, 1.1297419305, 1.3972505604, 1.7745574468, 2.0216185887, 2.1991027462, 0.33328407893, 0.26061953624, 0.77933140982, 1.4143051284, 1.2363626437, 1.3785296185, 2.1938583295, 2.7957598099, 2.5509815884, 3.2511411522, -1.5722757465, -1.0688675654, -0.80518364041, -0.73624243231, -0.21112476585, 0.16038955032, 0.60313994108, 0.74057117668, 1.0008858806, 1.2436127998, -0.62787341828, -0.50405845049, -0.3020157314, -0.1780850948, 0.2953416225, 0.47669509585, 1.1323482514, 1.0919617368, 1.3530287351, 1.7707833234, -0.12442186133, 0.83540788811, 0.74355731968, 1.0471687387, 1.0843736725, 1.5695944143, 2.0002610197, 1.535043578, 2.0988272166, 3.3551031056, -1.966335027, -1.2845496005, -1.2801071086, -0.96452341731, -0.78532760555, -0.64344224454, -0.025101084767, 0.075333191933, 0.61664861452, 0.66719483779, -1.9282645974, -1.5811769762, -1.4469377638, -1.2953933466, -0.66574700062, -0.32420429785, -0.10721359471, 0.38628917695, 0.52977777633, 0.9330787378, 0.027748693921, 0.28579130732, 0.34624338398, 0.91483195676, 1.2012688055, 1.3731761932, 1.8672318044, 2.0335716943, 2.3815523868, 2.4222180967, 0.29085371824, 0.65051092828, 1.0653827541, 1.2178194795, 1.4584290776, 1.8334151937, 2.061013457, 2.3264613477, 2.8076583096, 2.6364562126, -0.23213734419, -0.27952877888, 0.79821485115, 0.50424888081, 1.1099935794, 0.96792837094, 1.312294532, 1.6462889627, 2.0388595339, 2.5344962424, 0.73015913823, 1.2039441288, 1.8292156946, 2.1405063187, 1.9968433789, 2.182769002, 3.0319208881, 3.2155103208, 3.5408506532, 3.4444027223, 0.19888184261, 0.3971312949, 1.2772913943, 1.1655273831, 1.6134989754, 1.7585131539, 1.7786411447, 2.0037515176, 2.7998943255, 2.7921593602, -0.2066004224, 0.15549613689, -0.1556552863, 0.26848098137, 0.93073305902, 1.0933692431, 1.4461940369, 1.963242531, 1.9685108244, 2.4873139042, 0.70195171382, 1.647173931, 1.6779718114, 1.8541715766, 2.179324411, 2.2483860129, 2.5820278115, 2.7735064113, 3.1934581462, 3.8529942522, 1.072076406, 0.93210593441, 1.2560123846, 1.6338627319, 2.1579578901, 2.2031744335, 2.6853246105, 2.8930992116, 3.6115700624, 3.7430010213, -1.7223909182, -1.4109067975, -1.2625469806, -1.028624538, -0.7752264338, -0.10643222864, -0.039354323153, 0.43686907849, 0.41675133699, 0.50549209893, 1.1555447802, 1.2016654701, 1.2749520664, 2.2558642614, 2.2458860677, 2.1882219781, 3.1411483671, 3.1476508649, 3.2020657145, 3.8390342458, 0.26360719793, 0.43032461527, 0.70607955929, 1.4881425789, 1.1211521201, 1.5518466269, 1.9744996412, 1.9262133228, 2.6589044095, 2.8249986374, -0.42779245438, -1.0330623865, -0.72485044997, -0.044712352253, 0.13780622635, 0.72044356498, 0.86317570657, 1.1723699383, 1.6041722087, 1.8936144451, 1.1283419391, 1.8440651226, 2.1605103826, 2.6750443814, 2.2965871251, 2.9517872601, 3.2972769782, 3.6077986197, 3.9224240533, 4.1976374023, -0.34844042928, -0.084978780565, 0.46225520052, 0.45875406342, 0.65996273804, 1.2614117403, 0.95957496832, 1.5779061818, 1.9681057961, 2.6046964153, 0.39586128004, 0.83597893672, 0.53577159539, 0.97452178021, 1.3998913016, 1.8558334596, 2.0811413459, 2.0172419468, 2.3014815067, 3.0424884214, 1.8108060695, 1.694409471, 2.6742664507, 2.9333297834, 2.7501196304, 3.3088462235, 3.8754982201, 3.6584881404, 3.7914586751, 4.3339580659, -0.26980318071, -0.181871401, 0.026483019243, 0.37995168698, 0.4267898338, 1.0789890092, 1.1722244548, 1.590261629, 2.1747792389, 1.9384910014, 0.65212354166, 1.1418561995, 1.1573659282, 1.6084906557, 1.8750886657, 2.0205638149, 2.3640353508, 2.1735763311, 3.0264552177, 3.3213048777, -1.1068927129, -0.99729398895, -9.2597608488e-05, -0.20399087979, 0.31433473785, 0.43602080075, 0.65658340551, 0.65938836489, 1.6582126762, 2.0024432029, -1.4279580869, -0.97251127875, -1.054383421, -0.46768983617, 0.16496465068, -0.12703642611, 0.64734139551, 0.68780781544, 0.88914721748, 0.99647237375, -0.25066704443, -0.00070623090938, 0.32251644163, 0.79759091269, 1.0358128794, 1.603047373, 1.5645010495, 2.0201091498, 2.311980212, 2.4828065118, 0.7278959372, 1.3490584697, 1.454862867, 1.7901644262, 1.7594044793, 2.4275730214, 2.5705520471, 3.2367603884, 3.5115302831, 3.5571989989, -0.73979180312, 0.15927338676, 0.039300932037, 0.29434999935, 0.70073982989, 1.068565035, 1.5630504786, 1.6092035894, 1.8101026602, 2.2443277212, 0.55262997585, 0.96489449466, 1.0781284711, 1.5781727824, 1.9270948245, 2.2606097168, 2.296702209, 2.8443302363, 3.2877043915, 3.3317160054, -0.22931996777, 0.063243893795, 0.55447129728, 0.46898029796, 0.77942625236, 0.94990807887, 1.6783147135, 1.6751189255, 2.0103284103, 2.5532533426, 0.8968494973, 1.3062106691, 1.3979895897, 1.5279195391, 2.0908385394, 2.7660513164, 2.744385478, 2.8701081019, 4.0294685448, 3.734017466, 0.92367956786, 0.6287345374, 1.3990645751, 1.2421141953, 1.5704398171, 1.8111123022, 2.2378212046, 2.7603815818, 2.7551524776, 3.0480328228, 0.96607611762, 1.2449213763, 1.2355866633, 1.9098294131, 2.2964752946, 2.3207553745, 2.8399532186, 3.3038322992, 3.0465382372, 3.7189088783, -2.5395367389, -2.63181283, -2.0524625276, -2.1880238372, -0.8231795258, -0.79134305772, -0.70411493131, -0.24518803199, -0.18970389001, 0.096384178244, -2.1241569406, -1.8482603934, -1.5440339318, -1.3243440965, -1.0005254793, -0.36504107885, -0.035653216925, -0.16406242861, 0.24464910425, 0.77944216207, -0.72806231545, -0.61494318818, -0.25744025188, -0.28380205202, 0.26126819232, 0.73738444279, 0.97612132482, 1.1251728102, 1.565948397, 1.7295467545, -1.6673450074, -1.3192305558, -1.2503653573, -0.56398495281, -0.4714219366, -0.0016038593432, 0.18890582342, 0.43819658608, 0.56810549896, 0.92777826338, -0.49442764506, -0.31996888257, 0.15106278712, 0.50601827183, 0.68994456026, 0.84442575807, 1.4843153279, 1.4001768326, 2.3060820366, 1.998646351, -1.0729163723, -0.2522245871, 0.21085894408, 0.29194113375, 0.6284314779, 1.3461024637, 1.1924836198, 1.5457163396, 2.0018197805, 2.4366475547, 1.0032305727, 1.1771356533, 1.4891008031, 1.1693790998, 1.9305279513, 2.2360794045, 2.3994969166, 2.9047362426, 3.1024592597, 3.434466321, -0.4551890581, -0.34430967131, 0.17518149431, 0.48698742437, 0.82175749057, 1.3422077731, 1.5574998595, 1.8802972916, 1.7639222823, 2.3389415784, -1.135683434, -1.0407915442, -0.28204995645, -0.12893398856, 0.21426782573, 0.12195379886, 1.0204591662, 1.032482652, 1.1363052242, 1.6126000857, 0.14540028358, 0.46732799842, 1.023117271, 1.2008374151, 1.338156175, 1.6260740471, 2.3306137273, 2.5430140585, 2.3738826324, 3.0395705048, 0.27066249806, 0.79810869337, 1.3481570958, 1.4565751888, 2.0118832626, 2.1907508758, 2.5318328946, 2.9597835043, 3.4801295354, 3.30021854, 0.84038198078, 1.6570167913, 1.9529163739, 2.3373025577, 2.4478557346, 2.8627824495, 3.3574660849, 3.5958345181, 3.9162719391, 4.0223877031, 0.14182978134, 0.41315277203, 0.53272575802, 1.0507506529, 1.3218261859, 1.4926383843, 2.0613971679, 2.1431513342, 2.8511247588, 2.5097604262, 0.51330675232, 1.1563702159, 1.246173972, 1.5150915729, 1.9098672979, 2.0452892511, 2.9157239392, 3.1810305009, 3.0528065919, 3.7945524832, -1.145576382, -0.86552690806, 0.099267618542, 0.15162029775, 0.30401658814, 0.52730073676, 0.86775713422, 1.0615046305, 1.5862643494, 1.9337531275, 0.49986430036, 0.92727032194, 0.88486521262, 1.4251932292, 1.9671197303, 2.3914775114, 2.3829985221, 2.3272305641, 2.8953541089, 3.2102820741, -0.5229676599, -0.54545794074, 0.1639319757, 0.20482810417, 0.80450008939, 0.99258724893, 0.7300606498, 1.5075843642, 2.1546064188, 2.1203480926, -0.93935482369, -0.73909857472, -0.53284504113, 0.12401402629, 0.044075090342, 0.53749441417, 0.43414409209, 0.77630725347, 1.3444888229, 1.8065788897, -0.0029535150072, 0.0014463739369, 0.32489035361, 0.80004633141, 1.1680962876, 1.229713721, 1.959642011, 1.5927424383, 2.2358970523, 2.2481333624, -0.61032154479, 0.41084487961, 0.3935257912, 0.94586771459, 1.1455912989, 1.4608485065, 1.4079219687, 2.0646871035, 2.1357387633, 2.1575804069, -0.66708622848, -0.95625869774, -0.52479299152, -0.57411412822, 0.046850177398, 0.62751186653, 0.48639144868, 0.65195103754, 1.1764910321, 1.7119156825, 0.80598450744, 1.1666817112, 1.2674040319, 1.325229956, 1.9102203689, 1.8296360734, 2.7418399073, 2.6220243642, 3.3077116636, 3.8321074974, 1.5161796993, 2.2250884659, 2.4826011278, 2.6832013129, 2.9849434104, 3.2321031828, 3.523055892, 4.0232226436, 3.9057438341, 4.6764668762] + }, + "params": { + "pattern": "multi_path_reversible", + "n_switcher_groups": 80, + "n_realized_groups": 120, + "n_periods": 10, + "seed": 116, + "effects": 3, + "by_path": 3, + "controls": "X1", + "ci_level": 95 + }, + "results": { + "by_path": [ + { + "path": "0,1,1,1", + "frequency_rank": 1, + "horizons": { + "1": { + "effect": 2.2207980992, + "se": 0.1389110462, + "ci_lo": 1.9485374516, + "ci_hi": 2.4930587468, + "n_switchers": 40, + "n_obs": 180 + }, + "2": { + "effect": 2.0309912182, + "se": 0.12594455913, + "ci_lo": 1.7841444182, + "ci_hi": 2.2778380181, + "n_switchers": 40, + "n_obs": 145 + }, + "3": { + "effect": 2.2134560437, + "se": 0.13937947014, + "ci_lo": 1.9402773021, + "ci_hi": 2.4866347854, + "n_switchers": 40, + "n_obs": 120 + } + } + }, + { + "path": "0,1,1,0", + "frequency_rank": 2, + "horizons": { + "1": { + "effect": 2.0938609016, + "se": 0.15371701433, + "ci_lo": 1.7925810897, + "ci_hi": 2.3951407135, + "n_switchers": 25, + "n_obs": 105 + }, + "2": { + "effect": 1.9592356796, + "se": 0.16449972251, + "ci_lo": 1.636822148, + "ci_hi": 2.2816492111, + "n_switchers": 25, + "n_obs": 85 + }, + "3": { + "effect": 0.18541804831, + "se": 0.18432327889, + "ci_lo": -0.17584893982, + "ci_hi": 0.54668503644, + "n_switchers": 25, + "n_obs": 70 + } + } + }, + { + "path": "0,1,0,0", + "frequency_rank": 3, + "horizons": { + "1": { + "effect": 2.4205005071, + "se": 0.23016818353, + "ci_lo": 1.969379157, + "ci_hi": 2.8716218573, + "n_switchers": 10, + "n_obs": 35 + }, + "2": { + "effect": 0.70170860301, + "se": 0.19437003934, + "ci_lo": 0.32075032623, + "ci_hi": 1.0826668798, + "n_switchers": 10, + "n_obs": 30 + }, + "3": { + "effect": -0.059399676238, + "se": 0.28864046959, + "ci_lo": -0.62512460111, + "ci_hi": 0.50632524864, + "n_switchers": 10, + "n_obs": 30 + } + } + } + ] + } } }, "generator": "generate_reversible_did_data v1", diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py index 2d11e252..30db7c7c 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille.py +++ b/diff_diff/chaisemartin_dhaultfoeuille.py @@ -408,10 +408,20 @@ class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin): the object of interest) and ``L_max >= 1`` (the path window depends on ``L_max``). Binary treatment only — non-binary treatment + ``by_path`` is deferred. Also incompatible with - ``controls``, ``trends_linear``, ``trends_nonparam``, - ``heterogeneity``, ``design2``, ``honest_did``, and - ``survey_design`` (each combination raises - ``NotImplementedError`` in the current release). + ``trends_linear``, ``trends_nonparam``, ``heterogeneity``, + ``design2``, ``honest_did``, and ``survey_design`` (each + combination raises ``NotImplementedError`` in the current + release). + + Compatible with ``controls`` (DID^X residualization) -- the + per-baseline OLS residualization runs once on first-differenced + ``Y`` BEFORE path enumeration, so per-path point estimates, + bootstrap SE, per-path placebos, and per-path sup-t bands all + consume the residualized ``Y_mat`` automatically (Frisch- + Waugh-Lovell). Per-period effects remain unadjusted, consistent + with the existing ``controls`` + per-period DID contract. The + cross-path cohort-sharing SE deviation from R documented for + ``path_effects`` is inherited unchanged. Compatible with ``n_bootstrap > 0`` -- the top-k paths are enumerated once on the observed data (paths held fixed across @@ -985,11 +995,6 @@ def fit( "[F_g - 1, F_g - 1 + L_max] and therefore depends on " "the event-study horizon. Set L_max when calling fit()." ) - if controls is not None: - raise NotImplementedError( - "by_path combined with controls (DID^X residualization) " - "is deferred to a future release." - ) if trends_linear: raise NotImplementedError( "by_path combined with trends_linear (DID^{fd}) is " diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 45101494..344957f0 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -638,7 +638,7 @@ The guard is fired by `_survey_se_from_group_if` (analytical and replicate) and - **Note (Phase 3 Design-2 switch-in/switch-out):** Convenience wrapper for Web Appendix Section 1.6 (Assumption 16). Identifies groups with exactly 2 treatment changes (join then leave), reports switch-in and switch-out mean effects. This is a descriptive summary, not a full re-estimation with specialized control pools as described in the paper. **Always uses raw (unadjusted) outcomes** regardless of active `controls`, `trends_linear`, or `trends_nonparam` options - those adjustments apply to the main estimator surface but not to the Design-2 descriptive block. For full adjusted Design-2 estimation with proper control pools, the paper recommends "running the command on a restricted subsample and using `trends_nonparam` for the entry-timing grouping." Activated via `design2=True` in `fit()`, requires `drop_larger_lower=False` to retain 2-switch groups. -- **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `controls`, `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs, cband_lower, cband_upper]` (the last two are added by the joint sup-t Note below; populated for positive-horizon rows of paths with a finite sup-t crit, NaN otherwise). Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass). +- **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs, cband_lower, cband_upper]` (the last two are added by the joint sup-t Note below; populated for positive-horizon rows of paths with a finite sup-t crit, NaN otherwise). Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass). **Per-path covariate residualization (DID^X):** when `controls=[...]` is set with `by_path=k`, the per-baseline OLS residualization (Web Appendix Section 1.2) runs once on the first-differenced outcome BEFORE path enumeration. All four downstream surfaces — analytical per-path SE, bootstrap SE, per-path placebos, and per-path joint sup-t bands — consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity is confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls` on the `multi_path_reversible_by_path_controls` scenario; cross-surface inheritance is regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns). - **Note (Phase 3 `by_path` per-path joint sup-t bands):** When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are computed across horizons `1..L_max` within each path. **Methodology:** a single `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons within the path. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is then used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon, surfaced in `path_effects[path]["horizons"][l]["cband_conf_int"]` and at top-level `results.path_sup_t_bands[path] = {"crit_value", "alpha", "n_bootstrap", "method", "n_valid_horizons"}`. **Gates:** a path must have `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band; otherwise the path is absent from `path_sup_t_bands`. Both gates mirror the OVERALL `event_study_sup_t_bands` semantics at `chaisemartin_dhaultfoeuille_bootstrap.py:605,612`: `len(valid_horizons) >= 2` AND `finite_mask.sum() > 0.5 * n_bootstrap`. Exactly half-finite draws are NOT enough — the gate is strictly greater than half. **Empty-state contract:** `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates. **`to_dataframe(level="by_path")` integration:** the table now includes `cband_lower` / `cband_upper` columns for parity with OVERALL `level="event_study"`; populated for positive-horizon rows of paths with a finite sup-t crit, NaN for placebo rows / unbanded paths / the requested-but-empty fallback DataFrame. **Methodology asymmetry vs OVERALL:** OVERALL sup-t reuses the same multi-horizon shared-draw distribution for both the SE in the t-stat denominator and the bootstrap distribution in the numerator. The per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — numerator: fresh shared draws, denominator: bootstrap SEs from the earlier independent draws. Asymptotically equivalent to OVERALL's self-consistent reuse, but NOT bit-identical. The fresh draw is intentional: it preserves RNG-state isolation and keeps every existing per-path SE seed-reproducibility test bit-stable post-implementation. **Inherited deviation from R:** the bootstrap SE used as the t-stat denominator carries the cross-path cohort-sharing SE deviation from R documented for `path_effects` above; the per-path sup-t crit therefore inherits the same deviation. **Interpretation:** the band covers joint inference *within a single path across horizons*; it does NOT provide simultaneous coverage *across paths* (a different inference target requiring a `path × horizon` re-derivation, deferred to a future wave). **Deviation from R:** `did_multiplegt_dyn` provides no joint / sup-t / simultaneous bands at any surface — this is a Python-only methodology extension, consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Regression test anchor: `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands`. diff --git a/tests/test_chaisemartin_dhaultfoeuille.py b/tests/test_chaisemartin_dhaultfoeuille.py index ed80f83c..7a18595c 100644 --- a/tests/test_chaisemartin_dhaultfoeuille.py +++ b/tests/test_chaisemartin_dhaultfoeuille.py @@ -3786,7 +3786,13 @@ def test_requires_lmax(self): @pytest.mark.parametrize( "fit_kwargs, msg", [ - ({"controls": ["outcome"]}, "controls"), + # NB: the prior `controls` entry was removed when the + # `by_path + controls` gate was lifted (Wave 3 #5). The + # entry used `controls=["outcome"]` as a "any column works + # because the gate fires first" shortcut; after gate + # removal the column-existence path runs and `outcome` is + # itself a valid column, so there is no equivalent + # NotImplementedError-raising input to migrate to. ({"trends_linear": True}, "trends_linear"), ({"trends_nonparam": "group"}, "trends_nonparam"), ({"heterogeneity": "group"}, "heterogeneity"), @@ -6065,3 +6071,488 @@ def fake_generator( f"l={l_h}: OVERALL cband_conf_int written despite " f"strict-majority gate failure at exactly 50% finite" ) + + +# --------------------------------------------------------------------------- +# Wave 3 #5: by_path + controls (DID^X residualization) +# --------------------------------------------------------------------------- + + +def _by_path_three_path_data_with_controls( + seed: int = 42, x_effect: float = 3.0 +) -> pd.DataFrame: + """Three-path panel with confounding covariate X1. + + Extends ``_by_path_three_path_data``: same 8-group / 4-period + structure with the same path assignment, but adds an X1 column + whose group-level mean is tied to the group identity (group g + has X1 base = 0.3*g) and outcome includes ``x_effect * X1`` as + a confounding term. Designed so that fitting WITHOUT controls + produces a biased per-path estimate and WITH ``controls=["X1"]`` + recovers the underlying treatment effect (= 2.0) via FWL + residualization. + """ + rng = np.random.default_rng(seed) + rows = [] + + def _build(group, treatment_path, x_base): + for t, d in enumerate(treatment_path): + x = x_base + 0.2 * t + rng.normal(0, 0.1) + y = d * 2.0 + x_effect * x + rng.normal(0, 0.1) + rows.append( + { + "group": group, + "period": t, + "treatment": d, + "outcome": y, + "X1": x, + } + ) + + for g in (1, 2, 3): + _build(g, [0, 1, 1, 1], x_base=0.1 * g) + for g in (4, 5): + _build(g, [0, 1, 0, 0], x_base=0.1 * g) + _build(6, [0, 1, 1, 0], x_base=0.1 * 6) + for g in (7, 8): + _build(g, [0, 0, 0, 0], x_base=0.1 * g) + return pd.DataFrame(rows) + + +def _load_by_path_controls_scenario(): + """Load the golden-value scenario for by_path + controls. + + Returns the data frame including X1, or pytest.skip if the golden + file is missing (CI's isolated-install job ships only tests/, not + benchmarks/, per ``feedback_golden_file_pytest_skip.md``). + """ + golden_path = ( + Path(__file__).parents[1] + / "benchmarks" + / "data" + / "dcdh_dynr_golden_values.json" + ) + if not golden_path.exists(): + pytest.skip( + f"dCDH golden values file not found at {golden_path}; " + "run: Rscript benchmarks/R/generate_dcdh_dynr_test_values.R" + ) + with open(golden_path) as f: + sc = json.load(f)["scenarios"].get("multi_path_reversible_by_path_controls") + if sc is None: + pytest.skip("scenario 'multi_path_reversible_by_path_controls' absent") + return pd.DataFrame(sc["data"]) + + +class TestByPathControls: + """Wave 3 #5: ``by_path`` + ``controls`` (DID^X residualization). + + Tests the gate-lift PR. Validates that all four downstream surfaces + (analytical SE, bootstrap SE, per-path placebos, per-path sup-t + bands) auto-inherit residualized ``Y_mat`` produced once at + ``chaisemartin_dhaultfoeuille.py:1498`` (the residualization runs + BEFORE path enumeration, so the per-path computation consumes the + residualized outcome). + + R parity for per-path point estimates is validated separately at + ``tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls``. + """ + + # Gate removal ------------------------------------------------------- + def test_no_longer_raises(self): + """``by_path + controls`` no longer raises NotImplementedError.""" + data = _by_path_three_path_data_with_controls() + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est = ChaisemartinDHaultfoeuille(drop_larger_lower=False, by_path=3) + res = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + assert res.path_effects is not None + assert len(res.path_effects) >= 1 + + # Analytical SE ------------------------------------------------------ + def test_residualization_changes_per_path_estimates(self): + """Strongly-confounded DGP: with vs without controls per-path + coefficients differ for at least one (path, horizon) by a + non-trivial margin.""" + data = _by_path_three_path_data_with_controls(x_effect=5.0) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est_no = ChaisemartinDHaultfoeuille(drop_larger_lower=False, by_path=3) + res_no = est_no.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + L_max=3, + ) + est_yes = ChaisemartinDHaultfoeuille(drop_larger_lower=False, by_path=3) + res_yes = est_yes.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + + max_diff = 0.0 + for path, entry_yes in res_yes.path_effects.items(): + entry_no = res_no.path_effects.get(path, {"horizons": {}}) + for l_h, vals_yes in entry_yes["horizons"].items(): + vals_no = entry_no["horizons"].get(l_h, {}) + if "effect" in vals_no and np.isfinite(vals_no["effect"]): + max_diff = max(max_diff, abs(vals_yes["effect"] - vals_no["effect"])) + + # At least one (path, horizon) must differ noticeably + assert max_diff > 0.5, ( + f"Residualization had no effect on any per-path estimate " + f"(max abs diff = {max_diff}). Expected confounding to be " + f"corrected by controls=['X1']." + ) + + def test_path_enumeration_unaffected_by_controls(self): + """Path enumeration depends only on D_mat / first_switch_idx, + not on residualized Y_mat — same paths enumerated with or + without controls.""" + data = _by_path_three_path_data_with_controls() + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + _, res_no = _fit_by_path(data, by_path=3, L_max=3) + est_yes = ChaisemartinDHaultfoeuille(drop_larger_lower=False, by_path=3) + res_yes = est_yes.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + + assert set(res_no.path_effects.keys()) == set(res_yes.path_effects.keys()), ( + f"Path set differs between no-controls and controls fits: " + f"no={sorted(res_no.path_effects.keys())} " + f"yes={sorted(res_yes.path_effects.keys())}" + ) + # Frequency rank must also match (path counts unchanged) + for path, entry_yes in res_yes.path_effects.items(): + entry_no = res_no.path_effects[path] + assert entry_yes["frequency_rank"] == entry_no["frequency_rank"] + assert entry_yes["n_groups"] == entry_no["n_groups"] + + def test_multi_covariate_works(self): + """``controls=["X1", "X2"]`` fits successfully and produces + finite per-path estimates and SEs.""" + data = _by_path_three_path_data_with_controls() + # Add a second covariate + rng = np.random.default_rng(99) + data = data.assign( + X2=lambda d: 0.5 * d["X1"] + rng.normal(0, 0.5, size=len(d)) + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est = ChaisemartinDHaultfoeuille(drop_larger_lower=False, by_path=3) + res = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1", "X2"], + L_max=3, + ) + assert res.path_effects is not None + for path, entry in res.path_effects.items(): + for l_h, vals in entry["horizons"].items(): + assert np.isfinite(vals["effect"]), ( + f"path={path} l={l_h}: effect not finite under multi-covariate" + ) + + # Bootstrap SE inheritance ------------------------------------------ + @pytest.mark.slow + def test_bootstrap_with_controls_finite_se(self): + """Bootstrap SE is finite > 0 on a non-degenerate panel under + ``controls`` — verifies the per-path bootstrap pipeline + consumes the residualized Y_mat without breaking.""" + data = _load_by_path_controls_scenario() + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est = ChaisemartinDHaultfoeuille( + drop_larger_lower=False, by_path=3, n_bootstrap=200, seed=42 + ) + res = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + any_finite = False + for path, entry in res.path_effects.items(): + for _l_h, vals in entry["horizons"].items(): + if np.isfinite(vals["se"]) and vals["se"] > 0: + any_finite = True + break + assert any_finite, "No (path, horizon) produced a finite > 0 bootstrap SE" + + @pytest.mark.slow + def test_bootstrap_point_estimates_unchanged(self): + """Bootstrap perturbs SE only; point estimates equal the + analytical-only fit on the same seed.""" + data = _load_by_path_controls_scenario() + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est_a = ChaisemartinDHaultfoeuille( + drop_larger_lower=False, by_path=3, seed=42 + ) + res_a = est_a.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + est_b = ChaisemartinDHaultfoeuille( + drop_larger_lower=False, by_path=3, n_bootstrap=200, seed=42 + ) + res_b = est_b.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + for path, entry_a in res_a.path_effects.items(): + entry_b = res_b.path_effects[path] + for l_h, vals_a in entry_a["horizons"].items(): + vals_b = entry_b["horizons"][l_h] + np.testing.assert_allclose( + vals_a["effect"], + vals_b["effect"], + rtol=1e-12, + atol=1e-12, + err_msg=f"path={path} l={l_h}: bootstrap changed point estimate", + ) + + # Per-path placebos inheritance ------------------------------------- + @pytest.mark.slow + def test_per_path_placebos_with_controls_present(self): + """``placebo=True + controls=['X1']`` populates + ``path_placebo_event_study[path][-l]`` with finite values for at + least one (path, l).""" + data = _load_by_path_controls_scenario() + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est = ChaisemartinDHaultfoeuille( + drop_larger_lower=False, by_path=3, placebo=True, seed=42 + ) + res = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + assert res.path_placebo_event_study is not None + any_finite = False + for path, lags in res.path_placebo_event_study.items(): + for lag, vals in lags.items(): + if np.isfinite(vals.get("effect", np.nan)): + any_finite = True + break + assert any_finite, ( + "No per-path placebo lag produced a finite effect under " + "controls + by_path + placebo" + ) + + @pytest.mark.slow + def test_per_path_placebos_with_controls_bootstrap(self): + """Bootstrap SEs on the per-path placebo surface are finite under + ``controls + by_path + placebo + n_bootstrap``.""" + data = _load_by_path_controls_scenario() + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est = ChaisemartinDHaultfoeuille( + drop_larger_lower=False, + by_path=3, + placebo=True, + n_bootstrap=200, + seed=42, + ) + res = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + assert res.path_placebo_event_study is not None + any_finite_se = False + for path, lags in res.path_placebo_event_study.items(): + for lag, vals in lags.items(): + se = vals.get("se", np.nan) + if np.isfinite(se) and se > 0: + any_finite_se = True + break + assert any_finite_se, ( + "No per-path placebo lag produced a finite > 0 bootstrap SE" + ) + + # Per-path sup-t bands inheritance ---------------------------------- + @pytest.mark.slow + def test_sup_t_bands_with_controls_finite_crit(self): + """``path_sup_t_bands[path]['crit_value']`` is finite > 0 for + paths passing the >=2 valid horizons + strict-majority gates + under ``controls``. Uses ``n_bootstrap=400`` to keep the gate + margin comfortable on the small per-path samples.""" + data = _load_by_path_controls_scenario() + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est = ChaisemartinDHaultfoeuille( + drop_larger_lower=False, by_path=3, n_bootstrap=400, seed=42 + ) + res = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + assert res.path_sup_t_bands is not None + # At least one path should pass both gates + any_finite = any( + np.isfinite(entry.get("crit_value", np.nan)) + and entry.get("crit_value", -1) > 0 + for entry in res.path_sup_t_bands.values() + ) + assert any_finite, ( + "No path produced a finite > 0 sup-t crit_value under controls; " + f"path_sup_t_bands keys: {list(res.path_sup_t_bands.keys())}" + ) + + # Edge cases -------------------------------------------------------- + def test_per_period_effects_unadjusted_with_by_path_controls(self): + """Per-period DID does not support residualization + (``chaisemartin_dhaultfoeuille.py:1493-1496``); the per-period + effects surface returned by ``fit()`` must be unaffected by + controls when by_path is also set, mirroring the existing + controls + per-period contract.""" + data = _by_path_three_path_data_with_controls() + # Fit with by_path + controls AND with by_path alone, comparing + # per_period_effects (raw Y path). + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + _, res_no = _fit_by_path(data, by_path=3, L_max=3) + est_yes = ChaisemartinDHaultfoeuille(drop_larger_lower=False, by_path=3) + res_yes = est_yes.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + # Per-period DID is unaffected by controls residualization + # (operates on raw Y, not residualized Y) — both fits produce + # identical per_period_effects. Per-period dicts contain + # `did_plus_t` and `did_minus_t` (not `effect`); both fields + # must match bit-identically across the no-controls / controls + # fits to lock in the unadjusted contract. + if res_no.per_period_effects is not None: + assert res_yes.per_period_effects is not None + for t in res_no.per_period_effects: + assert t in res_yes.per_period_effects, ( + f"per_period_effects period {t} missing under controls" + ) + for field in ("did_plus_t", "did_minus_t"): + np.testing.assert_allclose( + res_no.per_period_effects[t][field], + res_yes.per_period_effects[t][field], + rtol=1e-12, + atol=1e-12, + err_msg=( + f"per_period_effects[{t}][{field}] differs " + f"under controls — per-period DID was expected " + f"to remain unadjusted (raw Y_mat)" + ), + ) + + def test_covariate_residuals_round_trip_with_by_path(self): + """``results.covariate_residuals`` is a non-empty DataFrame + after fitting ``by_path + controls`` — the field is set + unconditionally on the controls path + (``chaisemartin_dhaultfoeuille_results.py:532``) and must + surface intact regardless of whether by_path is also active.""" + data = _by_path_three_path_data_with_controls() + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est = ChaisemartinDHaultfoeuille(drop_larger_lower=False, by_path=3) + res = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + assert res.covariate_residuals is not None + assert isinstance(res.covariate_residuals, pd.DataFrame) + assert len(res.covariate_residuals) > 0 + + @pytest.mark.slow + def test_to_dataframe_by_path_with_controls_and_bootstrap(self): + """``results.to_dataframe(level='by_path')`` populates + ``cband_lower`` / ``cband_upper`` for paths passing the PR #374 + sup-t gates under ``controls`` — pre-empts the cross-surface + adjacency CI reviewers cycle on per + ``feedback_cross_surface_parity_audit.md``.""" + data = _load_by_path_controls_scenario() + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est = ChaisemartinDHaultfoeuille( + drop_larger_lower=False, by_path=3, n_bootstrap=400, seed=42 + ) + res = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + df_long = res.to_dataframe(level="by_path") + assert "cband_lower" in df_long.columns + assert "cband_upper" in df_long.columns + # At least one row must have a finite cband + any_finite_cband = ( + df_long["cband_lower"].notna() & df_long["cband_upper"].notna() + ).any() + assert any_finite_cband, ( + "to_dataframe(level='by_path') produced no rows with finite " + "cband columns under controls + bootstrap" + ) diff --git a/tests/test_chaisemartin_dhaultfoeuille_parity.py b/tests/test_chaisemartin_dhaultfoeuille_parity.py index c2081cf2..d0797458 100644 --- a/tests/test_chaisemartin_dhaultfoeuille_parity.py +++ b/tests/test_chaisemartin_dhaultfoeuille_parity.py @@ -751,3 +751,112 @@ def test_parity_multi_path_reversible_by_path_placebo(self, golden_values): f"path={path_key} lag={h} placebo SE: " f"py={py_se:.4f} vs r={r_se:.4f}" ) + + +class TestDCDHDynRParityByPathControls: + """ + Parity tests for ``by_path + controls`` (DID^X residualization) + against R DIDmultiplegtDYN 2.3.3. + + R's ``did_multiplegt_dyn(..., by_path=k, controls="X1")`` re-runs + the estimator per path with a path-restricted subsample (path's + switchers + same-baseline not-yet-treated controls). Our + architecture residualizes once on the full panel before path + enumeration. On the ``multi_path_reversible`` DGP, all switchers + share baseline ``D_{g,1}=0``, so the per-path control pool that R + feeds to its per-baseline OLS residualization equals the global + control pool we use — and the residualization coefficients (and + therefore the residualized outcomes) coincide. Per-path point + estimates then match R exactly (rtol ~1e-11). Per-path SE + inherits the documented cross-path cohort-sharing deviation + (Phase 2 envelope). + + On multi-baseline DGPs the residualization coefficients can + diverge across paths under R's per-path call, producing a small + deviation in point estimates. The fixture intentionally sticks to + the single-baseline scenario to keep the parity claim tight. + """ + + POINT_RTOL = 1e-9 + SE_RTOL = 0.12 + + def _path_key_from_r_label(self, r_label: str): + return tuple(int(x) for x in r_label.split(",")) + + def test_parity_multi_path_reversible_by_path_controls(self, golden_values): + """3-path case with covariate residualization: by_path=3, controls=X1.""" + import math + import warnings + + scenario = golden_values.get("multi_path_reversible_by_path_controls") + if scenario is None: + pytest.skip( + "scenario 'multi_path_reversible_by_path_controls' not in golden values" + ) + + df = _golden_to_df_with_covariates(scenario["data"]) + est = ChaisemartinDHaultfoeuille(drop_larger_lower=False, by_path=3) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + results = est.fit( + df, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + + r_by_path = scenario["results"]["by_path"] + assert results.path_effects is not None + + py_keys = set(results.path_effects.keys()) + r_keys = {self._path_key_from_r_label(e["path"]) for e in r_by_path} + assert py_keys == r_keys, ( + f"Path-set mismatch.\n" + f" Python only: {py_keys - r_keys}\n" + f" R only: {r_keys - py_keys}" + ) + + for r_path_entry in r_by_path: + path_key = self._path_key_from_r_label(r_path_entry["path"]) + py_path = results.path_effects[path_key] + + assert py_path["frequency_rank"] == r_path_entry["frequency_rank"], ( + f"path={path_key}: frequency_rank mismatch " + f"py={py_path['frequency_rank']} vs r={r_path_entry['frequency_rank']}" + ) + + for h_str, r_h in r_path_entry["horizons"].items(): + h = int(h_str) + assert ( + h in py_path["horizons"] + ), f"path={path_key}: horizon {h} missing from Python path_effects" + py_h = py_path["horizons"][h] + + assert py_h["n_obs"] == int(r_h["n_switchers"]), ( + f"path={path_key} h={h}: switcher-count mismatch " + f"py={py_h['n_obs']} vs r={int(r_h['n_switchers'])}" + ) + + assert py_h["effect"] == pytest.approx( + r_h["effect"], rel=self.POINT_RTOL + ), ( + f"path={path_key} h={h}: " + f"py={py_h['effect']:.4f} vs r={r_h['effect']:.4f}" + ) + + py_se = py_h["se"] + r_se = r_h["se"] + py_finite_positive = math.isfinite(py_se) and py_se > 0.0 + r_finite_positive = math.isfinite(r_se) and r_se > 0.0 + assert py_finite_positive == r_finite_positive, ( + f"path={path_key} h={h} SE state mismatch " + f"(py_se={py_se}, r_se={r_se})" + ) + if py_finite_positive and r_finite_positive: + assert py_se == pytest.approx(r_se, rel=self.SE_RTOL), ( + f"path={path_key} h={h} SE: " + f"py={py_se:.4f} vs r={r_se:.4f}" + ) From 028e7e180952f627642d9f545fb2f43d51451202 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 18:37:19 -0400 Subject: [PATCH 2/6] Address PR #378 R0 P1: multi-baseline R-deviation warning + REGISTRY note MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CI reviewer flagged that the new by_path + controls combination silently produces point-estimate divergence from R on multi-baseline switcher panels (R re-runs per-baseline residualization on each path's restricted subsample; we residualize once globally). The parity test docstring documented the deviation but REGISTRY.md and the runtime did not. Fixes: - Emit UserWarning in fit() when by_path + controls is used on a panel with multiple switcher D_{g,1} values (chaisemartin_dhaultfoeuille.py inside the controls residualization block, after _compute_group_switch_metadata) - Update the by_path docstring with an explicit "Deviation from R on multi-baseline switcher panels" paragraph - Update REGISTRY.md "Per-path covariate residualization (DID^X)" paragraph to document the point-estimate deviation alongside the existing SE deviation - Update CHANGELOG entry to call out the multi-baseline deviation - Update R-generator scenario 16 comment to correctly describe R's per-path re-residualization (the prior comment misstated R's behavior as "residualize once globally") - Update parity test class docstring to be precise about R's per-path call site (R/R/did_multiplegt_dyn.R lines 393-411) - Add two regression tests: * test_multi_baseline_panel_emits_r_deviation_warning — joiner + leaver + always-treated + never-treated panel triggers the warning * test_single_baseline_panel_does_not_emit_r_deviation_warning — standard 3-path joiners-only fixture does NOT trigger the warning The single-baseline R-parity scenario (multi_path_reversible_by_path_controls) remains exact-match (rtol ~1e-11) because all switchers in the DGP share D_{g,1}=0 and R's per-path control pool reduces to the global control pool we use. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 2 +- benchmarks/R/generate_dcdh_dynr_test_values.R | 29 +++-- diff_diff/chaisemartin_dhaultfoeuille.py | 51 ++++++++- docs/methodology/REGISTRY.md | 2 +- tests/test_chaisemartin_dhaultfoeuille.py | 103 ++++++++++++++++++ ...test_chaisemartin_dhaultfoeuille_parity.py | 38 ++++--- 6 files changed, 195 insertions(+), 30 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index efc0c411..d60f513f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added -- **`ChaisemartinDHaultfoeuille.by_path` + `controls`** (DID^X residualization) — the per-baseline OLS residualization (Web Appendix Section 1.2) is now compatible with `by_path=k`. The residualization runs once on the first-differenced outcome BEFORE path enumeration, so all four downstream surfaces (analytical per-path SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Inherits the cross-path cohort-sharing SE deviation from R** documented for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` via the new `multi_path_reversible_by_path_controls` golden-value scenario (per-path point estimates exact match — measured rtol ~1e-11 across all path × horizon cells; per-path SE within ~6.5% of R, well inside the Phase 2 multi-horizon envelope). Gate at `chaisemartin_dhaultfoeuille.py:988-992` removed; `by_path` docstring updated to add the new compatibility paragraph and remove `controls` from the incompatible list. R-parity test at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls`; cross-surface inheritance regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path covariate residualization (DID^X)" for the full contract. +- **`ChaisemartinDHaultfoeuille.by_path` + `controls`** (DID^X residualization) — the per-baseline OLS residualization (Web Appendix Section 1.2) is now compatible with `by_path=k`. The residualization runs once on the first-differenced outcome BEFORE path enumeration, so all four downstream surfaces (analytical per-path SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Deviation from R on multi-baseline switcher panels (point estimates):** R `did_multiplegt_dyn(..., by_path, controls)` re-runs the per-baseline OLS residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls), so its residualization coefficients vary per path when switchers have different baseline values. Our global-residualization architecture coincides with R on single-baseline switcher panels (every switcher shares the same `D_{g,1}`) — per-path point estimates match R exactly there. On multi-baseline panels, point estimates can diverge; the estimator emits a `UserWarning` at fit-time when this configuration is detected so practitioners do not silently consume estimates that disagree with R. **SE inherits the cross-path cohort-sharing SE deviation from R** documented for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` via the new `multi_path_reversible_by_path_controls` single-baseline golden-value scenario (per-path point estimates exact match — measured rtol ~1e-11 across all path × horizon cells; per-path SE within ~6.5% of R, well inside the Phase 2 multi-horizon envelope). Gate at `chaisemartin_dhaultfoeuille.py:988-992` removed; `by_path` docstring updated to add the new compatibility paragraph (with the multi-baseline caveat) and remove `controls` from the incompatible list. R-parity test at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls`; cross-surface inheritance + multi-baseline `UserWarning` regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns + multi-baseline warning). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path covariate residualization (DID^X)" for the full contract. - **HAD linearity-family pretests under survey (Phase 4.5 C).** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` now accept `weights=` / `survey=` keyword-only kwargs. Stute family uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap): each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix, broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`, weighted OLS refit, weighted CvM via new `_cvm_statistic_weighted` helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence AND PSU clustering. Yatchew uses **closed-form weighted OLS + pweight-sandwich variance components** (no bootstrap): `sigma2_lin = sum(w·eps²)/sum(w)`, `sigma2_diff = sum(w_avg·diff²)/(2·sum(w))` with arithmetic-mean pair weights `w_avg_g = (w_g+w_{g-1})/2`, `sigma4_W = sum(w_avg·prod)/sum(w_avg)`, `T_hr = sqrt(sum(w))·(sigma2_lin-sigma2_diff)/sigma2_W`. All three Yatchew components reduce bit-exactly to the unweighted formulas at `w=ones(G)` (locked at `atol=1e-14` by direct helper test). The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (new `survey._make_trivial_resolved` helper) so the same kernel handles both entry paths. `did_had_pretest_workflow(..., survey=, weights=)` removes the Phase 4.5 C0 `NotImplementedError`, dispatches to the survey-aware sub-tests, **skips the QUG step with `UserWarning`** (per C0 deferral), sets `qug=None` on the report, and appends a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix to the verdict. `HADPretestReport.qug` retyped from `QUGTestResults` to `Optional[QUGTestResults]`; `summary()` / `to_dict()` / `to_dataframe()` updated to None-tolerant rendering. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise `NotImplementedError` at every entry point (defense in depth, reciprocal-guard discipline) — parallel follow-up after this PR. **Stratified designs (`SurveyDesign(strata=...)`) also raise `NotImplementedError` on the Stute family** — the within-stratum demean + `sqrt(n_h/(n_h-1))` correction that the HAD sup-t bootstrap applies to match the Binder-TSL stratified target has not been derived for the Stute CvM functional, so applying raw multipliers from `generate_survey_multiplier_weights_batch` directly to residual perturbations would leave the bootstrap p-value silently miscalibrated. Phase 4.5 C narrows survey support to **pweight-only**, **PSU-only** (`SurveyDesign(weights=, psu=)`), and **FPC-only** (`SurveyDesign(weights=, fpc=)`) designs; stratified is a follow-up after the matching Stute-CvM stratified-correction derivation lands. Strictly positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Per-row `weights=` / `survey=col` aggregated to per-unit via existing HAD helpers `_aggregate_unit_weights` / `_aggregate_unit_resolved_survey` (constant-within-unit invariant enforced). Unweighted code paths preserved bit-exactly. Patch-level addition (additive on stable surfaces). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C) for the full methodology. - **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0` joint sup-t bands** — per-path joint sup-t simultaneous confidence intervals across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon. Surfaced on `results.path_sup_t_bands` (dict keyed by path tuple, each entry with `crit_value / alpha / n_bootstrap / method / n_valid_horizons`); as `cband_conf_int` per horizon entry on `path_effects[path]["horizons"][l]`; and as `cband_lower` / `cband_upper` columns on `results.to_dataframe(level="by_path")` (mirrors the OVERALL `level="event_study"` schema; positive-horizon rows of banded paths get populated values, placebo / unbanded / empty-window rows get NaN). Gates: a path needs `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Empty-state contract: `path_sup_t_bands is None` when not requested; `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL `event_study_sup_t_bands`:** the per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Documented intentional choice to preserve RNG-state isolation for existing per-path SE seed-reproducibility tests. Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Deviation from R:** `did_multiplegt_dyn` does not provide joint / sup-t bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL sup-t bands (also Python-only). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Pre-audit fix bundled: stale "Phase 2 placeholder" docstring on the existing `sup_t_bands` field updated to the actual contract description. Tests at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands` (`@pytest.mark.slow`). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path joint sup-t bands)` for the full contract. - **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract. diff --git a/benchmarks/R/generate_dcdh_dynr_test_values.R b/benchmarks/R/generate_dcdh_dynr_test_values.R index f507bcfe..ed168644 100644 --- a/benchmarks/R/generate_dcdh_dynr_test_values.R +++ b/benchmarks/R/generate_dcdh_dynr_test_values.R @@ -703,16 +703,25 @@ scenarios$multi_path_reversible_by_path_placebo <- list( # Wave 3 #5: by_path + DID^X residualization). Same deterministic DGP # and n_periods=10 as scenarios 14/15, with a confounding covariate X1 # added via the same `add_covariate` helper used by scenario 10's -# `joiners_only_controls`. Per-baseline OLS residualization runs once -# globally before path enumeration on both Python and R sides -# (verified against `chaisemartinPackages/did_multiplegt_dyn` source — -# `did_multiplegt_by_path` calls `did_multiplegt_main()` once with the -# global controls residualization, then disaggregates per-path through -# aggregation). Per-path event-study point estimates and switcher -# counts must match R exactly; per-path SE within the documented Phase -# 2 envelope and inherits the cross-path cohort-sharing deviation from -# R documented for `path_effects`. Single covariate keeps the scenario -# tight; multi-covariate is exercised via internal regression tests. +# `joiners_only_controls`. **R re-runs `did_multiplegt_main()` per path** +# with a path-restricted subsample (path's switchers + same-baseline +# not-yet-treated controls), so its per-baseline OLS residualization +# coefficients can vary per path (verified against +# `chaisemartinPackages/did_multiplegt_dyn` source — +# `R/R/did_multiplegt_dyn.R` lines 393-411 dispatch the per-path loop; +# `did_multiplegt_by_path` is a path-classifier preprocessor only). +# Python residualizes once on the full panel before path enumeration, +# then disaggregates per path. **The two strategies coincide on +# single-baseline switcher panels** (every switcher shares D_{g,1}=0) +# because R's per-path control pool then equals the global control pool +# — `multi_path_reversible` is built precisely for this property, so +# per-path event-study point estimates and switcher counts must match R +# exactly. Per-path SE inherits the documented cross-path cohort-sharing +# deviation from R for `path_effects`. On multi-baseline switcher panels +# the residualization coefficients can diverge per path between Python +# and R; the production fit emits a `UserWarning` in that configuration. +# Single covariate keeps the scenario tight; multi-covariate is +# exercised via internal regression tests. cat(" Scenario 16: multi_path_reversible_by_path_controls\n") d16 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 10, pattern = "multi_path_reversible", seed = 116, diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py index 30db7c7c..3cc43eb5 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille.py +++ b/diff_diff/chaisemartin_dhaultfoeuille.py @@ -419,9 +419,21 @@ class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin): bootstrap SE, per-path placebos, and per-path sup-t bands all consume the residualized ``Y_mat`` automatically (Frisch- Waugh-Lovell). Per-period effects remain unadjusted, consistent - with the existing ``controls`` + per-period DID contract. The - cross-path cohort-sharing SE deviation from R documented for - ``path_effects`` is inherited unchanged. + with the existing ``controls`` + per-period DID contract. + + **Deviation from R on multi-baseline switcher panels:** R + ``did_multiplegt_dyn(..., by_path, controls)`` re-runs the + per-baseline residualization on each path's restricted + subsample (path's switchers + same-baseline not-yet-treated + controls), so its residualization coefficients vary per path + when switchers have different baseline values. Our global- + residualization architecture coincides with R on single- + baseline panels (every switcher shares the same ``D_{g,1}``) + and per-path point estimates match exactly. On multi-baseline + panels, point estimates can diverge — a ``UserWarning`` is + emitted at fit-time when this configuration is detected. + SE inherits the cross-path cohort-sharing deviation from R + documented for ``path_effects``. Compatible with ``n_bootstrap > 0`` -- the top-k paths are enumerated once on the observed data (paths held fixed across @@ -1478,6 +1490,39 @@ def fit( ) _switch_metadata_computed = True + # by_path + controls multi-baseline deviation from R: R re-runs + # the per-baseline OLS residualization on each path's restricted + # subsample (path's switchers + same-baseline not-yet-treated + # controls), so its residualization coefficients can differ per + # path. We residualize once on the full panel before path + # enumeration. On single-baseline switcher panels (every + # switcher has the same D_{g,1}) the two strategies coincide + # and per-path point estimates match R exactly. On multi- + # baseline switcher panels they can diverge — warn the user + # explicitly so they don't silently consume estimates that + # disagree with R. SE inheritance (cross-path cohort-sharing) + # is documented separately in REGISTRY.md. + if self.by_path is not None: + _switcher_mask = first_switch_idx_arr >= 0 + if _switcher_mask.any(): + _switcher_baselines = baselines[_switcher_mask] + if np.unique(_switcher_baselines).size > 1: + warnings.warn( + "by_path + controls: switcher baselines D_{g,1} " + "take multiple values in this panel. Python " + "residualizes once on the full panel before path " + "enumeration; R `did_multiplegt_dyn(..., by_path, " + "controls)` re-runs residualization per path on " + "the path-restricted subsample, so per-path point " + "estimates can diverge between Python and R on " + "this panel. See `docs/methodology/REGISTRY.md` " + "(`Note (Phase 3 by_path ...)` -> Per-path " + "covariate residualization) for the full " + "deviation contract.", + UserWarning, + stacklevel=2, + ) + Y_mat_residualized, covariate_diagnostics, _failed_baselines = ( _compute_covariate_residualization( Y_mat=Y_mat, diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 344957f0..7693f543 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -638,7 +638,7 @@ The guard is fired by `_survey_se_from_group_if` (analytical and replicate) and - **Note (Phase 3 Design-2 switch-in/switch-out):** Convenience wrapper for Web Appendix Section 1.6 (Assumption 16). Identifies groups with exactly 2 treatment changes (join then leave), reports switch-in and switch-out mean effects. This is a descriptive summary, not a full re-estimation with specialized control pools as described in the paper. **Always uses raw (unadjusted) outcomes** regardless of active `controls`, `trends_linear`, or `trends_nonparam` options - those adjustments apply to the main estimator surface but not to the Design-2 descriptive block. For full adjusted Design-2 estimation with proper control pools, the paper recommends "running the command on a restricted subsample and using `trends_nonparam` for the entry-timing grouping." Activated via `design2=True` in `fit()`, requires `drop_larger_lower=False` to retain 2-switch groups. -- **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs, cband_lower, cband_upper]` (the last two are added by the joint sup-t Note below; populated for positive-horizon rows of paths with a finite sup-t crit, NaN otherwise). Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass). **Per-path covariate residualization (DID^X):** when `controls=[...]` is set with `by_path=k`, the per-baseline OLS residualization (Web Appendix Section 1.2) runs once on the first-differenced outcome BEFORE path enumeration. All four downstream surfaces — analytical per-path SE, bootstrap SE, per-path placebos, and per-path joint sup-t bands — consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity is confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls` on the `multi_path_reversible_by_path_controls` scenario; cross-surface inheritance is regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns). +- **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs, cband_lower, cband_upper]` (the last two are added by the joint sup-t Note below; populated for positive-horizon rows of paths with a finite sup-t crit, NaN otherwise). Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass). **Per-path covariate residualization (DID^X):** when `controls=[...]` is set with `by_path=k`, the per-baseline OLS residualization (Web Appendix Section 1.2) runs once on the first-differenced outcome BEFORE path enumeration. All four downstream surfaces — analytical per-path SE, bootstrap SE, per-path placebos, and per-path joint sup-t bands — consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Deviation from R on multi-baseline switcher panels (point estimates):** R `did_multiplegt_dyn(..., by_path, controls)` re-runs the per-baseline residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls), so its residualization coefficients can vary per path. Our global-residualization architecture coincides with R when every switcher shares the same baseline value (`D_{g,1}` constant across switchers — e.g., joiners-only or leavers-only panels), and per-path point estimates match R exactly on those panels. On multi-baseline switcher panels, point estimates can diverge — a `UserWarning` is emitted at fit-time when this configuration is detected so practitioners do not silently consume estimates that disagree with R. **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity is confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls` on the `multi_path_reversible_by_path_controls` scenario (single-baseline DGP, exact point-estimate match measured rtol ~1e-11); cross-surface inheritance and the multi-baseline warning are regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns + multi-baseline `UserWarning`). - **Note (Phase 3 `by_path` per-path joint sup-t bands):** When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are computed across horizons `1..L_max` within each path. **Methodology:** a single `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons within the path. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is then used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon, surfaced in `path_effects[path]["horizons"][l]["cband_conf_int"]` and at top-level `results.path_sup_t_bands[path] = {"crit_value", "alpha", "n_bootstrap", "method", "n_valid_horizons"}`. **Gates:** a path must have `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band; otherwise the path is absent from `path_sup_t_bands`. Both gates mirror the OVERALL `event_study_sup_t_bands` semantics at `chaisemartin_dhaultfoeuille_bootstrap.py:605,612`: `len(valid_horizons) >= 2` AND `finite_mask.sum() > 0.5 * n_bootstrap`. Exactly half-finite draws are NOT enough — the gate is strictly greater than half. **Empty-state contract:** `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates. **`to_dataframe(level="by_path")` integration:** the table now includes `cband_lower` / `cband_upper` columns for parity with OVERALL `level="event_study"`; populated for positive-horizon rows of paths with a finite sup-t crit, NaN for placebo rows / unbanded paths / the requested-but-empty fallback DataFrame. **Methodology asymmetry vs OVERALL:** OVERALL sup-t reuses the same multi-horizon shared-draw distribution for both the SE in the t-stat denominator and the bootstrap distribution in the numerator. The per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — numerator: fresh shared draws, denominator: bootstrap SEs from the earlier independent draws. Asymptotically equivalent to OVERALL's self-consistent reuse, but NOT bit-identical. The fresh draw is intentional: it preserves RNG-state isolation and keeps every existing per-path SE seed-reproducibility test bit-stable post-implementation. **Inherited deviation from R:** the bootstrap SE used as the t-stat denominator carries the cross-path cohort-sharing SE deviation from R documented for `path_effects` above; the per-path sup-t crit therefore inherits the same deviation. **Interpretation:** the band covers joint inference *within a single path across horizons*; it does NOT provide simultaneous coverage *across paths* (a different inference target requiring a `path × horizon` re-derivation, deferred to a future wave). **Deviation from R:** `did_multiplegt_dyn` provides no joint / sup-t / simultaneous bands at any surface — this is a Python-only methodology extension, consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Regression test anchor: `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands`. diff --git a/tests/test_chaisemartin_dhaultfoeuille.py b/tests/test_chaisemartin_dhaultfoeuille.py index 7a18595c..01611da8 100644 --- a/tests/test_chaisemartin_dhaultfoeuille.py +++ b/tests/test_chaisemartin_dhaultfoeuille.py @@ -6556,3 +6556,106 @@ def test_to_dataframe_by_path_with_controls_and_bootstrap(self): "to_dataframe(level='by_path') produced no rows with finite " "cband columns under controls + bootstrap" ) + + # Multi-baseline R-deviation warning --------------------------------- + def test_multi_baseline_panel_emits_r_deviation_warning(self): + """When ``by_path + controls`` is fit on a panel where switchers + have multiple ``D_{g,1}`` baseline values, the estimator must + emit a ``UserWarning`` documenting the deviation from R's + per-path re-residualization. Verified against a panel with both + joiner switchers (``D_{g,1}=0``) and leaver switchers + (``D_{g,1}=1``), plus a longer panel and always-treated + controls so per-baseline residualization stays well-conditioned + on both baseline values.""" + # 6 joiners (D_{g,1}=0) + 6 leavers (D_{g,1}=1) + 4 always- + # treated (D_{g,1}=1 controls) + 4 never-treated (D_{g,1}=0 + # controls), 6 periods. + rng = np.random.default_rng(7) + rows = [] + + def _add(group, treatment_path): + for t, d in enumerate(treatment_path): + x = 0.05 * group + 0.15 * t + rng.normal(0, 0.1) + y = d * 2.0 + 1.0 * x + rng.normal(0, 0.1) + rows.append( + {"group": group, "period": t, "treatment": d, "outcome": y, "X1": x} + ) + + for g in (1, 2, 3): + _add(g, [0, 0, 1, 1, 1, 1]) # joiner-late path 0,0,1,1,1,1 + for g in (4, 5, 6): + _add(g, [0, 1, 1, 1, 1, 1]) # joiner-early path 0,1,1,1,1,1 + for g in (7, 8, 9): + _add(g, [1, 0, 0, 0, 0, 0]) # leaver-early path 1,0,0,0,0,0 + for g in (10, 11, 12): + _add(g, [1, 1, 1, 0, 0, 0]) # leaver-late path 1,1,1,0,0,0 + for g in (13, 14, 15, 16): + _add(g, [1, 1, 1, 1, 1, 1]) # always-treated controls + for g in (17, 18, 19, 20): + _add(g, [0, 0, 0, 0, 0, 0]) # never-treated controls + data = pd.DataFrame(rows) + + # Sanity: panel has switcher baselines {0, 1} + baselines_seen = data[data["period"] == 0].groupby("group")["treatment"].first() + assert sorted(baselines_seen.unique()) == [0, 1] + + with warnings.catch_warnings(record=True) as caught: + warnings.simplefilter("always") + est = ChaisemartinDHaultfoeuille( + drop_larger_lower=False, by_path=2 + ) + est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + + deviation_msgs = [ + str(w.message) + for w in caught + if issubclass(w.category, UserWarning) + and "by_path + controls" in str(w.message) + and "multi-baseline" not in str(w.message).lower() + or ( + issubclass(w.category, UserWarning) + and "switcher baselines" in str(w.message) + ) + ] + assert deviation_msgs, ( + "Expected a UserWarning mentioning 'by_path + controls' and " + "'switcher baselines D_{g,1}' on a multi-baseline panel. " + f"Captured warnings: {[str(w.message) for w in caught]}" + ) + + def test_single_baseline_panel_does_not_emit_r_deviation_warning(self): + """The multi-baseline R-deviation warning must NOT fire on a + single-baseline panel (every switcher has the same ``D_{g,1}``). + Pinned against the standard 3-path fixture (joiners-only, all + ``D_{g,1}=0``).""" + data = _by_path_three_path_data_with_controls() + with warnings.catch_warnings(record=True) as caught: + warnings.simplefilter("always") + est = ChaisemartinDHaultfoeuille(drop_larger_lower=False, by_path=3) + est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + deviation_msgs = [ + str(w.message) + for w in caught + if issubclass(w.category, UserWarning) + and "switcher baselines" in str(w.message) + ] + assert not deviation_msgs, ( + "Multi-baseline deviation warning fired on a single-baseline " + f"panel: {deviation_msgs}" + ) diff --git a/tests/test_chaisemartin_dhaultfoeuille_parity.py b/tests/test_chaisemartin_dhaultfoeuille_parity.py index d0797458..98f6c5f2 100644 --- a/tests/test_chaisemartin_dhaultfoeuille_parity.py +++ b/tests/test_chaisemartin_dhaultfoeuille_parity.py @@ -760,21 +760,29 @@ class TestDCDHDynRParityByPathControls: R's ``did_multiplegt_dyn(..., by_path=k, controls="X1")`` re-runs the estimator per path with a path-restricted subsample (path's - switchers + same-baseline not-yet-treated controls). Our - architecture residualizes once on the full panel before path - enumeration. On the ``multi_path_reversible`` DGP, all switchers - share baseline ``D_{g,1}=0``, so the per-path control pool that R - feeds to its per-baseline OLS residualization equals the global - control pool we use — and the residualization coefficients (and - therefore the residualized outcomes) coincide. Per-path point - estimates then match R exactly (rtol ~1e-11). Per-path SE - inherits the documented cross-path cohort-sharing deviation - (Phase 2 envelope). - - On multi-baseline DGPs the residualization coefficients can - diverge across paths under R's per-path call, producing a small - deviation in point estimates. The fixture intentionally sticks to - the single-baseline scenario to keep the parity claim tight. + switchers + same-baseline not-yet-treated controls); see the + canonical R source at ``R/R/did_multiplegt_dyn.R`` lines 393-411 + where the per-path dispatch loop calls ``did_multiplegt_main()`` + once per path with ``df_main`` filtered to that path's groups + plus same-baseline not-yet-treated controls. Our architecture + residualizes once on the full panel before path enumeration. + + On the ``multi_path_reversible`` DGP, every switcher shares + baseline ``D_{g,1}=0``, so R's per-path control pool reduces to + the global control pool we use — and the per-baseline OLS + residualization coefficients coincide. Per-path point estimates + then match R exactly (measured rtol ~1e-11). Per-path SE inherits + the documented cross-path cohort-sharing deviation (Phase 2 + envelope, ~6.5% rtol on this scenario). + + **On multi-baseline switcher panels** the residualization + coefficients vary per path under R's per-path call, producing a + point-estimate deviation between Python and R. The fixture + intentionally restricts to the single-baseline scenario to keep + the parity claim tight, and the production estimator emits a + ``UserWarning`` whenever ``by_path + controls`` is fit on a + multi-baseline panel so practitioners do not silently consume + estimates that disagree with R. """ POINT_RTOL = 1e-9 From 1e49dff9c73cb73afbeb785dbe9a4d017c76b667 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 19:10:13 -0400 Subject: [PATCH 3/6] Address PR #378 R2 P2: CHANGELOG consistency for by_path gated combos MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Older Unreleased by_path entry (PR #357 origin) listed `controls` in the gated-combos list, contradicting the new entry that says `by_path + controls` is supported. Same internal contradiction applied implicitly to the bootstrap, placebo, and sup-t bands extensions that landed in the same Unreleased block — those weren't in the gated list to begin with, but the entry didn't acknowledge they're now supported. Updated the by_path bullet to remove `controls` from the gated list and explicitly acknowledge `n_bootstrap > 0`, `placebo=True`, joint sup-t bands, and `controls` as supported (see dedicated entries elsewhere in [Unreleased]). Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index d60f513f..5f951c39 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -26,7 +26,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **`HeterogeneousAdoptionDiD.fit(survey=..., weights=...)` on continuous-dose paths (Phase 4.5 survey support).** The `continuous_at_zero` (paper Design 1') and `continuous_near_d_lower` (Design 1 continuous-near-d̲) designs accept survey weights through two interchangeable kwargs: `weights=` (pweight shortcut, weighted-robust SE from the CCT-2014 lprobust port) and `survey=SurveyDesign(weights, strata, psu, fpc)` (design-based inference via Binder-TSL variance using the existing `compute_survey_if_variance` helper at `diff_diff/survey.py:1802`). Point estimates match across both entry paths; SE diverges by design (pweight-only vs PSU-aggregated). `HeterogeneousAdoptionDiDResults.survey_metadata` is a repo-standard `SurveyMetadata` dataclass (weight_type / effective_n / design_effect / sum_weights / weight_range / n_strata / n_psu / df_survey); HAD-specific extras (`variance_formula` label, `effective_dose_mean`) are separate top-level result fields. `to_dict()` surfaces the full `SurveyMetadata` object plus `variance_formula` + `effective_dose_mean`; `summary()` renders `variance_formula`, `effective_n`, `effective_dose_mean`, and (when the survey= path is used) `df_survey`; `__repr__` surfaces `variance_formula` + `effective_dose_mean` when present. The HAD `mass_point` design and `aggregate="event_study"` path raise `NotImplementedError` under survey/weights (deferred to Phase 4.5 B: weighted 2SLS + event-study survey composition); the HAD pretests stay unweighted in this release (Phase 4.5 C). Parity ceiling acknowledged — no public weighted-CCF bias-corrected local-linear reference exists in any language; methodology confidence comes from (1) uniform-weights bit-parity at `atol=1e-14` on the full lprobust output struct, (2) cross-language weighted-OLS parity (manual R reference) at `atol=1e-12`, and (3) Monte Carlo oracle consistency on known-τ DGPs. `_nprobust_port.lprobust` gains `weights=` and `return_influence=` (used internally by the Binder-TSL path); `bias_corrected_local_linear` removes the Phase 1c `NotImplementedError` on `weights=` and forwards. Auto-bandwidth selection remains unweighted in this release — pass `h`/`b` explicitly for weight-aware bandwidths. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted extension (Phase 4.5 survey support)". - **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication). - **`did_had_pretest_workflow(aggregate="event_study")`**: multi-period dispatch on balanced ≥3-period panels. Runs QUG at `F` + joint pre-trends Stute across earlier pre-periods + joint homogeneity-linearity Stute across post-periods. Step 2 closure requires ≥2 pre-periods; with only a single pre-period (the base `F-1`) `pretrends_joint=None` and the verdict flags the skip. Reuses the Phase 2b event-study panel validator (last-cohort auto-filter under staggered timing with `UserWarning`; `ValueError` when `first_treat_col=None` and the panel is staggered). The data-in wrappers `joint_pretrends_test` and `joint_homogeneity_test` also route through that same validator internally, so direct wrapper calls inherit the last-cohort filter and constant-post-dose invariant. `HADPretestReport` extended with `pretrends_joint`, `homogeneity_joint`, and `aggregate` fields; serialization methods (`summary`, `to_dict`, `to_dataframe`, `__repr__`) preserve the Phase 3 output bit-exactly on `aggregate="overall"` — no `aggregate` key, no header row, no schema drift — and only surface the new fields on `aggregate="event_study"`. -- **`ChaisemartinDHaultfoeuille.by_path`** — per-path event-study disaggregation, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Passing `by_path=k` (positive int) to the estimator reports separate `DID_{path,l}` + SE + inference for the top-k most common observed treatment paths in the window `[F_g-1, F_g-1+L_max]`, answering the practitioner question "is a single pulse enough, or do you need sustained exposure?" across paths like `(0,1,0,0)` vs `(0,1,1,0)` vs `(0,1,1,1)`. The per-path SE follows the joiners-only / leavers-only IF precedent (switcher-side contribution zeroed for non-path groups; control pool and cohort structure unchanged; plug-in SE with path-specific divisor). Requires `drop_larger_lower=False` (multi-switch groups are the object of interest) and `L_max >= 1`. Binary treatment only in this release; combinations with `controls`, `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` raise `NotImplementedError` and are deferred to follow-up PRs (`n_bootstrap > 0` is now supported — see the dedicated entry below). Results expose `results.path_effects: Dict[Tuple[int, ...], Dict[str, Any]]` and `results.to_dataframe(level="by_path")`; the summary grows a "Treatment-Path Disaggregation" block. Ties in path frequency are broken lexicographically on the path tuple for deterministic ranking. Overflow (`by_path > n_observed_paths`) returns all observed paths with a `UserWarning`. See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path event-study disaggregation)` for the full contract. +- **`ChaisemartinDHaultfoeuille.by_path`** — per-path event-study disaggregation, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Passing `by_path=k` (positive int) to the estimator reports separate `DID_{path,l}` + SE + inference for the top-k most common observed treatment paths in the window `[F_g-1, F_g-1+L_max]`, answering the practitioner question "is a single pulse enough, or do you need sustained exposure?" across paths like `(0,1,0,0)` vs `(0,1,1,0)` vs `(0,1,1,1)`. The per-path SE follows the joiners-only / leavers-only IF precedent (switcher-side contribution zeroed for non-path groups; control pool and cohort structure unchanged; plug-in SE with path-specific divisor). Requires `drop_larger_lower=False` (multi-switch groups are the object of interest) and `L_max >= 1`. Binary treatment only in this release; combinations with `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` raise `NotImplementedError` and are deferred to follow-up PRs (`n_bootstrap > 0`, `placebo=True`, joint sup-t bands, and `controls` are now supported — see the dedicated entries elsewhere in `[Unreleased]`). Results expose `results.path_effects: Dict[Tuple[int, ...], Dict[str, Any]]` and `results.to_dataframe(level="by_path")`; the summary grows a "Treatment-Path Disaggregation" block. Ties in path frequency are broken lexicographically on the path tuple for deterministic ranking. Overflow (`by_path > n_observed_paths`) returns all observed paths with a `UserWarning`. See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path event-study disaggregation)` for the full contract. - **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0`** — bootstrap SE for per-path event-study effects. The top-k paths are enumerated once on the observed data (R-faithful path-stability semantics: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`, confirmed empirically against `DIDmultiplegtDYN 2.3.3`), and the existing multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` propagate the **bootstrap percentile** statistics (library Round-10 convention, same as `overall` / `joiners` / `leavers` / `multi_horizon`); `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation is *conditional on the observed path set* — practitioners wanting unconditional inference capturing path-selection uncertainty need a pairs-bootstrap (no R precedent). **SE inherits the analytical cross-path cohort-sharing deviation:** bootstrap input is the same full-panel cohort-centered path IF as the analytical path, so the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the existing analytical-path divergence from R on mixed-path cohorts (see REGISTRY.md for the full mechanism). On single-path-cohort panels, bootstrap and analytical SE both track R up to the Phase 2 envelope. **Deviation from R (CI method):** R's per-path bootstrap CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Positive regressions at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (`@pytest.mark.slow`): point-estimate invariance, finite SE on non-degenerate panels, bootstrap-vs-analytical SE within 30% rtol on cohort-clean panels, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → **Bootstrap SE** for the full write-up. - **R-parity for `ChaisemartinDHaultfoeuille.by_path`** against `DIDmultiplegtDYN 2.3.3`. Two new scenarios in `benchmarks/data/dcdh_dynr_golden_values.json` generated from `did_multiplegt_dyn(..., by_path=k)`: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 observed paths, `by_path=3`, via a new deterministic multi-path DGP pattern in the R generator). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path scenario, ≤ 4.2% on the 4-path scenario). Parity tests live at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath`, matching paths by tuple label via set-equality (robust to R's undocumented frequency-tie tiebreak) and cross-checking per-path switcher counts before SE comparison. **Deviation documented:** cross-path cohort sharing — our full-panel cohort-centered plug-in vs R's per-path re-run diverges materially when a `(D_{g,1}, F_g, S_g)` cohort spans multiple observed paths; the two coincide when every cohort is single-path. The parity scenarios are constructed to keep cohorts single-path (scenario 13 by design, scenario 14 via path-assignment-deterministic-on-F_g). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path...)` for the full write-up. - **`profile_panel()` utility + `llms-autonomous.txt` reference guide (agent-facing)** — new `diff_diff.profile_panel(df, *, unit, time, treatment, outcome)` returns a frozen `PanelProfile` dataclass of structural facts (panel balance, treatment-type classification — `"binary_absorbing"` / `"binary_non_absorbing"` / `"continuous"` / `"categorical"`, cohort structure, outcome characteristics, and a `tuple[Alert, ...]` of factual observations). `.to_dict()` returns a JSON-serializable view. Paired with a new bundled `"autonomous"` variant on `get_llm_guide()` — `get_llm_guide("autonomous")` returns a reference-shaped guide (distinct from the existing workflow-prose `"practitioner"` variant) with §1 audience disclaimer, §2 `PanelProfile` field reference, §3 embedded 17-estimator × 9-design-feature support matrix, §4 per-design-feature reasoning citing Baker et al. (2025) and Roth / Sant'Anna (2023), §5 post-fit validation index, §6 BR/DR schema reference, §7 citations, §8 intentional omissions. Both pieces are bundled inside the wheel (no GitHub / RTD dependency at runtime); `diff_diff/__init__.py` module docstring leads with an agent-entry block listing `profile_panel`, `get_llm_guide("autonomous")`, `get_llm_guide("practitioner")`, and `BusinessReport` so `help(diff_diff)` surfaces them. Descriptive, not opinionated — `profile_panel` alerts never recommend a specific estimator, and the guide enumerates trade-offs rather than dispatching. Exports: `profile_panel`, `PanelProfile`, `Alert` from top-level `diff_diff`. From d24ae257aca4b7153cc6942dfbbe9295ee629ce6 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 19:39:35 -0400 Subject: [PATCH 4/6] Address PR #378 R3 P1: precise parity condition + heterogeneous-F_g regression MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reviewer flagged that the parity condition documented for by_path + controls (single-baseline switcher panel) might not be sufficient, hypothesizing that R's per-path subset could exclude pre-switch rows of other-path switchers and produce a different first-stage residualization sample. Verified the hypothesis is empirically falsified and analytically incorrect by reading R/R/did_multiplegt_dyn.R lines 401-405 line-by- line. R's per-path subset for path B includes: - Rows where path_XX == B (path-B switchers, all rows) - OR rows where yet_to_switch=1 AND baseline matches (pre-switch rows of any group with matching baseline, regardless of path) So R's per-path first-stage sample equals (pre-switch rows of all switchers with matching baseline + all rows of never-switchers with matching baseline) — bit-identical to our global first-stage sample under single-baseline switcher panels, regardless of how F_g varies across paths or within a path. Empirical confirmation: scenario 16 (`multi_path_reversible_by_path_controls`) has switcher F_g spanning [0..6] across 4 distinct paths under D_{g,1}=0 and Python matches R to rtol ~1e-11 across all (path, horizon) cells. Strengthened the contract: - Expanded the warning code comment to spell out R's per-path subset construction (citing R source line numbers) and why single-baseline- switcher is the precise parity condition (control-pool equivalence via the OR clause), with the empirical scenario reference baked in - Updated REGISTRY.md "Per-path covariate residualization (DID^X)" paragraph to cite R lines 401-405 and clarify never-switcher baselines do not affect parity - New regression test `test_single_baseline_heterogeneous_F_g_no_warning_and_matches_r` uses the golden-value scenario (single-baseline, heterogeneous F_g across paths) to assert: (a) no UserWarning fires, (b) per-path point estimates are produced finite. The numeric R-parity is locked separately in TestDCDHDynRParityByPathControls. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff_diff/chaisemartin_dhaultfoeuille.py | 48 +++++++++---- docs/methodology/REGISTRY.md | 2 +- tests/test_chaisemartin_dhaultfoeuille.py | 86 +++++++++++++++++++++++ 3 files changed, 123 insertions(+), 13 deletions(-) diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py index 3cc43eb5..fdcb3054 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille.py +++ b/diff_diff/chaisemartin_dhaultfoeuille.py @@ -1490,18 +1490,42 @@ def fit( ) _switch_metadata_computed = True - # by_path + controls multi-baseline deviation from R: R re-runs - # the per-baseline OLS residualization on each path's restricted - # subsample (path's switchers + same-baseline not-yet-treated - # controls), so its residualization coefficients can differ per - # path. We residualize once on the full panel before path - # enumeration. On single-baseline switcher panels (every - # switcher has the same D_{g,1}) the two strategies coincide - # and per-path point estimates match R exactly. On multi- - # baseline switcher panels they can diverge — warn the user - # explicitly so they don't silently consume estimates that - # disagree with R. SE inheritance (cross-path cohort-sharing) - # is documented separately in REGISTRY.md. + # by_path + controls residualization-sample deviation from R. + # R's `did_multiplegt_dyn(..., by_path, controls)` calls + # `did_multiplegt_main()` once per path with `df_main` filtered + # to: rows of the path's switchers OR rows where + # `yet_to_switch=1 AND baseline matches the path's baseline` + # (R/R/did_multiplegt_dyn.R lines 401-405). Inside the per-path + # `did_multiplegt_main()` call, the per-baseline first-stage + # residualization regression uses `(g, t)` cells where g's + # treatment hasn't changed yet at t. Critically, R's path- + # restricted subset INCLUDES the pre-switch rows of OTHER-path + # switchers via the `yet_to_switch=1 AND baseline matches` + # clause, so the first-stage SAMPLE that R uses for path B + # equals: pre-switch rows of all switchers with matching + # baseline + all rows of never-switchers with matching + # baseline. This is BIT-IDENTICAL to the first-stage sample + # we use under our global residualization — first-stage + # coefficients (and therefore residualized outcomes) coincide, + # and per-path point estimates match R exactly **under single- + # baseline switcher panels** (every switcher has the same + # `D_{g,1}`, regardless of how `F_g` varies across paths or + # within a path). Empirical confirmation: the + # `multi_path_reversible_by_path_controls` R-parity scenario + # has 4 paths with switcher `F_g` values spanning [0..6] under + # `D_{g,1}=0` for every switcher, and Python matches R to + # rtol ~1e-11 across all `(path, horizon)` cells. + # + # On MULTI-baseline switcher panels the per-baseline regression + # coefficients diverge per path under R (R's per-path subset + # for path B drops switchers whose baseline differs from B's + # baseline), so point estimates can diverge between Python and + # R — warn the user explicitly. The check filters to switcher + # groups only (never-switchers do not contribute to "switcher + # baseline" multiplicity even if they appear at multiple + # `D_{g,1}` values across the never-treated / always-treated + # control mix). SE inheritance (cross-path cohort-sharing) is + # documented separately in REGISTRY.md. if self.by_path is not None: _switcher_mask = first_switch_idx_arr >= 0 if _switcher_mask.any(): diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 7693f543..ba1c4ce3 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -638,7 +638,7 @@ The guard is fired by `_survey_se_from_group_if` (analytical and replicate) and - **Note (Phase 3 Design-2 switch-in/switch-out):** Convenience wrapper for Web Appendix Section 1.6 (Assumption 16). Identifies groups with exactly 2 treatment changes (join then leave), reports switch-in and switch-out mean effects. This is a descriptive summary, not a full re-estimation with specialized control pools as described in the paper. **Always uses raw (unadjusted) outcomes** regardless of active `controls`, `trends_linear`, or `trends_nonparam` options - those adjustments apply to the main estimator surface but not to the Design-2 descriptive block. For full adjusted Design-2 estimation with proper control pools, the paper recommends "running the command on a restricted subsample and using `trends_nonparam` for the entry-timing grouping." Activated via `design2=True` in `fit()`, requires `drop_larger_lower=False` to retain 2-switch groups. -- **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs, cband_lower, cband_upper]` (the last two are added by the joint sup-t Note below; populated for positive-horizon rows of paths with a finite sup-t crit, NaN otherwise). Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass). **Per-path covariate residualization (DID^X):** when `controls=[...]` is set with `by_path=k`, the per-baseline OLS residualization (Web Appendix Section 1.2) runs once on the first-differenced outcome BEFORE path enumeration. All four downstream surfaces — analytical per-path SE, bootstrap SE, per-path placebos, and per-path joint sup-t bands — consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Deviation from R on multi-baseline switcher panels (point estimates):** R `did_multiplegt_dyn(..., by_path, controls)` re-runs the per-baseline residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls), so its residualization coefficients can vary per path. Our global-residualization architecture coincides with R when every switcher shares the same baseline value (`D_{g,1}` constant across switchers — e.g., joiners-only or leavers-only panels), and per-path point estimates match R exactly on those panels. On multi-baseline switcher panels, point estimates can diverge — a `UserWarning` is emitted at fit-time when this configuration is detected so practitioners do not silently consume estimates that disagree with R. **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity is confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls` on the `multi_path_reversible_by_path_controls` scenario (single-baseline DGP, exact point-estimate match measured rtol ~1e-11); cross-surface inheritance and the multi-baseline warning are regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns + multi-baseline `UserWarning`). +- **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs, cband_lower, cband_upper]` (the last two are added by the joint sup-t Note below; populated for positive-horizon rows of paths with a finite sup-t crit, NaN otherwise). Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass). **Per-path covariate residualization (DID^X):** when `controls=[...]` is set with `by_path=k`, the per-baseline OLS residualization (Web Appendix Section 1.2) runs once on the first-differenced outcome BEFORE path enumeration. All four downstream surfaces — analytical per-path SE, bootstrap SE, per-path placebos, and per-path joint sup-t bands — consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Deviation from R on multi-baseline switcher panels (point estimates):** R `did_multiplegt_dyn(..., by_path, controls)` re-runs the per-baseline residualization on each path's restricted subsample (`R/R/did_multiplegt_dyn.R` lines 401-405: rows of the path's switchers OR rows where `yet_to_switch=1 AND baseline matches the path's baseline`). The first-stage residualization sample R uses for path B equals: pre-switch rows of all switchers with matching baseline + all rows of never-switchers with matching baseline — bit-identical to our global first-stage sample under single-baseline switcher panels (every switcher shares the same `D_{g,1}`, regardless of how `F_g` or path identity varies across switchers). Per-path point estimates therefore match R exactly on those panels (the `multi_path_reversible_by_path_controls` parity fixture has 4 paths with switcher `F_g` values spanning [0..6] under `D_{g,1}=0`, and Python matches R to rtol ~1e-11). On multi-baseline switcher panels (some switchers have `D_{g,1}=0`, others have `D_{g,1}=1`) R's per-path subset drops switchers whose baseline differs from the path's baseline, so the per-baseline regression coefficients diverge per path under R and point estimates can diverge between Python and R — a `UserWarning` is emitted at fit-time when this configuration is detected so practitioners do not silently consume estimates that disagree with R. The warning filters to switcher groups only; never-switchers (never-treated + always-treated controls) at multiple baseline values do NOT trigger the warning because they don't affect R's per-path subset construction. **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity is confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls` on the `multi_path_reversible_by_path_controls` scenario (single-baseline DGP, exact point-estimate match measured rtol ~1e-11); cross-surface inheritance and the multi-baseline warning are regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns + multi-baseline `UserWarning`). - **Note (Phase 3 `by_path` per-path joint sup-t bands):** When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are computed across horizons `1..L_max` within each path. **Methodology:** a single `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons within the path. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is then used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon, surfaced in `path_effects[path]["horizons"][l]["cband_conf_int"]` and at top-level `results.path_sup_t_bands[path] = {"crit_value", "alpha", "n_bootstrap", "method", "n_valid_horizons"}`. **Gates:** a path must have `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band; otherwise the path is absent from `path_sup_t_bands`. Both gates mirror the OVERALL `event_study_sup_t_bands` semantics at `chaisemartin_dhaultfoeuille_bootstrap.py:605,612`: `len(valid_horizons) >= 2` AND `finite_mask.sum() > 0.5 * n_bootstrap`. Exactly half-finite draws are NOT enough — the gate is strictly greater than half. **Empty-state contract:** `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates. **`to_dataframe(level="by_path")` integration:** the table now includes `cband_lower` / `cband_upper` columns for parity with OVERALL `level="event_study"`; populated for positive-horizon rows of paths with a finite sup-t crit, NaN for placebo rows / unbanded paths / the requested-but-empty fallback DataFrame. **Methodology asymmetry vs OVERALL:** OVERALL sup-t reuses the same multi-horizon shared-draw distribution for both the SE in the t-stat denominator and the bootstrap distribution in the numerator. The per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — numerator: fresh shared draws, denominator: bootstrap SEs from the earlier independent draws. Asymptotically equivalent to OVERALL's self-consistent reuse, but NOT bit-identical. The fresh draw is intentional: it preserves RNG-state isolation and keeps every existing per-path SE seed-reproducibility test bit-stable post-implementation. **Inherited deviation from R:** the bootstrap SE used as the t-stat denominator carries the cross-path cohort-sharing SE deviation from R documented for `path_effects` above; the per-path sup-t crit therefore inherits the same deviation. **Interpretation:** the band covers joint inference *within a single path across horizons*; it does NOT provide simultaneous coverage *across paths* (a different inference target requiring a `path × horizon` re-derivation, deferred to a future wave). **Deviation from R:** `did_multiplegt_dyn` provides no joint / sup-t / simultaneous bands at any surface — this is a Python-only methodology extension, consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Regression test anchor: `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands`. diff --git a/tests/test_chaisemartin_dhaultfoeuille.py b/tests/test_chaisemartin_dhaultfoeuille.py index 01611da8..993cebde 100644 --- a/tests/test_chaisemartin_dhaultfoeuille.py +++ b/tests/test_chaisemartin_dhaultfoeuille.py @@ -6659,3 +6659,89 @@ def test_single_baseline_panel_does_not_emit_r_deviation_warning(self): "Multi-baseline deviation warning fired on a single-baseline " f"panel: {deviation_msgs}" ) + + def test_single_baseline_heterogeneous_F_g_no_warning_and_matches_r(self): + """Pin the precise parity condition: single-baseline switcher + panel with HETEROGENEOUS ``F_g`` across paths produces (a) no + multi-baseline UserWarning and (b) per-path point estimates that + match R bit-exactly. Uses the + ``multi_path_reversible_by_path_controls`` golden-value scenario, + whose switchers all share ``D_{g,1}=0`` while ``F_g`` spans + [0..6] across 4 distinct observed paths. + + Why this is the right parity condition (not just a global + baseline check): R's per-path subset + (``R/R/did_multiplegt_dyn.R`` lines 401-405) includes + ``yet_to_switch=1`` rows with matching baseline regardless of + which path the row's group belongs to. So R's per-path first- + stage residualization sample equals (pre-switch rows of all + switchers with matching baseline + all rows of never-switchers + with matching baseline) — bit-identical to our global first- + stage sample under single-baseline conditions, even when ``F_g`` + and path identity vary across switchers.""" + data = _load_by_path_controls_scenario() + + # Sanity: panel has multiple distinct switcher F_g values but a + # single switcher baseline. A "switcher" is a group whose + # treatment changes over time; always-treated and never-treated + # groups are NOT switchers regardless of their D_{g,1} value. + treatment_per_group = data.groupby("group")["treatment"] + is_switcher_per_group = treatment_per_group.nunique() > 1 + switcher_groups = is_switcher_per_group[is_switcher_per_group].index + baselines_at_t0 = data[data["period"] == 0].set_index("group")["treatment"] + switcher_baselines = baselines_at_t0.loc[switcher_groups] + assert switcher_baselines.nunique() == 1, ( + f"Fixture invariant violated: switcher baselines should be a " + f"single value, got {sorted(switcher_baselines.unique())}" + ) + first_treat = ( + data[ + (data["treatment"] == 1) & data["group"].isin(switcher_groups) + ] + .groupby("group")["period"] + .min() + ) + assert first_treat.nunique() > 1, ( + f"Fixture invariant violated: switcher F_g should span " + f"multiple values, got {sorted(first_treat.unique())}" + ) + + with warnings.catch_warnings(record=True) as caught: + warnings.simplefilter("always") + est = ChaisemartinDHaultfoeuille(drop_larger_lower=False, by_path=3) + res = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + controls=["X1"], + L_max=3, + ) + + deviation_msgs = [ + str(w.message) + for w in caught + if issubclass(w.category, UserWarning) + and "switcher baselines" in str(w.message) + ] + assert not deviation_msgs, ( + "Multi-baseline deviation warning fired on a single-baseline " + f"panel with heterogeneous F_g: {deviation_msgs}. The parity " + "condition is single-baseline-switcher (regardless of F_g " + "heterogeneity), so this scenario must NOT trigger the warning." + ) + + # Locked numeric checks: per-path point estimates from this fit + # match the R `did_multiplegt_dyn(..., by_path=3, controls="X1")` + # output to rtol ~1e-11 on this scenario (the parity test in + # `test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls` + # asserts this against the golden values; here we lock the + # internal-only invariant that the estimates are produced). + assert res.path_effects is not None and len(res.path_effects) >= 1 + for path, entry in res.path_effects.items(): + for l_h, vals in entry["horizons"].items(): + assert np.isfinite(vals["effect"]), ( + f"path={path} l={l_h}: effect not finite under " + f"single-baseline + heterogeneous F_g" + ) From fb7a9129a35d08afabe0fdfe1625bc8080eb452e Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 19:47:49 -0400 Subject: [PATCH 5/6] =?UTF-8?q?Address=20PR=20#378=20R3=20P3=20=C3=972:=20?= =?UTF-8?q?cell-weighting=20cross-ref=20+=20Step=207b=20comment=20scope?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit P3 #1 (Methodology): qualified the "exact R match" claim across docstring / REGISTRY / CHANGELOG / R-generator comment / parity test docstring with a cross-reference to the existing DID^X cell-weighting deviation (Python's first-stage uses equal cell weights, R weights by N_gt). The two coincide on one-observation-per-(g,t) panels (the common cell-aggregated regime that the parity scenario uses). The multi-observation-per-cell deviation is independent of the by_path lift and was already documented in REGISTRY's "Note (Phase 3 DID^X covariate adjustment)". P3 #2 (Maintainability): narrowed the Step 7b header comment in chaisemartin_dhaultfoeuille.py:1465-1473 to spell out that DID^X residualization applies to the per-group multi-horizon path (event_study_effects, overall_att, joiners/leavers, by_path, placebos, sup-t bands) but intentionally excludes per_period_effects which stays on raw outcomes per the existing "Note (Phase 3 DID^X covariate adjustment)" contract. Documentation-only fix; no runtime behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 2 +- benchmarks/R/generate_dcdh_dynr_test_values.R | 10 ++++++-- diff_diff/chaisemartin_dhaultfoeuille.py | 25 +++++++++++++------ docs/methodology/REGISTRY.md | 2 +- ...test_chaisemartin_dhaultfoeuille_parity.py | 14 ++++++++--- 5 files changed, 37 insertions(+), 16 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 5f951c39..dc67d0e2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added -- **`ChaisemartinDHaultfoeuille.by_path` + `controls`** (DID^X residualization) — the per-baseline OLS residualization (Web Appendix Section 1.2) is now compatible with `by_path=k`. The residualization runs once on the first-differenced outcome BEFORE path enumeration, so all four downstream surfaces (analytical per-path SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Deviation from R on multi-baseline switcher panels (point estimates):** R `did_multiplegt_dyn(..., by_path, controls)` re-runs the per-baseline OLS residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls), so its residualization coefficients vary per path when switchers have different baseline values. Our global-residualization architecture coincides with R on single-baseline switcher panels (every switcher shares the same `D_{g,1}`) — per-path point estimates match R exactly there. On multi-baseline panels, point estimates can diverge; the estimator emits a `UserWarning` at fit-time when this configuration is detected so practitioners do not silently consume estimates that disagree with R. **SE inherits the cross-path cohort-sharing SE deviation from R** documented for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` via the new `multi_path_reversible_by_path_controls` single-baseline golden-value scenario (per-path point estimates exact match — measured rtol ~1e-11 across all path × horizon cells; per-path SE within ~6.5% of R, well inside the Phase 2 multi-horizon envelope). Gate at `chaisemartin_dhaultfoeuille.py:988-992` removed; `by_path` docstring updated to add the new compatibility paragraph (with the multi-baseline caveat) and remove `controls` from the incompatible list. R-parity test at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls`; cross-surface inheritance + multi-baseline `UserWarning` regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns + multi-baseline warning). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path covariate residualization (DID^X)" for the full contract. +- **`ChaisemartinDHaultfoeuille.by_path` + `controls`** (DID^X residualization) — the per-baseline OLS residualization (Web Appendix Section 1.2) is now compatible with `by_path=k`. The residualization runs once on the first-differenced outcome BEFORE path enumeration, so all four downstream surfaces (analytical per-path SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Deviation from R on multi-baseline switcher panels (point estimates):** R `did_multiplegt_dyn(..., by_path, controls)` re-runs the per-baseline OLS residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls), so its residualization coefficients vary per path when switchers have different baseline values. Our global-residualization architecture coincides with R on single-baseline switcher panels (every switcher shares the same `D_{g,1}`) — per-path point estimates match R exactly there. On multi-baseline panels, point estimates can diverge; the estimator emits a `UserWarning` at fit-time when this configuration is detected so practitioners do not silently consume estimates that disagree with R. **SE inherits the cross-path cohort-sharing SE deviation from R** documented for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` via the new `multi_path_reversible_by_path_controls` single-baseline golden-value scenario (per-path point estimates match R bit-exactly — measured rtol ~1e-11 across all path × horizon cells — on this one-observation-per-cell scenario; per-path SE within ~6.5% of R, well inside the Phase 2 multi-horizon envelope). Cell-aggregated panels with multiple observations per `(g, t)` also coincide with our equal-cell-weighting first stage rather than R's `N_gt`-weighted first stage per the existing DID^X cell-weighting deviation documented in `docs/methodology/REGISTRY.md` `Note (Phase 3 DID^X covariate adjustment)`. Gate at `chaisemartin_dhaultfoeuille.py:988-992` removed; `by_path` docstring updated to add the new compatibility paragraph (with the multi-baseline caveat) and remove `controls` from the incompatible list. R-parity test at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls`; cross-surface inheritance + multi-baseline `UserWarning` regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns + multi-baseline warning). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path covariate residualization (DID^X)" for the full contract. - **HAD linearity-family pretests under survey (Phase 4.5 C).** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` now accept `weights=` / `survey=` keyword-only kwargs. Stute family uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap): each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix, broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`, weighted OLS refit, weighted CvM via new `_cvm_statistic_weighted` helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence AND PSU clustering. Yatchew uses **closed-form weighted OLS + pweight-sandwich variance components** (no bootstrap): `sigma2_lin = sum(w·eps²)/sum(w)`, `sigma2_diff = sum(w_avg·diff²)/(2·sum(w))` with arithmetic-mean pair weights `w_avg_g = (w_g+w_{g-1})/2`, `sigma4_W = sum(w_avg·prod)/sum(w_avg)`, `T_hr = sqrt(sum(w))·(sigma2_lin-sigma2_diff)/sigma2_W`. All three Yatchew components reduce bit-exactly to the unweighted formulas at `w=ones(G)` (locked at `atol=1e-14` by direct helper test). The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (new `survey._make_trivial_resolved` helper) so the same kernel handles both entry paths. `did_had_pretest_workflow(..., survey=, weights=)` removes the Phase 4.5 C0 `NotImplementedError`, dispatches to the survey-aware sub-tests, **skips the QUG step with `UserWarning`** (per C0 deferral), sets `qug=None` on the report, and appends a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix to the verdict. `HADPretestReport.qug` retyped from `QUGTestResults` to `Optional[QUGTestResults]`; `summary()` / `to_dict()` / `to_dataframe()` updated to None-tolerant rendering. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise `NotImplementedError` at every entry point (defense in depth, reciprocal-guard discipline) — parallel follow-up after this PR. **Stratified designs (`SurveyDesign(strata=...)`) also raise `NotImplementedError` on the Stute family** — the within-stratum demean + `sqrt(n_h/(n_h-1))` correction that the HAD sup-t bootstrap applies to match the Binder-TSL stratified target has not been derived for the Stute CvM functional, so applying raw multipliers from `generate_survey_multiplier_weights_batch` directly to residual perturbations would leave the bootstrap p-value silently miscalibrated. Phase 4.5 C narrows survey support to **pweight-only**, **PSU-only** (`SurveyDesign(weights=, psu=)`), and **FPC-only** (`SurveyDesign(weights=, fpc=)`) designs; stratified is a follow-up after the matching Stute-CvM stratified-correction derivation lands. Strictly positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Per-row `weights=` / `survey=col` aggregated to per-unit via existing HAD helpers `_aggregate_unit_weights` / `_aggregate_unit_resolved_survey` (constant-within-unit invariant enforced). Unweighted code paths preserved bit-exactly. Patch-level addition (additive on stable surfaces). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C) for the full methodology. - **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0` joint sup-t bands** — per-path joint sup-t simultaneous confidence intervals across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon. Surfaced on `results.path_sup_t_bands` (dict keyed by path tuple, each entry with `crit_value / alpha / n_bootstrap / method / n_valid_horizons`); as `cband_conf_int` per horizon entry on `path_effects[path]["horizons"][l]`; and as `cband_lower` / `cband_upper` columns on `results.to_dataframe(level="by_path")` (mirrors the OVERALL `level="event_study"` schema; positive-horizon rows of banded paths get populated values, placebo / unbanded / empty-window rows get NaN). Gates: a path needs `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Empty-state contract: `path_sup_t_bands is None` when not requested; `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL `event_study_sup_t_bands`:** the per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Documented intentional choice to preserve RNG-state isolation for existing per-path SE seed-reproducibility tests. Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Deviation from R:** `did_multiplegt_dyn` does not provide joint / sup-t bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL sup-t bands (also Python-only). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Pre-audit fix bundled: stale "Phase 2 placeholder" docstring on the existing `sup_t_bands` field updated to the actual contract description. Tests at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands` (`@pytest.mark.slow`). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path joint sup-t bands)` for the full contract. - **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract. diff --git a/benchmarks/R/generate_dcdh_dynr_test_values.R b/benchmarks/R/generate_dcdh_dynr_test_values.R index ed168644..69d37bf2 100644 --- a/benchmarks/R/generate_dcdh_dynr_test_values.R +++ b/benchmarks/R/generate_dcdh_dynr_test_values.R @@ -714,9 +714,15 @@ scenarios$multi_path_reversible_by_path_placebo <- list( # then disaggregates per path. **The two strategies coincide on # single-baseline switcher panels** (every switcher shares D_{g,1}=0) # because R's per-path control pool then equals the global control pool -# — `multi_path_reversible` is built precisely for this property, so +# # — `multi_path_reversible` is built precisely for this property, so # per-path event-study point estimates and switcher counts must match R -# exactly. Per-path SE inherits the documented cross-path cohort-sharing +# bit-exactly on the one-observation-per-(g,t) DGP this generator +# produces. (On panels with multiple observations per `(g, t)` cell, the +# library's equal-cell-weighting first stage diverges from R's `N_gt`- +# weighted first stage per the existing DID^X cell-weighting deviation +# in `docs/methodology/REGISTRY.md` "Note (Phase 3 DID^X covariate +# adjustment)" — that deviation is independent of the by_path lift.) +# Per-path SE inherits the documented cross-path cohort-sharing # deviation from R for `path_effects`. On multi-baseline switcher panels # the residualization coefficients can diverge per path between Python # and R; the production fit emits a `UserWarning` in that configuration. diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py index fdcb3054..9d138fed 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille.py +++ b/diff_diff/chaisemartin_dhaultfoeuille.py @@ -429,11 +429,15 @@ class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin): when switchers have different baseline values. Our global- residualization architecture coincides with R on single- baseline panels (every switcher shares the same ``D_{g,1}``) - and per-path point estimates match exactly. On multi-baseline - panels, point estimates can diverge — a ``UserWarning`` is - emitted at fit-time when this configuration is detected. - SE inherits the cross-path cohort-sharing deviation from R - documented for ``path_effects``. + and per-path point estimates match exactly on the one- + observation-per-``(g, t)`` regime; on multi-observation-per- + cell panels the existing DID^X cell-weighting deviation from + R applies (see ``docs/methodology/REGISTRY.md`` "Note (Phase + 3 DID^X covariate adjustment)"; independent of the by_path + lift). On multi-baseline switcher panels, point estimates can + diverge — a ``UserWarning`` is emitted at fit-time when this + configuration is detected. SE inherits the cross-path cohort- + sharing deviation from R documented for ``path_effects``. Compatible with ``n_bootstrap > 0`` -- the top-k paths are enumerated once on the observed data (paths held fixed across @@ -1467,9 +1471,14 @@ def fit( # # When controls are specified, residualize Y_mat by partialling # out covariate effects per baseline treatment group. This - # transforms Y_mat in-place so ALL downstream DID computations - # (per-period and per-group multi-horizon) automatically produce - # covariate-adjusted estimates. See Web Appendix Section 1.2. + # transforms Y_mat so the per-group multi-horizon DID path + # (event_study_effects, overall_att, joiners/leavers, by_path + # surfaces, placebos, sup-t bands) automatically produces + # covariate-adjusted estimates. The per-period DID path + # (per_period_effects) intentionally remains on raw outcomes — + # it uses binary joiner/leaver categorization and is not part + # of the DID^X contract per REGISTRY.md "Note (Phase 3 DID^X + # covariate adjustment)". See Web Appendix Section 1.2. # ------------------------------------------------------------------ covariate_diagnostics: Optional[Dict[str, Any]] = None _switch_metadata_computed = False diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index ba1c4ce3..2b43c61e 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -638,7 +638,7 @@ The guard is fired by `_survey_se_from_group_if` (analytical and replicate) and - **Note (Phase 3 Design-2 switch-in/switch-out):** Convenience wrapper for Web Appendix Section 1.6 (Assumption 16). Identifies groups with exactly 2 treatment changes (join then leave), reports switch-in and switch-out mean effects. This is a descriptive summary, not a full re-estimation with specialized control pools as described in the paper. **Always uses raw (unadjusted) outcomes** regardless of active `controls`, `trends_linear`, or `trends_nonparam` options - those adjustments apply to the main estimator surface but not to the Design-2 descriptive block. For full adjusted Design-2 estimation with proper control pools, the paper recommends "running the command on a restricted subsample and using `trends_nonparam` for the entry-timing grouping." Activated via `design2=True` in `fit()`, requires `drop_larger_lower=False` to retain 2-switch groups. -- **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs, cband_lower, cband_upper]` (the last two are added by the joint sup-t Note below; populated for positive-horizon rows of paths with a finite sup-t crit, NaN otherwise). Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass). **Per-path covariate residualization (DID^X):** when `controls=[...]` is set with `by_path=k`, the per-baseline OLS residualization (Web Appendix Section 1.2) runs once on the first-differenced outcome BEFORE path enumeration. All four downstream surfaces — analytical per-path SE, bootstrap SE, per-path placebos, and per-path joint sup-t bands — consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Deviation from R on multi-baseline switcher panels (point estimates):** R `did_multiplegt_dyn(..., by_path, controls)` re-runs the per-baseline residualization on each path's restricted subsample (`R/R/did_multiplegt_dyn.R` lines 401-405: rows of the path's switchers OR rows where `yet_to_switch=1 AND baseline matches the path's baseline`). The first-stage residualization sample R uses for path B equals: pre-switch rows of all switchers with matching baseline + all rows of never-switchers with matching baseline — bit-identical to our global first-stage sample under single-baseline switcher panels (every switcher shares the same `D_{g,1}`, regardless of how `F_g` or path identity varies across switchers). Per-path point estimates therefore match R exactly on those panels (the `multi_path_reversible_by_path_controls` parity fixture has 4 paths with switcher `F_g` values spanning [0..6] under `D_{g,1}=0`, and Python matches R to rtol ~1e-11). On multi-baseline switcher panels (some switchers have `D_{g,1}=0`, others have `D_{g,1}=1`) R's per-path subset drops switchers whose baseline differs from the path's baseline, so the per-baseline regression coefficients diverge per path under R and point estimates can diverge between Python and R — a `UserWarning` is emitted at fit-time when this configuration is detected so practitioners do not silently consume estimates that disagree with R. The warning filters to switcher groups only; never-switchers (never-treated + always-treated controls) at multiple baseline values do NOT trigger the warning because they don't affect R's per-path subset construction. **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity is confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls` on the `multi_path_reversible_by_path_controls` scenario (single-baseline DGP, exact point-estimate match measured rtol ~1e-11); cross-surface inheritance and the multi-baseline warning are regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns + multi-baseline `UserWarning`). +- **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs, cband_lower, cband_upper]` (the last two are added by the joint sup-t Note below; populated for positive-horizon rows of paths with a finite sup-t crit, NaN otherwise). Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass). **Per-path covariate residualization (DID^X):** when `controls=[...]` is set with `by_path=k`, the per-baseline OLS residualization (Web Appendix Section 1.2) runs once on the first-differenced outcome BEFORE path enumeration. All four downstream surfaces — analytical per-path SE, bootstrap SE, per-path placebos, and per-path joint sup-t bands — consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Deviation from R on multi-baseline switcher panels (point estimates):** R `did_multiplegt_dyn(..., by_path, controls)` re-runs the per-baseline residualization on each path's restricted subsample (`R/R/did_multiplegt_dyn.R` lines 401-405: rows of the path's switchers OR rows where `yet_to_switch=1 AND baseline matches the path's baseline`). The first-stage residualization sample R uses for path B equals: pre-switch rows of all switchers with matching baseline + all rows of never-switchers with matching baseline — bit-identical to our global first-stage sample under single-baseline switcher panels (every switcher shares the same `D_{g,1}`, regardless of how `F_g` or path identity varies across switchers). Per-path point estimates therefore coincide with R on those panels up to the existing **DID^X first-stage cell-weighting deviation** documented above in `Note (Phase 3 DID^X covariate adjustment)` (Python's first-stage OLS uses equal cell weights — one observation per `(g, t)` cell, consistent with the library's cell-aggregated input convention; R weights by `N_gt`). On panels with one observation per `(g, t)` cell (the common case after the cell-aggregation step in `fit()`), Python matches R bit-exactly: the `multi_path_reversible_by_path_controls` parity fixture has 4 paths with switcher `F_g` values spanning [0..6] under `D_{g,1}=0` and Python matches R to rtol ~1e-11. On multi-baseline switcher panels (some switchers have `D_{g,1}=0`, others have `D_{g,1}=1`) R's per-path subset drops switchers whose baseline differs from the path's baseline, so the per-baseline regression coefficients diverge per path under R and point estimates can diverge between Python and R — a `UserWarning` is emitted at fit-time when this configuration is detected so practitioners do not silently consume estimates that disagree with R. The warning filters to switcher groups only; never-switchers (never-treated + always-treated controls) at multiple baseline values do NOT trigger the warning because they don't affect R's per-path subset construction. **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity is confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls` on the `multi_path_reversible_by_path_controls` scenario (single-baseline DGP, exact point-estimate match measured rtol ~1e-11); cross-surface inheritance and the multi-baseline warning are regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns + multi-baseline `UserWarning`). - **Note (Phase 3 `by_path` per-path joint sup-t bands):** When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are computed across horizons `1..L_max` within each path. **Methodology:** a single `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons within the path. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is then used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon, surfaced in `path_effects[path]["horizons"][l]["cband_conf_int"]` and at top-level `results.path_sup_t_bands[path] = {"crit_value", "alpha", "n_bootstrap", "method", "n_valid_horizons"}`. **Gates:** a path must have `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band; otherwise the path is absent from `path_sup_t_bands`. Both gates mirror the OVERALL `event_study_sup_t_bands` semantics at `chaisemartin_dhaultfoeuille_bootstrap.py:605,612`: `len(valid_horizons) >= 2` AND `finite_mask.sum() > 0.5 * n_bootstrap`. Exactly half-finite draws are NOT enough — the gate is strictly greater than half. **Empty-state contract:** `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates. **`to_dataframe(level="by_path")` integration:** the table now includes `cband_lower` / `cband_upper` columns for parity with OVERALL `level="event_study"`; populated for positive-horizon rows of paths with a finite sup-t crit, NaN for placebo rows / unbanded paths / the requested-but-empty fallback DataFrame. **Methodology asymmetry vs OVERALL:** OVERALL sup-t reuses the same multi-horizon shared-draw distribution for both the SE in the t-stat denominator and the bootstrap distribution in the numerator. The per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — numerator: fresh shared draws, denominator: bootstrap SEs from the earlier independent draws. Asymptotically equivalent to OVERALL's self-consistent reuse, but NOT bit-identical. The fresh draw is intentional: it preserves RNG-state isolation and keeps every existing per-path SE seed-reproducibility test bit-stable post-implementation. **Inherited deviation from R:** the bootstrap SE used as the t-stat denominator carries the cross-path cohort-sharing SE deviation from R documented for `path_effects` above; the per-path sup-t crit therefore inherits the same deviation. **Interpretation:** the band covers joint inference *within a single path across horizons*; it does NOT provide simultaneous coverage *across paths* (a different inference target requiring a `path × horizon` re-derivation, deferred to a future wave). **Deviation from R:** `did_multiplegt_dyn` provides no joint / sup-t / simultaneous bands at any surface — this is a Python-only methodology extension, consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Regression test anchor: `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands`. diff --git a/tests/test_chaisemartin_dhaultfoeuille_parity.py b/tests/test_chaisemartin_dhaultfoeuille_parity.py index 98f6c5f2..550a43c6 100644 --- a/tests/test_chaisemartin_dhaultfoeuille_parity.py +++ b/tests/test_chaisemartin_dhaultfoeuille_parity.py @@ -770,10 +770,16 @@ class TestDCDHDynRParityByPathControls: On the ``multi_path_reversible`` DGP, every switcher shares baseline ``D_{g,1}=0``, so R's per-path control pool reduces to the global control pool we use — and the per-baseline OLS - residualization coefficients coincide. Per-path point estimates - then match R exactly (measured rtol ~1e-11). Per-path SE inherits - the documented cross-path cohort-sharing deviation (Phase 2 - envelope, ~6.5% rtol on this scenario). + residualization coefficients coincide on the one-observation-per- + ``(g, t)`` panel this fixture produces. Per-path point estimates + then match R bit-exactly (measured rtol ~1e-11). On + multi-observation-per-cell panels the library's equal-cell- + weighting first stage diverges from R's ``N_gt``-weighted first + stage per the existing DID^X cell-weighting deviation documented + in ``docs/methodology/REGISTRY.md`` "Note (Phase 3 DID^X covariate + adjustment)" — that deviation is independent of the by_path lift. + Per-path SE inherits the documented cross-path cohort-sharing + deviation (Phase 2 envelope, ~6.5% rtol on this scenario). **On multi-baseline switcher panels** the residualization coefficients vary per path under R's per-path call, producing a From 6785d6a8a29da219c361f2e28029fd64f87fbcd7 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 21:06:02 -0400 Subject: [PATCH 6/6] Address PR #378 R4 P3: rename test to reflect actual scope MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Test name `test_single_baseline_heterogeneous_F_g_no_warning_and_matches_r` oversold its scope — the body only checks warning suppression and finite effects. The numeric R-parity assertion (rtol ~1e-11 per-path) lives in TestDCDHDynRParityByPathControls which fits the same scenario and compares against the golden values. Renamed to `test_single_baseline_heterogeneous_F_g_does_not_warn` and updated docstring + inline comment to spell out: this test locks the warning-suppression invariant; the numeric R-parity check is in the parity test class. Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/test_chaisemartin_dhaultfoeuille.py | 36 +++++++++++++++-------- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/tests/test_chaisemartin_dhaultfoeuille.py b/tests/test_chaisemartin_dhaultfoeuille.py index 993cebde..8df6954c 100644 --- a/tests/test_chaisemartin_dhaultfoeuille.py +++ b/tests/test_chaisemartin_dhaultfoeuille.py @@ -6660,16 +6660,16 @@ def test_single_baseline_panel_does_not_emit_r_deviation_warning(self): f"panel: {deviation_msgs}" ) - def test_single_baseline_heterogeneous_F_g_no_warning_and_matches_r(self): - """Pin the precise parity condition: single-baseline switcher - panel with HETEROGENEOUS ``F_g`` across paths produces (a) no - multi-baseline UserWarning and (b) per-path point estimates that - match R bit-exactly. Uses the + def test_single_baseline_heterogeneous_F_g_does_not_warn(self): + """Pin the precise warning condition: single-baseline switcher + panel with HETEROGENEOUS ``F_g`` across paths must NOT trigger + the multi-baseline R-deviation warning, and the fit must + produce finite per-path effects. Uses the ``multi_path_reversible_by_path_controls`` golden-value scenario, whose switchers all share ``D_{g,1}=0`` while ``F_g`` spans [0..6] across 4 distinct observed paths. - Why this is the right parity condition (not just a global + Why this is the right warning condition (not just a global baseline check): R's per-path subset (``R/R/did_multiplegt_dyn.R`` lines 401-405) includes ``yet_to_switch=1`` rows with matching baseline regardless of @@ -6678,7 +6678,17 @@ def test_single_baseline_heterogeneous_F_g_no_warning_and_matches_r(self): switchers with matching baseline + all rows of never-switchers with matching baseline) — bit-identical to our global first- stage sample under single-baseline conditions, even when ``F_g`` - and path identity vary across switchers.""" + and path identity vary across switchers. + + The actual numeric R-parity assertion (rtol ~1e-11 on per-path + point estimates) lives in + ``tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls::test_parity_multi_path_reversible_by_path_controls``, + which fits the same scenario and compares cell-by-cell against + the R-generated golden values. This test deliberately does NOT + duplicate that numeric check; it locks the warning-suppression + invariant on the same fixture so future changes to either the + warning predicate or the parity scenario keep both surfaces + coherent.""" data = _load_by_path_controls_scenario() # Sanity: panel has multiple distinct switcher F_g values but a @@ -6732,12 +6742,12 @@ def test_single_baseline_heterogeneous_F_g_no_warning_and_matches_r(self): "heterogeneity), so this scenario must NOT trigger the warning." ) - # Locked numeric checks: per-path point estimates from this fit - # match the R `did_multiplegt_dyn(..., by_path=3, controls="X1")` - # output to rtol ~1e-11 on this scenario (the parity test in - # `test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls` - # asserts this against the golden values; here we lock the - # internal-only invariant that the estimates are produced). + # Lock the local invariant that the fit produces non-empty + # finite per-path estimates on this scenario. The numeric R- + # parity assertion (per-path point estimates within rtol ~1e-11 + # of R) is locked separately in + # `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls` + # against the golden values. assert res.path_effects is not None and len(res.path_effects) >= 1 for path, entry in res.path_effects.items(): for l_h, vals in entry["horizons"].items():