Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md

Large diffs are not rendered by default.

54 changes: 54 additions & 0 deletions benchmarks/R/generate_dcdh_dynr_test_values.R
Original file line number Diff line number Diff line change
Expand Up @@ -699,6 +699,60 @@ scenarios$multi_path_reversible_by_path_placebo <- list(
results = extract_dcdh_by_path(res15, n_effects = 3, n_placebos = 2)
)

# Scenario 16: multi_path_reversible + by_path=3 + controls="X1" (Phase 3
# Wave 3 #5: by_path + DID^X residualization). Same deterministic DGP
# and n_periods=10 as scenarios 14/15, with a confounding covariate X1
# added via the same `add_covariate` helper used by scenario 10's
# `joiners_only_controls`. **R re-runs `did_multiplegt_main()` per path**
# with a path-restricted subsample (path's switchers + same-baseline
# not-yet-treated controls), so its per-baseline OLS residualization
# coefficients can vary per path (verified against
# `chaisemartinPackages/did_multiplegt_dyn` source —
# `R/R/did_multiplegt_dyn.R` lines 393-411 dispatch the per-path loop;
# `did_multiplegt_by_path` is a path-classifier preprocessor only).
# Python residualizes once on the full panel before path enumeration,
# then disaggregates per path. **The two strategies coincide on
# single-baseline switcher panels** (every switcher shares D_{g,1}=0)
# because R's per-path control pool then equals the global control pool
# # — `multi_path_reversible` is built precisely for this property, so
# per-path event-study point estimates and switcher counts must match R
# bit-exactly on the one-observation-per-(g,t) DGP this generator
# produces. (On panels with multiple observations per `(g, t)` cell, the
# library's equal-cell-weighting first stage diverges from R's `N_gt`-
# weighted first stage per the existing DID^X cell-weighting deviation
# in `docs/methodology/REGISTRY.md` "Note (Phase 3 DID^X covariate
# adjustment)" — that deviation is independent of the by_path lift.)
# Per-path SE inherits the documented cross-path cohort-sharing
# deviation from R for `path_effects`. On multi-baseline switcher panels
# the residualization coefficients can diverge per path between Python
# and R; the production fit emits a `UserWarning` in that configuration.
# Single covariate keeps the scenario tight; multi-covariate is
# exercised via internal regression tests.
cat(" Scenario 16: multi_path_reversible_by_path_controls\n")
d16 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 10,
pattern = "multi_path_reversible", seed = 116,
L_max = 3)
d16 <- add_covariate(d16, seed = 216, x_effect = 1.5)
res16 <- did_multiplegt_dyn(
df = d16, outcome = "outcome", group = "group", time = "period",
treatment = "treatment", effects = 3, by_path = 3, controls = "X1",
ci_level = 95
)
scenarios$multi_path_reversible_by_path_controls <- list(
data = list(
group = as.numeric(d16$group),
period = as.numeric(d16$period),
treatment = as.numeric(d16$treatment),
outcome = as.numeric(d16$outcome),
X1 = as.numeric(d16$X1)
),
params = list(pattern = "multi_path_reversible",
n_switcher_groups = N_GOLDEN, n_realized_groups = N_GOLDEN + 40L,
n_periods = 10, seed = 116, effects = 3, by_path = 3,
controls = "X1", ci_level = 95),
results = extract_dcdh_by_path(res16, n_effects = 3)
)

# ---------------------------------------------------------------------------
# Write output
# ---------------------------------------------------------------------------
Expand Down
114 changes: 114 additions & 0 deletions benchmarks/data/dcdh_dynr_golden_values.json

Large diffs are not rendered by default.

107 changes: 95 additions & 12 deletions diff_diff/chaisemartin_dhaultfoeuille.py
Original file line number Diff line number Diff line change
Expand Up @@ -408,10 +408,36 @@ class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin):
the object of interest) and ``L_max >= 1`` (the path window
depends on ``L_max``). Binary treatment only — non-binary
treatment + ``by_path`` is deferred. Also incompatible with
``controls``, ``trends_linear``, ``trends_nonparam``,
``heterogeneity``, ``design2``, ``honest_did``, and
``survey_design`` (each combination raises
``NotImplementedError`` in the current release).
``trends_linear``, ``trends_nonparam``, ``heterogeneity``,
``design2``, ``honest_did``, and ``survey_design`` (each
combination raises ``NotImplementedError`` in the current
release).

Compatible with ``controls`` (DID^X residualization) -- the
per-baseline OLS residualization runs once on first-differenced
``Y`` BEFORE path enumeration, so per-path point estimates,
bootstrap SE, per-path placebos, and per-path sup-t bands all
consume the residualized ``Y_mat`` automatically (Frisch-
Waugh-Lovell). Per-period effects remain unadjusted, consistent
with the existing ``controls`` + per-period DID contract.

**Deviation from R on multi-baseline switcher panels:** R
``did_multiplegt_dyn(..., by_path, controls)`` re-runs the
per-baseline residualization on each path's restricted
subsample (path's switchers + same-baseline not-yet-treated
controls), so its residualization coefficients vary per path
when switchers have different baseline values. Our global-
residualization architecture coincides with R on single-
baseline panels (every switcher shares the same ``D_{g,1}``)
and per-path point estimates match exactly on the one-
observation-per-``(g, t)`` regime; on multi-observation-per-
cell panels the existing DID^X cell-weighting deviation from
R applies (see ``docs/methodology/REGISTRY.md`` "Note (Phase
3 DID^X covariate adjustment)"; independent of the by_path
lift). On multi-baseline switcher panels, point estimates can
diverge — a ``UserWarning`` is emitted at fit-time when this
configuration is detected. SE inherits the cross-path cohort-
sharing deviation from R documented for ``path_effects``.

Compatible with ``n_bootstrap > 0`` -- the top-k paths are
enumerated once on the observed data (paths held fixed across
Expand Down Expand Up @@ -985,11 +1011,6 @@ def fit(
"[F_g - 1, F_g - 1 + L_max] and therefore depends on "
"the event-study horizon. Set L_max when calling fit()."
)
if controls is not None:
raise NotImplementedError(
"by_path combined with controls (DID^X residualization) "
"is deferred to a future release."
)
if trends_linear:
raise NotImplementedError(
"by_path combined with trends_linear (DID^{fd}) is "
Expand Down Expand Up @@ -1450,9 +1471,14 @@ def fit(
#
# When controls are specified, residualize Y_mat by partialling
# out covariate effects per baseline treatment group. This
# transforms Y_mat in-place so ALL downstream DID computations
# (per-period and per-group multi-horizon) automatically produce
# covariate-adjusted estimates. See Web Appendix Section 1.2.
# transforms Y_mat so the per-group multi-horizon DID path
# (event_study_effects, overall_att, joiners/leavers, by_path
# surfaces, placebos, sup-t bands) automatically produces
# covariate-adjusted estimates. The per-period DID path
# (per_period_effects) intentionally remains on raw outcomes —
# it uses binary joiner/leaver categorization and is not part
# of the DID^X contract per REGISTRY.md "Note (Phase 3 DID^X
# covariate adjustment)". See Web Appendix Section 1.2.
# ------------------------------------------------------------------
covariate_diagnostics: Optional[Dict[str, Any]] = None
_switch_metadata_computed = False
Expand All @@ -1473,6 +1499,63 @@ def fit(
)
_switch_metadata_computed = True

# by_path + controls residualization-sample deviation from R.
# R's `did_multiplegt_dyn(..., by_path, controls)` calls
# `did_multiplegt_main()` once per path with `df_main` filtered
# to: rows of the path's switchers OR rows where
# `yet_to_switch=1 AND baseline matches the path's baseline`
# (R/R/did_multiplegt_dyn.R lines 401-405). Inside the per-path
# `did_multiplegt_main()` call, the per-baseline first-stage
# residualization regression uses `(g, t)` cells where g's
# treatment hasn't changed yet at t. Critically, R's path-
# restricted subset INCLUDES the pre-switch rows of OTHER-path
# switchers via the `yet_to_switch=1 AND baseline matches`
# clause, so the first-stage SAMPLE that R uses for path B
# equals: pre-switch rows of all switchers with matching
# baseline + all rows of never-switchers with matching
# baseline. This is BIT-IDENTICAL to the first-stage sample
# we use under our global residualization — first-stage
# coefficients (and therefore residualized outcomes) coincide,
# and per-path point estimates match R exactly **under single-
# baseline switcher panels** (every switcher has the same
# `D_{g,1}`, regardless of how `F_g` varies across paths or
# within a path). Empirical confirmation: the
# `multi_path_reversible_by_path_controls` R-parity scenario
# has 4 paths with switcher `F_g` values spanning [0..6] under
# `D_{g,1}=0` for every switcher, and Python matches R to
# rtol ~1e-11 across all `(path, horizon)` cells.
#
# On MULTI-baseline switcher panels the per-baseline regression
# coefficients diverge per path under R (R's per-path subset
# for path B drops switchers whose baseline differs from B's
# baseline), so point estimates can diverge between Python and
# R — warn the user explicitly. The check filters to switcher
# groups only (never-switchers do not contribute to "switcher
# baseline" multiplicity even if they appear at multiple
# `D_{g,1}` values across the never-treated / always-treated
# control mix). SE inheritance (cross-path cohort-sharing) is
# documented separately in REGISTRY.md.
if self.by_path is not None:
_switcher_mask = first_switch_idx_arr >= 0
if _switcher_mask.any():
_switcher_baselines = baselines[_switcher_mask]
if np.unique(_switcher_baselines).size > 1:
warnings.warn(
"by_path + controls: switcher baselines D_{g,1} "
"take multiple values in this panel. Python "
"residualizes once on the full panel before path "
"enumeration; R `did_multiplegt_dyn(..., by_path, "
"controls)` re-runs residualization per path on "
"the path-restricted subsample, so per-path point "
"estimates can diverge between Python and R on "
"this panel. See `docs/methodology/REGISTRY.md` "
"(`Note (Phase 3 by_path ...)` -> Per-path "
"covariate residualization) for the full "
"deviation contract.",
UserWarning,
stacklevel=2,
)

Y_mat_residualized, covariate_diagnostics, _failed_baselines = (
_compute_covariate_residualization(
Y_mat=Y_mat,
Expand Down
Loading
Loading