Skip to content
Merged
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]

### Added
- **`ChaisemartinDHaultfoeuille.by_path` is now compatible with `trends_linear` (DID^{fd} group-specific linear trends) and `trends_nonparam` (state-set trends).** For `trends_linear`, the first-differencing transform runs once globally before path enumeration, so per-path raw second-differences `DID^{fd}_{path, l}` surface on `path_effects[path]["horizons"][l]` automatically. Per-path **cumulated level effects** `delta_{path, l} = sum_{l'=1..l} DID^{fd}_{path, l'}` (the quantity R returns under `did_multiplegt_dyn(..., by_path, trends_lin)`) surface on the new `results.path_cumulated_event_study[path][l]` field, mirroring the global `linear_trends_effects` cumulation. `to_dataframe(level="by_path")` exposes `cumulated_effect` / `cumulated_se` columns (always present, NaN-when-None — mirrors the `cband_*` convention from PR #374); `summary()` renders a "Cumulated Level Effects (DID^{fd}, trends_linear)" sub-section under each per-path block. SE on the cumulated layer is the conservative upper bound (sum of per-horizon component SEs, NaN-consistent), matching the global `linear_trends_effects` convention. Path enumeration runs on the post-first-differenced `N_mat_fd`: switchers with `F_g==2` fail the window-eligibility check and are dropped from path enumeration entirely (the existing global `F_g >= 3` warning still surfaces the issue), so a path whose switchers all have `F_g < 3` is silently absent from `path_effects` rather than present-with-NaN. Placebo under `trends_linear` returns RAW per-horizon values — there is no per-path placebo cumulation surface in either Python or R. For `trends_nonparam`, the set membership column is validated and stored once globally as `set_ids_arr`; the `set_ids` parameter is now threaded through the four per-path IF helpers (`_compute_path_effects`, `_compute_path_placebos`, `_collect_path_bootstrap_inputs`, `_collect_path_placebo_bootstrap_inputs`) so per-path analytical SE, bootstrap, placebos, and sup-t bands all consume the set-restricted control pool automatically. Per-period effects remain unadjusted under both extensions, consistent with the existing per-period DID contract. Validated against R via two new golden-value scenarios: `single_baseline_multi_path_by_path_trends_lin` (n_periods=13, F_g >= 4, cohort-single-path; per-path cumulated point estimates match R bit-exactly with `POINT_RTOL=1e-9`, cumulated SE within `CUM_SE_RTOL=0.20`) and `multi_path_reversible_by_path_trends_nonparam` (per-path point estimates AND placebos match R bit-exactly with `POINT_RTOL=1e-9`, per-path SE within `SE_RTOL=0.15`). **F_g=3 boundary-case divergence (`by_path + trends_linear`):** `F_g=3` switchers have only 1 valid pre-window Z value after first-differencing, triggering 30%+ relative divergence between Python and R per-path point estimates on paths whose switchers include `F_g=3`. A targeted `UserWarning` fires at fit-time on this regime; R parity is asserted only on the `F_g >= 4` parity fixture. Placebo parity for `trends_linear` is intentionally skipped (R's per-path placebo computation re-runs on the path-restricted subsample with different control eligibility than Python's global-then-disaggregate architecture surfaces; placebo + `trends_linear` is exercised via internal regression only). Cross-path cohort-sharing SE deviation from R documented for `path_effects` is inherited unchanged. Gates at `chaisemartin_dhaultfoeuille.py:1014-1023` removed; `by_path` docstring updated to add the two new compatibility paragraphs and remove `trends_linear` / `trends_nonparam` from the incompatible list. R-parity tests at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathTrendsLinear` and `::TestDCDHDynRParityByPathTrendsNonparam`; cross-surface regressions at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathTrendsLinear` and `::TestByPathTrendsNonparam`. See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path linear-trends DID^{fd}" and "Per-path state-set trends" for the full contract.
- **HAD `trends_lin=True` linear-trend detrending mode** on `HeterogeneousAdoptionDiD.fit(aggregate="event_study")`, `joint_pretrends_test`, and `joint_homogeneity_test`. Mirrors R `DIDHAD::did_had(..., trends_lin=TRUE)` (paper Eq. 17 / Eq. 18 / page 32 joint-Stute homogeneity-with-trends). Per-group linear-trend slope estimated as `Y[g, F-1] - Y[g, F-2]` and applied as `(t - base) × slope` adjustment to per-event-time outcome evolutions. Requires F ≥ 3 (panel must contain F-2). The "consumed" placebo at our event-time `e=-2` is auto-dropped (R reduces max placebo lag by 1 with the same effect). Mutually exclusive with survey weighting (`survey_design` / `survey` / `weights`): raises `NotImplementedError` per `feedback_per_method_survey_element_contract` (weighted slope estimator not derived from paper; tracked in TODO.md as a follow-up). Bit-exact backcompat for `trends_lin=False` (default). Patch-level (additive keyword-only kwarg).
- **HAD R-package end-to-end parity test** vs `DIDHAD` v2.0.0 (`Credible-Answers/did_had`) on the **`design="continuous_at_zero"` (Design 1') surface**. New parity fixture `benchmarks/data/did_had_golden.json` generated by `benchmarks/R/generate_did_had_golden.R` covers 3 paper-derived synthetic DGPs (Uniform, Beta(2,2), Beta(0.5,1)) × 5 method combinations (overall, event-study, placebo, yatchew, trends_lin). The harness explicitly forces `HeterogeneousAdoptionDiD(design="continuous_at_zero")` because R `did_had` always evaluates the local-linear at `d=0` regardless of dose distribution; our default `design="auto"` may legitimately choose `continuous_near_d_lower` or `mass_point` on dose distributions with boundary density bounded away from zero (e.g., Beta(2,2)) and thereby diverge from R numerically — that divergence is methodologically defensible but out of scope for this parity test. Python parity test `tests/test_did_had_parity.py` asserts point estimate / SE / CI bounds at `atol=1e-8` and Yatchew T-stat at `atol=1e-10` after a documented `× G/(G-1)` finite-sample convention shift. Two intentional convention deviations from R, documented in `docs/methodology/REGISTRY.md`: (a) we report the bias-corrected point estimate (modern CCF 2018 convention; R's `Estimate` column reports the conventional estimate with the bias-corrected CI separately — our `att` matches R's CI midpoint); (b) Yatchew uses paper Appendix E's literal (1/G) variance-denominator convention while R uses base-R `var()`'s (1/(N-1)) sample-variance convention (parity is bit-exact after the `× G/(G-1)` shift). Yatchew on placebos with R's mean-independence null (`order=0`) is not yet exposed in our `yatchew_hr_test` (we currently only support the linearity null) and is skipped in the parity test; tracked as TODO follow-up.

Expand Down
174 changes: 174 additions & 0 deletions benchmarks/R/generate_dcdh_dynr_test_values.R
Original file line number Diff line number Diff line change
Expand Up @@ -753,6 +753,180 @@ scenarios$multi_path_reversible_by_path_controls <- list(
results = extract_dcdh_by_path(res16, n_effects = 3)
)

# Scenario 17: single-baseline multi-path + by_path=3 + trends_lin=TRUE
# (Phase 3 Wave 3 #6: by_path + DID^{fd} group-specific linear trends).
# Custom inline single-baseline DGP — `multi_path_reversible` (Scenarios
# 13-16) concentrates each path on 1-2 F_g values, which after R's
# per-path subset + trends_lin's F_g==2 filter collapses to a single F_g,
# violating the dCDH staggered design restriction inside R's per-path
# `did_multiplegt_main` call. `mixed_single_switch` (Scenario 17's first
# attempt) is MULTI-baseline (joiners + leavers), which triggers the
# documented multi-baseline divergence between Python's global-then-
# disaggregate architecture and R's per-path full-pipeline call (same
# divergence pattern as `controls`; rel diffs of 7-19% on point estimates
# observed). To get clean single-baseline parity AND avoid the trends_lin
# F_g concentration trap, this scenario uses a custom DGP: all groups
# start at D=0 (single baseline), 3 paths span F_g ∈ {3,4,5} (all >= 3
# so trends_lin's F_g==2 filter is a no-op), per-path counts unequal so
# top-k ranking is deterministic. **R returns the cumulated level effect
# delta_l per horizon, NOT the raw second-difference DID^{fd}_l** —
# verified empirically against the existing `joiners_only_trends_lin`
# parity test (`tests/test_chaisemartin_dhaultfoeuille_parity.py:403-409`
# documents this convention). The Python parity test compares Python's
# `path_cumulated_event_study[path][l]` against R's per-path Effect_l.
# Placebos under trends_lin remain RAW per-horizon (no cumulated placebo
# surface in R either), so the Python parity test compares
# `path_placebo_event_study[path][-l]` against R's per-path Placebo_l
# directly. Cumulated SE_RTOL is widened (~0.20 vs 0.12 used for
# non-cumulated by_path parity) because the conservative upper-bound SE
# (sum of per-horizon component SEs) compounds the cross-path
# cohort-sharing deviation under summation.
cat(" Scenario 17: single_baseline_multi_path_by_path_trends_lin\n")
{
# Custom DGP: 80 switchers across 3 paths × 2 distinct F_g per path,
# all single-baseline (D_{g,1}=0). Each F_g maps to exactly ONE path
# (cohort-single-path, eliminating the cross-path cohort-sharing
# deviation from R that PR #360 documented for path_effects). All
# F_g >= 3 so trends_lin's F_g==2 filter is a no-op. Two distinct
# F_g per path satisfies R's staggered-design requirement inside the
# per-path `did_multiplegt_main` call. n_periods=11, L_max=3 gives
# F_g in [2,8] = 7 values; we use F_g in {3..8} = 6 values, two per
# path. Plus 20 never-treated + 20 always-treated controls
# (n_realized_groups = 120).
set.seed(117)
n_periods17 <- 13
L_max17 <- 3
target_paths17 <- list(
c(0L, 1L, 1L, 1L), # path 1, sustained on (rank 1)
c(0L, 1L, 1L, 0L), # path 2, on-then-off (rank 2)
c(0L, 1L, 0L, 0L) # path 3, on briefly (rank 3)
)
# F_g-to-path mapping (each F_g unique to one path).
# All F_g >= 4 to avoid the F_g=3 boundary case under trends_lin —
# F_g=3 leaves only 1 valid pre-window Z value after the time==1
# filter, causing R's per-path call (which re-runs the full pipeline
# on the path subset) to handle pre-window/control eligibility
# differently from Python's global first-differencing. F_g >= 4 gives
# both implementations 2+ valid pre-window Z values, eliminating the
# boundary-case divergence.
# F_g=4 -> path 1, F_g=5 -> path 1 (path 1 = 38)
# F_g=6 -> path 2, F_g=7 -> path 2 (path 2 = 24)
# F_g=8 -> path 3, F_g=9 -> path 3 (path 3 = 18)
# Total switchers = 80
fg_path_counts17 <- list(
list(F_g = 4L, path_idx = 1L, count = 20L),
list(F_g = 5L, path_idx = 1L, count = 18L),
list(F_g = 6L, path_idx = 2L, count = 13L),
list(F_g = 7L, path_idx = 2L, count = 11L),
list(F_g = 8L, path_idx = 3L, count = 11L),
list(F_g = 9L, path_idx = 3L, count = 7L)
)
n_switchers17 <- sum(sapply(fg_path_counts17, function(x) x$count))
stopifnot(n_switchers17 == 80L)
D17 <- matrix(0L, nrow = n_switchers17, ncol = n_periods17)
g17 <- 1L
for (entry in fg_path_counts17) {
F_g <- entry$F_g
target <- target_paths17[[entry$path_idx]]
n_here <- entry$count
for (k in seq_len(n_here)) {
# Pre-baseline [1..F_g-2]: D=0 (single-baseline contract)
if (F_g >= 3L) D17[g17, 1:(F_g - 2L)] <- 0L
# Window [F_g-1..F_g-1+L_max]: target path
for (j in 0:L_max17) D17[g17, F_g - 1L + j] <- target[j + 1L]
# Post-window: stable at path[L_max+1]
if (F_g + L_max17 <= n_periods17) {
D17[g17, (F_g + L_max17):n_periods17] <- target[L_max17 + 1L]
}
g17 <- g17 + 1L
}
}
# Append 20 never-treated and 20 always-treated controls
D17 <- rbind(D17,
matrix(0L, nrow = 20L, ncol = n_periods17),
matrix(1L, nrow = 20L, ncol = n_periods17))
n_total17 <- nrow(D17)
# Generate fixed effects, treatment effects, outcomes (mirror gen_reversible
# parameters: group_fe_sd=2.0, treatment_effect=2.0, time_trend=0.1, noise_sd=0.5)
set.seed(117L)
group_fe17 <- rnorm(n_total17, 0, 2.0)
noise17 <- matrix(rnorm(n_total17 * n_periods17, 0, 0.5),
nrow = n_total17, ncol = n_periods17)
period_arr17 <- 0:(n_periods17 - 1L)
Y17 <- 10.0 +
matrix(group_fe17, nrow = n_total17, ncol = n_periods17) +
matrix(0.1 * period_arr17, nrow = n_total17, ncol = n_periods17, byrow = TRUE) +
2.0 * D17 +
noise17
# Build long data frame
d17 <- data.frame(
group = rep(seq_len(n_total17) - 1L, each = n_periods17),
period = rep(period_arr17, n_total17),
treatment = as.vector(t(D17)),
outcome = as.vector(t(Y17))
)
# Inject per-group linear trends (Scenario 11 pattern)
set.seed(217L)
groups17 <- sort(unique(d17$group))
g_trends17 <- setNames(rnorm(length(groups17), 0, 0.5),
as.character(groups17))
d17$outcome <- d17$outcome +
g_trends17[as.character(d17$group)] * d17$period
res17 <- did_multiplegt_dyn(
df = d17, outcome = "outcome", group = "group", time = "period",
treatment = "treatment", effects = 3, placebo = 1, by_path = 3,
trends_lin = TRUE, ci_level = 95
)
scenarios$single_baseline_multi_path_by_path_trends_lin <- list(
data = list(
group = as.numeric(d17$group),
period = as.numeric(d17$period),
treatment = as.numeric(d17$treatment),
outcome = as.numeric(d17$outcome)
),
params = list(pattern = "single_baseline_multi_path",
n_switcher_groups = 80L, n_realized_groups = 120L,
n_periods = 13L, seed = 117L, effects = 3, placebo = 1,
by_path = 3, trends_lin = TRUE, ci_level = 95),
results = extract_dcdh_by_path(res17, n_effects = 3, n_placebos = 1)
)
}

# Scenario 18: multi_path_reversible + by_path=3 + trends_nonparam="state"
# (Phase 3 Wave 3 #7: by_path + state-set trends). Same deterministic DGP
# and n_periods=10 as Scenarios 16/17, with a 3-state column added
# (deterministic per-group assignment via `((group - 1) %% 3) + 1` so
# within-set controls are guaranteed to exist for each path). **R does
# NOT cumulate or first-difference under trends_nonparam** — Effect_l
# per horizon is a normal DID with set-restricted control pool. The
# Python parity test compares per-path raw DID per (path, l) directly
# against R's per-path Effect_l. Placebos likewise are raw per-horizon.
# Per-path R parity matches exactly on single-baseline panels.
cat(" Scenario 18: multi_path_reversible_by_path_trends_nonparam\n")
d18 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 10,
pattern = "multi_path_reversible", seed = 118,
L_max = 3)
d18$state <- ((d18$group - 1) %% 3) + 1
res18 <- did_multiplegt_dyn(
df = d18, outcome = "outcome", group = "group", time = "period",
treatment = "treatment", effects = 3, placebo = 1, by_path = 3,
trends_nonparam = "state", ci_level = 95
)
scenarios$multi_path_reversible_by_path_trends_nonparam <- list(
data = list(
group = as.numeric(d18$group),
period = as.numeric(d18$period),
treatment = as.numeric(d18$treatment),
outcome = as.numeric(d18$outcome),
state = as.numeric(d18$state)
),
params = list(pattern = "multi_path_reversible",
n_switcher_groups = N_GOLDEN, n_realized_groups = N_GOLDEN + 40L,
n_periods = 10, seed = 118, effects = 3, placebo = 1,
by_path = 3, trends_nonparam = "state", ci_level = 95),
results = extract_dcdh_by_path(res18, n_effects = 3, n_placebos = 1)
)

# ---------------------------------------------------------------------------
# Write output
# ---------------------------------------------------------------------------
Expand Down
Loading
Loading