igerber · igerber · Apr 26, 2026 · Apr 26, 2026 · Apr 26, 2026 · Apr 26, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+- **`ChaisemartinDHaultfoeuille.by_path` is now compatible with `trends_linear` (DID^{fd} group-specific linear trends) and `trends_nonparam` (state-set trends).** For `trends_linear`, the first-differencing transform runs once globally before path enumeration, so per-path raw second-differences `DID^{fd}_{path, l}` surface on `path_effects[path]["horizons"][l]` automatically. Per-path **cumulated level effects** `delta_{path, l} = sum_{l'=1..l} DID^{fd}_{path, l'}` (the quantity R returns under `did_multiplegt_dyn(..., by_path, trends_lin)`) surface on the new `results.path_cumulated_event_study[path][l]` field, mirroring the global `linear_trends_effects` cumulation. `to_dataframe(level="by_path")` exposes `cumulated_effect` / `cumulated_se` columns (always present, NaN-when-None — mirrors the `cband_*` convention from PR #374); `summary()` renders a "Cumulated Level Effects (DID^{fd}, trends_linear)" sub-section under each per-path block. SE on the cumulated layer is the conservative upper bound (sum of per-horizon component SEs, NaN-consistent), matching the global `linear_trends_effects` convention. Path enumeration runs on the post-first-differenced `N_mat_fd`: switchers with `F_g==2` fail the window-eligibility check and are dropped from path enumeration entirely (the existing global `F_g >= 3` warning still surfaces the issue), so a path whose switchers all have `F_g < 3` is silently absent from `path_effects` rather than present-with-NaN. Placebo under `trends_linear` returns RAW per-horizon values — there is no per-path placebo cumulation surface in either Python or R. For `trends_nonparam`, the set membership column is validated and stored once globally as `set_ids_arr`; the `set_ids` parameter is now threaded through the four per-path IF helpers (`_compute_path_effects`, `_compute_path_placebos`, `_collect_path_bootstrap_inputs`, `_collect_path_placebo_bootstrap_inputs`) so per-path analytical SE, bootstrap, placebos, and sup-t bands all consume the set-restricted control pool automatically. Per-period effects remain unadjusted under both extensions, consistent with the existing per-period DID contract. Validated against R via two new golden-value scenarios: `single_baseline_multi_path_by_path_trends_lin` (n_periods=13, F_g >= 4, cohort-single-path; per-path cumulated point estimates match R bit-exactly with `POINT_RTOL=1e-9`, cumulated SE within `CUM_SE_RTOL=0.20`) and `multi_path_reversible_by_path_trends_nonparam` (per-path point estimates AND placebos match R bit-exactly with `POINT_RTOL=1e-9`, per-path SE within `SE_RTOL=0.15`). **F_g=3 boundary-case divergence (`by_path + trends_linear`):** `F_g=3` switchers have only 1 valid pre-window Z value after first-differencing, triggering 30%+ relative divergence between Python and R per-path point estimates on paths whose switchers include `F_g=3`. A targeted `UserWarning` fires at fit-time on this regime; R parity is asserted only on the `F_g >= 4` parity fixture. Placebo parity for `trends_linear` is intentionally skipped (R's per-path placebo computation re-runs on the path-restricted subsample with different control eligibility than Python's global-then-disaggregate architecture surfaces; placebo + `trends_linear` is exercised via internal regression only). Cross-path cohort-sharing SE deviation from R documented for `path_effects` is inherited unchanged. Gates at `chaisemartin_dhaultfoeuille.py:1014-1023` removed; `by_path` docstring updated to add the two new compatibility paragraphs and remove `trends_linear` / `trends_nonparam` from the incompatible list. R-parity tests at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathTrendsLinear` and `::TestDCDHDynRParityByPathTrendsNonparam`; cross-surface regressions at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathTrendsLinear` and `::TestByPathTrendsNonparam`. See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path linear-trends DID^{fd}" and "Per-path state-set trends" for the full contract.
 - **HAD `trends_lin=True` linear-trend detrending mode** on `HeterogeneousAdoptionDiD.fit(aggregate="event_study")`, `joint_pretrends_test`, and `joint_homogeneity_test`. Mirrors R `DIDHAD::did_had(..., trends_lin=TRUE)` (paper Eq. 17 / Eq. 18 / page 32 joint-Stute homogeneity-with-trends). Per-group linear-trend slope estimated as `Y[g, F-1] - Y[g, F-2]` and applied as `(t - base) × slope` adjustment to per-event-time outcome evolutions. Requires F ≥ 3 (panel must contain F-2). The "consumed" placebo at our event-time `e=-2` is auto-dropped (R reduces max placebo lag by 1 with the same effect). Mutually exclusive with survey weighting (`survey_design` / `survey` / `weights`): raises `NotImplementedError` per `feedback_per_method_survey_element_contract` (weighted slope estimator not derived from paper; tracked in TODO.md as a follow-up). Bit-exact backcompat for `trends_lin=False` (default). Patch-level (additive keyword-only kwarg).
 - **HAD R-package end-to-end parity test** vs `DIDHAD` v2.0.0 (`Credible-Answers/did_had`) on the **`design="continuous_at_zero"` (Design 1') surface**. New parity fixture `benchmarks/data/did_had_golden.json` generated by `benchmarks/R/generate_did_had_golden.R` covers 3 paper-derived synthetic DGPs (Uniform, Beta(2,2), Beta(0.5,1)) × 5 method combinations (overall, event-study, placebo, yatchew, trends_lin). The harness explicitly forces `HeterogeneousAdoptionDiD(design="continuous_at_zero")` because R `did_had` always evaluates the local-linear at `d=0` regardless of dose distribution; our default `design="auto"` may legitimately choose `continuous_near_d_lower` or `mass_point` on dose distributions with boundary density bounded away from zero (e.g., Beta(2,2)) and thereby diverge from R numerically — that divergence is methodologically defensible but out of scope for this parity test. Python parity test `tests/test_did_had_parity.py` asserts point estimate / SE / CI bounds at `atol=1e-8` and Yatchew T-stat at `atol=1e-10` after a documented `× G/(G-1)` finite-sample convention shift. Two intentional convention deviations from R, documented in `docs/methodology/REGISTRY.md`: (a) we report the bias-corrected point estimate (modern CCF 2018 convention; R's `Estimate` column reports the conventional estimate with the bias-corrected CI separately — our `att` matches R's CI midpoint); (b) Yatchew uses paper Appendix E's literal (1/G) variance-denominator convention while R uses base-R `var()`'s (1/(N-1)) sample-variance convention (parity is bit-exact after the `× G/(G-1)` shift). Yatchew on placebos with R's mean-independence null (`order=0`) is not yet exposed in our `yatchew_hr_test` (we currently only support the linearity null) and is skipped in the parity test; tracked as TODO follow-up.
 

diff --git a/benchmarks/R/generate_dcdh_dynr_test_values.R b/benchmarks/R/generate_dcdh_dynr_test_values.R
@@ -753,6 +753,180 @@ scenarios$multi_path_reversible_by_path_controls <- list(
   results = extract_dcdh_by_path(res16, n_effects = 3)
 )
 
+# Scenario 17: single-baseline multi-path + by_path=3 + trends_lin=TRUE
+# (Phase 3 Wave 3 #6: by_path + DID^{fd} group-specific linear trends).
+# Custom inline single-baseline DGP — `multi_path_reversible` (Scenarios
+# 13-16) concentrates each path on 1-2 F_g values, which after R's
+# per-path subset + trends_lin's F_g==2 filter collapses to a single F_g,
+# violating the dCDH staggered design restriction inside R's per-path
+# `did_multiplegt_main` call. `mixed_single_switch` (Scenario 17's first
+# attempt) is MULTI-baseline (joiners + leavers), which triggers the
+# documented multi-baseline divergence between Python's global-then-
+# disaggregate architecture and R's per-path full-pipeline call (same
+# divergence pattern as `controls`; rel diffs of 7-19% on point estimates
+# observed). To get clean single-baseline parity AND avoid the trends_lin
+# F_g concentration trap, this scenario uses a custom DGP: all groups
+# start at D=0 (single baseline), 3 paths span F_g ∈ {3,4,5} (all >= 3
+# so trends_lin's F_g==2 filter is a no-op), per-path counts unequal so
+# top-k ranking is deterministic. **R returns the cumulated level effect
+# delta_l per horizon, NOT the raw second-difference DID^{fd}_l** —
+# verified empirically against the existing `joiners_only_trends_lin`
+# parity test (`tests/test_chaisemartin_dhaultfoeuille_parity.py:403-409`
+# documents this convention). The Python parity test compares Python's
+# `path_cumulated_event_study[path][l]` against R's per-path Effect_l.
+# Placebos under trends_lin remain RAW per-horizon (no cumulated placebo
+# surface in R either), so the Python parity test compares
+# `path_placebo_event_study[path][-l]` against R's per-path Placebo_l
+# directly. Cumulated SE_RTOL is widened (~0.20 vs 0.12 used for
+# non-cumulated by_path parity) because the conservative upper-bound SE
+# (sum of per-horizon component SEs) compounds the cross-path
+# cohort-sharing deviation under summation.
+cat("  Scenario 17: single_baseline_multi_path_by_path_trends_lin\n")
+{
+  # Custom DGP: 80 switchers across 3 paths × 2 distinct F_g per path,
+  # all single-baseline (D_{g,1}=0). Each F_g maps to exactly ONE path
+  # (cohort-single-path, eliminating the cross-path cohort-sharing
+  # deviation from R that PR #360 documented for path_effects). All
+  # F_g >= 3 so trends_lin's F_g==2 filter is a no-op. Two distinct
+  # F_g per path satisfies R's staggered-design requirement inside the
+  # per-path `did_multiplegt_main` call. n_periods=11, L_max=3 gives
+  # F_g in [2,8] = 7 values; we use F_g in {3..8} = 6 values, two per
+  # path. Plus 20 never-treated + 20 always-treated controls
+  # (n_realized_groups = 120).
+  set.seed(117)
+  n_periods17 <- 13
+  L_max17 <- 3
+  target_paths17 <- list(
+    c(0L, 1L, 1L, 1L),  # path 1, sustained on (rank 1)
+    c(0L, 1L, 1L, 0L),  # path 2, on-then-off  (rank 2)
+    c(0L, 1L, 0L, 0L)   # path 3, on briefly   (rank 3)
+  )
+  # F_g-to-path mapping (each F_g unique to one path).
+  # All F_g >= 4 to avoid the F_g=3 boundary case under trends_lin —
+  # F_g=3 leaves only 1 valid pre-window Z value after the time==1
+  # filter, causing R's per-path call (which re-runs the full pipeline
+  # on the path subset) to handle pre-window/control eligibility
+  # differently from Python's global first-differencing. F_g >= 4 gives
+  # both implementations 2+ valid pre-window Z values, eliminating the
+  # boundary-case divergence.
+  #   F_g=4 -> path 1, F_g=5 -> path 1   (path 1 = 38)
+  #   F_g=6 -> path 2, F_g=7 -> path 2   (path 2 = 24)
+  #   F_g=8 -> path 3, F_g=9 -> path 3   (path 3 = 18)
+  # Total switchers = 80
+  fg_path_counts17 <- list(
+    list(F_g = 4L, path_idx = 1L, count = 20L),
+    list(F_g = 5L, path_idx = 1L, count = 18L),
+    list(F_g = 6L, path_idx = 2L, count = 13L),
+    list(F_g = 7L, path_idx = 2L, count = 11L),
+    list(F_g = 8L, path_idx = 3L, count = 11L),
+    list(F_g = 9L, path_idx = 3L, count = 7L)
+  )
+  n_switchers17 <- sum(sapply(fg_path_counts17, function(x) x$count))
+  stopifnot(n_switchers17 == 80L)
+  D17 <- matrix(0L, nrow = n_switchers17, ncol = n_periods17)
+  g17 <- 1L
+  for (entry in fg_path_counts17) {
+    F_g <- entry$F_g
+    target <- target_paths17[[entry$path_idx]]
+    n_here <- entry$count
+    for (k in seq_len(n_here)) {
+      # Pre-baseline [1..F_g-2]: D=0 (single-baseline contract)
+      if (F_g >= 3L) D17[g17, 1:(F_g - 2L)] <- 0L
+      # Window [F_g-1..F_g-1+L_max]: target path
+      for (j in 0:L_max17) D17[g17, F_g - 1L + j] <- target[j + 1L]
+      # Post-window: stable at path[L_max+1]
+      if (F_g + L_max17 <= n_periods17) {
+        D17[g17, (F_g + L_max17):n_periods17] <- target[L_max17 + 1L]
+      }
+      g17 <- g17 + 1L
+    }
+  }
+  # Append 20 never-treated and 20 always-treated controls
+  D17 <- rbind(D17,
+               matrix(0L, nrow = 20L, ncol = n_periods17),
+               matrix(1L, nrow = 20L, ncol = n_periods17))
+  n_total17 <- nrow(D17)
+  # Generate fixed effects, treatment effects, outcomes (mirror gen_reversible
+  # parameters: group_fe_sd=2.0, treatment_effect=2.0, time_trend=0.1, noise_sd=0.5)
+  set.seed(117L)
+  group_fe17 <- rnorm(n_total17, 0, 2.0)
+  noise17 <- matrix(rnorm(n_total17 * n_periods17, 0, 0.5),
+                    nrow = n_total17, ncol = n_periods17)
+  period_arr17 <- 0:(n_periods17 - 1L)
+  Y17 <- 10.0 +
+    matrix(group_fe17, nrow = n_total17, ncol = n_periods17) +
+    matrix(0.1 * period_arr17, nrow = n_total17, ncol = n_periods17, byrow = TRUE) +
+    2.0 * D17 +
+    noise17
+  # Build long data frame
+  d17 <- data.frame(
+    group = rep(seq_len(n_total17) - 1L, each = n_periods17),
+    period = rep(period_arr17, n_total17),
+    treatment = as.vector(t(D17)),
+    outcome = as.vector(t(Y17))
+  )
+  # Inject per-group linear trends (Scenario 11 pattern)
+  set.seed(217L)
+  groups17 <- sort(unique(d17$group))
+  g_trends17 <- setNames(rnorm(length(groups17), 0, 0.5),
+                         as.character(groups17))
+  d17$outcome <- d17$outcome +
+    g_trends17[as.character(d17$group)] * d17$period
+  res17 <- did_multiplegt_dyn(
+    df = d17, outcome = "outcome", group = "group", time = "period",
+    treatment = "treatment", effects = 3, placebo = 1, by_path = 3,
+    trends_lin = TRUE, ci_level = 95
+  )
+  scenarios$single_baseline_multi_path_by_path_trends_lin <- list(
+    data = list(
+      group = as.numeric(d17$group),
+      period = as.numeric(d17$period),
+      treatment = as.numeric(d17$treatment),
+      outcome = as.numeric(d17$outcome)
+    ),
+    params = list(pattern = "single_baseline_multi_path",
+                  n_switcher_groups = 80L, n_realized_groups = 120L,
+                  n_periods = 13L, seed = 117L, effects = 3, placebo = 1,
+                  by_path = 3, trends_lin = TRUE, ci_level = 95),
+    results = extract_dcdh_by_path(res17, n_effects = 3, n_placebos = 1)
+  )
+}
+
+# Scenario 18: multi_path_reversible + by_path=3 + trends_nonparam="state"
+# (Phase 3 Wave 3 #7: by_path + state-set trends). Same deterministic DGP
+# and n_periods=10 as Scenarios 16/17, with a 3-state column added
+# (deterministic per-group assignment via `((group - 1) %% 3) + 1` so
+# within-set controls are guaranteed to exist for each path). **R does
+# NOT cumulate or first-difference under trends_nonparam** — Effect_l
+# per horizon is a normal DID with set-restricted control pool. The
+# Python parity test compares per-path raw DID per (path, l) directly
+# against R's per-path Effect_l. Placebos likewise are raw per-horizon.
+# Per-path R parity matches exactly on single-baseline panels.
+cat("  Scenario 18: multi_path_reversible_by_path_trends_nonparam\n")
+d18 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 10,
+                      pattern = "multi_path_reversible", seed = 118,
+                      L_max = 3)
+d18$state <- ((d18$group - 1) %% 3) + 1
+res18 <- did_multiplegt_dyn(
+  df = d18, outcome = "outcome", group = "group", time = "period",
+  treatment = "treatment", effects = 3, placebo = 1, by_path = 3,
+  trends_nonparam = "state", ci_level = 95
+)
+scenarios$multi_path_reversible_by_path_trends_nonparam <- list(
+  data = list(
+    group = as.numeric(d18$group),
+    period = as.numeric(d18$period),
+    treatment = as.numeric(d18$treatment),
+    outcome = as.numeric(d18$outcome),
+    state = as.numeric(d18$state)
+  ),
+  params = list(pattern = "multi_path_reversible",
+                n_switcher_groups = N_GOLDEN, n_realized_groups = N_GOLDEN + 40L,
+                n_periods = 10, seed = 118, effects = 3, placebo = 1,
+                by_path = 3, trends_nonparam = "state", ci_level = 95),
+  results = extract_dcdh_by_path(res18, n_effects = 3, n_placebos = 1)
+)
+
 # ---------------------------------------------------------------------------
 # Write output
 # ---------------------------------------------------------------------------