Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
0f98f19
Restore SDID survey support for placebo and jackknife variance methods
igerber Apr 24, 2026
96c2de3
Coverage MC extension + REGISTRY Notes + docs sweep for SDID survey (…
igerber Apr 24, 2026
c2b97e0
Address PR #365 R1 P0 + P1: NaN on undefined jackknife replicate; blo…
igerber Apr 24, 2026
ddb77b2
Address PR #365 R2 P1 + P3: route PSU/FPC-only placebo through survey…
igerber Apr 24, 2026
473c6d7
Address PR #365 R3 P3 docs nits: bootstrap-only stragglers + Case B/C…
igerber Apr 24, 2026
f039e2f
Address PR #365 R4 P1 + P3: Case D guard for exact-count placebo stra…
igerber Apr 24, 2026
ffd2e50
Address PR #365 R5 P1 + P3: zero-variance vs NaN; lonely_psu contract…
igerber Apr 24, 2026
6d27a57
Address PR #365 R6 P1 + P3: full-census stratum short-circuit + docs …
igerber Apr 24, 2026
0bcda79
Address PR #365 R7 P3: gate _loo_unit_ids on unit-granularity; refres…
igerber Apr 24, 2026
cdb42fe
Address PR #365 R8 P1: drop FPC from placebo dispatch + document FPC …
igerber Apr 24, 2026
3399a71
Address PR #365 R9 P1 + P3: Case E weight-aware placebo guard + per-d…
igerber Apr 25, 2026
312f78f
Address PR #365 R10 P1 + P3: gate implicit-PSU FPC validator on boots…
igerber Apr 25, 2026
a17c8a0
Address PR #365 R11 P1: drop FPC pre-resolve on placebo + Case D effe…
igerber Apr 25, 2026
fbdba34
Address PR #365 R12 P1: distinguish lonely_psu='certainty' from 'remo…
igerber Apr 25, 2026
087fc94
Address PR #365 R13 P3: validate FPC column existence on placebo pre-…
igerber Apr 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **SDID `variance_method="bootstrap"` survey support restored** via a hybrid pairs-bootstrap + Rao-Wu rescaling composed with a weighted Frank-Wolfe kernel. Each bootstrap draw first performs the unit-level pairs-bootstrap resampling specified by Arkhangelsky et al. (2021) Algorithm 2 (`boot_idx = rng.choice(n_total)`), and *then* applies Rao-Wu rescaled per-unit weights (Rao & Wu 1988) sliced over the resampled units — NOT a standalone Rao-Wu bootstrap. New Rust kernel `sc_weight_fw_weighted` (and `_with_convergence` sibling) accepts a per-coordinate `reg_weights` argument so the FW objective becomes `min ||A·ω - b||² + ζ²·Σ_j reg_w[j]·ω[j]²`. New Python helpers `compute_sdid_unit_weights_survey` and `compute_time_weights_survey` thread per-control survey weights through the two-pass sparsify-refit dispatcher (column-scaling Y by `rw` for the loss, `reg_weights=rw` for the penalty on the unit-weights side; weighted column-centering + row-scaling Y by `sqrt(rw)` for the loss with uniform reg on the time-weights side). `_bootstrap_se` survey branch composes the per-draw `rw` (Rao-Wu rescaling for full designs, constant `w_control` for pweight-only fits) with the weighted-FW helpers, then composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Coverage MC artifact extended with a `stratified_survey` DGP (BRFSS-style: N=40, strata=2, PSU=2/stratum); the bootstrap row's near-nominal calibration is the validation gate (target rejection ∈ [0.02, 0.10] at α=0.05). New regression tests across `test_methodology_sdid.py::TestBootstrapSE` (single-PSU short-circuit, full-design and pweight-only succeeds-tests, zero-treated-mass retry, deterministic Rao-Wu × boot_idx slice) and `test_survey_phase5.py::TestSyntheticDiDSurvey` (full-design ↔ pweight-only SE differs assertion). See REGISTRY.md §SyntheticDiD ``Note (survey + bootstrap composition)`` for the full objective and the argmin-set caveat.

### Changed (PR #355)
- **SDID bootstrap SE values under survey fits now differ numerically from the v3.2.x line that shipped PR #351 alone**: the fit no longer raises `NotImplementedError`, and instead returns the weighted-FW + Rao-Wu SE. Non-survey fits are unaffected (the bootstrap dispatcher routes only the survey branch through the new `_survey` helpers; non-survey fits continue to call the existing `compute_sdid_unit_weights` / `compute_time_weights` and stay bit-identical at rel=1e-14 on the `_BASELINE["bootstrap"]` regression). SDID's `placebo` and `jackknife` paths still reject `strata/PSU/FPC` (separate methodology gap; tracked in TODO.md as a follow-up PR).
- **SDID bootstrap SE values under survey fits now differ numerically from the v3.2.x line that shipped PR #351 alone**: the fit no longer raises `NotImplementedError`, and instead returns the weighted-FW + Rao-Wu SE. Non-survey fits are unaffected (the bootstrap dispatcher routes only the survey branch through the new `_survey` helpers; non-survey fits continue to call the existing `compute_sdid_unit_weights` / `compute_time_weights` and stay bit-identical at rel=1e-14 on the `_BASELINE["bootstrap"]` regression). SDID's `placebo` and `jackknife` paths still reject `strata/PSU/FPC` on the v3.2.x line; full-design support for those methods lands separately in the entries below.

### Added
- **SDID `variance_method="placebo"` and `"jackknife"` now support strata/PSU/FPC designs.** Closes the last SDID survey gap. All three variance methods (bootstrap from PR #355, plus placebo and jackknife here) now handle full survey designs. New private methods `SyntheticDiD._placebo_variance_se_survey` and `_jackknife_se_survey` route the full-design path through method-specific allocators:
- **Placebo** — stratified permutation (Pesarin 2001). Each draw samples pseudo-treated indices uniformly without replacement from controls *within each stratum* containing actual treated units; non-treated strata contribute their controls unconditionally. The weighted Frank-Wolfe kernel from PR #355 (`compute_sdid_unit_weights_survey` / `compute_time_weights_survey`) re-estimates ω and λ per draw with per-control survey weights threaded into both loss and regularization; post-optimization composition `ω_eff = rw·ω/Σ(rw·ω)`. Arkhangelsky Algorithm 4 SE formula unchanged.
- **Jackknife** — PSU-level leave-one-out with stratum aggregation (Rust & Rao 1996). `SE² = Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)²` with `f_h = n_h_sampled / fpc[h]` (population-count FPC form). λ held fixed across LOOs; ω subsetted, composed with rw, renormalized. Strata with `n_h < 2` silently skipped (matches R `survey::svyjkn` with `lonely_psu="remove"` / `"certainty"`; `"adjust"` raises `NotImplementedError`). Full-census strata (`f_h ≥ 1`) short-circuit to zero contribution before any LOO feasibility check. `SE = 0` is returned for legitimate zero variance (e.g., every stratum full-census); `SE = NaN` with a targeted `UserWarning` is reserved for undefined cases — all strata skipped, or any delete-one replicate in a non-full-census contributing stratum is undefined (all-treated-in-one-PSU LOO, kept ω_eff / w_treated mass zero, estimator raises). Unstratified single-PSU short-circuits to NaN.
- **Fit-time feasibility guards** (placebo): `ValueError` on stratum-level infeasibility with targeted messages distinguishing three cases — **Case B** (treated-containing stratum has zero controls), **Case C** (fewer controls than treated in a treated stratum), **Case D** (every treated stratum is exact-count `n_c_h == n_t_h` → permutation support is 1, null distribution collapses). Partial-permutation fallback rejected because it would silently change the null-distribution semantics.
- **Gate relaxed**: the fit-time guard at `synthetic_did.py:352-369` that rejected placebo/jackknife + strata/PSU/FPC is removed. Replicate-weight designs remain rejected (separate methodology — replicate variance is closed-form and would double-count with Rao-Wu-like rescaling). Non-survey and pweight-only paths bit-identical by construction — the new code is gated on `resolved_survey_unit.(strata|psu|fpc) is not None`.
- **Coverage MC**: `benchmarks/data/sdid_coverage.json` extended with jackknife on `stratified_survey`. Bootstrap validates near-nominal (α=0.05 rejection = 0.058, SE/trueSD = 1.13). Jackknife reported with an anti-conservatism caveat: with only 2 PSUs per stratum the stratified jackknife formula has 1 effective DoF per stratum, a well-documented limitation of Rust & Rao (1996) — `se_over_truesd ≈ 0.46` on this DGP. Users needing tight SE calibration with few PSUs should prefer `variance_method="bootstrap"`. Placebo is structurally infeasible on the existing `stratified_survey` DGP (its cohort packs into one stratum with 0 never-treated units — by design a bootstrap-suited DGP); the placebo survey path is exercised via unit tests on a feasible fixture.
- **Regression tests** across `tests/test_survey_phase5.py`: two new classes `TestSDIDSurveyPlaceboFullDesign` and `TestSDIDSurveyJackknifeFullDesign`. Placebo: pseudo-treated-stratum contract, Case B / Case C front-door guards with targeted-message regression, SE-differs-from-pweight-only, deterministic dispatch. Jackknife: stratum-aggregation self-consistency, **FPC magnitude regression** (2-stratum handcrafted panel asserts `SE_fpc == SE_nofpc · sqrt(1-f)` at `rtol=1e-10`), single-PSU-stratum skip, unstratified short-circuit, all-strata-skipped warning + NaN, SE-differs-from-pweight-only, deterministic dispatch. Existing `test_full_design_placebo_raises` and `test_full_design_jackknife_raises` flipped to `_succeeds` assertions. All 19 existing pweight-only and non-survey placebo/jackknife tests pass unchanged (bit-identity preserved via the new-path gating).
- **Allocator asymmetry** (documented in REGISTRY): placebo ignores the PSU axis (unit-level within-stratum permutation — the classical stratified permutation test; PSU-level permutation on few PSUs is near-degenerate); jackknife respects PSU (PSU-level LOO is the canonical survey jackknife). Both respect strata. See `docs/methodology/REGISTRY.md` §SyntheticDiD `Note (survey + placebo composition)` and `Note (survey + jackknife composition)`.

## [3.2.0] - 2026-04-19

Expand Down
1 change: 0 additions & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,6 @@ Deferred items from PR reviews that were not addressed before merge.
| `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low |
| `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low |
| `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium |
| **SDID + placebo/jackknife + strata/PSU/FPC** (capability gap remaining after PR #352). PR #352 restored survey-bootstrap support via weighted Frank-Wolfe + Rao-Wu composition; the same composition for `placebo` (which permutes control indices) and `jackknife` (which leaves out one unit at a time) requires its own derivations: placebo's allocator needs a weighted permutation distribution that respects PSU clustering; jackknife needs PSU-level LOO + stratum aggregation. Both reuse the weighted-FW kernel from PR #352 (`_sc_weight_fw(reg_weights=)`); the genuinely new work is the per-method allocator. Tracked but no concrete sketch yet — defer until user demand surfaces. | `synthetic_did.py::_placebo_variance_se`, `synthetic_did.py::_jackknife_se` | follow-up | Low |
| SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |

#### Performance
Expand Down
18 changes: 9 additions & 9 deletions benchmarks/data/sdid_coverage.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"n_bootstrap": 200,
"library_version": "3.2.0",
"backend": "rust",
"generated_at": "2026-04-24T13:01:54.876774+00:00",
"generated_at": "2026-04-24T21:08:20.185764+00:00",
"total_elapsed_sec": 2420.61,
"methods": [
"placebo",
Expand Down Expand Up @@ -156,17 +156,17 @@
"se_over_truesd": 1.1297002530566618
},
"jackknife": {
"n_successful_fits": 0,
"n_successful_fits": 500,
"rejection_rate": {
"0.01": null,
"0.05": null,
"0.10": null
"0.01": 0.358,
"0.05": 0.45,
"0.10": 0.512
},
"mean_se": null,
"true_sd_tau_hat": null,
"se_over_truesd": null
"mean_se": 0.20686834633234263,
"true_sd_tau_hat": 0.4512243070193919,
"se_over_truesd": 0.4584601119980272
},
"_elapsed_sec": 16.48
"_elapsed_sec": 18.62
}
}
}
50 changes: 39 additions & 11 deletions benchmarks/python/coverage_sdid.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,18 @@
rates at α ∈ {0.01, 0.05, 0.10} plus the ratio of mean estimated SE to
the empirical sampling SD of τ̂.

The ``stratified_survey`` DGP is bootstrap-only — placebo and jackknife
still reject full strata/PSU/FPC survey designs (tracked in ``TODO.md``),
so the harness skips those method × DGP cells via the per-DGP
``survey_design_factory`` in the ``DGPSpec`` registry (PR #352 R5 P3).
The ``stratified_survey`` DGP runs bootstrap and jackknife; placebo is
skipped because its cohort packs all treated units into stratum 1,
which has 0 never-treated units, so the stratified-permutation
allocator is structurally infeasible on this DGP (raises Case B —
treated-containing stratum with zero controls — at fit-time).
Jackknife is reported
with a documented anti-conservatism caveat — with only 2 PSUs per
stratum, the stratified PSU-level jackknife formula has 1 effective DoF
per stratum, a known limitation (see REGISTRY §SyntheticDiD "Note
(survey + jackknife composition)"). The harness skips unsupported
method × DGP cells via the per-DGP ``survey_design_factory`` in the
``DGPSpec`` registry.

The output JSON underwrites the calibration table in
``docs/methodology/REGISTRY.md`` §SyntheticDiD, including the
Expand Down Expand Up @@ -227,13 +235,29 @@ def _stratified_survey_dgp(seed: int) -> Tuple[pd.DataFrame, List[int]]:
def _stratified_survey_design(df: pd.DataFrame) -> Tuple[Any, Tuple[str, ...]]:
"""Build the SurveyDesign for the stratified_survey DGP.

Methods supported: bootstrap only — placebo / jackknife reject
strata/PSU/FPC at fit-time (separate methodology gap).
Methods supported on this DGP:
* **bootstrap** — weighted-FW + Rao-Wu (PR #355). Calibration
validated here.
* **jackknife** — PSU-level LOO with stratum aggregation (Rust &
Rao 1996). Reported here with a known anti-conservatism caveat:
with ``psu_per_stratum=2``, within-stratum jackknife has only
``n_h - 1 = 1`` effective DoF per stratum, which is a well-
documented limitation of the stratified jackknife formula when
PSU counts are low. The reported ``se_over_truesd`` is expected
to land below 1; this is not a bug — users needing tight SE
calibration with few PSUs should prefer ``bootstrap``.
* **placebo** — NOT supported on this DGP: the treated cohort packs
into stratum 1 (which has 0 never-treated units by construction),
so the stratified-permutation allocator raises Case B (zero
controls in a treated-containing stratum) at fit-time. This is a
property of the DGP, not of the placebo allocator; the placebo
survey method is exercised by
``tests/test_survey_phase5.py::TestSDIDSurveyPlaceboFullDesign``.
"""
from diff_diff import SurveyDesign
return (
SurveyDesign(weights="weight", strata="stratum", psu="psu", fpc="fpc"),
("bootstrap",),
("bootstrap", "jackknife"),
)


Expand Down Expand Up @@ -273,10 +297,14 @@ def _fit_one(
"""Fit SDID and return (att, se, p_value); (None, None, None) on failure.

For survey DGPs the harness passes a SurveyDesign via ``survey_design``;
fit() routes it through the bootstrap survey path (PR #352) when
method=='bootstrap'. The DGP's ``survey_design_factory`` declares which
methods are supported, so the caller skips unsupported methods entirely
rather than catching the resulting NotImplementedError here.
``fit()`` routes strata/PSU/FPC designs through the method-specific
survey variance path — bootstrap (PR #355 weighted-FW + Rao-Wu),
placebo (stratified permutation + weighted-FW), or jackknife (PSU-
level LOO with stratum aggregation). The DGP's
``survey_design_factory`` declares which methods are supported on
that specific DGP, so the caller skips unsupported methods entirely
rather than catching the resulting NotImplementedError / Case B-D
ValueError here.
"""
try:
with warnings.catch_warnings():
Expand Down
4 changes: 2 additions & 2 deletions diff_diff/guides/llms-full.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1032,7 +1032,7 @@ Returned by `SyntheticDiD.fit()`.

**Validation diagnostics** (call after `fit()`):
- `get_weight_concentration(top_k=5)` - effective N and top-k weight share; flags fragile synthetic controls dominated by a few donor units
- `get_loo_effects_df()` - per-unit leave-one-out influence from the jackknife pass (DataFrame includes both control and treated rows). Requires `variance_method="jackknife"`; raises `ValueError` if LOO is unavailable (see the method docstring for the full set of conditions, e.g. single treated unit or only one control with nonzero effective weight)
- `get_loo_effects_df()` - per-unit leave-one-out influence from the jackknife pass (DataFrame includes both control and treated rows). Requires `variance_method="jackknife"` with unit-level LOO granularity: available on non-survey and pweight-only jackknife fits; raises `NotImplementedError` on full-design survey jackknife (PSU-level LOO, see `result.placebo_effects` for raw PSU-level replicates) and `ValueError` when LOO is unavailable (single treated unit, only one control with nonzero effective weight, etc.)
- `in_time_placebo()` - re-estimate on shifted fake treatment dates in the pre-period; near-zero placebo ATTs indicate a credible design
- `sensitivity_to_zeta_omega()` - re-estimate across a grid of unit-weight regularization values; checks ATT robustness to the auto-selected zeta_omega

Expand Down Expand Up @@ -1674,7 +1674,7 @@ sd_female, data_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F'
**Key features:**
- Taylor Series Linearization (TSL) variance with strata + PSU + FPC
- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (13 of 16 estimators, including dCDH)
- Survey-aware bootstrap: multiplier at PSU (Hall-Mammen wild; dCDH, staggered) or Rao-Wu rescaled (SunAbraham, SyntheticDiD, TROP). SyntheticDiD bootstrap composes Rao-Wu rescaled per-draw weights with the weighted Frank-Wolfe variant of `_sc_weight_fw` (PR #352): each draw solves `min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²` and composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Pweight-only fits use constant `rw = w_control`; full designs use Rao-Wu. SDID's placebo and jackknife paths still reject strata/PSU/FPC (separate methodology gap, tracked in TODO.md)
- Survey-aware bootstrap: multiplier at PSU (Hall-Mammen wild; dCDH, staggered) or Rao-Wu rescaled (SunAbraham, SyntheticDiD, TROP). SyntheticDiD bootstrap composes Rao-Wu rescaled per-draw weights with the weighted Frank-Wolfe variant of `_sc_weight_fw` (PR #355): each draw solves `min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²` and composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Pweight-only fits use constant `rw = w_control`; full designs use Rao-Wu. SDID's placebo (stratified permutation + weighted FW) and jackknife (PSU-level LOO with stratum aggregation, Rust & Rao 1996) paths also support pweight-only and full strata/PSU/FPC designs
- DEFF diagnostics, subpopulation analysis, weight trimming (`trim_weights`)
- Repeated cross-sections: `CallawaySantAnna(panel=False)`
- Compatibility matrix: see `docs/choosing_estimator.rst` Survey Design Support section
Expand Down
Loading
Loading