diff --git a/CHANGELOG.md b/CHANGELOG.md index dafbebc2..5f72a884 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -37,7 +37,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **SDID `variance_method="bootstrap"` survey support restored** via a hybrid pairs-bootstrap + Rao-Wu rescaling composed with a weighted Frank-Wolfe kernel. Each bootstrap draw first performs the unit-level pairs-bootstrap resampling specified by Arkhangelsky et al. (2021) Algorithm 2 (`boot_idx = rng.choice(n_total)`), and *then* applies Rao-Wu rescaled per-unit weights (Rao & Wu 1988) sliced over the resampled units — NOT a standalone Rao-Wu bootstrap. New Rust kernel `sc_weight_fw_weighted` (and `_with_convergence` sibling) accepts a per-coordinate `reg_weights` argument so the FW objective becomes `min ||A·ω - b||² + ζ²·Σ_j reg_w[j]·ω[j]²`. New Python helpers `compute_sdid_unit_weights_survey` and `compute_time_weights_survey` thread per-control survey weights through the two-pass sparsify-refit dispatcher (column-scaling Y by `rw` for the loss, `reg_weights=rw` for the penalty on the unit-weights side; weighted column-centering + row-scaling Y by `sqrt(rw)` for the loss with uniform reg on the time-weights side). `_bootstrap_se` survey branch composes the per-draw `rw` (Rao-Wu rescaling for full designs, constant `w_control` for pweight-only fits) with the weighted-FW helpers, then composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Coverage MC artifact extended with a `stratified_survey` DGP (BRFSS-style: N=40, strata=2, PSU=2/stratum); the bootstrap row's near-nominal calibration is the validation gate (target rejection ∈ [0.02, 0.10] at α=0.05). New regression tests across `test_methodology_sdid.py::TestBootstrapSE` (single-PSU short-circuit, full-design and pweight-only succeeds-tests, zero-treated-mass retry, deterministic Rao-Wu × boot_idx slice) and `test_survey_phase5.py::TestSyntheticDiDSurvey` (full-design ↔ pweight-only SE differs assertion). See REGISTRY.md §SyntheticDiD ``Note (survey + bootstrap composition)`` for the full objective and the argmin-set caveat. ### Changed (PR #355) -- **SDID bootstrap SE values under survey fits now differ numerically from the v3.2.x line that shipped PR #351 alone**: the fit no longer raises `NotImplementedError`, and instead returns the weighted-FW + Rao-Wu SE. Non-survey fits are unaffected (the bootstrap dispatcher routes only the survey branch through the new `_survey` helpers; non-survey fits continue to call the existing `compute_sdid_unit_weights` / `compute_time_weights` and stay bit-identical at rel=1e-14 on the `_BASELINE["bootstrap"]` regression). SDID's `placebo` and `jackknife` paths still reject `strata/PSU/FPC` (separate methodology gap; tracked in TODO.md as a follow-up PR). +- **SDID bootstrap SE values under survey fits now differ numerically from the v3.2.x line that shipped PR #351 alone**: the fit no longer raises `NotImplementedError`, and instead returns the weighted-FW + Rao-Wu SE. Non-survey fits are unaffected (the bootstrap dispatcher routes only the survey branch through the new `_survey` helpers; non-survey fits continue to call the existing `compute_sdid_unit_weights` / `compute_time_weights` and stay bit-identical at rel=1e-14 on the `_BASELINE["bootstrap"]` regression). SDID's `placebo` and `jackknife` paths still reject `strata/PSU/FPC` on the v3.2.x line; full-design support for those methods lands separately in the entries below. + +### Added +- **SDID `variance_method="placebo"` and `"jackknife"` now support strata/PSU/FPC designs.** Closes the last SDID survey gap. All three variance methods (bootstrap from PR #355, plus placebo and jackknife here) now handle full survey designs. New private methods `SyntheticDiD._placebo_variance_se_survey` and `_jackknife_se_survey` route the full-design path through method-specific allocators: + - **Placebo** — stratified permutation (Pesarin 2001). Each draw samples pseudo-treated indices uniformly without replacement from controls *within each stratum* containing actual treated units; non-treated strata contribute their controls unconditionally. The weighted Frank-Wolfe kernel from PR #355 (`compute_sdid_unit_weights_survey` / `compute_time_weights_survey`) re-estimates ω and λ per draw with per-control survey weights threaded into both loss and regularization; post-optimization composition `ω_eff = rw·ω/Σ(rw·ω)`. Arkhangelsky Algorithm 4 SE formula unchanged. + - **Jackknife** — PSU-level leave-one-out with stratum aggregation (Rust & Rao 1996). `SE² = Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)²` with `f_h = n_h_sampled / fpc[h]` (population-count FPC form). λ held fixed across LOOs; ω subsetted, composed with rw, renormalized. Strata with `n_h < 2` silently skipped (matches R `survey::svyjkn` with `lonely_psu="remove"` / `"certainty"`; `"adjust"` raises `NotImplementedError`). Full-census strata (`f_h ≥ 1`) short-circuit to zero contribution before any LOO feasibility check. `SE = 0` is returned for legitimate zero variance (e.g., every stratum full-census); `SE = NaN` with a targeted `UserWarning` is reserved for undefined cases — all strata skipped, or any delete-one replicate in a non-full-census contributing stratum is undefined (all-treated-in-one-PSU LOO, kept ω_eff / w_treated mass zero, estimator raises). Unstratified single-PSU short-circuits to NaN. + - **Fit-time feasibility guards** (placebo): `ValueError` on stratum-level infeasibility with targeted messages distinguishing three cases — **Case B** (treated-containing stratum has zero controls), **Case C** (fewer controls than treated in a treated stratum), **Case D** (every treated stratum is exact-count `n_c_h == n_t_h` → permutation support is 1, null distribution collapses). Partial-permutation fallback rejected because it would silently change the null-distribution semantics. + - **Gate relaxed**: the fit-time guard at `synthetic_did.py:352-369` that rejected placebo/jackknife + strata/PSU/FPC is removed. Replicate-weight designs remain rejected (separate methodology — replicate variance is closed-form and would double-count with Rao-Wu-like rescaling). Non-survey and pweight-only paths bit-identical by construction — the new code is gated on `resolved_survey_unit.(strata|psu|fpc) is not None`. + - **Coverage MC**: `benchmarks/data/sdid_coverage.json` extended with jackknife on `stratified_survey`. Bootstrap validates near-nominal (α=0.05 rejection = 0.058, SE/trueSD = 1.13). Jackknife reported with an anti-conservatism caveat: with only 2 PSUs per stratum the stratified jackknife formula has 1 effective DoF per stratum, a well-documented limitation of Rust & Rao (1996) — `se_over_truesd ≈ 0.46` on this DGP. Users needing tight SE calibration with few PSUs should prefer `variance_method="bootstrap"`. Placebo is structurally infeasible on the existing `stratified_survey` DGP (its cohort packs into one stratum with 0 never-treated units — by design a bootstrap-suited DGP); the placebo survey path is exercised via unit tests on a feasible fixture. + - **Regression tests** across `tests/test_survey_phase5.py`: two new classes `TestSDIDSurveyPlaceboFullDesign` and `TestSDIDSurveyJackknifeFullDesign`. Placebo: pseudo-treated-stratum contract, Case B / Case C front-door guards with targeted-message regression, SE-differs-from-pweight-only, deterministic dispatch. Jackknife: stratum-aggregation self-consistency, **FPC magnitude regression** (2-stratum handcrafted panel asserts `SE_fpc == SE_nofpc · sqrt(1-f)` at `rtol=1e-10`), single-PSU-stratum skip, unstratified short-circuit, all-strata-skipped warning + NaN, SE-differs-from-pweight-only, deterministic dispatch. Existing `test_full_design_placebo_raises` and `test_full_design_jackknife_raises` flipped to `_succeeds` assertions. All 19 existing pweight-only and non-survey placebo/jackknife tests pass unchanged (bit-identity preserved via the new-path gating). + - **Allocator asymmetry** (documented in REGISTRY): placebo ignores the PSU axis (unit-level within-stratum permutation — the classical stratified permutation test; PSU-level permutation on few PSUs is near-degenerate); jackknife respects PSU (PSU-level LOO is the canonical survey jackknife). Both respect strata. See `docs/methodology/REGISTRY.md` §SyntheticDiD `Note (survey + placebo composition)` and `Note (survey + jackknife composition)`. ## [3.2.0] - 2026-04-19 diff --git a/TODO.md b/TODO.md index 5c34735b..0ac73b0a 100644 --- a/TODO.md +++ b/TODO.md @@ -107,7 +107,6 @@ Deferred items from PR reviews that were not addressed before merge. | `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low | | `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low | | `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium | -| **SDID + placebo/jackknife + strata/PSU/FPC** (capability gap remaining after PR #352). PR #352 restored survey-bootstrap support via weighted Frank-Wolfe + Rao-Wu composition; the same composition for `placebo` (which permutes control indices) and `jackknife` (which leaves out one unit at a time) requires its own derivations: placebo's allocator needs a weighted permutation distribution that respects PSU clustering; jackknife needs PSU-level LOO + stratum aggregation. Both reuse the weighted-FW kernel from PR #352 (`_sc_weight_fw(reg_weights=)`); the genuinely new work is the per-method allocator. Tracked but no concrete sketch yet — defer until user demand surfaces. | `synthetic_did.py::_placebo_variance_se`, `synthetic_did.py::_jackknife_se` | follow-up | Low | | SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low | #### Performance diff --git a/benchmarks/data/sdid_coverage.json b/benchmarks/data/sdid_coverage.json index 8f18a6e3..e8fab705 100644 --- a/benchmarks/data/sdid_coverage.json +++ b/benchmarks/data/sdid_coverage.json @@ -4,7 +4,7 @@ "n_bootstrap": 200, "library_version": "3.2.0", "backend": "rust", - "generated_at": "2026-04-24T13:01:54.876774+00:00", + "generated_at": "2026-04-24T21:08:20.185764+00:00", "total_elapsed_sec": 2420.61, "methods": [ "placebo", @@ -156,17 +156,17 @@ "se_over_truesd": 1.1297002530566618 }, "jackknife": { - "n_successful_fits": 0, + "n_successful_fits": 500, "rejection_rate": { - "0.01": null, - "0.05": null, - "0.10": null + "0.01": 0.358, + "0.05": 0.45, + "0.10": 0.512 }, - "mean_se": null, - "true_sd_tau_hat": null, - "se_over_truesd": null + "mean_se": 0.20686834633234263, + "true_sd_tau_hat": 0.4512243070193919, + "se_over_truesd": 0.4584601119980272 }, - "_elapsed_sec": 16.48 + "_elapsed_sec": 18.62 } } } \ No newline at end of file diff --git a/benchmarks/python/coverage_sdid.py b/benchmarks/python/coverage_sdid.py index 75460f1e..daace08b 100644 --- a/benchmarks/python/coverage_sdid.py +++ b/benchmarks/python/coverage_sdid.py @@ -8,10 +8,18 @@ rates at α ∈ {0.01, 0.05, 0.10} plus the ratio of mean estimated SE to the empirical sampling SD of τ̂. -The ``stratified_survey`` DGP is bootstrap-only — placebo and jackknife -still reject full strata/PSU/FPC survey designs (tracked in ``TODO.md``), -so the harness skips those method × DGP cells via the per-DGP -``survey_design_factory`` in the ``DGPSpec`` registry (PR #352 R5 P3). +The ``stratified_survey`` DGP runs bootstrap and jackknife; placebo is +skipped because its cohort packs all treated units into stratum 1, +which has 0 never-treated units, so the stratified-permutation +allocator is structurally infeasible on this DGP (raises Case B — +treated-containing stratum with zero controls — at fit-time). +Jackknife is reported +with a documented anti-conservatism caveat — with only 2 PSUs per +stratum, the stratified PSU-level jackknife formula has 1 effective DoF +per stratum, a known limitation (see REGISTRY §SyntheticDiD "Note +(survey + jackknife composition)"). The harness skips unsupported +method × DGP cells via the per-DGP ``survey_design_factory`` in the +``DGPSpec`` registry. The output JSON underwrites the calibration table in ``docs/methodology/REGISTRY.md`` §SyntheticDiD, including the @@ -227,13 +235,29 @@ def _stratified_survey_dgp(seed: int) -> Tuple[pd.DataFrame, List[int]]: def _stratified_survey_design(df: pd.DataFrame) -> Tuple[Any, Tuple[str, ...]]: """Build the SurveyDesign for the stratified_survey DGP. - Methods supported: bootstrap only — placebo / jackknife reject - strata/PSU/FPC at fit-time (separate methodology gap). + Methods supported on this DGP: + * **bootstrap** — weighted-FW + Rao-Wu (PR #355). Calibration + validated here. + * **jackknife** — PSU-level LOO with stratum aggregation (Rust & + Rao 1996). Reported here with a known anti-conservatism caveat: + with ``psu_per_stratum=2``, within-stratum jackknife has only + ``n_h - 1 = 1`` effective DoF per stratum, which is a well- + documented limitation of the stratified jackknife formula when + PSU counts are low. The reported ``se_over_truesd`` is expected + to land below 1; this is not a bug — users needing tight SE + calibration with few PSUs should prefer ``bootstrap``. + * **placebo** — NOT supported on this DGP: the treated cohort packs + into stratum 1 (which has 0 never-treated units by construction), + so the stratified-permutation allocator raises Case B (zero + controls in a treated-containing stratum) at fit-time. This is a + property of the DGP, not of the placebo allocator; the placebo + survey method is exercised by + ``tests/test_survey_phase5.py::TestSDIDSurveyPlaceboFullDesign``. """ from diff_diff import SurveyDesign return ( SurveyDesign(weights="weight", strata="stratum", psu="psu", fpc="fpc"), - ("bootstrap",), + ("bootstrap", "jackknife"), ) @@ -273,10 +297,14 @@ def _fit_one( """Fit SDID and return (att, se, p_value); (None, None, None) on failure. For survey DGPs the harness passes a SurveyDesign via ``survey_design``; - fit() routes it through the bootstrap survey path (PR #352) when - method=='bootstrap'. The DGP's ``survey_design_factory`` declares which - methods are supported, so the caller skips unsupported methods entirely - rather than catching the resulting NotImplementedError here. + ``fit()`` routes strata/PSU/FPC designs through the method-specific + survey variance path — bootstrap (PR #355 weighted-FW + Rao-Wu), + placebo (stratified permutation + weighted-FW), or jackknife (PSU- + level LOO with stratum aggregation). The DGP's + ``survey_design_factory`` declares which methods are supported on + that specific DGP, so the caller skips unsupported methods entirely + rather than catching the resulting NotImplementedError / Case B-D + ValueError here. """ try: with warnings.catch_warnings(): diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt index 6e2556a8..00f9c3bb 100644 --- a/diff_diff/guides/llms-full.txt +++ b/diff_diff/guides/llms-full.txt @@ -1032,7 +1032,7 @@ Returned by `SyntheticDiD.fit()`. **Validation diagnostics** (call after `fit()`): - `get_weight_concentration(top_k=5)` - effective N and top-k weight share; flags fragile synthetic controls dominated by a few donor units -- `get_loo_effects_df()` - per-unit leave-one-out influence from the jackknife pass (DataFrame includes both control and treated rows). Requires `variance_method="jackknife"`; raises `ValueError` if LOO is unavailable (see the method docstring for the full set of conditions, e.g. single treated unit or only one control with nonzero effective weight) +- `get_loo_effects_df()` - per-unit leave-one-out influence from the jackknife pass (DataFrame includes both control and treated rows). Requires `variance_method="jackknife"` with unit-level LOO granularity: available on non-survey and pweight-only jackknife fits; raises `NotImplementedError` on full-design survey jackknife (PSU-level LOO, see `result.placebo_effects` for raw PSU-level replicates) and `ValueError` when LOO is unavailable (single treated unit, only one control with nonzero effective weight, etc.) - `in_time_placebo()` - re-estimate on shifted fake treatment dates in the pre-period; near-zero placebo ATTs indicate a credible design - `sensitivity_to_zeta_omega()` - re-estimate across a grid of unit-weight regularization values; checks ATT robustness to the auto-selected zeta_omega @@ -1674,7 +1674,7 @@ sd_female, data_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F' **Key features:** - Taylor Series Linearization (TSL) variance with strata + PSU + FPC - Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (13 of 16 estimators, including dCDH) -- Survey-aware bootstrap: multiplier at PSU (Hall-Mammen wild; dCDH, staggered) or Rao-Wu rescaled (SunAbraham, SyntheticDiD, TROP). SyntheticDiD bootstrap composes Rao-Wu rescaled per-draw weights with the weighted Frank-Wolfe variant of `_sc_weight_fw` (PR #352): each draw solves `min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²` and composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Pweight-only fits use constant `rw = w_control`; full designs use Rao-Wu. SDID's placebo and jackknife paths still reject strata/PSU/FPC (separate methodology gap, tracked in TODO.md) +- Survey-aware bootstrap: multiplier at PSU (Hall-Mammen wild; dCDH, staggered) or Rao-Wu rescaled (SunAbraham, SyntheticDiD, TROP). SyntheticDiD bootstrap composes Rao-Wu rescaled per-draw weights with the weighted Frank-Wolfe variant of `_sc_weight_fw` (PR #355): each draw solves `min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²` and composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Pweight-only fits use constant `rw = w_control`; full designs use Rao-Wu. SDID's placebo (stratified permutation + weighted FW) and jackknife (PSU-level LOO with stratum aggregation, Rust & Rao 1996) paths also support pweight-only and full strata/PSU/FPC designs - DEFF diagnostics, subpopulation analysis, weight trimming (`trim_weights`) - Repeated cross-sections: `CallawaySantAnna(panel=False)` - Compatibility matrix: see `docs/choosing_estimator.rst` Survey Design Support section diff --git a/diff_diff/results.py b/diff_diff/results.py index ebdd1549..69d48b8f 100644 --- a/diff_diff/results.py +++ b/diff_diff/results.py @@ -882,6 +882,12 @@ def __post_init__(self): # Plain attributes rather than dataclass fields so asdict()-style # recursion cannot serialize internal panel state. self._loo_unit_ids: Optional[List[Any]] = None + # Granularity of the `placebo_effects` LOO array: "unit" (non- + # survey + pweight-only jackknife), "psu" (full-design survey + # jackknife), or None (non-jackknife variance methods). Governs + # which accessors are well-defined. Set by `fit()` at result + # construction time. + self._loo_granularity: Optional[str] = None self._loo_roles: Optional[List[str]] = None self._fit_snapshot: Optional[_SyntheticDiDFitSnapshot] = None @@ -1103,9 +1109,25 @@ def get_loo_effects_df(self) -> pd.DataFrame: """ Per-unit leave-one-out ATT from the jackknife variance pass. - Requires ``variance_method='jackknife'``; raises ValueError otherwise. - - The underlying values come from the jackknife loops in + Requires ``variance_method='jackknife'`` (``ValueError`` otherwise) + and unit-level LOO granularity (``NotImplementedError`` for the + full-design survey jackknife path, which uses PSU-level LOO). + + Available on: + * non-survey jackknife fits (classical Arkhangelsky Algorithm 3). + * pweight-only survey jackknife fits (Algorithm 3 with post-hoc + ω_eff composition; PSU labels in ``survey_metadata`` come from + implicit-PSU metadata but the LOO remains unit-level). + + Blocked on: + * full-design survey jackknife fits (strata / PSU / FPC set in + ``SurveyDesign``) — the underlying replicates are PSU-level + ``τ̂_{(h,j)}`` (Rust & Rao 1996), not unit-level. See + ``result.placebo_effects`` for the raw PSU-level replicate + array and REGISTRY §SyntheticDiD "Note (survey + jackknife + composition)" for the aggregation formula. + + The underlying unit-level values come from the jackknife loops in ``SyntheticDiD._jackknife_se``: control LOO estimates fill the first ``n_control`` positions (in the order of the control units seen by fit), then treated LOO estimates fill the next @@ -1133,6 +1155,30 @@ def get_loo_effects_df(self) -> pd.DataFrame: "Re-fit with SyntheticDiD(variance_method='jackknife') to " "obtain per-unit leave-one-out estimates." ) + # Survey-jackknife fits use PSU-level LOO (Rust & Rao 1996) with + # stratum aggregation rather than unit-level LOO. The returned + # ``placebo_effects`` array in that path is a flat list of + # PSU-level τ̂_{(h,j)} replicates (variable length, ordered by + # stratum then PSU), not a length-N unit-indexed array. Mapping + # these onto the fit-time unit IDs would mislabel PSU replicates + # as unit effects. Block the accessor when the explicit + # granularity flag set by ``fit()`` is "psu". We key off the + # granularity flag rather than ``survey_metadata.n_psu`` because + # pweight-only survey jackknife fits also populate ``n_psu`` via + # implicit-PSU metadata (``survey.py`` L749-L753) but still run + # unit-level LOO, so the ``n_psu`` heuristic would false-positive. + if getattr(self, "_loo_granularity", None) == "psu": + raise NotImplementedError( + "get_loo_effects_df() is unit-level-LOO only. This fit used " + "the full-design survey jackknife (PSU-level LOO with " + "stratum aggregation, Rust & Rao 1996); the underlying " + "replicates are PSU-level, not unit-level, so joining them " + "back to fit-time unit IDs is not well-defined. See " + "``result.placebo_effects`` for the raw PSU-level replicate " + "array and ``docs/methodology/REGISTRY.md`` §SyntheticDiD " + "\"Note (survey + jackknife composition)\" for the " + "aggregation formula." + ) if self._loo_unit_ids is None or self._loo_roles is None or self.placebo_effects is None: raise ValueError( "Leave-one-out estimates are unavailable (jackknife returned " diff --git a/diff_diff/synthetic_did.py b/diff_diff/synthetic_did.py index 7146ba19..ba056418 100644 --- a/diff_diff/synthetic_did.py +++ b/diff_diff/synthetic_did.py @@ -247,18 +247,29 @@ def fit( # type: ignore[override] out before computing the SDID estimator. survey_design : SurveyDesign, optional Survey design specification. Only pweight weight_type is - supported. Support matrix (PR #352): + supported. Replicate-weight designs are rejected. All three + variance methods support both pweight-only and full + strata/PSU/FPC designs: method pweight-only strata/PSU/FPC - bootstrap ✓ weighted FW ✓ weighted FW + Rao-Wu - placebo ✓ ✗ NotImplementedError - jackknife ✓ ✗ NotImplementedError - - The bootstrap path composes Rao-Wu rescaled weights per draw - with the weighted-Frank-Wolfe kernel; see REGISTRY.md - §SyntheticDiD ``Note (survey + bootstrap composition)``. - ``placebo`` and ``jackknife`` still reject strata/PSU/FPC - (separate methodology gap tracked in TODO.md). + bootstrap ✓ weighted FW ✓ weighted FW + Rao-Wu (PR #355) + placebo ✓ ✓ stratified permutation + weighted FW + jackknife ✓ ✓ PSU-level LOO + stratum aggregation + + - **Bootstrap** composes Rao-Wu rescaled weights per draw with + the weighted-Frank-Wolfe kernel; see REGISTRY.md §SyntheticDiD + ``Note (survey + bootstrap composition)``. + - **Placebo** under full design uses within-stratum permutation + (pseudo-treated sampled from controls in each treated-containing + stratum) with weighted-FW refit per draw; fit-time feasibility + guards raise ``ValueError`` when a treated stratum has fewer + controls than treated units (see ``Note (survey + placebo + composition)``). + - **Jackknife** under full design uses PSU-level LOO with + stratum aggregation (Rust & Rao 1996); anti-conservative with + few PSUs per stratum — prefer ``bootstrap`` when tight SE + calibration matters in that regime (see ``Note (survey + + jackknife composition)``). Returns ------- @@ -293,10 +304,12 @@ def fit( # type: ignore[override] targeted error instead of a bootstrap-exhaustion failure (PR #355 R8 P1). NotImplementedError - If ``survey_design`` with strata/PSU/FPC is provided with - ``variance_method='placebo'`` or ``'jackknife'``. Bootstrap - + any survey design (pweight-only or full design) is - supported via PR #352's weighted-FW + Rao-Wu composition. + If ``survey_design`` carries replicate weights (BRR/Fay/JK1/ + JKn/SDR) — SyntheticDiD has no replicate-weight variance + path. All three variance methods (placebo, bootstrap, + jackknife) accept pweight-only and full strata/PSU/FPC + analytical designs; only replicate-weight designs are + rejected. """ # Validate inputs if outcome is None or treatment is None or unit is None or time is None: @@ -317,23 +330,68 @@ def fit( # type: ignore[override] _validate_unit_constant_survey, ) + # R11 P1 fix: FPC is a documented no-op on placebo (Pesarin 2001 + # §1.5 — permutation tests condition on the observed sample), but + # ``SurveyDesign.resolve()`` itself enforces ``FPC >= n_PSU`` + # design-validity constraints (survey.py:349-368). On placebo, + # those constraints would block legitimate fits for a design + # element that doesn't enter the placebo math. Drop FPC from a + # copy of the survey design before resolution so placebo + # bypasses the validator entirely; emit the FPC no-op warning + # at the same time. The original survey_design object is + # preserved (caller's reference unchanged). + survey_design_for_resolve = survey_design + if ( + self.variance_method == "placebo" + and survey_design is not None + and getattr(survey_design, "fpc", None) is not None + ): + # R13 P3 fix: validate the FPC column name exists in `data` + # before dropping. Otherwise a typoed ``fpc="fpc_typo"`` is + # silently ignored on the placebo path (the missing-column + # check inside ``SurveyDesign.resolve()`` never runs because + # we strip FPC pre-resolve). Raise the same targeted error + # ``resolve()`` would have raised so input-spec mistakes + # surface even when the value is mathematically a no-op. + fpc_col = survey_design.fpc + if fpc_col not in data.columns: + raise ValueError( + f"FPC column '{fpc_col}' not found in data" + ) + import dataclasses as _dc + warnings.warn( + "SurveyDesign(fpc=...) is a no-op on " + "variance_method='placebo': permutation tests are " + "conditional on the observed sample (Pesarin 2001 §1.5), " + "so the sampling fraction does not enter Algorithm 4 or " + "its stratified-permutation survey extension. The FPC " + "column is dropped from the resolved survey design for " + "the placebo fit (this also bypasses the FPC >= n_PSU " + "design-validity check in SurveyDesign.resolve()). Use " + "variance_method='bootstrap' or 'jackknife' if you need " + "FPC to participate in the variance computation.", + UserWarning, + stacklevel=2, + ) + survey_design_for_resolve = _dc.replace(survey_design, fpc=None) + resolved_survey, survey_weights, survey_weight_type, survey_metadata = ( - _resolve_survey_for_fit(survey_design, data, "analytical") + _resolve_survey_for_fit(survey_design_for_resolve, data, "analytical") ) # Reject replicate-weight designs — SyntheticDiD has no replicate- # weight variance path. Analytical (pweight / strata / PSU / FPC) - # designs are supported per the PR #352 matrix (bootstrap covers - # full design via weighted-FW + Rao-Wu; placebo / jackknife - # accept pweight-only, reject strata/PSU/FPC). + # designs are supported across all three variance methods: + # bootstrap via weighted-FW + Rao-Wu (PR #355); placebo via + # stratified permutation + weighted FW; jackknife via PSU-level + # LOO with stratum aggregation (Rust & Rao 1996). if resolved_survey is not None and resolved_survey.uses_replicate_variance: raise NotImplementedError( "SyntheticDiD does not support replicate-weight survey " - "designs. Analytical survey designs are supported: " - "variance_method='bootstrap' accepts both pweight-only " - "and strata/PSU/FPC designs (PR #352), while " - "variance_method='placebo' and 'jackknife' accept " - "pweight-only. See docs/methodology/REGISTRY.md " - "§SyntheticDiD for the full survey support matrix." + "designs. Analytical designs are supported across all " + "three variance methods (placebo, bootstrap, jackknife), " + "for both pweight-only and full strata/PSU/FPC. See " + "docs/methodology/REGISTRY.md §SyntheticDiD for the " + "full survey support matrix." ) # Validate pweight only if resolved_survey is not None and resolved_survey.weight_type != "pweight": @@ -342,32 +400,14 @@ def fit( # type: ignore[override] f"Got '{resolved_survey.weight_type}'." ) - # Strata/PSU/FPC support matrix (PR #352): - # bootstrap → supported via weighted Frank-Wolfe + Rao-Wu rescaling - # (this PR; see _bootstrap_se Rao-Wu branch). - # placebo / jackknife → NotImplemented for full designs (separate - # methodology gap; resampling allocators differ between - # bootstrap pairs and placebo permutations / jackknife - # LOO). Tracked in TODO.md as a follow-up. - if ( - resolved_survey is not None - and ( - resolved_survey.strata is not None - or resolved_survey.psu is not None - or resolved_survey.fpc is not None - ) - and self.variance_method in ("placebo", "jackknife") - ): - raise NotImplementedError( - f"SyntheticDiD with variance_method='{self.variance_method}' " - "does not yet support survey designs with strata/PSU/FPC. " - "Pweight-only pseudo-population weights work with placebo / " - "jackknife. Strata/PSU/FPC support requires per-method " - "Rao-Wu / wild-bootstrap derivations on the placebo " - "allocator and the jackknife LOO mass; tracked in TODO.md " - "(SDID survey support follow-up). Use " - "variance_method='bootstrap' for full survey designs." - ) + # Strata/PSU/FPC support matrix: + # bootstrap → supported via weighted Frank-Wolfe + hybrid + # pairs-bootstrap + Rao-Wu rescaling (PR #355; + # see _bootstrap_se Rao-Wu branch). + # placebo → supported via stratified permutation + weighted + # Frank-Wolfe (this PR; _placebo_variance_se_survey). + # jackknife → supported via PSU-level LOO with stratum + # aggregation (this PR; _jackknife_se_survey). # Validate treatment is binary validate_binary(data[treatment].values, "treatment") @@ -469,8 +509,17 @@ def fit( # type: ignore[override] # sees a generic bootstrap-exhaustion message instead of a # targeted FPC/design error. Validate upstream so the user # gets a clean error before the bootstrap loop even starts. + # + # R10 P1 fix: gate this validator on variance methods that + # actually use FPC. Bootstrap (Rao-Wu) and jackknife (Rust + # & Rao stratum aggregation) both consume FPC; placebo is + # documented as FPC-no-op (Pesarin 2001 §1.5 — permutation + # tests condition on the observed sample). Running the + # validator on placebo would block legitimate placebo fits + # for a constraint that doesn't apply to permutation math. if ( - resolved_survey_unit.psu is None + self.variance_method in ("bootstrap", "jackknife") + and resolved_survey_unit.psu is None and resolved_survey_unit.fpc is not None ): if resolved_survey_unit.strata is None: @@ -742,6 +791,231 @@ def fit( # type: ignore[override] treated_pre_trajectory = Y_pre_treated_mean treated_post_trajectory = Y_post_treated_mean + # Detect full-design survey (strata/PSU/FPC). The unit-collapsed + # ``resolved_survey_unit`` carries the per-unit strata/psu/fpc + # arrays ordered as [control..., treated...] to match the + # downstream variance-method column layout. + _full_design_survey = ( + resolved_survey_unit is not None + and ( + resolved_survey_unit.strata is not None + or resolved_survey_unit.psu is not None + or resolved_survey_unit.fpc is not None + ) + ) + if _full_design_survey: + _n_c = len(control_units) + _strata_control = ( + resolved_survey_unit.strata[:_n_c] + if resolved_survey_unit.strata is not None + else None + ) + _strata_treated = ( + resolved_survey_unit.strata[_n_c:] + if resolved_survey_unit.strata is not None + else None + ) + _psu_control = ( + resolved_survey_unit.psu[:_n_c] + if resolved_survey_unit.psu is not None + else None + ) + _psu_treated = ( + resolved_survey_unit.psu[_n_c:] + if resolved_survey_unit.psu is not None + else None + ) + _fpc_control = ( + resolved_survey_unit.fpc[:_n_c] + if resolved_survey_unit.fpc is not None + else None + ) + _fpc_treated = ( + resolved_survey_unit.fpc[_n_c:] + if resolved_survey_unit.fpc is not None + else None + ) + else: + _strata_control = None + _strata_treated = None + _psu_control = None + _psu_treated = None + _fpc_control = None + _fpc_treated = None + + # Placebo routes to the survey allocator whenever **strata or + # PSU** is declared (FPC alone does NOT flip dispatch). For + # PSU-without-strata designs, the whole panel is synthesized + # as a single stratum (stratified permutation degenerates to + # global within-stratum permutation, still dispatched through + # the weighted-FW path). + # + # FPC handling on placebo (R8 P1 fix): permutation tests are + # conditional on the observed sample (Pesarin 2001 §1.5), so + # the sampling fraction does not enter Algorithm 4 or its + # stratified-permutation extension. Including FPC in the + # dispatch trigger would silently switch numerics (weighted-FW + # vs unweighted-FW + post-hoc composition) on a survey design + # element that has no place in the placebo math. Drop FPC from + # the dispatch condition; emit a ``UserWarning`` below if FPC + # is set with placebo to surface the no-op contract. + _placebo_use_survey_path = ( + self.variance_method == "placebo" + and resolved_survey_unit is not None + and ( + resolved_survey_unit.strata is not None + or resolved_survey_unit.psu is not None + ) + ) + # NOTE: the FPC no-op warning for placebo is emitted earlier + # (before ``_resolve_survey_for_fit``); ``resolved_survey_unit.fpc`` + # is already None on the placebo path because the FPC column is + # dropped from a copy of the survey design pre-resolve. No + # duplicate warning here. + + # Jackknife routes to the survey allocator whenever PSU or FPC or + # strata is declared. PSU-without-strata is treated as a single + # stratum (Rust & Rao 1996 JK1 form) inside + # ``_jackknife_se_survey``. + _jackknife_use_survey_path = ( + _full_design_survey and self.variance_method == "jackknife" + ) + + # Synthesize a single stratum for PSU/FPC-without-strata designs + # so the placebo / jackknife survey paths can treat them as the + # JK1 / global-permutation degenerate case of the stratified + # allocator. The `_strata_*_eff` arrays are passed to the survey + # methods; the original `_strata_*` arrays stay None so other + # code paths (REGISTRY, metadata) see the true design. + if _full_design_survey and _strata_control is None: + _strata_control_eff: np.ndarray = np.zeros( + len(control_units), dtype=np.int64 + ) + _strata_treated_eff: np.ndarray = np.zeros( + len(treated_units), dtype=np.int64 + ) + else: + _strata_control_eff = _strata_control # type: ignore[assignment] + _strata_treated_eff = _strata_treated # type: ignore[assignment] + + # Fit-time feasibility guard for stratified-permutation placebo + # (per `feedback_front_door_over_retry_swallow.md`). Case B / Case C + # are hard failures — partial-permutation fallback would silently + # change the null-distribution semantics and produce an incoherent + # test. Must run *before* the retry loop below swallows ValueErrors + # via `except (ValueError, LinAlgError, ZeroDivisionError): continue`. + if _placebo_use_survey_path: + unique_treated_strata, treated_counts = np.unique( + _strata_treated_eff, return_counts=True + ) + has_nondegenerate_stratum = False + assert w_control is not None # always set on full-design survey + for h, n_t_h in zip(unique_treated_strata, treated_counts): + n_c_h = int(np.sum(_strata_control_eff == h)) + if n_c_h == 0: + raise ValueError( + "Stratified-permutation placebo requires at least " + f"one control per stratum containing treated units; " + f"stratum {h} has 0 controls and {int(n_t_h)} " + "treated units. Either rebalance the panel, drop " + f"stratum {h} from the design, or use " + "variance_method='bootstrap' (which supports the " + "same full survey design via weighted-FW + Rao-Wu " + "without a permutation-feasibility constraint)." + ) + if n_c_h < int(n_t_h): + raise ValueError( + "Stratified-permutation placebo requires at least " + "n_treated controls per stratum containing treated " + "units (for exact-count within-stratum " + f"permutation); stratum {h} has {n_c_h} controls " + f"but {int(n_t_h)} treated units. Either rebalance " + "the panel, drop the undersupplied stratum, or use " + "variance_method='bootstrap' (which supports the " + "same full survey design via weighted-FW + Rao-Wu " + "without a permutation-feasibility constraint)." + ) + # Case E (R9 P1) — row-count guards passed (n_c_h ≥ n_t_h) + # but the stratum has fewer positive-weight controls + # than treated. The placebo allocator computes pseudo- + # treated means as ``np.average(Y, weights=w_control[idx])``; + # if too few controls have positive weight, draws can + # pick all-zero-weight subsets (ZeroDivisionError on + # np.average) and the retry loop swallows them as a + # generic ``n_successful=0`` warning + ``SE=0.0``. + # Front-door the targeted error. + w_in_h = w_control[_strata_control_eff == h] + n_c_h_positive = int(np.sum(w_in_h > 0)) + if n_c_h_positive < int(n_t_h): + raise ValueError( + "Stratified-permutation placebo requires at least " + "n_treated controls with positive survey weight " + "per stratum containing treated units (the " + "pseudo-treated mean uses survey-weighted " + f"averaging); stratum {h} has {n_c_h_positive} " + f"positive-weight controls (out of {n_c_h} total) " + f"but {int(n_t_h)} treated units. Either rebalance " + "the panel, drop the undersupplied stratum, or use " + "variance_method='bootstrap' (which supports the " + "same full survey design via weighted-FW + Rao-Wu " + "without a per-draw positive-mass constraint)." + ) + # Non-degenerate iff this stratum yields ≥2 distinct + # positive-mass pseudo-treated draws. Two necessary + # conditions, both required: + # * ``n_c_h > n_t_h`` — raw without-replacement count + # allows multiple subsets (otherwise only the + # "all-controls-as-pseudo-treated" subset exists, + # regardless of weights — Case D classical shape). + # * ``n_c_h_positive >= 2`` — at least 2 distinct + # positive-mass means are reachable. With only 1 + # positive-weight control, every successful pick + # reduces to that single control's mean (zero- + # weight cohabitants contribute 0 to numerator and + # denominator), regardless of how many subsets the + # raw allocator can construct (Case D effective + # single-support shape, R11 P1). + if n_c_h > int(n_t_h) and n_c_h_positive >= 2: + has_nondegenerate_stratum = True + # Case D: every treated stratum is effectively single- + # support, so the placebo null collapses to a single + # positive-mass allocation. Two paths into this: + # * Raw exact-count (``n_c_h == n_t_h`` for every treated + # stratum, R4 P1): the without-replacement permutation + # yields a single subset, every draw is identical. + # * Effective single-support (``n_c_h_positive < 2`` for + # every treated stratum, R11 P1): positive-mass picks + # reduce to a single distinct mean even when raw count + # counts are larger, because zero-weight controls + # contribute 0 to numerator and denominator. Successful + # draws all collapse to the unique positive-weight + # subset. + # Both shapes produce SE = FP noise (~1e-16) — reject up + # front rather than silently reporting a near-zero SE. + if not has_nondegenerate_stratum: + detail = ", ".join( + f"stratum {h}: n_c={int(np.sum(_strata_control_eff == h))}, " + f"n_c_positive={int(np.sum(w_control[_strata_control_eff == h] > 0))}, " + f"n_t={int(n_t_h)}" + for h, n_t_h in zip(unique_treated_strata, treated_counts) + ) + raise ValueError( + "Stratified-permutation placebo support is degenerate: " + "every treated-containing stratum has fewer than 2 " + "positive-weight controls, so within-stratum " + "permutation yields a single distinct positive-mass " + f"pseudo-treated mean across all draws ({detail}). " + "The resulting placebo distribution collapses to one " + "point and SE is not a meaningful null estimate. At " + "least one treated stratum must have ≥2 positive-" + "weight controls (and n_c_positive > n_t for the " + "test to have ≥2 distinct allocations). Either " + "rebalance the panel, or use " + "variance_method='bootstrap' (which supports the " + "same full survey design via weighted-FW + Rao-Wu " + "without a permutation-feasibility constraint)." + ) + # Compute standard errors on normalized Y, rescale to original units. # Variance procedures resample / permute indices (independent of Y # values) so RNG streams stay aligned across scales. @@ -749,7 +1023,7 @@ def fit( # type: ignore[override] # Paper-faithful pairs bootstrap (Algorithm 2 step 2): re-estimate # ω̂_b and λ̂_b via Frank-Wolfe on each draw. With survey designs # the FW switches to the weighted-FW variant and Rao-Wu rescaling - # supplies per-draw weights (PR #352). Pweight-only designs use + # supplies per-draw weights (PR #355). Pweight-only designs use # constant w_control across draws; full designs use Rao-Wu draws. # Determine which survey path the bootstrap should use: # - resolved_survey_unit + strata/PSU/FPC → Rao-Wu rescaling @@ -758,17 +1032,9 @@ def fit( # type: ignore[override] # bootstrap branch sets `_pweight_only` from `w_control` # when resolved_survey is None). # - non-survey → pass nothing (legacy path). - full_design = ( - resolved_survey_unit is not None - and ( - resolved_survey_unit.strata is not None - or resolved_survey_unit.psu is not None - or resolved_survey_unit.fpc is not None - ) - ) - _boot_resolved_survey = resolved_survey_unit if full_design else None - _boot_w_control = w_control if not full_design else None - _boot_w_treated = w_treated if not full_design else None + _boot_resolved_survey = resolved_survey_unit if _full_design_survey else None + _boot_w_control = w_control if not _full_design_survey else None + _boot_w_treated = w_treated if not _full_design_survey else None se_n, bootstrap_estimates_n = self._bootstrap_se( Y_pre_control_n, @@ -788,17 +1054,66 @@ def fit( # type: ignore[override] placebo_effects = np.asarray(bootstrap_estimates_n) * Y_scale inference_method = "bootstrap" elif self.variance_method == "jackknife": - # Fixed-weight jackknife (R's synthdid Algorithm 3) - se_n, jackknife_estimates_n = self._jackknife_se( - Y_pre_control_n, - Y_post_control_n, - Y_pre_treated_n, - Y_post_treated_n, - unit_weights, - time_weights, - w_treated=w_treated, - w_control=w_control, - ) + if _jackknife_use_survey_path: + # PSU-level LOO + stratum aggregation (Rust & Rao 1996). + assert w_control is not None and w_treated is not None + # R5 P1 fix: validate ``lonely_psu`` mode. The survey + # jackknife currently skips singleton strata (n_h < 2) + # unconditionally — equivalent to R ``survey::svyjkn``'s + # ``"remove"`` and ``"certainty"`` modes (both zero- + # contribution for singleton strata). ``"adjust"`` (use + # overall mean for singleton strata) is not implemented + # for SDID jackknife; reject upfront rather than silently + # treating it as ``"remove"``. + _lonely_psu_mode = getattr( + resolved_survey_unit, "lonely_psu", "remove" + ) + if _lonely_psu_mode not in ("remove", "certainty"): + raise NotImplementedError( + f"SurveyDesign(lonely_psu={_lonely_psu_mode!r}) is " + "not supported on the SDID jackknife survey path. " + "'remove' and 'certainty' are equivalent here " + "(both contribute 0 variance for singleton strata, " + "which is the canonical Rust & Rao 1996 behavior). " + "'adjust' requires an overall-mean fallback per " + "stratum that is not yet implemented for SDID " + "jackknife; use variance_method='bootstrap' (which " + "supports all three ``lonely_psu`` modes via the " + "weighted-FW + Rao-Wu path) or switch the design " + "to lonely_psu='remove'." + ) + # Unstratified designs use the synthesized single stratum + # (``_strata_*_eff``) so the loop reduces to classical + # JK1 (single-stratum PSU-LOO). + se_n, jackknife_estimates_n = self._jackknife_se_survey( + Y_pre_control_n, + Y_post_control_n, + Y_pre_treated_n, + Y_post_treated_n, + unit_weights, + time_weights, + w_control=w_control, + w_treated=w_treated, + strata_control=_strata_control_eff, + strata_treated=_strata_treated_eff, + psu_control=_psu_control, + psu_treated=_psu_treated, + fpc_control=_fpc_control, + fpc_treated=_fpc_treated, + lonely_psu=_lonely_psu_mode, + ) + else: + # Fixed-weight jackknife (R's synthdid Algorithm 3) + se_n, jackknife_estimates_n = self._jackknife_se( + Y_pre_control_n, + Y_post_control_n, + Y_pre_treated_n, + Y_post_treated_n, + unit_weights, + time_weights, + w_treated=w_treated, + w_control=w_control, + ) se = se_n * Y_scale placebo_effects = np.asarray(jackknife_estimates_n) * Y_scale inference_method = "jackknife" @@ -806,18 +1121,39 @@ def fit( # type: ignore[override] # Use placebo-based variance (R's synthdid Algorithm 4). # Placebo re-estimates ω, λ inside the loop; it must receive the # normalized zetas and operate on normalized Y. - se_n, placebo_effects_n = self._placebo_variance_se( - Y_pre_control_n, - Y_post_control_n, - Y_pre_treated_mean_n, - Y_post_treated_mean_n, - n_treated=len(treated_units), - zeta_omega=zeta_omega_n, - zeta_lambda=zeta_lambda_n, - min_decrease=min_decrease, - replications=self.n_bootstrap, - w_control=w_control, - ) + if _placebo_use_survey_path: + # Stratified permutation + weighted-FW (Pesarin 2001). + # PSU/FPC-without-strata designs use a synthesized single + # stratum (``_strata_*_eff``), which makes the stratified + # permutation degenerate to a global within-stratum + # permutation dispatched through the weighted-FW path. + assert w_control is not None + se_n, placebo_effects_n = self._placebo_variance_se_survey( + Y_pre_control_n, + Y_post_control_n, + Y_pre_treated_mean_n, + Y_post_treated_mean_n, + strata_control=_strata_control_eff, + treated_strata=_strata_treated_eff, + zeta_omega=zeta_omega_n, + zeta_lambda=zeta_lambda_n, + min_decrease=min_decrease, + replications=self.n_bootstrap, + w_control=w_control, + ) + else: + se_n, placebo_effects_n = self._placebo_variance_se( + Y_pre_control_n, + Y_post_control_n, + Y_pre_treated_mean_n, + Y_post_treated_mean_n, + n_treated=len(treated_units), + zeta_omega=zeta_omega_n, + zeta_lambda=zeta_lambda_n, + min_decrease=min_decrease, + replications=self.n_bootstrap, + w_control=w_control, + ) se = se_n * Y_scale placebo_effects = np.asarray(placebo_effects_n) * Y_scale inference_method = "placebo" @@ -915,8 +1251,31 @@ def fit( # type: ignore[override] treated_post_trajectory=treated_post_trajectory, time_weights_array=time_weights, ) - self.results_._loo_unit_ids = loo_unit_ids - self.results_._loo_roles = loo_roles + # Explicit LOO granularity flag for ``get_loo_effects_df``. The + # non-survey and pweight-only jackknife paths run unit-level LOO + # (one estimate per unit, matching ``control_unit_ids + + # treated_unit_ids``); the full-design survey jackknife runs + # PSU-level LOO and returns a flat PSU-indexed replicate array. + # Unit-level positional join onto ``_loo_unit_ids`` is well- + # defined only for the unit-level path. + if inference_method == "jackknife": + self.results_._loo_granularity = ( + "psu" if _jackknife_use_survey_path else "unit" + ) + else: + self.results_._loo_granularity = None + # Only populate unit-level LOO bookkeeping when the granularity + # is actually unit-level (R7 P3). Leaving ``_loo_unit_ids`` / + # ``_loo_roles`` populated on the PSU path would cause + # ``_loo_unit_ids is not None`` availability checks (e.g., + # ``practitioner.py`` / canned guidance) to call + # ``get_loo_effects_df()`` and hit ``NotImplementedError``. + if self.results_._loo_granularity == "unit": + self.results_._loo_unit_ids = loo_unit_ids + self.results_._loo_roles = loo_roles + else: + self.results_._loo_unit_ids = None + self.results_._loo_roles = None self.results_._fit_snapshot = fit_snapshot self._unit_weights = unit_weights @@ -1391,13 +1750,12 @@ def _placebo_variance_se( # Ensure we have enough controls for the split n_pseudo_control = n_control - n_treated if n_pseudo_control < 1: - # Fallback guidance. Placebo and jackknife reject strata/PSU/FPC, - # but bootstrap (PR #352) supports both pweight-only and - # full-design surveys, so it's always a valid fallback. + # Fallback guidance. All three variance methods support + # pweight-only and full-design surveys (PR #355 and this PR). fallback = ( - "variance_method='bootstrap' (supports pweight-only and " - "strata/PSU/FPC survey designs), variance_method='jackknife' " - "(pweight-only only), or adding more control units" + "variance_method='bootstrap' or 'jackknife' (both support " + "pweight-only and strata/PSU/FPC survey designs), or adding " + "more control units" if w_control is not None else "variance_method='bootstrap', variance_method='jackknife', " "or adding more control units" @@ -1495,13 +1853,14 @@ def _placebo_variance_se( if n_successful < 2: # Same fallback guidance as the pre-replication guard above. - # Bootstrap (PR #352) supports pweight-only + strata/PSU/FPC - # survey designs, so it's always a valid fallback for survey - # users even when placebo fails. + # Bootstrap and jackknife both support pweight-only + full + # strata/PSU/FPC survey designs, so either is a valid + # fallback for survey users (though jackknife is anti- + # conservative with few PSUs per stratum — see REGISTRY). fallback = ( - "variance_method='bootstrap' (supports pweight-only and " - "strata/PSU/FPC survey designs), variance_method='jackknife' " - "(pweight-only only), or increasing the number of control units" + "variance_method='bootstrap' or 'jackknife' (both support " + "pweight-only and strata/PSU/FPC survey designs), or " + "increasing the number of control units" if w_control is not None else "variance_method='bootstrap' or variance_method='jackknife' " "or increasing the number of control units" @@ -1531,6 +1890,191 @@ def _placebo_variance_se( return se, placebo_estimates + def _placebo_variance_se_survey( + self, + Y_pre_control: np.ndarray, + Y_post_control: np.ndarray, + Y_pre_treated_mean: np.ndarray, + Y_post_treated_mean: np.ndarray, + strata_control: np.ndarray, + treated_strata: np.ndarray, + zeta_omega: float = 0.0, + zeta_lambda: float = 0.0, + min_decrease: float = 1e-5, + replications: int = 200, + w_control: Optional[np.ndarray] = None, + ) -> Tuple[float, np.ndarray]: + """Stratified-permutation placebo variance for survey designs. + + Extends Algorithm 4 of Arkhangelsky et al. (2021) to strata/PSU/FPC + designs by restricting pseudo-treated sampling to controls in the + same stratum as actual treated units (Pesarin 2001 stratified + permutation test). Weighted Frank-Wolfe re-estimates ω and λ per + draw on the pseudo-panel with per-control survey weights flowing + into both the loss and the regularizer. + + The PSU axis is intentionally not randomized — within-stratum unit- + level permutation is the classical stratified permutation test + (Pesarin 2001 Ch. 3-4). PSU-level permutation on few PSUs (2-8 + typical) produces near-degenerate permutation support and poor + power. Asymmetry with the jackknife allocator (which respects PSU) + is by design; see REGISTRY.md §SyntheticDiD "Allocator asymmetry". + + Parameters + ---------- + Y_pre_control, Y_post_control : np.ndarray + Control outcomes, shapes (n_pre, n_control) / (n_post, n_control). + Y_pre_treated_mean, Y_post_treated_mean : np.ndarray + Survey-weighted treated means, shapes (n_pre,) / (n_post,). + strata_control : np.ndarray + Per-control stratum labels (already resolved via + ``collapse_survey_to_unit_level``), shape (n_control,). + treated_strata : np.ndarray + Per-treated-unit stratum labels, shape (n_treated,). + zeta_omega, zeta_lambda, min_decrease : float + Weighted-FW hyperparameters (already normalized by Y_scale). + replications : int, default 200 + Number of placebo draws. + w_control : np.ndarray + Per-control survey weights, shape (n_control,). Required for + survey path (passed through from fit-time resolved weights). + + Returns + ------- + tuple + ``(se, placebo_effects)`` where ``se = sqrt((r-1)/r) * std(...)`` + (Algorithm 4 SE formula) and ``placebo_effects`` is the array + of successful pseudo-τ̂ values. + + References + ---------- + Arkhangelsky et al. (2021), *American Economic Review*, Algorithm 4. + Pesarin (2001), *Multivariate Permutation Tests*, Ch. 3-4. + Pesarin & Salmaso (2010), *Permutation Tests for Complex Data*. + """ + rng = np.random.default_rng(self.seed) + + # Build per-stratum control index map (strata containing treated units) + unique_treated_strata, treated_counts_per_stratum = np.unique( + treated_strata, return_counts=True + ) + control_idx_per_stratum: Dict[Any, np.ndarray] = {} + for h in unique_treated_strata: + control_idx_per_stratum[h] = np.where(strata_control == h)[0] + + placebo_estimates = [] + + for _ in range(replications): + try: + pseudo_treated_parts = [] + for h, n_treated_h in zip( + unique_treated_strata, treated_counts_per_stratum + ): + controls_in_h = control_idx_per_stratum[h] + pseudo_treated_h = rng.choice( + controls_in_h, size=int(n_treated_h), replace=False + ) + pseudo_treated_parts.append(pseudo_treated_h) + pseudo_treated_idx = np.concatenate(pseudo_treated_parts) + + # Pseudo-control = all controls \ pseudo-treated. Keep the + # non-treated-stratum controls AND the unsampled controls + # within each treated stratum. + sampled_set = set(pseudo_treated_idx.tolist()) + pseudo_control_mask = np.array( + [i not in sampled_set for i in range(len(strata_control))] + ) + pseudo_control_idx = np.where(pseudo_control_mask)[0] + + # Pseudo-panel + Y_pre_pseudo_control = Y_pre_control[:, pseudo_control_idx] + Y_post_pseudo_control = Y_post_control[:, pseudo_control_idx] + pseudo_w_tr = w_control[pseudo_treated_idx] + pseudo_w_co = w_control[pseudo_control_idx] + + # Pseudo-treated means (survey-weighted) + Y_pre_pseudo_treated_mean = np.average( + Y_pre_control[:, pseudo_treated_idx], + axis=1, + weights=pseudo_w_tr, + ) + Y_post_pseudo_treated_mean = np.average( + Y_post_control[:, pseudo_treated_idx], + axis=1, + weights=pseudo_w_tr, + ) + + # Weighted FW for unit weights + pseudo_omega = compute_sdid_unit_weights_survey( + Y_pre_pseudo_control, + Y_pre_pseudo_treated_mean, + rw_control=pseudo_w_co, + zeta_omega=zeta_omega, + min_decrease=min_decrease, + ) + + # Compose ω_eff = rw · ω / Σ(rw · ω). Zero-mass guard: + # degenerate draw where FW sparsified onto zero-survey- + # weight controls; retry (same convention as bootstrap + # PR #355 R12 P1). + omega_scaled = pseudo_w_co * pseudo_omega + total = omega_scaled.sum() + if total <= 0: + continue + omega_eff = omega_scaled / total + + # Weighted FW for time weights + pseudo_lambda = compute_time_weights_survey( + Y_pre_pseudo_control, + Y_post_pseudo_control, + rw_control=pseudo_w_co, + zeta_lambda=zeta_lambda, + min_decrease=min_decrease, + ) + + tau = compute_sdid_estimator( + Y_pre_pseudo_control, + Y_post_pseudo_control, + Y_pre_pseudo_treated_mean, + Y_post_pseudo_treated_mean, + omega_eff, + pseudo_lambda, + ) + if np.isfinite(tau): + placebo_estimates.append(float(tau)) + + except (ValueError, LinAlgError, ZeroDivisionError): + continue + + placebo_estimates_arr = np.array(placebo_estimates) + n_successful = len(placebo_estimates_arr) + + if n_successful < 2: + warnings.warn( + f"Only {n_successful} placebo replications completed successfully " + f"on the survey path. Standard error cannot be estimated reliably. " + "Consider variance_method='bootstrap' (supports the same full " + "design via weighted-FW + Rao-Wu) or rebalance the panel.", + UserWarning, + stacklevel=3, + ) + return 0.0, placebo_estimates_arr + + failure_rate = 1 - (n_successful / replications) + if failure_rate > 0.05: + warnings.warn( + f"Only {n_successful}/{replications} stratified-permutation " + f"placebo replications succeeded ({failure_rate:.1%} failure " + "rate). Standard errors may be unreliable.", + UserWarning, + stacklevel=3, + ) + + se = np.sqrt((n_successful - 1) / n_successful) * np.std( + placebo_estimates_arr, ddof=1 + ) + return se, placebo_estimates_arr + def _jackknife_se( self, Y_pre_control: np.ndarray, @@ -1723,6 +2267,380 @@ def _jackknife_se( return se, jackknife_estimates + def _jackknife_se_survey( + self, + Y_pre_control: np.ndarray, + Y_post_control: np.ndarray, + Y_pre_treated: np.ndarray, + Y_post_treated: np.ndarray, + unit_weights: np.ndarray, + time_weights: np.ndarray, + w_control: np.ndarray, + w_treated: np.ndarray, + strata_control: np.ndarray, + strata_treated: np.ndarray, + psu_control: Optional[np.ndarray], + psu_treated: Optional[np.ndarray], + fpc_control: Optional[np.ndarray], + fpc_treated: Optional[np.ndarray], + lonely_psu: str = "remove", + ) -> Tuple[float, np.ndarray]: + """PSU-level leave-one-out jackknife with stratum aggregation. + + Extends Algorithm 3 of Arkhangelsky et al. (2021) to survey designs + via the stratified PSU-level jackknife variance estimator (Rust & + Rao 1996):: + + V_J = Σ_h (1 - f_h) · (n_h - 1)/n_h · Σ_{j∈h} (τ̂_{(h,j)} - τ̄_h)² + + where ``n_h`` is the number of PSUs in stratum h, ``f_h = n_h/N_h`` + is the sampling fraction (0 if no FPC), and ``τ̄_h`` is the + stratum-level mean of LOO estimates. + + Semantics: + * **λ fixed** (not re-estimated per LOO) — matches non-survey + Algorithm 3. Jackknife is variance-approximation; re-estimating + λ per LOO conflates weight-estimation uncertainty (bootstrap's + domain) with sampling uncertainty. + * **ω subset + rw-composed-renormalize** (not re-estimated) — same + rationale. Control units inside the dropped PSU are removed; + remaining ω is composed with remaining survey weights and + renormalized. + * **Strata with n_h < 2 are silently skipped** (lonely-PSU case, + matches R ``survey::svyjkn``). They contribute 0 to the total + variance. If every stratum is skipped, ``SE=NaN`` with a + ``UserWarning``. + * **Undefined LOOs within a contributing stratum → SE=NaN.** The + Rust & Rao formula requires every PSU-LOO in a contributing + stratum (``n_h ≥ 2``) to produce a defined ``τ̂_{(h,j)}``. If + any single LOO is undefined — (a) deletion removes all treated + units, (b) kept ``ω_eff`` mass is zero, (c) kept treated + survey mass is zero, (d) the SDID estimator raises or returns + non-finite τ̂ — the overall SE is undefined and the method + returns ``NaN`` with a targeted ``UserWarning`` naming the + stratum / PSU / reason. Silently skipping the missing LOO + while still applying the full ``(n_h-1)/n_h`` factor would + systematically under-scale variance (silently wrong SE). + + PSU-None fallback: if ``psu_control is None``, each unit is treated + as its own PSU within its stratum (matches PR #355 R8 P1 + implicit-PSU Rao-Wu semantics). + + Parameters + ---------- + Y_pre_control, Y_post_control : np.ndarray + Control outcomes. + Y_pre_treated, Y_post_treated : np.ndarray + Treated outcomes (raw per-unit, not averaged). + unit_weights, time_weights : np.ndarray + Fit-time ω, λ (kept fixed across LOOs). + w_control, w_treated : np.ndarray + Per-unit survey weights. + strata_control, strata_treated : np.ndarray + Per-unit stratum labels. + psu_control, psu_treated : np.ndarray, optional + Per-unit PSU labels. ``None`` → each unit is its own PSU. + fpc_control, fpc_treated : np.ndarray, optional + Per-unit FPC values (stratum-constant population counts from + ``survey.py::SurveyDesign.resolve``, validated constant within + each stratum). + + Returns + ------- + tuple + ``(se, tau_loo_all)`` where ``se`` is the stratum-aggregated + jackknife SE (NaN if every stratum was skipped) and + ``tau_loo_all`` is the flat array of successful LOO estimates + (not grouped per stratum). + + References + ---------- + Arkhangelsky et al. (2021), *American Economic Review*, Algorithm 3. + Rust & Rao (1996), *Statistical Methods in Medical Research*, 5(3), + 283-310, "Variance Estimation for Complex Surveys Using Replication + Techniques". + """ + n_control = Y_pre_control.shape[1] + n_treated = Y_pre_treated.shape[1] + + # Build unit-level (stratum, psu, fpc, is_control, local_idx) index. + # ``local_idx`` is the position in its arm (control_idx in [0,n_c) + # or treated_idx in [0,n_t)). We loop over (stratum, psu) groups. + if psu_control is None: + # Each control unit is its own PSU — use the control's own index. + psu_control_eff = np.arange(n_control, dtype=np.int64) + else: + psu_control_eff = np.asarray(psu_control) + if psu_treated is None: + psu_treated_eff = np.arange(n_control, n_control + n_treated, dtype=np.int64) + else: + psu_treated_eff = np.asarray(psu_treated) + + # Per-stratum PSU enumeration. PSU labels are globally unique + # within strata by ``SurveyDesign.resolve`` (see survey.py + # L308-L320 ``nest=False`` validation), so a (stratum, psu) pair + # uniquely identifies a PSU. + unique_strata_all = np.unique( + np.concatenate([strata_control, strata_treated]) + ) + + # Short-circuit: unstratified single-PSU design. ``strata_*`` arrays + # are always populated after ``_resolve_survey_for_fit``, so a + # single-stratum + single-PSU design is detectable as one unique + # PSU across both arms. + all_psus = np.concatenate([psu_control_eff, psu_treated_eff]) + if len(unique_strata_all) == 1 and len(np.unique(all_psus)) < 2: + return np.nan, np.array([]) + + # Precompute fixed-ω composition for the FULL sample (for LOOs that + # drop only treated PSUs — control ω/w_control unchanged). + omega_eff_full = unit_weights * w_control + if omega_eff_full.sum() <= 0: + # Fit-time guard should have caught this, but double-check for + # defense-in-depth. + warnings.warn( + "Jackknife survey SE cannot be computed: the effective " + "control omega mass (ω · w_control) sums to zero.", + UserWarning, + stacklevel=3, + ) + return np.nan, np.array([]) + omega_eff_full = omega_eff_full / omega_eff_full.sum() + + total_variance = 0.0 + tau_loo_all: List[float] = [] + any_stratum_contributed = False + # Undefined-replicate tracking (PR #365 R1 P0 fix). The Rust & Rao + # (1996) formula assumes every sampled PSU within a contributing + # stratum has a defined delete-one replicate `τ̂_{(h,j)}`. If any + # LOO within a contributing stratum (n_h ≥ 2) is undefined — e.g., + # all treated units are in that PSU, or the kept ω_eff mass is + # zero, or the SDID estimator raises — the stratified SE formula + # does not apply and the overall SE is undefined. Return NaN + # rather than silently skipping the missing replicate while still + # applying the full (n_h-1)/n_h factor (which would underscale). + undefined_replicate_stratum: Optional[Any] = None + undefined_replicate_psu: Optional[Any] = None + undefined_replicate_reason: str = "" + + for h in unique_strata_all: + # PSUs in stratum h (across both arms) + control_in_h_mask = strata_control == h + treated_in_h_mask = strata_treated == h + psus_in_h_control = psu_control_eff[control_in_h_mask] + psus_in_h_treated = psu_treated_eff[treated_in_h_mask] + psus_in_h = np.unique( + np.concatenate([psus_in_h_control, psus_in_h_treated]) + ) + n_h = len(psus_in_h) + if n_h < 2: + # Singleton-stratum handling. R12 P1 fix: distinguish + # ``"certainty"`` from ``"remove"`` semantics. Both end + # up adding zero variance for this stratum, but + # ``"certainty"`` is an *explicit* zero-variance + # contributor (the stratum is sampled with certainty, + # so no sampling variance — this is a documented + # legitimate zero, not a "skipped/undefined" case). + # Mark as contributing so the all-singleton design + # under ``"certainty"`` returns ``SE = 0.0`` instead + # of falling through to the "every stratum was + # skipped → NaN" branch (matches `survey.py`'s + # ``test_all_certainty_psu_zero_vcov`` contract). + # ``"remove"`` continues to silently skip — matches R + # ``survey::svyjkn`` lonely-PSU="remove". + if lonely_psu == "certainty": + any_stratum_contributed = True + continue + + # Per-stratum FPC. ``fpc_*`` arrays are stratum-constant by + # SurveyDesign.resolve (survey.py L343-L347). Read from either + # arm; prefer control if any controls in the stratum. + if fpc_control is not None and control_in_h_mask.any(): + fpc_h = float(fpc_control[control_in_h_mask][0]) + f_h = n_h / fpc_h if fpc_h > 0 else 0.0 + elif fpc_treated is not None and treated_in_h_mask.any(): + fpc_h = float(fpc_treated[treated_in_h_mask][0]) + f_h = n_h / fpc_h if fpc_h > 0 else 0.0 + else: + f_h = 0.0 + + # R6 P1 fix: full-census short-circuit. When f_h >= 1 the + # Rust & Rao factor ``(1 - f_h) <= 0`` zeros this stratum's + # contribution to total variance regardless of within- + # stratum dispersion. Skip the delete-one feasibility loop + # entirely — otherwise an undefined LOO inside a full- + # census stratum (e.g., all treated in the dropped PSU) + # would mistakenly short-circuit the whole design to + # ``SE=NaN``, even though the stratum contributes zero by + # legitimate design. Mark as contributing (so the overall + # result returns ``SE=0`` or a finite non-zero from other + # strata, not ``NaN`` from the "no stratum contributed" + # branch). + if f_h >= 1.0: + any_stratum_contributed = True + continue + + tau_loo_h: List[float] = [] + stratum_has_undefined_replicate = False + for j in psus_in_h: + # Mask: kept units across both arms + control_kept_mask = psu_control_eff != j + treated_kept_mask = psu_treated_eff != j + + # If this PSU contains no units in either arm, it cannot + # produce a meaningful LOO (and should not have been + # enumerated); treat as undefined for defensive consistency. + if control_kept_mask.all() and treated_kept_mask.all(): + stratum_has_undefined_replicate = True + undefined_replicate_stratum = h + undefined_replicate_psu = j + undefined_replicate_reason = ( + "PSU contains no units in either arm" + ) + break + + # All treated removed → LOO yields an undefined SDID + # estimator (no treated mean to compare). The Rust & Rao + # formula expects τ̂_{(h,j)} defined for every j; skipping + # this PSU while keeping the (n_h-1)/n_h factor would + # underscale variance (R1 P0). + if not treated_kept_mask.any(): + stratum_has_undefined_replicate = True + undefined_replicate_stratum = h + undefined_replicate_psu = j + undefined_replicate_reason = ( + "deletion removes all treated units (no treated " + "mean for the LOO SDID estimator)" + ) + break + + # Control ω composition on kept controls + omega_kept = unit_weights[control_kept_mask] + w_control_kept = w_control[control_kept_mask] + omega_eff_kept = omega_kept * w_control_kept + if omega_eff_kept.sum() <= 0: + stratum_has_undefined_replicate = True + undefined_replicate_stratum = h + undefined_replicate_psu = j + undefined_replicate_reason = ( + "kept omega_eff mass is zero (all remaining " + "controls have zero fit-time or survey weight)" + ) + break + omega_eff_kept = omega_eff_kept / omega_eff_kept.sum() + + # Treated mean on kept treated units (survey-weighted) + w_treated_kept = w_treated[treated_kept_mask] + if w_treated_kept.sum() <= 0: + stratum_has_undefined_replicate = True + undefined_replicate_stratum = h + undefined_replicate_psu = j + undefined_replicate_reason = ( + "kept treated survey mass is zero" + ) + break + Y_pre_t_mean = np.average( + Y_pre_treated[:, treated_kept_mask], + axis=1, + weights=w_treated_kept, + ) + Y_post_t_mean = np.average( + Y_post_treated[:, treated_kept_mask], + axis=1, + weights=w_treated_kept, + ) + + try: + tau_j = compute_sdid_estimator( + Y_pre_control[:, control_kept_mask], + Y_post_control[:, control_kept_mask], + Y_pre_t_mean, + Y_post_t_mean, + omega_eff_kept, + time_weights, + ) + except (ValueError, LinAlgError, ZeroDivisionError): + stratum_has_undefined_replicate = True + undefined_replicate_stratum = h + undefined_replicate_psu = j + undefined_replicate_reason = ( + "SDID estimator raised on the LOO panel" + ) + break + + if not np.isfinite(tau_j): + stratum_has_undefined_replicate = True + undefined_replicate_stratum = h + undefined_replicate_psu = j + undefined_replicate_reason = ( + "SDID estimator returned non-finite τ̂" + ) + break + tau_loo_h.append(float(tau_j)) + + if stratum_has_undefined_replicate: + # Record the partial LOOs for the returned array (useful + # for debugging) but stop accumulating variance — the + # stratified Rust & Rao formula requires all n_h + # replicates. + tau_loo_all.extend(tau_loo_h) + break + + if len(tau_loo_h) == n_h: + tau_bar_h = np.mean(tau_loo_h) + ss_h = float( + np.sum((np.asarray(tau_loo_h) - tau_bar_h) ** 2) + ) + total_variance += (1.0 - f_h) * (n_h - 1) / n_h * ss_h + any_stratum_contributed = True + tau_loo_all.extend(tau_loo_h) + + tau_loo_arr = np.asarray(tau_loo_all) + if undefined_replicate_stratum is not None: + # R1 P0 fix: Rust & Rao's stratified jackknife formula requires + # every LOO within a contributing stratum to be defined. When + # one is missing, the design is not covered by the formula and + # the SE is undefined; returning a finite value on the + # remaining replicates (still multiplied by (n_h-1)/n_h) would + # systematically under-scale variance. + warnings.warn( + "Jackknife survey SE is undefined: delete-one replicate " + f"for stratum {undefined_replicate_stratum} PSU " + f"{undefined_replicate_psu} is not computable " + f"({undefined_replicate_reason}). The stratified Rust & " + "Rao (1996) jackknife formula requires τ̂_{(h,j)} defined " + "for every j in every contributing stratum. Returning " + "SE=NaN. Consider variance_method='bootstrap' (supports " + "the same full design without a per-LOO feasibility " + "constraint) or rebalance the panel so every PSU has at " + "least one treated and one non-zero-mass control unit " + "after deletion.", + UserWarning, + stacklevel=3, + ) + return np.nan, tau_loo_arr + if not any_stratum_contributed: + warnings.warn( + "Jackknife survey SE is undefined because every stratum " + "was skipped (insufficient PSUs per stratum for variance " + "contribution). Returning SE=NaN. Consider " + "variance_method='bootstrap' (supports the same full " + "design) or rebalance the panel.", + UserWarning, + stacklevel=3, + ) + return np.nan, tau_loo_arr + + # R5 P1 fix: legitimate zero variance (e.g., full-census FPC with + # f_h = 1 for every contributing stratum → (1 - f_h) = 0 factor + # zeros the contribution even when within-stratum dispersion is + # non-zero; or exact-zero within-stratum dispersion when all + # LOOs produce identical τ̂). Rust & Rao gives V_J = 0, not + # undefined. Reserve NaN for the "all strata skipped" / + # undefined-replicate cases above; compute SE = 0 otherwise. + variance_nonneg = max(total_variance, 0.0) + return float(np.sqrt(variance_nonneg)), tau_loo_arr + def get_params(self) -> Dict[str, Any]: """Get estimator parameters.""" return { diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 44da7543..35d97579 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -1549,21 +1549,25 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi - **Jackknife with single nonzero-weight control**: Returns NaN SE. Leaving out the only effective control is not meaningful. - **Jackknife with non-finite LOO estimate**: Returns NaN SE. Unlike bootstrap/placebo, jackknife is deterministic and cannot skip failed iterations; NaN propagates through `var()` (matches R behavior). - **Jackknife with survey weights**: Guards on effective positive support (omega * w_control > 0 and w_treated > 0) after composition, not raw FW counts. Returns NaN SE if fewer than 2 effective controls or 2 positive-weight treated units. Per-iteration zero-sum guards return NaN for individual LOO iterations when remaining composed weights sum to zero. -- **Note (survey support matrix — PR #352):** +- **Note (survey support matrix):** | variance_method | pweight-only | strata/PSU/FPC | |-----------------|:------------:|:--------------:| - | `bootstrap` | ✓ weighted FW | ✓ weighted FW + Rao-Wu rescaling | - | `placebo` | ✓ | ✗ NotImplementedError (separate gap) | - | `jackknife` | ✓ | ✗ NotImplementedError (separate gap) | + | `bootstrap` | ✓ weighted FW | ✓ weighted FW + Rao-Wu rescaling (PR #355) | + | `placebo` | ✓ | ✓ stratified permutation + weighted FW | + | `jackknife` | ✓ | ✓ PSU-level LOO with stratum aggregation | **Pweight-only path** (placebo / jackknife / bootstrap): treated-side means are survey-weighted (Frank-Wolfe target and ATT formula); control-side synthetic weights are composed with survey weights post-optimization (ω_eff = ω * w_co, renormalized). Fit-time Frank-Wolfe is unweighted — survey importance enters after trajectory-matching. Covariate residualization uses WLS with survey weights. - **Bootstrap survey path** (PR #352): for pweight-only the per-draw FW uses constant `rw = w_control`; for full design (strata/PSU/FPC) the per-draw `rw = generate_rao_wu_weights(resolved_survey, rng)` rescaling is composed with the same weighted-FW kernel. See "Note (survey + bootstrap composition)" below for the full objective and the argmin-set caveat. + **Bootstrap survey path** (PR #355): for pweight-only the per-draw FW uses constant `rw = w_control`; for full design (strata/PSU/FPC) the per-draw `rw = generate_rao_wu_weights(resolved_survey, rng)` rescaling is composed with the same weighted-FW kernel. See "Note (survey + bootstrap composition)" below for the full objective and the argmin-set caveat. - **Placebo / jackknife full-design rejection**: separate methodology gap. Placebo permutes control indices (the resampling unit is a control unit, not a PSU); jackknife leaves out one unit at a time. Both allocators need their own weighted derivations to compose with strata/PSU; tracked in TODO.md as a follow-up. -- **Note (default variance_method deviation from R):** R's `synthdid::vcov()` defaults to `method="bootstrap"`; our `SyntheticDiD.__init__` defaults to `variance_method="placebo"`. Library deviation rationale: (a) placebo's default unconditional availability across all survey configurations (full design supported on bootstrap only); (b) placebo avoids the ~5–30× per-draw Frank-Wolfe refit slowdown. Users can opt into R's default with `variance_method="bootstrap"`. Placebo (Algorithm 4) and bootstrap (Algorithm 2 step 2) both track nominal calibration in the committed coverage MC; see the calibration table below. -- **Note (survey + bootstrap composition — PR #352):** Restored capability. The bootstrap survey path solves the **weighted Frank-Wolfe** variant of `_sc_weight_fw` accepting per-unit weights in loss and regularization. For unit weights: + **Placebo survey path**: for pweight-only the existing Algorithm 4 flow applies with survey-weighted pseudo-treated means + post-hoc ω_eff composition. For designs with explicit `strata` and/or `psu` the allocator switches to **stratified permutation** (Pesarin 2001): pseudo-treated indices are drawn within each stratum containing actual treated units; weighted-FW re-estimates ω and λ per draw with per-control survey weights threaded into both loss and regularization. See "Note (survey + placebo composition)" below. **FPC is a documented no-op for placebo** — permutation tests are conditional on the observed sample (Pesarin 2001 §1.5), so the sampling fraction does not enter Algorithm 4 or its survey extension; an `fpc=` column on a placebo fit emits a `UserWarning` and is preserved in the design metadata but never enters the variance computation. Routing is gated on `strata` / `psu` only — FPC alone does not flip dispatch from the non-survey to the survey placebo path. + + **Jackknife survey path**: for pweight-only the existing Algorithm 3 flow applies (unit-level LOO with subset + rw-composed-renormalized ω; λ fixed). For full design the allocator switches to **PSU-level LOO with stratum aggregation** (Rust & Rao 1996): leave out one PSU at a time within each stratum, aggregate as `SE² = Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)²`. See "Note (survey + jackknife composition)" below. + + **Allocator asymmetry** (placebo ignores PSU axis; jackknife respects it): intentional. Placebo is a null-distribution test — within-stratum unit-level permutation is the classical stratified permutation test (Pesarin 2001 Ch. 3-4); PSU-level permutation on few PSUs (2-8 typical for survey designs) produces near-degenerate permutation support and poor power. Jackknife is a design-based variance approximation — PSU-level LOO within strata is the canonical survey jackknife (Rust & Rao 1996); unit-level LOO under clustering would underestimate SE. Both allocators respect strata (the primary survey-design axis). Neither is "right" in all dimensions; each is the defensible analog for its hypothesis-testing vs variance-approximation role. +- **Note (default variance_method deviation from R):** R's `synthdid::vcov()` defaults to `method="bootstrap"`; our `SyntheticDiD.__init__` defaults to `variance_method="placebo"`. Library deviation rationale: placebo avoids the ~5–30× per-draw Frank-Wolfe refit slowdown. All three variance methods (placebo, bootstrap, jackknife) now support both pweight-only and full strata/PSU/FPC survey designs (see the survey support matrix above); users can opt into R's default with `variance_method="bootstrap"`, which is also the recommended choice on surveys with few PSUs per stratum (jackknife is anti-conservative in that regime per the "Note (survey + jackknife composition)" above). Placebo (Algorithm 4) and bootstrap (Algorithm 2 step 2) both track nominal calibration in the committed coverage MC; see the calibration table below. +- **Note (survey + bootstrap composition — PR #355):** Restored capability. The bootstrap survey path solves the **weighted Frank-Wolfe** variant of `_sc_weight_fw` accepting per-unit weights in loss and regularization. For unit weights: ``` min_{ω simplex} Σ_t (Σ_i rw_i · ω_i · Y_i,pre[t] - treated_pre[t])² + ζ²·Σ_i rw_i · ω_i² ``` @@ -1575,8 +1579,54 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi - pweight-only → `rw = w_control[boot_idx_control]` (constant per draw, no Rao-Wu). - full design → `rw = generate_rao_wu_weights(resolved_survey_unit, rng)` per draw, sliced over the resampled units. Rao-Wu rescales weights by `(n_h/m_h)·r_hi` within each stratum; degenerate-retry on zero-mass control or treated draws. - **single-PSU short-circuit**: unstratified single-PSU designs return NaN SE (resampling one PSU yields the same subset every draw — bootstrap distribution is unidentified). +- **Note (survey + placebo composition):** Stratified-permutation allocator composed with the same weighted Frank-Wolfe kernel from the bootstrap survey path. Each placebo draw: + 1. For each stratum `h` containing actual treated units, draws `n_treated_h` pseudo-treated indices uniformly without replacement from `controls_in_h`. Non-treated strata contribute their controls unconditionally to the pseudo-control set. + 2. Pseudo-treated means are survey-weighted: `Y_pseudo_t = np.average(Y[:, pseudo_treated_idx], weights=w_control[pseudo_treated_idx])`. + 3. Weighted Frank-Wolfe re-estimates ω and λ on the pseudo-panel using `compute_sdid_unit_weights_survey(rw_control=w_control[pseudo_control_idx], ...)` and `compute_time_weights_survey(...)`. Post-optimization composition `ω_eff = rw·ω/Σ(rw·ω)` with zero-mass retry. + 4. SDID estimator on the pseudo-panel; Algorithm 4 SE `sqrt((r-1)/r)·std(placebo_estimates, ddof=1)`. + + **Fit-time feasibility guards** (per `feedback_front_door_over_retry_swallow.md`): four distinct failure cases are rejected *before* entering the retry loop, each with a targeted `ValueError`: + * **Case B** (`n_controls_h == 0` for some treated-containing stratum): the stratum has treated units but no controls — no pseudo-treated set can be drawn. + * **Case C** (`0 < n_controls_h < n_treated_h`): the stratum has fewer controls than treated units, so exact-count without-replacement sampling is impossible. + * **Case E** (row-count guards passed but `n_positive_weight_controls_h < n_treated_h`): the stratum has enough raw controls but too few have positive survey weight. Since the pseudo-treated mean uses `np.average(Y, weights=w_control[idx])`, draws can pick all-zero-weight subsets (ZeroDivisionError on np.average) and the retry loop would swallow them as a generic ``n_successful=0`` warning + ``SE=0.0``. + * **Case D** (effective single-support — *every* treated stratum collapses to one positive-mass mean): two shapes trigger this. **(D-classical)** `n_controls_h == n_treated_h` so the without-replacement permutation has only one subset. **(D-effective)** `n_c_h > n_t_h` (raw count allows multiple subsets) but `n_positive_weight_controls_h < 2` — every successful pseudo-treated mean reduces to the unique positive-weight control's outcome (zero-weight cohabitants contribute 0 to numerator and denominator). Both shapes give a degenerate null (SE = FP noise ~1e-16). Non-degeneracy requires **both** `n_c_h > n_t_h` AND `n_positive_weight_controls_h >= 2` for at least one treated stratum. + + Partial-permutation fallback is rejected for all four cases — it would silently change the null distribution and produce an incoherent test. + + **Scope note — what is NOT randomized:** the stratum marginal is preserved exactly by construction (each draw pulls the same count per treated stratum). The PSU axis is not randomized (permutation is unit-level within strata). This is conservative under clustering (ignores within-stratum PSU correlation in the null) but aligns with the classical stratified permutation test literature. See Pesarin (2001) *Multivariate Permutation Tests*, Ch. 3-4; Pesarin & Salmaso (2010) *Permutation Tests for Complex Data*. + + **Validation:** no external R/Julia parity anchor (neither package defines survey-weighted SDID placebo). Correctness rests on: (a) stratum-membership contract enforced by construction + per-draw `rng.choice` interception regression that captures every actual sampled `pseudo_treated_idx` and asserts each sampled control's stratum membership ⊆ treated-strata set, (b) Case B / C / D / E front-door guards with targeted-message regression tests, (c) SE-differs-from-pweight-only cross-surface sanity, (d) deterministic-dispatch regression. + +- **Note (survey + jackknife composition):** PSU-level leave-one-out with stratum aggregation (Rust & Rao 1996). For a design with strata `h = 1..H` and PSUs `j = 1..n_h` within each stratum: + + ``` + SE² = Σ_h (1 - f_h) · (n_h - 1)/n_h · Σ_{j∈h} (τ̂_{(h,j)} - τ̄_h)² + ``` + + where `τ̂_{(h,j)}` is the SDID estimator computed after leaving out all units in PSU `j` of stratum `h`; `τ̄_h` is the stratum-level mean of successful LOO estimates; `f_h = n_h_sampled / fpc[h]` is the per-stratum sampling fraction. FPC is stored as a per-unit **population-count** array by `SurveyDesign.resolve` (see `survey.py:338-356`, where `fpc_h < n_psu_h` is the validation constraint), so `f_h` is recovered by `f_h = n_h / fpc[strata == h][0]`. No FPC → `f_h = 0`. + + **Fixed weights per LOO:** matches Algorithm 3 of Arkhangelsky et al. (2021). ω is subsetted over kept controls, composed with kept `w_control`, renormalized (`ω_eff_kept = rw·ω / Σ(rw·ω)`); λ is held at the fit-time value. Rationale: jackknife is a design-based variance approximation, not a refit-variance bootstrap. Re-estimating λ or ω per LOO would conflate weight-estimation uncertainty (bootstrap's domain) with sampling uncertainty (jackknife's domain). + + **Undefined-replicate handling** (return NaN, do NOT silently skip): the Rust & Rao formula requires `τ̂_{(h,j)}` be defined for every PSU `j` in every contributing stratum. If any single LOO in a contributing stratum (`n_h ≥ 2`) is not computable — (a) deletion removes all treated units (e.g., all treated in one PSU), (b) `ω_eff_kept.sum() ≤ 0` after composition, (c) `w_treated_kept.sum() ≤ 0`, (d) the SDID estimator raises or returns non-finite τ̂ — the overall SE is **undefined** and the method returns `SE=NaN` with a targeted `UserWarning` naming the stratum / PSU / reason. Silently skipping the missing LOO while still applying the `(n_h-1)/n_h` factor would systematically under-scale variance (silently wrong SE). Users needing a variance estimator that accommodates PSU-deletion infeasibility should use `variance_method="bootstrap"`, whose pairs-bootstrap has no per-LOO feasibility constraint. + + **Zero-variance vs undefined distinction:** when every stratum contributes but `total_variance == 0.0` by legitimate design — full-census FPC (`f_h = 1` → `(1 - f_h) = 0` zeros the contribution even when within-stratum dispersion is non-zero) or exact-zero within-stratum dispersion — the jackknife SE is **zero**, not undefined. `_jackknife_se_survey` returns `SE = 0.0` in that case. `SE = NaN` is reserved for the truly-undefined cases documented above (all strata skipped; any undefined delete-one replicate). + + **`lonely_psu` contract:** `SurveyDesign(lonely_psu="remove")` (default) and `"certainty"` are both accepted, but with **different semantics** when *every* stratum is singleton: + * `"remove"` silently skips singleton strata (matches R `survey::svyjkn` — they're dropped from the variance computation). If every stratum is skipped, returns ``SE = NaN`` with the "every stratum was skipped" warning (no contributing stratum, undefined). + * `"certainty"` treats singleton strata as **explicit zero-variance contributors** (sampled with certainty, no sampling variance). Singleton strata still contribute 0 to total variance, but the stratum counts as "contributing" to the overall design — so an all-singleton design returns ``SE = 0.0`` (legitimate zero variance), not NaN. Mirrors `compute_survey_vcov`'s ``test_all_certainty_psu_zero_vcov`` contract for other estimators. + + `lonely_psu="adjust"` (R's overall-mean fallback) is **not yet supported** on the SDID jackknife path and raises `NotImplementedError` at fit-time; users needing that semantic should pick `variance_method="bootstrap"` (which supports all three modes via the weighted-FW + Rao-Wu path) or switch the design to `"remove"` / `"certainty"`. + + **Stratum-skip handling** (silent, documented): strata with `n_h < 2` are silently skipped (stratum-level variance unidentified — the `lonely-PSU` case in R `survey::svyjkn`). If every stratum is skipped, returns `SE=NaN` with a separate `UserWarning`. PSU-None designs: each unit is treated as its own PSU within its stratum (matches the implicit-PSU convention established in PR #355 R8 P1). Unstratified single-PSU short-circuits to `SE=NaN`. + + **Scope note — what is NOT randomized:** stratum membership and PSU composition are fixed by design. The formula only captures within-stratum variation; between-stratum variance is absorbed into the analytical-TSL / design assumption. This is canonical survey-jackknife behavior (Rust & Rao 1996) and matches R's `survey::svyjkn` under stratified designs. + + **Known limitation — anti-conservatism with few PSUs per stratum:** with `n_h = 2` per stratum (the minimum for variance identifiability), within-stratum jackknife has only 1 effective DoF per stratum — a well-documented limitation of the stratified jackknife formula. On the coverage MC `stratified_survey` DGP (2 PSUs × 2 strata), `se_over_truesd ≈ 0.46` at α=0.05. **Users needing tight SE calibration with few PSUs should prefer `variance_method="bootstrap"`**, which validates at near-nominal calibration on the same DGP. + + **Validation:** (a) hand-computed 2-stratum FPC magnitude regression (`test_jackknife_full_design_fpc_reduces_se_magnitude` — asserts `SE_fpc == SE_nofpc · sqrt(1 - f)` at `rtol=1e-10`), (b) self-consistency between the returned SE and the stratum-aggregation formula applied to the returned LOO estimates, (c) single-PSU-stratum skip, (d) all-strata-skipped UserWarning + NaN, (e) unstratified single-PSU short-circuit, (f) deterministic-dispatch regression. + - **Note:** P-value computation is variance-method dependent. Placebo (Algorithm 4) uses the empirical null formula `max(mean(|placebo_effects| ≥ |att|), 1/(r+1))` because permuting control indices generates draws from the null distribution (centered on 0). Bootstrap (Algorithm 2) and jackknife (Algorithm 3) use the analytical p-value from `safe_inference(att, se)` (normal-theory): bootstrap draws are centered on `τ̂` (sampling distribution of the estimator) and jackknife pseudo-values are not null draws, so the empirical null formula is invalid for them. This matches R's `synthdid::vcov()` convention, where variance is returned and inference is normal-theory from the SE. -- **Note (coverage Monte Carlo calibration):** `benchmarks/data/sdid_coverage.json` carries empirical rejection rates across the three variance methods on 4 representative null-panel DGPs (500 seeds × B=200, regenerable via `benchmarks/python/coverage_sdid.py`). The fourth DGP (`stratified_survey`, added in PR #352) validates the survey-bootstrap calibration; bootstrap is the only method evaluated on it because placebo / jackknife reject strata/PSU/FPC at fit-time. Under H0 the nominal rejection rate at each α equals α; rates substantially above α indicate anti-conservatism, rates below indicate over-coverage. +- **Note (coverage Monte Carlo calibration):** `benchmarks/data/sdid_coverage.json` carries empirical rejection rates across the three variance methods on 4 representative null-panel DGPs (500 seeds × B=200, regenerable via `benchmarks/python/coverage_sdid.py`). The fourth DGP (`stratified_survey`, added in PR #355) validates the survey-bootstrap calibration; jackknife is also reported with a documented anti-conservatism caveat; placebo is N/A on this DGP because its cohort packs into a single stratum with 0 never-treated units (stratified-permutation allocator is structurally infeasible — see `test_placebo_full_design_raises_on_zero_control_stratum` / `_undersupplied_stratum` for the enforced behavior). Under H0 the nominal rejection rate at each α equals α; rates substantially above α indicate anti-conservatism, rates below indicate over-coverage. | DGP | method | α=0.01 | α=0.05 | α=0.10 | mean SE / true SD | |-----------------------------------------------------------|------------|--------|--------|--------|-------------------| @@ -1590,10 +1640,15 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi | AER §6.3 | bootstrap | 0.010 | 0.040 | 0.078 | 1.05 | | AER §6.3 | jackknife | 0.030 | 0.080 | 0.150 | 0.90 | | stratified_survey (N=40, strata=2, PSU=2/stratum, ICC≈0.84) | bootstrap | 0.024 | 0.058 | 0.094 | 1.13 | + | stratified_survey | jackknife | 0.358 | 0.450 | 0.512 | 0.46 | Reading: **`bootstrap` (paper-faithful refit)** and **`placebo`** both track nominal calibration across all three non-survey DGPs (rates within Monte Carlo noise at 500 seeds; 2σ MC band ≈ 0.02–0.05 at p ≈ 0.05–0.10). **`jackknife`** is slightly anti-conservative on the smaller panels (balanced, AER §6.3) at α=0.05 (rejection 0.112 and 0.080 vs the 0.05 target). Arkhangelsky et al. (2021) §6.3 reports mixed jackknife evidence (98% coverage — slightly conservative — under iid, and 93% coverage — slightly anti-conservative — under AR(1) ρ=0.7), so the direction of our observation is consistent with the AR(1) branch of the paper's evidence rather than the iid branch. The `mean SE / true SD` column compares mean estimated SE to the empirical sampling SD of τ̂ across seeds. - **`stratified_survey × bootstrap` (PR #352)**: validates the weighted-FW + Rao-Wu composition added in this PR. Rejection at α=0.05 is 0.058 (inside the calibration gate [0.02, 0.10] widened from a 2σ band to accommodate the high ICC ≈ 0.84 induced by `psu_re_sd=1.5` with only 4 PSUs total). `mean SE / true SD = 1.13` indicates the bootstrap is slightly conservative (overestimates the empirical sampling SD by ~13%) — the safer direction; expected under Rao-Wu rescaling with few PSUs because the per-draw weights inflate variance from the resampling structure on top of the fit-time uncertainty. Placebo and jackknife rows are `null` here because both methods reject strata/PSU/FPC at fit-time (tracked as a separate methodology gap in TODO.md). Bootstrap is the only available variance method for full-design SDID fits in this release. + **`stratified_survey × bootstrap` (PR #355)**: validates the weighted-FW + Rao-Wu composition added in that PR. Rejection at α=0.05 is 0.058 (inside the calibration gate [0.02, 0.10] widened from a 2σ band to accommodate the high ICC ≈ 0.84 induced by `psu_re_sd=1.5` with only 4 PSUs total). `mean SE / true SD = 1.13` indicates the bootstrap is slightly conservative (overestimates the empirical sampling SD by ~13%) — the safer direction; expected under Rao-Wu rescaling with few PSUs because the per-draw weights inflate variance from the resampling structure on top of the fit-time uncertainty. + + **`stratified_survey × jackknife`**: reported with an anti-conservative caveat. Rejection at α=0.05 is 0.450 (far outside any reasonable calibration gate) and `se_over_truesd ≈ 0.46`. This is the documented limitation of the stratified PSU-level jackknife formula with `n_h = 2` PSUs per stratum: within-stratum variance has only 1 effective DoF per stratum, and between-stratum variation is absorbed into the design assumption rather than the SE. The bootstrap row on the same DGP demonstrates that the fix is to pick `variance_method="bootstrap"` when the design has few PSUs per stratum. This row is committed for transparency; the methodology Note above (§"Note (survey + jackknife composition)") explicitly flags this regime and recommends bootstrap. + + **`stratified_survey × placebo`**: N/A on this DGP by construction (its cohort packs all treated units into stratum 1, which has 0 never-treated units, so the stratified-permutation allocator raises **Case B** — treated-containing stratum with zero controls — at fit-time; see Case B / C definitions in "Note (survey + placebo composition)" above). The placebo survey path is exercised under feasible structures in `tests/test_survey_phase5.py::TestSDIDSurveyPlaceboFullDesign`; calibration on a placebo-feasible DGP is a future MC extension. The schema smoke test is `TestCoverageMCArtifact::test_coverage_artifacts_present`; regenerate the JSON via `python benchmarks/python/coverage_sdid.py --n-seeds 500 --n-bootstrap 200 --output benchmarks/data/sdid_coverage.json` (~15–40 min on M-series Mac, Rust backend — warm-start convergence makes newer runs faster than the original cold-start one). @@ -1603,7 +1658,7 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi *Validation diagnostics (post-fit methods on `SyntheticDiDResults`):* - **Trajectories** (`synthetic_pre_trajectory`, `synthetic_post_trajectory`, `treated_pre_trajectory`, `treated_post_trajectory`): retained on results to support plotting and custom fit metrics. `synthetic_pre_trajectory = Y_pre_control @ ω_eff`; `treated_pre_trajectory` is the survey-weighted treated mean (matches the Frank-Wolfe target). `pre_treatment_fit` is recoverable as `RMSE(treated_pre_trajectory, synthetic_pre_trajectory)`. -- **`get_loo_effects_df()`**: user-facing join of the jackknife leave-one-out pseudo-values (stored in `placebo_effects`) to the underlying unit identities. First `n_control` positions map to `control_unit_ids`, next `n_treated` to `treated_unit_ids` — positional ordering that mirrors `_jackknife_se`. `att_loo` is NaN when the zero-sum composed-weight guard fired for that unit; `delta_from_full = att_loo - att`. Requires `variance_method='jackknife'`; raises `ValueError` otherwise. +- **`get_loo_effects_df()`**: user-facing join of the jackknife leave-one-out pseudo-values (stored in `placebo_effects`) to the underlying unit identities. **Unit-level LOO only** — available on the non-survey and pweight-only jackknife paths (classical Algorithm 3: one LOO per unit, first `n_control` positions map to `control_unit_ids`, next `n_treated` to `treated_unit_ids`; `att_loo` is NaN when the zero-sum composed-weight guard fired for that unit; `delta_from_full = att_loo - att`). Under the full-design survey jackknife path (PSU-level LOO with stratum aggregation, Rust & Rao 1996), the underlying replicates are PSU-level rather than unit-level — the accessor raises `NotImplementedError` pointing to `result.placebo_effects` for the raw PSU-level replicate array. Dispatch is gated by an explicit `_loo_granularity` flag set at fit-time (`"unit"` vs `"psu"`). Requires `variance_method='jackknife'`; raises `ValueError` otherwise. - **`get_weight_concentration(top_k=5)`**: returns `effective_n = 1/Σω²` (inverse Herfindahl), `herfindahl`, `top_k_share`, `top_k`. Operates on `self.unit_weights` which stores the composed `ω_eff`; for survey-weighted fits the metrics reflect the population-weighted concentration, not the raw Frank-Wolfe solution. - **`in_time_placebo(fake_treatment_periods=None, zeta_omega_override=None, zeta_lambda_override=None)`**: re-slices the pre-window at each fake treatment period and re-fits both ω and λ via Frank-Wolfe. Default sweeps every feasible pre-period (position index `i ≥ 2` so ≥2 pre-fake periods remain for weight estimation, `i ≤ n_pre - 1` so ≥1 post-fake period exists). Credible designs produce near-zero placebo ATTs; departures indicate pre-treatment dynamics the estimator is picking up. - **Note:** Regularization reuses `self.zeta_omega` / `self.zeta_lambda` from the original fit (matches R `synthdid` convention of treating regularization as a property of the fit). `*_override` re-fits with new values. @@ -1624,8 +1679,9 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi - [x] Sparsification: v[v <= max(v)/4] = 0; v = v/sum(v) - [x] Placebo SE formula: sqrt((r-1)/r) * sd(placebo_estimates) - [x] Placebo SE: re-estimates omega and lambda per replication (matching R's update.omega=TRUE, update.lambda=TRUE) -- [x] Bootstrap: paper-faithful Algorithm 2 step 2 — re-estimates ω̂_b and λ̂_b per draw via two-pass sparsified Frank-Wolfe on the resampled panel using the fit-time normalized-scale zeta. Matches R's default `synthdid::vcov(method="bootstrap")` (which rebinds `attr(estimate, "opts")` so the renormalized ω serves only as Frank-Wolfe initialization). Survey designs (pweight-only AND strata/PSU/FPC) are supported via the weighted-FW + hybrid pairs-bootstrap + Rao-Wu rescaling composition described in the "Note (survey + bootstrap composition)" above (PR #352). -- [x] Jackknife SE: fixed weights, LOO all units, formula `sqrt((n-1)/n * sum((u-ubar)^2))` +- [x] Bootstrap: paper-faithful Algorithm 2 step 2 — re-estimates ω̂_b and λ̂_b per draw via two-pass sparsified Frank-Wolfe on the resampled panel using the fit-time normalized-scale zeta. Matches R's default `synthdid::vcov(method="bootstrap")` (which rebinds `attr(estimate, "opts")` so the renormalized ω serves only as Frank-Wolfe initialization). Survey designs (pweight-only AND strata/PSU/FPC) are supported via the weighted-FW + hybrid pairs-bootstrap + Rao-Wu rescaling composition described in the "Note (survey + bootstrap composition)" above (PR #355). +- [x] Placebo: Survey-weighted pseudo-treated means + weighted-FW re-estimation on pseudo-panel for both pweight-only and full-design paths. Full-design path (strata/PSU/FPC) uses stratified-permutation allocator — see "Note (survey + placebo composition)" above. +- [x] Jackknife SE: fixed weights, LOO all units, formula `sqrt((n-1)/n * sum((u-ubar)^2))`. Full-design path (strata/PSU/FPC) uses PSU-level LOO with stratum aggregation — see "Note (survey + jackknife composition)" above. - [x] Jackknife: NaN SE for single treated or single nonzero-weight control - [x] Jackknife: analytical p-value (not empirical) - [x] Returns both unit and time weights for interpretation diff --git a/docs/methodology/survey-theory.md b/docs/methodology/survey-theory.md index 019cc4ca..c5d642c0 100644 --- a/docs/methodology/survey-theory.md +++ b/docs/methodology/survey-theory.md @@ -358,14 +358,25 @@ an IF representation. Two estimators in diff-diff --- **SyntheticDiD** and **TROP** --- involve non-smooth optimization steps (synthetic control weight selection, optimal transport maps) that do not fit cleanly into the smooth-functional framework. -Their survey support is limited to bootstrap-only variance estimation: the -bootstrap resamples PSUs within strata (Rao-Wu rescaled), bypassing the need -for an IF. For SyntheticDiD, each draw re-runs the full estimator on resampled -data. For TROP, per-observation treatment effects (tau_it) are deterministic -given the data and do not depend on survey weights, so the Rao-Wu path -precomputes tau values once and only varies the ATT aggregation weights across -draws (see REGISTRY.md for the documented optimization). The TSL/IF-based -argument in this document does not extend to these estimators. +Their survey variance estimators bypass the TSL/IF framework entirely and use +resampling / permutation-style allocators tailored to each method's role: + +- **TROP** uses Rao-Wu rescaled bootstrap at the PSU level. Per-observation + treatment effects (tau_it) are deterministic given the data and do not + depend on survey weights, so the Rao-Wu path precomputes tau values once + and only varies the ATT aggregation weights across draws (see REGISTRY.md + for the documented optimization). +- **SyntheticDiD** supports all three variance methods under full + strata/PSU/FPC designs. ``bootstrap`` uses hybrid pairs-bootstrap + Rao-Wu + rescaling composed with a weighted Frank-Wolfe kernel (each draw re-runs + the full estimator on the resampled panel). ``placebo`` uses stratified + permutation + weighted Frank-Wolfe (pseudo-treated sampled within each + treated-containing stratum). ``jackknife`` uses PSU-level leave-one-out + with Rust-Rao stratum aggregation (fixed ω, λ — no refit per LOO). See + the bullets under "4.2b. SyntheticDiD survey resampling allocators" below + and REGISTRY.md §SyntheticDiD for the full derivations. Replicate-weight + designs remain rejected (no replicate-weight variance path). The TSL/IF- + based argument in this document does not extend to these estimators. ### 4.3. Under survey weighting, the same IF form applies @@ -689,7 +700,7 @@ Each estimator uses one of three variance strategies under survey designs: | EfficientDiD | TSL on EIFs | all weight types | | ContinuousDiD | TSL sandwich | all weight types | | StackedDiD | TSL sandwich | pweight only | -| SyntheticDiD | Bootstrap only | Not IF-amenable (Section 4.2a) | +| SyntheticDiD | Bootstrap / permutation / PSU-LOO | Not IF-amenable (Section 4.2a); all three variance methods support full strata/PSU/FPC designs | | TROP | Bootstrap only | Not IF-amenable (Section 4.2a) | | BaconDecomposition | Diagnostic only | Weighted descriptives, no inference | @@ -725,7 +736,7 @@ Two bootstrap strategies interact with survey designs: - **Rao-Wu rescaled bootstrap** (SunAbraham, TROP): Draws PSUs with replacement within strata and rescales observation weights. Each draw re-runs the full estimator on the resampled data. -- **Hybrid pairs-bootstrap + Rao-Wu rescaling** (SyntheticDiD, PR #352): +- **Hybrid pairs-bootstrap + Rao-Wu rescaling** (SyntheticDiD, PR #355): SDID's full-design bootstrap is NOT a standalone Rao-Wu bootstrap. Each draw first performs the unit-level pairs-bootstrap resampling that Arkhangelsky et al. (2021) Algorithm 2 specifies (``boot_idx = rng.choice(n_total)``), @@ -735,10 +746,59 @@ Two bootstrap strategies interact with survey designs: ``min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²`` on the resampled panel, and ``ω_eff = rw·ω / Σ(rw·ω)`` is composed for the SDID estimator. See REGISTRY.md §SyntheticDiD ``Note (survey + bootstrap composition)`` - for the full objective and the argmin-set caveat. SDID's `placebo` and - `jackknife` methods still reject strata/PSU/FPC (the placebo permutation - allocator and jackknife LOO mass need their own weighted derivations; - tracked in TODO.md as a follow-up). + for the full objective and the argmin-set caveat. + +- **Stratified permutation placebo** (SyntheticDiD): SDID's full-design + placebo variance allocator (triggered when ``strata`` and/or ``psu`` + is declared on the ``SurveyDesign``). For each placebo draw, + pseudo-treated indices are sampled uniformly without replacement + from controls *within each stratum containing actual treated units* + (classical stratified permutation test — Pesarin 2001). + Pseudo-treated means are survey-weighted; weighted-FW re-estimates + ω and λ per draw with ``rw_control`` threaded into both loss and + regularization. Post-optimization composition + ``ω_eff = rw · ω / Σ(rw · ω)`` with zero-mass retry. SE follows + Arkhangelsky Algorithm 4: + ``sqrt((r-1)/r) · std(placebo_estimates, ddof=1)``. Fit-time + feasibility guards raise ``ValueError`` on three failure cases: + Case B (treated stratum has 0 controls), Case C (fewer controls + than treated in a treated stratum), and Case D (every treated + stratum is exact-count ``n_c == n_t`` → permutation support = 1). + ``SurveyDesign(fpc=...)`` is a documented no-op for placebo — + permutation tests are conditional on the observed sample (Pesarin + 2001 §1.5), so the sampling fraction does not enter Algorithm 4 or + its survey extension. An ``fpc=`` column emits a ``UserWarning`` and + is not part of the placebo dispatch trigger. See REGISTRY.md + §SyntheticDiD ``Note (survey + placebo composition)``. + +- **PSU-level leave-one-out with stratum aggregation** (SyntheticDiD): + SDID's full-design jackknife variance allocator, matching the + canonical Rust & Rao (1996) stratified jackknife form: + ``SE² = Σ_h (1 - f_h) · (n_h - 1)/n_h · Σ_{j∈h} (τ̂_{(h,j)} - τ̄_h)²`` + where ``f_h = n_h_sampled / fpc[h]`` is the per-stratum sampling + fraction (population-count FPC form, matching ``SurveyDesign.resolve``). + Fixed weights per LOO: ω subsetted over kept controls, composed with + kept ``w_control``, renormalized; λ held at the fit-time value. Strata + with ``n_h < 2`` are silently skipped (stratum-level variance + unidentified; matches R ``survey::svyjkn`` under + ``lonely_psu="remove"`` / ``"certainty"``). Full-census strata + (``f_h ≥ 1``) short-circuit to zero contribution before any LOO + feasibility check. ``SE = 0`` is returned for legitimate zero + variance (every stratum full-census, or exact-zero within-stratum + dispersion); ``SE = NaN`` with a ``UserWarning`` is reserved for + undefined cases (all strata skipped, or any delete-one replicate in + a non-full-census contributing stratum is undefined). Unstratified + single-PSU designs short-circuit to ``SE = NaN``. + ``SurveyDesign(lonely_psu="adjust")`` is **not yet supported** on + this path and raises ``NotImplementedError``; use + ``variance_method="bootstrap"`` or ``lonely_psu="remove"`` / + ``"certainty"``. **Known limitation**: with ``n_h = 2`` per stratum, + the stratified PSU-level jackknife has only 1 effective DoF per + stratum and tends to be anti-conservative (see REGISTRY §SyntheticDiD + calibration table for the ``stratified_survey × jackknife`` row). + Users with few PSUs per stratum should prefer + ``variance_method="bootstrap"``, which validates at near-nominal + calibration on the same DGP. --- diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index db49b416..1a335718 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -44,7 +44,7 @@ Weighted `solve_logit()` in `linalg.py` — survey weights enter IRLS as | Estimator | Survey Support | Notes | |-----------|----------------|-------| -| SyntheticDiD | pweight (placebo / jackknife / bootstrap); strata/PSU/FPC (bootstrap only via PR #352 weighted FW + Rao-Wu) | Treated means survey-weighted; omega composed with control weights post-optimization. Bootstrap survey path uses weighted-FW + Rao-Wu rescaling per draw | +| SyntheticDiD | pweight (placebo / jackknife / bootstrap); strata/PSU/FPC (all three methods — bootstrap via PR #355 weighted FW + Rao-Wu; placebo via stratified permutation + weighted FW; jackknife via PSU-level LOO with stratum aggregation). `lonely_psu="adjust"` not supported on the jackknife path (use `"remove"` / `"certainty"` or switch to `bootstrap`). | Treated means survey-weighted; omega composed with control weights post-optimization. Bootstrap survey path uses weighted-FW + Rao-Wu rescaling per draw. Placebo full-design permutes pseudo-treated within strata containing actual treated units (requires at least one stratum with `n_c > n_t`; exact-count designs raise Case D `ValueError`). Jackknife full-design leaves out one PSU at a time and aggregates per Rust & Rao (1996); full-census strata (`f_h ≥ 1`) short-circuit to zero contribution. | | TROP | pweight | Population-weighted ATT aggregation; model fitting unchanged | ### Phase 6: Advanced Features (v2.7.6) @@ -53,12 +53,16 @@ Weighted `solve_logit()` in `linalg.py` — survey weights enter IRLS as multiplier at PSU (CS, Imputation, TwoStage, Continuous, Efficient) and Rao-Wu rescaled (SA, SyntheticDiD, TROP). SyntheticDiD bootstrap composes Rao-Wu rescaled per-draw weights with the **weighted Frank-Wolfe** - variant (PR #352): each draw solves the weighted objective + variant (PR #355): each draw solves the weighted objective ``min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²`` and composes ``ω_eff = rw·ω/Σ(rw·ω)`` for the SDID estimator. See REGISTRY.md §SyntheticDiD ``Note (survey + bootstrap composition)`` for the full - derivation. SDID's `placebo` and `jackknife` paths still reject - strata/PSU/FPC (separate methodology gap; tracked in TODO.md). + derivation. SyntheticDiD's `placebo` and `jackknife` methods now also + support full strata/PSU/FPC designs: placebo via stratified permutation + + the same weighted FW kernel; jackknife via PSU-level LOO with + stratum aggregation (Rust & Rao 1996). See REGISTRY.md §SyntheticDiD + "Note (survey + placebo composition)" and "Note (survey + jackknife + composition)" for objectives and limitations. - **Replicate weight variance**: BRR, Fay's BRR, JK1, JKn, SDR. 12 of 16 estimators supported (not SyntheticDiD, TROP, BaconDecomposition, or WooldridgeDiD) - **DEFF diagnostics**: per-coefficient design effects vs SRS baseline @@ -223,8 +227,7 @@ the limitation and suggested alternative. | Estimator | Limitation | Alternative | |-----------|-----------|-------------| -| SyntheticDiD | `variance_method='placebo'` or `'jackknife'` + strata/PSU/FPC | Use `variance_method='bootstrap'` for full-design surveys (PR #352 weighted-FW + Rao-Wu composition). Placebo's control-index permutation and jackknife's LOO allocator need their own weighted derivations on top of the weighted-FW kernel; tracked in TODO.md as a follow-up. | -| SyntheticDiD | Replicate weights | Pre-existing limitation: no replicate-weight survey support on SDID. | +| SyntheticDiD | Replicate weights | Pre-existing limitation: no replicate-weight survey support on SDID. All three variance methods (bootstrap, placebo, jackknife) now support pweight-only and strata/PSU/FPC designs; replicate-weight designs remain rejected. | | TROP | Replicate weights | Use strata/PSU/FPC design with Rao-Wu rescaled bootstrap | | BaconDecomposition | Replicate weights | Diagnostic only, no inference | | ImputationDiD | `pretrends=True` + replicate weights | Use analytical survey design instead | diff --git a/docs/tutorials/16_survey_did.ipynb b/docs/tutorials/16_survey_did.ipynb index 790b6425..caf7ff27 100644 --- a/docs/tutorials/16_survey_did.ipynb +++ b/docs/tutorials/16_survey_did.ipynb @@ -1087,7 +1087,7 @@ "cell_type": "markdown", "id": "cell-35-f1ef376c", "metadata": {}, - "source": "## 9. Which Estimators Support Survey Design?\n\n`diff-diff` supports survey design across all estimators, though the level of support varies:\n\n| Estimator | Weights | Strata/PSU/FPC (TSL) | Replicate Weights | Survey-Aware Bootstrap |\n|-----------|---------|---------------------|-------------------|------------------------|\n| **DifferenceInDifferences** | Full | Full | -- | -- |\n| **TwoWayFixedEffects** | Full | Full | -- | -- |\n| **MultiPeriodDiD** | Full | Full | -- | -- |\n| **CallawaySantAnna** | pweight only | Full | Full | Multiplier at PSU |\n| **TripleDifference** | pweight only | Full | Full (analytical) | -- |\n| **StaggeredTripleDifference** | pweight only | Full | Full | Multiplier at PSU |\n| **SunAbraham** | Full | Full | -- | Rao-Wu rescaled |\n| **StackedDiD** | pweight only | Full (pweight only) | -- | -- |\n| **ImputationDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **TwoStageDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **ContinuousDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **EfficientDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **SyntheticDiD** | pweight only | bootstrap only (PR #352) | -- | Rao-Wu rescaled (bootstrap only) |\n| **TROP** | pweight only | -- | -- | Rao-Wu rescaled |\n| **BaconDecomposition** | Diagnostic | Diagnostic | -- | -- |\n\n**Legend:**\n- **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance\n- **Full (pweight only)**: Full TSL support with strata/PSU/FPC, but only accepts `pweight` weight type (`fweight`/`aweight` rejected because Q-weight composition changes their semantics)\n- **Partial (no FPC)**: Weights + strata (for df) + PSU (for clustering); FPC raises `NotImplementedError`\n- **pweight only** (Weights column): Only `pweight` accepted; `fweight`/`aweight` raise an error\n- **pweight only** (TSL column): Sampling weights for point estimates; no strata/PSU/FPC design elements\n- **bootstrap only** (TSL column): Strata/PSU/FPC supported only on `variance_method=\"bootstrap\"` via the weighted Frank-Wolfe + Rao-Wu composition (PR #352); placebo and jackknife reject full designs\n- **Diagnostic**: Weighted descriptive statistics only (no inference)\n- **--**: Not supported\n\n**Note on SyntheticDiD (PR #352):** the bootstrap survey path composes per-draw Rao-Wu rescaled weights with a **weighted Frank-Wolfe** variant of `_sc_weight_fw`. Each draw solves `min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²` and composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Pweight-only fits use the constant per-control survey weight as `rw`; full designs use Rao-Wu rescaling per draw. SDID's `placebo` and `jackknife` methods still reject `strata/PSU/FPC` (a separate methodology gap — placebo permutes control indices, jackknife leaves out one unit at a time, both need their own weighted derivations; tracked in `TODO.md`). See `docs/methodology/REGISTRY.md` §SyntheticDiD `Note (survey + bootstrap composition)` for the full objective and the argmin-set caveat.\n\n**Note:** `EfficientDiD` supports `covariates` and `survey_design` simultaneously. The doubly-robust (DR) path threads survey weights through WLS outcome regression, weighted sieve propensity ratios, and survey-weighted kernel smoothing.\n\nFor full details, see `docs/survey-roadmap.md`." + "source": "## 9. Which Estimators Support Survey Design?\n\n`diff-diff` supports survey design across all estimators, though the level of support varies:\n\n| Estimator | Weights | Strata/PSU/FPC (TSL) | Replicate Weights | Survey-Aware Bootstrap |\n|-----------|---------|---------------------|-------------------|------------------------|\n| **DifferenceInDifferences** | Full | Full | -- | -- |\n| **TwoWayFixedEffects** | Full | Full | -- | -- |\n| **MultiPeriodDiD** | Full | Full | -- | -- |\n| **CallawaySantAnna** | pweight only | Full | Full | Multiplier at PSU |\n| **TripleDifference** | pweight only | Full | Full (analytical) | -- |\n| **StaggeredTripleDifference** | pweight only | Full | Full | Multiplier at PSU |\n| **SunAbraham** | Full | Full | -- | Rao-Wu rescaled |\n| **StackedDiD** | pweight only | Full (pweight only) | -- | -- |\n| **ImputationDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **TwoStageDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **ContinuousDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **EfficientDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **SyntheticDiD** | pweight only | Full (all three variance methods) | -- | Hybrid pairs-bootstrap + Rao-Wu (bootstrap); stratified permutation (placebo); PSU-LOO (jackknife) |\n| **TROP** | pweight only | -- | -- | Rao-Wu rescaled |\n| **BaconDecomposition** | Diagnostic | Diagnostic | -- | -- |\n\n**Legend:**\n- **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance\n- **Full (pweight only)**: Full TSL support with strata/PSU/FPC, but only accepts `pweight` weight type (`fweight`/`aweight` rejected because Q-weight composition changes their semantics)\n- **Partial (no FPC)**: Weights + strata (for df) + PSU (for clustering); FPC raises `NotImplementedError`\n- **pweight only** (Weights column): Only `pweight` accepted; `fweight`/`aweight` raise an error\n- **pweight only** (TSL column): Sampling weights for point estimates; no strata/PSU/FPC design elements\n- **Full (all three variance methods)** (SyntheticDiD TSL column): Strata/PSU/FPC supported on all three `variance_method` choices — `bootstrap` via weighted Frank-Wolfe + Rao-Wu, `placebo` via stratified permutation + weighted FW, `jackknife` via PSU-level LOO with stratum aggregation. Replicate-weight designs remain rejected (pre-existing limitation).\n- **Diagnostic**: Weighted descriptive statistics only (no inference)\n- **--**: Not supported\n\n**Note on SyntheticDiD:** all three variance methods now support full strata/PSU/FPC designs.\n\n- **Bootstrap** (PR #355) composes per-draw Rao-Wu rescaled weights with a weighted Frank-Wolfe variant of `_sc_weight_fw`. Each draw solves `min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²` and composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Pweight-only fits use the constant per-control survey weight as `rw`; full designs use Rao-Wu rescaling per draw.\n- **Placebo** uses a stratified permutation allocator: pseudo-treated indices are drawn from controls *within each stratum* containing actual treated units; weighted FW re-estimates ω and λ per draw with per-control survey weights flowing into both loss and regularization. SE follows Arkhangelsky Algorithm 4. The allocator requires at least `n_treated_h` controls per treated-containing stratum; fit-time guards raise targeted `ValueError` on infeasible configurations.\n- **Jackknife** uses PSU-level leave-one-out with stratum aggregation: `SE² = Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)²` (Rust & Rao 1996). FPC folded via `(1-f_h)`; strata with fewer than 2 PSUs are silently skipped. Known anti-conservatism with few PSUs per stratum — for tight SE calibration in that regime, prefer `variance_method=\"bootstrap\"`.\n\nSee `docs/methodology/REGISTRY.md` §SyntheticDiD `Note (survey + bootstrap / placebo / jackknife composition)` for the full objectives, allocator asymmetry rationale (placebo ignores PSU axis, jackknife respects it), and validation details.\n\n**Note:** `EfficientDiD` supports `covariates` and `survey_design` simultaneously. The doubly-robust (DR) path threads survey weights through WLS outcome regression, weighted sieve propensity ratios, and survey-weighted kernel smoothing.\n\nFor full details, see `docs/survey-roadmap.md`." }, { "cell_type": "markdown", diff --git a/tests/test_methodology_sdid.py b/tests/test_methodology_sdid.py index e43ebc8e..d1a6a93e 100644 --- a/tests/test_methodology_sdid.py +++ b/tests/test_methodology_sdid.py @@ -3432,14 +3432,17 @@ def test_cross_period_ramping_trend(self): class TestCoverageMCArtifact: """Schema smoke-check on ``benchmarks/data/sdid_coverage.json``. - The full Monte Carlo study (500 seeds × B=200 × 4 DGPs × 3 methods, - PR #352) runs outside CI; its JSON output underwrites the calibration - table in REGISTRY.md §SyntheticDiD. The 4th DGP (``stratified_survey``) - is bootstrap-only — placebo / jackknife reject strata/PSU/FPC at - fit-time. This test verifies the artifact is present and structured - correctly. Per ``feedback_golden_file_pytest_skip.md``, skip if - missing — CI's isolated-install job copies only ``tests/``, not - ``benchmarks/``. + The full Monte Carlo study (500 seeds × B=200 × 4 DGPs × 3 methods) + runs outside CI; its JSON output underwrites the calibration table in + REGISTRY.md §SyntheticDiD. The 4th DGP (``stratified_survey``) + exercises the bootstrap survey-composition path (PR #355) and the + jackknife PSU-level LOO path (this PR); placebo is structurally + infeasible on this DGP because its cohort packs into a single stratum + with 0 never-treated units, so the harness skips placebo for the + `stratified_survey` block. This test verifies the artifact is present + and structured correctly. Per ``feedback_golden_file_pytest_skip.md``, + skip if missing — CI's isolated-install job copies only ``tests/``, + not ``benchmarks/``. """ def test_coverage_artifacts_present(self): @@ -3494,9 +3497,12 @@ def test_coverage_artifacts_present(self): f"missing alpha {alpha_key} in {dgp}/{method} rejection_rate" ) - # PR #352: stratified_survey is bootstrap-only — placebo and - # jackknife reject strata/PSU/FPC at fit-time, so their blocks - # report n_successful_fits=0. Bootstrap must have the full 500 + # Post-PR #365: stratified_survey runs bootstrap (validation + # gate) + jackknife (anti-conservative but reported for + # transparency); placebo is skipped on this DGP because its + # cohort packs all treated into stratum 1 which has 0 never- + # treated units (Case B at fit-time), so its block reports + # n_successful_fits=0. Bootstrap must have the full 500 # successful fits + finite rejection rate at α=0.05 inside the # calibration gate [0.02, 0.10]. survey_block = payload["per_dgp"]["stratified_survey"] @@ -3508,14 +3514,27 @@ def test_coverage_artifacts_present(self): assert 0.02 <= rej_05 <= 0.10, ( f"stratified_survey bootstrap α=0.05 rejection {rej_05} outside " "calibration gate [0.02, 0.10]; weighted FW + Rao-Wu is " - "miscalibrated. See PR #352 §3c rollback protocol." + "miscalibrated. See PR #355 §3c rollback protocol." ) + # Placebo is structurally infeasible on this DGP: the DGP packs + # all treated units into stratum 1, which has 0 never-treated + # units, so the stratified-permutation allocator raises Case B + # (zero controls in a treated-containing stratum) at fit-time. assert survey_block["placebo"]["n_successful_fits"] == 0, ( "stratified_survey placebo should have 0 successful fits " - "(strata/PSU/FPC raises NotImplementedError at fit-time)" - ) - assert survey_block["jackknife"]["n_successful_fits"] == 0, ( - "stratified_survey jackknife should have 0 successful fits " - "(strata/PSU/FPC raises NotImplementedError at fit-time)" + "(stratified-permutation allocator raises Case B at fit-time " + "because stratum 1 has 0 never-treated units — all treated " + "cohort packs into stratum 1 by DGP construction)." + ) + # Jackknife should now succeed (full-design support added). Its SE + # is known anti-conservative with only 2 PSUs per stratum — that's + # a methodology limitation documented in REGISTRY, not a regression. + # Here we just check that the fit returned a finite SE (survey path + # dispatched correctly); calibration-gate bands are intentionally + # not asserted for jackknife on this DGP. + assert survey_block["jackknife"]["n_successful_fits"] >= 100, ( + "stratified_survey jackknife must have ≥100 successful fits; " + "the PSU-level LOO + stratum aggregation path is broken if " + "this drops to 0." ) diff --git a/tests/test_survey_phase5.py b/tests/test_survey_phase5.py index 46e85373..113b77bf 100644 --- a/tests/test_survey_phase5.py +++ b/tests/test_survey_phase5.py @@ -208,33 +208,76 @@ def test_full_design_bootstrap_succeeds(self, sdid_survey_data, survey_design_fu assert "Survey Design" in summary assert "Bootstrap replications" in summary - def test_full_design_placebo_raises(self, sdid_survey_data, survey_design_full): - """Placebo variance with full design raises NotImplementedError.""" + def test_full_design_placebo_succeeds(self, sdid_survey_data_full_design): + """Placebo variance with full design now succeeds (restored capability). + + Stratified-permutation allocator draws pseudo-treated indices + within each stratum containing treated units; weighted-FW + re-estimates ω and λ per draw on the pseudo-panel. Uses the + non-degenerate full-design fixture (stratum 0 has 5 treated + + 10 controls, so the within-stratum permutation has ``C(10, 5) = + 252`` distinct allocations — SE reflects a genuine null + distribution, not FP noise from a single-allocation collapse). + See REGISTRY §SyntheticDiD "Note (survey + placebo composition)". + """ + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") est = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=42) - with pytest.raises(NotImplementedError, match="does not yet support survey designs with strata/PSU/FPC"): - est.fit( - sdid_survey_data, - outcome="outcome", - treatment="treated", - unit="unit", - time="time", - post_periods=[6, 7, 8, 9], - survey_design=survey_design_full, - ) + result = est.fit( + sdid_survey_data_full_design, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + assert np.isfinite(result.att) + assert np.isfinite(result.se) + # SE must be materially positive, not sub-ULP FP noise from a + # degenerate single-allocation permutation (R4 P1 regression — + # the prior fixture had n_c == n_t in stratum 0, yielding + # SE ≈ 1e-16; the Case D guard below rejects that shape). + assert result.se > 1e-6 + assert result.variance_method == "placebo" + assert result.survey_metadata is not None + assert result.survey_metadata.n_strata is not None + assert result.survey_metadata.n_psu is not None + # summary() renders without exception + summary = result.summary() + assert "Survey Design" in summary - def test_full_design_jackknife_raises(self, sdid_survey_data, survey_design_full): - """Jackknife variance with full design raises NotImplementedError.""" + def test_full_design_jackknife_succeeds( + self, sdid_survey_data_jk_well_formed + ): + """Jackknife variance with full design now succeeds (restored capability). + + PSU-level LOO with stratum aggregation (Rust & Rao 1996): + SE² = Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)². Uses + the well-formed jackknife fixture so every PSU-LOO in every + contributing stratum is defined (treated units spread across two + PSUs). See REGISTRY §SyntheticDiD "Note (survey + jackknife + composition)". + """ + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") est = SyntheticDiD(variance_method="jackknife", seed=42) - with pytest.raises(NotImplementedError, match="does not yet support survey designs with strata/PSU/FPC"): - est.fit( - sdid_survey_data, - outcome="outcome", - treatment="treated", - unit="unit", - time="time", - post_periods=[6, 7, 8, 9], - survey_design=survey_design_full, - ) + result = est.fit( + sdid_survey_data_jk_well_formed, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + assert np.isfinite(result.att) + assert np.isfinite(result.se) + assert result.se > 0 + assert result.variance_method == "jackknife" + assert result.survey_metadata is not None + assert result.survey_metadata.n_strata is not None + assert result.survey_metadata.n_psu is not None + summary = result.summary() + assert "Survey Design" in summary def test_placebo_with_pweight_only_full_design_stripped_att_match( self, sdid_survey_data @@ -548,6 +591,1301 @@ def test_effective_weights_returned(self, sdid_survey_data, survey_design_weight assert eff_vals != pytest.approx(uni_vals, abs=1e-6) +# ============================================================================= +# SyntheticDiD Full-Design Placebo & Jackknife Tests +# ============================================================================= + + +@pytest.fixture +def sdid_survey_data_full_design(): + """Balanced 30-unit panel with adequate stratum structure for full-design. + + 30 units (5 treated 0-4, 25 control 5-29), 10 periods. Treated all in + stratum 0 PSU 0. Controls spread across multiple strata + PSUs so + stratified-permutation placebo has >1 permutation (stratum 0 has + 10 controls, n_t=5 → C(10,5)=252 draws) and PSU-level LOO jackknife + has ≥2 PSUs per stratum so every stratum contributes to variance. + + Layout: + stratum 0: treated PSU 0 (units 0-4), control PSUs 1 & 2 (units 5-14) + stratum 1: control PSUs 3, 4, 5 (units 15-29) + """ + np.random.seed(7) + n_units = 30 + n_periods = 10 + n_treated = 5 + + units = list(range(n_units)) + periods = list(range(n_periods)) + + rows = [] + for u in units: + is_treated = 1 if u < n_treated else 0 + base = np.random.randn() * 2 + for t in periods: + y = base + 0.5 * t + np.random.randn() * 0.5 + if is_treated and t >= 6: + y += 2.0 + rows.append({"unit": u, "time": t, "outcome": y, "treated": is_treated}) + + data = pd.DataFrame(rows) + + unit_weight = 1.0 + np.arange(n_units) * 0.05 + unit_stratum = np.array([0] * 15 + [1] * 15) + unit_psu = np.array( + [0] * 5 + [1] * 5 + [2] * 5 + [3] * 5 + [4] * 5 + [5] * 5 + ) + unit_map = {u: i for i, u in enumerate(units)} + idx = data["unit"].map(unit_map).values + + data["weight"] = unit_weight[idx] + data["stratum"] = unit_stratum[idx] + data["psu"] = unit_psu[idx] + + return data + + +@pytest.fixture +def sdid_survey_design_full(): + return SurveyDesign(weights="weight", strata="stratum", psu="psu") + + +@pytest.fixture +def sdid_survey_data_jk_well_formed(): + """30-unit panel where every jackknife PSU-LOO is defined. + + The Rust & Rao (1996) stratified jackknife formula requires that + every LOO within a contributing stratum produce a defined + ``τ̂_{(h,j)}``. In particular, **every PSU that contains treated + units must also leave enough treated units behind when dropped** — + otherwise the LOO removes all treated and the SDID estimator is + undefined. The "treated all in one PSU" fixture used for the placebo + tests triggers this by design; this fixture distributes the 5 + treated units across **two PSUs within stratum 0** so that LOO of + any treated-containing PSU still leaves ≥1 treated unit. + + Layout: + stratum 0 (13 units): + PSU 0: treated units 0, 1 + control units 5, 6 + PSU 1: treated units 2, 3, 4 + control units 7, 8 + PSU 2: control units 9, 10, 11, 12 + stratum 1 (17 units): + PSU 3: control units 13-17 + PSU 4: control units 18-22 + PSU 5: control units 23-29 + """ + np.random.seed(7) + n_units = 30 + n_periods = 10 + # Treated at unit IDs 0-4. + treated_ids = {0, 1, 2, 3, 4} + + units = list(range(n_units)) + periods = list(range(n_periods)) + + rows = [] + for u in units: + is_treated = 1 if u in treated_ids else 0 + base = np.random.randn() * 2 + for t in periods: + y = base + 0.5 * t + np.random.randn() * 0.5 + if is_treated and t >= 6: + y += 2.0 + rows.append({"unit": u, "time": t, "outcome": y, "treated": is_treated}) + + data = pd.DataFrame(rows) + + unit_weight = 1.0 + np.arange(n_units) * 0.05 + # Stratum: units 0-12 → 0, units 13-29 → 1 + unit_stratum = np.array([0] * 13 + [1] * 17) + # PSU layout (12 stratum-0 units spread across PSU 0/1/2; 17 + # stratum-1 units across PSU 3/4/5). Treated units 0-4 straddle + # PSU 0 (units 0-1) and PSU 1 (units 2-4). + unit_psu = np.zeros(n_units, dtype=int) + unit_psu[0:2] = 0 # PSU 0: treated 0, 1 + unit_psu[2:5] = 1 # PSU 1: treated 2, 3, 4 + unit_psu[5:7] = 0 # PSU 0: control 5, 6 + unit_psu[7:9] = 1 # PSU 1: control 7, 8 + unit_psu[9:13] = 2 # PSU 2: control 9-12 + unit_psu[13:18] = 3 # PSU 3: control 13-17 + unit_psu[18:23] = 4 # PSU 4: control 18-22 + unit_psu[23:30] = 5 # PSU 5: control 23-29 + + unit_map = {u: i for i, u in enumerate(units)} + idx = data["unit"].map(unit_map).values + + data["weight"] = unit_weight[idx] + data["stratum"] = unit_stratum[idx] + data["psu"] = unit_psu[idx] + + return data + + +class TestSDIDSurveyPlaceboFullDesign: + """Stratified-permutation placebo allocator under strata/PSU/FPC (this PR). + + Allocator: pseudo-treated indices are drawn WITHIN each stratum + containing actual treated units; weighted-FW re-estimates ω and λ per + draw with per-control survey weights. See REGISTRY §SyntheticDiD + "Note (survey + placebo composition)". + """ + + def test_placebo_full_design_pseudo_treated_stays_within_treated_strata( + self, sdid_survey_data_full_design, sdid_survey_design_full + ): + """Every draw's pseudo-treated units have stratum ∈ treated-strata set. + + Stratified permutation preserves the treated-stratum marginal + exactly — pseudo-treated never picks from strata with no actual + treated units. Patches ``rng.choice`` inside the per-draw loop + to record every actual ``pseudo_treated_idx`` and asserts each + sampled control's stratum membership ⊆ treated-strata set + across every draw (R9 P3 — direct allocator inspection rather + than just dispatch arg recording). + """ + # Capture every np.random.Generator.choice call inside the per- + # draw loop. ``_placebo_variance_se_survey`` constructs a fresh + # rng = np.random.default_rng(self.seed), so monkey-patching at + # the class level on Generator doesn't intercept it. Instead we + # wrap ``np.random.default_rng`` to return a recording-aware rng. + captured_pseudo_treated_strata: list[set] = [] + + # Run the fit and intercept rng.choice via a thin Generator wrapper. + est = SyntheticDiD(variance_method="placebo", n_bootstrap=30, seed=123) + + # Build the resolved strata arrays the same way fit() does so we + # can map sampled control indices back to their stratum. + # sdid_survey_data_full_design layout: stratum 0 has treated + # 0-4 + controls 5-14, stratum 1 has controls 15-29. + treated_units = list(range(5)) + control_units = list(range(5, 30)) + unit_to_stratum = ( + sdid_survey_data_full_design.groupby("unit")["stratum"].first().to_dict() + ) + strata_control = np.array([unit_to_stratum[u] for u in control_units]) + treated_strata_set = set(unit_to_stratum[u] for u in treated_units) + + # Wrap np.random.default_rng to install a Generator with an + # instrumented `choice`. + import numpy as _np + + original_default_rng = _np.random.default_rng + + class _RecordingChoiceGenerator: + def __init__(self, inner: _np.random.Generator): + self._inner = inner + + def choice(self, a, *args, **kwargs): # type: ignore[override] + idx = self._inner.choice(a, *args, **kwargs) + # `a` is controls_in_h (control-array positions). + # idx is the picked subset (also control-array positions). + picked_strata = strata_control[np.asarray(idx)] + captured_pseudo_treated_strata.append(set(picked_strata.tolist())) + return idx + + def permutation(self, *args, **kwargs): # type: ignore[override] + return self._inner.permutation(*args, **kwargs) + + def __getattr__(self, name): + return getattr(self._inner, name) + + def fake_default_rng(seed=None): + return _RecordingChoiceGenerator(original_default_rng(seed)) + + try: + _np.random.default_rng = fake_default_rng # type: ignore[assignment] + est.fit( + sdid_survey_data_full_design, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sdid_survey_design_full, + ) + finally: + _np.random.default_rng = original_default_rng # type: ignore[assignment] + + # We expect ≥1 rng.choice call per replication × treated-stratum. + # All treated are in stratum 0, so each draw produces exactly one + # choice over stratum-0 controls. + assert len(captured_pseudo_treated_strata) >= 30 + # Every sampled pseudo-treated subset must come from a stratum + # that contains actual treated units (here, only stratum 0). + for draw_strata in captured_pseudo_treated_strata: + assert draw_strata.issubset(treated_strata_set), ( + f"sampled stratum {draw_strata} not subset of " + f"treated_strata_set {treated_strata_set}" + ) + + def test_placebo_full_design_raises_on_zero_control_stratum( + self, sdid_survey_data_full_design + ): + """Case B: stratum with treated units but zero controls → ValueError.""" + df = sdid_survey_data_full_design.copy() + # Move all controls out of stratum 0; treated stays in stratum 0. + df.loc[df["unit"].isin(range(5, 15)), "stratum"] = 1 + + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") + est = SyntheticDiD(variance_method="placebo", n_bootstrap=30, seed=7) + with pytest.raises( + ValueError, match=r"at least one control per stratum.*has 0 controls" + ): + est.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + + def test_placebo_full_design_raises_on_effective_single_support( + self, sdid_survey_data_full_design + ): + """R11 P1 fix: Case D effective single-support — n_t_h == 1 with + only one positive-weight control + zero-weight cohabitants. + + Row-count guards pass (``n_c_h > n_t_h``) and Case E passes + (``n_c_h_positive == n_t_h``), but every successful pseudo- + treated draw collapses to the single positive-weight control's + outcome (zero-weight cohabitants contribute 0 to numerator and + denominator). The placebo distribution is degenerate: SE ≈ 0 + from FP noise across identical means, not a meaningful null. + + Without this guard, the previous code marked the stratum as + non-degenerate based on ``n_c_h > n_t_h`` (raw count), and the + retry loop would silently succeed on any positive-mass pick + with the same effective mean → ``SE = 0.0``. + """ + # Build a fixture where stratum 0 has 1 treated + 1 positive- + # weight control + multiple zero-weight controls; stratum 1 + # has only controls. This sets up effective single-support + # in stratum 0 even though raw n_c_h > n_t_h. + # Reuse sdid_survey_data_full_design but trim to 1 treated and + # zero out most stratum-0 controls' weights. + df = sdid_survey_data_full_design.copy() + # Drop treated units 1-4, keep unit 0 as the sole treated. + df = df[~df["unit"].isin([1, 2, 3, 4])].copy() + # Stratum 0 controls (5-14): keep unit 5 with positive weight, + # zero out 6-14. + df.loc[df["unit"].isin(range(6, 15)), "weight"] = 0.0 + + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") + est = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=42) + with pytest.raises( + ValueError, + match=r"single distinct positive-mass pseudo-treated mean", + ): + est.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + + def test_placebo_full_design_raises_on_zero_weight_controls_in_stratum( + self, sdid_survey_data_full_design + ): + """R9 P1 fix: Case E — treated stratum has raw controls but + zero positive-weight controls. + + Row-count guards (Case B/C) pass because stratum 0 has 10 raw + controls vs 5 treated. But the placebo allocator computes + pseudo-treated means as ``np.average(Y, weights=w_control)``; + if every stratum-0 control has weight 0, every draw's pseudo- + treated subset has zero weight sum (ZeroDivisionError on + np.average). Previously the retry loop swallowed each failure + and the fit reported ``SE=0.0`` with a generic + ``n_successful=0`` warning. The new Case E fit-time guard + rejects up-front with a targeted ValueError. + """ + df = sdid_survey_data_full_design.copy() + # Zero out survey weights for all stratum-0 controls (units 5-14). + # Treated units (0-4) and stratum-1 controls (15-29) keep + # positive weights, so Cases B/C still pass. + df.loc[df["unit"].isin(range(5, 15)), "weight"] = 0.0 + + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") + est = SyntheticDiD(variance_method="placebo", n_bootstrap=30, seed=42) + with pytest.raises( + ValueError, + match=r"at least n_treated controls with positive survey weight", + ): + est.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + + def test_placebo_full_design_raises_on_exact_count_stratum( + self, sdid_survey_data, survey_design_full + ): + """R4 P1 fix: Case D — every treated stratum has n_c == n_t. + + The ``sdid_survey_data`` fixture has 5 treated units + 5 controls + in stratum 0 and 10 controls in stratum 1 (with no treated + units). For placebo stratified permutation, the pseudo-treated + set within stratum 0 is chosen from 5 controls, sized 5 — only + one allocation is possible. Every placebo draw reproduces the + same pseudo-treated set, the placebo null collapses to a + single point, and SE = FP noise (~1e-16). The new Case D guard + rejects this design at fit-time rather than silently reporting + a near-zero SE that would pass a naïve ``result.se > 0`` check. + """ + est = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=42) + with pytest.raises( + ValueError, + match=r"single distinct positive-mass pseudo-treated mean across all draws", + ): + est.fit( + sdid_survey_data, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=survey_design_full, + ) + + def test_placebo_full_design_raises_on_undersupplied_stratum( + self, sdid_survey_data_full_design + ): + """Case C: stratum with n_controls < n_treated → ValueError.""" + df = sdid_survey_data_full_design.copy() + # Move 8 of the 10 stratum-0 controls out; leaves 2 controls + # in stratum 0 with 5 treated → n_c=2 < n_t=5 → Case C. Using + # ``nest=True`` so the shifted PSUs stay unique-within-stratum. + df.loc[df["unit"].isin(range(7, 15)), "stratum"] = 1 + + sd = SurveyDesign( + weights="weight", strata="stratum", psu="psu", nest=True + ) + est = SyntheticDiD(variance_method="placebo", n_bootstrap=30, seed=7) + with pytest.raises( + ValueError, + match=r"at least n_treated controls.*2 controls but 5 treated", + ): + est.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + + def test_placebo_full_design_se_differs_from_pweight_only( + self, sdid_survey_data_full_design + ): + """Full-design placebo SE differs from pweight-only placebo SE. + + Pweight-only path permutes across ALL controls (unstratified); + full-design permutes WITHIN treated-strata only. Different + permutation supports ⇒ different null distributions ⇒ different + SEs. Analog of the bootstrap differs-test. + """ + sd_pweight = SurveyDesign(weights="weight") + sd_full = SurveyDesign(weights="weight", strata="stratum", psu="psu") + + est_pw = SyntheticDiD(variance_method="placebo", n_bootstrap=100, seed=42) + result_pw = est_pw.fit( + sdid_survey_data_full_design, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_pweight, + ) + est_full = SyntheticDiD(variance_method="placebo", n_bootstrap=100, seed=42) + result_full = est_full.fit( + sdid_survey_data_full_design, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_full, + ) + assert result_pw.att == pytest.approx(result_full.att, abs=1e-10) + assert result_pw.se != pytest.approx(result_full.se, abs=1e-6) + + def test_placebo_fpc_alone_no_op_warns_and_matches_pweight_only( + self, sdid_survey_data_full_design + ): + """R8 P1 fix: ``fpc=`` alone does not flip placebo dispatch. + + Permutation tests condition on the observed sample (Pesarin 2001 + §1.5), so FPC's sampling-fraction adjustment doesn't enter + Algorithm 4 or its stratified-permutation survey extension. The + previous dispatcher routed any ``fpc is not None`` design through + ``_placebo_variance_se_survey`` (weighted-FW per draw), silently + changing numerics relative to the no-FPC fit even though FPC + played no role in the math. + + The fix gates placebo's survey-path dispatch on + ``strata is not None OR psu is not None`` only, and emits a + ``UserWarning`` whenever FPC is set on a placebo fit. This test + asserts both: (a) the warning fires and (b) ``SE`` matches the + pweight-only-no-FPC fit at ``rel=1e-12`` (FPC truly is a no-op). + """ + df = sdid_survey_data_full_design.copy() + df["fpc_col"] = 1000.0 # any positive value — no-op on placebo + + sd_fpc_only = SurveyDesign(weights="weight", fpc="fpc_col") + sd_pweight_only = SurveyDesign(weights="weight") + + est_fpc = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=42) + with pytest.warns( + UserWarning, + match=r"SurveyDesign\(fpc=\.\.\.\) is a no-op on variance_method='placebo'", + ): + r_fpc = est_fpc.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_fpc_only, + ) + + est_pw = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=42) + r_pw = est_pw.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_pweight_only, + ) + + # FPC is documented as no-op for placebo: the SE under FPC must + # exactly match the SE without FPC (same dispatch path, no + # numerical drift from the routing flip the dispatcher used to + # introduce on `fpc is not None`). + assert r_fpc.se == pytest.approx(r_pw.se, rel=1e-12) + assert r_fpc.att == pytest.approx(r_pw.att, abs=1e-12) + + def test_placebo_typo_fpc_column_still_raises( + self, sdid_survey_data_full_design + ): + """R13 P3 fix: typoed FPC column name must still raise on placebo. + + The FPC pre-resolve drop on placebo (R11 P1) bypasses + ``SurveyDesign.resolve()``'s missing-column check. A typoed + ``fpc="fpc_typo"`` would be silently ignored behind the no-op + warning, hiding a genuine input-spec mistake. The fix + validates the FPC column name against ``data.columns`` BEFORE + dropping; missing columns surface the same targeted error + ``resolve()`` would have raised. + """ + sd_typo = SurveyDesign(weights="weight", fpc="nonexistent_col") + est = SyntheticDiD(variance_method="placebo", n_bootstrap=30, seed=42) + with pytest.raises( + ValueError, + match=r"FPC column 'nonexistent_col' not found in data", + ): + est.fit( + sdid_survey_data_full_design, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_typo, + ) + + def test_placebo_low_fpc_with_explicit_psu_skips_resolve_validator( + self, sdid_survey_data_full_design + ): + """R11 P1 fix: ``SurveyDesign.resolve()`` itself enforces + ``FPC >= n_PSU`` design-validity, but FPC is a placebo no-op + (Pesarin 2001 §1.5). On the placebo path, FPC is dropped from + a copy of the SurveyDesign before resolution so the + resolve-time validator never fires; the user sees the + documented FPC no-op warning and the fit succeeds. + + Test: explicit ``psu`` + low ``fpc`` (below the per-stratum + ``n_PSU`` threshold) — would normally raise inside + ``SurveyDesign.resolve()`` with "FPC must be >= n_PSU". On + placebo, it succeeds with the no-op warning. Bootstrap on the + same design still raises (validator-skip is variance-method- + gated). + """ + df = sdid_survey_data_full_design.copy() + # Each stratum has 3 PSUs in the well-formed-jackknife layout, + # but sdid_survey_data_full_design has stratum 0 with PSUs + # {0,1,2} and stratum 1 with PSUs {3,4,5} — 3 PSUs each. fpc=2 + # is below the 3-PSU threshold per stratum. + df["fpc_low"] = 2.0 + + sd_low_fpc_psu = SurveyDesign( + weights="weight", strata="stratum", psu="psu", fpc="fpc_low" + ) + sd_no_fpc = SurveyDesign(weights="weight", strata="stratum", psu="psu") + + est_fpc = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=42) + with pytest.warns( + UserWarning, + match=r"SurveyDesign\(fpc=\.\.\.\) is a no-op on variance_method='placebo'", + ): + r_fpc = est_fpc.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_low_fpc_psu, + ) + + est_no_fpc = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=42) + r_no_fpc = est_no_fpc.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_no_fpc, + ) + + # FPC truly is a no-op for placebo even with explicit psu: SE + # matches the no-FPC fit at machine precision. + assert r_fpc.se == pytest.approx(r_no_fpc.se, rel=1e-12) + assert r_fpc.att == pytest.approx(r_no_fpc.att, abs=1e-12) + # Bootstrap on the same low-FPC design still raises the resolve- + # time validator error (validator-skip stays placebo-only). + est_boot = SyntheticDiD(variance_method="bootstrap", n_bootstrap=20, seed=42) + with pytest.raises( + ValueError, + match=r"FPC \(2\.0\) is less than the number of PSUs", + ): + est_boot.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_low_fpc_psu, + ) + + def test_placebo_low_fpc_no_psu_warns_no_validator_block( + self, sdid_survey_data_full_design + ): + """R10 P1 fix: implicit-PSU FPC validator skipped on placebo. + + The implicit-PSU FPC validator (PR #355 R8 P1) rejects designs + where ``psu is None`` and ``fpc < n_units`` because Rao-Wu + bootstrap treats each unit as its own PSU and would fail mid- + draw. But placebo doesn't use FPC at all (Pesarin 2001 §1.5 — + permutation tests condition on the observed sample), so the + validator should be skipped on the placebo path. Otherwise a + legitimate placebo fit raises on a constraint that doesn't + apply to its math. + + Test: ``fpc_col = 5`` is well below ``n_units = 30``, which + would trip the bootstrap validator. With ``variance_method= + "placebo"``, the fit must succeed, emit the documented FPC + no-op ``UserWarning``, and produce SE matching the no-FPC + pweight-only fit (true no-op). + """ + df = sdid_survey_data_full_design.copy() + # 5 << n_units=30 — would trip the bootstrap implicit-PSU + # FPC validator (FPC must be >= n_units when psu is None). + df["fpc_col"] = 5.0 + + sd_low_fpc = SurveyDesign(weights="weight", fpc="fpc_col") + sd_pweight = SurveyDesign(weights="weight") + + est_fpc = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=42) + with pytest.warns( + UserWarning, + match=r"SurveyDesign\(fpc=\.\.\.\) is a no-op on variance_method='placebo'", + ): + r_fpc = est_fpc.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_low_fpc, + ) + + est_pw = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=42) + r_pw = est_pw.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_pweight, + ) + + # FPC truly is a no-op for placebo: SE/ATT must match the no-FPC + # fit at machine precision regardless of FPC value. + assert r_fpc.se == pytest.approx(r_pw.se, rel=1e-12) + assert r_fpc.att == pytest.approx(r_pw.att, abs=1e-12) + # And bootstrap on the same low-FPC design must still raise the + # implicit-PSU validator error (validator stays gated on + # bootstrap/jackknife only). + est_boot = SyntheticDiD(variance_method="bootstrap", n_bootstrap=50, seed=42) + with pytest.raises( + ValueError, + match=r"FPC \(5\.0\) is less than the number of units", + ): + est_boot.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_low_fpc, + ) + + def test_placebo_full_design_psu_only_routes_through_survey_path( + self, sdid_survey_data_jk_well_formed + ): + """R2 P1 regression: PSU/FPC-without-strata placebo routes through + ``_placebo_variance_se_survey`` (with a synthesized single + stratum), matching the documented full-design contract. + + The original implementation only routed through the survey path + when ``strata`` was declared; PSU/FPC-only designs fell through + to the non-survey placebo allocator even though REGISTRY + declares full-design support. Now all three designs (strata, + psu, fpc) dispatch to the weighted-FW survey path. + """ + sd_psu_only = SurveyDesign(weights="weight", psu="psu") + + est = SyntheticDiD(variance_method="placebo", n_bootstrap=30, seed=42) + # Monkeypatch to verify dispatch (sentinel returns distinct from + # both paths). + est._placebo_variance_se_survey = lambda *a, **kw: (42.0, np.array([1.0])) # type: ignore[assignment] + est._placebo_variance_se = lambda *a, **kw: (99.0, np.array([1.0])) # type: ignore[assignment] + result = est.fit( + sdid_survey_data_jk_well_formed, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_psu_only, + ) + # Survey path should have fired — SE reflects 42.0 sentinel, + # rescaled by Y_scale. + assert result.se > 40.0 and result.se < 90.0 + + def test_placebo_dispatches_to_survey_method_under_full_design( + self, sdid_survey_data_full_design, sdid_survey_design_full + ): + """Full design → _placebo_variance_se_survey; pweight-only → _placebo_variance_se. + + Deterministic dispatch test via monkeypatch. Sentinel return + value verifies the right branch fires. + """ + # Full-design dispatch + est = SyntheticDiD(variance_method="placebo", n_bootstrap=30, seed=42) + sentinel = (42.0, np.array([1.0, 2.0, 3.0])) + est._placebo_variance_se_survey = lambda *a, **kw: sentinel # type: ignore[assignment] + est._placebo_variance_se = lambda *a, **kw: (99.0, np.array([])) # type: ignore[assignment] + result = est.fit( + sdid_survey_data_full_design, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sdid_survey_design_full, + ) + # se rescales by Y_scale (normalization applied in fit), so check + # ordering rather than exact sentinel. + assert result.se > 40.0 # distinguishes 42.0 sentinel from 99.0 + assert result.variance_method == "placebo" + + # Pweight-only dispatch + est2 = SyntheticDiD(variance_method="placebo", n_bootstrap=30, seed=42) + est2._placebo_variance_se_survey = lambda *a, **kw: (42.0, np.array([])) # type: ignore[assignment] + est2._placebo_variance_se = lambda *a, **kw: (99.0, np.array([1.0])) # type: ignore[assignment] + sd_pw = SurveyDesign(weights="weight") + result2 = est2.fit( + sdid_survey_data_full_design, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_pw, + ) + # Pweight-only should dispatch to the non-survey method (99.0 * Y_scale) + assert result2.se > 90.0 # distinguishes 99.0 from 42.0 + + +class TestSDIDSurveyJackknifeFullDesign: + """PSU-level LOO jackknife with stratum aggregation (Rust & Rao 1996). + + Variance formula: SE² = Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)² + with f_h = n_h_sampled / fpc[h]. See REGISTRY §SyntheticDiD + "Note (survey + jackknife composition)". + """ + + def test_jackknife_full_design_stratum_aggregation_formula_magnitude( + self, sdid_survey_data_jk_well_formed + ): + """SE² matches the Rust & Rao stratum-aggregation formula exactly. + + Independently recomputes SE from the returned tau_loo_all array + using ``Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)²``; asserts + rtol=1e-12 match. Catches off-by-one in (n_h-1)/n_h, wrong + tau_bar_h, or missing (1-f_h). Uses the well-formed fixture so + every PSU-LOO is defined (6 strata-level replicates total). + """ + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") + est = SyntheticDiD(variance_method="jackknife", seed=42) + result = est.fit( + sdid_survey_data_jk_well_formed, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + assert np.isfinite(result.se) + assert result.se > 0 + # Fixture structure: stratum 0 has PSUs {0, 1, 2} (n_h=3), stratum 1 + # has PSUs {3, 4, 5} (n_h=3). No FPC → f_h=0 for both. Every + # PSU-LOO is well-defined, so tau_loo_all has 3 + 3 = 6 entries + # ordered as [s0 PSU 0, s0 PSU 1, s0 PSU 2, s1 PSU 3, s1 PSU 4, s1 PSU 5]. + taus = np.asarray(result.placebo_effects, dtype=float) + assert len(taus) == 6 + # Apply the Rust & Rao formula by hand. Y_scale rescaling is + # applied uniformly to tau_loo_all inside fit(), so the formula + # holds on the rescaled values. + s0 = taus[:3] + s1 = taus[3:6] + n_h = 3 + factor = (n_h - 1) / n_h # f_h = 0 → (1 - f_h) = 1 + ss0 = np.sum((s0 - s0.mean()) ** 2) + ss1 = np.sum((s1 - s1.mean()) ** 2) + expected_se = np.sqrt(factor * (ss0 + ss1)) + assert result.se == pytest.approx(expected_se, rel=1e-12) + + def test_jackknife_full_design_fpc_reduces_se_magnitude( + self, sdid_survey_data_jk_well_formed + ): + """With FPC, SE is reduced by the (1-f_h) multiplier per stratum. + + Two fits: one without FPC (f_h=0 so (1-f_h)=1); one with FPC set + to a population count such that f_h = n_h/fpc = 3/6 = 0.5. + Expected: SE_fpc = SE_nofpc * sqrt(1-0.5) = SE_nofpc / sqrt(2). + Uses the well-formed fixture so every LOO is defined. + """ + df_no_fpc = sdid_survey_data_jk_well_formed + df_fpc = sdid_survey_data_jk_well_formed.copy() + df_fpc["fpc_col"] = 6.0 # n_h=3 per stratum, f_h = 3/6 = 0.5 + + sd_no_fpc = SurveyDesign(weights="weight", strata="stratum", psu="psu") + sd_fpc = SurveyDesign( + weights="weight", strata="stratum", psu="psu", fpc="fpc_col" + ) + + est1 = SyntheticDiD(variance_method="jackknife", seed=42) + result_no_fpc = est1.fit( + df_no_fpc, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_no_fpc, + ) + est2 = SyntheticDiD(variance_method="jackknife", seed=42) + result_fpc = est2.fit( + df_fpc, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_fpc, + ) + # Expected magnitude ratio: SE_fpc/SE_no_fpc = sqrt(1 - 0.5) = 1/sqrt(2) + assert result_fpc.se == pytest.approx( + result_no_fpc.se / np.sqrt(2), rel=1e-10 + ) + + def test_jackknife_full_design_se_differs_from_pweight_only( + self, sdid_survey_data_jk_well_formed + ): + """Full-design jackknife SE differs from pweight-only jackknife SE. + + Full-design: PSU-level LOO + stratum aggregation. Pweight-only: + unit-level LOO (classical fixed-weight jackknife). Different + resampling granularity ⇒ different SE. + """ + sd_pweight = SurveyDesign(weights="weight") + sd_full = SurveyDesign(weights="weight", strata="stratum", psu="psu") + + est_pw = SyntheticDiD(variance_method="jackknife", seed=42) + result_pw = est_pw.fit( + sdid_survey_data_jk_well_formed, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_pweight, + ) + est_full = SyntheticDiD(variance_method="jackknife", seed=42) + result_full = est_full.fit( + sdid_survey_data_jk_well_formed, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_full, + ) + assert result_pw.att == pytest.approx(result_full.att, abs=1e-10) + assert result_pw.se != pytest.approx(result_full.se, abs=1e-6) + + def test_get_loo_effects_df_raises_on_survey_jackknife( + self, sdid_survey_data_jk_well_formed + ): + """R1 P1 fix: get_loo_effects_df blocks only on full-design survey + jackknife (PSU-level replicates), not on pweight-only jackknife. + + Keys off the explicit ``_loo_granularity`` flag (R2 P1 — the old + ``survey_metadata.n_psu`` heuristic false-positives on pweight- + only fits, which also populate ``n_psu`` via implicit-PSU + metadata but still run unit-level LOO). + """ + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") + est = SyntheticDiD(variance_method="jackknife", seed=42) + result = est.fit( + sdid_survey_data_jk_well_formed, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + assert getattr(result, "_loo_granularity", None) == "psu" + with pytest.raises( + NotImplementedError, + match=r"unit-level-LOO only.*PSU-level LOO with stratum aggregation", + ): + result.get_loo_effects_df() + + def test_get_loo_effects_df_works_on_pweight_only_jackknife( + self, sdid_survey_data, survey_design_weights + ): + """R2 P1 regression: pweight-only jackknife still exposes unit-level + LOO diagnostics through ``get_loo_effects_df``. + + Pweight-only fits populate ``survey_metadata.n_psu`` (via the + implicit-PSU metadata path) but run the non-survey unit-level + jackknife (classical Algorithm 3). The accessor must stay + available on this path — blocking it would regress a documented + diagnostic surface. + """ + est = SyntheticDiD(variance_method="jackknife", seed=42) + result = est.fit( + sdid_survey_data, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=survey_design_weights, + ) + assert getattr(result, "_loo_granularity", None) == "unit" + # Accessor returns a unit-indexed DataFrame with the expected + # schema; positional join is well-defined on the pweight-only + # path because ``placebo_effects`` has length n_control + n_treated. + df = result.get_loo_effects_df() + assert len(df) == result.n_control + result.n_treated + assert set(df.columns) == {"unit", "role", "att_loo", "delta_from_full"} + assert set(df["role"].unique()) <= {"control", "treated"} + + def test_jackknife_full_design_full_census_fpc_returns_zero_se( + self, sdid_survey_data_jk_well_formed + ): + """R5 P1 fix: full-census FPC → SE=0, not NaN. + + Rust & Rao's stratified jackknife formula has an explicit + ``(1 - f_h)`` factor. When ``fpc[h] == n_h`` for every + contributing stratum, ``f_h = 1``, ``(1 - f_h) = 0``, and every + stratum contribution is zero → ``total_variance = 0`` by + legitimate design, not by "every stratum skipped". The correct + jackknife SE in that case is **zero** (full census: no sampling + variance), not NaN. Reserve NaN for the truly-undefined cases + (all strata skipped, undefined PSU-LOO replicate). + """ + df = sdid_survey_data_jk_well_formed.copy() + # Each stratum has n_h=3 PSUs. Setting fpc=3 gives f_h=1 and + # (1 - f_h) = 0 — the formula collapses the stratum contribution + # to zero for legitimate design reasons. + df["fpc_full_census"] = 3.0 + + sd = SurveyDesign( + weights="weight", + strata="stratum", + psu="psu", + fpc="fpc_full_census", + ) + est = SyntheticDiD(variance_method="jackknife", seed=42) + result = est.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + # SE must be exactly zero (legitimate full-census no-sampling + # variance), not NaN (undefined) and not a tiny positive number. + assert np.isfinite(result.se) + assert result.se == 0.0 + + def test_jackknife_full_design_full_census_short_circuits_undefined_loo( + self, sdid_survey_data_full_design + ): + """R6 P1 fix: full-census stratum short-circuits before undefined-LOO. + + ``sdid_survey_data_full_design`` packs all 5 treated units into + stratum 0 PSU 0 → LOO PSU 0 removes all treated → undefined + replicate → would normally return ``SE=NaN``. But if every + stratum has ``fpc = n_h`` (full census, ``f_h = 1`` → + ``(1 - f_h) = 0``), every stratum's variance contribution is + zero regardless of LOO feasibility. The correct jackknife SE + in that case is exactly zero (full census: no sampling + variance), not NaN from an undefined replicate that doesn't + actually enter the formula. + """ + df = sdid_survey_data_full_design.copy() + # Each stratum has n_h=3 PSUs → fpc=3 gives f_h=1 per stratum. + df["fpc_full_census"] = 3.0 + + sd = SurveyDesign( + weights="weight", + strata="stratum", + psu="psu", + fpc="fpc_full_census", + ) + est = SyntheticDiD(variance_method="jackknife", seed=42) + # Fit must succeed without the undefined-LOO warning, and SE + # must be exactly zero (not NaN and not a non-zero number). + result = est.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + assert np.isfinite(result.se) + assert result.se == 0.0 + + def test_jackknife_full_design_lonely_psu_adjust_raises( + self, sdid_survey_data_jk_well_formed + ): + """R5 P1 fix: ``SurveyDesign(lonely_psu='adjust')`` on the jackknife + survey path raises NotImplementedError rather than silently being + treated as ``"remove"``. + + ``"remove"`` and ``"certainty"`` both contribute 0 variance for + singleton strata on the jackknife path, matching canonical R + ``survey::svyjkn`` behavior. ``"adjust"`` requires an overall- + mean fallback per stratum that is not yet implemented; rejecting + upfront prevents silent variance miscomputation. + """ + sd = SurveyDesign( + weights="weight", + strata="stratum", + psu="psu", + lonely_psu="adjust", + ) + est = SyntheticDiD(variance_method="jackknife", seed=42) + with pytest.raises( + NotImplementedError, + match=r"lonely_psu='adjust'.*not supported on the SDID jackknife", + ): + est.fit( + sdid_survey_data_jk_well_formed, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + + def test_jackknife_full_design_all_certainty_psu_returns_zero_se( + self, sdid_survey_data_jk_well_formed + ): + """R12 P1 fix: all-singleton-strata + ``lonely_psu='certainty'`` + returns SE=0 (legitimate zero variance), not NaN. + + Mirrors the broader survey contract from + ``tests/test_survey.py::test_all_certainty_psu_zero_vcov``: + certainty PSUs are sampled with certainty (no sampling + variance). When every stratum is singleton + certainty, the + Rust & Rao stratified jackknife sums zero variance + contributions across strata — this is a legitimate zero, not + an "every stratum was skipped → undefined" case. The previous + code conflated the two and returned ``SE = NaN``. + + Test: collapse the well-formed jackknife fixture so every unit + is its own stratum AND its own PSU (all 30 strata are + singletons). Under ``lonely_psu='certainty'``, the fit must + return SE = 0 exactly, with NaN inference fields downstream + (via ``safe_inference``). Under ``lonely_psu='remove'``, the + same design returns SE = NaN with the "every stratum was + skipped" warning. + """ + df = sdid_survey_data_jk_well_formed.copy() + # Each unit is its own stratum AND its own PSU (all strata + # singletons). + df["stratum"] = df["unit"] + df["psu"] = df["unit"] + + # Under "certainty": legitimate zero-variance contributors → SE=0. + sd_certainty = SurveyDesign( + weights="weight", strata="stratum", psu="psu", lonely_psu="certainty" + ) + est_cert = SyntheticDiD(variance_method="jackknife", seed=42) + result_cert = est_cert.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_certainty, + ) + assert np.isfinite(result_cert.se) + assert result_cert.se == 0.0 + # Inference downstream from SE=0 is NaN via safe_inference (zero + # SE → undefined t-statistic / p-value / CI). + assert np.isnan(result_cert.t_stat) + assert np.isnan(result_cert.p_value) + + # Under "remove": same design returns SE=NaN with the "every + # stratum was skipped" warning (no contributing stratum). + sd_remove = SurveyDesign( + weights="weight", strata="stratum", psu="psu", lonely_psu="remove" + ) + est_rem = SyntheticDiD(variance_method="jackknife", seed=42) + with pytest.warns(UserWarning, match=r"every stratum was skipped"): + result_rem = est_rem.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_remove, + ) + assert np.isnan(result_rem.se) + + def test_jackknife_full_design_lonely_psu_certainty_equivalent_to_remove( + self, sdid_survey_data_jk_well_formed + ): + """``lonely_psu='certainty'`` is accepted and produces the same SE + as ``lonely_psu='remove'`` (both contribute 0 for singleton + strata on the jackknife path). + """ + sd_remove = SurveyDesign( + weights="weight", strata="stratum", psu="psu", lonely_psu="remove" + ) + sd_certainty = SurveyDesign( + weights="weight", strata="stratum", psu="psu", lonely_psu="certainty" + ) + + est1 = SyntheticDiD(variance_method="jackknife", seed=42) + result_remove = est1.fit( + sdid_survey_data_jk_well_formed, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_remove, + ) + est2 = SyntheticDiD(variance_method="jackknife", seed=42) + result_certainty = est2.fit( + sdid_survey_data_jk_well_formed, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_certainty, + ) + assert result_remove.se == pytest.approx(result_certainty.se, rel=1e-14) + + def test_jackknife_full_design_undefined_replicate_returns_nan( + self, sdid_survey_data_full_design + ): + """R1 P0 fix: if any LOO in a contributing stratum is undefined, + the stratified Rust & Rao formula does not apply and SE is NaN. + + ``sdid_survey_data_full_design`` has all treated units in stratum + 0 PSU 0. LOO of PSU 0 removes all treated and the SDID estimator + τ̂_{(0,0)} is undefined. The old code silently skipped this LOO + while still applying the full ``(n_h-1)/n_h = 2/3`` factor, + under-scaling variance (silently wrong SE). The new code returns + NaN + a targeted UserWarning instead. + """ + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") + est = SyntheticDiD(variance_method="jackknife", seed=42) + with pytest.warns( + UserWarning, + match=r"delete-one replicate for stratum 0 PSU 0 is not " + r"computable.*deletion removes all treated units", + ): + result = est.fit( + sdid_survey_data_full_design, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + assert np.isnan(result.se) + + def test_jackknife_full_design_single_psu_stratum_skipped( + self, sdid_survey_data_full_design + ): + """Stratum with only 1 PSU contributes 0 to total variance. + + Degenerate stratum: relabel stratum-0 PSU 1+2 to a new stratum 2 + each with only 1 PSU. Jackknife should silently skip them and + produce SE only from stratum 1 (which still has 3 PSUs). + """ + df = sdid_survey_data_full_design.copy() + # Units 5-9 → stratum 2, PSU 1 alone; units 10-14 → stratum 3, PSU 2 alone + df.loc[df["unit"].isin(range(5, 10)), "stratum"] = 2 + df.loc[df["unit"].isin(range(10, 15)), "stratum"] = 3 + + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") + est = SyntheticDiD(variance_method="jackknife", seed=42) + result = est.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + # Stratum 0 now has only PSU 0 (treated, degenerate LOO). + # Strata 2, 3 each have 1 PSU → skipped. + # Stratum 1 has 3 PSUs → contributes. + # Fit should proceed; SE reflects only stratum 1. + assert np.isfinite(result.se) + assert result.se > 0 + + def test_jackknife_full_design_unstratified_short_circuit( + self, sdid_survey_data_full_design + ): + """No strata + single PSU → SE=NaN (unidentified variance).""" + df = sdid_survey_data_full_design.copy() + df["psu"] = 0 # all units in a single PSU + + # Unstratified single-PSU design + sd = SurveyDesign(weights="weight", psu="psu") + est = SyntheticDiD(variance_method="jackknife", seed=42) + result = est.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + assert np.isnan(result.se) + + def test_jackknife_full_design_all_strata_skipped_warns_and_returns_nan( + self, sdid_survey_data_full_design + ): + """Every stratum has <2 PSUs → UserWarning + NaN SE.""" + df = sdid_survey_data_full_design.copy() + # Collapse so every stratum has only 1 PSU: unit 0→psu0/s0, unit 1→psu1/s1, etc. + df["psu"] = df["unit"] + df["stratum"] = df["unit"] # each unit is its own stratum + + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") + est = SyntheticDiD(variance_method="jackknife", seed=42) + with pytest.warns(UserWarning, match=r"every stratum was skipped|SE is undefined"): + result = est.fit( + df, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd, + ) + assert np.isnan(result.se) + + def test_jackknife_dispatches_to_survey_method_under_full_design( + self, sdid_survey_data_full_design, sdid_survey_design_full + ): + """Full design → _jackknife_se_survey; pweight-only → _jackknife_se.""" + est = SyntheticDiD(variance_method="jackknife", seed=42) + est._jackknife_se_survey = lambda *a, **kw: (42.0, np.array([1.0, 2.0])) # type: ignore[assignment] + est._jackknife_se = lambda *a, **kw: (99.0, np.array([])) # type: ignore[assignment] + result = est.fit( + sdid_survey_data_full_design, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sdid_survey_design_full, + ) + assert result.se > 40.0 # from 42.0 sentinel + + est2 = SyntheticDiD(variance_method="jackknife", seed=42) + est2._jackknife_se_survey = lambda *a, **kw: (42.0, np.array([])) # type: ignore[assignment] + est2._jackknife_se = lambda *a, **kw: (99.0, np.array([1.0])) # type: ignore[assignment] + sd_pw = SurveyDesign(weights="weight") + result2 = est2.fit( + sdid_survey_data_full_design, + outcome="outcome", + treatment="treated", + unit="unit", + time="time", + post_periods=[6, 7, 8, 9], + survey_design=sd_pw, + ) + assert result2.se > 90.0 # from 99.0 sentinel + + # ============================================================================= # TROP Survey Tests # =============================================================================