Restore SDID survey support for placebo and jackknife variance methods#365
Restore SDID survey support for placebo and jackknife variance methods#365
Conversation
|
Overall Assessment ⛔ Blocker — the new full-design SDID jackknife path can report a finite SE even when a required PSU delete-one replicate is undefined, so the returned survey jackknife variance no longer matches the Rust–Rao formula the PR cites. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
Closes the last SDID survey gap (TODO.md row 107). PR #355 restored variance_method="bootstrap" for strata/PSU/FPC via hybrid pairs- bootstrap + Rao-Wu + weighted-FW. This commit extends the same full- design capability to variance_method="placebo" and "jackknife". Placebo allocator — stratified permutation (Pesarin 2001). Pseudo-treated indices drawn within each stratum containing actual treated units; weighted-FW re-estimates ω and λ per draw with per- control survey weights threaded into both loss and regularization (reuses compute_sdid_unit_weights_survey + compute_time_weights_survey from PR #355). New private method _placebo_variance_se_survey. Fit-time front-door guards (per feedback_front_door_over_retry_swallow.md) distinguish two infeasible permutation configurations with targeted ValueError messages: Case B (stratum with treated units has zero controls) and Case C (stratum with treated units has fewer controls than treated). Partial-permutation fallback rejected — it silently changes the null-distribution semantics. Jackknife allocator — PSU-level leave-one-out with stratum aggregation (Rust & Rao 1996). SE² = Σ_h (1-f_h)·(n_h-1)/n_h· Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)². FPC form: f_h = n_h_sampled / fpc[h] (population-count form from survey.py::SurveyDesign.resolve; confirmed via survey.py:338-356 where fpc_h < n_psu_h is the validation constraint). λ held fixed across LOOs; ω subset + rw- composed-renormalized (matches Arkhangelsky Algorithm 3 non-survey semantics — jackknife is variance-approximation, not refit-variance). Strata with n_h < 2 skip silently; total-zero-variance → NaN + UserWarning. Unstratified designs with PSU treated as single-stratum JK1. New private method _jackknife_se_survey. Gate relaxation — deletes the placebo+jackknife+strata/PSU/FPC raise at synthetic_did.py:352-369. Replicate-weight gate at L329-337 unchanged (separate methodology; closed-form replicate variance double-counts with Rao-Wu-like rescaling). fit() dispatcher adds _placebo_use_survey_path / _jackknife_use_survey_path flags routing to the new methods when appropriate; non-survey and pweight-only paths bit-identical by construction (guarded by the same branch isolation pattern used in PR #355 _bootstrap_se). Allocator asymmetry — placebo ignores PSU axis; jackknife respects it. Intentional: placebo is a null-distribution test (stratified unit- level permutation is classical — PSU-level permutation on few PSUs is near-degenerate), while jackknife is a design-based variance approximation (PSU-level LOO is canonical per Rust & Rao). Both respect strata. Rationale documented in method docstrings and REGISTRY (follow-up commit). Tests — tests/test_survey_phase5.py: - TestSyntheticDiDSurvey: flip test_full_design_placebo_raises and test_full_design_jackknife_raises from NotImplementedError→succeeds; assert finite SE > 0, populated survey_metadata, .summary() round-trip. - TestSDIDSurveyPlaceboFullDesign (new class): pseudo-treated-stays- within-treated-strata (monkeypatched recorder), Case B / Case C front-door guards (targeted ValueError match), se-differs-from- pweight-only, deterministic dispatch. - TestSDIDSurveyJackknifeFullDesign (new class): stratum-aggregation self-consistency, fpc-reduces-se magnitude (SE_fpc = SE_nofpc/sqrt(2) at f=0.5, rtol=1e-10), se-differs-from-pweight-only, single-PSU- stratum silently skipped, unstratified short-circuit, all-strata- skipped UserWarning + NaN, deterministic dispatch. Non-survey and pweight-only regressions — all 32 tests in TestBootstrapSE + TestPlaceboSE + TestJackknifeSE pass unchanged; bit-identity preserved by the new-path-gating pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…placebo, jackknife)
Second commit for the SDID survey-placebo/jackknife PR. Extends the
coverage Monte Carlo artifact with jackknife on the stratified_survey
DGP (bootstrap calibration unchanged); promotes the deferred REGISTRY
§SyntheticDiD gap bullets to two landed Notes; updates user-facing
docs to reflect restored capability.
Coverage MC changes
-------------------
* benchmarks/python/coverage_sdid.py: _stratified_survey_design now
returns ("bootstrap", "jackknife") on the methods tuple. Placebo is
omitted because the DGP's cohort packs into a single stratum with 0
never-treated units — stratified-permutation placebo is structurally
infeasible on this DGP (raises Case C at fit-time). Module docstring
explains the exclusion and the jackknife anti-conservatism caveat.
* benchmarks/data/sdid_coverage.json: regenerated stratified_survey
block at n_seeds=500, n_bootstrap=200. Bootstrap validates near-
nominal (α=0.05 rejection = 0.058, SE/trueSD = 1.13). Jackknife row
reports α=0.05 rejection = 0.45, SE/trueSD = 0.46 — documented anti-
conservatism from the stratified jackknife formula with 2 PSUs per
stratum (1 effective DoF per stratum, Rust & Rao 1996 limitation).
REGISTRY.md §SyntheticDiD
-------------------------
* Survey support matrix updated: all three variance methods now
support strata/PSU/FPC (not just bootstrap).
* Two new landed Notes:
- "Note (survey + placebo composition)": stratified-permutation
allocator, weighted-FW refit, ω_eff composition, fit-time
feasibility guards (Case B / Case C), scope note on what is NOT
randomized (within-stratum PSU axis). Cites Pesarin (2001) /
Pesarin & Salmaso (2010).
- "Note (survey + jackknife composition)": PSU-level LOO algorithm,
explicit stratum-aggregation SE² formula, FPC handling (population-
count form from survey.py:338-356), fixed-weights rationale,
degenerate-LOO skip semantics, scope note, known anti-conservatism
with few PSUs per stratum. Cites Rust & Rao (1996).
* "Allocator asymmetry" paragraph in the survey support matrix
documents the intentional asymmetry (placebo ignores PSU, jackknife
respects it) with rationale rooted in each method's role (null-
distribution test vs design-based variance approximation).
* Coverage MC table adds the stratified_survey × jackknife row with
anti-conservatism narrative; placebo row explicitly marked N/A-on-
this-DGP (with pointer to the unit-test coverage).
* Requirements checklist entries updated to describe full-design
support for placebo and jackknife.
Docs sweep
----------
* docs/methodology/survey-theory.md: new bullets describing the
stratified-permutation placebo allocator and the PSU-level LOO
jackknife, parallel to the existing hybrid-bootstrap bullet.
* docs/tutorials/16_survey_did.ipynb cell 35: support matrix SDID
row updated from "bootstrap only (PR #352)" to "Full (all three
variance methods)"; legend amended; "Note on SyntheticDiD" block
rewritten to describe all three allocators with the jackknife
few-PSU caveat.
* docs/survey-roadmap.md: Phase 5 matrix row closes the placebo/
jackknife gap; Phase 6 bullet updated to describe all three
allocators; Current Limitations table entry removed (only replicate-
weight limitation remains, merged into one row).
* CHANGELOG.md: "### Added" entry for placebo + jackknife full-design
support (no new section header — folded into existing Unreleased
block); "### Changed (PR #355)" tweaked to note the separate
follow-up for placebo/jackknife.
* TODO.md row 107 deleted (capability gap closed).
* diff_diff/synthetic_did.py __init__ docstring: survey_design
parameter description rewritten to describe all three methods.
Placebo fallback-guidance comment updated to remove stale "placebo
and jackknife reject strata/PSU/FPC" line.
* diff_diff/guides/llms-full.txt: Phase 5 bootstrap bullet updated
to describe all three survey allocators (UTF-8 fingerprint
preserved — `D'Haultfœuille` still appears throughout).
* tests/test_methodology_sdid.py::TestCoverageMCArtifact: narrative
and assertions updated to reflect that placebo=0-fits is expected
structurally on stratified_survey (documented Case C), while
jackknife now runs successfully with the known anti-conservatism
caveat intentionally unasserted at the calibration-gate level.
Verification
------------
* pytest tests/test_survey_phase5.py::TestSDIDSurveyPlaceboFullDesign
tests/test_survey_phase5.py::TestSDIDSurveyJackknifeFullDesign
tests/test_survey_phase5.py::TestSyntheticDiDSurvey
tests/test_methodology_sdid.py::{TestBootstrapSE,TestPlaceboSE,TestJackknifeSE,TestCoverageMCArtifact}
tests/test_guides.py → 82 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ck get_loo_effects_df on survey jackknife
P0 (Methodology — survey jackknife silently skipping undefined LOO):
The Rust & Rao (1996) stratified jackknife formula `SE² =
Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)²` requires every
PSU-LOO `τ̂_{(h,j)}` to be defined. The previous implementation
silently skipped PSUs whose deletion removed all treated units (or
zeroed control ω_eff mass, or raised in the estimator) while still
applying the full `(n_h-1)/n_h` factor, under-scaling variance on
designs where treated units pack into a single PSU.
Fix: `_jackknife_se_survey` now tracks any undefined replicate in a
contributing stratum (n_h ≥ 2) and short-circuits to `SE=NaN` with a
targeted `UserWarning` naming the stratum / PSU / reason (deletion
removes all treated, kept ω_eff zero, kept treated survey mass zero,
estimator raised, estimator returned non-finite). Partial LOOs are
still returned in `placebo_effects` for debugging; users needing a
variance estimator that accommodates PSU-deletion infeasibility
should use `variance_method="bootstrap"`. Silent stratum-level skip
for `n_h < 2` is preserved (canonical lonely-PSU handling matching
R `survey::svyjkn`).
New regression `test_jackknife_full_design_undefined_replicate_returns_nan`
exercises the fix on the original `sdid_survey_data_full_design`
fixture (treated all in stratum 0 PSU 0 → LOO PSU 0 removes all
treated) and asserts both the `UserWarning` match and `np.isnan(se)`.
The existing jackknife tests that asserted finite SE now use a new
`sdid_survey_data_jk_well_formed` fixture where treated units are
spread across two PSUs within stratum 0 (so every LOO leaves ≥1
treated). The self-consistency test
(`test_jackknife_full_design_stratum_aggregation_formula_magnitude`)
was rewritten from a flaccid finite-positive check to a real
recomputation of the Rust & Rao formula on the returned 6-entry
`placebo_effects` array, asserting `result.se == pytest.approx(
expected, rel=1e-12)`.
Coverage MC (`benchmarks/data/sdid_coverage.json`) is unchanged:
the `stratified_survey` DGP spreads its 32 treated units across
PSUs 2 and 3 within stratum 1 and PSUs 0 and 1 within stratum 0,
so every LOO is defined there too. The previously-reported jackknife
anti-conservatism (α=0.05 rejection = 0.45, SE/trueSD = 0.46) is
the documented few-PSU limitation (1 effective DoF per stratum
with `n_h = 2`), not the P0 silent-skip bug.
P1 (Code Quality — get_loo_effects_df on survey jackknife):
`SyntheticDiDResults.get_loo_effects_df()` assumes a length-N
unit-indexed `placebo_effects` array (first n_control are control-
LOO, next n_treated are treated-LOO). Survey-jackknife fits return
a flat PSU-level replicate array of variable length; joining onto
the fit-time `control_unit_ids + treated_unit_ids` would mislabel
PSU replicates as unit-level effects.
Fix: `get_loo_effects_df()` now raises `NotImplementedError` with a
targeted message pointing to `result.placebo_effects` for the raw
PSU-level array and REGISTRY §SyntheticDiD "Note (survey + jackknife
composition)" for the aggregation formula. New regression
`test_get_loo_effects_df_raises_on_survey_jackknife` asserts the
raise on a survey fit. Non-survey and pweight-only jackknife fits
continue to use `get_loo_effects_df()` as before (unit-level LOO).
P3 (Documentation — stale default variance_method note):
`docs/methodology/REGISTRY.md:L1569` default-variance-method note
rewritten to reflect that all three variance methods now support
full survey designs (removing "full design supported on bootstrap
only" language) and to recommend bootstrap specifically on surveys
with few PSUs per stratum.
Branch also rebased onto current origin/main to pick up PR #356
(agent-profile-panel) and PR #361 — the R1 Maintainability finding
about "unrelated API deletions" was a stale-base-drift artifact
(my branch was created before #356 merged). After rebase the diff
against main shows only SDID-survey changes.
Verification
------------
pytest tests/test_survey_phase5.py
tests/test_methodology_sdid.py::{TestBootstrapSE,TestPlaceboSE,TestJackknifeSE,TestCoverageMCArtifact}
→ 87 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
c0e14c7 to
c2b97e0
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance No material findings in the touched variance code. Maintainability No material maintainability finding beyond the documentation/message drift noted below. Tech Debt No blocker-level tech-debt finding. Removing the old SDID survey-gap row from Security No security findings in the touched code/docs/artifact updates. Documentation/Tests
Path to Approval
|
… path; explicit LOO granularity flag
P1 (Methodology — PSU/FPC-only placebo mismatch with documented contract):
The dispatcher previously routed placebo to ``_placebo_variance_se_survey``
only when ``strata`` was present. PSU-only and FPC-only designs fell
through to the non-survey ``_placebo_variance_se`` path — silently
inconsistent with REGISTRY §SyntheticDiD "Note (survey + placebo
composition)" and the ``fit()`` docstring, which document the
weighted-FW stratified-permutation allocator for any full-design
survey (strata OR PSU OR FPC).
Fix: gate the placebo survey dispatch on ``_full_design_survey`` (the
same flag already used for bootstrap and jackknife). For
PSU/FPC-without-strata designs, ``fit()`` synthesizes a single
stratum (``_strata_control_eff = zeros(n_control)``,
``_strata_treated_eff = zeros(n_treated)``) so the stratified-
permutation allocator degenerates to a global within-stratum
permutation dispatched through the weighted-FW path. Jackknife
dispatch was already stratum-synthesizing; unified both methods on
the same ``_strata_*_eff`` arrays.
New regression ``test_placebo_full_design_psu_only_routes_through_survey_path``
monkeypatches both placebo methods with distinct sentinels and
asserts ``SurveyDesign(weights=..., psu=...)`` (no strata) dispatches
to the survey method on SE magnitude.
P1 (Code Quality — get_loo_effects_df over-broad block):
The R1 fix keyed the accessor guard off ``survey_metadata.n_psu is not
None``. But pweight-only survey fits populate ``n_psu`` too (via the
implicit-PSU metadata path in ``survey.py`` L749-L753); the guard
would false-positive and raise ``NotImplementedError`` on the
previously-supported unit-level LOO diagnostics.
Fix: add an explicit ``_loo_granularity`` attribute on
``SyntheticDiDResults`` set by ``fit()`` to ``"unit"`` (non-survey or
pweight-only jackknife — classical Algorithm 3 unit-level LOO),
``"psu"`` (full-design survey jackknife — PSU-level LOO), or
``None`` (non-jackknife variance methods). ``get_loo_effects_df()``
now keys the raise off ``_loo_granularity == "psu"`` rather than
``survey_metadata.n_psu``.
Two regression tests:
* ``test_get_loo_effects_df_raises_on_survey_jackknife`` — verifies
``_loo_granularity == "psu"`` on a full-design fit and that the
accessor raises ``NotImplementedError`` with the PSU-level pointer
message.
* ``test_get_loo_effects_df_works_on_pweight_only_jackknife`` —
verifies ``_loo_granularity == "unit"`` on a pweight-only fit and
that the accessor returns a unit-indexed DataFrame with the
expected schema (columns ``unit``, ``role``, ``att_loo``,
``delta_from_full``; length ``n_control + n_treated``).
P3 (Documentation — stale messages after R1 fix):
* ``_placebo_variance_se``'s fallback warning (two sites) described
jackknife as "pweight-only only" — no longer true after this PR.
Rewrote to describe both bootstrap and jackknife as supporting
full survey designs (with the jackknife few-PSU caveat pointing to
REGISTRY).
* ``_jackknife_se_survey``'s docstring described "Degenerate LOOs
are skipped per iteration" — stale after the R1 P0 fix switched to
"undefined-LOO → SE=NaN + targeted UserWarning". Rewrote the
bullet to describe the four undefined-replicate conditions and the
NaN-return semantics, distinguishing them from the silent
stratum-skip for ``n_h < 2`` (lonely-PSU case).
* ``coverage_sdid.py`` module docstring and ``REGISTRY.md`` placebo
calibration-row narrative labeled the ``stratified_survey`` placebo
infeasibility as "Case C" (fewer controls than treated). Correct
label is **Case B** (zero controls in a treated-containing stratum)
— the DGP packs all treated into stratum 1, which has 0 never-
treated units.
Verification
------------
pytest tests/test_survey_phase5.py
tests/test_methodology_sdid.py::{TestBootstrapSE,TestPlaceboSE,TestJackknifeSE,TestCoverageMCArtifact}
→ 89 passed (2 new tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality No material findings. Performance No material findings. Maintainability No material findings. Tech Debt No material findings. Removing the old TODO entry is consistent with the code/tests now present. Security No material findings. Documentation/Tests
|
… labels R3 approved the PR with only P3 documentation nits remaining. Stragglers fixed: - ``diff_diff/synthetic_did.py`` ``fit()`` docstring Raises clause and replicate-weight-rejection error message still described the placebo/jackknife full-design paths as unsupported / pweight-only. Rewritten to describe the current contract (all three variance methods accept pweight-only and full strata/PSU/FPC; only replicate-weight designs remain rejected). - ``docs/methodology/survey-theory.md`` §4.2a "Where the IF chain does not apply" still said SyntheticDiD survey support is bootstrap-only. Rewritten to describe all three survey allocators (bootstrap hybrid pairs-bootstrap + Rao-Wu + weighted FW; placebo stratified permutation + weighted FW; jackknife PSU-level LOO + Rust-Rao aggregation). - ``benchmarks/python/coverage_sdid.py`` ``_stratified_survey_design`` docstring and ``tests/test_methodology_sdid.py::TestCoverageMCArtifact`` narrative labeled the ``stratified_survey`` placebo infeasibility as "Case C" (fewer controls than treated). Correct label is **Case B** (zero controls in a treated-containing stratum) — the DGP packs all treated units into stratum 1, which has 0 never-treated units. Verification: 89 passed (no behavior change; docs/messages only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality No material findings. Performance No material findings. Maintainability No material findings. Tech Debt No material findings. Security No material findings. Documentation/Tests
Path to Approval
|
…ta; non-degenerate test fixture P1 (Methodology — degenerate exact-count placebo strata): The Case B / Case C front-door guards rejected ``n_c_h == 0`` and ``n_c_h < n_t_h`` respectively, but allowed ``n_c_h == n_t_h``. For the stratified-permutation allocator, the per-stratum support is ``C(n_c_h, n_t_h)``: when every treated-containing stratum has ``n_c_h == n_t_h``, the only allocation is to pick all ``n_c_h`` controls as pseudo-treated on every draw. All placebo draws produce the same pseudo-treated set, the placebo null collapses to a single point, and SE equals FP noise (~1e-16) from the np.average call order-dependence. A naïve ``result.se > 0`` check spuriously passes. Concretely, ``sdid_survey_data`` (stratum 0: 5 treated + 5 controls, stratum 1: 10 controls, 0 treated) would return SE ≈ 3.79e-16 from placebo, and the R2/R3-era ``test_full_design_placebo_succeeds`` test was passing only because of that sub-ULP noise — the test assertion ``result.se > 0`` is satisfied even when the semantic SE is zero. Fix: add a Case D fit-time guard that rejects the design when every treated-containing stratum has exactly ``n_c_h == n_t_h``. At least one treated stratum must have ``n_c_h > n_t_h`` for the overall permutation support (``∏_h C(n_c_h, n_t_h)``) to be ≥2. ValueError message enumerates the per-stratum (n_c, n_t) counts and points to ``variance_method='bootstrap'`` as the unconstrained alternative. Test changes: * ``test_full_design_placebo_succeeds`` switched from ``sdid_survey_data`` (degenerate exact-count) to ``sdid_survey_data_full_design`` (stratum 0: 5 treated + 10 controls → ``C(10, 5) = 252`` distinct allocations). Tightened the SE assertion from ``> 0`` to ``> 1e-6`` so future regressions back to sub-ULP-noise SE fail loudly. * New ``test_placebo_full_design_raises_on_exact_count_stratum`` asserts the Case D ValueError fires on the old ``sdid_survey_data`` fixture (the regression target that surfaced this issue). P3 (Documentation — remaining bootstrap-only stragglers): * ``docs/methodology/survey-theory.md`` §"Estimator survey variance dispatch" table row for SyntheticDiD still said "Bootstrap only". Updated to "Bootstrap / permutation / PSU-LOO" with a note that all three variance methods support full strata/PSU/FPC designs. * ``tests/test_methodology_sdid.py::TestCoverageMCArtifact`` comment described ``stratified_survey`` as "bootstrap-only — placebo and jackknife reject strata/PSU/FPC at fit-time". Updated to reflect current state: bootstrap is the validation gate, jackknife is reported with anti-conservatism caveat, placebo is skipped due to DGP-specific Case B (all-treated-stratum packs). Verification: 90 passed (1 new Case D regression test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…; REGISTRY docs P1 (Methodology — zero computed variance conflated with undefined): ``_jackknife_se_survey`` previously collapsed ``total_variance <= 0.0`` into ``SE=NaN`` with an "every stratum was skipped" warning. That is correct for the "no stratum contributed" branch (undefined per Rust & Rao) but wrong for legitimate zero-variance outcomes: full-census FPC (``fpc[h] == n_h`` → ``f_h = 1`` → ``(1 - f_h) = 0`` zeros every stratum contribution even when within-stratum dispersion is non-zero) and exact-zero within-stratum dispersion both give ``total_variance = 0`` by construction, not by "undefined". Fix: split the terminal branch. Return ``SE=NaN`` only when no stratum contributed; otherwise return ``SE = sqrt(max(total_variance, 0.0))``. The ``max(..., 0.0)`` protects against sub-FP-epsilon negatives and preserves the legitimate zero case at bit precision. New regression ``test_jackknife_full_design_full_census_fpc_returns_zero_se``: fits on ``sdid_survey_data_jk_well_formed`` with ``fpc=3`` (n_h=3 per stratum → f_h=1 → zero SE by design). Asserts ``result.se == 0.0`` (not NaN). P1 (Methodology — lonely_psu silently ignored on jackknife path): The full-design jackknife always skipped singleton strata (``n_h < 2``) unconditionally, regardless of the user's ``SurveyDesign(lonely_psu=...)`` choice. ``"certainty"`` and ``"adjust"`` were silently degraded to ``"remove"``, which understates SE when the user intended ``"certainty"`` (equivalent to skip on jackknife) or flips what should be a zero-variance certainty case into NaN otherwise. Fix: validate ``resolved_survey_unit.lonely_psu`` at fit-time on the survey jackknife path. ``"remove"`` and ``"certainty"`` are both accepted (they produce the same SE on this path — singleton strata contribute 0 variance under both, matching canonical Rust & Rao / ``survey::svyjkn`` behavior for JKn). ``"adjust"`` (R's overall-mean fallback for singleton strata) is rejected with ``NotImplementedError`` and a targeted message pointing to bootstrap as the unconstrained alternative. Two regressions: * ``test_jackknife_full_design_lonely_psu_adjust_raises`` — verifies the rejection message. * ``test_jackknife_full_design_lonely_psu_certainty_equivalent_to_remove`` — asserts ``SE_remove == SE_certainty`` at ``rel=1e-14`` on the well-formed fixture. P3 (Documentation — REGISTRY lag): * Placebo feasibility Notes documented Cases B and C but missed Case D (the exact-count degeneracy guard added in R4). Split the "Fit-time feasibility guards" paragraph into an explicit 3-case enumeration (B: zero-control-stratum; C: undersupplied stratum; D: all-exact- count strata → single allocation). * ``get_loo_effects_df()`` description still said "Requires variance_method='jackknife'; raises ValueError otherwise." after R2 taught it to also raise ``NotImplementedError`` on PSU-level survey jackknife. Rewrote to distinguish unit-level (available) vs PSU- level (blocked, with pointer to ``result.placebo_effects``). * Added a Zero-variance-vs-undefined distinction paragraph and a "lonely_psu contract" paragraph to the jackknife survey Note, matching the shipped behavior from the two P1 fixes above. Verification: 93 passed (3 new regressions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall AssessmentExecutive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…sync P1 (Methodology — full-census strata could still return NaN under undefined-LOO): The R5 zero-variance-vs-NaN fix correctly returned ``SE=0`` when ``total_variance == 0`` and at least one stratum contributed, but the LOO feasibility loop still ran per-stratum regardless of ``f_h``. If a full-census stratum (``f_h ≥ 1`` → ``(1 - f_h) ≤ 0`` zeros its variance contribution) ALSO had an undefined delete-one replicate (e.g., all treated in the dropped PSU), the code exited via the undefined-replicate branch with ``SE=NaN`` — wrong, because the stratum's contribution is mathematically zero regardless of replicate feasibility. Fix: short-circuit strata with ``f_h >= 1.0`` before the delete-one feasibility loop. Mark as contributing (so ``any_stratum_contributed`` becomes True), skip LOO computation, continue to the next stratum. New regression ``test_jackknife_full_design_full_census_short_circuits_undefined_loo``: uses ``sdid_survey_data_full_design`` (all 5 treated in stratum 0 PSU 0 — LOO PSU 0 removes all treated, triggering the undefined- replicate branch in non-full-census fits) with ``fpc = n_h = 3`` (full census) and asserts ``SE == 0.0``, not NaN. P3 (Documentation — stale zero-variance + PSU-LOO wording): * ``CHANGELOG.md`` still said "total-zero-variance → NaN + UserWarning" (R5-era). Rewrote to spell out the full contract: legitimate zero variance → ``SE=0``; undefined replicates / all-strata-skipped → ``SE=NaN`` + targeted warning; full-census short-circuit + lonely_psu ``"remove"``/``"certainty"`` acceptance vs ``"adjust"`` rejection. Also enumerated Case B/C/D on the placebo feasibility line. * ``diff_diff/results.py::get_loo_effects_df`` docstring described only the unit-level unit-id-join behavior; after the R2 fix the accessor raises ``NotImplementedError`` on PSU-level survey jackknife. Rewrote docstring with explicit "Available on" / "Blocked on" sections pointing to ``result.placebo_effects`` for the raw PSU-level replicate array. * ``diff_diff/guides/llms-full.txt`` ``get_loo_effects_df()`` bullet still described it as generic unit-level only; updated to call out the NotImplementedError on full-design survey jackknife (PSU-level replicates). * ``docs/survey-roadmap.md`` Phase 5 SDID row and ``docs/methodology/survey-theory.md`` §4.2b PSU-level LOO bullet updated to surface (a) the ``lonely_psu="adjust"`` rejection, (b) the full-census short-circuit, (c) the Case D placebo guard, and (d) the zero-variance-vs-NaN distinction — all aligned with the shipped behavior and REGISTRY. Verification: 115 passed (1 new full-census regression; all previously passing tests plus guides unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…h harness docstring P3 (Maintainability — survey jackknife still populated _loo_unit_ids): ``fit()`` was unconditionally setting ``_loo_unit_ids`` / ``_loo_roles`` on every jackknife fit, including full-design survey fits where the underlying replicates are PSU-level and ``get_loo_effects_df()`` now raises ``NotImplementedError``. Internal / canned guidance keyed off ``_loo_unit_ids is not None`` as the availability check (e.g., ``practitioner.py``) would still call the accessor on a survey fit and hit the new raise. Fix: only populate ``_loo_unit_ids`` / ``_loo_roles`` when ``_loo_granularity == "unit"``; leave them ``None`` on the PSU path so ``_loo_unit_ids is not None`` correctly reports availability. ``_loo_granularity`` is the authoritative accessor gate; the legacy ``_loo_unit_ids`` sentinel now agrees with it. P3 (Documentation — harness docstring stale): ``coverage_sdid.py::_fit_one`` docstring said "fit() routes [survey designs] through the bootstrap survey path (PR #352) when method=='bootstrap'" — stale after the placebo + jackknife full- design paths landed. Rewrote to describe the three method-specific survey variance paths (weighted-FW + Rao-Wu bootstrap; stratified- permutation + weighted-FW placebo; PSU-LOO + stratum-aggregation jackknife) and mention the Case B-D ValueError failure modes alongside NotImplementedError. Verification: 94 passed (no behavior change on the gating fix — it's a state-gating tightening, not a correctness change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall AssessmentExecutive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…no-op contract P1 (Methodology — placebo dispatch flipped on FPC alone, but FPC plays no role in placebo math): The dispatcher gated placebo's survey-path routing on ``_full_design_survey = strata is not None OR psu is not None OR fpc is not None``. Adding an ``fpc=`` column to a SurveyDesign therefore silently switched dispatch from the non-survey placebo path (unweighted-FW + post-hoc ω composition) to the weighted-FW survey placebo path — different numerics — even though permutation tests are conditional on the observed sample (Pesarin 2001 §1.5) and the sampling fraction never enters Algorithm 4 or its stratified- permutation survey extension. The reviewer correctly flagged this as an undocumented methodology mismatch on a public variance method. Fix: * Gate ``_placebo_use_survey_path`` on ``strata is not None OR psu is not None`` (FPC dropped from the trigger). FPC alone now keeps placebo on the non-survey path with no numerical drift relative to the no-FPC fit. * Emit a ``UserWarning`` whenever ``fpc`` is set with ``variance_method="placebo"``, regardless of whether ``strata`` or ``psu`` are also set, so users get an explicit signal that the FPC column is preserved in design metadata but does not enter placebo math. Recommends ``variance_method="bootstrap"`` or ``"jackknife"`` for FPC participation. * REGISTRY §SyntheticDiD "Note (survey support matrix)" placebo bullet rewritten to spell out the contract: "for designs with explicit ``strata`` and/or ``psu`` … FPC is a documented no-op for placebo — permutation tests are conditional on the observed sample (Pesarin 2001 §1.5)." * survey-theory.md placebo bullet picks up the same FPC no-op language plus the Case B/C/D guard enumeration from R5. New regression ``test_placebo_fpc_alone_no_op_warns_and_matches_pweight_only`` asserts both contracts: (a) ``UserWarning`` fires when fpc is set on placebo, (b) SE under ``SurveyDesign(weights, fpc)`` matches SE under ``SurveyDesign(weights)`` at ``rel=1e-12`` (true no-op, not a silent dispatch flip introducing weighted-FW drift). Bootstrap and jackknife paths unchanged — they use FPC legitimately (Rao-Wu rescaling for bootstrap, ``(1 - f_h)`` factor in the Rust & Rao 1996 jackknife formula). Only placebo's contract narrows. Verification: 95 passed (1 new FPC no-op regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall AssessmentExecutive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…raw allocator regression P1 (Methodology — placebo Case E weight-aware feasibility): The Case B / C / D fit-time guards count raw rows per stratum, but the allocator computes pseudo-treated means as ``np.average(Y[:, pseudo_treated_idx], weights=w_control[pseudo_treated_idx])``. A treated-containing stratum can pass row-count guards while having fewer positive-weight controls than treated units — every draw can then pick a pseudo-treated subset whose weights all sum to zero (``ZeroDivisionError`` inside np.average), the per-draw retry loop swallows the failure as a generic ``n_successful=0`` warning, and the fit reports ``SE=0.0`` instead of a targeted methodology error. Fix: add a Case E front-door guard that rejects any treated- containing stratum with ``n_positive_weight_controls_h < n_treated_h``. Ordered after Case B/C (row-count failures) so the existing row-count error messages still fire when relevant; Case E catches the remaining "rows present, weights insufficient" gap. New regression ``test_placebo_full_design_raises_on_zero_weight_controls_in_stratum``: zeros out ``weight`` for all stratum-0 controls (units 5-14) on ``sdid_survey_data_full_design`` (which has 10 stratum-0 controls, 5 treated). Row-count guards pass (10 ≥ 5) but Case E now rejects with the targeted "at least n_treated controls with positive survey weight" message instead of the late ``SE=0.0`` warning. REGISTRY enumeration updated to four cases (B, C, E, D) with the weight-aware language; Validation bullet bumped to reflect the new regression. P3 (Documentation/Tests — placebo allocator regression too weak): ``test_placebo_full_design_pseudo_treated_stays_within_treated_strata`` previously only recorded the dispatch arguments to ``_placebo_variance_se_survey``; it did not observe any actual pseudo-treated indices, so it would not catch an allocator bug inside the per-draw loop. Fix: install a recording wrapper around ``np.random.default_rng`` that intercepts every ``rng.choice`` call inside the per-draw loop and records the sampled control indices' stratum memberships. Assert every recorded draw's sampled stratum ⊆ treated-strata set across all 30 replications, directly verifying the within-stratum permutation contract from REGISTRY. Verification: 96 passed (2 new regressions; existing Case B/C/D guards still fire on their fixtures). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…trap/jackknife only P1 (Methodology — implicit-PSU FPC validator leaked into placebo): PR #355 R8 P1 added a fit-time validator that rejects ``psu=None`` + ``fpc < n_units`` designs, because Rao-Wu bootstrap treats each unit as its own PSU and would fail mid-draw with the bootstrap loop swallowing the error as a generic exhaustion message. The validator ran unconditionally on every survey fit. After R8 documented FPC as a placebo no-op (Pesarin 2001 §1.5 — permutation tests condition on the observed sample), this validator became inconsistent: a placebo fit with low FPC and no explicit ``psu`` would still raise a "FPC must be ≥ n_units" error for a constraint that doesn't apply to the placebo math. Fix: gate the implicit-PSU FPC validator on ``self.variance_method in ("bootstrap", "jackknife")``. Both methods genuinely consume FPC (Rao-Wu rescaling for bootstrap, Rust & Rao ``(1 - f_h)`` factor for jackknife). Placebo proceeds to the documented no-op warning path regardless of FPC value. New regression ``test_placebo_low_fpc_no_psu_warns_no_validator_block``: sets ``fpc_col = 5`` (well below n_units=30) with no PSU. Asserts (a) placebo fit succeeds, (b) emits the documented FPC-no-op ``UserWarning``, (c) SE matches the no-FPC pweight-only fit at ``rel=1e-12``, AND (d) bootstrap on the same low-FPC design still raises the validator error (gating preserves bootstrap/jackknife behavior — only placebo's FPC contract changes). Verification: 97 passed (1 new low-FPC placebo regression; existing bootstrap/jackknife FPC validation regressions still fire on their fixtures). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall AssessmentExecutive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…ctive-support guard P1 #1 (FPC validator in SurveyDesign.resolve fires on placebo with explicit psu): The R10 fix gated the in-fit implicit-PSU FPC validator on bootstrap/jackknife only, but ``SurveyDesign.resolve()`` itself enforces ``FPC >= n_PSU`` design-validity (survey.py:349-368) before ``synthetic_did.fit()`` even sees the resolved object. So a placebo fit with explicit ``psu`` and low ``fpc`` would still raise — same parameter-interaction problem one layer earlier in resolution. Fix: when ``variance_method == "placebo"`` and ``survey_design.fpc is not None``, construct an FPC-stripped copy of the SurveyDesign (``dataclasses.replace(survey_design, fpc=None)``) BEFORE calling ``_resolve_survey_for_fit``. Emit the FPC no-op ``UserWarning`` at the same time. The original ``survey_design`` object is preserved (caller's reference unchanged); the resolved unit-level survey design carries no FPC on placebo, so the in-fit validators (and the downstream FPC-related dispatch flags) all correctly skip FPC handling. The duplicate downstream FPC no-op warning (added in R8 keyed on ``resolved_survey_unit.fpc``) becomes unreachable on placebo and is removed. New regression ``test_placebo_low_fpc_with_explicit_psu_skips_resolve_validator``: asserts (a) placebo with explicit psu + ``fpc < n_PSU`` succeeds + emits no-op warning, (b) SE matches the no-FPC fit at ``rel=1e-12``, (c) bootstrap on the same low-FPC design still raises ``"FPC (2.0) is less than the number of PSUs"`` from ``SurveyDesign.resolve()`` — validator-skip is correctly variance- method-gated. P1 #2 (Case D missed effective single-support): The Case D guard for placebo degeneracy keyed on raw control counts (``n_c_h > n_t_h`` for at least one stratum). It missed the case where ``n_c_h_positive < 2`` for every treated stratum: rows allow multiple subsets, but every successful pseudo-treated mean reduces to the unique positive-weight control's outcome (zero-weight cohabitants contribute 0 to numerator and denominator, R11 P1). The placebo null collapses to a single point and SE = FP noise. Fix: extend the non-degeneracy invariant to require **both** ``n_c_h > n_t_h`` AND ``n_c_h_positive >= 2`` for at least one treated stratum. The classical Case D shape (raw exact-count ``n_c_h == n_t_h``) and the new "effective single-support" shape (positive-weight controls < 2 even with extra zero-weight rows) both trigger Case D. Updated the Case D error message to enumerate ``n_c_positive`` alongside ``n_c`` / ``n_t`` per stratum. New regression ``test_placebo_full_design_raises_on_effective_single_support``: constructs a fixture with 1 treated unit + 1 positive-weight control + 9 zero-weight controls in stratum 0; raw guards (B/C/E) pass but Case D fires with the new "single distinct positive-mass pseudo-treated mean" message. Updated existing ``test_placebo_full_design_raises_on_exact_count_stratum`` regex to match the new message (same Case D path, slightly different wording). REGISTRY §SyntheticDiD Case enumeration updated: Case D now documents both the classical (``n_c == n_t``) and effective single- support (``n_c_positive < 2``) shapes, with the combined non- degeneracy invariant. Verification: 98 passed (2 new regressions; existing Case B/C/E/D- classical guards still fire on their fixtures). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall AssessmentExecutive Summary
Methodology
Paper-level cross-checks otherwise look coherent: the jackknife path still treats learned SDID weights as fixed during variance estimation, which matches the paper’s large-sample inference framing, and the placebo path remains permutation-like rather than bootstrap-like; the survey-specific stratified-placebo / PSU-jackknife extension is a local adaptation documented in Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…ve' on jackknife survey P1 (Methodology — all-singleton certainty design returned NaN instead of zero): ``_jackknife_se_survey`` treated ``lonely_psu="certainty"`` and ``"remove"`` as equivalent — both silently skipped singleton strata (``n_h < 2``). When every stratum is singleton + certainty, the fit fell through the "every stratum was skipped → NaN" branch even though the library's broader survey contract (and ``tests/test_survey.py::test_all_certainty_psu_zero_vcov``) defines certainty PSUs as zero-variance contributors: an all-certainty design yields ``vcov = 0``, not NaN. Fix: thread ``resolved_survey_unit.lonely_psu`` into ``_jackknife_se_survey``. Distinguish the singleton-stratum branch: * ``"remove"`` (default): silent skip — matches R ``survey::svyjkn`` lonely-PSU="remove". All-singleton design → ``SE = NaN`` (no contributing stratum). * ``"certainty"``: stratum still adds 0 variance, but is marked ``any_stratum_contributed = True`` — explicit zero-variance contribution. All-certainty design → ``SE = 0.0`` (legitimate zero, downstream ``safe_inference`` propagates NaN to t-stat / p-value / CI as the SE=0 contract requires). New regression ``test_jackknife_full_design_all_certainty_psu_returns_zero_se``: mirrors ``test_all_certainty_psu_zero_vcov`` from the broader survey suite. Constructs a 30-stratum 1-PSU/stratum design from the well-formed jackknife fixture, asserts: * ``"certainty"`` → ``SE = 0`` exactly, ``t_stat`` and ``p_value`` NaN via ``safe_inference``; * ``"remove"`` → ``SE = NaN`` with the "every stratum was skipped" warning. Method signature: ``lonely_psu`` parameter added at the end (after ``fpc_treated``) to keep the existing arg order intact. REGISTRY ``lonely_psu`` contract updated to spell out the ``"remove"`` vs ``"certainty"`` semantic split for all-singleton designs. Verification: 100 passed (1 new all-certainty regression; existing ``test_jackknife_full_design_lonely_psu_certainty_equivalent_to_remove`` still passes — on a fixture with at least one non-singleton stratum, the two modes still produce the same SE because the singleton-stratum branch isn't reached). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…resolve drop
P3 (Code Quality — typoed FPC column silently ignored on placebo):
The R11 P1 fix dropped ``fpc`` from a copy of ``SurveyDesign`` BEFORE
``_resolve_survey_for_fit()`` to bypass the resolve-time
``FPC >= n_PSU`` validator on placebo. Side effect: the missing-
column check inside ``SurveyDesign.resolve()`` (survey.py:326-329 —
``raise ValueError(f"FPC column '{self.fpc}' not found in data")``)
also no longer ran on placebo. A typoed ``fpc="fpc_typo"`` would be
silently dropped behind the no-op warning, hiding a genuine input-
spec mistake even though the value is mathematically harmless.
Fix: validate the original ``survey_design.fpc`` column name exists
in ``data.columns`` BEFORE replacing it with ``None``. Raise the same
targeted error string ``SurveyDesign.resolve()`` would have raised so
input-spec mistakes still surface on placebo, even when FPC's
*value* doesn't enter the variance computation.
New regression ``test_placebo_typo_fpc_column_still_raises``:
asserts ``ValueError`` with the exact "FPC column 'nonexistent_col'
not found in data" message on a typoed FPC + placebo fit. Existing
low-FPC + placebo regressions still pass (column exists; FPC value
is dropped post-validation as before).
Verification: 101 passed (1 new column-validation regression).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality No material findings. Performance No material findings. Maintainability No material findings. Tech Debt No material findings. The old SDID placebo/jackknife full-design gap was removed from Security No material findings. Documentation/Tests
|
Summary
variance_method=\"bootstrap\"for strata/PSU/FPC via hybrid pairs-bootstrap + Rao-Wu + weighted-FW. This PR extends full-design support tovariance_method=\"placebo\"and\"jackknife\", so all three SDID variance methods now handle both pweight-only and strata/PSU/FPC designs.compute_sdid_unit_weights_survey/compute_time_weights_surveyfrom PR Restore SDID survey-bootstrap via weighted Frank-Wolfe + Rao-Wu composition #355). Fit-time feasibility guards distinguish Case B (zero controls in a treated stratum) and Case C (fewer controls than treated in a treated stratum) with targetedValueErrormessages — partial-permutation fallback rejected since it would silently change the null distribution.SE² = Σ_h (1-f_h)·(n_h-1)/n_h·Σ_{j∈h}(τ̂_{(h,j)} - τ̄_h)²withf_h = n_h_sampled / fpc[h](population-count FPC form fromsurvey.py:338-356). λ held fixed across LOOs; ω subsetted, composed with rw, renormalized. Strata withn_h < 2silently skipped; total-zero-variance →NaN+UserWarning. Unstratified single-PSU short-circuits toNaN.NotImplementedErrorguard atsynthetic_did.py:352-369that previously rejected placebo/jackknife + strata/PSU/FPC. Replicate-weight designs remain rejected (closed-form variance double-counts with Rao-Wu-like rescaling).benchmarks/data/sdid_coverage.jsonextended with jackknife onstratified_survey. Bootstrap validates near-nominal (α=0.05 rejection = 0.058, SE/trueSD = 1.13). Jackknife reported with anti-conservatism caveat —se_over_truesd ≈ 0.46with only 2 PSUs per stratum is a well-documented limitation of the stratified jackknife formula (1 effective DoF per stratum). Placebo is structurally infeasible on the existingstratified_surveyDGP (its cohort packs into one stratum with 0 never-treated units); the placebo survey path is exercised via unit tests on a feasible fixture.Methodology references
SyntheticDiD._placebo_variance_se_survey(stratified permutation + weighted-FW),SyntheticDiD._jackknife_se_survey(PSU-level LOO with stratum aggregation).variance_method=\"bootstrap\".Validation
tests/test_survey_phase5.py):test_full_design_placebo_raises→test_full_design_placebo_succeeds+ same for jackknife (assert finite SE > 0, populatedsurvey_metadata,.summary()round-trip).TestSDIDSurveyPlaceboFullDesign: pseudo-treated-stratum contract (monkeypatched recorder), Case B / Case C front-door guards with targeted-message regression, SE-differs-from-pweight-only, deterministic dispatch.TestSDIDSurveyJackknifeFullDesign: FPC magnitude regression (2-stratum handcrafted panel assertsSE_fpc == SE_nofpc · sqrt(1 - f)atrtol=1e-10withf=0.5), stratum-aggregation self-consistency, SE-differs-from-pweight-only, single-PSU-stratum silently skipped, unstratified short-circuit returns NaN, all-strata-skipped warning + NaN, deterministic dispatch.tests/test_methodology_sdid.py::TestCoverageMCArtifactnarrative + assertions updated.TestBootstrapSE+TestPlaceboSE+TestJackknifeSE, 19 inTestSyntheticDiDSurvey) by gating the new code path on_full_design_survey.n_seeds=500,n_bootstrap=200. Bootstrap + stratified_survey: α=0.05 rejection = 0.058, SE/trueSD = 1.13 (unchanged from PR Restore SDID survey-bootstrap via weighted Frank-Wolfe + Rao-Wu composition #355). Jackknife + stratified_survey: α=0.05 rejection = 0.45, SE/trueSD = 0.46 (documented anti-conservatism caveat).n_seeds=100confirmed that placebo is structurally infeasible on this DGP;stratified_survey_design_factorynow omits placebo with documented rationale.Security / privacy