diff --git a/CHANGELOG.md b/CHANGELOG.md index c7607cec..f7c52d80 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - **`qug_test` and `did_had_pretest_workflow` survey-aware NotImplementedError gates (Phase 4.5 C0 decision gate).** `qug_test(d, *, survey=None, weights=None)` and `did_had_pretest_workflow(..., *, survey=None, weights=None)` now accept the two kwargs as keyword-only with default `None`. Passing either non-`None` raises `NotImplementedError` with an educational message naming the methodology rationale and pointing users to joint Stute (Phase 4.5 C, planned) as the survey-compatible alternative. Mutex guard on `survey=` + `weights=` mirrors `HeterogeneousAdoptionDiD.fit()` at `had.py:2890`. **QUG-under-survey is permanently deferred** — the test statistic uses extreme order statistics `D_{(1)}, D_{(2)}` which are NOT smooth functionals of the empirical CDF, so standard survey machinery (Binder-TSL linearization, Rao-Wu rescaled bootstrap, Krieger-Pfeffermann (1997) EDF tests) does not yield a calibrated test; under cluster sampling the `Exp(1)/Exp(1)` limit law's independence assumption breaks; and the EVT-under-unequal-probability-sampling literature (Quintos et al. 2001, Beirlant et al.) addresses tail-index estimation, not boundary tests. The workflow's gate is **temporary** — Phase 4.5 C will close it for the linearity-family pretests with mechanism varying by test: Rao-Wu rescaled bootstrap for `stute_test` and the joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`); weighted OLS residuals + weighted variance estimator for `yatchew_hr_test` (Yatchew 1997 is a closed-form variance-ratio test, not bootstrap-based). Sister pretests (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`) keep their closed signatures in this release — Phase 4.5 C will add kwargs and implementation together to avoid API churn. Unweighted `qug_test(d)` and `did_had_pretest_workflow(...)` calls are bit-exact pre-PR (kwargs are keyword-only after `*`; positional path unchanged). New tests at `tests/test_had_pretests.py::TestQUGTest` (5 rejection / mutex / message / regression tests) and the new `TestHADPretestWorkflowSurveyGuards` class (6 tests covering both kwarg paths, mutex, methodology pointer, both aggregate paths, and unweighted regression). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C0) for the full methodology rationale plus a sketch of the (out-of-scope) theoretical bridge that combines endpoint-estimation EVT (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999, Beirlant-de Wet-Goegebeur 2006), survey-aware functional CLTs (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017), and tail-empirical-process theory (Drees 2003) — publishable methodology research, not engineering work. - **`HeterogeneousAdoptionDiD` mass-point `survey=` / `weights=` + event-study `aggregate="event_study"` survey composition + multiplier-bootstrap sup-t simultaneous confidence band (Phase 4.5 B).** Closes the two Phase 4.5 A `NotImplementedError` gates: `design="mass_point" + weights/survey` and `aggregate="event_study" + weights/survey`. Weighted 2SLS sandwich in `_fit_mass_point_2sls` follows the Wooldridge 2010 Ch. 12 pweight convention (`w²` in the HC1 meat, `w·u` in the CR1 cluster score, weighted bread `Z'WX`); HC1 and CR1 ("stata" `se_type`) bit-parity with `estimatr::iv_robust(..., weights=, clusters=)` at `atol=1e-10` (new cross-language golden at `benchmarks/data/estimatr_iv_robust_golden.json`, generated by `benchmarks/R/generate_estimatr_iv_robust_golden.R`; `estimatr` added to `benchmarks/R/requirements.R`). `_fit_mass_point_2sls` gains `weights=` + `return_influence=` kwargs and now always returns a 3-tuple `(beta, se, psi)` — `psi` is the per-unit IF on the β̂-scale scaled so `compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1[1,1]` at `atol=1e-10` (PR #359 IF scale convention applied uniformly; no `sum(psi²)` claims). Event-study per-horizon variance: `survey=` path composes Binder-TSL via `compute_survey_if_variance`; `weights=` shortcut uses the analytical weighted-robust SE (continuous: CCT-2014 `bc_fit.se_robust / |den|`; mass-point: weighted 2SLS pweight sandwich from `_fit_mass_point_2sls` — HC1 / classical / CR1). `survey_metadata` / `variance_formula` / `effective_dose_mean` populated in both regimes (previously hardcoded `None` at `had.py:3366`). New multiplier-bootstrap sup-t: `_sup_t_multiplier_bootstrap` reuses `diff_diff.bootstrap_utils.generate_survey_multiplier_weights_batch` for PSU-level draws with stratum centering + sqrt(n_h/(n_h-1)) small-sample correction + FPC scaling + lonely-PSU handling. On the `weights=` shortcut, sup-t calibration is routed through a synthetic trivial `ResolvedSurveyDesign` so the centered + small-sample-corrected branch fires uniformly — targets the analytical HC1 variance family (`compute_survey_if_variance(IF, trivial) ≈ V_HC1` per the PR #359 IF scale invariant) rather than the raw `sum(ψ²) = ((n-1)/n) · V_HC1` that unit-level Rademacher multipliers would produce on the HC1-scaled IF. Perturbations: `delta = weights @ IF` with NO `(1/n)` prefactor (matching `staggered_bootstrap.py:373` idiom), normalized by per-horizon analytical SE, `(1-alpha)`-quantile of the sup-t distribution. At H=1 the quantile reduces to `Φ⁻¹(1 − alpha/2) ≈ 1.96` up to MC noise (regression-locked by `TestSupTReducesToNormalAtH1`). `HeterogeneousAdoptionDiD.__init__` gains `n_bootstrap: int = 999` and `seed: Optional[int] = None` (CS-parity singular seed); `fit()` gains `cband: bool = True` (only consulted on weighted event-study). `HeterogeneousAdoptionDiDEventStudyResults` extended with `variance_formula`, `effective_dose_mean`, `cband_low`, `cband_high`, `cband_crit_value`, `cband_method`, `cband_n_bootstrap` (all `None` on unweighted fits); surfaced in `to_dict`, `to_dataframe`, `summary`, `__repr__`. Unweighted event-study with `cband=False` preserves pre-Phase 4.5 B numerical output bit-exactly (stability invariant, locked by regression tests). Zero-weight subpopulation convention carries over from PR #359 (filter for design decisions; preserve full ResolvedSurveyDesign for variance). Non-pweight SurveyDesigns (`aweight`, `fweight`, replicate designs) raise `NotImplementedError` on both new paths (reciprocal-guard discipline). Pretest surfaces (`qug_test`, `stute_test`, `yatchew_hr_test`, joint variants, `did_had_pretest_workflow`) remain unweighted in this release — Phase 4.5 C / C0. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted 2SLS (Phase 4.5 B)", "Event-study survey composition", and "Sup-t multiplier bootstrap" for derivations and invariants. +- **`PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions + `llms-autonomous.txt` worked examples (Wave 2 of the AI-agent enablement track).** `profile_panel(...)` now populates two new optional sub-dataclasses on the returned `PanelProfile`: `outcome_shape: Optional[OutcomeShape]` (numeric outcomes only — exposes `n_distinct_values`, `pct_zeros`, `value_min` / `value_max`, `skewness` and `excess_kurtosis` (NaN-safe; `None` when `n_distinct_values < 3` or variance is zero), `is_integer_valued`, `is_count_like` (heuristic: integer-valued AND has zeros AND right-skewed AND > 2 distinct values AND non-negative support, i.e. `value_min >= 0`; flags WooldridgeDiD QMLE consideration over linear OLS — the non-negativity clause aligns the routing signal with `WooldridgeDiD(method="poisson")`'s hard rejection of negative outcomes at `wooldridge.py:1105-1109`), `is_bounded_unit` ([0, 1] support)) and `treatment_dose: Optional[TreatmentDoseShape]` (continuous treatments only — exposes `n_distinct_doses`, `has_zero_dose`, `dose_min` / `dose_max` / `dose_mean` over non-zero doses). Both `OutcomeShape` and `TreatmentDoseShape` are mostly descriptive context. **`profile_panel` does not see the separate `first_treat` column** that `ContinuousDiD.fit()` consumes; the estimator's actual fit-time gates key off `first_treat` (defines never-treated controls as `first_treat == 0`, force-zeroes nonzero `dose` on those rows with a `UserWarning`, and rejects negative dose only among treated units `first_treat > 0`; see `continuous_did.py:276-327` and `:348-360`). In the canonical `ContinuousDiD` setup (Callaway, Goodman-Bacon, Sant'Anna 2024), the dose `D_i` is **time-invariant per unit** and `first_treat` is a **separate column** the caller supplies (not derived from the dose column). Under that setup, several facts on the dose column predict `fit()` outcomes: `PanelProfile.has_never_treated` (proxies `P(D=0) > 0` because the canonical convention ties `first_treat == 0` to `D_i == 0`); `PanelProfile.treatment_varies_within_unit == False` (the actual fit-time gate at line 222-228, holds regardless of `first_treat`); `PanelProfile.is_balanced` (the actual fit-time gate at line 329-338); absence of the `duplicate_unit_time_rows` alert (silent last-row-wins overwrite, must deduplicate before fit); and `treatment_dose.dose_min > 0` (predicts the strictly-positive-treated-dose requirement at line 287-294 because treated units carry their constant dose across all periods). When `has_never_treated == False` (no zero-dose controls but all observed doses non-negative), `ContinuousDiD` does not apply (Remark 3.1 lowest-dose-as-control is not implemented); `HeterogeneousAdoptionDiD` IS a routing alternative on this branch (HAD's own contract requires non-negative dose, which is satisfied). When `dose_min <= 0` (negative treated doses), `ContinuousDiD` does not apply AND `HeterogeneousAdoptionDiD` is **not** a fallback — HAD also raises on negative post-period dose (`had.py:1450-1459`); the applicable alternative is linear DiD with the treatment as a signed continuous covariate. Re-encoding the treatment column is an agent-side preprocessing choice that changes the estimand and is not documented in REGISTRY as a supported fallback. The estimator's force-zero coercion on `first_treat == 0` rows with nonzero `dose` is implementation behavior for inconsistent inputs, not a documented method for manufacturing never-treated controls. The agent must validate the supplied `first_treat` column independently — `profile_panel` does not see it. The shape extensions provide distributional context (effect-size range, count-shape detection) that supplements but does not replace those gates. Both fields are `None` when their classification gate is not met (e.g., `treatment_dose is None` for binary treatments). `to_dict()` serializes the nested dataclasses as JSON-compatible nested dicts. New exports: `OutcomeShape`, `TreatmentDoseShape` from top-level `diff_diff`. `llms-autonomous.txt` gains a new §5 "Worked examples" section with three end-to-end PanelProfile -> reasoning -> validation walkthroughs (binary staggered with never-treated controls, continuous dose with zero baseline, count-shaped outcome) plus §2 field-reference subsections for the new shape fields and §4.7 / §4.11 cross-references for outcome-shape considerations. Existing §5-§8 of the autonomous guide are renumbered to §6-§9. Descriptive only — no recommender language inside the worked examples. - **`HeterogeneousAdoptionDiD.fit(survey=..., weights=...)` on continuous-dose paths (Phase 4.5 survey support).** The `continuous_at_zero` (paper Design 1') and `continuous_near_d_lower` (Design 1 continuous-near-d̲) designs accept survey weights through two interchangeable kwargs: `weights=` (pweight shortcut, weighted-robust SE from the CCT-2014 lprobust port) and `survey=SurveyDesign(weights, strata, psu, fpc)` (design-based inference via Binder-TSL variance using the existing `compute_survey_if_variance` helper at `diff_diff/survey.py:1802`). Point estimates match across both entry paths; SE diverges by design (pweight-only vs PSU-aggregated). `HeterogeneousAdoptionDiDResults.survey_metadata` is a repo-standard `SurveyMetadata` dataclass (weight_type / effective_n / design_effect / sum_weights / weight_range / n_strata / n_psu / df_survey); HAD-specific extras (`variance_formula` label, `effective_dose_mean`) are separate top-level result fields. `to_dict()` surfaces the full `SurveyMetadata` object plus `variance_formula` + `effective_dose_mean`; `summary()` renders `variance_formula`, `effective_n`, `effective_dose_mean`, and (when the survey= path is used) `df_survey`; `__repr__` surfaces `variance_formula` + `effective_dose_mean` when present. The HAD `mass_point` design and `aggregate="event_study"` path raise `NotImplementedError` under survey/weights (deferred to Phase 4.5 B: weighted 2SLS + event-study survey composition); the HAD pretests stay unweighted in this release (Phase 4.5 C). Parity ceiling acknowledged — no public weighted-CCF bias-corrected local-linear reference exists in any language; methodology confidence comes from (1) uniform-weights bit-parity at `atol=1e-14` on the full lprobust output struct, (2) cross-language weighted-OLS parity (manual R reference) at `atol=1e-12`, and (3) Monte Carlo oracle consistency on known-τ DGPs. `_nprobust_port.lprobust` gains `weights=` and `return_influence=` (used internally by the Binder-TSL path); `bias_corrected_local_linear` removes the Phase 1c `NotImplementedError` on `weights=` and forwards. Auto-bandwidth selection remains unweighted in this release — pass `h`/`b` explicitly for weight-aware bandwidths. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted extension (Phase 4.5 survey support)". - **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication). - **`did_had_pretest_workflow(aggregate="event_study")`**: multi-period dispatch on balanced ≥3-period panels. Runs QUG at `F` + joint pre-trends Stute across earlier pre-periods + joint homogeneity-linearity Stute across post-periods. Step 2 closure requires ≥2 pre-periods; with only a single pre-period (the base `F-1`) `pretrends_joint=None` and the verdict flags the skip. Reuses the Phase 2b event-study panel validator (last-cohort auto-filter under staggered timing with `UserWarning`; `ValueError` when `first_treat_col=None` and the panel is staggered). The data-in wrappers `joint_pretrends_test` and `joint_homogeneity_test` also route through that same validator internally, so direct wrapper calls inherit the last-cohort filter and constant-post-dose invariant. `HADPretestReport` extended with `pretrends_joint`, `homogeneity_joint`, and `aggregate` fields; serialization methods (`summary`, `to_dict`, `to_dataframe`, `__repr__`) preserve the Phase 3 output bit-exactly on `aggregate="overall"` — no `aggregate` key, no header row, no schema drift — and only surface the new fields on `aggregate="event_study"`. diff --git a/ROADMAP.md b/ROADMAP.md index fcc02f22..76bd19fc 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -138,14 +138,14 @@ Long-running program, framed as "building toward" rather than with discrete ship - Baker et al. (2025) 8-step workflow enforcement in `diff_diff/practitioner.py`. - `practitioner_next_steps()` context-aware guidance. - Runtime LLM guides via `get_llm_guide(...)` (`llms.txt`, `llms-full.txt`, `llms-practitioner.txt`, `llms-autonomous.txt`), bundled in the wheel. -- `profile_panel(df, ...)` returns a `PanelProfile` dataclass of structural facts about the panel - factual, not opinionated. Pairs with the `"autonomous"` guide variant (reference-shaped: estimator-support matrix + per-design-feature reasoning) so agents describe the data then consult a bundled reference rather than calling a deterministic recommender. +- `profile_panel(df, ...)` returns a `PanelProfile` dataclass of structural facts about the panel - factual, not opinionated. Pairs with the `"autonomous"` guide variant (reference-shaped: estimator-support matrix + per-design-feature reasoning) so agents describe the data then consult a bundled reference rather than calling a deterministic recommender. `PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions add descriptive distributional context (count-likeness / bounded-support hints on numeric outcomes; dose support and zero-dose presence on continuous treatments). Most fields are descriptive context. `outcome_shape.is_count_like` informs the WooldridgeDiD-QMLE-vs-linear-OLS judgment but does not gate it. `profile_panel` does not see the separate `first_treat` column that `ContinuousDiD.fit()` consumes; under the canonical `ContinuousDiD` setup (per-unit time-invariant dose `D_i` + separate `first_treat`), several preflight checks become predictive on the dose column: `has_never_treated` (proxies `P(D=0) > 0`), `treatment_varies_within_unit == False` (actual fit-time gate), `is_balanced` (actual fit-time gate), absence of the `duplicate_unit_time_rows` alert (silent last-row-wins overwrite path), and `treatment_dose.dose_min > 0` (predicts strictly-positive-treated-dose). When `has_never_treated == False` but all doses are non-negative, `ContinuousDiD` does not apply (Remark 3.1 not implemented); `HeterogeneousAdoptionDiD` is a routing alternative on this branch. When `dose_min <= 0` (negative doses), neither `ContinuousDiD` nor `HeterogeneousAdoptionDiD` apply (HAD raises on negative post-period dose); linear DiD with a signed continuous covariate is the applicable alternative. Re-encoding the treatment column is an agent-side preprocessing choice that is not documented in REGISTRY as a supported fallback. The estimator's force-zero coercion on inconsistent `first_treat == 0 + nonzero dose` inputs is implementation behavior, not a documented method for manufacturing controls. The autonomous guide §5 walks through three end-to-end PanelProfile -> reasoning -> validation worked examples. - Package docstring leads with an "For AI agents" entry block so `help(diff_diff)` surfaces the agent entry points automatically. - Silent-operation warnings so agents and humans see the same signals at the same time. **Next blocks toward the vision.** -- **Post-hoc mismatch detection in BR/DR output** - surfaces structured warnings like "you fit TWFE on staggered data with 37% forbidden-comparison weights" when the profile and the fitted estimator disagree. Safety net, not a pre-emptive rules engine. -- **Structured `sanity_checks` block in BR/DR** - machine-legible pass / warn / fail signals (pretrends, power, forbidden-comparisons, event-study cleanliness, placebo, sensitivity) so agents can dispatch on a stable schema rather than parsing prose. +- **Structured `sanity_checks` block in BR/DR** - machine-legible pass / warn / fail signals for pretrends, power, forbidden-comparisons, event-study cleanliness, placebo, and sensitivity, so agents dispatch on a stable schema rather than parsing prose. Highest-leverage net-new agent decision surface; orthogonal to existing `caveats` and to fit-time validators. +- **Post-hoc mismatch detection in BR/DR output** - originally proposed as Wave 2 but rescoped after a plan review showed most candidate checks duplicate fit-time validators (which raise `ValueError` before any fitted result exists) or the existing `caveats` block (TWFE-on-staggered is already surfaced via `bacon_contamination`). Held for revisiting only if the `sanity_checks` rollout uncovers genuine post-fit mismatch signals not caught by current surfaces. - **Context-aware `practitioner_next_steps()`** that substitutes actual column names - turns guidance into executable recommendations. - **Unified `assess_*` verb** across estimator native-diagnostic methods for a single discoverable convention. - **End-to-end scenario walkthrough templates** - reusable orchestration recipes an agent can adapt from data ingest through business-ready output. diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py index 0247eecd..661fa775 100644 --- a/diff_diff/__init__.py +++ b/diff_diff/__init__.py @@ -250,7 +250,13 @@ DiagnosticReportResults, ) from diff_diff._guides_api import get_llm_guide -from diff_diff.profile import Alert, PanelProfile, profile_panel +from diff_diff.profile import ( + Alert, + OutcomeShape, + PanelProfile, + TreatmentDoseShape, + profile_panel, +) from diff_diff.datasets import ( clear_cache, list_datasets, @@ -498,6 +504,8 @@ "profile_panel", "PanelProfile", "Alert", + "OutcomeShape", + "TreatmentDoseShape", # LLM guide accessor "get_llm_guide", ] diff --git a/diff_diff/guides/llms-autonomous.txt b/diff_diff/guides/llms-autonomous.txt index b571911e..a320ac71 100644 --- a/diff_diff/guides/llms-autonomous.txt +++ b/diff_diff/guides/llms-autonomous.txt @@ -26,10 +26,11 @@ discusses and that this guide does not pretend to resolve. - §2. PanelProfile field reference - §3. Estimator-support matrix - §4. Estimator-choice reasoning by design feature -- §5. Post-fit validation utilities -- §6. How to read BusinessReport / DiagnosticReport output -- §7. Glossary + citations -- §8. Intentional omissions +- §5. Worked examples +- §6. Post-fit validation utilities +- §7. How to read BusinessReport / DiagnosticReport output +- §8. Glossary + citations +- §9. Intentional omissions ## §1. What this guide is (and is not) @@ -37,14 +38,16 @@ discusses and that this guide does not pretend to resolve. **What it is.** A reference you consult after running `profile_panel()` and before calling any estimator's `fit()`. The matrix in §3 and the per-design- feature discussions in §4 tell you which estimators are well-suited to the -panel shape reported by the profile; the post-fit index in §5 tells you +panel shape reported by the profile; the worked examples in §5 walk through +several end-to-end PanelProfile -> reasoning -> validation flows; the +post-fit index in §6 tells you which diagnostics apply once you have a fitted result. **What it is not.** A deterministic recommender. No function in diff-diff returns "pick estimator X." This guide does not either. When several estimators fit a design, it enumerates them and names the trade-offs. The agent is responsible for weighing those trade-offs (often with the cited -references in §7) and justifying the choice in the final write-up. +references in §8) and justifying the choice in the final write-up. **Why this shape.** A rules-engine recommender would lock in a policy that ages poorly as new estimators land and as the applied-econometrics @@ -169,6 +172,128 @@ view. Every field below appears as a top-level key in that dict. outcome column is NaN, in `[0, 1]`. - **`outcome_summary: Mapping[str, float]`** - `{min, max, mean, std}` computed with NaN-skipping; empty for non-numeric outcomes. +- **`outcome_shape: Optional[OutcomeShape]`** - distributional facts for + numeric outcomes; `None` when the outcome dtype is non-numeric. Sub-fields: + - `n_distinct_values: int` - count of distinct non-NaN outcome values. + - `pct_zeros: float` - share of non-NaN observations equal to zero, + in `[0, 1]`. + - `value_min: float`, `value_max: float` - range of observed values. + - `skewness: Optional[float]` - sample skewness via the canonical + `m3 / std^3` form. `None` when `n_distinct_values < 3` or variance + is zero. + - `excess_kurtosis: Optional[float]` - `m4 / m2^2 - 3`, gated the + same way as `skewness`. + - `is_integer_valued: bool` - all non-NaN values are whole numbers + (covers integer dtype and floats that happen to be integer-valued). + - `is_count_like: bool` - heuristic for count-shaped outcomes: + `is_integer_valued AND pct_zeros > 0 AND skewness > 0.5 AND + n_distinct_values > 2 AND value_min >= 0`. When `True`, OLS + DiD imposes an additive functional form on a non-negative + count outcome (cluster-robust SEs are still calibrated, but + the model can be inefficient and may produce counterfactual + predictions outside the non-negative support); + `WooldridgeDiD(method="poisson")` (QMLE) is the multiplicative + (log-link) ETWFE alternative that respects the non-negative + support and matches the typical generative process for count + data, with QMLE sandwich SEs robust to distributional + misspecification. The Poisson fitter rejects negative outcomes + at fit time, which is why the heuristic gates on + `value_min >= 0`. See §5.3 for a worked example. + - `is_bounded_unit: bool` - all non-NaN values lie in `[0, 1]`. + When `True` and the linear-DiD point estimate is near the + boundary of feasible support, interpret with care (the linear + model can predict outside `[0, 1]`). +- **`treatment_dose: Optional[TreatmentDoseShape]`** - distributional + facts for continuous-treatment dose columns; `None` unless + `treatment_type == "continuous"`. Most sub-fields are descriptive + distributional context. **`profile_panel` does not see the + separate `first_treat` column** that `ContinuousDiD.fit()` + consumes: the estimator's actual fit-time gates key off + `first_treat` (defines never-treated controls as + `first_treat == 0`, force-zeroes nonzero `dose` on those rows + with a `UserWarning`, drops units where `first_treat > 0` AND + `dose == 0`, and rejects negative dose only among treated units + where `first_treat > 0`; see `continuous_did.py:276-327` and + `:348-360`). + + In the canonical `ContinuousDiD` setup (Callaway, Goodman-Bacon, + Sant'Anna 2024), the dose `D_i` is **time-invariant per unit** + (`D_i = 0` for never-treated, `D_i > 0` constant across all + periods for treated unit `i`) and `first_treat` is a **separate + column** the caller supplies — it is NOT derived from the dose + column (the dose column has no within-unit time variation in + this setup, so it cannot encode timing). Under the canonical + setup, several facts on the dose column predict + `ContinuousDiD.fit()` outcomes: + `has_never_treated == True` (proxy for `P(D=0) > 0` under both + `control_group="never_treated"` and + `control_group="not_yet_treated"`; Remark 3.1 + lowest-dose-as-control is not yet implemented); + `treatment_varies_within_unit == False` (the actual fit-time + gate, matching `ContinuousDiD.fit()`'s + `df.groupby(unit)[dose].nunique() > 1` rejection at line + 222-228; holds regardless of `first_treat`); `is_balanced == + True` (the actual fit-time gate at line 329-338); absence of the + `duplicate_unit_time_rows` alert (the precompute path silently + resolves duplicate cells via last-row-wins); and + `treatment_dose.dose_min > 0` (predicts the + strictly-positive-treated-dose requirement at line 287-294; + treated units carry their constant dose across all periods so + `dose_min` over non-zero values is the smallest treated dose). + + When `has_never_treated == False` (no zero-dose controls but + all observed doses non-negative), `ContinuousDiD` as currently + implemented does not apply (Remark 3.1 lowest-dose-as-control + is not implemented). Routing alternatives that do not require + `P(D=0) > 0`: `HeterogeneousAdoptionDiD` for graded-adoption + designs (HAD's own contract requires non-negative dose, which + this branch satisfies), or linear DiD with the treatment as a + continuous covariate. When `dose_min <= 0` (negative treated + doses), the situation is different: `ContinuousDiD` does not + apply, and `HeterogeneousAdoptionDiD` is **not** a fallback + either — HAD raises on negative post-period dose + (`had.py:1450-1459`). The applicable routing alternative on + the negative-dose branch is linear DiD with the treatment as + a signed continuous covariate. Re-encoding the treatment + column to a non-negative scale (shifting, absolute value, etc.) + is an agent-side preprocessing choice that changes the + estimand and is not documented in REGISTRY as a supported + fallback; if the agent does re-encode, both `ContinuousDiD` + and `HeterogeneousAdoptionDiD` become candidates again on the + re-encoded scale. Do not relabel positive- or negative-dose + units as `first_treat == 0`: that triggers the force-zero + coercion path, which is implementation behavior for + inconsistent inputs (e.g., an accidentally-nonzero row on a + never-treated unit), not a documented routing option. + + The agent must still validate the supplied `first_treat` column + independently: it must contain at least one `first_treat == 0` + unit (`P(D=0) > 0`), be non-negative integer-valued (or `+inf` / + 0 for never-treated), and be consistent with the dose column on + per-unit treated/untreated status. `profile_panel` does not see + `first_treat` and cannot validate it. See §5.2 for the worked + example. Sub-fields: + - `n_distinct_doses: int` - count of distinct non-NaN dose values + (including zero if observed). Useful supplement to the gate + checks for understanding the dose support. + - `has_zero_dose: bool` - at least one unit-period has dose + exactly zero. **Row-level fact**: a panel can have + `has_zero_dose == True` (some pre-treatment rows are zero) while + `has_never_treated == False` (every unit eventually treated), in + which case the panel still fails the ContinuousDiD never-treated + gate. Consult `has_never_treated` for the unit-level gate. + - `dose_min: float`, `dose_max: float`, `dose_mean: float` - + computed over the strictly non-zero doses; useful for + effect-size context and dose-response interpretation. As noted + above, under the standard workflow `dose_min > 0` is the + profile-side proxy for ContinuousDiD's strictly-positive- + treated-dose requirement. A continuous panel with negative + non-zero doses (e.g. `dose_min == -1.5`) labeled as + `first_treat > 0` would be rejected at fit time + (``continuous_did.py:287-294``); the same negative-dose units + labeled as `first_treat == 0` would be coerced to dose=0 with + a `UserWarning` instead. See §5.2 for the standard-workflow + walkthrough. ### Alerts @@ -294,6 +419,12 @@ estimator from the remaining rows (CS/SA/dCDH/Imputation/TwoStage/ Stacked/ETWFE all accept unbalanced input, with some caveats in their own docs). +For two common reasoning patterns walked through end-to-end (continuous +dose checked against the existing `has_never_treated` / +`treatment_varies_within_unit` / `is_balanced` gates with +`treatment_dose` providing descriptive context, and count-shaped +outcome with `outcome_shape` introspection), see §5.2 and §5.3. + ## §4. Estimator-choice reasoning by design feature @@ -419,16 +550,39 @@ worth considering. When `treatment_type == "continuous"`: - `ContinuousDiD` (Callaway, Goodman-Bacon, Sant'Anna 2024) - - continuous / dose-response treatment. **Three eligibility - prerequisites**: (a) zero-dose control units must exist - (`P(D=0) > 0`) because Remark 3.1 (lowest-dose-as-control) is not - yet implemented, (b) dose must be time-invariant per unit (rule out - panels where `PanelProfile.treatment_varies_within_unit == True`), - and (c) the panel must be balanced (`PanelProfile.is_balanced == - True`). `fit()` raises `ValueError` in any of the three cases. Note that - staggered adoption IS supported natively (adoption timing is - expressed via the `first_treat` column, not via within-unit dose - variation). The estimator exposes several dose-indexed targets that + continuous / dose-response treatment. The estimator's canonical + setup expects a **time-invariant unit dose** `D_i` (constant + across all periods for each unit, `0` for never-treated, `> 0` + for treated) and a **separate `first_treat` column** carrying + timing information — the dose column does not encode timing. + Under that canonical setup, five facts on the dose column predict + `fit()` outcomes (full discussion in the paragraph immediately + below): (a) zero-dose control units must exist + (`PanelProfile.has_never_treated == True`, proxying + `ContinuousDiD`'s `P(D=0) > 0` requirement under both + `control_group` options because Remark 3.1 lowest-dose-as-control + is not yet implemented); (b) dose must be time-invariant per unit + (rule out panels where + `PanelProfile.treatment_varies_within_unit == True`); (c) the + panel must be balanced (`PanelProfile.is_balanced == True`); + (d) no `duplicate_unit_time_rows` alert (the precompute path + silently resolves duplicate cells via last-row-wins); and (e) + strictly positive treated doses (`treatment_dose.dose_min > 0`). + `fit()` raises `ValueError` on (b) and (c) regardless of how + `first_treat` is constructed; duplicate rows in (d) are silently + overwritten with last-row-wins (a hard preflight veto, not a + fit-time raise — the agent must deduplicate before fitting); (a) + and (e) hold under the canonical setup. When (a) or (e) fails, + see §2 for the full routing-alternatives discussion (the two + branches differ: HAD applies on the no-never-treated branch but + not on the negative-dose branch, since HAD requires non-negative + dose support per `had.py:1450-1459`). + Note that staggered adoption IS supported natively (adoption + timing is expressed via the `first_treat` column, not via + within-unit dose variation), and `ContinuousDiD.fit()` applies + additional validation on the `first_treat` column itself — see + the paragraph below and §2 for the full list. The estimator exposes several dose-indexed targets + that require different assumptions: `ATT(d|d)` (effect of dose `d` on units that received `d`) and `ATT^{loc}` (binarized overall ATT) are identified under Parallel Trends; `ATT(d)` (full dose-response @@ -441,6 +595,12 @@ When `treatment_type == "continuous"`: scalar first-stage adoption summary. Useful when adoption is graded rather than binary. +See the `TreatmentDoseShape` field reference in §2 for the full +preflight-vs-gate breakdown and the routing-alternative discussion +when (a) or (e) fails. The remaining `treatment_dose` sub-fields +are descriptive context only; §5.2 walks through the screen -> fit +-> validation flow. + ### §4.8 Few treated units (one or a handful) When few treated units exist (not a separate `PanelProfile` field yet, @@ -540,8 +700,289 @@ docs explicitly say otherwise. When routing, also: the individual-respondent level, before interpreting the estimate as a group-time ATT. - -## §5. Post-fit validation utilities +### §4.11 Outcome-shape considerations + +The matrix in §3 routes by treatment shape. Outcome shape is a separate +axis: a panel that is binary-absorbing and staggered may still have a +count-shaped outcome (e.g., number of incidents per unit-period), and +on such an outcome linear DiD imposes an additive functional form that +can be inefficient and may produce counterfactual predictions outside +the non-negative support — even though the cluster-robust SEs remain +calibrated. The functional-form choice is what matters here, not SE +calibration. `PanelProfile.outcome_shape` exposes the relevant facts: + +- `outcome_shape.is_count_like == True` (integer-valued, non-negative, + has zeros, right-skewed, more than two distinct values) - linear-OLS + DiD imposes an additive functional form on a non-negative count + outcome: estimates are unbiased for the linear ATT (and the + estimator already uses cluster-robust SEs that do not assume + Gaussian errors), but the linear model can be inefficient on count + data and can produce counterfactual predictions outside the + non-negative support. `WooldridgeDiD(method="poisson")` is the + multiplicative (log-link) ETWFE alternative — it respects the + non-negative support, matches the typical generative process for + count data, and uses QMLE sandwich SEs that are robust to + distributional misspecification (Wooldridge 2023). It estimates the + overall ATT as an ASF-based outcome-scale difference (per-cell + average of `E[exp(η_1)] - E[exp(η_0)]`; see REGISTRY.md + §WooldridgeDiD nonlinear / ASF path). The headline `overall_att` is + a difference on the outcome scale, NOT a multiplicative ratio; a + proportional interpretation can be derived post-hoc as + `overall_att / E[Y_0]` if desired but is not the estimator's + reported scalar. The choice between linear-OLS DiD and Poisson + ETWFE is about which functional form (additive vs. multiplicative) + best summarizes the treatment effect on the count outcome, not + about whether SEs are calibrated. The shape field flags the + consideration; §5.3 walks through this pattern with a concrete + profile. +- `outcome_shape.is_bounded_unit == True` (values in `[0, 1]`, + e.g. a proportion) - linear DiD can produce predictions outside + `[0, 1]` and inference at the boundary is questionable. No + estimator in the suite handles this differently from a numeric + outcome; flag the consideration in the write-up. +- `outcome_shape.is_integer_valued == True` without + `is_count_like` (e.g., 0/1 binary, ordinal Likert) - the binary + case has its own caveats (logit/log-odds alternative per Roth + and Sant'Anna 2023). Ordinal outcomes generally need a + domain-specific design that the current suite does not provide. + + +## §5. Worked examples + +Each subsection shows a realistic `profile_panel` output, traces the +agent reasoning that maps it to an estimator (or rules estimators out), +and points at the validation step. Examples are illustrative: they do +not exhaust the design space and they do not collapse a multi-path +choice to a single mandated answer. + +### §5.1 Binary staggered panel with never-treated controls + +A long panel of 200 stores observed across 20 quarters, with treatment +applied to subsets of stores in three different quarters and a fourth +group never treated. `profile_panel(...)` returns: + +``` +PanelProfile( + n_units=200, n_periods=20, n_obs=4000, + is_balanced=True, observation_coverage=1.0, + treatment_type="binary_absorbing", + is_staggered=True, n_cohorts=3, + cohort_sizes={5: 40, 9: 35, 13: 45}, + has_never_treated=True, has_always_treated=False, + treatment_varies_within_unit=True, + first_treatment_period=5, last_treatment_period=13, + min_pre_periods=4, min_post_periods=7, + outcome_dtype="float64", outcome_is_binary=False, + outcome_has_zeros=False, outcome_has_negatives=False, + outcome_missing_fraction=0.0, + outcome_summary={"min": 12.4, "max": 88.1, + "mean": 47.3, "std": 14.2}, + outcome_shape=OutcomeShape( + n_distinct_values=2841, pct_zeros=0.0, + value_min=12.4, value_max=88.1, + skewness=0.18, excess_kurtosis=-0.41, + is_integer_valued=False, + is_count_like=False, is_bounded_unit=False, + ), + treatment_dose=None, + alerts=(), +) +``` + +Reasoning chain: + +1. `treatment_type == "binary_absorbing"` and `is_staggered == True` -> + §3 row narrows to the staggered-robust set: `CallawaySantAnna`, + `SunAbraham`, `ChaisemartinDHaultfoeuille`, `ImputationDiD`, + `TwoStageDiD`, `StackedDiD`, `WooldridgeDiD` (ETWFE), + `EfficientDiD`. `TwoWayFixedEffects` is `warn` per the §3 footnote + on cohort weights. `SyntheticDiD` is `✗` on staggered (block + treatment only). `ContinuousDiD` and `HeterogeneousAdoptionDiD` + are out (binary). +2. `has_never_treated == True` AND `n_cohorts == 3` (multi-cohort) -> + `CallawaySantAnna(control_group="never_treated")` and `SunAbraham` + are both well-suited; the never-treated controls preserve power. + `EfficientDiD` (Hausman-pretested between PT-All and PT-Post) is + another applicable path with the same control set. +3. `min_pre_periods == 4` -> parallel-trends and event-study pretests + have meaningful power; no `short_pre_panel` alert fires. +4. Pick `CallawaySantAnna(control_group="never_treated")` for the + group-time ATT decomposition; fit; then validate via + `compute_pretrends_power(results)` and + `compute_honest_did(results)` before reporting through + `BusinessReport(results, data=df)`. + +### §5.2 Continuous-dose panel with zero-dose controls + +A panel of 100 firms observed across 6 years, with 20 untreated firms +(dose 0 in every period), 30 firms at dose 1.0 (in every period), +30 at dose 2.5 (in every period), and 20 at dose 4.0 (in every +period). The dose column is time-invariant per unit; adoption timing +is carried separately via the `first_treat` column passed to +`ContinuousDiD.fit()` (e.g. `first_treat=3` for treated firms, +`first_treat=0` for the never-treated). `profile_panel(...)` returns +the relevant facts: + +``` +PanelProfile( + treatment_type="continuous", + treatment_varies_within_unit=False, + has_never_treated=True, + is_balanced=True, + treatment_dose=TreatmentDoseShape( + n_distinct_doses=4, + has_zero_dose=True, + dose_min=1.0, dose_max=4.0, dose_mean=2.4, + ), + outcome_shape=OutcomeShape( + is_count_like=False, is_bounded_unit=False, ... + ), + alerts=(), +) +``` + +Reasoning chain: + +1. `treatment_type == "continuous"` -> §3 row narrows to + `ContinuousDiD` (`✓`) and `HeterogeneousAdoptionDiD` (`partial`, + for graded adoption). All other estimators are `✗` on continuous. +2. The example matches the canonical `ContinuousDiD` setup + (per-unit time-invariant `D_i`; `first_treat` will be a + separate column the caller supplies, NOT derived from the dose + column). On the dose column alone, profile_panel exposes five + facts that predict `fit()` outcomes under that canonical + setup: `has_never_treated == True` (proxy for `P(D=0) > 0` + under both `control_group` options, since Remark 3.1 + lowest-dose-as-control is not yet implemented), + `treatment_varies_within_unit == False` (the actual fit-time + gate matching `ContinuousDiD.fit()`'s + `df.groupby(unit)[dose].nunique() > 1` rejection at line + 222-228; not first_treat-dependent), `is_balanced == True` + (actual fit-time gate at line 329-338), absence of a + `duplicate_unit_time_rows` alert (the precompute path silently + resolves duplicate cells via last-row-wins; the agent must + deduplicate before fit), and `treatment_dose.dose_min > 0` + (predicts the strictly-positive-treated-dose requirement at + line 287-294 because treated units carry their constant dose + across all periods so `dose_min` over non-zero values is the + smallest treated dose). All five pass (`dose_min == 1.0 > 0`), + so `ContinuousDiD` is a candidate. The remaining + `treatment_dose` sub-fields (`n_distinct_doses`, + `has_zero_dose`, `dose_max`, `dose_mean`) provide descriptive + context — useful for reasoning about dose support and the + eventual dose-response interpretation, but not themselves + preflight checks. + + See §2 `TreatmentDoseShape` for the full preflight-vs-gate + breakdown and the explicit warning against relabeling-to- + manufacture-controls. `fit()` also rejects NaN `first_treat` + rows, recodes `+inf` to 0 with a `UserWarning`, rejects negative + `first_treat`, and drops units with `first_treat > 0` AND + `dose == 0`. +3. Counter-example: had `treatment_varies_within_unit == True` (any + unit's full dose path - including pre-treatment zeros - has more + than one distinct value, e.g., a `0,0,d,d` adoption path with + varying nonzero `d`), `ContinuousDiD` would not apply. The two + paths from there are (a) `HeterogeneousAdoptionDiD` if a scalar + adoption summary fits, or (b) aggregate the dose to a binary + indicator and fall back to a binary staggered estimator. +4. Counter-example: had `has_never_treated == False` (every unit + eventually treated, even if some pre-treatment rows have zero + dose so `treatment_dose.has_zero_dose == True`), + `ContinuousDiD.fit()` would reject the panel under both + `control_group="never_treated"` and + `control_group="not_yet_treated"` because Remark 3.1 + lowest-dose-as-control is not yet implemented. On this branch + (no never-treated controls but doses still non-negative), + `HeterogeneousAdoptionDiD` IS a routing alternative for + graded-adoption designs, and linear DiD with the treatment as + a continuous covariate is another; see §2 for the full routing + discussion. +5. Counter-example: had `treatment_dose.dose_min < 0` (continuous + panel with some negative-valued treated doses, e.g. a + centered-around-zero treatment encoding), with a `first_treat` + column consistent with the dose column, `ContinuousDiD.fit()` + would raise at line 287-294 ("Dose must be strictly positive + for treated units"). `HeterogeneousAdoptionDiD` is **not** a + routing alternative here either — HAD requires non-negative + dose support (`had.py:1450-1459`, paper Section 2). The + applicable alternative is linear DiD with the treatment as a + signed continuous covariate; see §2 for the full routing + discussion. +6. Fit `ContinuousDiD`; the result object exposes the dose-response + curve (`ATT(d)`) and average causal response (`ACRT(d)`); choose + the headline estimand based on the business question (overall + ATT under PT, or the dose-response curve under Strong PT). + +### §5.3 Count-shaped outcome on a binary-staggered panel + +A panel of 300 retail outlets observed across 12 months, with three +adoption cohorts. The outcome is "number of customer complaints per +outlet per month" - integer-valued, lots of zero months, right-skewed. +`profile_panel(...)` returns: + +``` +PanelProfile( + treatment_type="binary_absorbing", is_staggered=True, + has_never_treated=True, n_cohorts=3, + outcome_dtype="int64", outcome_is_binary=False, + outcome_has_zeros=True, outcome_has_negatives=False, + outcome_summary={"min": 0, "max": 47, "mean": 1.8, "std": 3.2}, + outcome_shape=OutcomeShape( + n_distinct_values=18, pct_zeros=0.43, + value_min=0, value_max=47, + skewness=2.1, excess_kurtosis=6.4, + is_integer_valued=True, + is_count_like=True, + is_bounded_unit=False, + ), + treatment_dose=None, + alerts=(), +) +``` + +Reasoning chain: + +1. Same staggered-binary narrowing as §5.1 (CS/SA/dCDH/Imputation/ + TwoStage/Stacked/ETWFE/EfficientDiD applicable). +2. `outcome_shape.is_count_like == True` AND + `outcome_shape.value_min >= 0` -> linear-OLS DiD imposes an + additive functional form on a non-negative count outcome: + estimates are unbiased for the linear ATT and the implementation + already uses cluster-robust SEs that do not assume Gaussian + errors. The trade-off is functional-form / efficiency, not + inference calibration: the linear model can be inefficient on + count data and may produce counterfactual predictions outside the + non-negative support, while `WooldridgeDiD(method="poisson")` + (QMLE) imposes a multiplicative (log-link) functional form that + respects the non-negative support and matches the typical + generative process for count data. It estimates the overall ATT + as an ASF-based outcome-scale difference: the per-cell average of + `E[exp(η_1)] - E[exp(η_0)]` (Wooldridge 2023; see REGISTRY.md + §WooldridgeDiD nonlinear / ASF path), with QMLE sandwich SEs that + are robust to distributional misspecification. The Poisson fitter + hard-rejects negative outcomes (`y < 0` raises `ValueError` at + line ~1105 of `wooldridge.py`), which is why `is_count_like` + gates on `value_min >= 0`. +3. Decision: fit `WooldridgeDiD(method="poisson")` if you want a + multiplicative effect summary (with the outcome-scale headline + reported as a difference; a percent-change reading can be derived + post-hoc as `overall_att / E[Y_0]`). Fit linear-OLS DiD if the + additive ATT is the right summary and counterfactual predictions + stay safely within the non-negative support — the cluster-robust + SEs are calibrated either way. Document which functional form + the headline reflects either way; the two estimands are on the + same outcome scale but parameterize the treatment effect + differently. +4. Caveat: when the outcome includes structural zeros that violate + Poisson conditional moments (overdispersion), consider negative + binomial QMLE or a hurdle model; the current suite does not + provide these natively, but the linear DiD with cluster-robust + SEs remains defensible at sufficient sample size. The shape + field flags the consideration; the choice is yours. + + +## §6. Post-fit validation utilities After any `fit()`, the Baker et al. (2025) 8-step workflow recommends a diagnostic sequence. The library exposes utilities covering each step. @@ -625,7 +1066,7 @@ Event-study plots are also a diagnostic - pre-treatment coefficients close to zero support parallel trends. -## §6. How to read BusinessReport / DiagnosticReport output +## §7. How to read BusinessReport / DiagnosticReport output `BusinessReport(results)` and `DiagnosticReport(results)` are experimental in the 3.2 line. Their schema is versioned (`BUSINESS_REPORT_SCHEMA_VERSION` @@ -728,7 +1169,7 @@ Forthcoming schema additions (not yet shipped): a top-level queued for a later wave. Treat their current absence as expected. -## §7. Glossary + citations +## §8. Glossary + citations **ATT**: Average Treatment Effect on the Treated. The target parameter of most DiD estimators. @@ -820,14 +1261,14 @@ or the outcome model is correctly specified. (Callaway-Sant'Anna). -## §8. Intentional omissions +## §9. Intentional omissions This guide does **not**: - Recommend a specific estimator for a specific dataset. When multiple estimators fit, §4 lists them and names the trade-offs; the choice is the agent's. -- Enumerate every possible design edge case. The literature cited in §7 +- Enumerate every possible design edge case. The literature cited in §8 covers them; this guide is a navigation aid, not a substitute. - Promise forward-compatibility of the BR / DR schema or the alert catalogue. Treat these as experimental until the 12-item foundation- @@ -840,5 +1281,5 @@ This guide does **not**: treated unit). When those apply, point the user at dedicated libraries. -**If in doubt, consult the primary references in §7 and use +**If in doubt, consult the primary references in §8 and use `get_llm_guide("practitioner")` for the Baker et al. workflow.** diff --git a/diff_diff/profile.py b/diff_diff/profile.py index b343f9d4..62c6b4b8 100644 --- a/diff_diff/profile.py +++ b/diff_diff/profile.py @@ -40,6 +40,127 @@ class Alert: observed: Any +@dataclass(frozen=True) +class OutcomeShape: + """Distributional shape of a numeric outcome column. + + Populated on :class:`PanelProfile` when the outcome dtype is integer or + float (``np.dtype(...).kind in {"i", "u", "f"}``); ``None`` otherwise. + Descriptive only — these fields surface what is observed in the outcome + distribution. They never recommend a specific estimator family. + """ + + n_distinct_values: int + pct_zeros: float + value_min: float + value_max: float + skewness: Optional[float] + excess_kurtosis: Optional[float] + is_integer_valued: bool + is_count_like: bool + is_bounded_unit: bool + + +@dataclass(frozen=True) +class TreatmentDoseShape: + """Distributional shape of a continuous treatment dose. + + Populated on :class:`PanelProfile` only when ``treatment_type == + "continuous"``; ``None`` otherwise. Most fields are descriptive + distributional context. + + **profile_panel only sees the dose column**, not the separate + ``first_treat`` column ``ContinuousDiD.fit()`` consumes. In the + canonical ``ContinuousDiD`` setup (Callaway, Goodman-Bacon, + Sant'Anna 2024) the dose ``D_i`` is **time-invariant per unit** + (``D_i = 0`` for never-treated, ``D_i > 0`` constant across all + periods for treated unit i) and ``first_treat`` is a **separate + column** the caller supplies — not derived from the dose column. + Under that canonical setup, several profile-side facts on the + dose column predict ``ContinuousDiD.fit()`` outcomes: + + 1. ``PanelProfile.has_never_treated == True`` (some unit has + dose 0 in every period). Predicts the estimator's + ``P(D=0) > 0`` requirement under both + ``control_group="never_treated"`` and + ``control_group="not_yet_treated"`` (Remark 3.1 + lowest-dose-as-control not yet implemented), because the + canonical setup ties ``first_treat == 0`` to ``D_i == 0``. + Failure means no never-treated controls exist on the dose + column; see routing notes below. + 2. ``PanelProfile.treatment_varies_within_unit == False`` + (per-unit full-path dose constancy on the dose column). This + IS the actual fit-time gate, matching + ``ContinuousDiD.fit()``'s + ``df.groupby(unit)[dose].nunique() > 1`` rejection at line + 222-228; holds regardless of ``first_treat``. ``True`` rules + ``ContinuousDiD`` out — for graded-adoption panels with + dose changes use ``HeterogeneousAdoptionDiD``. + 3. ``PanelProfile.is_balanced == True``. Actual fit-time gate + (``continuous_did.py:329-338``); not ``first_treat``-dependent. + 4. Absence of the ``duplicate_unit_time_rows`` alert. The + precompute path silently resolves duplicate ``(unit, time)`` + cells via last-row-wins (``continuous_did.py:818-823``); + **not** a fit-time raise. The agent must deduplicate before + fit because ``ContinuousDiD`` will otherwise overwrite + silently. + 5. ``treatment_dose.dose_min > 0`` (over non-zero doses). + Predicts ``ContinuousDiD.fit()``'s strictly-positive-treated- + dose requirement (raises ``ValueError`` on negative dose for + ``first_treat > 0`` units, ``continuous_did.py:287-294``). + Failure means some treated units have negative dose; see + routing notes below. + + Routing alternatives when (1) or (5) fails: + + - When (1) fails (no never-treated controls but all observed + doses non-negative): ``ContinuousDiD`` does not apply (Remark + 3.1 lowest-dose-as-control is not implemented). + ``HeterogeneousAdoptionDiD`` IS a candidate for graded-adoption + designs (HAD's contract requires non-negative dose, satisfied + here); linear DiD with the treatment as a continuous covariate + is another. + - When (5) fails (negative treated doses): + ``HeterogeneousAdoptionDiD`` is **not** a fallback either — + HAD raises on negative post-period dose (``had.py:1450-1459``, + paper Section 2). Linear DiD with the treatment as a signed + continuous covariate is the applicable routing alternative. + - Re-encoding the treatment column (shifting, absolute value, + etc.) is an agent-side preprocessing choice that changes the + estimand and is not documented in REGISTRY as a supported + fallback; if the agent re-encodes to non-negative support, + both ``ContinuousDiD`` and ``HeterogeneousAdoptionDiD`` + become candidates again on the re-encoded scale. + - Do **not** relabel positive- or negative-dose units as + ``first_treat == 0``: that triggers ``ContinuousDiD.fit()``'s + force-zero coercion path, which is implementation behavior + for inconsistent inputs (e.g., an accidentally-nonzero row on + a never-treated unit), not a documented routing option. + + The agent must still validate the supplied ``first_treat`` + column independently: it must contain at least one + ``first_treat == 0`` unit (``P(D=0) > 0``), be non-negative + integer-valued (or ``+inf`` / 0 for never-treated), and be + consistent with the dose column on per-unit treated/untreated + status. ``profile_panel`` does not see ``first_treat`` and + cannot validate it. + + ``has_zero_dose`` is a row-level fact ("at least one observation has + dose == 0"); it is NOT a substitute for ``has_never_treated``, which + is the unit-level field. A panel can have ``has_zero_dose == True`` + (pre-treatment zero rows) while ``has_never_treated == False`` (every + unit eventually treated), in which case the standard-workflow agent + would conclude no never-treated controls exist before calling + ``ContinuousDiD.fit()``. + """ + + n_distinct_doses: int + has_zero_dose: bool + dose_min: float + dose_max: float + dose_mean: float + + @dataclass(frozen=True) class PanelProfile: """Structural facts about a DiD panel. @@ -78,6 +199,12 @@ class PanelProfile: alerts: Tuple[Alert, ...] + # Wave 2 additions are kept defaulted so direct PanelProfile(...) + # construction by external callers does not break when the new + # fields are not supplied. profile_panel() always populates both. + outcome_shape: Optional[OutcomeShape] = None + treatment_dose: Optional[TreatmentDoseShape] = None + def to_dict(self) -> Dict[str, Any]: """Return a JSON-serializable dict representation of the profile.""" return { @@ -103,6 +230,40 @@ def to_dict(self) -> Dict[str, Any]: "outcome_has_negatives": self.outcome_has_negatives, "outcome_missing_fraction": self.outcome_missing_fraction, "outcome_summary": {k: float(v) for k, v in self.outcome_summary.items()}, + "outcome_shape": ( + None + if self.outcome_shape is None + else { + "n_distinct_values": int(self.outcome_shape.n_distinct_values), + "pct_zeros": float(self.outcome_shape.pct_zeros), + "value_min": float(self.outcome_shape.value_min), + "value_max": float(self.outcome_shape.value_max), + "skewness": ( + None + if self.outcome_shape.skewness is None + else float(self.outcome_shape.skewness) + ), + "excess_kurtosis": ( + None + if self.outcome_shape.excess_kurtosis is None + else float(self.outcome_shape.excess_kurtosis) + ), + "is_integer_valued": bool(self.outcome_shape.is_integer_valued), + "is_count_like": bool(self.outcome_shape.is_count_like), + "is_bounded_unit": bool(self.outcome_shape.is_bounded_unit), + } + ), + "treatment_dose": ( + None + if self.treatment_dose is None + else { + "n_distinct_doses": int(self.treatment_dose.n_distinct_doses), + "has_zero_dose": bool(self.treatment_dose.has_zero_dose), + "dose_min": float(self.treatment_dose.dose_min), + "dose_max": float(self.treatment_dose.dose_max), + "dose_mean": float(self.treatment_dose.dose_mean), + } + ), "alerts": [ { "code": a.code, @@ -281,6 +442,8 @@ def profile_panel( outcome_summary = _summarize_outcome(valid) dtype_kind = getattr(outcome_col.dtype, "kind", "O") + outcome_shape = _compute_outcome_shape(valid, dtype_kind) + treatment_dose = _compute_treatment_dose(df, treatment=treatment, treatment_type=treatment_type) alerts = _compute_alerts( n_periods=n_periods, observation_coverage=observation_coverage, @@ -318,6 +481,8 @@ def profile_panel( outcome_has_negatives=outcome_has_negatives, outcome_missing_fraction=outcome_missing_fraction, outcome_summary=outcome_summary, + outcome_shape=outcome_shape, + treatment_dose=treatment_dose, alerts=tuple(alerts), ) @@ -370,11 +535,19 @@ def _classify_treatment( # has_never_treated has a single well-defined meaning across binary # and continuous numeric treatment: some unit has treatment == 0 in # every observed non-NaN row. For binary this is the clean-control - # group; for continuous this is the zero-dose control required by - # ContinuousDiD (P(D=0) > 0). + # group; for continuous this is the zero-dose control proxy used + # alongside ContinuousDiD's `P(D=0) > 0` requirement. + # + # On binary panels (values in {0, 1}), `unit_max == 0` is equivalent + # to "all observed values are 0". On continuous panels with negative + # dose support, that equivalence breaks: a unit with path `[-1, 0]` + # would falsely satisfy `unit_max == 0` even though no row is a + # zero-dose control. Check both endpoints (`unit_max == 0` AND + # `unit_min == 0`) to enforce the documented contract — for any + # numeric panel this is exactly "every observed dose is 0". unit_max = df.groupby(unit)[treatment].max().to_numpy() unit_min = df.groupby(unit)[treatment].min().to_numpy() - has_never_treated = bool(np.any(unit_max == 0)) + has_never_treated = bool(np.any((unit_max == 0) & (unit_min == 0))) is_binary_valued = values_set <= {0, 1, 0.0, 1.0} # has_always_treated has binary-only semantics: "unit is treated in @@ -507,6 +680,117 @@ def _summarize_outcome(valid: pd.Series) -> Dict[str, float]: } +def _compute_outcome_shape(valid: pd.Series, outcome_dtype_kind: str) -> Optional[OutcomeShape]: + """Compute distributional shape for a numeric outcome. + + Returns ``None`` for non-numeric dtypes (kind not in ``{"i", "u", "f"}``) + or empty series. Skewness and excess kurtosis are gated on + ``n_distinct_values >= 3`` and non-zero variance; otherwise ``None``. + """ + if outcome_dtype_kind not in {"i", "u", "f"}: + return None + if len(valid) == 0: + return None + + arr = np.asarray(valid, dtype=float) + n = int(arr.size) + n_distinct = int(np.unique(arr).size) + pct_zeros = float(np.sum(arr == 0)) / n + value_min = float(arr.min()) + value_max = float(arr.max()) + + # Tolerance-aware integer detection: a CSV-roundtripped count column + # may carry float64 representation noise (e.g., 1.0 stored as + # 1.0000000000000002), and that should still classify as + # integer-valued for the purpose of the count-like heuristic. + is_integer_valued = bool(np.all(np.isclose(arr, np.round(arr), rtol=0.0, atol=1e-12))) + is_bounded_unit = bool(np.all((arr >= 0.0) & (arr <= 1.0))) + + skewness: Optional[float] = None + excess_kurtosis: Optional[float] = None + if n_distinct >= 3: + mean = float(np.mean(arr)) + m2 = float(np.mean((arr - mean) ** 2)) + if m2 > 0.0: + std = m2**0.5 + m3 = float(np.mean((arr - mean) ** 3)) + m4 = float(np.mean((arr - mean) ** 4)) + skewness = float(m3 / (std**3)) + excess_kurtosis = float(m4 / (m2**2) - 3.0) + + # Non-negativity is part of the contract: `is_count_like == True` + # is the routing signal toward `WooldridgeDiD(method="poisson")`, + # which hard-rejects negative outcomes at fit time + # (`wooldridge.py:1105` raises `ValueError` on `y < 0`). Without the + # `value_min >= 0` guard, a right-skewed integer outcome with zeros + # and some negatives could set `is_count_like=True` and steer an + # agent toward an estimator that will then refuse to fit. + is_count_like = bool( + is_integer_valued + and pct_zeros > 0.0 + and skewness is not None + and skewness > 0.5 + and n_distinct > 2 + and value_min >= 0.0 + ) + + return OutcomeShape( + n_distinct_values=n_distinct, + pct_zeros=pct_zeros, + value_min=value_min, + value_max=value_max, + skewness=skewness, + excess_kurtosis=excess_kurtosis, + is_integer_valued=is_integer_valued, + is_count_like=is_count_like, + is_bounded_unit=is_bounded_unit, + ) + + +def _compute_treatment_dose( + df: pd.DataFrame, + *, + treatment: str, + treatment_type: str, +) -> Optional[TreatmentDoseShape]: + """Compute distributional shape for a continuous-treatment dose column. + + Returns ``None`` unless ``treatment_type == "continuous"``. Most + fields are descriptive distributional context; ``dose_min > 0`` + is one of the profile-side screening checks for ``ContinuousDiD`` + (see :class:`TreatmentDoseShape` docstring for the full screening + set and the ``first_treat`` validation that + ``ContinuousDiD.fit()`` applies separately). + """ + if treatment_type != "continuous": + return None + + col = df[treatment].dropna() + if col.empty: + return None + + n_distinct_doses = int(col.nunique()) + has_zero_dose = bool((col == 0).any()) + + # `treatment_type == "continuous"` is reached only when the + # treatment column has more than two distinct values OR a 2-valued + # numeric outside `{0, 1}` (see `_classify_treatment`). An all-zero + # numeric column is classified as `binary_absorbing` and never + # reaches this branch, so `nonzero` is guaranteed non-empty. + nonzero = col[col != 0] + dose_min = float(nonzero.min()) + dose_max = float(nonzero.max()) + dose_mean = float(nonzero.mean()) + + return TreatmentDoseShape( + n_distinct_doses=n_distinct_doses, + has_zero_dose=has_zero_dose, + dose_min=dose_min, + dose_max=dose_max, + dose_mean=dose_mean, + ) + + def _compute_alerts( *, n_periods: int, diff --git a/tests/test_continuous_did.py b/tests/test_continuous_did.py index 610ca07d..bc4d025b 100644 --- a/tests/test_continuous_did.py +++ b/tests/test_continuous_did.py @@ -166,9 +166,7 @@ def flaky_bspline(knots, c, degree): with patch.object(bspline_mod, "BSpline", side_effect=flaky_bspline): with _w.catch_warnings(record=True) as caught: _w.simplefilter("always") - dB = bspline_derivative_design_matrix( - x, knots, degree=deg, include_intercept=True - ) + dB = bspline_derivative_design_matrix(x, knots, degree=deg, include_intercept=True) deriv_warnings = [ w for w in caught if "B-spline derivative construction failed" in str(w.message) @@ -192,9 +190,7 @@ def flaky_bspline(knots, c, degree): # (except columns 1 and 3 which were forced to zero). This guards # a regression that would zero or corrupt the entire derivative # matrix on any ValueError. - dB_baseline = bspline_derivative_design_matrix( - x, knots, degree=deg, include_intercept=True - ) + dB_baseline = bspline_derivative_design_matrix(x, knots, degree=deg, include_intercept=True) for col in range(dB.shape[1]): if col in (1, 3): continue @@ -290,13 +286,15 @@ def test_missing_column(self): est.fit(data, "outcome", "unit", "period", "first_treat", "dose") def test_non_time_invariant_dose(self): - data = pd.DataFrame({ - "unit": [1, 1, 2, 2], - "period": [1, 2, 1, 2], - "outcome": [1.0, 2.0, 1.0, 2.0], - "first_treat": [2, 2, 0, 0], - "dose": [1.0, 2.0, 0.0, 0.0], # Dose changes over time! - }) + data = pd.DataFrame( + { + "unit": [1, 1, 2, 2], + "period": [1, 2, 1, 2], + "outcome": [1.0, 2.0, 1.0, 2.0], + "first_treat": [2, 2, 0, 0], + "dose": [1.0, 2.0, 0.0, 0.0], # Dose changes over time! + } + ) est = ContinuousDiD() with pytest.raises(ValueError, match="time-invariant"): est.fit(data, "outcome", "unit", "period", "first_treat", "dose") @@ -307,18 +305,24 @@ def test_drop_zero_dose_treated(self): rows = [] uid = 0 # 1 treated unit with zero dose (should be dropped) - rows += [{"unit": uid, "period": 1, "outcome": 1.0, "first_treat": 2, "dose": 0.0}, - {"unit": uid, "period": 2, "outcome": 3.0, "first_treat": 2, "dose": 0.0}] + rows += [ + {"unit": uid, "period": 1, "outcome": 1.0, "first_treat": 2, "dose": 0.0}, + {"unit": uid, "period": 2, "outcome": 3.0, "first_treat": 2, "dose": 0.0}, + ] uid += 1 # 4 treated units with positive dose (should remain) for d in [1.0, 2.0, 3.0, 4.0]: - rows += [{"unit": uid, "period": 1, "outcome": 0.0, "first_treat": 2, "dose": d}, - {"unit": uid, "period": 2, "outcome": 2 * d, "first_treat": 2, "dose": d}] + rows += [ + {"unit": uid, "period": 1, "outcome": 0.0, "first_treat": 2, "dose": d}, + {"unit": uid, "period": 2, "outcome": 2 * d, "first_treat": 2, "dose": d}, + ] uid += 1 # 3 control units for _ in range(3): - rows += [{"unit": uid, "period": 1, "outcome": 0.0, "first_treat": 0, "dose": 0.0}, - {"unit": uid, "period": 2, "outcome": 0.0, "first_treat": 0, "dose": 0.0}] + rows += [ + {"unit": uid, "period": 1, "outcome": 0.0, "first_treat": 0, "dose": 0.0}, + {"unit": uid, "period": 2, "outcome": 0.0, "first_treat": 0, "dose": 0.0}, + ] uid += 1 data = pd.DataFrame(rows) @@ -329,52 +333,61 @@ def test_drop_zero_dose_treated(self): assert results.n_treated_units == 4 def test_unbalanced_panel_error(self): - data = pd.DataFrame({ - "unit": [1, 1, 2], - "period": [1, 2, 1], - "outcome": [1.0, 2.0, 1.0], - "first_treat": [2, 2, 0], - "dose": [1.0, 1.0, 0.0], - }) + data = pd.DataFrame( + { + "unit": [1, 1, 2], + "period": [1, 2, 1], + "outcome": [1.0, 2.0, 1.0], + "first_treat": [2, 2, 0], + "dose": [1.0, 1.0, 0.0], + } + ) est = ContinuousDiD() with pytest.raises(ValueError, match="[Uu]nbalanced"): est.fit(data, "outcome", "unit", "period", "first_treat", "dose") def test_unbalanced_panel_same_count_different_periods(self): """Units with same period count but different periods should be caught.""" - data = pd.DataFrame({ - "unit": [1, 1, 1, 2, 2, 2], - "period": [1, 2, 3, 1, 2, 4], # Same count (3) but unit 2 has {1,2,4} vs {1,2,3} - "outcome": [1.0, 2.0, 3.0, 1.0, 2.0, 3.0], - "first_treat": [2, 2, 2, 0, 0, 0], - "dose": [1.0, 1.0, 1.0, 0.0, 0.0, 0.0], - }) + data = pd.DataFrame( + { + "unit": [1, 1, 1, 2, 2, 2], + "period": [1, 2, 3, 1, 2, 4], # Same count (3) but unit 2 has {1,2,4} vs {1,2,3} + "outcome": [1.0, 2.0, 3.0, 1.0, 2.0, 3.0], + "first_treat": [2, 2, 2, 0, 0, 0], + "dose": [1.0, 1.0, 1.0, 0.0, 0.0, 0.0], + } + ) est = ContinuousDiD() with pytest.raises(ValueError, match="[Uu]nbalanced"): est.fit(data, "outcome", "unit", "period", "first_treat", "dose") def test_invalid_aggregate_raises(self): """Invalid aggregate value should raise ValueError.""" - data = pd.DataFrame({ - "unit": [1, 1, 2, 2], - "period": [1, 2, 1, 2], - "outcome": [1.0, 2.0, 1.0, 2.0], - "first_treat": [2, 2, 0, 0], - "dose": [1.0, 1.0, 0.0, 0.0], - }) + data = pd.DataFrame( + { + "unit": [1, 1, 2, 2], + "period": [1, 2, 1, 2], + "outcome": [1.0, 2.0, 1.0, 2.0], + "first_treat": [2, 2, 0, 0], + "dose": [1.0, 1.0, 0.0, 0.0], + } + ) est = ContinuousDiD() with pytest.raises(ValueError, match="Invalid aggregate"): - est.fit(data, "outcome", "unit", "period", "first_treat", "dose", - aggregate="event_study") + est.fit( + data, "outcome", "unit", "period", "first_treat", "dose", aggregate="event_study" + ) def test_no_never_treated_error(self): - data = pd.DataFrame({ - "unit": [1, 1, 2, 2], - "period": [1, 2, 1, 2], - "outcome": [1.0, 3.0, 1.0, 4.0], - "first_treat": [2, 2, 2, 2], - "dose": [1.0, 1.0, 2.0, 2.0], - }) + data = pd.DataFrame( + { + "unit": [1, 1, 2, 2], + "period": [1, 2, 1, 2], + "outcome": [1.0, 3.0, 1.0, 4.0], + "first_treat": [2, 2, 2, 2], + "dose": [1.0, 1.0, 2.0, 2.0], + } + ) est = ContinuousDiD(control_group="never_treated") with pytest.raises(ValueError, match="[Nn]ever-treated"): est.fit(data, "outcome", "unit", "period", "first_treat", "dose") @@ -385,22 +398,16 @@ class TestContinuousDiDFit: @pytest.fixture def basic_data(self): - return generate_continuous_did_data( - n_units=100, n_periods=3, seed=42, noise_sd=0.5 - ) + return generate_continuous_did_data(n_units=100, n_periods=3, seed=42, noise_sd=0.5) def test_fit_returns_results(self, basic_data): est = ContinuousDiD() - results = est.fit( - basic_data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(basic_data, "outcome", "unit", "period", "first_treat", "dose") assert isinstance(results, ContinuousDiDResults) def test_dose_response_shapes(self, basic_data): est = ContinuousDiD() - results = est.fit( - basic_data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(basic_data, "outcome", "unit", "period", "first_treat", "dose") n_grid = len(results.dose_grid) assert results.dose_response_att.effects.shape == (n_grid,) assert results.dose_response_acrt.effects.shape == (n_grid,) @@ -409,17 +416,13 @@ def test_dose_response_shapes(self, basic_data): def test_overall_parameters(self, basic_data): est = ContinuousDiD() - results = est.fit( - basic_data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(basic_data, "outcome", "unit", "period", "first_treat", "dose") assert np.isfinite(results.overall_att) assert np.isfinite(results.overall_acrt) def test_group_time_effects_populated(self, basic_data): est = ContinuousDiD() - results = est.fit( - basic_data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(basic_data, "outcome", "unit", "period", "first_treat", "dose") assert len(results.group_time_effects) > 0 def test_results_contain_init_params(self, basic_data): @@ -431,9 +434,7 @@ def test_results_contain_init_params(self, basic_data): seed=123, rank_deficient_action="error", ) - results = est.fit( - basic_data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(basic_data, "outcome", "unit", "period", "first_treat", "dose") assert results.base_period == "universal" assert results.anticipation == 0 assert results.n_bootstrap == 49 @@ -443,12 +444,13 @@ def test_results_contain_init_params(self, basic_data): def test_not_yet_treated_control(self): data = generate_continuous_did_data( - n_units=100, n_periods=4, cohort_periods=[2, 3], seed=42, + n_units=100, + n_periods=4, + cohort_periods=[2, 3], + seed=42, ) est = ContinuousDiD(control_group="not_yet_treated") - results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(data, "outcome", "unit", "period", "first_treat", "dose") assert isinstance(results, ContinuousDiDResults) @@ -457,13 +459,9 @@ class TestContinuousDiDResults: @pytest.fixture def results(self): - data = generate_continuous_did_data( - n_units=100, n_periods=3, seed=42, noise_sd=0.1 - ) + data = generate_continuous_did_data(n_units=100, n_periods=3, seed=42, noise_sd=0.1) est = ContinuousDiD(n_bootstrap=49, seed=42) - return est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + return est.fit(data, "outcome", "unit", "period", "first_treat", "dose") def test_summary(self, results): s = results.summary() @@ -515,12 +513,20 @@ class TestDoseAggregation: def test_multi_period_aggregation(self): data = generate_continuous_did_data( - n_units=200, n_periods=5, cohort_periods=[2, 4], - seed=42, noise_sd=0.1, + n_units=200, + n_periods=5, + cohort_periods=[2, 4], + seed=42, + noise_sd=0.1, ) est = ContinuousDiD(degree=1, num_knots=0) results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose", + data, + "outcome", + "unit", + "period", + "first_treat", + "dose", aggregate="dose", ) # With linear DGP (ATT(d) = 1 + 2d) and degree=1, should recover well @@ -529,11 +535,19 @@ def test_multi_period_aggregation(self): def test_single_cohort_aggregation(self): data = generate_continuous_did_data( - n_units=100, n_periods=3, seed=42, noise_sd=0.1, + n_units=100, + n_periods=3, + seed=42, + noise_sd=0.1, ) est = ContinuousDiD(degree=1, num_knots=0) results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose", + data, + "outcome", + "unit", + "period", + "first_treat", + "dose", aggregate="dose", ) assert len(results.groups) == 1 @@ -545,12 +559,20 @@ class TestEventStudyAggregation: def test_event_study_computed(self): data = generate_continuous_did_data( - n_units=200, n_periods=5, cohort_periods=[2, 4], - seed=42, noise_sd=0.5, + n_units=200, + n_periods=5, + cohort_periods=[2, 4], + seed=42, + noise_sd=0.5, ) est = ContinuousDiD(n_bootstrap=49, seed=42) results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose", + data, + "outcome", + "unit", + "period", + "first_treat", + "dose", aggregate="eventstudy", ) assert results.event_study_effects is not None @@ -561,12 +583,20 @@ def test_event_study_computed(self): def test_event_study_to_dataframe(self): data = generate_continuous_did_data( - n_units=200, n_periods=4, cohort_periods=[2, 3], - seed=42, noise_sd=0.5, + n_units=200, + n_periods=4, + cohort_periods=[2, 3], + seed=42, + noise_sd=0.5, ) est = ContinuousDiD() results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose", + data, + "outcome", + "unit", + "period", + "first_treat", + "dose", aggregate="eventstudy", ) df = results.to_dataframe(level="event_study") @@ -576,12 +606,20 @@ def test_event_study_to_dataframe(self): def test_event_study_not_yet_treated(self): """Event study with control_group='not_yet_treated' and analytic SE.""" data = generate_continuous_did_data( - n_units=200, n_periods=5, cohort_periods=[2, 4], - seed=42, noise_sd=0.5, + n_units=200, + n_periods=5, + cohort_periods=[2, 4], + seed=42, + noise_sd=0.5, ) est = ContinuousDiD(control_group="not_yet_treated", n_bootstrap=0) results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose", + data, + "outcome", + "unit", + "period", + "first_treat", + "dose", aggregate="eventstudy", ) assert results.event_study_effects is not None @@ -595,12 +633,20 @@ def test_event_study_not_yet_treated(self): def test_event_study_universal_base_period(self): """Event study with base_period='universal' and analytic SE.""" data = generate_continuous_did_data( - n_units=200, n_periods=5, cohort_periods=[2, 4], - seed=42, noise_sd=0.5, + n_units=200, + n_periods=5, + cohort_periods=[2, 4], + seed=42, + noise_sd=0.5, ) est = ContinuousDiD(base_period="universal", n_bootstrap=0) results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose", + data, + "outcome", + "unit", + "period", + "first_treat", + "dose", aggregate="eventstudy", ) assert results.event_study_effects is not None @@ -615,14 +661,24 @@ def test_event_study_not_yet_treated_bootstrap(self, ci_params): """Event study with not_yet_treated control group and bootstrap SE.""" n_boot = ci_params.bootstrap(99) data = generate_continuous_did_data( - n_units=200, n_periods=5, cohort_periods=[2, 4], - seed=42, noise_sd=0.5, + n_units=200, + n_periods=5, + cohort_periods=[2, 4], + seed=42, + noise_sd=0.5, ) est = ContinuousDiD( - control_group="not_yet_treated", n_bootstrap=n_boot, seed=42, + control_group="not_yet_treated", + n_bootstrap=n_boot, + seed=42, ) results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose", + data, + "outcome", + "unit", + "period", + "first_treat", + "dose", aggregate="eventstudy", ) assert results.event_study_effects is not None @@ -641,12 +697,13 @@ class TestBootstrap: def test_bootstrap_ses_positive(self, ci_params): n_boot = ci_params.bootstrap(99) data = generate_continuous_did_data( - n_units=100, n_periods=3, seed=42, noise_sd=0.5, + n_units=100, + n_periods=3, + seed=42, + noise_sd=0.5, ) est = ContinuousDiD(n_bootstrap=n_boot, seed=42) - results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(data, "outcome", "unit", "period", "first_treat", "dose") assert results.overall_att_se > 0 assert results.overall_acrt_se > 0 # Dose-response SEs should be positive @@ -655,12 +712,13 @@ def test_bootstrap_ses_positive(self, ci_params): def test_bootstrap_ci_contains_estimate(self, ci_params): n_boot = ci_params.bootstrap(99) data = generate_continuous_did_data( - n_units=100, n_periods=3, seed=42, noise_sd=0.5, + n_units=100, + n_periods=3, + seed=42, + noise_sd=0.5, ) est = ContinuousDiD(n_bootstrap=n_boot, seed=42) - results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(data, "outcome", "unit", "period", "first_treat", "dose") lo, hi = results.overall_att_conf_int assert lo <= results.overall_att <= hi @@ -668,13 +726,16 @@ def test_bootstrap_acrt_ci_centered(self, ci_params): """Bootstrap ACRT CI should bracket the point estimate, not zero.""" n_boot = ci_params.bootstrap(99) data = generate_continuous_did_data( - n_units=200, n_periods=3, seed=42, noise_sd=0.5, - att_function="linear", att_slope=2.0, att_intercept=1.0, + n_units=200, + n_periods=3, + seed=42, + noise_sd=0.5, + att_function="linear", + att_slope=2.0, + att_intercept=1.0, ) est = ContinuousDiD(n_bootstrap=n_boot, seed=42) - results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(data, "outcome", "unit", "period", "first_treat", "dose") lo, hi = results.overall_acrt_conf_int assert lo <= results.overall_acrt <= hi, ( f"ACRT CI [{lo:.4f}, {hi:.4f}] does not bracket " @@ -691,12 +752,13 @@ def test_bootstrap_acrt_ci_centered(self, ci_params): def test_bootstrap_p_values_valid(self, ci_params): n_boot = ci_params.bootstrap(99) data = generate_continuous_did_data( - n_units=100, n_periods=3, seed=42, noise_sd=0.5, + n_units=100, + n_periods=3, + seed=42, + noise_sd=0.5, ) est = ContinuousDiD(n_bootstrap=n_boot, seed=42) - results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(data, "outcome", "unit", "period", "first_treat", "dose") assert 0 <= results.overall_att_p_value <= 1 assert 0 <= results.overall_acrt_p_value <= 1 @@ -704,25 +766,26 @@ def test_bootstrap_dose_response_p_values(self, ci_params): """Bootstrap dose-response should use bootstrap p-values, not normal approx.""" n_boot = ci_params.bootstrap(99) data = generate_continuous_did_data( - n_units=100, n_periods=3, seed=42, noise_sd=0.5, + n_units=100, + n_periods=3, + seed=42, + noise_sd=0.5, ) est = ContinuousDiD(n_bootstrap=n_boot, seed=42) - results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(data, "outcome", "unit", "period", "first_treat", "dose") for curve in [results.dose_response_att, results.dose_response_acrt]: df = curve.to_dataframe() # Bootstrap mode: t-stat is undefined - assert all(np.isnan(df["t_stat"])), ( - f"t_stat should be NaN in bootstrap mode for {curve.target}" - ) + assert all( + np.isnan(df["t_stat"]) + ), f"t_stat should be NaN in bootstrap mode for {curve.target}" # Bootstrap p-values should be present and valid - assert all(np.isfinite(df["p_value"])), ( - f"p_value should be finite in bootstrap mode for {curve.target}" - ) - assert all((df["p_value"] >= 0) & (df["p_value"] <= 1)), ( - f"p_value out of [0,1] range for {curve.target}" - ) + assert all( + np.isfinite(df["p_value"]) + ), f"p_value should be finite in bootstrap mode for {curve.target}" + assert all( + (df["p_value"] >= 0) & (df["p_value"] <= 1) + ), f"p_value out of [0,1] range for {curve.target}" class TestAnalyticalSE: @@ -730,23 +793,25 @@ class TestAnalyticalSE: def test_analytical_se_positive(self): data = generate_continuous_did_data( - n_units=100, n_periods=3, seed=42, noise_sd=0.5, + n_units=100, + n_periods=3, + seed=42, + noise_sd=0.5, ) est = ContinuousDiD(n_bootstrap=0) - results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(data, "outcome", "unit", "period", "first_treat", "dose") assert results.overall_att_se > 0 assert results.overall_acrt_se > 0 def test_analytical_ci(self): data = generate_continuous_did_data( - n_units=100, n_periods=3, seed=42, noise_sd=0.5, + n_units=100, + n_periods=3, + seed=42, + noise_sd=0.5, ) est = ContinuousDiD(n_bootstrap=0) - results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(data, "outcome", "unit", "period", "first_treat", "dose") lo, hi = results.overall_att_conf_int assert lo < results.overall_att < hi @@ -757,13 +822,13 @@ class TestEdgeCases: def test_few_treated_units(self): """Estimator should handle very few treated units.""" data = generate_continuous_did_data( - n_units=30, n_periods=3, seed=42, + n_units=30, + n_periods=3, + seed=42, never_treated_frac=0.8, # Only ~6 treated ) est = ContinuousDiD(degree=1, num_knots=0) - results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(data, "outcome", "unit", "period", "first_treat", "dose") assert isinstance(results, ContinuousDiDResults) def test_inf_first_treat_normalization(self): @@ -781,9 +846,7 @@ def test_inf_first_treat_normalization(self): UserWarning, match=rf"{n_inf_rows} row\(s\) have inf in 'first_treat'", ): - results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(data, "outcome", "unit", "period", "first_treat", "dose") assert results.n_control_units > 0 def test_no_inf_first_treat_no_warning(self): @@ -814,10 +877,15 @@ def test_nonzero_dose_on_never_treated_warns(self): else: ft, dose_val = 2.0, 1.0 # treated for t in range(1, 4): - rows.append({ - "unit": unit, "period": t, "outcome": float(unit + t), - "first_treat": ft, "dose": dose_val, - }) + rows.append( + { + "unit": unit, + "period": t, + "outcome": float(unit + t), + "first_treat": ft, + "dose": dose_val, + } + ) data = pd.DataFrame(rows) est = ContinuousDiD() @@ -832,9 +900,60 @@ def test_nonzero_dose_on_never_treated_warns(self): # treated for OLS); we only care about the dose-coercion warning. pass + def test_negative_dose_on_never_treated_coerces_not_rejects(self): + """Force-zero coercion applies to ANY nonzero dose on `first_treat=0` + rows, including negative values. The negative-dose rejection at + line 287-294 of continuous_did.py applies only to treated units + (`first_treat > 0`); never-treated rows are coerced to dose=0 + with a `UserWarning` regardless of sign. This is observed + implementation behavior for inconsistent inputs (an + accidentally-nonzero dose on a never-treated row), NOT a + documented routing option for manufacturing never-treated + controls — REGISTRY does not list relabeling as a fallback. + The test locks in the coercion contract; the autonomous guide + §5.2 counter-example #5 explicitly tells agents not to use this + path methodologically.""" + rows = [] + for unit in range(4): + if unit < 2: + ft, dose_val = 0.0, -1.5 # never-treated with NEGATIVE dose + else: + ft, dose_val = 2.0, 1.0 # treated, positive dose + for t in range(1, 4): + rows.append( + { + "unit": unit, + "period": t, + "outcome": float(unit + t), + "first_treat": ft, + "dose": dose_val, + } + ) + data = pd.DataFrame(rows) + est = ContinuousDiD() + with pytest.warns( + UserWarning, + match=r"6 row\(s\) have 'first_treat'=0 \(never-treated\) but nonzero 'dose'", + ): + try: + est.fit(data, "outcome", "unit", "period", "first_treat", "dose") + except ValueError as e: + # Downstream may reject the minimal panel for other reasons, + # but the rejection MUST NOT be the "negative dose" message + # (which only applies to treated units). + assert "negative dose" not in str(e).lower(), ( + "Negative dose on never-treated rows must coerce, not " + "raise the treated-unit negative-dose error." + ) + except Exception: + # Other downstream errors (small-panel OLS) are acceptable; + # the warning emission is what we are guarding here. + pass + def test_clean_never_treated_doses_silent(self): """Never-treated rows with dose=0 must not trigger the coercion warning.""" import warnings + data = generate_continuous_did_data(n_units=50, n_periods=3, seed=42) # generate_continuous_did_data already sets dose=0 for never-treated. est = ContinuousDiD() @@ -844,8 +963,7 @@ def test_clean_never_treated_doses_silent(self): est.fit(data, "outcome", "unit", "period", "first_treat", "dose") coerce_warnings = [ - x for x in w - if "never-treated" in str(x.message) and "nonzero 'dose'" in str(x.message) + x for x in w if "never-treated" in str(x.message) and "nonzero 'dose'" in str(x.message) ] assert coerce_warnings == [] @@ -864,10 +982,15 @@ def test_negative_first_treat_raises_with_row_count(self): else: ft = 0.0 for t in range(1, 4): - rows.append({ - "unit": unit, "period": t, "outcome": float(unit + t), - "first_treat": ft, "dose": 0.0, - }) + rows.append( + { + "unit": unit, + "period": t, + "outcome": float(unit + t), + "first_treat": ft, + "dose": 0.0, + } + ) data = pd.DataFrame(rows) est = ContinuousDiD() @@ -887,10 +1010,15 @@ def test_nan_first_treat_raises_with_row_count(self): # Unit 0 has NaN first_treat across all 3 periods (3 NaN rows). ft = np.nan if unit == 0 else 0.0 for t in range(1, 4): - rows.append({ - "unit": unit, "period": t, "outcome": float(unit + t), - "first_treat": ft, "dose": 0.0, - }) + rows.append( + { + "unit": unit, + "period": t, + "outcome": float(unit + t), + "first_treat": ft, + "dose": 0.0, + } + ) data = pd.DataFrame(rows) est = ContinuousDiD() @@ -905,14 +1033,20 @@ def test_positive_inf_warning_silent_when_no_inf(self): non-negative values (including just 0 and positive periods) must never trigger the recategorization warning.""" import warnings + rows = [] for unit in range(4): ft = 0.0 if unit < 2 else 2.0 for t in range(1, 4): - rows.append({ - "unit": unit, "period": t, "outcome": float(unit + t), - "first_treat": ft, "dose": 0.0 if unit < 2 else 1.0, - }) + rows.append( + { + "unit": unit, + "period": t, + "outcome": float(unit + t), + "first_treat": ft, + "dose": 0.0 if unit < 2 else 1.0, + } + ) data = pd.DataFrame(rows) est = ContinuousDiD() @@ -937,10 +1071,15 @@ def test_inf_first_treat_warning_counts_rows_not_units(self): ft = np.inf if unit < 2 else 2.0 dose = 0.0 if unit < 2 else 1.0 for t in range(1, 4): - rows.append({ - "unit": unit, "period": t, "outcome": float(unit + t), - "first_treat": ft, "dose": dose, - }) + rows.append( + { + "unit": unit, + "period": t, + "outcome": float(unit + t), + "first_treat": ft, + "dose": dose, + } + ) data = pd.DataFrame(rows) est = ContinuousDiD() @@ -960,9 +1099,7 @@ def test_custom_dvals(self): data = generate_continuous_did_data(n_units=100, n_periods=3, seed=42) custom_grid = np.array([1.0, 2.0, 3.0]) est = ContinuousDiD(dvals=custom_grid) - results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(data, "outcome", "unit", "period", "first_treat", "dose") np.testing.assert_array_equal(results.dose_grid, custom_grid) assert len(results.dose_response_att.effects) == 3 @@ -992,10 +1129,15 @@ def test_not_yet_treated_excludes_own_cohort(self): for i in range(n_per_group): uid = i for t in periods: - rows.append({ - "unit": uid, "period": t, "first_treat": 0, "dose": 0.0, - "outcome": rng.normal(0, 0.5), - }) + rows.append( + { + "unit": uid, + "period": t, + "first_treat": 0, + "dose": 0.0, + "outcome": rng.normal(0, 0.5), + } + ) # Group 2: treated at period 2 (g=2), moderate dose for i in range(n_per_group): uid = n_per_group + i @@ -1004,10 +1146,15 @@ def test_not_yet_treated_excludes_own_cohort(self): y = rng.normal(0, 0.5) if t >= 2: y += 5.0 * dose_i # strong treatment effect - rows.append({ - "unit": uid, "period": t, "first_treat": 2, "dose": dose_i, - "outcome": y, - }) + rows.append( + { + "unit": uid, + "period": t, + "first_treat": 2, + "dose": dose_i, + "outcome": y, + } + ) # Group 3: treated at period 3 (g=3), high dose for i in range(n_per_group): uid = 2 * n_per_group + i @@ -1016,25 +1163,37 @@ def test_not_yet_treated_excludes_own_cohort(self): y = rng.normal(0, 0.5) if t >= 3: y += 5.0 * dose_i - rows.append({ - "unit": uid, "period": t, "first_treat": 3, "dose": dose_i, - "outcome": y, - }) + rows.append( + { + "unit": uid, + "period": t, + "first_treat": 3, + "dose": dose_i, + "outcome": y, + } + ) data = pd.DataFrame(rows) est = ContinuousDiD( - control_group="not_yet_treated", degree=1, num_knots=0, n_bootstrap=0, + control_group="not_yet_treated", + degree=1, + num_knots=0, + n_bootstrap=0, ) results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose", + data, + "outcome", + "unit", + "period", + "first_treat", + "dose", ) # Pre-treatment cells for g=2 should be near zero (t=1 is pre-treatment) # If cohort g=2 were included in its own control set, the pre-treatment # difference would be contaminated by the cohort's own outcomes pre_treatment_effects = { - (g, t): v for (g, t), v in results.group_time_effects.items() - if t < g + (g, t): v for (g, t), v in results.group_time_effects.items() if t < g } for (g, t), cell in pre_treatment_effects.items(): att_glob = cell.get("att_glob", 0) @@ -1051,12 +1210,13 @@ def test_analytical_se_matches_bootstrap(self, ci_params): """Analytical SEs should be within ~50% of bootstrap SEs.""" n_boot = ci_params.bootstrap(999, min_n=199) data = generate_continuous_did_data( - n_units=200, n_periods=3, seed=42, noise_sd=1.0, + n_units=200, + n_periods=3, + seed=42, + noise_sd=1.0, ) est_boot = ContinuousDiD(n_bootstrap=n_boot, seed=42) - results_boot = est_boot.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results_boot = est_boot.fit(data, "outcome", "unit", "period", "first_treat", "dose") est_analytic = ContinuousDiD(n_bootstrap=0) results_analytic = est_analytic.fit( data, "outcome", "unit", "period", "first_treat", "dose" @@ -1075,7 +1235,9 @@ class TestDiscreteDoseWarning: def test_discrete_dose_warning(self): """Integer-valued doses should trigger a discrete dose warning.""" data = generate_continuous_did_data( - n_units=100, n_periods=3, seed=42, + n_units=100, + n_periods=3, + seed=42, ) data["dose"] = data["dose"].round().astype(float) data.loc[data["first_treat"] == 0, "dose"] = 0.0 @@ -1090,20 +1252,28 @@ class TestAnticipationEventStudy: def test_anticipation_event_study(self): """Event study with anticipation > 0 should include anticipation periods.""" data = generate_continuous_did_data( - n_units=100, n_periods=5, cohort_periods=[3], seed=42, + n_units=100, + n_periods=5, + cohort_periods=[3], + seed=42, ) est = ContinuousDiD(anticipation=1, n_bootstrap=0) results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose", + data, + "outcome", + "unit", + "period", + "first_treat", + "dose", aggregate="eventstudy", ) assert results.event_study_effects is not None # With anticipation=1 and g=3, post-treatment starts at t=2 (g - anticipation). # Relative times e = t - g, so t=2 → e=-1 (the anticipation period). rel_times = sorted(results.event_study_effects.keys()) - assert -1 in rel_times, ( - f"Anticipation period e=-1 missing from event study; got {rel_times}" - ) + assert ( + -1 in rel_times + ), f"Anticipation period e=-1 missing from event study; got {rel_times}" assert np.isfinite(results.event_study_effects[-1]["effect"]) def test_anticipation_event_study_excludes_contaminated_periods(self): @@ -1116,33 +1286,45 @@ def test_anticipation_event_study_excludes_contaminated_periods(self): # Never-treated for i in range(n_per_group): for t in periods: - rows.append({ - "unit": i, "period": t, "first_treat": 0, - "dose": 0.0, "outcome": rng.normal(0, 0.5), - }) + rows.append( + { + "unit": i, + "period": t, + "first_treat": 0, + "dose": 0.0, + "outcome": rng.normal(0, 0.5), + } + ) # Cohort g=5 — treatment at t=5, anticipation=2 means post at t>=3 for i in range(n_per_group): uid = n_per_group + i d = rng.uniform(0.5, 2.0) for t in periods: y = rng.normal(0, 0.5) + (2.0 * d if t >= 5 else 0) - rows.append({ - "unit": uid, "period": t, "first_treat": 5, - "dose": d, "outcome": y, - }) + rows.append( + { + "unit": uid, + "period": t, + "first_treat": 5, + "dose": d, + "outcome": y, + } + ) data = pd.DataFrame(rows) est = ContinuousDiD(anticipation=2, n_bootstrap=0) results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose", + data, + "outcome", + "unit", + "period", + "first_treat", + "dose", aggregate="eventstudy", ) assert results.event_study_effects is not None for e in results.event_study_effects.keys(): - assert e >= -2, ( - f"Found relative period e={e} with anticipation=2; " - f"expected e >= -2" - ) + assert e >= -2, f"Found relative period e={e} with anticipation=2; " f"expected e >= -2" def test_anticipation_not_yet_treated_excludes_anticipation_window(self): """Not-yet-treated controls must exclude cohorts in the anticipation window. @@ -1161,10 +1343,15 @@ def test_anticipation_not_yet_treated_excludes_anticipation_window(self): for i in range(n_per_group): uid = i for t in periods: - rows.append({ - "unit": uid, "period": t, "first_treat": 0, - "dose": 0.0, "outcome": rng.normal(0, 0.5), - }) + rows.append( + { + "unit": uid, + "period": t, + "first_treat": 0, + "dose": 0.0, + "outcome": rng.normal(0, 0.5), + } + ) # Early cohort: g=3, treatment effect = +5*dose at t>=3 for i in range(n_per_group): @@ -1172,10 +1359,15 @@ def test_anticipation_not_yet_treated_excludes_anticipation_window(self): d = rng.uniform(1, 3) for t in periods: y = rng.normal(0, 0.5) + (5.0 * d if t >= 3 else 0) - rows.append({ - "unit": uid, "period": t, "first_treat": 3, - "dose": d, "outcome": y, - }) + rows.append( + { + "unit": uid, + "period": t, + "first_treat": 3, + "dose": d, + "outcome": y, + } + ) # Late cohort: g=5, treatment effect = +5*dose at t>=5 for i in range(n_per_group): @@ -1183,26 +1375,36 @@ def test_anticipation_not_yet_treated_excludes_anticipation_window(self): d = rng.uniform(1, 3) for t in periods: y = rng.normal(0, 0.5) + (5.0 * d if t >= 5 else 0) - rows.append({ - "unit": uid, "period": t, "first_treat": 5, - "dose": d, "outcome": y, - }) + rows.append( + { + "unit": uid, + "period": t, + "first_treat": 5, + "dose": d, + "outcome": y, + } + ) data = pd.DataFrame(rows) est = ContinuousDiD( - anticipation=1, control_group="not_yet_treated", n_bootstrap=0, + anticipation=1, + control_group="not_yet_treated", + n_bootstrap=0, ) results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose", + data, + "outcome", + "unit", + "period", + "first_treat", + "dose", ) - assert np.isfinite(results.overall_att), ( - "overall_att should be finite with anticipation + not_yet_treated" - ) - assert results.dose_response_att is not None, ( - "dose-response curve should exist" - ) + assert np.isfinite( + results.overall_att + ), "overall_att should be finite with anticipation + not_yet_treated" + assert results.dose_response_att is not None, "dose-response curve should exist" class TestEmptyPostTreatment: @@ -1211,13 +1413,14 @@ class TestEmptyPostTreatment: def test_no_post_treatment_cells_warns(self): """When no post-treatment cells exist, should warn and return NaN.""" data = generate_continuous_did_data( - n_units=50, n_periods=3, cohort_periods=[5], seed=42, + n_units=50, + n_periods=3, + cohort_periods=[5], + seed=42, ) est = ContinuousDiD() with pytest.warns(UserWarning, match="[Nn]o post-treatment"): - results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(data, "outcome", "unit", "period", "first_treat", "dose") assert np.isnan(results.overall_att) assert np.isnan(results.overall_acrt) @@ -1255,12 +1458,13 @@ def test_bootstrap_percentile_ci(self, ci_params): """Bootstrap CIs should use percentile method (generally asymmetric).""" n_boot = ci_params.bootstrap(499, min_n=199) data = generate_continuous_did_data( - n_units=200, n_periods=3, seed=42, noise_sd=0.5, + n_units=200, + n_periods=3, + seed=42, + noise_sd=0.5, ) est = ContinuousDiD(n_bootstrap=n_boot, seed=42) - results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + results = est.fit(data, "outcome", "unit", "period", "first_treat", "dose") lo, hi = results.overall_att_conf_int estimate = results.overall_att # CI should contain estimate @@ -1290,9 +1494,7 @@ def test_no_never_treated_raises(self): ) est = ContinuousDiD(control_group="not_yet_treated", degree=1, num_knots=0) with pytest.raises(ValueError, match="D=0"): - est.fit( - data, "outcome", "unit", "period", "first_treat", "dose" - ) + est.fit(data, "outcome", "unit", "period", "first_treat", "dose") class TestEventStudyAnalyticalSE: @@ -1301,12 +1503,20 @@ class TestEventStudyAnalyticalSE: def test_event_study_analytical_se_finite(self): """Event study with n_bootstrap=0 should produce finite SE/t/p for all bins.""" data = generate_continuous_did_data( - n_units=200, n_periods=5, cohort_periods=[2, 4], - seed=42, noise_sd=0.5, + n_units=200, + n_periods=5, + cohort_periods=[2, 4], + seed=42, + noise_sd=0.5, ) est = ContinuousDiD(n_bootstrap=0) results = est.fit( - data, "outcome", "unit", "period", "first_treat", "dose", + data, + "outcome", + "unit", + "period", + "first_treat", + "dose", aggregate="eventstudy", ) assert results.event_study_effects is not None @@ -1317,6 +1527,4 @@ def test_event_study_analytical_se_finite(self): assert np.isfinite(info["p_value"]), f"p_value is NaN for e={e}" assert 0 <= info["p_value"] <= 1, f"p_value out of range for e={e}" lo, hi = info["conf_int"] - assert np.isfinite(lo) and np.isfinite(hi), ( - f"conf_int contains NaN for e={e}" - ) + assert np.isfinite(lo) and np.isfinite(hi), f"conf_int contains NaN for e={e}" diff --git a/tests/test_guides.py b/tests/test_guides.py index 2d08871d..e51f6920 100644 --- a/tests/test_guides.py +++ b/tests/test_guides.py @@ -20,10 +20,19 @@ def test_default_is_concise(): def test_full_is_largest(): - lengths = {v: len(get_llm_guide(v)) for v in ("concise", "full", "practitioner", "autonomous")} + """`llms-full.txt` is the API-docs roll-up; it should remain larger + than the short `concise` summary and the workflow-prose + `practitioner` guide. The `autonomous` reference guide is + deliberately excluded from this comparison: it serves a different + audience (LLM agents reasoning about estimator choice) and has + grown organically through Wave 1 + Wave 2 review rounds with + estimator-matrix detail, worked examples, and contract citations + that don't have a counterpart in `llms-full.txt`'s API roll-up. + Either of the two can be larger without violating any user-facing + invariant.""" + lengths = {v: len(get_llm_guide(v)) for v in ("concise", "full", "practitioner")} assert lengths["full"] > lengths["concise"] assert lengths["full"] > lengths["practitioner"] - assert lengths["full"] > lengths["autonomous"] def test_content_stability_practitioner_workflow(): @@ -38,6 +47,185 @@ def test_content_stability_autonomous_fingerprints(): text = get_llm_guide("autonomous") assert "profile_panel" in text assert "estimator-support matrix" in text.lower() + # Wave 2 additions: outcome / dose shape field references. + assert "outcome_shape" in text + assert "treatment_dose" in text + assert "is_count_like" in text + # has_never_treated is the authoritative ContinuousDiD gate; + # treatment_dose fields are descriptive only. + assert "has_never_treated" in text + # The ContinuousDiD prerequisite summary must continue to mention + # the duplicate-row hard stop alongside the field-based gates - + # `_precompute_structures()` silently resolves duplicate cells via + # last-row-wins, so a reader treating the summary as exhaustive + # could route duplicate-containing panels into a silent-overwrite + # path. Guard against that wording regression. + assert "duplicate_unit_time_rows" in text, ( + "ContinuousDiD prerequisite summary must mention the " + "`duplicate_unit_time_rows` alert: the precompute path resolves " + "duplicate (unit, time) cells via last-row-wins, so duplicates " + "must be removed before fitting." + ) + # ContinuousDiD also requires strictly positive treated doses + # (`continuous_did.py:287-294` raises on negative dose support). + # The autonomous guide must list `dose_min > 0` so an agent reading + # `treatment_dose.dose_min == -1.5` knows to route the panel away + # from ContinuousDiD before paying for the failed fit. + assert "dose_min > 0" in text, ( + "ContinuousDiD prerequisite summary must mention " + "`dose_min > 0`: the estimator hard-rejects negative treated " + "dose support at line 287-294 of continuous_did.py." + ) + # The five profile-side screening checks are necessary but not + # sufficient: `ContinuousDiD.fit()` takes a separate `first_treat` + # column (which `profile_panel` does not see) and applies + # additional validation. The autonomous guide must explicitly + # mention the `first_treat` validation surface so an agent + # passing the profile-side screen still knows to validate the + # `first_treat` column they will supply to `fit()`. + assert "first_treat" in text, ( + "ContinuousDiD documentation must mention the separate " + "`first_treat` column that `ContinuousDiD.fit()` validates " + "(NaN/inf/negative rejection, dose=0 unit drops, force-zero " + "coercion). The five profile-side screening checks alone are " + "necessary but not sufficient for fit-time success." + ) + + +def test_autonomous_contains_worked_examples_section(): + """The §5 worked-examples section walks an agent through three + end-to-end PanelProfile -> reasoning -> validation flows. Each + example carries a unique fingerprint phrase keyed off its + PanelProfile -> estimator path; these regressions guard the + examples from accidental deletion or scope drift.""" + text = get_llm_guide("autonomous") + assert "## §5. Worked examples" in text + # §5.1: binary staggered with never-treated -> CallawaySantAnna + assert "§5.1 Binary staggered panel with never-treated controls" in text + assert 'control_group="never_treated"' in text + # §5.2: continuous dose -> ContinuousDiD prerequisites via treatment_dose + assert "§5.2 Continuous-dose panel with zero-dose controls" in text + assert "TreatmentDoseShape(" in text + # §5.3: count-shaped outcome -> WooldridgeDiD QMLE + assert "§5.3 Count-shaped outcome" in text + assert 'WooldridgeDiD(method="poisson")' in text + assert 'WooldridgeDiD(family="poisson")' not in text, ( + "WooldridgeDiD takes `method=` not `family=`; the wrong kwarg " + "in the autonomous guide would produce runtime errors when an " + "agent follows the worked example." + ) + + +def test_autonomous_count_outcome_uses_asf_outcome_scale_estimand(): + """§4.11 and §5.3 must describe `WooldridgeDiD(method="poisson")`'s + `overall_att` as an ASF-based outcome-scale difference (matching the + estimator at `wooldridge.py:1225` and the reporting helper at + `_reporting_helpers.py:262-281`), NOT as a multiplicative / + proportional / log-link effect. An agent following an example that + described the headline as "multiplicative" would misreport the + scalar - the library's reported `overall_att` is `E[exp(η_1)] - + E[exp(η_0)]`, a difference on the natural outcome scale. + + Guards against regressing the wording back to "multiplicative + effect" / "proportional change" framing. Multiplicative + interpretations may appear in the guide as a clearly-marked + derived post-hoc reading, but never as the description of the + estimator's reported `overall_att`.""" + text = get_llm_guide("autonomous") + # Locate §4.11 and §5.3 blocks; check that within them the Poisson + # path is described with ASF / outcome-scale wording, NOT as the + # estimator's reported scalar being multiplicative or proportional. + sec_4_11_start = text.index("### §4.11 Outcome-shape considerations") + sec_4_11_end = text.index("## §5. Worked examples") + sec_4_11 = text[sec_4_11_start:sec_4_11_end] + + sec_5_3_start = text.index("### §5.3 Count-shaped outcome") + sec_5_3_end = text.index("## §6. Post-fit validation utilities") + sec_5_3 = text[sec_5_3_start:sec_5_3_end] + + forbidden_phrases = ( + "multiplicative effect under qmle", + "estimates the multiplicative effect", + "multiplicative (log-link) effect", + "report the multiplicative effect", + "report the multiplicative", + ) + for section_name, body in (("§4.11", sec_4_11), ("§5.3", sec_5_3)): + lowered = body.lower() + for phrase in forbidden_phrases: + assert phrase not in lowered, ( + f"{section_name} of the autonomous guide describes the " + f"WooldridgeDiD Poisson `overall_att` with the phrase " + f"{phrase!r}; the estimator returns an ASF-based " + f"outcome-scale difference (`E[exp(η_1)] - E[exp(η_0)]`), " + f"not a multiplicative ratio. See `wooldridge.py:1225` " + f"and `_reporting_helpers.py:262-281`." + ) + + # Positive: each block must explicitly anchor the estimand to the + # ASF / outcome-scale framing so future edits can't silently weaken + # the description. + assert "ASF" in sec_5_3, "§5.3 must reference the ASF interpretation" + assert "outcome scale" in sec_5_3.lower(), ( + "§5.3 must label the WooldridgeDiD `overall_att` as an " + "outcome-scale quantity to prevent multiplicative-ratio drift." + ) + + +def test_autonomous_negative_dose_path_does_not_route_to_had(): + """The §5.2 negative-dose counter-example must not present + `HeterogeneousAdoptionDiD` as a direct routing alternative + when `dose_min < 0`. HAD's contract requires non-negative + dose support and raises on negative post-period dose + (`had.py:1450-1459`, paper Section 2). Routing to HAD on a + negative-dose panel without re-encoding would steer the agent + into an unsupported estimator path. Guards against the wording + regressing back to a too-broad "HAD as fallback" framing on + this branch.""" + text = get_llm_guide("autonomous") + # Locate counter-example #5 (negative-dose path) within §5.2. + sec_5_2_start = text.index("### §5.2 Continuous-dose panel") + sec_5_3_start = text.index("### §5.3 Count-shaped outcome") + sec_5_2 = text[sec_5_2_start:sec_5_3_start] + # The negative-dose paragraph must explicitly state HAD is NOT a + # routing alternative on this branch. We assert the disqualifying + # phrase is present; we do not forbid `HeterogeneousAdoptionDiD` + # entirely because the section may legitimately mention it as a + # candidate AFTER re-encoding. + assert "HAD" in sec_5_2 or "HeterogeneousAdoptionDiD" in sec_5_2, ( + "§5.2 must mention HAD by name on the negative-dose branch " + "so its non-applicability can be explicitly called out." + ) + assert "had.py:1450-1459" in sec_5_2, ( + "§5.2 must cite `had.py:1450-1459` on the negative-dose " + "branch to anchor HAD's non-negative-dose contract (HAD " + "raises on negative post-period dose, paper Section 2). " + "Without this citation, the agent could route a " + "negative-dose panel directly to HAD and hit a fit-time " + "error." + ) + + +def test_autonomous_worked_examples_avoid_recommender_language(): + """Worked examples must mirror the rest of the guide's discipline: + no prescriptive language in the example reasoning. Multiple paths + must remain explicit.""" + text = get_llm_guide("autonomous") + # Locate the §5 block; check its body for forbidden phrasing. + start = text.index("## §5. Worked examples") + end = text.index("## §6. Post-fit validation utilities") + section_5 = text[start:end].lower() + forbidden = ( + "you should always", + "always pick", + "we recommend", + "the best estimator is", + ) + for phrase in forbidden: + assert phrase not in section_5, ( + f"§5 worked examples contain prescriptive phrase {phrase!r}; " + "the guide must keep multiple paths explicit." + ) def test_autonomous_contains_intact_estimator_matrix(): diff --git a/tests/test_profile_panel.py b/tests/test_profile_panel.py index b1d7a9f5..ff1d4f90 100644 --- a/tests/test_profile_panel.py +++ b/tests/test_profile_panel.py @@ -317,6 +317,8 @@ def test_to_dict_is_json_serializable(): "outcome_has_negatives", "outcome_missing_fraction", "outcome_summary", + "outcome_shape", + "treatment_dose", "alerts", } @@ -400,14 +402,23 @@ def test_all_nan_treatment_is_categorical(): def test_top_level_import_surface(): - """profile_panel, PanelProfile, and Alert must be importable from the - top-level namespace so `help(diff_diff)` points at real symbols.""" + """profile_panel, PanelProfile, Alert, OutcomeShape, and + TreatmentDoseShape must be importable from the top-level namespace + so `help(diff_diff)` points at real symbols.""" import diff_diff assert callable(diff_diff.profile_panel) assert diff_diff.PanelProfile.__name__ == "PanelProfile" assert diff_diff.Alert.__name__ == "Alert" - for name in ("profile_panel", "PanelProfile", "Alert"): + assert diff_diff.OutcomeShape.__name__ == "OutcomeShape" + assert diff_diff.TreatmentDoseShape.__name__ == "TreatmentDoseShape" + for name in ( + "profile_panel", + "PanelProfile", + "Alert", + "OutcomeShape", + "TreatmentDoseShape", + ): assert name in diff_diff.__all__, f"{name} missing from __all__" @@ -466,6 +477,38 @@ def test_continuous_zero_dose_controls_flag_has_never_treated(): assert profile.has_always_treated is False +def test_continuous_negative_then_zero_does_not_count_as_never_treated(): + """Regression: on continuous panels with negative dose support, a + unit path like `[-1, 0]` has `groupby(unit).max() == 0` but is NOT + a zero-dose never-treated control (`-1` is non-zero). The implementation + must check both endpoints (`unit_max == 0` AND `unit_min == 0`) so + `has_never_treated` matches the documented contract: "some unit has + treatment == 0 in every observed non-NaN row." The autonomous guide's + ContinuousDiD preflight uses this field as the `P(D=0) > 0` proxy, so + a false-positive here would silently mislead an agent about control + availability.""" + rng = np.random.default_rng(7) + rows = [] + for u in range(1, 21): + for t in range(4): + if u <= 5: + # Mixed negative/zero path - max is 0 but min < 0; + # must NOT count as never-treated. + dose = -1.0 if t < 2 else 0.0 + else: + dose = float(rng.uniform(0.5, 3.0)) + rows.append({"u": u, "t": t, "tr": dose, "y": float(rng.normal())}) + df = pd.DataFrame(rows) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + assert profile.treatment_type == "continuous" + assert profile.has_never_treated is False, ( + "Unit path [-1, -1, 0, 0] has unit_max == 0 but is not a " + "zero-dose never-treated control (some rows are negative). " + "Profile must enforce the documented contract: 'treatment == 0 " + "in every observed non-NaN row' (i.e., unit_min == unit_max == 0)." + ) + + def test_guide_api_strings_resolve_against_public_api(): """Sanity-check that every estimator referenced in the autonomous guide exists in the public API, plus the `hausman_pretest` classmethod location @@ -866,3 +909,526 @@ def test_empty_after_id_drop_raises_value_error(): ) with pytest.raises(ValueError, match="no rows remain"): profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + + +# --------------------------------------------------------------------------- +# Outcome shape (numeric outcome distributional facts) +# --------------------------------------------------------------------------- + + +def test_outcome_shape_count_like_poisson(): + """A Poisson-distributed outcome (integer-valued, has zeros, right-skewed) + must satisfy is_count_like=True so an agent can gate WooldridgeDiD over + linear OLS pre-fit. Uses Poisson(lambda=0.5) on a 200-unit panel for a + reliably-skewed empirical sample (theoretical skew = 1/sqrt(0.5) ~ 1.41).""" + rng = np.random.default_rng(7) + first_treat = {u: 2 for u in range(101, 201)} + df = _make_panel( + n_units=200, + periods=range(0, 4), + first_treat=first_treat, + outcome_fn=lambda u, t, tr, _rng: int(rng.poisson(0.5 + 0.2 * tr)), + ) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + shape = profile.outcome_shape + assert shape is not None + assert shape.is_integer_valued is True + assert shape.is_count_like is True + assert shape.pct_zeros > 0.1 + assert shape.skewness is not None and shape.skewness > 0.5 + assert shape.n_distinct_values >= 3 + + +def test_outcome_shape_binary_outcome_not_count_like(): + """Binary 0/1 outcome must NOT trigger is_count_like (only 2 distinct + values; skewness gate fires None) and is_bounded_unit must be True.""" + first_treat = {u: 2 for u in range(11, 21)} + df = _make_panel( + n_units=20, + periods=range(0, 4), + first_treat=first_treat, + outcome_fn=lambda u, t, tr, _rng: int(tr), + ) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + shape = profile.outcome_shape + assert shape is not None + assert shape.is_count_like is False + assert shape.is_bounded_unit is True + assert shape.skewness is None + assert shape.excess_kurtosis is None + assert shape.n_distinct_values == 2 + + +def test_outcome_shape_count_like_excludes_negative_support(): + """`is_count_like` is the routing signal toward + `WooldridgeDiD(method="poisson")`, which raises `ValueError` on + negative outcomes (`wooldridge.py:1105`). The heuristic must + therefore gate on `value_min >= 0`: a right-skewed integer outcome + with zeros AND some negative values must NOT set `is_count_like`, + even though the other four conditions (integer-valued, has zeros, + skewness > 0.5, > 2 distinct values) are satisfied. Otherwise the + autonomous-guide §5.3 reasoning chain would steer agents toward an + estimator that will then refuse to fit the panel.""" + rng = np.random.default_rng(37) + first_treat = {u: 2 for u in range(101, 201)} + + # Mix Poisson with a small share of negative integers - integer-valued, + # has zeros (Poisson(0.5) yields ~60% zeros), and right-skewed; but + # value_min < 0 should kill is_count_like. + def _outcome_fn(u, t, tr, _rng): + base = int(rng.poisson(0.5 + 0.2 * tr)) + # 5% of cells become a small negative integer, e.g. -1 or -2. + if rng.random() < 0.05: + return -int(rng.integers(1, 3)) + return base + + df = _make_panel( + n_units=200, + periods=range(0, 4), + first_treat=first_treat, + outcome_fn=_outcome_fn, + ) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + shape = profile.outcome_shape + assert shape is not None + assert shape.value_min < 0, "fixture must include some negative outcomes" + assert shape.is_integer_valued is True + assert shape.pct_zeros > 0.1 + assert shape.skewness is not None + assert shape.is_count_like is False, ( + "is_count_like must be False when value_min < 0; otherwise an " + "agent following the §5.3 worked example would route the panel " + "to WooldridgeDiD(method='poisson'), which raises ValueError on " + "negative outcomes." + ) + + +def test_outcome_shape_continuous_normal(): + """Normally-distributed float outcome must have is_integer_valued=False, + is_count_like=False, is_bounded_unit=False (signed values exit [0,1]).""" + rng = np.random.default_rng(11) + first_treat = {u: 2 for u in range(11, 21)} + df = _make_panel( + n_units=20, + periods=range(0, 4), + first_treat=first_treat, + outcome_fn=lambda u, t, tr, _rng: float(rng.normal(0.0, 1.0)), + ) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + shape = profile.outcome_shape + assert shape is not None + assert shape.is_integer_valued is False + assert shape.is_count_like is False + assert shape.is_bounded_unit is False + assert shape.skewness is not None + assert shape.excess_kurtosis is not None + + +def test_outcome_shape_bounded_unit(): + """Outcome confined to [0, 1] (e.g., a proportion / probability) must + set is_bounded_unit=True so an agent can flag the bounded-support + consideration when interpreting linear-OLS estimates.""" + rng = np.random.default_rng(13) + first_treat = {u: 2 for u in range(11, 21)} + df = _make_panel( + n_units=20, + periods=range(0, 4), + first_treat=first_treat, + outcome_fn=lambda u, t, tr, _rng: float(rng.uniform(0.0, 1.0)), + ) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + shape = profile.outcome_shape + assert shape is not None + assert shape.is_bounded_unit is True + assert 0.0 <= shape.value_min <= shape.value_max <= 1.0 + + +def test_outcome_shape_categorical_returns_none(): + """Object-dtype outcome (string labels) must return outcome_shape=None; + skewness/kurtosis are not defined for non-numeric data.""" + first_treat = {u: 2 for u in range(11, 21)} + rows = [] + for u in range(1, 21): + for t in range(4): + tr = 1 if (u in first_treat and t >= first_treat[u]) else 0 + rows.append({"u": u, "t": t, "tr": tr, "y": "high" if tr else "low"}) + df = pd.DataFrame(rows) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + assert profile.outcome_shape is None + + +def test_outcome_shape_skewness_kurtosis_gated_on_distinct_values(): + """Outcomes with fewer than 3 distinct values must report + skewness=None and excess_kurtosis=None — moments are undefined or + degenerate at that distinctness floor.""" + first_treat = {u: 2 for u in range(11, 21)} + df = _make_panel( + n_units=20, + periods=range(0, 4), + first_treat=first_treat, + outcome_fn=lambda u, t, tr, _rng: float(tr), + ) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + shape = profile.outcome_shape + assert shape is not None + assert shape.n_distinct_values == 2 + assert shape.skewness is None + assert shape.excess_kurtosis is None + + +def test_outcome_shape_to_dict_roundtrips_through_json(): + """outcome_shape must serialize to a JSON-compatible nested dict and + survive ``json.loads(json.dumps(...))`` without loss.""" + rng = np.random.default_rng(17) + first_treat = {u: 2 for u in range(11, 21)} + df = _make_panel( + n_units=20, + periods=range(0, 4), + first_treat=first_treat, + outcome_fn=lambda u, t, tr, _rng: int(rng.poisson(2.0)), + ) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + payload = profile.to_dict() + roundtrip = json.loads(json.dumps(payload)) + shape_dict = roundtrip["outcome_shape"] + assert shape_dict is not None + assert set(shape_dict.keys()) == { + "n_distinct_values", + "pct_zeros", + "value_min", + "value_max", + "skewness", + "excess_kurtosis", + "is_integer_valued", + "is_count_like", + "is_bounded_unit", + } + + +# --------------------------------------------------------------------------- +# Treatment dose (continuous-treatment distributional facts) +# --------------------------------------------------------------------------- + + +def test_treatment_dose_continuous_zero_baseline(): + """Continuous panel with zero-dose controls and multiple distinct + treated doses populates treatment_dose with has_zero_dose=True and + the correct n_distinct_doses + dose support.""" + rng = np.random.default_rng(19) + rows = [] + for u in range(1, 21): + # 5 zero-dose units, 15 with one of three nonzero dose levels + if u <= 5: + dose = 0.0 + elif u <= 10: + dose = 1.0 + elif u <= 15: + dose = 2.5 + else: + dose = 4.0 + for t in range(4): + rows.append({"u": u, "t": t, "tr": dose, "y": float(rng.normal())}) + df = pd.DataFrame(rows) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + dose = profile.treatment_dose + assert profile.treatment_type == "continuous" + assert dose is not None + assert dose.has_zero_dose is True + assert dose.n_distinct_doses == 4 # {0, 1, 2.5, 4} + assert dose.dose_min == pytest.approx(1.0) + assert dose.dose_max == pytest.approx(4.0) + + +def test_treatment_dose_descriptive_fields_supplement_existing_gates(): + """Regression: most TreatmentDoseShape fields are descriptive + distributional context. In the canonical ContinuousDiD setup + (Callaway, Goodman-Bacon, Sant'Anna 2024) the dose `D_i` is + time-invariant per unit and `first_treat` is a separate + caller-supplied column — `profile_panel` sees only the dose + column. Profile-side screening on the dose column includes + `treatment_varies_within_unit` (the actual fit-time gate; rules + out `0,0,d,d` paths) and `has_never_treated` (predicts + `P(D=0) > 0` under the canonical convention that `first_treat == + 0` ties to `D_i == 0`). The two cases below exercise those + signals: + + 1. `0,0,d,d` within-unit dose path: a single unit toggles between + zero (pre-treatment) and a single nonzero dose `d` (post). The + PanelProfile.treatment_varies_within_unit field correctly fires + True. This IS the actual fit-time gate (line 222-228 of + continuous_did.py uses `df.groupby(unit)[dose].nunique() > 1` + independent of `first_treat`). The canonical ContinuousDiD + setup expects time-invariant per-unit dose, so a `0,0,d,d` + path is incompatible regardless of how `first_treat` is + constructed. + 2. Row-level zeros without never-treated: every unit eventually + gets treated, but pre-treatment rows have dose=0. has_zero_dose + fires True (row-level), while has_never_treated correctly fires + False (unit-level). With a `first_treat` column consistent + with the dose column on per-unit treated/untreated status, + `ContinuousDiD.fit()` will reject the panel via + `n_control == 0`. `ContinuousDiD` as currently implemented + does not apply (Remark 3.1 lowest-dose-as-control is not + implemented). Routing alternatives that do not require + `P(D=0) > 0` include linear DiD with the treatment as a + continuous covariate and `HeterogeneousAdoptionDiD` for + graded-adoption designs. Re-encoding the treatment column + is an agent-side preprocessing choice that changes the + estimand and is not documented in REGISTRY. Do not relabel + units to manufacture controls via the force-zero coercion + path.""" + # Case 1: 0,0,d,d within-unit path + rows1 = [] + for u in range(1, 21): + for t in range(4): + if u <= 5: + dose = 0.0 # never-treated controls + else: + # 0,0,2.5,2.5 path - constant nonzero dose, but full + # path nunique == 2 + dose = 0.0 if t < 2 else 2.5 + rows1.append({"u": u, "t": t, "tr": dose, "y": 0.0}) + df1 = pd.DataFrame(rows1) + profile1 = profile_panel(df1, unit="u", time="t", treatment="tr", outcome="y") + assert profile1.treatment_type == "continuous" + assert profile1.treatment_varies_within_unit is True, ( + "treatment_varies_within_unit must fire True on `0,0,d,d` paths " + "(matching ContinuousDiD.fit() unit-level full-path nunique check at " + "line 222-228 of continuous_did.py); this is an actual fit-time gate " + "that holds independent of `first_treat`." + ) + + # Case 2: row-level zeros, no never-treated units + rows2 = [] + for u in range(1, 21): + # Every unit eventually treated; pre-treatment dose=0, post-treatment + # constant nonzero dose. No unit is never-treated. + adopt_period = 1 if u <= 10 else 2 + for t in range(4): + dose = 1.5 if t >= adopt_period else 0.0 + rows2.append({"u": u, "t": t, "tr": dose, "y": 0.0}) + df2 = pd.DataFrame(rows2) + profile2 = profile_panel(df2, unit="u", time="t", treatment="tr", outcome="y") + assert profile2.treatment_type == "continuous" + dose2 = profile2.treatment_dose + assert dose2 is not None + assert dose2.has_zero_dose is True, "row-level zero dose rows are present" + assert profile2.has_never_treated is False, ( + "every unit eventually treated -> has_never_treated must be False, " + "even though row-level has_zero_dose==True. Under the canonical " + "ContinuousDiD setup, has_never_treated==False signals to the " + "agent that no `first_treat == 0` units exist; ContinuousDiD.fit() " + "will reject the panel via `n_control == 0`. ContinuousDiD as " + "currently implemented does not apply on this panel; routing " + "alternatives include HeterogeneousAdoptionDiD or linear DiD with " + "a continuous covariate. Re-encoding is an agent-side preprocessing " + "choice not documented in REGISTRY as a supported fallback." + ) + + +def test_treatment_dose_min_flags_negative_dose_continuous_panels(): + """Regression: a balanced, never-treated, time-invariant continuous + panel with negative non-zero treated doses fails the canonical- + setup `dose_min > 0` preflight check. The canonical ContinuousDiD + setup uses time-invariant per-unit dose `D_i` and a separate + caller-supplied `first_treat` column; with a `first_treat` + consistent with the dose column on per-unit treated/untreated + status (negative-dose units labeled `first_treat > 0`), + `ContinuousDiD.fit()` would raise `ValueError` at + `continuous_did.py:287-294` ("Dose must be strictly positive for + treated units (D > 0)"). `ContinuousDiD` as currently + implemented does not apply on this panel. `HeterogeneousAdoptionDiD` + is also NOT a routing alternative here: HAD requires non-negative + dose support (`had.py:1450-1459`, paper Section 2). The applicable + alternative on the negative-dose branch is linear DiD with the + treatment as a signed continuous covariate. Re-encoding the + treatment to a non-negative scale is an agent-side preprocessing + choice that changes the estimand and is not documented in + REGISTRY as a supported fallback. The force-zero coercion path + on `first_treat == 0` rows is implementation behavior for + inconsistent inputs and is not a documented routing option. This + test asserts that the profile correctly surfaces `dose_min < 0` + so an agent can choose an applicable alternative before reaching + `fit()`.""" + rng = np.random.default_rng(41) + rows = [] + for u in range(1, 21): + # 5 zero-dose (never-treated) units, 15 with one of three + # NEGATIVE non-zero dose levels held constant across periods. + if u <= 5: + dose = 0.0 + elif u <= 10: + dose = -1.0 + elif u <= 15: + dose = -2.5 + else: + dose = -4.0 + for t in range(4): + rows.append({"u": u, "t": t, "tr": dose, "y": float(rng.normal())}) + df = pd.DataFrame(rows) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + # All other canonical-setup preflight checks pass: + assert profile.treatment_type == "continuous" + assert profile.has_never_treated is True + assert profile.treatment_varies_within_unit is False + assert profile.is_balanced is True + assert "duplicate_unit_time_rows" not in {a.code for a in profile.alerts} + # But dose_min > 0 fails: under the canonical ContinuousDiD setup + # (per-unit time-invariant dose + separate first_treat with + # negative-dose units labeled first_treat > 0), fit() would raise + # at line 287-294 ("Dose must be strictly positive for treated + # units"). ContinuousDiD as currently implemented does not apply. + # HeterogeneousAdoptionDiD is also NOT a routing alternative on + # this branch: HAD requires non-negative dose support (had.py: + # 1450-1459, paper Section 2). The applicable alternative is + # linear DiD with the treatment as a signed continuous covariate. + # Re-encoding is an agent-side preprocessing choice not + # documented in REGISTRY as a fallback. Relabeling-to- + # first_treat==0 is not a documented routing option. + dose = profile.treatment_dose + assert dose is not None + assert dose.dose_min < 0, ( + "Fixture must have negative dose_min so the canonical-setup " + "preflight check (`dose_min > 0`) correctly fires False on " + "this panel." + ) + + +def test_treatment_dose_continuous_no_zero_dose(): + """If every unit has a strictly positive dose throughout, has_zero_dose + must be False — descriptive flag for the absence of dose-zero rows in + the panel.""" + rng = np.random.default_rng(29) + rows = [] + for u in range(1, 21): + dose = float(rng.uniform(0.5, 3.0)) + for t in range(4): + rows.append({"u": u, "t": t, "tr": dose, "y": float(rng.normal())}) + df = pd.DataFrame(rows) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + dose = profile.treatment_dose + assert dose is not None + assert dose.has_zero_dose is False + + +def test_treatment_dose_binary_treatment_returns_none(): + """Binary treatment must produce treatment_dose=None — dose semantics + only apply to continuous treatments.""" + first_treat = {u: 2 for u in range(11, 21)} + df = _make_panel(n_units=20, periods=range(0, 4), first_treat=first_treat) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + assert profile.treatment_type == "binary_absorbing" + assert profile.treatment_dose is None + + +def test_treatment_dose_categorical_treatment_returns_none(): + """Categorical (object-dtype) treatment must produce treatment_dose=None.""" + rows = [] + for u in range(1, 11): + arm = "A" if u <= 5 else "B" + for t in range(4): + rows.append({"u": u, "t": t, "tr": arm, "y": float(u) + 0.1 * t}) + df = pd.DataFrame(rows) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + assert profile.treatment_type == "categorical" + assert profile.treatment_dose is None + + +def test_treatment_dose_to_dict_roundtrips_through_json(): + """treatment_dose must serialize to a JSON-compatible nested dict.""" + rng = np.random.default_rng(31) + rows = [] + for u in range(1, 21): + dose = 0.0 if u <= 5 else 2.5 + for t in range(4): + rows.append({"u": u, "t": t, "tr": dose, "y": float(rng.normal())}) + df = pd.DataFrame(rows) + profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") + payload = profile.to_dict() + roundtrip = json.loads(json.dumps(payload)) + dose_dict = roundtrip["treatment_dose"] + assert dose_dict is not None + assert set(dose_dict.keys()) == { + "n_distinct_doses", + "has_zero_dose", + "dose_min", + "dose_max", + "dose_mean", + } + + +# --------------------------------------------------------------------------- +# Frozen invariants +# --------------------------------------------------------------------------- + + +def test_outcome_shape_dataclass_is_frozen(): + from diff_diff.profile import OutcomeShape + + s = OutcomeShape( + n_distinct_values=3, + pct_zeros=0.1, + value_min=0.0, + value_max=1.0, + skewness=0.5, + excess_kurtosis=0.0, + is_integer_valued=False, + is_count_like=False, + is_bounded_unit=True, + ) + with pytest.raises(dataclasses.FrozenInstanceError): + s.n_distinct_values = 999 # type: ignore[misc] + + +def test_panel_profile_direct_construction_without_wave2_fields(): + """`PanelProfile` is a public top-level type. Wave 2 added two + optional fields (`outcome_shape`, `treatment_dose`) with `None` + defaults so direct callers that instantiated `PanelProfile(...)` + pre-Wave-2 do not break with `TypeError: missing keyword + argument`. Verify direct construction without the new fields + succeeds and yields the documented defaults.""" + profile = PanelProfile( + n_units=2, + n_periods=2, + n_obs=4, + is_balanced=True, + observation_coverage=1.0, + treatment_type="binary_absorbing", + is_staggered=False, + n_cohorts=1, + cohort_sizes={1: 1}, + has_never_treated=True, + has_always_treated=False, + treatment_varies_within_unit=True, + first_treatment_period=1, + last_treatment_period=1, + min_pre_periods=1, + min_post_periods=1, + outcome_dtype="float64", + outcome_is_binary=False, + outcome_has_zeros=False, + outcome_has_negatives=False, + outcome_missing_fraction=0.0, + outcome_summary={"min": 0.0, "max": 1.0, "mean": 0.5, "std": 0.5}, + alerts=(), + ) + assert profile.outcome_shape is None + assert profile.treatment_dose is None + # to_dict() must serialize the defaulted fields as None values + payload = profile.to_dict() + assert payload["outcome_shape"] is None + assert payload["treatment_dose"] is None + + +def test_treatment_dose_dataclass_is_frozen(): + from diff_diff.profile import TreatmentDoseShape + + d = TreatmentDoseShape( + n_distinct_doses=3, + has_zero_dose=True, + dose_min=1.0, + dose_max=3.0, + dose_mean=2.0, + ) + with pytest.raises(dataclasses.FrozenInstanceError): + d.has_zero_dose = False # type: ignore[misc]