From ed6af39035dc9e60825c14307582df5be31681d4 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Tue, 23 Jun 2026 18:38:55 +0000
Subject: [PATCH 1/3] Restore timeseries conformal benchmark (corrected) +
 point-model duel (#57)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Replaces the withdrawn gists/timeseries/conformal benchmark, which (per #57)
leaked each evaluation chunk into NNS.ARMA.optim and compared methods across
mismatched forecasting protocols.

gists/timeseries/conformal/ — corrected interval study. The NNS block point
forecast is held FIXED (model selected on a strictly historical validation tail;
the scored block is never shown to the optimizer) and only the interval
construction varies: NNS native PI vs. split-conformal (flat) vs. split-conformal
(per-lead-time) vs. Gaussian, all on identical residuals. Framed explicitly as an
adaptation to discern coverage guarantees on a heteroskedastic process. Finding:
native PI is the efficiency winner and is essentially a flat split-conformal band;
every flat band under-covers the volatile regime (exchangeability failure); only
the horizon-adaptive per-lead wrapper recovers near-nominal coverage, at a width
cost.

gists/timeseries/point_duel/ — fair point-forecast comparison under one block
protocol (no online updating): NNS block vs recursive-block ridge vs persistence.
NNS wins decisively (MAE 1.51 vs 2.66 vs 2.49); recursive ridge degrades below
persistence once the h=1 true-lag crutch is removed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TX6vFofX1vnanE147tJFWF
---
 gists/timeseries/conformal/README.md          | 127 +++++++
 .../conformal/results_block/scores.csv        |   5 +
 .../conformal/results_block/scores_all.csv    |  41 +++
 .../conformal/run_block_nns_intervals.py      | 304 +++++++++++++++++
 gists/timeseries/point_duel/README.md         |  93 ++++++
 .../point_duel/results_duel/interval.csv      |   5 +
 .../point_duel/results_duel/interval_all.csv  |  41 +++
 .../point_duel/results_duel/point.csv         |   4 +
 .../point_duel/results_duel/point_all.csv     |  31 ++
 gists/timeseries/point_duel/run_point_duel.py | 316 ++++++++++++++++++
 10 files changed, 967 insertions(+)
 create mode 100644 gists/timeseries/conformal/README.md
 create mode 100644 gists/timeseries/conformal/results_block/scores.csv
 create mode 100644 gists/timeseries/conformal/results_block/scores_all.csv
 create mode 100644 gists/timeseries/conformal/run_block_nns_intervals.py
 create mode 100644 gists/timeseries/point_duel/README.md
 create mode 100644 gists/timeseries/point_duel/results_duel/interval.csv
 create mode 100644 gists/timeseries/point_duel/results_duel/interval_all.csv
 create mode 100644 gists/timeseries/point_duel/results_duel/point.csv
 create mode 100644 gists/timeseries/point_duel/results_duel/point_all.csv
 create mode 100644 gists/timeseries/point_duel/run_point_duel.py

diff --git a/gists/timeseries/conformal/README.md b/gists/timeseries/conformal/README.md
new file mode 100644
index 0000000..79ec107
--- /dev/null
+++ b/gists/timeseries/conformal/README.md
@@ -0,0 +1,127 @@
+# NNS prediction intervals vs. a conformal wrapper — apples to apples
+
+A controlled test of **prediction-interval construction** on a single,
+heteroskedastic data-generating process. The **point forecast is held fixed**:
+every interval in the table below wraps the *same* `NNS.ARMA.optim` block
+forecast and the *same* NNS residuals. The only thing that varies is how the
+band is drawn. This isolates interval construction with zero point-model and
+zero protocol confounding.
+
+## Why this framing
+
+This is deliberately **not** the usual online conformal benchmark, and we want
+to be explicit that it is an **adaptation**. Standard adaptive conformal methods
+(ACI, PID, NexCP, …) are *online, h=1* procedures: they re-anchor on the realized
+observation every step, and their coverage is a control-theoretic property of
+that feedback loop rather than a statement about any forecast trajectory. An
+online method that re-anchors each step will hit its nominal coverage rate on
+almost any sequence — it is dragged across each one-step gap rather than
+forecasting it. Comparing a structural multi-step forecaster against that on a
+1-step task flatters the reactive method and tests the wrong thing.
+
+So here we **adapt conformal into a forecast wrapper** and ask a sharper
+question instead:
+
+> Given a genuine multi-step forecast made at time *t* with **no online
+> updating**, how well does each interval method discern **coverage guarantees
+> on a heteroskedastic process** — including *conditional* coverage across the
+> volatility regimes, not just the marginal rate?
+
+## Protocol
+
+- **DGP:** non-linear, heteroskedastic AR(1) with slow trend, two seasonal
+  components (periods 50, 200), and piecewise volatility regimes
+  (σ jumps to 2.5, drops to 0.55, settles at 1.8). 10 seeds, T = 3500.
+- **Forecast (fixed for all methods):** at each origin *t*, `NNS.ARMA.optim`
+  emits a single `h`-step block forecast, with `h = implied_h =
+  t·(1−0.9)/0.9`. The model (periods/method/bias) is selected on a strictly
+  **historical** validation tail — the scored block is never shown to the
+  optimizer (the leak from issue #57 is fixed). No observation inside the block
+  is fed back: this is forecasting by design, exposed to its own compounding
+  error.
+- **Intervals compared (all on the identical NNS forecast & residuals):**
+  - **NNS native PI** — `results ± pi_width`, NNS's own nonparametric rule
+    (a single flat half-width from the residual distribution + bias).
+  - **NNS + split-CP (flat)** — empirical (1−α) conformal quantile of the same
+    NNS validation residuals. Same residuals, textbook conformal quantile.
+  - **NNS + split-CP (per-lead-time)** — a conformal quantile *per lead-time k*,
+    pooled across past blocks (with a local-window thickening and a flat
+    fallback while pools are thin). The only method whose band **widens with
+    horizon**.
+  - **NNS + Gaussian (flat)** — `z·std(residuals)`, a parametric reference.
+- **Scoring:** interval (Winkler) score, marginal coverage, **worst rolling-window
+  coverage** and **low-/high-vol stratum coverage** (the heteroskedasticity
+  discriminators), mean width, and Gaussian-proxy CRPS / log-score.
+
+## Results
+
+```
+=== BLOCK NNS INTERVAL COMPARISON  (point = NNS fixed; mean over 10 seeds, alpha=0.1, target cov=0.9) ===
+
+                   method  marg_cov  worst_win_cov  cov_lowvol  cov_hivol  cond_cov_gap  width  interval_score   CRPS  logscore
+            NNS native PI     0.845          0.561       0.927      0.853         0.149  5.658           8.674  1.090     2.076
+    NNS + split-CP (flat)     0.824          0.515       0.899      0.850         0.169  5.398           8.735  1.091     2.102
+    NNS + Gaussian (flat)     0.821          0.516       0.894      0.842         0.170  5.332           8.773  1.092     2.105
+NNS + split-CP (per-lead)     0.890          0.582       0.983      0.844         0.099  7.995          10.521  1.169     2.268
+```
+
+Sorted by interval (Winkler) score, lower is better. Point-forecast MAE was
+identical across rows (~1.51, the fixed NNS block forecast); only the bands differ.
+
+## Findings
+
+- **NNS's native PI is the efficiency winner and ≈ a flat split-conformal band.**
+  It posts the best Winkler score (8.674) with the tightest width (5.66), and it
+  actually covers slightly *better* than the textbook split-conformal quantile on
+  the *same* residuals (marg 0.845 vs 0.824). NNS's `upm_var`+bias rule is a hair
+  more conservative than the empirical conformal quantile, but the two track
+  closely — confirming the native interval is, in effect, a flat split-conformal
+  interval.
+- **Every flat band under-covers this heteroskedastic process.** All three
+  flat methods land at ~0.82–0.85 marginal vs. the 0.90 target, collapse to
+  ~0.51–0.56 in the *worst* 100-step window, and carry a ~0.15–0.17 calm-vs-volatile
+  conditional gap (≈0.90–0.93 in the calm regime, ≈0.84–0.85 in the volatile one).
+  A width calibrated on historical validation residuals does not transport to a
+  multi-step block that compounds error and crosses volatility regimes.
+- **Only the horizon-adaptive conformal wrapper recovers the guarantee — and pays
+  for it.** `split-CP (per-lead)` is the lone method to reach near-nominal marginal
+  coverage (0.890) and the smallest conditional gap (0.099), because it widens with
+  lead-time and calibrates on realized block residuals. The cost is ~45% wider
+  intervals (7.995) and the worst Winkler score (10.521).
+
+## Takeaway
+
+On a heteroskedastic process, under honest multi-step forecasting with **no online
+updating**, there is no free lunch. NNS's native interval is the *efficiency*
+winner but does **not** deliver the nominal 90% coverage — least of all in the
+volatile regime and the worst windows, exactly where guarantees matter most.
+Recovering the guarantee requires a horizon-adaptive conformal wrapper and paying
+for it in width. This coverage-vs-efficiency tension — not a single "winner" — is
+what the benchmark is built to expose, and it is why we frame this as an
+*adaptation to discern coverage guarantees on a heteroskedastic process* rather
+than a claim that any one band dominates.
+
+## How to read it
+
+- **`NNS native PI` vs `NNS + split-CP (flat)`** is the headline head-to-head:
+  the *same* residuals, NNS's `upm_var`+bias rule vs. the empirical conformal
+  quantile. If they track closely, NNS's native band already *is*, in effect, a
+  flat split-conformal interval.
+- **`per-lead-time`** is the only band that adapts to horizon. Watch
+  **worst-window** and **high-vol stratum** coverage: that is where letting the
+  width grow with lead-time should pay off on a heteroskedastic process, at some
+  cost in mean width.
+- **Marginal coverage near 0.90 is necessary but cheap.** The honest test on a
+  heteroskedastic DGP is the **conditional** coverage gap between the calm and
+  volatile regimes — a flat band that hits 0.90 on average by over-covering the
+  calm regime and under-covering the volatile one has not actually solved the
+  problem.
+
+## Run it
+
+```bash
+pip install ovvo-nns numpy pandas scipy
+python run_block_nns_intervals.py
+```
+
+Writes `results_block/scores.csv` (aggregated) and `scores_all.csv` (per seed).
diff --git a/gists/timeseries/conformal/results_block/scores.csv b/gists/timeseries/conformal/results_block/scores.csv
new file mode 100644
index 0000000..8252db9
--- /dev/null
+++ b/gists/timeseries/conformal/results_block/scores.csv
@@ -0,0 +1,5 @@
+method,marg_cov,worst_win_cov,cov_lowvol,cov_hivol,cond_cov_gap,width,interval_score,CRPS,logscore
+NNS native PI,0.845217041800643,0.561,0.9271704180064309,0.852572347266881,0.14855305466237945,5.658247904936664,8.673563693193246,1.0902136839659236,2.076059884663743
+NNS + split-CP (flat),0.8243569131832797,0.515,0.8993569131832798,0.84983922829582,0.1686495176848875,5.397754711147717,8.734633715776543,1.090996705962063,2.101657125801346
+NNS + Gaussian (flat),0.8206993569131832,0.516,0.8940514469453376,0.8421221864951768,0.17025723472668813,5.332100605902861,8.772635078222827,1.091613092749769,2.104833304629708
+NNS + split-CP (per-lead),0.890112540192926,0.5820000000000001,0.9829581993569132,0.8437299035369776,0.09909967845659165,7.995348220035973,10.520733085082481,1.1692625763837228,2.267904409014462
diff --git a/gists/timeseries/conformal/results_block/scores_all.csv b/gists/timeseries/conformal/results_block/scores_all.csv
new file mode 100644
index 0000000..0e62ba6
--- /dev/null
+++ b/gists/timeseries/conformal/results_block/scores_all.csv
@@ -0,0 +1,41 @@
+method,marg_cov,worst_win_cov,cov_lowvol,cov_hivol,cond_cov_gap,width,interval_score,CRPS,logscore
+NNS native PI,0.8476688102893891,0.67,0.9131832797427653,0.8070739549839229,0.09453376205787789,5.3336289551117595,8.347311881079282,1.0521806181725764,1.9988765275223412
+NNS + split-CP (flat),0.8271704180064309,0.58,0.8954983922829582,0.8311897106109325,0.14919614147909965,5.227161385540743,8.534113534345261,1.0564514057462153,2.047717813637678
+NNS + split-CP (per-lead),0.8886655948553055,0.65,0.9839228295819936,0.8086816720257235,0.09131832797427653,7.846909211270708,10.496079109394278,1.140506122590597,2.250303555178376
+NNS + Gaussian (flat),0.8295819935691319,0.58,0.8858520900321544,0.8279742765273312,0.14276527331189715,5.220493265073913,8.535512774975702,1.0556550754409413,2.0458238791253365
+NNS native PI,0.8287781350482315,0.52,0.9163987138263665,0.8617363344051447,0.2022508038585209,5.538396478733216,9.039315486588494,1.1017051722008695,2.123941052068062
+NNS + split-CP (flat),0.815112540192926,0.51,0.9067524115755627,0.8440514469453376,0.23118971061093252,5.423595785308306,9.16356799285496,1.1045221353056585,2.145312293713436
+NNS + split-CP (per-lead),0.8830385852090032,0.6,0.9887459807073955,0.8295819935691319,0.12025723472668814,7.990812843044477,10.886441408223009,1.185924631575878,2.292922429135039
+NNS + Gaussian (flat),0.8163183279742765,0.52,0.9163987138263665,0.8408360128617364,0.23118971061093252,5.371605324250339,9.132929020698203,1.1034425720507015,2.1344452670414986
+NNS native PI,0.8480707395498392,0.54,0.9453376205787781,0.8713826366559485,0.14115755627009652,5.498313881263709,8.568987367833104,1.0769091090418628,2.088025725407269
+NNS + split-CP (flat),0.8348070739549839,0.47,0.9115755627009646,0.8858520900321544,0.16366559485530552,5.475440096762711,8.646211618108355,1.0779606714369134,2.1374529629188985
+NNS + split-CP (per-lead),0.889871382636656,0.49,0.9758842443729904,0.8697749196141479,0.11061093247588427,7.666059526775997,10.240968683617762,1.1422851084449035,2.299705192621768
+NNS + Gaussian (flat),0.8340032154340836,0.47,0.905144694533762,0.8681672025723473,0.17009646302250803,5.3825939727945,8.623802541408669,1.0773609694299997,2.127147377666515
+NNS native PI,0.8404340836012861,0.55,0.9469453376205788,0.8762057877813505,0.17009646302250803,5.417943415993259,8.509212166486481,1.0513799372860273,2.0645442083490733
+NNS + split-CP (flat),0.8183279742765274,0.51,0.8858520900321544,0.8569131832797428,0.15241157556270102,5.073486836478882,8.35866609307891,1.0468951694182635,2.0402247395377504
+NNS + split-CP (per-lead),0.8971061093247589,0.59,0.9790996784565916,0.8553054662379421,0.08006430868167203,7.868177918914598,10.290785210091181,1.1317381317261397,2.2247260687632786
+NNS + Gaussian (flat),0.8207395498392283,0.54,0.8697749196141479,0.8569131832797428,0.13151125401929264,5.187680907668864,8.312439020985863,1.0459231999241414,2.026847378341962
+NNS native PI,0.8472668810289389,0.51,0.9405144694533762,0.7733118971061094,0.12668810289389065,5.912994900178406,8.952518643444352,1.1094460776961108,2.105079957413961
+NNS + split-CP (flat),0.8243569131832797,0.58,0.9067524115755627,0.7765273311897106,0.15401929260450165,5.307560736619756,8.610536381423858,1.099087789113522,2.0849945815307396
+NNS + split-CP (per-lead),0.8866559485530546,0.6,0.9823151125401929,0.7845659163987139,0.11543408360128615,7.9404115839516605,10.336416034686959,1.1735005767309783,2.2413111688768863
+NNS + Gaussian (flat),0.8171221864951769,0.57,0.887459807073955,0.7733118971061094,0.1508038585209004,5.2117633612580585,8.673499632507543,1.0995952439049514,2.1011013227051474
+NNS native PI,0.860128617363344,0.58,0.9501607717041801,0.8906752411575563,0.15562700964630227,5.662363541166043,8.391587116904702,1.0718817368677682,2.0521000120264206
+NNS + split-CP (flat),0.8243569131832797,0.37,0.887459807073955,0.8745980707395499,0.14919614147909965,5.3533136605690315,8.60722523849902,1.0765820133621553,2.1197718297356456
+NNS + split-CP (per-lead),0.8922829581993569,0.6,0.9871382636655949,0.8778135048231511,0.09453376205787789,7.897905901117482,10.200940248313179,1.1494769211085452,2.242593936246776
+NNS + Gaussian (flat),0.8247588424437299,0.37,0.9019292604501608,0.8729903536977492,0.15241157556270102,5.31108027242601,8.649115111814341,1.0773573984920315,2.1247815699699717
+NNS native PI,0.8432475884244373,0.62,0.9019292604501608,0.8408360128617364,0.10900321543408364,5.958905251551655,8.884737288789054,1.1533883414727397,2.112495798167121
+NNS + split-CP (flat),0.802652733118971,0.55,0.9019292604501608,0.8344051446945338,0.18778135048231515,5.514395594522053,9.282531631715036,1.1632614958510543,2.169479165301043
+NNS + split-CP (per-lead),0.8882636655948553,0.62,0.9823151125401929,0.842443729903537,0.08231511254019286,8.567512061462239,11.159806161705042,1.2453577968361804,2.3305953801588606
+NNS + Gaussian (flat),0.7930064308681672,0.54,0.887459807073955,0.8215434083601286,0.1958199356913184,5.443032862596285,9.404878484224922,1.165957599219953,2.1892321453449055
+NNS native PI,0.8404340836012861,0.42,0.9501607717041801,0.8665594855305466,0.19742765273311902,5.533785215198311,8.749410650539787,1.0714713083435363,2.076368446829223
+NNS + split-CP (flat),0.8336012861736335,0.47,0.9212218649517685,0.8488745980707395,0.16045016077170415,5.394905745099468,8.708916464271185,1.069461483415126,2.071863509791649
+NNS + split-CP (per-lead),0.8858520900321544,0.49,0.9887459807073955,0.837620578778135,0.11382636655948553,7.695935282614414,10.322007696267946,1.145102489810678,2.2482944871036135
+NNS + Gaussian (flat),0.8211414790996785,0.46,0.9115755627009646,0.8408360128617364,0.18135048231511253,5.203825372900099,8.80921517840864,1.071527536672299,2.088411117524039
+NNS native PI,0.8484726688102894,0.58,0.905144694533762,0.8778135048231511,0.13151125401929264,6.028058412975543,8.799144860895987,1.1309121052212678,2.0927473976307103
+NNS + split-CP (flat),0.8311897106109325,0.52,0.887459807073955,0.8697749196141479,0.1620578778135049,5.725218656291451,8.758728314500287,1.130064761156996,2.113226263951555
+NNS + split-CP (per-lead),0.8914790996784566,0.53,0.977491961414791,0.8569131832797428,0.09292604501607715,8.23717370280603,10.704707152434425,1.212676341149698,2.3029740791509625
+NNS + Gaussian (flat),0.8255627009646302,0.52,0.8858520900321544,0.8569131832797428,0.1620578778135049,5.649387814291777,8.818937346230603,1.1313117903011027,2.117039408279818
+NNS native PI,0.8476688102893891,0.62,0.9019292604501608,0.860128617363344,0.1572347266881029,5.69808899719474,8.493411469371221,1.0828624333564756,2.046419721223251
+NNS + split-CP (flat),0.8319935691318328,0.59,0.8890675241157556,0.8762057877813505,0.17652733118971065,5.482468614284766,8.675839888968568,1.0856801348147251,2.0865280978950644
+NNS + split-CP (per-lead),0.8979099678456591,0.65,0.9839228295819936,0.8745980707395499,0.0897106109324759,8.242584168402129,10.569179146091031,1.166057643863629,2.24561779290906
+NNS + Gaussian (flat),0.8247588424437299,0.59,0.8890675241157556,0.8617363344051447,0.1845659163987139,5.339542905768758,8.766021670973783,1.0879995420615687,2.0935035802978836
diff --git a/gists/timeseries/conformal/run_block_nns_intervals.py b/gists/timeseries/conformal/run_block_nns_intervals.py
new file mode 100644
index 0000000..382bbf3
--- /dev/null
+++ b/gists/timeseries/conformal/run_block_nns_intervals.py
@@ -0,0 +1,304 @@
+"""
+run_block_nns_intervals.py
+================================================================================
+Apples-to-apples prediction-interval comparison, NNS point forecast held FIXED.
+
+Question answered
+-----------------
+At each origin t, given only data through t, NNS.ARMA.optim emits a single
+h-step block forecast (h = implied_h = current_train*(1-0.9)/0.9, NO online
+updating inside the block). We then build several prediction intervals on the
+SAME NNS point forecast and the SAME NNS residuals, and ask:
+
+    Does NNS's native prediction interval beat conformalizing NNS's own
+    residuals — and does letting the band widen with lead-time help?
+
+Because the point model is identical across every interval method, this isolates
+interval construction with zero protocol mismatch. (Contrast issue #57, where the
+NNS optimizer also saw the scored block — fixed here: the model is selected on a
+strictly historical validation tail, and the scored block is never shown to it.)
+
+Interval methods compared (all on the identical NNS block point forecast):
+  * NNS native PI            — results ± pi_width  (flat width, NNS's own rule)
+  * NNS + split-CP (flat)    — empirical 0.9 quantile of NNS validation residuals
+  * NNS + split-CP (per-lead)— quantile per lead-time k, pooled across past blocks
+                               (widens with horizon; data-starved -> fallback early)
+  * NNS + Gaussian (flat)    — z * std(NNS validation residuals)
+
+Run:
+    pip install ovvo-nns numpy pandas scipy matplotlib
+    python run_block_nns_intervals.py
+"""
+from __future__ import annotations
+
+import os
+import math
+import time
+import warnings
+from collections import defaultdict
+
+import numpy as np
+import pandas as pd
+from scipy import stats
+
+warnings.filterwarnings("ignore")
+
+try:
+    import nns as NNS
+except ImportError:
+    raise SystemExit("ovvo-nns is required. Run: pip install ovvo-nns")
+
+# ── Constants ────────────────────────────────────────────────────────────────
+ALPHA       = 0.10
+TARGET_COV  = 1.0 - ALPHA
+N_LAGS      = 12
+CAL_END     = 1000          # first NNS origin = N_LAGS + CAL_END
+WINDOW      = 100
+N_SEEDS     = 10
+TRAINING_FRAC = 0.90        # implied_h = current_train*(1-frac)/frac
+VAL_FRAC    = 0.10          # historical validation-tail fraction for model selection
+CP_MIN_POOL = 8             # min per-lead samples before trusting per-lead CP
+LOCAL_WIN   = 2             # +/- lead-time window to thicken per-lead pools
+
+os.makedirs("results_block", exist_ok=True)
+
+_MSE = lambda predicted, actual: np.mean((predicted - actual) ** 2)
+
+# ── DGP ──────────────────────────────────────────────────────────────────────
+def make_timeseries(T: int = 3500, seed: int = 0, heavy_tail: bool = False) -> dict:
+    rng = np.random.default_rng(seed + 1)
+    tt    = np.arange(1, T + 1, dtype=float)
+    level = 0.002 * tt + 1.50 * np.sin(2*np.pi*tt/50) + 0.75 * np.sin(2*np.pi*tt/200)
+    sigma = np.ones(T)
+    sigma[(tt > 900)  & (tt <= 1400)] = 2.5
+    sigma[(tt > 1900) & (tt <= 2450)] = 0.55
+    sigma[(tt > 2800)]                = 1.8
+    eps = rng.standard_t(5, size=T) / math.sqrt(5/3) if heavy_tail else rng.standard_normal(T)
+    y   = np.empty(T)
+    y[0] = level[0] + sigma[0] * eps[0]
+    for i in range(1, T):
+        y[i] = level[i] + 0.55 * (y[i-1] - level[i-1]) + sigma[i] * eps[i]
+    return {"y": y, "level": level, "sigma": sigma}
+
+# ── Scoring helpers ──────────────────────────────────────────────────────────
+def z_alpha(alpha: float = ALPHA) -> float:
+    return float(stats.norm.ppf(1 - alpha / 2))
+
+def coverage(lo, hi, y) -> float:
+    lo, hi, y = map(np.asarray, (lo, hi, y))
+    return float(np.mean((y >= lo) & (y <= hi)))
+
+def mean_width(lo, hi) -> float:
+    return float(np.mean(np.asarray(hi) - np.asarray(lo)))
+
+def rolling_coverage(lo, hi, y, window: int = WINDOW) -> np.ndarray:
+    lo, hi, y = map(np.asarray, (lo, hi, y))
+    n = len(y)
+    if n < window:
+        return np.array([])
+    return np.array([coverage(lo[i:i+window], hi[i:i+window], y[i:i+window])
+                     for i in range(n - window + 1)])
+
+def worst_window_coverage(lo, hi, y, window: int = WINDOW) -> float:
+    rc = rolling_coverage(lo, hi, y, window)
+    return float(np.min(rc)) if len(rc) > 0 else float("nan")
+
+def interval_score(lo, hi, y, alpha: float = ALPHA) -> float:
+    lo, hi, y = map(np.asarray, (lo, hi, y))
+    return float(np.mean((hi - lo)
+                         + (2/alpha) * np.maximum(lo - y, 0)
+                         + (2/alpha) * np.maximum(y - hi, 0)))
+
+def coverage_by_stratum(lo, hi, y, sigma, k: int = 4) -> list[float]:
+    lo, hi, y, sigma = map(np.asarray, (lo, hi, y, sigma))
+    ranks = stats.rankdata(sigma, method="ordinal")
+    grp   = np.ceil(ranks / len(ranks) * k).astype(int).clip(1, k)
+    return [coverage(lo[grp == j], hi[grp == j], y[grp == j]) for j in range(1, k+1)]
+
+def crps_gaussian(mu, sigma, y) -> float:
+    sigma = np.maximum(np.asarray(sigma, float), 1e-8)
+    z = (np.asarray(y) - np.asarray(mu)) / sigma
+    return float(np.mean(sigma * (z * (2*stats.norm.cdf(z) - 1)
+                                  + 2*stats.norm.pdf(z) - 1/math.sqrt(math.pi))))
+
+def log_score_gaussian(mu, sigma, y) -> float:
+    sigma = np.maximum(np.asarray(sigma, float), 1e-8)
+    return float(np.mean(-stats.norm.logpdf(np.asarray(y), loc=np.asarray(mu), scale=sigma)))
+
+def score_method(name, lo, hi, y, sigma, mu_=None, s_=None) -> dict:
+    lo_v = np.minimum(np.asarray(lo, float), np.asarray(hi, float))
+    hi_v = np.maximum(np.asarray(lo, float), np.asarray(hi, float))
+    cbs  = coverage_by_stratum(lo_v, hi_v, y, sigma)
+    row = {
+        "method":         name,
+        "marg_cov":       coverage(lo_v, hi_v, y),
+        "worst_win_cov":  worst_window_coverage(lo_v, hi_v, y),
+        "cov_lowvol":     cbs[0],
+        "cov_hivol":      cbs[-1],
+        "cond_cov_gap":   max(abs(c - TARGET_COV) for c in cbs if not math.isnan(c)),
+        "width":          mean_width(lo_v, hi_v),
+        "interval_score": interval_score(lo_v, hi_v, y),
+        "CRPS":           float("nan"),
+        "logscore":       float("nan"),
+    }
+    if mu_ is not None and s_ is not None:
+        row["CRPS"]     = crps_gaussian(mu_, s_, y)
+        row["logscore"] = log_score_gaussian(mu_, s_, y)
+    return row
+
+# ── NNS seasonal period discovery (full-window) ──────────────────────────────
+def get_nns_seas_periods(series: np.ndarray, train_n: int) -> list[int]:
+    max_period = int(train_n / 3) - 1
+    try:
+        raw = NNS.nns_seas(series, plot=False).get("periods", np.array([]))
+        seen, valid = set(), []
+        for p in raw:
+            pi = int(p)
+            if 1 < pi < max_period and pi not in seen:
+                seen.add(pi); valid.append(pi)
+        return valid if valid else [2]
+    except Exception:
+        return [2]
+
+def _emp_q(scores, level=TARGET_COV):
+    """Finite-sample conformal quantile of |scores| at coverage `level`."""
+    s = np.sort(np.abs(np.asarray(scores, float)))
+    n = len(s)
+    if n == 0:
+        return None
+    k = int(np.ceil((n + 1) * level)) - 1
+    return float("inf") if k >= n else float(s[k])
+
+# ── Block walk-forward: NNS point fixed, intervals swapped ────────────────────
+def run_once(seed: int = 0, heavy_tail: bool = False, verbose: bool = True) -> dict:
+    d = make_timeseries(T=3500, seed=seed, heavy_tail=heavy_tail)
+    y, sig = d["y"], d["sigma"]
+    T_raw = len(y)
+    z = z_alpha(ALPHA)
+
+    # accumulators: per interval-method lo/hi, shared point/y/sigma per scored step
+    acc_y, acc_sig, acc_pt = [], [], []
+    lo_acc = defaultdict(list); hi_acc = defaultdict(list); sig_acc = defaultdict(list)
+    per_lead_pool: dict[int, list] = defaultdict(list)     # realized |resid| by lead-time
+
+    current_train = N_LAGS + CAL_END
+    t0 = time.time(); n_chunks = 0
+
+    while current_train < T_raw:
+        remaining = T_raw - current_train
+        implied_h = max(1, int(current_train * (1 - TRAINING_FRAC) / TRAINING_FRAC))
+        h = min(remaining, implied_h)
+        train_slice = y[:current_train]
+        val_h = max(1, int(round(VAL_FRAC * current_train)))
+        periods = get_nns_seas_periods(train_slice, current_train)
+
+        try:
+            fit = NNS.nns_arma_optim(
+                variable=train_slice, h=h, training_set=current_train - val_h,
+                seasonal_factor=periods, negative_values=True,
+                obj_fn=_MSE, objective="min", linear_approximation=True,
+                pred_int=TARGET_COV, print_trace=False, plot=False,
+            )
+        except Exception as exc:
+            if verbose:
+                print(f"  [seed {seed} chunk {n_chunks+1} failed: {exc}] – skipping")
+            current_train += h
+            continue
+
+        point     = np.asarray(fit["results"], float)            # bias-corrected block point
+        nat_lo    = np.asarray(fit["lower.pred.int"], float)
+        nat_hi    = np.asarray(fit["upper.pred.int"], float)
+        val_err   = np.asarray(fit["errors"], float)             # validation-tail residuals
+        bias_shift = float(fit["bias.shift"])
+        # residuals of the *reported* (bias-corrected) forecast on the validation tail
+        cal_resid = val_err + bias_shift
+
+        yb  = y[current_train:current_train + h]
+        sgb = sig[current_train:current_train + h]
+        n_chunks += 1
+
+        # ── interval method 1: NNS native PI ──────────────────────────────
+        lo_acc["NNS native PI"].extend(nat_lo)
+        hi_acc["NNS native PI"].extend(nat_hi)
+        nat_hw = np.maximum((nat_hi - nat_lo) / 2.0, 1e-8)
+        sig_acc["NNS native PI"].extend(nat_hw / z)
+
+        # ── interval method 2: split-CP (flat) on NNS validation residuals ─
+        q_flat = _emp_q(cal_resid, TARGET_COV)
+        q_flat = nat_hw.mean() if (q_flat is None or not np.isfinite(q_flat)) else q_flat
+        lo_acc["NNS + split-CP (flat)"].extend(point - q_flat)
+        hi_acc["NNS + split-CP (flat)"].extend(point + q_flat)
+        sig_acc["NNS + split-CP (flat)"].extend(np.full(h, max(q_flat / z, 1e-8)))
+
+        # ── interval method 3: split-CP (per-lead-time) ───────────────────
+        pl_lo = np.empty(h); pl_hi = np.empty(h); pl_s = np.empty(h)
+        for k in range(h):
+            pool = []
+            for kk in range(max(0, k - LOCAL_WIN), k + LOCAL_WIN + 1):
+                pool.extend(per_lead_pool.get(kk, []))
+            if len(pool) >= CP_MIN_POOL:
+                qk = _emp_q(pool, TARGET_COV)
+                if qk is None or not np.isfinite(qk):
+                    qk = q_flat
+            else:
+                qk = q_flat                                       # fallback while pool is thin
+            pl_lo[k] = point[k] - qk; pl_hi[k] = point[k] + qk; pl_s[k] = max(qk / z, 1e-8)
+        lo_acc["NNS + split-CP (per-lead)"].extend(pl_lo)
+        hi_acc["NNS + split-CP (per-lead)"].extend(pl_hi)
+        sig_acc["NNS + split-CP (per-lead)"].extend(pl_s)
+
+        # ── interval method 4: Gaussian (flat) ────────────────────────────
+        s_flat = max(float(np.std(cal_resid)), 1e-8)
+        lo_acc["NNS + Gaussian (flat)"].extend(point - z * s_flat)
+        hi_acc["NNS + Gaussian (flat)"].extend(point + z * s_flat)
+        sig_acc["NNS + Gaussian (flat)"].extend(np.full(h, s_flat))
+
+        # shared truth / point
+        acc_y.extend(yb); acc_sig.extend(sgb); acc_pt.extend(point)
+
+        # update per-lead pools with realized residuals AFTER scoring (now data <= next t)
+        realized = np.abs(yb - point)
+        for k in range(h):
+            per_lead_pool[k].append(realized[k])
+
+        current_train += h
+
+    y_arr = np.asarray(acc_y); sig_arr = np.asarray(acc_sig); pt_arr = np.asarray(acc_pt)
+    rows = []
+    for name in lo_acc:
+        lo = np.asarray(lo_acc[name]); hi = np.asarray(hi_acc[name])
+        s_ = np.asarray(sig_acc[name])
+        rows.append(score_method(name, lo, hi, y_arr, sig_arr, mu_=pt_arr, s_=s_))
+
+    if verbose:
+        print(f"  [seed {seed}] {n_chunks} blocks, "
+              f"point MAE={np.mean(np.abs(pt_arr - y_arr)):.3f}, {time.time()-t0:.0f}s")
+
+    return {"scores": pd.DataFrame(rows), "lo": lo_acc, "hi": hi_acc,
+            "y": y_arr, "sigma": sig_arr, "point": pt_arr}
+
+# ── Aggregate over seeds ──────────────────────────────────────────────────────
+def run_all() -> dict:
+    all_scores = []
+    for s in range(N_SEEDS):
+        print(f"\n=== seed {s} ===")
+        all_scores.append(run_once(seed=s, verbose=True)["scores"])
+
+    scores_dt = pd.concat(all_scores, ignore_index=True)
+    metric_cols = ["marg_cov", "worst_win_cov", "cov_lowvol", "cov_hivol",
+                   "cond_cov_gap", "width", "interval_score", "CRPS", "logscore"]
+    agg = (scores_dt.groupby("method", sort=False)[metric_cols]
+           .mean().reset_index().sort_values("interval_score"))
+
+    scores_dt.to_csv("results_block/scores_all.csv", index=False)
+    agg.to_csv("results_block/scores.csv", index=False)
+
+    print(f"\n=== BLOCK NNS INTERVAL COMPARISON  "
+          f"(point=NNS fixed; mean over {N_SEEDS} seeds, alpha={ALPHA}, target={TARGET_COV}) ===\n")
+    print(agg.round(3).to_string(index=False))
+    print("\nWrote results_block/scores.csv and scores_all.csv")
+    return {"scores": scores_dt, "summary": agg}
+
+
+if __name__ == "__main__":
+    run_all()
diff --git a/gists/timeseries/point_duel/README.md b/gists/timeseries/point_duel/README.md
new file mode 100644
index 0000000..854f635
--- /dev/null
+++ b/gists/timeseries/point_duel/README.md
@@ -0,0 +1,93 @@
+# Point-model duel — NNS vs. recursive ridge, one honest protocol
+
+A fair head-to-head on **point forecasting**, with prediction intervals as a
+secondary check. Every method makes the same commitment NNS makes:
+
+> At each origin *t*, given only data through *t*, forecast an `h`-step block
+> (`h = implied_h = t·(1−0.9)/0.9`) with **no online updating**. No method may
+> peek at a realized value to predict the next one.
+
+This is the comparison the earlier conformal benchmark lacked: there the
+baselines forecast one step ahead with the *true* lag and refreshed every step,
+while NNS forecast a multi-step block blind. Here everyone forecasts the block
+blind, so the contest measures forecasting-by-design, not access to the truth.
+
+## Contenders
+
+- **NNS block** — `NNS.ARMA.optim`, model (periods/method/bias) selected on a
+  strictly historical validation tail (leak-free), native block forecast.
+- **Ridge (recursive)** — ridge on `N_LAGS` lags, fit on all data ≤ *t*, then
+  projected `h` steps **recursively** (its own predictions become the lags; no
+  true intermediate values). This is the baseline stripped of the h=1 crutch.
+- **Persistence** — last value carried forward `h` steps (floor).
+
+Intervals: each point method is wrapped with the **same** per-lead-time
+split-conformal band (calibrated on that method's own realized block residuals,
+pooled across past origins). NNS also reports its native PI.
+
+## Results
+
+Mean over 10 seeds, T = 3500, ~12 blocks/seed. Lower is better throughout.
+
+**Point forecast (the headline):**
+
+```
+           method   MAE  RMSE  median_AE
+        NNS block 1.514 2.056      1.122
+Ridge (recursive) 2.664 3.230      2.403
+      Persistence 2.493 3.241      1.991
+```
+
+**Intervals (same protocol; per-method split-CP + NNS native PI):**
+
+```
+                method  marg_cov  worst_win_cov  width  interval_score
+         NNS native PI     0.845          0.561  5.658           8.674
+        NNS block + CP     0.894          0.288  8.306          11.339
+Ridge (recursive) + CP     0.835          0.124  9.819          15.189
+      Persistence + CP     0.860          0.000 11.701          18.674
+```
+
+## Interpretation
+
+- **NNS wins the point duel decisively.** MAE 1.51 vs. recursive ridge 2.66
+  (~43% lower) and persistence 2.49; same ordering on RMSE and median absolute
+  error. The result is consistent across all 10 seeds, not seed-driven.
+- **Recursive ridge is *worse than persistence* on MAE.** Once you remove the
+  h=1 crutch (true lag every step) and force a genuine multi-step block, the
+  linear AR model's error compounds over the horizon until naive carry-forward
+  beats it. This is the concrete version of the forecaster-by-design vs.
+  forecaster-by-compulsion distinction: a structural seasonal model projects a
+  real trajectory; a reactive linear model that only ever learned the one-step
+  map degrades to noise without the truth fed back.
+- **Intervals follow point quality.** NNS's native PI is far the best Winkler
+  score (8.67) at the tightest width — because better, smaller residuals make a
+  tighter, better-centred band. The conformal wrappers on the weaker point
+  models inherit their larger errors and score worse.
+
+### Caveat on the interval table
+
+The shared per-lead-time split-conformal wrapper here has a cold-start fallback
+(it has no realized block residuals for the first origins), which depresses
+**worst-window** coverage for the `+ CP` rows (e.g. NNS block + CP at 0.288).
+Read the **point** table (MAE / RMSE / median AE) as the clean, primary result
+and the interval table as directional. The carefully-calibrated interval study
+lives separately; this branch's job is the point-model duel.
+
+## Run it
+
+```bash
+pip install ovvo-nns numpy pandas scipy scikit-learn
+python run_point_duel.py
+```
+
+Writes `results_duel/point.csv`, `interval.csv` (aggregated) and the per-seed
+`*_all.csv`.
+
+## Note
+
+Marginal coverage of every flat band falls short of the 0.90 target on this
+heteroskedastic DGP — that is the exchangeability-violation penalty, common to
+all methods, not specific to any one. The point tables (MAE / RMSE) are the
+clean comparison; the interval tables should be read as efficiency-at-similar-
+(under)coverage, not as a coverage guarantee.
diff --git a/gists/timeseries/point_duel/results_duel/interval.csv b/gists/timeseries/point_duel/results_duel/interval.csv
new file mode 100644
index 0000000..54abe05
--- /dev/null
+++ b/gists/timeseries/point_duel/results_duel/interval.csv
@@ -0,0 +1,5 @@
+method,marg_cov,worst_win_cov,width,interval_score
+NNS native PI,0.845217041800643,0.561,5.658247904936664,8.673563693193246
+NNS block + CP,0.8942926045016077,0.288,8.30644088604347,11.339456828288366
+Ridge (recursive) + CP,0.834766881028939,0.124,9.819236038266109,15.188952711334176
+Persistence + CP,0.8596061093247588,0.0,11.700845119008003,18.674312892485847
diff --git a/gists/timeseries/point_duel/results_duel/interval_all.csv b/gists/timeseries/point_duel/results_duel/interval_all.csv
new file mode 100644
index 0000000..1f87543
--- /dev/null
+++ b/gists/timeseries/point_duel/results_duel/interval_all.csv
@@ -0,0 +1,41 @@
+method,marg_cov,worst_win_cov,width,interval_score
+NNS native PI,0.8476688102893891,0.67,5.3336289551117595,8.347311881079282
+NNS block + CP,0.8906752411575563,0.22,8.177099486196127,11.529523035347692
+Ridge (recursive) + CP,0.8372186495176849,0.1,10.278876556971408,16.046670825709686
+Persistence + CP,0.8436495176848875,0.0,10.213249743026658,16.33320386075765
+NNS native PI,0.8287781350482315,0.52,5.538396478733216,9.039315486588494
+NNS block + CP,0.8902733118971061,0.37,8.390728807696368,11.499697971060435
+Ridge (recursive) + CP,0.7990353697749196,0.08,9.39395853105691,15.636645535901216
+Persistence + CP,0.8581189710610932,0.0,11.002413778926346,16.322217439495994
+NNS native PI,0.8480707395498392,0.54,5.498313881263709,8.568987367833104
+NNS block + CP,0.8870578778135049,0.32,7.850041608639307,10.881610133044617
+Ridge (recursive) + CP,0.8331993569131833,0.03,9.461311846092478,15.290577454252144
+Persistence + CP,0.8388263665594855,0.0,13.454901313710085,20.728177627283294
+NNS native PI,0.8404340836012861,0.55,5.417943415993259,8.509212166486481
+NNS block + CP,0.8971061093247589,0.18,8.178784348833949,11.442682746189892
+Ridge (recursive) + CP,0.842443729903537,0.21,9.937090055812732,15.019945691471802
+Persistence + CP,0.9091639871382636,0.0,14.085426811335523,22.670411720818084
+NNS native PI,0.8472668810289389,0.51,5.912994900178406,8.952518643444352
+NNS block + CP,0.8979099678456591,0.27,8.346058510609698,11.244196033911283
+Ridge (recursive) + CP,0.8336012861736335,0.13,9.626844810715227,14.755633310870019
+Persistence + CP,0.9011254019292605,0.0,11.723029193525093,15.965745582710372
+NNS native PI,0.860128617363344,0.58,5.662363541166043,8.391587116904702
+NNS block + CP,0.8922829581993569,0.25,8.141163390003173,11.244453556266834
+Ridge (recursive) + CP,0.8420418006430869,0.25,9.708049657419254,14.244410697181051
+Persistence + CP,0.8942926045016077,0.0,15.174974529034394,25.340887411818077
+NNS native PI,0.8432475884244373,0.62,5.958905251551655,8.884737288789054
+NNS block + CP,0.9115755627009646,0.41,9.046462585915824,11.49001377083638
+Ridge (recursive) + CP,0.8512861736334405,0.13,10.192315755071641,15.29971782370236
+Persistence + CP,0.8110932475884244,0.0,11.285499121470306,19.764628557431646
+NNS native PI,0.8404340836012861,0.42,5.533785215198311,8.749410650539787
+NNS block + CP,0.8854501607717041,0.21,7.999981741192585,11.219855763022137
+Ridge (recursive) + CP,0.8283762057877814,0.1,9.51494956048423,14.79895664009931
+Persistence + CP,0.8802250803858521,0.0,9.804629243119559,14.085680509142973
+NNS native PI,0.8484726688102894,0.58,6.028058412975543,8.799144860895987
+NNS block + CP,0.8959003215434084,0.46,8.40742772515753,11.10618124347616
+Ridge (recursive) + CP,0.8259646302250804,0.06,9.845777723130777,15.51537851501687
+Persistence + CP,0.8046623794212219,0.0,9.304325751647738,16.85406180705558
+NNS native PI,0.8476688102893891,0.62,5.69808899719474,8.493411469371221
+NNS block + CP,0.8946945337620579,0.19,8.52666065619013,11.736354029728231
+Ridge (recursive) + CP,0.8545016077170418,0.15,10.233185885906426,15.281590619137305
+Persistence + CP,0.854903536977492,0.0,10.960001704284322,18.678114408344804
diff --git a/gists/timeseries/point_duel/results_duel/point.csv b/gists/timeseries/point_duel/results_duel/point.csv
new file mode 100644
index 0000000..3cfd679
--- /dev/null
+++ b/gists/timeseries/point_duel/results_duel/point.csv
@@ -0,0 +1,4 @@
+method,MAE,RMSE,median_AE
+NNS block,1.5136011272156582,2.0556925586936883,1.12208606811395
+Ridge (recursive),2.6644732816036893,3.2304601906237487,2.4028239883431057
+Persistence,2.493290248083914,3.24141533222641,1.9910934283341892
diff --git a/gists/timeseries/point_duel/results_duel/point_all.csv b/gists/timeseries/point_duel/results_duel/point_all.csv
new file mode 100644
index 0000000..7900c80
--- /dev/null
+++ b/gists/timeseries/point_duel/results_duel/point_all.csv
@@ -0,0 +1,31 @@
+method,MAE,RMSE,median_AE
+NNS block,1.4624613757188,2.0095523514648352,1.0714033811834391
+Ridge (recursive),2.7093079489173206,3.2849148002794246,2.4276541206659417
+Persistence,2.5207425403649664,3.1838137342389228,2.0779938586077042
+NNS block,1.5179195200934121,2.0875213420368293,1.105426032360385
+Ridge (recursive),2.8282510401794556,3.3952268823991485,2.5875729210928973
+Persistence,2.3736672434422657,3.040742657230395,1.8980380282638447
+NNS block,1.4996940127463412,2.00757652089875,1.115767753513877
+Ridge (recursive),2.5709301773652182,3.1339434936155377,2.335881443966338
+Persistence,2.9117732238980736,3.794140173915152,2.234972066371844
+NNS block,1.4593534764791098,2.001246583495766,1.0864650913903005
+Ridge (recursive),2.6298954565866794,3.190132803482008,2.3755657648263804
+Persistence,2.4740376467312735,3.304377116943708,1.9953112501835901
+NNS block,1.5271989128921286,2.049606520740077,1.1602056950481388
+Ridge (recursive),2.596901417871215,3.1481544415297216,2.3307565425306667
+Persistence,2.3415372011880113,2.992295950923396,1.9604867694701609
+NNS block,1.4920674612633236,2.008429290064173,1.1172318673345731
+Ridge (recursive),2.614711558422066,3.1746112295444617,2.308934798983637
+Persistence,2.6319063703191286,3.543647928791187,2.0564798773344357
+NNS block,1.6060225425685897,2.1818865891046224,1.1965950466152333
+Ridge (recursive),2.717725103031881,3.295389524149246,2.4449283484676214
+Persistence,2.6267172703935375,3.4629309634408636,1.9490510884493548
+NNS block,1.4824820449499643,2.0065783979058245,1.1035396755182818
+Ridge (recursive),2.6981442061600074,3.249449182036675,2.5191619066463744
+Persistence,2.16538422214265,2.7236495214185332,1.867662248183055
+NNS block,1.5806024483770957,2.126087726777259,1.1775229045548874
+Ridge (recursive),2.6103148900670363,3.1982342648198507,2.298518235730201
+Persistence,2.4716622184905397,3.181652504844425,1.9899147900891014
+NNS block,1.5082094770678165,2.0784402644487474,1.086703233620384
+Ridge (recursive),2.6685510174360134,3.2345452843814164,2.3992658005209986
+Persistence,2.4154745438686924,3.1869027705175195,1.8810243063888008
diff --git a/gists/timeseries/point_duel/run_point_duel.py b/gists/timeseries/point_duel/run_point_duel.py
new file mode 100644
index 0000000..4623b56
--- /dev/null
+++ b/gists/timeseries/point_duel/run_point_duel.py
@@ -0,0 +1,316 @@
+"""
+run_point_duel.py
+================================================================================
+Point-model duel under one honest protocol.
+
+Every method makes the SAME commitment NNS makes: at each origin t, given only
+data through t, forecast an h-step block (h = implied_h = t*(1-0.9)/0.9) with NO
+online updating inside the block. No method gets to peek at a realized value to
+predict the next one. This is the apples-to-apples that the original benchmark
+lacked (where the baselines forecast h=1 with true lags and refreshed every step).
+
+Point forecasters
+-----------------
+  * NNS block        — NNS.ARMA.optim, model selected on a strictly historical
+                       validation tail (leak-free), native block forecast.
+  * Ridge (recursive)— ridge on N_LAGS lags, fit on all data <= t, then projected
+                       h steps RECURSIVELY (its own predictions become the lags;
+                       no true intermediate values). This removes the h=1 crutch.
+  * Persistence      — last observed value carried forward h steps (floor).
+
+Intervals (secondary, same protocol)
+------------------------------------
+Each point method is wrapped with the SAME per-lead-time split-conformal band,
+calibrated on that method's own realized block residuals pooled across past
+origins (flat fallback while thin). NNS additionally reports its native PI.
+
+Outputs a POINT table (MAE / RMSE, the headline) and an INTERVAL table.
+
+Run:
+    pip install ovvo-nns numpy pandas scipy scikit-learn
+    python run_point_duel.py
+"""
+from __future__ import annotations
+
+import os
+import math
+import time
+import warnings
+from collections import defaultdict
+
+import numpy as np
+import pandas as pd
+from scipy import stats
+
+warnings.filterwarnings("ignore")
+
+try:
+    import nns as NNS
+except ImportError:
+    raise SystemExit("ovvo-nns is required. Run: pip install ovvo-nns")
+
+try:
+    from sklearn.linear_model import Ridge
+    from sklearn.pipeline import make_pipeline
+    from sklearn.preprocessing import StandardScaler
+    HAS_SKLEARN = True
+except ImportError:
+    HAS_SKLEARN = False
+
+# ── Constants ────────────────────────────────────────────────────────────────
+ALPHA       = 0.10
+TARGET_COV  = 1.0 - ALPHA
+N_LAGS      = 12
+CAL_END     = 1000
+WINDOW      = 100
+N_SEEDS     = 10
+TRAINING_FRAC = 0.90
+VAL_FRAC    = 0.10
+CP_MIN_POOL = 8
+LOCAL_WIN   = 2
+
+os.makedirs("results_duel", exist_ok=True)
+_MSE = lambda predicted, actual: np.mean((predicted - actual) ** 2)
+
+# ── DGP ──────────────────────────────────────────────────────────────────────
+def make_timeseries(T: int = 3500, seed: int = 0, heavy_tail: bool = False) -> dict:
+    rng = np.random.default_rng(seed + 1)
+    tt    = np.arange(1, T + 1, dtype=float)
+    level = 0.002 * tt + 1.50 * np.sin(2*np.pi*tt/50) + 0.75 * np.sin(2*np.pi*tt/200)
+    sigma = np.ones(T)
+    sigma[(tt > 900)  & (tt <= 1400)] = 2.5
+    sigma[(tt > 1900) & (tt <= 2450)] = 0.55
+    sigma[(tt > 2800)]                = 1.8
+    eps = rng.standard_t(5, size=T) / math.sqrt(5/3) if heavy_tail else rng.standard_normal(T)
+    y   = np.empty(T)
+    y[0] = level[0] + sigma[0] * eps[0]
+    for i in range(1, T):
+        y[i] = level[i] + 0.55 * (y[i-1] - level[i-1]) + sigma[i] * eps[i]
+    return {"y": y, "level": level, "sigma": sigma}
+
+# ── Lag features ─────────────────────────────────────────────────────────────
+def lag_features(y: np.ndarray, n_lags: int = N_LAGS):
+    n  = len(y)
+    yy = y[n_lags:]
+    X  = np.column_stack([y[n_lags - k: n - k] for k in range(1, n_lags + 1)])
+    return X, yy
+
+# ── Point forecasters (all produce an h-step block from the end of y[:t]) ────
+def forecast_ridge_recursive(y_hist: np.ndarray, h: int, n_lags: int = N_LAGS) -> np.ndarray:
+    """Fit ridge on all data <= t, then project h steps recursively."""
+    X, yy = lag_features(y_hist, n_lags)
+    if HAS_SKLEARN:
+        model = make_pipeline(StandardScaler(), Ridge(alpha=1.0))
+        model.fit(X, yy)
+        predict = lambda lags: float(model.predict(lags.reshape(1, -1))[0])
+    else:
+        Xtr = np.column_stack([X, np.ones(len(X))])
+        coef, *_ = np.linalg.lstsq(Xtr, yy, rcond=None)
+        predict = lambda lags: float(np.append(lags, 1.0) @ coef)
+    hist = list(y_hist[-n_lags:])
+    out = []
+    for _ in range(h):
+        lags = np.array(hist[-1::-1][:n_lags])   # [y_{t}, y_{t-1}, ..., y_{t-n_lags+1}]
+        yhat = predict(lags)
+        out.append(yhat)
+        hist.append(yhat)                         # feed prediction back (no true value)
+    return np.asarray(out)
+
+def forecast_persistence(y_hist: np.ndarray, h: int) -> np.ndarray:
+    return np.full(h, float(y_hist[-1]))
+
+def get_nns_seas_periods(series: np.ndarray, train_n: int) -> list[int]:
+    max_period = int(train_n / 3) - 1
+    try:
+        raw = NNS.nns_seas(series, plot=False).get("periods", np.array([]))
+        seen, valid = set(), []
+        for p in raw:
+            pi = int(p)
+            if 1 < pi < max_period and pi not in seen:
+                seen.add(pi); valid.append(pi)
+        return valid if valid else [2]
+    except Exception:
+        return [2]
+
+def forecast_nns_block(y_hist: np.ndarray, h: int):
+    """Returns (point, native_lo, native_hi, cal_resid) — leak-free block forecast."""
+    n = len(y_hist)
+    val_h = max(1, int(round(VAL_FRAC * n)))
+    periods = get_nns_seas_periods(y_hist, n)
+    fit = NNS.nns_arma_optim(
+        variable=y_hist, h=h, training_set=n - val_h, seasonal_factor=periods,
+        negative_values=True, obj_fn=_MSE, objective="min",
+        linear_approximation=True, pred_int=TARGET_COV, print_trace=False, plot=False,
+    )
+    point  = np.asarray(fit["results"], float)
+    nat_lo = np.asarray(fit["lower.pred.int"], float)
+    nat_hi = np.asarray(fit["upper.pred.int"], float)
+    cal_resid = np.asarray(fit["errors"], float) + float(fit["bias.shift"])
+    return point, nat_lo, nat_hi, cal_resid
+
+# ── Scoring helpers ──────────────────────────────────────────────────────────
+def z_alpha(alpha: float = ALPHA) -> float:
+    return float(stats.norm.ppf(1 - alpha / 2))
+
+def coverage(lo, hi, y) -> float:
+    lo, hi, y = map(np.asarray, (lo, hi, y))
+    return float(np.mean((y >= lo) & (y <= hi)))
+
+def mean_width(lo, hi) -> float:
+    return float(np.mean(np.asarray(hi) - np.asarray(lo)))
+
+def rolling_coverage(lo, hi, y, window: int = WINDOW) -> np.ndarray:
+    lo, hi, y = map(np.asarray, (lo, hi, y)); n = len(y)
+    if n < window:
+        return np.array([])
+    return np.array([coverage(lo[i:i+window], hi[i:i+window], y[i:i+window])
+                     for i in range(n - window + 1)])
+
+def worst_window_coverage(lo, hi, y, window: int = WINDOW) -> float:
+    rc = rolling_coverage(lo, hi, y, window)
+    return float(np.min(rc)) if len(rc) > 0 else float("nan")
+
+def interval_score(lo, hi, y, alpha: float = ALPHA) -> float:
+    lo, hi, y = map(np.asarray, (lo, hi, y))
+    return float(np.mean((hi - lo) + (2/alpha)*np.maximum(lo - y, 0)
+                         + (2/alpha)*np.maximum(y - hi, 0)))
+
+def _emp_q(scores, level=TARGET_COV):
+    s = np.sort(np.abs(np.asarray(scores, float))); n = len(s)
+    if n == 0:
+        return None
+    k = int(np.ceil((n + 1) * level)) - 1
+    return float("inf") if k >= n else float(s[k])
+
+# ── One seed ──────────────────────────────────────────────────────────────────
+POINT_METHODS = ["NNS block", "Ridge (recursive)", "Persistence"]
+
+def run_once(seed: int = 0, heavy_tail: bool = False, verbose: bool = True) -> dict:
+    d = make_timeseries(T=3500, seed=seed, heavy_tail=heavy_tail)
+    y, sig = d["y"], d["sigma"]
+    T_raw = len(y); z = z_alpha(ALPHA)
+
+    pt_pred  = defaultdict(list)            # point forecasts by method
+    acc_y, acc_sig = [], []
+    lo_acc = defaultdict(list); hi_acc = defaultdict(list)
+    pools  = {m: defaultdict(list) for m in POINT_METHODS}   # per-method per-lead resid pool
+    nat_lo_all, nat_hi_all = [], []        # NNS native PI
+
+    current_train = N_LAGS + CAL_END
+    t0 = time.time(); n_chunks = 0
+
+    while current_train < T_raw:
+        remaining = T_raw - current_train
+        implied_h = max(1, int(current_train * (1 - TRAINING_FRAC) / TRAINING_FRAC))
+        h = min(remaining, implied_h)
+        y_hist = y[:current_train]
+        yb  = y[current_train:current_train + h]
+        sgb = sig[current_train:current_train + h]
+
+        try:
+            nns_pt, nat_lo, nat_hi, _ = forecast_nns_block(y_hist, h)
+        except Exception as exc:
+            if verbose:
+                print(f"  [seed {seed} chunk {n_chunks+1} NNS failed: {exc}] skip")
+            current_train += h
+            continue
+
+        preds = {
+            "NNS block":         nns_pt,
+            "Ridge (recursive)": forecast_ridge_recursive(y_hist, h),
+            "Persistence":       forecast_persistence(y_hist, h),
+        }
+        n_chunks += 1
+
+        # per-lead split-CP on each method's own past realized residuals
+        for m in POINT_METHODS:
+            p = preds[m]
+            lo = np.empty(h); hi = np.empty(h)
+            for k in range(h):
+                pool = []
+                for kk in range(max(0, k - LOCAL_WIN), k + LOCAL_WIN + 1):
+                    pool.extend(pools[m].get(kk, []))
+                qk = _emp_q(pool, TARGET_COV) if len(pool) >= CP_MIN_POOL else None
+                if qk is None or not np.isfinite(qk):
+                    # fallback: flat quantile of whatever realized resid we have, else inf->wide
+                    flat_pool = [v for vs in pools[m].values() for v in vs]
+                    qf = _emp_q(flat_pool, TARGET_COV) if flat_pool else None
+                    qk = qf if (qf is not None and np.isfinite(qf)) else float(np.std(p) + 1e-6)
+                lo[k] = p[k] - qk; hi[k] = p[k] + qk
+            lo_acc[f"{m} + CP"].extend(lo); hi_acc[f"{m} + CP"].extend(hi)
+
+        nat_lo_all.extend(nat_lo); nat_hi_all.extend(nat_hi)
+        for m in POINT_METHODS:
+            pt_pred[m].extend(preds[m])
+        acc_y.extend(yb); acc_sig.extend(sgb)
+
+        # update pools AFTER scoring
+        for m in POINT_METHODS:
+            r = np.abs(yb - preds[m])
+            for k in range(h):
+                pools[m][k].append(r[k])
+
+        current_train += h
+
+    y_arr = np.asarray(acc_y); sig_arr = np.asarray(acc_sig)
+
+    # POINT metrics
+    point_rows = []
+    for m in POINT_METHODS:
+        p = np.asarray(pt_pred[m])
+        err = p - y_arr
+        point_rows.append({
+            "method": m,
+            "MAE":    float(np.mean(np.abs(err))),
+            "RMSE":   float(np.sqrt(np.mean(err**2))),
+            "median_AE": float(np.median(np.abs(err))),
+        })
+
+    # INTERVAL metrics
+    int_rows = []
+    def _introw(name, lo, hi):
+        lo = np.asarray(lo); hi = np.asarray(hi)
+        return {"method": name, "marg_cov": coverage(lo, hi, y_arr),
+                "worst_win_cov": worst_window_coverage(lo, hi, y_arr),
+                "width": mean_width(lo, hi), "interval_score": interval_score(lo, hi, y_arr)}
+    int_rows.append(_introw("NNS native PI", nat_lo_all, nat_hi_all))
+    for name in lo_acc:
+        int_rows.append(_introw(name, lo_acc[name], hi_acc[name]))
+
+    if verbose:
+        mae = {m: np.mean(np.abs(np.asarray(pt_pred[m]) - y_arr)) for m in POINT_METHODS}
+        print(f"  [seed {seed}] {n_chunks} blocks  MAE: "
+              + "  ".join(f"{m}={mae[m]:.3f}" for m in POINT_METHODS)
+              + f"  ({time.time()-t0:.0f}s)")
+
+    return {"point": pd.DataFrame(point_rows), "interval": pd.DataFrame(int_rows)}
+
+# ── Aggregate ────────────────────────────────────────────────────────────────
+def run_all() -> dict:
+    pts, ints = [], []
+    for s in range(N_SEEDS):
+        print(f"\n=== seed {s} ===")
+        r = run_once(seed=s, verbose=True)
+        pts.append(r["point"]); ints.append(r["interval"])
+
+    point_dt = pd.concat(pts, ignore_index=True)
+    int_dt   = pd.concat(ints, ignore_index=True)
+    point_agg = point_dt.groupby("method", sort=False).mean().reset_index().sort_values("RMSE")
+    int_agg   = int_dt.groupby("method", sort=False).mean().reset_index().sort_values("interval_score")
+
+    point_dt.to_csv("results_duel/point_all.csv", index=False)
+    int_dt.to_csv("results_duel/interval_all.csv", index=False)
+    point_agg.to_csv("results_duel/point.csv", index=False)
+    int_agg.to_csv("results_duel/interval.csv", index=False)
+
+    print(f"\n=== POINT-MODEL DUEL  (block h-step, no online updating; "
+          f"mean over {N_SEEDS} seeds) ===\n")
+    print(point_agg.round(3).to_string(index=False))
+    print(f"\n=== INTERVALS (same protocol, per-method CP + NNS native) ===\n")
+    print(int_agg.round(3).to_string(index=False))
+    print("\nWrote results_duel/{point,interval}{,_all}.csv")
+    return {"point": point_agg, "interval": int_agg}
+
+
+if __name__ == "__main__":
+    run_all()

From db400139a150e0cc082d4d54b508ae0fe22f5199 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Tue, 23 Jun 2026 19:19:55 +0000
Subject: [PATCH 2/3] Consolidate timeseries benchmark into one routine (#57)

Merge the point-model duel and the interval study into a single walk-forward in
gists/timeseries/conformal/run_conformal.py. One leak-free NNS block forecast per
origin now feeds both analyses (previously two scripts each recomputed the
expensive NNS forecast), and the conformal calibration is unified so the per-lead
split-CP shares one well-seeded fallback across point models -- removing the
cold-start that depressed worst-window coverage in the standalone duel.

Outputs results/{point,interval}{,_all}.csv and one README covering both tables.
Removes the separate gists/timeseries/point_duel/ directory.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TX6vFofX1vnanE147tJFWF
---
 gists/timeseries/conformal/README.md          | 218 ++++++-------
 .../timeseries/conformal/results/interval.csv |   7 +
 .../conformal/results/interval_all.csv        |  61 ++++
 .../results}/point.csv                        |   0
 .../results}/point_all.csv                    |   0
 .../conformal/results_block/scores.csv        |   5 -
 .../conformal/results_block/scores_all.csv    |  41 ---
 .../conformal/run_block_nns_intervals.py      | 304 ------------------
 .../run_conformal.py}                         | 232 ++++++-------
 gists/timeseries/point_duel/README.md         |  93 ------
 .../point_duel/results_duel/interval.csv      |   5 -
 .../point_duel/results_duel/interval_all.csv  |  41 ---
 12 files changed, 297 insertions(+), 710 deletions(-)
 create mode 100644 gists/timeseries/conformal/results/interval.csv
 create mode 100644 gists/timeseries/conformal/results/interval_all.csv
 rename gists/timeseries/{point_duel/results_duel => conformal/results}/point.csv (100%)
 rename gists/timeseries/{point_duel/results_duel => conformal/results}/point_all.csv (100%)
 delete mode 100644 gists/timeseries/conformal/results_block/scores.csv
 delete mode 100644 gists/timeseries/conformal/results_block/scores_all.csv
 delete mode 100644 gists/timeseries/conformal/run_block_nns_intervals.py
 rename gists/timeseries/{point_duel/run_point_duel.py => conformal/run_conformal.py} (54%)
 delete mode 100644 gists/timeseries/point_duel/README.md
 delete mode 100644 gists/timeseries/point_duel/results_duel/interval.csv
 delete mode 100644 gists/timeseries/point_duel/results_duel/interval_all.csv

diff --git a/gists/timeseries/conformal/README.md b/gists/timeseries/conformal/README.md
index 79ec107..2aabb0d 100644
--- a/gists/timeseries/conformal/README.md
+++ b/gists/timeseries/conformal/README.md
@@ -1,127 +1,127 @@
-# NNS prediction intervals vs. a conformal wrapper — apples to apples
-
-A controlled test of **prediction-interval construction** on a single,
-heteroskedastic data-generating process. The **point forecast is held fixed**:
-every interval in the table below wraps the *same* `NNS.ARMA.optim` block
-forecast and the *same* NNS residuals. The only thing that varies is how the
-band is drawn. This isolates interval construction with zero point-model and
-zero protocol confounding.
-
-## Why this framing
-
-This is deliberately **not** the usual online conformal benchmark, and we want
-to be explicit that it is an **adaptation**. Standard adaptive conformal methods
-(ACI, PID, NexCP, …) are *online, h=1* procedures: they re-anchor on the realized
-observation every step, and their coverage is a control-theoretic property of
-that feedback loop rather than a statement about any forecast trajectory. An
-online method that re-anchors each step will hit its nominal coverage rate on
-almost any sequence — it is dragged across each one-step gap rather than
-forecasting it. Comparing a structural multi-step forecaster against that on a
-1-step task flatters the reactive method and tests the wrong thing.
-
-So here we **adapt conformal into a forecast wrapper** and ask a sharper
-question instead:
-
-> Given a genuine multi-step forecast made at time *t* with **no online
-> updating**, how well does each interval method discern **coverage guarantees
-> on a heteroskedastic process** — including *conditional* coverage across the
-> volatility regimes, not just the marginal rate?
-
-## Protocol
-
-- **DGP:** non-linear, heteroskedastic AR(1) with slow trend, two seasonal
-  components (periods 50, 200), and piecewise volatility regimes
-  (σ jumps to 2.5, drops to 0.55, settles at 1.8). 10 seeds, T = 3500.
-- **Forecast (fixed for all methods):** at each origin *t*, `NNS.ARMA.optim`
-  emits a single `h`-step block forecast, with `h = implied_h =
-  t·(1−0.9)/0.9`. The model (periods/method/bias) is selected on a strictly
-  **historical** validation tail — the scored block is never shown to the
-  optimizer (the leak from issue #57 is fixed). No observation inside the block
-  is fed back: this is forecasting by design, exposed to its own compounding
-  error.
-- **Intervals compared (all on the identical NNS forecast & residuals):**
-  - **NNS native PI** — `results ± pi_width`, NNS's own nonparametric rule
-    (a single flat half-width from the residual distribution + bias).
-  - **NNS + split-CP (flat)** — empirical (1−α) conformal quantile of the same
-    NNS validation residuals. Same residuals, textbook conformal quantile.
-  - **NNS + split-CP (per-lead-time)** — a conformal quantile *per lead-time k*,
-    pooled across past blocks (with a local-window thickening and a flat
-    fallback while pools are thin). The only method whose band **widens with
-    horizon**.
-  - **NNS + Gaussian (flat)** — `z·std(residuals)`, a parametric reference.
-- **Scoring:** interval (Winkler) score, marginal coverage, **worst rolling-window
-  coverage** and **low-/high-vol stratum coverage** (the heteroskedasticity
-  discriminators), mean width, and Gaussian-proxy CRPS / log-score.
-
-## Results
+# NNS under one honest forecasting protocol
 
+A single block walk-forward over one heteroskedastic data-generating process,
+emitting two coherent analyses from the *same* leak-free NNS forecast:
+
+1. **Point duel** — does NNS actually forecast better than a fair baseline?
+2. **Interval study** — given NNS's forecast, is its native prediction interval
+   better or worse than conformalizing the same residuals?
+
+This replaces the benchmark withdrawn under issue #57, which (a) leaked each
+evaluation chunk into `NNS.ARMA.optim`'s validation tail and (b) compared methods
+across mismatched protocols (online h=1 baselines vs. an NNS multi-step block).
+
+## The protocol (one rule for everyone)
+
+> At each origin *t*, given only data through *t*, every method forecasts an
+> `h`-step block (`h = implied_h = t·(1−0.9)/0.9`) with **no online updating** —
+> no method may peek at a realized value to predict the next one.
+
+This is *forecasting by design*, exposed to its own compounding error — the
+opposite of an online method that re-anchors on the truth every step. NNS's model
+(periods/method/bias) is selected on a strictly **historical** validation tail, so
+the scored block is never shown to the optimizer (the #57 leak is fixed).
+
+- **DGP:** non-linear, heteroskedastic AR(1), slow trend, two seasonal components
+  (periods 50, 200), piecewise volatility regimes (σ → 2.5, 0.55, 1.8). 10 seeds,
+  T = 3500, ~12 blocks/seed.
+
+## Point duel
+
+- **NNS block** — `NNS.ARMA.optim` native block forecast (leak-free selection).
+- **Ridge (recursive)** — ridge on `N_LAGS` lags, fit on all data ≤ *t*, projected
+  `h` steps recursively (its own predictions become the lags; no true intermediate
+  values — this removes the h=1 true-lag crutch the original baselines enjoyed).
+- **Persistence** — last value carried forward (floor).
+
+```
+           method   MAE  RMSE  median_AE
+        NNS block 1.514 2.056      1.122
+Ridge (recursive) 2.664 3.230      2.403
+      Persistence 2.493 3.241      1.991
 ```
-=== BLOCK NNS INTERVAL COMPARISON  (point = NNS fixed; mean over 10 seeds, alpha=0.1, target cov=0.9) ===
 
-                   method  marg_cov  worst_win_cov  cov_lowvol  cov_hivol  cond_cov_gap  width  interval_score   CRPS  logscore
-            NNS native PI     0.845          0.561       0.927      0.853         0.149  5.658           8.674  1.090     2.076
-    NNS + split-CP (flat)     0.824          0.515       0.899      0.850         0.169  5.398           8.735  1.091     2.102
-    NNS + Gaussian (flat)     0.821          0.516       0.894      0.842         0.170  5.332           8.773  1.092     2.105
-NNS + split-CP (per-lead)     0.890          0.582       0.983      0.844         0.099  7.995          10.521  1.169     2.268
+## Interval study
+
+We deliberately **adapt** conformal into a forecast wrapper here, and say so. Online
+adaptive conformal (ACI/PID/NexCP) gets its coverage from a per-step feedback loop —
+a control-theoretic property, not forecast skill — so pitting it against a multi-step
+block forecaster on a 1-step task tests the wrong thing. Instead we hold the NNS point
+forecast **fixed** and vary only the band, to discern **coverage guarantees on a
+heteroskedastic process** — conditional (per-regime / worst-window) coverage, not just
+the marginal rate:
+
+- **NNS native PI** — `results ± pi_width`, NNS's own flat nonparametric rule.
+- **NNS + split-CP (flat)** — empirical (1−α) conformal quantile of NNS residuals.
+- **NNS + split-CP (per-lead)** — a quantile *per lead-time k* (the only band that
+  widens with horizon).
+- **NNS + Gaussian (flat)** — `z · std(residuals)`.
+- **Ridge / Persistence + split-CP (per-lead)** — same wrapper on the weaker point
+  models, to show interval quality follows point quality.
+
+```
+                                 method  marg_cov  worst_win_cov  cov_lowvol  cov_hivol  cond_cov_gap  width  interval_score
+                          NNS native PI     0.845          0.561       0.927      0.853         0.149  5.658           8.674
+                  NNS + split-CP (flat)     0.824          0.515       0.899      0.850         0.169  5.398           8.735
+                  NNS + Gaussian (flat)     0.821          0.516       0.894      0.842         0.170  5.332           8.773
+              NNS + split-CP (per-lead)     0.918          0.649       0.996      0.858         0.097  8.595          10.505
+Ridge (recursive) + split-CP (per-lead)     0.865          0.553       0.955      0.826         0.149 10.169          13.853
+      Persistence + split-CP (per-lead)     0.887          0.333       0.981      0.807         0.162 12.107          16.440
 ```
 
-Sorted by interval (Winkler) score, lower is better. Point-forecast MAE was
-identical across rows (~1.51, the fixed NNS block forecast); only the bands differ.
+Point table sorted by RMSE, interval table by interval (Winkler) score; lower is
+better throughout.
 
 ## Findings
 
+- **NNS wins the point duel decisively** — MAE 1.51 vs. recursive ridge 2.66
+  (~43% lower) and persistence 2.49, same ordering on RMSE/median, consistent
+  across all 10 seeds. Recursive ridge falls *below* persistence on MAE: strip the
+  h=1 true-lag crutch and a linear AR model's error compounds over the block until
+  naive carry-forward beats it. Structural forecasting by design vs. reactive by
+  compulsion, made concrete.
 - **NNS's native PI is the efficiency winner and ≈ a flat split-conformal band.**
-  It posts the best Winkler score (8.674) with the tightest width (5.66), and it
-  actually covers slightly *better* than the textbook split-conformal quantile on
-  the *same* residuals (marg 0.845 vs 0.824). NNS's `upm_var`+bias rule is a hair
-  more conservative than the empirical conformal quantile, but the two track
-  closely — confirming the native interval is, in effect, a flat split-conformal
-  interval.
-- **Every flat band under-covers this heteroskedastic process.** All three
-  flat methods land at ~0.82–0.85 marginal vs. the 0.90 target, collapse to
-  ~0.51–0.56 in the *worst* 100-step window, and carry a ~0.15–0.17 calm-vs-volatile
-  conditional gap (≈0.90–0.93 in the calm regime, ≈0.84–0.85 in the volatile one).
-  A width calibrated on historical validation residuals does not transport to a
-  multi-step block that compounds error and crosses volatility regimes.
-- **Only the horizon-adaptive conformal wrapper recovers the guarantee — and pays
-  for it.** `split-CP (per-lead)` is the lone method to reach near-nominal marginal
-  coverage (0.890) and the smallest conditional gap (0.099), because it widens with
-  lead-time and calibrates on realized block residuals. The cost is ~45% wider
-  intervals (7.995) and the worst Winkler score (10.521).
-
-## Takeaway
-
-On a heteroskedastic process, under honest multi-step forecasting with **no online
-updating**, there is no free lunch. NNS's native interval is the *efficiency*
-winner but does **not** deliver the nominal 90% coverage — least of all in the
-volatile regime and the worst windows, exactly where guarantees matter most.
-Recovering the guarantee requires a horizon-adaptive conformal wrapper and paying
-for it in width. This coverage-vs-efficiency tension — not a single "winner" — is
-what the benchmark is built to expose, and it is why we frame this as an
-*adaptation to discern coverage guarantees on a heteroskedastic process* rather
-than a claim that any one band dominates.
+  Best interval score (8.674) at the tightest competitive width, and it tracks the
+  empirical split-conformal quantile on the *same* residuals (8.735) — confirming
+  the native band is, in effect, flat split conformal, slightly more conservative.
+- **Every flat band under-covers the volatile regime** (marg 0.82–0.85, worst-window
+  ~0.51–0.56, `cov_hivol` ~0.84–0.85). Calibrating a single width on historical
+  residuals cannot transport to a multi-step block that compounds error and crosses
+  σ-regimes — a textbook exchangeability failure under heteroskedasticity.
+- **Only the horizon-adaptive per-lead wrapper recovers the guarantee** — near-nominal
+  0.918 marginal, the smallest conditional gap (0.097) and best worst-window (0.649) —
+  at ~52% wider intervals and the worst Winkler score among the NNS bands. No free
+  lunch: coverage on a heteroskedastic process costs width.
+- **Interval quality follows point quality.** The *same* per-lead wrapper on the
+  weaker point models scores far worse (NNS 10.5 ≪ ridge 13.9 ≪ persistence 16.4)
+  with much wider bands — a better forecast makes a tighter, better-centred interval.
 
 ## How to read it
 
-- **`NNS native PI` vs `NNS + split-CP (flat)`** is the headline head-to-head:
-  the *same* residuals, NNS's `upm_var`+bias rule vs. the empirical conformal
-  quantile. If they track closely, NNS's native band already *is*, in effect, a
-  flat split-conformal interval.
-- **`per-lead-time`** is the only band that adapts to horizon. Watch
-  **worst-window** and **high-vol stratum** coverage: that is where letting the
-  width grow with lead-time should pay off on a heteroskedastic process, at some
-  cost in mean width.
-- **Marginal coverage near 0.90 is necessary but cheap.** The honest test on a
-  heteroskedastic DGP is the **conditional** coverage gap between the calm and
-  volatile regimes — a flat band that hits 0.90 on average by over-covering the
-  calm regime and under-covering the volatile one has not actually solved the
-  problem.
+- **Marginal coverage near 0.90 is necessary but cheap.** On a heteroskedastic DGP
+  the honest test is the **conditional** gap between calm (`cov_lowvol`) and volatile
+  (`cov_hivol`) regimes and the **worst rolling window** — a flat band that hits 0.90
+  on average by over-covering calm and under-covering volatile has not solved the
+  problem. Every flat band here under-covers the volatile regime: that is the
+  exchangeability-violation penalty, common to all of them, not specific to any one.
+- **Width is only a fair tie-breaker at matched coverage.** Among methods with
+  different coverage, a narrower band may just be under-covering more. The interval
+  (Winkler) score combines the two (`width + (2/α)·exceedance`), which is why it is
+  the sort key.
 
 ## Run it
 
 ```bash
-pip install ovvo-nns numpy pandas scipy
-python run_block_nns_intervals.py
+pip install ovvo-nns numpy pandas scipy scikit-learn
+python run_conformal.py
 ```
 
-Writes `results_block/scores.csv` (aggregated) and `scores_all.csv` (per seed).
+Writes `results/point.csv`, `results/interval.csv` (aggregated) and the per-seed
+`*_all.csv`.
+
+## Not yet addressed
+
+`NNS.ARMA.optim` still treats the tail after `training_set` as its validation target
+with no guard or warning when out-of-range data is passed — the upstream gap behind
+#57. Adding that guard is left to a follow-up; until then, treat this as the
+corrected-content restoration, not a closed case.
diff --git a/gists/timeseries/conformal/results/interval.csv b/gists/timeseries/conformal/results/interval.csv
new file mode 100644
index 0000000..206495f
--- /dev/null
+++ b/gists/timeseries/conformal/results/interval.csv
@@ -0,0 +1,7 @@
+method,marg_cov,worst_win_cov,cov_lowvol,cov_hivol,cond_cov_gap,width,interval_score
+NNS native PI,0.845217041800643,0.561,0.9271704180064309,0.852572347266881,0.14855305466237945,5.658247904936664,8.673563693193246
+NNS + split-CP (flat),0.8243569131832797,0.515,0.8993569131832798,0.84983922829582,0.1686495176848875,5.397754711147717,8.734633715776543
+NNS + Gaussian (flat),0.8206993569131832,0.516,0.8940514469453376,0.8421221864951768,0.17025723472668813,5.332100605902861,8.772635078222827
+NNS + split-CP (per-lead),0.917604501607717,0.649,0.9959807073954984,0.8577170418006432,0.09655948553054661,8.59459297384414,10.505483113927781
+Ridge (recursive) + split-CP (per-lead),0.865032154340836,0.553,0.9546623794212218,0.8263665594855306,0.14855305466237945,10.16897349633998,13.853289701795891
+Persistence + split-CP (per-lead),0.88725884244373,0.333,0.9810289389067524,0.807395498392283,0.1617684887459807,12.106951264773958,16.439867386859973
diff --git a/gists/timeseries/conformal/results/interval_all.csv b/gists/timeseries/conformal/results/interval_all.csv
new file mode 100644
index 0000000..cb5ba21
--- /dev/null
+++ b/gists/timeseries/conformal/results/interval_all.csv
@@ -0,0 +1,61 @@
+method,marg_cov,worst_win_cov,cov_lowvol,cov_hivol,cond_cov_gap,width,interval_score
+NNS native PI,0.8476688102893891,0.67,0.9131832797427653,0.8070739549839229,0.09453376205787789,5.3336289551117595,8.347311881079282
+NNS + split-CP (flat),0.8271704180064309,0.58,0.8954983922829582,0.8311897106109325,0.14919614147909965,5.227161385540743,8.534113534345261
+NNS + Gaussian (flat),0.8295819935691319,0.58,0.8858520900321544,0.8279742765273312,0.14276527331189715,5.220493265073913,8.535512774975702
+NNS + split-CP (per-lead),0.9147909967845659,0.65,0.9967845659163987,0.8183279742765274,0.09678456591639872,8.425865285679825,10.568242001830663
+Ridge (recursive) + split-CP (per-lead),0.8661575562700965,0.64,0.9598070739549839,0.7909967845659164,0.10900321543408364,10.581823596303991,14.592993775300465
+Persistence + split-CP (per-lead),0.8721864951768489,0.44,0.9710610932475884,0.707395498392283,0.19260450160771703,10.549102615950632,14.52448029198288
+NNS native PI,0.8287781350482315,0.52,0.9163987138263665,0.8617363344051447,0.2022508038585209,5.538396478733216,9.039315486588494
+NNS + split-CP (flat),0.815112540192926,0.51,0.9067524115755627,0.8440514469453376,0.23118971061093252,5.423595785308306,9.16356799285496
+NNS + Gaussian (flat),0.8163183279742765,0.52,0.9163987138263665,0.8408360128617364,0.23118971061093252,5.371605324250339,9.132929020698203
+NNS + split-CP (per-lead),0.9111736334405145,0.6,0.9967845659163987,0.8456591639871383,0.09678456591639872,8.679187273640938,10.817084198706139
+Ridge (recursive) + split-CP (per-lead),0.8356109324758842,0.55,0.932475884244373,0.7909967845659164,0.1909967845659164,9.768945508312589,14.299961658445255
+Persistence + split-CP (per-lead),0.8934887459807074,0.52,0.9967845659163987,0.7508038585209004,0.14919614147909965,11.420957332270556,14.557742149799997
+NNS native PI,0.8480707395498392,0.54,0.9453376205787781,0.8713826366559485,0.14115755627009652,5.498313881263709,8.568987367833104
+NNS + split-CP (flat),0.8348070739549839,0.47,0.9115755627009646,0.8858520900321544,0.16366559485530552,5.475440096762711,8.646211618108355
+NNS + Gaussian (flat),0.8340032154340836,0.47,0.905144694533762,0.8681672025723473,0.17009646302250803,5.3825939727945,8.623802541408669
+NNS + split-CP (per-lead),0.9127813504823151,0.6,0.9871382636655949,0.8810289389067524,0.09292604501607715,8.192814278346807,10.105807673835704
+Ridge (recursive) + split-CP (per-lead),0.8701768488745981,0.6,0.9517684887459807,0.8520900321543409,0.16045016077170415,9.911929457282502,13.517878973369866
+Persistence + split-CP (per-lead),0.8782154340836013,0.24,0.9919614147909968,0.7893890675241158,0.15562700964630227,13.918516762035877,18.90446027710822
+NNS native PI,0.8404340836012861,0.55,0.9469453376205788,0.8762057877813505,0.17009646302250803,5.417943415993259,8.509212166486481
+NNS + split-CP (flat),0.8183279742765274,0.51,0.8858520900321544,0.8569131832797428,0.15241157556270102,5.073486836478882,8.35866609307891
+NNS + Gaussian (flat),0.8207395498392283,0.54,0.8697749196141479,0.8569131832797428,0.13151125401929264,5.187680907668864,8.312439020985863
+NNS + split-CP (per-lead),0.9232315112540193,0.65,0.9935691318327974,0.8665594855305466,0.09356913183279736,8.47780821511154,10.420173113316652
+Ridge (recursive) + split-CP (per-lead),0.8629421221864951,0.56,0.954983922829582,0.8054662379421221,0.14115755627009652,10.224738158689307,13.735472897847638
+Persistence + split-CP (per-lead),0.9143890675241158,0.11,1.0,0.8118971061093248,0.09999999999999998,14.479797580928631,19.260873784808723
+NNS native PI,0.8472668810289389,0.51,0.9405144694533762,0.7733118971061094,0.12668810289389065,5.912994900178406,8.952518643444352
+NNS + split-CP (flat),0.8243569131832797,0.58,0.9067524115755627,0.7765273311897106,0.15401929260450165,5.307560736619756,8.610536381423858
+NNS + Gaussian (flat),0.8171221864951769,0.57,0.887459807073955,0.7733118971061094,0.1508038585209004,5.2117633612580585,8.673499632507543
+NNS + split-CP (per-lead),0.9212218649517685,0.67,0.9903536977491961,0.8327974276527331,0.0903536977491961,8.612047167435271,10.189042171041397
+Ridge (recursive) + split-CP (per-lead),0.8621382636655949,0.48,0.9421221864951769,0.8183279742765274,0.15401929260450165,9.940825035097003,13.418993354272594
+Persistence + split-CP (per-lead),0.9308681672025724,0.6,0.9453376205787781,0.8569131832797428,0.07588424437299035,12.099187370057201,13.957050019151211
+NNS native PI,0.860128617363344,0.58,0.9501607717041801,0.8906752411575563,0.15562700964630227,5.662363541166043,8.391587116904702
+NNS + split-CP (flat),0.8243569131832797,0.37,0.887459807073955,0.8745980707395499,0.14919614147909965,5.3533136605690315,8.60722523849902
+NNS + Gaussian (flat),0.8247588424437299,0.37,0.9019292604501608,0.8729903536977492,0.15241157556270102,5.31108027242601,8.649115111814341
+NNS + split-CP (per-lead),0.917604501607717,0.62,0.9967845659163987,0.8971061093247589,0.09678456591639872,8.460963622012667,10.275595768590357
+Ridge (recursive) + split-CP (per-lead),0.8677652733118971,0.5,0.9308681672025724,0.8504823151125402,0.14437299035369777,10.008320672419277,13.318984886956887
+Persistence + split-CP (per-lead),0.8995176848874598,0.09,0.9967845659163987,0.7877813504823151,0.1122186495176849,15.609723603863243,21.539983068871113
+NNS native PI,0.8432475884244373,0.62,0.9019292604501608,0.8408360128617364,0.10900321543408364,5.958905251551655,8.884737288789054
+NNS + split-CP (flat),0.802652733118971,0.55,0.9019292604501608,0.8344051446945338,0.18778135048231515,5.514395594522053,9.282531631715036
+NNS + Gaussian (flat),0.7930064308681672,0.54,0.887459807073955,0.8215434083601286,0.1958199356913184,5.443032862596285,9.404878484224922
+NNS + split-CP (per-lead),0.9228295819935691,0.68,1.0,0.8440514469453376,0.09999999999999998,9.20007850046642,10.997501962538344
+Ridge (recursive) + split-CP (per-lead),0.8782154340836013,0.59,0.9839228295819936,0.8456591639871383,0.14758842443729903,10.477958983260796,14.128418910843173
+Persistence + split-CP (per-lead),0.8380225080385852,0.24,0.9823151125401929,0.7379421221864951,0.24565916398713827,11.622172326163268,17.7186519202099
+NNS native PI,0.8404340836012861,0.42,0.9501607717041801,0.8665594855305466,0.19742765273311902,5.533785215198311,8.749410650539787
+NNS + split-CP (flat),0.8336012861736335,0.47,0.9212218649517685,0.8488745980707395,0.16045016077170415,5.394905745099468,8.708916464271185
+NNS + Gaussian (flat),0.8211414790996785,0.46,0.9115755627009646,0.8408360128617364,0.18135048231511253,5.203825372900099,8.80921517840864
+NNS + split-CP (per-lead),0.9143890675241158,0.65,1.0,0.864951768488746,0.09999999999999998,8.344194523078558,10.216779329031802
+Ridge (recursive) + split-CP (per-lead),0.8633440514469454,0.5,0.9630225080385852,0.837620578778135,0.17813504823151127,9.899806431265855,13.626610825171564
+Persistence + split-CP (per-lead),0.9184083601286174,0.72,0.9758842443729904,0.8585209003215434,0.07588424437299035,10.229330031021911,12.202300632363936
+NNS native PI,0.8484726688102894,0.58,0.905144694533762,0.8778135048231511,0.13151125401929264,6.028058412975543,8.799144860895987
+NNS + split-CP (flat),0.8311897106109325,0.52,0.887459807073955,0.8697749196141479,0.1620578778135049,5.725218656291451,8.758728314500287
+NNS + Gaussian (flat),0.8255627009646302,0.52,0.8858520900321544,0.8569131832797428,0.1620578778135049,5.649387814291777,8.818937346230603
+NNS + split-CP (per-lead),0.9172025723472669,0.66,0.9983922829581994,0.8520900321543409,0.09839228295819935,8.733533232757578,10.717532145849189
+Ridge (recursive) + split-CP (per-lead),0.8633440514469454,0.51,0.9565916398713826,0.8279742765273312,0.14276527331189715,10.30781624713549,14.048835006470576
+Persistence + split-CP (per-lead),0.8464630225080386,0.2,0.9565916398713826,0.8842443729903537,0.28906752411575565,9.805366045553008,15.29539421577076
+NNS native PI,0.8476688102893891,0.62,0.9019292604501608,0.860128617363344,0.1572347266881029,5.69808899719474,8.493411469371221
+NNS + split-CP (flat),0.8319935691318328,0.59,0.8890675241157556,0.8762057877813505,0.17652733118971065,5.482468614284766,8.675839888968568
+NNS + Gaussian (flat),0.8247588424437299,0.59,0.8890675241157556,0.8617363344051447,0.1845659163987139,5.339542905768758,8.766021670973783
+NNS + split-CP (per-lead),0.9208199356913184,0.71,1.0,0.8745980707395499,0.09999999999999998,8.819437639911786,10.747072774537559
+Ridge (recursive) + split-CP (per-lead),0.8806270096463023,0.6,0.9710610932475884,0.8440514469453376,0.11704180064308689,10.567570873632986,13.844746729280885
+Persistence + split-CP (per-lead),0.8810289389067524,0.17,0.9935691318327974,0.8890675241157556,0.22154340836012865,11.335358979895252,16.437737508532983
diff --git a/gists/timeseries/point_duel/results_duel/point.csv b/gists/timeseries/conformal/results/point.csv
similarity index 100%
rename from gists/timeseries/point_duel/results_duel/point.csv
rename to gists/timeseries/conformal/results/point.csv
diff --git a/gists/timeseries/point_duel/results_duel/point_all.csv b/gists/timeseries/conformal/results/point_all.csv
similarity index 100%
rename from gists/timeseries/point_duel/results_duel/point_all.csv
rename to gists/timeseries/conformal/results/point_all.csv
diff --git a/gists/timeseries/conformal/results_block/scores.csv b/gists/timeseries/conformal/results_block/scores.csv
deleted file mode 100644
index 8252db9..0000000
--- a/gists/timeseries/conformal/results_block/scores.csv
+++ /dev/null
@@ -1,5 +0,0 @@
-method,marg_cov,worst_win_cov,cov_lowvol,cov_hivol,cond_cov_gap,width,interval_score,CRPS,logscore
-NNS native PI,0.845217041800643,0.561,0.9271704180064309,0.852572347266881,0.14855305466237945,5.658247904936664,8.673563693193246,1.0902136839659236,2.076059884663743
-NNS + split-CP (flat),0.8243569131832797,0.515,0.8993569131832798,0.84983922829582,0.1686495176848875,5.397754711147717,8.734633715776543,1.090996705962063,2.101657125801346
-NNS + Gaussian (flat),0.8206993569131832,0.516,0.8940514469453376,0.8421221864951768,0.17025723472668813,5.332100605902861,8.772635078222827,1.091613092749769,2.104833304629708
-NNS + split-CP (per-lead),0.890112540192926,0.5820000000000001,0.9829581993569132,0.8437299035369776,0.09909967845659165,7.995348220035973,10.520733085082481,1.1692625763837228,2.267904409014462
diff --git a/gists/timeseries/conformal/results_block/scores_all.csv b/gists/timeseries/conformal/results_block/scores_all.csv
deleted file mode 100644
index 0e62ba6..0000000
--- a/gists/timeseries/conformal/results_block/scores_all.csv
+++ /dev/null
@@ -1,41 +0,0 @@
-method,marg_cov,worst_win_cov,cov_lowvol,cov_hivol,cond_cov_gap,width,interval_score,CRPS,logscore
-NNS native PI,0.8476688102893891,0.67,0.9131832797427653,0.8070739549839229,0.09453376205787789,5.3336289551117595,8.347311881079282,1.0521806181725764,1.9988765275223412
-NNS + split-CP (flat),0.8271704180064309,0.58,0.8954983922829582,0.8311897106109325,0.14919614147909965,5.227161385540743,8.534113534345261,1.0564514057462153,2.047717813637678
-NNS + split-CP (per-lead),0.8886655948553055,0.65,0.9839228295819936,0.8086816720257235,0.09131832797427653,7.846909211270708,10.496079109394278,1.140506122590597,2.250303555178376
-NNS + Gaussian (flat),0.8295819935691319,0.58,0.8858520900321544,0.8279742765273312,0.14276527331189715,5.220493265073913,8.535512774975702,1.0556550754409413,2.0458238791253365
-NNS native PI,0.8287781350482315,0.52,0.9163987138263665,0.8617363344051447,0.2022508038585209,5.538396478733216,9.039315486588494,1.1017051722008695,2.123941052068062
-NNS + split-CP (flat),0.815112540192926,0.51,0.9067524115755627,0.8440514469453376,0.23118971061093252,5.423595785308306,9.16356799285496,1.1045221353056585,2.145312293713436
-NNS + split-CP (per-lead),0.8830385852090032,0.6,0.9887459807073955,0.8295819935691319,0.12025723472668814,7.990812843044477,10.886441408223009,1.185924631575878,2.292922429135039
-NNS + Gaussian (flat),0.8163183279742765,0.52,0.9163987138263665,0.8408360128617364,0.23118971061093252,5.371605324250339,9.132929020698203,1.1034425720507015,2.1344452670414986
-NNS native PI,0.8480707395498392,0.54,0.9453376205787781,0.8713826366559485,0.14115755627009652,5.498313881263709,8.568987367833104,1.0769091090418628,2.088025725407269
-NNS + split-CP (flat),0.8348070739549839,0.47,0.9115755627009646,0.8858520900321544,0.16366559485530552,5.475440096762711,8.646211618108355,1.0779606714369134,2.1374529629188985
-NNS + split-CP (per-lead),0.889871382636656,0.49,0.9758842443729904,0.8697749196141479,0.11061093247588427,7.666059526775997,10.240968683617762,1.1422851084449035,2.299705192621768
-NNS + Gaussian (flat),0.8340032154340836,0.47,0.905144694533762,0.8681672025723473,0.17009646302250803,5.3825939727945,8.623802541408669,1.0773609694299997,2.127147377666515
-NNS native PI,0.8404340836012861,0.55,0.9469453376205788,0.8762057877813505,0.17009646302250803,5.417943415993259,8.509212166486481,1.0513799372860273,2.0645442083490733
-NNS + split-CP (flat),0.8183279742765274,0.51,0.8858520900321544,0.8569131832797428,0.15241157556270102,5.073486836478882,8.35866609307891,1.0468951694182635,2.0402247395377504
-NNS + split-CP (per-lead),0.8971061093247589,0.59,0.9790996784565916,0.8553054662379421,0.08006430868167203,7.868177918914598,10.290785210091181,1.1317381317261397,2.2247260687632786
-NNS + Gaussian (flat),0.8207395498392283,0.54,0.8697749196141479,0.8569131832797428,0.13151125401929264,5.187680907668864,8.312439020985863,1.0459231999241414,2.026847378341962
-NNS native PI,0.8472668810289389,0.51,0.9405144694533762,0.7733118971061094,0.12668810289389065,5.912994900178406,8.952518643444352,1.1094460776961108,2.105079957413961
-NNS + split-CP (flat),0.8243569131832797,0.58,0.9067524115755627,0.7765273311897106,0.15401929260450165,5.307560736619756,8.610536381423858,1.099087789113522,2.0849945815307396
-NNS + split-CP (per-lead),0.8866559485530546,0.6,0.9823151125401929,0.7845659163987139,0.11543408360128615,7.9404115839516605,10.336416034686959,1.1735005767309783,2.2413111688768863
-NNS + Gaussian (flat),0.8171221864951769,0.57,0.887459807073955,0.7733118971061094,0.1508038585209004,5.2117633612580585,8.673499632507543,1.0995952439049514,2.1011013227051474
-NNS native PI,0.860128617363344,0.58,0.9501607717041801,0.8906752411575563,0.15562700964630227,5.662363541166043,8.391587116904702,1.0718817368677682,2.0521000120264206
-NNS + split-CP (flat),0.8243569131832797,0.37,0.887459807073955,0.8745980707395499,0.14919614147909965,5.3533136605690315,8.60722523849902,1.0765820133621553,2.1197718297356456
-NNS + split-CP (per-lead),0.8922829581993569,0.6,0.9871382636655949,0.8778135048231511,0.09453376205787789,7.897905901117482,10.200940248313179,1.1494769211085452,2.242593936246776
-NNS + Gaussian (flat),0.8247588424437299,0.37,0.9019292604501608,0.8729903536977492,0.15241157556270102,5.31108027242601,8.649115111814341,1.0773573984920315,2.1247815699699717
-NNS native PI,0.8432475884244373,0.62,0.9019292604501608,0.8408360128617364,0.10900321543408364,5.958905251551655,8.884737288789054,1.1533883414727397,2.112495798167121
-NNS + split-CP (flat),0.802652733118971,0.55,0.9019292604501608,0.8344051446945338,0.18778135048231515,5.514395594522053,9.282531631715036,1.1632614958510543,2.169479165301043
-NNS + split-CP (per-lead),0.8882636655948553,0.62,0.9823151125401929,0.842443729903537,0.08231511254019286,8.567512061462239,11.159806161705042,1.2453577968361804,2.3305953801588606
-NNS + Gaussian (flat),0.7930064308681672,0.54,0.887459807073955,0.8215434083601286,0.1958199356913184,5.443032862596285,9.404878484224922,1.165957599219953,2.1892321453449055
-NNS native PI,0.8404340836012861,0.42,0.9501607717041801,0.8665594855305466,0.19742765273311902,5.533785215198311,8.749410650539787,1.0714713083435363,2.076368446829223
-NNS + split-CP (flat),0.8336012861736335,0.47,0.9212218649517685,0.8488745980707395,0.16045016077170415,5.394905745099468,8.708916464271185,1.069461483415126,2.071863509791649
-NNS + split-CP (per-lead),0.8858520900321544,0.49,0.9887459807073955,0.837620578778135,0.11382636655948553,7.695935282614414,10.322007696267946,1.145102489810678,2.2482944871036135
-NNS + Gaussian (flat),0.8211414790996785,0.46,0.9115755627009646,0.8408360128617364,0.18135048231511253,5.203825372900099,8.80921517840864,1.071527536672299,2.088411117524039
-NNS native PI,0.8484726688102894,0.58,0.905144694533762,0.8778135048231511,0.13151125401929264,6.028058412975543,8.799144860895987,1.1309121052212678,2.0927473976307103
-NNS + split-CP (flat),0.8311897106109325,0.52,0.887459807073955,0.8697749196141479,0.1620578778135049,5.725218656291451,8.758728314500287,1.130064761156996,2.113226263951555
-NNS + split-CP (per-lead),0.8914790996784566,0.53,0.977491961414791,0.8569131832797428,0.09292604501607715,8.23717370280603,10.704707152434425,1.212676341149698,2.3029740791509625
-NNS + Gaussian (flat),0.8255627009646302,0.52,0.8858520900321544,0.8569131832797428,0.1620578778135049,5.649387814291777,8.818937346230603,1.1313117903011027,2.117039408279818
-NNS native PI,0.8476688102893891,0.62,0.9019292604501608,0.860128617363344,0.1572347266881029,5.69808899719474,8.493411469371221,1.0828624333564756,2.046419721223251
-NNS + split-CP (flat),0.8319935691318328,0.59,0.8890675241157556,0.8762057877813505,0.17652733118971065,5.482468614284766,8.675839888968568,1.0856801348147251,2.0865280978950644
-NNS + split-CP (per-lead),0.8979099678456591,0.65,0.9839228295819936,0.8745980707395499,0.0897106109324759,8.242584168402129,10.569179146091031,1.166057643863629,2.24561779290906
-NNS + Gaussian (flat),0.8247588424437299,0.59,0.8890675241157556,0.8617363344051447,0.1845659163987139,5.339542905768758,8.766021670973783,1.0879995420615687,2.0935035802978836
diff --git a/gists/timeseries/conformal/run_block_nns_intervals.py b/gists/timeseries/conformal/run_block_nns_intervals.py
deleted file mode 100644
index 382bbf3..0000000
--- a/gists/timeseries/conformal/run_block_nns_intervals.py
+++ /dev/null
@@ -1,304 +0,0 @@
-"""
-run_block_nns_intervals.py
-================================================================================
-Apples-to-apples prediction-interval comparison, NNS point forecast held FIXED.
-
-Question answered
------------------
-At each origin t, given only data through t, NNS.ARMA.optim emits a single
-h-step block forecast (h = implied_h = current_train*(1-0.9)/0.9, NO online
-updating inside the block). We then build several prediction intervals on the
-SAME NNS point forecast and the SAME NNS residuals, and ask:
-
-    Does NNS's native prediction interval beat conformalizing NNS's own
-    residuals — and does letting the band widen with lead-time help?
-
-Because the point model is identical across every interval method, this isolates
-interval construction with zero protocol mismatch. (Contrast issue #57, where the
-NNS optimizer also saw the scored block — fixed here: the model is selected on a
-strictly historical validation tail, and the scored block is never shown to it.)
-
-Interval methods compared (all on the identical NNS block point forecast):
-  * NNS native PI            — results ± pi_width  (flat width, NNS's own rule)
-  * NNS + split-CP (flat)    — empirical 0.9 quantile of NNS validation residuals
-  * NNS + split-CP (per-lead)— quantile per lead-time k, pooled across past blocks
-                               (widens with horizon; data-starved -> fallback early)
-  * NNS + Gaussian (flat)    — z * std(NNS validation residuals)
-
-Run:
-    pip install ovvo-nns numpy pandas scipy matplotlib
-    python run_block_nns_intervals.py
-"""
-from __future__ import annotations
-
-import os
-import math
-import time
-import warnings
-from collections import defaultdict
-
-import numpy as np
-import pandas as pd
-from scipy import stats
-
-warnings.filterwarnings("ignore")
-
-try:
-    import nns as NNS
-except ImportError:
-    raise SystemExit("ovvo-nns is required. Run: pip install ovvo-nns")
-
-# ── Constants ────────────────────────────────────────────────────────────────
-ALPHA       = 0.10
-TARGET_COV  = 1.0 - ALPHA
-N_LAGS      = 12
-CAL_END     = 1000          # first NNS origin = N_LAGS + CAL_END
-WINDOW      = 100
-N_SEEDS     = 10
-TRAINING_FRAC = 0.90        # implied_h = current_train*(1-frac)/frac
-VAL_FRAC    = 0.10          # historical validation-tail fraction for model selection
-CP_MIN_POOL = 8             # min per-lead samples before trusting per-lead CP
-LOCAL_WIN   = 2             # +/- lead-time window to thicken per-lead pools
-
-os.makedirs("results_block", exist_ok=True)
-
-_MSE = lambda predicted, actual: np.mean((predicted - actual) ** 2)
-
-# ── DGP ──────────────────────────────────────────────────────────────────────
-def make_timeseries(T: int = 3500, seed: int = 0, heavy_tail: bool = False) -> dict:
-    rng = np.random.default_rng(seed + 1)
-    tt    = np.arange(1, T + 1, dtype=float)
-    level = 0.002 * tt + 1.50 * np.sin(2*np.pi*tt/50) + 0.75 * np.sin(2*np.pi*tt/200)
-    sigma = np.ones(T)
-    sigma[(tt > 900)  & (tt <= 1400)] = 2.5
-    sigma[(tt > 1900) & (tt <= 2450)] = 0.55
-    sigma[(tt > 2800)]                = 1.8
-    eps = rng.standard_t(5, size=T) / math.sqrt(5/3) if heavy_tail else rng.standard_normal(T)
-    y   = np.empty(T)
-    y[0] = level[0] + sigma[0] * eps[0]
-    for i in range(1, T):
-        y[i] = level[i] + 0.55 * (y[i-1] - level[i-1]) + sigma[i] * eps[i]
-    return {"y": y, "level": level, "sigma": sigma}
-
-# ── Scoring helpers ──────────────────────────────────────────────────────────
-def z_alpha(alpha: float = ALPHA) -> float:
-    return float(stats.norm.ppf(1 - alpha / 2))
-
-def coverage(lo, hi, y) -> float:
-    lo, hi, y = map(np.asarray, (lo, hi, y))
-    return float(np.mean((y >= lo) & (y <= hi)))
-
-def mean_width(lo, hi) -> float:
-    return float(np.mean(np.asarray(hi) - np.asarray(lo)))
-
-def rolling_coverage(lo, hi, y, window: int = WINDOW) -> np.ndarray:
-    lo, hi, y = map(np.asarray, (lo, hi, y))
-    n = len(y)
-    if n < window:
-        return np.array([])
-    return np.array([coverage(lo[i:i+window], hi[i:i+window], y[i:i+window])
-                     for i in range(n - window + 1)])
-
-def worst_window_coverage(lo, hi, y, window: int = WINDOW) -> float:
-    rc = rolling_coverage(lo, hi, y, window)
-    return float(np.min(rc)) if len(rc) > 0 else float("nan")
-
-def interval_score(lo, hi, y, alpha: float = ALPHA) -> float:
-    lo, hi, y = map(np.asarray, (lo, hi, y))
-    return float(np.mean((hi - lo)
-                         + (2/alpha) * np.maximum(lo - y, 0)
-                         + (2/alpha) * np.maximum(y - hi, 0)))
-
-def coverage_by_stratum(lo, hi, y, sigma, k: int = 4) -> list[float]:
-    lo, hi, y, sigma = map(np.asarray, (lo, hi, y, sigma))
-    ranks = stats.rankdata(sigma, method="ordinal")
-    grp   = np.ceil(ranks / len(ranks) * k).astype(int).clip(1, k)
-    return [coverage(lo[grp == j], hi[grp == j], y[grp == j]) for j in range(1, k+1)]
-
-def crps_gaussian(mu, sigma, y) -> float:
-    sigma = np.maximum(np.asarray(sigma, float), 1e-8)
-    z = (np.asarray(y) - np.asarray(mu)) / sigma
-    return float(np.mean(sigma * (z * (2*stats.norm.cdf(z) - 1)
-                                  + 2*stats.norm.pdf(z) - 1/math.sqrt(math.pi))))
-
-def log_score_gaussian(mu, sigma, y) -> float:
-    sigma = np.maximum(np.asarray(sigma, float), 1e-8)
-    return float(np.mean(-stats.norm.logpdf(np.asarray(y), loc=np.asarray(mu), scale=sigma)))
-
-def score_method(name, lo, hi, y, sigma, mu_=None, s_=None) -> dict:
-    lo_v = np.minimum(np.asarray(lo, float), np.asarray(hi, float))
-    hi_v = np.maximum(np.asarray(lo, float), np.asarray(hi, float))
-    cbs  = coverage_by_stratum(lo_v, hi_v, y, sigma)
-    row = {
-        "method":         name,
-        "marg_cov":       coverage(lo_v, hi_v, y),
-        "worst_win_cov":  worst_window_coverage(lo_v, hi_v, y),
-        "cov_lowvol":     cbs[0],
-        "cov_hivol":      cbs[-1],
-        "cond_cov_gap":   max(abs(c - TARGET_COV) for c in cbs if not math.isnan(c)),
-        "width":          mean_width(lo_v, hi_v),
-        "interval_score": interval_score(lo_v, hi_v, y),
-        "CRPS":           float("nan"),
-        "logscore":       float("nan"),
-    }
-    if mu_ is not None and s_ is not None:
-        row["CRPS"]     = crps_gaussian(mu_, s_, y)
-        row["logscore"] = log_score_gaussian(mu_, s_, y)
-    return row
-
-# ── NNS seasonal period discovery (full-window) ──────────────────────────────
-def get_nns_seas_periods(series: np.ndarray, train_n: int) -> list[int]:
-    max_period = int(train_n / 3) - 1
-    try:
-        raw = NNS.nns_seas(series, plot=False).get("periods", np.array([]))
-        seen, valid = set(), []
-        for p in raw:
-            pi = int(p)
-            if 1 < pi < max_period and pi not in seen:
-                seen.add(pi); valid.append(pi)
-        return valid if valid else [2]
-    except Exception:
-        return [2]
-
-def _emp_q(scores, level=TARGET_COV):
-    """Finite-sample conformal quantile of |scores| at coverage `level`."""
-    s = np.sort(np.abs(np.asarray(scores, float)))
-    n = len(s)
-    if n == 0:
-        return None
-    k = int(np.ceil((n + 1) * level)) - 1
-    return float("inf") if k >= n else float(s[k])
-
-# ── Block walk-forward: NNS point fixed, intervals swapped ────────────────────
-def run_once(seed: int = 0, heavy_tail: bool = False, verbose: bool = True) -> dict:
-    d = make_timeseries(T=3500, seed=seed, heavy_tail=heavy_tail)
-    y, sig = d["y"], d["sigma"]
-    T_raw = len(y)
-    z = z_alpha(ALPHA)
-
-    # accumulators: per interval-method lo/hi, shared point/y/sigma per scored step
-    acc_y, acc_sig, acc_pt = [], [], []
-    lo_acc = defaultdict(list); hi_acc = defaultdict(list); sig_acc = defaultdict(list)
-    per_lead_pool: dict[int, list] = defaultdict(list)     # realized |resid| by lead-time
-
-    current_train = N_LAGS + CAL_END
-    t0 = time.time(); n_chunks = 0
-
-    while current_train < T_raw:
-        remaining = T_raw - current_train
-        implied_h = max(1, int(current_train * (1 - TRAINING_FRAC) / TRAINING_FRAC))
-        h = min(remaining, implied_h)
-        train_slice = y[:current_train]
-        val_h = max(1, int(round(VAL_FRAC * current_train)))
-        periods = get_nns_seas_periods(train_slice, current_train)
-
-        try:
-            fit = NNS.nns_arma_optim(
-                variable=train_slice, h=h, training_set=current_train - val_h,
-                seasonal_factor=periods, negative_values=True,
-                obj_fn=_MSE, objective="min", linear_approximation=True,
-                pred_int=TARGET_COV, print_trace=False, plot=False,
-            )
-        except Exception as exc:
-            if verbose:
-                print(f"  [seed {seed} chunk {n_chunks+1} failed: {exc}] – skipping")
-            current_train += h
-            continue
-
-        point     = np.asarray(fit["results"], float)            # bias-corrected block point
-        nat_lo    = np.asarray(fit["lower.pred.int"], float)
-        nat_hi    = np.asarray(fit["upper.pred.int"], float)
-        val_err   = np.asarray(fit["errors"], float)             # validation-tail residuals
-        bias_shift = float(fit["bias.shift"])
-        # residuals of the *reported* (bias-corrected) forecast on the validation tail
-        cal_resid = val_err + bias_shift
-
-        yb  = y[current_train:current_train + h]
-        sgb = sig[current_train:current_train + h]
-        n_chunks += 1
-
-        # ── interval method 1: NNS native PI ──────────────────────────────
-        lo_acc["NNS native PI"].extend(nat_lo)
-        hi_acc["NNS native PI"].extend(nat_hi)
-        nat_hw = np.maximum((nat_hi - nat_lo) / 2.0, 1e-8)
-        sig_acc["NNS native PI"].extend(nat_hw / z)
-
-        # ── interval method 2: split-CP (flat) on NNS validation residuals ─
-        q_flat = _emp_q(cal_resid, TARGET_COV)
-        q_flat = nat_hw.mean() if (q_flat is None or not np.isfinite(q_flat)) else q_flat
-        lo_acc["NNS + split-CP (flat)"].extend(point - q_flat)
-        hi_acc["NNS + split-CP (flat)"].extend(point + q_flat)
-        sig_acc["NNS + split-CP (flat)"].extend(np.full(h, max(q_flat / z, 1e-8)))
-
-        # ── interval method 3: split-CP (per-lead-time) ───────────────────
-        pl_lo = np.empty(h); pl_hi = np.empty(h); pl_s = np.empty(h)
-        for k in range(h):
-            pool = []
-            for kk in range(max(0, k - LOCAL_WIN), k + LOCAL_WIN + 1):
-                pool.extend(per_lead_pool.get(kk, []))
-            if len(pool) >= CP_MIN_POOL:
-                qk = _emp_q(pool, TARGET_COV)
-                if qk is None or not np.isfinite(qk):
-                    qk = q_flat
-            else:
-                qk = q_flat                                       # fallback while pool is thin
-            pl_lo[k] = point[k] - qk; pl_hi[k] = point[k] + qk; pl_s[k] = max(qk / z, 1e-8)
-        lo_acc["NNS + split-CP (per-lead)"].extend(pl_lo)
-        hi_acc["NNS + split-CP (per-lead)"].extend(pl_hi)
-        sig_acc["NNS + split-CP (per-lead)"].extend(pl_s)
-
-        # ── interval method 4: Gaussian (flat) ────────────────────────────
-        s_flat = max(float(np.std(cal_resid)), 1e-8)
-        lo_acc["NNS + Gaussian (flat)"].extend(point - z * s_flat)
-        hi_acc["NNS + Gaussian (flat)"].extend(point + z * s_flat)
-        sig_acc["NNS + Gaussian (flat)"].extend(np.full(h, s_flat))
-
-        # shared truth / point
-        acc_y.extend(yb); acc_sig.extend(sgb); acc_pt.extend(point)
-
-        # update per-lead pools with realized residuals AFTER scoring (now data <= next t)
-        realized = np.abs(yb - point)
-        for k in range(h):
-            per_lead_pool[k].append(realized[k])
-
-        current_train += h
-
-    y_arr = np.asarray(acc_y); sig_arr = np.asarray(acc_sig); pt_arr = np.asarray(acc_pt)
-    rows = []
-    for name in lo_acc:
-        lo = np.asarray(lo_acc[name]); hi = np.asarray(hi_acc[name])
-        s_ = np.asarray(sig_acc[name])
-        rows.append(score_method(name, lo, hi, y_arr, sig_arr, mu_=pt_arr, s_=s_))
-
-    if verbose:
-        print(f"  [seed {seed}] {n_chunks} blocks, "
-              f"point MAE={np.mean(np.abs(pt_arr - y_arr)):.3f}, {time.time()-t0:.0f}s")
-
-    return {"scores": pd.DataFrame(rows), "lo": lo_acc, "hi": hi_acc,
-            "y": y_arr, "sigma": sig_arr, "point": pt_arr}
-
-# ── Aggregate over seeds ──────────────────────────────────────────────────────
-def run_all() -> dict:
-    all_scores = []
-    for s in range(N_SEEDS):
-        print(f"\n=== seed {s} ===")
-        all_scores.append(run_once(seed=s, verbose=True)["scores"])
-
-    scores_dt = pd.concat(all_scores, ignore_index=True)
-    metric_cols = ["marg_cov", "worst_win_cov", "cov_lowvol", "cov_hivol",
-                   "cond_cov_gap", "width", "interval_score", "CRPS", "logscore"]
-    agg = (scores_dt.groupby("method", sort=False)[metric_cols]
-           .mean().reset_index().sort_values("interval_score"))
-
-    scores_dt.to_csv("results_block/scores_all.csv", index=False)
-    agg.to_csv("results_block/scores.csv", index=False)
-
-    print(f"\n=== BLOCK NNS INTERVAL COMPARISON  "
-          f"(point=NNS fixed; mean over {N_SEEDS} seeds, alpha={ALPHA}, target={TARGET_COV}) ===\n")
-    print(agg.round(3).to_string(index=False))
-    print("\nWrote results_block/scores.csv and scores_all.csv")
-    return {"scores": scores_dt, "summary": agg}
-
-
-if __name__ == "__main__":
-    run_all()
diff --git a/gists/timeseries/point_duel/run_point_duel.py b/gists/timeseries/conformal/run_conformal.py
similarity index 54%
rename from gists/timeseries/point_duel/run_point_duel.py
rename to gists/timeseries/conformal/run_conformal.py
index 4623b56..9245bc0 100644
--- a/gists/timeseries/point_duel/run_point_duel.py
+++ b/gists/timeseries/conformal/run_conformal.py
@@ -1,34 +1,39 @@
 """
-run_point_duel.py
+run_conformal.py
 ================================================================================
-Point-model duel under one honest protocol.
-
-Every method makes the SAME commitment NNS makes: at each origin t, given only
-data through t, forecast an h-step block (h = implied_h = t*(1-0.9)/0.9) with NO
-online updating inside the block. No method gets to peek at a realized value to
-predict the next one. This is the apples-to-apples that the original benchmark
-lacked (where the baselines forecast h=1 with true lags and refreshed every step).
-
-Point forecasters
------------------
-  * NNS block        — NNS.ARMA.optim, model selected on a strictly historical
-                       validation tail (leak-free), native block forecast.
-  * Ridge (recursive)— ridge on N_LAGS lags, fit on all data <= t, then projected
-                       h steps RECURSIVELY (its own predictions become the lags;
-                       no true intermediate values). This removes the h=1 crutch.
-  * Persistence      — last observed value carried forward h steps (floor).
-
-Intervals (secondary, same protocol)
-------------------------------------
-Each point method is wrapped with the SAME per-lead-time split-conformal band,
-calibrated on that method's own realized block residuals pooled across past
-origins (flat fallback while thin). NNS additionally reports its native PI.
-
-Outputs a POINT table (MAE / RMSE, the headline) and an INTERVAL table.
+Time-series block benchmark — NNS under one honest protocol (issue #57 redux).
+
+A single walk-forward emits BOTH analyses, from the same leak-free NNS block
+forecast (computed once per origin):
+
+  POINT DUEL — who forecasts best?
+    At each origin t, given only data through t, every method forecasts an
+    h-step block (h = implied_h = t*(1-0.9)/0.9) with NO online updating — no
+    method peeks at a realized value to predict the next one.
+      * NNS block        — NNS.ARMA.optim, model selected on a strictly
+                           historical validation tail (leak-free).
+      * Ridge (recursive)— ridge on N_LAGS lags, fit on all data <= t, projected
+                           h steps recursively (own predictions become lags).
+      * Persistence      — last value carried forward (floor).
+
+  INTERVAL STUDY — given the NNS forecast, how should the band be drawn?
+    The NNS point forecast is held FIXED and only the interval construction
+    varies, on identical residuals:
+      * NNS native PI            — results +/- pi_width (flat, NNS's own rule)
+      * NNS + split-CP (flat)    — empirical (1-a) quantile of NNS residuals
+      * NNS + split-CP (per-lead)— quantile per lead-time k (widens w/ horizon)
+      * NNS + Gaussian (flat)    — z * std(residuals)
+    Plus Ridge/Persistence + split-CP (per-lead) to show that interval quality
+    follows point quality.
+
+Framed as an adaptation to discern coverage guarantees on a heteroskedastic
+process: marginal coverage is cheap; the test is conditional (per-regime /
+worst-window) coverage and whether width adapts.
 
 Run:
     pip install ovvo-nns numpy pandas scipy scikit-learn
-    python run_point_duel.py
+    python run_conformal.py
+Writes results/{point,interval}{,_all}.csv
 """
 from __future__ import annotations
 
@@ -69,8 +74,9 @@
 CP_MIN_POOL = 8
 LOCAL_WIN   = 2
 
-os.makedirs("results_duel", exist_ok=True)
+os.makedirs("results", exist_ok=True)
 _MSE = lambda predicted, actual: np.mean((predicted - actual) ** 2)
+POINT_METHODS = ["NNS block", "Ridge (recursive)", "Persistence"]
 
 # ── DGP ──────────────────────────────────────────────────────────────────────
 def make_timeseries(T: int = 3500, seed: int = 0, heavy_tail: bool = False) -> dict:
@@ -88,16 +94,14 @@ def make_timeseries(T: int = 3500, seed: int = 0, heavy_tail: bool = False) -> d
         y[i] = level[i] + 0.55 * (y[i-1] - level[i-1]) + sigma[i] * eps[i]
     return {"y": y, "level": level, "sigma": sigma}
 
-# ── Lag features ─────────────────────────────────────────────────────────────
 def lag_features(y: np.ndarray, n_lags: int = N_LAGS):
     n  = len(y)
     yy = y[n_lags:]
     X  = np.column_stack([y[n_lags - k: n - k] for k in range(1, n_lags + 1)])
     return X, yy
 
-# ── Point forecasters (all produce an h-step block from the end of y[:t]) ────
+# ── Point forecasters (each returns an h-step block from end of y_hist) ──────
 def forecast_ridge_recursive(y_hist: np.ndarray, h: int, n_lags: int = N_LAGS) -> np.ndarray:
-    """Fit ridge on all data <= t, then project h steps recursively."""
     X, yy = lag_features(y_hist, n_lags)
     if HAS_SKLEARN:
         model = make_pipeline(StandardScaler(), Ridge(alpha=1.0))
@@ -107,13 +111,10 @@ def forecast_ridge_recursive(y_hist: np.ndarray, h: int, n_lags: int = N_LAGS) -
         Xtr = np.column_stack([X, np.ones(len(X))])
         coef, *_ = np.linalg.lstsq(Xtr, yy, rcond=None)
         predict = lambda lags: float(np.append(lags, 1.0) @ coef)
-    hist = list(y_hist[-n_lags:])
-    out = []
+    hist = list(y_hist[-n_lags:]); out = []
     for _ in range(h):
-        lags = np.array(hist[-1::-1][:n_lags])   # [y_{t}, y_{t-1}, ..., y_{t-n_lags+1}]
-        yhat = predict(lags)
-        out.append(yhat)
-        hist.append(yhat)                         # feed prediction back (no true value)
+        lags = np.array(hist[-1::-1][:n_lags])
+        yhat = predict(lags); out.append(yhat); hist.append(yhat)
     return np.asarray(out)
 
 def forecast_persistence(y_hist: np.ndarray, h: int) -> np.ndarray:
@@ -133,7 +134,7 @@ def get_nns_seas_periods(series: np.ndarray, train_n: int) -> list[int]:
         return [2]
 
 def forecast_nns_block(y_hist: np.ndarray, h: int):
-    """Returns (point, native_lo, native_hi, cal_resid) — leak-free block forecast."""
+    """Leak-free NNS block: returns (point, native_lo, native_hi, cal_resid)."""
     n = len(y_hist)
     val_h = max(1, int(round(VAL_FRAC * n)))
     periods = get_nns_seas_periods(y_hist, n)
@@ -148,57 +149,59 @@ def forecast_nns_block(y_hist: np.ndarray, h: int):
     cal_resid = np.asarray(fit["errors"], float) + float(fit["bias.shift"])
     return point, nat_lo, nat_hi, cal_resid
 
-# ── Scoring helpers ──────────────────────────────────────────────────────────
+# ── Metrics ──────────────────────────────────────────────────────────────────
 def z_alpha(alpha: float = ALPHA) -> float:
     return float(stats.norm.ppf(1 - alpha / 2))
-
 def coverage(lo, hi, y) -> float:
-    lo, hi, y = map(np.asarray, (lo, hi, y))
-    return float(np.mean((y >= lo) & (y <= hi)))
-
+    lo, hi, y = map(np.asarray, (lo, hi, y)); return float(np.mean((y >= lo) & (y <= hi)))
 def mean_width(lo, hi) -> float:
     return float(np.mean(np.asarray(hi) - np.asarray(lo)))
-
 def rolling_coverage(lo, hi, y, window: int = WINDOW) -> np.ndarray:
     lo, hi, y = map(np.asarray, (lo, hi, y)); n = len(y)
-    if n < window:
-        return np.array([])
-    return np.array([coverage(lo[i:i+window], hi[i:i+window], y[i:i+window])
-                     for i in range(n - window + 1)])
-
+    if n < window: return np.array([])
+    return np.array([coverage(lo[i:i+window], hi[i:i+window], y[i:i+window]) for i in range(n-window+1)])
 def worst_window_coverage(lo, hi, y, window: int = WINDOW) -> float:
-    rc = rolling_coverage(lo, hi, y, window)
-    return float(np.min(rc)) if len(rc) > 0 else float("nan")
-
+    rc = rolling_coverage(lo, hi, y, window); return float(np.min(rc)) if len(rc) > 0 else float("nan")
 def interval_score(lo, hi, y, alpha: float = ALPHA) -> float:
     lo, hi, y = map(np.asarray, (lo, hi, y))
-    return float(np.mean((hi - lo) + (2/alpha)*np.maximum(lo - y, 0)
-                         + (2/alpha)*np.maximum(y - hi, 0)))
-
+    return float(np.mean((hi - lo) + (2/alpha)*np.maximum(lo - y, 0) + (2/alpha)*np.maximum(y - hi, 0)))
+def coverage_by_stratum(lo, hi, y, sigma, k: int = 4) -> list[float]:
+    lo, hi, y, sigma = map(np.asarray, (lo, hi, y, sigma))
+    ranks = stats.rankdata(sigma, method="ordinal")
+    grp   = np.ceil(ranks / len(ranks) * k).astype(int).clip(1, k)
+    return [coverage(lo[grp == j], hi[grp == j], y[grp == j]) for j in range(1, k+1)]
 def _emp_q(scores, level=TARGET_COV):
     s = np.sort(np.abs(np.asarray(scores, float))); n = len(s)
-    if n == 0:
-        return None
+    if n == 0: return None
     k = int(np.ceil((n + 1) * level)) - 1
     return float("inf") if k >= n else float(s[k])
 
-# ── One seed ──────────────────────────────────────────────────────────────────
-POINT_METHODS = ["NNS block", "Ridge (recursive)", "Persistence"]
-
+# ── One seed: single walk-forward, both analyses ─────────────────────────────
 def run_once(seed: int = 0, heavy_tail: bool = False, verbose: bool = True) -> dict:
     d = make_timeseries(T=3500, seed=seed, heavy_tail=heavy_tail)
     y, sig = d["y"], d["sigma"]
     T_raw = len(y); z = z_alpha(ALPHA)
 
-    pt_pred  = defaultdict(list)            # point forecasts by method
+    pt_pred = defaultdict(list)
     acc_y, acc_sig = [], []
-    lo_acc = defaultdict(list); hi_acc = defaultdict(list)
-    pools  = {m: defaultdict(list) for m in POINT_METHODS}   # per-method per-lead resid pool
-    nat_lo_all, nat_hi_all = [], []        # NNS native PI
+    lo_acc = defaultdict(list); hi_acc = defaultdict(list); sig_acc = defaultdict(list)
+    per_lead = {m: defaultdict(list) for m in POINT_METHODS}   # realized |resid| by lead
+    flat_pool = {m: [] for m in POINT_METHODS}                 # realized |resid| pooled
 
     current_train = N_LAGS + CAL_END
     t0 = time.time(); n_chunks = 0
 
+    def cp_width(m, k, q_fallback):
+        pool = []
+        for kk in range(max(0, k - LOCAL_WIN), k + LOCAL_WIN + 1):
+            pool.extend(per_lead[m].get(kk, []))
+        if len(pool) >= CP_MIN_POOL:
+            q = _emp_q(pool, TARGET_COV)
+            if q is not None and np.isfinite(q):
+                return q
+        qf = _emp_q(flat_pool[m], TARGET_COV) if flat_pool[m] else None
+        return qf if (qf is not None and np.isfinite(qf)) else q_fallback
+
     while current_train < T_raw:
         remaining = T_raw - current_train
         implied_h = max(1, int(current_train * (1 - TRAINING_FRAC) / TRAINING_FRAC))
@@ -208,12 +211,10 @@ def run_once(seed: int = 0, heavy_tail: bool = False, verbose: bool = True) -> d
         sgb = sig[current_train:current_train + h]
 
         try:
-            nns_pt, nat_lo, nat_hi, _ = forecast_nns_block(y_hist, h)
+            nns_pt, nat_lo, nat_hi, cal_resid = forecast_nns_block(y_hist, h)
         except Exception as exc:
-            if verbose:
-                print(f"  [seed {seed} chunk {n_chunks+1} NNS failed: {exc}] skip")
-            current_train += h
-            continue
+            if verbose: print(f"  [seed {seed} chunk {n_chunks+1} NNS failed: {exc}] skip")
+            current_train += h; continue
 
         preds = {
             "NNS block":         nns_pt,
@@ -222,66 +223,73 @@ def run_once(seed: int = 0, heavy_tail: bool = False, verbose: bool = True) -> d
         }
         n_chunks += 1
 
-        # per-lead split-CP on each method's own past realized residuals
+        q_flat_nns = _emp_q(cal_resid, TARGET_COV)
+        if q_flat_nns is None or not np.isfinite(q_flat_nns):
+            q_flat_nns = float(np.std(cal_resid) + 1e-6)
+        s_flat_nns = max(float(np.std(cal_resid)), 1e-8)
+
+        # ── INTERVAL STUDY on the fixed NNS point ────────────────────────────
+        lo_acc["NNS native PI"].extend(nat_lo); hi_acc["NNS native PI"].extend(nat_hi)
+        sig_acc["NNS native PI"].extend(np.maximum((nat_hi - nat_lo)/2.0, 1e-8)/z)
+
+        lo_acc["NNS + split-CP (flat)"].extend(nns_pt - q_flat_nns)
+        hi_acc["NNS + split-CP (flat)"].extend(nns_pt + q_flat_nns)
+        sig_acc["NNS + split-CP (flat)"].extend(np.full(h, max(q_flat_nns/z, 1e-8)))
+
+        lo_acc["NNS + Gaussian (flat)"].extend(nns_pt - z*s_flat_nns)
+        hi_acc["NNS + Gaussian (flat)"].extend(nns_pt + z*s_flat_nns)
+        sig_acc["NNS + Gaussian (flat)"].extend(np.full(h, s_flat_nns))
+
+        # ── per-lead split-CP for each point method (shared machinery) ───────
         for m in POINT_METHODS:
             p = preds[m]
-            lo = np.empty(h); hi = np.empty(h)
+            name = f"{m} + split-CP (per-lead)" if m != "NNS block" else "NNS + split-CP (per-lead)"
+            lo = np.empty(h); hi = np.empty(h); ss = np.empty(h)
             for k in range(h):
-                pool = []
-                for kk in range(max(0, k - LOCAL_WIN), k + LOCAL_WIN + 1):
-                    pool.extend(pools[m].get(kk, []))
-                qk = _emp_q(pool, TARGET_COV) if len(pool) >= CP_MIN_POOL else None
-                if qk is None or not np.isfinite(qk):
-                    # fallback: flat quantile of whatever realized resid we have, else inf->wide
-                    flat_pool = [v for vs in pools[m].values() for v in vs]
-                    qf = _emp_q(flat_pool, TARGET_COV) if flat_pool else None
-                    qk = qf if (qf is not None and np.isfinite(qf)) else float(np.std(p) + 1e-6)
-                lo[k] = p[k] - qk; hi[k] = p[k] + qk
-            lo_acc[f"{m} + CP"].extend(lo); hi_acc[f"{m} + CP"].extend(hi)
-
-        nat_lo_all.extend(nat_lo); nat_hi_all.extend(nat_hi)
+                qk = cp_width(m, k, q_flat_nns)
+                lo[k] = p[k] - qk; hi[k] = p[k] + qk; ss[k] = max(qk/z, 1e-8)
+            lo_acc[name].extend(lo); hi_acc[name].extend(hi); sig_acc[name].extend(ss)
+
         for m in POINT_METHODS:
             pt_pred[m].extend(preds[m])
         acc_y.extend(yb); acc_sig.extend(sgb)
 
-        # update pools AFTER scoring
+        # update pools AFTER scoring (now data <= next origin)
         for m in POINT_METHODS:
             r = np.abs(yb - preds[m])
+            flat_pool[m].extend(r.tolist())
             for k in range(h):
-                pools[m][k].append(r[k])
+                per_lead[m][k].append(r[k])
 
         current_train += h
 
     y_arr = np.asarray(acc_y); sig_arr = np.asarray(acc_sig)
 
-    # POINT metrics
+    # POINT table
     point_rows = []
     for m in POINT_METHODS:
-        p = np.asarray(pt_pred[m])
-        err = p - y_arr
-        point_rows.append({
-            "method": m,
-            "MAE":    float(np.mean(np.abs(err))),
-            "RMSE":   float(np.sqrt(np.mean(err**2))),
-            "median_AE": float(np.median(np.abs(err))),
-        })
+        err = np.asarray(pt_pred[m]) - y_arr
+        point_rows.append({"method": m, "MAE": float(np.mean(np.abs(err))),
+                           "RMSE": float(np.sqrt(np.mean(err**2))),
+                           "median_AE": float(np.median(np.abs(err)))})
 
-    # INTERVAL metrics
+    # INTERVAL table
     int_rows = []
-    def _introw(name, lo, hi):
-        lo = np.asarray(lo); hi = np.asarray(hi)
-        return {"method": name, "marg_cov": coverage(lo, hi, y_arr),
-                "worst_win_cov": worst_window_coverage(lo, hi, y_arr),
-                "width": mean_width(lo, hi), "interval_score": interval_score(lo, hi, y_arr)}
-    int_rows.append(_introw("NNS native PI", nat_lo_all, nat_hi_all))
     for name in lo_acc:
-        int_rows.append(_introw(name, lo_acc[name], hi_acc[name]))
+        lo = np.asarray(lo_acc[name]); hi = np.asarray(hi_acc[name]); s_ = np.asarray(sig_acc[name])
+        cbs = coverage_by_stratum(lo, hi, y_arr, sig_arr)
+        int_rows.append({
+            "method": name, "marg_cov": coverage(lo, hi, y_arr),
+            "worst_win_cov": worst_window_coverage(lo, hi, y_arr),
+            "cov_lowvol": cbs[0], "cov_hivol": cbs[-1],
+            "cond_cov_gap": max(abs(c - TARGET_COV) for c in cbs if not math.isnan(c)),
+            "width": mean_width(lo, hi), "interval_score": interval_score(lo, hi, y_arr),
+        })
 
     if verbose:
         mae = {m: np.mean(np.abs(np.asarray(pt_pred[m]) - y_arr)) for m in POINT_METHODS}
         print(f"  [seed {seed}] {n_chunks} blocks  MAE: "
-              + "  ".join(f"{m}={mae[m]:.3f}" for m in POINT_METHODS)
-              + f"  ({time.time()-t0:.0f}s)")
+              + "  ".join(f"{m}={mae[m]:.3f}" for m in POINT_METHODS) + f"  ({time.time()-t0:.0f}s)")
 
     return {"point": pd.DataFrame(point_rows), "interval": pd.DataFrame(int_rows)}
 
@@ -298,17 +306,17 @@ def run_all() -> dict:
     point_agg = point_dt.groupby("method", sort=False).mean().reset_index().sort_values("RMSE")
     int_agg   = int_dt.groupby("method", sort=False).mean().reset_index().sort_values("interval_score")
 
-    point_dt.to_csv("results_duel/point_all.csv", index=False)
-    int_dt.to_csv("results_duel/interval_all.csv", index=False)
-    point_agg.to_csv("results_duel/point.csv", index=False)
-    int_agg.to_csv("results_duel/interval.csv", index=False)
+    point_dt.to_csv("results/point_all.csv", index=False)
+    int_dt.to_csv("results/interval_all.csv", index=False)
+    point_agg.to_csv("results/point.csv", index=False)
+    int_agg.to_csv("results/interval.csv", index=False)
 
-    print(f"\n=== POINT-MODEL DUEL  (block h-step, no online updating; "
-          f"mean over {N_SEEDS} seeds) ===\n")
+    pd.set_option("display.width", 200, "display.max_columns", 20)
+    print(f"\n=== POINT DUEL  (block h-step, no online updating; mean over {N_SEEDS} seeds) ===\n")
     print(point_agg.round(3).to_string(index=False))
-    print(f"\n=== INTERVALS (same protocol, per-method CP + NNS native) ===\n")
+    print(f"\n=== INTERVAL STUDY  (point = NNS fixed; alpha={ALPHA}, target={TARGET_COV}) ===\n")
     print(int_agg.round(3).to_string(index=False))
-    print("\nWrote results_duel/{point,interval}{,_all}.csv")
+    print("\nWrote results/{point,interval}{,_all}.csv")
     return {"point": point_agg, "interval": int_agg}
 
 
diff --git a/gists/timeseries/point_duel/README.md b/gists/timeseries/point_duel/README.md
deleted file mode 100644
index 854f635..0000000
--- a/gists/timeseries/point_duel/README.md
+++ /dev/null
@@ -1,93 +0,0 @@
-# Point-model duel — NNS vs. recursive ridge, one honest protocol
-
-A fair head-to-head on **point forecasting**, with prediction intervals as a
-secondary check. Every method makes the same commitment NNS makes:
-
-> At each origin *t*, given only data through *t*, forecast an `h`-step block
-> (`h = implied_h = t·(1−0.9)/0.9`) with **no online updating**. No method may
-> peek at a realized value to predict the next one.
-
-This is the comparison the earlier conformal benchmark lacked: there the
-baselines forecast one step ahead with the *true* lag and refreshed every step,
-while NNS forecast a multi-step block blind. Here everyone forecasts the block
-blind, so the contest measures forecasting-by-design, not access to the truth.
-
-## Contenders
-
-- **NNS block** — `NNS.ARMA.optim`, model (periods/method/bias) selected on a
-  strictly historical validation tail (leak-free), native block forecast.
-- **Ridge (recursive)** — ridge on `N_LAGS` lags, fit on all data ≤ *t*, then
-  projected `h` steps **recursively** (its own predictions become the lags; no
-  true intermediate values). This is the baseline stripped of the h=1 crutch.
-- **Persistence** — last value carried forward `h` steps (floor).
-
-Intervals: each point method is wrapped with the **same** per-lead-time
-split-conformal band (calibrated on that method's own realized block residuals,
-pooled across past origins). NNS also reports its native PI.
-
-## Results
-
-Mean over 10 seeds, T = 3500, ~12 blocks/seed. Lower is better throughout.
-
-**Point forecast (the headline):**
-
-```
-           method   MAE  RMSE  median_AE
-        NNS block 1.514 2.056      1.122
-Ridge (recursive) 2.664 3.230      2.403
-      Persistence 2.493 3.241      1.991
-```
-
-**Intervals (same protocol; per-method split-CP + NNS native PI):**
-
-```
-                method  marg_cov  worst_win_cov  width  interval_score
-         NNS native PI     0.845          0.561  5.658           8.674
-        NNS block + CP     0.894          0.288  8.306          11.339
-Ridge (recursive) + CP     0.835          0.124  9.819          15.189
-      Persistence + CP     0.860          0.000 11.701          18.674
-```
-
-## Interpretation
-
-- **NNS wins the point duel decisively.** MAE 1.51 vs. recursive ridge 2.66
-  (~43% lower) and persistence 2.49; same ordering on RMSE and median absolute
-  error. The result is consistent across all 10 seeds, not seed-driven.
-- **Recursive ridge is *worse than persistence* on MAE.** Once you remove the
-  h=1 crutch (true lag every step) and force a genuine multi-step block, the
-  linear AR model's error compounds over the horizon until naive carry-forward
-  beats it. This is the concrete version of the forecaster-by-design vs.
-  forecaster-by-compulsion distinction: a structural seasonal model projects a
-  real trajectory; a reactive linear model that only ever learned the one-step
-  map degrades to noise without the truth fed back.
-- **Intervals follow point quality.** NNS's native PI is far the best Winkler
-  score (8.67) at the tightest width — because better, smaller residuals make a
-  tighter, better-centred band. The conformal wrappers on the weaker point
-  models inherit their larger errors and score worse.
-
-### Caveat on the interval table
-
-The shared per-lead-time split-conformal wrapper here has a cold-start fallback
-(it has no realized block residuals for the first origins), which depresses
-**worst-window** coverage for the `+ CP` rows (e.g. NNS block + CP at 0.288).
-Read the **point** table (MAE / RMSE / median AE) as the clean, primary result
-and the interval table as directional. The carefully-calibrated interval study
-lives separately; this branch's job is the point-model duel.
-
-## Run it
-
-```bash
-pip install ovvo-nns numpy pandas scipy scikit-learn
-python run_point_duel.py
-```
-
-Writes `results_duel/point.csv`, `interval.csv` (aggregated) and the per-seed
-`*_all.csv`.
-
-## Note
-
-Marginal coverage of every flat band falls short of the 0.90 target on this
-heteroskedastic DGP — that is the exchangeability-violation penalty, common to
-all methods, not specific to any one. The point tables (MAE / RMSE) are the
-clean comparison; the interval tables should be read as efficiency-at-similar-
-(under)coverage, not as a coverage guarantee.
diff --git a/gists/timeseries/point_duel/results_duel/interval.csv b/gists/timeseries/point_duel/results_duel/interval.csv
deleted file mode 100644
index 54abe05..0000000
--- a/gists/timeseries/point_duel/results_duel/interval.csv
+++ /dev/null
@@ -1,5 +0,0 @@
-method,marg_cov,worst_win_cov,width,interval_score
-NNS native PI,0.845217041800643,0.561,5.658247904936664,8.673563693193246
-NNS block + CP,0.8942926045016077,0.288,8.30644088604347,11.339456828288366
-Ridge (recursive) + CP,0.834766881028939,0.124,9.819236038266109,15.188952711334176
-Persistence + CP,0.8596061093247588,0.0,11.700845119008003,18.674312892485847
diff --git a/gists/timeseries/point_duel/results_duel/interval_all.csv b/gists/timeseries/point_duel/results_duel/interval_all.csv
deleted file mode 100644
index 1f87543..0000000
--- a/gists/timeseries/point_duel/results_duel/interval_all.csv
+++ /dev/null
@@ -1,41 +0,0 @@
-method,marg_cov,worst_win_cov,width,interval_score
-NNS native PI,0.8476688102893891,0.67,5.3336289551117595,8.347311881079282
-NNS block + CP,0.8906752411575563,0.22,8.177099486196127,11.529523035347692
-Ridge (recursive) + CP,0.8372186495176849,0.1,10.278876556971408,16.046670825709686
-Persistence + CP,0.8436495176848875,0.0,10.213249743026658,16.33320386075765
-NNS native PI,0.8287781350482315,0.52,5.538396478733216,9.039315486588494
-NNS block + CP,0.8902733118971061,0.37,8.390728807696368,11.499697971060435
-Ridge (recursive) + CP,0.7990353697749196,0.08,9.39395853105691,15.636645535901216
-Persistence + CP,0.8581189710610932,0.0,11.002413778926346,16.322217439495994
-NNS native PI,0.8480707395498392,0.54,5.498313881263709,8.568987367833104
-NNS block + CP,0.8870578778135049,0.32,7.850041608639307,10.881610133044617
-Ridge (recursive) + CP,0.8331993569131833,0.03,9.461311846092478,15.290577454252144
-Persistence + CP,0.8388263665594855,0.0,13.454901313710085,20.728177627283294
-NNS native PI,0.8404340836012861,0.55,5.417943415993259,8.509212166486481
-NNS block + CP,0.8971061093247589,0.18,8.178784348833949,11.442682746189892
-Ridge (recursive) + CP,0.842443729903537,0.21,9.937090055812732,15.019945691471802
-Persistence + CP,0.9091639871382636,0.0,14.085426811335523,22.670411720818084
-NNS native PI,0.8472668810289389,0.51,5.912994900178406,8.952518643444352
-NNS block + CP,0.8979099678456591,0.27,8.346058510609698,11.244196033911283
-Ridge (recursive) + CP,0.8336012861736335,0.13,9.626844810715227,14.755633310870019
-Persistence + CP,0.9011254019292605,0.0,11.723029193525093,15.965745582710372
-NNS native PI,0.860128617363344,0.58,5.662363541166043,8.391587116904702
-NNS block + CP,0.8922829581993569,0.25,8.141163390003173,11.244453556266834
-Ridge (recursive) + CP,0.8420418006430869,0.25,9.708049657419254,14.244410697181051
-Persistence + CP,0.8942926045016077,0.0,15.174974529034394,25.340887411818077
-NNS native PI,0.8432475884244373,0.62,5.958905251551655,8.884737288789054
-NNS block + CP,0.9115755627009646,0.41,9.046462585915824,11.49001377083638
-Ridge (recursive) + CP,0.8512861736334405,0.13,10.192315755071641,15.29971782370236
-Persistence + CP,0.8110932475884244,0.0,11.285499121470306,19.764628557431646
-NNS native PI,0.8404340836012861,0.42,5.533785215198311,8.749410650539787
-NNS block + CP,0.8854501607717041,0.21,7.999981741192585,11.219855763022137
-Ridge (recursive) + CP,0.8283762057877814,0.1,9.51494956048423,14.79895664009931
-Persistence + CP,0.8802250803858521,0.0,9.804629243119559,14.085680509142973
-NNS native PI,0.8484726688102894,0.58,6.028058412975543,8.799144860895987
-NNS block + CP,0.8959003215434084,0.46,8.40742772515753,11.10618124347616
-Ridge (recursive) + CP,0.8259646302250804,0.06,9.845777723130777,15.51537851501687
-Persistence + CP,0.8046623794212219,0.0,9.304325751647738,16.85406180705558
-NNS native PI,0.8476688102893891,0.62,5.69808899719474,8.493411469371221
-NNS block + CP,0.8946945337620579,0.19,8.52666065619013,11.736354029728231
-Ridge (recursive) + CP,0.8545016077170418,0.15,10.233185885906426,15.281590619137305
-Persistence + CP,0.854903536977492,0.0,10.960001704284322,18.678114408344804

From 8e4e3b065c03e3258ab92126b97565f2e268eb01 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Tue, 23 Jun 2026 19:22:48 +0000
Subject: [PATCH 3/3] Drop upstream-guard note from benchmark README (#57)

Correct `variable` specification at the call site (variable=train_slice, never
y[:end_i]) prevents the evaluation-chunk leak, so there is no separate upstream
guard to flag as outstanding.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TX6vFofX1vnanE147tJFWF
---
 gists/timeseries/conformal/README.md | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/gists/timeseries/conformal/README.md b/gists/timeseries/conformal/README.md
index 2aabb0d..8761992 100644
--- a/gists/timeseries/conformal/README.md
+++ b/gists/timeseries/conformal/README.md
@@ -118,10 +118,3 @@ python run_conformal.py
 
 Writes `results/point.csv`, `results/interval.csv` (aggregated) and the per-seed
 `*_all.csv`.
-
-## Not yet addressed
-
-`NNS.ARMA.optim` still treats the tail after `training_set` as its validation target
-with no guard or warning when out-of-range data is passed — the upstream gap behind
-#57. Adding that guard is left to a follow-up; until then, treat this as the
-corrected-content restoration, not a closed case.