Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions gists/timeseries/conformal/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# NNS under one honest forecasting protocol

A single block walk-forward over one heteroskedastic data-generating process,
emitting two coherent analyses from the *same* leak-free NNS forecast:

1. **Point duel** — does NNS actually forecast better than a fair baseline?
2. **Interval study** — given NNS's forecast, is its native prediction interval
better or worse than conformalizing the same residuals?

This replaces the benchmark withdrawn under issue #57, which (a) leaked each
evaluation chunk into `NNS.ARMA.optim`'s validation tail and (b) compared methods
across mismatched protocols (online h=1 baselines vs. an NNS multi-step block).

## The protocol (one rule for everyone)

> At each origin *t*, given only data through *t*, every method forecasts an
> `h`-step block (`h = implied_h = t·(1−0.9)/0.9`) with **no online updating** —
> no method may peek at a realized value to predict the next one.

This is *forecasting by design*, exposed to its own compounding error — the
opposite of an online method that re-anchors on the truth every step. NNS's model
(periods/method/bias) is selected on a strictly **historical** validation tail, so
the scored block is never shown to the optimizer (the #57 leak is fixed).

- **DGP:** non-linear, heteroskedastic AR(1), slow trend, two seasonal components
(periods 50, 200), piecewise volatility regimes (σ → 2.5, 0.55, 1.8). 10 seeds,
T = 3500, ~12 blocks/seed.

## Point duel

- **NNS block** — `NNS.ARMA.optim` native block forecast (leak-free selection).
- **Ridge (recursive)** — ridge on `N_LAGS` lags, fit on all data ≤ *t*, projected
`h` steps recursively (its own predictions become the lags; no true intermediate
values — this removes the h=1 true-lag crutch the original baselines enjoyed).
- **Persistence** — last value carried forward (floor).

```
method MAE RMSE median_AE
NNS block 1.514 2.056 1.122
Ridge (recursive) 2.664 3.230 2.403
Persistence 2.493 3.241 1.991
```

## Interval study

We deliberately **adapt** conformal into a forecast wrapper here, and say so. Online
adaptive conformal (ACI/PID/NexCP) gets its coverage from a per-step feedback loop —
a control-theoretic property, not forecast skill — so pitting it against a multi-step
block forecaster on a 1-step task tests the wrong thing. Instead we hold the NNS point
forecast **fixed** and vary only the band, to discern **coverage guarantees on a
heteroskedastic process** — conditional (per-regime / worst-window) coverage, not just
the marginal rate:

- **NNS native PI** — `results ± pi_width`, NNS's own flat nonparametric rule.
- **NNS + split-CP (flat)** — empirical (1−α) conformal quantile of NNS residuals.
- **NNS + split-CP (per-lead)** — a quantile *per lead-time k* (the only band that
widens with horizon).
- **NNS + Gaussian (flat)** — `z · std(residuals)`.
- **Ridge / Persistence + split-CP (per-lead)** — same wrapper on the weaker point
models, to show interval quality follows point quality.

```
method marg_cov worst_win_cov cov_lowvol cov_hivol cond_cov_gap width interval_score
NNS native PI 0.845 0.561 0.927 0.853 0.149 5.658 8.674
NNS + split-CP (flat) 0.824 0.515 0.899 0.850 0.169 5.398 8.735
NNS + Gaussian (flat) 0.821 0.516 0.894 0.842 0.170 5.332 8.773
NNS + split-CP (per-lead) 0.918 0.649 0.996 0.858 0.097 8.595 10.505
Ridge (recursive) + split-CP (per-lead) 0.865 0.553 0.955 0.826 0.149 10.169 13.853
Persistence + split-CP (per-lead) 0.887 0.333 0.981 0.807 0.162 12.107 16.440
```

Point table sorted by RMSE, interval table by interval (Winkler) score; lower is
better throughout.

## Findings

- **NNS wins the point duel decisively** — MAE 1.51 vs. recursive ridge 2.66
(~43% lower) and persistence 2.49, same ordering on RMSE/median, consistent
across all 10 seeds. Recursive ridge falls *below* persistence on MAE: strip the
h=1 true-lag crutch and a linear AR model's error compounds over the block until
naive carry-forward beats it. Structural forecasting by design vs. reactive by
compulsion, made concrete.
- **NNS's native PI is the efficiency winner and ≈ a flat split-conformal band.**
Best interval score (8.674) at the tightest competitive width, and it tracks the
empirical split-conformal quantile on the *same* residuals (8.735) — confirming
the native band is, in effect, flat split conformal, slightly more conservative.
- **Every flat band under-covers the volatile regime** (marg 0.82–0.85, worst-window
~0.51–0.56, `cov_hivol` ~0.84–0.85). Calibrating a single width on historical
residuals cannot transport to a multi-step block that compounds error and crosses
σ-regimes — a textbook exchangeability failure under heteroskedasticity.
- **Only the horizon-adaptive per-lead wrapper recovers the guarantee** — near-nominal
0.918 marginal, the smallest conditional gap (0.097) and best worst-window (0.649) —
at ~52% wider intervals and the worst Winkler score among the NNS bands. No free
lunch: coverage on a heteroskedastic process costs width.
- **Interval quality follows point quality.** The *same* per-lead wrapper on the
weaker point models scores far worse (NNS 10.5 ≪ ridge 13.9 ≪ persistence 16.4)
with much wider bands — a better forecast makes a tighter, better-centred interval.

## How to read it

- **Marginal coverage near 0.90 is necessary but cheap.** On a heteroskedastic DGP
the honest test is the **conditional** gap between calm (`cov_lowvol`) and volatile
(`cov_hivol`) regimes and the **worst rolling window** — a flat band that hits 0.90
on average by over-covering calm and under-covering volatile has not solved the
problem. Every flat band here under-covers the volatile regime: that is the
exchangeability-violation penalty, common to all of them, not specific to any one.
- **Width is only a fair tie-breaker at matched coverage.** Among methods with
different coverage, a narrower band may just be under-covering more. The interval
(Winkler) score combines the two (`width + (2/α)·exceedance`), which is why it is
the sort key.

## Run it

```bash
pip install ovvo-nns numpy pandas scipy scikit-learn
python run_conformal.py
```

Writes `results/point.csv`, `results/interval.csv` (aggregated) and the per-seed
`*_all.csv`.
7 changes: 7 additions & 0 deletions gists/timeseries/conformal/results/interval.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
method,marg_cov,worst_win_cov,cov_lowvol,cov_hivol,cond_cov_gap,width,interval_score
NNS native PI,0.845217041800643,0.561,0.9271704180064309,0.852572347266881,0.14855305466237945,5.658247904936664,8.673563693193246
NNS + split-CP (flat),0.8243569131832797,0.515,0.8993569131832798,0.84983922829582,0.1686495176848875,5.397754711147717,8.734633715776543
NNS + Gaussian (flat),0.8206993569131832,0.516,0.8940514469453376,0.8421221864951768,0.17025723472668813,5.332100605902861,8.772635078222827
NNS + split-CP (per-lead),0.917604501607717,0.649,0.9959807073954984,0.8577170418006432,0.09655948553054661,8.59459297384414,10.505483113927781
Ridge (recursive) + split-CP (per-lead),0.865032154340836,0.553,0.9546623794212218,0.8263665594855306,0.14855305466237945,10.16897349633998,13.853289701795891
Persistence + split-CP (per-lead),0.88725884244373,0.333,0.9810289389067524,0.807395498392283,0.1617684887459807,12.106951264773958,16.439867386859973
61 changes: 61 additions & 0 deletions gists/timeseries/conformal/results/interval_all.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
method,marg_cov,worst_win_cov,cov_lowvol,cov_hivol,cond_cov_gap,width,interval_score
NNS native PI,0.8476688102893891,0.67,0.9131832797427653,0.8070739549839229,0.09453376205787789,5.3336289551117595,8.347311881079282
NNS + split-CP (flat),0.8271704180064309,0.58,0.8954983922829582,0.8311897106109325,0.14919614147909965,5.227161385540743,8.534113534345261
NNS + Gaussian (flat),0.8295819935691319,0.58,0.8858520900321544,0.8279742765273312,0.14276527331189715,5.220493265073913,8.535512774975702
NNS + split-CP (per-lead),0.9147909967845659,0.65,0.9967845659163987,0.8183279742765274,0.09678456591639872,8.425865285679825,10.568242001830663
Ridge (recursive) + split-CP (per-lead),0.8661575562700965,0.64,0.9598070739549839,0.7909967845659164,0.10900321543408364,10.581823596303991,14.592993775300465
Persistence + split-CP (per-lead),0.8721864951768489,0.44,0.9710610932475884,0.707395498392283,0.19260450160771703,10.549102615950632,14.52448029198288
NNS native PI,0.8287781350482315,0.52,0.9163987138263665,0.8617363344051447,0.2022508038585209,5.538396478733216,9.039315486588494
NNS + split-CP (flat),0.815112540192926,0.51,0.9067524115755627,0.8440514469453376,0.23118971061093252,5.423595785308306,9.16356799285496
NNS + Gaussian (flat),0.8163183279742765,0.52,0.9163987138263665,0.8408360128617364,0.23118971061093252,5.371605324250339,9.132929020698203
NNS + split-CP (per-lead),0.9111736334405145,0.6,0.9967845659163987,0.8456591639871383,0.09678456591639872,8.679187273640938,10.817084198706139
Ridge (recursive) + split-CP (per-lead),0.8356109324758842,0.55,0.932475884244373,0.7909967845659164,0.1909967845659164,9.768945508312589,14.299961658445255
Persistence + split-CP (per-lead),0.8934887459807074,0.52,0.9967845659163987,0.7508038585209004,0.14919614147909965,11.420957332270556,14.557742149799997
NNS native PI,0.8480707395498392,0.54,0.9453376205787781,0.8713826366559485,0.14115755627009652,5.498313881263709,8.568987367833104
NNS + split-CP (flat),0.8348070739549839,0.47,0.9115755627009646,0.8858520900321544,0.16366559485530552,5.475440096762711,8.646211618108355
NNS + Gaussian (flat),0.8340032154340836,0.47,0.905144694533762,0.8681672025723473,0.17009646302250803,5.3825939727945,8.623802541408669
NNS + split-CP (per-lead),0.9127813504823151,0.6,0.9871382636655949,0.8810289389067524,0.09292604501607715,8.192814278346807,10.105807673835704
Ridge (recursive) + split-CP (per-lead),0.8701768488745981,0.6,0.9517684887459807,0.8520900321543409,0.16045016077170415,9.911929457282502,13.517878973369866
Persistence + split-CP (per-lead),0.8782154340836013,0.24,0.9919614147909968,0.7893890675241158,0.15562700964630227,13.918516762035877,18.90446027710822
NNS native PI,0.8404340836012861,0.55,0.9469453376205788,0.8762057877813505,0.17009646302250803,5.417943415993259,8.509212166486481
NNS + split-CP (flat),0.8183279742765274,0.51,0.8858520900321544,0.8569131832797428,0.15241157556270102,5.073486836478882,8.35866609307891
NNS + Gaussian (flat),0.8207395498392283,0.54,0.8697749196141479,0.8569131832797428,0.13151125401929264,5.187680907668864,8.312439020985863
NNS + split-CP (per-lead),0.9232315112540193,0.65,0.9935691318327974,0.8665594855305466,0.09356913183279736,8.47780821511154,10.420173113316652
Ridge (recursive) + split-CP (per-lead),0.8629421221864951,0.56,0.954983922829582,0.8054662379421221,0.14115755627009652,10.224738158689307,13.735472897847638
Persistence + split-CP (per-lead),0.9143890675241158,0.11,1.0,0.8118971061093248,0.09999999999999998,14.479797580928631,19.260873784808723
NNS native PI,0.8472668810289389,0.51,0.9405144694533762,0.7733118971061094,0.12668810289389065,5.912994900178406,8.952518643444352
NNS + split-CP (flat),0.8243569131832797,0.58,0.9067524115755627,0.7765273311897106,0.15401929260450165,5.307560736619756,8.610536381423858
NNS + Gaussian (flat),0.8171221864951769,0.57,0.887459807073955,0.7733118971061094,0.1508038585209004,5.2117633612580585,8.673499632507543
NNS + split-CP (per-lead),0.9212218649517685,0.67,0.9903536977491961,0.8327974276527331,0.0903536977491961,8.612047167435271,10.189042171041397
Ridge (recursive) + split-CP (per-lead),0.8621382636655949,0.48,0.9421221864951769,0.8183279742765274,0.15401929260450165,9.940825035097003,13.418993354272594
Persistence + split-CP (per-lead),0.9308681672025724,0.6,0.9453376205787781,0.8569131832797428,0.07588424437299035,12.099187370057201,13.957050019151211
NNS native PI,0.860128617363344,0.58,0.9501607717041801,0.8906752411575563,0.15562700964630227,5.662363541166043,8.391587116904702
NNS + split-CP (flat),0.8243569131832797,0.37,0.887459807073955,0.8745980707395499,0.14919614147909965,5.3533136605690315,8.60722523849902
NNS + Gaussian (flat),0.8247588424437299,0.37,0.9019292604501608,0.8729903536977492,0.15241157556270102,5.31108027242601,8.649115111814341
NNS + split-CP (per-lead),0.917604501607717,0.62,0.9967845659163987,0.8971061093247589,0.09678456591639872,8.460963622012667,10.275595768590357
Ridge (recursive) + split-CP (per-lead),0.8677652733118971,0.5,0.9308681672025724,0.8504823151125402,0.14437299035369777,10.008320672419277,13.318984886956887
Persistence + split-CP (per-lead),0.8995176848874598,0.09,0.9967845659163987,0.7877813504823151,0.1122186495176849,15.609723603863243,21.539983068871113
NNS native PI,0.8432475884244373,0.62,0.9019292604501608,0.8408360128617364,0.10900321543408364,5.958905251551655,8.884737288789054
NNS + split-CP (flat),0.802652733118971,0.55,0.9019292604501608,0.8344051446945338,0.18778135048231515,5.514395594522053,9.282531631715036
NNS + Gaussian (flat),0.7930064308681672,0.54,0.887459807073955,0.8215434083601286,0.1958199356913184,5.443032862596285,9.404878484224922
NNS + split-CP (per-lead),0.9228295819935691,0.68,1.0,0.8440514469453376,0.09999999999999998,9.20007850046642,10.997501962538344
Ridge (recursive) + split-CP (per-lead),0.8782154340836013,0.59,0.9839228295819936,0.8456591639871383,0.14758842443729903,10.477958983260796,14.128418910843173
Persistence + split-CP (per-lead),0.8380225080385852,0.24,0.9823151125401929,0.7379421221864951,0.24565916398713827,11.622172326163268,17.7186519202099
NNS native PI,0.8404340836012861,0.42,0.9501607717041801,0.8665594855305466,0.19742765273311902,5.533785215198311,8.749410650539787
NNS + split-CP (flat),0.8336012861736335,0.47,0.9212218649517685,0.8488745980707395,0.16045016077170415,5.394905745099468,8.708916464271185
NNS + Gaussian (flat),0.8211414790996785,0.46,0.9115755627009646,0.8408360128617364,0.18135048231511253,5.203825372900099,8.80921517840864
NNS + split-CP (per-lead),0.9143890675241158,0.65,1.0,0.864951768488746,0.09999999999999998,8.344194523078558,10.216779329031802
Ridge (recursive) + split-CP (per-lead),0.8633440514469454,0.5,0.9630225080385852,0.837620578778135,0.17813504823151127,9.899806431265855,13.626610825171564
Persistence + split-CP (per-lead),0.9184083601286174,0.72,0.9758842443729904,0.8585209003215434,0.07588424437299035,10.229330031021911,12.202300632363936
NNS native PI,0.8484726688102894,0.58,0.905144694533762,0.8778135048231511,0.13151125401929264,6.028058412975543,8.799144860895987
NNS + split-CP (flat),0.8311897106109325,0.52,0.887459807073955,0.8697749196141479,0.1620578778135049,5.725218656291451,8.758728314500287
NNS + Gaussian (flat),0.8255627009646302,0.52,0.8858520900321544,0.8569131832797428,0.1620578778135049,5.649387814291777,8.818937346230603
NNS + split-CP (per-lead),0.9172025723472669,0.66,0.9983922829581994,0.8520900321543409,0.09839228295819935,8.733533232757578,10.717532145849189
Ridge (recursive) + split-CP (per-lead),0.8633440514469454,0.51,0.9565916398713826,0.8279742765273312,0.14276527331189715,10.30781624713549,14.048835006470576
Persistence + split-CP (per-lead),0.8464630225080386,0.2,0.9565916398713826,0.8842443729903537,0.28906752411575565,9.805366045553008,15.29539421577076
NNS native PI,0.8476688102893891,0.62,0.9019292604501608,0.860128617363344,0.1572347266881029,5.69808899719474,8.493411469371221
NNS + split-CP (flat),0.8319935691318328,0.59,0.8890675241157556,0.8762057877813505,0.17652733118971065,5.482468614284766,8.675839888968568
NNS + Gaussian (flat),0.8247588424437299,0.59,0.8890675241157556,0.8617363344051447,0.1845659163987139,5.339542905768758,8.766021670973783
NNS + split-CP (per-lead),0.9208199356913184,0.71,1.0,0.8745980707395499,0.09999999999999998,8.819437639911786,10.747072774537559
Ridge (recursive) + split-CP (per-lead),0.8806270096463023,0.6,0.9710610932475884,0.8440514469453376,0.11704180064308689,10.567570873632986,13.844746729280885
Persistence + split-CP (per-lead),0.8810289389067524,0.17,0.9935691318327974,0.8890675241157556,0.22154340836012865,11.335358979895252,16.437737508532983
4 changes: 4 additions & 0 deletions gists/timeseries/conformal/results/point.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
method,MAE,RMSE,median_AE
NNS block,1.5136011272156582,2.0556925586936883,1.12208606811395
Ridge (recursive),2.6644732816036893,3.2304601906237487,2.4028239883431057
Persistence,2.493290248083914,3.24141533222641,1.9910934283341892
31 changes: 31 additions & 0 deletions gists/timeseries/conformal/results/point_all.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
method,MAE,RMSE,median_AE
NNS block,1.4624613757188,2.0095523514648352,1.0714033811834391
Ridge (recursive),2.7093079489173206,3.2849148002794246,2.4276541206659417
Persistence,2.5207425403649664,3.1838137342389228,2.0779938586077042
NNS block,1.5179195200934121,2.0875213420368293,1.105426032360385
Ridge (recursive),2.8282510401794556,3.3952268823991485,2.5875729210928973
Persistence,2.3736672434422657,3.040742657230395,1.8980380282638447
NNS block,1.4996940127463412,2.00757652089875,1.115767753513877
Ridge (recursive),2.5709301773652182,3.1339434936155377,2.335881443966338
Persistence,2.9117732238980736,3.794140173915152,2.234972066371844
NNS block,1.4593534764791098,2.001246583495766,1.0864650913903005
Ridge (recursive),2.6298954565866794,3.190132803482008,2.3755657648263804
Persistence,2.4740376467312735,3.304377116943708,1.9953112501835901
NNS block,1.5271989128921286,2.049606520740077,1.1602056950481388
Ridge (recursive),2.596901417871215,3.1481544415297216,2.3307565425306667
Persistence,2.3415372011880113,2.992295950923396,1.9604867694701609
NNS block,1.4920674612633236,2.008429290064173,1.1172318673345731
Ridge (recursive),2.614711558422066,3.1746112295444617,2.308934798983637
Persistence,2.6319063703191286,3.543647928791187,2.0564798773344357
NNS block,1.6060225425685897,2.1818865891046224,1.1965950466152333
Ridge (recursive),2.717725103031881,3.295389524149246,2.4449283484676214
Persistence,2.6267172703935375,3.4629309634408636,1.9490510884493548
NNS block,1.4824820449499643,2.0065783979058245,1.1035396755182818
Ridge (recursive),2.6981442061600074,3.249449182036675,2.5191619066463744
Persistence,2.16538422214265,2.7236495214185332,1.867662248183055
NNS block,1.5806024483770957,2.126087726777259,1.1775229045548874
Ridge (recursive),2.6103148900670363,3.1982342648198507,2.298518235730201
Persistence,2.4716622184905397,3.181652504844425,1.9899147900891014
NNS block,1.5082094770678165,2.0784402644487474,1.086703233620384
Ridge (recursive),2.6685510174360134,3.2345452843814164,2.3992658005209986
Persistence,2.4154745438686924,3.1869027705175195,1.8810243063888008
Loading
Loading