diff --git a/gists/timeseries/conformal/README.md b/gists/timeseries/conformal/README.md new file mode 100644 index 0000000..8761992 --- /dev/null +++ b/gists/timeseries/conformal/README.md @@ -0,0 +1,120 @@ +# NNS under one honest forecasting protocol + +A single block walk-forward over one heteroskedastic data-generating process, +emitting two coherent analyses from the *same* leak-free NNS forecast: + +1. **Point duel** — does NNS actually forecast better than a fair baseline? +2. **Interval study** — given NNS's forecast, is its native prediction interval + better or worse than conformalizing the same residuals? + +This replaces the benchmark withdrawn under issue #57, which (a) leaked each +evaluation chunk into `NNS.ARMA.optim`'s validation tail and (b) compared methods +across mismatched protocols (online h=1 baselines vs. an NNS multi-step block). + +## The protocol (one rule for everyone) + +> At each origin *t*, given only data through *t*, every method forecasts an +> `h`-step block (`h = implied_h = t·(1−0.9)/0.9`) with **no online updating** — +> no method may peek at a realized value to predict the next one. + +This is *forecasting by design*, exposed to its own compounding error — the +opposite of an online method that re-anchors on the truth every step. NNS's model +(periods/method/bias) is selected on a strictly **historical** validation tail, so +the scored block is never shown to the optimizer (the #57 leak is fixed). + +- **DGP:** non-linear, heteroskedastic AR(1), slow trend, two seasonal components + (periods 50, 200), piecewise volatility regimes (σ → 2.5, 0.55, 1.8). 10 seeds, + T = 3500, ~12 blocks/seed. + +## Point duel + +- **NNS block** — `NNS.ARMA.optim` native block forecast (leak-free selection). +- **Ridge (recursive)** — ridge on `N_LAGS` lags, fit on all data ≤ *t*, projected + `h` steps recursively (its own predictions become the lags; no true intermediate + values — this removes the h=1 true-lag crutch the original baselines enjoyed). +- **Persistence** — last value carried forward (floor). + +``` + method MAE RMSE median_AE + NNS block 1.514 2.056 1.122 +Ridge (recursive) 2.664 3.230 2.403 + Persistence 2.493 3.241 1.991 +``` + +## Interval study + +We deliberately **adapt** conformal into a forecast wrapper here, and say so. Online +adaptive conformal (ACI/PID/NexCP) gets its coverage from a per-step feedback loop — +a control-theoretic property, not forecast skill — so pitting it against a multi-step +block forecaster on a 1-step task tests the wrong thing. Instead we hold the NNS point +forecast **fixed** and vary only the band, to discern **coverage guarantees on a +heteroskedastic process** — conditional (per-regime / worst-window) coverage, not just +the marginal rate: + +- **NNS native PI** — `results ± pi_width`, NNS's own flat nonparametric rule. +- **NNS + split-CP (flat)** — empirical (1−α) conformal quantile of NNS residuals. +- **NNS + split-CP (per-lead)** — a quantile *per lead-time k* (the only band that + widens with horizon). +- **NNS + Gaussian (flat)** — `z · std(residuals)`. +- **Ridge / Persistence + split-CP (per-lead)** — same wrapper on the weaker point + models, to show interval quality follows point quality. + +``` + method marg_cov worst_win_cov cov_lowvol cov_hivol cond_cov_gap width interval_score + NNS native PI 0.845 0.561 0.927 0.853 0.149 5.658 8.674 + NNS + split-CP (flat) 0.824 0.515 0.899 0.850 0.169 5.398 8.735 + NNS + Gaussian (flat) 0.821 0.516 0.894 0.842 0.170 5.332 8.773 + NNS + split-CP (per-lead) 0.918 0.649 0.996 0.858 0.097 8.595 10.505 +Ridge (recursive) + split-CP (per-lead) 0.865 0.553 0.955 0.826 0.149 10.169 13.853 + Persistence + split-CP (per-lead) 0.887 0.333 0.981 0.807 0.162 12.107 16.440 +``` + +Point table sorted by RMSE, interval table by interval (Winkler) score; lower is +better throughout. + +## Findings + +- **NNS wins the point duel decisively** — MAE 1.51 vs. recursive ridge 2.66 + (~43% lower) and persistence 2.49, same ordering on RMSE/median, consistent + across all 10 seeds. Recursive ridge falls *below* persistence on MAE: strip the + h=1 true-lag crutch and a linear AR model's error compounds over the block until + naive carry-forward beats it. Structural forecasting by design vs. reactive by + compulsion, made concrete. +- **NNS's native PI is the efficiency winner and ≈ a flat split-conformal band.** + Best interval score (8.674) at the tightest competitive width, and it tracks the + empirical split-conformal quantile on the *same* residuals (8.735) — confirming + the native band is, in effect, flat split conformal, slightly more conservative. +- **Every flat band under-covers the volatile regime** (marg 0.82–0.85, worst-window + ~0.51–0.56, `cov_hivol` ~0.84–0.85). Calibrating a single width on historical + residuals cannot transport to a multi-step block that compounds error and crosses + σ-regimes — a textbook exchangeability failure under heteroskedasticity. +- **Only the horizon-adaptive per-lead wrapper recovers the guarantee** — near-nominal + 0.918 marginal, the smallest conditional gap (0.097) and best worst-window (0.649) — + at ~52% wider intervals and the worst Winkler score among the NNS bands. No free + lunch: coverage on a heteroskedastic process costs width. +- **Interval quality follows point quality.** The *same* per-lead wrapper on the + weaker point models scores far worse (NNS 10.5 ≪ ridge 13.9 ≪ persistence 16.4) + with much wider bands — a better forecast makes a tighter, better-centred interval. + +## How to read it + +- **Marginal coverage near 0.90 is necessary but cheap.** On a heteroskedastic DGP + the honest test is the **conditional** gap between calm (`cov_lowvol`) and volatile + (`cov_hivol`) regimes and the **worst rolling window** — a flat band that hits 0.90 + on average by over-covering calm and under-covering volatile has not solved the + problem. Every flat band here under-covers the volatile regime: that is the + exchangeability-violation penalty, common to all of them, not specific to any one. +- **Width is only a fair tie-breaker at matched coverage.** Among methods with + different coverage, a narrower band may just be under-covering more. The interval + (Winkler) score combines the two (`width + (2/α)·exceedance`), which is why it is + the sort key. + +## Run it + +```bash +pip install ovvo-nns numpy pandas scipy scikit-learn +python run_conformal.py +``` + +Writes `results/point.csv`, `results/interval.csv` (aggregated) and the per-seed +`*_all.csv`. diff --git a/gists/timeseries/conformal/results/interval.csv b/gists/timeseries/conformal/results/interval.csv new file mode 100644 index 0000000..206495f --- /dev/null +++ b/gists/timeseries/conformal/results/interval.csv @@ -0,0 +1,7 @@ +method,marg_cov,worst_win_cov,cov_lowvol,cov_hivol,cond_cov_gap,width,interval_score +NNS native PI,0.845217041800643,0.561,0.9271704180064309,0.852572347266881,0.14855305466237945,5.658247904936664,8.673563693193246 +NNS + split-CP (flat),0.8243569131832797,0.515,0.8993569131832798,0.84983922829582,0.1686495176848875,5.397754711147717,8.734633715776543 +NNS + Gaussian (flat),0.8206993569131832,0.516,0.8940514469453376,0.8421221864951768,0.17025723472668813,5.332100605902861,8.772635078222827 +NNS + split-CP (per-lead),0.917604501607717,0.649,0.9959807073954984,0.8577170418006432,0.09655948553054661,8.59459297384414,10.505483113927781 +Ridge (recursive) + split-CP (per-lead),0.865032154340836,0.553,0.9546623794212218,0.8263665594855306,0.14855305466237945,10.16897349633998,13.853289701795891 +Persistence + split-CP (per-lead),0.88725884244373,0.333,0.9810289389067524,0.807395498392283,0.1617684887459807,12.106951264773958,16.439867386859973 diff --git a/gists/timeseries/conformal/results/interval_all.csv b/gists/timeseries/conformal/results/interval_all.csv new file mode 100644 index 0000000..cb5ba21 --- /dev/null +++ b/gists/timeseries/conformal/results/interval_all.csv @@ -0,0 +1,61 @@ +method,marg_cov,worst_win_cov,cov_lowvol,cov_hivol,cond_cov_gap,width,interval_score +NNS native PI,0.8476688102893891,0.67,0.9131832797427653,0.8070739549839229,0.09453376205787789,5.3336289551117595,8.347311881079282 +NNS + split-CP (flat),0.8271704180064309,0.58,0.8954983922829582,0.8311897106109325,0.14919614147909965,5.227161385540743,8.534113534345261 +NNS + Gaussian (flat),0.8295819935691319,0.58,0.8858520900321544,0.8279742765273312,0.14276527331189715,5.220493265073913,8.535512774975702 +NNS + split-CP (per-lead),0.9147909967845659,0.65,0.9967845659163987,0.8183279742765274,0.09678456591639872,8.425865285679825,10.568242001830663 +Ridge (recursive) + split-CP (per-lead),0.8661575562700965,0.64,0.9598070739549839,0.7909967845659164,0.10900321543408364,10.581823596303991,14.592993775300465 +Persistence + split-CP (per-lead),0.8721864951768489,0.44,0.9710610932475884,0.707395498392283,0.19260450160771703,10.549102615950632,14.52448029198288 +NNS native PI,0.8287781350482315,0.52,0.9163987138263665,0.8617363344051447,0.2022508038585209,5.538396478733216,9.039315486588494 +NNS + split-CP (flat),0.815112540192926,0.51,0.9067524115755627,0.8440514469453376,0.23118971061093252,5.423595785308306,9.16356799285496 +NNS + Gaussian (flat),0.8163183279742765,0.52,0.9163987138263665,0.8408360128617364,0.23118971061093252,5.371605324250339,9.132929020698203 +NNS + split-CP (per-lead),0.9111736334405145,0.6,0.9967845659163987,0.8456591639871383,0.09678456591639872,8.679187273640938,10.817084198706139 +Ridge (recursive) + split-CP (per-lead),0.8356109324758842,0.55,0.932475884244373,0.7909967845659164,0.1909967845659164,9.768945508312589,14.299961658445255 +Persistence + split-CP (per-lead),0.8934887459807074,0.52,0.9967845659163987,0.7508038585209004,0.14919614147909965,11.420957332270556,14.557742149799997 +NNS native PI,0.8480707395498392,0.54,0.9453376205787781,0.8713826366559485,0.14115755627009652,5.498313881263709,8.568987367833104 +NNS + split-CP (flat),0.8348070739549839,0.47,0.9115755627009646,0.8858520900321544,0.16366559485530552,5.475440096762711,8.646211618108355 +NNS + Gaussian (flat),0.8340032154340836,0.47,0.905144694533762,0.8681672025723473,0.17009646302250803,5.3825939727945,8.623802541408669 +NNS + split-CP (per-lead),0.9127813504823151,0.6,0.9871382636655949,0.8810289389067524,0.09292604501607715,8.192814278346807,10.105807673835704 +Ridge (recursive) + split-CP (per-lead),0.8701768488745981,0.6,0.9517684887459807,0.8520900321543409,0.16045016077170415,9.911929457282502,13.517878973369866 +Persistence + split-CP (per-lead),0.8782154340836013,0.24,0.9919614147909968,0.7893890675241158,0.15562700964630227,13.918516762035877,18.90446027710822 +NNS native PI,0.8404340836012861,0.55,0.9469453376205788,0.8762057877813505,0.17009646302250803,5.417943415993259,8.509212166486481 +NNS + split-CP (flat),0.8183279742765274,0.51,0.8858520900321544,0.8569131832797428,0.15241157556270102,5.073486836478882,8.35866609307891 +NNS + Gaussian (flat),0.8207395498392283,0.54,0.8697749196141479,0.8569131832797428,0.13151125401929264,5.187680907668864,8.312439020985863 +NNS + split-CP (per-lead),0.9232315112540193,0.65,0.9935691318327974,0.8665594855305466,0.09356913183279736,8.47780821511154,10.420173113316652 +Ridge (recursive) + split-CP (per-lead),0.8629421221864951,0.56,0.954983922829582,0.8054662379421221,0.14115755627009652,10.224738158689307,13.735472897847638 +Persistence + split-CP (per-lead),0.9143890675241158,0.11,1.0,0.8118971061093248,0.09999999999999998,14.479797580928631,19.260873784808723 +NNS native PI,0.8472668810289389,0.51,0.9405144694533762,0.7733118971061094,0.12668810289389065,5.912994900178406,8.952518643444352 +NNS + split-CP (flat),0.8243569131832797,0.58,0.9067524115755627,0.7765273311897106,0.15401929260450165,5.307560736619756,8.610536381423858 +NNS + Gaussian (flat),0.8171221864951769,0.57,0.887459807073955,0.7733118971061094,0.1508038585209004,5.2117633612580585,8.673499632507543 +NNS + split-CP (per-lead),0.9212218649517685,0.67,0.9903536977491961,0.8327974276527331,0.0903536977491961,8.612047167435271,10.189042171041397 +Ridge (recursive) + split-CP (per-lead),0.8621382636655949,0.48,0.9421221864951769,0.8183279742765274,0.15401929260450165,9.940825035097003,13.418993354272594 +Persistence + split-CP (per-lead),0.9308681672025724,0.6,0.9453376205787781,0.8569131832797428,0.07588424437299035,12.099187370057201,13.957050019151211 +NNS native PI,0.860128617363344,0.58,0.9501607717041801,0.8906752411575563,0.15562700964630227,5.662363541166043,8.391587116904702 +NNS + split-CP (flat),0.8243569131832797,0.37,0.887459807073955,0.8745980707395499,0.14919614147909965,5.3533136605690315,8.60722523849902 +NNS + Gaussian (flat),0.8247588424437299,0.37,0.9019292604501608,0.8729903536977492,0.15241157556270102,5.31108027242601,8.649115111814341 +NNS + split-CP (per-lead),0.917604501607717,0.62,0.9967845659163987,0.8971061093247589,0.09678456591639872,8.460963622012667,10.275595768590357 +Ridge (recursive) + split-CP (per-lead),0.8677652733118971,0.5,0.9308681672025724,0.8504823151125402,0.14437299035369777,10.008320672419277,13.318984886956887 +Persistence + split-CP (per-lead),0.8995176848874598,0.09,0.9967845659163987,0.7877813504823151,0.1122186495176849,15.609723603863243,21.539983068871113 +NNS native PI,0.8432475884244373,0.62,0.9019292604501608,0.8408360128617364,0.10900321543408364,5.958905251551655,8.884737288789054 +NNS + split-CP (flat),0.802652733118971,0.55,0.9019292604501608,0.8344051446945338,0.18778135048231515,5.514395594522053,9.282531631715036 +NNS + Gaussian (flat),0.7930064308681672,0.54,0.887459807073955,0.8215434083601286,0.1958199356913184,5.443032862596285,9.404878484224922 +NNS + split-CP (per-lead),0.9228295819935691,0.68,1.0,0.8440514469453376,0.09999999999999998,9.20007850046642,10.997501962538344 +Ridge (recursive) + split-CP (per-lead),0.8782154340836013,0.59,0.9839228295819936,0.8456591639871383,0.14758842443729903,10.477958983260796,14.128418910843173 +Persistence + split-CP (per-lead),0.8380225080385852,0.24,0.9823151125401929,0.7379421221864951,0.24565916398713827,11.622172326163268,17.7186519202099 +NNS native PI,0.8404340836012861,0.42,0.9501607717041801,0.8665594855305466,0.19742765273311902,5.533785215198311,8.749410650539787 +NNS + split-CP (flat),0.8336012861736335,0.47,0.9212218649517685,0.8488745980707395,0.16045016077170415,5.394905745099468,8.708916464271185 +NNS + Gaussian (flat),0.8211414790996785,0.46,0.9115755627009646,0.8408360128617364,0.18135048231511253,5.203825372900099,8.80921517840864 +NNS + split-CP (per-lead),0.9143890675241158,0.65,1.0,0.864951768488746,0.09999999999999998,8.344194523078558,10.216779329031802 +Ridge (recursive) + split-CP (per-lead),0.8633440514469454,0.5,0.9630225080385852,0.837620578778135,0.17813504823151127,9.899806431265855,13.626610825171564 +Persistence + split-CP (per-lead),0.9184083601286174,0.72,0.9758842443729904,0.8585209003215434,0.07588424437299035,10.229330031021911,12.202300632363936 +NNS native PI,0.8484726688102894,0.58,0.905144694533762,0.8778135048231511,0.13151125401929264,6.028058412975543,8.799144860895987 +NNS + split-CP (flat),0.8311897106109325,0.52,0.887459807073955,0.8697749196141479,0.1620578778135049,5.725218656291451,8.758728314500287 +NNS + Gaussian (flat),0.8255627009646302,0.52,0.8858520900321544,0.8569131832797428,0.1620578778135049,5.649387814291777,8.818937346230603 +NNS + split-CP (per-lead),0.9172025723472669,0.66,0.9983922829581994,0.8520900321543409,0.09839228295819935,8.733533232757578,10.717532145849189 +Ridge (recursive) + split-CP (per-lead),0.8633440514469454,0.51,0.9565916398713826,0.8279742765273312,0.14276527331189715,10.30781624713549,14.048835006470576 +Persistence + split-CP (per-lead),0.8464630225080386,0.2,0.9565916398713826,0.8842443729903537,0.28906752411575565,9.805366045553008,15.29539421577076 +NNS native PI,0.8476688102893891,0.62,0.9019292604501608,0.860128617363344,0.1572347266881029,5.69808899719474,8.493411469371221 +NNS + split-CP (flat),0.8319935691318328,0.59,0.8890675241157556,0.8762057877813505,0.17652733118971065,5.482468614284766,8.675839888968568 +NNS + Gaussian (flat),0.8247588424437299,0.59,0.8890675241157556,0.8617363344051447,0.1845659163987139,5.339542905768758,8.766021670973783 +NNS + split-CP (per-lead),0.9208199356913184,0.71,1.0,0.8745980707395499,0.09999999999999998,8.819437639911786,10.747072774537559 +Ridge (recursive) + split-CP (per-lead),0.8806270096463023,0.6,0.9710610932475884,0.8440514469453376,0.11704180064308689,10.567570873632986,13.844746729280885 +Persistence + split-CP (per-lead),0.8810289389067524,0.17,0.9935691318327974,0.8890675241157556,0.22154340836012865,11.335358979895252,16.437737508532983 diff --git a/gists/timeseries/conformal/results/point.csv b/gists/timeseries/conformal/results/point.csv new file mode 100644 index 0000000..3cfd679 --- /dev/null +++ b/gists/timeseries/conformal/results/point.csv @@ -0,0 +1,4 @@ +method,MAE,RMSE,median_AE +NNS block,1.5136011272156582,2.0556925586936883,1.12208606811395 +Ridge (recursive),2.6644732816036893,3.2304601906237487,2.4028239883431057 +Persistence,2.493290248083914,3.24141533222641,1.9910934283341892 diff --git a/gists/timeseries/conformal/results/point_all.csv b/gists/timeseries/conformal/results/point_all.csv new file mode 100644 index 0000000..7900c80 --- /dev/null +++ b/gists/timeseries/conformal/results/point_all.csv @@ -0,0 +1,31 @@ +method,MAE,RMSE,median_AE +NNS block,1.4624613757188,2.0095523514648352,1.0714033811834391 +Ridge (recursive),2.7093079489173206,3.2849148002794246,2.4276541206659417 +Persistence,2.5207425403649664,3.1838137342389228,2.0779938586077042 +NNS block,1.5179195200934121,2.0875213420368293,1.105426032360385 +Ridge (recursive),2.8282510401794556,3.3952268823991485,2.5875729210928973 +Persistence,2.3736672434422657,3.040742657230395,1.8980380282638447 +NNS block,1.4996940127463412,2.00757652089875,1.115767753513877 +Ridge (recursive),2.5709301773652182,3.1339434936155377,2.335881443966338 +Persistence,2.9117732238980736,3.794140173915152,2.234972066371844 +NNS block,1.4593534764791098,2.001246583495766,1.0864650913903005 +Ridge (recursive),2.6298954565866794,3.190132803482008,2.3755657648263804 +Persistence,2.4740376467312735,3.304377116943708,1.9953112501835901 +NNS block,1.5271989128921286,2.049606520740077,1.1602056950481388 +Ridge (recursive),2.596901417871215,3.1481544415297216,2.3307565425306667 +Persistence,2.3415372011880113,2.992295950923396,1.9604867694701609 +NNS block,1.4920674612633236,2.008429290064173,1.1172318673345731 +Ridge (recursive),2.614711558422066,3.1746112295444617,2.308934798983637 +Persistence,2.6319063703191286,3.543647928791187,2.0564798773344357 +NNS block,1.6060225425685897,2.1818865891046224,1.1965950466152333 +Ridge (recursive),2.717725103031881,3.295389524149246,2.4449283484676214 +Persistence,2.6267172703935375,3.4629309634408636,1.9490510884493548 +NNS block,1.4824820449499643,2.0065783979058245,1.1035396755182818 +Ridge (recursive),2.6981442061600074,3.249449182036675,2.5191619066463744 +Persistence,2.16538422214265,2.7236495214185332,1.867662248183055 +NNS block,1.5806024483770957,2.126087726777259,1.1775229045548874 +Ridge (recursive),2.6103148900670363,3.1982342648198507,2.298518235730201 +Persistence,2.4716622184905397,3.181652504844425,1.9899147900891014 +NNS block,1.5082094770678165,2.0784402644487474,1.086703233620384 +Ridge (recursive),2.6685510174360134,3.2345452843814164,2.3992658005209986 +Persistence,2.4154745438686924,3.1869027705175195,1.8810243063888008 diff --git a/gists/timeseries/conformal/run_conformal.py b/gists/timeseries/conformal/run_conformal.py new file mode 100644 index 0000000..9245bc0 --- /dev/null +++ b/gists/timeseries/conformal/run_conformal.py @@ -0,0 +1,324 @@ +""" +run_conformal.py +================================================================================ +Time-series block benchmark — NNS under one honest protocol (issue #57 redux). + +A single walk-forward emits BOTH analyses, from the same leak-free NNS block +forecast (computed once per origin): + + POINT DUEL — who forecasts best? + At each origin t, given only data through t, every method forecasts an + h-step block (h = implied_h = t*(1-0.9)/0.9) with NO online updating — no + method peeks at a realized value to predict the next one. + * NNS block — NNS.ARMA.optim, model selected on a strictly + historical validation tail (leak-free). + * Ridge (recursive)— ridge on N_LAGS lags, fit on all data <= t, projected + h steps recursively (own predictions become lags). + * Persistence — last value carried forward (floor). + + INTERVAL STUDY — given the NNS forecast, how should the band be drawn? + The NNS point forecast is held FIXED and only the interval construction + varies, on identical residuals: + * NNS native PI — results +/- pi_width (flat, NNS's own rule) + * NNS + split-CP (flat) — empirical (1-a) quantile of NNS residuals + * NNS + split-CP (per-lead)— quantile per lead-time k (widens w/ horizon) + * NNS + Gaussian (flat) — z * std(residuals) + Plus Ridge/Persistence + split-CP (per-lead) to show that interval quality + follows point quality. + +Framed as an adaptation to discern coverage guarantees on a heteroskedastic +process: marginal coverage is cheap; the test is conditional (per-regime / +worst-window) coverage and whether width adapts. + +Run: + pip install ovvo-nns numpy pandas scipy scikit-learn + python run_conformal.py +Writes results/{point,interval}{,_all}.csv +""" +from __future__ import annotations + +import os +import math +import time +import warnings +from collections import defaultdict + +import numpy as np +import pandas as pd +from scipy import stats + +warnings.filterwarnings("ignore") + +try: + import nns as NNS +except ImportError: + raise SystemExit("ovvo-nns is required. Run: pip install ovvo-nns") + +try: + from sklearn.linear_model import Ridge + from sklearn.pipeline import make_pipeline + from sklearn.preprocessing import StandardScaler + HAS_SKLEARN = True +except ImportError: + HAS_SKLEARN = False + +# ── Constants ──────────────────────────────────────────────────────────────── +ALPHA = 0.10 +TARGET_COV = 1.0 - ALPHA +N_LAGS = 12 +CAL_END = 1000 +WINDOW = 100 +N_SEEDS = 10 +TRAINING_FRAC = 0.90 +VAL_FRAC = 0.10 +CP_MIN_POOL = 8 +LOCAL_WIN = 2 + +os.makedirs("results", exist_ok=True) +_MSE = lambda predicted, actual: np.mean((predicted - actual) ** 2) +POINT_METHODS = ["NNS block", "Ridge (recursive)", "Persistence"] + +# ── DGP ────────────────────────────────────────────────────────────────────── +def make_timeseries(T: int = 3500, seed: int = 0, heavy_tail: bool = False) -> dict: + rng = np.random.default_rng(seed + 1) + tt = np.arange(1, T + 1, dtype=float) + level = 0.002 * tt + 1.50 * np.sin(2*np.pi*tt/50) + 0.75 * np.sin(2*np.pi*tt/200) + sigma = np.ones(T) + sigma[(tt > 900) & (tt <= 1400)] = 2.5 + sigma[(tt > 1900) & (tt <= 2450)] = 0.55 + sigma[(tt > 2800)] = 1.8 + eps = rng.standard_t(5, size=T) / math.sqrt(5/3) if heavy_tail else rng.standard_normal(T) + y = np.empty(T) + y[0] = level[0] + sigma[0] * eps[0] + for i in range(1, T): + y[i] = level[i] + 0.55 * (y[i-1] - level[i-1]) + sigma[i] * eps[i] + return {"y": y, "level": level, "sigma": sigma} + +def lag_features(y: np.ndarray, n_lags: int = N_LAGS): + n = len(y) + yy = y[n_lags:] + X = np.column_stack([y[n_lags - k: n - k] for k in range(1, n_lags + 1)]) + return X, yy + +# ── Point forecasters (each returns an h-step block from end of y_hist) ────── +def forecast_ridge_recursive(y_hist: np.ndarray, h: int, n_lags: int = N_LAGS) -> np.ndarray: + X, yy = lag_features(y_hist, n_lags) + if HAS_SKLEARN: + model = make_pipeline(StandardScaler(), Ridge(alpha=1.0)) + model.fit(X, yy) + predict = lambda lags: float(model.predict(lags.reshape(1, -1))[0]) + else: + Xtr = np.column_stack([X, np.ones(len(X))]) + coef, *_ = np.linalg.lstsq(Xtr, yy, rcond=None) + predict = lambda lags: float(np.append(lags, 1.0) @ coef) + hist = list(y_hist[-n_lags:]); out = [] + for _ in range(h): + lags = np.array(hist[-1::-1][:n_lags]) + yhat = predict(lags); out.append(yhat); hist.append(yhat) + return np.asarray(out) + +def forecast_persistence(y_hist: np.ndarray, h: int) -> np.ndarray: + return np.full(h, float(y_hist[-1])) + +def get_nns_seas_periods(series: np.ndarray, train_n: int) -> list[int]: + max_period = int(train_n / 3) - 1 + try: + raw = NNS.nns_seas(series, plot=False).get("periods", np.array([])) + seen, valid = set(), [] + for p in raw: + pi = int(p) + if 1 < pi < max_period and pi not in seen: + seen.add(pi); valid.append(pi) + return valid if valid else [2] + except Exception: + return [2] + +def forecast_nns_block(y_hist: np.ndarray, h: int): + """Leak-free NNS block: returns (point, native_lo, native_hi, cal_resid).""" + n = len(y_hist) + val_h = max(1, int(round(VAL_FRAC * n))) + periods = get_nns_seas_periods(y_hist, n) + fit = NNS.nns_arma_optim( + variable=y_hist, h=h, training_set=n - val_h, seasonal_factor=periods, + negative_values=True, obj_fn=_MSE, objective="min", + linear_approximation=True, pred_int=TARGET_COV, print_trace=False, plot=False, + ) + point = np.asarray(fit["results"], float) + nat_lo = np.asarray(fit["lower.pred.int"], float) + nat_hi = np.asarray(fit["upper.pred.int"], float) + cal_resid = np.asarray(fit["errors"], float) + float(fit["bias.shift"]) + return point, nat_lo, nat_hi, cal_resid + +# ── Metrics ────────────────────────────────────────────────────────────────── +def z_alpha(alpha: float = ALPHA) -> float: + return float(stats.norm.ppf(1 - alpha / 2)) +def coverage(lo, hi, y) -> float: + lo, hi, y = map(np.asarray, (lo, hi, y)); return float(np.mean((y >= lo) & (y <= hi))) +def mean_width(lo, hi) -> float: + return float(np.mean(np.asarray(hi) - np.asarray(lo))) +def rolling_coverage(lo, hi, y, window: int = WINDOW) -> np.ndarray: + lo, hi, y = map(np.asarray, (lo, hi, y)); n = len(y) + if n < window: return np.array([]) + return np.array([coverage(lo[i:i+window], hi[i:i+window], y[i:i+window]) for i in range(n-window+1)]) +def worst_window_coverage(lo, hi, y, window: int = WINDOW) -> float: + rc = rolling_coverage(lo, hi, y, window); return float(np.min(rc)) if len(rc) > 0 else float("nan") +def interval_score(lo, hi, y, alpha: float = ALPHA) -> float: + lo, hi, y = map(np.asarray, (lo, hi, y)) + return float(np.mean((hi - lo) + (2/alpha)*np.maximum(lo - y, 0) + (2/alpha)*np.maximum(y - hi, 0))) +def coverage_by_stratum(lo, hi, y, sigma, k: int = 4) -> list[float]: + lo, hi, y, sigma = map(np.asarray, (lo, hi, y, sigma)) + ranks = stats.rankdata(sigma, method="ordinal") + grp = np.ceil(ranks / len(ranks) * k).astype(int).clip(1, k) + return [coverage(lo[grp == j], hi[grp == j], y[grp == j]) for j in range(1, k+1)] +def _emp_q(scores, level=TARGET_COV): + s = np.sort(np.abs(np.asarray(scores, float))); n = len(s) + if n == 0: return None + k = int(np.ceil((n + 1) * level)) - 1 + return float("inf") if k >= n else float(s[k]) + +# ── One seed: single walk-forward, both analyses ───────────────────────────── +def run_once(seed: int = 0, heavy_tail: bool = False, verbose: bool = True) -> dict: + d = make_timeseries(T=3500, seed=seed, heavy_tail=heavy_tail) + y, sig = d["y"], d["sigma"] + T_raw = len(y); z = z_alpha(ALPHA) + + pt_pred = defaultdict(list) + acc_y, acc_sig = [], [] + lo_acc = defaultdict(list); hi_acc = defaultdict(list); sig_acc = defaultdict(list) + per_lead = {m: defaultdict(list) for m in POINT_METHODS} # realized |resid| by lead + flat_pool = {m: [] for m in POINT_METHODS} # realized |resid| pooled + + current_train = N_LAGS + CAL_END + t0 = time.time(); n_chunks = 0 + + def cp_width(m, k, q_fallback): + pool = [] + for kk in range(max(0, k - LOCAL_WIN), k + LOCAL_WIN + 1): + pool.extend(per_lead[m].get(kk, [])) + if len(pool) >= CP_MIN_POOL: + q = _emp_q(pool, TARGET_COV) + if q is not None and np.isfinite(q): + return q + qf = _emp_q(flat_pool[m], TARGET_COV) if flat_pool[m] else None + return qf if (qf is not None and np.isfinite(qf)) else q_fallback + + while current_train < T_raw: + remaining = T_raw - current_train + implied_h = max(1, int(current_train * (1 - TRAINING_FRAC) / TRAINING_FRAC)) + h = min(remaining, implied_h) + y_hist = y[:current_train] + yb = y[current_train:current_train + h] + sgb = sig[current_train:current_train + h] + + try: + nns_pt, nat_lo, nat_hi, cal_resid = forecast_nns_block(y_hist, h) + except Exception as exc: + if verbose: print(f" [seed {seed} chunk {n_chunks+1} NNS failed: {exc}] skip") + current_train += h; continue + + preds = { + "NNS block": nns_pt, + "Ridge (recursive)": forecast_ridge_recursive(y_hist, h), + "Persistence": forecast_persistence(y_hist, h), + } + n_chunks += 1 + + q_flat_nns = _emp_q(cal_resid, TARGET_COV) + if q_flat_nns is None or not np.isfinite(q_flat_nns): + q_flat_nns = float(np.std(cal_resid) + 1e-6) + s_flat_nns = max(float(np.std(cal_resid)), 1e-8) + + # ── INTERVAL STUDY on the fixed NNS point ──────────────────────────── + lo_acc["NNS native PI"].extend(nat_lo); hi_acc["NNS native PI"].extend(nat_hi) + sig_acc["NNS native PI"].extend(np.maximum((nat_hi - nat_lo)/2.0, 1e-8)/z) + + lo_acc["NNS + split-CP (flat)"].extend(nns_pt - q_flat_nns) + hi_acc["NNS + split-CP (flat)"].extend(nns_pt + q_flat_nns) + sig_acc["NNS + split-CP (flat)"].extend(np.full(h, max(q_flat_nns/z, 1e-8))) + + lo_acc["NNS + Gaussian (flat)"].extend(nns_pt - z*s_flat_nns) + hi_acc["NNS + Gaussian (flat)"].extend(nns_pt + z*s_flat_nns) + sig_acc["NNS + Gaussian (flat)"].extend(np.full(h, s_flat_nns)) + + # ── per-lead split-CP for each point method (shared machinery) ─────── + for m in POINT_METHODS: + p = preds[m] + name = f"{m} + split-CP (per-lead)" if m != "NNS block" else "NNS + split-CP (per-lead)" + lo = np.empty(h); hi = np.empty(h); ss = np.empty(h) + for k in range(h): + qk = cp_width(m, k, q_flat_nns) + lo[k] = p[k] - qk; hi[k] = p[k] + qk; ss[k] = max(qk/z, 1e-8) + lo_acc[name].extend(lo); hi_acc[name].extend(hi); sig_acc[name].extend(ss) + + for m in POINT_METHODS: + pt_pred[m].extend(preds[m]) + acc_y.extend(yb); acc_sig.extend(sgb) + + # update pools AFTER scoring (now data <= next origin) + for m in POINT_METHODS: + r = np.abs(yb - preds[m]) + flat_pool[m].extend(r.tolist()) + for k in range(h): + per_lead[m][k].append(r[k]) + + current_train += h + + y_arr = np.asarray(acc_y); sig_arr = np.asarray(acc_sig) + + # POINT table + point_rows = [] + for m in POINT_METHODS: + err = np.asarray(pt_pred[m]) - y_arr + point_rows.append({"method": m, "MAE": float(np.mean(np.abs(err))), + "RMSE": float(np.sqrt(np.mean(err**2))), + "median_AE": float(np.median(np.abs(err)))}) + + # INTERVAL table + int_rows = [] + for name in lo_acc: + lo = np.asarray(lo_acc[name]); hi = np.asarray(hi_acc[name]); s_ = np.asarray(sig_acc[name]) + cbs = coverage_by_stratum(lo, hi, y_arr, sig_arr) + int_rows.append({ + "method": name, "marg_cov": coverage(lo, hi, y_arr), + "worst_win_cov": worst_window_coverage(lo, hi, y_arr), + "cov_lowvol": cbs[0], "cov_hivol": cbs[-1], + "cond_cov_gap": max(abs(c - TARGET_COV) for c in cbs if not math.isnan(c)), + "width": mean_width(lo, hi), "interval_score": interval_score(lo, hi, y_arr), + }) + + if verbose: + mae = {m: np.mean(np.abs(np.asarray(pt_pred[m]) - y_arr)) for m in POINT_METHODS} + print(f" [seed {seed}] {n_chunks} blocks MAE: " + + " ".join(f"{m}={mae[m]:.3f}" for m in POINT_METHODS) + f" ({time.time()-t0:.0f}s)") + + return {"point": pd.DataFrame(point_rows), "interval": pd.DataFrame(int_rows)} + +# ── Aggregate ──────────────────────────────────────────────────────────────── +def run_all() -> dict: + pts, ints = [], [] + for s in range(N_SEEDS): + print(f"\n=== seed {s} ===") + r = run_once(seed=s, verbose=True) + pts.append(r["point"]); ints.append(r["interval"]) + + point_dt = pd.concat(pts, ignore_index=True) + int_dt = pd.concat(ints, ignore_index=True) + point_agg = point_dt.groupby("method", sort=False).mean().reset_index().sort_values("RMSE") + int_agg = int_dt.groupby("method", sort=False).mean().reset_index().sort_values("interval_score") + + point_dt.to_csv("results/point_all.csv", index=False) + int_dt.to_csv("results/interval_all.csv", index=False) + point_agg.to_csv("results/point.csv", index=False) + int_agg.to_csv("results/interval.csv", index=False) + + pd.set_option("display.width", 200, "display.max_columns", 20) + print(f"\n=== POINT DUEL (block h-step, no online updating; mean over {N_SEEDS} seeds) ===\n") + print(point_agg.round(3).to_string(index=False)) + print(f"\n=== INTERVAL STUDY (point = NNS fixed; alpha={ALPHA}, target={TARGET_COV}) ===\n") + print(int_agg.round(3).to_string(index=False)) + print("\nWrote results/{point,interval}{,_all}.csv") + return {"point": point_agg, "interval": int_agg} + + +if __name__ == "__main__": + run_all()