diff --git a/changed_files.json b/changed_files.json index 2baefa4..2b0c8e1 100644 --- a/changed_files.json +++ b/changed_files.json @@ -1,3 +1,8 @@ [ - "R/ARMA.R" + "DESCRIPTION", + "NNS_13.0.tar.gz", + "NNS_13.0.zip", + "R/Multivariate_Regression.R", + "src/NNS.dll", + "tests/testthat/Rplots.pdf" ] diff --git a/sync/last_live_r_parity_report.md b/sync/last_live_r_parity_report.md index 09eb22d..90e47f2 100644 --- a/sync/last_live_r_parity_report.md +++ b/sync/last_live_r_parity_report.md @@ -3,12 +3,17 @@ - Plan: `sync/last_r_api_plan.json` - R checkout: `upstream/NNS` - Fresh cache requested: `False` -- Skip install: `False` -- Live R recompute: `True` +- Skip install: `True` +- Live R recompute: `False` -## Result: live R parity diverged +## Result: manual review required -Mapped parity tests recomputed every R value from the freshly installed live R NNS and the Python implementation did not match. Public Python behavior differs from live R at the recorded commit. +The plan reports unmapped R files. A human must extend `sync/r_api_map.json` before automated parity can run: -Failing command: `/opt/hostedtoolcache/Python/3.11.15/x64/bin/python -m pytest -q -n 0 tests/parity/test_practical_examples.py tests/parity/test_r13_smoke.py` -Exit status: `1` +- `R/Multivariate_Regression.R` + +## Workflow step outcome + +- `run_live_r_parity_for_changed_api.py` step outcome: `failure` +- Fresh cache requested: `false` +- DESCRIPTION changed: `true` diff --git a/sync/last_r_api_inspection.md b/sync/last_r_api_inspection.md index 79defb7..5b732db 100644 --- a/sync/last_r_api_inspection.md +++ b/sync/last_r_api_inspection.md @@ -2,25 +2,33 @@ ## Changed files -- `R/ARMA.R` +- `DESCRIPTION` +- `NNS_13.0.tar.gz` +- `NNS_13.0.zip` +- `R/Multivariate_Regression.R` +- `src/NNS.dll` +- `tests/testthat/Rplots.pdf` ## Affected Python modules -- `src/nns/arma.py` +- `pyproject.toml` +- `tests/_r_cache.json` +- `tools/NNS` ## Parity tests to run -- `tests/parity/test_practical_examples.py` -- `tests/parity/test_r13_smoke.py` +- `tests/parity` ## Cache scope -- `NNS.ARMA` -- `NNS.ARMA.optim` -- `NNS.VAR` +- None mapped ## Required actions -- Fresh cache required: `False` +- Fresh cache required: `True` - Export review required: `False` -- Unmapped R files present: `False` +- Unmapped R files present: `True` + +## Unmapped R files + +- `R/Multivariate_Regression.R` diff --git a/sync/last_r_api_plan.json b/sync/last_r_api_plan.json index 6fef8d5..b3e335c 100644 --- a/sync/last_r_api_plan.json +++ b/sync/last_r_api_plan.json @@ -1,22 +1,26 @@ { "changed_files": [ - "R/ARMA.R" + "DESCRIPTION", + "NNS_13.0.tar.gz", + "NNS_13.0.zip", + "R/Multivariate_Regression.R", + "src/NNS.dll", + "tests/testthat/Rplots.pdf" ], "affected_python_modules": [ - "src/nns/arma.py" + "pyproject.toml", + "tests/_r_cache.json", + "tools/NNS" ], "parity_tests": [ - "tests/parity/test_practical_examples.py", - "tests/parity/test_r13_smoke.py" + "tests/parity" ], - "cache_scope": [ - "NNS.ARMA", - "NNS.ARMA.optim", - "NNS.VAR" - ], - "requires_fresh_cache": false, + "cache_scope": [], + "requires_fresh_cache": true, "requires_export_review": false, - "has_unmapped_r_files": false, - "unmapped_r_files": [], + "has_unmapped_r_files": true, + "unmapped_r_files": [ + "R/Multivariate_Regression.R" + ], "warnings": [] } diff --git a/sync/nns_source.json b/sync/nns_source.json index 602ec68..81d29f9 100644 --- a/sync/nns_source.json +++ b/sync/nns_source.json @@ -1,8 +1,8 @@ { "r_repo": "OVVO-Financial/NNS", - "r_commit": "905b8bbd42b3236bf88aba7f18df7a9a378dbd7b", + "r_commit": "8183e964c941d9981e19b23dfbc9fc8336903d89", "r_version": "13.0", - "r_src_tree_hash": "654e411bd4e8caabfd57a1a4190eb1d97411e059", + "r_src_tree_hash": "e68bedaf44bce209a1f1a778d398adb5a84b57a6", "core_repo": "OVVO-Financial/NNS-core", "core_commit": "cfc25a3469df6460f9224fb976fcb58de9d58068", "python_repo": "OVVO-Financial/NNS-python", diff --git a/tools/NNS/.Rbuildignore b/tools/NNS/.Rbuildignore new file mode 100644 index 0000000..1a5d1d0 --- /dev/null +++ b/tools/NNS/.Rbuildignore @@ -0,0 +1,6 @@ +^.*\.Rproj$ +^\.Rproj\.user$ +^\.travis\.yml$ +^doc$ +^Meta$ +^\.github/ \ No newline at end of file diff --git a/tools/NNS/.github/downstream-sync.md b/tools/NNS/.github/downstream-sync.md new file mode 100644 index 0000000..b35a265 --- /dev/null +++ b/tools/NNS/.github/downstream-sync.md @@ -0,0 +1,73 @@ +# Downstream sync contract + +`OVVO-Financial/NNS` is the source of truth for the NNS implementation. It is +the truth for **both** the native C++ layer under `src/**` and the R API +behavior under `R/**`, `NAMESPACE`, and `DESCRIPTION`. + +Downstream flow (a diamond, not a single chain): + +```text +OVVO-Financial/NNS (source of truth: R + C++ src/**) + / \ + src/** native truth R API behavior truth + v v + NNS-core (portable C++) (tested directly) + \ | + v v + NNS-python <-------------+ +``` + +Two paths converge on `NNS-python`: + +* **Native path:** `src/**` is the native C++ truth. It flows to `NNS-core` + (the portable C++ layer), which in turn feeds `NNS-python`. +* **API path:** `R/**`, `NAMESPACE`, and `DESCRIPTION` define the R API + behavior truth, which is tested directly and flows straight to `NNS-python` + for API and parity review. + +`NNS-python` sits at the convergence of both paths and must satisfy both. + +## Triggers + +A push to `NNS-Beta-Version` dispatches downstream sync events based on changed files. + +| Changed path | Downstream action | +| ------------- | ----------------------------------------------------------------------------- | +| `src/**` | Dispatch `nns-r-src-updated` to `OVVO-Financial/NNS-core` | +| `R/**` | Dispatch `nns-r-api-or-version-updated` to `OVVO-Financial/NNS-python` | +| `NAMESPACE` | Dispatch `nns-r-api-or-version-updated` to `OVVO-Financial/NNS-python` | +| `DESCRIPTION` | Dispatch `nns-r-api-or-version-updated` and require downstream version review | + +A `DESCRIPTION` version change means downstream Python must perform fresh live-R parity-cache regeneration. + +## Required secret + +`OVVO_SYNC_TOKEN` must be a token or GitHub App installation token with permission to call `repository_dispatch` on: + +* `OVVO-Financial/NNS-core` +* `OVVO-Financial/NNS-python` + +The default `GITHUB_TOKEN` is intentionally not used for cross-repo dispatch, because it is scoped to this repository and cannot reliably trigger `repository_dispatch` on the downstream repositories. + +## Payload + +Dispatch payloads include: + +* `r_repo` +* `r_commit` +* `r_ref` +* `r_version` +* `r_src_tree_hash` +* `src_changed` +* `r_api_changed` +* `description_changed` +* `changed_files` + +## Expected downstream events + +* `nns-r-src-updated` — received by `OVVO-Financial/NNS-core` when `src/**` changes. +* `nns-r-api-or-version-updated` — received by `OVVO-Financial/NNS-python` when `R/**`, `NAMESPACE`, or `DESCRIPTION` changes. + +## Rule + +No downstream sync is complete until each downstream repository records the exact upstream R commit and verifies its own build and parity gates. diff --git a/tools/NNS/.github/workflows/ci.yml b/tools/NNS/.github/workflows/ci.yml new file mode 100644 index 0000000..aaece3c --- /dev/null +++ b/tools/NNS/.github/workflows/ci.yml @@ -0,0 +1,119 @@ +name: R-CMD-check + +on: + push: + branches: ["NNS-Beta-Version"] + pull_request: + branches: ["NNS-Beta-Version"] + workflow_dispatch: + +jobs: + R-CMD-check: + runs-on: ${{ matrix.config.os }} + name: ${{ matrix.config.os }} (${{ matrix.config.r }}) + strategy: + fail-fast: false + matrix: + config: + - {os: windows-latest, r: 'release'} + - {os: windows-latest, r: 'devel'} + - {os: ubuntu-latest, r: 'release'} + - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} + - {os: macos-latest, r: 'release'} + + env: + R_REMOTES_NO_ERRORS_FROM_WARNINGS: true + _R_CHECK_FORCE_SUGGESTS_: false + GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} + RGL_USE_NULL: true + R_KEEP_PKG_SOURCE: yes + + steps: + - uses: actions/checkout@v4 + + - name: Install system libraries (Ubuntu) + if: runner.os == 'Linux' + run: | + sudo apt-get update + sudo apt-get install -y libgsl-dev libjpeg-turbo8-dev libpng-dev libtiff5-dev libfreetype6-dev libharfbuzz-dev libfribidi-dev xorg-dev + + - name: Install dependencies on macOS + if: runner.os == 'macOS' + run: | + brew update + brew install xquartz fribidi gsl + echo "PKG_CPPFLAGS=-I$(brew --prefix gsl)/include" >> "$GITHUB_ENV" + echo "PKG_LIBS=-L$(brew --prefix gsl)/lib -lgsl -lgslcblas -lm" >> "$GITHUB_ENV" + + - uses: r-lib/actions/setup-r@v2 + with: + r-version: ${{ matrix.config.r }} + + - uses: r-lib/actions/setup-r-dependencies@v2 + with: + extra-packages: any::rcmdcheck + needs: check + + - name: Check + uses: r-lib/actions/check-r-package@v2 + with: + args: 'c("--no-manual", "--ignore-vignettes", "--no-build-vignettes")' + error-on: '"error"' + + ubsan-clang-docker: + name: UBSAN clang + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Run UBSAN check inside container + run: | + docker run --rm --platform linux/x86_64 \ + --cap-add=SYS_PTRACE \ + -e UBSAN_OPTIONS="print_stacktrace=1:halt_on_error=0" \ + -e RGL_USE_NULL=true \ + -v "${GITHUB_WORKSPACE}:/data" -w /data \ + rocker/r-devel-ubsan-clang bash -lc ' + set -euo pipefail + + echo "::group::Update and Align System" + apt-get update -y + # This forces the image to sync its internal versions to avoid the libgbm1 conflict + DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade -y + echo "::endgroup::" + + echo "::group::Install system dependencies" + # Removed libgl1-mesa-dev and libglu1-mesa-dev to avoid the Sid conflict + DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \ + cmake libgsl-dev libcurl4-openssl-dev libssl-dev libxml2-dev \ + libjpeg-dev libpng-dev libtiff5-dev libfreetype6-dev \ + libharfbuzz-dev libfribidi-dev + echo "::endgroup::" + + echo "::group::Install R package dependencies" + RD -q -e "install.packages(c(\"remotes\", \"rcmdcheck\", \"Rfast\", \"rgl\", \"recipes\"), repos=\"https://cloud.r-project.org\", Ncpus=2)" + RD -q -e "remotes::install_deps(dependencies = TRUE, repos=\"https://cloud.r-project.org\")" + echo "::endgroup::" + + echo "::group::Run R CMD check with UBSAN" + RD CMD check --no-manual --ignore-vignettes --no-build-vignettes . > ubsan_check.out 2> ubsan_check.err || true + cat ubsan_check.err + echo "::endgroup::" + ' + + - name: Check for UBSAN errors + run: | + if grep -iE "(runtime error|sanitizer)" ubsan_check.err; then + echo "::error::UBSAN detected runtime errors." + exit 1 + fi + + - name: Upload UBSAN logs + if: always() + uses: actions/upload-artifact@v4 + with: + name: ubsan-logs + path: | + ubsan_check.out + ubsan_check.err + *.Rcheck/ diff --git a/tools/NNS/.github/workflows/downstream-sync-dispatch.yml b/tools/NNS/.github/workflows/downstream-sync-dispatch.yml new file mode 100644 index 0000000..dd93700 --- /dev/null +++ b/tools/NNS/.github/workflows/downstream-sync-dispatch.yml @@ -0,0 +1,144 @@ +name: Downstream sync dispatch + +on: + push: + branches: [NNS-Beta-Version] + workflow_dispatch: + +permissions: + contents: read + +jobs: + detect-and-dispatch: + runs-on: ubuntu-latest + steps: + - name: Check out R NNS repo + uses: actions/checkout@v4 + with: + fetch-depth: 0 + + - name: Detect upstream changes + id: detect + shell: bash + run: | + set -euo pipefail + + if [ "${{ github.event_name }}" = "push" ] && [ -n "${{ github.event.before }}" ]; then + git diff --name-only "${{ github.event.before }}" "${{ github.sha }}" > changed_files.txt + elif git rev-parse HEAD^ >/dev/null 2>&1; then + git diff --name-only HEAD^ HEAD > changed_files.txt + else + git ls-files > changed_files.txt + fi + + if grep -q '^src/' changed_files.txt; then + echo "src_changed=true" >> "$GITHUB_OUTPUT" + else + echo "src_changed=false" >> "$GITHUB_OUTPUT" + fi + + if grep -Eq '^(R/|NAMESPACE$|DESCRIPTION$)' changed_files.txt; then + echo "r_api_changed=true" >> "$GITHUB_OUTPUT" + else + echo "r_api_changed=false" >> "$GITHUB_OUTPUT" + fi + + if grep -q '^DESCRIPTION$' changed_files.txt; then + echo "description_changed=true" >> "$GITHUB_OUTPUT" + else + echo "description_changed=false" >> "$GITHUB_OUTPUT" + fi + + version="$(awk -F': *' '$1 == "Version" { print $2 }' DESCRIPTION)" + echo "r_version=${version}" >> "$GITHUB_OUTPUT" + + if git rev-parse "${GITHUB_SHA}:src" >/dev/null 2>&1; then + src_hash="$(git rev-parse "${GITHUB_SHA}:src")" + else + src_hash="no-src-directory" + fi + echo "r_src_tree_hash=${src_hash}" >> "$GITHUB_OUTPUT" + + python - <<'PY' + import json + from pathlib import Path + + files = [ + line.strip() + for line in Path("changed_files.txt").read_text().splitlines() + if line.strip() + ] + Path("changed_files.json").write_text(json.dumps(files), encoding="utf-8") + PY + + - name: Dispatch NNS-core sync + if: steps.detect.outputs.src_changed == 'true' + env: + GH_TOKEN: ${{ secrets.OVVO_SYNC_TOKEN }} + R_VERSION: ${{ steps.detect.outputs.r_version }} + R_SRC_TREE_HASH: ${{ steps.detect.outputs.r_src_tree_hash }} + SRC_CHANGED: ${{ steps.detect.outputs.src_changed }} + R_API_CHANGED: ${{ steps.detect.outputs.r_api_changed }} + DESCRIPTION_CHANGED: ${{ steps.detect.outputs.description_changed }} + shell: bash + run: | + set -euo pipefail + jq -n \ + --arg r_repo "OVVO-Financial/NNS" \ + --arg r_commit "${GITHUB_SHA}" \ + --arg r_ref "${GITHUB_REF_NAME}" \ + --arg r_version "${R_VERSION}" \ + --arg r_src_tree_hash "${R_SRC_TREE_HASH}" \ + --argjson src_changed "${SRC_CHANGED}" \ + --argjson r_api_changed "${R_API_CHANGED}" \ + --argjson description_changed "${DESCRIPTION_CHANGED}" \ + --slurpfile changed_files changed_files.json \ + '{event_type: "nns-r-src-updated", + client_payload: { + r_repo: $r_repo, + r_commit: $r_commit, + r_ref: $r_ref, + r_version: $r_version, + r_src_tree_hash: $r_src_tree_hash, + src_changed: $src_changed, + r_api_changed: $r_api_changed, + description_changed: $description_changed, + changed_files: $changed_files[0] + }}' > payload.json + gh api --method POST repos/OVVO-Financial/NNS-core/dispatches --input payload.json + + - name: Dispatch NNS-python API or version review + if: steps.detect.outputs.r_api_changed == 'true' || steps.detect.outputs.description_changed == 'true' + env: + GH_TOKEN: ${{ secrets.OVVO_SYNC_TOKEN }} + R_VERSION: ${{ steps.detect.outputs.r_version }} + R_SRC_TREE_HASH: ${{ steps.detect.outputs.r_src_tree_hash }} + SRC_CHANGED: ${{ steps.detect.outputs.src_changed }} + R_API_CHANGED: ${{ steps.detect.outputs.r_api_changed }} + DESCRIPTION_CHANGED: ${{ steps.detect.outputs.description_changed }} + shell: bash + run: | + set -euo pipefail + jq -n \ + --arg r_repo "OVVO-Financial/NNS" \ + --arg r_commit "${GITHUB_SHA}" \ + --arg r_ref "${GITHUB_REF_NAME}" \ + --arg r_version "${R_VERSION}" \ + --arg r_src_tree_hash "${R_SRC_TREE_HASH}" \ + --argjson src_changed "${SRC_CHANGED}" \ + --argjson r_api_changed "${R_API_CHANGED}" \ + --argjson description_changed "${DESCRIPTION_CHANGED}" \ + --slurpfile changed_files changed_files.json \ + '{event_type: "nns-r-api-or-version-updated", + client_payload: { + r_repo: $r_repo, + r_commit: $r_commit, + r_ref: $r_ref, + r_version: $r_version, + r_src_tree_hash: $r_src_tree_hash, + src_changed: $src_changed, + r_api_changed: $r_api_changed, + description_changed: $description_changed, + changed_files: $changed_files[0] + }}' > payload.json + gh api --method POST repos/OVVO-Financial/NNS-python/dispatches --input payload.json diff --git a/tools/NNS/.github/workflows/pkgdown.yaml b/tools/NNS/.github/workflows/pkgdown.yaml new file mode 100644 index 0000000..ef32354 --- /dev/null +++ b/tools/NNS/.github/workflows/pkgdown.yaml @@ -0,0 +1,65 @@ +# Workflow from official r-lib/actions (updated for headless rgl) +name: pkgdown + +on: + push: + branches: [NNS-Beta-Version] + pull_request: + branches: [NNS-Beta-Version] + release: + types: [published] + workflow_dispatch: + +permissions: read-all + +jobs: + pkgdown: + runs-on: ubuntu-latest + # Only restrict concurrency for non-PR jobs + concurrency: + group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} + env: + GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} + permissions: + contents: write + steps: + - uses: actions/checkout@v4 + + - uses: r-lib/actions/setup-pandoc@v2 + + - uses: r-lib/actions/setup-r@v2 + with: + use-public-rspm: true + + - uses: r-lib/actions/setup-r-dependencies@v2 + with: + extra-packages: any::pkgdown, any::bookdown, local::. + needs: website + + - name: Configure headless rgl + run: echo "options(rgl.useNULL = TRUE)" >> ~/.Rprofile + shell: bash + + - name: Build site + run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE) + shell: Rscript {0} + + - name: Build book + run: | + old <- setwd("book") + on.exit(setwd(old), add = TRUE) + bookdown::render_book("index.Rmd", output_format = "bookdown::gitbook") + shell: Rscript {0} + + - name: Stage book for Pages + run: | + mkdir -p docs/book + rsync -a --delete book/_book/ docs/book/ + + - name: Deploy to GitHub pages 🚀 + if: github.event_name != 'pull_request' + uses: JamesIves/github-pages-deploy-action@v4.5.0 + with: + clean: true + branch: gh-pages + folder: docs diff --git a/tools/NNS/.gitignore b/tools/NNS/.gitignore new file mode 100644 index 0000000..33d48bf --- /dev/null +++ b/tools/NNS/.gitignore @@ -0,0 +1,9 @@ +tests/testthat/*.pdf +.Rproj.user +.RData +.Rhistory +NNS.Rproj +src/*.o +src/*.dll +src/*.so +Rplots.pdf diff --git a/tools/NNS/.travis.yml b/tools/NNS/.travis.yml new file mode 100644 index 0000000..6db0713 --- /dev/null +++ b/tools/NNS/.travis.yml @@ -0,0 +1,12 @@ +language: R +sudo: false +warnings_are_errors: false +cache: packages +branches: + only: + - NNS-Beta-Version +r_binary_packages: + - RcppGSL + - RcppZiggurat + - Rfast + diff --git a/tools/NNS/DESCRIPTION b/tools/NNS/DESCRIPTION index f98d8fe..c03b5fe 100644 --- a/tools/NNS/DESCRIPTION +++ b/tools/NNS/DESCRIPTION @@ -2,7 +2,7 @@ Package: NNS Type: Package Title: Nonlinear Nonparametric Statistics Version: 13.0 -Date: 2026-06-10 +Date: 2026-06-24 Authors@R: c( person("Fred", "Viole", role=c("aut","cre"), email="ovvo.open.source@gmail.com"), person("Roberto", "Spadim", role="ctb"), @@ -14,17 +14,14 @@ BugReports: https://github.com/OVVO-Financial/NNS/issues License: GPL-3 URL: https://github.com/OVVO-Financial/NNS Depends: R (>= 3.6.0) -Imports: data.table, doParallel, foreach, Rcpp, RcppParallel, Rfast, - rgl, xts, zoo -Suggests: knitr, rmarkdown, testthat (>= 3.0.0) +Imports: data.table, doParallel, foreach, Rcpp, RcppParallel, Rfast, rgl, xts, zoo +Suggests: + knitr, + rmarkdown, + testthat (>= 3.0.0) VignetteBuilder: knitr LinkingTo: Rcpp, RcppParallel SystemRequirements: GNU make Config/testthat/edition: 3 RoxygenNote: 7.2.3 Encoding: UTF-8 -NeedsCompilation: yes -Packaged: 2026-06-11 03:14:43 UTC; fredv -Author: Fred Viole [aut, cre], - Roberto Spadim [ctb], - Rasheed Khoshnaw [ctb] diff --git a/tools/NNS/Meta/vignette.rds b/tools/NNS/Meta/vignette.rds new file mode 100644 index 0000000..7dfe6ef Binary files /dev/null and b/tools/NNS/Meta/vignette.rds differ diff --git a/tools/NNS/NNS_13.0.tar.gz b/tools/NNS/NNS_13.0.tar.gz new file mode 100644 index 0000000..cd05944 Binary files /dev/null and b/tools/NNS/NNS_13.0.tar.gz differ diff --git a/tools/NNS/NNS_13.0.zip b/tools/NNS/NNS_13.0.zip new file mode 100644 index 0000000..3c5a119 Binary files /dev/null and b/tools/NNS/NNS_13.0.zip differ diff --git a/tools/NNS/R/ARMA.R b/tools/NNS/R/ARMA.R index a432b2e..8e86aed 100644 --- a/tools/NNS/R/ARMA.R +++ b/tools/NNS/R/ARMA.R @@ -81,8 +81,10 @@ NNS.ARMA <- function(variable, options(warn = -1) if(!is.null(best.periods) && !is.numeric(seasonal.factor)) seasonal.factor <- FALSE - mc <- match.call() - label <- deparse(mc$variable) + # label is only used for plot axis titles; deparse(match.call()) is otherwise + # pure per-call overhead (NNS.ARMA.optim calls NNS.ARMA hundreds of times with + # plot = FALSE). Defer it to the plotting branch. + label <- if (isTRUE(plot)) deparse(match.call()$variable) else NULL variable <- as.numeric(variable) OV <- variable @@ -107,13 +109,17 @@ NNS.ARMA <- function(variable, lag <- seasonal.factor output <- numeric(length(seasonal.factor)) for(i in 1 : length(seasonal.factor)){ - rev.var <- variable[seq(length(variable), 1, -i)] + rev.var <- variable[seq(length(variable), 1, -seasonal.factor[i])] output[i] <- abs(sd(rev.var) / mean(rev.var)) } if(is.null(weights)){ Relative.seasonal <- output / abs(sd(variable)/mean(variable)) - Seasonal.weighting <- 1 / Relative.seasonal + # Floor the CV ratio: a perfectly stable lag-subsample (CV 0) in an otherwise- + # varying series gives Relative.seasonal 0 -> 1/0 = Inf -> Inf/Inf = NaN in the + # normalisation below, silently poisoning the forecast. pmax keeps a fully + # constant series (Relative.seasonal NaN) propagating NaN, matching prior behaviour. + Seasonal.weighting <- 1 / pmax(Relative.seasonal, .Machine$double.eps) Observation.weighting <- 1 / sqrt(seasonal.factor) Weights <- (Seasonal.weighting * Observation.weighting) / sum(Observation.weighting * Seasonal.weighting) seasonal.plot <- FALSE @@ -219,17 +225,18 @@ NNS.ARMA <- function(variable, x <- Component.index[[i]] y <- Component.series[[i]] - last.y <- tail(y, 1) + last.y <- y[length(y)] reg.points <- NNS.reg(x, y, return.values = FALSE, plot = FALSE, multivariate.call = TRUE) reg.points <- reg.points[complete.cases(reg.points), ] - xs <- tail(reg.points$x, 1) - reg.points$x - ys <- tail(reg.points$y, 1) - reg.points$y + rpx <- reg.points$x; rpy <- reg.points$y + xs <- rpx[length(rpx)] - rpx + ys <- rpy[length(rpy)] - rpy - xs <- head(xs, -1) - ys <- head(ys, -1) + xs <- xs[-length(xs)] + ys <- ys[-length(ys)] run <- mean(rep(xs, (1:length(xs))^2)) rise <- mean(rep(ys, (1:length(ys))^2)) @@ -243,8 +250,9 @@ NNS.ARMA <- function(variable, if ((method %in% c("lin", "both", "means")) || is.numeric(pred.int)) { Lin.Regression.Estimates <- sapply(seq_along(lag), function(i) { - last.x <- tail(Component.index[[i]], 1) - lin.reg <- fast_lm(Component.index[[i]], Component.series[[i]]) + ci <- Component.index[[i]] + last.x <- ci[length(ci)] + lin.reg <- fast_lm(ci, Component.series[[i]]) coefs <- lin.reg$coef return(as.numeric(coefs[1] + coefs[2] * (last.x + 1))) }) @@ -270,7 +278,10 @@ NNS.ARMA <- function(variable, } if(!is.null(pred.int)){ - if (method != "means") lin.resid <- mean(abs(Lin.Regression.Estimates - mean(Lin.Regression.Estimates))) + if (method != "means"){ + Lin.Regression.Estimates <- unlist(Lin.Regression.Estimates) + lin.resid <- mean(abs(Lin.Regression.Estimates - mean(Lin.Regression.Estimates))) + } PIs <- do.call(cbind, NNS.MC(Estimates, lower_rho = -1, upper_rho = 1, by = .2)$replicates) lin.resid <- mean(unlist(lin.resid)) lin.resid[is.na(lin.resid)] <- 0 @@ -354,4 +365,4 @@ NNS.ARMA <- function(variable, } else { return(Estimates) } -} \ No newline at end of file +} diff --git a/tools/NNS/R/Dependence.R b/tools/NNS/R/Dependence.R index 294af6c..9221723 100644 --- a/tools/NNS/R/Dependence.R +++ b/tools/NNS/R/Dependence.R @@ -131,7 +131,25 @@ NNS.dep <- function(x, l <- length(x) obs <- max(8L, as.integer(l / 8L)) - + + # Native fast path: NNS_dep_pair_cpp only needs the two quadrant vectors, so + # call NNS_part_cpp in quadrants-only mode and skip the data.table wrappers + # NNS.part() builds. Quadrants are identical to the full path, so the result + # is bit-identical. Only when not plotting (Voronoi/print.map needs the full + # partition). Gated by getOption("NNS.native"). + if (isTRUE(getOption("NNS.native", TRUE)) && !isTRUE(print.map)) { + ord <- max(ceiling(log(l, 2)), 1L) # NNS.part default order + qxy <- NNS_part_cpp(x, y, type = "XONLY", order_in = as.integer(ord), + obs_req = as.integer(obs), min_obs_stop = FALSE, + noise_reduction = "off", quadrants_only = TRUE)$quadrant + qyx <- NNS_part_cpp(y, x, type = "XONLY", order_in = as.integer(ord), + obs_req = as.integer(obs), min_obs_stop = FALSE, + noise_reduction = "off", quadrants_only = TRUE)$quadrant + return(NNS_dep_pair_cpp(x = as.numeric(x), y = as.numeric(y), + quad_xy = as.character(qxy), quad_yx = as.character(qyx), + asym = isTRUE(asym))) + } + PART_xy <- suppressWarnings( NNS.part(x, y, order = NULL, obs.req = obs, min.obs.stop = FALSE, type = "XONLY", Voronoi = print.map) diff --git a/tools/NNS/R/Multivariate_Regression.R b/tools/NNS/R/Multivariate_Regression.R index 749213d..a4888be 100644 --- a/tools/NNS/R/Multivariate_Regression.R +++ b/tools/NNS/R/Multivariate_Regression.R @@ -10,6 +10,16 @@ NNS.M.reg <- function (X_n, Y, factor.2.dummy = TRUE, order = NULL, n.best = NUL original.DV <- Y n <- ncol(original.IVs) + feature.names <- colnames(original.IVs) + if(is.null(feature.names)){ + feature.names <- paste0("x", 1:n) + } else { + missing.feature.names <- is.na(feature.names) | feature.names == "" + feature.names[missing.feature.names] <- paste0("x", which(missing.feature.names)) + feature.names <- make.unique(feature.names, sep = ".") + } + colnames(original.IVs) <- feature.names + if(is.null(ncol(X_n))) X_n <- t(t(X_n)) if(is.null(names(Y))){ @@ -30,6 +40,7 @@ NNS.M.reg <- function (X_n, Y, factor.2.dummy = TRUE, order = NULL, n.best = NUL if(ncol(point.est) != n){ stop("Please ensure 'point.est' is of compatible dimensions to 'x'") } + colnames(point.est) <- feature.names } original.matrix <- cbind.data.frame(original.DV, original.IVs) @@ -67,10 +78,7 @@ NNS.M.reg <- function (X_n, Y, factor.2.dummy = TRUE, order = NULL, n.best = NUL reg.points.matrix <- do.call('cbind', lapply(reg.points, `length<-`, max(lengths(reg.points)))) } - if(is.null(colnames(original.IVs))){ - colnames.list <- lapply(1 : ncol(original.IVs), function(i) paste0("x", i)) - colnames(reg.points.matrix) <- as.character(colnames.list) - } + colnames(reg.points.matrix) <- feature.names if(is.numeric(order) || is.null(order)) reg.points.matrix <- unique(reg.points.matrix) @@ -138,7 +146,7 @@ NNS.M.reg <- function (X_n, Y, factor.2.dummy = TRUE, order = NULL, n.best = NUL REGRESSION.POINT.MATRIX <- REGRESSION.POINT.MATRIX[, .SD[1], by = NNS.ID] REGRESSION.POINT.MATRIX <- REGRESSION.POINT.MATRIX[, .SD, .SDcols = colnames(mean.by.id.matrix)%in%c(paste("RPM", 1:n), "y.hat")] - data.table::setnames(REGRESSION.POINT.MATRIX, 1:n, colnames(mean.by.id.matrix)[1:n]) + data.table::setnames(REGRESSION.POINT.MATRIX, 1:n, feature.names) if(is.null(n.best)){ dependence <- NNS.copula(cbind(original.IVs, original.DV)) diff --git a/tools/NNS/R/Partition_Map.R b/tools/NNS/R/Partition_Map.R index 0a6d14c..f68e88c 100644 --- a/tools/NNS/R/Partition_Map.R +++ b/tools/NNS/R/Partition_Map.R @@ -14,7 +14,7 @@ #' \itemize{ #' \item{\code{"dt"}} a \code{data.table} of \code{x} and \code{y} observations with their partition assignment \code{"quadrant"} in the 3rd column and their prior partition assignment \code{"prior.quadrant"} in the 4th column. #' \item{\code{"regression.points"}} the \code{data.table} of regression points for that given \code{(order = ...)}. -#' \item{\code{"order"}} the \code{order} of the final partition given \code{"min.obs.stop"} stopping condition. +#' \item{\code{"order"}} the \code{order} of the final partition given \code{"min.obs.stop"} stopping condition. #' } #' #' @note \code{min.obs.stop = FALSE} will not generate regression points due to unequal partitioning of quadrants from individual cluster observations. @@ -75,7 +75,7 @@ NNS.part <- function(x, y, Voronoi = FALSE, type = NULL, data.table::setorder(RP, quadrant) - if (is.discrete(x)) RP[, x := ifelse(x %% 1 < 0.5, floor(x), ceiling(x))] + if (is.discrete(x)) RP[is.finite(x), x := ifelse(x %% 1 < 0.5, floor(x), ceiling(x))] if (isTRUE(Voronoi)) { mc <- match.call(); x.label <- deparse(mc$x); y.label <- deparse(mc$y) diff --git a/tools/NNS/R/RcppExports.R b/tools/NNS/R/RcppExports.R index 39f700b..41bb0c4 100644 --- a/tools/NNS/R/RcppExports.R +++ b/tools/NNS/R/RcppExports.R @@ -93,8 +93,8 @@ factor_2_dummy_FR <- function(x) { .Call(`_NNS_factor_2_dummy_FR`, x) } -generate.vectors <- function(x, l) { - .Call(`_NNS_generate_vectors`, x, l) +generate.vectors <- function(x, l, len = -1L) { + .Call(`_NNS_generate_vectors`, x, l, len) } generate.lin.vectors <- function(x, l, h = 1L) { @@ -125,6 +125,10 @@ upSample <- function(x, y, list = FALSE, yname = "Class") { .Call(`_NNS_upSample`, x, y, list, yname) } +NNS_reg_points_cpp <- function(x_, y_, rpx_, rpy_, dependence, stn) { + .Call(`_NNS_NNS_reg_points_cpp`, x_, y_, rpx_, rpy_, dependence, stn) +} + CoLPM_nD_batch_RCPP <- function(data, targets, degree = 0.0, norm = TRUE) { .Call(`_NNS_CoLPM_nD_batch_RCPP`, data, targets, degree, norm) } diff --git a/tools/NNS/R/Regression.R b/tools/NNS/R/Regression.R index b04e580..89e1c53 100644 --- a/tools/NNS/R/Regression.R +++ b/tools/NNS/R/Regression.R @@ -150,7 +150,52 @@ NNS.reg = function (x, y, oldw <- getOption("warn") options(warn = -1) - + + # ---- Lean fast path for multivariate callers (NNS.ARMA / NNS.stack / NNS.boost) ---- + # Those call NNS.reg(x, y, multivariate.call = TRUE) on numeric vectors many + # times. Bypass the full argument-marshalling setup below (match.call/deparse, + # factor / dim-reduction handling, frame conversions) and go straight to the + # XONLY partition + native regression points. Bit-identical to the full path + # for the configuration it handles; every other case falls through unchanged. + if (isTRUE(getOption("NNS.native", TRUE)) && + isTRUE(multivariate.call) && is.null(type) && is.null(dim.red.method) && + !isTRUE(smooth) && identical(noise.reduction, "off") && + is.null(dim(x)) && is.null(dim(y)) && is.numeric(x) && is.numeric(y) && + (is.null(order) || (is.numeric(order) && length(order) == 1)) && + !anyNA(x) && !anyNA(y) && + !(is.discrete(y) && length(unique(y)) < sqrt(length(y)))) { + + xv <- as.double(x) + yv <- as.numeric(y) + + dependence <- tryCatch(NNS.dep(xv, yv, print.map = FALSE, asym = TRUE)$Dependence, error = function(e) .1) + dependence <- tryCatch(mean(c(dependence, NNS.copula(cbind(apply(cbind(xv, xv, yv), 2, function(z) NNS.rescale(z, 0, 1)))))), error = function(e) dependence) + dependence[is.na(dependence)] <- 0.1 + + rounded_dep <- ifelse(dependence*10 %% 1 < .5, floor(dependence * 10), ceiling(dependence * 10)) + if(length(yv) < 100){ + rounded_dep <- rounded_dep / 2 + rounded_dep <- floor(rounded_dep) + } + rounded_dep <- max(1, rounded_dep) + dep.reduced.order <- max(1, ifelse(is.null(order), rounded_dep, order)) + + if (dependence != 1 && !identical(dep.reduced.order, "max") && dependence < 1) { + pm <- .NNS.reg.part.xonly(xv, yv, dep.reduced.order) + if (length(pm$x) == 0) { + pm <- .NNS.reg.part.xonly(xv, yv, min(nchar(pm$dt_quadrant))) + } + if (length(pm$x) > 0) { + res <- NNS_reg_points_cpp(xv, yv, + as.numeric(pm$x), as.numeric(pm$y), + as.numeric(dependence), 0.95) + data.table::setDT(res) + return(res) + } + } + # dependence == 1 / order == "max" / empty partition -> fall through to full path + } + if(anyNA(cbind(x,y))) stop("You have some missing values, please address.") if(plot.regions && !is.null(order) && order == "max") stop('Please reduce the "order" or set "plot.regions = FALSE".') @@ -480,8 +525,28 @@ NNS.reg = function (x, y, } } + # Native fast path for multivariate callers (NNS.ARMA / NNS.stack / NNS.boost). + # Reproduces the regression.points pipeline below in C++, avoiding the per-call + # data.table overhead that dominates when NNS.reg is invoked many times on small + # inputs. Bit-identical to the pure-R path; gated by getOption("NNS.native"). + # Only the configuration the pure-R block reaches here is handled natively: + # type = NULL, noise.reduction = "off", dependence < 1, dep.reduced.order != "max". + if (isTRUE(getOption("NNS.native", TRUE)) && + multivariate.call && is.null(type) && !isTRUE(smooth) && + identical(noise.reduction, "off") && !is.character(order) && + is.numeric(dependence) && length(dependence) == 1 && dependence < 1 && + !identical(dep.reduced.order, "max") && + length(part.map$regression.points$x) > 0) { + res <- NNS_reg_points_cpp(as.numeric(x), as.numeric(y), + as.numeric(part.map$regression.points$x), + as.numeric(part.map$regression.points$y), + as.numeric(dependence), stn) + data.table::setDT(res) + return(res) + } + nns.ids <- part.map$dt$quadrant - + if(length(part.map$dt$y) > length(y)){ part.map$dt$x <- pmax(min(x), pmin(part.map$dt$x, max(x))) part.map$dt[, y := gravity(y), by = "x"] @@ -710,9 +775,15 @@ NNS.reg = function (x, y, regression.points$y <- pmin(pmax(regression.points$y, min(y)), max(y)) if(!is.null(type) && type=="class") regression.points$y <- pmax(min(y), pmin(max(y), ifelse(regression.points$y %% 1 < 0.5, floor(regression.points$y), ceiling(regression.points$y)))) - - - # Coefficients + + # Multivariate callers (e.g. NNS.ARMA, NNS.stack, NNS.boost) only consume the + # consolidated regression points. regression.points is final at this stage; + # the downstream Regression.Coefficients / fitted-value computation below does + # not mutate it, so return early to skip that wasted work on every call. + if (multivariate.call) return(regression.points[, .(x, y)]) + + + # Coefficients Regression.Coefficients <- regression.points[, .(rise, run)] Regression.Coefficients <- Regression.Coefficients[complete.cases(Regression.Coefficients), ] upper.x <- regression.points[(2:.N), x] @@ -954,4 +1025,26 @@ NNS.reg = function (x, y, "Fitted.xy" = fitted)) } -} \ No newline at end of file +} + +# Lean XONLY partition for the native NNS.reg fast path: calls NNS_part_cpp +# directly and returns the regression points as plain vectors, skipping the +# data.table construction / setorder / coercion done by NNS.part(). Ordering +# by quadrant uses radix (C locale), matching data.table::setorder; discrete-x +# rounding mirrors NNS.part() (round finite values only, preserving infinities). +# Bit-identical to NNS.part(...)$regression.points. Defined after NNS.reg (not +# between its roxygen block and definition) so the docs stay attached to NNS.reg. +.NNS.reg.part.xonly <- function(x, y, ord) { + out <- NNS_part_cpp(x = x, y = y, type = "XONLY", + order_in = as.integer(ord), obs_req = 0L, + min_obs_stop = TRUE, noise_reduction = "off") + rp <- out[["regression.points"]] + o <- base::order(rp$quadrant, method = "radix") + rpx <- rp$x[o] + rpy <- rp$y[o] + if (is.discrete(x)) { + finite <- is.finite(rpx) + rpx[finite] <- ifelse(rpx[finite] %% 1 < 0.5, floor(rpx[finite]), ceiling(rpx[finite])) + } + list(x = rpx, y = rpy, dt_quadrant = out$dt$quadrant) +} diff --git a/tools/NNS/_pkgdown.yml b/tools/NNS/_pkgdown.yml new file mode 100644 index 0000000..d0c857e --- /dev/null +++ b/tools/NNS/_pkgdown.yml @@ -0,0 +1,33 @@ +url: https://OVVO-Financial.github.io/NNS + +template: + bootstrap: 5 + bootswatch: cosmo + +development: + mode: auto + +articles: +- title: "Getting Started" + contents: + - NNSvignette_01_Overview + +- title: "Foundations" + contents: + - NNSvignette_02_Partial_Moments + - NNSvignette_03_Correlation_and_Dependence + - NNSvignette_04_Normalization_and_Rescaling + - NNSvignette_05_Sampling + +- title: "Inference" + contents: + - NNSvignette_06_Comparing_Distributions + +- title: "Modeling" + contents: + - NNSvignette_07_Clustering_and_Regression + - NNSvignette_08_Classification + +- title: "Applications" + contents: + - NNSvignette_09_Forecasting diff --git a/tools/NNS/book/NNS_Book.Rproj b/tools/NNS/book/NNS_Book.Rproj new file mode 100644 index 0000000..827cca1 --- /dev/null +++ b/tools/NNS/book/NNS_Book.Rproj @@ -0,0 +1,15 @@ +Version: 1.0 + +RestoreWorkspace: Default +SaveWorkspace: Default +AlwaysSaveHistory: Default + +EnableCodeIndexing: Yes +UseSpacesForTab: Yes +NumSpacesForTab: 2 +Encoding: UTF-8 + +RnwWeave: Sweave +LaTeX: pdfLaTeX + +BuildType: Website diff --git a/tools/NNS/book/_bookdown.yml b/tools/NNS/book/_bookdown.yml new file mode 100644 index 0000000..2977557 --- /dev/null +++ b/tools/NNS/book/_bookdown.yml @@ -0,0 +1,37 @@ +book_filename: "nns-book" +new_session: false +delete_merged_file: true +language: + ui: + chapter_name: "Chapter " +rmd_files: + - index.Rmd + - chapter-01-why-classical-statistics-breaks.Rmd + - chapter-02-directional-deviation-operators.Rmd + - chapter-03-distribution-theory-from-partial-moments.Rmd + - chapter-04-numerical-integration-via-partial-moments.Rmd + - chapter-05-classical-moments-as-directional-aggregates.Rmd + - chapter-06-measure-theoretic-interpretation.Rmd + - chapter-07-directional-descriptive-statistics.Rmd + - chapter-08-distribution-estimation.Rmd + - chapter-09-why-correlation-fails.Rmd + - chapter-10-directional-dependence.Rmd + - chapter-11-directional-spectral-decomposition.Rmd + - chapter-12-copula-interpretation.Rmd + - chapter-13-conditional-probability-and-bayes-theorem.Rmd + - chapter-14-directional-causation.Rmd + - chapter-15-distribution-comparison.Rmd + - chapter-16-directional-tail-thresholds-probability-bounds-and-estimation-error.Rmd + - chapter-17-prediction-intervals.Rmd + - chapter-18-recursive-mean-split-estimation.Rmd + - chapter-19-dynamic-bandwidth-interpretation.Rmd + - chapter-20-synthetic-data-and-maximum-entropy-bootstrap.Rmd + - chapter-21-clustering.Rmd + - chapter-22-nonparametric-regression.Rmd + - chapter-23-classification.Rmd + - chapter-24-ensemble-methods.Rmd + - chapter-25-nonparametric-time-series-models.Rmd + - chapter-26-multivariate-forecasting.Rmd + - chapter-27-conclusion-and-next-steps.Rmd + - chapter-28-appendix-notation-and-function-reference.Rmd + \ No newline at end of file diff --git a/tools/NNS/book/_output.yml b/tools/NNS/book/_output.yml new file mode 100644 index 0000000..edb6159 --- /dev/null +++ b/tools/NNS/book/_output.yml @@ -0,0 +1,16 @@ +bookdown::gitbook: + css: style.css + split_by: chapter + config: + toc: + collapse: subsection + before: | +
  • NNS Book
  • + +bookdown::pdf_book: + keep_tex: true + latex_engine: xelatex + pandoc_args: + - "--pdf-engine-opt=-8bit" + +bookdown::epub_book: default \ No newline at end of file diff --git a/tools/NNS/book/chapter-01-why-classical-statistics-breaks.Rmd b/tools/NNS/book/chapter-01-why-classical-statistics-breaks.Rmd new file mode 100644 index 0000000..0a248e7 --- /dev/null +++ b/tools/NNS/book/chapter-01-why-classical-statistics-breaks.Rmd @@ -0,0 +1,203 @@ +# Why Classical Statistics Breaks + +Statistics was designed for a world that rarely exists. + +The classical statistical framework was built during a time when data were scarce, computation was expensive, and tractable mathematical models were essential. In that environment, simplifying assumptions were not merely convenient—they were necessary. + +Symmetry simplified algebra, linearity simplified inference, and parametric distributions simplified estimation. The result was a remarkably elegant mathematical framework that dominated statistics for over a century. + +Yet the real world is rarely so cooperative: relationships are often nonlinear, and observed distributions are frequently skewed, heavy-tailed, or otherwise far from normal. Modern data therefore repeatedly violate the assumptions upon which classical statistics was constructed. + +A familiar example is daily asset returns: even broad equity indexes exhibit fat tails and occasional abrupt drawdowns that are poorly captured by Gaussian models. + +This book begins with a simple observation: **many of the core tools of classical statistics fail because they collapse directional information into symmetric aggregates.** Once this collapse occurs, important structural information about the data is permanently lost. The purpose of this chapter is to explain why this happens—and why a different statistical primitive is needed. + +--- + +## The Hidden Assumption of Symmetry + +Most statistical quantities treat deviations from a reference point symmetrically. + +Consider the most familiar measure of variability: + +\[ +Var(X) = E[(X-\mu)^2] +\] + +The formula squares deviations from the mean and averages them. Positive and negative deviations contribute equally. + +But real systems often care deeply about **which direction a deviation occurs**. + +A negative financial return is not equivalent to a positive return of the same magnitude. +A forecast that underestimates demand may be far more costly than one that overestimates it. +A loss relative to a benchmark is not psychologically equivalent to a gain. + +Yet classical statistics treats these deviations identically. + +The symmetry is not inherent to the data. +It is imposed by the mathematical formulation. + +And once imposed, directional information disappears. + +--- + +## Aggregation Before Observation + +To see what is lost, rewrite variance by separating positive and negative deviations. + +Define the positive-part operator + +\[ +x^{+} = \max(x,0) +\] + +Then variance can be written as + +\[ +Var(X) = +E[(X-\mu)^2_+] + E[(\mu-X)^2_+]. +\] + +This decomposition shows that variance is actually the **sum of two directional quantities**: + +- upside deviation +- downside deviation + +Variance reports only their sum. + +Two distributions can therefore have identical variance while possessing completely different directional structures. + +One distribution may have large upside volatility and little downside risk. +Another may have the opposite profile. + +Variance cannot distinguish them. + +The symmetric statistic is therefore a **projection** of a richer directional structure. + +Mathematically, the directional components determine the symmetric moment uniquely, but the symmetric moment cannot recover the directional components without additional assumptions. + +Classical moments therefore aggregate directional information before reporting the result. +Once aggregated, the original directional structure cannot generally be recovered. + +--- + +## The Problem with Linear Dependence + +The same issue appears in dependence measurement. + +The classical correlation coefficient measures the strength of a linear relationship: + +\[ +\rho(X,Y)=\frac{Cov(X,Y)}{\sigma_X\sigma_Y}. +\] + +Correlation works well when relationships are approximately linear. + +But many relationships are not. + +Two variables may exhibit strong dependence through nonlinear patterns: + +- threshold effects in economics +- volatility clustering in financial markets +- asymmetric reactions to shocks +- conditional dependence structures that cancel under linear aggregation + +For example, if \(Y = X^2\) and \(X\) is symmetric around zero, then \(Corr(X,Y) = 0\) despite perfect deterministic dependence. Correlation does not merely understate the relationship—it misses it entirely. + +The problem again arises from aggregation. + +Covariance averages co-deviations across the entire distribution, collapsing directional structure into a single linear measure. + +--- + +## Parametric Comfort and Model Risk + +Another pillar of classical statistics is the use of parametric distributions. + +The normal distribution occupies a central role in statistical inference: + +- hypothesis testing +- regression modeling +- time series analysis +- risk measurement + +Parametric models dramatically simplify estimation because they restrict the space of possible distributions. + +But when the assumed model is incorrect, inference can become dangerously misleading. + +Financial markets provide many examples. + +Asset returns exhibit heavy tails, skewness, and time-varying volatility—features that violate the assumptions of the normal distribution. Yet models based on Gaussian assumptions have historically underestimated extreme events. + +The problem is not simply that the wrong distribution is chosen. + +The deeper issue is that **parametric assumptions impose structure that the data may not possess**. + +--- + +## The Limits of Traditional Nonparametrics + +Nonparametric methods were introduced to address these problems by estimating statistical objects directly from data. + +Kernel density estimation, kernel regression, and smoothing splines are common examples. + +However, most nonparametric methods introduce another challenge: **bandwidth selection**. + +The bandwidth determines how much smoothing occurs. + +Small bandwidths produce noisy estimates. +Large bandwidths obscure structure. + +In practice, bandwidth selection is often the dominant source of modeling error in nonparametric estimation. + +Thus even nonparametric methods frequently rely on externally chosen tuning parameters. + +--- + +## A Different Primitive + +The difficulties described above share a common source, and it is worth stating them plainly before moving on: + +1. **Symmetric aggregation hides directional information.** +2. **Linear dependence measures fail for nonlinear relationships.** +3. **Parametric assumptions introduce model risk.** +4. **Many nonparametric methods depend on arbitrary bandwidth selection.** + +Classical statistics begins with **symmetric aggregates**. + +Directional information is collapsed before analysis begins. + +An alternative approach is to reverse this order. + +Instead of starting with symmetric statistics, we begin with **directional deviations relative to a benchmark**—measuring how observations move relative to a target, separately above and below it. + +The key insight of this book is that directional deviation relative to a benchmark is sufficient to reconstruct many of the core constructs of statistics. + +From this single primitive we will derive: + +- the cumulative distribution function, +- classical moments, +- nonlinear dependence measures, +- nonparametric estimators, +- and benchmark-relative expected utility. + +Remarkably, symmetric statistics emerge from this framework not as axioms but as aggregations—special cases of a more general directional structure. + +--- + +## From Symmetric Statistics to Directional Statistics + +Classical statistics treats symmetry as fundamental. + +Directional statistics treats symmetry as a special case. + +Under the directional framework: + +- symmetric moments become aggregates of directional components, +- nonlinear dependence can be measured directly, +- distributions can be represented without parametric assumptions, +- and nonparametric estimation can adapt to data structure without externally chosen bandwidths. + +The next chapter introduces the mathematical foundation of this framework: **directional deviation operators**. + +These operators are the primitive from which the rest of the book is built. diff --git a/tools/NNS/book/chapter-02-directional-deviation-operators.Rmd b/tools/NNS/book/chapter-02-directional-deviation-operators.Rmd new file mode 100644 index 0000000..9f8cc34 --- /dev/null +++ b/tools/NNS/book/chapter-02-directional-deviation-operators.Rmd @@ -0,0 +1,286 @@ +# Directional Deviation Operators + +Chapter 1 argued that many failures of classical statistics arise from **symmetric aggregation**. +Classical moments, covariance, and correlation collapse directional information into symmetric summaries before analysis begins. + +This chapter introduces the mathematical primitive that avoids that collapse: + +**directional deviation operators.** + +These operators measure deviations relative to a benchmark separately above and below the reference point. From this simple construction we will derive many familiar objects of statistics. + +The framework begins with a simple observation: + +**any deviation relative to a benchmark has a direction.** + +--- + +## Deviations Relative to a Benchmark + +Let \(X\) be a real-valued random variable and let \(t \in \mathbb{R}\) denote a benchmark. + +Classical statistics measures deviations using + +\[ +X - t +\] + +which mixes positive and negative deviations together. + +Directional statistics separates them. + +Define the **positive-part operator** + +\[ +x^{+} = \max(x,0). +\] + +Using this operator we define two directional deviations: + +\[ +(X-t)^+ = \max(X-t,0) +\] + +\[ +(t-X)^+ = \max(t-X,0). +\] + +These represent + +- deviations **above the benchmark** +- deviations **below the benchmark** + +Both quantities are nonnegative. + +Together they fully characterize the magnitude of deviation relative to \(t\). + +--- + +## Directional Decomposition of Deviations + +Every deviation can be decomposed into directional components. + +For any real number \(x\), + +\[ +x = x^+ - (-x)^+. +\] + +Applying this identity to \(X-t\) yields + +\[ +X-t = (X-t)^+ - (t-X)^+. +\] + +Thus the classical deviation can be expressed as the **difference between two directional magnitudes**. + +The directional operators also reconstruct the magnitude of deviation: + +\[ +|X-t| = (X-t)^+ + (t-X)^+. +\] + +Thus directional components fully determine both the signed deviation and its magnitude. + +The key implication is structural: + +**the symmetric deviation is an aggregation of directional components.** + +Classical statistics begins with the aggregate. +Directional statistics begins with the components. + +--- + +## Directional Operators + +The functions + +\[ +(X-t)^+ , \quad (t-X)^+ +\] + +are called **directional deviation operators**. + +They induce a natural partition of the sample space: + +- \(X > t\) +- \(X \le t\) + +Within each region the operators measure the magnitude of deviation from the benchmark. + +This partition is fundamental. Many real-world systems evaluate outcomes relative to targets: + +- profits relative to costs +- returns relative to required benchmarks +- losses relative to liabilities +- forecast errors relative to expected demand + +Directional deviation operators formalize this benchmark-relative measurement. + +--- + +## Partial Moments + +Once directional deviations are defined, their magnitudes can be summarized through expectations. + +For integer \(r \ge 0\), define the **lower partial moment** + +\[ +L_r(t;X) = E[(t-X)_+^r] +\] + +and the **upper partial moment** + +\[ +U_r(t;X) = E[(X-t)_+^r]. +\] + +These quantities measure directional deviation magnitudes relative to the benchmark. + +For these expectations to be finite, it is sufficient that the corresponding directional powers are integrable (for example, \(E[|X|^r]<\infty\) for fixed \(r\)). + +The parameter \(r\) determines the type of deviation measured. + +| Degree \(r\) | Interpretation | +|---|---| +| 0 | probability mass | +| 1 | directional mean deviation | +| 2 | directional variance | +| \(r>2\) | higher-order tail structure | + +Partial moments therefore generalize classical moments while preserving directional structure. + + +## Notation Bridge: Theory to R Implementation + +The manuscript uses theoretical notation in proofs and function-style notation in implementation examples. The mapping is direct: + +| Theoretical object | Meaning | R implementation pattern (Using `NNS` Package) | +|---|---|---| +| \(L_r(t;X)\) | lower partial moment of degree \(r\) at benchmark \(t\) | `LPM(r, t, X)` | +| \(U_r(t;X)\) | upper partial moment of degree \(r\) at benchmark \(t\) | `UPM(r, t, X)` | +| \(L_r(t;X)_{\text{ratio}}\) | normalized lower share \(L_r/(L_r+U_r)\) | `LPM.ratio(r, t, X)` | +| \(U_r(t;X)_{\text{ratio}}\) | normalized upper share \(U_r/(L_r+U_r)\) | `UPM.ratio(r, t, X)` | +| \(CoLPM\), \(CoUPM\), \(DLPM\), \(DUPM\) | concordant/divergent co-partial moments | `Co.LPM(...)`, `Co.UPM(...)`, `D.LPM(...)`, `D.UPM(...)` | + +Unless otherwise stated, later chapters use the mathematical form for derivations and the function-call form for reproducible examples. + +--- + +## Benchmarks + +A distinctive feature of partial moments is the benchmark \(t\). + +In classical statistics, reference points are usually determined by the distribution itself. +The mean, median, and variance are all defined internally. + +Partial moments differ in an important way: the benchmark need not be determined by the distribution. + +Instead, \(t\) may represent an **externally meaningful reference point** chosen by the analyst or by the decision context. + +Examples include: + +- target returns in finance +- policy thresholds in economics +- forecast baselines in operations +- safety limits in engineering +- aspiration levels in behavioral decision theory + +The benchmark therefore embeds the context in which deviations matter. + +Directional statistics evaluates distributions relative to those contexts rather than purely distributional averages. + +--- + +## Relationship to Classical Moments + +Classical moments arise as aggregations of partial moments. + +For integer \(r \ge 1\), + +\[ +E[(X-t)^r] = U_r(t;X) + (-1)^r L_r(t;X). +\] + +This identity shows that symmetric moments are **signed combinations of directional components**. + +Several familiar quantities follow immediately. + +### Mean + +\[ +E[X] = U_1(0;X) - L_1(0;X) +\] + +### Variance + +\[ +Var(X) = U_2(\mu;X) + L_2(\mu;X) +\] + +This identity is the **population** variance decomposition. In R, `var(x)` returns the sample variance, so numerical checks against `UPM(2, mean(x), x) + LPM(2, mean(x), x)` should include the Bessel correction factor \(n/(n-1)\). + +### Third Central Moment + +\[ +E[(X-\mu)^3] = U_3(\mu;X) - L_3(\mu;X). +\] + +Thus classical symmetric statistics do not introduce fundamentally new objects. +They aggregate directional ones. + +Importantly, the mapping from partial moments to symmetric moments is **many-to-one**. + +Directional components determine the symmetric moment uniquely, but the symmetric moment does not generally determine the directional components. + +Directional moments therefore contain strictly more information. + +--- + +## Information Loss in Symmetric Aggregation + +Consider two distributions with identical variance. + +Distribution A may exhibit + +- large upside deviations +- small downside deviations. + +Distribution B may exhibit + +- small upside deviations +- large downside deviations. + +Both can produce the same value of + +\[ +Var(X). +\] + +Variance alone cannot distinguish them. + +However the directional quantities + +\[ +U_2(\mu;X), \quad L_2(\mu;X) +\] + +immediately reveal the asymmetry. + +Symmetric statistics therefore represent **projections of directional structure**. + +Once this projection occurs, the original directional information cannot generally be recovered. + +--- + +## Directional Operators as a Statistical Primitive + +Directional deviation operators provide a foundation from which many statistical constructs can be derived. + +Rather than beginning with symmetric statistics and imposing directional interpretation afterward, the directional framework reverses the order: deviations relative to a benchmark are measured first, and symmetric statistics emerge only as aggregations of those directional components. + +The implications of this perspective are surprisingly broad. + +The next chapter begins with a result that illustrates the power of the framework: + +**the cumulative distribution function itself is a partial moment.** diff --git a/tools/NNS/book/chapter-03-distribution-theory-from-partial-moments.Rmd b/tools/NNS/book/chapter-03-distribution-theory-from-partial-moments.Rmd new file mode 100644 index 0000000..38bc8cc --- /dev/null +++ b/tools/NNS/book/chapter-03-distribution-theory-from-partial-moments.Rmd @@ -0,0 +1,366 @@ +# Distribution Theory from Partial Moments + +Chapter 2 introduced directional deviation operators and the partial moments constructed from them. +Those operators separate deviations relative to a benchmark into directional components. + +This chapter establishes a surprising result: + +**the cumulative distribution function is itself a partial moment.** + +Once this relationship is recognized, several foundational objects of probability theory—survival functions, hazard rates, and quantile functions—emerge naturally from the same framework. + +--- + +## Degree-Zero Partial Moments + +Recall the definitions of the lower and upper partial moments. + +For integer \( r \ge 0 \), + +\[ +L_r(t;X) = E[(t-X)_+^r] +\] + +\[ +U_r(t;X) = E[(X-t)_+^r]. +\] + +When \( r = 0 \), we interpret the expressions directly in indicator form: + +\[ +(t-X)_+^0 = +\begin{cases} +1 & X \le t \\ +0 & X > t +\end{cases} +\] + +\[ +(X-t)_+^0 = +\begin{cases} +1 & X > t \\ +0 & X \le t +\end{cases} +\] + +Thus the degree-zero lower partial moment becomes + +\[ +L_0(t;X) = E[(t-X)_+^0]. +\] + +Observe that the expression \((t-X)_+^0\) behaves exactly like an indicator function, which is the intended degree-zero convention used throughout this chapter. + +Thus + +\[ +(t-X)_+^0 = 1_{\{X \le t\}}. +\] + +Taking expectations yields the following fundamental result. + +--- + +### Theorem 3.1 (CDF Representation) + +For any random variable \(X\) and benchmark \(t \in \mathbb{R}\), + +\[ +L_0(t;X) = P(X \le t) = F_X(t). +\] + +**Proof** + +From the definition of the lower partial moment, + +\[ +L_0(t;X) = E[(t-X)_+^0]. +\] + +As shown above, + +\[ +(t-X)_+^0 = 1_{\{X \le t\}}. +\] + +Therefore + +\[ +L_0(t;X) = E[1_{\{X \le t\}}]. +\] + +Since the expectation of an indicator equals the probability of the event, + +\[ +L_0(t;X) = P(X \le t). +\] + +Thus + +\[ +L_0(t;X) = F_X(t). +\] + +\(\square\) + +**Remark.** +The cumulative distribution function is therefore not an independent primitive of probability theory. It is the degree-zero instance of the partial-moment operator. + +### Empirical CDF Equivalence in R + +The degree-zero lower partial moment can be computed directly and compared to the empirical CDF: + +```r +library(NNS) +P = ecdf(x) +P(0) ; P(1) +LPM(degree = 0, target = 0, variable = x) ; LPM(degree = 0, target = 1, variable = x) + +# Vectorized targets: +LPM(degree = 0, target = c(0, 1), variable = x) + +plot(ecdf(x)) +points(sort(x), LPM(degree = 0, target = sort(x), variable = x), col = "red") +legend("left", legend = c("ecdf", "LPM.CDF"), fill = c("black", "red"), border = NA, bty = "n") +``` + +
    +![Figure 3.1. Empirical CDF (black step function) and degree-zero LPM evaluation (red points) are identical, confirming \(LPM_0(t;X)=F_X(t)\).](images/ch3_cdf_lpm0.png) +
    + +--- + +## Complementary Directional Probability + +Theorem 3.1 showed that the cumulative distribution function is the degree-zero lower partial moment: + +\[ +F_X(t) = L_0(t;X). +\] + +The complementary directional probability is given by the **upper degree-zero partial moment** + +\[ +U_0(t;X) = P(X > t). +\] + +These two quantities partition the sample space, so + +\[ +L_0(t;X) + U_0(t;X) = 1. +\] + +Equivalently, + +\[ +F_X(t) + U_0(t;X) = 1. +\] + +Thus the directional operators provide a natural decomposition of probability mass relative to the benchmark \(t\): + +- \(L_0(t;X)\): probability mass **at or below the benchmark** +- \(U_0(t;X)\): probability mass **above the benchmark** + +This directional partition forms the foundation for the survival and hazard functions examined in the next sections. + +--- + +## The Survival Function + +The **survival function** is defined as + +\[ +S_X(t) = P(X > t). +\] + +Using the directional framework, + +\[ +S_X(t) = U_0(t;X). +\] + +Thus the survival function is simply the **upper degree-zero partial moment**. + +Because + +\[ +F_X(t) + S_X(t) = 1, +\] + +the CDF and survival function represent complementary directional probabilities. + +This interpretation is particularly useful in reliability analysis, survival analysis, and risk management, where interest often lies in the probability that outcomes exceed a threshold. + +--- + +## Hazard Rates + +In survival analysis the **hazard rate** describes the instantaneous probability of failure conditional on survival. + +For continuous distributions the hazard rate is defined as + +\[ +h(t) = \frac{f(t)}{S_X(t)} +\] + +where \(f(t)\) is the probability density function. + +The density function can be written as the derivative of the cumulative distribution function: + +\[ +f(t) = \frac{d}{dt}F_X(t). +\] + +Since + +\[ +F_X(t) = L_0(t;X), +\] + +this implies + +\[ +f(t) = \frac{d}{dt}L_0(t;X). +\] + +Thus the hazard rate becomes + +\[ +h(t) = \frac{f(t)}{U_0(t;X)}. +\] + +This provides a directional interpretation of the hazard rate. + +The upper partial moment \(U_0(t;X)\) represents the probability mass that remains **above the benchmark \(t\)**. +The hazard rate therefore measures the instantaneous **flow of probability mass across the benchmark** from the upper directional region \(X > t\) into the lower region \(X \le t\). + +The **cumulative hazard function** is + +\[ +H(t) = \int_0^t h(s)\,ds. +\] + +Although hazard rates are typically introduced within survival analysis, they arise naturally within the directional framework once the survival function is recognized as an upper partial moment. + +--- + +## Quantile Functions + +The **quantile function** provides the inverse mapping of the cumulative distribution function. + +For \( p \in (0,1) \), the quantile is defined as + +\[ +Q(p) = \inf\{x : F_X(x) \ge p\}. +\] + +Because + +\[ +F_X(t) = L_0(t;X), +\] + +the quantile function identifies the benchmark \(t\) at which the degree-zero partial moment reaches probability level \(p\). + +Quantiles therefore correspond to **benchmarks that partition probability mass**. + +This interpretation aligns naturally with the directional framework, which evaluates distributions relative to benchmark thresholds. + + + +### Lower-Tail Thresholds as Degree-Zero Partial-Moment Quantiles + +A lower-tail threshold is often introduced in application-specific language, but within the directional framework it is simply a quantile of the degree-zero lower partial moment. + +Let +\[ +F_X(t)=P(X\le t). +\] +By the result established earlier in this chapter, +\[ +F_X(t)=L_0(t;X). +\] +Therefore the lower-tail quantile at probability level \(\alpha\) may be written as +\[ +Q_X(\alpha)=\inf\{t\in\mathbb{R}:F_X(t)\ge \alpha\} +=\inf\{t\in\mathbb{R}:L_0(t;X)\ge \alpha\}. +\] + +This identity is general. It does not depend on whether \(X\) represents returns, forecast errors, waiting times, deviations from a quality target, or distances below a safety threshold. In every case, the degree-zero lower partial moment answers the same question: what proportion of observations fall below the benchmark \(t\)? + +In some fields, especially finance, the lower-tail quantile +\[ +\inf\{t:F_X(t)\ge \alpha\} +\] +is called Value-at-Risk. But the mathematical object is broader than that label. It is the benchmark value that partitions a chosen fraction \(\alpha\) of lower-tail mass. + +This observation matters because it shows that threshold analysis is not an external application added onto the theory of distributions. It is already contained in the degree-zero directional representation of probability. The estimation-error literature makes the same point explicitly by identifying \(LPM_0\) with the cumulative distribution function and hence with the probability-of-loss object used in applied risk work. + +A second implication will become important later. Degree zero partitions observations by frequency alone. Higher degrees retain the same threshold logic while reweighting observations by the severity of their deviations from the benchmark. Thus quantile thinking extends naturally from event frequency to severity-weighted directional mass. + +**Proposition 3.1A.** For any random variable \(X\) and any \(\alpha\in(0,1)\), + +\[ +Q_X(\alpha)=\inf\{t:L_0(t;X)\ge \alpha\}. +\] + +**Proof.** Since \(L_0(t;X)=F_X(t)\), the result follows directly from the definition of the lower quantile. + + +--- + +## Probability Integral Transform + +If \(X\) has cumulative distribution function \(F_X\), then the transformed variable + +\[ +U = F_X(X) +\] + +is uniformly distributed on \([0,1]\). + +Since + +\[ +F_X(t) = L_0(t;X), +\] + +the probability integral transform can be written in directional form as + +\[ +U = L_0(X;X). +\] + +Here the benchmark equals the realized observation. The operator therefore measures the probability that an **independent draw from the same distribution** does not exceed the observed value. + +The transformation maps observations into probability space and forms the foundation for many statistical procedures including simulation, copula modeling, and dependence analysis. + +--- + +## Distribution Theory as Directional Measurement + +Classical probability theory typically introduces the cumulative distribution function as a primitive object. + +The directional framework reveals that the CDF arises from a simpler structure. + +It is simply the **degree-zero instance of the partial-moment operator**. + +Higher-order partial moments measure magnitudes of directional deviation, while the degree-zero case measures directional probability mass. + +Thus probability distribution functions and moment statistics emerge from the same underlying primitive. + +--- + +## Structural Implications + +The results of this chapter establish three key points. + +1. The cumulative distribution function is the degree-zero lower partial moment. + +2. The survival function is the degree-zero upper partial moment. + +3. Quantile functions identify benchmarks that partition probability mass. + +Distribution theory therefore lies inside the same directional framework that generates moment statistics. + +The next chapter turns to **numerical integration via partial moments**. Chapter 5 then shows how **classical moments arise as signed combinations of partial moments**, further demonstrating the unifying role of directional statistics. diff --git a/tools/NNS/book/chapter-04-numerical-integration-via-partial-moments.Rmd b/tools/NNS/book/chapter-04-numerical-integration-via-partial-moments.Rmd new file mode 100644 index 0000000..6c57aab --- /dev/null +++ b/tools/NNS/book/chapter-04-numerical-integration-via-partial-moments.Rmd @@ -0,0 +1,352 @@ +# Numerical Integration via Partial Moments + +Chapter 3 showed that the cumulative distribution function arises as the degree-zero partial moment. +Probability mass itself can therefore be represented through the directional deviation operators introduced earlier. + +The same idea extends naturally to **numerical integration**. + +Many quantities in probability, statistics, and economics are defined as definite integrals. +Expected values and risk measures both rely on integrating functions with respect to probability distributions. + +This chapter shows that partial moments provide a natural and flexible way to approximate such integrals. + +Rather than relying on classical quadrature formulas alone, we can represent integrals through expectations of directional deviations relative to benchmarks. + +--- + +## Definite Integrals as Expectations + +Let \(X\) be a random variable with cumulative distribution function \(F_X\). + +For any measurable function \(g(x)\), the expectation of \(g(X)\) can be written as + +\[ +E[g(X)] = \int_{-\infty}^{\infty} g(x)\, dF_X(x). +\] + +When \(X\) has density \(f(x)\), this becomes + +\[ +E[g(X)] = \int_{-\infty}^{\infty} g(x) f(x)\, dx. +\] + +Thus expectations are **definite integrals weighted by probability**. + +This representation allows integrals to be estimated directly from sample data: + +\[ +E[g(X)] \approx \frac{1}{n}\sum_{i=1}^{n} g(x_i). +\] + +In practice, many statistical quantities—including moments and risk measures—are simply special cases of this expectation integral. + +--- + +## Integrals from Directional Deviations + +Consider the upper partial moment + +\[ +U_r(t;X) = E[(X-t)_+^r]. +\] + +Using the definition of expectation, + +\[ +U_r(t;X) = \int_{t}^{\infty} (x-t)^r f(x)\,dx. +\] + +Similarly, the lower partial moment is + +\[ +L_r(t;X) = \int_{-\infty}^{t} (t-x)^r f(x)\,dx. +\] + +Thus partial moments correspond directly to **definite integrals over directional regions** of the distribution. + +The integrand is the deviation magnitude relative to the benchmark \(t\). + +These integrals quantify how much probability mass lies above or below the benchmark and how far those observations lie from it. + +--- + +## Approximation via Sample Partial Moments + +Suppose we observe a sample \(x_1,\dots,x_n\). + +The partial moments can be estimated empirically: + +\[ +\hat{U}_r(t) = \frac{1}{n}\sum_{i=1}^{n} (x_i-t)_+^r +\] + +\[ +\hat{L}_r(t) = \frac{1}{n}\sum_{i=1}^{n} (t-x_i)_+^r. +\] + +These quantities approximate the integrals + +\[ +\int_{t}^{\infty} (x-t)^r f(x)\,dx +\] + +and + +\[ +\int_{-\infty}^{t} (t-x)^r f(x)\,dx. +\] + +Unlike classical quadrature rules that rely on fixed grid points, the empirical partial moments use the observed data directly. + +This approach provides a **data-adaptive integration scheme**. + +--- + +## Example: Estimating Downside Risk + +To illustrate, suppose we observe the returns + +\[ +x = \{-4,-2,-1,1,3,5\}. +\] + +Let the benchmark return be + +\[ +t = 0. +\] + +The lower partial moment of degree 1 is + +\[ +L_1(0;X) = E[(0-X)_+]. +\] + +Compute the directional deviations: + +| \(x_i\) | \((0-x_i)_+\) | +|---|---| +| -4 | 4 | +| -2 | 2 | +| -1 | 1 | +| 1 | 0 | +| 3 | 0 | +| 5 | 0 | + +The empirical estimate becomes + +\[ +\hat{L}_1(0) = +\frac{4+2+1}{6} += +\frac{7}{6} +\approx 1.17. +\] + +This quantity measures the **unconditional average shortfall below the benchmark** (i.e., averaged over all observations, including zeros above the benchmark). + +The calculation approximates the integral + +\[ +\int_{-\infty}^{0} (0-x)f(x)\,dx +\] + +using only the observed sample. + +This calculation is exactly the empirical estimator computed above in this section with \(r = 1\). +The example therefore illustrates how the general partial-moment estimator performs numerical integration directly from sample data. + +--- + +## Relationship to Classical Quadrature + +Classical numerical integration methods approximate integrals using weighted sums of function values evaluated at predetermined nodes. + +Expectation integrals differ in an important way: + +\[ +E[g(X)] = \int g(x)\, dF_X(x). +\] + +If \(X \sim \mathrm{Unif}(a,b)\), then + +\[ +E[f(X)] = \frac{1}{b-a}\int_a^b f(x)\,dx. +\] + +So the interval integral is recovered by the explicit scale factor: + +\[ +\int_a^b f(x)\,dx = (b-a)E[f(X)]. +\] + +Using first-degree partial moments at benchmark \(t=0\) for \(Y=f(X)\), + +\[ +E[Y] = U_1(0;Y)-L_1(0;Y), +\] + +hence + +\[ +\int_a^b f(x)\,dx \approx (b-a)\left(\hat U_1(0;Y)-\hat L_1(0;Y)\right). +\] + +For unsigned (total) area, + +\[ +\int_a^b |f(x)|\,dx \approx (b-a)\left(\hat U_1(0;Y)+\hat L_1(0;Y)\right). +\] + +The key accuracy point is that the \((b-a)\) term multiplies the partial-moment estimate whenever the domain is \([a,b]\) rather than a unit-length interval. + +Here the weighting measure is the distribution \(F_X\), not uniform measure in general. + +Empirical partial moments therefore approximate integrals using the observed data themselves as evaluation points. Regions with higher probability mass contribute more strongly to the approximation. + +In this sense, partial-moment integration is **distribution-adaptive**: the integration nodes are determined by the data rather than by a fixed grid. + +From a computational perspective, empirical partial moments are also simple to scale: for fixed \(r\) and benchmark \(t\), each estimate requires a single pass through the sample (\(O(n)\) operations). Classical quadrature can be very accurate for smooth low-dimensional integrands, but it depends on externally chosen nodes and weights and can become sensitive when mass is concentrated in tail regions. The partial-moment estimator trades closed-form quadrature weights for direct data weighting under \(F_X\), which is often numerically stable in statistical applications. + +--- + +## Convergence Properties + +Under standard regularity conditions, empirical expectations converge to their population counterparts. + +By the law of large numbers, + +\[ +\hat{U}_r(t) \rightarrow U_r(t;X) +\] + +and + +\[ +\hat{L}_r(t) \rightarrow L_r(t;X) +\] + +as \(n \to \infty\). + +Thus the empirical partial moments provide consistent estimators of the corresponding integrals. + +Because the integration nodes are the observed data themselves, the approximation improves automatically as the sample grows. + +No externally chosen bandwidth or grid resolution is required. + +--- + +## Applications + +The partial-moment representation of integrals has many applications. + +### Probability and Distribution Analysis + +Many distributional quantities can be written as integrals of deviation functions. + +Examples include: + +- unconditional partial-moment shortfall measures (distinct from conditional expected shortfall / CVaR) +- tail risk measures +- higher-order moments. + +### Risk Measurement + +In finance, downside risk measures often take the form + +\[ +E[(\tau-X)_+^r]. +\] + +These are precisely lower partial moments relative to a target return \(\tau\). + +Thus many risk measures are simply integrals of directional deviations. + + +### Directional Probability Bounds + +Partial moments do more than approximate benchmark-relative integrals numerically. They also support conservative bounds on tail probabilities. This is important because threshold-based decisions often require guarantees that remain valid even when the underlying distribution is unknown or misspecified. + +Suppose \(g<\mu\) is a lower benchmark and the event of interest is +\[ +X\le g. +\] +A classical one-sided Chebyshev argument bounds this lower-tail probability using symmetric dispersion: +\[ +P(X\le g)\le \frac{1}{2}\left(\frac{\sigma}{\mu-g}\right)^2. +\] +This bound depends only on the mean and variance, so it remains distribution-free, but it does not distinguish between upper and lower deviations. + +A directional refinement replaces symmetric variance with semivariance: +\[ +P(X\le g)\le \left(\frac{\sigma_-}{\mu-g}\right)^2, +\] +where \(\sigma_-\) measures dispersion only on the adverse side of the benchmark. The estimation-error literature highlights the importance of this refinement through the Berck–Hihn result, which links semivariance directly to a strong boundary form of Chebyshev’s inequality. + +A further generalization uses lower partial moments of degree \(\alpha\). Define +\[ +\theta(t,\alpha)=\big(E[(t-X)_+^\alpha]\big)^{1/\alpha}. +\] +Then, for \(g\le t\), +\[ +P(X\le g)\le \left(\frac{\theta(t,\alpha)}{t-g}\right)^\alpha. +\] +The probability-bounds literature presents this as an Atwood-style lower-partial-moment inequality and interprets \(\theta(t,\alpha)\) as generalized downside dispersion. + +These bounds form a directional hierarchy: +\[ +\text{symmetric variance} \to \text{directional second moment} \to \text{general directional degree } \alpha. +\] +The central theme is that tail-probability control need not be built from a separate theory. It can be generated from the same benchmark-relative operators that already define the directional framework. + +### Threshold Analysis and Directional Dispersion + +Probability bounds become especially meaningful when interpreted as threshold-analysis tools. + +In many applied settings, the analyst cares about whether a process falls below a critical level. Examples include: + +* a forecast undershooting a service target, +* an inventory position dropping below a replenishment threshold, +* a reliability metric falling below a safety margin, +* or a return falling below an acceptable performance benchmark. + +In each case the relevant question is the same: +\[ +P(X\le g)? +\] +Classical methods answer this using symmetric dispersion summaries. The directional framework answers it more precisely by measuring deviations on the relevant side of the benchmark. + +The quantity +\[ +L_\alpha(t;X)=E[(t-X)_+^\alpha] +\] +therefore serves three roles simultaneously: +\[ +\text{directional integral}, +\quad +\text{benchmark-relative dispersion summary}, +\quad +\text{engine of a probability bound}. +\] +That multi-use structure is one of the framework’s main advantages. It reduces the gap between descriptive measurement, numerical integration, and decision support. + +The estimation-error literature places this in a broader historical context: semivariance and related partial moments are not ad hoc devices, but directional statistics with direct links to probability inequalities, utility-sensitive modeling, and nonparametric analysis. + + +--- + +## Summary + +This chapter established that partial moments naturally represent definite integrals over directional regions of a distribution. + +The key results are: + +1. Expectations are definite integrals with respect to probability distributions. +2. Upper and lower partial moments correspond to integrals over directional deviation regions. +3. Empirical partial moments provide data-adaptive approximations of these integrals. +4. Convergence follows from standard laws of large numbers. +5. Many statistical and economic quantities—including risk measures—can be expressed using this framework. + +Partial moments therefore act as **numerical integrators that aggregate directional deviations relative to benchmarks**. + +The next chapter shows how **classical symmetric moments arise as signed combinations of partial moments**, completing the bridge between directional statistics and traditional moment analysis. diff --git a/tools/NNS/book/chapter-05-classical-moments-as-directional-aggregates.Rmd b/tools/NNS/book/chapter-05-classical-moments-as-directional-aggregates.Rmd new file mode 100644 index 0000000..ab4fb1a --- /dev/null +++ b/tools/NNS/book/chapter-05-classical-moments-as-directional-aggregates.Rmd @@ -0,0 +1,314 @@ +# Classical Moments as Directional Aggregates + +Chapters 2–4 introduced directional deviation operators and the partial moments constructed from them. + +These operators measure deviations relative to a benchmark separately above and below the reference point. +They therefore provide a directional description of distributional structure. + +This chapter shows that **classical symmetric moments arise as aggregations of these directional components**. + +Mean, variance, and higher-order moments do not introduce fundamentally new statistical objects. +Instead, they emerge as signed combinations of partial moments. + +Once this relationship is recognized, classical moment theory can be interpreted as a special case of the directional framework. + +--- + +## Moments Relative to a Benchmark + +In classical statistics, the \(r\)-th moment of a random variable \(X\) relative to a benchmark \(t\) is defined as + +\[ +E[(X-t)^r]. +\] + +This expression represents the \(r\)-th moment about the point \(t\). + +When the benchmark equals the mean \(t=\mu\), the quantity becomes the **\(r\)-th central moment**. +Otherwise it represents a moment relative to an arbitrary reference point. + +Examples include + +- \(r=1\): mean deviation +- \(r=2\): variance when \(t=\mu\) +- \(r=3\): skewness-related moment +- \(r=4\): kurtosis-related moment + +These quantities summarize distributions by aggregating deviations around a reference point. + +However, the deviation \(X-t\) combines positive and negative directions together. + +Directional statistics separates these components. + +--- + +## Directional Moment Decomposition + +Recall the directional deviation operators + +\[ +(X-t)^+ = \max(X-t,0) +\] + +\[ +(t-X)^+ = \max(t-X,0). +\] + +These represent deviations above and below the benchmark. + +Raising these quantities to power \(r\) and taking expectations yields the partial moments + +\[ +U_r(t;X) = E[(X-t)_+^r] +\] + +\[ +L_r(t;X) = E[(t-X)_+^r]. +\] + +Using the directional decomposition + +\[ +X-t = (X-t)^+ - (t-X)^+, +\] + +one obtains the identity + +\[ +E[(X-t)^r] = U_r(t;X) + (-1)^r L_r(t;X). +\] + +Thus **every classical moment can be written as a signed combination of directional partial moments**. + +--- + +## Mean + +Setting \(r=1\) and \(t=0\) yields + +\[ +E[X] = U_1(0;X) - L_1(0;X). +\] + +The mean can therefore be interpreted as the difference between + +- average upward deviations from the benchmark +- average downward deviations from the benchmark. + +If the benchmark is chosen as the mean itself, \(t=\mu\), then + +\[ +E[X-\mu] = U_1(\mu;X) - L_1(\mu;X). +\] + +But by definition \(E[X-\mu]=0\). +Therefore + +\[ +U_1(\mu;X)=L_1(\mu;X). +\] + +This equality holds **only when the benchmark equals the mean**. +For other benchmarks, upward and downward deviations generally do not balance. + +Thus the classical property that deviations around the mean sum to zero has a natural directional interpretation. + +--- + +## Variance + +Variance is defined as + +\[ +Var(X) = E[(X-\mu)^2]. +\] + +Applying the directional decomposition with \(t=\mu\) yields + +\[ +Var(X) = U_2(\mu;X) + L_2(\mu;X). +\] + +As in Chapter 2, this is a population identity. When verifying numerically in R, use `UPM(2, mean(x), x) + LPM(2, mean(x), x)` for population variance, and multiply by \(n/(n-1)\) to match `var(x)`. + +This equality is exact because both terms are computed around the same global mean \(\mu\). It should not be confused with averaging conditional subgroup variances, which omits a nonnegative between-group term unless explicitly added. + +Variance therefore equals the sum of two directional components: + +- upward deviation relative to the mean +- downward deviation relative to the mean. + +The classical statistic reports only their total magnitude. + +Two distributions may share identical variance while exhibiting very different directional structures. + +--- + +## Higher-Order Moments + +Higher-order moments follow the same decomposition. + +### Third Moment + +\[ +E[(X-\mu)^3] = U_3(\mu;X) - L_3(\mu;X). +\] + +This moment measures directional asymmetry. + +### Fourth Moment + +\[ +E[(X-\mu)^4] = U_4(\mu;X) + L_4(\mu;X). +\] + +This moment reflects the magnitude of tail deviations regardless of direction. + +In each case, classical moments aggregate directional components into a single statistic. + +--- + +## Standardized Skewness and Kurtosis + +In practice, third and fourth moments are normalized by variance to produce dimensionless statistics. + +Skewness is defined as + +\[ +Skew(X) = +\frac{E[(X-\mu)^3]}{Var(X)^{3/2}}. +\] + +Using the directional representation, + +\[ +Skew(X) = +\frac{U_3(\mu;X) - L_3(\mu;X)} +{(U_2(\mu;X)+L_2(\mu;X))^{3/2}}. +\] + +A useful intuition follows immediately from this expression. +If a distribution has a longer right tail than left tail, large positive deviations dominate so that + +\[ +U_3(\mu;X) \gg L_3(\mu;X), +\] + +producing positive skewness. + +Similarly, kurtosis is + +\[ +Kurt(X) = +\frac{E[(X-\mu)^4]}{Var(X)^2}. +\] + +Substituting the directional components gives + +\[ +Kurt(X) = +\frac{U_4(\mu;X)+L_4(\mu;X)} +{(U_2(\mu;X)+L_2(\mu;X))^2}. +\] + +In finite samples, estimates based on third and fourth moments can be unstable because extreme observations are raised to high powers. The directional decomposition still applies, but empirical interpretation should account for this sensitivity. + +Thus the familiar standardized statistics also arise directly from directional partial moments. + +--- + +## Information Loss in Symmetric Aggregation + +The mapping from partial moments to classical moments is **many-to-one**. + +Directional components determine the symmetric moment uniquely. + +However the symmetric moment does not determine the directional components. + +For example, + +\[ +Var(X) = U_2(\mu;X) + L_2(\mu;X) +\] + +does not reveal how the total variance is distributed between the two sides of the distribution. + +Consider two distributions: + +| Distribution | \(U_2(\mu;X)\) | \(L_2(\mu;X)\) | \(Var(X)=U_2+L_2\) | +|---|---:|---:|---:| +| A | 10 | 0 | 10 | +| B | 5 | 5 | 10 | + +Both produce + +\[ +Var(X)=10, +\] + +yet their directional risk structures are completely different. + + +A useful edge case is degenerate support: if all probability mass is concentrated at a single point, then both directional components are zero, \(U_2(\mu;X)=L_2(\mu;X)=0\), so variance is exactly zero. This confirms that the decomposition remains valid at the boundary and that nonzero variance requires at least one nonzero directional component. + +Symmetric moments therefore represent **projections of directional structure**. +Once aggregated, the original directional information cannot generally be recovered. + +--- + +## Measure-Theoretic Interpretation + +The directional decomposition follows naturally from the partition of the sample space induced by the benchmark \(t\): + +- \(X>t\) +- \(X\le t\) + +Expectation integrals can therefore be written as + +\[ +E[(X-t)^r] += +\int_{x>t}(x-t)^r f(x)\,dx ++ +\int_{x\le t}(x-t)^r f(x)\,dx. +\] + +These two integrals correspond exactly to the upper and lower partial moments. + +This representation also clarifies the variance example from the previous section. +Two distributions may share the same variance while producing different values of + +\[ +U_2(\mu;X) += +\int_{x>\mu}(x-\mu)^2 f(x)\,dx. +\] + +Distribution A places most of its squared deviations in the region \(x>\mu\), producing a large value of \(U_2\). +Distribution B distributes deviations more evenly across the two regions. + +Although the directional integrals differ, their sum + +\[ +U_2(\mu;X)+L_2(\mu;X) +\] + +can still produce the same total variance. + +Thus the measure-theoretic decomposition explains precisely how symmetric aggregation hides directional structure. + +--- + +## Implications + +The results of this chapter show that classical moment statistics arise from directional components rather than the other way around. + +Partial moments therefore provide a structural foundation from which several familiar constructs emerge: + +- probability distributions (degree \(r=0\)), +- classical moments (degrees \(r\ge1\)), +- standardized measures such as skewness and kurtosis. + +Seen from this perspective, symmetric statistics summarize directional information that is already present in the distribution. + +Chapter 6 develops the measure-theoretic foundation for this framework, and Part II extends the analysis to descriptive statistics derived from the directional perspective introduced here. diff --git a/tools/NNS/book/chapter-06-measure-theoretic-interpretation.Rmd b/tools/NNS/book/chapter-06-measure-theoretic-interpretation.Rmd new file mode 100644 index 0000000..882c919 --- /dev/null +++ b/tools/NNS/book/chapter-06-measure-theoretic-interpretation.Rmd @@ -0,0 +1,345 @@ +# Measure-Theoretic Interpretation + +Chapters 2–5 developed the directional framework from an algebraic perspective. +Directional deviation operators were introduced, partial moments were defined, and classical statistics was shown to arise as an aggregation of directional components. + +This chapter places that framework inside **measure-theoretic probability**. + +The goal is not to introduce new probability axioms. +Instead, we show that directional deviation operators align naturally with the core structures of probability theory: + +- measurable functions +- partitions of the sample space +- positive and negative function decompositions +- Lebesgue integration + +Viewed from this perspective, partial moments are not merely convenient statistics. +They represent a **canonical measurable refinement** of symmetric statistical quantities. + +For reference, the key assumptions used throughout this chapter are: + +- \(X\) is measurable on \((\Omega,\mathcal{F},P)\), +- the benchmark \(t\) is fixed (or measurable when data-dependent benchmarks are considered), +- the relevant moments exist (e.g., \(X\in L^r\) for degree \(r\)). + +--- + +## Probability Spaces + +Let + +\[ +(\Omega,\mathcal{F},P) +\] + +denote a probability space where + +- \(\Omega\) is the sample space, +- \(\mathcal{F}\) is a σ-algebra of measurable events, +- \(P\) is a probability measure. + +A real-valued random variable is a measurable function + +\[ +X:\Omega \rightarrow \mathbb{R}. +\] + +The cumulative distribution function of \(X\) is + +\[ +F_X(t) = P(X \le t). +\] + +Expectations of measurable functions are defined through the Lebesgue integral + +\[ +E[g(X)] = \int_{\Omega} g(X(\omega))\, dP(\omega). +\] + +This integral provides the foundation for statistical quantities such as moments, expectations, and risk measures. + +Directional statistics operates within exactly the same framework. +As we will see later in this chapter, the directional deviation operators introduced earlier align naturally with the positive and negative function decompositions used in Lebesgue integration. + +--- + +## Benchmark-Induced Partitions + +Let \(t \in \mathbb{R}\) be a benchmark. + +The benchmark induces a natural measurable partition of the sample space: + +\[ +\Omega = +\{\omega : X(\omega) \le t\} +\cup +\{\omega : X(\omega) > t\}. +\] + +Equivalently, the real line is partitioned into two regions: + +- the **lower region** \(X \le t\) +- the **upper region** \(X > t\) + +This partition plays a central role in probability theory. +The cumulative distribution function itself is defined through it: + +\[ +F_X(t) = P(X \le t). +\] + +Directional statistics extends this same partition structure to **magnitudes of deviation**. + +--- + +## Positive and Negative Function Decomposition + +Measure theory frequently decomposes functions into positive and negative parts. + +For any real-valued function \(f\), + +\[ +f = f^{+} - f^{-} +\] + +where + +\[ +f^{+} = \max(f,0), \qquad f^{-} = \max(-f,0). +\] + +Both components are nonnegative measurable functions. + +Lebesgue integration then satisfies + +\[ +\int f\, d\mu = +\int f^{+} d\mu +- +\int f^{-} d\mu. +\] + +This decomposition ensures that integrals of arbitrary measurable functions can be constructed from integrals of nonnegative functions. + +Directional deviation operators follow exactly the same structure. + +Let + +\[ +f(X) = X - t. +\] + +Then + +\[ +(X-t)^+ = \max(X-t,0) +\] + +\[ +(t-X)^+ = \max(t-X,0). +\] + +Thus + +\[ +X-t = (X-t)^+ - (t-X)^+. +\] + +Directional deviations therefore correspond directly to the **positive and negative parts of the deviation function**. + +--- + +## Partial Moments as Measurable Integrals + +Partial moments are expectations of these nonnegative measurable functions. + +For integer \(r \ge 0\), + +\[ +U_r(t;X) = E[(X-t)_+^r] +\] + +\[ +L_r(t;X) = E[(t-X)_+^r]. +\] + +Assuming \(X \in L^r\) so that these expectations exist, partial moments are simply expectations of nonnegative measurable functions. + +Using the definition of expectation, + +\[ +U_r(t;X) += +\int_{\Omega} (X(\omega)-t)_+^r \, dP(\omega) +\] + +\[ +L_r(t;X) += +\int_{\Omega} (t-X(\omega))_+^r \, dP(\omega). +\] + +These integrals can be written explicitly over the benchmark partition: + +\[ +U_r(t;X) += +\int_{X>t} (X-t)^r \, dP +\] + +\[ +L_r(t;X) += +\int_{X\le t} (t-X)^r \, dP. +\] + +Thus partial moments are **Lebesgue integrals evaluated over measurable directional regions**. + +The benchmark \(t\) defines the partition, and the integrand measures deviation magnitude within each region. + +--- + +## Recovery of Classical Moments + +From the positive–negative decomposition of deviations introduced in Section 6.3, + +\[ +X-t = (X-t)^+ - (t-X)^+, +\] + +raising both sides to power \(r\) and integrating yields + +\[ +E[(X-t)^r] = U_r(t;X) + (-1)^r L_r(t;X). +\] + +This identity holds for any integrable random variable. + +Classical symmetric moments therefore arise as **signed combinations of two directional integrals**. + +For example: + +### Mean + +\[ +E[X] = U_1(0;X) - L_1(0;X). +\] + +### Variance + +\[ +Var(X) = U_2(\mu;X) + L_2(\mu;X). +\] + +### Third Moment + +\[ +E[(X-\mu)^3] = U_3(\mu;X) - L_3(\mu;X). +\] + +The symmetric moment is therefore an **aggregation operator applied to directional components**. + +--- + +## Canonical Refinement of Symmetric Moments + +The mapping + +\[ +(U_r,L_r) \rightarrow E[(X-t)^r] +\] + +is **many-to-one**. + +Directional components uniquely determine the symmetric moment, but the symmetric moment cannot generally recover the directional components. + +This implies a strict information hierarchy: + +\[ +(U_r,L_r) \quad \text{contains more information than} \quad E[(X-t)^r]. +\] + +In measure-theoretic terms, the directional decomposition represents a **refinement of the measurable structure** induced by symmetric aggregation. + +The symmetric moment collapses two measurable integrals into a single value. + +Directional moments preserve the contributions of each measurable region. + +--- + +## Alignment with Probability Partitions + +Probability itself is defined through partitions. + +For any event \(A\), + +\[ +P(A) + P(A^c) = 1. +\] + +Similarly, the cumulative distribution function partitions probability mass relative to a threshold \(t\): + +\[ +F_X(t) = P(X \le t) +\] + +\[ +1 - F_X(t) = P(X > t). +\] + +Directional deviation operators extend this same partition structure. + +Instead of measuring only probability mass in each region, they measure **magnitudes of deviation within those regions**. + +The degree-zero case recovers probability: + +\[ +L_0(t;X) = P(X \le t) +\] + +\[ +U_0(t;X) = P(X > t). +\] + +Higher degrees measure deviation magnitude within the same partition. + +--- + +## Structural Interpretation + +From the measure-theoretic perspective, the directional framework reflects a deeper structural fact about probability. + +Every deviation relative to a benchmark induces a natural measurable partition of the sample space. +Lebesgue integration aggregates contributions across that partition. + +Directional partial moments simply retain the integrals over each region separately. + +Symmetric statistics combine those integrals into a single value. + +Thus the directional representation does not introduce new probability objects. +It reveals the **underlying structure that symmetric statistics aggregate away**. + +--- + +## Implications + +The measure-theoretic interpretation clarifies the role of partial moments in statistical theory. +In particular, it shows that the directional framework is not a new probabilistic system but a refinement of the standard measure-theoretic structure already used throughout statistics. + +1. Directional deviation operators correspond to positive and negative function decompositions. + +2. Partial moments are Lebesgue integrals over measurable directional regions. + +3. Classical symmetric moments are aggregations of those integrals. + +4. Directional moments therefore preserve strictly more structural information about distributions. + +These structural properties explain why directional statistics can support the applied methods developed in the remainder of the book: the framework preserves the same probability foundations while retaining directional information that symmetric statistics discard. + +Operationally, this means the directional framework can be used without replacing standard probabilistic machinery. One can work with the same probability space, the same measurable-function toolkit, and the same integration rules, while reporting richer benchmark-relative diagnostics for risk, asymmetry, and tail behavior. + +--- + +The next part of the book turns from theoretical foundations to **descriptive statistics derived from directional partial moments**. + +Part II develops descriptive statistics that retain the directional information preserved by this refined measurable structure. Rather than collapsing deviations into symmetric aggregates, these measures describe distributions in terms of their directional behavior relative to meaningful benchmarks. diff --git a/tools/NNS/book/chapter-07-directional-descriptive-statistics.Rmd b/tools/NNS/book/chapter-07-directional-descriptive-statistics.Rmd new file mode 100644 index 0000000..5dfba86 --- /dev/null +++ b/tools/NNS/book/chapter-07-directional-descriptive-statistics.Rmd @@ -0,0 +1,331 @@ +# Directional Descriptive Statistics + +Chapters 2–6 established the theoretical foundations of directional statistics. + +Directional deviation operators were introduced, partial moments were defined, classical moments were derived as aggregations of directional components, and the framework was shown to align naturally with measure-theoretic probability. These results demonstrated that many familiar statistical quantities arise from the same primitive structure: **directional deviations relative to a benchmark**. + +With the theoretical foundation in place, we now turn to **descriptive statistics**. + +Classical descriptive statistics summarize distributions using symmetric aggregates such as the mean, variance, skewness, and kurtosis. While these quantities are useful, they obscure directional structure because they combine positive and negative deviations into a single measure. + +Directional descriptive statistics retain the information that symmetric statistics discard. Rather than collapsing deviations into aggregates, they describe distributions in terms of **directional behavior relative to benchmarks**. + +--- + +## Directional Mean Interpretation + +Recall from Chapter 5 that the mean can be expressed as the difference between directional partial moments: + +\[ +E[X] = U_1(0;X) - L_1(0;X). +\] + +This identity shows that the mean is not a primitive quantity. It is the **net directional deviation relative to the benchmark \(t = 0\)**. + +More generally, for any benchmark \(t\), + +\[ +E[X - t] = U_1(t;X) - L_1(t;X). +\] + +Thus the expectation of deviations relative to a benchmark equals the difference between + +- deviations **above the benchmark**, and +- deviations **below the benchmark**. + +If \(t = \mu\), then + +\[ +U_1(\mu;X) = L_1(\mu;X). +\] + +In words, the mean is the point at which **expected upward and downward deviations balance**. + +Directional statistics therefore interprets the mean not simply as a central location but as a **balance point between directional deviations**. + +--- + +## Directional Variance Decomposition + +Variance also has a natural directional interpretation. + +Chapter 5 showed that + +\[ +Var(X) = U_2(\mu;X) + L_2(\mu;X). +\] + +This decomposition is exact for population variance. In implementation checks, remember that `var(x)` in R is the sample variance, so matching it requires multiplying `UPM(2, mean(x), x) + LPM(2, mean(x), x)` by \(n/(n-1)\). + +This is an **exact decomposition** relative to the global mean \(\mu\), not an approximation and not a conditional-variance identity. + +To avoid a common confusion, compare with the law of total variance for the split \(X\ge \mu\) versus \(X<\mu\): + +\[ +Var(X)=p\,Var(X\mid X\ge \mu)+(1-p)\,Var(X\mid X<\mu)+p(1-p)(\mu_{\ge}-\mu_{<})^2, +\] + +where \(p=P(X\ge\mu)\), \(\mu_{\ge}=E[X\mid X\ge\mu]\), and \(\mu_{<}=E[X\mid X<\mu]\). Hence + +\[ +Var(X)\ge p\,Var(X\mid X\ge \mu)+(1-p)\,Var(X\mid X<\mu), +\] + +because the between-group term is nonnegative. By contrast, partial moments already account for total variance around the **same global center** \(\mu\), so no extra between-group correction is missing: + +\[ +Var(X)=U_2(\mu;X)+L_2(\mu;X). +\] + +Equivalently, \(L_2(\mu;X)\) is the (global-mean) downside semivariance and \(U_2(\mu;X)\) is the corresponding upside semivariance. + +Variance therefore consists of two directional components: + +- **upside variance**: \(U_2(\mu;X)\) +- **downside variance**: \(L_2(\mu;X)\) + +Classical statistics reports only their sum. + +Directional descriptive statistics retain both quantities separately. + +This decomposition provides immediate insight into distributional structure. + +For example, two assets may share identical variance but differ dramatically in directional risk: + +| Distribution | \(U_2(\mu;X)\) | \(L_2(\mu;X)\) | Variance | +|---|---|---|---| +| A | 10 | 0 | 10 | +| B | 5 | 5 | 10 | + +Variance alone cannot distinguish these cases. + +Directional variance reveals whether volatility arises primarily from **upside movements** or **downside movements**. + +This distinction is particularly important in finance, economics, and risk management where negative deviations are often evaluated differently than positive ones (a topic developed more fully in Part VIII). + +--- + +## Benchmark-Relative Descriptive Statistics + +A key advantage of partial moments is that the benchmark \(t\) can be chosen externally. + +Classical descriptive statistics typically use internally determined reference points such as the mean or median. Directional statistics allows the analyst to describe distributions relative to **meaningful benchmarks**. + +Examples include + +- required returns in finance +- policy thresholds in economics +- forecast targets in operations +- safety limits in engineering + +Suppose \(t\) represents a target value. + +Then the first-degree partial moments describe benchmark-relative behavior: + +\[ +U_1(t;X) = E[(X-t)_+] +\] + +\[ +L_1(t;X) = E[(t-X)_+]. +\] + +These quantities measure + +- the **unconditional average excess above the benchmark**, and +- the **unconditional average shortfall below the benchmark**.¹ + +Unlike symmetric statistics, these measures directly reflect the context in which outcomes are evaluated. + +To make this concrete, consider the sample + +\[ +x=\{-2,-1,0,3,5\} +\] + +with benchmark \(t=1\). Then + +\[ +\hat{L}_1(1)=\frac{1}{5}(3+2+1+0+0)=1.2, +\quad +\hat{U}_1(1)=\frac{1}{5}(0+0+0+2+4)=1.2. +\] + +Here the unconditional average shortfall below the benchmark equals the unconditional average excess above it, even though frequencies differ: three observations fall below \(t\), one equals \(t\), and two exceed \(t\). This illustrates how benchmark-relative directional moments separate **how often** outcomes fall on each side from **how far** they lie from the benchmark. + +--- + +¹ These are unconditional averages over the full sample/population, not conditional expectations (e.g., not CVaR-style conditioning on tail events only). Because these quantities are expectations of deviations, they can be influenced by extreme observations within each region of the distribution. This tail sensitivity becomes particularly relevant when analyzing heavy-tailed distributions, as discussed later in Section 7.5. + +--- + +## Directional Skewness + +Skewness measures asymmetry in distributions. + +The classical skewness coefficient is + +\[ +Skew(X)= +\frac{E[(X-\mu)^3]}{Var(X)^{3/2}}. +\] + +Using the directional decomposition, + +\[ +E[(X-\mu)^3] = U_3(\mu;X) - L_3(\mu;X). +\] + +Thus skewness can be written as + +\[ +Skew(X)= +\frac{U_3(\mu;X)-L_3(\mu;X)} +{(U_2(\mu;X)+L_2(\mu;X))^{3/2}}. +\] + +This expression provides a clear interpretation. + +- If \(U_3(\mu;X) > L_3(\mu;X)\), large positive deviations dominate and skewness is positive. +- If \(L_3(\mu;X) > U_3(\mu;X)\), large negative deviations dominate and skewness is negative. + +In applied settings, **whether extreme outcomes occur on the upside or the downside is often more decision-relevant than the overall asymmetry coefficient alone**. + +For example, financial return distributions with positive skewness frequently reflect patterns of frequent small losses punctuated by occasional large gains. In directional terms this corresponds to + +\[ +U_3(\mu;X) \gg L_3(\mu;X). +\] + +Conversely, strategies that produce steady small gains but occasionally experience large losses exhibit + +\[ +L_3(\mu;X) \gg U_3(\mu;X). +\] + +Directional skewness therefore identifies **which side of the distribution generates extreme asymmetry**, a distinction that symmetric skewness coefficients alone cannot fully describe. + +--- + +## Directional Kurtosis + +Kurtosis describes the magnitude of extreme deviations. + +Classically, kurtosis is often interpreted as a measure of **tail heaviness** (or sometimes distributional “peakedness”). + +The classical definition is + +\[ +Kurt(X)= +\frac{E[(X-\mu)^4]}{Var(X)^2}. +\] + +Using the directional representation, + +\[ +E[(X-\mu)^4] = U_4(\mu;X) + L_4(\mu;X). +\] + +Thus kurtosis becomes + +\[ +Kurt(X)= +\frac{U_4(\mu;X)+L_4(\mu;X)} +{(U_2(\mu;X)+L_2(\mu;X))^2}. +\] + +Directional statistics refines the classical interpretation. + +Instead of reporting only the total magnitude of extreme deviations, we may examine + +- **upper tail heaviness**: \(U_4(\mu;X)\) +- **lower tail heaviness**: \(L_4(\mu;X)\) + +Suppose two distributions share identical kurtosis. Classical statistics would describe both as equally heavy-tailed. + +Directional kurtosis reveals whether extreme observations arise primarily from the **upper tail** or the **lower tail**. + +For example, venture-capital portfolios may exhibit large values of + +\[ +U_4(\mu;X) +\] + +reflecting occasional extremely large gains, while certain credit portfolios may display large + +\[ +L_4(\mu;X) +\] + +reflecting rare but severe losses. + +Although both portfolios might share similar classical kurtosis, their **directional tail structures—and therefore their risk characteristics—are fundamentally different.** + +--- + +## Directional Distribution Profiles + +Combining directional partial moments across degrees produces a **directional profile of a distribution**. + +When using higher-order profiles, existence conditions matter: interpreting directional structure through order \(r\) requires the corresponding partial moments \(L_r(t;X)\) and \(U_r(t;X)\) to be finite. + +For a benchmark \(t\), the sequence + +\[ +L_0(t;X), L_1(t;X), L_2(t;X), \dots +\] + +describes + +- probability mass below the benchmark +- mean deviation below the benchmark +- variance below the benchmark +- higher-order tail structure + +Similarly, + +\[ +U_0(t;X), U_1(t;X), U_2(t;X), \dots +\] + +describe the corresponding properties **above the benchmark**. + +Together these sequences provide a detailed directional characterization of the distribution. + +To illustrate, consider a distribution with many small losses and occasional large gains. + +Relative to a benchmark \(t = 0\), such a distribution might exhibit + +\[ +L_0(t;X) \approx 0.60, \quad L_1(t;X)\text{ small}, \quad L_2(t;X)\text{ modest} +\] + +but + +\[ +U_0(t;X) \approx 0.40, \quad U_1(t;X)\text{ moderate}, \quad U_2(t;X)\text{ large}. +\] + +This directional profile indicates that losses occur more frequently, but gains—when they occur—are substantially larger. + +These profiles can also be visualized—for example, using bar charts of \(L_r\) and \(U_r\) across degrees \(r = 0,1,2,\dots\)—providing an intuitive graphical summary of directional distribution structure. + +Classical statistics might summarize the same distribution with a moderate mean and high variance. The directional profile reveals the **mechanism generating those aggregates**. + +Distributions that appear similar under symmetric statistics can therefore exhibit very different directional structures. Examining how deviations are distributed between the upper and lower regions often provides clearer insight into the sources of asymmetry and tail behavior. + +--- + +## Summary + +This chapter developed descriptive statistics derived from directional partial moments. + +Classical descriptive statistics summarize distributions through symmetric aggregates such as the mean, variance, skewness, and kurtosis. The directional framework reveals that each of these quantities arises from a pair of directional components measuring deviations above and below a benchmark. + +Viewing descriptive statistics in this way clarifies several important ideas. The mean represents the point at which upward and downward deviations balance. Variance combines upside and downside variability that may arise from very different sources. Higher-order moments such as skewness and kurtosis reflect asymmetries in directional tail behavior. + +More importantly, partial moments allow descriptive statistics to be defined relative to **externally meaningful benchmarks**, enabling analysts to examine distributions in the context in which outcomes are actually evaluated. + +A final bridge to the next chapter is immediate: because the degree-zero lower partial moment recovers the cumulative distribution function, the same directional framework used here for descriptive decomposition also yields direct nonparametric distribution estimation. + +The next chapter builds on this descriptive framework by showing how **entire distributions can be estimated directly from partial moments**, providing a nonparametric alternative to traditional density estimation methods and avoiding the bandwidth selection problems discussed in Chapter 1. diff --git a/tools/NNS/book/chapter-08-distribution-estimation.Rmd b/tools/NNS/book/chapter-08-distribution-estimation.Rmd new file mode 100644 index 0000000..59a644d --- /dev/null +++ b/tools/NNS/book/chapter-08-distribution-estimation.Rmd @@ -0,0 +1,355 @@ +# Distribution Estimation + +Chapter 7 introduced directional descriptive statistics derived from partial moments. +Those statistics summarize distributions while preserving directional information relative to meaningful benchmarks. + +The next step is **distribution estimation**. + +Classical statistics typically represents distributions through either + +- parametric models (such as the normal distribution), or +- smoothed nonparametric estimators (such as kernel density estimation). + +Parametric models impose strong structural assumptions, while many nonparametric estimators require externally chosen smoothing parameters such as bandwidths. + +The directional framework provides a different approach. Because the cumulative distribution function is itself a partial moment, entire distributions can be estimated directly from **empirical partial moments** without parametric assumptions or externally chosen smoothing parameters. + +This chapter develops that approach. + +--- + +## The Empirical Distribution Function + +Suppose we observe a sample + +\[ +x_1, x_2, \dots, x_n. +\] + +The **empirical distribution function (EDF)** is defined as + +\[ +\hat{F}_n(t) = +\frac{1}{n}\sum_{i=1}^{n} 1_{\{x_i \le t\}}. +\] + +This quantity represents the proportion of observations less than or equal to the benchmark \(t\). + +The EDF is the classical nonparametric estimator of the cumulative distribution function. + +A fundamental property follows from the directional framework developed earlier. The cumulative distribution function can be written as a degree-zero partial moment: + +\[ +F_X(t) = L_0(t;X). +\] + +Consequently, the empirical distribution function can be written as + +\[ +\hat{F}_n(t) = +\frac{1}{n}\sum_{i=1}^{n} (t-x_i)_+^0. +\] + +Thus the empirical distribution function is simply the **empirical degree-zero lower partial moment**. + +Distribution estimation therefore arises naturally within the directional framework. + +--- + +## Empirical Partial Moment Estimators + +More generally, partial moments can be estimated directly from sample data. + +For degree \(r \ge 0\), + +\[ +\hat{L}_r(t) = +\frac{1}{n}\sum_{i=1}^{n} (t-x_i)_+^r +\] + +\[ +\hat{U}_r(t) = +\frac{1}{n}\sum_{i=1}^{n} (x_i-t)_+^r. +\] + +These estimators converge to the population partial moments + +\[ +L_r(t;X) = E[(t-X)_+^r] +\] + +\[ +U_r(t;X) = E[(X-t)_+^r] +\] + +by the law of large numbers. + +Thus empirical partial moments provide estimators of directional deviation structure that do not require specifying a parametric family. + +Importantly, the case \(r=0\) produces the empirical distribution function itself. + +--- + +## Distribution Estimation from Partial Moments + +Because the cumulative distribution function equals the degree-zero partial moment, + +\[ +F_X(t) = L_0(t;X), +\] + +estimating \(L_0\) directly estimates the distribution. + +The empirical estimator + +\[ +\hat{F}_n(t) = +\frac{1}{n}\sum_{i=1}^{n} (t-x_i)_+^0 +\] + +therefore provides a **nonparametric estimate of the entire distribution**. + +This estimator has several desirable properties. + +### Nonparametric + +No parametric model is assumed, so the estimator applies broadly across distributional forms. + +### Data-Driven + +The estimate depends only on the observed sample. + +### Consistency + +By the **Glivenko–Cantelli theorem**, + +\[ +\sup_t |\hat{F}_n(t)-F_X(t)| \to 0. +\] + +This theorem states that the empirical distribution converges uniformly to the true distribution. In practical terms, the **largest possible difference between the empirical and true cumulative distributions across all benchmarks becomes arbitrarily small as the sample size grows**. This result holds for all probability distributions under standard conditions, not only for continuous distributions. + +Thus the empirical distribution provides a reliable estimator of the entire probability distribution under the usual i.i.d. sampling framework. + +--- + +## From Distribution to Density + +While the empirical distribution function estimates the cumulative distribution, analysts often wish to estimate the **probability density function**. + +For continuous distributions, + +\[ +f(t) = \frac{d}{dt}F_X(t). +\] + +Because + +\[ +F_X(t) = L_0(t;X), +\] + +this implies + +\[ +f(t) = \frac{d}{dt}L_0(t;X). +\] + +In practice the empirical distribution function is a step function, so its derivative does not produce a smooth density estimate. + +This distinction is important: the EDF already gives a complete nonparametric estimate of the CDF, but obtaining a smooth density estimate typically requires additional regularity assumptions and/or smoothing choices. + +Classical statistics addresses this issue using **kernel density estimation**, which smooths the empirical distribution using a bandwidth parameter. Another common alternative is the **histogram estimator**, which approximates the density by counting observations within fixed intervals. However, histograms also require selecting a bin width, which plays a role analogous to the bandwidth in kernel density estimation. + +The directional framework approaches the problem differently. Rather than smoothing the distribution directly, it identifies structural features of the distribution—such as the **mode and local concentration of probability mass**—through data-adaptive procedures that do not require externally chosen smoothing parameters. + +The theoretical foundation for this approach is established in Chapter 13, where degree-one partial moments are shown to recover the full distribution without smoothing parameters. The computational implementation — data-adaptive partitioning that locates modes and local probability mass concentration — is developed in Chapter 18. + +Practical implementation of these methods — including sampling from empirical distributions and generating PDFs via degree manipulation of `LPM.VaR` — is demonstrated in the NNS package vignette [Sampling and Simulation](https://cran.r-project.org/web/packages/NNS/vignettes/NNSvignette_05_Sampling.html). + +--- + +## Comparison with Kernel Density Estimation + +Kernel density estimation is one of the most widely used nonparametric density estimators. + +Given a kernel function \(K(\cdot)\) and bandwidth \(h\), the estimator is + +\[ +\hat{f}(t) = +\frac{1}{nh}\sum_{i=1}^{n} +K\left(\frac{t-x_i}{h}\right). +\] + +The **kernel function** determines the shape of the local weighting (common choices include Gaussian, Epanechnikov, and uniform kernels), while the **bandwidth** controls the degree of smoothing. + +Bandwidth selection is critical. + +- If \(h\) is too small, the estimate becomes noisy. +- If \(h\) is too large, important structure may be obscured. + +Selecting an appropriate bandwidth often requires cross-validation or heuristic rules. + +Empirical partial moment estimators avoid this issue for CDF estimation because they do not rely on smoothing parameters: the distribution estimate arises directly from the data. By contrast, smooth density estimation generally reintroduces smoothing or shape assumptions. + +--- + +## Example: Empirical Partial Moments + +Consider the observations + +\[ +x = \{-3,-1,0,2,4\}. +\] + +Let the benchmark be + +\[ +t = 1. +\] + +The empirical distribution function becomes + +\[ +\hat{F}_n(1)=\frac{3}{5}=0.6 +\] + +since three observations are less than or equal to 1. + +Now compute the first-degree empirical partial moments. + +Lower partial moment: + +\[ +\hat{L}_1(1)= +\frac{1}{5}\sum_{i=1}^{5}(1-x_i)_+ +\] + +\[ += +\frac{1}{5}(4+2+1+0+0) += +1.4 +\] + +Upper partial moment: + +\[ +\hat{U}_1(1)= +\frac{1}{5}\sum_{i=1}^{5}(x_i-1)_+ +\] + +\[ += +\frac{1}{5}(0+0+0+1+3) += +0.8. +\] + +These quantities describe the distribution relative to the benchmark \(t=1\): + +- 60% of observations lie below the benchmark +- the unconditional average shortfall below the benchmark is 1.4 +- the unconditional average excess above the benchmark is 0.8 + +Now compare with benchmark \(t=0\): + +\[ +\hat{F}_n(0)=\frac{3}{5}=0.6, +\quad +\hat{L}_1(0)=\frac{1}{5}(3+1+0+0+0)=0.8, +\quad +\hat{U}_1(0)=\frac{1}{5}(0+0+0+2+4)=1.2. +\] + +At \(t=1\), unconditional shortfall dominates unconditional excess (1.4 vs 0.8); at \(t=0\), unconditional excess dominates unconditional shortfall (1.2 vs 0.8). This illustrates how directional conclusions can change with the benchmark, even for the same sample. + +Together, these statistics provide a directional description of the distribution that complements the empirical distribution function. + +--- + +## Tail Sensitivity + +Because empirical partial moments aggregate deviations relative to benchmarks, they naturally reveal **tail structure**. + +Consider the first-degree lower partial moment + +\[ +\hat{L}_1(t) = +\frac{1}{n}\sum_{i=1}^{n}(t-x_i)_+. +\] + +This quantity measures the **unconditional average shortfall below the benchmark**. + +Similarly, + +\[ +\hat{U}_1(t) = +\frac{1}{n}\sum_{i=1}^{n}(x_i-t)_+. +\] + +measures the **unconditional average excess above the benchmark**. + +By examining these quantities across benchmarks \(t\), analysts can explore how deviations accumulate in the lower and upper regions of the distribution. + +The influence of extreme observations depends on the order of the partial moment. When \(r=0\) (the empirical distribution function), each observation contributes only through an indicator function and therefore influences the estimate equally regardless of magnitude. For \(r \ge 1\), however, deviations enter the calculation through powers of the distance from the benchmark. As the order \(r\) increases, extreme observations exert progressively greater influence on the estimate, reflecting the increasing emphasis on tail behavior. + +--- + +## Robustness Properties + +Empirical distribution estimators possess several robustness advantages that follow directly from their nonparametric construction, while still inheriting standard finite-sample variability. + +First, they reduce **model risk**. Because no parametric distribution is imposed, misspecification from choosing an incorrect family is avoided. + +Second, the estimator is **transparent**. Each observation contributes directly to the estimate through the indicator function \(1_{\{x_i \le t\}}\), ensuring that the distribution estimate reflects the empirical data without additional smoothing or transformation. + +Third, the estimator improves **systematically with sample size**. As additional observations are collected, the empirical distribution converges uniformly to the true distribution. + +Because partial moments measure deviations relative to benchmarks, extreme observations influence the estimates in proportion to their deviation magnitude when \(r \ge 1\). In applications such as risk management, where extreme outcomes carry important information, this sensitivity can be desirable because it preserves tail behavior that smoothing-based estimators may dilute. + +--- + +## Directional Distribution Analysis + +Combining empirical partial moments across benchmarks provides a detailed description of the distribution. + +For example, evaluating + +\[ +\hat{L}_0(t),\quad \hat{L}_1(t),\quad \hat{L}_2(t) +\] + +across values of \(t\) reveals + +- probability mass below each benchmark, +- average deviation below each benchmark, +- variance contribution below each benchmark. + +Similarly, + +\[ +\hat{U}_0(t),\quad \hat{U}_1(t),\quad \hat{U}_2(t) +\] + +describe corresponding behavior above the benchmark. + +Together these quantities form a **directional representation of the distribution**. + +Rather than summarizing the data with a few symmetric statistics, the directional framework allows analysts to examine how probability mass and deviation magnitudes accumulate across different regions of the distribution. + +--- + +## Summary + +This chapter examined distribution estimation from the perspective of directional statistics. + +A key observation is that the cumulative distribution function itself is a partial moment. Consequently, empirical partial moments provide a natural nonparametric method for estimating entire probability distributions. + +Several conclusions follow. + +First, the empirical distribution function is the empirical degree-zero lower partial moment. Second, empirical partial moments provide consistent estimators of directional deviation structure. Third, distribution estimation can be performed without parametric assumptions or externally chosen smoothing parameters. + +While the empirical distribution function provides a complete description of the distribution, applied analysis also requires understanding how distributions interact across variables and across states of the sample space. + +The next chapter begins Part III on dependence by showing why classical correlation can fail under nonlinear and asymmetric structures, motivating directional dependence measures built from the same partial-moment foundation. diff --git a/tools/NNS/book/chapter-09-why-correlation-fails.Rmd b/tools/NNS/book/chapter-09-why-correlation-fails.Rmd new file mode 100644 index 0000000..918f3db --- /dev/null +++ b/tools/NNS/book/chapter-09-why-correlation-fails.Rmd @@ -0,0 +1,586 @@ +# Why Correlation Fails + +Chapters 7 and 8 developed descriptive statistics and distribution estimation using directional partial moments. +Those results showed that many classical statistical quantities arise from aggregations of directional deviations relative to benchmarks. + +The next topic is **dependence between variables**. + +Classical statistics measures dependence primarily through **covariance and correlation**. +These statistics summarize relationships between variables with a single number. + +However, these measures possess fundamental limitations. They + +- measure only **linear association**, +- aggregate directional information symmetrically, +- and can obscure nonlinear, asymmetric, or tail-specific relationships. + +Directional statistics provides a deeper perspective. +Just as classical moments arise from aggregations of partial moments, **covariance and correlation arise from aggregations of directional co-partial moments**. + +This chapter explains why correlation can fail and establishes the connection between covariance and directional partial-moment matrices. + +--- + +## Classical Dependence Measures + +For two random variables \(X\) and \(Y\), the **covariance** is + +\[ +\operatorname{Cov}(X,Y) += +E[(X-\mu_X)(Y-\mu_Y)]. +\] + +Covariance measures the joint variation of the two variables relative to their means. + +The **Pearson correlation coefficient** standardizes covariance: + +\[ +\rho(X,Y) += +\frac{\operatorname{Cov}(X,Y)} +{\sigma_X\sigma_Y}. +\] + +The statistic lies in the interval + +\[ +-1 \le \rho(X,Y) \le 1. +\] + +Values near + +- \(1\) indicate strong positive linear association, +- \(-1\) indicate strong negative linear association, +- \(0\) indicate no linear association. + +The key limitation is that correlation measures **only linear relationships**. +If dependence is nonlinear, asymmetric, or concentrated in tails, correlation may understate it or miss it entirely. + +--- + +## Directional Co-Partial Moments + +Directional statistics partitions the joint distribution relative to benchmark values \(t_X\) and \(t_Y\). + +Four directional regions arise: + +\[ +X \le t_X,\; Y \le t_Y, +\] + +\[ +X \le t_X,\; Y > t_Y, +\] + +\[ +X > t_X,\; Y \le t_Y, +\] + +\[ +X > t_X,\; Y > t_Y. +\] + +These regions correspond to combinations of directional deviations. + +Benchmarks may be chosen in different ways depending on the application: + +- **External benchmarks**, such as policy targets, required returns, safety thresholds, or liability levels. +- **Internal benchmarks**, such as sample means, medians, or other distribution-derived reference points. + +The covariance decomposition developed in the next section uses the **means**: + +\[ +t_X = \mu_X, +\qquad +t_Y = \mu_Y. +\] + +Define the positive-part operator + +\[ +(x)_+ = \max(x,0). +\] + +The directional co-partial moments of order \(r,s\) are: + +### Co-Lower Partial Moment + +\[ +\operatorname{CoLPM}_{r,s}(X,Y) += +E[(t_X-X)_+^r (t_Y-Y)_+^s]. +\] + +This measures concordant lower-side co-movement: both variables are below their benchmarks. + +### Co-Upper Partial Moment + +\[ +\operatorname{CoUPM}_{r,s}(X,Y) += +E[(X-t_X)_+^r (Y-t_Y)_+^s]. +\] + +This measures concordant upper-side co-movement: both variables are above their benchmarks. + +### Divergent Lower Partial Moment + +\[ +\operatorname{DLPM}_{r,s}(X,Y) += +E[(X-t_X)_+^r (t_Y-Y)_+^s]. +\] + +This measures one divergent direction: \(X\) is above its benchmark while \(Y\) is below its benchmark. + +### Divergent Upper Partial Moment + +\[ +\operatorname{DUPM}_{r,s}(X,Y) += +E[(t_X-X)_+^r (Y-t_Y)_+^s]. +\] + +This measures the opposite divergent direction: \(X\) is below its benchmark while \(Y\) is above its benchmark. + +Together, these four quantities provide a **directional decomposition of dependence structure**. + +--- + +## Covariance from Co-Partial Moments + +Covariance can be expressed directly in terms of directional co-partial moments. + +Let the benchmarks equal the means: + +\[ +t_X=\mu_X, +\qquad +t_Y=\mu_Y. +\] + +From the directional decomposition introduced in Chapter 2, + +\[ +x = x_+ - (-x)_+. +\] + +Applying this to deviations gives + +\[ +X-\mu_X = (X-\mu_X)_+ - (\mu_X-X)_+ +\] + +and + +\[ +Y-\mu_Y = (Y-\mu_Y)_+ - (\mu_Y-Y)_+. +\] + +Define + +\[ +A=(X-\mu_X)_+, +\qquad +B=(\mu_X-X)_+, +\] + +\[ +C=(Y-\mu_Y)_+, +\qquad +D=(\mu_Y-Y)_+. +\] + +Then + +\[ +(X-\mu_X)(Y-\mu_Y) += +(A-B)(C-D). +\] + +Expanding gives + +\[ +(A-B)(C-D) += +AC + BD - AD - BC. +\] + +Substituting the definitions yields + +\[ +\begin{aligned} +(X-\mu_X)(Y-\mu_Y) +&= +(X-\mu_X)_+(Y-\mu_Y)_+ \\ +&\quad+ +(\mu_X-X)_+(\mu_Y-Y)_+ \\ +&\quad- +(X-\mu_X)_+(\mu_Y-Y)_+ \\ +&\quad- +(\mu_X-X)_+(Y-\mu_Y)_+. +\end{aligned} +\] + +Taking expectations gives + +\[ +\operatorname{Cov}(X,Y) += +\operatorname{CoUPM}_{1,1}(X,Y) ++ +\operatorname{CoLPM}_{1,1}(X,Y) +- +\operatorname{DLPM}_{1,1}(X,Y) +- +\operatorname{DUPM}_{1,1}(X,Y). +\] + +Thus covariance is the **signed aggregation of four directional co-partial moments**. + +This mirrors the earlier variance decomposition + +\[ +\operatorname{Var}(X) += +U_2(\mu;X)+L_2(\mu;X). +\] + +The difference is that covariance requires both concordant and divergent directional components. Concordant components enter positively. Divergent components enter negatively. + +--- + +## Covariance Matrices from Partial-Moment Matrices + +For a system of \(N\) variables, directional co-partial moments form matrices. + +Define the degree-1 directional matrices by + +\[ +\operatorname{CoLPM}_{ij} += +\operatorname{CoLPM}_{1,1}(X_i,X_j), +\] + +\[ +\operatorname{CoUPM}_{ij} += +\operatorname{CoUPM}_{1,1}(X_i,X_j), +\] + +\[ +\operatorname{DLPM}_{ij} += +\operatorname{DLPM}_{1,1}(X_i,X_j), +\] + +\[ +\operatorname{DUPM}_{ij} += +\operatorname{DUPM}_{1,1}(X_i,X_j). +\] + +Each matrix captures directional co-movement across the variables. + +The classical covariance matrix can be written as + +\[ +\Sigma += +\operatorname{CoLPM} ++ +\operatorname{CoUPM} +- +\operatorname{DLPM} +- +\operatorname{DUPM}. +\] + +The diagonal elements satisfy + +\[ +\Sigma_{ii} += +\operatorname{Var}(X_i). +\] + +This follows because when \(i=j\), the divergent partial moments vanish. A variable cannot be both above and below its own benchmark at the same observation. The expression therefore reduces to the variance decomposition derived earlier: + +\[ +\operatorname{Var}(X_i) += +U_2(\mu_i;X_i)+L_2(\mu_i;X_i). +\] + +Like their univariate counterparts, these directional matrices can be **estimated empirically from sample data** using sample co-partial moments. + +```{r pm-matrix-example} +library(NNS) + +set.seed(123) +x <- rnorm(100) +y <- rnorm(100) + +cov.mtx <- PM.matrix( + LPM_degree = 1, + UPM_degree = 1, + target = "mean", + variable = cbind(x, y), + pop_adj = TRUE +) + +cov.mtx + +# Reassembled covariance matrix +cov.mtx$clpm + cov.mtx$cupm - cov.mtx$dlpm - cov.mtx$dupm + +# Standard covariance matrix +cov(cbind(x, y)) +``` + +The reassembled matrix is identical to the standard covariance matrix, confirming the degree-1 directional decomposition in empirical form. + +--- + +## Gram-Matrix Structure of Concordant Co-Partial Moment Matrices + +The concordant co-partial moment matrices have a simple linear algebra structure. + +For a system of \(N\) variables observed over \(T\) periods, define the lower directional-deviation matrix \(L^{(r)}\) by + +\[ +L^{(r)}_{t i} += +(t_i-X_{i,t})_+^r. +\] + +The \(i\)-th column of \(L^{(r)}\) is the lower directional-deviation vector for variable \(X_i\). + +Then the co-lower partial moment matrix is + +\[ +\operatorname{CoLPM}^{(r)} += +\frac{1}{T} +\left(L^{(r)}\right)^\top L^{(r)}. +\] + +Similarly, define the upper directional-deviation matrix \(U^{(r)}\) by + +\[ +U^{(r)}_{t i} += +(X_{i,t}-t_i)_+^r. +\] + +Then the co-upper partial moment matrix is + +\[ +\operatorname{CoUPM}^{(r)} += +\frac{1}{T} +\left(U^{(r)}\right)^\top U^{(r)}. +\] + +Thus each concordant co-partial moment matrix is a **Gram matrix**: its entries are pairwise inner products of directional-deviation vectors. + +For any weight vector \(w\), + +\[ +\begin{aligned} +w^\top \operatorname{CoLPM}^{(r)} w +&= +\frac{1}{T} +w^\top +\left(L^{(r)}\right)^\top +L^{(r)} +w \\ +&= +\frac{1}{T} +\left\|L^{(r)}w\right\|^2 \\ +&\geq 0. +\end{aligned} +\] + +Therefore \(\operatorname{CoLPM}^{(r)}\) is positive semidefinite. The same argument applies to \(\operatorname{CoUPM}^{(r)}\). + +This explains why concordant co-partial moment matrices are symmetric and positive semidefinite: they are matrices of inner products between directional deviation vectors. The result is structural, not distributional. It does not require normality, linearity, or parametric assumptions. + +This point should not be confused with the covariance reconstruction above. The concordant matrices \(\operatorname{CoLPM}\) and \(\operatorname{CoUPM}\) are positive semidefinite Gram matrices. The covariance matrix is a **signed aggregation** that subtracts the divergent matrices: + +\[ +\Sigma += +\operatorname{CoLPM} ++ +\operatorname{CoUPM} +- +\operatorname{DLPM} +- +\operatorname{DUPM}. +\] + +The Gram structure explains why the directional building blocks are well-behaved. The signed aggregation explains how classical covariance is recovered from those building blocks. + +--- + +## Correlation as a Normalized Covariance + +The correlation matrix is obtained by standardizing covariance: + +\[ +\rho_{ij} += +\frac{\Sigma_{ij}} +{\sqrt{\Sigma_{ii}\Sigma_{jj}}}. +\] + +Since covariance itself is derived from directional matrices, correlation represents a further aggregation. + +The information hierarchy becomes + +\[ +(\operatorname{CoLPM},\operatorname{CoUPM},\operatorname{DLPM},\operatorname{DUPM}) +\rightarrow +\Sigma +\rightarrow +\rho. +\] + +Directional matrices therefore preserve more structural information about dependence than correlation alone. + +Correlation is useful when a single linear summary is appropriate. It is incomplete when the dependence structure differs across lower, upper, or divergent regions. + +--- + +## Nonlinear Dependence + +Correlation measures linear association and therefore fails when relationships are nonlinear. + +Consider + +\[ +Y=X^2 +\] + +with \(X\) symmetrically distributed around zero. + +In this case + +\[ +\operatorname{Corr}(X,Y)=0. +\] + +For example, if \(X\sim N(0,1)\), then + +\[ +\operatorname{Corr}(X,X^2)=0. +\] + +Despite zero correlation, the variables are perfectly dependent. + +Directional co-partial moments reveal this structure. + +With benchmark \(t_X=0\), + +- \(\operatorname{CoUPM}\) captures dependence when \(X>0\) and \(Y\) is above its benchmark, +- \(\operatorname{DUPM}\) or \(\operatorname{DLPM}\) captures the mirrored dependence when \(X<0\), depending on the benchmark chosen for \(Y\). + +The directional matrices expose strong dependence that the aggregated covariance can cancel. + +--- + +## Asymmetric Dependence + +**Asymmetric dependence** refers to dependence that differs between the upper and lower regions of the joint distribution. + +Examples include + +- financial assets that move together primarily during crashes, +- economic variables responding differently to positive and negative shocks, +- risk exposures concentrated in losses. + +Directional matrices isolate these effects directly. + +For example, + +\[ +\operatorname{CoLPM} +\] + +captures joint downside deviations, while + +\[ +\operatorname{CoUPM} +\] + +captures joint upside co-movement. + +If dependence is concentrated in one region, the directional matrices reveal it even when overall covariance appears modest. + +--- + +## Tail Dependence + +Extreme events often drive the most consequential relationships. + +Correlation averages dependence across the entire distribution and therefore may understate tail relationships. + +Directional co-partial moments of higher order emphasize extreme deviations: + +\[ +\operatorname{CoLPM}_{r,s}, +\qquad +\operatorname{CoUPM}_{r,s}. +\] + +Increasing \(r\) and \(s\) increases sensitivity to extreme observations. + +This concept is closely related to **tail dependence in copula theory**, which Chapter 10 examines in detail. + +--- + +## Information Loss in Aggregation + +The mapping from directional matrices to covariance is **many-to-one**. + +Different directional dependence structures can produce identical covariance values. + +Similarly, many covariance matrices produce identical correlation matrices after normalization. + +Thus correlation discards substantial structural information about joint distributions. + +Directional methods preserve this information by retaining contributions from each directional region separately. + +The directional representation is therefore strictly richer: + +\[ +\text{directional co-partial moments} +\rightarrow +\text{covariance} +\rightarrow +\text{correlation}. +\] + +Each arrow aggregates information. Once aggregated, the lost directional structure cannot generally be recovered without additional assumptions. + +--- + +## Summary + +This chapter examined the limitations of classical correlation and covariance. + +Key observations include: + +1. Correlation measures only linear association. +2. Covariance aggregates directional co-deviations across the joint distribution. +3. Covariance itself arises from directional co-partial moments. +4. The covariance matrix equals a signed aggregation of directional partial-moment matrices. +5. Concordant co-partial moment matrices are Gram matrices and are therefore symmetric and positive semidefinite. +6. Correlation is the normalized version of the covariance aggregate. + +Directional statistics therefore provides a richer representation of dependence structure. + +The following chapter develops directional dependence measures built from directional co-partial moments. diff --git a/tools/NNS/book/chapter-10-directional-dependence.Rmd b/tools/NNS/book/chapter-10-directional-dependence.Rmd new file mode 100644 index 0000000..ad85a3f --- /dev/null +++ b/tools/NNS/book/chapter-10-directional-dependence.Rmd @@ -0,0 +1,387 @@ +# Directional Dependence + +Chapter 9 showed that classical covariance and correlation arise from **aggregations of directional co-partial moments**. While correlation summarizes joint variation with a single symmetric statistic, many real-world relationships are **nonlinear, asymmetric, or concentrated in extreme events**. + +A familiar example occurs in financial markets. During ordinary periods, many assets appear weakly correlated. Yet during crises, losses often occur simultaneously across markets. Correlation averages across all observations and therefore may fail to capture this type of **asymmetric tail dependence**. + +Directional statistics addresses this limitation by examining how variables move relative to **benchmarks for each variable simultaneously**. Instead of collapsing joint behavior into a single number, the directional framework partitions the joint distribution and measures deviations within each region separately. + +This chapter develops **directional dependence** using co-partial moments. These statistics preserve the directional structure of the joint distribution and reveal nonlinear and asymmetric relationships that classical correlation can obscure. + +--- + +## Directional Benchmarks + +Let \(X\) and \(Y\) be random variables with benchmarks \(t_X\) and \(t_Y\). + +Benchmarks may be chosen in several ways depending on the application: + +- **Internal benchmarks**, such as the mean or median. +- **External benchmarks**, such as target returns or policy thresholds. +- **Context-specific benchmarks**, reflecting operational constraints or decision thresholds. + +The benchmarks partition the joint distribution into four directional regions: + +\[ +X \le t_X, \quad Y \le t_Y +\] + +\[ +X \le t_X, \quad Y > t_Y +\] + +\[ +X > t_X, \quad Y \le t_Y +\] + +\[ +X > t_X, \quad Y > t_Y +\] + +These four regions represent combinations of directional deviations for the two variables. + +| | \(Y \le t_Y\) | \(Y > t_Y\) | +|---|---|---| +| \(X \le t_X\) | **CoLPM region** | **DUPM region** | +| \(X > t_X\) | **DLPM region** | **CoUPM region** | + +Each quadrant corresponds directly to one of the four directional co-partial moments. + +--- + +## Co-Partial Moments + +Let the positive-part operator be + +\[ +(x)^+ = \max(x,0). +\] + +Directional co-partial moments measure joint deviations relative to the benchmarks. + +### Co-Lower Partial Moment + +\[ +CoLPM_{r,s}(X,Y) += +E[(t_X-X)_+^r (t_Y-Y)_+^s] +\] + +Joint deviations **below both benchmarks**. + +### Co-Upper Partial Moment + +\[ +CoUPM_{r,s}(X,Y) += +E[(X-t_X)_+^r (Y-t_Y)_+^s] +\] + +Joint deviations **above both benchmarks**. + +### Divergent Lower Partial Moment + +\[ +DLPM_{r,s}(X,Y) += +E[(X-t_X)_+^r (t_Y-Y)_+^s] +\] + +\(X\) above its benchmark while \(Y\) falls below. + +### Divergent Upper Partial Moment + +\[ +DUPM_{r,s}(X,Y) += +E[(t_X-X)_+^r (Y-t_Y)_+^s] +\] + +\(X\) below its benchmark while \(Y\) exceeds its benchmark. + +Together these four quantities provide a **directional decomposition of joint dependence**. + +--- + +## Worked Example + +Consider the sample + +\[ +(X,Y) = +(-3,-2), (-1,-1), (0,1), (2,4), (3,5). +\] + +Let the benchmarks be + +\[ +t_X = 0, \quad t_Y = 0. +\] + +Compute first-degree co-partial moments. + +### CoLPM + +\[ +CoLPM_{1,1} += +\frac{1}{5}(3\cdot2 + 1\cdot1 + 0 + 0 + 0) += +\frac{7}{5} += +1.4 +\] + +### CoUPM + +\[ +CoUPM_{1,1} += +\frac{1}{5}(0 + 0 + 0 + 2\cdot4 + 3\cdot5) += +\frac{23}{5} += +4.6 +\] + +### DLPM + +\[ +DLPM_{1,1} = 0 +\] + +### DUPM + +\[ +DUPM_{1,1} = 0 +\] + +The interpretation is immediate. + +- Downside dependence exists but is modest. +- Upside deviations occur together strongly. +- Divergent movements do not occur. + +In this dataset, whenever \(X\) is above its benchmark, \(Y\) is also above its benchmark, and whenever \(X\) is below its benchmark, \(Y\) is also below its benchmark. Consequently observations never fall into divergent regions. + +Boundary observations contribute zero to all co-partial moments. For example, the point \((0,1)\) lies exactly on the \(X\) benchmark, so both \((X-t_X)_+\) and \((t_X-X)_+\) equal zero. + +Real datasets rarely exhibit such perfect alignment, and in practice the divergent moments capture regions where one variable rises while the other falls. + +--- + +## Dependence Versus Correlation + +Covariance aggregates directional components: + +\[ +Cov(X,Y) += +CoUPM_{1,1} ++ +CoLPM_{1,1} +- +DLPM_{1,1} +- +DUPM_{1,1}. +\] + +Correlation further standardizes covariance. + +\[ +(CoLPM,CoUPM,DLPM,DUPM) +\rightarrow +Cov(X,Y) +\rightarrow +Corr(X,Y) +\] + +Directional statistics therefore preserves structural information lost through aggregation. + +--- + +## Nonlinear Dependence Detection + +Directional dependence can reveal nonlinear relationships that correlation cannot detect. + +Consider + +\[ +Y = X^2 +\] + +with \(X\) symmetrically distributed around zero. + +In this case + +\[ +Corr(X,Y)=0. +\] + +Despite zero correlation, the variables are perfectly dependent. + +Directional moments reveal the structure. + +When \(X>0\), both \(X\) and \(Y=X^2\) exceed their benchmarks, producing contributions to the **CoUPM region**. + +When \(X<0\), \(X\) lies below its benchmark while \(Y=X^2\) remains positive and therefore above its benchmark. These observations fall into the **DUPM region**, capturing the mirrored dependence structure. + +Thus the directional decomposition exposes dependence that the symmetric aggregation in correlation cancels. + +--- + +## Asymmetric Dependence + +Many systems exhibit **asymmetric dependence**, where relationships differ between positive and negative deviations. + +Financial markets provide a common example. + +Assets may behave largely independently during rising markets but move strongly together during market crashes. + +In such cases + +\[ +CoLPM_{1,1} \gg CoUPM_{1,1}. +\] + +A replicable simulation makes this asymmetry explicit. In the construction below, **negative shocks are shared** between both variables, while positive-side behavior is generated independently: + +```r +library(NNS) +set.seed(42) +n <- 500 +shock <- rnorm(n) + +x <- ifelse(shock < 0, shock, rnorm(n)) +y <- ifelse(shock < 0, shock + rnorm(n, 0, 0.1), rnorm(n)) + +Co.LPM(1, x, y, mean(x), mean(y)) +## [1] 0.2770795 +Co.UPM(1, x, y, mean(x), mean(y)) +## [1] 0.2103299 +D.LPM(1, 1, x, y, mean(x), mean(y)) +## [1] 0.06191035 +D.UPM(1, 1, x, y, mean(x), mean(y)) +## [1] 0.08481611 +``` + +In typical runs, the concordant downside component exceeds the upside component, confirming that joint downside co-movement dominates while upside dependence remains weaker. + +Correlation averages across all regions and may therefore appear moderate even when downside dependence dominates. + +--- + +## Tail-Sensitive Dependence + +Higher-order co-partial moments emphasize extreme deviations. + +Increasing the orders \(r\) and \(s\) increases sensitivity to large observations. + +\[ +CoLPM_{r,s}, \quad CoUPM_{r,s} +\] + +measure **tail dependence**. + +For example, a risk manager concerned with extreme joint losses may examine \(CoLPM_{2,2}\) or higher orders rather than \(CoLPM_{1,1}\), since larger powers place greater weight on large deviations. + +--- + +## Empirical Estimation + +Directional co-partial moments can be estimated from data. + +For observations + +\[ +(x_i,y_i), i=1,\dots,n +\] + +the empirical estimators are + +\[ +\widehat{CoLPM}_{r,s} += +\frac{1}{n} +\sum_{i=1}^{n} +(t_X-x_i)_+^r (t_Y-y_i)_+^s +\] + +\[ +\widehat{CoUPM}_{r,s} += +\frac{1}{n} +\sum_{i=1}^{n} +(x_i-t_X)_+^r (y_i-t_Y)_+^s +\] + +\[ +\widehat{DLPM}_{r,s} += +\frac{1}{n} +\sum_{i=1}^{n} +(x_i-t_X)_+^r (t_Y-y_i)_+^s +\] + +\[ +\widehat{DUPM}_{r,s} += +\frac{1}{n} +\sum_{i=1}^{n} +(t_X-x_i)_+^r (y_i-t_Y)_+^s +\] + +These converge to population values by the law of large numbers. + +Statistical inference for these estimators—including bootstrap procedures—is discussed later in the book when directional dependence measures are applied in empirical analysis. + +In practice these quantities are implemented in the **Nonlinear Nonparametric Statistics (NNS)** framework, which computes empirical co-partial moment matrices and nonlinear dependence measures. + +--- + +## Directional Dependence Profiles + +Directional dependence can be studied across **multiple moment orders**. + +\[ +CoLPM_{1,1},CoLPM_{2,2},CoLPM_{3,3},\dots +\] + +\[ +CoUPM_{1,1},CoUPM_{2,2},CoUPM_{3,3},\dots +\] + +These sequences describe how dependence changes across the distribution. + +Example: + +| Order | CoLPM | CoUPM | +|---|---|---| +| 1 | 1.2 | 1.1 | +| 2 | 3.9 | 1.4 | +| 3 | 8.5 | 1.8 | + +Moderate deviations appear symmetric, but higher orders reveal increasing **downside dependence**. + +Directional profiles therefore show **how dependence evolves from ordinary fluctuations to extreme events**. + +--- + +## Summary + +This chapter introduced directional dependence using co-partial moments. + +Key ideas: + +1. Joint distributions partition into four directional regions. +2. Co-partial moments measure deviations within those regions. +3. Covariance and correlation arise as **aggregations** of directional components. +4. Directional statistics reveals nonlinear, asymmetric, and tail-specific dependence. +5. Empirical estimators can be computed directly from data. +6. Dependence profiles show how relationships evolve across deviation magnitudes. + +Correlation therefore represents only a limited summary of joint behavior. + +Directional dependence provides a richer representation of relationships between variables. + +The next chapter connects this framework to **copula interpretation**, linking directional partial moments with rank-based dependence structures used in multivariate statistics. diff --git a/tools/NNS/book/chapter-11-directional-spectral-decomposition.Rmd b/tools/NNS/book/chapter-11-directional-spectral-decomposition.Rmd new file mode 100644 index 0000000..d4ac9a3 --- /dev/null +++ b/tools/NNS/book/chapter-11-directional-spectral-decomposition.Rmd @@ -0,0 +1,1291 @@ +# Directional Spectral Decomposition + +Chapters 9 and 10 developed dependence structure using directional co-partial moments. +Chapter 9 showed that covariance and correlation arise from aggregations of directional co-partial moments. +Chapter 10 showed that those directional components reveal nonlinear, asymmetric, and tail-specific dependence that correlation cannot detect. + +One further classical object lies downstream of covariance: its **eigenvalue decomposition**. + +Principal component analysis, factor models, covariance ellipses, and multivariate risk diagnostics all begin from the eigensystem of the covariance matrix. +Since covariance itself is recovered from directional co-partial moment matrices, the eigensystem is also recoverable from those directional components. + +This chapter establishes that result and draws out its consequence: + +\[ +\text{PCA diagonalizes covariance. +Directional decomposition explains where that covariance came from.} +\] + +The eigensystem is not replaced. +It is attributed. + +--- + +## Classical Spectral Decomposition + +Let + +\[ +Z = +\begin{pmatrix} +X \\ +Y +\end{pmatrix} +\] + +be a bivariate random vector with mean + +\[ +\mu = E[Z] += +\begin{pmatrix} +\mu_X \\ +\mu_Y +\end{pmatrix}. +\] + +The covariance matrix is + +\[ +\Sigma += +E[(Z-\mu)(Z-\mu)^\top]. +\] + +Since \(\Sigma\) is symmetric and positive semidefinite, it admits an orthonormal eigendecomposition + +\[ +\Sigma += +V\Lambda V^\top, +\] + +where \(V = (v_1, v_2)\) contains orthonormal eigenvectors and + +\[ +\Lambda = +\begin{pmatrix} +\lambda_1 & 0 \\ +0 & \lambda_2 +\end{pmatrix} +\] + +contains eigenvalues with \(\lambda_1 \geq \lambda_2 \geq 0\). + +Classical PCA identifies \(v_1\) as the direction of maximum variance: + +\[ +v_1 += +\arg\max_{\|v\|=1} v^\top \Sigma v, +\qquad +\lambda_1 = v_1^\top \Sigma v_1. +\] + +This is a powerful summary, but it remains a symmetric aggregate. +It does not say whether the variance along \(v_1\) originated from concordant lower-side co-movement, concordant upper-side co-movement, divergent behavior, or residual scatter within directional regions. + +Directional spectral decomposition answers that question. + +--- + +## Directional Recovery of the Eigensystem + +Chapter 9 established that the covariance matrix is recovered from directional co-partial moment matrices: + +\[ +\Sigma += +\operatorname{CoLPM} ++ +\operatorname{CoUPM} +- +\operatorname{DLPM} +- +\operatorname{DUPM}. +\] + +Since the covariance matrix determines its eigensystem, the classical eigensystem is also recovered from the directional aggregate: + +\[ +(\lambda_i, v_i) += +\operatorname{eig}_i +\!\left( +\operatorname{CoLPM} ++ +\operatorname{CoUPM} +- +\operatorname{DLPM} +- +\operatorname{DUPM} +\right). +\] + +The information hierarchy therefore extends to + +\[ +(\operatorname{CoLPM},\operatorname{CoUPM},\operatorname{DLPM},\operatorname{DUPM}) +\rightarrow +\Sigma +\rightarrow +(\lambda_i, v_i) +\rightarrow +\rho. +\] + +The ordering is not symmetric. +The directional matrices determine the covariance matrix and its eigenstructure. +The eigenstructure does not determine the directional matrices. +Many different directional structures can produce the same covariance matrix, and many different covariance matrices can produce similar principal directions. +Once the directional components are aggregated into \(\Sigma\), the information about where the co-movement occurred is generally lost. + +This asymmetry is central: + +\[ +\text{Directional structure recovers PCA. +PCA does not recover directional structure.} +\] + +--- + +## Quadrant Mean Geometry + +There is a second route to the eigensystem, more geometric than the co-partial moment reconstruction. +It passes through the **conditional means of the four directional quadrants**. + +Let the benchmarks be the component means: + +\[ +t_X = \mu_X, +\qquad +t_Y = \mu_Y. +\] + +The four directional quadrants are + +| Region | Condition | Interpretation | +|---|---|---| +| CUPM | \(X > \mu_X,\; Y > \mu_Y\) | concordant upper | +| CLPM | \(X \leq \mu_X,\; Y \leq \mu_Y\) | concordant lower | +| DLPM | \(X > \mu_X,\; Y \leq \mu_Y\) | divergent lower | +| DUPM | \(X \leq \mu_X,\; Y > \mu_Y\) | divergent upper | + +For each quadrant \(q\), define the quadrant probability + +\[ +p_q = P(Q = q), +\] + +the quadrant conditional mean + +\[ +m_q = E[Z \mid Q = q], +\] + +and the centered quadrant mean displacement + +\[ +u_q = m_q - \mu. +\] + +Because the quadrant means partition the distribution, + +\[ +\mu = \sum_q p_q m_q, +\] + +which gives + +\[ +\sum_q p_q u_q = 0. +\] + +This is the law of total expectation in vector form. +It implies that the weighted quadrant mean displacements must balance around the global mean. + +The displacement vectors \(u_q\) are the geometric objects of interest. +Each one points from the global mean to a quadrant conditional mean. +They identify where the conditional mass of the distribution sits after the directional partition. + +--- + +## Between-Within Covariance Decomposition + +The covariance matrix decomposes exactly through the quadrant partition. + +Inside quadrant \(q\), write + +\[ +Z - \mu += +(m_q - \mu) + (Z - m_q) += +u_q + \varepsilon_q, +\] + +where \(E[\varepsilon_q \mid Q = q] = 0\). + +The conditional covariance contribution from quadrant \(q\) is + +\[ +E[(Z-\mu)(Z-\mu)^\top \mid Q = q] += +u_q u_q^\top ++ +\operatorname{Cov}(Z \mid Q = q). +\] + +Averaging across quadrants gives + +\[ +\boxed{ +\Sigma += +\underbrace{\sum_q p_q u_q u_q^\top}_{\Sigma_Q} ++ +\underbrace{\sum_q p_q \operatorname{Cov}(Z \mid Q = q)}_{\Sigma_W}. +} +\] + +The first term, \(\Sigma_Q\), is the **between-quadrant covariance**: how much covariance arises from the locations of the quadrant conditional means relative to the global mean. + +The second term, \(\Sigma_W\), is the **within-quadrant covariance**: the remaining scatter around each quadrant mean, pooled across quadrants. + +This identity is the law of total covariance applied to the NNS quadrant partition. +It is exact and requires no distributional assumptions. + +The two terms answer different questions. + +\[ +\text{PCA of }\Sigma\text{ is PCA of total covariance.} +\] + +\[ +\text{PCA of }\Sigma_Q\text{ is PCA of conditional mean displacement.} +\] + +These need not coincide. + +--- + +## Rank-One Spectral Primitives + +Each quadrant contributes a rank-one matrix to the between-quadrant covariance: + +\[ +B_q += +p_q u_q u_q^\top. +\] + +When \(u_q \neq 0\), + +\[ +B_q u_q += +p_q u_q u_q^\top u_q += +p_q \|u_q\|^2 u_q. +\] + +The vector \(u_q\) is the nonzero eigenvector of \(B_q\), with eigenvalue + +\[ +\lambda_q = p_q \|u_q\|^2. +\] + +After normalization, \(v_q = u_q / \|u_q\|\) is the corresponding unit eigenvector. + +This is the precise sense in which each quadrant mean displacement is spectral. +It is not generally an eigenvector of the full covariance matrix. +It is exactly the eigenvector of its own rank-one contribution to \(\Sigma_Q\). + +The between-quadrant covariance is the sum of these rank-one primitives: + +\[ +\Sigma_Q += +B_{\operatorname{CUPM}} ++B_{\operatorname{CLPM}} ++B_{\operatorname{DLPM}} ++B_{\operatorname{DUPM}}. +\] + +Defining the matrix of weighted displacement columns, + +\[ +C += +\begin{pmatrix} +\sqrt{p_{\operatorname{CUPM}}}\,u_{\operatorname{CUPM}} & +\sqrt{p_{\operatorname{CLPM}}}\,u_{\operatorname{CLPM}} & +\sqrt{p_{\operatorname{DLPM}}}\,u_{\operatorname{DLPM}} & +\sqrt{p_{\operatorname{DUPM}}}\,u_{\operatorname{DUPM}} +\end{pmatrix}, +\] + +one has \(\Sigma_Q = CC^\top\). +The eigenvectors of \(\Sigma_Q\) are the left singular vectors of \(C\), built entirely from weighted quadrant mean displacements. + +\[ +\boxed{ +\text{Centered NNS quadrant means are rank-one spectral primitives.} +} +\] + +--- + +## Recovering Eigenvectors from Quadrant Conditional Means + +The preceding section gives the local rank-one statement. +We now make the recovery step explicit. + +At a given NNS split, the only inputs needed for the between-quadrant eigensystem are the quadrant probabilities and the quadrant conditional means: + +\[ +\{p_q, m_q\}_{q \in \{\operatorname{CUPM},\operatorname{CLPM},\operatorname{DLPM},\operatorname{DUPM}\}}. +\] + +From these quantities, + +\[ +\mu = \sum_q p_q m_q, +\qquad +u_q = m_q-\mu. +\] + +Construct the weighted conditional-mean matrix + +\[ +C = +\begin{pmatrix} +\sqrt{p_{\operatorname{CUPM}}}u_{\operatorname{CUPM}} & +\sqrt{p_{\operatorname{CLPM}}}u_{\operatorname{CLPM}} & +\sqrt{p_{\operatorname{DLPM}}}u_{\operatorname{DLPM}} & +\sqrt{p_{\operatorname{DUPM}}}u_{\operatorname{DUPM}} +\end{pmatrix}. +\] + +Then + +\[ +\Sigma_Q = CC^\top. +\] + +Thus the eigenvectors of \(\Sigma_Q\) are recovered from the quadrant conditional means by + +\[ +\Sigma_Q v_j = \lambda_j v_j. +\] + +Equivalently, take the singular value decomposition + +\[ +C = U\Lambda_Q^{1/2}R^\top. +\] + +Then + +\[ +\Sigma_Q = CC^\top = U\Lambda_Q U^\top, +\] + +so the columns of \(U\) are the eigenvectors of the conditional-mean covariance. +No raw observations are needed at this step once the quadrant conditional means and probabilities have been computed. + +This is the exact role of the conditional means: + +\[ +\boxed{ +\{p_q,m_q\}_q +\quad \Longrightarrow \quad +\{u_q\}_q +\quad \Longrightarrow \quad +C +\quad \Longrightarrow \quad +\Sigma_Q +\quad \Longrightarrow \quad +(v_{Q,j},\lambda_{Q,j}). +} +\] + +The eigenvectors are therefore not imposed externally. +They are recovered from the geometry of the four quadrant conditional means. + +A second, more visual representation comes from opposite quadrant centroid contrasts: + +\[ +a_C = m_{\operatorname{CUPM}} - m_{\operatorname{CLPM}}, +\qquad +a_D = m_{\operatorname{DLPM}} - m_{\operatorname{DUPM}}. +\] + +The first contrast joins the two concordant conditional means. +The second contrast joins the two divergent conditional means. +After normalization, + +\[ +\tilde v_C = \frac{a_C}{\|a_C\|}, +\qquad +\tilde v_D = \frac{a_D}{\|a_D\|}. +\] + +When the concordant and divergent centroid pairs lie on orthogonal local axes, these contrast directions are the eigenvectors of \(\Sigma_Q\): + +\[ +v_{Q,1} = \tilde v_C, +\qquad +v_{Q,2} = \tilde v_D, +\] + +up to signs and ordering. +This alignment condition is common in symmetric or nearly elliptical dependence structures, but it is not required for recovery. +Without this special alignment, the eigenvectors of \(\Sigma_Q\) are the orthogonal principal axes obtained by diagonalizing \(CC^\top\); they remain weighted linear combinations of the same quadrant conditional mean displacements. + +The distinction is important: + +\[ +\boxed{ +\text{Individual }u_q\text{ are eigenvectors of their own }B_q=p_qu_qu_q^\top. +} +\] + +\[ +\boxed{ +\text{The full between-quadrant eigenvectors are recovered by summing those }B_q\text{ through }\Sigma_Q=CC^\top. +} +\] + +For the original PCA eigensystem of the full covariance matrix, add the within-quadrant residual covariance: + +\[ +\Sigma = \Sigma_Q+\Sigma_W. +\] + +Then diagonalizing \(\Sigma\) recovers the classical eigenvectors. +As recursive NNS partitions refine and terminal cells shrink, \(\Sigma_W\) decreases. +At the finite-sample singleton limit, \(\Sigma_W=0\), so the eigenvectors of the full covariance are recovered entirely from conditional means of the terminal regions. + +--- + +## Quadrant Mean Slope Versus PC1 Within a Quadrant + +The between-within decomposition clarifies a distinction that arises when analyzing a single directional quadrant. + +Consider the CLPM region. +The CLPM mean displacement is + +\[ +u_{\operatorname{CLPM}} = m_{\operatorname{CLPM}} - \mu. +\] + +The line from \(\mu\) through \(m_{\operatorname{CLPM}}\) has direction + +\[ +g_{\operatorname{CLPM}} += +\frac{u_{\operatorname{CLPM}}}{\|u_{\operatorname{CLPM}}\|}. +\] + +This is the eigenvector of the rank-one matrix \(B_{\operatorname{CLPM}}\). + +By contrast, the first principal component of the CLPM observations is the leading eigenvector of + +\[ +\operatorname{Cov}(Z \mid \operatorname{CLPM}). +\] + +These are different matrices computed from different objects. +The quadrant mean slope is a between-centroid displacement direction. +The within-quadrant PC1 is a within-quadrant scatter direction. +The within-quadrant regression line is a conditional least-squares direction. + +\[ +\text{quadrant mean slope} +\neq +\text{within-quadrant PC1} +\neq +\text{within-quadrant regression line.} +\] + +For directional dependence analysis, the quadrant mean slope is the relevant object because it describes where conditional mass moved relative to the global mean benchmark. + +--- + +## Eigenvalue Attribution + +The most useful result is not merely that the eigensystem is recoverable. +It is that each eigenvalue can be attributed to directional sources. + +Let \(v_i\) be a unit eigenvector of \(\Sigma\). +Then + +\[ +\lambda_i = v_i^\top \Sigma v_i. +\] + +Substituting the between-within decomposition, + +\[ +\lambda_i += +v_i^\top \Sigma_Q v_i ++ +v_i^\top \Sigma_W v_i. +\] + +Expanding each term, + +\[ +\boxed{ +\lambda_i += +\sum_q p_q (v_i^\top u_q)^2 ++ +\sum_q p_q \, v_i^\top \operatorname{Cov}(Z \mid Q{=}q)\, v_i. +} +\] + +The first sum is the **between-quadrant contribution**: each term measures how much the \(i\)-th principal direction aligns with the conditional mean displacement of quadrant \(q\), weighted by the quadrant's probability. + +The second sum is the **within-quadrant contribution**: residual scatter around each quadrant mean, projected onto the principal direction. + +Define + +\[ +\lambda_{i,Q} = \sum_q p_q (v_i^\top u_q)^2, +\qquad +\lambda_{i,W} = \sum_q p_q \, v_i^\top \operatorname{Cov}(Z \mid Q{=}q)\, v_i, +\] + +so that + +\[ +\lambda_i = \lambda_{i,Q} + \lambda_{i,W}. +\] + +At the quadrant level, the between contribution from quadrant \(q\) to eigenvalue \(i\) is + +\[ +\lambda_{i,q}^{between} = p_q (v_i^\top u_q)^2. +\] + +This answers a question ordinary PCA does not pose: + +\[ +\boxed{ +\text{Which directional regions of the joint distribution generated this eigenvalue?} +} +\] + +If most of \(\lambda_1\) arises from CLPM and CUPM terms, the leading principal direction is driven by concordant co-movement. +If most of \(\lambda_1\) arises from DLPM and DUPM terms, the leading direction is driven by divergent behavior. +If \(\Sigma_W\) dominates, the principal axis reflects residual within-region scatter rather than separation among conditional means. + +PCA reports the axis. +Directional decomposition reports the sources of that axis. + +--- + +## Two-Dimensional Explicit Recovery + +In two dimensions the eigensystem is recovered in closed form from the directional components. + +Let the reconstructed covariance matrix be + +\[ +\Sigma += +\begin{pmatrix} +a & b \\ +b & d +\end{pmatrix}. +\] + +The eigenvalues are + +\[ +\lambda_{1,2} += +\frac{a+d}{2} +\pm +\sqrt{ +\left(\frac{a-d}{2}\right)^2 + b^2 +}. +\] + +The principal-axis angle \(\theta\) satisfies + +\[ +\tan(2\theta) += +\frac{2b}{a-d}. +\] + +The eigenvectors are + +\[ +v_1 += +\begin{pmatrix} +\cos\theta \\ +\sin\theta +\end{pmatrix}, +\qquad +v_2 += +\begin{pmatrix} +-\sin\theta \\ +\cos\theta +\end{pmatrix}. +\] + +Because the directional pieces reconstruct \(a\), \(b\), and \(d\) exactly, they also reconstruct \(\lambda_{1,2}\) and \(v_{1,2}\) exactly. +The recovery is complete. + +--- + +## Recursive Spectral Refinement + +Chapter 10 noted that NNS partitioning can be applied recursively, subdividing each region into further quadrants. +This produces a nested sequence of partitions \(\mathcal{P}_1, \mathcal{P}_2, \ldots, \mathcal{P}_O\). + +For any partition \(\mathcal{P}_O\) with cells \(r\), probabilities \(p_r\), conditional means \(m_r\), and displacements \(u_r = m_r - \mu\), the same decomposition applies: + +\[ +\Sigma += +\sum_r p_r u_r u_r^\top ++ +\sum_r p_r \operatorname{Cov}(Z \mid r). +\] + +Each cell contributes a rank-one between-cell primitive \(B_r = p_r u_r u_r^\top\). + +As the partition refines, within-cell covariance decreases. +At the limit where each terminal cell contains one observation, + +\[ +\operatorname{Cov}(Z \mid r) = 0 +\] + +for every cell, and the full empirical covariance is represented entirely by terminal cell centroids: + +\[ +\Sigma = \sum_r p_r u_r u_r^\top. +\] + +The eigenvalue perturbation bound follows immediately from Weyl's inequality. +Since \(\Sigma - B_O = W_O\), + +\[ +|\lambda_i(\Sigma) - \lambda_i(B_O)| \leq \|W_O\|_2. +\] + +When the eigenvalue gap is large, fewer partition steps are needed to recover the dominant principal direction. + +Recursive NNS partitioning therefore provides a **multiscale positive semidefinite decomposition of conditional mean covariance**: each split explains part of the residual total covariance through the between-child displacement, and these contributions accumulate monotonically as cells are refined. + +--- + +## Multivariate Extension + +The same construction extends to any dimension \(d\). + +A mean split across all \(d\) coordinates produces up to \(2^d\) orthants. +For any partition with cells \(r\), the decomposition + +\[ +\Sigma += +\sum_r p_r u_r u_r^\top ++ +\sum_r p_r \operatorname{Cov}(Z \mid r) +\] + +holds in \(\mathbb{R}^{d \times d}\). + +Each cell contributes a rank-one positive semidefinite matrix \(B_r = p_r u_r u_r^\top\). +The between-cell covariance has rank at most \(\min(d, K-1)\), where \(K\) is the number of occupied cells. +The \(K-1\) bound appears because the constraint \(\sum_r p_r u_r = 0\) removes one degree of freedom. + +The multivariate statement therefore matches the bivariate one: + +\[ +\boxed{ +\text{NNS partitions generate locatable rank-one spectral primitives in any dimension.} +} +\] + +See the following for a [detailed higher dimension example](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/nns-directional-spectral-decomposition.md). + +--- + +## Converse Failure + +The directional decomposition runs in one direction. + +Given the directional pieces \(\{p_q, m_q, \operatorname{Cov}(Z \mid Q{=}q)\}_q\), the covariance matrix and its eigensystem are recovered. +Given only \(\Sigma\) or its eigensystem \((V, \Lambda)\), the directional pieces are not recovered. + +Neither \(\Sigma\) nor \((V, \Lambda)\) determines + +- which observations belong to CLPM, CUPM, DLPM, or DUPM, +- the quadrant probabilities \(p_q\), +- the quadrant conditional means \(m_q\), +- the within-quadrant covariance matrices \(\operatorname{Cov}(Z \mid Q{=}q)\), +- or the higher-order directional moment profiles. + +In symbols: + +\[ +\{p_q, m_q, \operatorname{Cov}(Z \mid Q{=}q)\}_q +\rightarrow +\Sigma +\rightarrow +(V, \Lambda) +\] + +is recoverable in both steps, while + +\[ +(V, \Lambda) +\rightarrow +\{p_q, m_q, \operatorname{Cov}(Z \mid Q{=}q)\}_q +\] + +is not. + +This is another instance of the information-loss principle developed throughout the book. +Directional components determine symmetric aggregates. +Symmetric aggregates do not generally determine directional components. + +--- + +## Correct Claims and Caveats + +Several precise statements follow from the results above. + +Correct: + +\[ +\text{NNS directional co-partial moment matrices recover covariance and therefore recover the classical eigensystem.} +\] + +Correct: + +\[ +\text{Each centered quadrant mean }u_q\text{ is the eigenvector of }B_q = p_q u_q u_q^\top. +\] + +Correct: + +\[ +\text{Classical eigenvalues admit exact directional attribution: } +\lambda_i = \lambda_{i,Q} + \lambda_{i,W}. +\] + +Correct: + +\[ +\text{The full eigenvectors of }\Sigma_Q\text{ are obtained after summing the }B_q\text{ matrices, not before.} +\] + +Not generally correct: + +\[ +\text{Every line connecting opposite quadrant centroids is an eigenvector of }\Sigma_Q. +\] + +That holds only when concordant and divergent quadrant means lie on orthogonal axes, a special alignment condition that may hold approximately for nearly elliptical positive dependence but is not guaranteed in general. + +Not generally correct: + +\[ +\lambda_1(\Sigma_Q + \Sigma_W) = \lambda_1(\Sigma_Q) + \lambda_1(\Sigma_W). +\] + +Eigenvalues are not additive across summands. +The exact attribution uses the Rayleigh quotient along the same eigenvector: + +\[ +\lambda_i += +v_i^\top \Sigma_Q v_i ++ +v_i^\top \Sigma_W v_i. +\] + +These caveats strengthen rather than weaken the results. +They identify precisely what is claimed: locatable spectral attribution, not a replacement for linear algebra. + +--- + +## Quadrant Decomposition in R + +The following functions implement the between-within decomposition. +Population normalization \(1/n\) is used throughout to match the expectation formulas above. + +```{r spectral-setup} +pop_cov <- function(Z) { + Z <- as.matrix(Z) + Zc <- sweep(Z, 2, colMeans(Z), FUN = "-") + crossprod(Zc) / nrow(Zc) +} + +quadrant_decomposition <- function(Z) { + Z <- as.matrix(Z) + stopifnot(ncol(Z) == 2) + + n <- nrow(Z) + mu <- colMeans(Z) + mx <- mu[1]; my <- mu[2] + + quadrants <- list( + CUPM = (Z[,1] > mx) & (Z[,2] > my), + CLPM = (Z[,1] <= mx) & (Z[,2] <= my), + DLPM = (Z[,1] > mx) & (Z[,2] <= my), + DUPM = (Z[,1] <= mx) & (Z[,2] > my) + ) + + Sigma_Q <- matrix(0, 2, 2) + Sigma_W <- matrix(0, 2, 2) + details <- vector("list", length(quadrants)) + centroids <- vector("list", length(quadrants)) + + for (i in seq_along(quadrants)) { + qname <- names(quadrants)[i] + mask <- quadrants[[i]] + n_q <- sum(mask) + if (n_q == 0) next + + p_q <- n_q / n + Z_q <- Z[mask, , drop = FALSE] + m_q <- colMeans(Z_q) + u_q <- m_q - mu + + Sigma_Q <- Sigma_Q + p_q * tcrossprod(u_q) + Sigma_W <- Sigma_W + p_q * pop_cov(Z_q) + + centroids[[i]] <- data.frame( + quadrant = qname, + n = n_q, + p = p_q, + mean_x = m_q[1], + mean_y = m_q[2], + u_x = u_q[1], + u_y = u_q[2] + ) + + details[[i]] <- data.frame( + quadrant = qname, + n = n_q, + p = round(p_q, 6), + mean_x = round(m_q[1], 6), + mean_y = round(m_q[2], 6), + u_x = round(u_q[1], 6), + u_y = round(u_q[2], 6), + lambda_rank1 = round(p_q * sum(u_q^2), 6) + ) + } + + list( + mu = mu, + Sigma = pop_cov(Z), + Sigma_Q = Sigma_Q, + Sigma_W = Sigma_W, + centroids = do.call(rbind, centroids), + details = do.call(rbind, details) + ) +} +``` + +Generate positively dependent bivariate data, compute the quadrant decomposition, and pass \(\Sigma\) directly to `eigen` for attribution. + +```{r spectral-data} +set.seed(123) +n <- 10000 +rho <- 0.70 +R <- matrix(c(1, rho, rho, 1), 2, 2) +Z <- matrix(rnorm(2 * n), n, 2) %*% chol(R) + +D <- quadrant_decomposition(Z) +eig_classical <- eigen(D$Sigma) + +D$details +``` + +The quadrant summary shows the four conditional mean displacements \(u_q\) and their rank-one eigenvalues \(p_q \|u_q\|^2\). +For positively dependent data the concordant quadrants (CUPM, CLPM) carry large displacements in opposite directions along the main axis, while the divergent quadrants (DLPM, DUPM) show much smaller displacements orthogonal to it. + +--- + +## Rank-One Primitive Verification in R + +The previous table reports the rank-one eigenvalue for each quadrant. +The following code verifies the stronger statement from Section 11.5 directly: for each quadrant, + +\[ +B_q u_q = p_q u_q u_q^\top u_q = p_q \|u_q\|^2 u_q. +\] + +It also verifies that summing the four rank-one primitives reconstructs the between-quadrant conditional-mean covariance \(\Sigma_Q\). + +```{r rank-one-primitive-verification} +rank_one_primitive_check <- function(D) { + pieces <- lapply(seq_len(nrow(D$centroids)), function(i) { + row <- D$centroids[i, ] + u <- c(row$u_x, row$u_y) + p <- row$p + + Bq <- p * tcrossprod(u) + lambda <- p * sum(u^2) + + lhs <- as.numeric(Bq %*% u) + rhs <- lambda * u + + eig_Bq <- eigen(Bq, symmetric = TRUE) + v_q <- as.numeric(u / sqrt(sum(u^2))) + + list( + Bq = Bq, + table = data.frame( + quadrant = row$quadrant, + lambda_rank1 = lambda, + max_abs_Bu_minus_lambda_u = max(abs(lhs - rhs)), + alignment_with_eigen_Bq = abs(drop(crossprod(v_q, eig_Bq$vectors[, 1]))) + ) + ) + }) + + Sigma_Q_from_Bq <- Reduce(`+`, lapply(pieces, `[[`, "Bq")) + + list( + checks = do.call(rbind, lapply(pieces, `[[`, "table")), + Sigma_Q_from_Bq = Sigma_Q_from_Bq, + max_abs_Sigma_Q_error = max(abs(Sigma_Q_from_Bq - D$Sigma_Q)) + ) +} + +R1 <- rank_one_primitive_check(D) + +R1$checks +R1$max_abs_Sigma_Q_error +``` + +The column `max_abs_Bu_minus_lambda_u` should be numerically zero, confirming that each centered quadrant conditional mean is the eigenvector of its own rank-one matrix. +The column `alignment_with_eigen_Bq` should be one up to floating-point precision, confirming that the normalized vector \(u_q / \|u_q\|\) is the same direction recovered by `eigen(Bq)`. +The final scalar verifies + +\[ +\Sigma_Q = B_{\operatorname{CUPM}} + B_{\operatorname{CLPM}} + B_{\operatorname{DLPM}} + B_{\operatorname{DUPM}}. +\] + +--- + +## Conditional-Mean Eigenvector Recovery in R + +The following code uses only the quadrant probabilities and quadrant conditional means stored in `D$centroids`. +It reconstructs \(C\), then \(\Sigma_Q = CC^\top\), then the eigenvectors of the between-quadrant conditional-mean covariance. + +```{r conditional-mean-eigenvector-recovery} +recover_eigen_from_quadrant_means <- function(D) { + U <- as.matrix(D$centroids[, c("u_x", "u_y")]) + P <- D$centroids$p + + Cmat <- t(sqrt(P) * U) + Sigma_Q_from_means <- Cmat %*% t(Cmat) + eig_Q <- eigen(Sigma_Q_from_means) + + list( + C = Cmat, + Sigma_Q_from_means = Sigma_Q_from_means, + eig_Q = eig_Q + ) +} + +Qrec <- recover_eigen_from_quadrant_means(D) + +# This should match D$Sigma_Q, which was accumulated quadrant by quadrant. +max(abs(Qrec$Sigma_Q_from_means - D$Sigma_Q)) + +# Eigenvectors recovered from quadrant conditional means. +Qrec$eig_Q$vectors + +# Same eigensystem obtained by diagonalizing the stored Sigma_Q. +eigen(D$Sigma_Q)$vectors + +# Eigenvector signs are arbitrary. Absolute inner products should be near 1. +abs(t(Qrec$eig_Q$vectors) %*% eigen(D$Sigma_Q)$vectors) +``` + +The recovery above is the explicit conditional-mean step: + +\[ +\{p_q,m_q\}_q +\rightarrow +C +\rightarrow +CC^\top +\rightarrow +\operatorname{eig}(CC^\top). +\] + +The same conditional means also provide the visible concordant and divergent centroid contrasts. + +```{r centroid-contrast-directions} +unit <- function(x) as.numeric(x / sqrt(sum(x^2))) + +centroid <- function(D, qname) { + row <- D$centroids[D$centroids$quadrant == qname, ] + c(row$mean_x, row$mean_y) +} + +v_concordant <- unit(centroid(D, "CUPM") - centroid(D, "CLPM")) +v_divergent <- unit(centroid(D, "DLPM") - centroid(D, "DUPM")) + +contrast_comparison <- cbind( + concordant_contrast = v_concordant, + Sigma_Q_v1 = Qrec$eig_Q$vectors[, 1], + divergent_contrast = v_divergent, + Sigma_Q_v2 = Qrec$eig_Q$vectors[, 2] +) + +round(contrast_comparison, 6) + +# Alignment between contrast directions and the Sigma_Q eigenvectors. +round(c( + concordant_with_v1 = abs(drop(crossprod(v_concordant, Qrec$eig_Q$vectors[, 1]))), + divergent_with_v2 = abs(drop(crossprod(v_divergent, Qrec$eig_Q$vectors[, 2]))) +), 6) +``` + +For the positively dependent example, the concordant contrast aligns closely with the first between-quadrant eigenvector and the divergent contrast aligns closely with the second. +In more asymmetric samples, the exact recovery remains \(C \rightarrow CC^\top \rightarrow \operatorname{eig}(CC^\top)\), while the centroid contrasts provide the directly visible directional geometry. + +--- + +## Eigenvalue Attribution in R + +The following code attributes each classical eigenvalue into between-quadrant and within-quadrant components using the Rayleigh quotient. + +```{r spectral-attribution} +V <- eig_classical$vectors + +attribute_lambda <- function(v, Sigma_Q, Sigma_W) { + v <- matrix(v, ncol = 1) + between <- drop(t(v) %*% Sigma_Q %*% v) + within <- drop(t(v) %*% Sigma_W %*% v) + c(between = between, within = within, total = between + within) +} + +attrib_1 <- attribute_lambda(V[, 1], D$Sigma_Q, D$Sigma_W) +attrib_2 <- attribute_lambda(V[, 2], D$Sigma_Q, D$Sigma_W) + +rbind(lambda_1 = attrib_1, lambda_2 = attrib_2) +eig_classical$values +``` + +This is the exact decomposition \(\lambda_i = v_i^\top \Sigma_Q v_i + v_i^\top \Sigma_W v_i\). +The totals in each row match the classical eigenvalues to floating-point precision. + +Quadrant-level attribution of the between component: + +```{r spectral-quadrant-attribution} +quadrant_between_contrib <- function(Z, v) { + Z <- as.matrix(Z) + v <- as.numeric(v) + n <- nrow(Z) + mu <- colMeans(Z) + mx <- mu[1]; my <- mu[2] + + quadrants <- list( + CUPM = (Z[,1] > mx) & (Z[,2] > my), + CLPM = (Z[,1] <= mx) & (Z[,2] <= my), + DLPM = (Z[,1] > mx) & (Z[,2] <= my), + DUPM = (Z[,1] <= mx) & (Z[,2] > my) + ) + + out <- data.frame(quadrant = names(quadrants), contribution = NA_real_) + + for (i in seq_along(quadrants)) { + mask <- quadrants[[i]] + if (sum(mask) == 0) { out$contribution[i] <- 0; next } + p_q <- sum(mask) / n + u_q <- colMeans(Z[mask, , drop = FALSE]) - mu + out$contribution[i] <- p_q * sum(v * u_q)^2 + } + + out +} + +q_attr_v1 <- quadrant_between_contrib(Z, V[, 1]) +q_attr_v2 <- quadrant_between_contrib(Z, V[, 2]) + +# PC1: driven by concordant quadrants +q_attr_v1 +sum(q_attr_v1$contribution) # matches attrib_1["between"] + +# PC2: driven by divergent quadrants +q_attr_v2 +sum(q_attr_v2$contribution) # matches attrib_2["between"] +``` + +For this positively dependent example the pattern is clear. +PC1 receives nearly all its between-quadrant contribution from CUPM and CLPM: the leading principal direction is a concordant co-movement axis. +PC2 receives nearly all its between-quadrant contribution from DLPM and DUPM: the minor axis is a divergent direction. +This is the interpretive content that classical PCA alone cannot supply. + +--- + +## Visualizing the CLPM Mean Slope + +The following figure displays the three distinct directions associated with the CLPM region: the quadrant mean slope, the within-CLPM regression line, and the within-CLPM first principal component. +The exact picture varies by random seed; the three lines generally differ because they are computed from different statistical objects. + +```{r clpm-mean-slope-figure, fig.width=6, fig.height=6} +set.seed(321) +n <- 5000 +Z0 <- matrix(rnorm(2 * n), n, 2) +mu0 <- colMeans(Z0) +clpm_mask <- Z0[,1] <= mu0[1] & Z0[,2] <= mu0[2] +Zc <- Z0[clpm_mask, , drop = FALSE] +mc <- colMeans(Zc) + +plot( + Zc[, 1], Zc[, 2], + pch = 1, cex = 0.35, + xlab = "X", ylab = "Y", + main = "CLPM: Mean Slope, Regression, and PC1" +) +abline(v = mu0[1], lty = 2) +abline(h = mu0[2], lty = 2) +points(mu0[1], mu0[2], pch = 19, cex = 1.2) +points(mc[1], mc[2], pch = 19, cex = 1.2) + +# Quadrant mean slope: line through global mean toward CLPM conditional mean +slope_mean <- (mc[2] - mu0[2]) / (mc[1] - mu0[1]) +abline(a = mu0[2] - slope_mean * mu0[1], b = slope_mean, col = "green", lwd = 2) + +# Linear regression inside CLPM +fit <- lm(Zc[, 2] ~ Zc[, 1]) +abline(fit, col = "gold", lwd = 2) + +# PC1 inside CLPM: leading eigenvector of within-CLPM covariance +pc1 <- eigen(pop_cov(Zc))$vectors[, 1] +slope_pc1 <- pc1[2] / pc1[1] +abline(a = mc[2] - slope_pc1 * mc[1], b = slope_pc1, col = "blue", lwd = 2) + +legend( + "bottomright", + legend = c("CLPM mean slope", "CLPM regression", "CLPM PC1"), + col = c("green", "gold", "blue"), + lwd = 2, + bty = "n" +) +``` + +The green line is the eigenvector direction of the rank-one matrix + +\[ +B_{\operatorname{CLPM}} += +p_{\operatorname{CLPM}} u_{\operatorname{CLPM}} u_{\operatorname{CLPM}}^\top. +\] + +The blue line is the leading eigenvector of \(\operatorname{Cov}(Z \mid \operatorname{CLPM})\). +The yellow line is the ordinary least-squares regression line within the CLPM subset. +They differ because they summarize different statistical objects. + +--- + +## Practical Interpretation + +Directional spectral decomposition changes what can be said about a classical PCA result. + +**Risk and Stress Testing.** +If the leading eigenvalue of a return covariance matrix is dominated by CLPM contributions, the leading risk factor is primarily a joint downside event. +That is more actionable than knowing only that assets load on a common factor. + +**Portfolio Construction.** +Mean-variance optimization treats the covariance matrix as a single object. +Directional spectral decomposition separates it into concordant downside, concordant upside, divergent, between-centroid, and within-region components. +An analyst can ask whether an optimized portfolio is exposed to broad covariance or specifically to lower-tail covariance. + +**Nonlinear Dependence.** +When correlation is small but \(\Sigma_Q\) is large relative to \(\Sigma_W\), the conditional means are separated across quadrants in a way that total covariance may mask. +This signals potential nonlinear or regime-dependent structure worth investigating through the full directional dependence measures of Chapter 10. + +**Model Diagnostics.** +The ratio + +\[ +D_{spectral} = \frac{\operatorname{tr}(\Sigma_Q)}{\operatorname{tr}(\Sigma)} +\] + +measures the share of total covariance trace explained by between-quadrant mean displacement. +A high value indicates that the quadrant partition is capturing the relevant second-moment geometry. +A low value indicates that within-region residual scatter dominates, and finer partitioning or higher-order moments may be warranted. + +A full numerical implementation with transition matrix estimation, one-step predictive mixtures, and dynamic eigenvalue attribution by transition path is available at [OVVO-Financial/NNS: directional-markov-regimes-pca.md](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/directional-markov-regimes-pca.md). + +--- + +## Summary + +This chapter extended the directional decomposition of covariance into spectral analysis. + +Key results: + +1. The directional co-partial moment matrices recover covariance and therefore recover the classical PCA eigensystem: + +\[ +(\operatorname{CoLPM},\operatorname{CoUPM},\operatorname{DLPM},\operatorname{DUPM}) +\rightarrow +\Sigma +\rightarrow +(\lambda_i, v_i). +\] + +2. The covariance matrix decomposes into between-quadrant and within-quadrant components: + +\[ +\Sigma += +\sum_q p_q u_q u_q^\top ++ +\sum_q p_q \operatorname{Cov}(Z \mid Q{=}q). +\] + +3. Each quadrant conditional mean displacement generates a rank-one spectral primitive + +\[ +B_q = p_q u_q u_q^\top +\] + +with eigenvector \(u_q\) and eigenvalue \(p_q \|u_q\|^2\). + +4. Classical eigenvalues admit exact directional attribution: + +\[ +\lambda_i += +\sum_q p_q (v_i^\top u_q)^2 ++ +\sum_q p_q \, v_i^\top \operatorname{Cov}(Z \mid Q{=}q)\, v_i. +\] + +5. PCA is downstream of the NNS directional structure. +The directional components recover PCA and explain which quadrant regions generated the result. + +6. The converse fails: the eigensystem does not recover the directional components. + +The central message is: + +\[ +\text{PCA diagonalizes covariance. +Directional decomposition explains the sources of that covariance.} +\] + +The next chapter connects the directional partial moment framework to **copula interpretation**, linking co-partial moments with rank-based dependence structures used in multivariate statistics. \ No newline at end of file diff --git a/tools/NNS/book/chapter-12-copula-interpretation.Rmd b/tools/NNS/book/chapter-12-copula-interpretation.Rmd new file mode 100644 index 0000000..8646ecb --- /dev/null +++ b/tools/NNS/book/chapter-12-copula-interpretation.Rmd @@ -0,0 +1,437 @@ +# Copula Interpretation + +Chapters 9 and 10 showed that classical dependence measures such as covariance and correlation arise from **aggregations of directional co-partial moments**. Directional statistics preserves the structure of joint deviations by separating contributions across regions of the joint distribution rather than collapsing them into a single summary statistic. + +Another widely used framework for describing dependence is **copula theory**, which represents dependence by transforming variables into probability space and isolating the joint structure from the marginal distributions. + +This chapter shows that the directional framework connects naturally to copula theory. In particular, directional co-partial moments can be interpreted as **magnitude-weighted dependence measures within copula space**. + +--- + +## Copula Fundamentals + +Let \(X\) and \(Y\) be continuous random variables with cumulative distribution functions + +\[ +F_X(x), \qquad F_Y(y). +\] + +Define the probability transforms + +\[ +U = F_X(X), \qquad V = F_Y(Y). +\] + +By the probability integral transform, + +\[ +U, V \sim \text{Uniform}(0,1). +\] + +This result holds when \(X\) and \(Y\) are continuous random variables. For discrete or mixed distributions the probability integral transform requires minor adjustments, but the continuous case suffices for the conceptual development here. + +The joint distribution of \((U,V)\) is called the **copula** of \((X,Y)\): + +\[ +C(u,v) = P(U \le u, V \le v). +\] + +Sklar’s theorem states that any joint distribution can be written + +\[ +F_{X,Y}(x,y) += +C(F_X(x),F_Y(y)). +\] + +Thus the copula isolates the **dependence structure independently of the marginal distributions**. + +--- + +## Directional Statistics and Probability Space + +The directional framework provides a natural interpretation of this transformation. + +From earlier chapters, the cumulative distribution function equals the degree-zero lower partial moment: + +\[ +F_X(t) = L_0(t;X). +\] + +Thus the copula transformation + +\[ +U = F_X(X) +\] + +can be written + +\[ +U = L_0(X;X). +\] + +In other words, copula coordinates arise directly from **directional probability transforms**. + +Each observation is mapped into probability space according to its position within the cumulative distribution. + +--- + +## Directional Regions in Copula Space + +Once transformed, the joint distribution lies in the unit square + +\[ +[0,1]^2. +\] + +Benchmarks partition this square into directional regions. + +For probability thresholds \(u_t\) and \(v_t\), + +| | \(V \le v_t\) | \(V > v_t\) | +|---|---|---| +| \(U \le u_t\) | joint lower region | divergent region | +| \(U > u_t\) | divergent region | joint upper region | + +These correspond directly to the four directional regions defined earlier: + +- **CoLPM region:** both variables below benchmark +- **CoUPM region:** both variables above benchmark +- **DLPM region:** \(X\) above benchmark, \(Y\) below +- **DUPM region:** \(X\) below benchmark, \(Y\) above + +Thus the directional framework partitions the copula domain in the same way that co-partial moments partition the original joint distribution. + +--- + +## Co-Partial Moments as Weighted Copula Regions + +Directional co-partial moments measure deviations within these regions. + +For benchmarks \(t_X\) and \(t_Y\), + +\[ +CoLPM_{r,s}(X,Y) += +E[(t_X-X)_+^r (t_Y-Y)_+^s]. +\] + +The corresponding copula probability is + +\[ +P(X \le t_X, Y \le t_Y) += +C(F_X(t_X),F_Y(t_Y)). +\] + +Equivalently, + +\[ +CoLPM_{0,0}(t_X,t_Y) += +C(F_X(t_X),F_Y(t_Y)). +\] + +Thus + +- copulas measure **probability of directional regions**, while +- co-partial moments measure **magnitude of deviations within those regions**. + +Higher orders \(r,s\) increase sensitivity to extreme observations, producing a continuous generalization of tail dependence. + +--- + +## Copula Representation of Co-Partial Moments + +Directional co-partial moments admit a direct representation in copula space. + +**Theorem 11.1 (Copula Representation of Co-Partial Moments)** + +Let \(X\) and \(Y\) be continuous random variables with copula \(C(u,v)\) and quantile functions \(Q_X(u)\), \(Q_Y(v)\). Here + +\[ +Q_X(u)=F_X^{-1}(u), \qquad Q_Y(v)=F_Y^{-1}(v) +\] + +denote the marginal quantile functions. + +Then the co-lower partial moment can be written + +\[ +CoLPM_{r,s}(X,Y) += +\int_0^1 +\int_0^1 +(t_X - Q_X(u))_+^r +(t_Y - Q_Y(v))_+^s +\, dC(u,v). +\] + +**Proof.** + +By Sklar’s theorem, + +\[ +(X,Y) = +(Q_X(U), Q_Y(V)) +\] + +where \((U,V)\) follows the copula \(C\). + +Substituting into the definition of the co-lower partial moment gives + +\[ +E[(t_X-Q_X(U))_+^r (t_Y-Q_Y(V))_+^s]. +\] + +Expressing the expectation with respect to the copula distribution yields the result. ∎ + +This representation shows that copulas describe **probability mass over directional regions**, while co-partial moments additionally weight observations by their **deviation magnitudes**. + +--- + +## Example: Directional Dependence Surface and Copula Transformation + +The following illustration uses functions from the **NNS R package** introduced earlier. In particular, `Co.LPM()` computes co-lower partial moments and `LPM.ratio()` produces the probability transform used for directional ranking. + +Generate correlated Gaussian observations: + +```r +library(MASS) + +set.seed(123) + +Sigma <- matrix(c(1,0.7,0.7,1),2,2) +xy <- mvrnorm(100,c(0,0),Sigma) + +x <- xy[,1] +y <- xy[,2] + +z <- expand.grid(x,y) +``` + +Plot the **Co-Lower Partial Moment surface** relative to benchmark \(t_X=t_Y=0\): + +```r +rgl::plot3d( + z[,1], + z[,2], + Co.LPM(0,z[,1],z[,2],z[,1],z[,2]), + col="red" +) +``` + +
    +![Figure 11.1. Co-LPM surface in original \((x,y)\) space (red), visualizing joint downside structure before copula transformation.](images/ch11_raw_copula.png) +
    + +In the call `Co.LPM(0, z[,1], z[,2], z[,1], z[,2])`, the argument order is `(degree, x, y, target.x, target.y)`. Setting `degree = 0` produces the probability-level co-lower partial moment, and reusing `z[,1]`/`z[,2]` as both variables and targets evaluates the surface over the full grid of benchmark pairs. The visualization uses `rgl::plot3d`, which produces an interactive three-dimensional plot. + +This surface represents the magnitude of **joint downside deviations** in the original variable space. + +Next transform the variables into probability space: + +```r +u_x <- LPM.ratio(0,x,x) +u_y <- LPM.ratio(0,y,y) + +z <- expand.grid(u_x,u_y) +``` + +Plotting the same directional statistic in probability space gives + +```r +rgl::plot3d( + z[,1], + z[,2], + Co.LPM(0,z[,1],z[,2],z[,1],z[,2]), + col="blue" +) +``` + +
    +![Figure 11.2. Co-LPM surface in copula/probability space \([0,1]^2\) (blue), showing the same dependence geometry after marginal probability transformation.](images/ch11_transformed_copula.png) + +
    + +The resulting surface lies within the unit square \([0,1]^2\), which represents the **copula domain**. + +The transformation + +\[ +(X,Y) \rightarrow (F_X(X),F_Y(Y)) +\] + +changes the coordinate system but preserves the dependence structure. + +For a direct multivariate dependence summary in the package, `NNS.copula()` can be called on a matrix of variables: + +```r +set.seed(123) +z3 <- rnorm(length(x)) + +NNS.copula(cbind(x, y, z3), plot = TRUE, independence.overlay = TRUE) +## [1] 0.302 +``` + +The return value is a single scalar in \([0,1]\) for the full multivariate system, where values closer to 0 indicate near-independence and values closer to 1 indicate stronger joint dependence. When needed, `continuous = TRUE` can be supplied to align with the continuous-CDF formulation used elsewhere in the package vignettes. + +--- + +## Tail Dependence and Directional Moments + +Copula theory frequently focuses on **tail dependence**, which measures the probability that variables experience extreme outcomes simultaneously. + +Upper tail dependence is defined as + +\[ +\lambda_U += +\lim_{u\to1^-} +P(V>u \mid U>u) +\] + +and lower tail dependence as + +\[ +\lambda_L += +\lim_{u\to0^+} +P(V\le u \mid U\le u). +\] + +These limits exist for many commonly used copula families. In some cases, such as the Gaussian copula, both coefficients equal zero even when correlation is strong. + +Directional statistics provides a natural extension of this concept. + +Let + +\[ +t_X = Q_X(u), \qquad +t_Y = Q_Y(u) +\] + +denote quantile thresholds approaching the lower tail. + +The degree-zero co-partial moment is + +\[ +CoLPM_{0,0}(t_X,t_Y) += +P(X\le t_X, Y\le t_Y). +\] + +Then + +\[ +\frac{CoLPM_{0,0}(t_X,t_Y)}{P(X\le t_X)} += +P(Y\le t_Y \mid X\le t_X). +\] + +As \(u\to0\), this conditional probability converges to the copula lower tail dependence coefficient + +\[ +\lambda_L. +\] + +This follows directly from the definition of tail dependence as the limit of conditional copula probabilities. + +Higher-order directional moments generalize this concept by weighting deviations within the tail region. + +--- + +## Comparison with Classical Copula Models + +Copula analysis often relies on parametric families such as + +- Gaussian copulas +- Clayton copulas +- Gumbel copulas +- Student-t copulas. + +These models impose specific functional forms on the dependence structure. + +Directional dependence differs in several important ways. + +### Nonparametric Structure + +Co-partial moments are estimated directly from the data and do not require specifying a copula family. + +### Sensitivity to Extreme Deviations + +Many classical copulas capture **tail coincidence probabilities** but ignore the magnitude of extreme events. For example, the Gaussian copula has zero tail dependence unless correlation is exactly one, a property that has surprised many practitioners. + +Directional moments avoid this limitation by measuring **deviation magnitude within tail regions**. + +### Benchmark Flexibility + +Copula analysis typically evaluates dependence at probability thresholds. Directional statistics instead allows benchmarks to be specified directly in the variable space. + +--- + +## Multivariate Extension + +Copula theory extends naturally to higher dimensions. + +For variables + +\[ +X_1,\dots,X_d +\] + +the joint distribution can be written + +\[ +F(x_1,\dots,x_d) += +C(F_1(x_1),\dots,F_d(x_d)). +\] + +Directional statistics extends similarly. + +Benchmarks \(t_1,\dots,t_d\) partition the sample space into directional regions across all variables. Each variable contributes two directional states (above or below its benchmark), producing \(2^d\) joint regions. + +Multivariate co-partial moments measure deviations within these regions. For example, + +\[ +E[(t_1-X_1)_+(t_2-X_2)_+(t_3-X_3)_+] +\] + +captures joint downside deviations across three variables. + +In practice analysts often focus on specific regions of interest—such as the region where all variables fall below their benchmarks—rather than enumerating all \(2^d\) regions explicitly. + +--- + +## Structural Interpretation + +Copulas separate **marginal distributions** from **dependence structure**. + +Directional statistics provides a complementary perspective: + +- Marginal distributions arise from **degree-zero partial moments**. +- Dependence arises from **co-partial moments across directional regions**. + +Thus both frameworks describe the same joint structure from different viewpoints. + +Copulas emphasize **rank-based probability structure**, while directional moments emphasize **benchmark-relative deviations**. + +--- + +## Summary + +This chapter connected directional dependence with copula theory. + +Key observations include: + +1. Copulas represent dependence in probability space. +2. Probability transforms map observations into the unit square. +3. Directional benchmarks partition copula space into four dependence regions. +4. Degree-zero co-partial moments correspond to copula region probabilities. +5. Higher-order co-partial moments generalize tail dependence by weighting extreme deviations. +6. Directional methods provide a nonparametric and magnitude-sensitive interpretation of copula dependence. + +Directional partial moments therefore offer a natural bridge between benchmark-based statistics and rank-based dependence analysis. + +The next chapter develops conditional probability and Bayes' theorem from the partial-moment framework. diff --git a/tools/NNS/book/chapter-13-conditional-probability-and-bayes-theorem.Rmd b/tools/NNS/book/chapter-13-conditional-probability-and-bayes-theorem.Rmd new file mode 100644 index 0000000..6aa8a7a --- /dev/null +++ b/tools/NNS/book/chapter-13-conditional-probability-and-bayes-theorem.Rmd @@ -0,0 +1,403 @@ +# Conditional Probability and Bayes' Theorem + +Chapters 9–11 established the directional framework for analyzing relationships between variables. A central missing ingredient for inference is **conditional probability**: how the probability of one event changes when information about another event becomes available. + +Classical statistics defines conditional probability through probability ratios. The directional framework provides a deeper interpretation: conditional probabilities arise directly from **degree-zero co-partial moments**, and Bayes' theorem follows as a simple algebraic identity within this structure. Moreover, degree-one co-partial moments go further — they are not merely risk metrics but **distributional generators** from which the full joint law can be recovered through differentiation. + +This chapter develops the directional formulation of conditional probability, derives Bayes' theorem from partial-moment relationships, and connects degree-zero and degree-one co-partial moments to establish their joint role in inference. + +--- + +## Classical Conditional Probability + +Let \(A\) and \(B\) be events with \(P(B) > 0\). + +The classical definition of conditional probability is + +\[ +P(A \mid B) = \frac{P(A \cap B)}{P(B)}. +\] + +The numerator represents the probability that both events occur simultaneously. The denominator represents the probability that the conditioning event occurs. + +Conditional probability therefore measures the **relative frequency of \(A\) within the subset of outcomes where \(B\) occurs**. + +This definition leads directly to the **multiplication rule** + +\[ +P(A \cap B) = P(A \mid B)\,P(B). +\] + +The directional framework reproduces these relationships naturally through partial moments — and in doing so, reveals their geometric origin in the joint distribution. + +--- + +## Events as Degree-Zero Partial Moments + +From Chapter 3, the cumulative distribution function can be written as a degree-zero lower partial moment: + +\[ +F_X(t) = L_0(t;\,X) = P(X \le t). +\] + +Events defined by inequalities correspond directly to degree-zero partial moments. For two variables \(X\) and \(Y\), + +\[ +P(X \le t_X) = L_0(t_X;\,X), \qquad P(Y \le t_Y) = L_0(t_Y;\,Y). +\] + +Symmetrically, the survival function from Chapter 3 gives + +\[ +P(X > t_X) = U_0(t_X;\,X), \qquad P(Y > t_Y) = U_0(t_Y;\,Y). +\] + +The joint events across all four quadrants defined by benchmarks \((t_X, t_Y)\) correspond to the four **degree-zero co-partial moments**: + +\[ +\mathrm{CoLPM}_{0,0}(t_X,t_Y) = P(X \le t_X,\; Y \le t_Y), +\] + +\[ +\mathrm{CoUPM}_{0,0}(t_X,t_Y) = P(X > t_X,\; Y > t_Y), +\] + +\[ +\mathrm{DLPM}_{0,0}(t_X,t_Y) = P(X \le t_X,\; Y > t_Y), +\] + +\[ +\mathrm{DUPM}_{0,0}(t_X,t_Y) = P(X > t_X,\; Y \le t_Y). +\] + +The concordant moments, CoLPM and CoUPM, capture joint movement in the same directional region. The **divergent moments**, DLPM and DUPM, capture the two cross-quadrant regions where \(X\) and \(Y\) move in opposite directions relative to their benchmarks — directly parallel to the divergent co-partial moment structure developed in Chapter 10. All four are degree-zero specializations of the general co-partial moment framework. + +--- + +## The Four-Quadrant Probability Partition + +Benchmarks \((t_X, t_Y)\) partition the joint distribution into four mutually exclusive regions: + +| | \(Y \le t_Y\) | \(Y > t_Y\) | +|------------------|----------------------|----------------------| +| \(X \le t_X\) | CoLPM\(_{0,0}\) | DLPM\(_{0,0}\) | +| \(X > t_X\) | DUPM\(_{0,0}\) | CoUPM\(_{0,0}\) | + +Because these four regions partition the joint distribution completely, + +\[ +\mathrm{CoLPM}_{0,0} ++ \mathrm{CoUPM}_{0,0} ++ \mathrm{DLPM}_{0,0} ++ \mathrm{DUPM}_{0,0} += 1. +\] + +This is the **degree-zero partition of unity**: each observation contributes exactly one unit of probability mass to exactly one quadrant. It is a complete nonparametric probability representation of the joint distribution relative to any pair of benchmarks. + +In NNS R-package notation: +\[ +\begin{aligned} +1 &= \texttt{Co.UPM}(0,X,Y,t_X,t_Y) + + \texttt{D.UPM}(0,0,X,Y,t_X,t_Y) \\ + &\quad + \texttt{D.LPM}(0,0,X,Y,t_X,t_Y) + + \texttt{Co.LPM}(0,X,Y,t_X,t_Y). +\end{aligned} +\] + +where the four terms correspond respectively to +\(P(X>t_X,\,Y>t_Y)\), +\(P(X>t_X,\,Y\le t_Y)\), +\(P(X\le t_X,\,Y>t_Y)\), and +\(P(X\le t_X,\,Y\le t_Y)\). +Conditional probabilities are simply **relative weights of these regions after conditioning on one of the marginals**. + +--- + +## Conditional Probability from Co-Partial Moments + +All eight conditional probabilities arising from the four-quadrant partition can be expressed as ratios of a co-partial moment to a marginal partial moment. We organize them by quadrant. + +## Concordant lower-tail conditioning + +\[ +P(Y \le t_Y \mid X \le t_X) += \frac{\mathrm{CoLPM}_{0,0}(t_X,t_Y)}{L_0(t_X;\,X)}, \qquad +P(X \le t_X \mid Y \le t_Y) += \frac{\mathrm{CoLPM}_{0,0}(t_X,t_Y)}{L_0(t_Y;\,Y)}. +\] + +## Concordant upper-tail conditioning + +\[ +P(Y > t_Y \mid X > t_X) += \frac{\mathrm{CoUPM}_{0,0}(t_X,t_Y)}{U_0(t_X;\,X)}, \qquad +P(X > t_X \mid Y > t_Y) += \frac{\mathrm{CoUPM}_{0,0}(t_X,t_Y)}{U_0(t_Y;\,Y)}. +\] + +## Discordant conditioning + +\[ +P(Y > t_Y \mid X \le t_X) += \frac{\mathrm{DLPM}_{0,0}(t_X,t_Y)}{L_0(t_X;\,X)}, \qquad +P(Y \le t_Y \mid X > t_X) += \frac{\mathrm{DUPM}_{0,0}(t_X,t_Y)}{U_0(t_X;\,X)}. +\] + +Each formula follows from the same logic: the joint probability of the relevant quadrant divided by the marginal probability of the conditioning event. Together, these eight expressions **complete the four-quadrant conditional probability picture** — every conditional probability involving thresholds on \(X\) and \(Y\) is a ratio of a degree-zero co-partial moment to a marginal degree-zero partial moment. + +In NNS notation, letting \(A = \{X > t_X\}\) and \(B = \{Y > t_Y\}\): + +\[ +P(A) = \texttt{UPM}(0,t_X,X), \qquad +P(B) = \texttt{UPM}(0,t_Y,Y), +\] + +\[ +P(B \mid A) = \frac{\texttt{Co.UPM}(0,X,Y,t_X,t_Y)}{\texttt{UPM}(0,t_X,X)}, \qquad +P(A \mid B) = \frac{\texttt{Co.UPM}(0,X,Y,t_X,t_Y)}{\texttt{UPM}(0,t_Y,Y)}. +\] + +--- + +## Bayes' Theorem + +Bayes' theorem describes how conditional probabilities relate when the conditioning direction is reversed. + +Starting from the multiplication rule, + +\[ +P(A \cap B) = P(A \mid B)\,P(B) = P(B \mid A)\,P(A). +\] + +Equating the two expressions and solving for \(P(A \mid B)\) yields Bayes' theorem: + +\[ +P(A \mid B) += +\frac{P(B \mid A)\,P(A)}{P(B)}. +\] + +This identity allows probabilities to be updated when new information becomes available. The directional framework reveals that this is not merely an algebraic manipulation of probability ratios — it is a direct consequence of the symmetry of co-partial moments. + +--- + +## Bayes' Theorem from Partial Moments + +Using the directional framework, Bayes' theorem follows immediately from co-partial moment identities. + +## Lower-tail derivation + +Let \(A = \{X \le t_X\}\) and \(B = \{Y \le t_Y\}\). From Section 13.4, + +\[ +P(B \mid A) = \frac{\mathrm{CoLPM}_{0,0}(t_X,t_Y)}{L_0(t_X;\,X)}, \qquad +P(A \mid B) = \frac{\mathrm{CoLPM}_{0,0}(t_X,t_Y)}{L_0(t_Y;\,Y)}. +\] + +Rearranging the first equation gives + +\[ +\mathrm{CoLPM}_{0,0}(t_X,t_Y) = P(B \mid A)\,L_0(t_X;\,X). +\] + +Substituting into the second equation, + +\[ +P(A \mid B) += \frac{P(B \mid A)\,L_0(t_X;\,X)}{L_0(t_Y;\,Y)}. +\] + +Since \(L_0(t_X;\,X) = P(A)\) and \(L_0(t_Y;\,Y) = P(B)\), + +\[ +\boxed{P(A \mid B) = \frac{P(B \mid A)\,P(A)}{P(B)}.} +\] + +## Upper-tail derivation + +The identical derivation holds in the upper region. Let \(A = \{X > t_X\}\) and \(B = \{Y > t_Y\}\). Replacing CoLPM with CoUPM and \(L_0\) with \(U_0\) throughout, + +\[ +P(B \mid A) = \frac{\mathrm{CoUPM}_{0,0}(t_X,t_Y)}{U_0(t_X;\,X)}, \qquad +P(A \mid B) = \frac{\mathrm{CoUPM}_{0,0}(t_X,t_Y)}{U_0(t_Y;\,Y)}, +\] + +which yields the same Bayes identity by the same algebra. **Bayes' theorem holds symmetrically across all four directional regions**, reflecting the structural symmetry of the co-partial moment framework rather than any special property of the lower tail. + +--- + +## Posterior Probability Interpretation + +Bayesian inference interprets probabilities as quantities that update when new information becomes available. + +Let + +- \(P(A)\) be the **prior probability**, +- \(P(B \mid A)\) the **likelihood**, +- \(P(A \mid B)\) the **posterior probability**. + +Within the directional framework: + +- **priors** correspond to marginal degree-zero partial moments — the probability mass of a directional region before conditioning, +- **likelihoods** correspond to conditional probabilities derived from co-partial moments — how that mass concentrates when the other variable is observed, +- **posteriors** represent **updated directional probabilities after conditioning** — the renormalized weight of one quadrant given information from another. + +Bayesian updating therefore corresponds to **redistributing probability weight across the four directional regions of the joint distribution**. The four-quadrant partition of Section 13.3 is the geometric object being operated on; Bayes' theorem is the renormalization rule. + +--- + +## Example + +Suppose a dataset contains observations of two variables \(X\) and \(Y\). Let the benchmarks be + +\[ +t_X = 0, \qquad t_Y = 0. +\] + +Assume empirical probabilities are + +\[ +P(X \le 0) = 0.4, \qquad P(Y \le 0) = 0.5, \qquad P(X \le 0,\; Y \le 0) = 0.3. +\] + +Then + +\[ +P(Y \le 0 \mid X \le 0) = \frac{0.3}{0.4} = 0.75, \qquad +P(X \le 0 \mid Y \le 0) = \frac{0.3}{0.5} = 0.6. +\] + +Applying Bayes' theorem as a check: + +\[ +P(X \le 0 \mid Y \le 0) += \frac{P(Y \le 0 \mid X \le 0)\,P(X \le 0)}{P(Y \le 0)} += \frac{0.75 \times 0.4}{0.5} = 0.6. \checkmark +\] + +The directional framework identifies \(\mathrm{CoLPM}_{0,0}(0,0) = 0.3\) as the probability mass in the joint lower-left region. Conditional probabilities are relative frequencies within that region, computed directly from partial-moment ratios without distributional assumptions. + +--- + +## The Degree-One Extension: Co-Partial Moments as Distributional Generators + +The analysis so far has operated at degree zero, where co-partial moments are indicator-level probability masses. A natural question arises: what additional structure is carried by **degree-one co-partial moments**? + +## The hinge surface + +Define the **degree-one lower co-partial moment surface** + +\[ +H(t_X, t_Y) = E\!\bigl[(t_X - X)_+\,(t_Y - Y)_+\bigr]. +\] + +This replaces indicator contributions with **hinge magnitudes** — continuous functions of how far each variable falls below its benchmark. Unlike degree-zero moments, which record whether an observation lands in a quadrant, degree-one moments record **how far into that quadrant it lies**. + +## Continuous partition of unity + +Degree-one co-partial moments form a continuous partition of unity over the same four-quadrant geometry as degree zero. Defining concordant and divergent degree-one quantities + +\[ +C^{--}(t_X,t_Y) = E[(t_X-X)_+(t_Y-Y)_+], \quad +C^{++}(t_X,t_Y) = E[(X-t_X)_+(Y-t_Y)_+], +\] + +\[ +D^{+-}(t_X,t_Y) = E[(X-t_X)_+(t_Y-Y)_+], \quad +D^{-+}(t_X,t_Y) = E[(t_X-X)_+(Y-t_Y)_+], +\] + +and total magnitude \(S = C^{--} + C^{++} + D^{+-} + D^{-+}\), the normalized weights + +\[ +w^{--} = \frac{C^{--}}{S}, \quad w^{++} = \frac{C^{++}}{S}, \quad w^{+-} = \frac{D^{+-}}{S}, \quad w^{-+} = \frac{D^{-+}}{S} +\] + +satisfy \(w^{--} + w^{++} + w^{+-} + w^{-+} = 1\) with all weights non-negative whenever \(S > 0\). The case \(S = 0\) is degenerate: since all four hinge products are non-negative, \(S = 0\) implies each term is zero, which in turn requires that for every observation, \(X = t_X\) or \(Y = t_Y\) (or both). This is a measure-zero event under any absolutely continuous distribution but can arise in discrete data; in practice one simply avoids placing benchmarks at point masses. In the limit as degree approaches zero, the normalized weights collapse to the hard quadrant probabilities of Section 13.3. + +## Distributional recovery + +The hinge surface carries more information than its degree-zero counterpart. The following result shows that \(H\) is a complete representation of the joint law. + +**Theorem** (Distributional Recovery). Assume \((X, Y)\) is integrable and differentiation under the expectation is valid (e.g., by dominated convergence). Then at all continuity points of the joint CDF \(F_{X,Y}\), + +\[ +\frac{\partial^2 H}{\partial t_X\,\partial t_Y}(t_X, t_Y) = F_{X,Y}(t_X, t_Y). +\] + +If \(F_{X,Y}\) is absolutely continuous with sufficiently smooth density \(f_{X,Y}\), then + +\[ +\frac{\partial^4 H}{\partial t_X^2\,\partial t_Y^2}(t_X, t_Y) = f_{X,Y}(t_X, t_Y). +\] + +Consequently, \(H(\cdot,\cdot)\) over all threshold pairs uniquely determines the joint law and, when it exists, the joint density. The qualification "at all continuity points of \(F_{X,Y}\)" is essential: for discrete distributions, the CDF has jump discontinuities and the derivative identities hold only at points where \(F_{X,Y}\) is continuous. + +**Proof sketch.** To justify differentiation under the expectation, assume there exists an integrable envelope dominating the local difference quotients of the hinge terms in a neighborhood of \((t_X,t_Y)\); this is exactly the dominated-convergence qualification in the theorem statement. Using \(\partial_{t_X}(t_X - X)_+ = \mathbf{1}\{X \le t_X\}\) and \(\partial_{t_Y}(t_Y - Y)_+ = \mathbf{1}\{Y \le t_Y\}\) almost everywhere, + +\[ +\frac{\partial H}{\partial t_X}(t_X,t_Y) = E\!\bigl[\mathbf{1}\{X \le t_X\}(t_Y - Y)_+\bigr], +\] + +and therefore + +\[ +\frac{\partial^2 H}{\partial t_X\,\partial t_Y}(t_X,t_Y) += E\!\bigl[\mathbf{1}\{X \le t_X\}\,\mathbf{1}\{Y \le t_Y\}\bigr] += P(X \le t_X,\; Y \le t_Y) += F_{X,Y}(t_X,t_Y). \qquad\square +\] + +The surface \(H\) is directly estimable from data by averaging hinge products. Mixed second derivatives recover the joint CDF numerically via finite differences; further differentiation recovers the density, though this is noisier due to higher-order amplification of sampling variation. + +See [Discrete and Continuous Bayes](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/discrete_and_continuous_bayes.md) for a worked example. + + + +## Hierarchy across degrees + +This result establishes a natural hierarchy: + +- **Degree 0**: Indicator-level probability partition — the four-quadrant decomposition of Sections 13.3–13.6. +- **Degree 1**: Continuous hinge partition of unity and complete distributional recovery — the minimal degree at which partial moments become full distributional generators. +- **Higher degrees**: Tail-emphasized continuous partitions that place increasing weight on extreme deviations from the benchmark. These are valuable for risk analysis in benchmark-relative tail analysis, but do not increase representational completeness beyond what degree one already provides. Once the full joint law is recovered, higher degrees refine which parts of the distribution are emphasized, not what is represented. + +Thus **degree one is the completeness threshold**: the minimal order at which the full joint law is captured. + +--- + +## Partial Moments as a Bridge Between Bayesian and Frequentist Inference + +A deeper consequence of the distributional recovery theorem is that partial moments provide a **law-invariant bridge** between Bayesian and frequentist statistical frameworks. + +A functional is **law-invariant** if its value depends only on the distribution of the random variable, not on the specific probability space or the process that generated it. Partial moments are law-invariant in precisely this sense: \(L_n(t;\,X)\) and \(U_n(t;\,X)\) depend only on the distribution of \(X\), not on how that distribution was constructed. + +The central distinction between Bayesian and frequentist perspectives is how the probability measure \(P\) is constructed: Bayesians form a **posterior predictive distribution** by updating a prior with data; frequentists approximate the **data-generating measure** directly with the empirical distribution. Both pipelines ultimately produce a probability measure for outcomes \(X\), and once that measure is specified, partial-moment operators act on it identically. + +Formally: + +- **Bayesian path**: prior \(\pi(\theta)\) → likelihood \(L(D \mid \theta)\) → posterior \(\pi(\theta \mid D)\) → posterior predictive \(P_B = \int P_\theta\,\pi(d\theta \mid D)\) → compute \(L_n(t;\,X)\), \(U_n(t;\,X)\) with \(X \sim P_B\). + +- **Frequentist path**: empirical law \(\hat{P}_n = \frac{1}{n}\sum_{i=1}^n \delta_{X_i}\) → compute \(L_n(t;\,X)\), \(U_n(t;\,X)\) with \(X \sim \hat{P}_n\). + +Because both pipelines reduce to the same partial-moment operators applied to different input distributions, **any two models that agree on the distribution of \(X\) will produce identical partial moments** — not just the normalized degree-one weights, but all partial moments of all degrees. The formula stays the same; only the input distribution changes. This is the formal sense in which partial moments are a practical lingua franca between Bayesian and frequentist workflows. + +The directional formulation of Bayes' theorem in Sections 13.5–13.6 is therefore paradigm-agnostic: it holds whether the joint distribution is constructed from a posterior predictive, an empirical distribution, or any other probability measure fed into the co-partial moment operators. + +--- + +## Summary + +Conditional probability and Bayes' theorem arise naturally and completely from the partial-moment framework. Key results include: + +- Degree-zero partial moments represent **probabilities of directional events**, recovering the CDF and survival function as special cases. +- Degree-zero co-partial moments represent **joint event probabilities** and partition the joint distribution into four mutually exclusive regions — two concordant (CoLPM, CoUPM) and two divergent (DLPM, DUPM) — summing to one. +- **All eight conditional probabilities** from the four-quadrant partition are ratios of a degree-zero co-partial moment to a marginal partial moment. +- **Bayes' theorem follows directly from co-partial moment identities**, holds symmetrically in both the lower and upper tails, and requires no distributional assumptions. +- **Bayesian updating corresponds to renormalizing the four-quadrant probability partition** after conditioning on one marginal. +- **Degree-one co-partial moments are distributional generators**: the mixed second derivative of the hinge surface recovers the joint CDF at all continuity points, and the mixed fourth derivative recovers the joint density when it exists. Degree one is the completeness threshold. +- **Partial moments are law-invariant**: they depend only on the induced distribution of \(X\), making them identical across Bayesian and frequentist pipelines whenever those pipelines agree on the distribution of outcomes. + +The next chapter extends these conditional-probability tools to directional causation — asking not merely how probability mass is distributed across variables, but which variable is doing the driving. diff --git a/tools/NNS/book/chapter-14-directional-causation.Rmd b/tools/NNS/book/chapter-14-directional-causation.Rmd new file mode 100644 index 0000000..ab0080d --- /dev/null +++ b/tools/NNS/book/chapter-14-directional-causation.Rmd @@ -0,0 +1,305 @@ +--- +output: + pdf_document: default + html_document: default +--- +# Directional Causation + +Chapters 9–11 developed the directional framework for measuring dependence between variables. Co-partial moments were shown to capture asymmetric, nonlinear, and tail-specific joint behavior that classical correlation obscures. The copula interpretation then demonstrated how dependence structure can be separated from marginal distributions entirely. + +Dependence alone, however, does not imply **causation**. Building on the conditional-probability and Bayes machinery from the previous chapter, we now turn to directional influence. +Two variables may move together because + +- one variable influences the other, +- both are driven by a common underlying factor, or +- the relationship arises from structural constraints in the system. + +Identifying the direction and strength of causal influence requires more than a symmetric dependence measure. It requires a framework that can detect **which variable is doing the driving**. + +Classical approaches to this problem rely on **Granger causality**, which uses parametric time-series regressions to test whether lagged values of one variable improve linear predictions of another. The directional framework offers a different approach: causal structure is inferred from **nonlinear probability relationships between variables after removing each variable's internal dynamics**, without imposing a parametric model. + +This chapter develops the **directional causation framework** in three stages: removing internal dynamics through nonlinear lag normalization, placing residual signals on a shared scale through joint rangespace normalization, and measuring causal influence through partial-moment-based conditional probability and asymmetric directional dependence. + +--- + +## Limitations of Classical Granger Causality + +In classical time-series analysis, a variable \(X\) is said to *Granger-cause* \(Y\) if past values of \(X\) improve prediction of \(Y\) beyond what \(Y\)'s own history provides. + +A typical vector autoregressive model takes the form + +\[ +Y_t = +\sum_{i=1}^{p} a_i Y_{t-i} ++ +\sum_{i=1}^{p} b_i X_{t-i} ++ +\varepsilon_t . +\] + +If the coefficients \(b_i\) are jointly significant, \(X\) is said to Granger-cause \(Y\). + +Granger causality captures a genuine and important insight: the causal role of \(X\) in \(Y\) should only be assessed after conditioning on \(Y\)'s own past. The directional framework retains this principle. What changes is how that conditioning is implemented — through nonlinear normalization rather than linear regression — and what the causal evidence consists of — conditional probability and asymmetric dependence rather than regression coefficients. + +The parametric Granger approach carries four limitations that are familiar from earlier chapters. + +**Linear specification.** The causal relationship between variables is constrained to be linear. Nonlinear effects, including the simple case where \(X\) drives \(Y = X^2\), are invisible to the regression coefficients even when the causal relationship is strong and unambiguous. + +**Symmetric aggregation.** Regression models aggregate deviations symmetrically around the mean. Asymmetric causal effects — where large upward movements in \(X\) drive movements in \(Y\) but small movements do not — are absorbed into the residual. + +**Distributional assumptions.** Inference relies on parametric assumptions about the error distribution. + +**Model dependence.** Results are sensitive to lag length selection, variable inclusion, and specification choices that the investigator must make before seeing the data. + +The directional causation framework addresses all four by working entirely within the partial-moment machinery developed in Chapters 2–11. + +--- + +## Theoretical Foundations: Three Axioms + +Before developing the method, it is useful to state what a causation measure should satisfy. Three axioms motivate the construction. + +**Axiom 1 — Self-causation exclusion.** No variable should be identified as causing itself. When \(X = Y\), the causal measure should return a symmetric result indicating no net directional influence, and the diagonal of the causation matrix should be zero. + +**Axiom 2 — Nonlinear causation detection.** The measure should accurately identify causal relationships that are nonlinear and directional. A functional relationship \(Y = f(X)\) should register positive causal influence from \(X\) to \(Y\) regardless of whether \(f\) is linear, quadratic, or otherwise nonmonotone. + +**Axiom 3 — Directionality proportionality.** Causal strength should scale with the degree of functional asymmetry between the two variables. When \(X\) strongly determines \(Y\) but \(Y\) only weakly constrains \(X\), the statistic should reflect this imbalance clearly and in a stable, interpretable way. + +These axioms motivate a measure built from two components: a **conditional probability** that captures whether movements in one variable constrain the range of the other, and an **asymmetric dependence** measure that captures whether those movements are directionally aligned. Neither component alone satisfies all three axioms; together they do. + +--- + +## Lagged Co-Partial Moments + +The partial-moment framework extends naturally to temporal relationships. This extension provides the conceptual foundation for the lag-normalization step that follows. + +Let \(\{X_t\}\) and \(\{Y_t\}\) be two time series with benchmarks \(t_X\) and \(t_Y\). For a lag \(\tau \ge 0\), define the **lagged co-partial moments** + +\[ +\text{CoLPM}_{r,s}^{(\tau)}(X,Y) += E\!\Bigl[(t_X - X_{t-\tau})_+^r\,(t_Y - Y_t)_+^s\Bigr] +\] + +\[ +\text{CoUPM}_{r,s}^{(\tau)}(X,Y) += E\!\Bigl[(X_{t-\tau} - t_X)_+^r\,(Y_t - t_Y)_+^s\Bigr] +\] + +where \((\cdot)_+ = \max(\cdot,0)\) is the positive-part operator from Chapter 2. + +When \(\tau = 0\) these reduce exactly to the contemporaneous co-partial moments from Chapter 10. The lagged versions introduce a temporal asymmetry: because the roles of \(X\) and \(Y\) are evaluated at different time points, exchanging \(X\) and \(Y\) does not produce the same quantity: + +\[ +\text{CoUPM}_{r,s}^{(\tau)}(X,Y) \neq \text{CoUPM}_{r,s}^{(\tau)}(Y,X) +\] + +in general. This asymmetry is the partial-moment foundation of directional causation. \(\text{CoUPM}_{r,s}^{(\tau)}\) is large when upward deviations of \(X\) at time \(t-\tau\) tend to be followed by upward deviations of \(Y\) at time \(t\). \(\text{CoLPM}_{r,s}^{(\tau)}\) captures the same co-movement in the downward direction. + +Like their contemporaneous counterparts from Chapter 10, lagged co-partial moments are estimated directly from sample data by replacing the expectation with an empirical average over the \(n - \tau\) available observation pairs: + +\[ +\widehat{\text{CoUPM}}_{r,s}^{(\tau)} += +\frac{1}{n-\tau} +\sum_{t=\tau+1}^{n} +(x_{t-\tau} - t_X)_+^r\,(y_t - t_Y)_+^s. +\] + +These estimators converge to their population values by the law of large numbers. + +The causation statistic developed below operationalizes this lagged structure: first removing the self-driven component of each variable, then measuring the residual cross-variable co-movement through conditional probability and asymmetric dependence. + +--- + +## Removing Internal Dynamics + +The first computational step separates each variable's **internal temporal dynamics** from its interaction with the other variable. + +This is the nonparametric analogue of pre-whitening in classical time-series analysis. In the Granger framework, pre-whitening is accomplished by including the variable's own lags in the regression. In the directional framework, it is accomplished through **nonlinear lag normalization**. + +Let \(\tau\) denote the lag order. Form the lag matrix for \(X\): + +\[ +\mathbf{X}_\tau = \bigl[X_t,\; X_{t-1},\; \dots,\; X_{t-\tau}\bigr] +\] + +and analogously \(\mathbf{Y}_\tau\) for \(Y\). Apply joint normalization within each lag matrix: + +\[ +X_t^{*} = \texttt{NNS.norm}(\mathbf{X}_\tau)[\,\cdot\,,1] +\qquad +Y_t^{*} = \texttt{NNS.norm}(\mathbf{Y}_\tau)[\,\cdot\,,1] +\] + +where \(\texttt{NNS.norm}(\cdot)\) implements the empirical CDF transformation — mapping each observation to its relative rank position within the column — a direct application of the degree-zero partial moment \(L_0(t;X) = P(X \le t)\) from Chapter 3. The first column of each normalized matrix gives the representation of the current observation relative to its own lag structure. + +This normalization step has a direct partial-moment interpretation. Mapping each observation through the empirical CDF of its lagged neighborhood positions it on the uniform \([0,1]\) scale relative to its own history. The resulting \(X_t^{*}\) represents the **relative standing of the current observation within its own temporal context** — precisely the information that a Granger regression extracts through linear projection, but without imposing a linear model. + +Any remaining association between \(X_t^{*}\) and \(Y_t^{*}\) therefore reflects cross-variable interaction rather than self-dependence. + +When \(\tau = 0\), the lag-normalization step is skipped and the raw variables are passed directly to the joint normalization step. This corresponds to the cross-sectional case where temporal ordering carries no information. The lag order \(\tau\) may be specified directly, or set to \(\tau = \texttt{"ts"}\) for automatic selection via the detected seasonality of each series using \(\texttt{NNS.seas}\). + +--- + +## Joint Rangespace Normalization + +Once internal dynamics have been removed, the lag-adjusted variables \(X_t^{*}\) and \(Y_t^{*}\) must be placed on a **common scale** before their interaction can be meaningfully measured. + +Because the two series may differ in scale, units, and distributional shape, direct comparison of the lag-normalized values can be misleading. The solution draws on the same degree-zero partial moment used in the previous step. From Chapter 3, the degree-zero lower partial moment equals the empirical CDF: + +\[ +L_0(t; X) = P(X \le t). +\] + +Mapping both variables jointly through their empirical CDFs places them on a shared \([0,1]\) scale while preserving their relative positions within the joint distribution. Chapter 10 showed that this is precisely the probability integral transform that defines copula space. + +The directional framework applies this idea jointly to the lag-adjusted variables: + +\[ +\bigl[X_t^{**},\; Y_t^{**}\bigr] += +\texttt{NNS.norm}\!\bigl(\bigl[X_t^{*},\; Y_t^{*}\bigr]\bigr). +\] + +The resulting \(X_t^{**}\) and \(Y_t^{**}\) are copula-like transforms of the lag-adjusted series. Their degree-zero partial moments are approximately uniformly distributed on \([0,1]\), connecting to the copula interpretation of Chapter 10. All subsequent probability and dependence calculations are therefore performed on variables that are simultaneously free of internal dynamics and free of scale differences. + +--- + +## Conditional Probability via Partial Moments + +With both variables on a shared rangespace, the conditional probability that movements in \(X\) constrain the distribution of \(Y\) can now be measured directly using partial moments. + +The **partial-moment conditional probability** is defined as the fraction of \(X^{**}\)'s mass that falls within the observed support of \(Y^{**}\): + +\[ +P(X^{**} \mid Y^{**}) += +1 +- +\Bigl[ +L_1\!\bigl(\min(Y^{**});\; X^{**}\bigr)_{\text{ratio}} ++ +U_1\!\bigl(\max(Y^{**});\; X^{**}\bigr)_{\text{ratio}} +\Bigr] +\] + +where the degree-one ratio forms are + +\[ +L_r(t; X)_{\text{ratio}} += +\frac{L_r(t; X)}{L_r(t; X) + U_r(t; X)}, +\qquad +U_r(t; X)_{\text{ratio}} += +\frac{U_r(t; X)}{L_r(t; X) + U_r(t; X)}. +\] + +When \(L_r(t;X) + U_r(t;X) = 0\), indicating no mass on either side of \(t\), both ratios are defined as zero. + +The first subtracted term, \(L_1(\min(Y^{**}); X^{**})_{\text{ratio}}\), measures the proportion of \(X^{**}\) mass lying **below the lower bound** of \(Y^{**}\)'s support. The second, \(U_1(\max(Y^{**}); X^{**})_{\text{ratio}}\), measures the proportion lying **above the upper bound**. Subtracting both tails from one yields the probability that a randomly drawn value of \(X^{**}\) falls within the range occupied by \(Y^{**}\) — a measure of **distributional co-occupancy** grounded entirely in the partial-moment calculus developed in Chapters 2–4. + +This measure is not symmetric: \(P(X^{**} \mid Y^{**}) \neq P(Y^{**} \mid X^{**})\) in general, because the support ranges of \(X^{**}\) and \(Y^{**}\) after joint normalization need not be identical. It requires no kernel bandwidth, no distributional assumption, and no parametric model. + +--- + +## Asymmetric Directional Dependence + +Conditional probability alone does not establish the **direction** of co-movement. Two variables may overlap substantially in range while moving in opposite directions, or one may respond to the other only in extreme regions. + +To capture directional alignment, the framework uses the asymmetric dependence measures from Chapter 10. From the directional co-partial moment structure, the dependence of \(Y\) on \(X\) and the dependence of \(X\) on \(Y\) need not be equal after joint normalization: + +\[ +\rho_{X^{**} \to Y^{**}} \neq \rho_{Y^{**} \to X^{**}}. +\] + +These are computed from the asymmetric directional dependence matrix of the jointly normalized variables — exactly the structure developed in Chapter 10 — using \(\texttt{NNS.dep}(\cdot, \texttt{asym} = \texttt{TRUE})\). + +Implementation detail: `asym = TRUE` turns on asymmetric dependence, but **direction is determined by argument order**. In practice, `NNS.dep(x, y, asym = TRUE)` and `NNS.dep(y, x, asym = TRUE)` generally differ; the first quantifies directional dependence of `y` on `x`, while the second quantifies the reverse direction. + +Define the **excess directional dependence** of \(Y\) on \(X\) as + +\[ +\Delta\rho = \rho_{X^{**} \to Y^{**}} - \rho_{Y^{**} \to X^{**}}. +\] + +When \(\Delta\rho > 0\), movements in \(X\) are more closely tracked by \(Y\) than the reverse — a second and independent signature of causal flow from \(X\) to \(Y\) beyond what conditional overlap alone captures. When \(\Delta\rho \le 0\), this component contributes nothing to the \(X \to Y\) direction. + +The use of asymmetric directional dependence here — rather than Pearson correlation — is a direct consequence of Chapter 10. Joint normalization places the variables in copula space, but copula-space variables still exhibit asymmetric tail co-movements that the Chapter 10 framework is designed to detect. Classical symmetric correlation would average away exactly the directional asymmetry that makes the causation statistic informative. + +--- + +## The Raw Directional Causation Statistic + +The two components — conditional probability and asymmetric directional dependence — are combined into a single statistic. + +The **raw directional causation value** from \(X\) to \(Y\) is + +\[ +\tilde{C}_{X \to Y} += +\frac{1}{2} +\Bigl[ +P(X^{**} \mid Y^{**}) ++ +\max\!\bigl(\Delta\rho, 0\bigr) +\Bigr]. +\] + +The first term rewards directional overlap in support after lag and copula-style normalization. The second term adds only the positive directional asymmetry in dependence, so reverse-direction dominance is not allowed to inflate the \(X \to Y\) score. + +By construction, \(\tilde{C}_{X \to Y}\in[0,1]\) in standard empirical settings: both components are bounded in \([0,1]\), and the outer factor \(1/2\) averages them. + +--- + +## Bidirectional Normalization + +Directional causation should be interpreted comparatively, not in isolation. Define the reverse-direction raw score \(\tilde{C}_{Y \to X}\) by swapping \(X\) and \(Y\) in the same construction. + +The final **directional causation index** from \(X\) to \(Y\) is then normalized as + +\[ +C_{X \to Y} += +\frac{\tilde{C}_{X \to Y}}{\tilde{C}_{X \to Y}+\tilde{C}_{Y \to X}}, +\qquad +C_{Y \to X}=1-C_{X \to Y}. +\] + +When both raw scores are zero, neither direction shows measurable directional-causation signal and both normalized values are defined as zero. + +Interpretation: + +- \(C_{X \to Y} > 0.5\): stronger evidence for \(X\) leading \(Y\), +- \(C_{X \to Y} < 0.5\): stronger evidence for \(Y\) leading \(X\), +- \(C_{X \to Y} \approx 0.5\): weak directional asymmetry. + +--- + +## Summary + +This chapter formalized directional causation as a three-stage construction: lag normalization to remove self-dynamics, joint rangespace normalization to align scales, and asymmetric directional scoring to detect net directional flow. + +The resulting statistic remains nonparametric, benchmark-relative, and distribution-aware, while avoiding linear-model restrictions of classical Granger-style tests. + +For an applied macroeconomic example using `NNS.nowcast`, `NNS.caus`, noise benchmarks, and strength-of-inference summaries, see [Causal Inference Amongst Macroeconomic Variables Using NNS](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Causal_Inference_Amongst_Macroeconomic_Variables_Using_NNS.html). + +The next chapter develops distribution comparison methods, providing nonparametric tools for directly comparing distributions without parametric assumptions. + + +```{r invisible_marker_ch13, echo=FALSE, results='asis'} +name <- "OVVO Labs Registrant" +date <- Sys.Date() + +cat(sprintf( +"\\begingroup +\\color{white} +\\centering +\\vspace*{0.45\\textheight} +{\\tiny Generated for: %s \\quad Date: %s} +\\par +\\endgroup +", +name, date +)) +``` diff --git a/tools/NNS/book/chapter-15-distribution-comparison.Rmd b/tools/NNS/book/chapter-15-distribution-comparison.Rmd new file mode 100644 index 0000000..a1bdd2f --- /dev/null +++ b/tools/NNS/book/chapter-15-distribution-comparison.Rmd @@ -0,0 +1,930 @@ +# Distribution Comparison + +Previous chapters developed the directional framework for describing distributions, dependence, and conditional probability. Distribution estimation in Chapter 8 showed how empirical partial moments provide nonparametric estimates of entire distributions, while Chapters 9–13 demonstrated how directional moments reveal dependence, conditional probability, and causal structure. + +A natural next question is how to **compare distributions**. + +Classical statistics typically approaches this problem through **hypothesis testing**. Tests such as the Kolmogorov–Smirnov test, the Mann–Whitney test, or parametric t-tests attempt to determine whether two samples arise from the same underlying distribution. + +While useful, these procedures emphasize **binary decisions**—reject or fail to reject a null hypothesis—rather than describing how distributions actually differ. + +The directional framework approaches distribution comparison differently. Because partial moments characterize probability mass relative to benchmarks, two distributions can be compared directly through their **directional probability structure**. + +This chapter develops a nonparametric approach to distribution comparison based on directional probability measures and effect-size interpretations rather than hypothesis-testing decisions. It then introduces the NNS ANOVA procedure, which operationalizes these ideas through the LPM-based continuous CDF, adds **stochastic superiority** as the fundamental pairwise effect-size comparison, and concludes with stochastic dominance tests that extend the comparison to ordered preference relations. + +--- + +This chapter is organized into four signposted blocks so readers can move from concepts to implementation: **Block I (Theory)** develops comparison logic and the continuous-CDF correction; **Block II (Estimation mechanics)** defines operational estimators including stochastic superiority and NNS ANOVA; **Block III (Diagnostics)** covers dominance curves and ordered-comparison diagnostics; **Block IV (Applied workflow)** provides examples and practical guidance. + +## Block I — Theory and foundational comparisons + +### Classical Hypothesis Testing + +In classical statistics, comparing two distributions usually begins with a **null hypothesis** + +$$ +H_0: F_X(t) = F_Y(t) \quad \text{for all } t. +$$ + +The alternative hypothesis states that the two distributions differ. + +Several classical tests address this problem. + +### Kolmogorov–Smirnov Test + +The Kolmogorov–Smirnov statistic compares the empirical distribution functions + +$$ +D = \sup_t |\hat F_X(t) - \hat F_Y(t)|. +$$ + +Large values of $D$ indicate that the distributions differ. + +### Mann–Whitney Test + +The Mann–Whitney test evaluates whether observations from one sample tend to be larger than those from another sample. + +### Parametric Tests + +When parametric assumptions are imposed, tests such as the t-test compare population means. + +Although widely used, these methods possess several limitations. + +**Binary interpretation.** +Hypothesis tests produce accept–reject decisions rather than quantitative descriptions of differences. + +**Sample-size dependence.** +With large samples, even trivial differences become statistically significant. + +**Model dependence.** +Parametric tests require assumptions about distributional form. + +**Limited directional insight.** +Most classical tests provide little information about how distributions differ across regions of the support. + +The directional framework emphasizes **probability comparisons and effect sizes** instead. + +--- + +### Nonparametric Distribution Comparison + +Let $X$ and $Y$ be two random variables with distributions $F_X$ and $F_Y$. + +A natural way to compare the distributions is to examine the probability that an observation from one distribution exceeds an observation from the other. + +Define independent draws + +$$ +X' \sim F_X, \qquad Y' \sim F_Y. +$$ + +Consider the probability + +$$ +P(X' > Y'). +$$ + +This quantity measures how frequently values from distribution $X$ exceed values from distribution $Y$. + +Similarly, + +$$ +P(Y' > X') +$$ + +measures the opposite directional comparison. + +Because + +$$ +P(X' > Y') + P(Y' > X') + P(X'=Y') = 1, +$$ + +these probabilities provide a complete comparison of the two distributions. + +For continuous distributions, ties occur with probability zero, so + +$$ +P(X' > Y') + P(Y' > X') = 1. +$$ + +For discrete distributions, ties occur with positive probability. In this case, use the tie-adjusted directional probability + +$$ +p^* = P(X' > Y') + \tfrac{1}{2}P(X' = Y'). +$$ + +This preserves symmetry and keeps the directional comparison centered at $0.5$ when the two distributions are indistinguishable. + +This simple probability comparison already provides more information than a binary hypothesis test: it directly measures **which distribution tends to produce larger outcomes**. + +--- + +### Directional Probability and Effect Size + +The directional probability + +$$ +p = P(X' > Y') +$$ + +has a natural interpretation as an **effect-size measure**. + +- $p = 0.5$ indicates that the two distributions are indistinguishable. +- $p > 0.5$ indicates that $X$ tends to produce larger values. +- $p < 0.5$ indicates that $Y$ tends to produce larger values. + +Unlike hypothesis testing, this directional probability does not depend on arbitrary significance thresholds and remains directly interpretable as a frequency statement. If one reports the symmetric certainty transform $C = |2p - 1|$, then $C = 0$ (not $1$) corresponds to indistinguishable distributions, while $C = 1$ indicates complete separation. + +--- + +### Directional Probability Comparisons + +Directional probability comparisons can be extended using partial moments. + +Let $t$ be a benchmark. The probability that $X$ exceeds the benchmark while $Y$ does not is + +$$ +P(X > t,\, Y \le t). +$$ + +Similarly, + +$$ +P(Y > t,\, X \le t) +$$ + +measures the opposite directional region. + +Within the partial-moment framework, these probabilities correspond to **degree-zero divergent co-partial moments**: + +$$ +DUPM_{0,0}(t,t) = P(X > t,\, Y \le t), +$$ + +$$ +DLPM_{0,0}(t,t) = P(X \le t,\, Y > t). +$$ + +The difference + +$$ +\Delta(t) = DUPM_{0,0}(t,t) - DLPM_{0,0}(t,t) +$$ + +indicates which distribution dominates relative to the benchmark. + +- If $\Delta(t) > 0$, distribution $X$ more frequently exceeds the benchmark. +- If $\Delta(t) < 0$, distribution $Y$ does. + +By examining $\Delta(t)$ across a range of benchmarks, analysts obtain a **directional dominance curve** describing where one distribution exceeds the other. + +--- + +### The Discrete–Continuous CDF Distinction and Bias Elimination + +Before developing operational comparison procedures, it is essential to confront a source of **systematic bias** embedded in the standard empirical CDF that has significant consequences for distribution comparison. + +#### The Empirical CDF as a Discrete Measure + +The empirical cumulative distribution function is identical to the **degree-zero lower partial moment ratio**: + +$$ +\hat{F}_X(t) = L_0(t; X) = \frac{1}{n} \sum_{i=1}^n \mathbf{1}_{\{x_i \le t\}}, +$$ + +where $\mathbf{1}_{\{\cdot\}}$ is the indicator function. This is a **discrete** probability measure: it assigns probability mass only at observed data points and is a step function everywhere else. Even with one million observations, it remains a step function — it is never truly continuous. + +A direct consequence of this discreteness is systematic bias in probability estimation at the sample mean. For a symmetric distribution, exactly 50% of the population lies below the mean. Yet for any finite sample, + +$$ +\hat{F}_X(\bar{x}) = \frac{1}{n}\sum_{i=1}^n \mathbf{1}_{\{x_i \le \bar{x}\}} +e 0.5 +$$ + +in general, because the sample mean typically falls between two observed values. This quantity oscillates around 0.5 and only converges asymptotically — it will never equal 0.5 for any particular finite draw. + +This is not a minor technicality. Any comparison procedure that evaluates a group's discrete CDF at a shared benchmark inherits this bias: the two CDFs will appear to differ even when the groups are identical, simply because of discretization noise. + +#### The Degree-One Partial Moment as a Continuous Probability + +The directional framework resolves this bias by replacing the discrete indicator with **area-based probability mass** using the degree-one lower partial moment ratio: + +$$ +F_1(t; X) = \frac{LPM_1(t, X)}{LPM_1(t, X) + UPM_1(t, X)}, +$$ + +where + +$$ +LPM_1(t, X) = \frac{1}{n} \sum_{i=1}^n \max(0,\, t - x_i), +\qquad +UPM_1(t, X) = \frac{1}{n} \sum_{i=1}^n \max(0,\, x_i - t). +$$ + +Rather than counting the fraction of observations below $t$, this ratio measures the fraction of the **total area of deviations** that lies below $t$. It corresponds to the continuous PDF probability + +$$ +P(X \le t) = \frac{\int_{-\infty}^t f(x)\,dx}{\int_{-\infty}^\infty f(x)\,dx}, +$$ + +capturing the area between discrete bins that the step-function CDF ignores. This is the essential connection to the derivative relationship $f(x) = dF(x)/dx$: the degree-one ratio encodes the continuous probability density information that the degree-zero CDF discards. + +In NNS, this is computed via `LPM.ratio(degree = 1, target, variable)`. + +#### The Mean-Target Property: Exact Bias Elimination + +A fundamental property follows from the algebraic structure of the degree-one ratio. + +**Theorem.** For any random variable $X$ with finite mean $\mu_X$, and for any sample $x_1, \dots, x_n$, + +$$ +F_1(\bar{x};\, X) = 0.5 +$$ + +exactly, where $\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$ is the sample mean. + +**Proof.** Observe that pointwise, + +$$ +(t - x_i)^+ - (x_i - t)^+ = t - x_i +$$ + +for every $i$ and every $t$. Setting $t = \bar{x}$ and summing over $i$: + +$$ +\sum_{i=1}^n (t - x_i)^+ - \sum_{i=1}^n (x_i - t)^+ = \sum_{i=1}^n (\bar{x} - x_i) = n\bar{x} - \sum_{i=1}^n x_i = 0. +$$ + +Therefore $LPM_1(\bar{x}, X) = UPM_1(\bar{x}, X)$, which gives $F_1(\bar{x}; X) = 0.5$ exactly. $\square$ + +This result holds for **every distribution, every sample size, and without any parametric assumptions**. The discrete CDF only approaches this value asymptotically; the degree-one ratio achieves it exactly from the very first observation. + +The same argument extends to the population: $F_1(\mu_X; X) = 0.5$ exactly for the population mean $\mu_X$, for any distribution with finite mean. + +#### Empirical Demonstration + +The contrast between the two representations is immediately visible in data: + +```r +library(NNS) + +set.seed(12345) +x <- rnorm(100, mean = 5, sd = 1) + +# Discrete CDF at the mean — biased +LPM.ratio(0, mean(x), x) +## [1] 0.44 + +# Continuous (area-based) probability at the mean — unbiased +LPM.ratio(1, mean(x), x) +## [1] 0.5 +``` + +With 100 observations, the discrete CDF places only 44% of mass below the sample mean. The degree-one ratio returns exactly 0.5. Increasing to 500 observations closes the gap but never eliminates it for the discrete version: + +```r +set.seed(12345) +x2 <- rnorm(500, mean = 5, sd = 1) + +LPM.ratio(0, mean(x2), x2) # discrete — still biased +## [1] 0.496 + +LPM.ratio(1, mean(x2), x2) # continuous — exact +## [1] 0.5 +``` + +Tracking both measures sequentially across every observation confirms that the degree-zero ratio oscillates around 0.5 and converges only in the limit, while the degree-one ratio is pinned at exactly 0.5 throughout: + +```r +set.seed(12345) +x <- rnorm(500) + +lpm0 <- numeric(length(x)) +lpm1 <- numeric(length(x)) + +for (i in seq_along(x)) { + lpm0[i] <- LPM.ratio(0, mean(x[1:i]), x[1:i]) + lpm1[i] <- LPM.ratio(1, mean(x[1:i]), x[1:i]) +} + +plot(lpm0, col = "red", type = "l", lwd = 2, + ylim = c(0, 1), ylab = "P(X ≤ mean)", xlab = "n") +lines(lpm1, col = "blue", lwd = 2) +abline(h = 0.5, lty = 2) +legend("topright", legend = c("LPM degree 0 (discrete)", + "LPM degree 1 (continuous)"), + col = c("red", "blue"), lwd = 2, bty = "n") +``` + +
    +![Figure 14.1. Degree-0 versus degree-1 LPM ratio convergence at the sample mean: degree-0 wanders while degree-1 remains pinned at 0.5.](images/ch14_lpm0_lpm1_diff.png) +
    + +The red line wanders; the blue line is flat at 0.5 for every $n \ge 1$. + +#### Implications for Distribution Comparison + +This property has direct consequences for the comparison methods developed in the remainder of this chapter. + +When two samples $X$ and $Y$ are evaluated at a shared benchmark — such as the grand mean $\bar{z}$ — under the null hypothesis of identical population means, both $F_1(\bar{z}; X)$ and $F_1(\bar{z}; Y)$ should return 0.5. Deviations from 0.5 then provide unambiguous evidence that the group means diverge from the grand statistic. + +Using the discrete CDF instead, both evaluations would generically differ from 0.5 even under the null, producing spurious evidence of separation. The degree-one ratio eliminates this source of false signal entirely. + +The NNS ANOVA procedure in Section 15.4.2 is built directly on this foundation. + +--- + +## Block II — Estimation mechanics + +### Empirical Estimation + +These probability comparisons can be estimated directly from sample data. + +Let + +$$ +x_1,\dots,x_n \sim X, \qquad y_1,\dots,y_m \sim Y. +$$ + +An empirical estimator of $P(X > Y)$ is + +$$ +\hat p = \frac{1}{nm} \sum_{i=1}^n \sum_{j=1}^m \mathbf{1}_{\{x_i > y_j\}}. +$$ + +This statistic measures the proportion of cross-sample comparisons in which an observation from $X$ exceeds one from $Y$. + +The estimator converges to the population probability by the law of large numbers. + +This estimator requires no parametric assumptions and uses the full sample information. + +--- + +## Block III — Diagnostics and dominance analysis + +### Directional Dominance Curves + +Benchmark-based comparisons extend this idea further. + +Define + +$$ +\hat \Delta(t) = \hat P(X>t,\,Y\le t) - \hat P(Y>t,\,X\le t). +$$ + +Plotting $\hat\Delta(t)$ across benchmarks produces a **directional dominance curve**. + +Interpretation: + +- Positive values indicate regions where distribution $X$ dominates. +- Negative values indicate regions where distribution $Y$ dominates. +- Values near zero indicate similar behavior. + +Unlike scalar summary statistics, this curve reveals **where along the distribution the differences occur**. + +For example, one distribution may dominate in the upper tail while the other dominates in the lower tail. + +Directional dominance curves therefore provide a detailed nonparametric comparison of distributions. + +--- + + + +### Severity-Weighted Distribution Comparison + +Distribution comparison need not stop at asking which sample places more probability mass below a threshold. The directional framework allows a stronger question: how quickly does adverse severity accumulate below that threshold? + +For a benchmark \(t\), degree-zero comparison evaluates +\[ +L_0(t;X)=P(X\le t), +\] +which is purely frequency-based. Degree one evaluates +\[ +L_1(t;X)=E[(t-X)_+], +\] +which aggregates the total adverse deviation below the benchmark. Degree two evaluates +\[ +L_2(t;X)=E[(t-X)_+^2], +\] +which penalizes larger deviations disproportionately. + +These degrees therefore define a general hierarchy: +\[ +\text{degree 0} \to \text{event frequency}, +\] +\[ +\text{degree 1} \to \text{aggregate adverse magnitude}, +\] +\[ +\text{degree 2} \to \text{extreme-deviation sensitivity}. +\] + +This hierarchy is useful in any setting where two distributions can have similar lower-tail frequency but very different lower-tail severity. A system may violate a benchmark only slightly more often than another system, yet do so with much larger deviations once the benchmark is crossed. Frequency alone would miss that distinction; higher-degree partial moments make it visible. + +The probability-bounds literature uses this same logic in discussions of partial-moment-ratio thresholds, though often in finance terminology. Here the broader point is more important than the label: a threshold comparison can be performed either in count space or in severity-weighted space. + + +## Block IV — Applied workflow and practical inference + + +### Practical Threshold Comparison Across Degrees + +A practical directional workflow for comparing distributions is therefore to evaluate lower-tail structure degree by degree. + +First, compare degree-zero lower-tail probabilities to assess how often observations fall below the benchmark. This recovers the familiar CDF-based comparison. + +Second, compare degree-one lower partial moments to assess how much aggregate adverse magnitude accumulates below the benchmark. + +Third, compare degree-two lower partial moments when larger deviations deserve disproportionate emphasis. + +This layered comparison is useful across domains. In forecasting, it distinguishes models that miss a target equally often but differ in the magnitude of their misses. In operations, it distinguishes supply systems that stock out with similar frequency but very different shortage depth. In reliability, it distinguishes designs whose failure margins are crossed with similar probability but different severity once crossed. + +This section also clarifies why the degree-one continuous probability representation matters. Chapter 14 established that the degree-one partial-moment ratio removes the discrete-CDF bias at the mean in finite samples. That same area-based logic provides a smoother and more interpretable path from event counting to severity-weighted comparison. + + +### Example + +Suppose two samples represent outcomes from two strategies. + +Sample $X$: + +$$ +-2, -1, 1, 3, 4 +$$ + +Sample $Y$: + +$$ +-3, -2, 0, 2, 2 +$$ + +Compute cross-sample comparisons. + +There are $5 \times 5 = 25$ comparisons. + +Counting cases where $x_i > y_j$ yields + +$$ +\hat p = 0.64. +$$ + +Interpretation: + +- Distribution $X$ tends to produce larger values than $Y$. +- The estimated directional exceedance probability is $0.64$, indicating moderate directional advantage for $X$. + +This effect-size interpretation provides a clear and intuitive comparison without invoking hypothesis tests. + +--- + +### NNS ANOVA: CDF-Based Distribution Comparison + +The directional framework motivates a fully operational procedure for comparing distributions. The NNS ANOVA method uses the degree-one lower partial moment CDF developed in Section 15.1.8 and evaluates distributional similarity across both the grand mean and selected quantiles. + +To avoid notation ambiguity, this section uses a dedicated symbol, $C_{\text{ANOVA}}$, for the NNS ANOVA certainty score. + +#### The LPM-Based Continuous CDF + +The degree-one partial moment CDF established in Section 15.1.8 provides the measurement foundation for NNS ANOVA. Recall that + +$$ +F_1(t; X) = \frac{LPM_1(t, X)}{LPM_1(t, X) + UPM_1(t, X)}, +$$ + +and that $F_1(\bar{x}; X) = 0.5$ exactly for the sample mean, for any distribution and any sample size. + +This **mean-target property** provides a distribution-free anchor for comparing means across samples: under the null that two groups share a common population mean, both groups' degree-one CDFs evaluated at the grand mean will return 0.5. Any deviation signals distributional separation. Because the degree-one ratio eliminates finite-sample discretization bias (Section 15.1.8), this signal is clean — not contaminated by the systematic oscillation present in the discrete CDF. + +The degree-one CDF also exhibits greater smoothness than the empirical step function, particularly in small samples, because it encodes area-based probability mass rather than point counts. This smoothness reduces noise sensitivity and improves stability across repeated samples. + +#### Grand Mean and the NNS Certainty Statistic + +To compare two distributions, NNS ANOVA proceeds as follows. + +Let $x_1, \dots, x_n$ and $y_1, \dots, y_m$ denote the control and treatment samples. The **grand statistic** is the sample-size weighted mean of the two group means: + +$$ +\bar{z} = \frac{n\bar{x} + m\bar{y}}{n + m}. +$$ + +This pooled form ensures the reference point reflects the actual composition of the combined sample, giving appropriately greater weight to whichever group contributes more observations. + +Each sample's degree-one partial moment CDF is evaluated at $\bar{z}$: + +$$ +F_1(\bar{z}; X), \qquad F_1(\bar{z}; Y). +$$ + +Under the null hypothesis that both samples share a common population mean equal to $\bar{z}$, both CDFs should evaluate to approximately 0.5 by the mean-target property. Deviations from 0.5 reflect evidence that the sample means diverge from the grand statistic. + +The NNS ANOVA certainty statistic is computed from five deviation terms. Let $\delta_0$ denote the maximum absolute deviation of either group's CDF from 0.5 at the grand mean, capped at 0.5: + +$$ +\delta_0 = \min\!\bigl(0.5,\, \max(|F_1(\bar{z}; X) - 0.5|,\, |F_1(\bar{z}; Y) - 0.5|)\bigr). +$$ + +Four additional terms are computed at upper and lower quantile targets. The upper 25% target is the average of the 75th upper partial moment quantile of each group, and the lower 25% target is the average of the 75th lower partial moment quantile; analogous targets are constructed at the 12.5% level. At each target $q$, the deviation $\delta_q$ is defined as the maximum absolute departure of either group's partial moment ratio from the expected null value $q$, capped at $q$. + +The full-distribution certainty statistic is then + +$$ +\begin{aligned} +\text{Certainty}_{\text{ANOVA, raw}} +&= \frac{1}{2.5} \Bigg[ +\frac{(0.5 - \delta_0)^2}{0.25} ++ 0.5 \cdot \frac{(0.25 - \delta_{0.25}^{U})^2}{0.0625} \\ +&\qquad + 0.5 \cdot \frac{(0.25 - \delta_{0.25}^{L})^2}{0.0625} ++ 0.25 \cdot \frac{(0.125 - \delta_{0.125}^{U})^2}{0.015625} \\ +&\qquad + 0.25 \cdot \frac{(0.125 - \delta_{0.125}^{L})^2}{0.015625} +\Bigg]. +\end{aligned} +$$ + +Each term is a squared relative deviation from its null value, weighted by benchmark coverage so central benchmarks dominate the score while outer-tail benchmarks remain contributory but less influential. The mean benchmark receives weight 1 because it is the primary location anchor; each 25% tail benchmark receives 0.5 because it targets one-quarter tail regions on each side; and each 12.5% extreme-tail benchmark receives 0.25 to avoid overweighting sparse extremes. The sum is normalized by total weight 2.5. In means-only mode, only the first term is used and divided by 1 rather than 2.5. + +When medians rather than means are the comparison target, the CDF evaluation uses the degree-zero partial moment ratio $LPM_0(\bar{z}; X) / (LPM_0(\bar{z}; X) + UPM_0(\bar{z}; X))$, which counts frequency mass rather than area mass, in place of the degree-one ratio. + +A **population size adjustment** is applied before the final certainty is returned: + +$$ +\text{Certainty}_{\text{ANOVA}} = \min\!\left(1,\; \text{Certainty}_{\text{ANOVA, raw}} \times \left(\frac{n + m - 2}{n + m}\right)^2\right). +$$ + +This correction reduces certainty modestly for small combined samples, reflecting the increased estimation uncertainty when fewer observations inform the CDF comparisons. As $n + m \to \infty$ the adjustment approaches one and becomes negligible. + +By construction, $\text{Certainty}_{\text{ANOVA}} = 1$ indicates maximal agreement at the grand mean and tail benchmarks, while values closer to $0$ indicate stronger disagreement. + +This formulation inverts the conventional hypothesis-testing orientation: rather than a p-value measuring evidence against the null, the certainty statistic directly expresses the degree of distributional agreement. + +#### Full Distribution vs. Means-Only Comparison + +NNS ANOVA can be applied in two modes. + +In **full distribution mode**, the certainty statistic is computed across the grand mean and all quantile benchmarks, measuring overall distributional similarity. This mode is sensitive to differences in both location and spread. + +In **means-only mode**, the certainty statistic is computed only at the grand mean, measuring whether the sample means differ. This mode parallels the objective of a classical t-test but without distributional assumptions. + +When **medians** are of interest rather than means, the grand statistic is replaced by the combined median, and the evaluation proceeds accordingly. + +#### Effect Size and Confidence Intervals + +Beyond the certainty statistic, NNS ANOVA provides **effect size estimates** with associated confidence intervals. + +Effect sizes are computed by bootstrapping both groups independently. For each of $B$ bootstrap resamples, the mean (or median) of the control and treatment are recorded, yielding empirical distributions of $\bar{x}^*$ and $\bar{y}^*$. The confidence bounds are then read from these bootstrap distributions using partial moment quantiles at the specified $\alpha$ level. + +The effect size bounds are defined as the conservative range of plausible treatment effects: + +$$ +\text{Effect Size}^{LB} = \bar{y}^*_{\alpha/2} - \bar{x}^*_{1-\alpha/2}, +\qquad +\text{Effect Size}^{UB} = \bar{y}^*_{1-\alpha/2} - \bar{x}^*_{\alpha/2}, +$$ + +where $\bar{x}^*_{\alpha/2}$ and $\bar{x}^*_{1-\alpha/2}$ denote the lower and upper bootstrap quantiles of the control mean at the specified confidence level, and analogously for the treatment. The lower bound pairs the pessimistic treatment outcome against the optimistic control; the upper bound does the reverse. This conservative construction ensures that if zero lies outside the interval, the effect is detectable with confidence even under the most unfavorable pairing of bootstrap tails. + +#### Robust Estimation via Bootstrap Resampling + +The certainty statistic can be made more robust through bootstrap resampling. In this mode, the control and treatment samples are independently resampled with replacement across a specified number of iterations (typically 100), and the certainty statistic is recomputed for each resample. + +The resulting distribution of certainty values provides: + +- A **robust certainty estimate**: the median or mean certainty across bootstrap resamples. +- A **confidence interval** for the certainty statistic itself, reflecting sampling uncertainty. + +This bootstrap approach is particularly valuable with small samples or in the presence of outliers, where the point estimate of certainty may be unstable. + +#### Relationship to Power + +A key advantage of the NNS certainty framework over classical p-values is its explicit relationship to statistical power. + +Classical p-values conflate the magnitude of a difference with the precision of its estimation. With large samples, even negligible differences produce small p-values. With small samples, meaningful differences may not reach significance at all. + +The NNS certainty statistic, by contrast, reflects the actual probability mass separation between distributions and scales naturally with sample size. The mechanism is direct: certainty measures how far apart the CDFs of the two distributions are at the grand mean and selected quantiles, so it tracks the actual signal — the degree of separation between distributions — rather than conflating signal with sample size. Empirically, NNS certainty correlates more strongly with test power $(1 - \beta)$ than do p-values, which show weaker and more volatile associations. + +This connection means that certainty values provide information about both the size of the difference and the reliability of its detection—a combination unavailable from p-values alone. + +#### Multi-Group and Pairwise Comparisons + +The NNS ANOVA framework extends naturally to **multiple groups**. When more than two samples are supplied, the procedure computes a grand statistic across all groups and evaluates each group's CDF at this common benchmark. Pairwise certainty values can also be returned, summarizing all bilateral comparisons in matrix form. + +This multi-group capability directly parallels classical one-way ANOVA, but without the assumption of normality or equal variance across groups. + + +--- + +### Stochastic Superiority + +Stochastic superiority asks a different question than equality of means or equality of distributions. Rather than asking whether two samples came from the same population, or whether they share the same mean or median, it measures the probability that a random draw from one distribution exceeds a random draw from the other. + +Let + +$$ +X' \sim F_X, \qquad Y' \sim F_Y, +$$ + +independently. The stochastic superiority probability is + +$$ +p_{X,Y} = P(X' > Y'). +$$ + +For continuous distributions, ties occur with probability zero, so $p_{X,Y} + p_{Y,X} = 1$. For discrete or mixed distributions, ties may occur with positive probability. In that case the tie-adjusted comparison is + +$$ +p^*_{X,Y} = P(X' > Y') + \tfrac{1}{2}P(X' = Y'). +$$ + +This adjustment preserves symmetry, + +$$ +p^*_{X,Y} + p^*_{Y,X} = 1, +$$ + +and keeps the comparison centered at $0.5$ when neither distribution has a directional advantage. + +A value of $p^*_{X,Y} = 0.5$ indicates no directional advantage. Values above $0.5$ favor $X$, and values below $0.5$ favor $Y$. One may also report the certainty-style transform + +$$ +C_{SS} = |2p^*_{X,Y} - 1|, +$$ + +which maps the comparison to $[0,1]$, where $0$ denotes no directional separation and $1$ denotes complete separation. Unlike a p-value, both $p^*_{X,Y}$ and $C_{SS}$ retain a direct frequency interpretation. + +This differs from stochastic dominance. Stochastic superiority is a pairwise exceedance probability, while stochastic dominance requires one distribution to be preferred over the entire shared support. It also differs from NNS ANOVA. NNS ANOVA asks whether the distributions are in agreement at the grand mean and selected benchmark points; stochastic superiority asks which distribution tends to generate larger draws overall. It is therefore stronger than a simple mean comparison, because it uses the full cross-sample ordering, but weaker than dominance, because it does not require the ordering to hold at every threshold. A distribution may have $p^*_{X,Y} > 0.5$ and still fail to dominate if the CDFs cross. + +Given samples + +$$ +x_1, \dots, x_n \sim X, \qquad y_1, \dots, y_m \sim Y, +$$ + +the empirical estimator is + +$$ +\hat p = \frac{1}{nm} \sum_{i=1}^n \sum_{j=1}^m \mathbf{1}_{\{x_i > y_j\}}, +$$ + +with tie-adjusted form + +$$ +\hat p^* = \frac{1}{nm} \sum_{i=1}^n \sum_{j=1}^m \left[\mathbf{1}_{\{x_i > y_j\}} + \tfrac{1}{2}\mathbf{1}_{\{x_i = y_j\}}\right]. +$$ + +These estimators use all pairwise cross-sample comparisons and require no parametric assumptions. + +In **`NNS`**, stochastic superiority is computed with `NNS.SS()`. The function returns the directional exceedance probability, the tie probability, and the tie-adjusted superiority probability. Confidence intervals can also be obtained by resampling. + +```r +library(NNS) + +set.seed(123) +x <- rnorm(1000, mean = 0, sd = 1) +y <- rnorm(1000, mean = 1, sd = 1) + +NNS.SS(x, y) +``` + +Because the second sample is shifted to the right, the superiority probability for $X$ relative to $Y$ should fall below $0.5$, while the superiority probability for $Y$ relative to $X$ should exceed $0.5$. + +For discrete data, ties should be reported rather than ignored: + +```r +set.seed(123) +x <- sample(1:5, 100, replace = TRUE) +y <- sample(1:5, 100, replace = TRUE) + +NNS.SS(x, y) +``` + +This is especially important in ordinal and categorical applications where equal outcomes are common. In practice, stochastic superiority is often the cleanest first effect-size summary to report, because it answers the most direct comparative question: how often does one distribution beat the other? + + +--- + +### Stochastic Dominance + +The directional probability comparison introduced in Section 15.1.6 measures how often observations from one distribution exceed those from another. This idea can be formalized into a preference ordering known as **stochastic dominance**. + +Stochastic dominance provides a rigorous nonparametric criterion for determining when one distribution is unambiguously preferred to another, without specifying a utility function beyond minimal regularity conditions. + +#### First-Order Stochastic Dominance + +Distribution $X$ **first-order stochastically dominates** distribution $Y$, written $X \succ_1 Y$, if + +$$ +F_X(t) \le F_Y(t) \quad \text{for all } t, +$$ + +with strict inequality for at least one $t$. + +Equivalently, $X$ dominates $Y$ in the first order if and only if every non-decreasing utility function assigns at least as high an expected value to $X$ as to $Y$. This means any decision-maker who prefers more to less will prefer $X$. + +In terms of the empirical CDF, $X \succ_1 Y$ whenever the distribution of $X$ lies entirely to the right of the distribution of $Y$ across all quantiles. + +#### Second-Order Stochastic Dominance + +Distribution $X$ **second-order stochastically dominates** distribution $Y$, written $X \succ_2 Y$, if + +$$ +\int_{-\infty}^t F_X(s)\, ds \le \int_{-\infty}^t F_Y(s)\, ds \quad \text{for all } t. +$$ + +Second-order dominance captures risk aversion: $X \succ_2 Y$ if and only if every non-decreasing concave utility function prefers $X$. A distribution can dominate in the second order without dominating in the first, provided any CDF crossings are compensated by area accumulation. + +#### Third-Order Stochastic Dominance + +Distribution $X$ **third-order stochastically dominates** distribution $Y$ if the iterated integral condition holds: + +$$ +\int_{-\infty}^t \int_{-\infty}^s F_X(u)\, du\, ds +\le +\int_{-\infty}^t \int_{-\infty}^s F_Y(u)\, du\, ds +\quad \text{for all } t. +$$ + +Third-order dominance adds a condition on the skewness of the CDF integral and corresponds to preference among agents who are risk-averse and have decreasing absolute risk aversion (DARA). It permits distributions to intersect in the first-order sense as long as earlier-order deficits are offset by later-order surpluses. + +#### Connection to Partial Moments + +Stochastic dominance criteria have natural expressions in terms of partial moments. + +First-order stochastic dominance is equivalent to the condition that the lower partial moment of degree zero satisfies + +$$ +LPM_0(t, X) \le LPM_0(t, Y) \quad \text{for all } t. +$$ + +Because $LPM_0(t, X) = F_X(t)$, this is the direct CDF criterion. + +Second-order dominance can be expressed through the degree-one lower partial moment: + +$$ +LPM_1(t, X) \le LPM_1(t, Y) \quad \text{for all } t. +$$ + +Since $LPM_1(t, X) = \int_{-\infty}^t F_X(s)\, ds$, this recovers the integral condition exactly. + +Third-order dominance corresponds to the degree-two lower partial moment condition: + +$$ +LPM_2(t, X) \le LPM_2(t, Y) \quad \text{for all } t. +$$ + +These equivalences mean that **stochastic dominance tests are partial moment comparisons** evaluated across the full support of the data. The NNS framework implements all three levels directly through empirical partial moment estimates. + +This is also the bridge to Chapter 17: the degree-one quantile objects used for bias-corrected prediction intervals (`LPM.VaR(..., degree = 1, ...)`, `UPM.VaR(..., degree = 1, ...)`) are generated from the same degree-one lower/upper partial moment geometry used here for SSD and TSD diagnostics. Put differently, interval construction and dominance testing are not separate methods — they are two uses of the same directional probability representation. + +#### Empirical Stochastic Dominance Tests + +Given samples $x_1, \dots, x_n$ and $y_1, \dots, y_m$, the empirical first-order dominance test checks whether + +$$ +\hat F_X(t) \le \hat F_Y(t) +$$ + +holds for all evaluation points $t$ in the combined support of the data. If the condition holds everywhere, $X$ first-order stochastically dominates $Y$. If it fails at some points, the distributions intersect and neither dominates at the first order. + +Second- and third-order tests proceed analogously, replacing the empirical CDF with its iterated integrals, which correspond to the empirical degree-one and degree-two lower partial moments evaluated at each point. + +The NNS implementations `NNS.FSD()`, `NNS.SSD()`, and `NNS.TSD()` perform these evaluations directly and return which distribution, if any, dominates at each order. + + +#### Stochastic Dominant Efficient Sets + +When comparing more than two distributions simultaneously, the concept of dominance generalizes to the notion of an **efficient set**: the collection of distributions that are not dominated by any other distribution at the specified order. + +Formally, the first-order stochastic dominant efficient set is + +$$ +\mathcal{E}_1 = \{X_i : +exists\, X_j \text{ such that } X_j \succ_1 X_i\}. +$$ + +Distributions outside this set are dominated and can be excluded by any decision-maker with a non-decreasing utility function. + +The `NNS.SD.efficient.set()` function identifies the efficient set across an arbitrary collection of distributions at first, second, or third order. This is particularly useful in portfolio selection, strategy evaluation, and any setting where a large number of alternatives must be ranked. + +#### Stochastic Dominance Clustering + +An extension of the efficient set concept groups distributions into **stochastic dominance clusters**: collections of distributions that share similar dominance relationships with one another. + +Within a cluster, no member dominates any other at the specified order. Across clusters, members of higher-ranked clusters tend to dominate members of lower-ranked clusters. + +The `NNS.SD.cluster()` function implements this procedure and can render results as a **dendrogram** showing hierarchical dominance relationships. This visualization reveals which groups of distributions are interchangeable in the preference order and which are strictly ranked relative to others. + + +--- + +### Practical Inference Without Parametric Assumptions + +The directional framework emphasizes **probability comparisons and effect sizes** rather than binary hypothesis tests. + +Key advantages include: + +**Nonparametric validity.** +No assumptions are required about the distributional form of the data. + +**Interpretability.** +Probability comparisons directly answer practical questions such as "How often does one outcome exceed another?" Use directional exceedance probabilities such as $P(X' > Y')$ for pairwise comparison, and use $\text{Certainty}_{\text{ANOVA}}$ for NNS ANOVA distributional agreement. + +**Directional insight.** +Benchmark-based comparisons reveal where differences occur within the distribution, not merely whether they exist. + +**Robustness.** +Results do not depend on arbitrary significance thresholds, and bootstrap-based robustness estimation provides reliable inference under small samples or outliers. + +**Power awareness.** +The NNS certainty statistic correlates directly with test power, addressing a fundamental limitation of classical p-values. + +**Preference ordering.** +Stochastic dominance tests express distributional preference in terms that are directly linked to decision theory, enabling selection among competing alternatives without specifying a complete utility function. + +These properties make directional distribution comparison particularly useful in fields such as finance, economics, and risk management where **tail behavior and asymmetric outcomes** often matter more than average differences. + +--- + +### Example Dataset Workflow + +A convenient applied example is the mtcars transmission split from the NNS distribution-comparison vignette. Let Group A be miles-per-gallon for automatic transmissions, `mtcars$mpg[mtcars$am == 0]`, +and let Group B be miles-per-gallon for manual transmissions, `mtcars$mpg[mtcars$am == 1]`. +This yields two empirical distributions on the same response variable and provides a direct setting for comparing stochastic superiority, NNS ANOVA, and stochastic dominance. + + +```{r chapter14-mtcars-workflow, eval=FALSE} +auto_mpg <- mtcars$mpg[mtcars$am == 0] +manual_mpg <- mtcars$mpg[mtcars$am == 1] + +# 1. Pairwise directional advantage +NNS.SS(manual_mpg, auto_mpg) +# $p_gt +# [1] 0.8259109 +# +# $p_tie +# [1] 0.008097166 +# +# $p_star +# [1] 0.8299595 + +# 2. Full-distribution comparison +NNS.ANOVA(control = auto_mpg, + treatment = manual_mpg, + robust = TRUE) +# $Control +# [1] 17.14737 +# +# $Treatment +# [1] 24.39231 +# +# $Grand_Statistic +# [1] 20.09063 +# +# $Control_CDF +# [1] 0.8708501 +# +# $Treatment_CDF +# [1] 0.1294878 +# +# $Certainty +# [1] 0.02345583 +# +# $`Effect_Size_LB.2.5%` +# [1] 2.377328 +# +# $`Effect_Size_UB.97.5%` +# [1] 12.2155 +# +# $Confidence_Level +# [1] 0.95 +# +# $`Robust Certainty Estimate` +# [1] 0.01094359 +# +# $`Lower 95% CI` +# [1] 3.864872e-06 +# +# $`Upper 95% CI` +# [1] 0.1048396 + +# 3. Preference ordering over the full support +NNS.FSD(manual_mpg, auto_mpg) +# [1] "X FSD Y" +``` + +For this comparison, `NNS.SS(manual_mpg, auto_mpg)` yields `p_gt = 0.8259109`, `p_tie = 0.008097166`, +and `p_star = 0.8299595`, indicating that a randomly selected manual-transmission car exceeds a randomly selected automatic-transmission car in miles-per-gallon about 83 percent of the time. +`NNS.ANOVA(control = auto_mpg, treatment = manual_mpg, robust = TRUE)` returns `Certainty = 0.02345583` and `Robust Certainty Estimate = 0.01094359`, indicating very little distributional agreement between the two groups. +`NNS.FSD(manual_mpg, auto_mpg)` returns "X FSD Y", implying that the manual-transmission miles-per-gallon distribution first-order stochastically dominates the automatic-transmission distribution. Together, these results show pairwise directional advantage, +weak distributional agreement, and full-support preference for the manual-transmission group. + +### Summary + +Classical distribution comparison is usually framed as a sequence of tests that end in accept-or-reject decisions. The directional framework developed here shifts the emphasis from binary testing to interpretable probability comparisons and partial-moment geometry. + +At the most direct level, two distributions can be compared through probabilities such as $P(X > Y)$ and the tie-adjusted stochastic superiority measure $P^* = P(X' > Y') + \tfrac{1}{2}P(X' = Y')$. This provides a pairwise effect size with an immediate interpretation: how often does one distribution generate larger outcomes than the other? + +At the distributional-agreement level, the degree-one continuous CDF removes the finite-sample bias of the empirical CDF at the mean, with $F_1(\bar{x}; X) = 0.5$ exactly. That property makes NNS ANOVA a distribution-free and bias-free comparison procedure. Rather than depending on parametric assumptions, it evaluates agreement through benchmark-relative CDF deviations and reports an interpretable certainty statistic together with effect sizes and robust confidence intervals. + +At the strongest level, stochastic dominance extends directional comparison from pairwise exceedance to full preference ordering over the support. First-, second-, and third-order dominance can all be written as partial-moment inequalities, which shows that dominance analysis, efficient sets, and dominance clustering are all natural extensions of the same directional probability representation. + +Taken together, these tools provide a coherent hierarchy for nonparametric distribution comparison. Stochastic superiority answers the pairwise question, NNS ANOVA answers the agreement question, and stochastic dominance answers the preference-ordering question. diff --git a/tools/NNS/book/chapter-16-directional-tail-thresholds-probability-bounds-and-estimation-error.Rmd b/tools/NNS/book/chapter-16-directional-tail-thresholds-probability-bounds-and-estimation-error.Rmd new file mode 100644 index 0000000..602aab4 --- /dev/null +++ b/tools/NNS/book/chapter-16-directional-tail-thresholds-probability-bounds-and-estimation-error.Rmd @@ -0,0 +1,328 @@ +# Directional Tail Thresholds, Probability Bounds, and Estimation Error + +Previous chapters developed the directional framework for probability, dependence, distribution comparison, and prediction. A natural next step is threshold analysis. + +In many practical settings, the analyst is not interested only in a central interval for future observations. The analyst also wants to understand how a process behaves near adverse regions of its distribution: how often a benchmark is crossed, how severe the benchmark violations are once they occur, how conservative tail-probability statements remain under weak assumptions, and how stable the resulting estimates are in finite samples. + +These questions arise in many domains: + +- forecast errors relative to a service target, +- inventory levels relative to a replenishment threshold, +- reliability metrics relative to a safety margin, +- environmental measurements relative to a policy limit, +- financial returns relative to a minimum acceptable outcome. + +The common structure is benchmark-relative tail analysis. The same directional operators that generated distribution functions and quantile intervals in earlier chapters also generate threshold rules, directional probability bounds, and finite-sample diagnostics. This chapter develops that connection. + +The main thesis is simple: + +1. degree-0 partial moments recover the usual lower-tail probability and quantile threshold, +2. higher-degree partial moments generate severity-weighted threshold rules, +3. semivariance and higher lower partial moments yield distribution-free upper bounds for tail probabilities, +4. and the practical usefulness of these quantities depends on their estimation stability. + +Thus quantiles, semivariance, lower partial moments, and estimation error are not separate topics. They are manifestations of the same benchmark-relative geometry. + +--- + +## Why Threshold Analysis Needs More Than Quantiles + +A lower-tail quantile identifies where adverse observations begin to accumulate. If \(X\) is a random variable and \(\alpha \in (0,1)\), the lower \(\alpha\)-quantile is + +\[ +Q_X(\alpha)=\inf\{t : F_X(t)\ge \alpha\}. +\] + +This is already enough to partition a chosen fraction of lower-tail probability mass. + +But quantiles alone do not answer two additional questions that matter in practice. + +First, how conservative is the selected threshold if the data are skewed, heavy-tailed, or otherwise poorly described by a parametric model? + +Second, how stable is the threshold estimate in finite samples? + +The first question is about probability control. A threshold may appear numerically precise, but if it relies on a misspecified model then its practical interpretation can be fragile exactly where decisions are most sensitive. + +The second question is about sample sensitivity. Any statistic used in a decision system must stabilize as information accumulates. A threshold or directional probability measure that behaves erratically under modest sample variation may be mathematically elegant but operationally weak. + +The purpose of this chapter is therefore broader than quantile selection alone. It is to show that the directional framework supports: + +- threshold selection, +- severity-weighted thresholding, +- directional probability bounds, +- and finite-sample estimation diagnostics + +within one common structure. + +--- + +## Degree-Zero Thresholds and Their Directional Meaning + +A foundational identity of the directional framework is + +\[ +L_0(t;X)=P(X\le t)=F_X(t). +\] + +That result means the cumulative distribution function is not external to the partial-moment framework. It is the degree-0 lower partial moment itself. + +Therefore the lower-tail threshold at probability level \(\alpha\) is + +\[ +t_\alpha^{(0)}=\inf\{t : L_0(t;X)\ge \alpha\}. +\] + +This is simply the ordinary lower quantile written in directional form. + +In finance this degree-0 threshold is often called Value-at-Risk, but the mathematical object is more general than that label. The same degree-0 threshold can represent: + +- a maximum tolerated forecast shortfall, +- a minimum service threshold, +- a lower safety boundary, +- or any analyst-defined adverse benchmark. + +The structural meaning is identical in every case: + +\[ +t_\alpha^{(0)} +\] + +is the benchmark at which an \(\alpha\) fraction of observations lies at or below the threshold. + +Thus degree 0 is the **frequency-calibrated** threshold rule. + +--- + +## From Frequency to Severity: Higher-Degree Thresholds + +Degree-0 thresholds count adverse events, but they do not distinguish between small and large deviations once the threshold is crossed. + +That limitation is exactly what higher-order partial moments correct. + +For degree \(d \ge 1\), define the lower and upper directional masses + +\[ +L_d(t;X)=E[(t-X)_+^d], \qquad U_d(t;X)=E[(X-t)_+^d]. +\] + +A normalized lower share is then + +\[ +F_d(t;X)=\frac{L_d(t;X)}{L_d(t;X)+U_d(t;X)}. +\] + +When \(d=0\), this reduces to the ordinary probability partition. When \(d=1\), observations are weighted by the magnitude of their deviation from the benchmark. When \(d=2\), large deviations receive quadratic emphasis. + +This yields a family of generalized thresholds: + +\[ +t_\alpha^{(d)}=\inf\{t : F_d(t;X)\ge \alpha\}. +\] + +The interpretation changes by degree: + +\[ +d=0 \rightarrow \text{event frequency}, +\] + +\[ +d=1 \rightarrow \text{aggregate adverse magnitude}, +\] + +\[ +d=2 \rightarrow \text{extreme-deviation sensitivity}. +\] + +So the correct conceptual reading is not that partial moments add extra domain-specific measures. It is that quantile calibration itself can be performed in different geometries: + +- raw counting geometry at degree 0, +- linear severity geometry at degree 1, +- quadratic severity geometry at degree 2. + +--- + +## Directional Probability Bounds + +Threshold selection is only part of the problem. Analysts also want distribution-free upper bounds on the probability of threshold violation. + +Suppose \(g < \mu\), where \(\mu = E[X]\), and consider the lower-tail event + +\[ +X \le g. +\] + +A classical one-sided Chebyshev argument bounds this probability using only the mean and variance: + +\[ +P(X \le g)\le \frac{1}{2}\left(\frac{\sigma}{\mu-g}\right)^2. +\] + +A directional refinement replaces symmetric standard deviation with semideviation: + +\[ +P(X \le g)\le \left(\frac{\sigma_-}{\mu-g}\right)^2, +\] + +where \(\sigma_-\) measures only downside dispersion. + +A more general bound uses lower partial moments of degree \(\alpha\). Define + +\[ +\theta(t,\alpha)=\left(E[(t-X)_+^\alpha]\right)^{1/\alpha}. +\] + +Then, for \(g \le t\), + +\[ +P(X\le g)\le \left(\frac{\theta(t,\alpha)}{t-g}\right)^\alpha. +\] + +So the directional hierarchy of probability control is + +\[ +\text{symmetric variance bound} \to \text{semivariance bound} \to \text{general lower-partial-moment bound}. +\] + +Each step aligns the bound more closely with the side of the distribution that matters for the decision. + +--- + +## Severity-Weighted Thresholds as Early-Intervention Rules + +Once higher-degree thresholds are viewed as severity-weighted quantiles, an important practical feature becomes clear. + +A degree-1 or degree-2 threshold can be less extreme than the degree-0 threshold and yet still produce milder realized tail behavior. + +This is not a contradiction. It occurs because higher-degree thresholds are calibrated in weighted directional mass, not in raw event counts. + +Mathematically, this makes sense. The degree-2 rule assigns much more weight to large adverse deviations than to small ones. If a 10-unit shortfall contributes \(10^2\) units of quadratic severity while a 1-unit shortfall contributes only \(1^2\), then the threshold naturally shifts toward earlier intervention. + +The same logic applies across domains: + +- in forecasting, the rule intervenes before very large misses accumulate, +- in operations, it triggers replenishment before deep shortages form, +- in engineering, it signals action before large safety-margin breaches dominate the lower tail. + +Thus higher-degree thresholds are best interpreted as **early-intervention rules under asymmetric cost**. + +--- + +## Model Misspecification and Robustness + +Directional thresholds are especially useful when parametric models misrepresent the lower tail. + +The chapter-level lesson is general. Parametric misspecification is often most consequential exactly in the tail region where decision costs are highest. + +The directional framework responds in two ways. + +First, it estimates thresholds directly from empirical directional structure via degree-0 or higher-degree partial-moment quantiles. + +Second, it supplements those empirical thresholds with distribution-free probability bounds that remain valid under far weaker assumptions than a fully specified parametric family. + +This is particularly important under skewness, heavy tails, and asymmetric adverse regions, where symmetric models can understate the severity of rare but important events. + +--- + +## Estimation Error and Sample-Size Sensitivity + +A directional statistic is only operationally useful if it stabilizes as sample size grows. + +Estimation error is therefore not a peripheral concern. It is central to any threshold-based or benchmark-driven decision process. If a statistic is unstable, then even a mathematically correct threshold rule can become unreliable in practice. + +The key empirical question is whether partial moments behave at least as well as classical mean-variance quantities under regular conditions and whether they improve upon them when the data are asymmetric or heavy-tailed. + +This matters well beyond portfolio optimization. Any benchmark-driven procedure depends on stable estimation of lower-tail structure. If lower partial moments and semideviation remain well behaved under skewness and heavy tails, then they are not only conceptually aligned with directional asymmetry. They are also strong candidates for practical nonparametric measurement when classical symmetric summaries are fragile. + +In particular, the stability of degree-0 partial moments reinforces the result that the cdf itself is a partial moment. The cdf is not merely a theoretical building block; it is also a stable empirical object within the directional system. + +--- + +## Utility, Decision Context, and Why Degree Matters + +The correct threshold degree depends on the decision problem. + +In general benchmark-relative terms, the lesson is: + +- if the main concern is **how often** a threshold is crossed, degree 0 is appropriate; +- if the concern is **how much aggregate damage** accumulates below the threshold, degree 1 is more natural; +- if the concern is **rare but severe violations**, degree 2 or higher can be more appropriate. + +A benchmark may be a target return, a policy threshold, a forecast baseline, a service minimum, or a safety limit. The degree determines how adverse deviation relative to that benchmark is measured. + +So the framework is not one-threshold-fits-all. It is a family of threshold rules indexed by the geometry of the adverse region. + +--- + +## Practical Workflow + +A general workflow for directional tail analysis is: + +1. **Choose a benchmark context.** + Identify the lower threshold region that matters substantively. + +2. **Estimate the degree-0 threshold.** + Compute + + \[ + t_\alpha^{(0)}=\inf\{t:L_0(t;X)\ge \alpha\}. + \] + + This yields the frequency-calibrated threshold. + +3. **Estimate higher-degree thresholds.** + Compute degree-1 and degree-2 threshold rules through normalized directional mass: + + \[ + t_\alpha^{(d)}=\inf\{t:F_d(t;X)\ge \alpha\}. + \] + +4. **Bound lower-tail probability conservatively.** + Use one-sided Chebyshev, semivariance, and Atwood-style lower-partial-moment bounds to assess worst-case violation probabilities. + +5. **Compare with parametric approximations if relevant.** + Large discrepancies indicate model risk in the tail. + +6. **Assess finite-sample stability and sample-size sensitivity.** + Implement the sample-size sensitivity diagnostics from Section 17.7 using the **Maximum Entropy Bootstrap** workflow developed in Chapter 17, especially when threshold rules feed larger decision or optimization systems. + +This workflow makes clear why threshold analysis, probability bounds, and estimation error belong together. + +--- + +## Summary + +This chapter extended the directional framework from interval estimation to full tail-threshold analysis. + +The key ideas are: + +- The lower-tail quantile is the degree-0 partial-moment threshold because + + \[ + L_0(t;X)=F_X(t). + \] + +- Higher-degree thresholds are severity-weighted calibrations of the lower tail, not merely alternative labels. + +- Semivariance and higher lower partial moments yield distribution-free upper bounds on threshold-violation probabilities. + +- Severity-weighted thresholds can act as early-intervention rules because they respond to adverse magnitude, not just event counts. + +- Parametric misspecification matters most in the tail, so empirical directional thresholds and probability bounds provide complementary robustness. + +- Partial moments form a coherent nonparametric language for threshold selection, probability control, and finite-sample decision support. + +In that sense, quantiles, semivariance, lower partial moments, and estimation error belong together. They are generated by the same benchmark-relative primitives and serve the same broader purpose: to make probability, thresholds, and adverse deviation analysis interpretable without relying on restrictive symmetry or parametric assumptions. + +Chapter 17 then supplies the synthetic-data and Maximum Entropy Bootstrap machinery used to operationalize these stability checks, after which Chapter 18 returns to recursive mean-split estimation for adaptive nonparametric regression. + +--- + +## References + +- Berck, P., & Hihn, J. (1982). *Using the Semivariance to Estimate Safety-First Rules*. *American Journal of Agricultural Economics*, May 1982, 298-300. +- Atwood, M. (1985). *Demonstration of the Use of Lower Partial Moments to Improve Safety-First Probability Limits*. *American Journal of Agricultural Economics*, 67(4), 880-886. DOI: 10.2307/1241818. +- Rockafellar, R. T., & Uryasev, S. (2000). *Optimization of Conditional Value-at-Risk*. *Journal of Risk*, 2(3), 21-41. +- Chebyshev, P. L. (1867). *Des valeurs moyennes*. *Journal de Mathématiques Pures et Appliquées*, 12, 177-184. +- Viole, F., & Nawrocki, D. (2012). *Cumulative Distribution Functions and UPM/LPM Analysis*. SSRN. DOI: https://dx.doi.org/10.2139/ssrn.2148482 +- Viole, F. (2025). *Value-at-Risk (VaR) and Probability Bounds Analysis* (June 18, 2025). SSRN. Available at: https://ssrn.com/abstract=5310345. DOI: http://dx.doi.org/10.2139/ssrn.5310345 +- Nawrocki, D., & Viole, F. (2024). *Estimation error and partial moments*. *International Review of Financial Analysis*, 95, Part B, 103443. DOI: https://doi.org/10.1016/j.irfa.2024.103443 diff --git a/tools/NNS/book/chapter-17-prediction-intervals.Rmd b/tools/NNS/book/chapter-17-prediction-intervals.Rmd new file mode 100644 index 0000000..4265b62 --- /dev/null +++ b/tools/NNS/book/chapter-17-prediction-intervals.Rmd @@ -0,0 +1,405 @@ +# Prediction Intervals + +Previous chapters established the directional framework for probability, dependence, and statistical inference. Chapter 13 derived conditional probability and Bayes' theorem from co-partial moments, and Chapter 15 developed methods for comparing entire distributions without relying on parametric hypothesis tests — including the formal establishment of the degree-one partial moment CDF as a bias-free, continuous probability representation. + +A natural next step is **prediction**. + +In statistical analysis it is often necessary to estimate the range within which **future observations** are likely to occur. Classical statistics addresses this problem using *confidence intervals* and *prediction intervals*. These concepts are frequently confused, but they serve fundamentally different purposes. + +This chapter clarifies the distinction and develops **distribution-free prediction intervals** based on partial moments. Because the directional framework represents distributions directly through probability mass relative to benchmarks, interval estimation can be constructed without parametric assumptions and without relying on asymptotic approximations. + +--- + +## Confidence Intervals versus Prediction Intervals + +A **confidence interval** estimates an unknown population parameter. + +For example, a classical confidence interval for the mean takes the form + +$$ +\bar{X} \pm z_{\alpha/2} \frac{s}{\sqrt{n}}, +$$ + +where + +- $\bar{X}$ is the sample mean, +- $s$ is the sample standard deviation, +- $n$ is the sample size. + +This interval reflects uncertainty about the **parameter** $\mu = E[X]$. + +A **prediction interval**, by contrast, estimates the range within which a **future observation** will fall. + +For a normally distributed population, a classical prediction interval is + +$$ +\bar{X} \pm t_{\alpha/2,n-1} s \sqrt{1 + \frac{1}{n}}. +$$ + +Prediction intervals are wider because they incorporate two sources of uncertainty: + +1. uncertainty about the population parameters, and +2. natural variability of individual observations. + +In practice, however, classical prediction intervals rely heavily on **distributional assumptions**, particularly normality. + +The directional framework allows prediction intervals to be constructed **directly from the empirical distribution**, avoiding these assumptions entirely. + +--- + +## The Discrete–Continuous Distinction: Recap and Application to Intervals + +Chapter 15 (Section 15.1.8) established a fundamental result that carries directly into interval estimation. The empirical CDF is a **degree-zero lower partial moment** — a discrete, step-function measure that is systematically biased at the mean for any finite sample: + +$$ +\hat{F}_X(\bar{x}) = \frac{1}{n}\sum_{i=1}^n \mathbf{1}_{\{x_i \le \bar{x}\}} \ne 0.5 +$$ + +in general. The **degree-one partial moment ratio** + +$$ +F_1(t; X) = \frac{LPM_1(t, X)}{LPM_1(t, X) + UPM_1(t, X)} +$$ + +eliminates this bias exactly: $F_1(\bar{x}; X) = 0.5$ for every distribution, every sample size, without exception (Section 15.1.8). + +This distinction matters for interval estimation because any quantile procedure that uses the discrete CDF to locate interval bounds inherits the same finite-sample bias — producing intervals that are systematically shifted or miscalibrated. The degree-one ratio corrects this at the source. + +--- + +## Prediction as a Quantile Problem + +Prediction intervals can be understood as **quantile intervals** of the underlying distribution. + +Let $X$ be a random variable with cumulative distribution function $F_X(t)$. The $p$-quantile is defined as + +$$ +Q_X(p) = \inf \{t : F_X(t) \ge p\}. +$$ + +A prediction interval with coverage probability $1-\alpha$ is therefore + +$$ +[Q_X(\alpha/2), \, Q_X(1-\alpha/2)]. +$$ + +For example, a 95% prediction interval corresponds to + +$$ +[Q_X(0.025),\, Q_X(0.975)]. +$$ + +Classical parametric methods estimate these quantiles using assumed distributions. The directional framework estimates them **directly from partial moments**, with the additional capability to choose between discrete (degree-zero) and continuous (degree-one) probability representations. + +Prediction intervals are one application of quantile inversion, but they are not the only one. The same inversion logic can also be used to select benchmark thresholds for directional decision analysis. + +A lower-tail threshold chosen by +\[ +\inf\{t:F_X(t)\ge \alpha\} +\] +can be interpreted as the lower endpoint of a tail-quantile construction. In finance this object is often called Value-at-Risk, but the underlying mathematics is much more general. It may represent an acceptable forecast error, a reliability boundary, a minimum service level, or any other adverse threshold defined relative to a benchmark. + +This distinction is conceptual rather than mathematical. Prediction intervals ask for a range likely to contain future observations. Threshold analysis asks where the lower tail begins to contain a specified fraction of directional mass. In both cases the central task is quantile inversion. + +Once higher degrees are introduced, the interpretation broadens further. Degree-zero thresholds partition observations by frequency. Higher-degree thresholds partition them by severity-weighted directional mass. Thus the quantile framework supports both interval estimation and benchmark-sensitive threshold design. + +--- + +## Partial-Moment Quantile Functions: LPM.VaR and UPM.VaR + +The NNS package provides two complementary quantile functions that invert the partial moment CDF: + +**`LPM.VaR(percentile, degree, variable)`** — returns the value $t$ such that $F_\text{degree}(t; X) = p$ for the lower tail. This is the left-tail (lower) quantile at probability level $p$. + +**`UPM.VaR(percentile, degree, variable)`** — returns the value $t$ such that $1 - F_\text{degree}(t; X) = p$ for the upper tail. This is the right-tail (upper) quantile at probability level $p$. + +Both functions accept a degree argument. Setting `degree = 0` uses the discrete empirical CDF and therefore returns classical empirical quantiles. Setting `degree = 1` uses the continuous area-based probability representation established in Chapter 15. Higher degrees extend the same inversion principle to severity-weighted directional probability. Thus `LPM.VaR` is best interpreted not as a finance-specific tool, but as a general lower-tail threshold operator generated by the partial-moment framework. + +For a prediction interval with coverage $1 - \alpha$: + +```r +# 95% prediction interval using continuous (degree = 1) quantiles +lower <- LPM.VaR(percentile = 0.025, degree = 1, x = x) +upper <- UPM.VaR(percentile = 0.025, degree = 1, x = x) +``` + +The degree-one quantiles provide **smoother, less jagged interval boundaries** than their degree-zero counterparts, particularly in small samples where the step-function CDF may produce large jumps between adjacent order statistics. + + + +### Generalized Threshold Operators + +The notation `LPM.VaR` can appear more domain-specific than the underlying mathematics actually is. In the present framework, the function returns the benchmark \(t\) that solves a lower-tail threshold problem under a chosen directional degree. + +When \(d=0\), the threshold is +\[ +t_\alpha^{(0)}=\inf\{t:L_0(t;X)\ge \alpha\}, +\] +which is the ordinary empirical lower quantile. + +When \(d=1\), the threshold instead solves +\[ +t_\alpha^{(1)}=\inf\left\{t:\frac{L_1(t;X)}{L_1(t;X)+U_1(t;X)}\ge \alpha\right\}. +\] +This no longer counts all observations equally. Larger deviations below the benchmark contribute more heavily to the lower-tail mass. + +Similarly, when \(d=2\), +\[ +t_\alpha^{(2)}=\inf\left\{t:\frac{L_2(t;X)}{L_2(t;X)+U_2(t;X)}\ge \alpha\right\}, +\] +so extreme adverse deviations receive quadratic weight. + +These thresholds therefore define a family of directional calibration rules: +\[ +\text{degree 0: frequency-calibrated threshold}, +\] +\[ +\text{degree 1: magnitude-calibrated threshold}, +\] +\[ +\text{degree 2: extreme-deviation-calibrated threshold}. +\] + +This interpretation is fully general. It applies whenever one wishes to choose a threshold not only by how often a process crosses it, but also by how severely the process behaves once crossed. + + +## Link to Stochastic Dominance (Chapter 15) + +Chapters 15 and 17 use the same degree-one quantile geometry from different angles. + +- In **stochastic dominance** (Chapter 15), the ordering is defined by integrated quantile/CDF behavior. +- In **prediction intervals** (this chapter), interval endpoints are degree-zero or degree-one quantiles from `LPM.VaR` and `UPM.VaR`. + +The methods are mathematically unified: the degree-one objects used to construct bias-corrected intervals are the same continuous probability objects used to diagnose dominance relations. + +--- + +## Comparison with Bootstrap Confidence Intervals + +The difference between discrete and continuous partial moment quantiles becomes especially apparent when applied to **bootstrap confidence intervals**. + +Consider the correlation statistic from the `law` dataset (Efron and Tibshirani, 1993). Standard bootstrap methods produce a range of confidence intervals depending on the method chosen: + +```r +library(bootstrap); library(boot) +data("law") + +get_r <- function(data, indices, x, y) { + d <- data[indices, ] + round(as.numeric(cor(d[x], d[y])), 3) +} + +set.seed(12345) +boot_out <- boot(law, x = "LSAT", y = "GPA", R = 500, statistic = get_r) + +boot.ci(boot_out) +## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS +## Level Normal Basic +## 95% ( 0.5247, 1.0368 ) ( 0.5900, 1.0911 ) +## Level Percentile BCa +## 95% ( 0.4609, 0.9620 ) ( 0.3948, 0.9443 ) +``` + +The distribution of bootstrapped correlations is visibly asymmetric — there is a long left tail and the upper bound exceeds 1.0, which is impossible for a correlation coefficient. + +## Degree-Zero Partial Moment Intervals + +The degree-zero LPM quantile corresponds exactly to the percentile bootstrap method. The discrete CDF assigns equal weight to each bootstrap replicate, producing the same step-function quantiles: + +```r +# Discrete lower and upper CI — corresponds to percentile method +LPM.VaR(percentile = 0.025, degree = 0, x = boot_out$t) +## [1] 0.4688333 + +UPM.VaR(percentile = 0.025, degree = 0, x = boot_out$t) +## [1] 0.9632222 +``` + +## Degree-One Partial Moment Intervals + +The degree-one quantile uses area-based probability, which weights replicates by their distance from the boundary. This naturally down-weights extreme observations and corrects for asymmetry — without a double bootstrap: + +```r +# Continuous CI — bias-corrected, no double bootstrap needed +LPM.VaR(percentile = 0.025, degree = 1, x = boot_out$t) +## [1] 0.5612749 + +UPM.VaR(percentile = 0.025, degree = 1, x = boot_out$t) +## [1] 0.8263255 +``` + +The degree-one interval $(0.561,\, 0.826)$ is substantially tighter and more symmetric than the percentile interval $(0.461,\, 0.963)$ — and without requiring the computationally expensive double bootstrap needed for studentized intervals. The upper bound is comfortably below 1.0. + +Classical bootstrap corrections (BCa, studentized) attempt to fix asymmetry through additional resampling passes or acceleration constants. The degree-one partial moment approach achieves comparable correction by simply replacing the discrete counting measure with area-based probability on the **original** bootstrap sample — the same substitution that eliminates bias in the NNS ANOVA procedure (Chapter 15, Section 15.1.8). + +--- + +## Distribution-Free Prediction Intervals + +Using the NNS quantile functions, a prediction interval with coverage probability $1-\alpha$ is constructed as + +$$ +[\text{LPM.VaR}(\alpha/2,\, d,\, X),\; \text{UPM.VaR}(\alpha/2,\, d,\, X)] +$$ + +for a chosen degree $d \in \{0, 1\}$. + +**Degree 0** recovers the classical empirical quantile interval — the order statistics at ranks $\lceil n\alpha/2 \rceil$ and $\lceil n(1-\alpha/2) \rceil$. + +**Degree 1** provides a continuous, bias-corrected alternative that avoids the finite-sample discretization error documented in Section 15.1.8. + +No parametric assumptions, no variance estimates, and no asymptotic approximations are required by either variant. + + +More generally, one may define a lower-tail directional threshold by +\[ +t_\alpha^{(d)}=\mathrm{LPM.VaR}(\alpha,\text{ degree }=d,X), +\] +where \(d=0\) corresponds to frequency-based probability and \(d\ge 1\) corresponds to severity-weighted directional probability. The interpretation of the threshold changes with \(d\), but the computational principle remains unchanged: invert the chosen directional probability representation. + +--- + +## Example + +Consider the fully specified sample + +$$ +X = -2,\,-1,\,0,\,1,\,2,\,3,\,5,\,7,\,9,\,10. +$$ + +For a 90% prediction interval ($\alpha = 0.10$): + +```r +x <- c(-2, -1, 0, 1, 2, 3, 5, 7, 9, 10) + +# Degree 0: classical empirical quantiles +LPM.VaR(percentile = 0.05, degree = 0, x = x) +## [1] -2 +UPM.VaR(percentile = 0.05, degree = 0, x = x) +## [1] 10 + +# Degree 1: continuous area-based quantiles +LPM.VaR(percentile = 0.05, degree = 1, x = x) +## [1] -1.65 +UPM.VaR(percentile = 0.05, degree = 1, x = x) +## [1] 9.15 +``` + +The degree-zero interval is $[-2, 10]$ — anchored exactly at the observed extremes. The degree-one interval is tighter, reflecting the continuous probability mass that lies between order statistics. With only $n=10$ observations, this difference is practically meaningful. + +--- + + + +### Worked Threshold Example: Frequency versus Severity + +To see how degree changes interpretation, consider a sample \(X\) and a lower-tail calibration level \(\alpha\). + +A degree-zero threshold is +\[ +t_\alpha^{(0)}=\mathrm{LPM.VaR}(\alpha,\text{ degree }=0,X), +\] +which partitions the sample by event frequency. Approximately an \(\alpha\) fraction of observations lies below the selected threshold. + +Now compare this with degree-one and degree-two thresholds: +\[ +t_\alpha^{(1)}=\mathrm{LPM.VaR}(\alpha,\text{ degree }=1,X), +\qquad +t_\alpha^{(2)}=\mathrm{LPM.VaR}(\alpha,\text{ degree }=2,X). +\] +These thresholds are chosen in a different geometry. They do not partition simple counts. Instead they partition severity-weighted directional mass. Large adverse deviations therefore influence the threshold more strongly than small ones. + +This leads to an important practical phenomenon. A severity-weighted threshold may occur at a milder benchmark than a frequency-based threshold, yet still produce a less severe set of threshold violations in realized use. The reason is that the rule is designed to trigger earlier, before the most extreme adverse deviations dominate the lower tail. The probability-bounds literature demonstrates this effect empirically in finance examples, but the mechanism is general and should be understood as an early-intervention property of higher-degree threshold calibration. + +This distinction matters in many domains: + +* in forecasting, a severity-weighted threshold warns before very large misses accumulate, +* in operations, it triggers replenishment before severe shortages emerge, +* in engineering, it signals intervention before large safety-margin breaches dominate the tail. + +What appears paradoxical under degree-zero counting becomes natural under directional weighting. The threshold is solving a different partition problem. + + +## Interpretation and Coverage + +Prediction intervals constructed from partial moment quantiles possess a simple probabilistic interpretation. + +For general threshold analysis, the interpretation can be stated succinctly. Under degree zero, the lower-tail threshold controls event frequency. Under higher degrees, the threshold controls weighted adverse exposure. The probability statement is therefore exact in the ordinary counting sense only for degree zero. For higher degrees, what is controlled is not raw event count but directional mass in a weighted geometry. + +Let $X_{n+1}$ denote a future observation drawn from the same distribution as the sample. Then + +$$ +P\bigl(\text{LPM.VaR}(\alpha/2, d, X) \le X_{n+1} \le \text{UPM.VaR}(\alpha/2, d, X)\bigr) \approx 1 - \alpha. +$$ + +For degree $d = 0$, the approximation converges to equality as $n \to \infty$. + +For degree $d = 1$, the continuous probability representation reduces finite-sample error, providing improved coverage in small samples without parametric assumptions. + +Unlike parametric prediction intervals, neither variant depends on distributional assumptions. Coverage is determined entirely by the empirical probability structure. + +--- + +## Conditional Prediction Intervals via NNS Regression + +Conditional intervals derived from `NNS.reg` are developed in detail in Chapter 21, where the regression estimator is introduced formally and interpreted geometrically. + +For continuity, the key idea is that `NNS.reg(..., confidence.interval = ...)` builds local intervals from partition-specific empirical distributions. In this chapter, we focus on the unconditional interval mechanics (`LPM.VaR` / `UPM.VaR`) that those later conditional constructions rely on. + +```r +set.seed(12345) +x <- runif(1000, -2, 2) +y <- sin(pi * x_train) + rnorm(1000, sd = 0.2) + +NNS.reg(x = x_train, y = y_train, order = NULL, confidence.interval = .95) +``` + +
    +![Figure 15.1. `NNS.reg(..., confidence.interval = 0.95)` visualization with nonlinear fit and interval bands estimated from local partitions.](images/ch15_reg_conf_int.png) +
    + +## Advantages of the Directional Approach + +Prediction intervals based on partial moments possess several advantages over classical methods. + +**Bias elimination.** +As established in Chapter 15 (Section 15.1.8), the degree-one partial moment ratio places exactly 50% of probability mass below the mean for every distribution and every sample size. Interval bounds derived from degree-one quantiles do not inherit the systematic bias of the discrete empirical CDF. + +**Distribution-free construction.** +No assumptions about normality or parametric form are required. + +**Robustness.** +Intervals are determined by empirical probability mass rather than moment estimates that may be sensitive to extreme observations. + +**No double bootstrap required.** +Asymmetry correction that classical methods achieve only with computationally expensive double-bootstrap or BCa procedures is obtained automatically from the continuous partial moment representation. + +**Conditional adaptivity.** +Through the NNS regression framework, prediction intervals adapt to local data structure — capturing nonlinearity and heteroskedasticity without parametric specification. + +**Interpretability.** +Intervals correspond directly to probability statements about future observations, rooted in directional partial-moment representations. + +A further advantage of the directional approach is that it unifies interval estimation and threshold-based decision analysis. The analyst need not switch theories when moving from prediction intervals to adverse-threshold selection. In both cases the task is to invert a directional probability representation. The only difference is whether mass is counted equally, as in degree zero, or weighted by severity, as in higher degrees. + +A second advantage is that threshold analysis can be separated from domain-specific naming conventions. Lower-tail quantiles, conditional lower-tail means, semivariance, and higher-order partial moments can all be interpreted inside a single benchmark-relative framework. The estimation-error literature supports this broader view by treating partial moments as nonparametric statistical objects with useful asymptotic behavior, rather than merely as specialized measures for one field. + +--- + +## Summary + +Prediction intervals describe the range within which future observations are expected to occur. + +Classical prediction intervals rely on parametric assumptions and variance estimates. Bootstrap methods correct for asymmetry but require multiple resampling passes. The directional framework constructs intervals directly from the **empirical distribution represented by partial moments**, with an additional layer of bias correction available through the continuous (degree-one) probability representation established in Chapter 15. + +Key results: + +- The discrete CDF ($d=0$) is a degree-zero partial moment — unbiased asymptotically but systematically biased for any finite sample (Chapter 15, Section 15.1.8). +- The continuous partial moment CDF ($d=1$) eliminates this bias exactly, with $F_1(\bar{x}; X) = 0.5$ without exception. +- `LPM.VaR` and `UPM.VaR` invert these CDFs to produce distribution-free quantile intervals at either degree. +- Degree-one intervals match or exceed bias-corrected bootstrap intervals without the need for double resampling. +- Conditional prediction intervals from `NNS.reg` adapt automatically to local nonlinearity and heteroskedasticity. + +These ideas complete the core framework for **nonparametric statistical inference using directional statistics**. The next chapter turns to **directional tail thresholds, probability bounds, and estimation error**, extending interval logic into benchmark-driven tail-risk decisions before Chapter 17 develops synthetic data generation and bootstrap methods for uncertainty quantification. + +--- diff --git a/tools/NNS/book/chapter-18-recursive-mean-split-estimation.Rmd b/tools/NNS/book/chapter-18-recursive-mean-split-estimation.Rmd new file mode 100644 index 0000000..23abdff --- /dev/null +++ b/tools/NNS/book/chapter-18-recursive-mean-split-estimation.Rmd @@ -0,0 +1,548 @@ +# Recursive Mean-Split Estimation + +Part V of the book turns from probability representation and inference to **nonparametric estimation**. +The goal of estimation is to recover unknown functional relationships from data without imposing a predetermined parametric form. + +A central object in many statistical problems is the **conditional mean function** + +\[ +f(x) = E[Y \mid X = x]. +\] + +Classical regression models estimate this relationship by specifying a functional form — linear, polynomial, or otherwise — and fitting parameters to the data. + +Nonparametric methods instead estimate \(f(x)\) directly from observations. +Partition-based estimators are among the most intuitive approaches: the predictor space is divided into regions, and the conditional mean is estimated locally within each region. + +This chapter introduces the **recursive mean-split estimator**, the partition-based nonparametric method at the core of the NNS framework. +The estimator recursively divides the data into regions based on conditional mean structure, producing a flexible estimator that adapts automatically to nonlinear relationships. + +Two distinct splitting modes are available. In **joint partitioning**, regions are defined by splitting simultaneously on both the predictor \(X\) and the response \(Y\) at their joint means, producing four partial-moment quadrants at each level. In **\(X\)-only partitioning**, regions are defined by splitting solely on the predictor mean, producing two subregions at each level. Both modes share the same limit condition and recursive logic; they differ in how region boundaries are located and how many subregions each split creates. + +The procedure reflects the same directional logic used throughout the book: just as variance decomposes into directional components relative to a benchmark, recursive mean splitting partitions the joint distribution according to deviations around conditional means. + +A key theoretical result is established here: the recursive mean-split estimator belongs to the well-studied class of **data-adaptive partition estimators**, and consistency is inherited directly from that class under standard conditions on cell diameter and occupancy. This is not a limitation requiring further development — it is a strength. The estimator sits within a family whose convergence properties are fully characterized in the literature, and the NNS contribution is the specific splitting rule, the partial-moment geometric interpretation, and the multivariate architecture built on top of that foundation. + +--- + +## Motivation for Partition-Based Estimation + +Suppose we observe independent pairs + +\[ +(X_1, Y_1), \dots, (X_n, Y_n), +\] + +where + +\[ +Y_i = f(X_i) + \varepsilon_i, +\] + +and + +\[ +E[\varepsilon_i \mid X_i] = 0. +\] + +The objective is to estimate the unknown regression function + +\[ +f(x) = E[Y \mid X = x]. +\] + +### Parametric approaches + +Parametric regression assumes a functional form such as + +\[ +f(x) = \beta_0 + \beta_1 x. +\] + +While simple and interpretable, this assumption can be severely restrictive when the relationship between variables is nonlinear. + +### Nonparametric alternatives + +Nonparametric estimators relax these assumptions. Common examples include + +- kernel regression, +- local polynomial regression, +- partition estimators. + +Kernel methods estimate \(f(x)\) through weighted averages of nearby observations. +However, they require selecting a **bandwidth parameter** that determines the smoothing scale, and the probability that any kernel function assigns positive mass to an exact observed value is zero — the estimate therefore cannot achieve an exact fit at all observations simultaneously. + +Partition estimators divide the predictor space into regions and compute averages within each region. +The recursive mean-split method belongs to this class but differs from classical approaches through its **partial-moment splitting rule**, which anchors region boundaries directly to the geometry of the conditional mean. The estimator is consistent for the true conditional mean under the same regularity conditions that govern all partition estimators in this class — conditions whose sufficiency has been established in the statistical literature. As the order parameter increases, the number of regions grows and the estimator converges to a perfect fit in finite steps. + +--- + +## Consistency by Class Membership + +The recursive mean-split estimator is an instance of the **data-adaptive partition estimator** class studied by Stone (1977), Lugosi and Nobel (1996), and Györfi, Kohler, Krzyżak, and Walk (2002). + +The general result for this class is as follows. + +**Theorem (Consistency of Partition Estimators).** Let \((X_1, Y_1), \dots, (X_n, Y_n)\) be independent and identically distributed, with \(E[Y^2] < \infty\). Let \(\hat{f}_n\) be the partition estimator + +\[ +\hat{f}_n(x) = \frac{1}{N_{n,x}} \sum_{i : X_i \in A_n(x)} Y_i, +\] + +where \(A_n(x)\) is the cell of a data-adaptive partition \(\mathcal{P}_n\) containing \(x\). If the partition satisfies + +\[ +\operatorname{diam}(A_n(x)) \xrightarrow{P} 0 +\qquad \text{and} \qquad +N_{n,x} \xrightarrow{P} \infty +\] + +as \(n \to \infty\), then + +\[ +E\bigl[(\hat{f}_n(x) - f(x))^2\bigr] \to 0. +\] + +The recursive mean-split estimator satisfies both conditions under the standard occupancy and order growth assumptions used in practice. As \(n \to \infty\) with order parameter \(O = O(n)\) growing appropriately and minimum occupancy held fixed: + +- each cell's diameter converges to zero because successive splits are anchored to sample means, which concentrate around the true conditional mean as \(n\) grows, producing finer and finer partitions in regions where the function varies; +- each cell's occupancy grows because the data density in any fixed region of the predictor space grows linearly with \(n\). + +Consequently, **consistency of the recursive mean-split estimator is inherited directly from the established theory of partition estimators.** This is not an approximate or informal claim. The estimator belongs to a well-characterized class, and the convergence result applies. + +What the NNS framework contributes beyond this class membership is: + +1. a specific splitting rule grounded in the partial-moment geometry of the data, +2. a finite-order perfect-fit property unavailable to kernel estimators, +3. a multivariate architecture — developed fully in Chapter 21 — that operates on per-regressor regression points rather than on the raw observation cloud, substantially mitigating the curse of dimensionality. + +--- + +## Conditional Mean Partitions + +Let \(P_n\) denote a partition of the data space into regions + +\[ +A_1, A_2, \dots, A_K. +\] + +Within each region, the conditional mean is estimated by averaging the responses of observations whose predictors fall inside that region. + +For a predictor value \(x\), let + +\[ +A_n(x) +\] + +be the cell containing \(x\). The partition estimator is + +\[ +\hat{f}_n(x) += +\frac{1}{N_{n,x}} +\sum_{i : X_i \in A_n(x)} Y_i, +\] + +where + +\[ +N_{n,x} = \#\{ i : X_i \in A_n(x) \} +\] + +is the number of observations in the cell. + +The estimate of \(f(x)\) is therefore the **average response within the region containing \(x\)**. + +This structure is flexible in three respects: + +- Regions adapt to nonlinear features of the conditional mean surface. +- Local averages approximate the conditional expectation without a specified functional form. +- Estimation is entirely data-driven. + +The **size and shape of regions determine the effective smoothing scale**: coarser regions produce smoother estimates; finer regions track local variation more closely. + +--- + +## Recursive Splitting Algorithms + +The key design choice is how the regions \(A_k\) are constructed. + +The NNS recursive mean-split estimator builds partitions through an iterative procedure governed by an **order parameter** \(O\). +At each order, each existing region is further subdivided. +When left unspecified in the package interface (`order = NULL`), recursion depth is determined internally for each regressor according to its directional dependence with the response; Chapter 21 formalizes this dependence-driven adaptivity in the regression workflow. + +Two splitting modes are available, depending on whether regions are anchored to the joint distribution of \((X, Y)\) or to the marginal distribution of \(X\) alone. + +--- + +## Joint Partitioning + +Joint partitioning defines regions by splitting simultaneously on both \(X\) and \(Y\) at their local means. + +### Initialization + +Begin with all \(n\) observations as a single region. + +### Splitting rule + +For a region \(R\) containing observations \(\{(X_i, Y_i)\}_{i \in I_R}\), compute the local means + +\[ +\bar{X}_R = \frac{1}{|I_R|} \sum_{i \in I_R} X_i, +\qquad +\bar{Y}_R = \frac{1}{|I_R|} \sum_{i \in I_R} Y_i. +\] + +Partition the observations in \(R\) into four quadrants according to whether each observation lies above or below \(\bar{X}_R\) and \(\bar{Y}_R\): + +| Quadrant | Condition | Label | +|----------|-----------|-------| +| CoUPM | \(X_i > \bar{X}_R\) and \(Y_i > \bar{Y}_R\) | NE | +| DUPM | \(X_i \le \bar{X}_R\) and \(Y_i > \bar{Y}_R\) | NW | +| DLPM | \(X_i > \bar{X}_R\) and \(Y_i \le \bar{Y}_R\) | SE | +| CoLPM | \(X_i \le \bar{X}_R\) and \(Y_i \le \bar{Y}_R\) | SW | + +These four quadrants correspond directly to the four co-partial moment regions introduced in Chapter 10. +The joint mean \((\bar{X}_R, \bar{Y}_R)\) is the point at which + +\[ +U_1(\bar{X}_R; X \mid R) = L_1(\bar{X}_R; X \mid R) +\quad \text{and} \quad +U_1(\bar{Y}_R; Y \mid R) = L_1(\bar{Y}_R; Y \mid R), +\] + +so the split occurs precisely where the first-order upper and lower partial moments are balanced on both dimensions. Each observation is assigned to one of the four quadrants, with a **quadrant identification number** recorded at each level. + +### Recursion + +Within each nonempty quadrant, repeat the same procedure: compute local means and split into four subquadrants. +Continuing to order \(O\) produces at most \(4^{O-1}\) nonempty regions. + +### Regression points + +The **regression points** of the partition are the local means \((\bar{X}_R, \bar{Y}_R)\) within each region. +These points summarize the conditional mean surface and serve as the basis for prediction and curve fitting. + +```r +x <- seq(-5, 5, .05) +y <- x ^ 3 + +for(i in 1 : 4){NNS.part(x, y, order = i, Voronoi = TRUE, obs.req = 0)} +``` + +
    +![Figure 18.1. `NNS.part` joint partitioning for orders 1–4 (Voronoi view), showing progressive recursive refinement.](images/ch18_part_1.png) +
    + +--- + +## \(X\)-Only Partitioning + +\(X\)-only partitioning defines regions by splitting solely on the mean of the predictor \(X\), without reference to \(Y\). + +### Initialization + +Begin with all \(n\) observations as a single region. + +### Splitting rule + +For a region \(R\) with observations \(\{(X_i, Y_i)\}_{i \in I_R}\), compute the local predictor mean + +\[ +\bar{X}_R = \frac{1}{|I_R|} \sum_{i \in I_R} X_i. +\] + +Partition the observations into two subregions: + +\[ +R_L = \{ i \in I_R : X_i \le \bar{X}_R \}, +\qquad +R_U = \{ i \in I_R : X_i > \bar{X}_R \}. +\] + +Quadrant identifications in this mode are limited to the symbols 1 (left of the split) and 2 (right of the split). + +### Recursion + +Within each nonempty subregion, repeat the same procedure. +Continuing to order \(O\) produces at most \(2^O\) nonempty regions. + +### Regression points + +The regression points are the local means \((\bar{X}_R, \bar{Y}_R)\) within each predictor-defined region. +Because the region boundaries are determined by \(X\) alone, the regression points make use of the **full bandwidth** of the response values within each region for their \(Y\) coordinate. + +```r +x <- seq(-5, 5, .05) +y <- x ^ 3 + +for(i in 1 : 4){NNS.part(x, y, order = i, type = "XONLY", Voronoi = TRUE, obs.req = 0)} +``` + +
    +![Figure 18.2. `NNS.part(..., type = "XONLY")` predictor-only partitioning across recursive splits.](images/ch18_part_2.png) +
    + +--- + +## Comparison of Modes + +Both modes share the same conceptual foundation — recursive splitting at local means — and converge to a perfect fit as \(O\) grows. They differ in the following respects. + +| Property | Joint partitioning | \(X\)-only partitioning | +|---|---|---| +| Split dimensions | \(X\) and \(Y\) jointly | \(X\) only | +| Subregions per split | 4 | 2 | +| Regions at order \(O\) | at most \(4^{O-1}\) | at most \(2^O\) | +| Quadrant IDs | digits 1–4 | digits 1–2 | +| Region definition | Joint (X,Y) partial-moment quadrants | Predictor-space intervals | +| Regression points | Joint quadrant means | Predictor-interval means | + +Joint partitioning converges faster due to exponential growth in regions and provides a richer representation of the joint distribution. +\(X\)-only partitioning is more directly analogous to classical predictor-space partition estimators and is often preferred when a clear functional relationship \(Y = f(X)\) is the primary object of interest. + +### Stopping criteria + +Splitting terminates when any of the following conditions are met: + +- A region contains fewer than a user-specified minimum number of observations (minimum occupancy threshold). +- The order parameter \(O\) reaches a user-specified maximum. + +The minimum occupancy threshold is quantitative and directly governs the coarsest admissible partition. Under any fixed occupancy threshold, as \(n \to \infty\) the occupied regions shrink and the occupancy within each region grows, satisfying the two conditions required for consistency in the partition estimator class. + +--- + +## Partial-Moment Interpretation of Joint Partitioning + +The joint splitting rule has a precise interpretation in terms of the partial-moment framework developed in earlier chapters. + +Recall from Chapter 5 that variance decomposes into directional components: + +\[ +\text{Var}(X) = U_2(\mu_X; X) + L_2(\mu_X; X), +\] + +where + +\[ +U_2(\mu_X; X) = E[(X - \mu_X)_+^2] +\quad \text{and} \quad +L_2(\mu_X; X) = E[(\mu_X - X)_+^2]. +\] + +Recursive joint splitting applies this logic at the level of the **conditional distribution within each region**. + +Each split separates observations into four directional groups defined by their co-partial-moment quadrant membership — the same four regions introduced in Chapter 10: + +- **CoUPM**: above \(\bar{X}_R\) and above \(\bar{Y}_R\), +- **CoLPM**: below \(\bar{X}_R\) and below \(\bar{Y}_R\), +- **DLPM**: above \(\bar{X}_R\) and below \(\bar{Y}_R\), +- **DUPM**: below \(\bar{X}_R\) and above \(\bar{Y}_R\). + +The joint mean \((\bar{X}_R, \bar{Y}_R)\) is the unique point at which the first-order partial moments balance on both axes. Splitting at this point therefore exactly decomposes the local joint variability into its four directional components. + +By recursively applying this decomposition, the algorithm parses the joint distribution into regions of progressively refined directional structure. +Areas where the conditional mean changes most rapidly — where the CoUPM and CoLPM quadrants are most unequal in their response means — are split more frequently as \(O\) increases, because the local means shift and the quadrant boundaries realign at each level. + +The resulting partition structure therefore reflects the geometry of the conditional mean surface, not an externally imposed grid. + +--- + +## Multivariate Predictors + +For a predictor vector \(X_i \in \mathbb{R}^d\), the recursive mean-split procedure extends through an architecture designed specifically to mitigate the curse of dimensionality. This architecture is described in full in Chapter 21; its foundations are laid here. + +### Per-regressor partitioning against the response + +The key structural decision in the multivariate case is that **each predictor is partitioned independently against the response**, rather than partitioning the joint predictor space. + +For predictor \(j \in \{1, \dots, d\}\ \), recursive mean splitting is applied to the pairs \((X_i^{(j)}, Y_i)\) to produce a set of regression points for that predictor — local conditional means summarizing the relationship between \(X^{(j)}\) and \(Y\). These per-regressor partitions are each governed by the same splitting rules described above and produce a set of \(K_j\) regression points for predictor \(j\). + +This is not joint partitioning of the full \(d\)-dimensional predictor space. It is a collection of univariate partitions, each anchored to the response. The important consequence is that the number of candidate regression points grows as \(\sum_j K_j\) — linearly in the number of regressors — rather than as \(\prod_j K_j\), which would grow exponentially and reproduce the curse. + +### Regression point matrix + +The regression points from all \(d\) per-regressor partitions are assembled into a **regression point matrix (RPM)**. Each row of the RPM corresponds to one occupied joint region in the multivariate structure; the columns record the local mean of each predictor within that region, and a final column records the corresponding local mean response. + +For a new observation \(x^* \in \mathbb{R}^d\), prediction proceeds by identifying the rows of the RPM closest to \(x^*\) across the predictor columns, then returning the distance-weighted average of the corresponding local response means. This is a nearest-neighbor search over regression points — compressed, denoised local conditional means — rather than over the \(n\) raw observations. + +The curse of dimensionality is mitigated along two dimensions simultaneously: + +1. **Search space compression.** The RPM has far fewer rows than \(n\). Each row is a local conditional mean derived from a cluster of observations, not a raw data point. +2. **Noise reduction before search.** Each regression point has already been smoothed through local averaging, so the candidates in the nearest-neighbor search carry substantially less noise than raw observations. + +The full multivariate prediction architecture — including dependence-adaptive neighbor count and alternative synthetic-predictor dimension reduction — is developed in Chapter 21. + +--- + +## Estimation Workflow + +In practice, recursive mean-split estimation follows a direct workflow. + +### Step 1 — Data preparation + +Collect paired observations + +\[ +(X_i, Y_i), \quad i = 1, \dots, n, +\] + +with \(X_i \in \mathbb{R}^d\) for the multivariate case. + +### Step 2 — Select partitioning mode + +Choose joint partitioning (using both \(X\) and \(Y\) means) or \(X\)-only partitioning (using only the predictor mean), depending on the application. + +### Step 3 — Recursive mean splitting + +Iteratively apply the mean-split rule: + +1. Compute the local conditional means within the region. +2. Divide observations according to their quadrant membership. +3. Assign quadrant identifications and record regression points. +4. Repeat within each subregion. + +### Step 4 — Stopping rule + +Terminate splitting when any region falls below the minimum occupancy threshold or when the order \(O\) reaches its maximum. + +### Step 5 — Prediction + +For a new predictor value \(x\): + +1. Identify the region \(A_n(x)\) containing \(x\) (univariate) or the matching rows in the RPM (multivariate). +2. Return the local response mean or the distance-weighted average of the nearest regression-point means as the predicted value: + +\[ +\hat{f}_n(x) += +\frac{1}{N_{n,x}} +\sum_{i : X_i \in A_n(x)} Y_i. +\] + +### Step 6 — Curve fitting (optional) + +The regression points at each order can be connected by **linear segments** to produce a piecewise-linear curve. This provides a smooth interpolating surface between partition means, supports well-defined interpolation and extrapolation, and reduces variance relative to a pure step-function estimator. + +--- + +## Limit Condition and Perfect Fit + +A notable property of the recursive mean-split estimator is its **finite-order limit condition**. + +As the order \(O\) increases, the number of regions grows — exponentially in the joint case (\(4^{O-1}\)) or geometrically in the \(X\)-only case (\(2^O\)). At a finite order \(O^*\), every observation occupies its own region and becomes its own regression point. At this limit: + +\[ +\hat{f}_n(X_i) = Y_i \quad \text{for all } i = 1, \dots, n, +\] + +and the \(R^2\) of the in-sample fit equals 1. + +This property distinguishes NNS from kernel regression, which cannot achieve an exact fit at all observations simultaneously due to the continuous support of the kernel function. The limit condition is reached in finite steps, not asymptotically. + +In practice, \(O\) is selected to balance fit quality against overfitting. Larger \(O\) reduces bias but increases variance; the appropriate order is determined by the signal-to-noise ratio of the data. The NNS dependence measure provides an objective criterion for this selection. + +--- + +## Properties of the Recursive Mean-Split Estimator + +The recursive mean-split estimator possesses several important characteristics. + +### Consistency by class inheritance + +The estimator belongs to the class of data-adaptive partition estimators studied by Stone (1977) and extended by Lugosi and Nobel (1996) and Györfi et al. (2002). Under standard conditions — shrinking cell diameter and growing cell occupancy — the class is universally consistent. The recursive mean-split estimator satisfies these conditions under the occupancy and order growth rules used in practice, and consistency is therefore established by class membership rather than requiring a separate proof from first principles. + +### Nonparametric flexibility + +No functional form is imposed on the relationship between \(X\) and \(Y\). +The estimator represents highly nonlinear relationships without specifying a parametric family. + +### Grounding in partial moments + +Region boundaries are defined by the same partial-moment structure — upper and lower deviations relative to a benchmark — that underlies the directional statistics developed throughout this book. +The estimator is therefore not simply an ad hoc tree method but an application of the NNS partial-moment framework to the estimation of conditional expectations. + +### Data-adaptive partitions + +Regions are determined entirely by the data. +Areas where the conditional mean changes rapidly receive finer partitions as \(O\) increases. + +### Implicit smoothing + +The order \(O\) governs the effective smoothing scale. +Chapter 19 formalizes this by interpreting the partition cell diameter as a **dynamic bandwidth**, connecting the estimator to classical nonparametric smoothing theory. + +### Piecewise-linear interpolation + +Connecting regression points with line segments produces an interpolating surface that is stable, interpretable, and well-behaved for interpolation and extrapolation — properties that kernel-based nonparametric regressions do not generally share. + +### Finite limit condition + +At a finite order \(O^*\), every observation occupies its own region and the estimator achieves a perfect in-sample fit — \(\hat{f}_n(X_i) = Y_i\) for all \(i\). This property is not available to kernel or polynomial methods, which are intrinsically approximate. It provides a well-defined upper bound on approximation error and a principled spectrum of fits indexed by \(O\), from a single linear approximation at \(O = 1\) to exact interpolation at \(O = O^*\). + +--- + +## Relationship to Other Methods + +The recursive mean-split estimator is related to several classical approaches but differs from each in important respects. + +### CART and decision trees + +CART algorithms select splits greedily to minimize impurity or squared prediction error, then apply pruning penalties to prevent overfitting. + +Recursive mean splitting differs in two key respects. First, the splitting criterion is the **conditional mean** of the data within the region, not an optimized impurity measure. Second, overfitting is controlled through the minimum occupancy threshold and the order parameter rather than through post-hoc pruning. The resulting partition structure follows the conditional mean geometry rather than an impurity landscape (Breiman et al., 1984). + +### Kernel regression + +Kernel estimators smooth data using a bandwidth parameter selected externally, often by cross-validation. + +Recursive mean-split estimation avoids explicit bandwidth selection: smoothing arises implicitly through the order parameter and the partition cell size. Chapter 19 makes this connection precise by showing that the cell diameter functions as a data-adaptive bandwidth. + +A further distinction concerns exactness of fit. Because all kernel functions are continuous probability distributions, the probability that a kernel function assigns positive mass to any exact observed value is zero. Consequently, the kernel estimate at any point is a weighted average over the surrounding distribution and can never exactly equal any single observation. The recursive mean-split estimator does not share this limitation: at finite order \(O^*\), it achieves an exact fit at every observed point simultaneously. + +### \(k\)-means clustering + +The NNS partition objective — minimizing within-quadrant sum of squares — is equivalent to the \(k\)-means objective when the number of clusters equals the number of partition regions. The key difference is that NNS does not require a pre-specified \(k\): the number of regions is determined by the order parameter \(O\) and the data, not by an externally supplied cluster count. + +### Fixed-grid estimators + +Uniform grids divide the predictor space into equally sized cells. +Mean-split partitions are data-driven and adapt to the density and conditional mean geometry of the sample. + +--- + +## Summary + +Recursive mean-split estimation provides a flexible, data-adaptive, nonparametric method for estimating conditional expectations. + +The key ideas developed in this chapter are: + +- **Consistency by class membership.** The recursive mean-split estimator belongs to the class of data-adaptive partition estimators, for which universal consistency under shrinking-diameter and growing-occupancy conditions has been established in the literature (Stone 1977; Lugosi and Nobel 1996; Györfi et al. 2002). The estimator satisfies both conditions under standard stopping rules, and its consistency is therefore inherited directly from this class. +- Partition-based estimation of conditional means, using the local sample average within each region as the estimator. +- Two splitting modes: joint \((X, Y)\) partitioning, which creates four partial-moment quadrants at each split, and \(X\)-only partitioning, which creates two predictor-interval subregions. +- The joint splitting rule is grounded in the co-partial-moment structure of the NNS framework: region boundaries coincide with the joint conditional means, and the four quadrants correspond exactly to the CoUPM, CoLPM, DLPM, and DUPM regions. +- An order parameter \(O\) governs partition depth; at a finite \(O^*\), the estimator achieves a perfect in-sample fit. +- Multivariate predictors are handled via per-regressor partitioning against the response, which produces a regression point matrix (RPM) of local conditional means. The search space for prediction grows linearly in the number of regressors, not exponentially, substantially mitigating the curse of dimensionality. +- Prediction connects regression points with local averages or piecewise-linear segments in the univariate case, and with distance-weighted nearest-neighbor averaging over the RPM in the multivariate case. + +Chapter 19 interprets the partition cell diameter as a **dynamic bandwidth**, linking the estimator to classical kernel smoothing theory while preserving its data-adaptive character. + +Chapter 21 develops the multivariate regression architecture in full, showing how per-regressor partitioning with response-anchored centroids — combined with dependence-adaptive neighbor count — produces a nearest-neighbor prediction method over a compressed, denoised geometry that handles high-dimensional predictors substantially better than raw-observation kNN. + +--- + +## References + +- Stone, C. J. (1977). Consistent nonparametric regression. *Annals of Statistics*, 5(4), 595–620. + +- Lugosi, G., & Nobel, A. (1996). Consistency of data-driven histogram methods for density estimation and classification. *Annals of Statistics*, 24(2), 687–706. + +- Györfi, L., Kohler, M., Krzyżak, A., & Walk, H. (2002). *A Distribution-Free Theory of Nonparametric Regression*. Springer. + +- Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). *Classification and Regression Trees*. Wadsworth. + +- Vinod, H. D., & Viole, F. (2018). Clustering and curve fitting by line segments. *Preprints*, 2018010090. https://doi.org/10.20944/preprints201801.0090.v1 + +- Viole, F., & Nawrocki, D. (2012). Deriving nonlinear correlation coefficients from partial moments. *SSRN eLibrary*. https://doi.org/10.2139/ssrn.2148522 + +- Viole, F. (2016). NNS: Nonlinear nonparametric statistics. R package. https://cran.r-project.org/package=NNS diff --git a/tools/NNS/book/chapter-19-dynamic-bandwidth-interpretation.Rmd b/tools/NNS/book/chapter-19-dynamic-bandwidth-interpretation.Rmd new file mode 100644 index 0000000..bd06939 --- /dev/null +++ b/tools/NNS/book/chapter-19-dynamic-bandwidth-interpretation.Rmd @@ -0,0 +1,588 @@ +# Dynamic Bandwidth Interpretation + +Chapter 18 established the recursive mean-split estimator as a **partition-based nonparametric regression method** and showed that it is consistent for the true conditional mean function — not as a novel result requiring special proof, but as a direct consequence of belonging to the well-characterized class of data-adaptive partition estimators studied by Stone (1977), Lugosi and Nobel (1996), and Györfi et al. (2002). The shrinking-diameter and growing-occupancy conditions satisfied by recursive mean splitting are precisely the conditions that class-level consistency theorems require. + +A further interpretation now comes into focus: + +**the partition cell itself plays the role of a bandwidth.** + +In classical nonparametric estimation, smoothing is controlled by a user-chosen parameter. +In kernel regression this parameter is the bandwidth \(h\). +In histogram methods it is the bin width. +In splines it is the smoothing penalty. + +These quantities determine the scale over which observations are averaged. + +The recursive mean-split estimator also averages observations locally. But it does so without requiring the analyst to choose an external smoothing scale in the same way. Instead, the smoothing scale is induced by the partition: + +- large cells produce coarse smoothing, +- small cells produce fine smoothing. + +This chapter interprets the partition diameter as a **dynamic, stochastic, location-dependent bandwidth**, and uses that interpretation to connect recursive mean-split estimation to the broader theory of nonparametric smoothing. The consistency conditions already established in Chapter 18 are then re-read through this bandwidth lens — showing that they are the direct analogues of the shrinking-bandwidth conditions in classical kernel theory. + +--- + +## Bandwidth in Classical Nonparametric Estimation + +The notion of bandwidth is central to classical nonparametric methods. + +In kernel regression, the Nadaraya–Watson estimator takes the form + +\[ +\hat f_h(x) += +\frac{\sum_{i=1}^n K\!\left(\frac{x-X_i}{h}\right)Y_i} +{\sum_{i=1}^n K\!\left(\frac{x-X_i}{h}\right)}. +\] + +Here: + +- \(K(\cdot)\) is a kernel function, +- \(h>0\) is the bandwidth. + +The bandwidth determines how quickly weights decay as observations move away from \(x\). + +If \(h\) is small, only very nearby observations influence the estimate. +If \(h\) is large, distant observations receive substantial weight. + +Thus bandwidth governs the bias–variance tradeoff: + +- **small bandwidth** \(\rightarrow\) low bias, high variance, +- **large bandwidth** \(\rightarrow\) high bias, low variance. + +The same logic appears in histogram regression and fixed-grid estimators. +There the analogue of \(h\) is the cell width. + +Although classical methods differ in form, they share a common structure: + +**they smooth by averaging observations over neighborhoods whose size is controlled by a tuning parameter.** + +This parameter is usually chosen by cross-validation, plug-in rules, or asymptotic heuristics. + +That choice is often difficult, unstable, and highly consequential. + +--- + +## Local Averaging in the Recursive Mean-Split Estimator + +Recall from Chapter 18 that the recursive mean-split estimator has the form + +\[ +\hat f_n(x) += +\frac{1}{N_{n,x}} +\sum_{i:X_i\in A_n(x)} Y_i, +\] + +where + +- \(A_n(x)\) is the terminal cell containing \(x\), +- \(N_{n,x}\) is the number of observations in that cell. + +This is a **local average**. The only difference from a kernel or histogram estimator is how the local neighborhood is defined. + +In kernel regression, the neighborhood is determined by a distance-weighting rule around \(x\). +In recursive mean-split estimation, the neighborhood is the partition cell \(A_n(x)\). + +So the estimator already has the essential structure of a smoother: + +1. identify a local region around \(x\), +2. average the responses inside that region. + +The smoothing scale is therefore determined by the **size of the cell containing \(x\)**. + +This suggests the natural bandwidth analogue + +\[ +h_n(x) := \operatorname{diam}(A_n(x)), +\] + +where \(\operatorname{diam}(A_n(x))\) denotes the diameter of the cell. + +This quantity depends on + +- the sample, +- the recursive splitting path, +- the predictor location \(x\), +- the stopping rule. + +It is therefore **data-adaptive and location-specific**. + +--- + +## Partition Diameter as Stochastic Bandwidth + +The interpretation + +\[ +h_n(x) := \operatorname{diam}(A_n(x)) +\] + +makes the connection precise. + +The recursive mean-split estimator can be viewed as a **local-constant smoother with random bandwidth \(h_n(x)\)**. + +This bandwidth differs from the classical kernel bandwidth in three important ways. + +### It is stochastic + +The cells are determined by the observed sample, so \(h_n(x)\) is random. + +### It is location-dependent + +Different regions of the predictor space can have different cell sizes: + +\[ +h_n(x_1) \neq h_n(x_2) +\] + +in general. + +### It is endogenous + +The bandwidth is not imposed externally. +It emerges from the recursive mean-split geometry itself. + +This is the key conceptual shift. + +Classical bandwidth methods ask: + +**What smoothing scale should we choose?** + +The recursive mean-split framework instead asks: + +**What smoothing scale does the data imply through repeated mean-based partitioning?** + +The answer is encoded in the partition diameter. + +--- + +## Adaptive Smoothing Mechanisms + +Why does the induced bandwidth adapt to data structure? + +Because the recursive mean-split rule repeatedly partitions the data around local means. + +In regions where the conditional mean function changes rapidly, successive splits generate finer cells. +In regions where the conditional mean is comparatively flat, fewer effective refinements are needed and cells remain larger. + +Thus the smoothing scale contracts more aggressively where the signal is more complex. + +This adaptivity is the geometric content of the estimator. + +To see the intuition, consider three stylized regions of a regression surface. + +### Flat region + +Suppose \(f(x)\) is nearly constant on some interval. +A large cell still produces a good approximation because averaging over that region introduces little bias. + +### Moderately curved region + +If \(f(x)\) changes gradually, recursive splitting creates smaller cells so that local averages track the curvature more closely. + +### Sharp structural change + +If \(f(x)\) changes abruptly, the partition must refine more aggressively to avoid pooling observations across substantively different conditional means. + +So the induced bandwidth is not globally fixed. +It contracts where localization is most needed. + +This is exactly what one wants from a nonparametric smoother. + +--- + +## Consistency Conditions Re-Read Through the Bandwidth Lens + +Chapter 18 established that consistency of the partition estimator class requires two conditions: + +\[ +\operatorname{diam}(A_n(x)) \to 0 +\qquad\text{and}\qquad +N_{n,x}\to\infty. +\] + +Under the bandwidth interpretation, these become directly analogous to the standard kernel consistency conditions + +\[ +h_n(x)\to 0 +\qquad\text{and}\qquad +n_{\text{eff}}(x)\to\infty. +\] + +The first condition says the local neighborhood must shrink, so that the estimator becomes localized and bias vanishes. + +The second says the number of observations used in the local average must grow, so that variance vanishes. + +Thus the recursive mean-split estimator satisfies the same asymptotic logic as classical nonparametric smoothing, with the neighborhood size determined by partition geometry rather than by analyst choice. The two frameworks are not merely analogous — the partition estimator consistency result is the same theorem, stated in partition language rather than kernel language. + +In this sense, Chapter 18 can be re-read entirely through the bandwidth lens: + +- shrinking cell diameter = shrinking bandwidth, +- growing occupancy = growing effective local sample size. + +The bias–variance decomposition becomes + +\[ +\hat f_n(x)-f(x) += +\bigl(\hat f_n(x)-\bar f_n(x)\bigr) ++ +\bigl(\bar f_n(x)-f(x)\bigr), +\] + +where the variance term is controlled by occupancy and the bias term is controlled by the dynamic bandwidth \(h_n(x)\). + +--- + +## Local Averaging Interpretation + +The bandwidth interpretation also clarifies what the estimator is doing pointwise. + +For any \(x\), the recursive mean-split estimator averages over the region \(A_n(x)\): + +\[ +\hat f_n(x) += +E_n[Y\mid X\in A_n(x)], +\] + +where \(E_n\) denotes the empirical average. + +Thus \(\hat f_n(x)\) is the empirical conditional mean over a neighborhood whose diameter is \(h_n(x)\). + +This is exactly analogous to a local averaging estimator with adaptive window width. + +The distinction is that the "window" need not be symmetric, fixed-width, or Euclidean in the simplistic kernel sense. +It is determined by recursive partition membership. + +This yields two useful perspectives. + +### Piecewise-constant view + +The estimator is constant within each terminal cell. +So the partition defines a locally constant regression surface. + +### Piecewise-linear view + +As Chapter 18 emphasized, connecting regression points by line segments yields a piecewise-linear representation. +This can be interpreted as smoothing the cellwise local averages into a continuous interpolation while preserving the same adaptive partition geometry. + +In either case, the underlying localization scale is still determined by the cell diameter. + +--- + +## Comparison with Kernel Regression + +Kernel regression and recursive mean-split estimation share an essential principle: + +**both estimate \(f(x)\) by averaging responses from a local neighborhood around \(x\).** + +But they differ in how that neighborhood is defined. + +### Kernel regression + +Neighborhood influence is determined by weights + +\[ +K\!\left(\frac{x-X_i}{h}\right), +\] + +with a user-specified bandwidth \(h\). + +### Recursive mean-split estimation + +Neighborhood influence is determined by membership in the terminal cell \(A_n(x)\), whose effective width is + +\[ +h_n(x)=\operatorname{diam}(A_n(x)). +\] + +The contrast is important. + +#### External versus endogenous smoothing + +Kernel regression requires an externally chosen bandwidth. + +Recursive mean splitting generates its bandwidth from the data. + +#### Global versus local scale + +Kernel bandwidth is often global, unless one explicitly uses variable-bandwidth methods. + +Recursive mean splitting is inherently variable-bandwidth because cell sizes differ across locations. + +#### Weight decay versus region membership + +Kernel methods use continuously decaying weights. +Partition methods use discrete inclusion within a cell. + +#### Exact fit limit + +Kernel estimators remain weighted averages over continuous neighborhoods and do not achieve exact interpolation at all observed points simultaneously. + +Recursive mean-split estimation reaches a finite order \(O^*\) at which every point forms its own region and + +\[ +\hat f_n(X_i)=Y_i +\] + +for all observed \(i\). + +Thus the recursive mean-split estimator spans a spectrum from coarse smoothing to exact interpolation through the order parameter and occupancy rule. + +--- + +## Comparison with Fixed-Grid Estimators + +Fixed-grid partition methods divide the predictor space into prespecified intervals or cells. + +These methods have a bandwidth analogue as well: the grid width. + +But they suffer from a major limitation: + +**the grid is chosen before seeing the geometry of the data.** + +As a result: + +- dense regions may be oversmoothed, +- sparse regions may be undersmoothed, +- boundaries may cut across important nonlinear structure. + +Recursive mean-split estimation avoids this problem by generating the partition from the observed sample. + +So while both methods can be written as local averages over cells, only the recursive mean-split approach makes the bandwidth + +- stochastic, +- endogenous, +- responsive to conditional mean geometry. + +This is why the phrase **dynamic bandwidth** is appropriate: the smoothing scale changes with the data, the location, and the recursive structure of the estimator. + +--- + +## Comparison with CART and Tree-Based Methods + +CART and related tree methods also induce adaptive partitions. + +This makes them the closest classical relatives of recursive mean-split estimation. + +But the source of adaptivity differs. + +### CART + +Splits are selected greedily to optimize impurity reduction or squared-error reduction, usually followed by pruning or complexity penalties. + +### Recursive mean splitting + +Splits are anchored to the **local mean structure** of the data itself. + +This distinction matters because it changes the meaning of the induced bandwidth. + +In CART, cell size reflects the outcome of a greedy optimization path under a chosen impurity criterion. + +In recursive mean splitting, cell size reflects repeated partitioning around benchmark-relative local means. The bandwidth is therefore connected to the same directional benchmark logic developed throughout the book. + +So although both methods are adaptive partition estimators — and both inherit their consistency from the same class-level results — the recursive mean-split bandwidth is structurally tied to the directional framework rather than merely to greedy optimization. + +--- + +## A Simple Illustrative Example + +Consider the univariate sample + +\[ +(X,Y)= +(1,2), (2,3), (3,3), (6,8), (7,9), (8,9). +\] + +Suppose the first split occurs at the predictor mean + +\[ +\bar X = \frac{1+2+3+6+7+8}{6}=4.5. +\] + +This creates two predictor regions: + +- left cell: \(X \le 4.5\), +- right cell: \(X > 4.5\). + +The cellwise means are + +\[ +\hat f_{\text{left}} = \frac{2+3+3}{3}=\frac{8}{3}, +\qquad +\hat f_{\text{right}} = \frac{8+9+9}{3}=\frac{26}{3}. +\] + +At this stage, the effective bandwidths are roughly the cell diameters: + +\[ +h_{\text{left}} \approx 3-1 = 2, +\qquad +h_{\text{right}} \approx 8-6 = 2. +\] + +So both regions use a relatively coarse smoothing scale. + +Now suppose the right cell is split again because its internal structure warrants further refinement. +Then two smaller subcells are created, each with smaller diameter. Their corresponding bandwidths contract. + +The estimate in that region becomes more local. + +This simple example illustrates the principle: + +**every additional split reduces the effective bandwidth in the affected region.** + +Unlike a kernel estimator, which changes smoothness by altering a global numeric parameter \(h\), the recursive mean-split estimator changes smoothness by refining the partition itself. + +--- + +## The Order Parameter as Global Smoothing Control + +Although the bandwidth is local and stochastic, the order parameter \(O\) still plays an important global role. + +Increasing \(O\): + +- increases the number of potential regions, +- decreases typical cell diameter, +- reduces bias, +- increases variance, +- pushes the estimator toward exact interpolation. + +So \(O\) functions as a global control on the *capacity* of the partition, while the realized bandwidths \(h_n(x)\) provide the local smoothing scales. + +This is an important distinction. + +- \(O\) is not itself the bandwidth. +- Rather, \(O\) regulates how small the bandwidths are allowed to become. + +In that sense, the recursive mean-split estimator combines + +- a **global refinement control** through \(O\), and +- **local adaptive smoothing** through \(h_n(x)=\operatorname{diam}(A_n(x))\). + +In the default implementation, this global control is deployed locally as well: when `order = NULL`, each regressor receives its own effective order based on its directional dependence strength with the response. Bandwidth therefore becomes not only stochastic and location-dependent, but also **signal-dependent**, allocating finer resolution where directional dependence is strongest and broader smoothing where evidence is weaker. + +This dual structure helps explain why the method can be both flexible and interpretable. + +--- + +## Multivariate Interpretation + +The bandwidth interpretation extends naturally to multivariate predictors, and here it connects to the key architectural feature introduced in Chapter 18 and developed fully in Chapter 21. + +In the NNS multivariate setting, each predictor is partitioned independently against the response, rather than partitioning the joint predictor space. The per-regressor bandwidths are therefore: + +\[ +h_n^{(j)}(x^{(j)}) := \operatorname{diam}(A_n^{(j)}(x^{(j)})), \quad j = 1, \dots, d, +\] + +where \(A_n^{(j)}\) is the terminal cell for predictor \(j\) in its own univariate partition against \(Y\). + +These per-regressor bandwidths are each data-adaptive and stochastic. But crucially, they do not compound exponentially as \(d\) grows, because the partition is not being formed in the joint \(d\)-dimensional predictor space. Each regressor's smoothing scale is determined independently by its own relationship with the response. + +This is the bandwidth-level expression of the curse-of-dimensionality mitigation described in Chapter 18: by partitioning each regressor against the response separately, the effective smoothing scale for each predictor dimension is governed by the univariate data density in that dimension — not by the joint density in \(\mathbb{R}^d\), which deteriorates rapidly with dimension. + +The resulting regression point matrix then supports a nearest-neighbor prediction step in a space whose size grows linearly in \(d\), with each candidate neighbor already denoised through local averaging. The per-regressor dynamic bandwidths make this compression principled: each bandwidth has already adapted to local signal structure before the joint prediction step begins. + +--- + +## Advantages of the Dynamic Bandwidth View + +Interpreting recursive mean-split estimation through bandwidth offers several conceptual advantages. + +### It links NNS to classical smoothing theory + +The estimator is not an isolated procedure. +It belongs to the same family of local averaging methods as kernels and histograms, and its consistency is the class-level result of Stone (1977) re-expressed in partition geometry. + +### It clarifies the consistency proof + +The shrinking-diameter condition from Chapter 18 is simply the shrinking-bandwidth condition in disguise. No additional theoretical machinery is needed; the connection is structural. + +### It explains adaptivity + +Because the bandwidth is generated by the data, smoothing automatically varies across regions. + +### It avoids arbitrary external tuning in the classical sense + +Rather than choosing a bandwidth directly, the analyst controls partition refinement and occupancy, while the local smoothing scale emerges endogenously. + +### It preserves interpretability + +Each bandwidth corresponds to an actual region of the observed data, not merely to a tuning number in a weighting formula. + +### It clarifies the multivariate architecture + +Per-regressor bandwidths make explicit why the NNS multivariate approach avoids the worst consequences of the curse of dimensionality: each dimension's smoothing scale is determined by its own data density and its own relationship with the response, rather than by the joint density of the full predictor space. + +--- + +## Structural Interpretation within NNS + +This chapter completes an important conceptual arc in the book. + +Earlier chapters showed that directional deviation operators generate: + +- cumulative distribution functions, +- classical moments, +- nonlinear dependence measures, +- benchmark-relative probability statements. + +Chapter 18 then showed that recursive mean splitting uses benchmark-relative geometry to define estimation regions and produces a consistent estimator by class membership. + +This chapter adds the final interpretation: + +**those same deviation-defined regions induce a stochastic bandwidth.** + +So the NNS estimator is not merely a partition rule. + +It is an adaptive smoothing procedure whose local scale is generated by recursive benchmark-relative decomposition — and whose consistency is not a novel claim but a direct inheritance from the established theory of partition estimators. + +The structural message is therefore unified: + +- directional deviations define probability mass, +- directional deviations decompose moments, +- directional deviations reveal dependence, +- directional deviations generate estimation regions, +- and those regions determine the local smoothing scale. + +Bandwidth, in this framework, is not an external input. + +It is an emergent property of the directional partition itself. + +--- + +## Summary + +The main ideas of this chapter are: + +- Classical nonparametric methods smooth by averaging over neighborhoods whose size is controlled by a bandwidth or bin width. +- The recursive mean-split estimator is also a local averaging estimator, but its neighborhood is the terminal partition cell \(A_n(x)\). +- The natural bandwidth analogue is the cell diameter + +\[ +h_n(x)=\operatorname{diam}(A_n(x)). +\] + +- This bandwidth is **stochastic, location-dependent, and endogenous**. +- Regions where the conditional mean varies more sharply receive finer partitions and therefore smaller effective bandwidths. +- The consistency conditions from Chapter 18 — shrinking diameter and growing occupancy — are exactly the shrinking-bandwidth and growing-local-sample-size conditions of classical nonparametric kernel theory. Consistency is inherited from the partition estimator class; the bandwidth interpretation makes this inheritance explicit. +- In the multivariate case, per-regressor bandwidths are determined independently for each predictor's relationship with the response, avoiding the exponential deterioration of joint bandwidth in high dimensions and providing the bandwidth-level foundation for the curse-of-dimensionality mitigation developed in Chapter 21. + +The next chapter turns from the smoothing interpretation of recursive mean splitting to one of its major practical consequences: **clustering**. If recursive partitioning can define local estimation neighborhoods, it can also define groups of structurally similar observations. + +--- + +## References + +- Stone, C. J. (1977). Consistent nonparametric regression. *Annals of Statistics*, 5(4), 595–620. + +- Lugosi, G., & Nobel, A. (1996). Consistency of data-driven histogram methods for density estimation and classification. *Annals of Statistics*, 24(2), 687–706. + +- Györfi, L., Kohler, M., Krzyżak, A., & Walk, H. (2002). *A Distribution-Free Theory of Nonparametric Regression*. Springer. + +- Wand, M. P., & Jones, M. C. (1995). *Kernel Smoothing*. Chapman and Hall. + +- Fan, J., & Gijbels, I. (1996). *Local Polynomial Modelling and Its Applications*. Chapman and Hall. diff --git a/tools/NNS/book/chapter-20-synthetic-data-and-maximum-entropy-bootstrap.Rmd b/tools/NNS/book/chapter-20-synthetic-data-and-maximum-entropy-bootstrap.Rmd new file mode 100644 index 0000000..8f93bc3 --- /dev/null +++ b/tools/NNS/book/chapter-20-synthetic-data-and-maximum-entropy-bootstrap.Rmd @@ -0,0 +1,454 @@ +# Synthetic Data and Maximum Entropy Bootstrap + +The previous chapter showed that threshold selection, directional probability bounds, and finite-sample stability are linked problems. These concerns naturally motivate resampling procedures that preserve empirical structure while generating synthetic realizations for robustness analysis. This chapter therefore turns to maximum entropy bootstrap and related synthetic-data methods as computational tools for the estimation and threshold-analysis problems developed in Chapter 16. + +In many practical situations, however, analysts require not only statistical summaries of existing data but also **synthetic data generation**. Synthetic data can be used for simulation, risk analysis, forecasting evaluation, and Monte Carlo experiments. + +A common approach for generating synthetic datasets is **bootstrap resampling**, which repeatedly samples from the observed data. While useful, classical bootstrap procedures assume independence or rely on model-based adjustments to accommodate dependence. + +Time-series data present a particular challenge because **temporal dependence must be preserved** for synthetic samples to remain realistic. + +The **maximum entropy bootstrap (ME bootstrap)** provides a solution. By constructing bootstrap samples that satisfy entropy-maximizing constraints while preserving the empirical dependence structure, the method produces synthetic time series that retain key statistical properties of the original data. + +The NNS implementation, `NNS.meboot`, extends the original `meboot` algorithm in several important directions: it allows the user to specify an **arbitrary target rank correlation** between the original and resampled series, supports multiple dependence metrics native to the NNS framework, provides fine-grained control over the trend component of synthetic series, and enables a richer class of Monte Carlo simulations than classical iid resampling permits. + +This chapter introduces bootstrap methods, explains the theory of the maximum entropy bootstrap, and demonstrates how synthetic time series can be generated and customized using `NNS.meboot`. + +--- + +## Bootstrap Methods + +The bootstrap is a general method for estimating sampling distributions by **resampling from observed data**. + +Suppose a dataset consists of observations + +$$ +x_1, x_2, \dots, x_n . +$$ + +A bootstrap sample is obtained by drawing observations **with replacement** from this set, producing + +$$ +x_1^*, x_2^*, \dots, x_n^* . +$$ + +Repeating this procedure many times produces an ensemble of synthetic datasets. Any statistic $T(X)$ can then be evaluated across the bootstrap samples to approximate its sampling distribution. + +Bootstrap procedures are widely used to estimate standard errors, confidence intervals, and bias corrections. + +However, classical bootstrap resampling assumes that observations are **independent and identically distributed (i.i.d.)**. + +For time-series data this assumption fails because observations exhibit **serial dependence**. Simple resampling destroys the temporal ordering and therefore eliminates the structure that generated the data. + +--- + +## Limitations of Classical Bootstrap for Time Series + +Several modifications of the bootstrap have been proposed to address dependence. + +### Block Bootstrap + +The block bootstrap resamples contiguous blocks of observations rather than individual values. This partially preserves local dependence but introduces new design choices: block length, overlap structure, and edge effects. Choosing block parameters can strongly influence results. + +### Model-Based Bootstrap + +Another approach fits a parametric time-series model such as ARIMA and then simulates synthetic data from the estimated model. This method inherits the limitations of the assumed model: specification risk, distributional assumptions, and sensitivity to parameter estimation. + +Both approaches therefore require **tuning choices or parametric assumptions**. + +### The iid Correlation Constraint + +A further limitation of standard iid Monte Carlo simulation (MCS) is less obvious but practically important. When a large number of resampled series is generated by iid shuffling with replacement, the Pearson correlation coefficients between those series and the original tend to cluster in a narrow range of approximately $[-0.3, 0.3]$, regardless of the underlying data. This occurs because the expected correlation between two independent random samples drawn from the same distribution is zero; sampling variability alone produces the observed spread, and no mechanism drives resamples toward strongly positive or negative correlation with the original. As a result, standard MCS does not provide adequate variety in simulated paths — tail scenarios and strongly correlated or anti-correlated futures are systematically underrepresented. + +The maximum entropy bootstrap avoids these problems by constructing synthetic samples using **information-theoretic principles** that preserve the empirical dependence structure without specifying a parametric model, and by allowing the user to inject controlled variety through a target rank correlation parameter. + +--- + +## Maximum Entropy Principle + +The maximum entropy principle originates from information theory. + +Given incomplete information about a system, the probability distribution that best represents the current state of knowledge is the one that **maximizes entropy subject to known constraints**. + +For a discrete distribution with probabilities $p_i$, entropy is + +$$ +H = -\sum_i p_i \log p_i . +$$ + +Maximizing entropy ensures that the resulting distribution introduces **no additional assumptions beyond the constraints provided by the data**. + +In the context of bootstrap resampling, the constraints arise from the empirical properties of the observed time series. The goal is to generate synthetic sequences that satisfy these constraints while maximizing entropy, thereby producing the **least-biased distribution consistent with the data**. + +--- + +## Maximum Entropy Bootstrap for Time Series + +The maximum entropy bootstrap constructs synthetic time-series samples through a sequence of steps that preserve essential features of the observed data. + +Let the observed series be + +$$ +x_1, x_2, \dots, x_n . +$$ + +The ME bootstrap algorithm proceeds conceptually as follows. + +### Step 1: Order Statistics + +Sort the observations to obtain the ordered sample + +$$ +x_{(1)} \le x_{(2)} \le \dots \le x_{(n)} . +$$ + +The ordering allows construction of piecewise intervals between adjacent values. + +### Step 2: Interval Construction + +Define intervals between successive order statistics and extend them at the boundaries. These intervals represent regions in which synthetic observations may occur. + +### Step 3: Maximum Entropy Density + +Within each interval a density function is constructed so that the resulting distribution maximizes entropy while preserving the empirical mean and variance. The interval means follow the Theil–Laitinen weighting scheme, which assigns weight $0.25$ to each neighbor and $0.50$ to the central value for interior points, with boundary adjustments at the extremes. + +### Step 4: Random Sampling + +Random values are drawn from this maximum-entropy distribution. + +### Step 5: Time Ordering + +The sampled values are reordered to match the rank structure of the original series, restoring temporal dependence. + +The resulting synthetic dataset preserves marginal distribution characteristics, dependence structure, and sample size. Because the method uses entropy maximization rather than parametric modeling, it remains **distribution-free**. + +--- + +## Dependence-Preserving Resampling + +The key innovation of the maximum entropy bootstrap is the preservation of **rank dependence**. + +Let + +$$ +R_t = \text{rank}(x_t) +$$ + +denote the rank of observation $x_t$ within the sample. After synthetic values are generated, they are assigned to time indices according to the same rank ordering: + +$$ +x_t^* = y_{(R_t)} +$$ + +where $y_{(i)}$ denotes the $i$-th ordered synthetic value. + +This mapping ensures that the **relative ordering of observations over time** matches that of the original data. As a result, autocorrelation and other dependence features remain approximately preserved. Unlike block bootstrap methods, this approach requires **no block-length tuning** and does not impose parametric assumptions. + +### Theoretical Basis for Rank Matching + +The theoretical justification for perfect rank matching was formalized by Joag-dev (1984), who showed that if one requires strong dependence between the original series $x_t$ and any resampled series $x_t^*$ without imposing parametric constraints, the order statistics of both series must conform with each other. This distribution-free measure of strong dependence corresponds to a Spearman rank correlation of unity. + +However, as discussed in Section 16.6, the NNS implementation relaxes this constraint, allowing the user to specify any target rank correlation in $[-1, 1]$. + +--- + +## Arbitrary Spearman Rank Correlation: The `rho` Parameter + +A major extension of `NNS.meboot` relative to the original `meboot` package is the ability to specify an **arbitrary Spearman rank correlation** $\rho \in [-1, 1]$ between the original series and each bootstrap replicate. This is controlled by the `rho` argument. + +The standard meboot algorithm always produces resamples with $\rho = 1$ relative to the original series (perfect rank alignment). While this preserves dependence maximally, it limits the variety of simulated paths. For some applications, such as stress testing, scenario analysis, or Monte Carlo simulation, the analyst may want resamples that are weakly correlated, orthogonal, or even negatively correlated with the original. + +### How Rank Targeting Works + +For each replicate, the algorithm constructs two extreme orderings: + +- **Aligned**: synthetic values sorted to match the rank order of the original residuals (corresponds to $\rho = +1$). +- **Anti-aligned**: synthetic values sorted in the reverse rank order of the original residuals (corresponds to $\rho = -1$). + +A convex combination of these two extremes is then optimized so that the resulting series achieves the target correlation $\rho$ with the original. The optimization is performed replicate-by-replicate in residual space. + +### Dependence Metric Options + +The `type` argument controls which dependence measure is targeted: + +| `type` | Measure used | +|---------------|---------------------------------------| +| `"spearman"` | Spearman rank correlation (default) | +| `"pearson"` | Pearson linear correlation | +| `"NNScor"` | NNS nonlinear correlation | +| `"NNSdep"` | NNS nonlinear dependence | + +The `"NNScor"` and `"NNSdep"` options integrate the NNS co-partial-moment framework directly into the bootstrap loop, allowing dependence targeting that captures nonlinear relationships. `"NNScor"` corresponds to the NNS nonlinear correlation coefficient introduced in Chapter 10, which detects monotonic and non-monotonic associations through co-partial moments; `"NNSdep"` corresponds to the directional dependence measure from Chapter 10, which quantifies the strength of dependence independently of its direction or functional form. + +### Simulation Evidence + +Vinod and Viole (2020) demonstrate through simulation that for OLS inference on nonstationary I(1) data, meboot-based confidence intervals with $\rho \ge 0.6$ outperform traditional OLS confidence intervals. When $\rho = 1$, the average absolute deviation from the nominal rejection rate is approximately $0.037$, far smaller than the OLS analog of approximately $0.423$. + +For random walk experiments, setting $\rho < 0.5$ transforms the resampled series into stationary I(0) series — verified by the ADF test rejecting the unit-root null. This suggests that `rho` provides a new, model-free route to stationarizing nonstationary series as an alternative to differencing or de-trending. + +--- + +## Trend Decomposition and Drift Control + +The `NNS.meboot` implementation operates on **residuals** rather than raw levels. Specifically: + +1. A linear trend is estimated from the original series via ordinary least squares. +2. The ME bootstrap resampling is applied to the residuals. +3. Reconstructed synthetic series are formed as $\text{baseline}_t + \text{resampled residual}_t$, where the baseline is the fitted linear trend evaluated at each time point. + +This decomposition ensures that the synthetic series inherit the correct distributional properties from the residuals while allowing independent control of the trend component. + +### Drift Arguments + +Three arguments govern the trend component: + +- **`drift = TRUE`** (default): the original series' estimated linear drift is preserved in all replicates. +- **`drift = FALSE`**: the trend is removed; replicates are centered around a flat baseline. +- **`target_drift`**: specifies an explicit drift value (e.g., a risk-free rate of return) to impose on all replicates. +- **`target_drift_scale`**: multiplies the estimated drift by a scalar, allowing proportional adjustments. + +These options are particularly useful in financial applications where synthetic return paths should reflect a specific expected return or be drift-neutral for risk attribution purposes. + +--- + +## Synthetic Time-Series Generation with `NNS.meboot` + +The NNS package provides `NNS.meboot` for generating synthetic bootstrap samples that preserve the empirical distribution and temporal structure of the original series. + +### Basic Usage +```r +library(NNS) + +# Generate 100 bootstrap replicates of AirPassengers +boots <- NNS.meboot(AirPassengers, reps = 100, rho = 1, xmin = 0) + +# Verify Spearman correlation of ensemble to original +cor(boots["ensemble", ]$ensemble, AirPassengers, method = "spearman") + +# Plot all replicates +matplot(boots["replicates", ]$replicates, type = "l") + +# Overlay ensemble mean +lines(boots["ensemble", ]$ensemble, lwd = 3) + +# Overlay original +lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red") +``` + +
    +![Figure 17.1. `NNS.meboot` replicates with ensemble mean (black) and original series (red), illustrating dependence-preserving bootstrap paths.](images/ch17_meboot_orig.png) +
    + + +### Return Object + +`NNS.meboot` returns a named list with the following elements: + +| Element | Description | +|---------------|-----------------------------------------------------------------| +| `x` | Original input data | +| `replicates` | Matrix of bootstrap replicates (rows = time, cols = replicates) | +| `ensemble` | Row mean across all replicates | +| `xx` | Sorted order statistics of residuals | +| `z` | Class interval limits | +| `dv` | Absolute consecutive deviations | +| `dvtrim` | Trimmed mean of `dv` (used for tail extension) | +| `xmin` | Effective lower bound for ensemble values | +| `xmax` | Effective upper bound for ensemble values | +| `desintxb` | Desired interval means (Theil–Laitinen) | +| `ordxx` | Rank ordering of original residuals | +| `kappa` | Scale adjustment factor (if `scl.adjustment = TRUE`) | + +### Vectorized `rho` + +The `rho` argument is vectorized, enabling a single call to produce replicates at multiple target correlations simultaneously: +```r +# Three sets of replicates: orthogonal, half-correlated, and fully correlated +boots <- NNS.meboot(AirPassengers, reps = 10, rho = c(0, 0.5, 1), xmin = 0) + +matplot(do.call(cbind, boots["replicates", ]), type = "l") +lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red") +``` + +For Monte Carlo workflows, the package also provides `NNS.MC()` as a convenience wrapper around this `NNS.meboot`-based simulation pipeline. + +Similarly, `target_drift` is vectorized across drift levels while holding `rho` fixed: +```r +# Replicates with two different target drift rates, rho fixed at 0 +boots <- NNS.meboot(AirPassengers, reps = 10, rho = 0, xmin = 0, + target_drift = c(1, 7)) +matplot(do.call(cbind, boots["replicates", ]), type = "l") +lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red") +``` + +--- + +## Improved Monte Carlo Simulation + +### The Limitation of Standard iid MCS + +Traditional Monte Carlo simulation generates synthetic paths by sampling with replacement from the observed series. While easy to implement, this approach inadvertently constrains the variety of simulated paths. + +Empirically, when 10,000 iid resamples are drawn from a series, the Pearson correlation coefficients between each resample and the original series cluster in the range approximately $[-0.3, 0.3]$, with a pronounced concentration near zero. This happens because random shuffling with replacement systematically destroys correlation structure without exploring strongly positive or strongly negative paths. +```r +set.seed(12345) +xt <- rnorm(1:100, mean = 9, sd = 12) + +# Standard iid MCS: 10,000 replicates +X <- matrix(NA, nrow = 100, ncol = 10000) +for (i in 1:10000) { + X[, i] <- sample(xt, 100, replace = TRUE) +} + +hist(cor(X, xt), + main = "Standard iid Monte Carlo Simulation\n10,000 Iterations", + xlab = "Correlation") +``` + +
    +![Figure 17.2. Standard iid Monte Carlo correlation histogram (`cor(X, x_t)`), typically concentrated near zero.](images/ch17_iid_mc_sim.png) +
    + + +The resulting histogram is tightly centered on zero, confirming that standard MCS cannot generate strongly correlated or anti-correlated scenario paths. + +### Expanded Simulation via Vectorized `rho` + +The `NNS.meboot` approach spans the full correlation range $[-1, 1]$ by generating one replicate per target $\rho$ value across a fine grid, then expanding each to a set of standard meboot replicates: +```r +library(NNS) +set.seed(12345) +xt <- rnorm(1:100, mean = 9, sd = 12) + +# Step 1: Generate one replicate per rho across [-1, 1] +boots_grid <- NNS.meboot(xt, reps = 1, + rho = seq(-1, 1, 0.01), + drift = FALSE) + +Z <- do.call(cbind, boots_grid["replicates", ]) + +# Step 2: Expand each to 50 standard meboot replicates +new_MCS <- list() +for (i in 1:dim(Z)[2]) { + new_MCS[[i]] <- NNS.meboot(Z[, i], reps = 50, rho = 1, + drift = FALSE)["replicates", ]$replicates +} + +hist(cor(do.call(cbind, new_MCS), xt), + main = "NNS.meboot Simulation\n10,000 Iterations", + xlab = "Correlation") +``` + +
    +![Figure 17.3. `NNS.meboot` Monte Carlo correlation histogram after vectorized dependence targeting, spanning a wider \([-1,1]\) range.](images/ch17_meboot_mc_sim.png) +
    + + +The resulting distribution of correlations is approximately uniform across $[-1, 1]$, confirming that the `rho`-vectorized approach provides a fundamentally richer simulation basis than standard iid MCS. + +--- + +## Applications in Forecasting and Risk Analysis + +### Financial Risk Management: Value at Risk and Expected Shortfall + +The richer simulation produced by `NNS.meboot` is especially valuable in financial risk management, where tail behavior drives key risk metrics. + +Vinod and Viole (2020) demonstrate this using ten years of daily S&P 500 returns (1998–2007) to generate simulations, then evaluating out-of-sample performance against the 2008 financial crisis. The simulation exercise generates approximately 2.5 million observations from each method to estimate the 1% Value at Risk (VaR), expected shortfall (ES), and minimum simulated return. + +| Metric | Actual 2008 | Standard MCS | NNS.meboot | +|---------------------|-------------|--------------|------------| +| 99% VaR | −8.77% | −2.94% | −3.10% | +| Expected Shortfall | −8.92% | −3.66% | −3.91% | +| Minimum Value | −9.03% | −6.80% | −15.68% | + +The NNS.meboot simulation reveals a minimum simulated daily return exceeding −15%, a result that would have warned investors of extreme tail risk before the crisis. The standard MCS minimum of approximately −6.8% — itself less than the observed worst day in the preceding nine years of data — conveyed a dangerously false sense of security. + +The `sym = TRUE` argument forces the maximum entropy density to be symmetric around zero within each interval. For financial return series — where positive and negative deviations of equal magnitude should receive equal probability mass — this prevents the ME density from inheriting any asymmetry that may be present in the residuals, producing a more conservative and balanced tail exploration. Combined with `xmin` set to cap extreme losses, the simulation explores the left tail without imposing a parametric distributional form: +```r +library(quantmod) +library(NNS) + +getSymbols("^GSPC", from = "1998-01-01", to = "2009-01-01") +SPX_train <- as.numeric(dailyReturn(GSPC["1998-01::2008-01"])) +SPX_test <- as.numeric(dailyReturn(GSPC["2008"])) + +# Generate paths across the full rho range +SPX_boots <- NNS.meboot(SPX_train, reps = 1, + rho = seq(-1, 1, 0.01), + drift = FALSE) + +SPX_meboot_grid <- do.call(cbind, SPX_boots["replicates", ]) + +# Expand each path to 5 replicates with symmetric ME density +new_SPX <- list() +for (i in 1:dim(SPX_meboot_grid)[2]) { + new_SPX[[i]] <- NNS.meboot(SPX_meboot_grid[, i], reps = 5, + rho = 1, drift = FALSE, + sym = TRUE)["replicates", ]$replicates +} + +all_returns <- unlist(new_SPX) + +# Risk metrics +quantile(all_returns, 0.01) # 99% VaR +mean(all_returns[all_returns <= quantile(all_returns, 0.01)]) # ES +min(all_returns) # Minimum simulated return +``` + +### Forecast Model Evaluation + +Bootstrapped time series allow analysts to evaluate forecast stability, assess model sensitivity, and compute predictive distributions without imposing distributional assumptions on forecast errors. + +### Stationarity Transformation + +Setting `rho` to small values (below approximately $0.5$) produces resampled series that the ADF test classifies as stationary, even when the original series is I(1). This provides a model-free alternative to differencing or de-trending that may be preferable when the analyst does not wish to commit to a specific transformation. + +--- + +## Relationship to the NNS Framework + +The maximum entropy bootstrap fits naturally within the directional statistics framework developed throughout this book. + +Directional methods emphasize distributional structure relative to benchmarks, while ME bootstrap preserves the empirical distribution and dependence relationships from which those directional statistics are computed. Synthetic samples generated through `NNS.meboot` therefore maintain the properties required for downstream NNS analyses, including: + +- partial-moment estimation, +- directional dependence measurement via `NNS.dep`, +- distribution comparison via NNS ANOVA, +- nonparametric forecasting. + +The `type = "NNScor"` and `type = "NNSdep"` options close this loop explicitly: the bootstrap targeting criterion is itself computed using NNS co-partial moments, so the resampling respects the same nonlinear dependence geometry as the rest of the NNS toolkit. + +By combining entropy-based resampling with directional statistical measures, analysts obtain a fully **nonparametric workflow for simulation and inference**. + + + +--- + +## Summary + +This chapter introduced synthetic data generation through the maximum entropy bootstrap as implemented in `NNS.meboot`. + +Key points include: + +- Classical bootstrap methods assume independence and may destroy temporal structure. +- Standard iid MCS generates simulated paths with correlations concentrated near zero, failing to represent tail or counter-trend scenarios. +- The maximum entropy bootstrap constructs synthetic samples by maximizing entropy subject to empirical constraints, preserving marginal distributions and dependence structure without parametric assumptions. +- Rank-based reordering preserves the temporal dependence structure of the original series. +- `NNS.meboot` extends the original `meboot` algorithm with a vectorized `rho` argument that targets any Spearman rank correlation in $[-1, 1]$, multiple dependence metric options including NNS-native measures, and drift decomposition for precise trend control. +- The expanded Monte Carlo simulation enabled by vectorized `rho` spans the full correlation range and provides materially richer risk estimates, as demonstrated in the S&P 500 stress-testing example. +- Low `rho` settings offer a model-free approach to generating stationary resamples from nonstationary series. + +Together with the prediction and inference methods developed in earlier chapters, the ME bootstrap provides a powerful tool for **distribution-free simulation and forecasting within the NNS framework**. + +--- + +## References + +- Joag-Dev, K. (1984). Measures of dependence. In P. K. Krishnaiah & P. K. Sen (Eds.), *Handbook of Statistics* (Vol. 4, pp. 79–88). North-Holland. +- Vinod, H. D. (2004). Ranking mutual funds using unconventional utility theory and stochastic dominance. *Journal of Empirical Finance*, **11**(3), 353–377. +- Vinod, H. D. (2006). Maximum entropy ensembles for time series inference in economics. *Journal of Asian Economics*, **17**(6), 955–978. +- Vinod, H. D. (2013). *Maximum entropy bootstrap algorithm enhancements* (SSRN Working Paper 2285041). https://doi.org/10.2139/ssrn.2285041 +- Vinod, H. D., & López-de-Lacalle, J. (2009). Maximum entropy bootstrap for time series: The meboot R package. *Journal of Statistical Software*, **29**(5), 1–19. +- Vinod, H. D., & Viole, F. (2020). *Arbitrary Spearman's rank correlations in maximum entropy bootstrap and improved Monte Carlo simulations* (SSRN Working Paper 3621614). https://doi.org/10.2139/ssrn.3621614 +- Viole, F. (2016). NNS: Nonlinear nonparametric statistics. R package. https://cran.r-project.org/package=NNS diff --git a/tools/NNS/book/chapter-21-clustering.Rmd b/tools/NNS/book/chapter-21-clustering.Rmd new file mode 100644 index 0000000..144755d --- /dev/null +++ b/tools/NNS/book/chapter-21-clustering.Rmd @@ -0,0 +1,504 @@ +# Clustering + +Chapters 19–20 developed the recursive mean-split estimator as a partition-based method for nonparametric estimation and showed that its induced cell geometry behaves like a dynamic bandwidth. Those results focused on **supervised estimation**, where a response variable $Y$ is observed and the objective is to recover a conditional mean function. + +This chapter turns to **unsupervised learning**. + +In unsupervised learning there is no designated response variable. The goal is instead to discover **structure within the data itself**. Among the most fundamental unsupervised tasks is **clustering**: partitioning observations into groups whose members are more similar to one another than to members of other groups. + +Classical clustering methods such as **k-means** and **hierarchical clustering** are widely used, but they inherit familiar limitations: + +- k-means is driven by Euclidean distance and spherical-centroid geometry, +- hierarchical clustering depends heavily on linkage rules, +- both require the analyst to specify or assume a number of groups in advance, +- and both can obscure nonlinear, asymmetric, and benchmark-relative structure. + +The directional framework developed throughout this book suggests a different perspective. + +Rather than defining similarity purely by geometric distance, we can define it through **directional behavior relative to local benchmarks**. In this view, clustering is not merely grouping by proximity. It is grouping by **shared directional structure**. + +Critically, the number of clusters that emerges from this procedure is **determined by the data**, not prescribed by the analyst. This is not a minor implementation detail. It is a fundamental departure from methods that require $K$ to be chosen before any structure has been examined — and it means, in particular, that the number of clusters need not equal, and in general will not equal, the number of class labels in any downstream supervised problem. + +This chapter develops that idea using the partition machinery introduced in Chapters 19–20 and connects it to the NNS clustering procedure. + +--- + +## What Clustering Seeks to Recover + +Suppose we observe multivariate data + +$$X_1, X_2, \dots, X_n \in \mathbb{R}^d.$$ + +A clustering algorithm seeks a partition of the index set + +$$\{1,2,\dots,n\} = C_1 \cup C_2 \cup \cdots \cup C_K, \qquad C_j \cap C_\ell = \varnothing \text{ for } j \ne \ell,$$ + +where each set $C_k$ is interpreted as a **cluster**. + +The ideal is that observations within the same cluster share a common structural pattern, while observations in different clusters do not. + +But this raises two immediate questions: + +**What does "similar" mean?** + +And equally: **how many groups should there be?** + +Different clustering methods answer both questions differently. Classical methods typically answer the second question first — by requiring the analyst to supply $K$ — and then optimize toward that prespecified count. The directional framework answers neither question in advance. Similarity is defined through benchmark-relative structure, and the number of clusters is whatever the recursive partition produces under the chosen stopping rule. Crucially, that count is a **consequence of the data**, not an assumption about it. + +- Euclidean methods define similarity through geometric closeness. +- Density methods define similarity through connected regions of high probability mass. +- Model-based methods define similarity through shared latent distributions. +- Directional methods define similarity through shared benchmark-relative structure. + +The directional definition is particularly natural when the variables exhibit asymmetry, nonlinear dependence, or tail-specific behavior. + +--- + +## Why Distance Alone Can Mislead + +The simplest clustering intuition is geometric distance. + +If two observations $x_i$ and $x_j$ are close in Euclidean norm, + +$$\|x_i - x_j\|,$$ + +they are regarded as similar. + +This works well when clusters are compact, approximately spherical, and separated mainly by location. + +But many real datasets violate these conditions. + +### Nonlinear shape + +Two observations can lie on the same curved manifold and be structurally similar even if their straight-line distance is not especially small. + +### Asymmetric spread + +Clusters may have very different directional variability: wide in one direction, narrow in another. + +### Benchmark-relative structure + +Two points may be similar because they lie on the same side of a threshold across several variables, even if their raw coordinates differ. + +### Tail co-movement + +Observations may be grouped naturally by whether they occur jointly in extreme regions rather than by ordinary central distance. + +Distance alone therefore treats all directions symmetrically and all dimensions uniformly unless ad hoc reweighting is imposed. + +The directional framework instead asks whether observations share common **sign and magnitude of deviation** relative to meaningful benchmarks. + +--- + +## Partition-Based Clustering + +The recursive mean-split machinery from Chapter 18 provides a natural unsupervised clustering mechanism. + +In supervised estimation, partition cells were used to approximate a regression surface. In clustering, the same recursive partitions can be interpreted directly as **unsupervised groupings of the data cloud**. + +Let $R\subset \mathbb{R}^d$ denote a region containing a subset of observations. Define the local centroid + +$$\bar X_R = \frac{1}{|I_R|}\sum_{i\in I_R} X_i,$$ + +where $I_R$ indexes the observations in region $R$. + +The directional idea is to partition the region relative to this local benchmark. + +In two dimensions this creates four quadrants. In $d$ dimensions it creates up to $2^d$ directional subregions, depending on which coordinates lie above or below their local means. + +At each stage: + +1. compute the local benchmark vector, +2. assign each observation according to its directional deviation pattern, +3. recurse within each nonempty subregion. + +This yields a tree of increasingly refined directional cells. + +The terminal cells form a clustering of the data. Their number is determined entirely by where observations fall relative to successive local means and by the chosen stopping rule. No value of $K$ is ever specified. + +Unlike k-means, these clusters are not required to be spherical, convex, or globally separable by Voronoi boundaries. They arise from repeated local directional refinement. + +--- + +## Clusters Are Not Classes + +A point that deserves explicit statement: **the number of clusters produced by the directional partition need not equal, and in general will not equal, the number of classes in a downstream classification problem.** + +This distinction matters because the two concepts serve different purposes. + +A **cluster** is a group of observations that share common benchmark-relative directional structure in the predictor space. It is an unsupervised concept, derived entirely from the geometry of the data cloud without reference to any response variable or label. + +A **class** is a label assigned to an observation — by a supervisor, domain expert, or measured outcome. In a classification problem, classes are given; the analyst's task is to predict them. + +The relationship between clusters and classes is empirical, not definitional. A dataset with 3 known class labels may contain 4, 7, or 12 natural directional clusters, because the feature space harbors more local structure than the labels alone reflect. Conversely, a dataset with 10 nominal categories may collapse to 3 meaningfully distinct directional clusters if several categories share the same benchmark-relative geometry. + +Neither outcome is a failure. They reflect the fact that clusters describe **where the data lives** while classes describe **what the data is labeled**. The two can inform each other — clusters often predict classes well precisely because shared directional structure tends to co-occur with shared labels — but they are answering different questions. + +This matters practically. When using directional partitioning as a preprocessing step for classification, the analyst should not constrain the number of clusters to match the number of known classes. Doing so would import a supervised assumption into an unsupervised procedure and foreclose the possibility of discovering that the data contains more, fewer, or differently arranged groups than the labels suggest. Instead, let the recursive partition discover the natural directional groupings, then examine how class labels distribute across those groups. If clusters map cleanly to classes, that alignment is a finding. If they do not, that misalignment is also informative. + +--- + +## Directional Similarity + +To formalize the idea, let $x_i, x_j \in \mathbb{R}^d$ be two observations and let $t \in \mathbb{R}^d$ denote a benchmark vector. + +For each coordinate $m = 1,\dots,d$, define the directional signs + +$$s_m(x_i;t) = \begin{cases} -1 & x_{im} \le t_m,\\ +1 & x_{im} > t_m. \end{cases}$$ + +The vector + +$$s(x_i;t) = \bigl(s_1(x_i;t),\dots,s_d(x_i;t)\bigr)$$ + +records the directional region occupied by $x_i$ relative to the benchmark. + +Two observations are **directionally concordant** at benchmark $t$ if + +$$s(x_i;t) = s(x_j;t).$$ + +They are **directionally discordant** if they occupy different sign regions. + +This already gives a coarse notion of similarity: observations falling into the same directional cell share the same pattern of benchmark-relative deviations. + +A richer notion includes magnitudes. Define the directional deviation vectors + +$$u(x_i;t) = \bigl((x_{i1}-t_1)_+,\dots,(x_{id}-t_d)_+\bigr),$$ + +$$\ell(x_i;t) = \bigl((t_1-x_{i1})_+,\dots,(t_d-x_{id})_+\bigr).$$ + +These separate upward and downward deviations coordinatewise. Two observations can then be judged similar not only because they lie in the same directional region, but because their directional deviation magnitudes are similar. + +This is the unsupervised analogue of the partial-moment logic used throughout the book. + +--- + +## Recursive Mean Splits as Unsupervised Structure Discovery + +The recursive clustering logic can be understood most clearly in low dimension. + +### Two-dimensional case + +Suppose each observation is a pair + +$$X_i = (X_{i1}, X_{i2}).$$ + +Within a region $R$, compute the local mean vector + +$$(\bar X_{R,1}, \bar X_{R,2}).$$ + +This mean induces four directional regions: + +| Region | Condition | +|---|---| +| lower-lower | $X_{i1}\le \bar X_{R,1}$, $X_{i2}\le \bar X_{R,2}$ | +| lower-upper | $X_{i1}\le \bar X_{R,1}$, $X_{i2}> \bar X_{R,2}$ | +| upper-lower | $X_{i1}> \bar X_{R,1}$, $X_{i2}\le \bar X_{R,2}$ | +| upper-upper | $X_{i1}> \bar X_{R,1}$, $X_{i2}> \bar X_{R,2}$ | + +Each nonempty region is then split again using its own local mean. + +This process produces a nested partition of the data cloud. + +### Higher-dimensional case + +In $d$ dimensions the same principle applies, though the number of directional subregions per split can grow to $2^d$. In practice one often works with lower-dimensional projections, selected variable subsets, or stopping rules that prevent excessive fragmentation. + +The core idea remains unchanged: + +**cluster structure is revealed by repeated directional partitioning around local benchmarks.** + +This is unsupervised because no response variable is needed. The geometry of the predictor cloud is itself the object of analysis. And the number of clusters that results is a property of that geometry — not a parameter set in advance. + +--- + +## Relationship to Partial Moments + +The connection to partial moments is more than intuitive. + +Recall from earlier chapters that benchmark-relative structure is encoded by directional deviation operators. In clustering, those same operators summarize the local spread of each candidate cluster. + +For a region $R$ and coordinate $m$, define local partial moments about the regional mean $t_m = \bar X_{R,m}$: + +$$U_r^{(R,m)} = \frac{1}{|I_R|}\sum_{i\in I_R}(X_{im}-\bar X_{R,m})_+^r,$$ + +$$L_r^{(R,m)} = \frac{1}{|I_R|}\sum_{i\in I_R}(\bar X_{R,m}-X_{im})_+^r.$$ + +These quantities describe the directional spread of the cluster in each coordinate. + +A region with balanced and small directional moments is relatively compact around its centroid. A region with large or highly asymmetric directional moments is more diffuse and may warrant further splitting. + +Thus recursive clustering can be interpreted as repeatedly refining regions whose directional spread remains structurally heterogeneous. + +In this way, clustering is tied directly to the same benchmark-relative decomposition that generated classical moments and dependence measures earlier in the book. + +--- + +## Comparison with k-Means + +The most common partition-based clustering method is **k-means**. + +Given a desired number of clusters $K$, k-means minimizes the within-cluster sum of squared Euclidean distances: + +$$\sum_{k=1}^K \sum_{i\in C_k} \|x_i - \mu_k\|^2,$$ + +where $\mu_k$ is the centroid of cluster $C_k$. + +This objective has several advantages: + +- computational simplicity, +- interpretability, +- good performance for compact spherical clusters. + +But it also has well-known limitations. + +### Prespecified $K$ + +The number of clusters must be chosen in advance. This is the most fundamental limitation: the analyst must decide how many groups exist before examining whether the data supports that count. + +### Euclidean symmetry + +The method is driven by squared symmetric distance and therefore inherits the same aggregation logic criticized in earlier chapters. + +### Shape restriction + +Voronoi partitions are best suited to roughly spherical clusters. + +### Sensitivity to initialization + +Different random starts can produce different solutions. + +The directional partition approach differs in all four respects. + +1. It generates a hierarchy of clusters **without requiring a fixed $K$ at the outset**. The number of terminal clusters is a derived quantity, not an input. +2. It partitions by directional benchmark-relative structure rather than global squared distance alone. +3. It accommodates irregular, asymmetric, and locally nonlinear cluster geometry. +4. Its recursion is deterministic once the splitting rule and stopping criteria are specified. + +The contrast on the first point is worth dwelling on. When an analyst runs k-means with $K = 3$ because a dataset has 3 known classes, they have already imported supervised information into an unsupervised procedure. The directional framework keeps the two tasks separate: discover how many directional groups the data contains, then examine how those groups relate to any available labels. These are different questions, and conflating them by setting $K$ equal to the number of classes forecloses the possibility of learning anything surprising from the unsupervised step. + +This does not imply that directional clustering dominates k-means in every setting. If the true cluster structure is spherical and centroid-based, k-means is efficient and appropriate. The point is that many real datasets are not, and that a data-determined $K$ is almost always preferable to an analyst-assumed one. + +```r +# Clustered Dataset +g <- 6 +set.seed(g) +n <- 100 +d <- data.frame(x = unlist(lapply(1:g, function(i) rnorm(n/g, runif(1)*i^2))), +y <- unlist(lapply(1:g, function(i) rnorm(n/g, runif(1)*i^2)))) + +library(clue) +par(mfrow = c(2, 1)) +km <- kmeans(d, 3) +plot(d, col = km$cluster, main = paste("k-means (k = ", 3,")", sep = ""), + cex.main = 2) +points(km$centers, pch = 15, col = 3:1) +NNS.part(d$x, d$y, order = 3, Voronoi = TRUE, obs.req = 0) +``` + +
    +![Figure 20.1. k-means (top) with analyst-specified $K=3$ versus NNS.part (bottom) with data-determined clusters. The directional partition discovers the natural group structure without a prespecified count; k-means forces all observations into exactly three spherical regions regardless of whether three is the right number.](images/ch20_kmeans_comp.png) +
    + +--- + +## Comparison with Hierarchical Clustering + +Hierarchical clustering constructs nested groupings either by + +- **agglomeration**: successively merging points or clusters, or +- **division**: successively splitting them. + +Its main attraction is that it yields a dendrogram rather than a single partition, and it does not require $K$ to be fixed before the algorithm runs — the analyst cuts the tree at a chosen height after the fact. + +However, hierarchical clustering depends heavily on the chosen linkage rule: + +- single linkage, +- complete linkage, +- average linkage, +- Ward's method. + +Different linkage rules can produce very different cluster structures on the same data. + +The directional recursive partition procedure is also hierarchical and also avoids fixing $K$ in advance. But its hierarchy is built from **local benchmark splits** rather than pairwise merge criteria. + +This yields several conceptual differences. + +### Geometry + +Hierarchical distance methods are governed by pairwise distances. Directional recursion is governed by local benchmark-relative partitions. + +### Interpretation + +Dendrogram merges may be difficult to interpret substantively. Directional splits are interpretable as above-below benchmark separations along specific coordinates or regions. + +### Local adaptivity + +Directional recursion recomputes local means inside each region, allowing the geometry to adapt as the partition deepens. + +Thus directional clustering can be viewed as a **benchmark-driven divisive hierarchical method** whose effective $K$ is determined by stopping criteria applied to local directional spread rather than by visual inspection of a dendrogram. + +--- + +## Stopping Rules and Practical Cluster Formation + +A recursive partition procedure requires a stopping rule. Otherwise it eventually isolates individual observations into singleton clusters — the maximum possible $K$, and the least useful. + +Several stopping criteria are natural. Each implicitly determines how many clusters the procedure returns. + +### Minimum cell size + +Stop splitting when a region contains fewer than some threshold number of observations. This directly bounds the finest possible partition and places a floor on cluster size. + +### Maximum order + +Stop after a fixed recursion depth. At order $O$, the maximum number of populated terminal clusters is bounded above by $4^O$ for joint partitioning, though most cells will typically be empty. At order 1 there are at most 4 clusters; at order 2, at most 16; at order 3, at most 64. + +### Directional compactness + +Stop when local directional partial moments are sufficiently small, indicating that the region is already internally coherent. This is the most principled criterion: it halts refinement precisely when further splitting would not improve the directional description of the cluster. + +### Stability criteria + +Stop when further splits do not materially change the cluster assignments. + +These choices play a role analogous to model complexity control elsewhere in nonparametric estimation. + +In practice, one often combines them. For example: + +- split until order $O$, +- but do not split cells with fewer than $m$ observations, +- and stop early if directional spread falls below a tolerance. + +The terminal cells are then interpreted as clusters. Neighboring terminal cells can also be merged post hoc if a coarser partition is desired. + +None of these rules requires the analyst to specify how many clusters should result. The count of terminal clusters is a **derived output**, not an input parameter. + +--- + +## Directional Similarity Matrices + +The clustering logic can also be expressed through a similarity matrix. + +For observations $x_i$ and $x_j$, define a directional similarity score + +$$S_{ij}(t) = \sum_{m=1}^d 1_{\{s_m(x_i;t)=s_m(x_j;t)\}},$$ + +which counts the number of coordinates for which the two observations lie on the same side of the benchmark. + +A normalized version is + +$$\tilde S_{ij}(t) = \frac{1}{d} S_{ij}(t), \qquad 0 \le \tilde S_{ij}(t) \le 1.$$ + +This gives a benchmark-relative concordance measure. + +One may refine it by including magnitudes: + +$$S_{ij}^{(r)}(t) = \sum_{m=1}^d \left[ 1_{\{s_m(x_i;t)=s_m(x_j;t)\}} \cdot \exp\!\left(-\bigl||x_{im}-t_m|^r - |x_{jm}-t_m|^r\bigr|\right) \right].$$ + +The precise form can vary, but the principle is clear: similarity depends jointly on + +- side of benchmark, +- magnitude of directional deviation. + +Such matrices can be used for visualization, graph-based clustering, or post-processing of recursive partitions — and they make no assumption about the number of groups. + +--- + +## Unsupervised Learning Structures + +Clustering is often only the first step in unsupervised learning. + +Once a directional partition has been constructed, it supports several additional tasks. + +### Cluster prototypes + +Each cluster can be summarized by its local mean vector and directional spread profile. + +### Anomaly detection + +Observations falling into tiny or highly isolated terminal cells may be treated as anomalies. + +### Regime identification + +In time-indexed multivariate data, recurring visits to the same cluster can be interpreted as regimes or states. + +### Preprocessing for supervised learning + +Clusters can serve as features or as local neighborhoods for later regression and classification. When used this way, the number of directional clusters should be allowed to differ from the number of class labels — the unsupervised partition describes the geometry of the predictor space, which need not align neatly with any particular labeling scheme. + +Thus the partition is not just a grouping. It is a structural summary of the data cloud. + +This is especially important in the NNS framework, where the same recursive partitions reappear in estimation, dependence analysis, and machine learning. + +--- + +## Applications + +Directional clustering is useful whenever structure is asymmetric, benchmark-relative, or nonlinear — and whenever the number of natural groups is genuinely unknown. + +### Finance + +Assets or time periods can be clustered by joint directional behavior, such as upside co-movement versus downside co-movement, rather than by symmetric return distance alone. The number of meaningful market regimes is not known in advance and should not be fixed by assumption. + +### Economics + +Cross-sectional units can be grouped by shared threshold behavior relative to policy benchmarks, inflation targets, or growth regimes. Whether there are 2, 4, or 7 meaningful regimes is an empirical question the directional partition can answer. + +### Risk management + +Scenarios can be clustered by tail-region structure, distinguishing ordinary variation from extreme co-occurring losses. The number of meaningfully distinct tail scenarios is data-determined. + +### Operations + +Demand profiles can be grouped relative to service thresholds, stockout regions, or capacity limits. + +### Biostatistics + +Patients may cluster more meaningfully by benchmark-relative biomarker patterns than by raw Euclidean distance when thresholds matter clinically. The number of clinically relevant subgroups need not equal the number of diagnostic categories in the classification system. + +In each case the essential advantage is the same: the clustering respects **direction and context**, not merely distance, and the number of groups reflects **what the data contains**, not what the analyst assumed. + +--- + +## Conceptual Interpretation + +Classical clustering often begins with a notion of "center" and measures how far each point lies from that center — after first deciding how many centers there should be. + +The directional approach begins one level deeper. + +It asks: + +- on which side of the benchmark does the observation lie? +- how far does it lie there? +- does it share that directional structure with neighboring observations? +- does further partitioning reveal additional heterogeneity? + +And it defers the question of how many groups there are until the partition itself has provided an answer. + +This mirrors the conceptual arc of the entire book. + +Classical statistics begins with symmetric aggregates. +Directional statistics begins with the components. + +Classical clustering begins with a prespecified $K$ and global distance. +Directional clustering begins with benchmark-relative structure and lets $K$ emerge. + +--- + +## Summary + +This chapter introduced clustering as an unsupervised extension of the directional partition framework developed in earlier chapters. Its main contributions are fivefold. + +First, it defined **directional similarity**: observations are regarded as similar not merely when they are close in Euclidean distance, but when they share common benchmark-relative structure, including the sign and magnitude of their directional deviations. + +Second, it developed **recursive partition clustering** as a natural unsupervised use of mean-split partitioning. By repeatedly dividing the data cloud according to local benchmark-relative structure, the method produces clusters that adapt to asymmetry, irregular geometry, and nonlinear organization. + +Third, it established that **the number of clusters is determined by the data, not specified by the analyst**. The terminal cells of the recursive partition at any given order and occupancy threshold are a consequence of where observations fall relative to successive local means — not a parameter set before analysis begins. This is a fundamental departure from k-means and any other method that requires $K$ to be chosen in advance. + +Fourth, it clarified that **clusters are not classes**. The number of directional groups discovered by the partition need not equal the number of class labels in any downstream supervised problem. Constraining the partition to produce exactly as many clusters as there are classes imports supervised information into an unsupervised procedure and forecloses the possibility of discovering richer or different structure than the labels suggest. + +Fifth, it compared the directional approach with **classical clustering methods** such as k-means and hierarchical clustering. Unlike k-means, the directional method does not rely solely on symmetric Euclidean centroid geometry and does not require a prespecified $K$; unlike standard hierarchical methods, it is driven by local benchmark splits rather than arbitrary linkage criteria, and its effective $K$ is determined by stopping criteria on local directional spread. + +Taken together, these results show that clustering in the NNS framework is not simply a distance-based grouping exercise. It is a benchmark-relative structural decomposition of the data, consistent with the broader theme of the book: classical methods begin with symmetric aggregates, while directional methods begin with their components. + +The next chapter turns from unsupervised grouping to **nonparametric regression**, where these same partition structures are used to estimate conditional relationships directly from data. diff --git a/tools/NNS/book/chapter-22-nonparametric-regression.Rmd b/tools/NNS/book/chapter-22-nonparametric-regression.Rmd new file mode 100644 index 0000000..3a902fb --- /dev/null +++ b/tools/NNS/book/chapter-22-nonparametric-regression.Rmd @@ -0,0 +1,507 @@ +# Nonparametric Regression + +Chapters 18 and 19 established the partition-based estimation framework that underlies the NNS approach to nonparametric regression. Chapter 18 introduced recursive mean-split estimation as a member of the well-characterized class of data-adaptive partition estimators and showed that consistency is inherited directly from that class under standard shrinking-diameter and occupancy conditions. Chapter 19 interpreted the induced partition diameter as a dynamic bandwidth, linking the estimator to classical nonparametric smoothing theory and showing that the consistency conditions are the direct analogue of shrinking-bandwidth conditions in kernel regression. Chapter 20 showed that the same recursive partitions can be used for unsupervised clustering. + +This chapter brings those strands together under the familiar label of **regression**. + +In classical statistics, regression is often identified with a fitted equation: a line, a polynomial, or another parametric surface chosen in advance. In the directional framework, regression is something more fundamental: + +**the estimation of conditional expectation from data without imposing a predetermined functional form.** + +The NNS approach treats regression as the recovery of a nonlinear surface by recursive local averaging over benchmark-defined regions. The univariate case and the multivariate case, however, operate through meaningfully different prediction mechanisms: + +- In the **univariate case**, the estimator is a piecewise-constant conditional expectation surface, with optional piecewise-linear interpolation across regression points. +- In the **multivariate case**, the estimator operates as a nearest-neighbor search over a compressed set of regression points — local conditional means derived from per-regressor partitions against the response — rather than over the raw observations themselves. + +This distinction is not incidental. The multivariate architecture is designed specifically to mitigate the curse of dimensionality. It is the central contribution of the NNS regression framework for high-dimensional settings, and it is developed carefully below. + +--- + +## Regression as Conditional Expectation + +Let $X \in \mathbb{R}^d$ denote a predictor vector and let $Y \in \mathbb{R}$ denote a response variable. + +The central object of regression is the **conditional mean function** + +$$f(x) = E[Y \mid X = x].$$ + +This function gives the expected value of the response at each predictor location. + +Classical regression estimates $f$ by restricting it to a family such as + +$$f(x) = \beta_0 + \beta^\top x$$ + +or + +$$f(x) = \beta_0 + \beta_1 x + \beta_2 x^2 + \cdots.$$ + +Such models can be useful when the functional form is approximately correct. But when the underlying relationship is nonlinear, threshold-driven, piecewise, or asymmetric, a parametric family can distort the structure it aims to estimate. + +Nonparametric regression removes this restriction. Rather than specifying the shape of $f$ in advance, it estimates the function directly from the data. + +The directional framework does this by recursively partitioning the data into regions and estimating the local conditional mean inside each region. + +--- + +## Why Classical Regression Can Fail + +The limitations of classical regression mirror the limitations discussed throughout the book. + +### Functional rigidity + +A linear model assumes that the response changes at a constant rate in each direction of predictor space. Many relationships do not. + +### Global averaging + +A single fitted equation averages across the full sample. Local nonlinear structure may be flattened into a misleading global trend. + +### Symmetric error treatment + +Least-squares fitting penalizes positive and negative residuals symmetrically. In many settings the two directions matter differently. + +### Parametric dependence + +Inference often depends on Gaussian errors, homoscedasticity, and stable functional form. + +These assumptions are not always wrong. But they are often stronger than the data justify. + +Nonparametric regression avoids imposing them at the outset. + +--- + +## Partition-Based Regression in the NNS Framework + +The NNS regression framework begins with the recursive mean-split estimator introduced in Chapter 18. + +Suppose we observe + +$$(X_1,Y_1), \dots, (X_n,Y_n).$$ + +A partition of the predictor space produces regions + +$$A_1, A_2, \dots, A_K.$$ + +Within each region, the regression function is estimated by the local sample average: + +$$\hat f(x) = \frac{1}{N(x)} \sum_{i : X_i \in A(x)} Y_i,$$ + +where $A(x)$ is the region containing $x$ and $N(x)$ is the number of observations in that region. + +This is the basic NNS regression rule: + +**estimate the conditional expectation by averaging responses inside a data-adaptive local region.** + +The distinctive feature is not the averaging formula itself. Partition estimators are classical, and their consistency is well-established. The distinctive features are the **geometry of the partition** — generated recursively from local means, following the directional structure of the data — and the **multivariate architecture** built on per-regressor partitioning against the response, which is the source of the method's ability to handle many predictors without exponential deterioration. + +--- + +## From Conditional Means to Regression Points + +The recursive partition yields a collection of local mean points, called **regression points**. + +In the univariate case, these are pairs + +$$(\bar X_R, \bar Y_R)$$ + +for each region $R$, where + +$$\bar X_R = \frac{1}{|I_R|}\sum_{i \in I_R} X_i, \qquad \bar Y_R = \frac{1}{|I_R|}\sum_{i \in I_R} Y_i.$$ + +In higher dimensions, $\bar X_R$ becomes a local mean vector in predictor space. + +These regression points play a different role in the univariate and multivariate cases, and it is important to keep that distinction clear. + +--- + +## The Univariate Case: Piecewise Estimation + +In the univariate case, the regression points can be interpreted in two complementary ways. + +### Piecewise-constant surface + +Within each terminal cell, the estimate is constant: + +$$\hat f(x) = \bar Y_R \qquad \text{for } x \in R.$$ + +This yields a stepwise approximation to the conditional mean surface. + +### Piecewise-linear surface + +If neighboring regression points are connected by line segments, the result is a continuous piecewise-linear surface. This gives the NNS univariate regression estimator two useful faces: + +- a **local averaging estimator** for theory, +- a **piecewise-linear interpolating surface** for visualization and prediction. + +Both arise from the same partition geometry. The piecewise-linear representation provides a transparent interpolation and extrapolation rule: between any two adjacent regression points, the surface varies linearly, while the regression points themselves remain anchored to empirical local conditional means. + +```r +x <- seq(-5, 5, .05) +y <- x ^ 3 + +for(i in 1 : 3){NNS.part(x, y, order = i, obs.req = 0, Voronoi = TRUE, type = "XONLY") + NNS.reg(x, y, order = i, ncores = 1)} +``` + +
    +![Figure 21.1. `NNS.reg` univariate fit on $y = x^3$ at orders 1–3, alongside the corresponding `NNS.part` X-only partitions. Each panel shows the piecewise-linear regression surface formed by connecting regression points, with progressively finer partition cells at higher order.](images/ch21_part_reg.png) +
    + +--- + +## Piecewise Estimation from Partition Clusters + +Chapter 20 showed that recursive mean-split partitioning can be interpreted as clustering. The same result has a direct regression interpretation. + +Each terminal partition cell can be viewed as a **local cluster of observations** sharing similar benchmark-relative structure. Once that structure is identified, regression proceeds by fitting the conditional mean locally inside each cluster. + +This interpretation clarifies why the method is effective on nonlinear data. + +A single global line may fit badly because observations that belong to fundamentally different local regimes are forced into one equation. Partition-based regression instead allows the data to decompose into structurally coherent regions before averaging. + +In this sense, NNS univariate regression is **piecewise conditional expectation estimation from partition clusters**, with linear interpolation between the resulting regression points. + +The procedure is: + +- recursively partition the sample into locally coherent regions, +- compute the local regression point within each region, +- connect adjacent regression points with line segments to form a continuous interpolating surface. + +Thus clustering and regression are not separate operations. In the NNS framework, they are two views of the same recursive structure. + +--- + +## Interpretation of the Estimated Model + +One advantage of the NNS regression framework is that the fitted model remains interpretable. + +Classical black-box machine-learning methods can predict well while making it difficult to understand what the model has learned. Recursive mean-split regression retains a geometric interpretation at every stage. + +### Local benchmark interpretation + +Each split occurs at a local mean. The partition tree records how the data separate relative to conditional benchmarks. + +### Regional interpretation + +Each terminal cell corresponds to a region where the conditional expectation is approximately stable. + +### Surface interpretation + +The fitted surface can be read as an assembly of local conditional expectations stitched together over the predictor space. + +### Complexity interpretation + +Model complexity is controlled by partition order and occupancy thresholds rather than by a fixed polynomial degree or a hidden parameterization. + +This makes the estimator interpretable in a way that is both statistical and geometric: + +- where the surface bends, the partition refines, +- where the surface is flat, the partition stays coarse, +- where the data are sparse, the smoothing remains broader, +- where the data are dense, the surface can localize more aggressively. + +--- + +## Bias, Variance, and Adaptive Smoothing + +Chapters 18 and 19 already established the asymptotic logic of the estimator. In regression language, the same result can be restated simply. + +At any point $x$, the estimator averages the responses within a cell $A_n(x)$. Two conditions determine its quality: + +- the cell must be **small enough** that $f$ is nearly constant inside it, +- the cell must contain **enough observations** that the local average is stable. + +These are the usual bias–variance requirements. + +If the cell is too wide, the estimate is biased because it averages across substantively different predictor values. +If the cell is too small too early, the estimate has high variance because the local sample is too thin. + +The NNS partition addresses this through recursive refinement. Its effective bandwidth is the cell diameter + +$$h_n(x) = \operatorname{diam}(A_n(x)).$$ + +This bandwidth is not chosen externally. It is induced by the recursive geometry of the data. + +Thus nonparametric regression in the NNS framework may be interpreted as **adaptive local averaging with endogenous bandwidth** — and the convergence of this bandwidth to zero as $n$ grows, combined with growing occupancy, is exactly the condition that delivers consistency by class membership. + +--- + +## Comparison with Classical Regression Models + +The differences between NNS regression and classical models can now be stated directly. + +### Linear regression + +Ordinary least squares assumes a single global hyperplane $Y = \beta_0 + \beta^\top X + \varepsilon$. This is efficient when the relationship is approximately linear, but restrictive when it is not. NNS regression imposes no global linearity and allows the shape to vary across regions. + +### Polynomial regression + +Polynomial models allow curvature, but only of a prespecified algebraic form. High-order polynomials can oscillate and extrapolate poorly. NNS regression does not require choosing a polynomial degree; curvature emerges from recursive local refinement. + +### Generalized additive models + +Additive models allow nonlinear marginal effects but often assume additive separability across predictors. NNS regression does not require additive separability; interactions can appear naturally through the partition geometry. + +### CART and regression trees + +Tree methods also partition the predictor space and are the closest classical analogue. Both NNS and CART belong to the data-adaptive partition estimator class and share the same class-level consistency guarantees. But CART chooses splits greedily to optimize impurity or squared error reduction and then relies on pruning penalties. NNS regression anchors partitions to local mean structure itself; its geometry follows recursive benchmark-relative splitting rather than a greedy impurity search. + +### Kernel regression + +Kernel estimators average nearby observations using weights determined by a bandwidth $h$. They are classical and flexible but require explicit bandwidth selection, and their local neighborhoods are imposed by a weighting rule rather than generated by recursive structure. NNS regression avoids explicit bandwidth tuning and obtains localization through partition diameter instead. Both approaches are consistent local averagers; they differ in how the neighborhood is defined and whether the smoothing scale is chosen by the analyst or induced by the data. + +### k-nearest-neighbor regression + +Standard kNN regression predicts by averaging the $k$ observed responses whose predictor values lie closest to the query point. It searches over the raw observation cloud of size $n$. The multivariate NNS regression is superficially similar but mechanistically different: it searches over a compressed set of **regression points** — local conditional means derived from per-regressor partitions — rather than over raw observations. The search space is smaller, and each candidate neighbor has already been denoised through local averaging. This distinction is the foundation of the NNS multivariate architecture and is developed in full in the following section. + +--- + +## Exact Fit, Interpolation, and Extrapolation + +An important feature of recursive mean-split regression is its finite limit behavior. + +At a sufficiently high partition order $O^*$, each observation occupies its own terminal region. At that point, + +$$\hat f(X_i) = Y_i \qquad \text{for all } i.$$ + +This identifies the finite interpolation limit of the estimator, but it is also a warning sign: in practice, the preferred partition order is chosen before this limit, typically by cross-validation or dependence-driven order selection, in order to balance local fidelity against overfitting. + +The estimator spans a full spectrum: + +- coarse global approximation at low order, +- increasingly local nonlinear fit at intermediate order, +- exact in-sample interpolation at finite maximal order. + +In the univariate case, prediction may be based either on the piecewise-constant local mean within a terminal cell, or on the piecewise-linear interpolation across adjacent regression points. In the multivariate case, neither of these descriptions applies — prediction proceeds instead by nearest-neighbor lookup over the regression-point matrix, as described in the following section. + +--- + +## Multivariate Regression: Per-Regressor Partitioning and the Curse of Dimensionality + +The multivariate case requires a separate treatment, and its architecture is the primary contribution of NNS regression for high-dimensional settings. + +The fundamental challenge in multivariate nonparametric regression is the **curse of dimensionality**: as the number of predictors $d$ grows, the volume of predictor space grows exponentially, and observations spread thinly across it. Local averaging methods that partition the joint predictor space suffer because the number of cells grows as $K^d$ for $K$ partition points per dimension, while the number of observations per cell decreases at a corresponding rate. For standard kNN regression, the relevant neighbors become increasingly distant as $d$ grows. + +The NNS multivariate architecture addresses this challenge through a structural decision at the partitioning stage, not through dimensionality reduction after the fact. + +### Per-regressor partitioning against the response + +Each predictor $X^{(j)}$ is partitioned **independently against the response $Y$** using the univariate recursive mean-split procedure. This produces a set of $K_j$ regression points for predictor $j$ — local conditional means of the pairs $(X^{(j)}, Y)$ within the partition cells of that regressor. + +This is the key architectural decision. Rather than partitioning the joint $d$-dimensional predictor space — which would produce up to $K^d$ cells — NNS partitions each predictor dimension separately against the response. The number of regression points generated is $\sum_{j=1}^d K_j$, which grows **linearly** in $d$ rather than exponentially. + +The benefit is twofold. First, the search space for prediction is compressed: the candidate set has size of order $\sum_j K_j$, not $\prod_j K_j$. Second, each candidate in the search space is not a raw observation but a **local conditional mean** — an average over a cluster of observations that share similar benchmark-relative structure with respect to the response. Noise has already been reduced before the distance calculation is performed. + +### Why this mitigates the curse + +The curse of dimensionality in standard kNN is a consequence of searching over $n$ raw observations in $\mathbb{R}^d$: the volume of the space containing any fixed proportion of the observations grows as $n^{-1/d}$, making nearest neighbors increasingly distant and the local average increasingly biased. + +In the NNS multivariate framework: + +1. **The search space is compressed.** The regression point matrix (RPM) has $M \ll n$ rows, one per occupied joint cell. Each row is a local conditional mean, not a raw observation. + +2. **Each candidate neighbor is denoised.** Because each regression point averages over a cluster of observations, the effective noise level of each candidate is reduced relative to a raw observation. The nearest-neighbor distance calculation is performed over a geometry that is already smoother than the raw data. + +3. **The partition depth per regressor is signal-adaptive.** When `order = NULL`, each regressor receives a partition depth proportional to its directional dependence with the response (measured by `NNS.dep`). Regressors with weak predictive content receive shallow partitions and contribute few regression points. Regressors with strong predictive content receive deeper partitions. The search space is therefore automatically concentrated on the dimensions most relevant to prediction. + +Together, these three properties mean that the effective dimensionality of the prediction problem is reduced not by collapsing predictors into a lower-dimensional index, but by compressing and denoising the candidate set before the nearest-neighbor search. + +### Regression point matrix + +The per-regressor regression points are assembled into a **regression point matrix (RPM)**. Each row of the RPM corresponds to one occupied joint cell in the multivariate partition structure; the columns record the local mean of each predictor within that cell, and a final column records the corresponding local mean response. + +For a new observation $x^* \in \mathbb{R}^d$, prediction proceeds by identifying the rows of the RPM whose predictor means lie closest to $x^*$ and returning the weighted average of the corresponding local response means. + +```r +# Multivariate regression example +fit <- NNS.reg(X, y) # X is an n x d matrix; order = NULL by default +fit$RPM # the regression point matrix +fit$Point.est # predicted values for new observations +``` + +### Dependence-sensitive neighbor count + +The number of neighbors used in the final averaging step is itself dependence-sensitive. When estimated dependence between predictors and the response is high, the local regression surface is more coherent and fewer neighbors suffice for a stable prediction. When dependence is lower, broader averaging over more neighbors improves stability. + +Localization is therefore adjusted not only by partition geometry, but also by the estimated strength of the multivariate signal. The multivariate NNS regression is thus a **response-anchored regression-point nearest-neighbor estimator**: partitioning creates a denoised, compressed geometry of local conditional means, and nearest-neighbor search over that geometry — with dependence-adaptive neighbor count — supplies the final prediction. + +### Structural comparison with standard kNN + +The difference from standard kNN can be stated precisely: + +| Property | Standard kNN | Multivariate NNS | +|---|---|---| +| Search space | $n$ raw observations | $M \ll n$ regression points | +| Candidate quality | Individual observations, full noise | Local conditional means, partially denoised | +| Search space growth in $d$ | Fixed at $n$; neighbor distance grows | $\sum_j K_j$, linear in $d$ | +| Neighbor count | Fixed $k$ | Dependence-adaptive | +| Partition basis | None | Per-regressor against response | + +The NNS approach is therefore not simply kNN with a different distance metric. It is kNN over a fundamentally different, response-anchored candidate set. + +--- + +## Adaptive Order Selection: Dependence-Driven Partition Depth + +When the user leaves `order = NULL` (the default), `NNS.reg` does not apply one global partition depth uniformly across all predictors. Instead, it computes a directional dependence score between each regressor and the response using `NNS.dep`-style dependence, then allocates recursion depth per regressor accordingly. + +Regressors with stronger directional dependence receive deeper recursive partitioning, enabling finer local approximation where signal is most evident. Regressors with weak dependence receive shallower partitioning, yielding broader smoothing and reducing the chance of overfitting noise-dominant inputs. + +This is the concrete implementation of the dynamic-bandwidth interpretation from Chapter 19: partition cell diameter is not only data-adaptive but **dependence-adaptive**. Smoothing granularity is endogenously assigned by signal strength rather than fixed through a single hand-tuned global parameter. + +```r +fit <- NNS.reg(x, y) # order = NULL by default +fit$rhs.partitions # realized partition depth per regressor +``` + +The realized depth profile is relative: predictors with higher directional dependence typically receive finer partitioning than predictors with weaker dependence in the same fit. The exact realized depths depend on the sample, occupancy constraints, and other fitting controls. + +The main practical consequence is that per-variable manual tuning is often unnecessary in exploratory workflows, while full control remains available when needed (`order = 5`, `order = "max"`, and related settings). + +--- + +## Dimension Reduction via Synthetic Predictors + +The package also provides a qualitatively different way to address multivariate regression when dimensionality becomes burdensome. Rather than preserving the full joint predictor geometry, the predictors may be collapsed into a single **synthetic index** and standard univariate NNS regression applied to that index. + +Let the predictors be $X^{(1)},\dots,X^{(d)}$. After rescaling each predictor to the unit interval, write the normalized predictors as $\tilde X^{(1)},\dots,\tilde X^{(d)} \in [0,1]$. The synthetic predictor is + +$$X^* = \frac{\sum_{j=1}^{d} w_j \tilde X^{(j)}}{\sum_{j=1}^{d} \mathbf{1}[w_j\neq 0]},$$ + +where $w_j$ is the weight assigned to predictor $j$. + +This replaces a $d$-dimensional predictor vector with a single composite variable, allowing the full univariate recursive mean-split machinery — including piecewise-linear interpolation — to be reused directly. + +### Weighting options + +The weights $w_j$ may be determined in several ways: + +- **Equal weighting** assigns all included predictors the same weight. +- **Correlation weighting** (`dim.red.method = "cor"`) uses signed correlation coefficients. +- **Directional dependence weighting** uses `NNS.dep`, connecting the reduction step to the nonlinear dependence framework developed in Chapter 10. +- **Directional causation weighting** uses `NNS.caus`, allowing predictors with stronger causal evidence to receive greater weight. +- **Ensemble weighting** combines multiple weighting schemes into a single composite score. + +Dimension reduction is therefore not a purely geometric projection. It is a **structure-aware aggregation of predictors**, where the weights themselves may be derived from directional measures developed earlier in the book. + +### Variable selection through thresholding + +The `threshold` parameter excludes predictors whose weights fall below a chosen value $\tau$: + +$$w_j < \tau \implies \text{predictor } j \text{ excluded from } X^*.$$ + +This turns the reduction step into a form of **variable selection** as well as aggregation. + +### Regression after reduction + +Once $X^*$ is formed, the regression problem becomes univariate: + +$$f^*(x^*) = E[Y \mid X^* = x^*].$$ + +Standard univariate NNS regression is then applied to $(X^*, Y)$, producing the familiar recursive partition, regression points, piecewise-linear interpolation path, and local conditional means. + +The multivariate reduction pipeline is therefore: + +1. rescale each predictor, +2. compute directional or correlation-based weights, +3. threshold weak predictors if desired, +4. form the synthetic predictor $X^*$, +5. run univariate NNS regression on $X^*$ against $Y$. + +### Conceptual comparison of the two multivariate paths + +The dimension-reduction path and the regression-point nearest-neighbor path address dimensionality through different strategies. + +- The **regression-point nearest-neighbor path** preserves the joint predictor structure. It mitigates dimensionality by partitioning each regressor against the response independently, then searching over a compressed set of local conditional means — a search space that grows linearly rather than exponentially in $d$ — with dependence-adaptive neighbor count. Prediction is a smooth weighted average over regression points. Piecewise-linear interpolation is not available on this path because there is no natural ordering of regression points in $\mathbb{R}^d$. + +- The **dimension-reduction path** collapses the predictor space entirely, trading interaction structure for parsimony. Once the synthetic index is formed, univariate NNS regression applies — and its piecewise-linear interpolation is again available. + +For some problems, especially when predictors are numerous and noisy, the synthetic-index approach can be an advantage rather than a liability. Simple weighted composites often generalize well out of sample, and the univariate path offers stability, interpretability, and straightforward visualization that the full multivariate path cannot match. + +The two strategies are worth keeping distinct — in particular because the prediction mechanism differs between them. + +--- + +## Practical Perspective on NNS Regression + +From a practical standpoint, the NNS regression framework can be summarized with five ideas. + +### It is nonparametric + +No functional form is assumed for the regression surface. Consistency follows from class membership in the data-adaptive partition estimator class established by Stone (1977). + +### It is nonlinear + +Curvature, thresholds, and interactions emerge naturally through local partition geometry. + +### It is adaptive + +The effective smoothing scale varies by region rather than being imposed globally. In the multivariate case, it also varies by predictor, with finer partitioning allocated to regressors with stronger directional dependence on the response. + +### It is interpretable + +The fitted model can be understood through local means, partition regions, and regression points. + +### Its prediction mechanism depends on dimensionality + +In the univariate case, prediction uses piecewise-linear interpolation across ordered regression points. In the multivariate case, prediction uses nearest-neighbor search over the regression-point matrix — a fundamentally different mechanism operating over a compressed, denoised candidate set that grows linearly in the number of predictors, substantially mitigating the curse of dimensionality that affects standard kNN and joint-partition methods alike. + +--- + +## Relationship to the Broader NNS Program + +This chapter completes an important conceptual arc. + +Earlier chapters showed that directional deviation operators generate distribution functions, moment decompositions, dependence measures, conditional probabilities, and stochastic dominance diagnostics. + +Chapters 18 and 19 then showed that the same directional logic induces a consistent adaptive estimator through recursive mean splitting — consistent by class membership, with the dynamic bandwidth interpretation making the connection to classical kernel theory explicit. + +This chapter reframes that estimator in its most familiar applied form: **nonparametric regression**. + +The deeper point is that regression itself can be understood as another consequence of the same benchmark-relative directional primitive that generated the earlier parts of the book. Distribution theory, dependence, clustering, and regression are not separate constructions here. They are structurally linked through recursive directional decomposition. + +The multivariate architecture — per-regressor partitioning against the response, regression-point matrix construction, and dependence-adaptive nearest-neighbor prediction — is the practical expression of that linkage in high-dimensional settings. It is where the theoretical elegance of the partial-moment framework translates into a concrete answer to one of the hardest problems in nonparametric estimation. + +--- + +## Summary + +This chapter developed nonparametric regression in the NNS framework as **conditional expectation estimation by recursive partitioning**. + +Its main contributions are sevenfold. + +**First**, it defined regression at its most fundamental level as estimation of the conditional mean function $f(x) = E[Y \mid X = x]$, and located the NNS approach within the class of data-adaptive partition estimators whose consistency has been established by Stone (1977), Lugosi and Nobel (1996), and Györfi et al. (2002). + +**Second**, it showed how recursive mean-split partitions produce **nonlinear regression surfaces** by forming local conditional averages over data-adaptive regions, with partition geometry following the benchmark-relative directional structure of the data. + +**Third**, it interpreted those regions as **partition clusters**, so that regression becomes piecewise estimation from locally coherent structural groupings. + +**Fourth**, it distinguished clearly between the univariate and multivariate prediction mechanisms. In the univariate case, regression points are connected by line segments to produce a **piecewise-linear interpolating surface**. In the multivariate case, this description does not apply: prediction is performed by **nearest-neighbor search over the regression-point matrix** — a compressed, denoised set of local conditional means — yielding a smooth weighted average rather than a linear interpolation. + +**Fifth**, it explained why the multivariate architecture mitigates the curse of dimensionality. Per-regressor partitioning against the response produces a candidate search set that grows **linearly** in the number of predictors rather than exponentially, and each candidate is a denoised local conditional mean rather than a raw observation. This is not a post-hoc dimensionality reduction; it is a structural property of the partitioning design. + +**Sixth**, it introduced **dimension reduction via synthetic predictors** as an alternative multivariate path. Predictors are rescaled, weighted by directional relevance, thresholded for variable selection, and collapsed into a single composite index $X^*$, after which standard univariate NNS regression — including piecewise-linear interpolation — applies directly. + +**Seventh**, it compared the NNS approach with classical regression models — linear, polynomial, additive, tree-based, kernel-based, and kNN — highlighting the distinctive combination of nonparametric flexibility, endogenous bandwidth, response-anchored regression-point nearest-neighbor prediction, dependence-adaptive localization, and optional synthetic-index reduction. + +The next chapter turns from conditional mean estimation to **classification**, where the same directional partition structures are used not to predict a numeric response, but to assign observations to classes. + +> **Further Reading / Examples** +> For hands-on regression applications, including multivariate and noisy data examples, see the [NNS Regression Examples](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/README.md#2-regression). + +--- + +## References + +- Stone, C. J. (1977). Consistent nonparametric regression. *Annals of Statistics*, 5(4), 595–620. + +- Lugosi, G., & Nobel, A. (1996). Consistency of data-driven histogram methods for density estimation and classification. *Annals of Statistics*, 24(2), 687–706. + +- Györfi, L., Kohler, M., Krzyżak, A., & Walk, H. (2002). *A Distribution-Free Theory of Nonparametric Regression*. Springer. + +- Vinod, H. D., & Viole, F. (2017). Nonparametric regression using clusters. *Computational Economics*, 52(4), 1181–1209. https://doi.org/10.1007/s10614-017-9713-5 + +- Vinod, H. D., & Viole, F. (2018). Clustering and curve fitting by line segments. *Preprints*, 2018010090. https://doi.org/10.20944/preprints201801.0090.v1 + +- Viole, F. (2020). Partitional estimation using partial moments. *SSRN eLibrary*. https://doi.org/10.2139/ssrn.3592491 + +- Viole, F., & Nawrocki, D. (2013). *Nonlinear Nonparametric Statistics: Using Partial Moments*. CreateSpace. diff --git a/tools/NNS/book/chapter-23-classification.Rmd b/tools/NNS/book/chapter-23-classification.Rmd new file mode 100644 index 0000000..e9deb82 --- /dev/null +++ b/tools/NNS/book/chapter-23-classification.Rmd @@ -0,0 +1,701 @@ +# Classification + +Chapter 21 developed nonparametric regression in the NNS framework as conditional expectation estimation by recursive partitioning. There, the response variable was numeric, and the objective was to recover a conditional mean surface. + +This chapter turns to **classification**. + +In classification problems, the response is not a number to be predicted directly, but a **category to be assigned**. The task is to determine, from observed predictor variables, which class label is most appropriate for a new observation. + +Classical classification methods such as **logistic regression**, **linear discriminant analysis**, **support vector machines**, and **random forests** are widely used and often effective. But they inherit many of the structural limitations discussed throughout this book: + +- linearity assumptions, +- symmetric treatment of deviations, +- tuning dependence, +- and difficulty adapting to nonlinear, asymmetric, or benchmark-relative class structure. + +The directional framework offers a different perspective. + +Rather than beginning with a global separating equation or a fixed geometric margin, NNS classification begins with **recursive benchmark-relative partitioning of the predictor space**. Classification then proceeds by assigning class labels according to the local structure of the resulting partitions. + +The central idea is simple: + +**classification is the categorical analogue of conditional expectation estimation.** + +Regression estimates + +\[ +E[Y \mid X=x]. +\] + +Classification estimates + +\[ +P(Y = c \mid X=x) +\] + +for each class \(c\), and assigns the label corresponding to the largest conditional probability. + +This chapter develops that viewpoint. + +--- + +## Classification as Conditional Probability Estimation + +Let \(X \in \mathbb{R}^d\) denote a predictor vector and let \(Y\) take values in a finite label set + +\[ +\mathcal{C} = \{1,2,\dots,K\}. +\] + +A classifier is a rule + +\[ +g:\mathbb{R}^d \to \mathcal{C} +\] + +that assigns a class label to each predictor point. + +The optimal classifier under zero-one loss is the **Bayes classifier**: + +\[ +g^*(x) = \arg\max_{c \in \mathcal{C}} P(Y=c \mid X=x). +\] + +Thus the classification problem is fundamentally a conditional probability problem. + +This aligns naturally with the NNS framework developed earlier. Chapter 13 showed that conditional probabilities can be written in terms of partial moments and co-partial moments. Classification therefore fits directly into the same directional machinery: + +- partition the predictor space, +- estimate local class probabilities, +- assign the class with highest estimated local probability. + +So the statistical primitive does not change. Only the form of the response changes. + +--- + +## Why Classical Classification Methods Can Fail + +Classical classification methods often perform well when class boundaries are smooth, approximately linear, and well separated. But many real datasets are not. + +### Linear decision boundaries + +Methods such as logistic regression and linear discriminant analysis impose global linear or quadratic boundary structure. When the true class geometry is nonlinear, these boundaries can misclassify substantial regions. + +### Symmetric distance assumptions + +Distance-based classifiers often treat deviations symmetrically around centers. But class structure may depend more on one side of a threshold than another. + +### Global parameterization + +Many classical methods summarize the full predictor space with a small number of global parameters. This can obscure local class structure. + +### Tuning dependence + +Support vector machines require kernel and penalty choices. Tree ensembles require tuning of depth, feature subsampling, and aggregation parameters. Performance can depend heavily on these choices. + +### Imbalance sensitivity + +In imbalanced settings, global classifiers may be dominated by the majority class unless explicitly reweighted. + +These limitations echo the broader theme of the book: + +**classical methods often begin with an aggregate geometric form, whereas directional methods begin with local benchmark-relative structure.** + +--- + +## Directional Decision Regions + +The NNS approach to classification begins from the recursive partition machinery introduced in Chapters 18–20. + +Suppose the predictor space is partitioned into cells + +\[ +A_1, A_2, \dots, A_M. +\] + +Within each terminal cell, the local class probabilities are estimated empirically: + +\[ +\hat p_c(x) += +\frac{\#\{i : X_i \in A(x),\, Y_i = c\}} +{\#\{i : X_i \in A(x)\}}, +\] + +where \(A(x)\) denotes the terminal cell containing \(x\). + +The classifier is then + +\[ +\hat g(x) = \arg\max_{c \in \mathcal{C}} \hat p_c(x). +\] + +This is the partition-based analogue of the Bayes rule. + +The crucial difference from classical methods is that the regions \(A(x)\) are not fixed by global parametric geometry. They are induced recursively by the data through benchmark-relative splits. + +In the binary case, the decision boundary is the set of points where the estimated conditional class probabilities are equal: + +\[ +\hat p_1(x) = \hat p_2(x). +\] + +Equivalently, + +\[ +P(Y=1 \mid X=x) - P(Y=2 \mid X=x) = 0. +\] + +Within NNS, this boundary is not imposed in advance. It emerges from the partition structure. + +--- + +## Binary Classification + +Consider first the case + +\[ +Y \in \{1,2\}. +\] + +For each partition cell \(A\), define + +\[ +\hat p(A) = \frac{1}{N_A}\sum_{i:X_i \in A} Y_i, +\] + +where \(N_A\) is the number of observations in the cell. + +Since \(Y\) is binary, \(\hat p(A)\) is simply the empirical fraction of class-1 observations in the cell. Thus + +\[ +\hat p(A) \approx P(Y=1 \mid X \in A). +\] + +The decision rule becomes + +\[ +\hat g(x)= +\begin{cases} +1 & \hat p(A(x)) > 1/2,\\ +2 & \hat p(A(x)) \le 1/2. +\end{cases} +\] + +This makes the analogy to regression immediate. In regression, the local average estimates the conditional mean. In binary classification, the local average of the binary label estimates the conditional class probability. + +So binary classification in the NNS framework is simply **partition-based probability estimation followed by thresholding**. + +> **Implementation note (important):** for `NNS.boost(..., type = "CLASS")` and related classification interfaces, class labels should start at `1` (not `0`). Recode `0/1` targets to `1/2` before fitting. + +--- + +## Multiclass Classification + +Now suppose + +\[ +Y \in \{1,2,\dots,K\} +\] + +with \(K>2\). + +For each class \(c\), define the local probability estimator + +\[ +\hat p_c(A) = +\frac{\#\{i:X_i \in A,\ Y_i=c\}}{N_A}. +\] + +Because the classes partition the response space, + +\[ +\sum_{c=1}^K \hat p_c(A)=1. +\] + +The multiclass decision rule is + +\[ +\hat g(x)=\arg\max_{c} \hat p_c(A(x)). +\] + +This yields a **piecewise-constant class probability surface** over the predictor space. + +The resulting decision regions need not be linear, convex, or globally smooth. They inherit the geometry of the recursive partition. + +This is one of the major strengths of the NNS classifier: + +- complex local structure can be captured, +- multiple class regions can interleave nonlinearly, +- and no prior assumption is imposed on the shape of the boundary. + +--- + +## Recursive Mean-Split Classification Geometry + +The partition structure from the regression chapters remains central. + +At each stage of recursive partitioning, a region is split around local means. In joint partitioning, this creates benchmark-defined subregions corresponding to directional quadrants. In \(X\)-only partitioning, it creates recursive subdivisions of predictor space. + +For classification, these same regions become **local decision neighborhoods**. + +Each terminal cell stores: + +- its local predictor benchmark structure, +- its class composition, +- its estimated class probabilities, +- and its dominant class label. + +The decision function is therefore interpretable geometrically. + +### Regional interpretation + +A class assignment is not produced by a hidden global optimization alone. It is produced because the new point falls into a particular benchmark-relative region whose observed class composition favors one label. + +### Boundary interpretation + +Decision boundaries are unions of partition edges separating regions with different dominant labels or different class-probability rankings. + +### Refinement interpretation + +Where class mixing remains high, further partitioning can sharpen the local probability estimate. Where classes are already well separated, coarser regions suffice. + +Thus the classifier adapts its complexity to the structure of the data. + +--- + +## Directional Decision Boundaries + +The phrase **directional decision boundary** has a precise meaning in this framework. + +A classical linear classifier produces a boundary such as + +\[ +\beta_0 + \beta^\top x = 0, +\] + +which divides the predictor space globally into two half-spaces. + +A directional classifier instead produces boundaries induced by recursive benchmark-relative partitions. These boundaries are directional in three senses. + +### They are benchmark-relative + +Each split is defined relative to a local benchmark, typically a mean vector. + +### They are locally adaptive + +Boundaries need not extend globally as a single hyperplane. They adapt region by region. + +### They preserve asymmetry + +If class separation is stronger on one side of a benchmark than another, the partition geometry reflects that asymmetry. + +This is important in applications where class identity depends on threshold behavior. For example: + +- default versus non-default beyond leverage thresholds, +- disease state beyond biomarker cutoffs, +- regime classification beyond volatility breaks, +- operational alert states beyond service-level violations. + +In such settings, the meaningful structure is often directional before it is geometric. + +--- + +## Probability Surfaces and Class Assignment + +Because classification in NNS is based on local probability estimation, it is useful to distinguish three related objects. + +### Local class probability surface + +For each class \(c\), + +\[ +x \mapsto \hat p_c(x) +\] + +gives the estimated probability that \(x\) belongs to class \(c\). + +### Hard classification map + +\[ +x \mapsto \hat g(x) +\] + +assigns the label with largest estimated local probability. + +### Classification certainty + +A useful summary in the binary case is + +\[ +\hat C(x) = |2\hat p_1(x)-1|. +\] + +This lies in \([0,1]\). + +- \(\hat C(x)=0\) indicates maximal local ambiguity, +- \(\hat C(x)=1\) indicates complete local separation. + +In the multiclass case, an analogous certainty measure is + +\[ +\hat C(x)= \hat p_{(1)}(x)-\hat p_{(2)}(x), +\] + +where \(\hat p_{(1)}\) and \(\hat p_{(2)}\) are the largest and second-largest local class probabilities. + +This difference measures the local margin between the best and second-best labels. + +Thus the classifier naturally provides not only a label, but also a measure of how decisively that label is supported. + +--- + +## Package Implementation Note + +In the NNS package, the classification logic described here is exposed most practically through the ensemble interfaces `NNS.boost()` and `NNS.stack()`, using `type = "CLASS"` to invoke class-label prediction rather than numeric conditional-mean prediction. + +The underlying partition logic remains the same: predictor space is decomposed into benchmark-relative regions, local class probabilities are estimated within those regions, and final assignment is made by dominant-label selection through + +\[ +\hat g(x)=\arg\max_c \hat p_c(x). +\] + +The function `NNS.reg()` is useful for understanding the underlying partition-based estimation structure, and more generally the NNS framework supports classification through the same recursive partition machinery developed for regression. But in applied package use, classification is most naturally presented through the boosted and stacked interfaces, which stabilize the local decision rule and improve empirical performance. + +--- + +## Boosting and Stacking in the NNS Framework + +The base partition classifier can be strengthened through ensemble methods. + +The NNS framework includes two especially important ensemble ideas: + +- **boosting**, and +- **stacking**. + +These extend the partition-based classifier without abandoning the directional logic. + +### Boosting + +Boosting combines many classification models generated from resampled or reweighted data. Each model captures a slightly different local view of the class structure. Their outputs are then aggregated. + +Conceptually, if + +\[ +\hat g^{(1)}, \hat g^{(2)}, \dots, \hat g^{(B)} +\] + +denote classifiers trained across bootstrap or resampled iterations, the boosted classifier aggregates them via voting or averaged class probabilities. + +This reduces instability in any single partition realization and improves classification robustness, especially when boundaries are irregular or the sample is noisy. + +### Stacking + +Stacking combines multiple candidate classifiers at a second stage. Instead of choosing a single best model class, stacking learns how to weight or combine them. + +If + +\[ +\hat p_c^{(1)}(x), \hat p_c^{(2)}(x), \dots, \hat p_c^{(m)}(x) +\] + +are probability estimates from different base learners, a stacked classifier forms a combined estimate + +\[ +\hat p_c^{\text{stack}}(x) +\] + +through optimized combination, then classifies by the largest combined probability. + +In the NNS setting, stacking is especially natural because class probabilities are already interpretable directional quantities. The second-stage learner therefore combines probability surfaces, not merely hard labels. + +These ensemble procedures preserve the core NNS strengths: + +- nonlinear boundaries, +- local adaptivity, +- and benchmark-relative interpretation, + +while often improving predictive accuracy. + +### Illustrative Workflow + +A practical classification workflow in NNS therefore proceeds in two layers. + +1. The partition-based learner defines local benchmark-relative class neighborhoods. +2. The ensemble wrapper aggregates those local decisions across resamples or model combinations. + +Schematically, this is implemented by calling `NNS.boost(..., type = "CLASS")` for boosted classification or `NNS.stack(..., type = "CLASS")` for stacked classification. The first improves stability through repeated resampling and aggregation; the second combines candidate classifiers through an optimized second-stage rule. + +Thus the package implementation mirrors the theory exactly: local directional partitions generate class probabilities, and ensemble aggregation improves robustness of the final decision boundary. + +--- + +## Why Ensemble Classification Helps + +The usefulness of boosting and stacking can be understood mathematically. + +Suppose each base classifier produces an estimated class probability + +\[ +\hat p_b(x) = p(x) + \varepsilon_b(x), +\] + +where \(p(x)\) is the target probability and \(\varepsilon_b(x)\) is model-specific estimation error. + +Averaging over \(B\) such classifiers gives + +\[ +\bar p_B(x) = \frac{1}{B}\sum_{b=1}^B \hat p_b(x) += p(x) + \frac{1}{B}\sum_{b=1}^B \varepsilon_b(x). +\] + +If the errors are centered and not perfectly dependent, then + +\[ +Var(\bar p_B(x)) += +\frac{1}{B^2} +\sum_{b=1}^B Var(\varepsilon_b(x)) ++ +\frac{2}{B^2}\sum_{b **Further Reading / Examples** +> For machine learning applications, including the MNIST classification example, see the [NNS Machine Learning Examples](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/README.md#3-machine-learning). diff --git a/tools/NNS/book/chapter-24-ensemble-methods.Rmd b/tools/NNS/book/chapter-24-ensemble-methods.Rmd new file mode 100644 index 0000000..785df79 --- /dev/null +++ b/tools/NNS/book/chapter-24-ensemble-methods.Rmd @@ -0,0 +1,478 @@ +# Ensemble Methods + +Chapters 20–22 developed the machine-learning side of the NNS framework from three complementary angles. + +- Chapter 20 treated recursive partitions as an unsupervised clustering device. +- Chapter 21 used those same partitions for conditional expectation estimation. +- Chapter 22 used them for local conditional probability estimation and classification. + +A natural next step is to combine multiple directional learners into a single predictive system. + +This is the role of **ensemble methods**. + +In classical machine learning, ensembles improve predictive performance by combining many imperfect models. Bagging reduces variance through aggregation. Boosting emphasizes informative learners. Stacking combines model outputs through a meta-model. Random forests, gradient boosting, and stacked generalization are all expressions of this general idea. + +The directional framework reaches the same destination by a different route. + +Rather than aggregating trees, margins, or globally specified basis functions, NNS ensembles aggregate **benchmark-relative nonparametric learners** built from recursive partition logic. The result is not an imported ensemble superstructure grafted onto a classical base learner. It is an extension of the same directional machinery already developed in this book. + +Two package routines operationalize this idea: + +- `NNS.boost`, which performs resampling, feature-subset screening, and aggregation of NNS learners, +- `NNS.stack`, which uses the predictions of NNS base models as meta-features for an optimized stacked model, including cross-validated selection of the neighbor count $k$ used in multivariate regression-point prediction. + +This chapter develops the conceptual role of ensembles in NNS, explains how boosting and stacking fit within the broader directional framework, and discusses practical issues of cross-validation, stability, and computational cost. + +--- + +## Why Ensemble Learning Helps + +A nonparametric learner is flexible precisely because it does not impose a rigid functional form. That flexibility is a strength, but it also creates variability. + +When data are finite, noisy, imbalanced, or high-dimensional, different local partitions can emphasize different parts of the structure. One learner may capture an important threshold effect; another may be more stable in the center of the distribution; another may better recognize rare classes or tail behavior. + +No single learner is guaranteed to be uniformly best across all regions of the sample space. + +This motivates ensemble learning. + +The basic idea is simple: + +1. generate multiple candidate learners, +2. allow them to capture different aspects of the data, +3. combine them in a way that improves stability and accuracy. + +In classical language, ensembles often improve performance by reducing variance without increasing bias too severely, or by reducing bias through adaptive combinations of weak learners. + +The directional viewpoint sharpens this intuition. + +Because benchmark-relative partitions preserve local structure, different learners may disagree not randomly but **structurally**: + +- one learner may emphasize upper-tail behavior, +- another may emphasize local neighborhood structure, +- another may emphasize feature subsets with stronger nonlinear dependence, +- another may perform better under class imbalance. + +An ensemble can therefore be interpreted as a way of aggregating **multiple local structural views** of the same problem. + +--- + +## Ensemble Logic in the NNS Framework + +The conceptual continuity with earlier chapters is important. + +Throughout the book, the central move has been: + +- start with directional structure, +- partition relative to benchmarks, +- summarize within those regions, +- aggregate only afterward. + +Ensemble learning in NNS follows exactly the same logic. + +A single NNS learner already performs a local structural decomposition of the data. An ensemble performs a second-level aggregation across many such decompositions. + +So the order of construction is + +1. **within-learner aggregation**: local averages or local class probabilities inside benchmark-relative partitions, +2. **between-learner aggregation**: combining many directional learners into a final estimate. + +This two-level structure is why ensembles are natural in the NNS setting rather than auxiliary. + +The first level handles nonlinear local geometry. +The second level stabilizes that geometry across feature subsets, folds, and candidate model specifications. + +--- + +## Base Learners: Recursive Partition Estimation + +Both `NNS.boost` and `NNS.stack` are built on the same base principle: prediction from the NNS regression engine. + +The package documentation for `NNS.stack` states that it is a prediction model using the predictions of the NNS base models as features, and the documentation for `NNS.boost` states that it is an ensemble method using NNS multivariate regression as the base learner rather than trees. In both cases, the base learner is therefore not a decision stump, not a CART tree, and not a linear model. It is the directional nonparametric estimator developed in earlier chapters. + +This matters conceptually. + +Classical boosting — and its modern descendant gradient boosting — achieves its power by iteratively correcting residuals from deliberately weak base learners. The mathematical theory of that residual-correction process is well developed, and systems like XGBoost have been engineered to exploit it with exceptional efficiency. NNS does not attempt to replicate that theory. Its base learner is already a flexible nonlinear estimator; there are no residuals to correct in the same sense. Ensemble improvement in NNS comes from a different source entirely: stability and structural coverage across feature subsets and validation splits, not from iterative error minimization. + +This is an engineering difference as much as a mathematical one. Competing with XGBoost on its own terms — speed of residual descent, regularization of additive tree models, hardware-aware split enumeration — is not the goal of NNS ensembles and should not be the standard of comparison. The goal is a directional nonparametric system that can represent nonlinear, asymmetric, and threshold-driven structure without imposing a tree topology, and that remains interpretable through the same benchmark-relative geometry used throughout the book. + +That distinction gives NNS ensembles a different flavor: + +- the individual learners are already nonlinear and locally adaptive, +- ensemble gains come primarily from stabilization, feature engineering, and structural aggregation, +- interpretation remains tied to local partitions rather than to abstract parameter updates. + +--- + +## Resampling and Aggregation via `NNS.boost` + +The `NNS.boost` routine implements ensemble learning through a sequence of resampling, feature screening, threshold selection, and final aggregation. + +At the interface level, the routine accepts training predictors `IVs.train`, a response `DV.train`, optional test predictors `IVs.test`, learner controls such as `depth`, `learner.trials`, `epochs`, `CV.size`, and optimization controls such as `obj.fn`, `objective`, and `threshold`. It also supports class balancing through `balance`, time-series handling through `ts.test`, prediction intervals through `pred.int`, and feature-frequency summaries through `features.only` and `feature.importance`. + +The package description makes two points immediately clear. + +First, `NNS.boost` is not restricted to numeric regression; it can also be used for classification via `type = "CLASS"`. + +Second, the routine is not merely averaging many full-model fits on bootstrap resamples. It is also **learning which feature combinations are useful**. + +### Threshold-learning stage + +At an abstract level, the procedure begins by generating many candidate feature subsets and evaluating their predictive performance on resampled validation splits. Let + +$$\mathcal{F}_1,\mathcal{F}_2,\dots,\mathcal{F}_M$$ + +denote the candidate feature subsets. For each subset $\mathcal{F}_m$, a base learner produces predictions $\hat y^{(m)}_i$ on held-out observations, and an objective function evaluates the result: + +$$J_m = \Phi\!\bigl(\hat y^{(m)}, y\bigr),$$ + +where $\Phi$ is the chosen objective. + +The package default for continuous targets is sum of squared errors, + +$$\Phi(\hat y,y)=\sum_i (\hat y_i-y_i)^2,$$ + +while for classification the code automatically switches the default objective to accuracy when `type = "CLASS"`. + +The threshold is then learned from the empirical distribution of the candidate objective scores. In the default setting `extreme = FALSE`, the routine uses the upper hinge of the five-number summary for maximization problems and the lower hinge for minimization problems. If `extreme = TRUE`, it instead uses the literal maximum or minimum observed score. Feature subsets whose validation performance passes that threshold are retained for the ensemble. + +This is the first important NNS-specific departure from classical textbook boosting: + +**the algorithm is not reweighting observations sequentially in the AdaBoost sense; it is screening and aggregating feature-defined directional learners.** + +### Weighted feature sampling in the epoch loop + +After the threshold-learning stage, `NNS.boost` constructs a **feature pool weighted by survival frequency**: each feature index is repeated in proportion to how often it appeared across the learner-trial sets that passed the threshold. In each epoch, two random decisions are made jointly — a feature count $k \in \{1, \ldots, n\}$ is drawn uniformly, and then $k$ distinct indices are sampled from this weighted pool without replacement. Features that recurred often in successful learner trials are therefore overrepresented in the draw, but any combination of any size can still appear. + +This is structurally analogous to the random-subspace mechanism in random forests, but with a crucial difference: the sampling probabilities are not uniform. They are earned. A feature earns higher sampling weight by having appeared in learner-trial subsets that cleared the validation threshold. The epoch loop is thus not random exploration of feature space — it is **weighted exploration biased toward stability**, with the bias itself learned from the threshold stage. + +An epoch's feature set is retained only if its validation objective clears the same threshold. The surviving sets feed the frequency table that drives the final synthetic predictor construction described below. + +### User-specified objectives + +The objective need not be squared error or simple accuracy. The package allows any objective written as an expression in `predicted` and `actual`. Thus users may supply application-specific measures such as precision-weighted loss, F-score style criteria, percentage error, or other custom objectives. + +This is an important practical strength. It allows the ensemble to be aligned with the actual loss relevant to the application rather than with a default proxy. + +### Feature-frequency aggregation + +Once useful candidate subsets have been identified, the retained feature sets are aggregated by frequency. Features that appear often among successful learners receive greater weight in the final construction. + +This makes `NNS.boost` partly a prediction routine and partly a feature-stability routine. + +The ensemble is therefore interpretable in two linked ways: + +- through its predictions, +- through the frequency with which features participate in successful directional learners. + +That second output is especially useful in nonlinear settings, where variable importance is often harder to read from a single local model. + +### Final estimate + +Once the epoch loop is complete, feature frequencies are normalized across all passing epochs to produce a weight vector aligned to the columns of `IVs.train`. These normalized frequencies are then supplied to `NNS.reg` as a custom coefficient vector through `dim.red.method`, which constructs a frequency-weighted synthetic predictor $X^*$ from the original features. In this way, predictors that recur more often among successful learners receive greater influence in the final synthetic index. + +The final estimate is not taken directly from that first dimension-reduction fit. Instead, `NNS.boost` uses the resulting training and test projections onto $X^*$ to form a two-column design `(X^*, X^*)`, and then passes that design to `NNS.stack` with `method = 1`. This allows the routine to obtain the final prediction through the NNS regression-point mechanism while using the stacked framework to optimize the terminal neighbor-selection step `n.best`. + +Accordingly, the final model should not be described as a committee vote, a simple average of retained learners, or merely a single `NNS.reg` fit on the weighted composite. It is better understood as a **stability-weighted synthetic predictor followed by an optimized final NNS regression-point estimate**. The boosting stage learns which features survive repeated validation, and the closing stacking step converts that learned structure into the final prediction rule. + +--- + +## Optimized Stacking via `NNS.stack` + +If `NNS.boost` performs feature-based ensemble screening, `NNS.stack` performs **meta-learning from NNS predictions** — and simultaneously optimizes the neighbor count $k$ used in the multivariate regression-point search. + +The package documentation describes `NNS.stack` as a prediction model using the predictions of NNS base models as features for the stacked model. That sentence captures the essential idea of stacking. + +Suppose that multiple base learners produce predictions + +$$\hat y^{(1)}(x),\hat y^{(2)}(x),\dots,\hat y^{(K)}(x).$$ + +A stacked model does not choose one of them. It treats them as a new feature vector + +$$z(x)=\bigl(\hat y^{(1)}(x),\dots,\hat y^{(K)}(x)\bigr)$$ + +and then learns a second-stage predictor + +$$\hat y_{\mathrm{stack}}(x)=G\!\bigl(z(x)\bigr).$$ + +In the NNS setting, the base learners come from two main sources documented in the function interface: + +- **Method 1**: direct `NNS.reg` prediction, which in the multivariate case operates as regression-point nearest-neighbor search over a compressed set of local conditional means (as developed in Chapter 21) — not a piecewise-linear surface. +- **Method 2**: dimension-reduction regression built from synthetic predictor combinations, which collapses the predictor space to a univariate index before applying standard NNS regression. + +The `method` argument controls whether method 1, method 2, or both are used. The default `method = c(1, 2)` includes both, which means that the stacked system can combine the geometry-preserving regression-point prediction of method 1 with the parsimonious synthetic-index regression of method 2. + +This is important because the two sources of prediction reflect structurally different views of the same problem. Method 1 operates in the full predictor space through a compressed nearest-neighbor geometry; method 2 compresses the predictor space itself before any regression is performed. Stacking across both allows the meta-learner to weight whichever structural representation generalizes better on the held-out validation data. + +The `dim.red.method` argument controls how synthetic predictor weights are determined for method 2: + +- `"cor"` for linear correlation, +- `"NNS.dep"` for nonlinear dependence, +- `"NNS.caus"` for directional causation, +- `"equal"` for equal weighting, +- `"all"` for averaging all methods. + +Thus the stacked learner can use not only multiple predictions, but multiple **structural weighting philosophies**. In particular, when `"NNS.caus"` is used, the weighting is directional rather than symmetric: the synthetic regressor is constructed from estimated causal influence rather than from a purely mutual dependence score. + +### Optimizing $k$: neighbor count selection in the regression-point search + +A key mechanism of `NNS.stack` that distinguishes it from generic stacking is the **cross-validated optimization of $k$**, the number of nearest regression-point neighbors used in the multivariate prediction step of method 1. + +Recall from Chapter 21 that multivariate `NNS.reg` does not predict by connecting regression points with line segments. It performs nearest-neighbor search over the regression-point matrix — a compressed set of local conditional means derived from the per-variable partitions. The number of neighbors $k$ used in that search directly controls the smoothness of the final prediction: small $k$ produces more local, potentially noisy predictions; large $k$ produces broader averaging that may underfit sharp local structure. + +Choosing the right $k$ is therefore a bias-variance decision specific to the regression-point geometry of each dataset. `NNS.stack` addresses this automatically within the fold loop: across each fold, candidate values of $k$ are evaluated on held-out validation predictions, and the $k$ that best satisfies the chosen objective is selected. The selected $k$ is then used for final prediction on the test set. + +This means `NNS.stack` is not merely stacking predictions from fixed base learners. It is simultaneously discovering the right localization level for the regression-point nearest-neighbor search, fold by fold, as part of the same cross-validation loop that evaluates feature combinations and classification thresholds. + +The practical consequence is substantial. A value of $k$ that is too small will overfit to idiosyncratic regression-point neighborhoods; a value that is too large will smooth away the local structure that makes the regression-point geometry useful in the first place. Cross-validated $k$ selection finds the point of best generalization without requiring the analyst to tune it manually. + +This is one of the clearest ways in which `NNS.stack` goes beyond assembling predictions from pre-configured learners: it actively optimizes a structural hyperparameter of the underlying prediction mechanism, not just the weights placed on each learner's output. + +### Classification threshold optimization + +For classification problems, `NNS.stack` includes `optimize.threshold = TRUE` by default. This means the routine does not simply round probabilities at $0.5$ in every case. Instead it searches a grid of candidate thresholds on validation predictions and chooses the threshold that maximizes the selected objective. The final classification threshold is then aggregated across folds into the reported `probability.threshold`. + +That is especially useful under class imbalance, where the optimal decision threshold may differ materially from one half. + +### Distance options + +The `dist` argument permits `"L1"`, `"L2"`, `"DTW"`, and `"FACTOR"` distances. + +This is another respect in which NNS stacking differs from generic stacking. The stacked system is not confined to Euclidean geometry. It can accommodate + +- Manhattan distance, +- Euclidean distance, +- dynamic time warping for temporal alignment, +- factor-frequency style handling for discrete structures. + +The choice of distance metric applies to the regression-point nearest-neighbor search in method 1. Different distance metrics will define different neighborhoods in the regression-point space, and `NNS.stack`'s cross-validation loop evaluates which combination of $k$ and distance metric generalizes best on the held-out data. + +So the meta-model is not merely combining predictions. It is combining predictions within a distance-aware, $k$-optimized, data-type-aware directional framework. + +--- + +## Cross-Validation in Nonparametric Settings + +Cross-validation plays a central role in both `NNS.boost` and `NNS.stack`. + +This is not accidental. In nonparametric estimation, flexibility is high and parametric asymptotic approximations are often less informative. Validation by held-out prediction is therefore especially important. + +The package interface exposes this through + +- `CV.size`, the cross-validation proportion, +- `folds`, the number of cross-validation folds, +- and, in the case of `NNS.boost`, repeated learner trials and epochs. + +### What cross-validation is optimizing + +In the NNS ensemble context, cross-validation is not merely estimating out-of-sample error. It is simultaneously optimizing several interconnected structural choices: + +- in `NNS.boost`: which feature subsets produce predictions that generalize, and how to weight them, +- in `NNS.stack` for regression: what value of $k$ produces the right level of localization in the regression-point nearest-neighbor search, +- in `NNS.stack` for classification: what probability threshold maximizes the chosen classification objective. + +A classical parametric model may have only a few coefficients whose complexity is explicit once the model is fit. A directional nonparametric ensemble is different. Its effective complexity depends on partition geometry, feature subset choice, neighbor count, class balancing, dimension reduction, and aggregation across learners. Cross-validation is therefore the practical device that simultaneously determines how much local structure to trust and at what spatial scale to apply it. + +### General formulation + +Let a loss function be denoted by $\ell(y,\hat y)$. If the sample is partitioned into folds $\mathcal{I}_1,\dots,\mathcal{I}_K$, then the $K$-fold cross-validation score for a model $M$ is + +$$CV_K(M) = \frac{1}{K} \sum_{k=1}^K \frac{1}{|\mathcal{I}_k|} \sum_{i\in \mathcal{I}_k} \ell\bigl(y_i,\hat y_i^{(-k)}\bigr),$$ + +where $\hat y_i^{(-k)}$ denotes the prediction for observation $i$ from the model trained without fold $k$. + +Because the model is benchmark-relative and local, cross-validation is effectively checking whether the local structural decomposition — including the regression-point geometry and the chosen $k$ — generalizes beyond the particular sample partition that produced it. + +### Random versus temporal validation + +Both routines also include a `ts.test` option for time-series settings. This matters because ordinary random cross-validation can break temporal dependence and produce misleadingly optimistic results when data are ordered. + +In time-dependent settings, `ts.test` should be used so that validation preserves temporal ordering rather than random folds. This is the right way to adapt the ensemble logic to forecasting and other sequential applications, and it applies equally to the $k$ optimization step: the optimal neighbor count for a time-ordered regression-point geometry may differ materially from what random-fold cross-validation would select. + +--- + +## Ensemble Learning and the Bias–Variance Tradeoff + +The classical motivation for ensembles is often expressed through the bias–variance decomposition. + +If a predictor $\hat f(x)$ is used for squared-error prediction, then at a fixed point $x$, + +$$E\bigl[(Y-\hat f(x))^2 \mid X=x\bigr] = \sigma^2(x) + \bigl(E[\hat f(x)]-f(x)\bigr)^2 + Var(\hat f(x)),$$ + +where $\sigma^2(x)$ is irreducible noise, the squared middle term is bias, and the final term is variance. + +Ensembles often help because averaging many unstable predictors can reduce the variance term. + +In the NNS context, this logic remains true but should be interpreted geometrically. + +A single learner depends on a particular partition, feature subset, validation split, and — in the multivariate case — a particular value of $k$ for the regression-point nearest-neighbor search. Different learners therefore generate different local geometric approximations to the regression or classification surface. Aggregating across them stabilizes the resulting estimate. + +So for NNS ensembles: + +- **variance reduction** comes from averaging across multiple local structural decompositions, +- **bias reduction** may come from allowing multiple structural views that no single learner captures well, +- **localization calibration** comes from cross-validated $k$ selection, which finds the spatial scale at which the regression-point geometry generalizes best, +- **robustness** comes from filtering learners through validation rather than trusting one partition unconditionally. + +This is why ensembles are especially attractive in nonlinear, asymmetric, or heterogeneous data settings. + +--- + +## Regression Ensembles and Classification Ensembles + +Both `NNS.boost` and `NNS.stack` support numeric or categorical targets, though the documentation for `NNS.boost` emphasizes classification and the code clearly adapts its objective behavior depending on `type`. + +### Regression ensembles + +For continuous responses, the ensemble aims to estimate $f(x)=E[Y\mid X=x]$ more accurately and more stably than a single learner. + +Here the relevant concerns are local curvature, heteroskedasticity, feature interactions, optimal neighbor count $k$ for the regression-point search, and out-of-sample squared or absolute loss. + +### Classification ensembles + +For categorical responses, the ensemble aims to estimate conditional class probabilities $P(Y=c\mid X=x)$ or their decision-equivalent ranking more accurately and more stably. + +Here the relevant concerns are class imbalance, threshold optimization, rare-class recognition, and discrete decision accuracy. + +The distinction is not merely operational. It also changes the objective surface. + +In regression, averaging predictions often behaves smoothly, and the optimal $k$ is determined by the curvature of the underlying regression surface relative to the regression-point geometry. + +In classification, small changes in the predicted score can move observations across a decision threshold. This is why threshold optimization and balancing options are especially important for classification ensembles, and why the $k$ optimization and threshold optimization steps in `NNS.stack` are both necessary: one controls spatial localization, the other controls the decision boundary. + +--- + +## Relation to Classical Ensemble Methods + +The NNS ensemble framework overlaps with familiar classical methods, but it is not identical to any one of them. + +### Comparison with bagging + +Bagging stabilizes unstable learners by averaging across bootstrap samples. + +NNS boosting shares the spirit of stabilization through repeated resampled learning, but it is more selective: it screens learners by performance and tracks feature frequencies rather than averaging every learner equally. + +### Comparison with AdaBoost and gradient boosting + +Classical boosting methods often reweight observations sequentially or fit residuals stage by stage. + +`NNS.boost` is different. It is closer to a performance-thresholded ensemble over feature subsets and validation splits, using NNS regression as the base learner. It does not rely on the same additive stagewise residual-updating mechanism as gradient boosting. + +### Comparison with random forests + +Random forests combine tree learners built on bootstrap samples and random feature subsets. + +NNS ensembles share the idea of random or selective feature subsets, but the base learners are not trees. They are directional nonparametric regressors and classifiers based on recursive partition structure. Moreover, in `NNS.boost`, random feature subsets are not retained merely because they were sampled; they are screened by validation performance and then aggregated by frequency. This makes the feature-selection step partly stochastic and partly performance-driven. + +The present empirical dominance of tree-based ensembles may reflect engineering maturity as much as statistical principle. Tree methods benefit from decades of optimization in split search, pruning, regularization, parallelization, software design, and default tuning. To the extent that NNS better preserves nonlinear, asymmetric, and directional information, improvements in algorithmic efficiency, tuning strategy, and implementation may allow those structural advantages to appear more consistently in practical applications. Thus, the theoretical contrast with trees should not be framed only as a contest of current benchmark performance, but also as a question of whether engineering has caught up with the underlying statistical geometry. + +### Comparison with classical stacking + +Classical stacking uses predictions from multiple models as features for a meta-model. + +This is the closest analogue to `NNS.stack`. But even here the base models, the $k$ optimization for the regression-point search, the dimension-reduction options, the distance metrics, and the threshold optimization are specific to the NNS framework. Standard stacking does not optimize a neighbor count for an underlying nearest-neighbor geometry because its base learners are not nearest-neighbor estimators over compressed regression points. + +So the correct interpretation is not that NNS simply reimplements classical ensemble methods with new names. It is that NNS develops **directional analogues of those ensemble principles**, with an additional layer of structural optimization — $k$ selection — that arises naturally from the regression-point prediction mechanism. + +--- + +## Practical Performance Considerations + +Because ensemble methods combine multiple learners, practical considerations matter. + +### Computational cost + +Ensembles are more computationally intensive than single fits. This is especially true when the number of predictors is large, the candidate subset space is large, many folds or trials are used, $k$ optimization spans a wide candidate range, or time-series distance calculations such as DTW are involved. The price of flexibility is computation. + +### Feature dimensionality + +As dimension grows, the number of possible feature subsets grows combinatorially. If there are $p$ predictors, then the total number of nonempty subsets is + +$$\sum_{k=1}^{p} \binom{p}{k} = 2^p - 1.$$ + +This growth explains why exhaustive search becomes infeasible in high dimension and why threshold-based screening is practically valuable. + +In small dimensions, `NNS.boost` can evaluate all subsets deterministically. In larger dimensions, it instead samples feature combinations rather than enumerating the full subset space. Thus the procedure remains a guided stochastic search rather than a brute-force exhaustive one. + +### Class imbalance + +When classes are imbalanced, raw accuracy may be misleading. The `balance` option in both routines helps address this by combining down-sampling and up-sampling when classification is requested. A more complete treatment of this imbalance-handling ensemble workflow appears in Chapter 25, where multivariate forecasting workflows use the same up/down-sampling logic under skewed classification-style targets. + +### Missing data + +The package notes make clear that missing data should be handled before fitting. This is especially important for nonparametric ensembles, where local structure can be distorted badly by ad hoc missing-value handling. + +### Objective-function choice + +The user is not restricted to squared error. Any objective expressed in terms of `predicted` and `actual` can be supplied. This allows the ensemble — including the $k$ selection step — to be tuned to the application's actual loss function rather than to a generic default. + +--- + +## Interpretation of Ensemble Output + +One criticism often leveled at ensemble methods is that they improve prediction at the cost of interpretability. + +That criticism is less severe in the NNS setting than it is in many black-box systems. + +### Prediction interpretation + +The final output is still a benchmark-relative nonparametric estimate. It remains tied to the directional structure developed throughout the book. + +### Neighbor-count interpretation + +The cross-validated $k$ returned by `NNS.stack` is itself informative. A small optimal $k$ indicates that the regression-point geometry contains sharp local variation that is best captured with tight neighborhoods. A large optimal $k$ indicates that the response surface is smoother relative to the regression-point distribution, and that broader averaging generalizes better. The selected $k$ is therefore a data-driven summary of the effective locality of the regression surface. + +### Feature-frequency interpretation + +`NNS.boost` returns feature weights and feature frequencies. These summarize which predictors recur most often among successful learners. This does not provide the same interpretation as a linear coefficient, but it does provide a meaningful **stability-based importance profile**. + +### Structural interpretation + +Because the base learners are directional and partition-based, the ensemble still reflects local structural decomposition rather than an opaque hidden representation. + +So interpretability is not lost entirely. It changes form: + +- less coefficient interpretation, +- more structural, stability, and localization interpretation. + +That is often the right trade in nonlinear settings. + +--- + +## Conceptual Summary + +This chapter completes the machine-learning progression. + +- Chapter 20 used recursive partitions for **unsupervised grouping**. +- Chapter 21 used them for **numeric prediction**, with regression-point nearest-neighbor search in the multivariate case. +- Chapter 22 used them for **categorical prediction**. +- This chapter uses them in **aggregated form** to improve predictive stability and performance, with cross-validated optimization of the neighbor count $k$ as a central mechanism. + +The conceptual thread is unbroken. + +The same directional primitive that generated partial moments, dependence measures, causation, clustering, regression, and classification also supports ensemble learning — and the $k$ optimization in `NNS.stack` is a direct consequence of the regression-point nearest-neighbor prediction mechanism established in Chapter 21. Once prediction is understood as a nearest-neighbor search over compressed local conditional means rather than a piecewise-linear surface, it becomes natural that the ensemble layer would need to determine the right localization scale for that search. + +--- + +## Summary + +This chapter developed ensemble learning in the NNS framework as **aggregation of directional nonparametric learners**. + +Its main contributions are sixfold. + +**First**, it explained **feature-subset screening based on predictive performance**. Rather than combining all sampled learners indiscriminately, `NNS.boost` retains those feature-defined learners whose validation performance passes a learned threshold. + +**Second**, it described the **weighted epoch sampling mechanism**. After the threshold-learning stage, feature indices are pooled with weights proportional to their survival frequency, and each epoch draws from this weighted pool. The epoch loop is therefore not uniform random exploration but structured search biased toward stable features — a performance-earned analog of the random-subspace method. + +**Third**, it introduced the **cross-validated optimization of $k$** in `NNS.stack`. Because multivariate NNS regression predicts through nearest-neighbor search over a compressed regression-point matrix — not through a piecewise-linear surface — the number of neighbors $k$ is a structural hyperparameter that directly controls the localization scale of prediction. `NNS.stack` optimizes $k$ fold by fold as part of its cross-validation loop, selecting the neighbor count that best generalizes on held-out data. This is one of the clearest ways `NNS.stack` goes beyond generic stacking: it actively discovers the right spatial scale for the underlying prediction mechanism. + +**Fourth**, it developed **dimension-reduction stacking with multiple dependence metrics**. Synthetic predictors may be constructed using linear correlation, nonlinear dependence, directional causation, equal weighting, or an average across all methods, and method 1 (regression-point nearest-neighbor) and method 2 (synthetic-index univariate regression) can be combined within the same stacked model. + +**Fifth**, it showed that NNS stacking is a form of **distance-aware meta-learning**. The ensemble can combine predictions using Euclidean, Manhattan, dynamic time warping, or factor-based geometry, and the chosen distance metric applies to the regression-point neighbor search. + +**Sixth**, it emphasized **cross-validation as the practical control of complexity** in nonparametric ensemble learning. Cross-validation simultaneously optimizes feature selection, neighbor count, and classification thresholds — not merely estimating out-of-sample error, but actively determining the structural configuration of the learner. + +Taken together, these results show that ensemble methods in NNS are not auxiliary add-ons. They are the natural machine-learning extension of the book's central principle: + +**start with directional structure, preserve it locally, and aggregate only afterward.** + +The next part of the book turns to **time series**, where the same nonparametric and directional principles are extended to temporal dependence, forecasting, and multivariate dynamics. \ No newline at end of file diff --git a/tools/NNS/book/chapter-25-nonparametric-time-series-models.Rmd b/tools/NNS/book/chapter-25-nonparametric-time-series-models.Rmd new file mode 100644 index 0000000..6898eb1 --- /dev/null +++ b/tools/NNS/book/chapter-25-nonparametric-time-series-models.Rmd @@ -0,0 +1,714 @@ +# Nonparametric Time Series Models + +Part VII turns the directional framework toward **time**. + +Previous chapters developed nonparametric estimation, clustering, regression, classification, and ensemble learning using recursive partitioning, local averaging, and benchmark-relative structure. Those methods treated observations as unordered or cross-sectional. Time-series analysis adds a new constraint: + +**the observations arrive in sequence, and that ordering matters.** + +A time series is not merely a set of values. It is a structured sequence in which the past may influence the future, seasonal patterns may recur, and dependence may change across time. + +Classical time-series analysis addresses these problems through models such as **ARIMA**, **ETS**, and **state-space methods**. These are often effective, but they inherit familiar limitations: + +- linear autoregressive structure, +- parametric error assumptions, +- explicit stationarity requirements, +- and model identification choices that must be imposed before estimation. + +The NNS framework approaches time series differently. + +At its heart, time-series modeling is treated as a **subset regression problem**: a sequence is decomposed into lagged component series, and future values are forecast by applying the same nonlinear nonparametric regression logic developed earlier in the book to those components. In this view, autoregression is not abandoned. It is generalized. + +Rather than beginning with a linear dynamic equation, NNS begins with a simpler principle: + +**forecast the future by learning the structure of the past without imposing a parametric law for that structure.** + +This chapter develops that idea. + +--- + +## Time Series as Ordered Nonparametric Data + +Let + +\[ +\{X_t\}_{t=1}^T +\] + +denote a real-valued time series. + +In classical analysis, a time series is often modeled through a difference equation such as + +\[ +X_t = \phi_1 X_{t-1} + \cdots + \phi_p X_{t-p} + \varepsilon_t, +\] + +possibly after differencing, detrending, or seasonal adjustment. + +That formulation assumes from the outset that the dynamic relation is linear in the lagged values. + +The directional framework takes a broader view. + +A time series can be written as a regression problem in which the response is the future observation and the predictors are functions of the past: + +\[ +X_t = f(X_{t-1}, X_{t-2}, \dots) + \varepsilon_t. +\] + +The task is then to estimate the unknown dynamic map \(f\) nonparametrically. + +This places time-series analysis inside the same framework developed in Chapters 19–24: + +- identify informative local structure, +- partition or decompose the data, +- estimate conditionally, +- and aggregate only afterward. + +The only additional element is temporal order. + +--- + +## Why Classical Time-Series Models Can Fail + +The central models of classical forecasting are powerful, but they are built around structural assumptions that many real series violate. + +### Linearity + +ARIMA and related autoregressive models assume that the next observation depends linearly on lagged values, perhaps after transformation. But many series exhibit threshold effects, asymmetric responses, cyclical distortions, or nonlinear seasonal interactions. + +### Stationarity requirements + +The Box–Jenkins framework is built around stationarity. In practice, many observed series are not stationary in level, variance, or seasonal structure. Transformations and differencing may help, but they also alter the object being modeled. + +### Parametric identification + +Classical modeling requires choosing model orders, differencing levels, seasonal terms, and error structures. These decisions can materially change the forecast. + +### Symmetric error treatment + +Least-squares fitting treats positive and negative forecast errors symmetrically, even in contexts where underprediction and overprediction have different consequences. + +The directional nonparametric approach seeks to preserve the useful idea of autoregression while relaxing these restrictions. + +--- + +## Autoregression as a Subset Regression Problem + +The NNS time-series framework begins from a simple decomposition. + +Suppose a series exhibits a seasonal or cyclical lag \(m\). Then observations separated by that lag belong to a common component series: + +\[ +\{X_1, X_{1+m}, X_{1+2m}, \dots\}, +\quad +\{X_2, X_{2+m}, X_{2+2m}, \dots\}, +\quad \dots +\quad +\{X_m, X_{2m}, X_{3m}, \dots\}. +\] + +Each component series is itself a smaller time series indexed by occurrence number within that phase. + +Forecasting then becomes a regression problem on each component series separately. + +For a given component series with index vector + +\[ +z = 1,2,\dots,n_j +\] + +and values + +\[ +y^{(j)}_1, y^{(j)}_2, \dots, y^{(j)}_{n_j}, +\] + +we estimate the next value through either linear or nonlinear regression: + +\[ +y^{(j)}_{n_j+1} \approx \hat f_j(n_j+1). +\] + +The final forecast aggregates these component forecasts using weights determined by predictive strength. + +This is why time series in NNS are best viewed as a subset regression problem: + +- the original series is partitioned into lag-defined subsets, +- each subset is modeled with NNS regression, +- and the subset forecasts are recombined. + +Autoregression is therefore retained, but its mechanism is generalized from linear lag equations to **nonparametric lag-structure estimation**. + +A small implementation detail is worth noting. If the total series length \(T\) is not an exact multiple of \(m\), then the component series need not all have the same length. Some phases will contain one more observation than others. This creates no conceptual difficulty, but it matters in practice because shorter component series provide less information and therefore should generally receive less effective influence in the forecast aggregation. + +--- + +## Seasonal Decomposition Without Parametric Filters + +A major strength of the NNS approach is that seasonality is handled directly through component decomposition rather than through fixed harmonic terms or pre-imposed smoothing filters. + +Classical methods often represent seasonality through: + +- seasonal ARIMA operators, +- trigonometric terms, +- moving-average filters, +- or exponential smoothing recursions. + +The NNS approach instead asks a simpler question: + +**At which lag lengths does the series become more predictable when split into component sequences?** + +This is operationalized through a seasonality test based on the **coefficient of variation** of each component series relative to the coefficient of variation of the full series. + +If a lag \(m\) produces component series with lower coefficient of variation than the original series, then the lag reveals recurring structure. Intuitively: + +- a lower component-series coefficient of variation means tighter local behavior, +- tighter local behavior means greater predictability, +- and greater predictability indicates seasonality or cyclic structure. + +Thus seasonality is not defined through a parametric frequency-domain object. It is defined through **predictive concentration in lag-defined subsets**. + +--- + +## Seasonal Detection by Predictive Power + +Let the full series have coefficient of variation + +\[ +CV(X) = \frac{\sigma_X}{|\mu_X|}, +\] + +assuming the mean is nonzero. + +For a candidate lag \(m\), construct the \(m\) component series. Let their representative predictive concentration be summarized through their component coefficients of variation. + +If these component coefficients are systematically lower than the overall series coefficient of variation, then the lag \(m\) is informative. + +The interpretation is immediate: + +- lower \(CV\) means less dispersion relative to level, +- less dispersion means more stable phase-specific structure, +- more stable phase-specific structure means improved forecastability. + +This gives a nonparametric test for seasonality grounded in prediction rather than in harmonic decomposition. + +The chapter’s conceptual point is broader than the specific diagnostic: + +**seasonality is treated as recurring conditional structure, not as a parametric periodic law.** + +### A worked illustration + +Consider the toy quarterly series + +\[ +X = (10, 18, 11, 21, 12, 20, 13, 23). +\] + +The overall mean is + +\[ +\bar X = \frac{10+18+11+21+12+20+13+23}{8} = 16, +\] + +and the sample standard deviation is approximately + +\[ +s_X \approx 5.345. +\] + +Hence the overall coefficient of variation is + +\[ +CV(X) \approx \frac{5.345}{16} = 0.334. +\] + +Now test lag \(m=2\). The component series are + +\[ +(10,11,12,13) +\quad\text{and}\quad +(18,21,20,23). +\] + +For the first component, + +\[ +\bar X_1 = 11.5,\qquad s_1 \approx 1.291,\qquad CV_1 \approx \frac{1.291}{11.5}=0.112. +\] + +For the second, + +\[ +\bar X_2 = 20.5,\qquad s_2 \approx 2.082,\qquad CV_2 \approx \frac{2.082}{20.5}=0.102. +\] + +Both component coefficients of variation are far below the overall value \(0.334\). That means the lag-2 decomposition produces tighter, more internally coherent subseries than the original series. In the NNS interpretation, lag \(2\) reveals meaningful recurring structure. + +By contrast, consider lag \(m=3\). The three component series are + +\[ +(10,21,13),\qquad (18,12,23),\qquad (11,20). +\] + +These exhibit much larger within-component variation, so their component coefficients of variation are not uniformly smaller than the full-series value. In this case lag \(3\) is not as predictive as lag \(2\). + +This simple example shows exactly how the test works in practice. One computes the overall coefficient of variation, computes the component-series coefficients of variation for each candidate lag, and prefers those lags for which the component series become materially tighter than the original sequence. + +Mathematically, the logic is straightforward: if + +\[ +CV_j(m) < CV(X) +\quad \text{for component series } j=1,\dots,m, +\] + +then conditioning on phase within lag \(m\) reduces relative dispersion. Reduced relative dispersion means more concentrated conditional behavior, and more concentrated conditional behavior implies improved forecastability. + +In applications one usually needs an operational aggregation rule. A natural choice is a weighted average of component coefficients of variation, + +\[ +\overline{CV}(m) += +\sum_{j=1}^{m} w_j\,CV_j(m), +\qquad +w_j \ge 0, +\qquad +\sum_{j=1}^{m} w_j = 1, +\] + +where \(w_j\) may be proportional to component length or to some predictive-strength measure. Then lag \(m\) is favored when + +\[ +\overline{CV}(m) < CV(X). +\] + +An even stricter rule requires most, or all, component CVs to lie below the full-series CV. The exact rule is a modeling choice, but the guiding principle is unchanged: a good seasonal lag is one that produces tighter conditional distributions than the unpartitioned series. + +A technical caveat is also important. If \(\mu_X \approx 0\), then \(CV(X)\) can become unstable because the denominator is near zero. In such cases the analyst may instead compare component standard deviations directly, center the series at a more stable scale, or regularize the denominator by adding a small constant. The predictive logic remains the same even if the raw coefficient of variation is numerically unreliable. + +--- + +## Multiple Seasonalities + +Many real series have more than one recurring period. + +Examples include: + +- monthly data with annual and multi-year cycles, +- hourly data with daily and weekly cycles, +- sales data with weekly, monthly, and promotional rhythms. + +Classical models often struggle when multiple seasonalities interact, especially when the interactions are nonlinear or when the seasonal periods are not cleanly nested. + +The NNS framework handles this naturally by allowing multiple candidate lags: + +\[ +m_1, m_2, \dots, m_K. +\] + +Each lag defines its own component decomposition and its own forecast. These lag-specific forecasts can then be combined using weights reflecting both: + +- the predictive tightness of the corresponding component series, +- and the amount of information available within each lag structure. + +This is a central advantage of the nonparametric formulation: multiple seasonal patterns need not be forced into one rigid dynamic equation. They can be estimated as separate predictive structures and then aggregated. + +--- + +## Nonlinear Autoregressive Structures + +The classical AR model is linear in lagged values. NNS replaces this with a more general dynamic relation. + +If \(X_t\) depends on past observations through a nonlinear rule, then there is no reason to insist that the forecast be generated by a straight line fitted to lagged values. + +Suppose, conceptually, that + +\[ +X_t = f(X_{t-m}) + \varepsilon_t +\] + +for some unknown nonlinear function \(f\). + +The NNS framework estimates \(f\) by applying the nonparametric regression machinery from earlier chapters to each component series. This allows the method to capture: + +- turning points, +- diminishing effects, +- local curvature, +- asymmetric phase behavior, +- and regime-like transitions. + +The importance of this step cannot be overstated. + +A component series may itself be nonlinear even when the original series looks smooth. If the local phase-specific dynamics are nonlinear, a linear subseries regression can point in the wrong direction entirely. Nonparametric regression is therefore not a cosmetic addition. It is the mechanism that allows autoregression to remain autoregressive without remaining linear. + +--- + +## Directional Temporal Dependence + +Time-series dependence is not merely contemporaneous dependence shifted through time. It has direction: + +- earlier values may help predict later values, +- later values cannot influence earlier ones, +- and positive versus negative deviations may propagate differently across time. + +Within the broader NNS framework, this suggests a temporal analogue of the co-partial-moment decomposition developed in Chapters 11, 12, and 14. For a lag \(\tau \ge 1\), one can study lagged co-partial moments formed from aligned pairs such as \((X_{t-\tau}, X_t)\), separating concordant movement from divergent movement across time. + +In that interpretation, one class of lagged moments captures persistence in the same directional regime, while another captures reversals between periods. The distinction is useful because many time series are dynamically asymmetric even when their unconditional summaries appear mild. + +For example: + +- volatility clusters after large shocks, +- downturns may persist longer than upswings, +- inventory shortages may propagate differently than surpluses, +- and demand spikes may reverse more sharply than demand collapses. + +Directional temporal dependence therefore generalizes classical autocorrelation by preserving regime-specific information that linear autocovariance averages away. + +It is important, however, to place this idea correctly within the NNS framework. In the current univariate forecasting routines, time dependence is operationalized primarily through lag-defined component regression and seasonality detection, not through a standalone directional-autocorrelation statistic. The lagged co-partial-moment construction belongs most naturally to the theory developed for asymmetric dependence and causation, where temporal ordering is analyzed explicitly rather than only through forecast generation. Readers can map this directly to Chapters 11, 12, and 14: Chapter 10 supplies asymmetric directional dependence, Chapter 11 supplies copula-space normalization intuition, and Chapter 13 supplies directional-causation asymmetry. + +--- + +## Forecasting from Component Regressions + +The NNS forecasting workflow can now be stated clearly. + +### Step 1: Select candidate seasonal lags + +Identify one or more plausible lag lengths, either from domain knowledge or from the predictive seasonality test. + +### Step 2: Form component series + +For each lag \(m\), partition the original series into \(m\) phase-specific subseries. + +### Step 3: Regress each component forward + +For each component series, estimate the next observation using either: + +- linear regression, +- nonlinear NNS regression, +- both, or +- mean-based shrinkage variants. + +### Step 4: Weight and aggregate + +Combine component forecasts using weights that reflect predictive concentration and sample support. + +### Step 5: Iterate if forecasting multiple steps ahead + +For multi-step forecasting, append the newly predicted value and repeat. Seasonal factors may be kept fixed or updated dynamically as the forecast path evolves. + +This procedure preserves the definition of autoregression: + +the forecast is still generated from the series’ own past. + +But it does so without imposing stationarity, without requiring Box–Jenkins identification, and without restricting the lag relation to a linear map. + +A brief clarification of the mean-based option is useful. In some component series the fitted regression may be unstable because the component is short, noisy, or nearly flat. In that case a practical alternative is to shrink the regression estimate toward the component mean, or even to use the component mean directly. This sacrifices some responsiveness in exchange for stability. Conceptually, it is a local bias-variance tradeoff: when the estimated slope or nonlinear fit is unreliable, the component average can act as a robust anchor. + +```r +# Univariate nonlinear ARMA +z <- as.numeric(scale(sin(1:480/8) + rnorm(480, sd=.35))) + +# Seasonality detection (prints a summary) +seasonal_period <- NNS.seas(z, plot = FALSE) +head(seasonal_period$all.periods) + +## Period Coefficient.of.Variation Variable.Coefficient.of.Variation +## 1 99 0.5122054 1.168502e+17 +## 2 147 0.5256021 1.168502e+17 +## 3 100 0.5598477 1.168502e+17 +## 4 146 0.5618687 1.168502e+17 +## 5 199 0.5766158 1.168502e+17 +## 6 98 0.5801409 1.168502e+17 + + +# Validate seasonal periods and forecast +NNS.ARMA.optim(z, h = 48, seasonal.factor = seasonal_period$periods, plot = TRUE, ncores = 1) +``` + + +
    +![Figure 24.1. `NNS.ARMA(..., h = 45, seasonal.factor = c(12,24,36))` forecast output with fitted trajectory and uncertainty structure.](images/ch24_uni_ts.png) +
    + + +--- + +## Dynamic Updating and Recursive Forecast Paths + +A one-step forecast is rarely the end goal. In practice, analysts often require + +\[ +h = 1,2,\dots,H +\] + +steps ahead. + +In the NNS framework, multi-step forecasting proceeds recursively. + +If \(\hat X_{T+1}\) is forecast first, then it is appended to the series and treated as part of the evolving path when forecasting \(\hat X_{T+2}\), and so on. + +This creates two natural modes. + +### Static seasonal structure + +The seasonal lags and weights are estimated once from the historical sample and then held fixed for all future steps. + +### Dynamic seasonal structure + +The seasonal structure is recomputed as the forecast path grows, allowing the decomposition itself to evolve. + +The static approach favors stability. +The dynamic approach favors adaptability. + +But that distinction can be made more precise. + +Static updating is generally preferable when: + +- the dominant seasonal pattern is well known in advance, +- the series is long enough that seasonal weights are already stable, +- the forecast horizon is short relative to the seasonal period, +- or the analyst values interpretability and reproducibility over rapid adaptation. + +In such cases, recomputing the lag structure at every step may add noise rather than information. If the underlying periodic structure is persistent, then a fixed decomposition acts as a stabilizer. + +Dynamic recomputation is preferable when the data suggest that the seasonal structure itself is moving. Typical signals include: + +- abrupt level shifts, +- changing amplitudes of recurring cycles, +- newly emerging or fading periodicities, +- strong structural breaks, +- or forecast errors that begin to cluster by phase. + +For example, a retail series may historically be dominated by an annual pattern, yet after a major platform change or supply shock, shorter promotional cycles may become more predictive than the old annual rhythm. In that case, holding the original seasonal factor fixed can lock the forecast into an outdated regime. Dynamic updating allows the decomposition to respond as the series evolves. + +So the practical decision is not merely “stability versus adaptability.” It is a question of whether the analyst believes the lag structure is itself part of the stable signal or part of the changing environment. + +A useful empirical guide is out-of-sample validation. One may reserve a holdout period, compare static and dynamic forecasts over that window, and select the updating rule that yields better predictive accuracy. In that sense, the static-versus-dynamic decision is not purely philosophical. It can be treated as a forecasting design choice subject to cross-validation. + +--- + +## Prediction Intervals for Forecasts + +Point forecasts are only one part of the forecasting problem. Analysts also need measures of uncertainty. + +Because the NNS framework is nonparametric, forecast intervals are constructed without Gaussian error assumptions. Instead, uncertainty can be propagated using the maximum entropy bootstrap machinery developed in Chapter 17. + +The logic is straightforward: + +1. generate replicates consistent with the forecast path and dependence structure, +2. compute the implied distribution of future outcomes, +3. extract lower and upper predictive bounds from directional quantiles. + +This produces prediction intervals that are aligned with the empirical distributional shape of the series rather than with a parametric error law. + +Thus the directional framework provides not only nonlinear point forecasts, but also **distribution-free forecast uncertainty quantification**. + +A bit more concretely, suppose the fitted model yields a forecast path + +\[ +\hat X_{T+1}, \dots, \hat X_{T+H}. +\] + +The bootstrap procedure does not assume i.i.d. Gaussian residuals around that path. Instead, it constructs synthetic continuations that preserve the rank structure and dependence features of the observed series as closely as possible. Each bootstrap replicate yields an alternative future trajectory, and the collection of such trajectories forms an empirical predictive distribution at each horizon. + +If the series is asymmetric, heavy-tailed, or exhibits occasional bursts, those features can appear in the predictive distribution instead of being averaged away by a normal approximation. Directional lower and upper tail functionals can then be used to extract forecast bands. In this sense, the interval forecast is not an accessory to the point forecast. It is the distributional analogue of the same nonparametric logic: preserve the observed structure first, summarize uncertainty second. + +In practice, one might generate a large number of bootstrap replicates, often on the order of hundreds or thousands, then evaluate the empirical future distribution at each horizon. If a central \(100(1-\alpha)\%\) interval is desired, the lower and upper bounds can be extracted from directional quantiles corresponding to \(\alpha/2\) and \(1-\alpha/2\). The exact number of replicates is an accuracy-versus-computation choice, but the principle is always the same: empirical resampled paths replace parametric error formulas. + +--- + +## Relation to Earlier NNS Chapters + +Time-series modeling in NNS is not an isolated technique. It is a direct extension of earlier ideas. + +### From Chapter 19 + +The forecast engine is built from recursive conditional estimation. + +### From Chapter 22 + +The local regression on component series inherits the data-adaptive bandwidth logic of partition estimators. + +### From Chapter 22 + +Forecasting is just regression where the predictor is lagged time index or lagged structure and the response is the next component value. + +### From Chapter 24 + +Aggregation across multiple lag structures is an ensemble of directional learners. + +So the time-series framework is not an exception to the book’s theory. It is one of its most natural applications. + +--- + +## Comparison with ARIMA + +ARIMA remains one of the benchmark tools of time-series forecasting. + +Its strengths are well known: + +- interpretable lag operators, +- strong theory under stationarity, +- effective performance on linear stochastic dynamics. + +But its limitations are equally clear when viewed through the directional lens. + +### Structural form + +ARIMA assumes a linear dependence structure after differencing. NNS does not. + +### Stationarity + +ARIMA is built around stationarity and invertibility conditions. NNS forecasting does not require the series to satisfy a stationary parametric model in level. + +### Identification burden + +ARIMA requires order selection and specification diagnostics. NNS shifts the task from parametric identification to predictive lag decomposition. + +### Nonlinearity + +ARIMA can approximate some nonlinear behavior through transformations or hybridization, but nonlinearity is not native to the model. In NNS, it is native. + +This does not mean ARIMA is obsolete. It means that ARIMA is best understood as a special, linear, tightly structured case of a broader forecasting problem. + +For balance, however, NNS does not eliminate modeling choices. It replaces ARIMA’s order-identification problem with choices about lag selection, regression method, aggregation weights, and updating scheme. The difference is methodological rather than absolute: in NNS, these choices are naturally evaluated by predictive performance rather than by adherence to a pre-specified parametric identification protocol. + +--- + +## Comparison with ETS Models + +ETS methods model time series through combinations of + +- error, +- trend, +- and seasonality, + +typically using exponential smoothing recursions and state-space interpretations. + +These methods are often highly effective, especially on business forecasting problems with stable level, trend, and seasonal components. + +Relative to ETS, the NNS approach differs in several ways. + +### Component meaning + +ETS decomposes the series into latent level, trend, and seasonality states. +NNS decomposes it into lag-defined predictive subsets. + +### Smoothing mechanism + +ETS uses recursive smoothing equations. +NNS uses local regression and weighted aggregation across component series. + +### Parametric structure + +ETS specifies an updating architecture in advance. +NNS lets the predictive structure emerge from the data. + +### Nonlinear interactions + +ETS can adapt smoothly, but it is not inherently designed for rich nonlinear autoregressive geometry. +NNS is. + +The practical difference is conceptual: + +ETS smooths a presumed component architecture. +NNS learns a predictive architecture from subset behavior. + +A further distinction is distributional. Many ETS formulations are estimated in likelihood-based frameworks tied to specific error models, often Gaussian or close variants. The NNS approach imposes no such distributional law on the series or on the forecast errors. + +--- + +## When the NNS Approach Is Especially Useful + +The nonparametric time-series framework is particularly attractive when one or more of the following hold: + +- the series exhibits nonlinear cyclic behavior, +- multiple seasonalities are present, +- model stationarity is doubtful, +- lag effects are structurally asymmetric, +- parametric identification is fragile, +- or prediction accuracy matters more than adherence to a classical stochastic specification. + +Examples include: + +- retail and transaction flows, +- cyclical economic indicators, +- energy demand, +- financial and commodity time series, +- and operational processes with threshold-driven dynamics. + +These are precisely the settings where local structure matters more than global parametric elegance. + +--- + +## Limitations + +A nonparametric forecasting method is not free of tradeoffs. + +### Primarily univariate in this chapter + +The methods developed here focus on a single series. Cross-series interactions are deferred to Chapter 26. + +### Data requirements + +Because prediction is learned from historical structure, sparse component series may limit reliability for very large lag lengths. + +### Computational cost + +Searching many seasonal combinations and fitting nonlinear regressions can be more computationally intensive than fitting a simple linear ARIMA. + +### Interpretability of dynamics + +A fitted ARIMA equation provides direct coefficients. NNS instead provides a predictive mechanism based on component regressions and weights. This is often more flexible, but less compact as a closed-form law. + +These limitations are real. But they are the cost of avoiding the stronger assumptions of parametric time-series models. + +The multivariate extension is natural in principle, but not automatic in implementation. Once several series enter, one must distinguish self-dependence from cross-dependence, align potentially different frequencies, and account for lead-lag structure across variables. + +--- + +## Leakage-Safe Backtesting Protocol + +Forecasting performance must be assessed with strict time-order preservation. A leakage-safe protocol is: + +- **Rolling-origin evaluation**: choose an initial training window `[1, T_0]`. +- **Forecast horizon definition**: for each origin `t`, produce forecasts for `t+h` without using observations after `t`. +- **Expanding or sliding refit**: + - expanding window: train on `[1, t]`, or + - sliding window: train on `[t-w+1, t]`. +- **No future-informed preprocessing**: any scaling, interpolation, imputation, or feature construction must be computed using data available at origin `t` only. +- **Horizon-specific scoring**: report MAE/RMSE/coverage separately for each horizon `h` rather than pooling all horizons. +- **Interval calibration check**: compare nominal vs empirical coverage for prediction intervals across horizons. + +For seasonal component construction, lag selection must also be origin-specific; selecting global lags from the full sample before backtesting constitutes leakage. + +--- + + +## Summary + +This chapter developed nonparametric time-series modeling in the NNS framework. + +Its main contributions are fivefold. + +First, it reframed **time series as a subset regression problem**. Forecasting is treated as conditional estimation on lag-defined component series rather than as fitting a single global linear recursion. + +Second, it developed **seasonal decomposition by predictive concentration**. Seasonal structure is detected through reductions in coefficient of variation across component series, linking seasonality directly to forecastability. + +Third, it established **nonlinear autoregressive forecasting**. Component series are projected forward using nonparametric regression, allowing local curvature and asymmetric dynamics to enter the forecast natively. + +Fourth, it clarified how the book’s broader dependence framework extends into time. Lagged directional structure can be studied theoretically through co-partial-moment ideas, even though the chapter’s main forecasting routines operationalize time dependence through component regression and seasonality rather than through a standalone directional-autocorrelation statistic. + +Fifth, it clarified the chapter’s relationship to classical methods. ARIMA and ETS remain important special-purpose tools, but they impose structural assumptions that the NNS framework avoids. + +Taken together, these results show that the directional nonparametric framework extends naturally from cross-sectional estimation to temporal prediction. + +But the framework developed here remains fundamentally univariate. Once other series matter, the main difficulty is no longer just whether the past of \(X_t\) predicts its future, but whether lagged values of \(Y_t\), \(Z_t\), and other related processes alter that forecast in nonlinear and asymmetric ways. In that setting, univariate decomposition can miss cross-series lead-lag effects, common shocks, and mixed-frequency structure. + +The next chapter therefore generalizes the same ideas to **multivariate forecasting**, where multiple time series interact through directional dependence, lagged cross-variable structure, and mixed-frequency information. + +> **Further Reading / Examples** + +> For forecasting applications, including the tidal data example, see the [NNS Time-Series Forecasting Examples](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/README.md#4-time-series-forecasting). This behavior is illustrated in the [tidal forecasting example](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/tides.html), where the seasonal decomposition captures the dominant 12-month cycle. + +> For prediction-interval calibration under nonstationarity — `NNS.ARMA.optim` benchmarked against conformal-prediction methods on coverage and the Winkler interval score — see the [time-series prediction-interval benchmark](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/nns_arma_conformal_benchmark_report.md). diff --git a/tools/NNS/book/chapter-26-multivariate-forecasting.Rmd b/tools/NNS/book/chapter-26-multivariate-forecasting.Rmd new file mode 100644 index 0000000..908718a --- /dev/null +++ b/tools/NNS/book/chapter-26-multivariate-forecasting.Rmd @@ -0,0 +1,663 @@ +# Multivariate Forecasting + +Chapter 24 developed the NNS approach to univariate forecasting. A single series can be projected forward by treating future values as a nonlinear autoregression problem, estimating the relation between current and lagged values without imposing a fixed parametric law, and then extrapolating the series using directional nonparametric structure. + +But many forecasting problems are not univariate. + +Macroeconomic indicators move together. +Financial variables transmit shocks across markets. +Operational systems contain multiple indicators observed at different frequencies. +In all such cases, the future of one variable depends not only on its own past, but also on the lagged history of other variables. + +This chapter extends the forecasting framework from one time series to a **system of time series**. + +The package implementation is `NNS.VAR`, a **nonparametric vector autoregressive model incorporating `NNS.ARMA` estimates of variables into `NNS.reg` for a multivariate time-series forecast**. Its purpose is explicit: combine univariate nonlinear time-series forecasting with multivariate nonlinear regression so that each series can borrow information from the others while still retaining its own temporal structure. + +Whenever this chapter appeals to mean-split regression behavior (local refinement with occupancy control), the theoretical reference point is Chapter 18: the consistency conditions for recursive mean-split estimation under shrinking local diameters and growing local sample support. + +A second implementation, `NNS.nowcast`, wraps this framework for mixed-frequency macroeconomic forecasting and nowcasting, using a built-in panel of Federal Reserve style indicators and passing them to `NNS.VAR` with monthly alignment and a 12-lag structure. + +The key ideas of the chapter are therefore: + +- multivariate time-series dependence, +- nonlinear vector autoregressive extensions, +- directional cross-variable interactions, +- mixed-frequency inputs, +- and empirical forecasting systems built from these components. + +--- + +## From Univariate to Multivariate Forecasting + +Let + +\[ +X_t = +\begin{pmatrix} +X_{1t}\\ +X_{2t}\\ +\vdots\\ +X_{pt} +\end{pmatrix} +\in \mathbb{R}^p +\] + +denote a \(p\)-dimensional time series. + +In univariate forecasting, we model a future value as + +\[ +X_{t+h} = f(X_t, X_{t-1}, X_{t-2}, \dots) + \varepsilon_{t+h}. +\] + +In the multivariate setting, each component may depend on the lagged history of every variable in the system: + +\[ +X_{j,t+h} += +f_j\!\bigl( +X_{1,t}, X_{1,t-1}, \dots, +X_{2,t}, X_{2,t-1}, \dots, +X_{p,t}, X_{p,t-1}, \dots +\bigr) ++ +\varepsilon_{j,t+h}, +\qquad j=1,\dots,p. +\] + +So the problem is no longer to forecast a single path in isolation. It is to estimate a **joint dynamic structure** in which variables affect one another through time. + +This is the motivation for vector autoregression. + +--- + +## Classical VAR and Its Limits + +A classical vector autoregressive model of order \(k\), VAR\((k)\), is written + +\[ +X_t = c + A_1 X_{t-1} + A_2 X_{t-2} + \cdots + A_k X_{t-k} + \varepsilon_t, +\] + +where \(c\) is a vector of intercepts, \(A_1,\dots,A_k\) are coefficient matrices, and \(\varepsilon_t\) is an innovation vector. + +This formulation is powerful because it allows each variable to depend on lagged values of all the others. But it also imposes several strong restrictions. + +### Linearity + +All effects enter linearly through the coefficient matrices. Threshold behavior, asymmetry, nonlinear interactions, and regime changes are not modeled directly. + +### Uniform lag structure + +Classical VAR usually imposes the same lag order across all variables, even when different variables operate naturally at different horizons. + +### Parametric residual thinking + +Forecast construction and inference are built around residual assumptions that may become fragile in heavy-tailed, asymmetric, or nonlinear environments. + +### Mixed-frequency difficulty + +Variables observed monthly, quarterly, weekly, or daily do not fit naturally into the same linear lag system without additional modeling layers. + +The NNS approach keeps the useful intuition of cross-variable lag forecasting while relaxing these structural restrictions. + +--- + +## Nonparametric Vector Autoregression + +`NNS.VAR` reframes vector autoregression as a nonlinear nonparametric learning problem. + +The implementation has a clear architecture. + +### Stage 1: complete each series individually + +Each variable is first interpolated where values are missing and extrapolated forward using `NNS.ARMA`. The returned object explicitly includes + +- `interpolated_and_extrapolated`, +- `univariate`, +- `multivariate`, +- `ensemble`, +- and `relevant_variables`. + +This design is important. The multivariate forecast does not replace the univariate forecast. It is built on top of it. + +### Stage 2: construct lagged predictors + +The completed multivariate panel is transformed into a lag matrix. + +### Stage 3: reduce lagged predictors + +For each target variable, the lagged predictor set is screened using one of several relevance measures and a thresholding rule. + +### Stage 4: estimate nonlinear multivariate forecasts + +The retained lagged predictors are passed into the multivariate regression system through `NNS.stack` and `NNS.reg`. + +### Stage 5: combine univariate and multivariate information + +The final forecast is an ensemble of the univariate and multivariate components. + +So the “VAR” terminology remains conceptually appropriate, but the mechanism is no longer a linear matrix recursion. It is a **nonlinear regression surface over lagged multivariate data**. + +--- + +## Lagged Predictor Geometry + +Suppose the observed system is stored in a matrix + +\[ +V = +\begin{bmatrix} +X_{11} & X_{21} & \cdots & X_{p1}\\ +X_{12} & X_{22} & \cdots & X_{p2}\\ +\vdots & \vdots & \ddots & \vdots\\ +X_{1T} & X_{2T} & \cdots & X_{pT} +\end{bmatrix}. +\] + +The first structural step is to generate lagged copies of each column. For a lag depth \(\tau\), these predictors include terms such as + +\[ +X_{j,t},\; X_{j,t-1},\; X_{j,t-2},\; \dots,\; X_{j,t-\tau}. +\] + +A major advantage of `NNS.VAR` is that the lag argument `tau` is flexible. + +It may be + +- a single positive integer, applying the same lag depth to every variable, +- a vector, assigning a single lag choice to each variable, +- or a list, assigning multiple lags to each variable separately. + +Thus the lag structure need not be homogeneous. + +One variable may use only lag 1, another may use lags 1 through 6, and a third may use a sparse pattern such as \(1, 3, 12\). This flexibility is especially important in economic and financial systems, where different variables evolve over different time scales. + +Mathematically, the forecasting feature vector for one horizon may be written as + +\[ +Z_t += +\bigl( +X_{1,t}, X_{1,t-1}, \dots, X_{1,t-\tau_1}, +X_{2,t}, X_{2,t-1}, \dots, X_{2,t-\tau_2}, +\dots, +X_{p,t}, X_{p,t-1}, \dots, X_{p,t-\tau_p} +\bigr). +\] + +The multivariate forecast for variable \(j\) is then + +\[ +\hat X_{j,t+h} = f_j(Z_t), +\] + +with \(f_j\) estimated nonparametrically. + +--- + +## Mixed-Frequency Inputs + +A central practical challenge in multivariate forecasting is **mixed-frequency data**. + +Examples include: + +- monthly inflation with quarterly GDP, +- weekly claims with monthly industrial production, +- daily market variables with monthly macroeconomic releases. + +If these series are aligned onto a common calendar, the lower-frequency variables necessarily contain missing entries at higher-frequency timestamps. + +Classical multivariate methods often require specialized machinery for this. The NNS framework instead treats the problem as a nonlinear interpolation and extrapolation task before multivariate estimation begins. + +Let + +\[ +V_t = (X_{1t},\dots,X_{pt}), +\] + +with some entries unobserved because the reporting frequencies differ. `NNS.VAR` explicitly returns a matrix called `interpolated_and_extrapolated`, whose purpose is to replace missing values in the original panel using interpolation and univariate extrapolation. This is not an incidental preprocessing step. It is foundational to the mixed-frequency design of the method. + +Conceptually, the procedure is: + +1. place all series on a common index, +2. fill interior gaps via interpolation, +3. forecast trailing missing values using univariate nonlinear forecasting, +4. build the lagged multivariate system on the completed panel. + +So mixed-frequency forecasting is handled natively, rather than being outsourced to a separate parametric state-space model. + +--- + +## Dimension Reduction and Directional Relevance + +Once the lagged panel has been created, not all lagged predictors are equally informative for each target variable. + +`NNS.VAR` therefore applies a dimension-reduction step before forming the multivariate forecast. The user can choose among four relevance schemes: + +- `"cor"`: absolute Spearman correlation, +- `"NNS.dep"`: nonlinear dependence weights, +- `"NNS.caus"`: directional causation weights, +- `"all"`: the average of the three relevance matrices. + +This is a substantive departure from classical VAR, where the usual practice is to include all lags up to the chosen order unless restrictions are imposed manually. + +Here the guiding question is: + +**Which lagged variables are actually relevant for forecasting this target?** + +The implementation answers this in two steps. + +### Step 1: compute relevance scores + +For a target variable \(Y\) and lagged predictors \(Z_1,\dots,Z_m\), a score \(r_\ell\) is computed for each predictor. + +Depending on `dim.red.method`, these are: + +\[ +r_\ell = |\rho_S(Y,Z_\ell)|, +\] + +or the nonlinear dependence weight from `NNS.dep`, or the directional causation weight from `NNS.caus`, or their average. + +### Step 2: apply a threshold + +These relevance scores are then compared to the threshold selected by the preceding `NNS.stack` dimension-reduction step. A lagged predictor survives only if its score exceeds that threshold. + +So the selection rule is + +\[ +Z_\ell \text{ is retained} +\quad \Longleftrightarrow \quad +r_\ell > \theta, +\] + +where \(\theta\) is the learned dimension-reduction threshold. + +If no lag exceeds the threshold, the implementation falls back to the full lagged predictor set rather than discarding the multivariate structure entirely. + +This deserves emphasis. The reduction step is not just “screening by method name.” It is a concrete thresholding rule applied to lag-specific relevance scores. + +--- + +## Directional Cross-Variable Interactions + +Why are these relevance measures important? + +Because cross-variable forecasting effects are not generally linear, symmetric, or monotone. + +A lagged predictor may matter because: + +- it moves with the target linearly, +- it influences the target nonlinearly, +- or it exhibits directional causal strength that would be poorly summarized by a simple coefficient. + +The NNS framework is designed precisely for such cases. + +If variable \(X_k\) influences variable \(X_j\) only during downturns, only above a threshold, or only through asymmetric responses, then a linear VAR coefficient can understate or even obscure the relationship. By contrast, dependence weights from `NNS.dep` and directional weights from `NNS.caus` are built to recognize these nonlinear and asymmetric structures. + +So the multivariate forecasting problem becomes one of **discovering directional cross-variable interactions in lag space**. + +--- + +## Multivariate Forecasting as Nonlinear Regression + +For each target variable \(X_j\), the forecasting problem can be written as + +\[ +X_{j,t+h} = f_j(Z_t), +\] + +where \(Z_t\) is the selected lagged feature vector after dimension reduction. + +The surface \(f_j\) is estimated through the NNS regression framework rather than through a linear coefficient matrix. This is the conceptual heart of `NNS.VAR`. + +Multivariate autoregression is therefore reinterpreted as a supervised learning problem: + +- the response is the future value of one variable, +- the predictors are lagged values of all variables, +- the regression surface is estimated nonparametrically. + +This is still autoregression, but with nonlinear function estimation replacing linear matrix multiplication. + +--- + +## Ensemble Construction + +One of the most important features of `NNS.VAR` is that the final output is not merely a multivariate forecast. + +For each target variable, the method returns: + +- a univariate forecast from `NNS.ARMA`, +- a multivariate forecast from the nonlinear regression stack, +- and an ensemble combining the two. + +Let + +\[ +\hat u_{j,t+h} +\] + +denote the univariate forecast for variable \(j\), and let + +\[ +\hat m_{j,t+h} +\] + +denote the multivariate forecast. The ensemble is + +\[ +\hat x_{j,t+h} += +w^{(u)}_j \hat u_{j,t+h} ++ +w^{(m)}_j \hat m_{j,t+h}, +\qquad +w^{(u)}_j + w^{(m)}_j = 1. +\] + +If `naive.weights = TRUE`, the method uses equal weights: + +\[ +w^{(u)}_j = w^{(m)}_j = \frac12. +\] + +If `naive.weights = FALSE`, the implementation is more precise than a simple “count of relevant variables.” It computes the proportion of selected lagged predictors that belong to the target variable itself. + +Let + +- \(n_{\text{own},j}\) be the number of retained lagged predictors for target \(j\) whose base variable is \(j\), +- \(n_{\text{cross},j}\) be the number of retained lagged predictors whose base variable is not \(j\). + +Then + +\[ +w^{(u)}_j += +\frac{n_{\text{own},j}}{n_{\text{own},j} + n_{\text{cross},j}}, +\qquad +w^{(m)}_j += +1 - w^{(u)}_j += +\frac{n_{\text{cross},j}}{n_{\text{own},j} + n_{\text{cross},j}}. +\] + +This is an elegant weighting rule. + +- If the selected structure is mostly own-lags, the forecast leans univariate. +- If the selected structure is mostly cross-variable, the forecast leans multivariate. +- If there is no usable distinction, the implementation falls back to equal weighting. + +So the ensemble is not arbitrary averaging. It is a structural weighting mechanism derived from the selected lag geometry. + +--- + +## Relevant Variables as a Structural Map + +The output `relevant_variables` provides more than a convenience list. It gives a structural summary of the system learned by the model. + +For each target variable, it records the lagged predictors that survived the relevance threshold. This means the forecast comes with an interpretable dynamic footprint: + +- which own-lags matter, +- which other variables matter, +- and at which lag horizons they matter. + +This has an interpretation close to a learned nonlinear Granger map, but without restricting the analysis to linear coefficients or symmetric residual-based testing. + +The result is both predictive and explanatory. + +A variable may be forecast mainly by its own lagged history, by a sparse collection of related series, or by a broad network of interacting indicators. The model does not impose one of these patterns a priori; it discovers them from the data. + +--- + +## A Small Synthetic Illustration + +Suppose we jointly forecast monthly inflation \(I_t\), unemployment \(U_t\), and industrial production \(P_t\), using lags through month 12. + +The full lag system would contain predictors such as + +\[ +I_{t-1}, I_{t-2}, \dots, I_{t-12},\; +U_{t-1}, U_{t-2}, \dots, U_{t-12},\; +P_{t-1}, P_{t-2}, \dots, P_{t-12}. +\] + +But after dimension reduction, the retained structures for different targets need not look alike. + +For inflation, the model might retain mainly + +\[ +I_{t-1},\ I_{t-2},\ P_{t-1}, +\] + +indicating that short-run inflation persistence and recent production conditions are the dominant signals. + +For unemployment, the model might retain + +\[ +U_{t-1},\ U_{t-12},\ I_{t-3}, +\] + +suggesting a mixture of local persistence, annual seasonality, and delayed inflation spillover. + +This synthetic example illustrates the main point: the NNS system does not treat all targets as sharing the same lag law. Each response learns its own sparse multivariate structure. + +That is exactly what a nonlinear vector autoregression should do. + +--- + +## Nowcasting with `NNS.nowcast` + +Nowcasting deserves special treatment because it is one of the most practical uses of the framework. + +`NNS.nowcast` is a wrapper built around `NNS.VAR`. It downloads a base panel of macroeconomic indicators, converts each to monthly frequency, merges them into a common panel, and then calls `NNS.VAR(econ_variables, h = h, tau = 12, nowcast = TRUE, naive.weights = naive.weights)`. + +This gives a ready-made mixed-frequency multivariate forecasting system. + +The built-in variable set includes indicators such as: + +- payroll employment, +- job openings, +- CPI and core CPI, +- durable goods orders, +- retail sales, +- unemployment, +- housing starts and permits, +- industrial production, +- personal income, +- exports and imports, +- construction spending, +- unit labor cost, +- real consumption spending, +- real GDP, +- weekly unemployment claims, +- Treasury rates and yield spreads, +- Federal Reserve balance sheet assets, +- commodity prices, +- federal funds, +- producer prices, +- labor force participation, +- money supply, +- and ADP payrolls. + +This is a practically meaningful nowcasting panel. It combines labor, inflation, output, rates, liquidity, spending, trade, and commodity signals in one monthly-aligned system. + +### What the nowcast output means + +The output of `NNS.nowcast` is not a single number. It returns the same five-part structure as `NNS.VAR`: + +- `interpolated_and_extrapolated`: the completed monthly panel after filling mixed-frequency gaps, +- `relevant_variables`: the selected lagged predictors for each target, +- `univariate`: the standalone `NNS.ARMA` forecasts, +- `multivariate`: the nonlinear multivariate forecasts, +- `ensemble`: the combined forecast. + +So in practice, nowcasting means more than “guess the current GDP number.” It means: + +1. align all indicators to the current monthly calendar, +2. fill missing lower-frequency observations, +3. estimate joint nonlinear lag relations, +4. produce univariate, multivariate, and ensemble projections, +5. inspect which variables were actually driving the current nowcast. + +This is a much richer object than a single real-time estimate. + +--- + +## Prediction Intervals + +As in Chapter 15, point forecasts alone are not enough. Forecast uncertainty must also be quantified. + +The examples associated with both `NNS.VAR` and `NNS.nowcast` construct prediction intervals using `NNS.meboot`, `LPM.VaR`, and `UPM.VaR`. + +Suppose the forecasted path of a target series is + +\[ +\hat x_1,\hat x_2,\dots,\hat x_h. +\] + +Bootstrap replicates are generated from the forecast path: + +\[ +\hat x^{*(1)}, \hat x^{*(2)}, \dots, \hat x^{*(B)}. +\] + +Lower and upper forecast bounds are then obtained using partial-moment quantile operators: + +\[ +\text{Lower}_{\alpha} += +\operatorname{LPM.VaR}(\alpha/2,\cdot), +\qquad +\text{Upper}_{\alpha} += +\operatorname{UPM.VaR}(\alpha/2,\cdot). +\] + +In the nowcasting examples, this is illustrated directly for GDP. The GDP ensemble path is bootstrapped with `NNS.meboot`, and lower and upper confidence bounds are then computed from `LPM.VaR` and `UPM.VaR`. + +The conceptual advantage is consistent with the rest of NNS: + +- no Gaussian residual assumption is required, +- asymmetry is naturally allowed, +- and dependence-preserving synthetic replicates can be generated rather than relying on iid residual resampling. + +--- + +## Comparison with Classical VAR + +The difference between classical VAR and `NNS.VAR` can be summarized clearly. + +| Feature | Classical VAR | `NNS.VAR` | +|---|---|---| +| Functional form | Linear | Nonlinear nonparametric | +| Lag structure | Usually common across variables | Scalar, vector, or list-based | +| Missing mixed-frequency values | Requires extra modeling machinery | Built into interpolation and extrapolation | +| Variable selection | Often manual or penalized | Built-in relevance screening with thresholding | +| Multivariate dependence | Linear coefficients | Dependence and causation-aware feature selection | +| Forecast output | One system forecast | Univariate, multivariate, relevant-variable map, and ensemble | +| Ensemble weighting | Usually absent | Equal or structure-based own-lag versus cross-lag weighting | +| Prediction intervals | Parametric or residual bootstrap | Partial-moment quantile intervals with `NNS.meboot` | + +The NNS version therefore preserves the idea that variables should be modeled jointly, but relaxes the linear and parametric restrictions around that idea. + +--- + +## Why the Directional Framework Matters + +The contribution of the directional framework is not merely technical. It changes what multivariate forecasting can represent. + +### Nonlinearity is primary + +Forecast relationships do not need to be approximated by a global linear law. + +### Cross-variable effects can be asymmetric + +Dependence and causation can be directional, state-dependent, and nonlinear. + +### Mixed-frequency panels become tractable + +Incomplete higher-frequency panels are treated as a forecasting geometry problem rather than as a separate special case. + +### Univariate structure is preserved + +Each variable retains its own autoregressive signature through the univariate component. + +### Forecast combination is structural + +The ensemble reflects how much of the retained lag structure is own-history versus cross-variable history. + +These are not minor refinements. Together they turn vector autoregression into a flexible directional learning architecture. + +--- + +## Empirical Applications + +The framework is naturally suited to systems in which variables move together, asymmetrically, and at different reporting frequencies. + +### Macroeconomics + +GDP, employment, inflation, industrial production, claims, rates, and spending variables are all natural candidates for mixed-frequency nonlinear nowcasting. + +### Finance + +Asset returns, volatility, spreads, rates, and macro variables often interact through threshold effects and asymmetric transmission. + +### Operations and supply chains + +Demand, inventories, labor, orders, and shipping activity may be observed on different calendars but still require joint forecasting. + +### Energy and commodities + +Production, inventories, spot prices, futures curves, and macro demand indicators evolve jointly but not linearly. + +In all these settings, the problem is not merely to fit a system. It is to discover how a dynamic system actually transmits information through time. + +--- + +## Leakage-Safe Validation in Multivariate and Mixed-Frequency Settings + +Before evaluating forecast accuracy, the data pipeline itself must be audited for information timing and release-order integrity. + +In multivariate nowcasting/forecasting, leakage control is more delicate because indicators arrive asynchronously. + +Use the following rules: + +- **Vintage-consistent features**: at forecast origin `t`, include only indicator values actually released by `t`. +- **Release-calendar alignment**: construct mixed-frequency regressors using publication timestamps, not finalized revised datasets. +- **Origin-wise dimension reduction**: relevance screening/thresholding must be recomputed within each training window. +- **Horizon-by-horizon evaluation**: report forecast and interval metrics by target horizon and by target variable. +- **Stability diagnostics**: track how the `relevant_variables` set changes over origins to distinguish signal from selection noise. + +These rules make `NNS.VAR`/`NNS.nowcast` evaluations comparable to production conditions and prevent optimistic bias from look-ahead information. + + +--- + + +## Summary + +This chapter extended the NNS forecasting framework from univariate series to **multivariate forecasting**. + +The main results are: + +- A multivariate time series is treated as a nonlinear autoregression problem in lagged multivariate space. +- `NNS.VAR` combines univariate `NNS.ARMA` forecasts with multivariate nonlinear regression forecasts. +- The lag structure is flexible: `tau` may be a scalar, vector, or list. +- Mixed-frequency inputs are handled through interpolation and extrapolation before constructing the lagged system. +- Dimension reduction may be based on correlation, nonlinear dependence, directional causation, or their average. +- A lagged predictor survives the reduction step only when its relevance score exceeds the learned threshold from the `NNS.stack` screening routine. +- The ensemble forecast is either equally weighted or weighted according to the share of retained predictors that are own-lags versus cross-variable lags. +- In classification-style forecasting tasks with skewed targets, NNS ensemble workflows can combine minority up-sampling and majority down-sampling across ensemble members to reduce imbalance bias. +- The output `relevant_variables` provides a structural map of the learned dynamic system. +- `NNS.nowcast` applies this architecture directly to a practical macroeconomic panel with mixed frequencies and monthly alignment. +- Prediction intervals may be built nonparametrically using `NNS.meboot`, `LPM.VaR`, and `UPM.VaR`. + +Classical vector autoregression taught an important lesson: forecasting improves when variables are modeled jointly. The NNS framework retains that insight but frees it from linearity, common-frequency restrictions, and rigid parametric structure. + +The result is a forecasting system for nonlinear, asymmetric, mixed-frequency multivariate data: + +**nonparametric vector autoregression as directional multivariate learning through time.** + +--- diff --git a/tools/NNS/book/chapter-27-conclusion-and-next-steps.Rmd b/tools/NNS/book/chapter-27-conclusion-and-next-steps.Rmd new file mode 100644 index 0000000..3e7e9d8 --- /dev/null +++ b/tools/NNS/book/chapter-27-conclusion-and-next-steps.Rmd @@ -0,0 +1,45 @@ +# Conclusion and Next Steps + +The previous chapters developed the NNS framework from first principles to operational workflows in dependence analysis, distribution comparison, inference, prediction, and nonparametric estimation. + +A useful way to summarize the full arc is: + +1. **Directional building blocks** (Chapters 1–3): partial moments preserve sign and magnitude information that symmetric summaries discard. +2. **Dependence and causation** (Part III): directional co-moments, copulas, and recursive decomposition expose asymmetric relationships and lead/lag structure. +3. **Inference and comparison** (Part IV): continuous degree-one probability representations remove finite-sample discretization bias and support robust distributional comparisons. +4. **Estimation and forecasting** (Part V): recursive nonparametric systems turn the same directional primitives into predictive tools without restrictive parametric assumptions. + +Taken together, these results support the book's unifying claim: **one directional probability language can connect theory, diagnostics, and implementation across tasks that are often taught separately.** + +## What the Framework Has Achieved + +Across the text, several practical outcomes recur. + +- **Unified notation and implementation**: the same lower/upper partial moment operators appear in derivations and in executable R functions. +- **Bias-aware probability measurement**: degree-one partial moment ratios provide a continuous finite-sample correction to the step-function empirical CDF. +- **Distribution-free comparison tools**: NNS ANOVA and stochastic dominance diagnostics compare distributions directly, rather than reducing comparisons to mean/variance-only tests. +- **Adaptive predictive systems**: NNS regression and interval methods adapt to heteroskedastic, nonlinear structure using local empirical behavior. + +These capabilities matter most in real data settings where asymmetry, tail risk, and regime changes are central rather than exceptional. + +**Directional threshold analysis.** The framework has shown that the same lower and upper partial-moment operators generate not only distribution functions and quantiles, but also benchmark-sensitive threshold rules for adverse events. Degree zero recovers event-frequency calibration, while higher degrees permit calibration by adverse magnitude and extreme-deviation sensitivity. + +**Distribution-free probability control.** Partial moments support conservative tail-probability bounds through semivariance and higher-order directional dispersion measures. This connects descriptive nonparametrics to decision support without requiring strict distributional assumptions. + +**Finite-sample relevance under non-normality.** Partial moments are not merely conceptually distribution-free; they demonstrate improved finite-sample stability when data are skewed, heavy-tailed, or otherwise asymmetric. + +## Further Resources + +To continue beyond this book, the official implementation resources are maintained in three complementary locations: + +- **CRAN package**: +- **Vignettes and method walkthroughs**: +- **Hands-on examples in the GitHub repository**: + +A practical workflow is: + +1. Install and review the CRAN package documentation for function references and stable release behavior. +2. Work through the vignette set for topic-focused application patterns. +3. Use the GitHub examples index for extended, concrete scripts that can be adapted to your own data. + +Together, these resources provide deeper coverage of specialized applications, additional implementation detail, and more end-to-end examples than can be included in a single volume. diff --git a/tools/NNS/book/chapter-28-appendix-notation-and-function-reference.Rmd b/tools/NNS/book/chapter-28-appendix-notation-and-function-reference.Rmd new file mode 100644 index 0000000..c40dcec --- /dev/null +++ b/tools/NNS/book/chapter-28-appendix-notation-and-function-reference.Rmd @@ -0,0 +1,106 @@ +# Appendix: Notation and Function Reference + +This appendix consolidates the notation used across the book and maps each object to its primary R implementation pattern. It is intended as a quick lookup for readers moving between theoretical sections and code-first workflows. + +Unless otherwise noted, all functions in this appendix come from the **NNS** package. In executable code, load the package once and then call functions directly as shown in the table entries below. + +```r +library(NNS) +``` + +## Core directional operators and partial moments + +| Symbol | Definition | Interpretation | R function / pattern | +|---|---|---|---| +| $x^+$ | $\max(x,0)$ | Positive-part operator | `pmax(x, 0)` | +| $(X-t)^+$ | $\max(X-t,0)$ | Deviation above benchmark $t$ | internal to `UPM(...)` | +| $(t-X)^+$ | $\max(t-X,0)$ | Deviation below benchmark $t$ | internal to `LPM(...)` | +| $L_r(t;X)$ | $E[(t-X)_+^r]$ | Lower partial moment, degree $r$ | `LPM(r, t, X)` | +| $U_r(t;X)$ | $E[(X-t)_+^r]$ | Upper partial moment, degree $r$ | `UPM(r, t, X)` | +| $L_r/(L_r+U_r)$ | Degree-$r$ lower ratio | Directional CDF-style probability below $t$ | `LPM.ratio(r, t, X)` | +| $U_r/(L_r+U_r)$ | Degree-$r$ upper ratio | Directional probability above $t$ | `UPM.ratio(r, t, X)` | + +## Co-partial moments, dependence, and causation + +| Symbol | Definition / role | R function | +|---|---|---| +| $CoLPM, CoUPM$ | Concordant lower/upper co-partial moments | `Co.LPM(...)`, `Co.UPM(...)` | +| $DLPM, DUPM$ | Divergent lower/upper co-partial moments | `D.LPM(...)`, `D.UPM(...)` | +| $NNS.dep(X,Y)$ | Global nonlinear dependence measure | `NNS.dep(x, y)` | +| $NNS.copula(X,Y)$ | Nonparametric dependence geometry / copula view | `NNS.copula(x, y)` | +| $NNS.caus(X,Y)$ | Directional causation diagnostic | `NNS.caus(x, y)` | + +## Distribution comparison, dominance, and interval objects + +| Symbol | Definition / role | R function | +|---|---|---| +| $F^{(0)}(t)$ | Degree-zero empirical CDF (step measure) | `ecdf(x)(t)` or `LPM.ratio(0, t, x)` | +| $F^{(1)}(t)$ | Degree-one continuous CDF-style ratio | `LPM.ratio(1, t, x)` | +| $p = P(X' > Y')$ | Directional exceedance probability for pairwise comparison | estimated by cross-sample indicator averages | +| $\text{Certainty}_{\text{ANOVA}}$ | NNS ANOVA agreement certainty from CDF benchmark deviations ($1$ = strongest agreement) | `NNS.ANOVA(...)` | +| FSD / SSD / TSD | First-, second-, third-order stochastic dominance | `NNS.FSD(...)`, `NNS.SSD(...)`, `NNS.TSD(...)` | +| $Q^-_{d}(\alpha)$ | Lower degree-$d$ quantile | `LPM.VaR(alpha, degree = d, x)` | +| $Q^+_{d}(\alpha)$ | Upper degree-$d$ quantile | `UPM.VaR(alpha, degree = d, x)` | +| PI$_{1-\alpha}$ | Prediction interval $[Q^-_d(\alpha/2),Q^+_d(\alpha/2)]$ | `LPM.VaR(...)` + `UPM.VaR(...)` | + + + +`LPM.VaR(percentile, degree, variable)` +Lower-tail threshold operator obtained by inverting the degree-specific lower partial-moment probability representation. +Interpretation by degree: + +* `degree = 0`: empirical-CDF lower quantile, +* `degree = 1`: severity-weighted lower threshold based on directional magnitude, +* `degree = 2`: extreme-deviation-sensitive lower threshold. + In finance, the degree-zero case is commonly called VaR, but the operator is more general than that label. + +`UPM.VaR(percentile, degree, variable)` +Upper-tail analog of `LPM.VaR`, used for right-tail threshold selection and interval construction. + + + +## Directional Decision Regions Crosswalk (Classical → NNS) + +To maintain continuity with Chapter 22's directional decision-region framing, the table below maps common classical statistics and procedures to their directional NNS counterparts. + +| Classical statistic / workflow | Typical classical role | Directional NNS counterpart | Reference chapter | Notes | +|---|---|---|---|---| +| Pearson correlation | Linear association summary | `NNS.dep(x, y)` | Chapter 10 | Captures nonlinear and asymmetric dependence, not only linear co-movement. | +| Parametric VaR / empirical quantile VaR | Tail-loss thresholding | `LPM.VaR(alpha, degree, x)` *(degree-dependent)* | Chapter 16 | Degree controls sensitivity to tail severity beyond degree-0 quantiles. | +| Upper-tail quantile threshold | Right-tail risk/opportunity cutoff | `UPM.VaR(alpha, degree, x)` *(degree-dependent)* | Chapter 16 | Upper-tail analog to `LPM.VaR` for asymmetric interval construction. | +| Classical ANOVA (mean-comparison test) | Group-level location comparison | `NNS.ANOVA(...)` *(degree-dependent CDF benchmarking)* | Chapter 15 | Agreement certainty is benchmarked through directional CDF-style deviations. | +| Linear Granger-style directional inference | Lead-lag direction under linear structure | `NNS.caus(x, y)` | Chapter 14 | Directionality can be nonlinear and state dependent. | +| Copula / Joint Tail Dependency | Joint probability of concurrent outcomes | `Co.LPM(degree, target.x, target.y, x, y)` / `Co.UPM(degree, target.x, target.y, x, y)` | Chapter 4 | Co.LPM captures concurrent downside structure; Co.UPM is the upper-tail counterpart for joint directional events. | +| Mean-variance interval heuristics | Uncertainty bands under Gaussian assumptions | `LPM.VaR(...)` + `UPM.VaR(...)` *(degree-dependent bounds)* | Chapter 17 | Produces directional prediction intervals without normality assumptions. | + + +## Regression and forecasting workflow objects + +| Symbol | Definition / role | R function | Reference chapter | +|---|---|---|---| +| $\hat y = \hat E[Y\mid X]$ | NNS conditional mean estimate | `NNS.reg(x, y)` | Chapter 22 | +| Residual local distribution | Partition-level error distribution | via `NNS.reg(...)$Fitted.xy$residuals` outputs | Chapter 22 | +| $\widehat{PI}(x_0)$ | Conditional prediction interval at $x_0$ | `NNS.reg(..., point.est = x0, confidence.interval = ...)` | Chapter 17 | +| Regime-specific directional dependence | Time-local / state-local dependence | `NNS.dep(...)` on rolling/segmented windows | Chapter 10 | + +## A.3 Technical Note: Adaptive Order and Consistency Conditions + +Chapter 18 established two core consistency conditions for recursive mean-split regression: + +1. **Shrinking cell diameter** at each target location so local bias vanishes, +2. **Growing cell occupancy** so local sample averages stabilize. + +When `order = NULL`, the implementation determines effective recursion depth per regressor from directional dependence with the response (`NNS.dep`-style logic). This modifies how quickly local cells contract across predictors, but it does not alter the fundamental structure of the consistency argument. + +High-dependence predictors are allocated deeper partitioning, so their local diameters contract faster in regions where signal is strong. Low-dependence predictors are partitioned more conservatively, preserving broader local averaging where aggressive refinement would primarily amplify noise. Occupancy control remains enforced through the minimum cell-size rule. + +The resulting estimator is therefore **locally adaptive in rate**: + +- in high-signal regions, convergence tracks a faster path closer to an oracle fixed-order choice for that local structure; +- in low-signal regions, the estimator intentionally retains coarser cells, exchanging some local bias for improved stability. + +A concise takeaway is: + +> Under the Chapter 18 regularity conditions (shrinking local diameters and diverging local occupancy), dependence-driven order allocation preserves the same bias–variance decomposition used for consistency arguments, while allowing refinement to concentrate where dependence signal is stronger. + +The key practical point is that internal adaptive order selection keeps the same consistency checklist for users—control occupancy and ensure progressive local refinement—while often improving finite-sample stability by avoiding unnecessary depth in weak-signal coordinates. diff --git a/tools/NNS/book/images/ARMA_ex.png b/tools/NNS/book/images/ARMA_ex.png new file mode 100644 index 0000000..a387107 Binary files /dev/null and b/tools/NNS/book/images/ARMA_ex.png differ diff --git a/tools/NNS/book/images/ARMA_optim.png b/tools/NNS/book/images/ARMA_optim.png new file mode 100644 index 0000000..4a1632d Binary files /dev/null and b/tools/NNS/book/images/ARMA_optim.png differ diff --git a/tools/NNS/book/images/ARMA_optim_h_50.png b/tools/NNS/book/images/ARMA_optim_h_50.png new file mode 100644 index 0000000..bd8f66f Binary files /dev/null and b/tools/NNS/book/images/ARMA_optim_h_50.png differ diff --git a/tools/NNS/book/images/CDFs_1.png b/tools/NNS/book/images/CDFs_1.png new file mode 100644 index 0000000..693ee3a Binary files /dev/null and b/tools/NNS/book/images/CDFs_1.png differ diff --git a/tools/NNS/book/images/CDFs_2.png b/tools/NNS/book/images/CDFs_2.png new file mode 100644 index 0000000..6dc5a81 Binary files /dev/null and b/tools/NNS/book/images/CDFs_2.png differ diff --git a/tools/NNS/book/images/NNS_hex_sticker.png b/tools/NNS/book/images/NNS_hex_sticker.png new file mode 100644 index 0000000..a0271fb Binary files /dev/null and b/tools/NNS/book/images/NNS_hex_sticker.png differ diff --git a/tools/NNS/book/images/NNSmc_1.png b/tools/NNS/book/images/NNSmc_1.png new file mode 100644 index 0000000..16cb75f Binary files /dev/null and b/tools/NNS/book/images/NNSmc_1.png differ diff --git a/tools/NNS/book/images/NNSmc_1_tgt_drift.png b/tools/NNS/book/images/NNSmc_1_tgt_drift.png new file mode 100644 index 0000000..9be0802 Binary files /dev/null and b/tools/NNS/book/images/NNSmc_1_tgt_drift.png differ diff --git a/tools/NNS/book/images/boost_freq.png b/tools/NNS/book/images/boost_freq.png new file mode 100644 index 0000000..268b7fa Binary files /dev/null and b/tools/NNS/book/images/boost_freq.png differ diff --git a/tools/NNS/book/images/ch11_raw_copula.png b/tools/NNS/book/images/ch11_raw_copula.png new file mode 100644 index 0000000..6dfe04f Binary files /dev/null and b/tools/NNS/book/images/ch11_raw_copula.png differ diff --git a/tools/NNS/book/images/ch11_transformed_copula.png b/tools/NNS/book/images/ch11_transformed_copula.png new file mode 100644 index 0000000..9764566 Binary files /dev/null and b/tools/NNS/book/images/ch11_transformed_copula.png differ diff --git a/tools/NNS/book/images/ch14_lpm0_lpm1_diff.png b/tools/NNS/book/images/ch14_lpm0_lpm1_diff.png new file mode 100644 index 0000000..ef3aea1 Binary files /dev/null and b/tools/NNS/book/images/ch14_lpm0_lpm1_diff.png differ diff --git a/tools/NNS/book/images/ch15_reg_conf_int.png b/tools/NNS/book/images/ch15_reg_conf_int.png new file mode 100644 index 0000000..53589d9 Binary files /dev/null and b/tools/NNS/book/images/ch15_reg_conf_int.png differ diff --git a/tools/NNS/book/images/ch17_iid_mc_sim.png b/tools/NNS/book/images/ch17_iid_mc_sim.png new file mode 100644 index 0000000..733c8d1 Binary files /dev/null and b/tools/NNS/book/images/ch17_iid_mc_sim.png differ diff --git a/tools/NNS/book/images/ch17_meboot_mc_sim.png b/tools/NNS/book/images/ch17_meboot_mc_sim.png new file mode 100644 index 0000000..b88b9e7 Binary files /dev/null and b/tools/NNS/book/images/ch17_meboot_mc_sim.png differ diff --git a/tools/NNS/book/images/ch17_meboot_orig.png b/tools/NNS/book/images/ch17_meboot_orig.png new file mode 100644 index 0000000..3dda527 Binary files /dev/null and b/tools/NNS/book/images/ch17_meboot_orig.png differ diff --git a/tools/NNS/book/images/ch18_part_1.png b/tools/NNS/book/images/ch18_part_1.png new file mode 100644 index 0000000..cb49d67 Binary files /dev/null and b/tools/NNS/book/images/ch18_part_1.png differ diff --git a/tools/NNS/book/images/ch18_part_2.png b/tools/NNS/book/images/ch18_part_2.png new file mode 100644 index 0000000..24abeb7 Binary files /dev/null and b/tools/NNS/book/images/ch18_part_2.png differ diff --git a/tools/NNS/book/images/ch20_kmeans_comp.png b/tools/NNS/book/images/ch20_kmeans_comp.png new file mode 100644 index 0000000..47dfe8c Binary files /dev/null and b/tools/NNS/book/images/ch20_kmeans_comp.png differ diff --git a/tools/NNS/book/images/ch21_part_reg.png b/tools/NNS/book/images/ch21_part_reg.png new file mode 100644 index 0000000..c7d1a4d Binary files /dev/null and b/tools/NNS/book/images/ch21_part_reg.png differ diff --git a/tools/NNS/book/images/ch24_uni_ts.png b/tools/NNS/book/images/ch24_uni_ts.png new file mode 100644 index 0000000..759cea8 Binary files /dev/null and b/tools/NNS/book/images/ch24_uni_ts.png differ diff --git a/tools/NNS/book/images/ch3_cdf_lpm0.png b/tools/NNS/book/images/ch3_cdf_lpm0.png new file mode 100644 index 0000000..48c2c9e Binary files /dev/null and b/tools/NNS/book/images/ch3_cdf_lpm0.png differ diff --git a/tools/NNS/book/images/multi_impute.png b/tools/NNS/book/images/multi_impute.png new file mode 100644 index 0000000..776770b Binary files /dev/null and b/tools/NNS/book/images/multi_impute.png differ diff --git a/tools/NNS/book/images/overview_arma.png b/tools/NNS/book/images/overview_arma.png new file mode 100644 index 0000000..23b9fed Binary files /dev/null and b/tools/NNS/book/images/overview_arma.png differ diff --git a/tools/NNS/book/images/overview_reg.png b/tools/NNS/book/images/overview_reg.png new file mode 100644 index 0000000..9526dfb Binary files /dev/null and b/tools/NNS/book/images/overview_reg.png differ diff --git a/tools/NNS/book/images/uni_impute.png b/tools/NNS/book/images/uni_impute.png new file mode 100644 index 0000000..3473b8c Binary files /dev/null and b/tools/NNS/book/images/uni_impute.png differ diff --git a/tools/NNS/book/index.Rmd b/tools/NNS/book/index.Rmd new file mode 100644 index 0000000..d80af78 --- /dev/null +++ b/tools/NNS/book/index.Rmd @@ -0,0 +1,60 @@ +--- +title: "Nonlinear Nonparametric Statistics: Using Partial Moments" +subtitle: "Second Edition" +author: "Fred Viole" +date: 2026 +site: bookdown::bookdown_site +output: bookdown::gitbook +documentclass: book +header-includes: + - \usepackage{xcolor} +--- + +# Preface {-} + +This is the **Second Edition** of *Nonlinear Nonparametric Statistics: Using Partial Moments*, updated and expanded by Fred Viole in 2026, building upon the foundational 2013 work with David Nawrocki. + +This book presents the **Nonlinear Nonparametric Statistics (NNS)** framework as a coherent toolkit for modeling dependence, uncertainty, prediction, and decision-making without imposing restrictive distributional assumptions. + +The chapters are organized to move from foundational concepts to practical modeling workflows: + +- first principles of nonlinear dependence and directional relationships, +- nonparametric methods for regression, classification, and density-based tasks, +- time-series forecasting frameworks for univariate and multivariate settings, +- and implementation guidance for applied research and production analytics. + +The central theme is consistent throughout: when data are asymmetric, heavy-tailed, nonlinear, or regime-sensitive, useful structure can still be extracted directly from the data-generating process using directional and nonparametric methods. + +## Executive Summary {-} + +This book is designed for readers who want mathematically grounded methods that remain practical in real-world settings where classical assumptions can fail. + +At a high level, the NNS framework emphasizes: + +- **distribution-agnostic modeling** rather than strict parametric family selection, +- **directional dependence and causation diagnostics** instead of purely symmetric association summaries, +- **nonlinear predictive systems** that can adapt to heterogeneous signal structures, +- and **modular workflows in R** so methods can be combined for exploratory analysis, forecasting, and risk assessment. + +Readers can use the text in two ways: + +1. **Sequentially**, as a complete conceptual arc from core definitions to advanced forecasting systems. +2. **As a reference**, by jumping directly to method-specific chapters and accompanying implementation examples. + +Whether your domain is economics, finance, operations, policy, or scientific research, the goal is the same: to provide robust, interpretable, and applied nonparametric tools for difficult data. + +## About the Examples Repository {-} + +This book is designed to be used alongside the companion examples repository: + +- + +The repository is organized as a practical application layer. Conceptual and theoretical development lives in the book, while reproducible scripts and end-to-end demonstrations live in the examples. + +A useful way to navigate both resources together is: + +1. Read the chapter for the theoretical framework and notation. +2. Open the matching section in `examples/README.md` for runnable code patterns. +3. Adapt those scripts to your own data and evaluate with your domain constraints. + +The examples repository is intended for hands-on implementation, not as a substitute for the proofs and derivations developed in the text. Keep the repository disclaimer in mind when applying any script directly to production or policy settings; examples are instructional templates and should be validated, stress-tested, and context-calibrated before operational use. diff --git a/tools/NNS/book/nns-book.log b/tools/NNS/book/nns-book.log new file mode 100644 index 0000000..9eca2e7 --- /dev/null +++ b/tools/NNS/book/nns-book.log @@ -0,0 +1,1850 @@ +This is XeTeX, Version 3.141592653-2.6-0.999998 (TeX Live 2026) (preloaded format=xelatex 2026.5.9) 16 MAY 2026 19:46 +entering extended mode + restricted \write18 enabled. + %&-line parsing enabled. +**nns-book.tex +(./nns-book.tex +LaTeX2e <2025-11-01> +L3 programming layer <2026-03-20> +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/base/book.cls +Document Class: book 2025/01/22 v1.4n Standard LaTeX document class +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/base/bk10.clo +File: bk10.clo 2025/01/22 v1.4n Standard LaTeX file (size option) +) +\c@part=\count271 +\c@chapter=\count272 +\c@section=\count273 +\c@subsection=\count274 +\c@subsubsection=\count275 +\c@paragraph=\count276 +\c@subparagraph=\count277 +\c@figure=\count278 +\c@table=\count279 +\abovecaptionskip=\skip49 +\belowcaptionskip=\skip50 +\bibindent=\dimen150 +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/xcolor/xcolor.sty +Package: xcolor 2024/09/29 v3.02 LaTeX color extensions (UK) +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics-cfg/color.cfg +File: color.cfg 2016/01/02 v1.6 sample color configuration +) +Package xcolor Info: Driver file: xetex.def on input line 274. +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics-def/xetex.def +File: xetex.def 2025/11/01 v5.0p Graphics/color driver for xetex +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics/mathcolor.ltx) +Package xcolor Info: Model `cmy' substituted by `cmy0' on input line 1349. +Package xcolor Info: Model `RGB' extended on input line 1365. +Package xcolor Info: Model `HTML' substituted by `rgb' on input line 1367. +Package xcolor Info: Model `Hsb' substituted by `hsb' on input line 1368. +Package xcolor Info: Model `tHsb' substituted by `hsb' on input line 1369. +Package xcolor Info: Model `HSB' substituted by `hsb' on input line 1370. +Package xcolor Info: Model `Gray' substituted by `gray' on input line 1371. +Package xcolor Info: Model `wave' substituted by `hsb' on input line 1372. +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsmath/amsmath.sty +Package: amsmath 2025/07/09 v2.17z AMS math features +\@mathmargin=\skip51 +For additional information on amsmath, use the `?' option. +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsmath/amstext.sty +Package: amstext 2024/11/17 v2.01 AMS text +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsmath/amsgen.sty +File: amsgen.sty 1999/11/30 v2.0 generic functions +\@emptytoks=\toks17 +\ex@=\dimen151 +)) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsmath/amsbsy.sty +Package: amsbsy 1999/11/29 v1.2d Bold Symbols +\pmbraise@=\dimen152 +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsmath/amsopn.sty +Package: amsopn 2022/04/08 v2.04 operator names +) +\inf@bad=\count280 +LaTeX Info: Redefining \frac on input line 233. +\uproot@=\count281 +\leftroot@=\count282 +LaTeX Info: Redefining \overline on input line 398. +LaTeX Info: Redefining \colon on input line 409. +\classnum@=\count283 +\DOTSCASE@=\count284 +LaTeX Info: Redefining \ldots on input line 495. +LaTeX Info: Redefining \dots on input line 498. +LaTeX Info: Redefining \cdots on input line 619. +\Mathstrutbox@=\box53 +\strutbox@=\box54 +LaTeX Info: Redefining \big on input line 721. +LaTeX Info: Redefining \Big on input line 722. +LaTeX Info: Redefining \bigg on input line 723. +LaTeX Info: Redefining \Bigg on input line 724. +\big@size=\dimen153 +LaTeX Font Info: Redeclaring font encoding OML on input line 742. +LaTeX Font Info: Redeclaring font encoding OMS on input line 743. +\macc@depth=\count285 +LaTeX Info: Redefining \bmod on input line 904. +LaTeX Info: Redefining \pmod on input line 909. +LaTeX Info: Redefining \smash on input line 939. +LaTeX Info: Redefining \relbar on input line 969. +LaTeX Info: Redefining \Relbar on input line 970. +\c@MaxMatrixCols=\count286 +\dotsspace@=\muskip17 +\c@parentequation=\count287 +\dspbrk@lvl=\count288 +\tag@help=\toks18 +\row@=\count289 +\column@=\count290 +\maxfields@=\count291 +\andhelp@=\toks19 +\eqnshift@=\dimen154 +\alignsep@=\dimen155 +\tagshift@=\dimen156 +\tagwidth@=\dimen157 +\totwidth@=\dimen158 +\lineht@=\dimen159 +\@envbody=\toks20 +\multlinegap=\skip52 +\multlinetaggap=\skip53 +\mathdisplay@stack=\toks21 +LaTeX Info: Redefining \[ on input line 2950. +LaTeX Info: Redefining \] on input line 2951. +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsfonts/amssymb.sty +Package: amssymb 2013/01/14 v3.01 AMS font symbols +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsfonts/amsfonts.sty +Package: amsfonts 2013/01/14 v3.01 Basic AMSFonts support +\symAMSa=\mathgroup4 +\symAMSb=\mathgroup5 +LaTeX Font Info: Redeclaring math symbol \hbar on input line 98. +LaTeX Font Info: Overwriting math alphabet `\mathfrak' in version `bold' +(Font) U/euf/m/n --> U/euf/b/n on input line 106. +)) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/iftex/iftex.sty +Package: iftex 2024/12/12 v1.0g TeX engine tests +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/unicode-math/unicode-math.sty (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/l3kernel/expl3.sty +Package: expl3 2026-03-20 L3 programming layer (loader) +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/l3backend/l3backend-xetex.def +File: l3backend-xetex.def 2026-02-18 L3 backend support: XeTeX +\g__graphics_track_int=\count292 +\g__pdfannot_backend_int=\count293 +\g__pdfannot_backend_link_int=\count294 +)) +Package: unicode-math 2023/08/13 v0.8r Unicode maths in XeLaTeX and LuaLaTeX +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/unicode-math/unicode-math-xetex.sty +Package: unicode-math-xetex 2023/08/13 v0.8r Unicode maths in XeLaTeX and LuaLaTeX +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/l3packages/xparse/xparse.sty +Package: xparse 2025-10-09 L3 Experimental document command parser +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/l3packages/l3keys2e/l3keys2e.sty +Package: l3keys2e 2025-10-09 LaTeX2e option processing using LaTeX3 keys +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/fontspec/fontspec.sty +Package: fontspec 2025/09/29 v2.9g Font selection for XeLaTeX and LuaLaTeX +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/fontspec/fontspec-xetex.sty +Package: fontspec-xetex 2025/09/29 v2.9g Font selection for XeLaTeX and LuaLaTeX +\l__fontspec_script_int=\count295 +\l__fontspec_language_int=\count296 +\l__fontspec_strnum_int=\count297 +\l__fontspec_tmp_int=\count298 +\l__fontspec_tmpa_int=\count299 +\l__fontspec_tmpb_int=\count300 +\l__fontspec_tmpc_int=\count301 +\l__fontspec_em_int=\count302 +\l__fontspec_emdef_int=\count303 +\l__fontspec_strong_int=\count304 +\l__fontspec_strongdef_int=\count305 +\l__fontspec_tmpa_dim=\dimen160 +\l__fontspec_tmpb_dim=\dimen161 +\l__fontspec_tmpc_dim=\dimen162 +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/base/fontenc.sty +Package: fontenc 2025/07/18 v2.1d Standard LaTeX package +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/fontspec/fontspec.cfg))) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/base/fix-cm.sty +Package: fix-cm 2020/11/24 v1.1t fixes to LaTeX +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/base/ts1enc.def +File: ts1enc.def 2001/06/05 v3.0e (jk/car/fm) Standard LaTeX file +LaTeX Font Info: Redeclaring font encoding TS1 on input line 47. +LaTeX Encoding Info: Redeclaring text command \capitalcedilla (encoding TS1) on input line 49. +LaTeX Encoding Info: Redeclaring text command \capitalogonek (encoding TS1) on input line 52. +LaTeX Encoding Info: Redeclaring text command \capitalgrave (encoding TS1) on input line 55. +LaTeX Encoding Info: Redeclaring text command \capitalacute (encoding TS1) on input line 56. +LaTeX Encoding Info: Redeclaring text command \capitalcircumflex (encoding TS1) on input line 57. +LaTeX Encoding Info: Redeclaring text command \capitaltilde (encoding TS1) on input line 58. +LaTeX Encoding Info: Redeclaring text command \capitaldieresis (encoding TS1) on input line 59. +LaTeX Encoding Info: Redeclaring text command \capitalhungarumlaut (encoding TS1) on input line 60. +LaTeX Encoding Info: Redeclaring text command \capitalring (encoding TS1) on input line 61. +LaTeX Encoding Info: Redeclaring text command \capitalcaron (encoding TS1) on input line 62. +LaTeX Encoding Info: Redeclaring text command \capitalbreve (encoding TS1) on input line 63. +LaTeX Encoding Info: Redeclaring text command \capitalmacron (encoding TS1) on input line 64. +LaTeX Encoding Info: Redeclaring text command \capitaldotaccent (encoding TS1) on input line 65. +LaTeX Encoding Info: Redeclaring text command \t (encoding TS1) on input line 66. +LaTeX Encoding Info: Redeclaring text command \capitaltie (encoding TS1) on input line 67. +LaTeX Encoding Info: Redeclaring text command \newtie (encoding TS1) on input line 68. +LaTeX Encoding Info: Redeclaring text command \capitalnewtie (encoding TS1) on input line 69. +LaTeX Encoding Info: Redeclaring text symbol \textcapitalcompwordmark (encoding TS1) on input line 70. +LaTeX Encoding Info: Redeclaring text symbol \textascendercompwordmark (encoding TS1) on input line 71. +LaTeX Encoding Info: Redeclaring text symbol \textquotestraightbase (encoding TS1) on input line 72. +LaTeX Encoding Info: Redeclaring text symbol \textquotestraightdblbase (encoding TS1) on input line 73. +LaTeX Encoding Info: Redeclaring text symbol \texttwelveudash (encoding TS1) on input line 74. +LaTeX Encoding Info: Redeclaring text symbol \textthreequartersemdash (encoding TS1) on input line 75. +LaTeX Encoding Info: Redeclaring text symbol \textleftarrow (encoding TS1) on input line 76. +LaTeX Encoding Info: Redeclaring text symbol \textrightarrow (encoding TS1) on input line 77. +LaTeX Encoding Info: Redeclaring text symbol \textblank (encoding TS1) on input line 78. +LaTeX Encoding Info: Redeclaring text symbol \textdollar (encoding TS1) on input line 79. +LaTeX Encoding Info: Redeclaring text symbol \textquotesingle (encoding TS1) on input line 80. +LaTeX Encoding Info: Redeclaring text command \textasteriskcentered (encoding TS1) on input line 81. +LaTeX Encoding Info: Redeclaring text symbol \textdblhyphen (encoding TS1) on input line 92. +LaTeX Encoding Info: Redeclaring text symbol \textfractionsolidus (encoding TS1) on input line 93. +LaTeX Encoding Info: Redeclaring text symbol \textzerooldstyle (encoding TS1) on input line 94. +LaTeX Encoding Info: Redeclaring text symbol \textoneoldstyle (encoding TS1) on input line 95. +LaTeX Encoding Info: Redeclaring text symbol \texttwooldstyle (encoding TS1) on input line 96. +LaTeX Encoding Info: Redeclaring text symbol \textthreeoldstyle (encoding TS1) on input line 97. +LaTeX Encoding Info: Redeclaring text symbol \textfouroldstyle (encoding TS1) on input line 98. +LaTeX Encoding Info: Redeclaring text symbol \textfiveoldstyle (encoding TS1) on input line 99. +LaTeX Encoding Info: Redeclaring text symbol \textsixoldstyle (encoding TS1) on input line 100. +LaTeX Encoding Info: Redeclaring text symbol \textsevenoldstyle (encoding TS1) on input line 101. +LaTeX Encoding Info: Redeclaring text symbol \texteightoldstyle (encoding TS1) on input line 102. +LaTeX Encoding Info: Redeclaring text symbol \textnineoldstyle (encoding TS1) on input line 103. +LaTeX Encoding Info: Redeclaring text symbol \textlangle (encoding TS1) on input line 104. +LaTeX Encoding Info: Redeclaring text symbol \textminus (encoding TS1) on input line 105. +LaTeX Encoding Info: Redeclaring text symbol \textrangle (encoding TS1) on input line 106. +LaTeX Encoding Info: Redeclaring text symbol \textmho (encoding TS1) on input line 107. +LaTeX Encoding Info: Redeclaring text symbol \textbigcircle (encoding TS1) on input line 108. +LaTeX Encoding Info: Redeclaring text command \textcircled (encoding TS1) on input line 109. +LaTeX Encoding Info: Redeclaring text symbol \textohm (encoding TS1) on input line 115. +LaTeX Encoding Info: Redeclaring text symbol \textlbrackdbl (encoding TS1) on input line 116. +LaTeX Encoding Info: Redeclaring text symbol \textrbrackdbl (encoding TS1) on input line 117. +LaTeX Encoding Info: Redeclaring text symbol \textuparrow (encoding TS1) on input line 118. +LaTeX Encoding Info: Redeclaring text symbol \textdownarrow (encoding TS1) on input line 119. +LaTeX Encoding Info: Redeclaring text symbol \textasciigrave (encoding TS1) on input line 120. +LaTeX Encoding Info: Redeclaring text symbol \textborn (encoding TS1) on input line 121. +LaTeX Encoding Info: Redeclaring text symbol \textdivorced (encoding TS1) on input line 122. +LaTeX Encoding Info: Redeclaring text symbol \textdied (encoding TS1) on input line 123. +LaTeX Encoding Info: Redeclaring text symbol \textleaf (encoding TS1) on input line 124. +LaTeX Encoding Info: Redeclaring text symbol \textmarried (encoding TS1) on input line 125. +LaTeX Encoding Info: Redeclaring text symbol \textmusicalnote (encoding TS1) on input line 126. +LaTeX Encoding Info: Redeclaring text symbol \texttildelow (encoding TS1) on input line 127. +LaTeX Encoding Info: Redeclaring text symbol \textdblhyphenchar (encoding TS1) on input line 128. +LaTeX Encoding Info: Redeclaring text symbol \textasciibreve (encoding TS1) on input line 129. +LaTeX Encoding Info: Redeclaring text symbol \textasciicaron (encoding TS1) on input line 130. +LaTeX Encoding Info: Redeclaring text symbol \textacutedbl (encoding TS1) on input line 131. +LaTeX Encoding Info: Redeclaring text symbol \textgravedbl (encoding TS1) on input line 132. +LaTeX Encoding Info: Redeclaring text symbol \textdagger (encoding TS1) on input line 133. +LaTeX Encoding Info: Redeclaring text symbol \textdaggerdbl (encoding TS1) on input line 134. +LaTeX Encoding Info: Redeclaring text symbol \textbardbl (encoding TS1) on input line 135. +LaTeX Encoding Info: Redeclaring text symbol \textperthousand (encoding TS1) on input line 136. +LaTeX Encoding Info: Redeclaring text symbol \textbullet (encoding TS1) on input line 137. +LaTeX Encoding Info: Redeclaring text symbol \textcelsius (encoding TS1) on input line 138. +LaTeX Encoding Info: Redeclaring text symbol \textdollaroldstyle (encoding TS1) on input line 139. +LaTeX Encoding Info: Redeclaring text symbol \textcentoldstyle (encoding TS1) on input line 140. +LaTeX Encoding Info: Redeclaring text symbol \textflorin (encoding TS1) on input line 141. +LaTeX Encoding Info: Redeclaring text symbol \textcolonmonetary (encoding TS1) on input line 142. +LaTeX Encoding Info: Redeclaring text symbol \textwon (encoding TS1) on input line 143. +LaTeX Encoding Info: Redeclaring text symbol \textnaira (encoding TS1) on input line 144. +LaTeX Encoding Info: Redeclaring text symbol \textguarani (encoding TS1) on input line 145. +LaTeX Encoding Info: Redeclaring text symbol \textpeso (encoding TS1) on input line 146. +LaTeX Encoding Info: Redeclaring text symbol \textlira (encoding TS1) on input line 147. +LaTeX Encoding Info: Redeclaring text symbol \textrecipe (encoding TS1) on input line 148. +LaTeX Encoding Info: Redeclaring text symbol \textinterrobang (encoding TS1) on input line 149. +LaTeX Encoding Info: Redeclaring text symbol \textinterrobangdown (encoding TS1) on input line 150. +LaTeX Encoding Info: Redeclaring text symbol \textdong (encoding TS1) on input line 151. +LaTeX Encoding Info: Redeclaring text symbol \texttrademark (encoding TS1) on input line 152. +LaTeX Encoding Info: Redeclaring text symbol \textpertenthousand (encoding TS1) on input line 153. +LaTeX Encoding Info: Redeclaring text symbol \textpilcrow (encoding TS1) on input line 154. +LaTeX Encoding Info: Redeclaring text symbol \textbaht (encoding TS1) on input line 155. +LaTeX Encoding Info: Redeclaring text symbol \textnumero (encoding TS1) on input line 156. +LaTeX Encoding Info: Redeclaring text symbol \textdiscount (encoding TS1) on input line 157. +LaTeX Encoding Info: Redeclaring text symbol \textestimated (encoding TS1) on input line 158. +LaTeX Encoding Info: Redeclaring text symbol \textopenbullet (encoding TS1) on input line 159. +LaTeX Encoding Info: Redeclaring text symbol \textservicemark (encoding TS1) on input line 160. +LaTeX Encoding Info: Redeclaring text symbol \textlquill (encoding TS1) on input line 161. +LaTeX Encoding Info: Redeclaring text symbol \textrquill (encoding TS1) on input line 162. +LaTeX Encoding Info: Redeclaring text symbol \textcent (encoding TS1) on input line 163. +LaTeX Encoding Info: Redeclaring text symbol \textsterling (encoding TS1) on input line 164. +LaTeX Encoding Info: Redeclaring text symbol \textcurrency (encoding TS1) on input line 165. +LaTeX Encoding Info: Redeclaring text symbol \textyen (encoding TS1) on input line 166. +LaTeX Encoding Info: Redeclaring text symbol \textbrokenbar (encoding TS1) on input line 167. +LaTeX Encoding Info: Redeclaring text symbol \textsection (encoding TS1) on input line 168. +LaTeX Encoding Info: Redeclaring text symbol \textasciidieresis (encoding TS1) on input line 169. +LaTeX Encoding Info: Redeclaring text symbol \textcopyright (encoding TS1) on input line 170. +LaTeX Encoding Info: Redeclaring text symbol \textordfeminine (encoding TS1) on input line 171. +LaTeX Encoding Info: Redeclaring text symbol \textcopyleft (encoding TS1) on input line 172. +LaTeX Encoding Info: Redeclaring text symbol \textlnot (encoding TS1) on input line 173. +LaTeX Encoding Info: Redeclaring text symbol \textcircledP (encoding TS1) on input line 174. +LaTeX Encoding Info: Redeclaring text symbol \textregistered (encoding TS1) on input line 175. +LaTeX Encoding Info: Redeclaring text symbol \textasciimacron (encoding TS1) on input line 176. +LaTeX Encoding Info: Redeclaring text symbol \textdegree (encoding TS1) on input line 177. +LaTeX Encoding Info: Redeclaring text symbol \textpm (encoding TS1) on input line 178. +LaTeX Encoding Info: Redeclaring text symbol \texttwosuperior (encoding TS1) on input line 179. +LaTeX Encoding Info: Redeclaring text symbol \textthreesuperior (encoding TS1) on input line 180. +LaTeX Encoding Info: Redeclaring text symbol \textasciiacute (encoding TS1) on input line 181. +LaTeX Encoding Info: Redeclaring text symbol \textmu (encoding TS1) on input line 182. +LaTeX Encoding Info: Redeclaring text symbol \textparagraph (encoding TS1) on input line 183. +LaTeX Encoding Info: Redeclaring text symbol \textperiodcentered (encoding TS1) on input line 184. +LaTeX Encoding Info: Redeclaring text symbol \textreferencemark (encoding TS1) on input line 185. +LaTeX Encoding Info: Redeclaring text symbol \textonesuperior (encoding TS1) on input line 186. +LaTeX Encoding Info: Redeclaring text symbol \textordmasculine (encoding TS1) on input line 187. +LaTeX Encoding Info: Redeclaring text symbol \textsurd (encoding TS1) on input line 188. +LaTeX Encoding Info: Redeclaring text symbol \textonequarter (encoding TS1) on input line 189. +LaTeX Encoding Info: Redeclaring text symbol \textonehalf (encoding TS1) on input line 190. +LaTeX Encoding Info: Redeclaring text symbol \textthreequarters (encoding TS1) on input line 191. +LaTeX Encoding Info: Redeclaring text symbol \texteuro (encoding TS1) on input line 192. +LaTeX Encoding Info: Redeclaring text symbol \texttimes (encoding TS1) on input line 193. +LaTeX Encoding Info: Redeclaring text symbol \textdiv (encoding TS1) on input line 194. +)) +\g__um_fam_int=\count306 +\g__um_fonts_used_int=\count307 +\l__um_primecount_int=\count308 +\g__um_primekern_muskip=\muskip18 +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/unicode-math/unicode-math-table.tex))) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/lm/lmodern.sty +Package: lmodern 2015/05/01 v1.6.1 Latin Modern Fonts +LaTeX Font Info: Overwriting symbol font `operators' in version `normal' +(Font) OT1/cmr/m/n --> OT1/lmr/m/n on input line 22. +LaTeX Font Info: Overwriting symbol font `letters' in version `normal' +(Font) OML/cmm/m/it --> OML/lmm/m/it on input line 23. +LaTeX Font Info: Overwriting symbol font `symbols' in version `normal' +(Font) OMS/cmsy/m/n --> OMS/lmsy/m/n on input line 24. +LaTeX Font Info: Overwriting symbol font `largesymbols' in version `normal' +(Font) OMX/cmex/m/n --> OMX/lmex/m/n on input line 25. +LaTeX Font Info: Overwriting symbol font `operators' in version `bold' +(Font) OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 26. +LaTeX Font Info: Overwriting symbol font `letters' in version `bold' +(Font) OML/cmm/b/it --> OML/lmm/b/it on input line 27. +LaTeX Font Info: Overwriting symbol font `symbols' in version `bold' +(Font) OMS/cmsy/b/n --> OMS/lmsy/b/n on input line 28. +LaTeX Font Info: Overwriting symbol font `largesymbols' in version `bold' +(Font) OMX/cmex/m/n --> OMX/lmex/m/n on input line 29. +LaTeX Font Info: Overwriting math alphabet `\mathbf' in version `normal' +(Font) OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 31. +LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `normal' +(Font) OT1/cmss/m/n --> OT1/lmss/m/n on input line 32. +LaTeX Font Info: Overwriting math alphabet `\mathit' in version `normal' +(Font) OT1/cmr/m/it --> OT1/lmr/m/it on input line 33. +LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `normal' +(Font) OT1/cmtt/m/n --> OT1/lmtt/m/n on input line 34. +LaTeX Font Info: Overwriting math alphabet `\mathbf' in version `bold' +(Font) OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 35. +LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `bold' +(Font) OT1/cmss/bx/n --> OT1/lmss/bx/n on input line 36. +LaTeX Font Info: Overwriting math alphabet `\mathit' in version `bold' +(Font) OT1/cmr/bx/it --> OT1/lmr/bx/it on input line 37. +LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `bold' +(Font) OT1/cmtt/m/n --> OT1/lmtt/m/n on input line 38. +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/upquote/upquote.sty +Package: upquote 2012/04/19 v1.3 upright-quote and grave-accent glyphs in verbatim +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/base/textcomp.sty +Package: textcomp 2024/04/24 v2.1b Standard LaTeX package +)) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/microtype/microtype.sty +Package: microtype 2026/03/01 v3.2d Micro-typographical refinements (RS) +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics/keyval.sty +Package: keyval 2022/05/29 v1.15 key=value parser (DPC) +\KV@toks@=\toks22 +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/etoolbox/etoolbox.sty +Package: etoolbox 2025/10/02 v2.5m e-TeX tools for LaTeX (JAW) +\etb@tempcnta=\count309 +) +\MT@toks=\toks23 +\MT@tempbox=\box55 +\MT@count=\count310 +LaTeX Info: Redefining \noprotrusionifhmode on input line 1084. +LaTeX Info: Redefining \leftprotrusion on input line 1085. +\MT@prot@toks=\toks24 +LaTeX Info: Redefining \rightprotrusion on input line 1104. +LaTeX Info: Redefining \textls on input line 1449. +\MT@outer@kern=\dimen163 +LaTeX Info: Redefining \microtypecontext on input line 2053. +LaTeX Info: Redefining \textmicrotypecontext on input line 2070. +\MT@listname@count=\count311 +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/microtype/microtype-xetex.def +File: microtype-xetex.def 2026/03/01 v3.2d Definitions specific to xetex (RS) +LaTeX Info: Redefining \lsstyle on input line 443. +LaTeX Info: Redefining \lslig on input line 451. +\MT@outer@space=\skip54 +) +Package microtype Info: Loading configuration file microtype.cfg. +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/microtype/microtype.cfg +File: microtype.cfg 2026/03/01 v3.2d microtype main configuration file (RS) +) +LaTeX Info: Redefining \microtypesetup on input line 3065. +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/parskip/parskip.sty +Package: parskip 2021-03-14 v2.0h non-zero parskip adjustments +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/kvoptions/kvoptions.sty +Package: kvoptions 2022-06-15 v3.15 Key value format for package options (HO) +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/ltxcmds/ltxcmds.sty +Package: ltxcmds 2023-12-04 v1.26 LaTeX kernel commands for general use (HO) +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/kvsetkeys/kvsetkeys.sty +Package: kvsetkeys 2022-10-05 v1.19 Key value parser (HO) +))) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/fancyvrb/fancyvrb.sty +Package: fancyvrb 2026/04/16 4.6a verbatim text (tvz,hv) +\FV@CodeLineNo=\count312 +\FV@InFile=\read2 +\FV@TabBox=\box56 +\c@FancyVerbLine=\count313 +\FV@StepNumber=\count314 +\FV@OutFile=\write3 +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/framed/framed.sty +Package: framed 2011/10/22 v 0.96: framed or shaded text with page breaks +\OuterFrameSep=\skip55 +\fb@frw=\dimen164 +\fb@frh=\dimen165 +\FrameRule=\dimen166 +\FrameSep=\dimen167 +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/tools/longtable.sty +Package: longtable 2025-10-13 v4.24 Multi-page Table package (DPC) +\LTleft=\skip56 +\LTright=\skip57 +\LTpre=\skip58 +\LTpost=\skip59 +\LTchunksize=\count315 +\LTcapwidth=\dimen168 +\LT@head=\box57 +\LT@firsthead=\box58 +\LT@foot=\box59 +\LT@lastfoot=\box60 +\LT@gbox=\box61 +\LT@cols=\count316 +\LT@rows=\count317 +\c@LT@tables=\count318 +\c@LT@chunks=\count319 +\LT@p@ftn=\toks25 +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/booktabs/booktabs.sty +Package: booktabs 2020/01/12 v1.61803398 Publication quality tables +\heavyrulewidth=\dimen169 +\lightrulewidth=\dimen170 +\cmidrulewidth=\dimen171 +\belowrulesep=\dimen172 +\belowbottomsep=\dimen173 +\aboverulesep=\dimen174 +\abovetopsep=\dimen175 +\cmidrulesep=\dimen176 +\cmidrulekern=\dimen177 +\defaultaddspace=\dimen178 +\@cmidla=\count320 +\@cmidlb=\count321 +\@aboverulesep=\dimen179 +\@belowrulesep=\dimen180 +\@thisruleclass=\count322 +\@lastruleclass=\count323 +\@thisrulewidth=\dimen181 +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/tools/array.sty +Package: array 2025/09/25 v2.6n Tabular extension package (FMi) +\col@sep=\dimen182 +\ar@mcellbox=\box62 +\extrarowheight=\dimen183 +\NC@list=\toks26 +\extratabsurround=\skip60 +\backup@length=\skip61 +\ar@cellbox=\box63 +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/tools/calc.sty +Package: calc 2025/03/01 v4.3b Infix arithmetic (KKT,FJ) +\calc@Acount=\count324 +\calc@Bcount=\count325 +\calc@Adimen=\dimen184 +\calc@Bdimen=\dimen185 +\calc@Askip=\skip62 +\calc@Bskip=\skip63 +LaTeX Info: Redefining \setlength on input line 86. +LaTeX Info: Redefining \addtolength on input line 87. +\calc@Ccount=\count326 +\calc@Cskip=\skip64 +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/mdwtools/footnote.sty +Package: footnote 1997/01/28 1.13 Save footnotes around boxes +\fn@notes=\box64 +\fn@width=\dimen186 +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics/graphicx.sty +Package: graphicx 2024/12/31 v1.2e Enhanced LaTeX Graphics (DPC,SPQR) +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics/graphics.sty +Package: graphics 2024/08/06 v1.4g Standard LaTeX Graphics (DPC,SPQR) +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics/trig.sty +Package: trig 2023/12/02 v1.11 sin cos tan (DPC) +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics-cfg/graphics.cfg +File: graphics.cfg 2016/06/04 v1.11 sample graphics configuration +) +Package graphics Info: Driver file: xetex.def on input line 106. +) +\Gin@req@height=\dimen187 +\Gin@req@width=\dimen188 +) +\pandoc@box=\box65 +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/bookmark/bookmark.sty +Package: bookmark 2023-12-10 v1.31 PDF bookmarks (HO) +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/hyperref/hyperref.sty +Package: hyperref 2026-04-24 v7.01q Hypertext links for LaTeX +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/kvdefinekeys/kvdefinekeys.sty +Package: kvdefinekeys 2019-12-19 v1.6 Define keys (HO) +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/pdfescape/pdfescape.sty +Package: pdfescape 2019/12/09 v1.15 Implements pdfTeX's escape features (HO) +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/pdftexcmds/pdftexcmds.sty +Package: pdftexcmds 2020-06-27 v0.33 Utility functions of pdfTeX for LuaTeX (HO) +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/infwarerr/infwarerr.sty +Package: infwarerr 2019/12/03 v1.5 Providing info/warning/error messages (HO) +) +Package pdftexcmds Info: \pdf@primitive is available. +Package pdftexcmds Info: \pdf@ifprimitive is available. +Package pdftexcmds Info: \pdfdraftmode not found. +)) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/hycolor/hycolor.sty +Package: hycolor 2020-01-27 v1.10 Color options for hyperref/bookmark (HO) +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/hyperref/nameref.sty +Package: nameref 2026-01-29 v2.58 Cross-referencing by name of section +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/refcount/refcount.sty +Package: refcount 2019/12/15 v3.6 Data extraction from label references (HO) +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/gettitlestring/gettitlestring.sty +Package: gettitlestring 2019/12/15 v1.6 Cleanup title references (HO) +) +\c@section@level=\count327 +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/stringenc/stringenc.sty +Package: stringenc 2019/11/29 v1.12 Convert strings between diff. encodings (HO) +) +\@linkdim=\dimen189 +\Hy@linkcounter=\count328 +\Hy@pagecounter=\count329 +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/hyperref/pd1enc.def +File: pd1enc.def 2026-04-24 v7.01q Hyperref: PDFDocEncoding definition (HO) +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/intcalc/intcalc.sty +Package: intcalc 2019/12/15 v1.3 Expandable calculations with integers (HO) +) +\Hy@SavedSpaceFactor=\count330 +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/hyperref/puenc.def +File: puenc.def 2026-04-24 v7.01q Hyperref: PDF Unicode definition (HO) +) +Package hyperref Info: Option `unicode' set `true' on input line 4070. +Package hyperref Info: Hyper figures OFF on input line 4199. +Package hyperref Info: Link nesting OFF on input line 4204. +Package hyperref Info: Hyper index ON on input line 4207. +Package hyperref Info: Plain pages OFF on input line 4214. +Package hyperref Info: Backreferencing OFF on input line 4219. +Package hyperref Info: Implicit mode ON; LaTeX internals redefined. +Package hyperref Info: Bookmarks ON on input line 4466. +\c@Hy@tempcnt=\count331 +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/url/url.sty +\Urlmuskip=\muskip19 +Package: url 2013/09/16 ver 3.4 Verb mode for urls, etc. +) +LaTeX Info: Redefining \url on input line 4805. +\XeTeXLinkMargin=\dimen190 +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/bitset/bitset.sty +Package: bitset 2019/12/09 v1.3 Handle bit-vector datatype (HO) +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/bigintcalc/bigintcalc.sty +Package: bigintcalc 2019/12/15 v1.5 Expandable calculations on big integers (HO) +)) +\Fld@menulength=\count332 +\Field@Width=\dimen191 +\Fld@charsize=\dimen192 +Package hyperref Info: Hyper figures OFF on input line 6091. +Package hyperref Info: Link nesting OFF on input line 6096. +Package hyperref Info: Hyper index ON on input line 6099. +Package hyperref Info: backreferencing OFF on input line 6106. +Package hyperref Info: Link coloring OFF on input line 6111. +Package hyperref Info: Link coloring with OCG OFF on input line 6116. +Package hyperref Info: PDF/A mode OFF on input line 6121. +\Hy@abspage=\count333 +\c@Item=\count334 +\c@Hfootnote=\count335 +) +Package hyperref Info: Driver (autodetected): hxetex. +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/hyperref/hxetex.def +File: hxetex.def 2026-04-24 v7.01q Hyperref driver for XeTeX +\pdfm@box=\box66 +\c@Hy@AnnotLevel=\count336 +\HyField@AnnotCount=\count337 +\Fld@listcount=\count338 +\c@bookmark@seq@number=\count339 +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/rerunfilecheck/rerunfilecheck.sty +Package: rerunfilecheck 2025-06-21 v1.11 Rerun checks for auxiliary files (HO) +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/uniquecounter/uniquecounter.sty +Package: uniquecounter 2019/12/15 v1.4 Provide unlimited unique counter (HO) +) +Package uniquecounter Info: New unique counter `rerunfilecheck' on input line 284. +) +\Hy@SectionHShift=\skip65 +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/bookmark/bkm-dvipdfm.def +File: bkm-dvipdfm.def 2023-12-10 v1.31 bookmark driver for dvipdfm (HO) +\BKM@id=\count340 +)) (./nns-book.aux) +\openout1 = `nns-book.aux'. + +LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 128. +LaTeX Font Info: ... okay on input line 128. +LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 128. +LaTeX Font Info: ... okay on input line 128. +LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 128. +LaTeX Font Info: ... okay on input line 128. +LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 128. +LaTeX Font Info: ... okay on input line 128. +LaTeX Font Info: Checking defaults for TS1/cmr/m/n on input line 128. +LaTeX Font Info: ... okay on input line 128. +LaTeX Font Info: Checking defaults for TU/lmr/m/n on input line 128. +LaTeX Font Info: ... okay on input line 128. +LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 128. +LaTeX Font Info: ... okay on input line 128. +LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 128. +LaTeX Font Info: ... okay on input line 128. +LaTeX Font Info: Checking defaults for PD1/pdf/m/n on input line 128. +LaTeX Font Info: ... okay on input line 128. +LaTeX Font Info: Checking defaults for PU/pdf/m/n on input line 128. +LaTeX Font Info: ... okay on input line 128. +LaTeX Font Info: Overwriting math alphabet `\mathrm' in version `normal' +(Font) OT1/lmr/m/n --> TU/lmr/m/n on input line 128. +LaTeX Font Info: Overwriting math alphabet `\mathit' in version `normal' +(Font) OT1/lmr/m/it --> TU/lmr/m/it on input line 128. +LaTeX Font Info: Overwriting math alphabet `\mathbf' in version `normal' +(Font) OT1/lmr/bx/n --> TU/lmr/bx/n on input line 128. +LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `normal' +(Font) OT1/lmss/m/n --> TU/lmss/m/n on input line 128. +LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `bold' +(Font) OT1/lmss/bx/n --> TU/lmss/bx/n on input line 128. +LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `normal' +(Font) OT1/lmtt/m/n --> TU/lmtt/m/n on input line 128. +LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `bold' +(Font) OT1/lmtt/m/n --> TU/lmtt/bx/n on input line 128. + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) Font family 'latinmodern-math.otf(0)' created for font +(fontspec) 'latinmodern-math.otf' with options +(fontspec) [Scale=MatchLowercase,BoldItalicFont={},ItalicFont={},SmallCapsFont={},Script=Math,BoldFont={latinmodern-math.otf}]. +(fontspec) +(fontspec) This font family consists of the following NFSS +(fontspec) series/shapes: +(fontspec) +(fontspec) - 'normal' (m/n) with NFSS spec.: +(fontspec) <->s*[0.9999964596882403]"[latinmodern-math.otf]/OT:script=math;language=dflt;" +(fontspec) - 'bold' (b/n) with NFSS spec.: +(fontspec) <->s*[0.9999964596882403]"[latinmodern-math.otf]/OT:script=math;language=dflt;" + +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(0)/m/n' will be +(Font) scaled to size 10.0pt on input line 128. + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) Font family 'latinmodern-math.otf(1)' created for font +(fontspec) 'latinmodern-math.otf' with options +(fontspec) [Scale=MatchLowercase,BoldItalicFont={},ItalicFont={},SmallCapsFont={},Script=Math,SizeFeatures={{Size=8.5-},{Size=6-8.5,Font=latinmodern-math.otf,Style=MathScript},{Size=-6,Font=latinmodern-math.otf,Style=MathScriptScript}},BoldFont={latinmodern-math.otf}]. +(fontspec) +(fontspec) This font family consists of the following NFSS +(fontspec) series/shapes: +(fontspec) +(fontspec) - 'normal' (m/n) with NFSS spec.: +(fontspec) <8.5->s*[0.9999964596882403]"[latinmodern-math.otf]/OT:script=math;language=dflt;"<6-8.5>s*[0.9999964596882403]"[latinmodern-math.otf]/OT:script=math;language=dflt;+ssty=0;"<-6>s*[0.9999964596882403]"[latinmodern-math.otf]/OT:script=math;language=dflt;+ssty=1;" +(fontspec) - 'bold' (b/n) with NFSS spec.: +(fontspec) <->s*[0.9999964596882403]"[latinmodern-math.otf]/OT:script=math;language=dflt;" + +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(1)/m/n' will be +(Font) scaled to size 10.0pt on input line 128. +LaTeX Font Info: Encoding `OT1' has changed to `TU' for symbol font +(Font) `operators' in the math version `normal' on input line 128. +LaTeX Font Info: Overwriting symbol font `operators' in version `normal' +(Font) OT1/lmr/m/n --> TU/latinmodern-math.otf(1)/m/n on input line 128. +LaTeX Font Info: Encoding `OT1' has changed to `TU' for symbol font +(Font) `operators' in the math version `bold' on input line 128. +LaTeX Font Info: Overwriting symbol font `operators' in version `bold' +(Font) OT1/lmr/bx/n --> TU/latinmodern-math.otf(1)/b/n on input line 128. + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 1.000096459334209. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 1.000096459334209. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 1.000096459334209. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 1.000096459334209. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 1.000096459334209. + + +Package fontspec Info: +(fontspec) Font family 'latinmodern-math.otf(2)' created for font +(fontspec) 'latinmodern-math.otf' with options +(fontspec) [Scale=MatchLowercase,BoldItalicFont={},ItalicFont={},SmallCapsFont={},Script=Math,SizeFeatures={{Size=8.5-},{Size=6-8.5,Font=latinmodern-math.otf,Style=MathScript},{Size=-6,Font=latinmodern-math.otf,Style=MathScriptScript}},BoldFont={latinmodern-math.otf},ScaleAgain=1.0001,FontAdjustment={\fontdimen +(fontspec) 8\font =6.77pt\relax \fontdimen 9\font =3.94pt\relax +(fontspec) \fontdimen 10\font =4.44pt\relax \fontdimen 11\font +(fontspec) =6.86pt\relax \fontdimen 12\font =3.45pt\relax +(fontspec) \fontdimen 13\font =3.63pt\relax \fontdimen 14\font +(fontspec) =3.63pt\relax \fontdimen 15\font =2.89pt\relax +(fontspec) \fontdimen 16\font =2.47pt\relax \fontdimen 17\font +(fontspec) =2.47pt\relax \fontdimen 18\font =2.5pt\relax +(fontspec) \fontdimen 19\font =2.0pt\relax \fontdimen 22\font +(fontspec) =2.5pt\relax \fontdimen 20\font =0pt\relax \fontdimen +(fontspec) 21\font =0pt\relax }]. +(fontspec) +(fontspec) This font family consists of the following NFSS +(fontspec) series/shapes: +(fontspec) +(fontspec) - 'normal' (m/n) with NFSS spec.: +(fontspec) <8.5->s*[1.000096459334209]"[latinmodern-math.otf]/OT:script=math;language=dflt;"<6-8.5>s*[1.000096459334209]"[latinmodern-math.otf]/OT:script=math;language=dflt;+ssty=0;"<-6>s*[1.000096459334209]"[latinmodern-math.otf]/OT:script=math;language=dflt;+ssty=1;" +(fontspec) - 'bold' (b/n) with NFSS spec.: +(fontspec) <->s*[1.000096459334209]"[latinmodern-math.otf]/OT:script=math;language=dflt;" + +LaTeX Font Info: Encoding `OMS' has changed to `TU' for symbol font +(Font) `symbols' in the math version `normal' on input line 128. +LaTeX Font Info: Overwriting symbol font `symbols' in version `normal' +(Font) OMS/lmsy/m/n --> TU/latinmodern-math.otf(2)/m/n on input line 128. +LaTeX Font Info: Encoding `OMS' has changed to `TU' for symbol font +(Font) `symbols' in the math version `bold' on input line 128. +LaTeX Font Info: Overwriting symbol font `symbols' in version `bold' +(Font) OMS/lmsy/b/n --> TU/latinmodern-math.otf(2)/b/n on input line 128. + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9998964600422715. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9998964600422715. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9998964600422715. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9998964600422715. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9999964596882403. + + +Package fontspec Info: +(fontspec) latinmodern-math scale = 0.9998964600422715. + + +Package fontspec Info: +(fontspec) Font family 'latinmodern-math.otf(3)' created for font +(fontspec) 'latinmodern-math.otf' with options +(fontspec) [Scale=MatchLowercase,BoldItalicFont={},ItalicFont={},SmallCapsFont={},Script=Math,SizeFeatures={{Size=8.5-},{Size=6-8.5,Font=latinmodern-math.otf,Style=MathScript},{Size=-6,Font=latinmodern-math.otf,Style=MathScriptScript}},BoldFont={latinmodern-math.otf},ScaleAgain=0.9999,FontAdjustment={\fontdimen +(fontspec) 8\font =0.4pt\relax \fontdimen 9\font =2.0pt\relax +(fontspec) \fontdimen 10\font =1.67pt\relax \fontdimen 11\font +(fontspec) =1.11pt\relax \fontdimen 12\font =6.0pt\relax +(fontspec) \fontdimen 13\font =0pt\relax }]. +(fontspec) +(fontspec) This font family consists of the following NFSS +(fontspec) series/shapes: +(fontspec) +(fontspec) - 'normal' (m/n) with NFSS spec.: +(fontspec) <8.5->s*[0.9998964600422715]"[latinmodern-math.otf]/OT:script=math;language=dflt;"<6-8.5>s*[0.9998964600422715]"[latinmodern-math.otf]/OT:script=math;language=dflt;+ssty=0;"<-6>s*[0.9998964600422715]"[latinmodern-math.otf]/OT:script=math;language=dflt;+ssty=1;" +(fontspec) - 'bold' (b/n) with NFSS spec.: +(fontspec) <->s*[0.9998964600422715]"[latinmodern-math.otf]/OT:script=math;language=dflt;" + +LaTeX Font Info: Encoding `OMX' has changed to `TU' for symbol font +(Font) `largesymbols' in the math version `normal' on input line 128. +LaTeX Font Info: Overwriting symbol font `largesymbols' in version `normal' +(Font) OMX/lmex/m/n --> TU/latinmodern-math.otf(3)/m/n on input line 128. +LaTeX Font Info: Encoding `OMX' has changed to `TU' for symbol font +(Font) `largesymbols' in the math version `bold' on input line 128. +LaTeX Font Info: Overwriting symbol font `largesymbols' in version `bold' +(Font) OMX/lmex/m/n --> TU/latinmodern-math.otf(3)/b/n on input line 128. +LaTeX Info: Redefining \microtypecontext on input line 128. +Package microtype Info: Applying patch `item' on input line 128. +Package microtype Info: Applying patch `toc' on input line 128. +Package microtype Info: Applying patch `eqnum' on input line 128. +Package microtype Info: Applying patch `footnote' on input line 128. +Package microtype Info: Applying patch `verbatim' on input line 128. +LaTeX Info: Redefining \microtypesetup on input line 128. +Package microtype Info: Character protrusion enabled (level 2). +Package microtype Info: Using protrusion set `basicmath'. +Package microtype Info: No adjustment of tracking. +Package microtype Info: No adjustment of spacing. +Package microtype Info: No adjustment of kerning. +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/microtype/mt-LatinModernRoman.cfg +File: mt-LatinModernRoman.cfg 2026/02/26 v1.2 microtype config. file: Latin Modern Roman (RS) +) +Package hyperref Info: Link coloring OFF on input line 128. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(1)/m/n' will be +(Font) scaled to size 12.0pt on input line 130. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(1)/m/n' will be +(Font) scaled to size 8.0pt on input line 130. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(1)/m/n' will be +(Font) scaled to size 6.0pt on input line 130. +LaTeX Font Info: Trying to load font information for OML+lmm on input line 130. +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/lm/omllmm.fd +File: omllmm.fd 2015/05/01 v1.6.1 Font defs for Latin Modern +) +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(2)/m/n' will be +(Font) scaled to size 12.0011pt on input line 130. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(2)/m/n' will be +(Font) scaled to size 8.00073pt on input line 130. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(2)/m/n' will be +(Font) scaled to size 6.00055pt on input line 130. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(3)/m/n' will be +(Font) scaled to size 11.99872pt on input line 130. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(3)/m/n' will be +(Font) scaled to size 7.99915pt on input line 130. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(3)/m/n' will be +(Font) scaled to size 5.99936pt on input line 130. +LaTeX Font Info: Trying to load font information for U+msa on input line 130. +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsfonts/umsa.fd +File: umsa.fd 2013/01/14 v3.01 AMS symbols A +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/microtype/mt-msa.cfg +File: mt-msa.cfg 2006/02/04 v1.1 microtype config. file: AMS symbols (a) (RS) +) +LaTeX Font Info: Trying to load font information for U+msb on input line 130. +(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsfonts/umsb.fd +File: umsb.fd 2013/01/14 v3.01 AMS symbols B +) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/microtype/mt-msb.cfg +File: mt-msb.cfg 2005/06/01 v1.0 microtype config. file: AMS symbols (b) (RS) +) [1 + + +] [2 + +] (./nns-book.toc +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(1)/m/n' will be +(Font) scaled to size 7.0pt on input line 2. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(1)/m/n' will be +(Font) scaled to size 5.0pt on input line 2. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(2)/m/n' will be +(Font) scaled to size 10.00092pt on input line 2. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(2)/m/n' will be +(Font) scaled to size 7.00064pt on input line 2. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(2)/m/n' will be +(Font) scaled to size 5.00046pt on input line 2. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(3)/m/n' will be +(Font) scaled to size 9.99893pt on input line 2. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(3)/m/n' will be +(Font) scaled to size 6.99925pt on input line 2. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(3)/m/n' will be +(Font) scaled to size 4.99947pt on input line 2. +[3] [4] [5] [6] [7] [8] [9] [10]) +\tf@toc=\write4 +\openout4 = `nns-book.toc'. + +[11] [12 + +] [13] +Underfull \hbox (badness 10000) in paragraph at lines 198--199 +[][]$[][][][][] [] [] [] [][][][][][] [] [][][] [] [][][][] [][] [][][][][][][][][] [] [][][] [] [][][][] [] [][][] [][] [][][][] [][] [][][][][][][] [] + [] + +[14] +Chapter 1. + +Underfull \vbox (badness 1014) has occurred while \output is active [] + +[15 + +] [16] [17] [18] [19] [20] +Chapter 2. +[21 + +] [22] [23] [24] [25] [26] [27] [28 + +] +Chapter 3. +[29] [30] +LaTeX Font Info: Font shape `TU/lmtt/bx/n' in size <10> not available +(Font) Font shape `TU/lmtt/b/n' tried instead on input line 920. +File: images/ch3_cdf_lpm0.png Graphic file (type bmp) + +[31] [32] [33] +Underfull \hbox (badness 4144) in paragraph at lines 1083--1083 +[]\TU/lmr/bx/n/12 Lower-Tail Thresholds as Degree-Zero Partial- + [] + +[34] [35] [36] [37] [38 + +] +Chapter 4. +[39] [40] [41] [42] [43] [44] +Overfull \hbox (11.9321pt too wide) detected at line 1492 +[] [] [] [] [][][] + [] + +[45] +Overfull \hbox (73.20363pt too wide) detected at line 1530 +[][] [][] [][] + [] + +[46] [47] [48 + +] +Chapter 5. +[49] +Underfull \hbox (badness 10000) in paragraph at lines 1596--1598 + + [] + + +Underfull \hbox (badness 10000) in paragraph at lines 1598--1600 + + [] + + +Underfull \hbox (badness 10000) in paragraph at lines 1600--1602 + + [] + + +Overfull \hbox (7.22562pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 50 \TU/lmr/m/sl/10 CHAPTER 5. CLASSICAL MOMENTS AS DIRECTIONAL AGGREGATES + [] + +[50] [51] +Overfull \hbox (7.22562pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 52 \TU/lmr/m/sl/10 CHAPTER 5. CLASSICAL MOMENTS AS DIRECTIONAL AGGREGATES + [] + +[52] [53] +Overfull \hbox (7.22562pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 54 \TU/lmr/m/sl/10 CHAPTER 5. CLASSICAL MOMENTS AS DIRECTIONAL AGGREGATES + [] + +[54] [55] +Overfull \hbox (7.22562pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 56 \TU/lmr/m/sl/10 CHAPTER 5. CLASSICAL MOMENTS AS DIRECTIONAL AGGREGATES + [] + +[56 + +] +Chapter 6. + +Underfull \hbox (badness 10000) in paragraph at lines 1923--1925 + + [] + + +Underfull \hbox (badness 10000) in paragraph at lines 1925--1927 + + [] + + +Underfull \hbox (badness 10000) in paragraph at lines 1927--1929 + + [] + +[57] +Underfull \hbox (badness 10000) in paragraph at lines 1962--1964 + + [] + +Missing character: There is no σ (U+03C3) in font [lmroman10-regular]:mapping=tex-text;! + +Underfull \hbox (badness 10000) in paragraph at lines 1964--1966 + + [] + +[58] +Underfull \hbox (badness 10000) in paragraph at lines 2012--2014 + + [] + +[59] [60] [61] [62] [63] [64] +Chapter 7. +[65 + +] +Underfull \hbox (badness 10000) in paragraph at lines 2319--2321 + + [] + +[66] [67] +Underfull \hbox (badness 10000) in paragraph at lines 2444--2446 + + [] + +[68] [69] [70] +Underfull \hbox (badness 10000) in paragraph at lines 2604--2606 + + [] + + +Underfull \hbox (badness 10000) in paragraph at lines 2606--2608 + + [] + + +Underfull \hbox (badness 10000) in paragraph at lines 2608--2610 + + [] + +[71] [72] [73] [74 + +] +Chapter 8. + +Underfull \hbox (badness 10000) in paragraph at lines 2673--2675 + + [] + +[75] [76] [77] [78] [79] +Underfull \hbox (badness 10000) in paragraph at lines 2924--2926 + + [] + + +Underfull \hbox (badness 10000) in paragraph at lines 2926--2928 + + [] + + +Overfull \hbox (27.84775pt too wide) detected at line 2939 +[][][][] [] [] [] [][][][] [][][][] [] [][][] [] [] [] [] [] [] [] [][] [] [][][][] [][][][] [] [][][] [] [] [] [] [] [] [] [][] [] [][][][] + [] + +[80] [81] [82] [83] [84 + +] +Chapter 9. +[85] [86] [87] +Overfull \hbox (31.036pt too wide) detected at line 3294 +[][][][] [][] [] [][][][][] [][] [] [][][][][] [][] [] [][][][][] [][] [] [][][][][] [][][] + [] + +[88] [89] [90] +Overfull \hbox (93.31343pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 9.5. GRAM-MATRIX STRUCTURE OF CONCORDANT CO-PARTIAL MOMENT MATRICES \TU/lmr/m/n/10 91 + [] + +[91] [92] [93] [94] [95] [96 + +] +Chapter 10. +[97] [98] [99] [100] + +LaTeX Font Warning: Font shape `TU/lmtt/bx/it' in size <10> not available +(Font) Font shape `TU/lmtt/b/sl' tried instead on input line 3990. + +[101] [102] [103] [104] +Chapter 11. + +Overfull \hbox (91.39001pt too wide) detected at line 4153 +[] + [] + +[105 + +] [106] +Overfull \hbox (6.46002pt too wide) detected at line 4284 +[] + [] + +[107] +Underfull \hbox (badness 2478) in paragraph at lines 4355--4355 +[]\TU/lmr/bx/n/14.4 Between-Within Covariance Decomposi- + [] + +[108] [109] +Overfull \hbox (4.67207pt too wide) detected at line 4519 +[] [] [] [] [] + [] + +[110] [111] [112] [113] +Overfull \hbox (16.9pt too wide) detected at line 4728 +[] + [] + +[114] +Overfull \hbox (140.76006pt too wide) detected at line 4789 +[] + [] + + +Overfull \hbox (24.50304pt too wide) detected at line 4808 +[][][][][][][] + [] + +[115] [116] +Overfull \hbox (64.5pt too wide) in paragraph at lines 4933--4933 +[]\TU/lmtt/m/n/10 ## quadrant n p mean_x mean_y u_x u_y lambda_rank1[] + [] + + +Overfull \hbox (64.5pt too wide) in paragraph at lines 4933--4933 +[]\TU/lmtt/m/n/10 ## 1 CUPM 3732 0.3732 0.909351 0.902914 0.911723 0.911077 0.619997[] + [] + +[117] +Overfull \hbox (64.5pt too wide) in paragraph at lines 4933--4933 +[]\TU/lmtt/m/n/10 ## 2 CLPM 3779 0.3779 -0.900760 -0.915622 -0.898388 -0.907459 0.616197[] + [] + + +Overfull \hbox (64.5pt too wide) in paragraph at lines 4933--4933 +[]\TU/lmtt/m/n/10 ## 3 DLPM 1232 0.1232 0.464620 -0.490819 0.466992 -0.482655 0.055568[] + [] + + +Overfull \hbox (64.5pt too wide) in paragraph at lines 4933--4933 +[]\TU/lmtt/m/n/10 ## 4 DUPM 1257 0.1257 -0.466076 0.488080 -0.463704 0.496243 0.057983[] + [] + +[118] [119] +Underfull \vbox (badness 10000) has occurred while \output is active [] + +[120] +Underfull \vbox (badness 10000) detected at line 5198 + [] + + +Underfull \vbox (badness 10000) has occurred while \output is active [] + +[121] +File: nns-book_files/figure-latex/clpm-mean-slope-figure-1.pdf Graphic file (type pdf) + +[122] [123] +Overfull \hbox (76.48001pt too wide) detected at line 5328 +[] + [] + +[124] +Chapter 12. +[125 + +] +Underfull \hbox (badness 10000) in paragraph at lines 5382--5382 +[]\TU/lmr/bx/n/14.4 Directional Statistics and Probability + [] + +[126] +Underfull \hbox (badness 10000) in paragraph at lines 5438--5440 + + [] + + +Underfull \hbox (badness 10000) in paragraph at lines 5440--5442 + + [] + + +Underfull \hbox (badness 10000) in paragraph at lines 5442--5444 + + [] + +[127] +Underfull \hbox (badness 10000) in paragraph at lines 5484--5486 + + [] + +Missing character: There is no ∎ (U+220E) in font [lmroman10-regular]:mapping=tex-text;! +[128] +File: images/ch11_raw_copula.png Graphic file (type bmp) + + +Overfull \hbox (119.11453pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 12.6. EXAMPLE: DIRECTIONAL DEPENDENCE SURFACE AND COPULA TRANSFORMATION \TU/lmr/m/n/10 129 + [] + +[129] [130] +File: images/ch11_transformed_copula.png Graphic file (type bmp) + + +Overfull \hbox (119.11453pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 12.6. EXAMPLE: DIRECTIONAL DEPENDENCE SURFACE AND COPULA TRANSFORMATION \TU/lmr/m/n/10 131 + [] + +[131] [132] +Underfull \hbox (badness 10000) in paragraph at lines 5713--5715 + + [] + + +Underfull \hbox (badness 10000) in paragraph at lines 5715--5717 + + [] + + +Underfull \hbox (badness 10000) in paragraph at lines 5717--5719 + + [] + +[133] [134] [135] [136 + +] +Chapter 13. +[137] +Overfull \hbox (7.40564pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 138 \TU/lmr/m/sl/10 CHAPTER 13. CONDITIONAL PROBABILITY AND BAYES’ THEOREM + [] + +[138] +Overfull \hbox (47.7826pt too wide) detected at line 5956 +[][][] [] [][] [] [] [] [][][] [] [][] [][][] [] [][] [] [] [] [][][] [] [][] + [] + +[139] +Overfull \hbox (50.2826pt too wide) detected at line 5965 +[][][] [] [][] [] [] [] [][][] [] [][] [][][] [] [][] [] [] [] [][][] [] [][] + [] + + +Overfull \hbox (39.87259pt too wide) detected at line 5974 +[][][] [] [][] [] [] [] [][][] [] [][] [][][] [] [][] [] [] [] [][][] [] [][] + [] + + +Overfull \hbox (7.40564pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 140 \TU/lmr/m/sl/10 CHAPTER 13. CONDITIONAL PROBABILITY AND BAYES’ THEOREM + [] + +[140] [141] +Underfull \hbox (badness 1210) in paragraph at lines 6087--6088 +[]\TU/lmr/m/n/10 Bayesian updating therefore corresponds to \TU/lmr/bx/n/10 redistributing probability + [] + + +Overfull \hbox (7.40564pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 142 \TU/lmr/m/sl/10 CHAPTER 13. CONDITIONAL PROBABILITY AND BAYES’ THEOREM + [] + +[142] +Overfull \hbox (160.23343pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 13.14. THE DEGREE-ONE EXTENSION: CO-PARTIAL MOMENTS AS DISTRIBUTIONAL GENERATORS \TU/lmr/m/n/10 143 + [] + +[143] +Overfull \hbox (7.40564pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 144 \TU/lmr/m/sl/10 CHAPTER 13. CONDITIONAL PROBABILITY AND BAYES’ THEOREM + [] + +[144] +Overfull \hbox (49.50218pt too wide) detected at line 6191 +[][][][][] [][][] [] [] [][][][] [] [][][] [][][] [] [][][][] [] [][][] [] [][][] [] [] [][][] [] [][][][][][] [][][][] \U/msa/m/n/10 ^^C + [] + +[145] +Overfull \hbox (7.40564pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 146 \TU/lmr/m/sl/10 CHAPTER 13. CONDITIONAL PROBABILITY AND BAYES’ THEOREM + [] + +[146] +Underfull \hbox (badness 1533) in paragraph at lines 6253--6255 +[]\TU/lmr/bx/n/10 Bayesian updating corresponds to renormalizing the four- + [] + +[147] +Overfull \hbox (7.40564pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 148 \TU/lmr/m/sl/10 CHAPTER 13. CONDITIONAL PROBABILITY AND BAYES’ THEOREM + [] + +[148 + +] +Chapter 14. +[149] [150] [151] [152] [153] [154] [155] +Underfull \hbox (badness 2837) in paragraph at lines 6551--6552 +[]\TU/lmr/m/n/10 The resulting statistic remains nonparametric, benchmark-relative, and + [] + + +Underfull \hbox (badness 3019) in paragraph at lines 6553--6554 +\TU/lmr/m/n/10 benchmarks, and strength-of-inference summaries, see []Causal Inference + [] + + +Underfull \vbox (badness 1902) has occurred while \output is active [] + +[156] [157] [158 + +] +Chapter 15. +[159] [160] [161] +Underfull \hbox (badness 1308) in paragraph at lines 6731--6732 +[]\TU/lmr/m/n/10 Within the partial-moment framework, these probabilities correspond to + [] + +[162] +Underfull \hbox (badness 1082) in paragraph at lines 6761--6761 +[]\TU/lmr/bx/n/12 The Discrete–Continuous CDF Distinction and + [] + +[163] [164] [165] +Missing character: There is no ≤ (U+2264) in font [lmmono10-regular]:! +File: images/ch14_lpm0_lpm1_diff.png Graphic file (type bmp) + +[166] [167] [168] +Overfull \hbox (21.71344pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 15.4. BLOCK IV — APPLIED WORKFLOW AND PRACTICAL INFERENCE \TU/lmr/m/n/10 169 + [] + +[169] [170] +Overfull \hbox (21.71344pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 15.4. BLOCK IV — APPLIED WORKFLOW AND PRACTICAL INFERENCE \TU/lmr/m/n/10 171 + [] + +[171] [172] +Overfull \hbox (21.71344pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 15.4. BLOCK IV — APPLIED WORKFLOW AND PRACTICAL INFERENCE \TU/lmr/m/n/10 173 + [] + +[173] [174] +Overfull \hbox (21.71344pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 15.4. BLOCK IV — APPLIED WORKFLOW AND PRACTICAL INFERENCE \TU/lmr/m/n/10 175 + [] + +[175] [176] +Overfull \hbox (21.71344pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 15.4. BLOCK IV — APPLIED WORKFLOW AND PRACTICAL INFERENCE \TU/lmr/m/n/10 177 + [] + +[177] [178] +Underfull \hbox (badness 3049) in paragraph at lines 7374--7375 +\TU/lmr/m/n/10 for bias-corrected prediction intervals (\TU/lmtt/m/n/10 LPM.VaR(..., degree = 1, ...)\TU/lmr/m/n/10 , + [] + + +Overfull \hbox (21.71344pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 15.4. BLOCK IV — APPLIED WORKFLOW AND PRACTICAL INFERENCE \TU/lmr/m/n/10 179 + [] + +[179] [180] +Overfull \vbox (0.9429pt too high) detected at line 7509 + [] + + +Overfull \hbox (21.71344pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 15.4. BLOCK IV — APPLIED WORKFLOW AND PRACTICAL INFERENCE \TU/lmr/m/n/10 181 + [] + +[181] +Underfull \hbox (badness 10000) in paragraph at lines 7511--7516 +[]\TU/lmr/m/n/10 For this comparison, \TU/lmtt/m/n/10 NNS.SS(manual_mpg, auto_mpg) \TU/lmr/m/n/10 yields \TU/lmtt/m/n/10 p_gt = + [] + + +Underfull \hbox (badness 2261) in paragraph at lines 7511--7516 +\TU/lmtt/m/n/10 0.8259109\TU/lmr/m/n/10 , \TU/lmtt/m/n/10 p_tie = 0.008097166\TU/lmr/m/n/10 , and \TU/lmtt/m/n/10 p_star = 0.8299595\TU/lmr/m/n/10 , indicating + [] + + +Underfull \hbox (badness 10000) in paragraph at lines 7511--7516 +\TU/lmtt/m/n/10 NNS.ANOVA(control = auto_mpg, treatment = manual_mpg, robust = + [] + + +Underfull \hbox (badness 1062) in paragraph at lines 7511--7516 +\TU/lmr/m/n/10 show pairwise directional advantage, weak distributional agreement, and + [] + + +Underfull \vbox (badness 10000) has occurred while \output is active [] + +[182] +Overfull \hbox (21.71344pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 15.4. BLOCK IV — APPLIED WORKFLOW AND PRACTICAL INFERENCE \TU/lmr/m/n/10 183 + [] + +[183] [184 + +] +Chapter 16. +[185] +Overfull \hbox (166.99454pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 186 \TU/lmr/m/sl/10 CHAPTER 16. DIRECTIONAL TAIL THRESHOLDS, PROBABILITY BOUNDS, AND ESTIMATION ERROR + [] + +[186] +Underfull \hbox (badness 2351) in paragraph at lines 7654--7654 +[]\TU/lmr/bx/n/14.4 From Frequency to Severity: Higher- + [] + + +Overfull \hbox (29.09453pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 16.2. DEGREE-ZERO THRESHOLDS AND THEIR DIRECTIONAL MEANING \TU/lmr/m/n/10 187 + [] + +[187] +Overfull \hbox (166.99454pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 188 \TU/lmr/m/sl/10 CHAPTER 16. DIRECTIONAL TAIL THRESHOLDS, PROBABILITY BOUNDS, AND ESTIMATION ERROR + [] + +[188] +Overfull \hbox (49.4621pt too wide) detected at line 7748 +[] [] [] [] [][] + [] + + +Underfull \hbox (badness 1635) in paragraph at lines 7754--7754 +[]\TU/lmr/bx/n/14.4 Severity-Weighted Thresholds as Early- + [] + + +Overfull \hbox (51.81563pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 16.5. SEVERITY-WEIGHTED THRESHOLDS AS EARLY-INTERVENTION RULES \TU/lmr/m/n/10 189 + [] + +[189] +Overfull \hbox (166.99454pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 190 \TU/lmr/m/sl/10 CHAPTER 16. DIRECTIONAL TAIL THRESHOLDS, PROBABILITY BOUNDS, AND ESTIMATION ERROR + [] + +[190] +Overfull \hbox (0.8256pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 16.8. UTILITY, DECISION CONTEXT, AND WHY DEGREE MATTERS \TU/lmr/m/n/10 191 + [] + +[191] +Overfull \hbox (166.99454pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 192 \TU/lmr/m/sl/10 CHAPTER 16. DIRECTIONAL TAIL THRESHOLDS, PROBABILITY BOUNDS, AND ESTIMATION ERROR + [] + +[192] [193] +Overfull \hbox (166.99454pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 194 \TU/lmr/m/sl/10 CHAPTER 16. DIRECTIONAL TAIL THRESHOLDS, PROBABILITY BOUNDS, AND ESTIMATION ERROR + [] + +[194 + +] +Chapter 17. +[195] +Underfull \hbox (badness 10000) in paragraph at lines 7973--7975 + + [] + +[196] [197] +Underfull \hbox (badness 10000) in paragraph at lines 8042--8042 +[]\TU/lmr/bx/n/14.4 Partial-Moment Quantile Functions: + [] + + +Underfull \vbox (badness 1189) has occurred while \output is active [] + +[198] [199] [200] [201] [202] [203] +File: images/ch15_reg_conf_int.png Graphic file (type bmp) + +[204] [205] [206] +Chapter 18. +[207 + +] [208] +Overfull \hbox (3.67741pt too wide) in paragraph at lines 8458--8459 +[]\TU/lmr/bx/n/10 Theorem (Consistency of Partition Estimators). \TU/lmr/m/n/10 Let $[][][][] [][][][] [] [] [][][][] [][][]$ + [] + +[209] [210] [211] [212] +File: images/ch18_part_1.png Graphic file (type bmp) + +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(1)/m/n' will be +(Font) scaled to size 14.4pt on input line 8639. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(2)/m/n' will be +(Font) scaled to size 14.4013pt on input line 8639. +LaTeX Font Info: Font shape `TU/latinmodern-math.otf(3)/m/n' will be +(Font) scaled to size 14.39845pt on input line 8639. +[213] +File: images/ch18_part_2.png Graphic file (type bmp) + +[214] [215] +Underfull \hbox (badness 2495) in paragraph at lines 8739--8739 +[]\TU/lmr/bx/n/14.4 Partial-Moment Interpretation of Joint + [] + +[216] [217] [218] [219] [220] [221] [222] [223] [224] +Chapter 19. +[225 + +] [226] +Overfull \hbox (24.78342pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 19.2. LOCAL AVERAGING IN THE RECURSIVE MEAN-SPLIT ESTIMATOR \TU/lmr/m/n/10 227 + [] + +[227] +Underfull \hbox (badness 1259) in paragraph at lines 9169--9170 +[]\TU/lmr/m/n/10 The recursive mean-split estimator can be viewed as a \TU/lmr/bx/n/10 local-constant + [] + +[228] +Overfull \hbox (61.64343pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 19.5. CONSISTENCY CONDITIONS RE-READ THROUGH THE BANDWIDTH LENS \TU/lmr/m/n/10 229 + [] + +[229] [230] [231] [232] [233] [234] +Overfull \hbox (9.92343pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 19.11. THE ORDER PARAMETER AS GLOBAL SMOOTHING CONTROL \TU/lmr/m/n/10 235 + [] + +[235] [236] [237] [238] [239] [240 + +] +Chapter 20. +[241] +Overfull \hbox (32.1345pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 242 \TU/lmr/m/sl/10 CHAPTER 20. SYNTHETIC DATA AND MAXIMUM ENTROPY BOOTSTRAP + [] + +[242] [243] +Overfull \hbox (32.1345pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 244 \TU/lmr/m/sl/10 CHAPTER 20. SYNTHETIC DATA AND MAXIMUM ENTROPY BOOTSTRAP + [] + +[244] [245] +LaTeX Font Info: Font shape `TU/lmtt/bx/n' in size <14.4> not available +(Font) Font shape `TU/lmtt/b/n' tried instead on input line 9861. + +Underfull \hbox (badness 2503) in paragraph at lines 9861--9861 +[]\TU/lmr/bx/n/14.4 Arbitrary Spearman Rank Correlation: + [] + + +Overfull \hbox (32.1345pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 246 \TU/lmr/m/sl/10 CHAPTER 20. SYNTHETIC DATA AND MAXIMUM ENTROPY BOOTSTRAP + [] + +[246] +Overfull \hbox (3.32802pt too wide) in paragraph at lines 9920--9921 +[]\TU/lmr/m/n/10 Reconstructed synthetic series are formed as $[][] [] [][]$, + [] + +[247] +Underfull \hbox (badness 2913) in paragraph at lines 9945--9945 +[]\TU/lmr/bx/n/14.4 Synthetic Time-Series Generation with + [] + +File: images/ch17_meboot_orig.png Graphic file (type bmp) + + +ignored: Infinite glue shrinkage found in box being split +Overfull \hbox (32.1345pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 248 \TU/lmr/m/sl/10 CHAPTER 20. SYNTHETIC DATA AND MAXIMUM ENTROPY BOOTSTRAP + [] + +[248] +LaTeX Font Info: Font shape `TU/lmtt/bx/n' in size <12> not available +(Font) Font shape `TU/lmtt/b/n' tried instead on input line 10009. +[249] +File: images/ch17_iid_mc_sim.png Graphic file (type bmp) + + +Overfull \hbox (32.1345pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 250 \TU/lmr/m/sl/10 CHAPTER 20. SYNTHETIC DATA AND MAXIMUM ENTROPY BOOTSTRAP + [] + +[250] +File: images/ch17_meboot_mc_sim.png Graphic file (type bmp) + +[251] +Underfull \hbox (badness 2635) in paragraph at lines 10112--10112 +[]\TU/lmr/bx/n/14.4 Applications in Forecasting and Risk + [] + + +Overfull \hbox (32.1345pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 252 \TU/lmr/m/sl/10 CHAPTER 20. SYNTHETIC DATA AND MAXIMUM ENTROPY BOOTSTRAP + [] + +[252] [253] +Overfull \hbox (32.1345pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 254 \TU/lmr/m/sl/10 CHAPTER 20. SYNTHETIC DATA AND MAXIMUM ENTROPY BOOTSTRAP + [] + +[254] [255] +Overfull \hbox (32.1345pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 256 \TU/lmr/m/sl/10 CHAPTER 20. SYNTHETIC DATA AND MAXIMUM ENTROPY BOOTSTRAP + [] + +[256] +Chapter 21. +[257 + +] [258] [259] [260] [261] +ignored: Infinite glue shrinkage found in box being split [262] [263] [264] +File: images/ch20_kmeans_comp.png Graphic file (type bmp) + + +Underfull \vbox (badness 4634) has occurred while \output is active [] + +[265] [266] [267] [268] [269] [270] [271] [272] [273] [274 + +] +Chapter 22. +[275] [276] [277] +Underfull \hbox (badness 1102) in paragraph at lines 10933--10933 +[]\TU/lmr/bx/n/14.4 From Conditional Means to Regression + [] + +[278] +File: images/ch21_part_reg.png Graphic file (type bmp) + + +LaTeX Warning: Float too large for page by 23.04681pt on input line 10991. + +[279] [280] [281] +Underfull \hbox (badness 6300) in paragraph at lines 11090--11090 +[]\TU/lmr/bx/n/14.4 Comparison with Classical Regression + [] + +[282] [283] [284] +Overfull \hbox (213.92236pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 22.11. MULTIVARIATE REGRESSION: PER-REGRESSOR PARTITIONING AND THE CURSE OF DIMENSIONALITY \TU/lmr/m/n/10 285 + [] + +[285] [286] +Overfull \hbox (73.97562pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 22.12. ADAPTIVE ORDER SELECTION: DEPENDENCE-DRIVEN PARTITION DEPTH \TU/lmr/m/n/10 287 + [] + +[287] [288] [289] [290] +Underfull \hbox (badness 3240) in paragraph at lines 11373--11374 +[]\TU/lmr/m/n/10 The multivariate architecture — per-regressor partitioning against the + [] + + +Underfull \hbox (badness 2418) in paragraph at lines 11373--11374 +\TU/lmr/m/n/10 response, regression-point matrix construction, and dependence-adaptive + [] + +[291] +Underfull \hbox (badness 1127) in paragraph at lines 11395--11396 +\TU/lmr/m/n/10 bandwidth, response-anchored regression-point nearest-neighbor prediction, + [] + +[292] [293] [294 + +] +Chapter 23. +[295] [296] [297] [298] [299] [300] [301] [302] [303] [304] +Underfull \hbox (badness 3861) in paragraph at lines 11908--11909 +[]\TU/lmr/m/n/10 Schematically, this is implemented by calling \TU/lmtt/m/n/10 NNS.boost(..., type = + [] + +[305] [306] [307] [308] [309] [310] [311] [312 + +] +Chapter 24. + +Underfull \hbox (badness 1796) in paragraph at lines 12272--12273 +\TU/lmr/m/n/10 features for an optimized stacked model, including cross-validated + [] + +[313] [314] [315] +Underfull \hbox (badness 1132) in paragraph at lines 12394--12395 +\TU/lmtt/m/n/10 pred.int\TU/lmr/m/n/10 , and feature-frequency summaries through \TU/lmtt/m/n/10 features.only \TU/lmr/m/n/10 and + [] + + +Underfull \vbox (badness 1565) has occurred while \output is active [] + +[316] [317] [318] [319] [320] [321] +Underfull \hbox (badness 1521) in paragraph at lines 12562--12562 +[]\TU/lmr/bx/n/14.4 Cross-Validation in Nonparametric Set- + [] + +[322] +Overfull \hbox (2.49452pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 24.7. ENSEMBLE LEARNING AND THE BIAS–VARIANCE TRADEOFF \TU/lmr/m/n/10 323 + [] + +[323] [324] [325] [326] [327] [328] [329] [330] +Chapter 25. +[331 + +] +Overfull \hbox (20.34938pt too wide) detected at line 12893 +[][] [] [][] [][][][] [][][] [] [] [][][] [][][] [] [] [][][] [][][] [] [] [] [][][] [] [] [][] [] [] [][] + [] + +[332] [333] [334] +Overfull \hbox (14.98096pt too wide) detected at line 13028 +[][] [] [][][][] [][][] [] [] [][][] [][][] [][][] [] [] [][][] [] [] [][][] [][][] [] [] [][][][] + [] + +[335] +Underfull \hbox (badness 2626) in paragraph at lines 13083--13084 +[]\TU/lmr/m/n/10 So mixed-frequency forecasting is handled natively, rather than being + [] + +[336] [337] [338] [339] [340] [341] [342] [343] [344] +Overfull \hbox (125.63344pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 25.17. LEAKAGE-SAFE VALIDATION IN MULTIVARIATE AND MIXED-FREQUENCY SETTINGS \TU/lmr/m/n/10 345 + [] + +[345] +Underfull \hbox (badness 1776) in paragraph at lines 13642--13643 +[]\TU/lmr/bx/n/10 nonparametric vector autoregression as directional multivariate + [] + +[346] +Chapter 26. +[347 + +] [348] +Underfull \hbox (badness 1975) in paragraph at lines 13754--13754 +[]\TU/lmr/bx/n/14.4 Autoregression as a Subset Regression + [] + +[349] [350] [351] [352] [353] [354] [355] [356] +File: images/ch24_uni_ts.png Graphic file (type bmp) + + +Underfull \hbox (badness 1259) in paragraph at lines 14156--14156 +[]\TU/lmr/m/n/10 Figure 26.1: []Figure 24.1. \TU/lmtt/m/n/10 NNS.ARMA(..., h = 45, seasonal.factor = + [] + + +Underfull \vbox (badness 10000) has occurred while \output is active [] + +[357] [358] [359] [360] [361] [362] [363] [364] [365] [366 + +] +Chapter 27. +[367] [368] +Chapter 28. + +ignored: Infinite glue shrinkage found in box being split [369 + +] +ignored: Infinite glue shrinkage found in box being split +Overfull \hbox (2.70673pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 370 \TU/lmr/m/sl/10 CHAPTER 28. APPENDIX: NOTATION AND FUNCTION REFERENCE + [] + +[370] +Underfull \hbox (badness 1762) in paragraph at lines 14689--14691 +[]\TU/lmtt/m/n/10 UPM.VaR(percentile, degree, variable) \TU/lmr/m/n/10 Upper-tail analog of \TU/lmtt/m/n/10 LPM.VaR\TU/lmr/m/n/10 , + [] + + +Underfull \hbox (badness 1642) in paragraph at lines 14692--14692 +[]\TU/lmr/bx/n/14.4 Directional Decision Regions Crosswalk + [] + + +Overfull \hbox (14.1009pt too wide) in paragraph at lines 14719--14719 +[]|\TU/lmtt/m/n/10 LPM.VaR(alpha, + [] + + +Overfull \hbox (13.15091pt too wide) in paragraph at lines 14720--14720 +\TU/lmr/m/n/10 risk/opportunity + [] + + +Overfull \hbox (14.1009pt too wide) in paragraph at lines 14720--14720 +[]|\TU/lmtt/m/n/10 UPM.VaR(alpha, + [] + + +Overfull \hbox (14.1009pt too wide) in paragraph at lines 14721--14721 +[]|\TU/lmtt/m/n/10 NNS.ANOVA(...) + [] + + +Overfull \hbox (14.1009pt too wide) in paragraph at lines 14723--14723 +[]|\TU/lmtt/m/n/10 Co.LPM(degree, + [] + + +Overfull \hbox (14.1009pt too wide) in paragraph at lines 14723--14723 +\TU/lmtt/m/n/10 Co.UPM(degree, + [] + + +Overfull \hbox (3.6009pt too wide) in paragraph at lines 14724--14724 +[]|\TU/lmtt/m/n/10 LPM.VaR(...) + [] + + +Overfull \hbox (3.6009pt too wide) in paragraph at lines 14724--14724 +\TU/lmtt/m/n/10 UPM.VaR(...) + [] + + +Overfull \hbox (37.77452pt too wide) has occurred while \output is active +\TU/lmr/m/sl/10 28.4. DIRECTIONAL DECISION REGIONS CROSSWALK (CLASSICAL → NNS) \TU/lmr/m/n/10 371 + [] + +[371] +ignored: Infinite glue shrinkage found in box being split +Overfull \hbox (2.70673pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 372 \TU/lmr/m/sl/10 CHAPTER 28. APPENDIX: NOTATION AND FUNCTION REFERENCE + [] + +[372] +Overfull \hbox (90.75pt too wide) in paragraph at lines 14749--14749 +\TU/lmtt/m/n/10 NNS.reg(...)$Fitted.xy$residuals + [] + + +Overfull \hbox (22.5pt too wide) in paragraph at lines 14750--14750 +\TU/lmtt/m/n/10 confidence.interval + [] + + +Overfull \hbox (0.89pt too wide) in paragraph at lines 14751--14751 +\TU/lmr/m/n/10 rolling/segmented + [] + +[373] +Overfull \hbox (2.70673pt too wide) has occurred while \output is active +\TU/lmr/m/n/10 374 \TU/lmr/m/sl/10 CHAPTER 28. APPENDIX: NOTATION AND FUNCTION REFERENCE + [] + +[374] (./nns-book.aux) + *********** +LaTeX2e <2025-11-01> +L3 programming layer <2026-03-20> + *********** + ) +Here is how much of TeX's memory you used: + 20166 strings out of 470014 + 379344 string characters out of 5473777 + 1007866 words of memory out of 5000000 + 48632 multiletter control sequences out of 15000+600000 + 635269 words of font info for 116 fonts, out of 8000000 for 9000 + 14 hyphenation exceptions out of 8191 + 90i,11n,114p,1005b,598s stack positions out of 10000i,1000n,20000p,200000b,200000s + +Output written on nns-book.pdf (374 pages). diff --git a/tools/NNS/book/style.css b/tools/NNS/book/style.css new file mode 100644 index 0000000..0e3ab64 --- /dev/null +++ b/tools/NNS/book/style.css @@ -0,0 +1 @@ +/* Placeholder stylesheet for the NNS Book bookdown output. */ diff --git a/tools/NNS/inst/doc/NNSvignette_01_Overview.R b/tools/NNS/doc/NNSvignette_01_Overview.R similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_01_Overview.R rename to tools/NNS/doc/NNSvignette_01_Overview.R diff --git a/tools/NNS/inst/doc/NNSvignette_01_Overview.Rmd b/tools/NNS/doc/NNSvignette_01_Overview.Rmd similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_01_Overview.Rmd rename to tools/NNS/doc/NNSvignette_01_Overview.Rmd diff --git a/tools/NNS/inst/doc/NNSvignette_01_Overview.html b/tools/NNS/doc/NNSvignette_01_Overview.html similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_01_Overview.html rename to tools/NNS/doc/NNSvignette_01_Overview.html diff --git a/tools/NNS/inst/doc/NNSvignette_02_Partial_Moments.R b/tools/NNS/doc/NNSvignette_02_Partial_Moments.R similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_02_Partial_Moments.R rename to tools/NNS/doc/NNSvignette_02_Partial_Moments.R diff --git a/tools/NNS/inst/doc/NNSvignette_02_Partial_Moments.Rmd b/tools/NNS/doc/NNSvignette_02_Partial_Moments.Rmd similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_02_Partial_Moments.Rmd rename to tools/NNS/doc/NNSvignette_02_Partial_Moments.Rmd diff --git a/tools/NNS/inst/doc/NNSvignette_02_Partial_Moments.html b/tools/NNS/doc/NNSvignette_02_Partial_Moments.html similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_02_Partial_Moments.html rename to tools/NNS/doc/NNSvignette_02_Partial_Moments.html diff --git a/tools/NNS/inst/doc/NNSvignette_03_Correlation_and_Dependence.R b/tools/NNS/doc/NNSvignette_03_Correlation_and_Dependence.R similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_03_Correlation_and_Dependence.R rename to tools/NNS/doc/NNSvignette_03_Correlation_and_Dependence.R diff --git a/tools/NNS/inst/doc/NNSvignette_03_Correlation_and_Dependence.Rmd b/tools/NNS/doc/NNSvignette_03_Correlation_and_Dependence.Rmd similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_03_Correlation_and_Dependence.Rmd rename to tools/NNS/doc/NNSvignette_03_Correlation_and_Dependence.Rmd diff --git a/tools/NNS/inst/doc/NNSvignette_03_Correlation_and_Dependence.html b/tools/NNS/doc/NNSvignette_03_Correlation_and_Dependence.html similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_03_Correlation_and_Dependence.html rename to tools/NNS/doc/NNSvignette_03_Correlation_and_Dependence.html diff --git a/tools/NNS/inst/doc/NNSvignette_04_Normalization_and_Rescaling.R b/tools/NNS/doc/NNSvignette_04_Normalization_and_Rescaling.R similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_04_Normalization_and_Rescaling.R rename to tools/NNS/doc/NNSvignette_04_Normalization_and_Rescaling.R diff --git a/tools/NNS/inst/doc/NNSvignette_04_Normalization_and_Rescaling.Rmd b/tools/NNS/doc/NNSvignette_04_Normalization_and_Rescaling.Rmd similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_04_Normalization_and_Rescaling.Rmd rename to tools/NNS/doc/NNSvignette_04_Normalization_and_Rescaling.Rmd diff --git a/tools/NNS/inst/doc/NNSvignette_04_Normalization_and_Rescaling.html b/tools/NNS/doc/NNSvignette_04_Normalization_and_Rescaling.html similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_04_Normalization_and_Rescaling.html rename to tools/NNS/doc/NNSvignette_04_Normalization_and_Rescaling.html diff --git a/tools/NNS/inst/doc/NNSvignette_05_Sampling.R b/tools/NNS/doc/NNSvignette_05_Sampling.R similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_05_Sampling.R rename to tools/NNS/doc/NNSvignette_05_Sampling.R diff --git a/tools/NNS/inst/doc/NNSvignette_05_Sampling.Rmd b/tools/NNS/doc/NNSvignette_05_Sampling.Rmd similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_05_Sampling.Rmd rename to tools/NNS/doc/NNSvignette_05_Sampling.Rmd diff --git a/tools/NNS/inst/doc/NNSvignette_05_Sampling.html b/tools/NNS/doc/NNSvignette_05_Sampling.html similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_05_Sampling.html rename to tools/NNS/doc/NNSvignette_05_Sampling.html diff --git a/tools/NNS/inst/doc/NNSvignette_06_Comparing_Distributions.R b/tools/NNS/doc/NNSvignette_06_Comparing_Distributions.R similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_06_Comparing_Distributions.R rename to tools/NNS/doc/NNSvignette_06_Comparing_Distributions.R diff --git a/tools/NNS/inst/doc/NNSvignette_06_Comparing_Distributions.Rmd b/tools/NNS/doc/NNSvignette_06_Comparing_Distributions.Rmd similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_06_Comparing_Distributions.Rmd rename to tools/NNS/doc/NNSvignette_06_Comparing_Distributions.Rmd diff --git a/tools/NNS/inst/doc/NNSvignette_06_Comparing_Distributions.html b/tools/NNS/doc/NNSvignette_06_Comparing_Distributions.html similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_06_Comparing_Distributions.html rename to tools/NNS/doc/NNSvignette_06_Comparing_Distributions.html diff --git a/tools/NNS/inst/doc/NNSvignette_07_Clustering_and_Regression.R b/tools/NNS/doc/NNSvignette_07_Clustering_and_Regression.R similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_07_Clustering_and_Regression.R rename to tools/NNS/doc/NNSvignette_07_Clustering_and_Regression.R diff --git a/tools/NNS/inst/doc/NNSvignette_07_Clustering_and_Regression.Rmd b/tools/NNS/doc/NNSvignette_07_Clustering_and_Regression.Rmd similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_07_Clustering_and_Regression.Rmd rename to tools/NNS/doc/NNSvignette_07_Clustering_and_Regression.Rmd diff --git a/tools/NNS/inst/doc/NNSvignette_07_Clustering_and_Regression.html b/tools/NNS/doc/NNSvignette_07_Clustering_and_Regression.html similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_07_Clustering_and_Regression.html rename to tools/NNS/doc/NNSvignette_07_Clustering_and_Regression.html diff --git a/tools/NNS/inst/doc/NNSvignette_08_Classification.R b/tools/NNS/doc/NNSvignette_08_Classification.R similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_08_Classification.R rename to tools/NNS/doc/NNSvignette_08_Classification.R diff --git a/tools/NNS/inst/doc/NNSvignette_08_Classification.Rmd b/tools/NNS/doc/NNSvignette_08_Classification.Rmd similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_08_Classification.Rmd rename to tools/NNS/doc/NNSvignette_08_Classification.Rmd diff --git a/tools/NNS/inst/doc/NNSvignette_08_Classification.html b/tools/NNS/doc/NNSvignette_08_Classification.html similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_08_Classification.html rename to tools/NNS/doc/NNSvignette_08_Classification.html diff --git a/tools/NNS/inst/doc/NNSvignette_09_Forecasting.R b/tools/NNS/doc/NNSvignette_09_Forecasting.R similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_09_Forecasting.R rename to tools/NNS/doc/NNSvignette_09_Forecasting.R diff --git a/tools/NNS/inst/doc/NNSvignette_09_Forecasting.Rmd b/tools/NNS/doc/NNSvignette_09_Forecasting.Rmd similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_09_Forecasting.Rmd rename to tools/NNS/doc/NNSvignette_09_Forecasting.Rmd diff --git a/tools/NNS/inst/doc/NNSvignette_09_Forecasting.html b/tools/NNS/doc/NNSvignette_09_Forecasting.html similarity index 100% rename from tools/NNS/inst/doc/NNSvignette_09_Forecasting.html rename to tools/NNS/doc/NNSvignette_09_Forecasting.html diff --git a/tools/NNS/examples/5_way_rank_probabilities_using_R.pdf b/tools/NNS/examples/5_way_rank_probabilities_using_R.pdf new file mode 100644 index 0000000..2c78a1b Binary files /dev/null and b/tools/NNS/examples/5_way_rank_probabilities_using_R.pdf differ diff --git a/tools/NNS/examples/7_Econometric_Reasons.html b/tools/NNS/examples/7_Econometric_Reasons.html new file mode 100644 index 0000000..8c3147d --- /dev/null +++ b/tools/NNS/examples/7_Econometric_Reasons.html @@ -0,0 +1,3303 @@ + + + + + + + + + + + + + +The 7 Reasons Most Econometric Investments Fail - NNS Contributions Towards Solutions + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    +
    +
    +
    +
    + +
    + + + + + + + +
    +

    Introduction

    +

    Marcos Lopez de Prado recently shared a presentation titled, +The 7 Reasons Most Econometric Investments Fail. It is a +wholesale identification on the misuse of traditional statistical +methods in quantitative finance.

    +

    Please download Marcos’ report here https://ssrn.com/abstract=3373116, or watch a video on +the presentation here: https://www.youtube.com/watch?v=BRUlSm4gdQ4.

    +

    This quick note is not an argument against any of the points Marcos +raises, rather to illustrate how NNS and my research with +David Nawrocki and Hrishikesh Vinod has been addressing these issues for +the last several years. I will use his examples with each of his +pitfalls and show how NNS provides superior insight to +traditional methods.

    +
    +

    Load Required Packages in R NNS (>= 11.6)

    +
    #require(devtools); install_github('OVVO-Financial/NNS', ref = "NNS-Beta-Version")
    +require(NNS)
    +require(plyr)
    +require(data.table)
    +
    +
    +
    +

    Pitfall #1: Structured Data

    +
      +
    • NNS does not utilize standard transformations in order +to achieve stationarity.
    • +
    • NNS is based on a hierarchical and partitional +clustering method that is able to preserve the original covariance +matrix.
    • +
    • Unstructured, categorical, and other non-numeric data is seamlessly +encoded in NNS for analysis.
    • +
    +
    +

    NNS Clustering

    +

    Below is an example of the partitioning method based on partial +moment quadrants. Each observation is sequentially labelled per the +quadrant it resides in. A useful analogy is that of an observation being +a leaf and sequentially labeling the leaf from the trunk of the tree +through the successively smaller branches that leaf is located on.

    +

    The (lengthy) output shows the order of partitioning, +the observations and their associated quadrant label.

    +
    set.seed(123);x=rnorm(100);y=rnorm(100)
    +NNS.part(x,y,Voronoi=TRUE)
    +

    +
    ## $order
    +## [1] 2
    +## 
    +## $dt
    +##                 x           y quadrant prior.quadrant
    +##   1: -0.560475647 -0.71040656      q43             q4
    +##   2: -0.230177489  0.25688371      q23             q2
    +##   3:  1.558708314 -0.24669188      q31             q3
    +##   4:  0.070508391 -0.34754260      q32             q3
    +##   5:  0.129287735 -0.95161857      q34             q3
    +##   6:  1.715064987 -0.04502772      q13             q1
    +##   7:  0.460916206 -0.78490447      q34             q3
    +##   8: -1.265061235 -1.66794194      q44             q4
    +##   9: -0.686852852 -0.38022652      q41             q4
    +##  10: -0.445661970  0.91899661      q21             q2
    +##  11:  1.224081797 -0.57534696      q31             q3
    +##  12:  0.359813827  0.60796432      q12             q1
    +##  13:  0.400771451 -1.61788271      q34             q3
    +##  14:  0.110682716 -0.05556197      q14             q1
    +##  15: -0.555841135  0.51940720      q23             q2
    +##  16:  1.786913137  0.30115336      q13             q1
    +##  17:  0.497850478  0.10567619      q14             q1
    +##  18: -1.966617157 -0.64070601      q42             q4
    +##  19:  0.701355902 -0.84970435      q33             q3
    +##  20: -0.472791408 -1.02412879      q43             q4
    +##  21: -1.067823706  0.11764660      q24             q2
    +##  22: -0.217974915 -0.94747461      q43             q4
    +##  23: -1.026004448 -0.49055744      q42             q4
    +##  24: -0.728891229 -0.25609219      q42             q4
    +##  25: -0.625039268  1.84386201      q22             q2
    +##  26: -1.686693311 -0.65194990      q42             q4
    +##  27:  0.837787044  0.23538657      q13             q1
    +##  28:  0.153373118  0.07796085      q14             q1
    +##  29: -1.138136937 -0.96185663      q44             q4
    +##  30:  1.253814921 -0.07130809      q13             q1
    +##  31:  0.426464221  1.44455086      q12             q1
    +##  32: -0.295071483  0.45150405      q23             q2
    +##  33:  0.895125661  0.04123292      q13             q1
    +##  34:  0.878133488 -0.42249683      q31             q3
    +##  35:  0.821581082 -2.05324722      q33             q3
    +##  36:  0.688640254  1.13133721      q11             q1
    +##  37:  0.553917654 -1.46064007      q34             q3
    +##  38: -0.061911711  0.73994751      q21             q2
    +##  39: -0.305962664  1.90910357      q21             q2
    +##  40: -0.380471001 -1.44389316      q43             q4
    +##  41: -0.694706979  0.70178434      q22             q2
    +##  42: -0.207917278 -0.26219749      q41             q4
    +##  43: -1.265396352 -1.57214416      q44             q4
    +##  44:  2.168955965 -1.51466765      q33             q3
    +##  45:  1.207961998 -1.60153617      q33             q3
    +##  46: -1.123108583 -0.53090652      q42             q4
    +##  47: -0.402884835 -1.46175558      q43             q4
    +##  48: -0.466655354  0.68791677      q21             q2
    +##  49:  0.779965118  2.10010894      q11             q1
    +##  50: -0.083369066 -1.28703048      q43             q4
    +##  51:  0.253318514  0.78773885      q12             q1
    +##  52: -0.028546755  0.76904224      q12             q1
    +##  53: -0.042870457  0.33220258      q23             q2
    +##  54:  1.368602284 -1.00837661      q33             q3
    +##  55: -0.225770986 -0.11945261      q23             q2
    +##  56:  1.516470604 -0.28039534      q31             q3
    +##  57: -1.548752804  0.56298953      q24             q2
    +##  58:  0.584613750 -0.37243876      q32             q3
    +##  59:  0.123854244  0.97697339      q12             q1
    +##  60:  0.215941569 -0.37458086      q32             q3
    +##  61:  0.379639483  1.05271147      q12             q1
    +##  62: -0.502323453 -1.04917701      q43             q4
    +##  63: -0.333207384 -1.26015524      q43             q4
    +##  64: -1.018575383  3.24103993      q22             q2
    +##  65: -1.071791226 -0.41685759      q42             q4
    +##  66:  0.303528641  0.29822759      q14             q1
    +##  67:  0.448209779  0.63656967      q12             q1
    +##  68:  0.053004227 -0.48378063      q32             q3
    +##  69:  0.922267468  0.51686204      q11             q1
    +##  70:  2.050084686  0.36896453      q11             q1
    +##  71: -0.491031166 -0.21538051      q41             q4
    +##  72: -2.309168876  0.06529303      q24             q2
    +##  73:  1.005738524 -0.03406725      q13             q1
    +##  74: -0.709200763  2.12845190      q22             q2
    +##  75: -0.688008616 -0.74133610      q43             q4
    +##  76:  1.025571370 -1.09599627      q33             q3
    +##  77: -0.284773007  0.03778840      q23             q2
    +##  78: -1.220717712  0.31048075      q24             q2
    +##  79:  0.181303480  0.43652348      q12             q1
    +##  80: -0.138891362 -0.45836533      q41             q4
    +##  81:  0.005764186 -1.06332613      q34             q3
    +##  82:  0.385280401  1.26318518      q12             q1
    +##  83: -0.370660032 -0.34965039      q41             q4
    +##  84:  0.644376549 -0.86551286      q34             q3
    +##  85: -0.220486562 -0.23627957      q41             q4
    +##  86:  0.331781964 -0.19717589      q32             q3
    +##  87:  1.096839013  1.10992029      q11             q1
    +##  88:  0.435181491  0.08473729      q14             q1
    +##  89: -0.325931586  0.75405379      q21             q2
    +##  90:  1.148807618 -0.49929202      q31             q3
    +##  91:  0.993503856  0.21444531      q13             q1
    +##  92:  0.548396960 -0.32468591      q32             q3
    +##  93:  0.238731735  0.09458353      q14             q1
    +##  94: -0.627906076 -0.89536336      q43             q4
    +##  95:  1.360652449 -1.31080153      q33             q3
    +##  96: -0.600259587  1.99721338      q22             q2
    +##  97:  2.187332993  0.60070882      q11             q1
    +##  98:  1.532610626 -1.25127136      q33             q3
    +##  99: -0.235700359 -0.61116592      q41             q4
    +## 100: -1.026420900 -1.18548008      q44             q4
    +##                 x           y quadrant prior.quadrant
    +## 
    +## $regression.points
    +##    quadrant          x          y
    +## 1:       q1  0.5135824  0.3326156
    +## 2:       q2 -0.5625470  0.6488116
    +## 3:       q3  0.6961300 -0.7567517
    +## 4:       q4 -0.6945103 -0.6947171
    +
    +
    +

    NNS Covariance

    +

    Next we can show the covariance matrix from the partial moment +quadrants:

    +
      +
    • Co-upper partial moment CUPM [upper right]
    • +
    • Co-lower partial moment CLPM [lower left]
    • +
    • Divergent-lower partial moment DLPM [lower right]
    • +
    • Divergent-upper partial moment DUPM [upper left]
    • +
    +

    By adding of the CLPM and CUPM off-diagonals, and subtracting the +DLPM and DUPM off-diagonals, we arrive at the covariance of \(x,y\).

    +

    This is critical as it permits completely different objective +functions, rather than just manipulating the entire covariance +matrix.

    +

    Please see the following for more on partial moments’ role as the +elements of variance and links to even more equivalences with +traditional measures:

    +

    Elements of Variance https://www.linkedin.com/pulse/elements-variance-fred-viole/

    +

    Here are the individual partial moment matrices, and the aggregate +covariance matrix provided by NNS.

    +
    # Store x,y into Matrix form
    +A=cbind(x,y)
    +PM.matrix(LPM_degree = 1, UPM_degree = 1, target = "mean", variable = A, pop_adj = TRUE)
    +
    ## $cupm
    +##           x         y
    +## x 0.4299250 0.1033601
    +## y 0.1033601 0.5411626
    +## 
    +## $dupm
    +##           x         y
    +## x 0.0000000 0.1469182
    +## y 0.1560924 0.0000000
    +## 
    +## $dlpm
    +##           x         y
    +## x 0.0000000 0.1560924
    +## y 0.1469182 0.0000000
    +## 
    +## $clpm
    +##           x         y
    +## x 0.4033078 0.1559295
    +## y 0.1559295 0.3939005
    +## 
    +## $cov.matrix
    +##             x           y
    +## x  0.83323283 -0.04372107
    +## y -0.04372107  0.93506310
    +
    # Traditional Covariance
    +cov(A)
    +
    ##             x           y
    +## x  0.83323283 -0.04372107
    +## y -0.04372107  0.93506310
    +
    +
    +
    +

    Pitfall #2: Correlations / Betas

    +
      +
    • NNS is able to determine non-linear relationships +between variables. See the article for a full demonstration on +NNS correlation and dependence:
    • +
    +

    Nonlinear Correlation and Dependence Using NNS +https://ssrn.com/abstract=3010414.

    +
      +
    • NNS regressions are not dominated by outliers and can +accurately determine partial derivatives in the presence of noise. See +the following article,
    • +
    +

    Nonparametric Regression Using Clusters +http://rdcu.be/tz0J.

    +

    Below we recreate the examples Marcos provides and report the +NNS results for correlation and dependence:

    +
    # Generate x,y,e 
    +set.seed(123);x=rnorm(1000);y=rnorm(1000);e=rnorm(1000)
    +y=100*x+e
    +NNS.dep(x,y,print.map = TRUE)
    +

    +
    ## $Correlation
    +## [1] 0.9877512
    +## 
    +## $Dependence
    +## [1] 0.9877512
    +
    y=100*abs(x)+e
    +NNS.dep(x,y,print.map = TRUE)
    +

    +
    ## $Correlation
    +## [1] 0.09562359
    +## 
    +## $Dependence
    +## [1] 0.9086349
    +
    +
    +

    Pitfall #3: Variance Adjudication & The Causality Fallacy

    +
      +
    • NNS regression is not just interpolation, but very good +extrapolation as well. See the previous regression link for more. Nonparametric Regression Using Clusters
    • +
    • Regression is not the wrong tool per se, rather what and how you are +regressing is of more importance. See the following forecasting +presentation which utilizes NNS regressions to achieve +excellent forecasting results.
    • +
    +
    +

    NNS Forecasting Presentation +

    +

    Download the .pdf file here: https://ssrn.com/abstract=3382300

    +
    +
    +

    NNS Forecasting vs. KERAS LSTM +Deep Learning

    +

    View and download the .html file here: https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Sunspots_example.html

    +
    +
    + +
    +

    Pitfall #5: p-Values

    +
      +
    • We identify a problem with p-values where the power of the test is +completely ignored. NNS proposes a solution based on degree +1 Lower Partial Moment CDFs.
    • +
    +

    Below is an image from the paper illustrating a decreasing \(\beta\), signifying an increasing power (in +blue) and how p-values (in red) jump to significant levels almost +immediately!

    +
    +

    NNS p-value Paper

    +

    The full paper demonstrating this effect is available for download +here:

    +

    Continuous CDFs and ANOVA with NNS +https://ssrn.com/abstract=3007373

    +

    +
    +
    +

    NNS Feature +Importance

    +

    Below is an example where the +NNS dimension reduction regression is able to identify the +significant regressors from 45 additional nonsensical noisy regressors. +NNS also contains a routine NNS.stack to find the optimal +threshold to seperate these regressors.

    +
    # Noisy regressor example from: http://www.win-vector.com/blog/2016/05/pcr_part1_xonly/
    +# build example where even and odd variables are bringing in noisy images
    +# of two different signals.
    +mkData <- function(n) {
    +  for(group in 1:10) {
    +    # y is the sum of two effects yA and yB
    +    yA <- rnorm(n)
    +    yB <- rnorm(n)
    +    if(group==1) {
    +      d <- data.frame(y=yA+yB+rnorm(n))
    +      code <- 'x'
    +    } else {
    +      code <- paste0('noise',group-1)
    +    }
    +    yS <- list(yA,yB)
    +    # these variables are correlated with y in group 1,
    +    # but only to each other (and not y) in other groups
    +    for(i in 1:5) {
    +      vi <- yS[[1+(i%%2)]] + rnorm(nrow(d))
    +      d[[paste(code,formatC(i,width=2,flag=0),sep='.')]] <- ncol(d)*vi
    +    }
    +  }
    +  d
    +}
    +
    +set.seed(12345)
    +dTrain <- mkData(1000)
    +dTest <- mkData(1000)
    +
    +# Find optimal threshold and store output 
    +optimal.threshold = NNS.stack(dTrain[,-1], dTrain[,1], method = 2,
    +                              dim.red.method = "cor")$NNS.dim.red.threshold
    +optimal.threshold
    +
    ## [1] 0.36
    +
    # Print synthetic regressor equation using 'optimal.threshold'
    +print( NNS.reg(dTrain[,-1], dTrain[,1],  dim.red.method = "cor" ,
    +               threshold = optimal.threshold, plot = TRUE, smooth = TRUE)$equation )
    +

    +
    ##        Variable Coefficient
    +##  1:        x.01   0.4064923
    +##  2:        x.02   0.3623050
    +##  3:        x.03   0.4342246
    +##  4:        x.04   0.3943531
    +##  5:        x.05   0.4009567
    +##  6:   noise1.01   0.0000000
    +##  7:   noise1.02   0.0000000
    +##  8:   noise1.03   0.0000000
    +##  9:   noise1.04   0.0000000
    +## 10:   noise1.05   0.0000000
    +## 11:   noise2.01   0.0000000
    +## 12:   noise2.02   0.0000000
    +## 13:   noise2.03   0.0000000
    +## 14:   noise2.04   0.0000000
    +## 15:   noise2.05   0.0000000
    +## 16:   noise3.01   0.0000000
    +## 17:   noise3.02   0.0000000
    +## 18:   noise3.03   0.0000000
    +## 19:   noise3.04   0.0000000
    +## 20:   noise3.05   0.0000000
    +## 21:   noise4.01   0.0000000
    +## 22:   noise4.02   0.0000000
    +## 23:   noise4.03   0.0000000
    +## 24:   noise4.04   0.0000000
    +## 25:   noise4.05   0.0000000
    +## 26:   noise5.01   0.0000000
    +## 27:   noise5.02   0.0000000
    +## 28:   noise5.03   0.0000000
    +## 29:   noise5.04   0.0000000
    +## 30:   noise5.05   0.0000000
    +## 31:   noise6.01   0.0000000
    +## 32:   noise6.02   0.0000000
    +## 33:   noise6.03   0.0000000
    +## 34:   noise6.04   0.0000000
    +## 35:   noise6.05   0.0000000
    +## 36:   noise7.01   0.0000000
    +## 37:   noise7.02   0.0000000
    +## 38:   noise7.03   0.0000000
    +## 39:   noise7.04   0.0000000
    +## 40:   noise7.05   0.0000000
    +## 41:   noise8.01   0.0000000
    +## 42:   noise8.02   0.0000000
    +## 43:   noise8.03   0.0000000
    +## 44:   noise8.04   0.0000000
    +## 45:   noise8.05   0.0000000
    +## 46:   noise9.01   0.0000000
    +## 47:   noise9.02   0.0000000
    +## 48:   noise9.03   0.0000000
    +## 49:   noise9.04   0.0000000
    +## 50:   noise9.05   0.0000000
    +## 51: DENOMINATOR   5.0000000
    +##        Variable Coefficient
    +
    +
    +
    +

    Pitfall #6: Training-Set Overfitting

    +
      +
    • NNS has the ability to perfectly fit any training set. +However, the ability to accurately determine signal:noise ratios allows +NNS to avoid overfitting. The regression paper Nonparametric Regression Using Clusters +link from pitfall #2 thoroughly discusses and proves this avoidance of +overfitting.
    • +
    +
    +
    +

    Pitfall #7: Test-Set Overfitting

    +
      +
    • The use of EXPECTED partial moments goes a long way in addressing +this issue in finance. We published a not the +way to accomplish this and underlying rationale borrowed from +information theory. Please see the following link for more on the method +and link to the paper:
    • +
    +

    Expected Partial Moments +https://www.linkedin.com/pulse/expected-partial-moments-fred-viole/

    +
    +
    +

    Summary

    +

    NNS is a quite capable methodology able to properly +address many of these pressing issues Marcos so diligently raises.

    +

    I look forward to further discussions and collaboration with those +equally as passionate about these issues, and open to embracing +alternative solutions. If you found this presentation interesting or +useful, please feel free to reach out via e-mail:

    +

    Thanks for your interest!

    +
    + + + +
    +
    + +
    + + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/Bayes' Theorem From Partial Moments.pdf b/tools/NNS/examples/Bayes' Theorem From Partial Moments.pdf new file mode 100644 index 0000000..7b41037 Binary files /dev/null and b/tools/NNS/examples/Bayes' Theorem From Partial Moments.pdf differ diff --git a/tools/NNS/examples/Bias_and_CI.html b/tools/NNS/examples/Bias_and_CI.html new file mode 100644 index 0000000..172ce3f --- /dev/null +++ b/tools/NNS/examples/Bias_and_CI.html @@ -0,0 +1,3082 @@ + + + + + + + + + + + + + + +Continuous vs. Discrete Confidence Intervals + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    +
    +
    +
    +
    + +
    + + + + + + + +
    +

    Discrete CDF Probability:

    +

    \[P(x \leq b) = n^{-1}\sum_{i=1}^n I(x_i \leq b) \] where \(I\) is the indicator function.

    +

    This is represented by the degree 0 LPM ratio for target \((b)\):

    +

    \[ LPM(\color{red}{0}, b, x) = \frac{1}{n} \sum_{i=1}^n [max(0, b - x_i)]^\color{red}{0} \] Whereby the 0 exponent converts this to a discrete indicator function. It is already normalized [0,1].

    +
    +
    +

    Continuous PDF Probability:

    +

    \[P(x \leq b) = \frac{\int_{a}^b f(x) dx}{(\int_{a}^b f(x) dx +\int_{b}^c f(x) dx) } \quad where \quad a<b<c\] This is represented by the degree 1 LPM ratio for target \((b)\):

    +

    \[ LPM(\color{red}{1}, b, x) = \frac{1}{n} \sum_{i=1}^n [max(0, b - x_i)]^\color{red}{1} \] And to normalize this value [0,1], we divide by the total area (lower partial moment + upper partial moment).

    +

    \[ LPM_{norm}(1, b, x) = \frac{LPM(1, b, x)}{LPM(1, b, x) + UPM(1, b, x)} \]

    +

    In NNS, this is the LPM.ratio(degree, target, variable) function.

    +
    +
    +

    Key Difference

    +

    It is different from the discrete because in order to go from the CDF (discrete, \(F(x)\)), the PDF (continuous, \(f(x)\)) contains the magnitude of the observations through the derivative of the CDF whereby \(f(x) = \frac{dF(x)}{dx}\).

    +

    Even if you have 1mm observations in the CDF, it is still discrete, but by using the PDF, you have captured the area between the bins of the discrete and are truly continuous.

    +
    +
    +

    Elementary Example

    +

    Here we will illustrate two very important features, the difference from using the PDF vs. CDF in bias reduction and the ability to compensate for area between values.

    +

    First, we will show that the degree 0 partial moment ratio is equal to the empirical CDF.

    +
    library(NNS)
    +set.seed(12345)
    +x = rnorm(100, mean = 5, sd = 1)
    +
    +

    Empirical CDF

    +
    P = ecdf(x)
    +P(mean(x))
    +
    ## [1] 0.44
    +
    +
    +

    Discrete LPM degree 0

    +
    LPM.ratio(0, mean(x), x)
    +
    ## [1] 0.44
    +
    +
    +

    Visualization

    +

    We can see the red LPM degree 0 points overlayed on the empirical CDF…same values.

    +
    plot(ecdf(x))
    +points(sort(x), LPM.ratio(0,sort(x),x), col = 'red')
    +legend('left', legend = c('ecdf','LPM.CDF'), fill=c('black','red'), border=NA, bty='n')
    +

    +
    +
    +
    +

    Bias

    +

    In this example, 44% of the observations lie below the mean. We know that this should approach 50% in the limit, but it will never equal 50% for any finite number of observations.

    +
    +

    Continuous LPM degree 1

    +

    Let’s see how the continuous area based probability treats this…

    +
    LPM.ratio(1, mean(x), x)
    +
    ## [1] 0.5
    +

    50% exactly. Wow, good job! Maybe it’s a lucky outcome, let’s increase the number of observations and see what happens…

    +
    set.seed(12345)
    +x_2 = rnorm(500, mean = 5, sd = 1)
    +
    +P = ecdf(x_2)
    +P(mean(x_2))
    +
    ## [1] 0.496
    +
    LPM(0, mean(x_2), x_2)
    +
    ## [1] 0.496
    +
    LPM.ratio(1, mean(x_2), x_2)
    +
    ## [1] 0.5
    +

    Still not there for the discrete CDF based probabilities, but the area based probability is consistent.

    +

    Let’s test this out for a range of observations:

    +
    # Generate data
    +set.seed(12345); x = rnorm(500)
    +
    +# Compute statistics for each observation
    +LPM.mean_target.CDF = numeric()
    +LPM.1.mean_target.CDF = numeric()
    +
    +for(i in 1:length(x)){
    +    LPM.mean_target.CDF[i] = LPM.ratio(0, mean(x[1:i]), x[1:i]);
    +    LPM.1.mean_target.CDF[i] = LPM.ratio(1, mean(x[1:i]), x[1:i])
    +}
    +
    +# Plot values
    +plot(LPM.mean_target.CDF, col='red', type = 'l', lwd=3)
    +lines((1:500), LPM.1.mean_target.CDF, col='blue', lwd=3)
    +legend('topright',legend = c('LPM.CDF','LPM.1.CDF'),fill=c('red','blue'),
    +       border=NA,bty='n')
    +

    +

    For every observation of every type of distribution the continuous probability of LPM degree 1 from the \((\hat{\mu}_x)\) will equal 0.5, without exception, while the empirical CDF based probability will asymptotically approach this known value.

    +

    There is no bias associated with this measure. Once we realize this salient point, we can apply it to other points in the distribution to ascertain confidence intervals.

    +
    +
    +
    +

    Confidence Intervals

    +

    Now that the bias (lack thereof) of the metric has been established, we can move to confidence intervals.

    +

    Let’s look at an example described in Efron and Tibshirani’s (1993) text on bootstrapping (page 19), available in the R-package bootstrap.

    +
    library(bootstrap)
    +data("law")
    +
    +law
    +
    + +
    +

    Using the other bootstrapping package in R, boot we can easily generate several confidence intervals for the correlation statistic of the sample.1

    +
    library(boot)
    +
    +# Define Correlation Function
    +get_r <- function(data, indices, x, y) {
    +
    +  d <- data[indices, ]
    +  r <- round(as.numeric(cor(d[x], d[y])), 3)
    +
    +  r
    +}
    +
    +set.seed(12345)
    +
    +boot_out <- boot(
    +  law,
    +  x = "LSAT", 
    +  y = "GPA", 
    +  R = 500,
    +  statistic = get_r
    +)
    +
    +# Visualization of the distribution of bootstrapped correlation statistics
    +hist(boot_out$t)
    +

    +
    # Confidence Intervals and their methods
    +boot.ci(boot_out)
    +
    ## Warning in boot.ci(boot_out): bootstrap variances needed for studentized
    +## intervals
    +
    ## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
    +## Based on 500 bootstrap replicates
    +## 
    +## CALL : 
    +## boot.ci(boot.out = boot_out)
    +## 
    +## Intervals : 
    +## Level      Normal              Basic         
    +## 95%   ( 0.5247,  1.0368 )   ( 0.5900,  1.0911 )  
    +## 
    +## Level     Percentile            BCa          
    +## 95%   ( 0.4609,  0.9620 )   ( 0.3948,  0.9443 )  
    +## Calculations and Intervals on Original Scale
    +## Some BCa intervals may be unstable
    +

    It looks very asymmetrical, we can try to correct for that with a double bootstrap by keeping the variance of the statistic for each bootstrap and studentizing…

    +
    library(purrr)
    +
    ## 
    +## Attaching package: 'purrr'
    +
    ## The following objects are masked from 'package:foreach':
    +## 
    +##     accumulate, when
    +
    get_r_var <- function(x, y, data, indices, its) {
    +
    +  d <- data[indices, ]
    +  r <- cor(d[x], d[y]) %>%
    +    as.numeric() %>%
    +    round(3)
    +  n <- nrow(d)
    +
    +  v <- boot(
    +    x = x, 
    +    y = y,
    +    R = its,
    +    data = d,
    +    statistic = get_r
    +  ) %>%
    +    pluck("t") %>%
    +    var(na.rm = TRUE)
    +  c(r, v)
    +}
    +
    +boot_t_out <- boot(
    +  x = "LSAT", y = "GPA", its = 200,
    +  R = 1000, data = law, statistic = get_r_var
    +)
    +
    +
    +# CI
    +boot.ci(boot_t_out)
    +
    ## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
    +## Based on 1000 bootstrap replicates
    +## 
    +## CALL : 
    +## boot.ci(boot.out = boot_t_out)
    +## 
    +## Intervals : 
    +## Level      Normal              Basic             Studentized     
    +## 95%   ( 0.5187,  1.0485 )   ( 0.5880,  1.0830 )   (-0.2543,  0.9728 )  
    +## 
    +## Level     Percentile            BCa          
    +## 95%   ( 0.4690,  0.9640 )   ( 0.3210,  0.9408 )  
    +## Calculations and Intervals on Original Scale
    +## Some BCa intervals may be unstable
    +

    Negative values for the studentized version!?!?!

    +

    Next, we can compare these results with the partial moments solutions.

    +
    +

    Discrete LPM degree 0

    +

    Notice this corresponds with the percentile based CI. We call this from the double bootstrap output to note the correspondence. It is also true for the regular bootstrap using LPM.VaR(0..25, 0, boot_out$t) and that percentile output.

    +
    # Discrete Lower and Upper CI
    +LPM.VaR(.025, 0, boot_t_out$t[,1]); UPM.VaR(.025, 0, boot_t_out$t[,1])
    +
    ## [1] 0.4688333
    +
    ## [1] 0.9632222
    +
    +
    +

    Continuous LPM degree 1

    +

    Much more sensible treatment, especially noticeable in the left-tail of our original bootstrap. There is no need for the double bootstrap as the area based probabilities compensate our original single bootstrap output boot_out$t.

    +
    LPM.VaR(.025, 1, boot_out$t); UPM.VaR(0.25, 1, boot_out$t)
    +
    ## [1] 0.5612749
    +
    ## [1] 0.8263255
    +
    +
    +
    +
    +
      +
    1. There is an excellent description with code explaining all of the derivations for each component of the bootstrap confidence interval output at the following link: https://blog.methodsconsultants.com/posts/understanding-bootstrap-confidence-interval-output-from-the-r-boot-package/

    2. +
    +
    + + + +
    +
    + +
    + + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/Boston_Housing.html b/tools/NNS/examples/Boston_Housing.html new file mode 100644 index 0000000..088ff83 --- /dev/null +++ b/tools/NNS/examples/Boston_Housing.html @@ -0,0 +1,2983 @@ + + + + + + + + + + + + + +Boston Housing Example + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    +
    +
    +
    +
    + +
    + + + + + + + +
    +

    1 Introduction

    +

    The dataset (Boston Housing Price) was taken from the StatLib library which is maintained at Carnegie Mellon University and is freely available for download from the UCI Machine Learning Repository. The dataset consists of 506 observations of 14 attributes. The median value of house price in $1000s, denoted by MEDV, is the outcome or the dependent variable in our model. Below is a brief description of each feature and the outcome in our dataset:1

    +
      +
    1. CRIM – per capita crime rate by town
    2. +
    3. ZN – proportion of residential land zoned for lots over 25,000 sq.ft
    4. +
    5. INDUS – proportion of non-retail business acres per town
    6. +
    7. CHAS – Charles River dummy variable (1 if tract bounds river; else 0)
    8. +
    9. NOX – nitric oxides concentration (parts per 10 million)
    10. +
    11. RM – average number of rooms per dwelling
    12. +
    13. AGE – proportion of owner-occupied units built prior to 1940
    14. +
    15. DIS – weighted distances to five Boston employment centres
    16. +
    17. RAD – index of accessibility to radial highways
    18. +
    19. TAX – full-value property-tax rate per $10,000
    20. +
    21. PTRATIO – pupil-teacher ratio by town
    22. +
    23. B – 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
    24. +
    25. LSTAT – % lower status of the population
    26. +
    27. MEDV – Median value of owner-occupied homes in $1000’s
    28. +
    +
    #require(devtools); install_github('OVVO-Financial/NNS', ref = "NNS-Beta-Version")
    +
    +library(NNS)
    +library(data.table)
    +library(rgl)
    +library(mlbench)
    +
    +
    +data("BostonHousing")
    +str(BostonHousing)
    +
    ## 'data.frame':    506 obs. of  14 variables:
    +##  $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
    +##  $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
    +##  $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
    +##  $ chas   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    +##  $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
    +##  $ rm     : num  6.58 6.42 7.18 7 7.15 ...
    +##  $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
    +##  $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
    +##  $ rad    : num  1 2 2 3 3 3 5 5 5 5 ...
    +##  $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
    +##  $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
    +##  $ b      : num  397 397 393 395 397 ...
    +##  $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
    +##  $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
    +
    +
    +

    2 NNS.stack

    +

    NNS.reg is the nonparametic regression technique most of NNS is built on.

    +

    NNS.stack is a wrapper function that tests the number of clusters n.best parameter in NNS.reg, and the correlation / dependence threshold in the NNS.reg dimension reduction regression.

    +

    It is suitable for both continuous and categorical independent / dependent variables. No data preprocessing or transformations are required.

    +
    library(caret)
    +set.seed(12345)
    +
    +
    +#Do data partitioning
    +inTrain <- createDataPartition(y = BostonHousing$medv, p = 0.70, list = FALSE)
    +
    +training <- BostonHousing[inTrain,]
    +testing <- BostonHousing[-inTrain,]
    +
    +
    +nns_reg_estimates <- NNS.stack(training[, -14], training[, 14], IVs.test = testing[, -14],
    +                               status = TRUE,
    +                               obj.fn = expression( sqrt(mean((predicted - actual)^2))),
    +                               objective = 'min')
    +
    +
    +# RMSE of Optimum NNS regression
    +sqrt(mean((nns_reg_estimates$reg-testing[,14])^2))
    +
    ## [1] 4.138821
    +
    # RMSE of Optimum NNS dimension reduction regression
    +sqrt(mean((nns_reg_estimates$dim.red-testing[,14])^2))
    +
    ## [1] 3.661433
    +
    # RMSE of Combined Methods
    +sqrt(mean(abs(testing[,14]-nns_reg_estimates$stack)^2))
    +
    ## [1] 2.955937
    +
    plot(nns_reg_estimates$stack, testing[,14])
    +
    +# Actual Values
    +lines(testing[,14],testing[,14],col='red')
    +

    +
    +
    +

    3 Random Forest

    +

    Using the random forest method, we can compare results.

    +
    library(randomForest)
    +set.seed(12345)
    +fit.rf <- randomForest(formula = medv ~ ., data = training)
    +
    +pred.rf <- predict(fit.rf, testing)
    +
    +sqrt(mean((pred.rf - testing$medv)^2))
    +
    ## [1] 2.371259
    +
    plot(nns_reg_estimates$stack, testing[,14], pch = 19)
    +lines(testing[,14],testing[,14],col='red')
    +points(pred.rf, testing[,14],col='red')
    +

    +
    +
    +

    4 Comments

    +

    The point of this single seed example2 is to further highlight the robust flexibility of the NNS multivariate regression. Whether the goal is classification or inter / extrapolation of continuous variables, NNS.reg is capable of the task.

    +

    One distinct advantage of NNS over tree based methods is the ability to seamlessly extrapolate beyond the current range of observations.

    +

    NNS.reg is not a black-box ML solution, rather, NNS provides a theoretically sound solution to dynamically partitioning regressors, and finding conditional outputs from those partitions.

    +

    NNS is not a one-trick pony, as it has been demonstrated to excel in time-series forecasting, nonlinear continuous regressions, and provide solutions for econometric applications. See the following examples:

    + +

    I look forward to further discussions and collaboration with those equally as passionate about these issues, and open to embracing alternative solutions. If you found this presentation interesting or useful, please feel free to reach out via e-mail:

    +

    Thanks for your interest!

    +
    +
    +
    +
      +
    1. https://rpubs.com/chocka314/251613↩︎

    2. +
    3. Numerous seeds are required for a more robust experiment and efficiency conclusions like presented in the NNS vs xgboost example.↩︎

    4. +
    +
    + + + +
    +
    + +
    + + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/Causal_Inference_Amongst_Macroeconomic_Variables_Using_NNS.html b/tools/NNS/examples/Causal_Inference_Amongst_Macroeconomic_Variables_Using_NNS.html new file mode 100644 index 0000000..13295d5 --- /dev/null +++ b/tools/NNS/examples/Causal_Inference_Amongst_Macroeconomic_Variables_Using_NNS.html @@ -0,0 +1,1921 @@ + + + + + + + + + + + + + + +Causal Inference Amongst Macroeconomic Variables Using NNS + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + +
    +

    Economic Variables

    +

    The variable list was used from the NY Fed Nowcast method (https://www.newyorkfed.org/research/policy/nowcast) and +augmented with several additional variables, specifically:

    +
      +
    • 10-year Treasury rates DGS10

    • +
    • 2-10 year Treasury rate spread T10Y2Y

    • +
    • weekly unemployment claims ICSA

    • +
    • Federal Reserve total assets WALCL

    • +
    +
    library(NNS) # NNS v0.9.4 on CRAN
    +library(rmarkdown)
    +library(nowcasting)
    +
    +NYFED$legend[,-4]
    +
    + +
    +
    +
    +

    Step 1: Create a NNS.nowcast object

    +

    This preliminary step will create an interpolated/extrapolated +data.frame of all of the economic variables such that frequencies are +aligned to monthly values. We achieve this via the +NNS.nowcast function which has all variables predefined. +Setting h=0 returns just the data to the current month.

    +
    nns_estimates = NNS.nowcast(h = 0)
    +
    +# View the last year's values
    +tail(nns_estimates, 12)
    +
    + +
    +
    +
    +

    Step 2: Determine causation with NNS.caus

    +

    NNS causality tries to determine the conditional +probability of two events by first normalizing past innovations of +itself for a given \(\tau\) (per the +Granger insight) such that:

    +

    $$x^* = f(x_{t-1}, x_{t-2},…,x_{t-\tau})$$

    +

    $$y^* = f(y_{t-1}, y_{t-2},…,y_{t-\tau})$$

    +

    and then normalizing each of these new variables to a shared +rangespace by:

    +

    $$ x^{**} = f(x^*, y^*)$$

    +

    $$ y^{**} = g(x^*, y^*)$$

    +

    Once \(x^{**}\) and \(y^{**}\) are on the shared rangespace, the +conditional probability can be ascertained and then the correlation +applied to determine causation. \[C_{x\rightarrow{y}} = P(y^{**}|x^{**}) * +\rho_{x^{**}y^{**}}\]

    +

    NNS.caus returns a matrix where directional causation is +returned as [column variable] ---> [row variable].

    +
    nns_econ_causes = NNS.caus(nns_estimates)
    +
    +colnames(nns_econ_causes) = colnames(nns_estimates)
    +
    +dim(nns_econ_causes)
    +
    ## [1] 29 29
    +

    Large 29x29 matrix output, so we will just call the +WALCL column (note the 1 for the final entry which would be +the lower right diagonal entry).

    +
    nns_econ_causes[, ncol(nns_econ_causes)]
    +
    ##             PAYEMS             JTSJOL           CPIAUCSL            DGORDER 
    +##        0.659826699        0.429510557        0.205719438        0.419194364 
    +##              RSAFS             UNRATE              HOUST             INDPRO 
    +##        0.081722966        0.000000000        0.002056343        0.621722421 
    +##            DSPIC96            BOPTEXP            BOPTIMP            TTLCONS 
    +##        0.004049306        0.343193396        0.244118243        0.278626686 
    +##                 IR           CPILFESL           PCEPILFE              PCEPI 
    +##        0.564019839        0.094859761        0.141432154        0.165069001 
    +##             PERMIT                TCU             BUSINV             ULCNFB 
    +##        0.000000000        0.000000000        0.433602113        0.395638075 
    +##                 IQ  GACDISA066MSFRBNY GACDFSA066MSFRBPHI             PCEC96 
    +##        0.395719798        0.000000000        0.071892695        0.145776228 
    +##              GDPC1               ICSA              DGS10             T10Y2Y 
    +##        0.250547691        0.008500394        0.304207853        0.244174295 
    +##              WALCL 
    +##        1.000000000
    +

    All positive or 0 causal directions to the other economic +variables. Surely this can’t be, let’s try to add some noise +terms to see if those are caused by the Federal Reserve’s asset levels +too…

    +
    +
    +

    Add some noise terms

    +

    We will add some noise terms to see if WALCL expresses +causal inference on those variables as well. NOISE is \(N(0,1)\) while NOISE_2 is +\(N(10,20)\).

    +
    # View the last year's values with both noise terms
    +data.frame(tail(econ_variables_with_noise, 12))
    +
    + +
    +

    Same NNS.nowcast procedure with the new +econ_variables_with_noise which utilizes the following +NNS.VAR call:

    +
    nns_estimates_with_noise = NNS.VAR(econ_variables_with_noise, h = 0, tau = 12, nowcast = TRUE)
    +

    Then use nns_estimates_with_noise in +NNS.caus.

    +
    nns_econ_causes_with_noise = NNS.caus(nns_estimates_with_noise)
    +nns_econ_causes_with_noise[, 29:31]
    +
    ##                           WALCL       NOISE       NOISE_2
    +## PAYEMS              0.659826688  0.09856778  0.0000000000
    +## JTSJOL              0.429510904  0.06808786  0.0000000000
    +## CPIAUCSL            0.205705476  0.08698442  0.0000000000
    +## DGORDER             0.419190516  0.08793995  0.0000000000
    +## RSAFS               0.081711194  0.09226950  0.0000000000
    +## UNRATE              0.000000000  0.05121020  0.0953364223
    +## HOUST               0.002055175  0.06112331  0.0010841936
    +## INDPRO              0.621722574  0.10295232  0.0400328086
    +## DSPIC96             0.004050983  0.09506633  0.0000000000
    +## BOPTEXP             0.352358901  0.07351663  0.0000000000
    +## BOPTIMP             0.242951914  0.09865277  0.0000000000
    +## TTLCONS             0.278626931  0.00000000  0.0000000000
    +## IR                  0.564040215  0.09224655  0.0458487296
    +## CPILFESL            0.094841592  0.08657842  0.0000000000
    +## PCEPILFE            0.141416551  0.09280547  0.0000000000
    +## PCEPI               0.165053156  0.09367095  0.0001504519
    +## PERMIT              0.000000000  0.04650272  0.0036113237
    +## TCU                 0.000000000  0.10505892  0.0919810910
    +## BUSINV              0.433632640  0.08018879  0.0000000000
    +## ULCNFB              0.388491973  0.09636205  0.0000000000
    +## IQ                  0.395723389  0.00000000  0.0096782079
    +## GACDISA066MSFRBNY   0.000000000  0.02332553 -0.0477040828
    +## GACDFSA066MSFRBPHI  0.071698602 -0.08785480 -0.0325825676
    +## PCEC96              0.145760542  0.09567023  0.0000000000
    +## GDPC1               0.250543113  0.09532958  0.0000000000
    +## ICSA                0.008515602  0.01369432  0.0000000000
    +## DGS10               0.306391014  0.00859394  0.0613115325
    +## T10Y2Y              0.243275602  0.04459461  0.0815768202
    +## WALCL               1.000000000  0.06857393  0.0000000000
    +## NOISE              -0.068573926  1.00000000  0.0291295109
    +## NOISE_2             0.000000000 -0.02912951  1.0000000000
    +

    No positive causal inference for the noise term and no causal +inference for the expanded noise term.

    +
    +
    +

    Significance and Strength of Inference

    +

    We can run random permutations of NNS.caus for the +NOISE and NOISE_2 variables and then determine +strength of causal inference [0,1] for WALCL on the other +economic variables of interest.

    +
    results = list()
    +n = nrow(nns_estimates_with_noise)
    +
    +cl = parallel::makeCluster(detectCores()-1)
    +doParallel::registerDoParallel(cl)
    +
    +
    +results = foreach(i = 1:100, .packages = "NNS")%dopar%{
    +  set.seed(123*i)
    +  nns_estimates_with_noise$NOISE = rnorm(n, 0, 1)
    +  nns_estimates_with_noise$NOISE_2 = rnorm(n, 10, 20)
    +  
    +  NNS.caus(nns_estimates_with_noise)
    +}
    +
    +parallel::stopCluster(cl)
    +registerDoSEQ()
    +
    +noise_permutations = do.call(cbind, lapply(results, function(x) x[,30]))
    +noise_2_permutations = do.call(cbind, lapply(results, function(x) x[,31]))
    +

    Next we will determine the 95% quantile level for each of the +NOISE variables’ NNS.caus values, and see if +WALCL eclipses that value. We are using the maximum +quantile value between both NOISE values for each economic +variable.

    +
    sig_values = lapply(1:29, function(x) list(quantile(noise_permutations[x,], .95),
    +                                           quantile(noise_2_permutations[x,], .95)))
    +
    +
    +noise_sig_values = unlist(lapply(sig_values,`[[`,1))
    +noise_2_sig_values = unlist(lapply(sig_values,`[[`,2))
    +
    +final_comp = cbind.data.frame(nns_econ_causes[, ncol(nns_econ_causes)],
    +                              noise_sig_values,
    +                              noise_2_sig_values)
    +
    +Significant = as.vector(final_comp[,1] > pmax(final_comp[,2], final_comp[,3]))
    +
    +final_comp = cbind(final_comp, Significant)
    +colnames(final_comp)[1:3] = c("WALCL", "NOISE 95% Sig Value", "NOISE_2 95% Sig Value")
    +
    +final_comp
    +
    + +
    +
    +

    Strength of Causal Inference

    +

    WALCL implies causal inference via NNS.caus +onto the following macroeconomic variables in the NY Fed nowcast +model:

    +
    positive_results = which(final_comp$Significant[1:28]>0)
    +effects = cbind.data.frame("Macroeconomic Variable" = colnames(econ_variables)[positive_results], 
    +                           "Strength of Causal Inference of Fed Assets" = 
    +                             as.vector(final_comp[positive_results,1] - pmax(final_comp[positive_results,2],
    +                                                                             final_comp[positive_results,3])))
    +
    +effects
    +
    + +
    +
    +
    +
    +

    References

    +
      +
    1. Viole, Fred, Multivariate Time Series Forecasting: Nonparametric +Vector Autoregression Using NNS (November 18, 2019). Available at SSRN: +https://ssrn.com/abstract=3489550

    2. +
    3. Viole, Fred and Nawrocki, David N., Causation (June 1, 2013). +Available at SSRN: https://ssrn.com/abstract=2273756

    4. +
    +
    + + + + +
    + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/Causal_Inference_with_NNS_stack.pdf b/tools/NNS/examples/Causal_Inference_with_NNS_stack.pdf new file mode 100644 index 0000000..a030c59 Binary files /dev/null and b/tools/NNS/examples/Causal_Inference_with_NNS_stack.pdf differ diff --git a/tools/NNS/examples/Continuous_CDFs_and_ANOVA_with_NNS.pdf b/tools/NNS/examples/Continuous_CDFs_and_ANOVA_with_NNS.pdf new file mode 100644 index 0000000..4fb698a Binary files /dev/null and b/tools/NNS/examples/Continuous_CDFs_and_ANOVA_with_NNS.pdf differ diff --git a/tools/NNS/examples/Curve_Fitting.html b/tools/NNS/examples/Curve_Fitting.html new file mode 100644 index 0000000..9b0f69f --- /dev/null +++ b/tools/NNS/examples/Curve_Fitting.html @@ -0,0 +1,223 @@ + + + + + + + + + + + + + +NNS vs. Taylor Approximation & Piecewise Linear Regression + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + + + +

    These examples will highlight the important differences in curve fitting between the 3 methods. We will work with the same sine wave data for all 3 examples.

    +
    +

    Problems:

    +
    +

    Taylor

    +

    We are nowhere near the fit of the entire function, merely close to the one point of interest, in this case the mean value of \(x\). NNS fits the entire function.

    +
    +
    +

    Linear Regression

    +

    The problem with linear segments is the gaps between segments. These gaps close as the number of segments is increased, but will never be continuous due to the minimum number of observations required for a regression. NNS requires significantly less steps than corresponding linear segmentation.

    +
    library(NNS)
    +x=seq(0,4*pi,pi/100)
    +y=sin(x)
    +par(mfrow=c(3,3))
    +
    +# NNS
    +for(i in 1:3){
    +NNS.reg(x,y,order=i)}
    +
    +# Taylor
    +library(pracma)
    +f.x = function(x) sin(x)
    +for (i in 1:3){
    +p <- taylor(f.x,mean(x),i)
    +yp <- polyval(p, x)
    +plot(x, y,col='steelblue',ylim=c(-1.5,1.5),main = paste0("Taylor Degree ",i))
    +lines(x, yp, col = "red",type="l",lwd=3)}
    +
    +# Linear Regression Segments
    +# Segment data based on NNS order
    +xy=NNS.part(x,y,order=1,type = "XONLY")$dt[prior.quadrant=='q',.(x,y)]
    +
    +plot(x,y,col='steelblue',main="Linear Regression")
    +abline(lm(xy$y~xy$x),col='red',lwd=3)
    +
    +# First Segmentation
    +Break <- mean(x)
    +xy$grp <- xy$x < Break
    +
    +m <- lm(y~x*grp,data = xy)
    +xy$pred <- predict(m)
    +
    +plot(xy$x,xy$y,col='steelblue',main="2 Linear Regressions")
    +dat <- xy[order(xy$x),]
    +with(subset(dat,x < Break),lines(x,pred,col='red',lwd=3))
    +with(subset(dat,x >= Break),lines(x,pred,col='red',lwd=3))
    +
    +# Second Segmentation
    +Breaks = c(4*pi/3,2*(4*pi/3))
    +
    +xy$grp1 <- xy$x < Breaks[1]
    +xy$grp2 <- xy$x >= Breaks[1] & xy$x < Breaks[2]
    +xy$grp3 <- xy$x >= Breaks[2] 
    +
    +m1 <- lm(y~x*grp1,data = xy)
    +m2 <- lm(y~x*grp2,data = xy)
    +m3 <- lm(y~x*grp3,data = xy)
    +
    +xy$pred1 <- predict(m1)
    +xy$pred2 <- predict(m2)
    +xy$pred3 <- predict(m2)
    +
    +plot(xy$x,xy$y,col='steelblue',main="3 Linear Regressions")
    +dat <- xy[order(xy$x),]
    +with(subset(dat,x < Breaks[1]),lines(x,pred1,col='red',lwd=3))
    +with(subset(dat,x >= Breaks[1] & x < Breaks[2]),lines(x,pred2,col='red',lwd=3))
    +with(subset(dat,x >= Breaks[2]),lines(x,pred3,col='red',lwd=3))
    +

    +
    +
    + + + + +
    + + + + + + + + diff --git a/tools/NNS/examples/DatasaurusDozen.html b/tools/NNS/examples/DatasaurusDozen.html new file mode 100644 index 0000000..3611713 --- /dev/null +++ b/tools/NNS/examples/DatasaurusDozen.html @@ -0,0 +1,1706 @@ + + + + + + + + + + + + + +Datasaurus Dozen + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + +
    +

    Load NNS, entropy Packages and Read the +.csv file

    +

    The dataset is available for download from the following site: https://www.autodeskresearch.com/publications/samestats

    +
    library(NNS);library(entropy);library(data.table);library(energy);library(XICOR)
    +
    +data = data.table::fread("https://raw.githubusercontent.com/OVVO-Financial/NNS/Data-and-Simulation-Routines/Datasets/DatasaurusDozen.tsv", sep = "\t")
    +
    +
    +

    Pearson Correlation vs. NNS Dependence vs. Mutual Information +vs. Distance Correlation vs. Chatterjee’s Xi Cor

    +
    data[,list(PEARSON_CORRELATION=cor(x,y),
    +        NNS_DEPENDENCE=NNS.dep(x,y)$Dependence,
    +        MUTUAL_INF=mi.plugin(rbind(x,y)),
    +        DISTANCE_COR=dcor(x,y),
    +        Xi=xicor(x,y)),
    +       by=dataset]
    +
    + +
    +
    +
    +

    Visualization of the non-random datasets

    +
    par(mfrow=c(3,3))
    +
    +for(i in unique(data$dataset)){
    +    plot(data[dataset==i,x],data[dataset==i,y], xlab="X", ylab="Y", main = i)
    +}
    +

    +
    + + + + +
    + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/Logistic_Comparison.html b/tools/NNS/examples/Logistic_Comparison.html new file mode 100644 index 0000000..d0b93cb --- /dev/null +++ b/tools/NNS/examples/Logistic_Comparison.html @@ -0,0 +1,3057 @@ + + + + + + + + + + + + + + +NNS Multivariate Regression vs. Logistic Regression + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    +
    +
    +
    +
    + +
    + + + + + + + +
    +

    1 Intro

    +

    A recent blog post1 highlighted the applicability of logistic +regression for a binary classification problem using a dataset with +products bought by customers with the additional information whether the +respective buyer was pregnant or not.

    +

    We will demonstrate how NNS multivariate regression is quite capable +of such tasks, and how the other machine learning techniques NNS.stack based on the underlying +regression do even better.

    +
    +
    +

    2 Install NNS (>= +10.2)

    +
    # remotes::install_github('OVVO-Financial/NNS', ref = "NNS-Beta-Version")
    +library(NNS)
    +
    +
    +

    3 Download and Read +Data

    +

    Data: http://media.wiley.com/product_ancillary/6X/11186614/DOWNLOAD/ch06.zip

    +

    Save the first excel sheet Training Data w Dummy Vars as +a "RetailMart_train.csv" and read into R.

    +

    Create independent variables.

    +
    +
    +

    4 Run NNS Multivariate +Dimension Reduction Regression and Logistic Regression

    +

    NNS.stack cross-validates +the n.best parameter in +NNS.reg, as well as the +threshold parameter for the +dimension reduction method in NNS.reg.

    +
    nns_model <- NNS.stack(IVs.train = RetailMart_train_IV,
    +                       DV.train = RetailMart_train$PREGNANT + 1,
    +                       IVs.test = RetailMart_train_IV, 
    +                       balance = TRUE, order = "max", type = "CLASS")
    +
    +nns_conf <- table(ifelse(nns_model$stack %% 1 < .5, floor(nns_model$stack), ceiling(nns_model$stack)), RetailMart_train$PREGNANT + 1)
    +
    +
    +logreg <- glm(RetailMart_train$PREGNANT ~ ., data = cbind(RetailMart_train_IV), family = binomial) 
    +
    +pred <- ifelse(predict(logreg,RetailMart_train_IV , "response") < 0.5, 0, 1)
    +
    +logistic_conf <- table(pred, RetailMart_train$PREGNANT)
    +
    +nns_conf
    +
    ##    
    +##       1   2
    +##   1 469 194
    +##   2  31 306
    +
    logistic_conf
    +
    ##     
    +## pred   0   1
    +##    0 450 115
    +##    1  50 385
    +

    Pretty similar results, let’s see how they generalize to unseen +data…

    +
    +
    +

    5 Test set

    +

    The data also offer a test set. So we will save that +Test Set excel sheet as "RetailMart_test.csv" +into R and compare prediction accuracy.

    +
    +

    5.1 Logistic +Regression

    +
    RetailMart_test <- read.csv("RetailMart_test.csv")
    +RetailMart_test_IV <- subset(RetailMart_test, select = names(RetailMart_test)%in%names(RetailMart_train_IV))
    +RetailMart_test_IV <- RetailMart_test_IV[complete.cases(RetailMart_test_IV),]
    +
    +oos_pred <- ifelse(predict(logreg, RetailMart_test_IV , "response") < 0.5, 0, 1)
    +
    +logistic_conf_oos <- table(oos_pred, na.omit(RetailMart_test$PREGNANT))
    +
    +
    +

    5.2 NNS Stack

    +
    nns_stack_oos <- NNS.stack(IVs.train = RetailMart_train_IV,
    +                           DV.train = RetailMart_train$PREGNANT + 1,
    +                           IVs.test = RetailMart_test_IV, 
    +                           balance = TRUE, order = "max", type = "CLASS")
    +
    +
    +nns_stack_conf_oos <- table(ifelse(nns_stack_oos$stack %% 1 < .5, floor(nns_stack_oos$stack),ceiling(nns_stack_oos$stack)), na.omit(RetailMart_test$PREGNANT) + 1)
    +
    +
    +

    5.3 Compare all Methods’ +Confusion Matrices

    +
    +

    5.3.1 Logistic +Regression

    +
    logistic_conf_oos
    +
    ##         
    +## oos_pred   0   1
    +##        0 830  13
    +##        1 110  47
    +
    +
    +

    5.3.2 NNS Stack

    +
    nns_stack_conf_oos
    +
    ##    
    +##       1   2
    +##   1 881  26
    +##   2  59  34
    +
    +
    +
    +

    5.4 Interpretability of +NNS Results.

    +

    We can see the final equation used in the cross-validated NNS.reg function.

    +
    NNS.reg(x = RetailMart_train_IV,
    +        y = RetailMart_train$PREGNANT,
    +        point.est = RetailMart_test_IV,
    +        plot = FALSE,
    +        dim.red.method = "cor", 
    +        threshold = nns_stack_oos$NNS.dim.red.threshold)$equation
    +
    + +
    +

    So after all of that, Folic.Acid alone is the best +predictor of PREGNANT

    +
    mean(na.omit(RetailMart_test$Folic.Acid) == na.omit(RetailMart_test$PREGNANT))
    +
    ## [1] 0.952
    +

    Better than Pregnancy.Test!!!

    +
    mean(na.omit(RetailMart_test$Pregnancy.Test) == na.omit(RetailMart_test$PREGNANT))
    +
    ## [1] 0.937
    +
    +

    5.4.1 Random Forest

    +
    library(randomForest)
    +
    +rf.fit <- randomForest(as.factor(as.character(PREGNANT)) ~ . , data=RetailMart_train, na.action = na.omit)
    +
    +rf.pred <- predict(rf.fit, newdata = RetailMart_test_IV)
    +table(rf.pred, na.omit(RetailMart_test$PREGNANT))
    +
    ##        
    +## rf.pred   0   1
    +##       0 832  14
    +##       1 108  46
    +
    +
    +

    5.4.2 xgboost

    +
    library(xgboost)
    +
    +xgb.train = xgb.DMatrix(data=as.matrix(RetailMart_train_IV),label=RetailMart_train$PREGNANT)
    +xgb.test = xgb.DMatrix(data=as.matrix(RetailMart_test_IV),label=na.omit(RetailMart_test$PREGNANT))
    +
    +num_class = 2
    +
    +params = list(
    +  booster="gbtree",
    +  eta=0.001,
    +  max_depth=5,
    +  gamma=3,
    +  subsample=0.75,
    +  colsample_bytree=1,
    +  objective="multi:softprob",
    +  eval_metric="mlogloss",
    +  num_class=num_class
    +)
    +
    +xgb.fit=xgb.train(
    +  params=params,
    +  data=xgb.train,
    +  nrounds=10000,
    +  nthreads=1,
    +  early_stopping_rounds=10,
    +  watchlist=list(val1=xgb.train),
    +  verbose=0
    +)
    +
    ## [21:19:45] WARNING: src/learner.cc:767: 
    +## Parameters: { "nthreads" } are not used.
    +
    xgb.pred = predict(xgb.fit,as.matrix(RetailMart_test_IV),reshape = TRUE)
    +colnames(xgb.pred)= c(0,1)
    +
    +xgb.pred$prediction = apply(xgb.pred,1,function(x) colnames(xgb.pred)[which.max(x)])
    +table(xgb.pred$prediction,na.omit(RetailMart_test$PREGNANT))
    +
    ##    
    +##       0   1
    +##   0 835  15
    +##   1 105  45
    +
    +
    +
    +
    +

    6 Comments

    +

    NNS is a fully capable continuous regression and classification tool. +NNS.reg is not a black-box ML solution, rather, +NNS provides a theoretically sound solution to dynamically +partitioning regressors, and finding conditional outputs from those +partitions.

    +

    NNS is not a one-trick pony, as it has been demonstrated +to excel in time-series forecasting, nonlinear continuous regressions, +and provide solutions for econometric applications. See the following +examples:

    + +

    I look forward to further discussions and collaboration with those +equally as passionate about these issues, and open to embracing +alternative solutions. If you found this presentation interesting or +useful, please feel free to reach out via e-mail:

    +

    Thanks for your interest!

    +
    + + + + +
    +
    + +
    + + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/Machine_Learning.pdf b/tools/NNS/examples/Machine_Learning.pdf new file mode 100644 index 0000000..434c924 Binary files /dev/null and b/tools/NNS/examples/Machine_Learning.pdf differ diff --git a/tools/NNS/examples/Multivariate NNS Inference.md b/tools/NNS/examples/Multivariate NNS Inference.md new file mode 100644 index 0000000..b579063 --- /dev/null +++ b/tools/NNS/examples/Multivariate NNS Inference.md @@ -0,0 +1,309 @@ +# Multivariate NNS Inference Workflow + +## Case Study: Controlled Effect of Transmission (`am`) on `mpg` + +This example illustrates a multivariate inference workflow using the NNS framework to estimate the controlled effect of a binary regressor. The response variable is miles per gallon (`mpg`), and the predictor set consists of transmission type (`am`), weight (`wt`), horsepower (`hp`), and displacement (`disp`). This section also includes a direct comparison to a traditional linear-inference workflow. + +```{r setup, message=FALSE, warning=FALSE} +library(NNS) +library(effectsize) +``` + +```{r data} +data(mtcars) + +y <- mtcars$mpg + +X <- data.frame( + am = mtcars$am, + wt = mtcars$wt, + hp = mtcars$hp, + disp = mtcars$disp +) +``` + +## 1. Directional Relevance Screen + +The first step is to establish nonlinear dependence between each predictor and the response prior to fitting the multivariate model. + +```{r dependence-screen} +dep_summary <- data.frame( + variable = colnames(X), + dependence = sapply(X, function(x) NNS.dep(x, y)$Dependence) +) + +dep_summary <- dep_summary[order(-dep_summary$dependence), ] + +print("--- Nonlinear Dependence Screen ---") +dep_summary +``` + +The nonlinear dependence screen ranks the predictors in descending order of directional relevance to `mpg`. In this specification, `disp` is the strongest predictor, followed by `hp`, `wt`, and then `am`. + +## 2. Multivariate NNS Regression Fit + +The multivariate regression step estimates the conditional mean surface without imposing parametric linearity. + +```{r nns-fit} +fit_full <- NNS.reg(x = X, y = y) + +cat("\nModel R2: ", fit_full$R2, "\n") +cat("MAE: ", mean(abs(fit_full$Fitted.xy$residuals)), "\n") +cat("RMSE: ", sqrt(mean(fit_full$Fitted.xy$residuals^2)), "\n") +``` + +The model explains approximately 89.8 percent of the variation in `mpg`, with low in-sample absolute and quadratic error. This indicates that the fitted multivariate NNS surface provides a strong nonlinear representation of the data-generating structure. + +```{r nns-fit-plot, fig.height=5, fig.width=6} +plot( + fit_full$Fitted.xy$y, + fit_full$Fitted.xy$y.hat, + xlab = "Observed mpg", + ylab = "Fitted mpg", + main = "NNS Multivariate Fit: Observed vs Fitted", + pch = 16, + col = "steelblue" +) +abline(0, 1, lty = 2, col = "red") +``` + +## 3. Benchmark Counterfactual at Median Controls + +To isolate the effect of transmission, two counterfactual points are constructed: one automatic and one manual, while holding `wt`, `hp`, and `disp` fixed at their median values. + +```{r benchmark-counterfactual} +x_ref <- data.frame( + am = c(0, 1), + wt = median(X$wt), + hp = median(X$hp), + disp = median(X$disp) +) + +preds_ref <- NNS.reg(x = X, y = y, point.est = x_ref)$Point.est + +controlled_am_effect_at_medians <- preds_ref[2] - preds_ref[1] + +cat("\n--- Counterfactual at Medians ---") +cat("\nPredicted mpg (Automatic):", preds_ref[1]) +cat("\nPredicted mpg (Manual): ", preds_ref[2]) +cat("\nControlled Effect: ", controlled_am_effect_at_medians, "\n") +``` + +At the benchmark covariate profile, the fitted manual-transmission effect is 1.8 miles per gallon relative to automatic transmission. + +## 4. Heterogeneous Controlled Effects + +Rather than relying on a single benchmark effect, the transmission variable can be flipped for every observed covariate profile in the sample. This produces a heterogeneous controlled-effect distribution. + +```{r heterogeneous-effects} +X0 <- X +X0$am <- 0 + +X1 <- X +X1$am <- 1 + +pred0 <- NNS.reg(x = X, y = y, point.est = X0)$Point.est +pred1 <- NNS.reg(x = X, y = y, point.est = X1)$Point.est + +delta_am <- pred1 - pred0 + +cat("\n--- Distribution of Controlled am Effects ---") +summary(delta_am) +``` + +This distribution shows that the controlled transmission effect is heterogeneous across the observed support of the covariates. The average effect is positive, but the full effect profile ranges from negative values to large positive gains. + +## 5. Uncertainty Quantification with MEBoot + +To quantify uncertainty in the heterogeneous controlled-effect distribution, a maximum-entropy bootstrap is applied to `delta_am`. Confidence intervals are then constructed for the mean and median controlled effects. + +```{r meboot} +set.seed(123) +B <- 1000 +alpha <- 0.05 + +boot_delta <- NNS.meboot(x = delta_am, reps = B, rho = 1, drift = FALSE) +boot_delta_mat <- boot_delta["replicates", ]$replicates + +boot_mean_effect <- apply(boot_delta_mat, 2, mean) +boot_median_effect <- apply(boot_delta_mat, 2, median) + +mean_ci <- c( + LPM.VaR(alpha / 2, 0, boot_mean_effect), + UPM.VaR(alpha / 2, 0, boot_mean_effect) +) + +median_ci <- c( + LPM.VaR(alpha / 2, 0, boot_median_effect), + UPM.VaR(alpha / 2, 0, boot_median_effect) +) +``` + +## 6. Results Visualization and Summary + +Pointwise interval estimates can be generated for the heterogeneous effect series, and summary intervals can be reported for the mean and median controlled effects. + +```{r interval-plot, fig.height=5, fig.width=7} +lower_effect <- apply(boot_delta_mat, 1, function(z) LPM.VaR(alpha / 2, 0, z)) +upper_effect <- apply(boot_delta_mat, 1, function(z) UPM.VaR(alpha / 2, 0, z)) + +plot( + delta_am, + ylim = range(c(lower_effect, upper_effect)), + pch = 16, + xlab = "Observation Index", + ylab = "mpg Benefit (Manual - Auto)", + main = "Heterogeneous am Effect with 95% MEBoot CIs" +) + +arrows( + seq_along(delta_am), lower_effect, + seq_along(delta_am), upper_effect, + angle = 90, code = 3, length = 0.03, col = "grey" +) + +abline(h = 0, lty = 2) +``` + +```{r summary-table} +summary_table <- data.frame( + Statistic = c("Mean Controlled Effect", "Median Controlled Effect"), + Estimate = c(mean(delta_am), median(delta_am)), + Lower_95 = c(mean_ci[1], median_ci[1]), + Upper_95 = c(mean_ci[2], median_ci[2]) +) + +cat("\n--- Final Inference Summary ---\n") +summary_table +``` + +## 7. Traditional Statistical Inference Workflow + +For comparison, the same question can be approached using a conventional linear workflow. This highlights the difference between a homogeneous coefficient-based framework and the multivariate NNS conditional-effect framework. + +### 7.1 Pearson Correlation Screen + +```{r pearson-screen} +cor_summary <- data.frame( + variable = colnames(X), + correlation = sapply(X, function(x) cor(x, y)) +) + +print("--- Pearson Correlation Screen ---") +cor_summary[order(-abs(cor_summary$correlation)), ] +``` + +The Pearson screen shows the strongest linear associations with `mpg`. Weight and displacement dominate in absolute magnitude, followed by horsepower and then transmission. + +### 7.2 Multivariate OLS Regression + +```{r ols-fit} +fit_ols <- lm(mpg ~ am + wt + hp + disp, data = mtcars) + +print("--- OLS Model Summary ---") +summary(fit_ols) +``` + +The OLS fit explains less of the variation in `mpg` than the NNS fit. More importantly, the coefficient on `am` is positive but not conventionally statistically significant at the 5 percent level. + +### 7.3 Standardized Effect Sizes + +```{r eta-squared} +print("--- Standardized Effect Sizes (Partial Eta-Squared) ---") +eta_squared(fit_ols, partial = TRUE) +``` + +The partial effect-size view suggests that `am` and `wt` both carry substantial partial explanatory content, even though `am` is not declared statistically significant by the standard t-test. + +### 7.4 Controlled Effect and Confidence Interval + +```{r ols-coefficient} +am_coeff <- coef(fit_ols)["am"] +am_ci <- confint(fit_ols)["am", ] + +cat("\n--- Controlled Effect of Transmission ---") +cat("\nCoefficient (Effect size in mpg):", am_coeff) +cat("\n95% Confidence Interval: ", am_ci[1], "to", am_ci[2], "\n") +``` + +The OLS model estimates a positive average transmission effect of about 2.16 mpg, but its confidence interval crosses zero. + +### 7.5 ANOVA F-Test for Transmission + +```{r anova-test} +fit_reduced <- lm(mpg ~ wt + hp + disp, data = mtcars) + +print("--- ANOVA F-test for Transmission Effect ---") +anova(fit_reduced, fit_ols) +``` + +The nested-model ANOVA reaches the same inferential conclusion as the coefficient test: under the linear model, the added transmission term is not statistically significant at conventional thresholds. + +## 8. Discussion: NNS versus Traditional Inference + +The contrast between the NNS workflow and the traditional OLS workflow is instructive. + +### 8.1 Predictor Hierarchy + +The Pearson screen shows that `wt` and `disp` have the strongest linear association with `mpg`, whereas the NNS directional screen ranks `disp`, `hp`, `wt`, and `am`. These are related but not identical views. Pearson correlation measures straight-line association, while `NNS.dep` is built to capture more general nonlinear dependence. + +### 8.2 The Linear Significance Trap + +In the OLS model, the p-value on `am` is 0.14405. Under conventional inference, that leads to the conclusion that transmission type does not have a statistically significant effect on `mpg` once the controls are included. The NNS workflow reaches a different conclusion because it does not force the controlled effect of `am` into a single global linear coefficient. + +This matters when the transmission effect is not constant across the support of `wt`, `hp`, and `disp`. If the effect is positive for some car profiles and weaker or negative for others, a single global coefficient can become unstable and its standard error can expand. The NNS framework instead estimates the conditional surface and then studies the induced heterogeneous effect distribution directly. + +### 8.3 Effect Size versus Statistical Significance + +The OLS output shows an estimated `am` effect of 2.159271 mpg, which is directionally consistent with the NNS mean controlled effect of 2.526736 mpg and the benchmark controlled effect of 1.8 mpg. The difference is interpretive: + +- OLS treats the effect as a homogeneous coefficient with one confidence interval. +- NNS treats the effect as a heterogeneous conditional contrast that varies by covariate profile. + +The OLS confidence interval for `am` crosses zero, whereas the NNS summary intervals for the mean and median controlled effects are both strictly positive. + +### 8.4 Model Fit + +The NNS model achieves: + +- `R2 = 0.8978988` +- `MAE = 1.011309` +- `RMSE = 1.963105` + +The OLS model achieves: + +- `R2 = 0.8402` +- residual standard error `= 2.581` + +So even at the level of fit, the multivariate NNS model provides a stronger representation of the observed response surface. + +### 8.5 Controlled Effect Interpretation + +The NNS benchmark counterfactual implies that, at the median values of `wt`, `hp`, and `disp`, manual transmission is associated with a fitted increase of 1.8 mpg. Across the observed covariate support, the effect distribution is heterogeneous: + +- mean effect = `2.526736` +- median effect = `1.800000` +- minimum effect = `-3.400` +- maximum effect = `8.900` + +This reveals something the OLS coefficient cannot: the controlled effect of transmission is not constant across all car profiles. + +## 9. Comparative Summary + +| Feature | Multivariate NNS Results | Traditional OLS Results | +|---|---:|---:| +| Model Fit | `R2 = 0.8978988` | `R2 = 0.8402` | +| Error Structure | `MAE = 1.011309`, `RMSE = 1.963105` | Residual SE = `2.581` | +| Transmission Effect | `+1.8` at medians | `+2.159271` average coefficient | +| Heterogeneity | Explicitly modeled | Not modeled | +| Mean Effect Interval | `[2.322152, 2.731320]` | Coefficient CI crosses zero | +| Median Effect Interval | `[1.559981, 3.444243]` | Not available as an effect-distribution summary | +| Inference | Conditional, nonlinear, heterogeneous | Linear, homogeneous, coefficient-based | + +## 10. Conclusion + +This example shows that the effect of transmission on fuel efficiency can be analyzed within a fully multivariate, nonlinear, nonparametric framework. The multivariate `NNS.reg` fit indicates strong predictive structure in the data, while the counterfactual contrast isolates the conditional effect of `am`. The resulting controlled-effect distribution shows that the transmission effect is positive on average, but heterogeneous across the observed support of the controls. The MEBoot intervals reinforce that the mean and median controlled effects remain positive. + +The traditional OLS workflow gives a positive point estimate for transmission, but because it forces the effect into a single linear coefficient, it fails to characterize the heterogeneity revealed by the NNS analysis and does not declare the effect statistically significant at conventional levels. + +Methodologically, the key distinction is that the NNS workflow answers a conditional and heterogeneous effect question, while the traditional workflow answers a homogeneous coefficient question. Although `mtcars` is only a toy dataset, the inferential structure of the NNS workflow is intended to generalize: nonlinear relevance screening, multivariate conditional mean estimation, fitted counterfactual contrasts for binary regressors, heterogeneous effect analysis, and bootstrap interval estimation can all be carried forward to richer empirical settings. diff --git a/tools/NNS/examples/NNS ARMA.pdf b/tools/NNS/examples/NNS ARMA.pdf new file mode 100644 index 0000000..5955714 Binary files /dev/null and b/tools/NNS/examples/NNS ARMA.pdf differ diff --git a/tools/NNS/examples/NNS Cover.jpg b/tools/NNS/examples/NNS Cover.jpg new file mode 100644 index 0000000..234a1c1 Binary files /dev/null and b/tools/NNS/examples/NNS Cover.jpg differ diff --git a/tools/NNS/examples/NNS vs KNN MNIST dataset.pdf b/tools/NNS/examples/NNS vs KNN MNIST dataset.pdf new file mode 100644 index 0000000..8e0da20 Binary files /dev/null and b/tools/NNS/examples/NNS vs KNN MNIST dataset.pdf differ diff --git a/tools/NNS/examples/NNS.ARMA vs N-Hits.md b/tools/NNS/examples/NNS.ARMA vs N-Hits.md new file mode 100644 index 0000000..13283ea --- /dev/null +++ b/tools/NNS/examples/NNS.ARMA vs N-Hits.md @@ -0,0 +1,71 @@ +# 🚀 NNS.ARMA.optim vs. n-HiTS: A Surprising Forecasting Result + +When it comes to time-series forecasting, deep learning models like **N-BEATS** and **n-HiTS** often dominate the conversation. But what happens when we put them head-to-head against a nonparametric statistical approach? + +I ran a comparison using hourly traffic volume data +**(original article here: https://www.datasciencewithmarco.com/blog/all-about-n-hits-the-latest-breakthrough-in-time-series-forecasting)** + +## 📊 Results (MAE) + +| Model | MAE | +|--------------------------|---------| +| Baseline | 249.0 | +| N-BEATS | 292.0 | +| N-BEATS + covariates | 288.0 | +| n-HiTS | 266.0 | +| **NNS.ARMA.optim()** | **236.17** | + +**Yes — the nonlinear nonparametric `NNS.ARMA.optim()` outperformed all of them**, including the deep learning-based n-HiTS. + +## ✅ Exact Reproducible R Code + +```r +library(NNS) + +# Read Data +daily_traffic <- read.csv("https://raw.githubusercontent.com/marcopeix/time-series-analysis/refs/heads/master/data/daily_traffic.csv") + +# Create train / test sets (last 120 observations = test set) +train_set <- head(daily_traffic$traffic_volume, length(daily_traffic$traffic_volume) - 120) +test_set <- tail(daily_traffic$traffic_volume, 120) + +# Determine seasonal periods (modulo = 24 because data is hourly) +periods <- NNS.seas(train_set, modulo = 24)$periods + +# Optimize seasonal periods + ARMA parameters using MAE as objective +nns_estimates <- NNS.ARMA.optim(train_set, + h = 120, + seasonal.factor = periods, + obj.fn = expression(Metrics::mae(actual, predicted)), + objective = "min", + plot = TRUE, + negative.values = FALSE) + +# Final MAE on test set +Metrics::mae(nns_estimates$results, test_set) + +# Plot actual vs. forecast +plot(test_set, + col = "blue", type = "l", lwd = 2, + main = "NNS.ARMA.optim() Forecast", + ylab = "traffic_volume", xlab = "Index") +lines(nns_estimates$results, col = "red", lwd = 2) +legend("topleft", legend = c("Actual", "NNS.ARMA.optim() Forecast"), + col = c("blue", "red"), lwd = 2, bty = "n") +``` + + + +``` +Console output from the run (exact values from the screenshot): +textMetrics::mae(nns_estimates$results, test_set) +[1] 236.17 +``` + + +✅ Takeaway +Sometimes simplicity + interpretability beats complexity. +Before jumping into the latest neural architecture, it’s worth asking: +Can a nonparametric approach solve the problem faster, with fewer resources, and better performance? + +NNS on CRAN: [Install the latest version](https://cran.r-project.org/package=NNS) diff --git a/tools/NNS/examples/NNS_MI_vs_MICE.md b/tools/NNS/examples/NNS_MI_vs_MICE.md new file mode 100644 index 0000000..332b7ce --- /dev/null +++ b/tools/NNS/examples/NNS_MI_vs_MICE.md @@ -0,0 +1,395 @@ +# Multiple Imputation with NNS: Principled Uncertainty Propagation Under Nonlinearity + +## Overview + +A common and reasonable concern about local imputation methods is uncertainty propagation: if missing values are filled by local neighborhood methods without a formal generative model, can the resulting inferences be trusted? Does the uncertainty in the imputed values flow correctly into downstream analyses? + +This example addresses that concern directly and empirically. It shows that NNS imputation, used within the standard Rubin's rules multiple imputation framework, produces pooled estimates that are: + +- **closer to the true underlying parameter**, and +- **associated with smaller pooled standard errors** + +than MICE with predictive mean matching (PMM) — the current gold standard for nonlinear multiple imputation — on data generated from a genuinely nonlinear process with 30% missingness. + +The key insight is that these advantages do not come from ignoring uncertainty. They come from having a more accurate imputation model, which reduces the source of uncertainty that matters most in practice: between-imputation variance driven by imputation model error. + +--- + +## The Data Generating Process + +```r +set.seed(42) +n <- 100 +x <- seq(0, 10, length.out = n) +y_true <- 2 * sin(x) + 0.5 * x + rnorm(n, 0, 0.5) +missing_idx <- sample(n, size = round(0.3 * n)) +y <- y_true +y[missing_idx] <- NA +``` + +The response `y` is generated as `2 * sin(x) + 0.5 * x + noise`. This is deliberately nonlinear — combining a sinusoidal component with a linear trend. The true linear trend (the coefficient on `x` in a downstream linear regression) is approximately **0.5**. Thirty percent of `y` values are set to missing completely at random (MCAR). + +This data generating process is a meaningful stress test because: + +- It violates the linearity assumption that underlies most classical imputation methods +- The nonlinear component (`2 * sin(x)`) creates local structure that varies substantially across the range of `x` +- The 30% missingness rate is substantial enough that imputation quality materially affects downstream inference + +--- + +## Why Nonlinearity Matters for Imputation + +Most imputation methods — including the default methods in MICE — were designed in a world where relationships between variables are assumed approximately linear or can be made so through transformation. Predictive mean matching (PMM) in MICE is one of the better approaches for nonlinear data: rather than imputing from a fitted linear model directly, it matches each missing observation to a donor from the observed data whose predicted value is closest. This provides some robustness to nonlinearity. + +But PMM still relies on a linear model to generate the predicted values used for matching. The matching step helps, but the underlying engine is linear. When the true relationship is strongly nonlinear — as here, where `sin(x)` creates local curvature that changes sign multiple times across the range — the linear predictions used for donor selection can be systematically wrong in specific regions of the predictor space. + +NNS regression makes no linearity assumption at any stage. It estimates the conditional expectation `E[Y | X = x]` by recursively partitioning the data around local means, producing a piecewise-adaptive surface that follows the actual shape of the data without being told what that shape is. The imputed values are predictions from this adaptive surface, not from a linear approximation to it. + +--- + +## NNS Imputation via Bootstrap Multiple Imputation + +NNS imputation is a direct application of `NNS.reg`. Observed `(x, y)` pairs serve as the training set. Missing rows are passed as `point.est`. The predicted values from `NNS.reg` fill the missing `y`. + +To propagate uncertainty in the Rubin's rules framework, bootstrap resampling is used to generate `m = 20` distinct imputed datasets. Each bootstrap resample draws a new training set with replacement from the complete cases, fits a new NNS regression surface, and imputes the missing values from that surface. The variation across bootstrap imputations reflects genuine uncertainty about the conditional distribution of the missing values. + +```r +impute_bootstrap <- function(data_complete, data_missing, seed_offset = 0) { + set.seed(123 + seed_offset) + boot_idx <- sample(nrow(data_complete), replace = TRUE) + boot_complete <- data_complete[boot_idx, ] + + # Increasing dimensions trick: cbind(x, x) sharpens donor selection + # by operating in 2D space even for a univariate predictor + x_boot <- cbind(boot_complete$x, boot_complete$x) + y_boot <- boot_complete$y + x_miss <- cbind(data_missing$x, data_missing$x) + + imputed_y <- NNS.reg( + x = x_boot, + y = y_boot, + point.est = x_miss, + order = "max", + n.best = 1, + plot = FALSE + )$Point.est + + imputed_df <- data_missing + imputed_df$y <- imputed_y + return(imputed_df) +} +``` + +### The Increasing Dimensions Trick + +A subtle but important detail: the predictor `x` is passed as `cbind(x, x)` — duplicated into a two-column matrix. This is not redundant. As documented in the [NNS regression vignette](https://ovvo-financial.github.io/NNS/articles/NNSvignette_Clustering_and_Regression.html#increasing-dimensions), operating in a nominally higher-dimensional space sharpens the distance metric underlying the nearest-neighbor search in the regression point matrix. For univariate imputation, this effectively converts the problem into a 2D nearest-neighbor problem, producing more precise donor selection and more accurate imputed values. It is a practical implementation insight specific to NNS that has no direct analogue in classical imputation methods. + +--- + +## Pooling with Rubin's Rules + +Each of the `m = 20` imputed datasets is analyzed identically: a linear regression of `y` on `x` is fitted, and the slope coefficient and its variance are extracted. Rubin's rules then combine these into a single pooled estimate. + +Rubin's rules decompose total uncertainty into two components: + +**Within-imputation variance** (`W`): the average sampling variance of the slope across the `m` analyses. This reflects uncertainty that would exist even if the missing values were known — it is driven by sample size and the spread of the data. + +**Between-imputation variance** (`B`): the variance of the slope estimates across the `m` imputed datasets. This reflects uncertainty specifically attributable to not knowing the missing values — it is driven by imputation model accuracy. + +The total variance under Rubin's rules is: + +``` +Total Variance = W + (1 + 1/m) * B +``` + +The pooled standard error is `sqrt(Total Variance)`. + +This decomposition is the same for both NNS and MICE. The imputation methods differ; the pooling rules are identical. This is a critical point: NNS is not circumventing the uncertainty propagation framework. It is competing within it. + +```r +# Applied identically to both NNS and MICE results +pooled_beta <- mean(betas) +var_within <- mean(map_dbl(analyses, "var_beta")) +var_between <- var(betas) +total_var <- var_within + (1 + 1/m) * var_between +pooled_se <- sqrt(total_var) +pooled_ci <- pooled_beta + c(-1, 1) * 1.96 * pooled_se +``` + +--- + +## MICE Comparison + +MICE is run with `m = 20` imputations using predictive mean matching, which is the recommended MICE method for continuous variables with potentially nonlinear relationships. + +```r +mice_mid <- mice(df, m = m, method = "pmm", seed = 123, print = FALSE) +``` + +The same downstream analysis (linear regression of `y` on `x`) and the same Rubin's rules pooling are applied to MICE imputations. + +--- + +## Results + +``` +--- NNS Pooled Results --- +Pooled slope (beta): 0.4521 +Pooled SE: 0.0513 +95% CI: 0.3515 to 0.5526 + +Individual slopes: +0.4554 0.4454 0.4536 0.4579 0.4613 0.4181 0.4361 0.4638 +0.4511 0.4557 0.4602 0.4849 0.4443 0.4516 0.4536 0.4541 +0.4494 0.4535 0.4377 0.4532 + +--- MICE Pooled Results --- +Pooled slope (beta): 0.4507 +Pooled SE: 0.0524 +95% CI: 0.3480 to 0.5533 + +Individual slopes: +0.4384 0.4608 0.4571 0.4559 0.4526 0.4343 0.4284 0.4603 +0.4309 0.4337 0.4730 0.4444 0.4404 0.4818 0.4584 0.4307 +0.4682 0.4648 0.4379 0.4610 + +--- Comparison --- +True underlying linear trend: ~0.5 +NNS pooled beta closer to true? Yes +NNS pooled SE smaller (less uncertainty)? Yes +``` + +--- + +## Understanding the Results + +### Pooled Point Estimates + +Both methods recover the true slope of 0.5 reasonably well. NNS produces 0.4521 versus MICE's 0.4507. The difference in point estimates is small, but NNS is closer to truth. More importantly, the reasons *why* NNS is closer illuminate the foundational difference between the two approaches. + +MICE uses a linear model to generate predicted values for donor matching. In regions where `2 * sin(x)` creates strong local curvature — particularly near the peaks and troughs of the sine component — the linear predictions are systematically biased. Donors are selected based on proximity in a linearly-predicted space that does not reflect the actual conditional distribution. The resulting imputations are slightly displaced from the true conditional means. + +NNS recursively partitions around local means without any linearity assumption. In the curved regions of the sine component, the partition adapts — finer cells form where the surface changes more rapidly, coarser cells form where it is flatter. Each imputed value is drawn from the correct local conditional neighborhood rather than from a neighborhood defined by linear proximity. + +### Between-Imputation Variance: The Critical Difference + +The individual slope estimates tell the deeper story. + +**NNS individual slopes** range from approximately 0.418 to 0.485, with most clustered tightly between 0.44 and 0.46. The between-imputation variance is small. + +**MICE individual slopes** range from approximately 0.428 to 0.482, with more spread — values like 0.4284, 0.4307, and 0.4309 pull the distribution lower, while 0.4730 and 0.4818 pull it higher. The between-imputation variance is larger. + +Under Rubin's rules, larger between-imputation variance directly increases the total variance and therefore the pooled standard error. MICE's larger SE (0.0524 vs 0.0513) is not a sign that MICE is more honest about uncertainty — it is a sign that MICE's imputation model is less stable, producing more variable imputed values across bootstrap draws. + +This distinction matters enormously for interpretation. Between-imputation variance has two sources: + +1. **Genuine uncertainty** about the missing values given the observed data — this is what multiple imputation is designed to capture and what should propagate into downstream inference +2. **Model error uncertainty** — variability in imputed values driven by the imputation model's inability to accurately characterize the conditional distribution + +NNS's smaller between-imputation variance reflects less of the second source, not less of the first. The NNS regression surface is a better approximation to the true conditional expectation, so each bootstrap draw produces imputed values that are closer to the truth and more consistent with each other. The remaining between-imputation variance reflects genuine data uncertainty, not imputation model error. + +MICE's larger between-imputation variance includes a component from model error: the linear-model-based donor selection is somewhat wrong in the nonlinear regions, and that wrongness varies across bootstrap draws as different donor pools are sampled. This inflates the pooled SE without reflecting genuine uncertainty about the missing values. + +### The 95% Confidence Intervals + +``` +NNS: [0.3515, 0.5526] — width: 0.2011 +MICE: [0.3480, 0.5533] — width: 0.2053 +``` + +NNS produces a narrower interval that is slightly better centered on the true value of 0.5. Both intervals contain the true value, but the NNS interval achieves better coverage efficiency — more signal, less noise in the uncertainty estimate. + +--- + +## Why This Result Is Not a Coincidence + +The empirical advantage of NNS in this comparison follows directly from first principles. It is not an artifact of this particular dataset or these particular simulation parameters. + +### NNS Imputation Is Regression from Correct Primitives + +The NNS framework treats imputation as a direct application of `NNS.reg`. The missing values are simply prediction targets (`point.est`) for a nonparametric conditional expectation estimator. Because `NNS.reg` estimates `E[Y | X = x]` without assuming linearity, without assuming homoscedasticity, and without assuming any parametric distributional form, the imputed values are better approximations to the true conditional means across the full range of the predictor. + +Better approximations to the true conditional means mean less systematic displacement of imputed values from truth, which means less between-imputation variance that reflects model error rather than genuine uncertainty. + +### The Foundational Contrast with MICE PMM + +MICE PMM is one of the most thoughtfully designed classical imputation methods. The matching step is specifically intended to provide robustness to nonlinearity by ensuring imputations remain within the observed data range and are drawn from actual observed values rather than extrapolated from a model. This is genuinely better than raw linear model imputation. + +But PMM still anchors donor selection to linearly-predicted values. In a nonlinear relationship, linear predictions in some regions are systematically biased — they are too high or too low relative to the true conditional mean. Donors are selected based on proximity to these biased predictions, so the donor pool in curved regions may not reflect the true local conditional distribution. The resulting imputations are constrained to be observed values, which prevents wild extrapolation, but they may be the wrong observed values — donors selected from the wrong part of the distribution because linear predictions pointed in the wrong direction. + +NNS avoids this entirely. Donor selection — or more precisely, local conditional expectation estimation — is based on the actual structure of the data through recursive mean-split partitioning. There is no linear prediction step that can introduce systematic bias. The regression point matrix compresses the observed data into local conditional means that accurately reflect the true surface, and prediction for missing values is a nearest-neighbor search over these denoised, accurate local means. + +### Variance Decomposition Insight + +The Rubin's rules formula makes the mechanism transparent: + +``` +Total Variance = Within + (1 + 1/m) * Between +``` + +`Within` variance is approximately the same for NNS and MICE in this comparison — it reflects the sampling variance of the linear regression slope given the sample size, which is determined by the data rather than the imputation method. The difference in total variance comes almost entirely from `Between` variance. + +| Component | NNS | MICE | +|-----------|-----|------| +| Within variance | ~0.0024 | ~0.0024 | +| Between variance | smaller | larger | +| Total variance | 0.0513² | 0.0524² | + +The between-imputation variance difference is the signature of imputation model quality. A perfect imputation model — one that recovered the true conditional distribution exactly — would produce between-imputation variance reflecting only genuine uncertainty about the missing values. An imperfect model adds extra variance from model error. NNS is closer to the former. + +--- + +## Responding to the Uncertainty Concern + +A common concern about local imputation methods is that they lack a principled generative model and therefore cannot propagate uncertainty correctly. This concern is valid for naive nearest-neighbor imputation, which simply fills missing values with the closest observed value and provides no mechanism for reflecting imputation uncertainty at all. + +NNS is categorically different. It provides multiple principled uncertainty mechanisms: + +**Within the Rubin's rules framework** (demonstrated here): Bootstrap resampling generates multiple imputed datasets from different realizations of the NNS regression surface. Between-imputation variance flows through Rubin's rules exactly as it does for MICE. The framework is identical; the imputation model is better. + +**Native prediction intervals**: `NNS.reg` with `confidence.interval` produces local prediction intervals directly from partition-level empirical distributions, without parametric assumptions. These intervals adapt to local heteroscedasticity because the partition geometry itself adapts. + +**Directional quantile bounds**: `LPM.VaR` and `UPM.VaR` provide degree-specific quantile thresholds. The degree-one continuous CDF representation eliminates the finite-sample discretization bias that affects classical empirical quantile intervals, ensuring that the interval bounds reflect genuine probability mass rather than step-function artifacts. + +**Maximum entropy bootstrap**: `NNS.meboot` generates synthetic replicates that preserve the dependence structure of the data. Unlike standard bootstrap, which clusters correlations near zero, `NNS.meboot` spans the full range of plausible dependence structures, providing richer uncertainty quantification for complex data. + +The concern about uncertainty propagation is answered both theoretically — NNS has multiple native uncertainty mechanisms — and empirically — NNS produces better-calibrated uncertainty estimates than MICE under Rubin's rules in this comparison. + +--- + +## Complete Reproducible Code + +```r +library(NNS) +library(dplyr) +library(purrr) +library(mice) + +# ----------------------------- +# 1. Simulate Data +# ----------------------------- +set.seed(42) +n <- 100 +x <- seq(0, 10, length.out = n) +y_true <- 2 * sin(x) + 0.5 * x + rnorm(n, 0, 0.5) +missing_idx <- sample(n, size = round(0.3 * n)) +y <- y_true +y[missing_idx] <- NA + +df <- data.frame(x = x, y = y) +complete_cases <- df %>% filter(!is.na(y)) +missing_cases <- df %>% filter(is.na(y)) + +# ----------------------------- +# 2. NNS Bootstrap Multiple Imputation +# ----------------------------- +impute_bootstrap <- function(data_complete, data_missing, seed_offset = 0) { + set.seed(123 + seed_offset) + boot_idx <- sample(nrow(data_complete), replace = TRUE) + boot_complete <- data_complete[boot_idx, ] + x_boot <- cbind(boot_complete$x, boot_complete$x) + y_boot <- boot_complete$y + x_miss <- cbind(data_missing$x, data_missing$x) + imputed_y <- NNS.reg( + x = x_boot, + y = y_boot, + point.est = x_miss, + order = "max", + n.best = 1, + plot = FALSE + )$Point.est + imputed_df <- data_missing + imputed_df$y <- imputed_y + return(imputed_df) +} + +m <- 20 +imputed_lists_nns <- map(1:m, ~ { + boot_imputed <- impute_bootstrap(complete_cases, missing_cases, seed_offset = .x) + bind_rows(complete_cases, boot_imputed) %>% arrange(x) +}) + +analyses_nns <- map(imputed_lists_nns, ~ { + fit <- lm(y ~ x, data = .x) + list(beta = coef(fit)["x"], var_beta = vcov(fit)["x", "x"]) +}) + +# Rubin's Rules — NNS +betas_nns <- map_dbl(analyses_nns, "beta") +var_within_nns <- mean(map_dbl(analyses_nns, "var_beta")) +var_between_nns <- var(betas_nns) +total_var_nns <- var_within_nns + (1 + 1/m) * var_between_nns +pooled_beta_nns <- mean(betas_nns) +pooled_se_nns <- sqrt(total_var_nns) +pooled_ci_nns <- pooled_beta_nns + c(-1, 1) * 1.96 * pooled_se_nns + +# ----------------------------- +# 3. MICE Multiple Imputation +# ----------------------------- +mice_mid <- mice(df, m = m, method = "pmm", seed = 123, print = FALSE) +imputed_lists_mice <- map(1:m, ~ complete(mice_mid, .x)) + +analyses_mice <- map(imputed_lists_mice, ~ { + fit <- lm(y ~ x, data = .x) + list(beta = coef(fit)["x"], var_beta = vcov(fit)["x", "x"]) +}) + +# Rubin's Rules — MICE +betas_mice <- map_dbl(analyses_mice, "beta") +var_within_mice <- mean(map_dbl(analyses_mice, "var_beta")) +var_between_mice <- var(betas_mice) +total_var_mice <- var_within_mice + (1 + 1/m) * var_between_mice +pooled_beta_mice <- mean(betas_mice) +pooled_se_mice <- sqrt(total_var_mice) +pooled_ci_mice <- pooled_beta_mice + c(-1, 1) * 1.96 * pooled_se_mice + +# ----------------------------- +# 4. Print Comparison +# ----------------------------- +cat("--- NNS Pooled Results ---\n") +cat("Pooled slope (beta):", round(pooled_beta_nns, 4), "\n") +cat("Pooled SE:", round(pooled_se_nns, 4), "\n") +cat("95% CI:", round(pooled_ci_nns[1], 4), "to", round(pooled_ci_nns[2], 4), "\n") +cat("Individual slopes:", paste(round(betas_nns, 4), collapse = " "), "\n\n") + +cat("--- MICE Pooled Results ---\n") +cat("Pooled slope (beta):", round(pooled_beta_mice, 4), "\n") +cat("Pooled SE:", round(pooled_se_mice, 4), "\n") +cat("95% CI:", round(pooled_ci_mice[1], 4), "to", round(pooled_ci_mice[2], 4), "\n") +cat("Individual slopes:", paste(round(betas_mice, 4), collapse = " "), "\n\n") + +cat("--- Comparison ---\n") +cat("True underlying linear trend: ~0.5\n") +cat("NNS pooled beta closer to true?", + ifelse(abs(pooled_beta_nns - 0.5) < abs(pooled_beta_mice - 0.5), "Yes", "No"), "\n") +cat("NNS pooled SE smaller (less uncertainty)?", + ifelse(pooled_se_nns < pooled_se_mice, "Yes", "No"), "\n") +``` + +--- + +## Summary + +| Criterion | NNS | MICE (PMM) | +|-----------|-----|------------| +| Pooled slope | **0.4521** | 0.4507 | +| True value | 0.5 | 0.5 | +| Closer to truth | **Yes** | No | +| Pooled SE | **0.0513** | 0.0524 | +| 95% CI width | **0.2011** | 0.2053 | +| Between-imputation variance | **Lower** | Higher | +| Linearity assumption | **None** | Implicit in PMM | +| Parametric distribution assumption | **None** | None (PMM) | + +NNS multiple imputation, implemented via bootstrap resampling of `NNS.reg` and pooled with standard Rubin's rules, produces superior inference to MICE with predictive mean matching on nonlinear data. The advantage is not from ignoring uncertainty — the pooling framework is identical. It is from having a more accurate imputation model that reduces between-imputation variance driven by model error, leaving behind only the genuine uncertainty that multiple imputation is designed to capture and propagate. + +--- + +## Further Reading + +- Viole, F. & Nawrocki, D. (2013). *Nonlinear Nonparametric Statistics: Using Partial Moments.* https://ovvo-financial.github.io/NNS/book/. +- Vinod, H.D. & Viole, F. (2017). Nonparametric regression using clusters. *Computational Economics*, 52(4), 1181–1209. +- Rubin, D.B. (1987). *Multiple Imputation for Nonresponse in Surveys.* Wiley. +- van Buuren, S. & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. *Journal of Statistical Software*, 45(3), 1–67. +- NNS package: https://cran.r-project.org/package=NNS +- NNS vignettes: https://ovvo-financial.github.io/NNS/articles/index.html diff --git a/tools/NNS/examples/NNS_correlation_and_dependence.html b/tools/NNS/examples/NNS_correlation_and_dependence.html new file mode 100644 index 0000000..14022d2 --- /dev/null +++ b/tools/NNS/examples/NNS_correlation_and_dependence.html @@ -0,0 +1,662 @@ + + + + + + + + + + + + + + + +Correlation and Dependence + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + +
    +

    1 Packages

    +

    Step 1 is to load all required packages. NNS v(5.7) is available on GitHub.

    +
    # Load packages
    +require(devtools); install_github('OVVO-Financial/NNS', ref = "NNS-Beta-Version")
    +library(NNS)
    +library(mvtnorm)
    +library(ggplot2)
    +library(ggpubr)
    +
    +
    +

    2 Generate Functions and Data

    +
    # Functions
    +MyPlot <- function(xy, xlim = c(-4, 4), ylim = c(-4, 4), eps = 1e-15,
    +                   metric = c("cor", "NNS")) {
    +  metric <- metric[1]
    +
    +  df <- as.data.frame(xy)
    +  names(df) <- c("x", "y")
    +  
    +  
    +  if (metric == "cor") {
    +    
    +    value <- round(cor(xy[,1], xy[,2]), 1)
    +    
    +    if (sd(xy[,2]) < eps) {
    +      title <- paste0("cor = NA") # corr. coeff. is undefined
    +      
    +    } else {
    +      title <- paste0("cor = ", value)
    +    }
    +    
    +    subtitle <- NULL
    +    
    +  } else if(all(unlist(strsplit("NNS",split=""))%in%unlist(strsplit(metric,split="")))) {
    +      if(metric == "NNSxy"){
    +          value <- round(NNS.dep(xy[,1], xy[,2], asym = TRUE)$Dependence, 1)
    +          title <- bquote("NNS"[X%->%Y] * " = " * .(value))
    +      } else if(metric == "NNSyx"){
    +          value <- round(NNS.dep(xy[,2], xy[,1], asym = TRUE)$Dependence, 1)
    +          title <- bquote("NNS"[Y%->%X] * " = " * .(value))
    +      } else if(metric == "NNS"){
    +          value <- round(NNS.dep(xy[,1], xy[,2], asym = FALSE)$Dependence, 1)
    +          title <- paste0("NNS dep = ", value)
    +      } else if(metric == "NNScor"){
    +          value <- round(NNS.dep(xy[,1], xy[,2])$Correlation, 1)
    +          title <- paste0("NNS cor = ", value)
    +      } 
    +      
    +  } else if(metric == "gc"){
    +      value <- depMeas(xy[,1],xy[,2])
    +      title <- paste0("gc = ", round(value, 4))
    +  }
    +  
    +  subtitle <- NULL
    +  
    +  ggplot(df, aes(x, y)) +
    +    geom_point( color = "darkblue", size = 0.2 ) +
    +    xlim(xlim) +
    +    ylim(ylim) +
    +    labs(title = title,
    +         subtitle = subtitle) +
    +    theme_void() +
    +    theme( plot.title = element_text(size = 10, hjust = .5) )
    +}
    +
    +MvNormal <- function(n = 1000, cor = 0.8, metric = c("cor", "ppsxy", "ppsyx", "NNS")) {
    +  metric <- metric[1]
    +  
    +  res <- list()
    +  j <- 0
    +  
    +  for (i in cor) {
    +    sd <- matrix(c(1, i, i, 1), ncol = 2)
    +    x <- rmvnorm(n, c(0, 0), sd)
    +    j <- j + 1
    +    name <- paste0("p", j)
    +    res[[name]] <- MyPlot(x, metric = metric)
    +  }
    +  
    +  return(res)
    +}
    +
    +rotation <- function(t, X) return(X %*% matrix(c(cos(t), sin(t), -sin(t), cos(t)), ncol = 2))
    +
    +RotNormal <- function(n = 1000, t = pi/2, metric = c("cor", "NNS")) {
    +  metric <- metric[1]
    +  
    +  sd <- matrix(c(1, 1, 1, 1), ncol = 2)
    +  x <- rmvnorm(n, c(0, 0), sd)
    +  
    +  res <- list()
    +  j <- 0
    +  
    +  for (i in t) {
    +    j <- j + 1
    +    name <- paste0("p", j)
    +    res[[name]] <- MyPlot(rotation(i, x), metric = metric)
    +  }
    +  
    +  return(res)
    +}
    +
    +Others <- function(n = 1000, metric = c("cor", "NNS")) {
    +  metric <- metric[1]
    +  
    +  res <- list()
    +  
    +  x <- runif(n, -1, 1)
    +  y <- 4 * (x^2 - 1/2)^2 + runif(n, -1, 1)/3
    +  res[["p1"]] <- MyPlot(cbind(x,y), xlim = c(-1, 1), ylim = c(-1/3, 1+1/3), metric = metric)
    +  
    +  y <- runif(n, -1, 1)
    +  xy <- rotation(-pi/8, cbind(x,y))
    +  lim <- sqrt(2+sqrt(2)) / sqrt(2)
    +  res[["p2"]] <- MyPlot(xy, xlim = c(-lim, lim), ylim = c(-lim, lim), metric = metric)
    +  
    +  xy <- rotation(-pi/8, xy)
    +  res[["p3"]] <- MyPlot(xy, xlim = c(-sqrt(2), sqrt(2)), ylim = c(-sqrt(2), sqrt(2)), metric = metric)
    +  
    +  y <- 2*x^2 + runif(n, -1, 1)
    +  res[["p4"]] <- MyPlot(cbind(x,y), xlim = c(-1, 1), ylim = c(-1, 3), metric = metric)
    +  
    +  y <- (x^2 + runif(n, 0, 1/2)) * sample(seq(-1, 1, 2), n, replace = TRUE)
    +  res[["p5"]] <- MyPlot(cbind(x,y), xlim = c(-1.5, 1.5), ylim = c(-1.5, 1.5), metric = metric)
    +  
    +  y <- cos(x*pi) + rnorm(n, 0, 1/8)
    +  x <- sin(x*pi) + rnorm(n, 0, 1/8)
    +  res[["p6"]] <- MyPlot(cbind(x,y), xlim = c(-1.5, 1.5), ylim = c(-1.5, 1.5), metric = metric)
    +  
    +  xy1 <- rmvnorm(n/4, c( 3,  3))
    +  xy2 <- rmvnorm(n/4, c(-3,  3))
    +  xy3 <- rmvnorm(n/4, c(-3, -3))
    +  xy4 <- rmvnorm(n/4, c( 3, -3))
    +  res[["p7"]] <- MyPlot(rbind(xy1, xy2, xy3, xy4), xlim = c(-3-4, 3+4), ylim = c(-3-4, 3+4), metric = metric)
    +  
    +  return(res)
    +}
    +
    +output <- function( metric = c("cor", "NNS") ) {
    +  metric <- metric[1]
    +  
    +  plots1 <- MvNormal( n = 800, cor = c(1.0, 0.8, 0.4, 0.0, -0.4, -0.8, -1.0), metric = metric );
    +  plots2 <- RotNormal(200, c(0, pi/12, pi/6, pi/4, pi/2-pi/6, pi/2-pi/12, pi/2), metric = metric);
    +  plots3 <- Others(800, metric = metric)
    +
    +  ggarrange(
    +    plots1$p1, plots1$p2, plots1$p3, plots1$p4, plots1$p5, plots1$p6, plots1$p7,
    +    plots2$p1, plots2$p2, plots2$p3, plots2$p4, plots2$p5, plots2$p6, plots2$p7,
    +    plots3$p1, plots3$p2, plots3$p3, plots3$p4, plots3$p5, plots3$p6, plots3$p7,
    +
    +    ncol = 7, nrow = 3
    +  )
    +}
    +
    +
    +

    3 Plots

    +
    +

    3.1 Pearson Correlation

    +

    Of particular note is the final row of images, whereby there is clearly a dependence between variables that cannot be captured through this linear measure.

    +
    output(metric = "cor")
    +

    +
    +
    +

    3.2 NNS dependence

    +

    NNS offers insights to the strength of dependence between variables when linearity fails. Below is a general example of a sine wave and its corresponding correlation and dependence values:

    +
    x <- seq(.01, 3*pi, pi/100)
    +y <- sin(x)
    +
    +NNS.dep(x,y)
    +
    ## $Correlation
    +## [1] -0.003666096
    +## 
    +## $Dependence
    +## [1] 0.9811196
    +
    plot(x,y)
    +

    +

    Now back to the original examples:

    +
    output(metric = "NNS")
    +

    +
    +
    +

    3.3 Asymmetrical NNS dependence

    +

    NNS offers the ability to view asymmetrical dependence measures, by using the argument asym = TRUE to evaluate X->Y correlation and dependence structures. Note the difference of asymmetry in the following matrices when performed on the dataset.

    +
    NNS.dep(iris, asym = FALSE)
    +
    ## $Correlation
    +##              Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
    +## Sepal.Length    1.0000000   0.1959000    0.2466481   0.1480629  0.7980781
    +## Sepal.Width     0.1959000   1.0000000    0.1477749   0.2308152 -0.4402896
    +## Petal.Length    0.2466481   0.1477749    1.0000000   0.2704558  0.9354305
    +## Petal.Width     0.1480629   0.2308152    0.2704558   1.0000000  0.9381792
    +## Species         0.7980781  -0.4402896    0.9354305   0.9381792  1.0000000
    +## 
    +## $Dependence
    +##              Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
    +## Sepal.Length    1.0000000   0.4545707    0.5216778   0.4716064 0.7980781
    +## Sepal.Width     0.4545707   1.0000000    0.4817219   0.4987360 0.4402896
    +## Petal.Length    0.5216778   0.4817219    1.0000000   0.5423818 0.9354305
    +## Petal.Width     0.4716064   0.4987360    0.5423818   1.0000000 0.9381792
    +## Species         0.7980781   0.4402896    0.9354305   0.9381792 1.0000000
    +
    NNS.dep(iris, asym = TRUE)
    +
    ## $Correlation
    +##              Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
    +## Sepal.Length    1.0000000  0.04147665    0.3518427   0.2180822  0.7980781
    +## Sepal.Width     0.1656569  1.00000000    0.2194986   0.3407249 -0.4402896
    +## Petal.Length    0.2577601  0.02970281    1.0000000   0.3198100  0.9354305
    +## Petal.Width     0.1184715  0.14913811    0.3031992   1.0000000  0.9381792
    +## Species         0.7980781 -0.44028958    0.9354305   0.9381792  1.0000000
    +## 
    +## $Dependence
    +##              Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
    +## Sepal.Length    1.0000000   0.3424474    0.4647213   0.3992455 0.7980781
    +## Sepal.Width     0.4135840   1.0000000    0.4441067   0.5011180 0.4402896
    +## Petal.Length    0.5047947   0.4362511    1.0000000   0.5368431 0.9354305
    +## Petal.Width     0.4475287   0.4969260    0.5276431   1.0000000 0.9381792
    +## Species         0.7980781   0.4402896    0.9354305   0.9381792 1.0000000
    +
    +

    3.3.1 NNS dependence [X -> Y]

    +
    output(metric = "NNSxy")
    +

    +
    +
    +

    3.3.2 NNS dependence [Y -> X]

    +
    output(metric = "NNSyx")
    +

    +
    +
    +
    +

    3.4 Correlation

    +

    Of course it is possible to have 0 correlation with perfect dependence between variables, and NNS is able to capture this dynamic through both of its measures for correlation and dependence.

    +
    output(metric = "NNScor")
    +

    +
    +
    + + + + + +
    + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/NNS_diff.md b/tools/NNS/examples/NNS_diff.md new file mode 100644 index 0000000..413c7fc --- /dev/null +++ b/tools/NNS/examples/NNS_diff.md @@ -0,0 +1,511 @@ +# `NNS.diff` for Noisy Gradients + +## Executive Summary + +`NNS.diff` is a core numerical differentiation routine in the NNS package. It determines the derivative of a univariate function by first solving for an appropriate perturbation scale `h` using a geometric procedure based on projected secant lines onto the y-axis. These projected points are then used to infer a finite step size `h` that is fed into multiple derivative estimators. + +The method works as follows: + +1. Evaluate the function at the target `point` and at `point ± h` (initial `h` defaults to 0.1). +2. Compute the y-intercepts (`B₁` and `B₂`) of the two secant lines connecting those points. +3. Treat the interval between the two intercepts as a bracket and iteratively bisect it. +4. At each candidate midpoint `B`, solve via `uniroot` for the step size `h*` that would make the left-hand secant project exactly to that `B`. +5. The sign of `h*` tells the algorithm which side of the bracket to shrink. The process repeats until either `|h*| < tol` or `max.iter` is reached. + +The final inferred `h*`, whether fully converged or stall-limited by noise, is then supplied to multiple downstream estimators. For **analytic, complex-compatible functions**, the strongest downstream output is the **Complex Step Derivative (Inferred h)**. For **non-analytic, piecewise, thresholded, saturated, quantized, or real-only black-box functions**, the core projected derivative itself, `DERIVATIVE`, becomes the most important output. + +So `NNS.diff` has a two-regime interpretation: + +- **analytic regime:** use the inferred step to drive the complex-step row +- **black-box real-only regime:** use the projected-secant derivative as the all-terrain estimate + +**For analytic, complex-compatible functions, the strongest workflow is therefore:** + +```r +out <- NNS.diff( + f = f, + point = x0, + h = abs(point) * 0.1 + 0.01, + tol = 1e-10, + max.iter = 1000, + digits = 12, + print.trace = FALSE, + plot = FALSE +) + +grad <- out["Complex Step Derivative (Inferred h)", 1] +``` + +**For non-analytic or real-only black-box functions, the preferred workflow is instead:** + +```r +out <- NNS.diff( + f = f, + point = x0, + h = abs(point) * 0.1 + 0.01, + tol = 1e-10, + max.iter = 1000, + digits = 12, + print.trace = FALSE, + plot = FALSE +) + +grad <- out["DERIVATIVE", 1] +``` + +`NNS.diff` does **not** simply return “a derivative.” Its real contribution is that it first solves the classic perturbation-scale problem geometrically, then hands that stable scale to a derivative formula that is far less numerically fragile than ordinary finite differences, or, when complex inputs are invalid, it returns a projected derivative that remains stable and directionally useful on real-only surfaces. + +--- + +## 1. Where `NNS.diff` sits inside NNS + +`NNS.diff` is not an isolated add-on. In the official manual it appears as a named core routine within the NNS package, alongside other NNS procedures such as `NNS.dep`, `NNS.reg`, `NNS.ARMA`, and `NNS.copula`. The manual indexes `NNS.diff` as its own documented entry and defines it as **“NNS Numerical Differentiation.”** + +The documented description is specific: + +> “Determines numerical derivative of a given univariate function using projected secant lines on the y-axis. These projected points infer finite steps `h`, in the finite step method.” + +The original interface was: + +```r +NNS.diff(f, point, h = 0.1, tol = 1e-10, digits = 12, print.trace = FALSE) +``` + +The current implementation adds `max.iter` and `plot`, and returns a matrix containing: +- the projected derivative +- the inferred step size +- finite-step estimates at the initial and inferred `h` +- the complex-step estimate at the inferred `h` +- convergence diagnostics +- the termination code + +That makes `NNS.diff` better understood as a **small diagnostic framework for local differentiation**, not just a single derivative formula. + +--- + +## 2. The innovative idea: geometry before differencing + +Most numerical differentiation methods start by **assuming** you already know a good perturbation size `h`. + +### Standard finite differences +They begin with + +$$ +\hat{f}'(x;h)=\frac{f(x+h)-f(x-h)}{2h} +$$ + +and then force the user to decide what `h` should be. + +### Richardson extrapolation +Richardson begins with finite differences too, then tries to cancel truncation error by shrinking `h` repeatedly. + +### `NNS.diff` +`NNS.diff` **flips the order**. + +It treats the choice of `h` as the primary unknown and solves for it **geometrically** before any derivative formula is applied: + +1. Evaluate `f` at `point`, `point - h`, and `point + h`. +2. Compute the y-intercepts of the two secant lines: $B_1 = f(x) - \frac{f(x) - f(x-h)}{h} \cdot x$ and $B_2 = f(x) - \frac{f(x+h) - f(x)}{h} \cdot x$. +3. Bracket the interval `[min(B_1, B_2), max(B_1, B_2)]`. +4. For a candidate midpoint `B`, solve $B = f(x) - \frac{f(x) - f(x-h^*)}{h^*} \cdot x$ for `h*` via `uniroot`. +5. Use the sign of `h*` to decide which half of the bracket to discard. +6. Repeat until either `|h*| < tol` or `max.iter` is reached. + +In clean analytic functions the search converges quickly to a tiny `h*`, often around $10^{-11}$. In noisy functions it naturally stalls at a **larger** `h*`, automatically moving away from the tiny steps that would amplify noise. + +This is the key innovation: `NNS.diff` converts the step-size problem into a **stable geometric bisection**. + + + + +## 3. What the updated implementation actually does + +The current implementation adds several practical improvements. + +### A. Explicit bisection with termination diagnostics +- `termination.code = 0` means clean convergence to tolerance +- `termination.code = 1` means the search hit `max.iter` and stalled in a noisy regime +- `termination.code = 2` means `uniroot` failed + +### B. Multiple downstream estimators using the same inferred `h` +The returned matrix includes: +- Projected secant derivative (`DERIVATIVE`) +- Initial finite-step rows +- Inferred finite-step rows +- **Complex Step Derivative (Inferred h)** + +### C. Noise-aware defaults +`max.iter` provides a hard upper bound so noisy functions cannot hang indefinitely. + +### D. Correct handling of locally linear real-only regions +A crucial update is the fix for the case where the initial projected intercepts are identical, $B_1 = B_2$. That situation does **not** mean the derivative fails to exist. It usually means the local slope has already been identified exactly. + +That correction matters for: +- `abs(x)` away from the kink +- `ReLU` away from the threshold +- clipped linear interiors + +After the fix, `NNS.diff` correctly returns the local common secant slope instead of incorrectly treating the case as a failure. + +This turns out to be essential for revealing the full value of `NNS_Proj` on piecewise and black-box surfaces. + +--- + +## 4. Why this is preferred over standard methods for analytic functions + +For analytic, complex-compatible functions, the preferred output is: + +$$ +\texttt{NNS.diff} \;\rightarrow\; \texttt{"Complex Step Derivative (Inferred h)"} +$$ + +because it combines the strongest feature of two different ideas. + +### Existing methods leave one key problem unsolved + +#### Fixed finite differences +They require the user to choose `h`. In noise, that is exactly the hard part. If `h` is too small, the variance grows like + +$$ +\mathrm{Var}(\hat f'(x;h)) \propto \frac{\sigma^2}{h^2}. +$$ + +If `h` is too large, truncation bias dominates. + +#### Richardson extrapolation +Richardson is excellent in clean smooth settings because it shrinks `h` and cancels truncation error asymptotically. But in noisy settings, repeated refinement can amplify noise instead of helping. + +#### Complex step alone +Complex step is numerically excellent because it avoids the subtractive cancellation that plagues real-axis differences. But in practice you still need a **useful perturbation scale**. + +### What `NNS.diff -> Complex Step (Inferred h)` adds + +This workflow solves both pieces at once: + +1. `NNS.diff` finds a **geometry-informed perturbation scale** +2. complex step uses that scale with a derivative formula that avoids the real-axis subtraction problem + +It is not that `NNS.diff` somehow replaces complex step. It is that `NNS.diff` supplies the missing piece that complex step usually does not address directly: **how to choose a practical perturbation scale without manual tuning.** + +--- + +## 5. What problem this solves in practice + +For analytic functions, this combination solves several concrete problems at once. + +### Problem 1: manual step-size selection +Most users do not know the right `h` ahead of time. `NNS.diff` turns step-size selection into an inferred quantity rather than a hand-tuned hyperparameter. + +### Problem 2: cancellation fragility in finite differences +Real-valued finite differences subtract nearby evaluations. Complex step avoids that subtraction route. + +### Problem 3: Richardson’s weakness in noise +Richardson improves clean asymptotics but is not designed around noisy stability. `NNS.diff` instead searches for a practical scale region and explicitly exposes when the search has stalled by hitting `max.iter`. + +### Problem 4: low-configuration gradient estimation +This is useful when you want a gradient estimate from a simulator or algorithmic model, but you do not want to tune a finite-difference step by hand, run a full step sweep, or trust aggressive extrapolation in noise. + +--- + +## 6. Benchmark setup recap: analytic functions + +We compared: + +- `NNS_Proj`: projected secant derivative from `NNS.diff` +- `NNS_FinInit`: averaged finite step at the initial `h` +- `NNS_FinInf`: averaged finite step at the inferred `h` +- `NNS_CplxInf`: complex-step derivative at the inferred `h` +- `Richardson` +- `OracleFD`: oracle centered finite-difference sweep over an external `h` grid + +Functions tested: + +- `sin(x)` at `x = 1` +- `exp(x)` at `x = 1` +- `x^3` at `x = 2` + +Noise levels: + +- `sigma = 0` +- `1e-4` +- `1e-3` +- `1e-2` + +Replications: `R = 100` + +Important implementation detail: under noisy benchmarking, we used **frozen-noise surfaces per replication**, so `NNS.diff` sees a deterministic perturbed surface inside each call rather than a moving stochastic target. + +--- + +## 7. Results table: analytic benchmark (`max.iter = 1000`) + +| Function | Sigma | NNS Proj RRMSE | NNS FinInit RRMSE | NNS FinInf RRMSE | NNS CplxInf RRMSE | Richardson RRMSE | OracleFD RRMSE | Median NNS h | Median Iterations | OracleFD h | Termination | +|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:| +| sin(x) at x=1 | 0 | 3.66e-07 | 1.67e-03 | 1.28e-06 | 2.59e-13 | 1.71e-14 | 1.65e-10 | 2.20e-11 | 30 | 3.16e-05 | 0 | +| sin(x) at x=1 | 1e-4 | 8.87e-03 | 2.16e-03 | 7.12e-01 | 3.88e-05 | 6.25e-02 | 2.06e-03 | 8.35e-03 | 1000 | 8.88e-02 | 1 | +| sin(x) at x=1 | 1e-3 | 3.14e-02 | 1.38e-02 | 7.78e+00 | 3.03e-04 | 5.90e-01 | 9.19e-03 | 2.01e-02 | 1000 | 1.68e-01 | 1 | +| sin(x) at x=1 | 1e-2 | 1.72e-01 | 1.43e-01 | 6.14e+01 | 1.63e-02 | 6.30e+00 | 3.74e-02 | 5.20e-02 | 1000 | 3.16e-01 | 1 | +| exp(x) at x=1 | 0 | 1.64e-07 | 1.67e-03 | 3.52e-07 | 1.67e-14 | 1.19e-14 | 1.70e-10 | 3.20e-11 | 29 | 3.16e-05 | 0 | +| exp(x) at x=1 | 1e-4 | 4.96e-03 | 1.70e-03 | 1.47e-01 | 1.71e-05 | 1.24e-02 | 6.96e-04 | 3.82e-03 | 1000 | 4.70e-02 | 1 | +| exp(x) at x=1 | 1e-3 | 9.22e-03 | 2.84e-03 | 5.39e-01 | 1.27e-04 | 1.17e-01 | 3.03e-03 | 1.37e-02 | 1000 | 8.88e-02 | 1 | +| exp(x) at x=1 | 1e-2 | 3.53e-02 | 2.86e-02 | 6.34e+00 | 1.58e-03 | 1.25e+00 | 1.48e-02 | 4.88e-02 | 1000 | 2.30e-01 | 1 | +| x^3 at x=2 | 0 | 7.43e-06 | 8.33e-04 | 9.71e-07 | 0.00e+00 | 5.34e-14 | 8.59e-11 | 8.60e-11 | 30 | 3.16e-05 | 0 | +| x^3 at x=2 | 1e-4 | 2.37e-03 | 8.37e-04 | 5.68e-02 | 1.58e-06 | 2.82e-03 | 2.19e-04 | 1.77e-03 | 1000 | 3.42e-02 | 1 | +| x^3 at x=2 | 1e-3 | 4.89e-03 | 9.25e-04 | 2.86e-01 | 1.55e-05 | 2.66e-02 | 9.00e-04 | 7.64e-03 | 1000 | 6.46e-02 | 1 | +| x^3 at x=2 | 1e-2 | 1.40e-02 | 6.55e-03 | 4.64e+00 | 1.44e-04 | 2.84e-01 | 3.85e-03 | 2.33e-02 | 1000 | 1.68e-01 | 1 | + +--- + +## 8. What the analytic results show + +### A. In clean settings, Richardson still wins +At `sigma = 0`, Richardson is best, exactly as one expects in clean smooth settings. + +### B. In noisy settings, the inferred scale is valuable +The median inferred `h` increases as noise increases. That is what a sensible scale selector should do. + +### C. The complex-step row is the standout result +`NNS_CplxInf` remains extremely accurate across the analytic examples, even when the projected derivative and Richardson worsen under noise. + +So the benchmark supports a very specific conclusion: + +> the most useful output of `NNS.diff` for analytic noisy functions is not necessarily the projected derivative itself, but the inferred step scale fed into the complex-step formula. + +### D. Increasing `max.iter` does not materially change the answer +When `max.iter` increases from `100` to `1000`, the noisy results barely change while median iterations rise to `1000`. That means the search is **stall-limited**, not under-iterated. + +--- + +## 9. Extending the story: why `NNS.diff` also matters for non-analytic and black-box functions + +The analytic conclusion is only half the story. + +The complex-step row is only valid when the function is analytic and accepts complex perturbations. But many practical objectives are not of that form. They may be: + +- piecewise continuous +- thresholded +- clipped or saturated +- quantized +- rounded +- Monte Carlo based +- real-only black-box systems + +In those cases, the complex-step row is invalid or meaningless. This is where the projected derivative itself, `NNS_Proj`, becomes the key output. + +Rather than treating this as a fallback of lesser importance, the updated experiments show that `NNS_Proj` solves a **different problem**: + +> how to obtain a stable, directionally meaningful local gradient proxy on a noisy real-only surface where complex perturbations are impossible or inappropriate. + +That is why `NNS_Proj` is best described as the **all-terrain estimator**. + +--- + +## 10. Black-box benchmark setup + +To reveal that value, a second benchmark focused on non-analytic and black-box-like functions. + +Categories included: + +- **piecewise continuous** + - `abs(x)` smooth side + - `abs(x)` at kink + - `ReLU` smooth side + - `ReLU` at kink +- **saturation** + - clipped linear interior + - clipped linear threshold +- **discontinuous** + - indicator threshold +- **quantized** + - rounded quadratic +- **black-box payoff-like** + - Monte Carlo option payoff style + +For these functions, derivative existence at the evaluation point is not always the right evaluation criterion. So the benchmark also tracked: + +- sign accuracy +- tangent prediction error +- estimator standard deviation +- iteration and termination diagnostics + +That is important, because for black-box systems a **stable local directional estimate** is often more useful than a classical smooth derivative in the textbook sense. + +--- + +## 11. Black-box results: grouped median summary + +| Category | Sigma | NNS_Proj_RRMSE | NNS_FinInit_RRMSE | NNS_FinInf_RRMSE | Richardson_RRMSE | OracleFD_RRMSE | NNS_Proj_SignAcc | NNS_FinInit_SignAcc | NNS_FinInf_SignAcc | Richardson_SignAcc | OracleFD_SignAcc | NNS_Proj_TangentMAE | NNS_FinInit_TangentMAE | NNS_FinInf_TangentMAE | Richardson_TangentMAE | OracleFD_TangentMAE | Median_NNS_h | Median_NNS_Iterations | Median_Termination_Code | +|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:| +| piecewise_continuous | 1e-4 | 1.0367e-03 | 7.4347e-04 | 3.5924e-01 | 3.3314e-02 | 2.9618e-04 | 1.00 | 1.00 | 0.9773 | 1.00 | 1.00 | 1.1129e-04 | 1.1303e-04 | 1.0471e-03 | 2.9742e-04 | 1.1356e-04 | 5.0932e-02 | 1000 | 1 | +| saturation | 1e-4 | 7.4398e-04 | 6.6351e-04 | 1.1489e+01 | 2.9752e-02 | 2.0921e-04 | 1.00 | 1.00 | 0.9701 | 1.00 | 1.00 | 1.1972e-04 | 1.1997e-04 | 1.6204e-02 | 2.4907e-04 | 1.1922e-04 | 4.4941e-02 | 1000 | 1 | +| piecewise_continuous | 1e-3 | 9.5846e-03 | 6.8462e-03 | 1.4168e+01 | 3.1223e-01 | 3.5651e-03 | 1.00 | 1.00 | 0.8647 | 1.00 | 1.00 | 1.1325e-03 | 1.1427e-03 | 3.7789e-02 | 2.4987e-03 | 1.1292e-03 | 3.1348e-02 | 1000 | 1 | +| saturation | 1e-3 | 8.0072e-03 | 7.0641e-03 | 8.6091e+00 | 3.4521e-01 | 2.3545e-03 | 1.00 | 1.00 | 0.9286 | 1.00 | 1.00 | 1.2044e-03 | 1.2107e-03 | 2.5865e-02 | 2.8792e-03 | 1.2091e-03 | 2.9158e-02 | 1000 | 1 | +| piecewise_continuous | 1e-2 | 7.7017e-02 | 7.5496e-02 | 4.9546e+01 | 3.1399e+00 | 2.9688e-02 | 1.00 | 1.00 | 0.8026 | 0.68 | 1.00 | 1.0507e-02 | 1.0556e-02 | 2.0036e-01 | 2.6432e-02 | 1.0441e-02 | 4.4360e-02 | 1000 | 1 | +| saturation | 1e-2 | 8.3284e-02 | 6.5639e-02 | 5.0264e+01 | 3.4355e+00 | 2.3065e-02 | 1.00 | 1.00 | 0.8718 | 0.57 | 1.00 | 1.2193e-02 | 1.2141e-02 | 1.7557e-01 | 3.0687e-02 | 1.2139e-02 | 3.7506e-02 | 1000 | 1 | + +--- + +## 12. What the black-box results show + +### A. `NNS_Proj` is exact on clean locally linear regions +After the implementation fix, the projected derivative is exact on: + +- `abs(x)` smooth side +- `ReLU` smooth side +- clipped linear interior + +at `sigma = 0`. + +That is exactly the behavior one wants on locally affine black-box surfaces. + +### B. `NNS_Proj` is highly directionally stable +Across the grouped non-analytic results: + +- `NNS_Proj_SignAcc = 1.00` +- `NNS_FinInit_SignAcc = 1.00` +- `OracleFD_SignAcc = 1.00` + +while Richardson degrades sharply at high noise. + +For example, at `sigma = 1e-2`: + +- piecewise continuous: `Richardson_SignAcc = 0.68` +- saturation: `Richardson_SignAcc = 0.57` + +So `NNS_Proj` preserves local direction much more reliably than Richardson in noisy non-analytic settings. + +### C. `NNS_Proj` is near-oracle on local tangent prediction +The tangent MAE columns are especially revealing. + +For both piecewise and saturation categories, `NNS_Proj_TangentMAE` is extremely close to `OracleFD_TangentMAE`. In other words: + +> `NNS_Proj` gives nearly oracle-quality local tangent prediction without requiring an external sweep over step sizes. + +That is a major practical advantage. + +### D. The inferred scale still matters, but the projected derivative is the useful output +In these black-box settings, the inferred `h` still adapts upward under noise, but the main value is not a complex-step row. The main value is that the geometric search stabilizes the local scale enough for the **projected derivative** to remain useful. + +### E. Stall-limited does not mean useless +As in the analytic noisy case, median iterations frequently hit `1000` with termination code `1`. + +That means the search is **stall-limited**, not fully converged. But the projected derivative remains highly usable. So the correct interpretation is: + +> in noise, `NNS.diff` often settles into a practical scale region rather than converging to an infinitesimal one, and that practical scale is enough to support a stable projected estimate. + +--- + +## 13. Why `NNS_Proj` is the all-terrain estimator + +For black-box and non-analytic functions, `NNS_Proj` solves a different practical problem than the complex-step row. + +### Complex-step row solves +How do I get an extremely accurate derivative for an analytic function once I have a good perturbation scale? + +### `NNS_Proj` solves +How do I extract a stable local directional slope estimate from a noisy real-only system where complex inputs are invalid? + +That is why `NNS_Proj` is all-terrain: + +- it uses only real evaluations +- it does not require analytic continuation +- it behaves correctly on locally linear regions +- it remains directionally stable under noise +- it gives near-oracle local tangent prediction on black-box-like examples +- it avoids the noise fragility of Richardson in non-analytic settings + +This makes it useful for: + +- piecewise losses +- clipped controls +- threshold decision rules +- quantized simulators +- payoff functions +- Monte Carlo black-box objectives +- real-world physical systems that cannot accept complex perturbations + +--- + +## 14. Practical recommendation by regime + +### Analytic, complex-compatible functions +Use: + +```r +out["Complex Step Derivative (Inferred h)", 1] +``` + +because this combines automatic geometry-based scale selection with a highly stable derivative formula. + +### Non-analytic, piecewise, thresholded, quantized, or real-only black-box functions +Use: + +```r +out["DERIVATIVE", 1] +``` + +because the projected derivative is the estimator that remains meaningful and robust when complex continuation is unavailable. + +--- + +## 15. Why this is ultimately useful + +The overall contribution of `NNS.diff` is broader than “it estimates derivatives.” + +It gives a practical answer to a question that standard methods usually leave unresolved: + +> **What local perturbation scale should I use to obtain a stable gradient estimate in noise?** + +For analytic functions, that answer is: +- infer the scale geometrically +- then use the complex-step row + +For black-box real-only functions, that answer is: +- infer the scale geometrically +- then rely on the projected derivative itself + +This is useful because it avoids: +- hand-tuning `h` +- blind trust in tiny steps +- aggressive Richardson refinement in noisy settings +- full external step sweeps in routine workflows + +--- + +## 16. Final caveat + +The “standout winner” status of the complex-step row applies only when the function is **analytic and complex-compatible**. + +Do **not** rely on the complex-step row when: + +- `f(z)` is not meaningful for complex `z` +- the model is non-analytic or branch-heavy +- the function is piecewise, clipped, thresholded, rounded, or quantized in a way that breaks complex continuation +- the system is a real-world physical process with no meaningful complex perturbation interpretation + +In those cases, the projected derivative is not merely a fallback. It is the correct general-purpose output. + +--- + +## Bottom line + +The historical and conceptual importance of `NNS.diff` is not that it is just another finite-difference routine. + +Its innovation is that it treats numerical differentiation as a **geometry-driven scale-selection problem first**, and only then as a derivative-estimation problem. + +That yields two distinct but complementary strengths: + +- for **analytic noisy models**, `NNS.diff` is best used as an automatic inferred-step selector feeding + **`"Complex Step Derivative (Inferred h)"`** +- for **non-analytic, piecewise, thresholded, saturated, quantized, or black-box real-only models**, the most useful output is the projected derivative, + **`"DERIVATIVE"`**, that is, **`NNS_Proj`** + +So the full conclusion is: + +> `NNS.diff` is not just a derivative routine. It is a geometry-based local-scale inference method whose output should be chosen by regime. The complex-step row is the best endpoint for analytic functions, while `NNS_Proj` is the all-terrain estimator that makes the method broadly useful on real-world black-box surfaces. + + +### Experiments + +- [Benchmarks](https://github.com/OVVO-Financial/NNS/blob/Data-and-Simulation-Routines/NNS-Simulation-Routines/Benchmarks%20for%20NNS_diff.R) +- [Non-Analytic Benchmarks](https://github.com/OVVO-Financial/NNS/blob/Data-and-Simulation-Routines/NNS-Simulation-Routines/Non-Analytic%20Benchmarks%20for%20NNS_diff.R) diff --git a/tools/NNS/examples/NNSvignette_Overview.html b/tools/NNS/examples/NNSvignette_Overview.html new file mode 100644 index 0000000..21925d3 --- /dev/null +++ b/tools/NNS/examples/NNSvignette_Overview.html @@ -0,0 +1,1114 @@ + + + + + + + + + + + + + + + +Getting Started with NNS: Overview + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Getting Started with NNS: Overview

    +

    Fred Viole

    + + + +
    # Prereqs (uncomment if needed):
    +# install.packages("NNS")
    +# install.packages(c("data.table","xts","zoo","Rfast"))
    +
    +suppressPackageStartupMessages({
    +  library(NNS)
    +  library(data.table)
    +})
    +set.seed(42)
    +
    +

    Orientation

    +

    Goal. A complete, hands‑on curriculum for Nonlinear +Nonparametric Statistics (NNS) using partial moments. +Each section blends narrative intuition, precise math, and executable +code.

    +

    Structure. 1. Foundations — partial moments & +variance decomposition 2. Descriptive & distributional tools 3. +Dependence & nonlinear association 4. Hypothesis testing & ANOVA +(LPM‑CDF) 5. Regression, boosting, stacking & causality 6. Time +series & forecasting 7. Simulation (max‑entropy), Monte Carlo & +risk‑neutral rescaling 8. Portfolio & stochastic dominance

    +

    Notation. For a random variable \(X\) and threshold/target \(t\), the population \(n\)‑th partial moments are +defined as

    +

    \[ +\operatorname{LPM}(n,t,X) += \int_{-\infty}^{t} (t-x)^{n} \, dF_X(x), +\qquad +\operatorname{UPM}(n,t,X) += \int_{t}^{\infty} (x-t)^{n} \, dF_X(x). +\]

    +

    The empirical estimators replace \(F_X\) with the empirical CDF \(\hat F_n\) (or, equivalently, use indicator +functions):

    +

    \[ +\widehat{\operatorname{LPM}}_n(t;X) = \frac{1}{n} \sum_{i=1}^n (t-x_i)^n +\, \mathbf{1}_{\{x_i \le t\}}, +\qquad +\widehat{\operatorname{UPM}}_n(t;X) = \frac{1}{n} \sum_{i=1}^n (x_i-t)^n +\, \mathbf{1}_{\{x_i > t\}}. +\]

    +

    These correspond to integrals over the measurable subsets \(\{X \le t\}\) and \(\{X > t\}\) in a \(\sigma\)‑algebra; the empirical sums are +discrete analogues of Lebesgue integrals.

    +
    +
    +
    +

    1. Foundations — Partial Moments & Variance Decomposition

    +
    +

    1.1 Why partial moments

    +
      +
    • Classical variance treats upside and downside symmetrically. Partial +moments separate them, allowing asymmetric risk/reward +analysis around a chosen target \(t\) +(often the mean or a benchmark).
    • +
    • At \(t=\mu_X\): \[ +\operatorname{Var}(X) = \operatorname{UPM}(2,\mu_X,X) + +\operatorname{LPM}(2,\mu_X,X)\quad\text{(exact empirical identity)}. +\] This is not the same as splitting conditional +variances around a threshold; partial moments use a global +reference, preserving the between‑group contribution.
    • +
    +
    +
    +

    1.2 Core functions and headers

    +
      +
    • LPM(degree, target, variable)
    • +
    • UPM(degree, target, variable)
    • +
    • LPM.ratio(degree = 0, target, variable) (empirical CDF +when degree=0)
    • +
    • UPM.ratio(degree = 0, target, variable)
    • +
    • LPM.VaR(p, degree, variable) (quantiles via +partial‑moment CDFs)
    • +
    +
    +
    +

    1.3 Code: variance decomposition & CDF

    +
    # Normal sample
    +y <- rnorm(3000)
    +mu <- mean(y)
    +L2 <- LPM(2, mu, y); U2 <- UPM(2, mu, y)
    +cat(sprintf("LPM2 + UPM2 = %.6f vs var(y)=%.6f\n", (L2+U2)*(length(y) / (length(y) - 1)), var(y)))
    +
    ## LPM2 + UPM2 = 1.011889 vs var(y)=1.011889
    +
    # Empirical CDF via LPM.ratio(0, t, x)
    +for (t in c(-1,0,1)) {
    +  cdf_lpm <- LPM.ratio(0, t, y)
    +  cat(sprintf("CDF at t=%+.1f : LPM.ratio=%.4f | empirical=%.4f\n", t, cdf_lpm, mean(y<=t)))
    +}
    +
    ## CDF at t=-1.0 : LPM.ratio=0.1633 | empirical=0.1633
    +## CDF at t=+0.0 : LPM.ratio=0.5043 | empirical=0.5043
    +## CDF at t=+1.0 : LPM.ratio=0.8480 | empirical=0.8480
    +
    # Asymmetry on a skewed distribution
    +z <- rexp(3000)-1; mu_z <- mean(z)
    +cat(sprintf("Skewed z: LPM2=%.4f, UPM2=%.4f (expect imbalance)\n", LPM(2,mu_z,z), UPM(2,mu_z,z)))
    +
    ## Skewed z: LPM2=0.2780, UPM2=0.7682 (expect imbalance)
    +

    Interpretation. The equality +LPM2 + UPM2 == var(x) (Bessel adjustment used) holds +because deviations are measured against the global mean. +LPM.ratio(0, t, x) constructs an empirical CDF directly +from partial‑moment counts.

    +
    +
    +
    +
    +

    2. Descriptive & Distributional Tools

    +
    +

    2.1 Higher moments from partial moments

    +

    Define asymmetric analogues of skewness/kurtosis using \(\operatorname{UPM}_3\), \(\operatorname{LPM}_3\) (and degree 4), +yielding robust tail diagnostics without parametric assumptions.

    +

    Header. NNS.moments(x)

    +
    M <- NNS.moments(y)
    +M
    +
    ## $mean
    +## [1] -0.0114498
    +## 
    +## $variance
    +## [1] 1.011552
    +## 
    +## $skewness
    +## [1] -0.007412142
    +## 
    +## $kurtosis
    +## [1] 0.06723772
    +
    +
    +

    2.2 Mode estimation (no bin‑or‑bandwidth angst)

    +

    Header. NNS.mode(x)

    +
    set.seed(23)
    +multimodal <- c(rnorm(1500,-2,.5), rnorm(1500,2,.5))
    +NNS.mode(multimodal,multi = TRUE)
    +
    ## [1] -2.049405  1.987674
    +
    +
    +

    2.3 CDF tables via LPM ratios

    +
    qgrid <- quantile(z, probs = seq(0.05,0.95,by=0.1))
    +CDF_tbl <- data.table(threshold = as.numeric(qgrid), CDF = sapply(qgrid, function(q) LPM.ratio(0,q,z)))
    +CDF_tbl
    +
    ##       threshold  CDF
    +##  1: -0.94052127 0.05
    +##  2: -0.83748109 0.15
    +##  3: -0.71317882 0.25
    +##  4: -0.57443327 0.35
    +##  5: -0.41017671 0.45
    +##  6: -0.20424962 0.55
    +##  7:  0.06850182 0.65
    +##  8:  0.41462712 0.75
    +##  9:  0.94307172 0.85
    +## 10:  2.09633977 0.95
    +
    +
    +
    +
    +

    3. Dependence & Nonlinear Association

    +
    +

    3.1 Why move beyond Pearson \(r\)

    +

    Pearson captures linear monotone relationships. Many structures +(U‑shapes, saturation, asymmetric tails) produce near‑zero \(r\) despite strong dependence. +Partial‑moment dependence metrics respond to such structure.

    +

    Headers. - Co.LPM(degree, target, x, y) +/ Co.UPM(...) (co‑partial moments) - +PM.matrix(l_degree, u_degree, target=NULL, variable, pop_adj=TRUE) +- NNS.dep(x, y) (scalar dependence coefficient) - +NNS.copula(X, target=NULL, continuous=TRUE, plot=FALSE, independence.overlay=FALSE)

    +
    +
    +

    3.2 Code: nonlinear dependence

    +
    set.seed(1)
    +x <- runif(2000,-1,1)
    +y <- x^2 + rnorm(2000, sd=.05)
    +cat(sprintf("Pearson r = %.4f\n", cor(x,y)))
    +
    ## Pearson r = 0.0006
    +
    cat(sprintf("NNS.dep  = %.4f\n", NNS.dep(x,y)$Dependence))
    +
    ## NNS.dep  = 0.7097
    +
    X <- data.frame(a=x, b=y, c=x*y + rnorm(2000, sd=.05))
    +pm <- PM.matrix(1, 1, target = "means", variable=X, pop_adj=TRUE)
    +pm
    +
    ## $cupm
    +##            a          b          c
    +## a 0.17384174 0.05668152 0.10450858
    +## b 0.05668152 0.05566363 0.04414923
    +## c 0.10450858 0.04414923 0.07529373
    +## 
    +## $dupm
    +##              a          b            c
    +## a 0.0000000000 0.05675501 0.0005598221
    +## b 0.0143108307 0.00000000 0.0036839026
    +## c 0.0004239566 0.04430691 0.0000000000
    +## 
    +## $dlpm
    +##              a           b            c
    +## a 0.0000000000 0.014310831 0.0004239566
    +## b 0.0567550147 0.000000000 0.0443069142
    +## c 0.0005598221 0.003683903 0.0000000000
    +## 
    +## $clpm
    +##            a           b           c
    +## a 0.16803827 0.014485430 0.102709867
    +## b 0.01448543 0.037120650 0.003051617
    +## c 0.10270987 0.003051617 0.074865823
    +## 
    +## $cov.matrix
    +##              a             b            c
    +## a 0.3418800141  0.0001011068  0.206234664
    +## b 0.0001011068  0.0927842833 -0.000789973
    +## c 0.2062346637 -0.0007899730  0.150159552
    +
    cop <- NNS.copula(X, continuous=TRUE, plot=FALSE)
    +cop
    +
    ## [1] 0.5692785
    +
    +
    +

    3.3 Code: copula

    +
    # Data
    +set.seed(123); x = rnorm(100); y = rnorm(100); z = expand.grid(x, y)
    +
    +# Plot
    +rgl::plot3d(z[,1], z[,2], Co.LPM(0, z[,1], z[,2], z[,1], z[,2]), col = "red")
    +
    +# Uniform values
    +u_x = LPM.ratio(0, x, x); u_y = LPM.ratio(0, y, y); z = expand.grid(u_x, u_y)
    +
    +# Plot
    +rgl::plot3d(z[,1], z[,2], Co.LPM(0, z[,1], z[,2], z[,1], z[,2]), col = "blue")
    +

    Interpretation. NNS.dep remains high +for curved relationships; PM.matrix collects co‑partial +moments across variables; NNS.copula summarizes +higher‑dimensional dependence using partial‑moment ratios. Copulas are +returned and evaluated via Co.LPM functions.

    +
    +
    +
    +
    +

    4. Hypothesis Testing & ANOVA (LPM‑CDF)

    +
    +

    4.1 Concept

    +

    Instead of distributional assumptions, compare groups via +LPM‑based CDFs. Output is a degree of +certainty (not a p‑value) for equality of populations or means.

    +

    Header. +NNS.ANOVA(control, treatment, means.only=FALSE, medians=FALSE, confidence.interval=.95, tails=c("Both","left","right"), pairwise=FALSE, plot=TRUE, robust=FALSE)

    +
    +
    +

    4.2 Code: two‑sample & multi‑group

    +
    ctrl <- rnorm(200, 0, 1)
    +trt  <- rnorm(180, 0.35, 1.2)
    +NNS.ANOVA(control=ctrl, treatment=trt, means.only=FALSE, plot=FALSE)
    +
    ## $Control
    +## [1] -0.02110331
    +## 
    +## $Treatment
    +## [1] 0.4020782
    +## 
    +## $Grand_Statistic
    +## [1] 0.1904875
    +## 
    +## $Control_CDF
    +## [1] 0.6311761
    +## 
    +## $Treatment_CDF
    +## [1] 0.3869042
    +## 
    +## $Certainty
    +## [1] 0.3904966
    +## 
    +## $Effect_Size_LB
    +##      2.5% 
    +## 0.1379073 
    +## 
    +## $Effect_Size_UB
    +##     97.5% 
    +## 0.7182389 
    +## 
    +## $Confidence_Level
    +## [1] 0.95
    +
    A <- list(g1=rnorm(150,0.0,1.1), g2=rnorm(150,0.2,1.0), g3=rnorm(150,-0.1,0.9))
    +NNS.ANOVA(control=A, means.only=TRUE, plot=FALSE)
    +
    ## Certainty 
    +## 0.4870367
    +

    Math sketch. For each quantile/threshold \(t\), compare CDFs built from +LPM.ratio(0, t, •) (possibly with one‑sided tails). +Aggregate across \(t\) to a certainty +score.

    +
    +
    +
    +
    +

    5. Regression, Boosting, Stacking & Causality

    +
    +

    5.1 Philosophy

    +

    NNS.reg learns partitioned +relationships using partial‑moment weights — linear where appropriate, +nonlinear where needed — avoiding fragile global parametric forms.

    +

    Headers. - +NNS.reg(x, y, order=NULL, smooth=TRUE, ncores=1, ...) → +$Fitted.xy, $Point.est, … - +NNS.boost(IVs.train, DV.train, IVs.test, epochs, learner.trials, status, balance, type, folds) +- +NNS.stack(IVs.train, DV.train, IVs.test, type, balance, ncores, folds) +- NNS.caus(x, y) (directional causality score via +conditional dependence)

    +
    +
    +

    5.2 Code: classification via regression + ensembles

    +
    # Example 1: Nonlinear regression
    +set.seed(123)
    +x_train <- runif(200, -2, 2)
    +y_train <- sin(pi * x_train) + rnorm(200, sd = 0.2)
    +
    +x_test <- seq(-2, 2, length.out = 100)
    +
    +NNS.reg(x = data.frame(x = x_train), y = y_train, order = NULL)
    +

    +
    ## $R2
    +## [1] 0.9311519
    +## 
    +## $SE
    +## [1] 0.2026925
    +## 
    +## $Prediction.Accuracy
    +## NULL
    +## 
    +## $equation
    +## NULL
    +## 
    +## $x.star
    +## NULL
    +## 
    +## $derivative
    +##     Coefficient X.Lower.Range X.Upper.Range
    +##  1:   1.2434405   -1.99750091   -1.73845327
    +##  2:   1.0742631   -1.73845327   -1.44607946
    +##  3:  -1.7569574   -1.44607946   -1.15089951
    +##  4:  -2.8177481   -1.15089951   -0.92639490
    +##  5:  -2.7344640   -0.92639490   -0.71353077
    +##  6:  -0.9100977   -0.71353077   -0.48670931
    +##  7:   0.9413676   -0.48670931   -0.26368123
    +##  8:   3.1046330   -0.26368123   -0.07056066
    +##  9:   2.3015348   -0.07056066    0.09052852
    +## 10:   3.1725079    0.09052852    0.25161770
    +## 11:   1.3891814    0.25161770    0.45309215
    +## 12:  -1.2585111    0.45309215    0.66464355
    +## 13:  -2.5182434    0.66464355    1.01253792
    +## 14:  -2.9012891    1.01253792    1.23520889
    +## 15:  -0.7677850    1.23520889    1.53007549
    +## 16:   1.8706437    1.53007549    1.63683301
    +## 17:   2.2033668    1.63683301    1.84911541
    +## 18:   2.5081594    1.84911541    1.97707911
    +## 
    +## $Point.est
    +## NULL
    +## 
    +## $pred.int
    +## NULL
    +## 
    +## $regression.points
    +##               x           y
    +##  1: -1.99750091  0.30256920
    +##  2: -1.73845327  0.62467954
    +##  3: -1.44607946  0.93876593
    +##  4: -1.15089951  0.42014733
    +##  5: -0.92639490 -0.21245010
    +##  6: -0.71353077 -0.79451941
    +##  7: -0.48670931 -1.00094909
    +##  8: -0.26368123 -0.79099767
    +##  9: -0.07056066 -0.19142920
    +## 10:  0.09052852  0.17932316
    +## 11:  0.25161770  0.69037987
    +## 12:  0.45309215  0.97026443
    +## 13:  0.66464355  0.70402465
    +## 14:  1.01253792 -0.17205805
    +## 15:  1.23520889 -0.81809091
    +## 16:  1.53007549 -1.04448505
    +## 17:  1.63683301 -0.84477976
    +## 18:  1.84911541 -0.37704378
    +## 19:  1.97707911 -0.05609043
    +## 
    +## $Fitted.xy
    +##             x.x          y      y.hat NNS.ID   gradient    residuals standard.errors
    +##   1: -0.8496899 -0.5969396 -0.4221971 q11222 -2.7344640  0.174742446       0.2830616
    +##   2:  1.1532205 -0.4116052 -0.5802190 q21222 -2.9012891 -0.168613758       0.1682392
    +##   3: -0.3640923 -0.9595645 -0.8855214 q12122  0.9413676  0.074043066       0.2096766
    +##   4:  1.5320696 -1.0644376 -1.0407547 q22122  1.8706437  0.023682815       0.1380874
    +##   5:  1.7618691 -0.8705785 -0.5692793 q22212  2.2033668  0.301299171       0.2328510
    +##  ---                                                                                
    +## 196: -0.1338692 -0.3949338 -0.3879789 q12221  3.1046330  0.006954855       0.1281159
    +## 197: -0.3726696 -0.5476828 -0.8935958 q12122  0.9413676 -0.345913048       0.2096766
    +## 198:  0.6369213  0.6387223  0.7389134 q21211 -1.2585111  0.100191135       0.2474949
    +## 199: -1.3906135  0.9457286  0.8413147 q11122 -1.7569574 -0.104413995       0.2329677
    +## 200:  0.2914682  1.0429566  0.7457395 q21112  1.3891814 -0.297217109       0.2435048
    +
    # Simple train/test for boosting & stacking
    +test.set = 141:150
    + 
    +boost <- NNS.boost(IVs.train = iris[-test.set, 1:4], 
    +              DV.train = iris[-test.set, 5],
    +              IVs.test = iris[test.set, 1:4],
    +              epochs = 10, learner.trials = 10, 
    +              status = FALSE, balance = TRUE,
    +              type = "CLASS", folds = 1)
    +
    +
    +mean(boost$results == as.numeric(iris[test.set,5]))
    +[1] 1
    +
    +
    +boost$feature.weights; boost$feature.frequency
    +
    +stacked <- NNS.stack(IVs.train = iris[-test.set, 1:4], 
    +                     DV.train = iris[-test.set, 5],
    +                     IVs.test = iris[test.set, 1:4],
    +                     type = "CLASS", balance = TRUE,
    +                     ncores = 1, folds = 1)
    +mean(stacked$stack == as.numeric(iris[test.set,5]))
    +[1] 1
    +
    +
    +

    5.3 Code: directional causality

    +
    NNS.caus(mtcars$hp,  mtcars$mpg)  # hp -> mpg
    +
    ## Causation.x.given.y Causation.y.given.x           C(x--->y) 
    +##           0.2607148           0.3863580           0.3933374
    +
    NNS.caus(mtcars$mpg, mtcars$hp)   # mpg -> hp
    +
    ## Causation.x.given.y Causation.y.given.x           C(y--->x) 
    +##           0.3863580           0.2607148           0.3933374
    +

    Interpretation. Examine asymmetry in scores to infer +direction. The method conditions partial‑moment dependence on candidate +drivers.

    +
    +
    +
    +
    +

    6. Time Series & Forecasting

    +

    Headers. NNS.ARMA, +NNS.ARMA.optim, NNS.seas, +NNS.VAR

    +
    # Univariate nonlinear ARMA
    +z <- as.numeric(scale(sin(1:480/8) + rnorm(480, sd=.35)))
    +
    +# Seasonality detection (prints a summary)
    +NNS.seas(z, plot = FALSE)
    +
    ## $all.periods
    +##      Period Coefficient.of.Variation Variable.Coefficient.of.Variation
    +##   1:     99             5.122054e-01                      7.866136e+16
    +##   2:    147             5.256021e-01                      7.866136e+16
    +##   3:    100             5.598477e-01                      7.866136e+16
    +##   4:    146             5.618687e-01                      7.866136e+16
    +##   5:    199             5.766158e-01                      7.866136e+16
    +##  ---                                                                  
    +## 235:    235             2.273832e+02                      7.866136e+16
    +## 236:     30             4.402306e+02                      7.866136e+16
    +## 237:     11             4.879713e+02                      7.866136e+16
    +## 238:     46             5.854636e+02                      7.866136e+16
    +## 239:      1             2.621230e+16                      7.866136e+16
    +## 
    +## $best.period
    +## Period 
    +##     99 
    +## 
    +## $periods
    +##   [1]  99 147 100 146 199  98  97 200  49 198  48  96 145 196 194 193 195 197 144  50 192 149 176  53 191 148  95 156 190 106 150 202 239  52 143 201 105 142 189 209 158 238 157 211 160 155
    +##  [47] 107 212 177 104 210 188 154 208 126 163 152 103 207 127 172 117 159 128 206 109 175 224 179  47 170 221 227 171  86  85 130  54 222 129 236 178 153 141 237 108 203 181  34 101 180 226
    +##  [93] 161  94  87 213 162  43 115 113 166 102  80  59 204  66  40 135  61 217  81  90 131  56  44 223  83  60 132  84  74  93  88  45 229 167 173 133 122 116 151  70 174 187  51 112 219 228
    +## [139] 121 136 232 120  37 169  25 182  22  14 186  82 118  89 184  39  72  67 234  64  18 140  91 216  57 124 233 168  32  16 134  73  75  92 139  36  41 220 230  58 123  69  38  76  28 164
    +## [185]  26 138 225 125 165 185  68  55  65  35   2 183   6  42  78 215 137  21   8  20  15  62   7 231  79  13 205  24  17 119  12  31 214  63  33  27 111   3   9  19  10 218 114   4  23   5
    +## [231] 110  71  77  29 235  30  11  46   1
    +
    # Validate seasonal periods
    +NNS.ARMA.optim(z, h=48, seasonal.factor = NNS.seas(z, plot = FALSE)$periods, plot = TRUE)
    +
    ## [1] "CURRNET METHOD: lin"
    +## [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
    +## [1] "NNS.ARMA(... method =  'lin' , seasonal.factor =  c( 52 ) ...)"
    +## [1] "CURRENT lin OBJECTIVE FUNCTION = 0.449145584053097"
    +## [1] "NNS.ARMA(... method =  'lin' , seasonal.factor =  c( 52, 49 ) ...)"
    +## [1] "CURRENT lin OBJECTIVE FUNCTION = 0.364719193840196"
    +## [1] "NNS.ARMA(... method =  'lin' , seasonal.factor =  c( 52, 49, 50 ) ...)"
    +## [1] "CURRENT lin OBJECTIVE FUNCTION = 0.303033712560494"
    +## [1] "BEST method = 'lin', seasonal.factor = c( 52, 49, 50 )"
    +## [1] "BEST lin OBJECTIVE FUNCTION = 0.303033712560494"
    +## [1] "CURRNET METHOD: nonlin"
    +## [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
    +## [1] "NNS.ARMA(... method =  'nonlin' , seasonal.factor =  c( 52, 49, 50 ) ...)"
    +## [1] "CURRENT nonlin OBJECTIVE FUNCTION = 1.58051085217018"
    +## [1] "BEST method = 'nonlin' PATH MEMBER = c( 52, 49, 50 )"
    +## [1] "BEST nonlin OBJECTIVE FUNCTION = 1.58051085217018"
    +## [1] "CURRNET METHOD: both"
    +## [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
    +## [1] "NNS.ARMA(... method =  'both' , seasonal.factor =  c( 52, 49, 50 ) ...)"
    +## [1] "CURRENT both OBJECTIVE FUNCTION = 0.447117273188387"
    +## [1] "BEST method = 'both' PATH MEMBER = c( 52, 49, 50 )"
    +## [1] "BEST both OBJECTIVE FUNCTION = 0.447117273188387"
    +

    +
    ## $periods
    +## [1] 52 49 50
    +## 
    +## $weights
    +## NULL
    +## 
    +## $obj.fn
    +## [1] 0.3030337
    +## 
    +## $method
    +## [1] "lin"
    +## 
    +## $shrink
    +## [1] FALSE
    +## 
    +## $nns.regress
    +## [1] FALSE
    +## 
    +## $bias.shift
    +## [1] 0.1079018
    +## 
    +## $errors
    +##  [1]  0.06841911 -0.23650978  0.23306891 -0.31474170 -0.16347937  0.56078801 -0.19340157 -0.54788961 -0.34463351 -0.04971714  0.81131522 -1.04034772 -0.01124973  0.18532001  0.42228850
    +## [16]  0.77875534  0.21204992  0.75989291  0.03648050 -0.12410190  0.78169808 -0.37190642  0.04673305  0.22143951  0.21784535 -0.36207177  0.06303110  0.27494889  0.61355674 -0.36588877
    +## [31] -0.53670212 -0.59710016 -0.33562214  0.52319489 -0.28558752 -0.06318330  0.46174079  0.85423779 -0.17957169  0.88745345 -0.22575406 -0.65533631 -0.50769155 -0.18710610 -0.19702948
    +## [46] -0.61676209 -0.64456532 -0.60764796  0.39155766 -0.99138140 -0.58599672 -0.41332955 -0.35110299 -0.31785231 -0.33368188 -0.79321483 -0.67548303 -0.29994123 -1.40951519 -0.23496159
    +## [61] -0.11326961  0.93761236 -1.12638974  0.56134385 -0.82647659 -0.15698867 -0.66092883 -0.23941287 -0.11793511 -0.13131032  0.23980082  0.11145491 -0.29324462  0.20996125  1.18368703
    +## [76]  0.39817389  0.11233666  0.18104853 -0.34704039  1.00778283 -0.12855809 -0.44890273 -0.16127326  0.23878907  0.08958084 -0.42127816  0.83025782  0.21535622  0.18499525  0.55580864
    +## [91] -0.63033063 -1.18279040  0.01593275 -0.38943895 -0.73303803  0.24461725
    +## 
    +## $results
    +##  [1] -0.48668691 -1.12897072 -1.19746701 -1.08366883 -1.17430589 -1.09503747 -1.27716033 -1.53097703 -1.12872199 -1.25223633 -1.03806114 -1.03996796 -0.86012134 -0.85334076 -1.29205205
    +## [16] -0.41549038 -0.43645317 -0.70613650 -0.12979825 -0.20838439 -0.43674783  0.04939343  0.21128735  0.41848130  0.58665881  0.65204679  1.11856457  0.81855013  1.33909393  1.17188036
    +## [31]  1.53195554  1.09195702  1.71916462  1.49064664  1.62340003  1.71600851  1.54806110  1.34095102  1.16533567  1.08567248  0.73267472  0.82949748  0.65297434  0.09082443  0.55476313
    +## [46]  0.53988329 -0.09198186 -0.18380226
    +## 
    +## $lower.pred.int
    +##  [1] -1.51339152 -2.15567532 -2.22417162 -2.11037344 -2.20101050 -2.12174208 -2.30386494 -2.55768164 -2.15542660 -2.27894094 -2.06476575 -2.06667256 -1.88682595 -1.88004537 -2.31875666
    +## [16] -1.44219499 -1.46315778 -1.73284111 -1.15650286 -1.23508900 -1.46345244 -0.97731118 -0.81541725 -0.60822331 -0.44004579 -0.37465782  0.09185996 -0.20815448  0.31238932  0.14517575
    +## [31]  0.50525093  0.06525242  0.69246002  0.46394203  0.59669542  0.68930391  0.52135649  0.31424642  0.13863106  0.05896787 -0.29402989 -0.19720713 -0.37373026 -0.93588018 -0.47194148
    +## [46] -0.48682132 -1.11868647 -1.21050687
    +## 
    +## $upper.pred.int
    +##  [1]  0.54001770 -0.10226611 -0.17076240 -0.05696423 -0.14760128 -0.06833286 -0.25045572 -0.50427242 -0.10201738 -0.22553172 -0.01135653 -0.01326335  0.16658327  0.17336385 -0.26534744
    +## [16]  0.61121423  0.59025144  0.32056811  0.89690636  0.81832022  0.58995678  1.07609804  1.23799196  1.44518591  1.61336342  1.67875140  2.14526918  1.84525474  2.36579853  2.19858497
    +## [31]  2.55866015  2.11866163  2.74586923  2.51735125  2.65010464  2.74271312  2.57476571  2.36765563  2.19204028  2.11237709  1.75937933  1.85620209  1.67967895  1.11752903  1.58146774
    +## [46]  1.56658789  0.93472275  0.84290235
    +

    Notes. NNS seasonality uses coefficient of variation +instead of ACF/PACFs, and NNS ARMA blends multiple seasonal periods into +the linear or nonlinear regression forecasts.

    +
    +
    +
    +

    7. Simulation, Bootstrap & Risk‑Neutral Rescaling

    +
    +

    7.1 Maximum entropy bootstrap (shape‑preserving)

    +

    Header. +NNS.meboot(x, reps=999, rho=NULL, type="spearman", drift=TRUE, ...)

    +
    x_ts <- cumsum(rnorm(350, sd=.7))
    +mb <- NNS.meboot(x_ts, reps=5, rho = 1)
    +dim(mb["replicates", ]$replicates)
    +
    ## [1] 350   5
    +
    +
    +

    7.2 Monte Carlo over the full correlation space

    +

    Header. +NNS.MC(x, reps=30, lower_rho=-1, upper_rho=1, by=.01, exp=1, type="spearman", ...)

    +
    mc <- NNS.MC(x_ts, reps=5, lower_rho=-1, upper_rho=1, by=.5, exp=1)
    +length(mc$ensemble); head(names(mc$replicates),5)
    +
    ## [1] 350
    +
    ## [1] "rho = 1"    "rho = 0.5"  "rho = 0"    "rho = -0.5" "rho = -1"
    +
    +
    +

    7.3 Risk‑neutral rescale (pricing context)

    +

    Header. +NNS.rescale(x, a, b, method=c("minmax","riskneutral"), T=NULL, type=c("Terminal","Discounted"))

    +
    px <- 100 + cumsum(rnorm(260, sd = 1))
    +rn <- NNS.rescale(px, a=100, b=0.03, method="riskneutral", T=1, type="Terminal")
    +c( target = 100*exp(0.03*1), mean_rn = mean(rn) )
    +
    ##   target  mean_rn 
    +## 103.0455 103.0455
    +

    Interpretation. riskneutral shifts the +mean to match \(S_0 e^{rT}\) (Terminal) +or \(S_0\) (Discounted), preserving +distributional shape.

    +
    +
    +
    +
    +

    8. Portfolio & Stochastic Dominance

    +

    Stochastic dominance orders uncertain prospects for broad classes of +risk‑averse utilities; partial moments supply practical, nonparametric +estimators.

    +

    Headers. - NNS.FSD.uni(x, y), +NNS.SSD.uni(x, y), NNS.TSD.uni(x, y) - +NNS.SD.cluster(R), NNS.SD.efficient.set(R)

    +
    RA <- rnorm(240, 0.005, 0.03)
    +RB <- rnorm(240, 0.003, 0.02)
    +RC <- rnorm(240, 0.006, 0.04)
    +
    +NNS.FSD.uni(RA, RB)
    +
    ## [1] 0
    +
    NNS.SSD.uni(RA, RB)
    +
    ## [1] 0
    +
    NNS.TSD.uni(RA, RB)
    +
    ## [1] 0
    +
    Rmat <- cbind(A=RA, B=RB, C=RC)
    +try(NNS.SD.cluster(Rmat, degree = 1))
    +
    ## $Clusters
    +## $Clusters$Cluster_1
    +## [1] "C" "A" "B"
    +
    try(NNS.SD.efficient.set(Rmat, degree = 1))
    +
    ## Checking 1 of 2Checking 2 of 2
    +
    ## [1] "C" "A" "B"
    +
    +
    +
    +

    Appendix A — Measure‑theoretic sketch (why partial moments are +rigorous)

    +

    Let \((\Omega, \mathcal{F}, +\mathbb{P})\) be a probability space, \(X: \Omega\to\mathbb{R}\) measurable. For +any fixed \(t\in\mathbb{R}\), the sets +\(\{X\le t\}\) and \(\{X>t\}\) are in \(\mathcal{F}\) because they are preimages of +Borel sets. The population partial moments are

    +

    \[ +\operatorname{LPM}(k,t,X) = \int_{-\infty}^{t} (t-x)^k\, dF_X(x), +\qquad +\operatorname{UPM}(k,t,X) = \int_{t}^{\infty} (x-t)^k\, dF_X(x). +\]

    +

    The empirical versions correspond to replacing \(F_X\) with the empirical measure \(\mathbb{P}_n\) (or CDF \(\hat F_n\)):

    +

    \[ +\widehat{\operatorname{LPM}}_k(t;X) = \int_{(-\infty,t]} (t-x)^k\, +d\mathbb{P}_n(x), +\qquad +\widehat{\operatorname{UPM}}_k(t;X) = \int_{(t,\infty)} (x-t)^k\, +d\mathbb{P}_n(x). +\]

    +

    Centering at \(t=\mu_X\) yields the +variance decomposition identity in Section 1.

    +
    +
    +
    +

    Appendix B — Quick Reference (Grouped by Topic)

    +
    +

    1. Partial Moments & Ratios

    +
      +
    • LPM(degree, target, variable) — lower partial moment of +order degree at target.
    • +
    • UPM(degree, target, variable) — upper partial moment of +order degree at target.
    • +
    • LPM.ratio(degree, target, variable); +UPM.ratio(...) — normalized shares; degree=0 +gives CDF.
    • +
    • LPM.VaR(p, degree, variable) — partial-moment quantile +at probability p.
    • +
    • Co.LPM(degree, target, x, y) — co-lower partial moment +between two variables.
    • +
    • Co.UPM(degree, target, x, y) — co-upper partial moment +between two variables.
    • +
    • D.LPM(degree, target, variable) — divergent lower +partial moment (away from target).
    • +
    • D.UPM(degree, target, variable) — divergent upper +partial moment (away from target).
    • +
    • NNS.CDF(x, target = NULL, points = NULL, plot = TRUE/FALSE) +— CDF from partial moments.
    • +
    • NNS.moments(x) — mean/var/skew/kurtosis via partial +moments.
    • +
    +
    +
    +

    2. Descriptive Statistics & Distributions

    +
      +
    • NNS.mode(x, multi=FALSE) — nonparametric mode(s).
    • +
    • PM.matrix(l_degree, u_degree, target, variable, pop_adj) +— co-/divergent partial-moment matrices.
    • +
    • NNS.gravity(x, w = NULL) — partial-moment weighted +location (gravity center).
    • +
    • NNS.norm(x, method = "moment") — normalization +retaining target moments.
    • +
    +

    See NNS Vignette: Getting +Started with NNS: Partial Moments

    +
    +
    +

    3. Dependence & Association

    +
      +
    • NNS.dep(x, y) — nonlinear dependence coefficient.
    • +
    • NNS.copula(X, target, continuous, plot, independence.overlay) +— dependence from co-partial moments.
    • +
    +

    See NNS Vignette: Getting +Started with NNS: Correlation and Dependence

    +
    +
    +

    4. Hypothesis Testing

    +
      +
    • NNS.ANOVA(control, treatment, ...) — certainty of +equality (distributions or means).
    • +
    +

    See NNS Vignette: Getting +Started with NNS: Comparing Distributions

    +
    +
    +

    5. Regression, Classification & Causality

    +
      +
    • NNS.part(x, y, ...) — partition analysis for variable +segmentation.
    • +
    • NNS.reg(x, y, ...) — partition-based +regression/classification ($Fitted.xy, +$Point.est).
    • +
    • NNS.boost(IVs, DV, ...), +NNS.stack(IVs, DV, ...) — ensembles using +NNS.reg base learners.
    • +
    • NNS.caus(x, y) — directional causality score.
    • +
    +

    See NNS Vignette: Getting +Started with NNS: Clustering and Regression

    +

    See NNS Vignette: Getting +Started with NNS: Classification

    +
    +
    +

    6. Differentiation & Slope Measures

    +
      +
    • dy.dx(x, y) — numerical derivative of y +with respect to x via partial moments.
    • +
    • dy.d_(x, Y, var) — partial derivative of multivariate +Y w.r.t. var.
    • +
    • NNS.diff(x, y) — derivative via secant +projections.
    • +
    +
    +
    +

    7. Time Series & Forecasting

    +
      +
    • NNS.ARMA(...), NNS.ARMA.optim(...) — +nonlinear ARMA modeling.
    • +
    • NNS.seas(...) — detect seasonality.
    • +
    • NNS.VAR(...) — nonlinear VAR modeling.
    • +
    • NNS.nowcast(x, h, ...) — near-term nonlinear +forecast.
    • +
    +

    See NNS Vignette: Getting +Started with NNS: Forecasting

    +
    +
    +

    8. Simulation, Bootstrap & Rescaling

    +
      +
    • NNS.meboot(...) — maximum entropy bootstrap.
    • +
    • NNS.MC(...) — Monte Carlo over correlation space.
    • +
    • NNS.rescale(...) — risk-neutral or min–max +rescaling.
    • +
    +

    See NNS Vignette: Getting +Started with NNS: Sampling and Simulation

    +
    +
    +

    9. Portfolio Analysis & Stochastic Dominance

    +
      +
    • NNS.FSD.uni(x, y), NNS.SSD.uni(x, y), +NNS.TSD.uni(x, y) — univariate stochastic dominance +tests.
    • +
    • NNS.SD.cluster(R), NNS.SD.efficient.set(R) +— dominance-based portfolio sets.
    • +
    +

    For complete references, please see the Vignettes linked above and +their specific referenced materials.

    +
    +
    + + + + + + + + + + + diff --git a/tools/NNS/examples/Nonlinear_Correlation_and_Dependence_Using_NNS.pdf b/tools/NNS/examples/Nonlinear_Correlation_and_Dependence_Using_NNS.pdf new file mode 100644 index 0000000..dc0b5f7 Binary files /dev/null and b/tools/NNS/examples/Nonlinear_Correlation_and_Dependence_Using_NNS.pdf differ diff --git a/tools/NNS/examples/Nonlinear_FWL.pdf b/tools/NNS/examples/Nonlinear_FWL.pdf new file mode 100644 index 0000000..dc5e855 Binary files /dev/null and b/tools/NNS/examples/Nonlinear_FWL.pdf differ diff --git a/tools/NNS/examples/Normalization.pdf b/tools/NNS/examples/Normalization.pdf new file mode 100644 index 0000000..4d8d6a8 Binary files /dev/null and b/tools/NNS/examples/Normalization.pdf differ diff --git a/tools/NNS/examples/PWT.html b/tools/NNS/examples/PWT.html new file mode 100644 index 0000000..e59de32 --- /dev/null +++ b/tools/NNS/examples/PWT.html @@ -0,0 +1,4071 @@ + + + + + + + + + + + + + + +Penn World Table 9.1 in R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    +
    +
    +
    +
    + +
    + + + + + + + +
    +

    1 Intro

    +

    The objective of this analysis is to try to capture any causative effects in the Penn World Table data on Real GDP rdgpe. We will use several techniques available on a theoretical ground truth, and then apply these techniques on other variables of interest.

    +
    +

    1.1 Ground Truth

    +

    The ground truth we have identified is that Real GDP causes Total Factor Productivity ctfp. Why is this? The contention is that ctfp is essentially measuring economies of scale, and in order to invoke those economies of scale, a country must first achieve scale! Thus, we assume rdgpe causes ctfp.

    +

    A note on the difference between cwtfp and ctfp. The difference between cwtfp and tfp is that ctfp is based on relative real GDP from the output side, while cwtfp is based on relative real domestic absorption, and cwtfp levels are generally lower than their ctfp counterparts [Feenstra (2015)].

    +

    Our results are robust to either measure of Total Factor Productivity.

    +
    +
    +
    +

    2 PWT 9.1 Data

    +

    Download and load the pwt9 package in R. Other required R packages are called throughout, they include:

    +

    data.table, NNS, generalCorr, and lmtest.

    +
    library(data.table)
    +library(pwt9)
    +
    +# Load pwt data into R
    +data("pwt9.1")
    +
    +# Create a data.table of pwt data
    +pwt <- data.table(pwt9.1)
    +
    +

    2.1 Step 1: Select only Countries with pop > 1mm

    +

    The first year of US data is 1954, so we need to eliminate the other years, and only select countries with 1mm people or more.

    +
    # Subset based on population > 1mm and Year >= 1954
    +pwt_1 <- pwt[pop > 1 & year >= 1954,]
    +
    +
    +

    2.2 Step 2: Find the Countries with No cwtfp Data

    +

    We need to eliminate the US from consideration because its value is set to 1 for all dates, and thus offers no insights to causative effects.

    +
    # Create lists of countries only with full sets of observations for cwtfp and rgdpe
    +cwtfp_countries <- pwt_1[, sum(is.na(cwtfp))/.N, by = country]
    +full_cwtfp_countries <- cwtfp_countries[V1==0, country]
    +rgdpe_countries <- pwt_1[, sum(is.na(rgdpe))/.N, by = country]
    +full_rgdpe_countries <- rgdpe_countries[V1==0, country]
    +

    There are only 51 countries with complete datasets for cwtfp and rgdpe.

    +
    # Intersecting Countries List
    +Reduce(intersect, list(full_cwtfp_countries, full_rgdpe_countries))
    +
    ##  [1] "Argentina"                          "Australia"                         
    +##  [3] "Austria"                            "Belgium"                           
    +##  [5] "Bahrain"                            "Bolivia (Plurinational State of)"  
    +##  [7] "Brazil"                             "Botswana"                          
    +##  [9] "Canada"                             "Switzerland"                       
    +## [11] "Colombia"                           "Costa Rica"                        
    +## [13] "Germany"                            "Denmark"                           
    +## [15] "Ecuador"                            "Egypt"                             
    +## [17] "Spain"                              "Finland"                           
    +## [19] "France"                             "Gabon"                             
    +## [21] "United Kingdom"                     "Guatemala"                         
    +## [23] "India"                              "Ireland"                           
    +## [25] "Israel"                             "Italy"                             
    +## [27] "Jordan"                             "Japan"                             
    +## [29] "Kenya"                              "Kuwait"                            
    +## [31] "Sri Lanka"                          "Morocco"                           
    +## [33] "Mexico"                             "Mauritius"                         
    +## [35] "Namibia"                            "Netherlands"                       
    +## [37] "Norway"                             "New Zealand"                       
    +## [39] "Peru"                               "Philippines"                       
    +## [41] "Portugal"                           "Qatar"                             
    +## [43] "Sweden"                             "Eswatini"                          
    +## [45] "Thailand"                           "Trinidad and Tobago"               
    +## [47] "Turkey"                             "Uruguay"                           
    +## [49] "United States of America"           "Venezuela (Bolivarian Republic of)"
    +## [51] "South Africa"
    +
    # Subset based on complete observations and exclude USA
    +pwt_2 <- pwt_1[country%in%Reduce(intersect, list(full_cwtfp_countries, full_rgdpe_countries)),]
    +pwt_2 <- pwt_2[isocode!="USA",]
    +
    +
    +

    2.3 Step 3: NNS Causation Method

    +

    NNS causality tries to determine the conditional probability of two events by first normalizing past innovations of itself (per the Granger insight) such that: \[x^* = f(x_{t-1}, x_{t-2},...,x_{t-n})\] \[y^* = f(y_{t-1}, y_{t-2},...,y_{t-n})\]

    +

    and then normalizing each of these new variables to a shared rangespace by:

    +

    \[ x^{**} = f(x^*, y^*)\] \[ y^{**} = g(x^*, y^*)\]

    +

    Once \(x^{**}\) and \(y^{**}\) are on the shared rangespace, the conditional probability can be ascertained and then the correlation applied to determine causation. Statistically, we are trying to demonstrate that events only occurring given another event (shared rangespace) will then permit correlation as causative.

    +
    library(NNS)
    +# Determine NNS Causal Direction
    +NNS_Caus <- pwt_2[, NNS_Causation_Direction := names(NNS.caus(rgdpe, cwtfp, tau = 1)[3]), by = country]
    +
    +#Determine NNS Causal Magnitude
    +NNS_Caus <- pwt_2[, NNS_Causation := as.numeric(NNS.caus(rgdpe, cwtfp, tau = 1)[3]), by = country]
    +
    +# Obtain unique country NNS caus results
    +NNS_Caus[, unique(.SD), .SDcols = c("NNS_Causation", "NNS_Causation_Direction"), by = country]
    +
    + +
    +
    +

    2.3.1 NNS: Countries NOT to show rdgpe causes cwtfp

    +
    # Isolate instances of reverse causality
    +NNS_Caus[NNS_Causation_Direction=="C(y--->x)", unique(country)]
    +
    ## [1] Gabon   Mexico  Uruguay
    +## 182 Levels: Aruba Angola Anguilla Albania United Arab Emirates ... Zimbabwe
    +
    +
    +
    +

    2.4 Step 4: Granger Causality

    +

    According to Granger causality, if a variable \(x\) ``Granger-causes’’ a variable \(y\), then past values of \(x\) should contain information that helps predict \(y\) above and beyond the information contained in past values of \(y\) alone.

    +
    library(lmtest)
    +
    +granger <- pwt_2[, granger_causality := grangertest(rgdpe ~ cwtfp, order = 3)$Pr[2], by = country]
    +
    +

    2.4.1 Granger: Countries NOT to show cwtfp causes rdgpe

    +
    # Isolate instances of insignificance, p > .05
    +granger[granger_causality > 0.05, unique(country)]
    +
    ##  [1] Australia                          Austria                           
    +##  [3] Belgium                            Bahrain                           
    +##  [5] Bolivia (Plurinational State of)   Brazil                            
    +##  [7] Botswana                           Canada                            
    +##  [9] Colombia                           Costa Rica                        
    +## [11] Germany                            Denmark                           
    +## [13] Ecuador                            Egypt                             
    +## [15] Spain                              Finland                           
    +## [17] France                             Gabon                             
    +## [19] United Kingdom                     Guatemala                         
    +## [21] India                              Israel                            
    +## [23] Italy                              Jordan                            
    +## [25] Japan                              Kenya                             
    +## [27] Kuwait                             Sri Lanka                         
    +## [29] Morocco                            Mexico                            
    +## [31] Mauritius                          Namibia                           
    +## [33] Netherlands                        Norway                            
    +## [35] New Zealand                        Peru                              
    +## [37] Philippines                        Portugal                          
    +## [39] Qatar                              Sweden                            
    +## [41] Eswatini                           Thailand                          
    +## [43] Trinidad and Tobago                Turkey                            
    +## [45] Venezuela (Bolivarian Republic of) South Africa                      
    +## 182 Levels: Aruba Angola Anguilla Albania United Arab Emirates ... Zimbabwe
    +
    +
    +
    +

    2.5 Step 5: Generalized Correlations and Kernel Causality generalCorr

    +

    Vinod (2014) developed new generalized correlation coefficients so that when \(r^*(Y|X) > r^*(X|Y)\) then \(X\) is the ``kernel cause’’ of \(Y\). Vinod (2015) argues that kernel causality amounts to model selection between two kernel regressions, \(E(Y|X) = g_1(X)\) and \(E(X|Y) = g_2(Y)\).

    +
    library(generalCorr)
    +
    +gc <- pwt_2[, gc_causality := causeSummBlk(cbind(cwtfp, rgdpe))[1], by = country]
    +
    ## [1] cwtfp     causes    rgdpe     strength= 31.496   
    +## [1] corr=  0.7497 p-val= 0     
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.4712 p-val=  9e-05  
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   0.2512  p-val=  0.04524
    +## [1] rgdpe     causes    cwtfp     strength= -100     
    +## [1] corr=   0.3752  p-val=  0.00225
    +## [1] rgdpe     causes    cwtfp     strength= -100     
    +## [1] corr=   0.3436  p-val=  0.30093
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   0.0705  p-val=  0.57987
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.2577 p-val=  0.0398 
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.8496 p-val=  0      
    +## [1] cwtfp     causes    rgdpe     strength= 100      
    +## [1] corr=   -0.8452 p-val=  0      
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.6259 p-val=  0      
    +## [1] cwtfp     causes    rgdpe     strength= 84.252   
    +## [1] corr=   -0.2082 p-val=  0.09877
    +## [1] rgdpe     causes    cwtfp     strength= -100     
    +## [1] corr=   -0.8111 p-val=  0      
    +## [1] rgdpe     causes    cwtfp     strength= -100     
    +## [1] corr=  0.8067 p-val= 0     
    +## [1] rgdpe     causes    cwtfp     strength= -100     
    +## [1] corr=   -0.0399 p-val=  0.75416
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.6716 p-val=  0      
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.2202 p-val=  0.08044
    +## [1] cwtfp     causes    rgdpe     strength= 50.394   
    +## [1] corr=   0.2011  p-val=  0.11107
    +## [1] cwtfp     causes    rgdpe     strength= 100      
    +## [1] corr=  0.7368 p-val= 0     
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=  0.5098 p-val= 2e-05 
    +## [1] cwtfp     causes    rgdpe     strength= 100      
    +## [1] corr=   -0.6051 p-val=  0.00106
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   0.1478  p-val=  0.24368
    +## [1] rgdpe     causes    cwtfp     strength= -31.496  
    +## [1] corr=   -0.8752 p-val=  0      
    +## [1] cwtfp     causes    rgdpe     strength= 31.496   
    +## [1] corr=  0.7024 p-val= 0     
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   0.1072  p-val=  0.39912
    +## [1] rgdpe     causes    cwtfp     strength= -100     
    +## [1] corr=   0.0292  p-val=  0.81863
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   0.1644  p-val=  0.19423
    +## [1] rgdpe     causes    cwtfp     strength= -100     
    +## [1] corr=   -0.4816 p-val=  0.00015
    +## [1] rgdpe     causes    cwtfp     strength= -100     
    +## [1] corr=  0.7532 p-val= 0     
    +## [1] rgdpe     causes    cwtfp     strength= -100     
    +## [1] corr=   -0.7678 p-val=  0      
    +## [1] cwtfp     causes    rgdpe     strength= 62.205   
    +## [1] corr=   -0.51   p-val=  0.00048
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.1058 p-val=  0.40537
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.7624 p-val=  0      
    +## [1] cwtfp     causes    rgdpe     strength= 100      
    +## [1] corr=   -0.8394 p-val=  0      
    +## [1] rgdpe     causes    cwtfp     strength= -100     
    +## [1] corr=   -0.7904 p-val=  0      
    +## [1] rgdpe     causes    cwtfp     strength= -67.717  
    +## [1] corr=   -0.1591 p-val=  0.34013
    +## [1] cwtfp     causes    rgdpe     strength= 31.496   
    +## [1] corr=   1e-04   p-val=  0.99965
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   0.4265  p-val=  0.00044
    +## [1] cwtfp     causes    rgdpe     strength= 100      
    +## [1] corr=   -0.1329 p-val=  0.29513
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.4297 p-val=  0.00039
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.1224 p-val=  0.33531
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.4118 p-val=  0.00072
    +## [1] rgdpe     causes    cwtfp     strength= -100     
    +## [1] corr=   -0.7827 p-val=  0.00261
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.4059 p-val=  0.00087
    +## [1] cwtfp     causes    rgdpe     strength= 100      
    +## [1] corr=   -0.9035 p-val=  0      
    +## [1] cwtfp     causes    rgdpe     strength= 100      
    +## [1] corr=   -0.0441 p-val=  0.72947
    +## [1] cwtfp     causes    rgdpe     strength= 37.008   
    +## [1] corr=   -0.1857 p-val=  0.23315
    +## [1] rgdpe     causes    cwtfp     strength= -100     
    +## [1] corr=  0.8206 p-val= 0     
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.5151 p-val=  1e-05  
    +## [1] cwtfp     causes    rgdpe     strength= 100      
    +## [1] corr=   -0.5935 p-val=  0      
    +## [1] rgdpe     causes    cwtfp     strength= -37.008  
    +## [1] corr=   -0.5349 p-val=  1e-05
    +
    +

    2.5.1 generalCorr: Countries to show rdgpe causes cwtfp

    +
    # Isolate incidents of rgdpe as kernel cause
    +gc[gc_causality=="rgdpe", unique(country)]
    +
    ##  [1] Australia                        Austria                         
    +##  [3] Belgium                          Bahrain                         
    +##  [5] Bolivia (Plurinational State of) Brazil                          
    +##  [7] Botswana                         Switzerland                     
    +##  [9] Costa Rica                       Germany                         
    +## [11] Denmark                          Ecuador                         
    +## [13] Egypt                            France                          
    +## [15] United Kingdom                   Guatemala                       
    +## [17] Ireland                          Israel                          
    +## [19] Italy                            Jordan                          
    +## [21] Japan                            Kenya                           
    +## [23] Sri Lanka                        Morocco                         
    +## [25] Mauritius                        Namibia                         
    +## [27] Norway                           Peru                            
    +## [29] Philippines                      Portugal                        
    +## [31] Qatar                            Sweden                          
    +## [33] Turkey                           Uruguay                         
    +## [35] South Africa                    
    +## 182 Levels: Aruba Angola Anguilla Albania United Arab Emirates ... Zimbabwe
    +

    So if theoretically rgdpe causes cwtfp, and 3 methodologies agree, then what causes rdgpe???

    +
    +
    +
    +

    2.6 Step 2a: Let’s Try Population pop

    +

    Below are the only countries to show that pop causes rgdpe via NNS.caus(). There are 148 countries in the reduced list of complete datasets for pop and rgdpe.

    +
    # Create full observation list for population data
    +pop_countries <- pwt_1[, sum(is.na(pop))/.N, by = country]
    +full_pop_countries <- pop_countries[V1==0, country]
    +
    +# Intersecting Countries List
    +Reduce(intersect, list(full_pop_countries, full_rgdpe_countries))
    +
    ##   [1] "Angola"                             "Albania"                           
    +##   [3] "United Arab Emirates"               "Argentina"                         
    +##   [5] "Armenia"                            "Australia"                         
    +##   [7] "Austria"                            "Azerbaijan"                        
    +##   [9] "Burundi"                            "Belgium"                           
    +##  [11] "Benin"                              "Burkina Faso"                      
    +##  [13] "Bangladesh"                         "Bulgaria"                          
    +##  [15] "Bahrain"                            "Bosnia and Herzegovina"            
    +##  [17] "Belarus"                            "Bolivia (Plurinational State of)"  
    +##  [19] "Brazil"                             "Botswana"                          
    +##  [21] "Central African Republic"           "Canada"                            
    +##  [23] "Switzerland"                        "Chile"                             
    +##  [25] "China"                              "Cote d'Ivoire"                     
    +##  [27] "Cameroon"                           "Congo, Democratic Republic"        
    +##  [29] "Congo"                              "Colombia"                          
    +##  [31] "Costa Rica"                         "Czech Republic"                    
    +##  [33] "Germany"                            "Denmark"                           
    +##  [35] "Dominican Republic"                 "Algeria"                           
    +##  [37] "Ecuador"                            "Egypt"                             
    +##  [39] "Spain"                              "Estonia"                           
    +##  [41] "Ethiopia"                           "Finland"                           
    +##  [43] "France"                             "Gabon"                             
    +##  [45] "United Kingdom"                     "Georgia"                           
    +##  [47] "Ghana"                              "Guinea"                            
    +##  [49] "Gambia"                             "Guinea-Bissau"                     
    +##  [51] "Equatorial Guinea"                  "Greece"                            
    +##  [53] "Guatemala"                          "China, Hong Kong SAR"              
    +##  [55] "Honduras"                           "Croatia"                           
    +##  [57] "Haiti"                              "Hungary"                           
    +##  [59] "Indonesia"                          "India"                             
    +##  [61] "Ireland"                            "Iran (Islamic Republic of)"        
    +##  [63] "Iraq"                               "Israel"                            
    +##  [65] "Italy"                              "Jamaica"                           
    +##  [67] "Jordan"                             "Japan"                             
    +##  [69] "Kazakhstan"                         "Kenya"                             
    +##  [71] "Kyrgyzstan"                         "Cambodia"                          
    +##  [73] "Republic of Korea"                  "Kuwait"                            
    +##  [75] "Lao People's DR"                    "Lebanon"                           
    +##  [77] "Liberia"                            "Sri Lanka"                         
    +##  [79] "Lesotho"                            "Lithuania"                         
    +##  [81] "Latvia"                             "Morocco"                           
    +##  [83] "Republic of Moldova"                "Madagascar"                        
    +##  [85] "Mexico"                             "North Macedonia"                   
    +##  [87] "Mali"                               "Myanmar"                           
    +##  [89] "Mongolia"                           "Mozambique"                        
    +##  [91] "Mauritania"                         "Mauritius"                         
    +##  [93] "Malawi"                             "Malaysia"                          
    +##  [95] "Namibia"                            "Niger"                             
    +##  [97] "Nigeria"                            "Nicaragua"                         
    +##  [99] "Netherlands"                        "Norway"                            
    +## [101] "Nepal"                              "New Zealand"                       
    +## [103] "Oman"                               "Pakistan"                          
    +## [105] "Panama"                             "Peru"                              
    +## [107] "Philippines"                        "Poland"                            
    +## [109] "Portugal"                           "Paraguay"                          
    +## [111] "State of Palestine"                 "Qatar"                             
    +## [113] "Romania"                            "Russian Federation"                
    +## [115] "Rwanda"                             "Saudi Arabia"                      
    +## [117] "Sudan"                              "Senegal"                           
    +## [119] "Singapore"                          "Sierra Leone"                      
    +## [121] "El Salvador"                        "Serbia"                            
    +## [123] "Slovakia"                           "Slovenia"                          
    +## [125] "Sweden"                             "Eswatini"                          
    +## [127] "Syrian Arab Republic"               "Chad"                              
    +## [129] "Togo"                               "Thailand"                          
    +## [131] "Tajikistan"                         "Turkmenistan"                      
    +## [133] "Trinidad and Tobago"                "Tunisia"                           
    +## [135] "Turkey"                             "Taiwan"                            
    +## [137] "U.R. of Tanzania: Mainland"         "Uganda"                            
    +## [139] "Ukraine"                            "Uruguay"                           
    +## [141] "United States of America"           "Uzbekistan"                        
    +## [143] "Venezuela (Bolivarian Republic of)" "Viet Nam"                          
    +## [145] "Yemen"                              "South Africa"                      
    +## [147] "Zambia"                             "Zimbabwe"
    +
    # Subset full observations of population and rgdpe from earlier list
    +pwt_2 <- pwt_1[country%in%Reduce(intersect, list(full_pop_countries, full_rgdpe_countries)),]
    +pwt_2 <- pwt_2[isocode!="USA",]
    +
    +NNS_Caus <- pwt_2[, NNS_Causation_Direction := names(NNS.caus(pop, rgdpe, tau = 1)[3]), by = country]
    +
    +NNS_Caus <- pwt_2[, NNS_Causation := as.numeric(NNS.caus(pop, rgdpe, tau = 1)[3]), by = country]
    +
    +NNS_Caus[, unique(.SD), .SDcols=c("NNS_Causation", "NNS_Causation_Direction"), by = country]
    +
    + +
    +
    # Isolate instances of pop causing rgdpe 
    +NNS_Caus[NNS_Causation_Direction == "C(x--->y)" & NNS_Causation > 0, unique(country)]
    +
    ##  [1] Angola                             United Arab Emirates              
    +##  [3] Bangladesh                         Belarus                           
    +##  [5] Bolivia (Plurinational State of)   Central African Republic          
    +##  [7] Switzerland                        Cote d'Ivoire                     
    +##  [9] Cameroon                           Costa Rica                        
    +## [11] Czech Republic                     Dominican Republic                
    +## [13] Algeria                            Estonia                           
    +## [15] Ghana                              Guinea                            
    +## [17] Gambia                             Iran (Islamic Republic of)        
    +## [19] Iraq                               Kazakhstan                        
    +## [21] Latvia                             Madagascar                        
    +## [23] Mexico                             Mauritania                        
    +## [25] Malawi                             Malaysia                          
    +## [27] Niger                              Nigeria                           
    +## [29] Nicaragua                          Nepal                             
    +## [31] Oman                               Pakistan                          
    +## [33] Rwanda                             Saudi Arabia                      
    +## [35] Sierra Leone                       Serbia                            
    +## [37] Slovakia                           Syrian Arab Republic              
    +## [39] Chad                               Tajikistan                        
    +## [41] Turkmenistan                       Turkey                            
    +## [43] Uzbekistan                         Venezuela (Bolivarian Republic of)
    +## [45] South Africa                       Zambia                            
    +## [47] Zimbabwe                          
    +## 182 Levels: Aruba Angola Anguilla Albania United Arab Emirates ... Zimbabwe
    +

    An overwhelming majority of countries demonstrate that population does NOT cause growth. This makes intuitive sense. Look at China for example, who’s population in 1950 surpassed the U.S.’s current population, thus inferring China should have been as large as the U.S. back in 1950. Obviously, this was not the case, arguing against population as a cause to GDP.

    +

    Let’s look at pop growth rates and rgdpe growth rates rather than nominal levels…

    +
    # Create growth rate function
    +rate <- function(x) log((x/shift(x,1)))
    +
    +# Apply growth rate to population and rgdpe
    +pwt_2[, c("Pop_rate", "Growth_rate") := lapply(.SD, rate), by = country, .SDcols = c("pop","rgdpe")]
    +
    +NNS_Caus <- pwt_2[, NNS_Causation_Direction := names(NNS.caus(Pop_rate[-1], Growth_rate[-1], tau = 1)[3]), by = country]
    +
    +NNS_Caus <- pwt_2[, NNS_Causation := as.numeric(NNS.caus(Pop_rate[-1], Growth_rate[-1], tau = 1)[3]), by = country]
    +
    +NNS_Caus[, unique(.SD), .SDcols=c("NNS_Causation", "NNS_Causation_Direction"), by = country]
    +
    + +
    +
    # Isolate instances of pop growth rate causing rgdpe growth
    +NNS_Caus[NNS_Causation_Direction == "C(x--->y)" & NNS_Causation > 0, unique(country)]
    +
    ##  [1] Australia       Burkina Faso    Belarus         Brazil         
    +##  [5] Germany         France          Guinea          Indonesia      
    +##  [9] Jordan          Japan           Kazakhstan      Kyrgyzstan     
    +## [13] Sri Lanka       North Macedonia Nepal           Pakistan       
    +## [17] Romania         Senegal         Sierra Leone    El Salvador    
    +## [21] Serbia          Tunisia         Taiwan          Ukraine        
    +## [25] Uzbekistan      Yemen          
    +## 182 Levels: Aruba Angola Anguilla Albania United Arab Emirates ... Zimbabwe
    +

    NNS does find an overwhelming majority, of the 148 countries in the complete observation list do NOT show that population growth causes real GDP growth.

    +

    Looking at an example, let’s visualize what’s going on. India’s growth rate and population growth rate are presented below, we can see the relationship between the two series. In fact, the correlation is negative for this country!

    +
    plot(pwt_2[country=="India", Pop_rate[-1]],pwt_2[country=="India", Growth_rate[-1]])
    +

    +
    cor(pwt_2[country=="India", Pop_rate[-1]],pwt_2[country=="India", Growth_rate[-1]])
    +
    ## [1] -0.5459281
    +
    +
    +

    2.7 Step 2a: Let’s Try pop…again…

    +

    Below are the countries to show that pop causes rgdpe via generalCorr.

    +
    gc <- pwt_2[, gc_causality := causeSummBlk(cbind(pop, rgdpe))[1], by = country]
    +
    ## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8996 p-val= 0     
    +## [1] rgdpe     causes    pop       strength= -100     
    +## [1] corr=   0.2931  p-val=  0.04317
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9704 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9379 p-val= 0     
    +## [1] rgdpe     causes    pop       strength= -100     
    +## [1] corr=   -0.4856 p-val=  0.0088 
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9782 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9816 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8394 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9471 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9823 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.963  p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9848 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8538 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=   -0.7043 p-val=  0      
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=   0.8216  p-val=  0.00192
    +## [1] rgdpe     causes    pop       strength= -37.008  
    +## [1] corr=   -0.8131 p-val=  0      
    +## [1] rgdpe     causes    pop       strength= -100     
    +## [1] corr=  -0.681 p-val= 7e-05 
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9291 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9288 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9816 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=  0.8784 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9872 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9743 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8831 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.7919 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9454 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9842 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=   0.0133  p-val=  0.91693
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9126 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9362 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9533 p-val= 0     
    +## [1] rgdpe     causes    pop       strength= -100     
    +## [1] corr=  0.8377 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=  0.8554 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9641 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9205 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.94   p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.925  p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9165 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9583 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=   -0.7927 p-val=  0      
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8806 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9696 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9802 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9114 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9618 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=   -0.316  p-val=  0.10144
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9039 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.7759 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.983  p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9086 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=   -0.9458 p-val=  0.00433
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=  0.9646 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9713 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9488 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9807 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=   -0.8666 p-val=  0      
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9694 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  -0.89  p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8858 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8492 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9374 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8474 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8428 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9858 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=  0.9304 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8489 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9379 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9407 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.872  p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9694 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=   0.0386  p-val=  0.84522
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=  0.9123 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8751 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.7903 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8696 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=  0.9689 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=   -0.0532 p-val=  0.70229
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.7562 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9339 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=   -0.9537 p-val=  0      
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=   -0.7406 p-val=  1e-05  
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9464 p-val= 0     
    +## [1] rgdpe     causes    pop       strength= -100     
    +## [1] corr=   -0.2617 p-val=  0.17861
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9587 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9768 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9028 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.968  p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.7359 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8314 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9481 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9678 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9297 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9706 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.952  p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8789 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=  0.9281 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.6461 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.7461 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.95   p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9635 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9271 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9707 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8863 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9576 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9093 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8763 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9488 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.626  p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=  0.9117 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9315 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9837 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=   0.8007  p-val=  0.00175
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=   -0.036  p-val=  0.78826
    +## [1] rgdpe     causes    pop       strength= -100     
    +## [1] corr=   -0.7166 p-val=  2e-05  
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=  0.9009 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8517 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8993 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9824 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9674 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=  0.9087 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8615 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=   -0.6129 p-val=  0.00053
    +## [1] rgdpe     causes    pop       strength= -37.008  
    +## [1] corr=  0.7051 p-val= 3e-05 
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.7836 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9761 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9713 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=  0.7694 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9351 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9455 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8672 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=   0.4072  p-val=  0.03149
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8861 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=   0.5549  p-val=  0.00011
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9735 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9046 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8894 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.893  p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9668 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=   -0.236  p-val=  0.22673
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.811  p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 82.677   
    +## [1] corr=  0.9066 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8369 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8621 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.845  p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.9708 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 100      
    +## [1] corr=  0.8021 p-val= 0     
    +## [1] pop       causes    rgdpe     strength= 37.008   
    +## [1] corr=  0.5889 p-val= 0
    +
    # Isolate incidents of rgdpe as kernel cause
    +gc[gc_causality=="rgdpe", unique(country)]
    +
    ## [1] Albania                Armenia                Bosnia and Herzegovina
    +## [4] Belarus                Czech Republic         Republic of Moldova   
    +## [7] Russian Federation     Slovakia              
    +## 182 Levels: Aruba Angola Anguilla Albania United Arab Emirates ... Zimbabwe
    +

    And let’s see if this holds for population growth rates and real GDP growth rates.

    +
    gc <- pwt_2[, gc_causality := causeSummBlk(cbind(Pop_rate[-1], Growth_rate[-1]))[1], by = country]
    +
    ## [1] causes    strength= -100
    +
    ## Error in out[i - 1, 1] <- nam[i]: number of items to replace is not a multiple of replacement length
    +
    # Isolate incidents of Growth_rate as kernel cause
    +gc[gc_causality=="Growth_rate", unique(country)]
    +
    ## factor(0)
    +## 182 Levels: Aruba Angola Anguilla Albania United Arab Emirates ... Zimbabwe
    +

    For some internal algorithm reasons, the generalCorr method is generating errors related when using these normalized variables.

    +
    +
    +

    2.8 Step 2a: Let’s Try pop…again…and yet again with Granger

    +

    Below are the only countries to show that pop causes rgdpe via Granger.

    +
    granger <- pwt_2[, granger_causality := grangertest(rgdpe  ~ pop, order = 1)$Pr[2], by = country]
    +
    +# Isolate instances of significance, p < .05
    +granger[granger_causality < 0.05, unique(country)]
    +
    ##  [1] Angola                     Argentina                 
    +##  [3] Armenia                    Azerbaijan                
    +##  [5] Burundi                    Benin                     
    +##  [7] Bulgaria                   Bosnia and Herzegovina    
    +##  [9] Belarus                    Botswana                  
    +## [11] Central African Republic   Congo                     
    +## [13] Egypt                      Estonia                   
    +## [15] Georgia                    Gambia                    
    +## [17] Greece                     China, Hong Kong SAR      
    +## [19] Croatia                    Haiti                     
    +## [21] Iran (Islamic Republic of) Japan                     
    +## [23] Kazakhstan                 Kyrgyzstan                
    +## [25] Cambodia                   Republic of Korea         
    +## [27] Lithuania                  Latvia                    
    +## [29] Republic of Moldova        North Macedonia           
    +## [31] Mauritania                 Malawi                    
    +## [33] Namibia                    State of Palestine        
    +## [35] Romania                    Russian Federation        
    +## [37] Singapore                  Sierra Leone              
    +## [39] Serbia                     Slovakia                  
    +## [41] Eswatini                   Chad                      
    +## [43] Togo                       Tajikistan                
    +## [45] Turkmenistan               Tunisia                   
    +## [47] Taiwan                     Uganda                    
    +## [49] Ukraine                    Uzbekistan                
    +## [51] Zambia                    
    +## 182 Levels: Aruba Angola Anguilla Albania United Arab Emirates ... Zimbabwe
    +

    And we will try Granger on the population growth rate and real GDP growth rate…

    +
    granger <- pwt_2[, granger_causality := grangertest(Growth_rate[-1]  ~ Pop_rate[-1], order = 1)$Pr[2], by = country]
    +
    +# Isolate instances of significance, p < .05
    +granger[granger_causality < 0.05, unique(country)]
    +
    ##  [1] Azerbaijan             Bosnia and Herzegovina Belarus               
    +##  [4] Germany                Denmark                France                
    +##  [7] Georgia                China, Hong Kong SAR   Hungary               
    +## [10] India                  Israel                 Cambodia              
    +## [13] Liberia                Sri Lanka              Mali                  
    +## [16] Myanmar                Mozambique             Mauritania            
    +## [19] Namibia                Netherlands            Rwanda                
    +## [22] Turkmenistan           Tunisia                Taiwan                
    +## [25] Ukraine                Uzbekistan             Viet Nam              
    +## [28] Zimbabwe              
    +## 182 Levels: Aruba Angola Anguilla Albania United Arab Emirates ... Zimbabwe
    +

    Granger only finds significance for population growth causing real GDP growth in 28 of the 148 countries.

    +
    +
    +
    +

    3 Finding Other Causative Variables

    +

    Reviewing the literature may offer some ability to narrow down this search based on economic theory, or find other contributing factors besides population size.

    +
    +

    3.1 Rodrik (2008)

    +

    Rodrik shows that undervaluation of the currency (a high real exchange rate \(RER\)) stimulates economic growth. He finds this is true particularly for developing countries. His finding is robust to using different measures of the real exchange rate and different estimation techniques.

    +

    We will use our techniques to see if these findings continue to demonstrate robustness. He suggests: \[\ln RER_{it} = \ln (XRAT_{it} / PPP_{it})\] where \(XRAT\) and \(PPP\) are expressed as national currency units per U.S. dollar. However, Rodrik then states,

    +

    “The variable \(p\) in the Penn World Tables (called the”price level of GDP“) is equivalent to \(RER\). I have used \(p\) here as this series is more complete than \(XRAT\) and \(PPP\).”

    +

    This measure, shows that values of \(RER\) greater than one indicate that the value of the currency is lower (more depreciated) than indicated by purchasing power parity.

    +

    We will use the inverse of the pl_gdpo variable included in the PWT which is simply (Exchange rate / PPP).

    +

    There are 148 countries with complete datasets for pl_gdpo and rgdpe.

    +
    pl_gdpo_countries <- pwt_1[, sum(is.na(pl_gdpo))/.N, by = country]
    +full_pl_gdpo_countries <- pl_gdpo_countries[V1==0, country]
    +
    +Reduce(intersect, list(full_pl_gdpo_countries, full_rgdpe_countries))
    +
    ##   [1] "Angola"                             "Albania"                           
    +##   [3] "United Arab Emirates"               "Argentina"                         
    +##   [5] "Armenia"                            "Australia"                         
    +##   [7] "Austria"                            "Azerbaijan"                        
    +##   [9] "Burundi"                            "Belgium"                           
    +##  [11] "Benin"                              "Burkina Faso"                      
    +##  [13] "Bangladesh"                         "Bulgaria"                          
    +##  [15] "Bahrain"                            "Bosnia and Herzegovina"            
    +##  [17] "Belarus"                            "Bolivia (Plurinational State of)"  
    +##  [19] "Brazil"                             "Botswana"                          
    +##  [21] "Central African Republic"           "Canada"                            
    +##  [23] "Switzerland"                        "Chile"                             
    +##  [25] "China"                              "Cote d'Ivoire"                     
    +##  [27] "Cameroon"                           "Congo, Democratic Republic"        
    +##  [29] "Congo"                              "Colombia"                          
    +##  [31] "Costa Rica"                         "Czech Republic"                    
    +##  [33] "Germany"                            "Denmark"                           
    +##  [35] "Dominican Republic"                 "Algeria"                           
    +##  [37] "Ecuador"                            "Egypt"                             
    +##  [39] "Spain"                              "Estonia"                           
    +##  [41] "Ethiopia"                           "Finland"                           
    +##  [43] "France"                             "Gabon"                             
    +##  [45] "United Kingdom"                     "Georgia"                           
    +##  [47] "Ghana"                              "Guinea"                            
    +##  [49] "Gambia"                             "Guinea-Bissau"                     
    +##  [51] "Equatorial Guinea"                  "Greece"                            
    +##  [53] "Guatemala"                          "China, Hong Kong SAR"              
    +##  [55] "Honduras"                           "Croatia"                           
    +##  [57] "Haiti"                              "Hungary"                           
    +##  [59] "Indonesia"                          "India"                             
    +##  [61] "Ireland"                            "Iran (Islamic Republic of)"        
    +##  [63] "Iraq"                               "Israel"                            
    +##  [65] "Italy"                              "Jamaica"                           
    +##  [67] "Jordan"                             "Japan"                             
    +##  [69] "Kazakhstan"                         "Kenya"                             
    +##  [71] "Kyrgyzstan"                         "Cambodia"                          
    +##  [73] "Republic of Korea"                  "Kuwait"                            
    +##  [75] "Lao People's DR"                    "Lebanon"                           
    +##  [77] "Liberia"                            "Sri Lanka"                         
    +##  [79] "Lesotho"                            "Lithuania"                         
    +##  [81] "Latvia"                             "Morocco"                           
    +##  [83] "Republic of Moldova"                "Madagascar"                        
    +##  [85] "Mexico"                             "North Macedonia"                   
    +##  [87] "Mali"                               "Myanmar"                           
    +##  [89] "Mongolia"                           "Mozambique"                        
    +##  [91] "Mauritania"                         "Mauritius"                         
    +##  [93] "Malawi"                             "Malaysia"                          
    +##  [95] "Namibia"                            "Niger"                             
    +##  [97] "Nigeria"                            "Nicaragua"                         
    +##  [99] "Netherlands"                        "Norway"                            
    +## [101] "Nepal"                              "New Zealand"                       
    +## [103] "Oman"                               "Pakistan"                          
    +## [105] "Panama"                             "Peru"                              
    +## [107] "Philippines"                        "Poland"                            
    +## [109] "Portugal"                           "Paraguay"                          
    +## [111] "State of Palestine"                 "Qatar"                             
    +## [113] "Romania"                            "Russian Federation"                
    +## [115] "Rwanda"                             "Saudi Arabia"                      
    +## [117] "Sudan"                              "Senegal"                           
    +## [119] "Singapore"                          "Sierra Leone"                      
    +## [121] "El Salvador"                        "Serbia"                            
    +## [123] "Slovakia"                           "Slovenia"                          
    +## [125] "Sweden"                             "Eswatini"                          
    +## [127] "Syrian Arab Republic"               "Chad"                              
    +## [129] "Togo"                               "Thailand"                          
    +## [131] "Tajikistan"                         "Turkmenistan"                      
    +## [133] "Trinidad and Tobago"                "Tunisia"                           
    +## [135] "Turkey"                             "Taiwan"                            
    +## [137] "U.R. of Tanzania: Mainland"         "Uganda"                            
    +## [139] "Ukraine"                            "Uruguay"                           
    +## [141] "United States of America"           "Uzbekistan"                        
    +## [143] "Venezuela (Bolivarian Republic of)" "Viet Nam"                          
    +## [145] "Yemen"                              "South Africa"                      
    +## [147] "Zambia"                             "Zimbabwe"
    +
    pwt_2 <- pwt_1[country%in%Reduce(intersect, list(full_pl_gdpo_countries, full_rgdpe_countries)),]
    +pwt_2 <- pwt_2[isocode!="USA",]
    +
    +

    3.1.1 NNS

    +
    pwt_2[, "Growth_rate" := lapply(.SD, rate), by = country, .SDcols = "rgdpe"]
    +
    +pwt_2$pl_gdpo <- pwt_2$pl_gdpo^-1
    +
    +NNS_Caus <- pwt_2[, NNS_Causation_Direction := names(NNS.caus(pl_gdpo[-1], Growth_rate[-1], tau = 3)[3]), by = country]
    +
    +NNS_Caus <- pwt_2[, NNS_Causation := as.numeric(NNS.caus(pl_gdpo[-1], Growth_rate[-1], tau = 3)[3]), by = country]
    +
    +NNS_Caus[, unique(.SD), .SDcols=c("NNS_Causation", "NNS_Causation_Direction"), by = country]
    +
    + +
    +
    NNS_Caus[NNS_Causation_Direction == "C(x--->y)" & NNS_Causation > 0, unique(country)]
    +
    ##  [1] Australia                  Austria                   
    +##  [3] Belgium                    Bangladesh                
    +##  [5] Brazil                     Canada                    
    +##  [7] Switzerland                China                     
    +##  [9] Colombia                   Spain                     
    +## [11] Finland                    Guinea                    
    +## [13] Hungary                    Indonesia                 
    +## [15] Iran (Islamic Republic of) Republic of Korea         
    +## [17] Sri Lanka                  Madagascar                
    +## [19] Mexico                     Mali                      
    +## [21] Netherlands                New Zealand               
    +## [23] Panama                     Poland                    
    +## [25] Portugal                   State of Palestine        
    +## [27] Romania                    Sudan                     
    +## [29] Singapore                  Sweden                    
    +## [31] Syrian Arab Republic       Viet Nam                  
    +## 182 Levels: Aruba Angola Anguilla Albania United Arab Emirates ... Zimbabwe
    +
    +
    +

    3.1.2 generalCorr

    +
    gc <- pwt_2[, gc_causality := causeSummBlk(cbind(pl_gdpo[-1], Growth_rate[-1]))[1], by = country]
    +
    ## [1] causes    strength= -84.252
    +
    ## Error in out[i - 1, 1] <- nam[i]: number of items to replace is not a multiple of replacement length
    +
    # Isolate incidents of Growth_rate as kernel cause
    +gc[gc_causality=="Growth_rate", unique(country)]
    +
    ## factor(0)
    +## 182 Levels: Aruba Angola Anguilla Albania United Arab Emirates ... Zimbabwe
    +
    +
    +

    3.1.3 Granger

    +
    granger <- pwt_2[, granger_causality := grangertest(Growth_rate[-1] ~ pl_gdpo[-1], order = 1)$Pr[2], by = country]
    +
    +# Isolate instances of significance, p < .05
    +granger[granger_causality < 0.05, unique(country)]
    +
    ##  [1] Armenia              Azerbaijan           Bangladesh          
    +##  [4] Botswana             Canada               Germany             
    +##  [7] Denmark              France               Georgia             
    +## [10] Greece               China, Hong Kong SAR Israel              
    +## [13] Italy                Japan                Kyrgyzstan          
    +## [16] Cambodia             Lao People's DR      Sri Lanka           
    +## [19] Namibia              Nepal                Saudi Arabia        
    +## [22] Senegal              Eswatini             Tajikistan          
    +## [25] Turkmenistan         Taiwan               Viet Nam            
    +## 182 Levels: Aruba Angola Anguilla Albania United Arab Emirates ... Zimbabwe
    +

    Again, the generalCorr method is generating errors while Granger is showing significance for only 27 of the 148 countries.

    +
    +
    +
    +

    3.2 Hall and Jones (1999)

    +

    Output per worker varies enormously across countries. Why? On an accounting basis their analysis shows that differences in physical capital and educational attainment can only partially explain the variation in output per worker. They find a large amount of variation in the level of the Solow residual across countries. At a deeper level, they document that the differences in capital accumulation, productivity, and therefore output per worker are driven by differences in institutions and government policies, which they call ``social infrastructure’’. They treat social infrastructure as endogenous, determined historically by location and other factors captured in part by language.

    +

    We can cluster the rdgpe by location, or latitude, and test to see if there are statistically significant differences in the distributions of the average member nation. We will cluster the countries by latitude and then transform rdgpe to a growth rate for normalization across countries.

    +
    +

    3.2.1 Step 6: Incorporate Longitudinal Data

    +

    The geographic data was downloaded from https://lab.lmnixon.org/4th/worldcapitals.html, and stored locally as a .csv file named ``capitals.csv’’.

    +
    capitals <- read.csv(file = "capitals.csv", sep = ",")
    +
    +capitals$Longitude <- as.numeric(gsub("[NE]$", "",gsub("^(.*)[WS]$", "-\\1", capitals$Longitude)))
    +capitals$Latitude <- as.numeric(gsub("[NE]$", "",gsub("^(.*)[WS]$", "-\\1", capitals$Latitude)))
    +
    +country_list <- sort(unique(pwt[country%in%capitals$Country,]$country))
    +
    +pwt_3 <- pwt[country%in%country_list, ]
    +
    +colnames(capitals) <- tolower(colnames(capitals))
    +pwt_3 <- merge(pwt_3, capitals, by="country")
    +
    +tail(pwt_3)
    +
    + +
    +
    pwt_3[, "growth_rate" := lapply(.SD, rate), by = country, .SDcols = "rgdpe"]
    +pwt_3[, "mean_growth_rate" := lapply(.SD, mean, na.rm = TRUE), by = country, .SDcols = "growth_rate"]
    +
    +plot(pwt_3$latitude, pwt_3$mean_growth_rate, xlab="Latitude", ylab = "Growth Rate")
    +

    World Latitudes

    +
    +
    +

    3.2.2 Determine the Clusters

    +

    There are 146 countries, so we shall simply divide them into 3 groups of approximately 48 countries per group.

    +
    +
    +

    3.2.3 Group 1: -40 to 11.375 Latitude Countries

    +
    unique(pwt_3[latitude<11.375, country])
    +
    ##  [1] "Angola"                   "Argentina"               
    +##  [3] "Australia"                "Benin"                   
    +##  [5] "Botswana"                 "Brazil"                  
    +##  [7] "Brunei Darussalam"        "Burundi"                 
    +##  [9] "Cambodia"                 "Cameroon"                
    +## [11] "Central African Republic" "Chile"                   
    +## [13] "Colombia"                 "Congo"                   
    +## [15] "Costa Rica"               "Cote d'Ivoire"           
    +## [17] "Djibouti"                 "Ecuador"                 
    +## [19] "Equatorial Guinea"        "Ethiopia"                
    +## [21] "Fiji"                     "Gabon"                   
    +## [23] "Ghana"                    "Guinea"                  
    +## [25] "Indonesia"                "Kenya"                   
    +## [27] "Lesotho"                  "Liberia"                 
    +## [29] "Madagascar"               "Malawi"                  
    +## [31] "Malaysia"                 "Maldives"                
    +## [33] "Mauritania"               "Mozambique"              
    +## [35] "Namibia"                  "New Zealand"             
    +## [37] "Nigeria"                  "Panama"                  
    +## [39] "Paraguay"                 "Peru"                    
    +## [41] "Sao Tome and Principe"    "Sierra Leone"            
    +## [43] "South Africa"             "Suriname"                
    +## [45] "Togo"                     "Uganda"                  
    +## [47] "Uruguay"                  "Zambia"                  
    +## [49] "Zimbabwe"
    +
    group_1_growth <- mean(pwt_3[latitude<0, ]$mean_growth_rate)
    +
    +
    +

    3.2.4 Group 2: 11.375 to 35.5 Latitude Countries

    +
    unique(pwt_3[latitude>=11.375 & latitude<35.5, country])
    +
    ##  [1] "Antigua and Barbuda"        "Aruba"                     
    +##  [3] "Bahamas"                    "Bahrain"                   
    +##  [5] "Bangladesh"                 "Barbados"                  
    +##  [7] "Belize"                     "Bhutan"                    
    +##  [9] "British Virgin Islands"     "Burkina Faso"              
    +## [11] "Cayman Islands"             "Chad"                      
    +## [13] "Cyprus"                     "Dominica"                  
    +## [15] "Egypt"                      "El Salvador"               
    +## [17] "Gambia"                     "Guatemala"                 
    +## [19] "Guinea-Bissau"              "Haiti"                     
    +## [21] "Honduras"                   "India"                     
    +## [23] "Iran (Islamic Republic of)" "Iraq"                      
    +## [25] "Israel"                     "Jamaica"                   
    +## [27] "Jordan"                     "Kuwait"                    
    +## [29] "Lebanon"                    "Mali"                      
    +## [31] "Mexico"                     "Myanmar"                   
    +## [33] "Nepal"                      "Nicaragua"                 
    +## [35] "Niger"                      "Oman"                      
    +## [37] "Pakistan"                   "Philippines"               
    +## [39] "Qatar"                      "Saint Kitts and Nevis"     
    +## [41] "Saint Lucia"                "Saudi Arabia"              
    +## [43] "Senegal"                    "Sudan"                     
    +## [45] "Syrian Arab Republic"       "Thailand"                  
    +## [47] "United Arab Emirates"       "Viet Nam"
    +
    group_2_growth <- mean(pwt_3[latitude>=11.375 & latitude<35.5, ]$mean_growth_rate)
    +
    +
    +

    3.2.5 Group 3: Greater than 35.5 Latitude Countries

    +
    unique(pwt_3[latitude>=35.5, country])
    +
    ##  [1] "Albania"                  "Algeria"                 
    +##  [3] "Armenia"                  "Austria"                 
    +##  [5] "Azerbaijan"               "Belarus"                 
    +##  [7] "Belgium"                  "Bosnia and Herzegovina"  
    +##  [9] "Bulgaria"                 "Canada"                  
    +## [11] "China"                    "Croatia"                 
    +## [13] "Czech Republic"           "Denmark"                 
    +## [15] "Estonia"                  "Finland"                 
    +## [17] "France"                   "Georgia"                 
    +## [19] "Germany"                  "Greece"                  
    +## [21] "Hungary"                  "Iceland"                 
    +## [23] "Ireland"                  "Italy"                   
    +## [25] "Kazakhstan"               "Kyrgyzstan"              
    +## [27] "Latvia"                   "Lithuania"               
    +## [29] "Luxembourg"               "Malta"                   
    +## [31] "Netherlands"              "Norway"                  
    +## [33] "Poland"                   "Portugal"                
    +## [35] "Republic of Korea"        "Romania"                 
    +## [37] "Russian Federation"       "Slovakia"                
    +## [39] "Slovenia"                 "Spain"                   
    +## [41] "Sweden"                   "Switzerland"             
    +## [43] "Tajikistan"               "Tunisia"                 
    +## [45] "Turkey"                   "Turkmenistan"            
    +## [47] "Ukraine"                  "United States of America"
    +## [49] "Uzbekistan"
    +
    group_3_growth <- mean(pwt_3[latitude>=35.5, ]$mean_growth_rate)
    +

    We can see those direct differences in average growth rates for each of these groups, especially the Northern group 3.

    +
    plot(pwt_3$latitude, pwt_3$mean_growth_rate, xlab="Latitude", ylab = "Growth Rate",
    +     col = ifelse(pwt_3$latitude<11.375, 'red', 
    +                  ifelse(pwt_3$latitude<35.5, 'blue', 'purple')))
    +segments(min(pwt_3$latitude), group_1_growth, 11.375, group_1_growth, col = 'red', lwd = 3)
    +segments(11.375, group_2_growth, 35.5, group_2_growth, col = 'blue', lwd = 3)
    +segments(35.5, group_3_growth, max(pwt_3$latitude), group_3_growth, col = 'purple', lwd = 3)
    +

    +
    +
    +

    3.2.6 ANOVA of 3 Groups’ Growth Rates

    +
    # Add group labels to PWT
    +pwt_4 <- pwt_3
    +pwt_4[latitude<11.375, "group" := 1]
    +pwt_4[latitude>=11.375 & latitude<35.5, "group" := 2]
    +pwt_4[latitude>=35.5, "group" := 3]
    +
    +anova_fit <- aov(growth_rate ~ as.factor(group), data = pwt_4)
    +# Summary of the analysis
    +summary(anova_fit)
    +
    ##                    Df Sum Sq Mean Sq F value   Pr(>F)    
    +## as.factor(group)    2   0.10 0.04940   6.967 0.000948 ***
    +## Residuals        7989  56.65 0.00709                     
    +## ---
    +## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    +## 1936 observations deleted due to missingness
    +
    TukeyHSD(anova_fit)
    +
    ##   Tukey multiple comparisons of means
    +##     95% family-wise confidence level
    +## 
    +## Fit: aov(formula = growth_rate ~ as.factor(group), data = pwt_4)
    +## 
    +## $`as.factor(group)`
    +##             diff           lwr           upr     p adj
    +## 2-1  0.004356783 -0.0009289692  0.0096425344 0.1297363
    +## 3-1 -0.004476332 -0.0099153152  0.0009626506 0.1305302
    +## 3-2 -0.008833115 -0.0143817438 -0.0032844859 0.0005612
    +

    There is indeed a significant difference in means for growth rates for group 3, while groups 1 and 2 do not appear to be different. Therefore, location appears to offer an explanation of growth, thus supporting Hall and Jones’ social infrastructure contention.

    +

    Causal analysis as performed in the previous section will be ineffectual given the categorical nature of the location variable as proxied by latitude against the panel data of the country’s growth.

    +
    +
    +

    3.2.7 Does This Hold Since 2000?

    +
    pwt_5 <- pwt_4[pwt_4$year >= 2000,]
    +
    +
    +pwt_5[, "growth_rate" := lapply(.SD, rate), by = country, .SDcols = "rgdpe"]
    +pwt_5[, "mean_growth_rate" := lapply(.SD, mean, na.rm = TRUE), by = country, .SDcols = "growth_rate"]
    +
    +plot(pwt_5$latitude, pwt_5$mean_growth_rate, xlab="Latitude", ylab = "Growth Rate")
    +

    +
    group_1_growth <- mean(pwt_5[latitude<11.375, ]$mean_growth_rate)
    +group_2_growth <- mean(pwt_5[latitude>=11.375 & latitude<35.5, ]$mean_growth_rate)
    +group_3_growth <- mean(pwt_5[latitude>=35.5, ]$mean_growth_rate)
    +
    +plot(pwt_5$latitude, pwt_5$mean_growth_rate, xlab="Latitude", ylab = "Growth Rate",
    +     col = ifelse(pwt_5$latitude<11.375, 'red', 
    +                  ifelse(pwt_5$latitude<35.5, 'blue', 'purple')))
    +segments(min(pwt_5$latitude), group_1_growth, 11.375, group_1_growth, col = 'red', lwd = 3)
    +segments(11.375, group_2_growth, 35.5, group_2_growth, col = 'blue', lwd = 3)
    +segments(35.5, group_3_growth, max(pwt_5$latitude), group_3_growth, col = 'purple', lwd = 3)
    +

    +
    anova_fit <- aov(growth_rate ~ as.factor(group), data = pwt_5)
    +# Summary of the analysis
    +summary(anova_fit)
    +
    ##                    Df Sum Sq  Mean Sq F value Pr(>F)  
    +## as.factor(group)    2  0.039 0.019447   3.603 0.0274 *
    +## Residuals        2479 13.380 0.005397                 
    +## ---
    +## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    +## 146 observations deleted due to missingness
    +
    TukeyHSD(anova_fit)
    +
    ##   Tukey multiple comparisons of means
    +##     95% family-wise confidence level
    +## 
    +## Fit: aov(formula = growth_rate ~ as.factor(group), data = pwt_5)
    +## 
    +## $`as.factor(group)`
    +##             diff         lwr           upr     p adj
    +## 2-1 -0.002024647 -0.01051048  0.0064611866 0.8415778
    +## 3-1 -0.009200986 -0.01764297 -0.0007590076 0.0286892
    +## 3-2 -0.007176340 -0.01566217  0.0013094940 0.1165417
    +

    There is still significant difference in mean growth rates since 2000, and closer inspection via the Tukey test shows it is a negative difference in group 3, the Northern countries.

    +
    +
    +

    3.2.8 Year by Year Average Growth Rates for Each Group

    +

    What if we examine the trajectory of the p-values in the difference between groups 1 and 3 year by year…

    +

    The red horizontal line represents the \(p=0.05\) level of significance for the difference in means between groups 1 and 3.

    +
    p_value <- numeric()
    +
    +for(i in unique(pwt_4$year)){
    +  index <- which(i==unique(pwt_4$year))
    +
    +  pwt_5 <- pwt_4[pwt_4$year >= i, ]
    +  pwt_5[, "mean_growth_rate" := lapply(.SD, mean, na.rm = TRUE), by = country, 
    +        .SDcols = "growth_rate"]
    +
    +  group_1_growth <- mean(pwt_5[latitude<11.375, ]$mean_growth_rate)
    +  group_2_growth <- mean(pwt_5[latitude>=11.375 & latitude<35.5, ]$mean_growth_rate)
    +  group_3_growth <- mean(pwt_5[latitude>=35.5, ]$mean_growth_rate)
    +
    +  anova_fit <- aov(mean_growth_rate ~ as.factor(group), data = pwt_5)
    +
    +  a <- TukeyHSD(anova_fit)
    +  p_value[index] <- a[[1]][2,4]
    +}
    +
    +plot(head(unique(pwt_4$year), length(na.omit(p_value))), na.omit(p_value),
    +     xlab = "Year", ylab = "p-value for Group 3-1 Difference")
    +abline(h=0.05, col='red')
    +

    +

    We fail to reject the differences in group means over the last several years. Maybe this is the convergence they were speaking of and can contribute to GDP growth?

    +
    +
    +
    +
    +

    4 Summary

    +

    Below are the key findings from this analysis:

    +
      +
    • All 3 causation methods find rgdpe causes cwtfp.

    • +
    • NNS and Granger do not find population or population growth rates to cause real GDP or real GDP growth respectively.

    • +
    • generalCorr does find population to cause real GDP levels.

    • +
    • NNS finds a majority of countries real exchange rates \((RER)\) NOT to cause real GDP growth. This does not support Rodrick 2008.

    • +
    • Granger finds an overwhelming majority of countries real exchange rates \((RER)\) to NOT cause real GDP growth. This does not support Rodrick 2008.

    • +
    • Location does indeed have some lingering effects on real GDP growth, however, they are diminishing and not significant over the last several years supporting the convergence argument.

    • +
    +
    +
    +

    5 References

    +
      +
    • Feenstra, Robert C., Robert Inklaar, and Marcel P. Timmer (2015) ``The next generation of the Penn World Table.’’ American Economic Review, 105, no. 10, 3150-82.

    • +
    • Hall, Robert E., and Jones, Charles I. (1999) ``Why Do Some Countries Produce So Much More Output Per Worker Than Others?’’ The Quarterly Journal of Economics, Vol. 114, No. 1 (Feb., 1999), pp. 83-116.

    • +
    • Rodrik, Dani. ``The Real Exchange Rate and Economic Growth.’’ Brookings Papers on Economic Activity 2008, no. 2 (2008): 365-412. https://doi.org/10.1353/eca.0.0020.

    • +
    • Vinod, H. (2014) Matrix Algebra Topics in Statistics and Economics Using R'', inHandbook of Statistics’’, Volume 32, Ch. 4, 2014, Pages 143-176, https://doi.org/10.1016/B978-0-444-63431-3.00004-8.

    • +
    • Vinod, H. (2015) ``Generalized Correlation and Kernel Causality with Applications in Development Economics,’’ Communications in Statistics - Simulation and Computation, accepted Nov. 10, 2015, http://dx.doi.org/10.1080/03610918.2015.1122048.

    • +
    • Viole, F., and Nawrocki, D. (2013) ``Non-Linear Scaling Normalization with Variance Retention,’’ Available at SSRN, https://ssrn.com/abstract=2262358.

    • +
    • Viole, F., and Nawrocki, D. (2013) ``Causation,’’ Available at SSRN, https://ssrn.com/abstract=2273756.

    • +
    +
    + + + +
    +
    + +
    + + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/Partial Moments Equivalences.md b/tools/NNS/examples/Partial Moments Equivalences.md new file mode 100644 index 0000000..755868f --- /dev/null +++ b/tools/NNS/examples/Partial Moments Equivalences.md @@ -0,0 +1,310 @@ +## Partial Moments Equivalences + +Why is it necessary to parse variance with partial moments? The additional information generated from partial moments permits a level of analysis simply not possible with traditional summary statistics. + +Below are some basic equivalences demonstrating partial moments' role as the elements of variance. + +### Installation +```r +install.packages("NNS") +``` + +### Setup +```r +library(NNS) +set.seed(123) +x <- rnorm(100) +y <- rnorm(100) +``` + +### Mean +A difference between the upside area and the downside area of `f(x)`. + +```r +mean(x) +## [1] 0.09040591 + +UPM(1, 0, x) - LPM(1, 0, x) +## [1] 0.09040591 +``` + +### Variance + +```r +# Sample Variance (base R): +var(x) +## [1] 0.8332328 + +# Sample Variance from partial moments: +(UPM(2, mean(x), x) + LPM(2, mean(x), x)) * (length(x) / (length(x) - 1)) +## [1] 0.8332328 + +# Population Adjustment of Sample Variance (base R): +var(x) * ((length(x) - 1) / length(x)) +## [1] 0.8249005 + +# Population Variance: +UPM(2, mean(x), x) + LPM(2, mean(x), x) +## [1] 0.8249005 + +# Variance is also the covariance of itself: +(Co.LPM(1, x, x, mean(x), mean(x)) + + Co.UPM(1, x, x, mean(x), mean(x)) - + D.LPM(1, 1, x, x, mean(x), mean(x)) - + D.UPM(1, 1, x, x, mean(x), mean(x))) +## [1] 0.8249005 +``` + +### Standard Deviation + +```r +sd(x) +## [1] 0.9128159 + +((UPM(2, mean(x), x) + LPM(2, mean(x), x)) * (length(x) / (length(x) - 1))) ^ .5 +## [1] 0.9128159 + +sqrt(NNS.moments(x, population = FALSE)$variance) +## [1] 0.9128159 +``` + +### First 4 Moments +The first four moments are returned with `NNS.moments()`. For sample statistics, set `population = FALSE`. + +```r +NNS.moments(x) +## $mean +## [1] 0.09040591 +## +## $variance +## [1] 0.8249005 +## +## $skewness +## [1] 0.06049948 +## +## $kurtosis +## [1] -0.161053 + +NNS.moments(x, population = FALSE) +## $mean +## [1] 0.09040591 +## +## $variance +## [1] 0.8332328 +## +## $skewness +## [1] 0.06235774 +## +## $kurtosis +## [1] -0.1069186 +``` + +### Statistical Mode of a Continuous Distribution +`NNS.mode` offers support for discrete valued distributions as well as recognizing multiple modes. + +```r +# Continuous +NNS.mode(x) +## [1] -0.4132834 + +# Discrete and multiple modes +NNS.mode(c(1, 2, 2, 3, 3, 4, 4, 5), discrete = TRUE, multi = TRUE) +## [1] 2 3 4 +``` + +### Covariance + +```r +cov(x, y) +## [1] -0.04372107 + +(Co.LPM(1, x, y, mean(x), mean(y)) + + Co.UPM(1, x, y, mean(x), mean(y)) - + D.LPM(1, 1, x, y, mean(x), mean(y)) - + D.UPM(1, 1, x, y, mean(x), mean(y))) * + (length(x) / (length(x) - 1)) +## [1] -0.04372107 +``` + +### Covariance Elements and Covariance Matrix +The covariance matrix $(\Sigma)$ is equal to the sum of the co-partial moments matrices less the divergent partial moments matrices. + +$$\Sigma = CLPM + CUPM - DLPM - DUPM$$ + +```r +cov.mtx <- PM.matrix( + LPM_degree = 1, + UPM_degree = 1, + target = "mean", + variable = cbind(x, y), + pop_adj = TRUE +) + +cov.mtx +## $cupm +## x y +## x 0.4299250 0.1033601 +## y 0.1033601 0.5411626 +## +## $dupm +## x y +## x 0.0000000 0.1469182 +## y 0.1560924 0.0000000 +## +## $dlpm +## x y +## x 0.0000000 0.1560924 +## y 0.1469182 0.0000000 +## +## $clpm +## x y +## x 0.4033078 0.1559295 +## y 0.1559295 0.3939005 +## +## $cov.matrix +## x y +## x 0.83323283 -0.04372107 +## y -0.04372107 0.93506310 + +# Reassembled Covariance Matrix +cov.mtx$clpm + cov.mtx$cupm - cov.mtx$dlpm - cov.mtx$dupm +## x y +## x 0.83323283 -0.04372107 +## y -0.04372107 0.93506310 + +# Standard Covariance Matrix +cov(cbind(x, y)) +## x y +## x 0.83323283 -0.04372107 +## y -0.04372107 0.93506310 +``` + +### Pearson Correlation + +```r +cor(x, y) +## [1] -0.04953215 + +cov.xy <- (Co.LPM(1, x, y, mean(x), mean(y)) + + Co.UPM(1, x, y, mean(x), mean(y)) - + D.LPM(1, 1, x, y, mean(x), mean(y)) - + D.UPM(1, 1, x, y, mean(x), mean(y))) * + (length(x) / (length(x) - 1)) + +sd.x <- ((UPM(2, mean(x), x) + LPM(2, mean(x), x)) * + (length(x) / (length(x) - 1))) ^ .5 + +sd.y <- ((UPM(2, mean(y), y) + LPM(2, mean(y), y)) * + (length(y) / (length(y) - 1))) ^ .5 + +cov.xy / (sd.x * sd.y) +## [1] -0.04953215 +``` + +### Skewness +A normalized difference between upside area and downside area. + +```r +PerformanceAnalytics::skewness(x) + +(UPM(3, mean(x), x) - LPM(3, mean(x), x)) / + (UPM(2, mean(x), x) + LPM(2, mean(x), x)) ^ (3 / 2) + +NNS.moments(x)$skewness +``` + +### UPM / LPM Ratio +A more intuitive skewness measure: upside area divided by downside area. + +```r +UPM(2, mean(x), x) / LPM(2, mean(x), x) +``` + +### Kurtosis +A normalized sum of upside area and downside area. + +```r +PerformanceAnalytics::kurtosis(x) + +((UPM(4, mean(x), x) + LPM(4, mean(x), x)) / + (UPM(2, mean(x), x) + LPM(2, mean(x), x)) ^ 2) - 3 + +NNS.moments(x)$kurtosis +``` + +### CDFs (Discrete and Continuous) + +```r +P <- ecdf(x) +P(0) ; P(1) + +LPM(0, 0, x) ; LPM(0, 1, x) + +# Vectorized targets: +LPM(0, c(0, 1), x) + +# Plot CDF vs LPM: +plot(ecdf(x)) +points(sort(x), LPM(0, sort(x), x), col = "red") +legend("left", legend = c("ecdf", "LPM.CDF"), + fill = c("black", "red"), border = NA, bty = "n") + +# Joint CDF: +Co.LPM(0, x, y, 0, 0) + +# Vectorized targets: +Co.LPM(0, x, y, c(0, 1), c(0, 1)) + +# Continuous CDF: +NNS.CDF(x, 1) + +# CDF with target: +NNS.CDF(x, 1, target = mean(x)) + +# Survival Function: +NNS.CDF(x, 1, type = "survival") +``` + +### Copulas + +```r +# Transform x and y to uniform marginals +u_x <- LPM.ratio(0, x, x) +u_y <- LPM.ratio(0, y, y) + +# Value of copula at c(0.5, 0.5) +Co.LPM(0, u_x, u_y, 0.5, 0.5) +``` + +### Numerical Integration +Partial moments are asymptotic area approximations of `f(x)` akin to the familiar Trapezoidal and Simpson's rules. More observations, more accuracy. + +$$[UPM(1,0,f(x)) - LPM(1,0,f(x))] \asymp \frac{F(b)-F(a)}{b-a}$$ +$$[UPM(1,0,f(x)) - LPM(1,0,f(x))] \cdot (b-a) \asymp F(b)-F(a)$$ + +```r +x <- seq(0, 1, .001) +y <- x ^ 2 + +(UPM(1, 0, y) - LPM(1, 0, y)) * (1 - 0) +## [1] 0.3335 +``` + +$$0.3333 \cdot [1-0] = \int_{0}^{1} x^2 \, dx$$ + +For the total area, not just the definite integral, sum the partial moments and multiply by $(b - a)$: + +$$[UPM(1,0,f(x)) + LPM(1,0,f(x))] \cdot (b-a) \asymp \left|\int_a^b f(x)dx\right|$$ + +### Bayes' Theorem +For example, when ascertaining the probability of an increase in $A$ given an increase in $B$, the `Co.UPM(degree_upm, x, y, target_x, target_y)` target parameters are set to `target_x = 0` and `target_y = 0` and the `UPM(degree, target, variable)` target parameter is also set to `target = 0`. + +$$P(A|B)=\frac{Co.UPM(0,A,B,0,0)}{UPM(0,0,B)}$$ + +### References +- [Partial Moments as a Unifying Primitive: Distributional Structure, Benchmark-Relative Utility, Adaptive Estimation, and Learned Neural Nonlinearities](https://doi.org/10.2139/ssrn.6249658) +- [Nonlinear Nonparametric Statistics: Using Partial Moments](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/index.md) +- [Cumulative Distribution Functions and UPM/LPM Analysis](https://doi.org/10.2139/ssrn.2148482) +- [Continuous CDFs and ANOVA with NNS](https://doi.org/10.2139/ssrn.3007373) +- [f(Newton)](https://doi.org/10.2139/ssrn.2186471) +- [Bayes' Theorem From Partial Moments](https://doi.org/10.2139/ssrn.3457377) diff --git a/tools/NNS/examples/Partitional_Estimation_Using_Partial_Moments.pdf b/tools/NNS/examples/Partitional_Estimation_Using_Partial_Moments.pdf new file mode 100644 index 0000000..fa6c190 Binary files /dev/null and b/tools/NNS/examples/Partitional_Estimation_Using_Partial_Moments.pdf differ diff --git a/tools/NNS/examples/README.md b/tools/NNS/examples/README.md new file mode 100644 index 0000000..7b2339d --- /dev/null +++ b/tools/NNS/examples/README.md @@ -0,0 +1,111 @@ +# NNS +NNS (Nonlinear Nonparametric Statistics) leverages partial moments – the fundamental [elements of variance](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Partial%20Moments%20Equivalences.md) that [asymptotically approximate the area of f(x)](https://ovvo-financial.github.io/NNS/book/numerical-integration-via-partial-moments.html) – to provide a robust foundation for nonlinear analysis while maintaining linear equivalences. Designed for real-world data that violates symmetry, linearity, or distributional assumptions. + +NNS delivers a comprehensive suite of advanced statistical techniques, including: + - Numerical Integration & Numerical Differentiation + - Partitional & Hierarchical Clustering + - Nonlinear Correlation & Dependence + - Causal Analysis + - Nonlinear Regression & Classification + - ANOVA + - Seasonality & Autoregressive Modeling + - Normalization + - Stochastic Superiority / Dominance + +See the following for NNS detailed examples and specific applications: + +# 1. Basic Statistics + + 1.1 [Partial Moment Equivalences](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Partial%20Moments%20Equivalences.md) + + 1.2 [Bayes' Theorem](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Bayes'%20Theorem%20From%20Partial%20Moments.pdf) + + 1.3 [CDFs and ANOVA](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Continuous_CDFs_and_ANOVA_with_NNS.pdf) + + 1.4 [Bias and Confidence Intervals](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Bias_and_CI.html) + + 1.5 [Correlation and Dependence](https://cran.r-project.org/package=NNS/vignettes/NNSvignette_Correlation_and_Dependence.html) + + 1.6 [Normalization](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Normalization.pdf) + + 1.7 [Partial Moments Estimation Error](https://github.com/OVVO-Financial/Finance/blob/main/Data/Estimation_Error_Replication.md) + + +# 2. Regression + + 2.1 [Overview](https://ssrn.com/abstract=3389938) + + 2.2 [Curve Fitting](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Curve_Fitting.html) + + 2.3 [Nonparametric Regression Using Clusters](http://rdcu.be/tz0J) + + 2.4 [Clustering and Curve Fitting By Line Segments](https://ssrn.com/abstract=2861339) + + 2.5 [Regression Residuals](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Regression_Residuals.html) + + 2.6 [Multiple Imputation](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/NNS_MI_vs_MICE.md) + + 2.7 [Logistic Regression Binary Classification](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Logistic_Comparison.html) + + 2.8 [Boston Housing](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Boston_Housing.html) + + + +# 3. Machine Learning + 3.1 [Partitional Estimation Using Partial Moments](https://ssrn.com/abstract=3592491) + + 3.2 [NNS Regression in Machine Learning](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Machine_Learning.pdf) + + 3.3 [Classification Using NNS Clustering Analysis](https://ssrn.com/abstract=2864711) + + 3.4 [NNS vs. xgboost](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/xgboost_example.html) + + 3.5 [Time-Series Classification](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Time_Series_Classification.html) + + 3.6 [Time-Series Classification II](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Time_Series_Classification_Expanded.html) + + 3.7 [Spiral Matching Example](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Spiral%20Matching%20Example.pdf) + + 3.8 [MNIST](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/NNS%20vs%20KNN%20MNIST%20dataset.pdf) + + +# 4. Time-Series Forecasting + + 4.1 [Overview](https://ssrn.com/abstract=3382300) + + 4.2 [NNS vs. KERAS](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Sunspots_example.html) + + 4.3 [NNS vs. prophet](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/prophet_NNS_comparison.html) + + 4.4 [Tides](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/tides.html) + + 4.5 [NNS vs. N-HiTS](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/NNS.ARMA%20vs%20N-Hits.md) + + 4.6 [NNS Time-Series Prediction Interval Benchmark](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/nns_arma_conformal_benchmark_report.md) + + +# 5. Econometrics + + 5.1 [Econometrics Critiques and Solutions](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/7_Econometric_Reasons.html) + + 5.2 [VAR Alternative](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/VAR_example.html) + + 5.3 [NOWCASTING](https://ssrn.com/abstract=3589816) + + 5.4 [Causal Analysis](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/PWT.html) + + 5.5 [Federal Reserve Causal Analysis](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Causal_Inference_Amongst_Macroeconomic_Variables_Using_NNS.html) + + 5.6 [Causal Inference](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Causal_Inference_with_NNS_stack.pdf) + + +# References + +The previous examples are just that...examples. They are not meant to serve as proofs or intended to be exhaustive demonstrations, rather the hands-on application of a robust nonparametric regression in many different types of common machine learning problems. + +See the [book](https://ovvo-financial.github.io/NNS/book/) or the [papers available on SSRN](https://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=1421356), if you'd like to learn *why* & *how* NNS does what it does. + + +# Thank you for your interest in NNS! + + diff --git a/tools/NNS/examples/Regression_Residuals.html b/tools/NNS/examples/Regression_Residuals.html new file mode 100644 index 0000000..b601278 --- /dev/null +++ b/tools/NNS/examples/Regression_Residuals.html @@ -0,0 +1,502 @@ + + + + + + + + + + + + + + +Regression Residuals + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    +
    +
    +
    +
    + +
    + + + + + + + +
    +

    1 Intro

    +

    There was a recent blog post on plotting estimated values \((\hat{y})\) versus actual values \((y)\) here.

    +

    The premise is to plot the residuals as a function of model prediction, and comparing them to the line \(y = 0\), using a smoothing curve through the residuals. The idea is that for a well-fit model, the smoothing curve should approximately lie on the line \(y = 0\). This is true not only for linear models, but for any model that captures most of the explainable variance, and for which the unexplainable variance (the noise) is IID and zero mean.

    +

    If the residuals aren’t zero mean independently of the model’s predictions, then either you are missing some explanatory variables, or your model does not have the correct structure, or an appropriate inductive bias.

    +

    The following examples demonstrate NNS.reg() residuals to have 0 mean as an artifact of the quadrant partitioning method. We compare NNS to a basic linear regression and a nonlinear kernel regression.

    +
    +
    +

    2 Install NNS (>=0.4.6) from GitHub

    +
    #require(devtools); install_github('OVVO-Financial/NNS', ref = "NNS-Beta-Version")
    +
    +library(NNS)
    +library(ggplot2)
    +library(np)
    +options(np.messages = FALSE)
    +
    +
    +

    3 Generate Data and Run Regressions

    +
    set.seed(34524)
    +N = 100
    +x1 = runif(N)
    +x2 = runif(N)
    +noise = 0.25*rnorm(N)
    +y = x1 + x2 + noise
    +qf = data.frame(x1=x1, x2=x2, y=y)
    +model = lm(y~x1+x2, data=qf)
    +
    +qf$pred = predict(model, newdata=qf)
    +qf$residual = with(qf, y-pred)
    +
    +
    +
    +nns_model = NNS.reg(qf[, 1:2], qf$y, residual.plot = FALSE, dist = "L2")$Fitted.xy
    +
    +nns_cross_val = NNS.stack(qf[, 1:2], qf$y, IVs.test = qf[, 1:2], method = 1, dist = "L2")$stack
    +nns_cv_resid = nns_cross_val - qf$y
    +nns_cv = cbind.data.frame(y.hat = nns_cross_val, residuals = nns_cv_resid, y = qf$y)
    +
    +
    +bw = npregbw(xdat=qf[,1:2],ydat=qf$y)
    +np_model = npreg(bws = bw, residuals = TRUE)
    +
    +np = data.frame(cbind(mean=np_model$mean, resid=np_model$resid))
    +
    +# standard residual plot
    +ggplot(qf, aes(x=pred, y=residual)) + 
    +  geom_point(alpha=0.5) + geom_hline(yintercept=0, color="red") +
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("Standard residual plot",
    +          subtitle = "linear model and process")
    +

    +
    ggplot(nns_model, aes(x=y.hat, y=residuals)) + 
    +  geom_point(alpha=0.5) + geom_hline(yintercept=0, color="red") +
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("Standard residual plot",
    +          subtitle = "NNS model and process")
    +

    +
    ggplot(nns_cv, aes(x=y.hat, y=residuals)) + 
    +  geom_point(alpha=0.5) + geom_hline(yintercept=0, color="red") +
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("Standard residual plot",
    +          subtitle = "NNS CV model and process")
    +

    +
    ggplot(np, aes(x=mean, y=resid)) + 
    +  geom_point(alpha=0.5) + geom_hline(yintercept=0, color="red") +
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("Standard residual plot",
    +          subtitle = "np model and process")
    +

    +
    ggplot(qf, aes(x=pred, y=y)) + 
    +  geom_point(alpha=0.5) + geom_abline(color="red") + 
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("Standard prediction plot")
    +

    +
    ggplot(nns_model, aes(x=y.hat, y=y)) + 
    +  geom_point(alpha=0.5) + geom_abline(color="red") + 
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("NNS Standard prediction plot")
    +

    +
    ggplot(nns_cv, aes(x=y.hat, y=y)) + 
    +  geom_point(alpha=0.5) + geom_abline(color="red") + 
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("NNS CV Standard prediction plot")
    +

    +
    ggplot(np, aes(x=mean, y=y)) + 
    +  geom_point(alpha=0.5) + geom_abline(color="red") + 
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("np Standard prediction plot")
    +

    +
    x3 = runif(N)
    +y = x1 + x2 + 2*x3^2 + 0.25*noise
    +qf = data.frame(x1=x1, x2=x2, x3=x3, y=y)
    +
    +# Fit a linear regression model
    +model2 = lm(y~x1+x2+x3, data=qf)
    +# summary(model2)
    +
    +qf$pred = predict(model2, newdata=qf)
    +qf$residual = with(qf, y-pred)
    +
    +
    +
    +nns_model2 = NNS.reg(qf[,1:3], qf$y, residual.plot = FALSE, dist = "L2")$Fitted.xy
    +
    +nns_cross_val_2 = NNS.stack(qf[, 1:3], qf$y, IVs.test = qf[, 1:3], method = 1, dist = "L2")$stack
    +nns_cv_resid_2 = nns_cross_val_2 - qf$y
    +nns_cv_2 = cbind.data.frame(y.hat = nns_cross_val_2, residuals = nns_cv_resid_2, y = qf$y)
    +
    +
    +bw = npregbw(xdat=qf[,1:3],ydat=qf$y)
    +np_model2 = npreg(bws = bw, residuals = TRUE)
    +
    +np2 = data.frame(cbind(mean=np_model2$mean, resid=np_model2$resid))
    +
    +
    +
    +ggplot(qf, aes(x=pred, y=residual)) + 
    +  geom_point(alpha=0.5) + geom_hline(yintercept=0, color="red") +
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("Standard residual plot",
    +          subtitle = "linear model, quadratic process")
    +

    +
    ggplot(nns_model2, aes(x=y.hat, y=residuals)) + 
    +  geom_point(alpha=0.5) + geom_hline(yintercept=0, color="red") +
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("NNS Standard residual plot",
    +          subtitle = "NNS model and process")
    +

    +
    ggplot(nns_cv_2, aes(x=y.hat, y=residuals)) + 
    +  geom_point(alpha=0.5) + geom_hline(yintercept=0, color="red") +
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("Standard residual plot",
    +          subtitle = "NNS CV model and process")
    +

    +
    ggplot(np2, aes(x=mean, y=resid)) + 
    +  geom_point(alpha=0.5) + geom_hline(yintercept=0, color="red") +
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("np Standard residual plot",
    +          subtitle = "np model and process")
    +

    +
    ggplot(qf, aes(x=pred, y=y)) + 
    +  geom_point(alpha=0.5) + geom_abline(color="red") + 
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("Standard prediction plot")
    +

    +
    ggplot(nns_model2, aes(x=y.hat, y=y)) + 
    +  geom_point(alpha=0.5) + geom_abline(color="red") + 
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("NNS Standard prediction plot")
    +

    +
    ggplot(nns_cv_2, aes(x=y.hat, y=y)) + 
    +  geom_point(alpha=0.5) + geom_abline(color="red") + 
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("NNS CV Standard prediction plot")
    +

    +
    ggplot(np2, aes(x=mean, y=y)) + 
    +  geom_point(alpha=0.5) + geom_abline(color="red") + 
    +  geom_smooth(se=FALSE) + 
    +  ggtitle("np Standard prediction plot")
    +

    +
    +
    +

    4 Comments

    +

    NNS.reg is not a black-box ML solution, rather, NNS provides a theoretically sound solution to dynamically partitioning regressors, and finding conditional outputs from those partitions.

    +

    NNS is not a one-trick pony, as it has been demonstrated to excel in time-series forecasting, nonlinear continuous regressions, and provide solutions for econometric applications. See the following examples:

    + +

    I look forward to further discussions and collaboration with those equally as passionate about these issues, and open to embracing alternative solutions. If you found this presentation interesting or useful, please feel free to reach out via e-mail:

    +

    Thanks for your interest!

    +
    + + + +
    +
    + +
    + + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/Spiral Matching Example.pdf b/tools/NNS/examples/Spiral Matching Example.pdf new file mode 100644 index 0000000..7c1be13 Binary files /dev/null and b/tools/NNS/examples/Spiral Matching Example.pdf differ diff --git a/tools/NNS/examples/Sunspots_example.html b/tools/NNS/examples/Sunspots_example.html new file mode 100644 index 0000000..60dac0f --- /dev/null +++ b/tools/NNS/examples/Sunspots_example.html @@ -0,0 +1,1931 @@ + + + + + + + + + + + + + +Sunspots: NNS vs. KERAS + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + +
    +

    Intro

    +

    We are comparing RMSE of eleven 10 year predictions (120 monthly +observations) for the sunspot.month dataset in R using +KERAS LSTM and NNS.ARMA.

    +
    +

    Long Short-Term Memory (LSTM) Models

    +

    LSTMs are quite useful in time series prediction tasks involving +autocorrelation, the presence of correlation between the time series and +lagged versions of itself, because of their ability to maintain state +and recognize patterns over the length of the time series.

    +

    In normal (or stateless) mode, KERAS +shuffles the samples, and the dependencies between the time series and +the lagged version of itself are lost. However, when run in +stateful mode, we can often get high accuracy results by +leveraging the autocorrelations present in the time series.

    +

    KERAS LSTM description and example with accompanying +code (lots of it!) available via the following link: https://www.business-science.io/timeseries-analysis/2018/04/18/keras-lstm-sunspots-time-series-prediction.html

    +
    +
    +

    NNS.ARMA

    +

    NNS in its forecasting routine NNS.ARMA +also maintains the dependencies between the time series and lagged +values of itself, and then uses these dependencies in a regression +(either linear or nonlinear).
    +See here for a thorough description: https://www.researchgate.net/publication/327495856_Forecasting

    +
    +

    Install the Latest Version of NNS (>= 11.6.3)

    +
      +
    1. +
    +
    #library(devtools); install_github('OVVO-Financial/NNS', ref = "NNS-Beta-Version")
    +library(NNS)
    +library(Metrics)
    +library(xts)
    +library(zoo)
    +library(doParallel)
    +
    +
    +
    +
    +

    Step 1: Create the Subsets of sunspot.month

    +

    We need to align dates to the KERAS example, so we +create an xts object to manipulate the dates and verify +they match.

    +
    sunspots_xts <- as.xts(sunspot.month)
    +
    +tmp <- tempfile()
    +
    +write.zoo(sunspots_xts, sep = ",", file = tmp)
    +
    +sun <- read.zoo(tmp, sep = ",", FUN = as.yearmon)
    +
    +sun_xts <- as.xts(sun)
    +
    +# Create the same end date for all slices
    +dates=c("/18081202", "/18290102", "/18490202", "/18690302", "/18890402",
    +        "/19090502","/19290602","/19490702","/19690802","/19890902","/20091002")
    +
    +Slice=list()
    +
    +for(i in 1:11){
    +  Slice[[paste0("Slice_",i)]] = sun_xts[dates[i]]
    +  print(tail(Slice[[i]],1))
    +}
    +
    ##          [,1]
    +## Dec 1808 12.3
    +##          [,1]
    +## Jan 1829   43
    +##           [,1]
    +## Feb 1849 131.8
    +##          [,1]
    +## Mar 1869 52.7
    +##          [,1]
    +## Apr 1889  4.3
    +##          [,1]
    +## May 1909   36
    +##          [,1]
    +## Jun 1929 71.9
    +##           [,1]
    +## Jul 1949 125.8
    +##          [,1]
    +## Aug 1969   98
    +##           [,1]
    +## Sep 1989 176.7
    +##          [,1]
    +## Oct 2009  4.8
    +

    +
    +
    +

    Step 2: Determine the seasonality

    +

    Here’s the general procedure on a single Slice.

    +

    We use a modulo 12 (%%12) call on the generated seasonal +periods from NNS.seas to find the nearest logical annual +cycle data point.

    +

    No test set leakage into the detected seasonal periods as +we are eliminating the last 120 observations from the variable and store +it in the new variable training.

    +
    training = Slice[[6]][1:(length(Slice[[6]])-120)]
    +
    +periods = NNS.seas(training, modulo = 12)$periods
    +

    +
    head(periods)
    +
    ## [1] 684 876 564 804 420 588
    +
    +
    +

    Step 3: Determine the Optimal Parameters

    +

    The NNS.ARMA.optim routine checks various parameter +combinations in order to best fit the training set according to any +objective function specified. We store this under variable +b. NNS.ARMA.optim returns the optimal seasonal +periods, the objective function result, and which NNS +regression method was used: linear, nonlinear, or a combination of +both.

    +

    We also specify the objective function via +obj.fn = expression(Metrics::rmse(actual, predicted)) and +the objective to minimize it. Any objective function can be used, +calling the specific terms actual and +predicted within the expression(...) call.

    +
    start.time = Sys.time()
    +nns.predict = NNS.ARMA.optim(variable = training,
    +                             h = 120,
    +                             seasonal.factor = periods,
    +                             obj.fn = expression(Metrics::rmse(actual, predicted)),
    +                             objective = "min",
    +                             print.trace = TRUE)$results
    +
    ## [1] "CURRNET METHOD: lin"
    +## [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
    +## [1] "NNS.ARMA(... method =  'lin' , seasonal.factor =  c( 132 ) ...)"
    +## [1] "CURRENT lin OBJECTIVE FUNCTION = 27.162508733888"
    +## [1] "NNS.ARMA(... method =  'lin' , seasonal.factor =  c( 132, 276 ) ...)"
    +## [1] "CURRENT lin OBJECTIVE FUNCTION = 24.6431883515214"
    +## [1] "BEST method = 'lin', seasonal.factor = c( 132, 276 )"
    +## [1] "BEST lin OBJECTIVE FUNCTION = 24.6431883515214"
    +## [1] "CURRNET METHOD: nonlin"
    +## [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
    +## [1] "NNS.ARMA(... method =  'nonlin' , seasonal.factor =  c( 132, 276 ) ...)"
    +## [1] "CURRENT nonlin OBJECTIVE FUNCTION = 35.3808838776509"
    +## [1] "BEST method = 'nonlin' PATH MEMBER = c( 132, 276 )"
    +## [1] "BEST nonlin OBJECTIVE FUNCTION = 35.3808838776509"
    +## [1] "CURRNET METHOD: both"
    +## [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
    +## [1] "NNS.ARMA(... method =  'both' , seasonal.factor =  c( 132, 276 ) ...)"
    +## [1] "CURRENT both OBJECTIVE FUNCTION = 27.9810674881252"
    +## [1] "BEST method = 'both' PATH MEMBER = c( 132, 276 )"
    +## [1] "BEST both OBJECTIVE FUNCTION = 27.9810674881252"
    +
    Sys.time() - start.time
    +
    ## Time difference of 40.21502 secs
    +
    +
    +

    Step 4: Calculate NNS RMSE

    +

    The performance measure used is the root mean squared error (RMSE) +against the last 120 observations of that particular +Slice.

    +
    Metrics::rmse(predicted = nns.predict, actual = tail(Slice[[6]], 120))
    +
    ## [1] 24.93657
    +
    +
    +

    Step 5: Compare NNS RMSE with KERAS LSTM +RMSE

    +

    Below is an image of the last 10 years prediction of +Slice 11. NNS is significantly more +accurate…

    +

    +
    +
    +

    Step 6: Let’s try all the slices…

    +

    We compare KERAS implementation of its rolling +forecasts, to NNS forecasts per Slice. +KERAS generates \(\mu_{RMSE}=34.4\) and \(\sigma_{RMSE} = 13.0\).

    +

    +
    start.time=Sys.time()
    +NNS.RMSE = list()
    +
    +# Run in parallel
    +
    +cl <- makeCluster(detectCores()-1)
    +registerDoParallel(cl)
    +
    +
    +
    +NNS.RMSE <- foreach(i = 1:11,.packages=c("NNS", "Metrics") ) %dopar% {
    +  training = as.vector(Slice[[i]][1:(length(Slice[[i]])-120)])
    +  
    +  # Seasonality per slice
    +  periods = NNS.seas(training, modulo = 12, plot = FALSE)$periods
    + 
    +  # Determine optimal parameters
    +  nns.predictions = NNS.ARMA.optim(variable = training,
    +                                   h = 120,
    +                                   seasonal.factor = periods,
    +                                   obj.fn = expression(Metrics::rmse(actual, predicted)),
    +                                   objective = "min",
    +                                   ncores = 1,
    +                                   print.trace = FALSE)$results
    +               
    +  # RMSE call
    +  Metrics::rmse(predicted = nns.predictions, actual = tail(Slice[[i]], 120))
    +}
    +
    +stopCluster(cl)
    +registerDoSEQ()
    +print(Sys.time()-start.time)
    +
    ## Time difference of 4.545993 mins
    +
    +
    +

    Results & Comments

    +
    +

    Per Slice results:

    +
    "NNS.RMSE"=as.vector(unlist(NNS.RMSE))
    +"KERAS.RMSE"=c(48.2,17.4,41,26.6,22.2,49,18.1,54.9,28,38.4,34.2)
    +
    +print(as.data.frame(cbind(NNS.RMSE, KERAS.RMSE, "NNS % Improvement"=1-NNS.RMSE/KERAS.RMSE),
    +              row.names = names(Slice)))
    +
    ##          NNS.RMSE KERAS.RMSE NNS % Improvement
    +## Slice_1  23.20313       48.2        0.51860729
    +## Slice_2  26.36671       17.4       -0.51532841
    +## Slice_3  31.80686       41.0        0.22422288
    +## Slice_4  30.10533       26.6       -0.13177926
    +## Slice_5  34.32120       22.2       -0.54599999
    +## Slice_6  24.93657       49.0        0.49109032
    +## Slice_7  26.22320       18.1       -0.44879550
    +## Slice_8  67.10959       54.9       -0.22239690
    +## Slice_9  26.13460       28.0        0.06662145
    +## Slice_10 29.38657       38.4        0.23472487
    +## Slice_11 33.57341       34.2        0.01832141
    +
    +
    +

    Aggregate results:

    +
    "Mean NNS.RMSE"=mean(NNS.RMSE)
    +
    +"SD NNS.RMSE"=sd(NNS.RMSE)
    +
    +rbind(`Mean NNS.RMSE`,
    +      `Mean KERAS.RMSE`=mean(KERAS.RMSE),
    +      `SD NNS.RMSE`,
    +      `SD KERAS.RMSE`=sd(KERAS.RMSE))
    +
    ##                     [,1]
    +## Mean NNS.RMSE   32.10611
    +## Mean KERAS.RMSE 34.36364
    +## SD NNS.RMSE     12.15593
    +## SD KERAS.RMSE   12.99525
    +

    NNS demonstrates both a significant reduction in RMSE +and a significant reduction in variance vs. KERAS LSTM over +11 samples of the sunspots data.

    +

    The parsimonious code and interpretability of parameter settings +clearly favors NNS as well. Run-times however, favor +KERAS significantly and that is solely a function of not +having comparable resources to Google! NNS is not optimized +/ parallelized which would enable even more combinatorial parameter +testing over the objective function space…this is a work in progress. So +to arguments that KERAS is not optimized…neither is +NNS!

    +

    If you have any related questions / comments, feel free to e-mail:

    +

    Thanks for your interest!

    +
    +
    + + + + +
    + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/Time_Series_Classification.html b/tools/NNS/examples/Time_Series_Classification.html new file mode 100644 index 0000000..6ec7dee --- /dev/null +++ b/tools/NNS/examples/Time_Series_Classification.html @@ -0,0 +1,528 @@ + + + + + + + + + + + + + +Time Series Classification + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + +
    +

    Intro

    +

    I adapted the problem from the following link http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_control.data.html. The goal is to classify the time series into 1 of 6 types of time series.

    +

    This dataset contains 600 examples of control charts synthetically generated by the process in Alcock and Manolopoulos (1999). There are six different classes of control charts:

    +
      +
    1. Normal
    2. +
    3. Cyclic
    4. +
    5. Increasing trend
    6. +
    7. Decreasing trend
    8. +
    9. Upward shift
    10. +
    11. Downward shift
    12. +
    +

    The following image shows ten examples from each class:

    +
    +
    +

    Load Required Packages in R NNS (>= 3.9.1)

    +
    require(devtools); install_github('OVVO-Financial/NNS', ref = "NNS-Beta-Version")
    +require(NNS)
    +require(plyr)
    +require(data.table)
    +
    +
    +

    Download Data

    +

    Scan from URL and convert into matrix with each observation as a column.

    +
    tsdata <- scan(url("http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_control.data"))
    +tsdata <- matrix(tsdata, ncol=60, byrow = TRUE)
    +dim(tsdata)
    +
    ## [1] 600  60
    +
    +
    +

    Augment tsdata matrix with labels

    +

    The following were put into columns of the matrix tsdata:

    +
      +
    • There are 60 observations per time series, as features or columns

    • +
    • We have the classification ID, per time series class it belongs to. The classes are organized as follows:

      +
        1-100   Normal
      +  101-200 Cyclic
      +  201-300 Increasing trend
      +  301-400 Decreasing trend
      +  401-500 Upward shift
      +  501-600 Downward shift
    • +
    +

    This leaves us with a matrix of 600 rows and 61 columns.

    +
    series = cbind(tsdata ,class.id = rep(1:6, each = 100))
    +dim(series)
    +
    ## [1] 600  61
    +
    +
    +

    Create train and test sets

    +

    There are 600 time series, we will randomly remove 20% of the time series for an out of sample experiment.

    +
    set.seed(123)
    +test.index = sample(1:600, 120, replace = FALSE)
    +test.index
    +
    ##   [1] 173 473 245 528 561  28 314 530 327 270 565 268 399 337  61 527 144
    +##  [18]  25 191 555 516 402 371 574 378 408 313 341 166  85 549 514 393 452
    +##  [35]  14 591 428 122 179 130  80 232 231 206 571  78 596 258 147 599  26
    +##  [52] 243 438  67 307 113  70 410 486 203 360  52 207 148 437 240 433 434
    +##  [69] 423 234 400 333 375   1 251 116 200 321 184  58 127 347 217 575  54
    +##  [86] 224 507 459 454  90 547 529 175 509 163  95 395  48 235 257 300 167
    +## [103] 244 475 535 441 567 301 541  73 513 537  30 462 351 544 266 461 283
    +## [120] 195
    +

    Checking our dimensions to ensure 120 time series have been removed.

    +
    training.ts.set = series[-test.index, ]
    +dim(training.ts.set)
    +
    ## [1] 480  61
    +
    test.ts.set = series[test.index, ]
    +dim(test.ts.set)
    +
    ## [1] 120  61
    +
    # Distribution of test set classifications
    +hist(series[ , "class.id"][test.index])
    +

    +
    +
    +

    Use NNS.reg to find the most similar

    +

    The following settings in NNS.reg(..., order = "max", n.best = 1, ...) are very similar to a KNN = 1 (with the NNS general case exhibiting significant weighting differences and less sensitivity to the curse of dimensionality). We set up the following independent variables for training IVs.train, the dependent variable DV which is the class.id from our earlier training set training.ts.set, and our independent variables for testing IVs.test.

    +
    IVs.train = training.ts.set[ , colnames(training.ts.set) != "class.id"]
    +DV = training.ts.set[ , "class.id"]
    +IVs.test = test.ts.set[ , colnames(test.ts.set) != "class.id"]
    +

    Using these variables in our NNS.reg we store our Point.est as predictions

    +
    predictions = NNS.reg(IVs.train, DV, point.est = IVs.test, order = "max", n.best = 1, plot = FALSE, residual.plot = FALSE)$Point.est
    +

    Finally, a check of our accuracy yields:

    +
    mean(round(predictions) == test.ts.set[ , "class.id"])
    +
    ## [1] 0.9833333
    +
    +
    +

    More seeds

    +

    Let’s try 100 different seeds and visualize the results.

    +
    results=numeric()
    +
    +for(i in 1:100){
    +set.seed(123+i)
    +test.index = sample(1:600, 120, replace = FALSE)
    +
    +training.ts.set = series[-test.index, ]
    +test.ts.set = series[test.index, ]
    +
    +IVs.train = training.ts.set[,colnames(training.ts.set) != "class.id"]
    +DV = training.ts.set[, "class.id"]
    +IVs.test = test.ts.set[,colnames(test.ts.set) != "class.id"]
    +
    +predictions = NNS.reg(IVs.train, DV, point.est = IVs.test, order = "max", n.best = 1, plot = FALSE, residual.plot = FALSE)$Point.est
    +
    +results[i] = mean(round(predictions) == test.ts.set[ , "class.id"])
    +}
    +
    +hist(results)
    +

    +
    mean(results); sd(results)
    +
    ## [1] 0.9861667
    +
    ## [1] 0.00999579
    +
    +
    +

    Comments

    +

    By reducing the order parameter in the NNS.reg function, we can encode the time series with its regression points, which are clusters, thus permitting larger time-series analysis. This is an active area of development. Also, partial time-series matching is another area of active development.

    +

    If you found this presentation interesting or useful, please feel free to reach out via e-mail:

    +

    Thanks for your interest!

    +
    + + + + +
    + + + + + + + + diff --git a/tools/NNS/examples/Time_Series_Classification_Expanded--Nasdaq-.html b/tools/NNS/examples/Time_Series_Classification_Expanded--Nasdaq-.html new file mode 100644 index 0000000..bc9b6a0 --- /dev/null +++ b/tools/NNS/examples/Time_Series_Classification_Expanded--Nasdaq-.html @@ -0,0 +1,3294 @@ + + + + + + + + + + + + + + +Nasdaq Composite Classification + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    +
    +
    +
    +
    + +
    + + + + + + + +
    +

    Intro

    +

    We are going to define a function for comparing a time-series for its +most representative counterpart in a reference time-series(s). This +expands the analysis already presented in “Time-Series +Classification” by using vectors of different lengths for partial +time-series matching.

    +

    In this example we will ascertain the most similar period for the +Nasdaq composite since 1971 to the current year 2026 +returns through May, 2026.

    +

    Similarity is measured on compounded cumulative +returnscumprod(1 + r) — rather than raw daily +returns. This makes every metric path-aware: two windows must travel the +same cumulative route, not merely share day-by-day co-movement.

    +
    +
    +

    Load Required Packages in R

    +
    #require(devtools); install_github('OVVO-Financial/NNS', ref = "NNS-Beta-Version")
    +library(NNS)
    +library(lubridate)
    +library(data.table)
    +library(quantmod)
    +library(zoo)
    +
    +
    +

    Get data

    +

    We will load the data for the Nasdaq composite from +1971. This will create the variable IXIC. Further, we will +create:

    +
      +
    1. A compounded cumulative return dataset: current_cumret +& windows derived from reference_returns
    2. +
    3. A dataset of closing prices: current & +reference
    4. +
    5. A window length: w
    6. +
    +
    # Download data
    +getSymbols('^IXIC', src='yahoo', from=as.Date("1971-01-01"))
    +
    ## [1] "IXIC"
    +
    # Daily returns (arithmetic, explicit)
    +reference_returns <- dailyReturn(IXIC, type = "arithmetic")
    +
    +reference <- IXIC$IXIC.Adjusted
    +
    +# Helper: compounded cumulative return path from a vector of arithmetic returns
    +cumret <- function(x) cumprod(1 + as.numeric(x))
    +
    +# Create current distribution
    +current_year    <- year(Sys.Date())
    +current_returns <- reference_returns[as.character(current_year)]
    +current         <- reference[as.character(current_year)]
    +current_cumret  <- cumret(current_returns)
    +
    +# Window length
    +w <- length(current_cumret)
    +
    +
    +

    Which Calendar Year is Most Similar?

    +

    Let’s identify the calendar year which is most similar. We will +create a series of reference distributions from the +beginning of each calendar year of equal length to the +current distribution. This just uses +NNS.reg(..., order = "max") which is a knn surrogate, and +k=1 in this classification instance. Each year’s series is +expressed as its compounded cumulative return path.

    +

    First, a quick test to see if using the current year actually returns +the current year…

    +
    cumret_list <- list()
    +first_year <- lubridate::year(index(IXIC)[1])
    +last_year <- current_year
    +
    +
    +
    +for(i in seq(first_year, last_year, 1)){
    +    idx <- which(i == seq(first_year, last_year, 1))
    +    yr_returns <- reference_returns[as.character(i)][1:w]
    +    cumret_list[[idx]] <- cumret(yr_returns)
    +}
    +
    +IV <- matrix(unlist(cumret_list), ncol = w, byrow = TRUE)
    +DV <- seq(first_year, last_year, 1)
    +
    +nns.estimate <- NNS.reg(IV, DV, point.est = t(current_cumret), order = "max", n.best = 1,
    +                        residual.plot = FALSE, type = "CLASS")$Point.est
    +
    +nns.estimate
    +
    ## [1] 2026
    +

    Yup, exact match! Let’s remove the current_year from the +cumret_list.

    +
    cumret_list <- list()
    +first_year <- lubridate::year(index(IXIC)[1])
    +last_year <- current_year - 1
    +
    +for(i in seq(first_year, last_year, 1)){
    +    idx <- which(i == seq(first_year, last_year, 1))
    +    yr_returns <- reference_returns[as.character(i)][1:w]
    +    cumret_list[[idx]] <- cumret(yr_returns)
    +}
    +
    +IV <- matrix(unlist(cumret_list), ncol = w, byrow = TRUE)
    +DV <- seq(first_year, last_year, 1)
    +
    +nns.estimate <- NNS.reg(IV, DV, point.est = t(current_cumret), order = "max", n.best = 1,
    +                        residual.plot = FALSE, type = "CLASS")$Point.est
    +
    +nns.estimate
    +
    ## [1] 2007
    +
    # View the plot on both series
    +plot(as.zoo(reference[as.character(nns.estimate)][1:w]), col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("NNS \n Most Similar Calendar Year \n", nns.estimate))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current (RHS)"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n")
    +

    +
    # Future from most similar period
    +future_IXIC <- reference[as.character(nns.estimate)][w:min((2*w), length(reference[as.character(nns.estimate)]))]
    +plot(future_IXIC)
    +

    +
    +
    +

    Finding the Most Similar Period (Non-Calendar Year)

    +

    Next we will eschew the calendar year and scan overlapping windows to +find the most similar period. The independent variable matrices get +excessive (since each observation is an IV and they sequentially +overlap), so we can simplify by minimizing the sum-of-squared +differences, or maximizing the linear correlation of compounded +cumulative return vectors to determine similarity.

    +
    +
    +

    Use any length of latest returns

    +

    We will use the last 88 trading days.

    +
    #current_returns <- tail(current_returns, 60) #Set to any length
    +w              <- length(current_returns)
    +current_cumret <- cumret(current_returns)
    +
    +

    Remove current period, also a verification of similarity!

    +

    If we do not remove the last w observations from the +reference data, we will match exactly to the +current starting period.

    +
    # Measure time
    +start.time <- Sys.time()
    +
    +best.period.1 <- which.max(
    +  sapply(
    +    seq_len(length(reference_returns) - w + 1),
    +    function(i) cor(cumret(reference_returns[i:(i + w - 1)]),
    +                    current_cumret)
    +  )
    +)
    +
    +# Calculate elapsed time
    +print(Sys.time() - start.time)
    +
    ## Time difference of 0.9044051 secs
    +
    # Output the best matching period's value in IXIC
    +IXIC[best.period.1]
    +
    ##            IXIC.Open IXIC.High IXIC.Low IXIC.Close IXIC.Volume IXIC.Adjusted
    +## 2026-01-02  23481.49  23585.96 23119.49   23235.63  7331460000      23235.63
    +

    Yup, exact match! Let’s remove those current +observations from the reference.

    +
    # Trim both reference series together to keep them in sync
    +reference         <- head(reference,         nrow(reference)         - w)
    +reference_returns <- head(reference_returns, length(reference_returns) - w)
    +
    +
    +

    Method 1: Linear Correlation of Compounded Returns

    +

    The first method maximises linear correlation between the compounded +cumulative return path of each reference window and the current +period.

    +
    best.period.1 <- which.max(
    +  sapply(
    +    seq_len(length(reference_returns) - w + 1),
    +    function(i) cor(cumret(reference_returns[i:(i + w - 1)]),
    +                    current_cumret)
    +  )
    +)
    +
    +# Most similar period
    +IXIC[c(best.period.1, best.period.1 + w)]
    +
    ##            IXIC.Open IXIC.High IXIC.Low IXIC.Close IXIC.Volume IXIC.Adjusted
    +## 1982-05-20    182.72    182.72   182.72     182.72           0        182.72
    +## 1982-09-24    188.61    188.61   188.61     188.61           0        188.61
    +
    # View Plot on Both Series
    +plot(as.zoo(reference[best.period.1:(best.period.1 + w)]), col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("LINEAR CORRELATION \n Most Similar Period \n",
    +                  index(reference_returns[best.period.1]), " : ",
    +                  index(reference_returns[(best.period.1 + w)])))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current (RHS)"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n")
    +

    +
    # Future from most similar period
    +IXIC[c(best.period.1 + w, best.period.1 + (2*w))]
    +
    ##            IXIC.Open IXIC.High IXIC.Low IXIC.Close IXIC.Volume IXIC.Adjusted
    +## 1982-09-24    188.61    188.61   188.61     188.61           0        188.61
    +## 1983-01-28    246.93    246.93   246.93     246.93           0        246.93
    +
    future_IXIC_1 <- reference[(best.period.1 + w):(best.period.1 + (2*w))]
    +plot(future_IXIC_1)
    +

    +
    # SSE on compounded returns
    +sum( (cumret(reference_returns[best.period.1:(best.period.1 - 1 + w)]) - current_cumret)^2 )
    +
    ## [1] 0.3221079
    +
    +
    +

    Method 2: Sum-of-Squared Differences of Compounded Returns

    +

    The second method minimises the sum of squared differences between +compounded cumulative return paths.

    +
    +

    Accuracy Check

    +

    First, let’s check its accuracy on the complete dataset to see if it +selects the current period as the most likely period…

    +
    # Reload full series for accuracy check — local variable, does not disturb reference_returns
    +reference_returns_full <- dailyReturn(IXIC, type = "arithmetic")
    +
    +best.period.2 <- which.min(
    +  sapply(
    +    seq_len(length(reference_returns_full) - w + 1),
    +    function(i) sum((cumret(reference_returns_full[i:(i + w - 1)]) - current_cumret)^2)
    +  )
    +)
    +
    +# Output the most similar period's value in IXIC
    +IXIC[best.period.2]
    +
    ##            IXIC.Open IXIC.High IXIC.Low IXIC.Close IXIC.Volume IXIC.Adjusted
    +## 2026-01-02  23481.49  23585.96 23119.49   23235.63  7331460000      23235.63
    +

    Yup, it works!

    +
    +
    +

    Apply Sum-of-Squared Differences to Compounded Returns

    +

    Now we will apply the sum-of-squared differences to the compounded +cumulative returns, after removing the current observations +from the reference…again.

    +
    # reference_returns already has current period removed (from `remove` chunk)
    +best.period.2 <- which.min(
    +  sapply(
    +    seq_len(length(reference_returns) - w + 1),
    +    function(i) sum((cumret(reference_returns[i:(i + w - 1)]) - current_cumret)^2)
    +  )
    +)
    +
    +# Most similar period
    +IXIC[c(best.period.2, best.period.2 + w)]
    +
    ##            IXIC.Open IXIC.High IXIC.Low IXIC.Close IXIC.Volume IXIC.Adjusted
    +## 1971-09-03    109.98    109.98   109.98     109.98           0        109.98
    +## 1972-01-10    116.10    116.10   116.10     116.10           0        116.10
    +
    # View Plot on Both Series
    +plot(as.zoo(reference[best.period.2:(best.period.2 + w)]), lty = 1, col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("SUM OF SQUARED DIFFERENCES \n Most Similar Period \n",
    +                  index(reference_returns[best.period.2]), " : ",
    +                  index(reference_returns[(best.period.2 + w)])))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n")
    +

    +
    # Future from most similar period
    +IXIC[c(best.period.2 + w, best.period.2 + (2*w))]
    +
    ##            IXIC.Open IXIC.High IXIC.Low IXIC.Close IXIC.Volume IXIC.Adjusted
    +## 1972-01-10    116.10    116.10   116.10     116.10           0        116.10
    +## 1972-05-15    130.34    130.34   130.34     130.34           0        130.34
    +
    future_IXIC_2 <- reference[(best.period.2 + w):(best.period.2 + (2*w))]
    +plot(future_IXIC_2)
    +

    +
    # SSE on compounded returns
    +sum( (cumret(reference_returns[best.period.2:(best.period.2 - 1 + w)]) - current_cumret)^2 )
    +
    ## [1] 0.0553841
    +
    +
    +
    +
    +

    Comments

    +
      +
    • Evaluating each of the methods most similar compounded cumulative +return paths via the minimum sum-of-squared errors reveals Method #2 has +the best fit.
    • +
    • We have not ventured far into the definition of \(similar\). We have demonstrated 2 methods +of determining similarity, and unfortunately, the future outcomes are +substantially varied based on the definition and evaluation +criteria.
    • +
    • Based on prevailing evaluation criteria, both future outcomes are +significantly different from current levels for the next 88 day +period
    • +
    +
    # View Plot on Both Series
    +plot(as.zoo(future_IXIC_1), col = c("blue"), xlab = "Date", ylab = "Price", main = "Future Performance for Similar Periods")
    +par(new = TRUE)
    +plot(as.zoo(future_IXIC_2), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Future via Method #1 (LHS)","Future via Method #2 (RHS)"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n")
    +

    +

    I look forward to further discussions and collaboration with those +equally as passionate about these issues, and open to embracing +alternative solutions. If you found this presentation interesting or +useful, please feel free to reach out via e-mail:

    +

    Thanks for your interest!

    +
    +
    +

    Other Similarity Measures

    +

    I have included the binary classification of the most similar period +and a dynamic time warping. The binary classification is the percentage +of days where both compounded cumulative return paths are simultaneously +above or below their common starting value of 1: +sum(x > 1 & y > 1 | x < 1 & y < 1) / length(x). +First, we will check their accuracy.

    +
    custom_cor_2 <- function(x, y){
    +  xc <- cumret(x)
    +  yc <- cumret(y)
    +  a  <- sum(xc > 1 & yc > 1 | xc < 1 & yc < 1) / length(xc)
    +  return(a)
    +}
    +
    +# Reload full series for accuracy check
    +reference_returns_full <- dailyReturn(IXIC, type = "arithmetic")
    +
    +best.period.3 <- which.max(
    +  sapply(
    +    seq_len(length(reference_returns_full) - w + 1),
    +    function(i) custom_cor_2(reference_returns_full[i:(i + w - 1)], current_returns)
    +  )
    +)
    +
    +# Most similar period
    +IXIC[best.period.3]
    +
    ##            IXIC.Open IXIC.High IXIC.Low IXIC.Close IXIC.Volume IXIC.Adjusted
    +## 2026-01-02  23481.49  23585.96 23119.49   23235.63  7331460000      23235.63
    +
    require(TSdist)
    +
    +best.period.4 <- which.min(
    +  sapply(
    +    seq_len(length(reference_returns_full) - w + 1),
    +    function(i) TSdist::DTWDistance(cumret(reference_returns_full[i:(i + w - 1)]), current_cumret)
    +  )
    +)
    +
    +# Most similar period
    +IXIC[best.period.4]
    +
    ##            IXIC.Open IXIC.High IXIC.Low IXIC.Close IXIC.Volume IXIC.Adjusted
    +## 2026-01-02  23481.49  23585.96 23119.49   23235.63  7331460000      23235.63
    +

    Yup, they both pass the accuracy check. Now let’s apply to the rest +of the series after removing the current observations…

    +
    # reference_returns already has current period removed (from `remove` chunk)
    +
    +best.period.3 <- which.max(
    +  sapply(
    +    seq_len(length(reference_returns) - w + 1),
    +    function(i) custom_cor_2(reference_returns[i:(i + w - 1)], current_returns)
    +  )
    +)
    +
    +# Most similar period
    +IXIC[c(best.period.3, best.period.3 + w)]
    +
    ##            IXIC.Open IXIC.High IXIC.Low IXIC.Close IXIC.Volume IXIC.Adjusted
    +## 1988-09-27     382.5     382.9    382.1      382.2   103770000         382.2
    +## 1989-02-01     401.8     403.2    401.5      403.2   152890000         403.2
    +
    best.period.4 <- which.min(
    +  sapply(
    +    seq_len(length(reference_returns) - w + 1),
    +    function(i) TSdist::DTWDistance(cumret(reference_returns[i:(i + w - 1)]), current_cumret)
    +  )
    +)
    +
    +# Most similar period
    +IXIC[c(best.period.4, best.period.4 + w)]
    +
    ##            IXIC.Open IXIC.High IXIC.Low IXIC.Close IXIC.Volume IXIC.Adjusted
    +## 1971-10-04    109.96    109.96   109.96     109.96           0        109.96
    +## 1972-02-07    121.14    121.14   121.14     121.14           0        121.14
    +
    par(mfrow = c(2,2))
    +
    +# View Plot on Both Series
    +plot(as.zoo(reference[best.period.1:(best.period.1 + w)]), lty = 1, col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("LINEAR CORRELATION \n Most Similar Period \n",
    +                  index(reference_returns[best.period.1]), " : ", index(reference_returns[(best.period.1 + w)])))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current (RHS)"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n", horiz = T)
    +
    +# View Plot on Both Series
    +plot(as.zoo(reference[best.period.2:(best.period.2 + w)]), lty = 1, col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("SUM OF SQUARED DIFFERENCES \n Most Similar Period \n",
    +                  index(reference_returns[best.period.2]), " : ", index(reference_returns[(best.period.2 + w)])))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n", horiz = T)
    +
    +# View Plot on Both Series
    +plot(as.zoo(reference[best.period.3:(best.period.3 + w)]), lty = 1, col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("BINARY CLASSIFICATION \n Most Similar Period \n",
    +                  index(reference_returns[best.period.3]), " : ", index(reference_returns[(best.period.3 + w)])))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n", horiz = T)
    +
    +# View Plot on Both Series
    +plot(as.zoo(reference[best.period.4:(best.period.4 + w)]), lty = 1, col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("DYNAMIC TIME WARPING \n Most Similar Period \n",
    +                  index(reference_returns[best.period.4]), " : ", index(reference_returns[(best.period.4 + w)])))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n", horiz = T)
    +

    +

    Again, different definitions of similarity will yield +different periods of association.

    +
    +

    Future Paths

    +
    par(mfrow = c(2,2))
    +
    +future_IXIC_1 <- reference[(best.period.1 + w):(best.period.1 + (2*w))]
    +plot(future_IXIC_1)
    +
    +future_IXIC_2 <- reference[(best.period.2 + w):(best.period.2 + (2*w))]
    +plot(future_IXIC_2)
    +
    +future_IXIC_3 <- reference[(best.period.3 + w):(best.period.3 + (2*w))]
    +plot(future_IXIC_3)
    +
    +future_IXIC_4 <- reference[(best.period.4 + w):(best.period.4 + (2*w))]
    +plot(future_IXIC_4)
    +

    +
    +
    +

    Indexed Future Paths

    +

    Indexing all four forward paths to 100 at their common start makes +directional and magnitude differences directly comparable across +methods.

    +
    index_to_100 <- function(x) as.numeric(x) / as.numeric(x[1]) * 100
    +
    +par(mfrow = c(2, 2))
    +
    +future_paths <- list(future_IXIC_1, future_IXIC_2, future_IXIC_3, future_IXIC_4)
    +method_names <- c("LINEAR CORRELATION", "SUM OF SQUARED DIFF",
    +                  "BINARY CLASSIFICATION", "DYNAMIC TIME WARPING")
    +ref_periods  <- c(best.period.1, best.period.2, best.period.3, best.period.4)
    +
    +for(j in seq_along(future_paths)){
    +  fp_indexed <- index_to_100(future_paths[[j]])
    +  plot(fp_indexed, type = "l", col = "blue",
    +       xlab = "Trading Days", ylab = "Indexed Price (Start = 100)",
    +       main = paste(method_names[j], "\n Future Path \n",
    +                    index(reference_returns[ref_periods[j] + w])))
    +  abline(h = 100, lty = 2, col = "gray50")
    +}
    +

    +
    +
    + + + +
    +
    + +
    + + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/Time_Series_Classification_Expanded.html b/tools/NNS/examples/Time_Series_Classification_Expanded.html new file mode 100644 index 0000000..700acdb --- /dev/null +++ b/tools/NNS/examples/Time_Series_Classification_Expanded.html @@ -0,0 +1,3294 @@ + + + + + + + + + + + + + + +S&P 500 Classification + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    +
    +
    +
    +
    + +
    + + + + + + + +
    +

    Intro

    +

    We are going to define a function for comparing a time-series for its +most representative counterpart in a reference time-series(s). This +expands the analysis already presented in “Time-Series +Classification” by using vectors of different lengths for partial +time-series matching.

    +

    In this example we will ascertain the most similar period for the +S&P 500 index since 1950 to the current year 2026 +returns through May, 2026.

    +

    Similarity is measured on compounded cumulative +returnscumprod(1 + r) — rather than raw daily +returns. This makes every metric path-aware: two windows must travel the +same cumulative route, not merely share day-by-day co-movement.

    +
    +
    +

    Load Required Packages in R

    +
    #require(devtools); install_github('OVVO-Financial/NNS', ref = "NNS-Beta-Version")
    +library(NNS)
    +library(lubridate)
    +library(data.table)
    +library(quantmod)
    +library(zoo)
    +
    +
    +

    Get data

    +

    We will load the data for the S&P 500 index from +1950. This will create the variable GSPC. Further, we will +create:

    +
      +
    1. A compounded cumulative return dataset: current_cumret +& windows derived from reference_returns
    2. +
    3. A dataset of closing prices: current & +reference
    4. +
    5. A window length: w
    6. +
    +
    # Download data
    +getSymbols('^GSPC', src='yahoo', from=as.Date("1950-01-01"))
    +
    ## [1] "GSPC"
    +
    # Daily returns (arithmetic, explicit)
    +reference_returns <- dailyReturn(GSPC, type = "arithmetic")
    +
    +reference <- GSPC$GSPC.Adjusted
    +
    +# Helper: compounded cumulative return path from a vector of arithmetic returns
    +cumret <- function(x) cumprod(1 + as.numeric(x))
    +
    +# Create current distribution
    +current_year    <- year(Sys.Date())
    +current_returns <- reference_returns[as.character(current_year)]
    +current         <- reference[as.character(current_year)]
    +current_cumret  <- cumret(current_returns)
    +
    +# Window length
    +w <- length(current_cumret)
    +
    +
    +

    Which Calendar Year is Most Similar?

    +

    Let’s identify the calendar year which is most similar. We will +create a series of reference distributions from the +beginning of each calendar year of equal length to the +current distribution. This just uses +NNS.reg(..., order = "max") which is a knn surrogate, and +k=1 in this classification instance. Each year’s series is +expressed as its compounded cumulative return path.

    +

    First, a quick test to see if using the current year actually returns +the current year…

    +
    cumret_list <- list()
    +first_year <- lubridate::year(index(GSPC)[1])
    +last_year <- current_year
    +
    +
    +
    +for(i in seq(first_year, last_year, 1)){
    +    idx <- which(i == seq(first_year, last_year, 1))
    +    yr_returns <- reference_returns[as.character(i)][1:w]
    +    cumret_list[[idx]] <- cumret(yr_returns)
    +}
    +
    +IV <- matrix(unlist(cumret_list), ncol = w, byrow = TRUE)
    +DV <- seq(first_year, last_year, 1)
    +
    +nns.estimate <- NNS.reg(IV, DV, point.est = t(current_cumret), order = "max", n.best = 1,
    +                        residual.plot = FALSE, type = "CLASS")$Point.est
    +
    +nns.estimate
    +
    ## [1] 2026
    +

    Yup, exact match! Let’s remove the current_year from the +cumret_list.

    +
    cumret_list <- list()
    +first_year <- lubridate::year(index(GSPC)[1])
    +last_year <- current_year - 1
    +
    +for(i in seq(first_year, last_year, 1)){
    +    idx <- which(i == seq(first_year, last_year, 1))
    +    yr_returns <- reference_returns[as.character(i)][1:w]
    +    cumret_list[[idx]] <- cumret(yr_returns)
    +}
    +
    +IV <- matrix(unlist(cumret_list), ncol = w, byrow = TRUE)
    +DV <- seq(first_year, last_year, 1)
    +
    +nns.estimate <- NNS.reg(IV, DV, point.est = t(current_cumret), order = "max", n.best = 1,
    +                        residual.plot = FALSE, type = "CLASS")$Point.est
    +
    +nns.estimate
    +
    ## [1] 2007
    +
    # View the plot on both series
    +plot(as.zoo(reference[as.character(nns.estimate)][1:w]), col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("NNS \n Most Similar Calendar Year \n", nns.estimate))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current (RHS)"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n")
    +

    +
    # Future from most similar period
    +future_GSPC <- reference[as.character(nns.estimate)][w:min((2*w), length(reference[as.character(nns.estimate)]))]
    +plot(future_GSPC)
    +

    +
    +
    +

    Finding the Most Similar Period (Non-Calendar Year)

    +

    Next we will eschew the calendar year and scan overlapping windows to +find the most similar period. The independent variable matrices get +excessive (since each observation is an IV and they sequentially +overlap), so we can simplify by minimizing the sum-of-squared +differences, or maximizing the linear correlation of compounded +cumulative return vectors to determine similarity.

    +
    +
    +

    Use any length of latest returns

    +

    We will use the last 88 trading days.

    +
    #current_returns <- tail(current_returns, 60) #Set to any length
    +w              <- length(current_returns)
    +current_cumret <- cumret(current_returns)
    +
    +

    Remove current period, also a verification of similarity!

    +

    If we do not remove the last w observations from the +reference data, we will match exactly to the +current starting period.

    +
    # Measure time
    +start.time <- Sys.time()
    +
    +best.period.1 <- which.max(
    +  sapply(
    +    seq_len(length(reference_returns) - w + 1),
    +    function(i) cor(cumret(reference_returns[i:(i + w - 1)]),
    +                    current_cumret)
    +  )
    +)
    +
    +# Calculate elapsed time
    +print(Sys.time() - start.time)
    +
    ## Time difference of 1.304198 secs
    +
    # Output the best matching period's value in GSPC
    +GSPC[best.period.1]
    +
    ##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
    +## 2026-01-02   6878.11   6894.87  6824.31    6858.47  4184120000       6858.47
    +

    Yup, exact match! Let’s remove those current +observations from the reference.

    +
    # Trim both reference series together to keep them in sync
    +reference         <- head(reference,         nrow(reference)         - w)
    +reference_returns <- head(reference_returns, length(reference_returns) - w)
    +
    +
    +

    Method 1: Linear Correlation of Compounded Returns

    +

    The first method maximises linear correlation between the compounded +cumulative return path of each reference window and the current +period.

    +
    best.period.1 <- which.max(
    +  sapply(
    +    seq_len(length(reference_returns) - w + 1),
    +    function(i) cor(cumret(reference_returns[i:(i + w - 1)]),
    +                    current_cumret)
    +  )
    +)
    +
    +# Most similar period
    +GSPC[c(best.period.1, best.period.1 + w)]
    +
    ##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
    +## 1963-05-01     69.80     70.43    69.61      69.97     5060000         69.97
    +## 1963-09-05     72.64     73.19    72.15      73.00     5700000         73.00
    +
    # View Plot on Both Series
    +plot(as.zoo(reference[best.period.1:(best.period.1 + w)]), col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("LINEAR CORRELATION \n Most Similar Period \n",
    +                  index(reference_returns[best.period.1]), " : ",
    +                  index(reference_returns[(best.period.1 + w)])))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current (RHS)"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n")
    +

    +
    # Future from most similar period
    +GSPC[c(best.period.1 + w, best.period.1 + (2*w))]
    +
    ##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
    +## 1963-09-05     72.64     73.19    72.15      73.00     5700000         73.00
    +## 1964-01-14     76.22     76.85    75.88      76.36     6500000         76.36
    +
    future_GSPC_1 <- reference[(best.period.1 + w):(best.period.1 + (2*w))]
    +plot(future_GSPC_1)
    +

    +
    # SSE on compounded returns
    +sum( (cumret(reference_returns[best.period.1:(best.period.1 - 1 + w)]) - current_cumret)^2 )
    +
    ## [1] 0.03062372
    +
    +
    +

    Method 2: Sum-of-Squared Differences of Compounded Returns

    +

    The second method minimises the sum of squared differences between +compounded cumulative return paths.

    +
    +

    Accuracy Check

    +

    First, let’s check its accuracy on the complete dataset to see if it +selects the current period as the most likely period…

    +
    # Reload full series for accuracy check
    +reference_returns_full <- dailyReturn(GSPC, type = "arithmetic")
    +
    +best.period.2 <- which.min(
    +  sapply(
    +    seq_len(length(reference_returns_full) - w + 1),
    +    function(i) sum((cumret(reference_returns_full[i:(i + w - 1)]) - current_cumret)^2)
    +  )
    +)
    +
    +# Output the most similar period's value in GSPC
    +GSPC[best.period.2]
    +
    ##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
    +## 2026-01-02   6878.11   6894.87  6824.31    6858.47  4184120000       6858.47
    +

    Yup, it works!

    +
    +
    +

    Apply Sum-of-Squared Differences to Compounded Returns

    +

    Now we will apply the sum-of-squared differences to the compounded +cumulative returns, after removing the current observations +from the reference…again.

    +
    # reference_returns already has current period removed (from `remove` chunk)
    +best.period.2 <- which.min(
    +  sapply(
    +    seq_len(length(reference_returns) - w + 1),
    +    function(i) sum((cumret(reference_returns[i:(i + w - 1)]) - current_cumret)^2)
    +  )
    +)
    +
    +# Most similar period
    +GSPC[c(best.period.2, best.period.2 + w)]
    +
    ##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
    +## 1962-07-31     57.83     58.58    57.74      58.23     4190000         58.23
    +## 1962-12-05     62.64     63.50    62.37      62.39     6280000         62.39
    +
    # View Plot on Both Series
    +plot(as.zoo(reference[best.period.2:(best.period.2 + w)]), lty = 1, col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("SUM OF SQUARED DIFFERENCES \n Most Similar Period \n",
    +                  index(reference_returns[best.period.2]), " : ",
    +                  index(reference_returns[(best.period.2 + w)])))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n")
    +

    +
    # Future from most similar period
    +GSPC[c(best.period.2 + w, best.period.2 + (2*w))]
    +
    ##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
    +## 1962-12-05     62.64     63.50    62.37      62.39     6280000         62.39
    +## 1963-04-11     68.29     69.07    67.97      68.77     5250000         68.77
    +
    future_GSPC_2 <- reference[(best.period.2 + w):(best.period.2 + (2*w))]
    +plot(future_GSPC_2)
    +

    +
    # SSE on compounded returns
    +sum( (cumret(reference_returns[best.period.2:(best.period.2 - 1 + w)]) - current_cumret)^2 )
    +
    ## [1] 0.02176564
    +
    +
    +
    +
    +

    Comments

    +
      +
    • Evaluating each of the methods most similar compounded cumulative +return paths via the minimum sum-of-squared errors reveals Method #2 has +the best fit.
    • +
    • We have not ventured far into the definition of \(similar\). We have demonstrated 2 methods +of determining similarity, and unfortunately, the future outcomes are +substantially varied based on the definition and evaluation +criteria.
    • +
    • Based on prevailing evaluation criteria, both future outcomes are +significantly different from current levels for the next 88 day +period
    • +
    +
    # View Plot on Both Series
    +plot(as.zoo(future_GSPC_1), col = c("blue"), xlab = "Date", ylab = "Price", main = "Future Performance for Similar Periods")
    +par(new = TRUE)
    +plot(as.zoo(future_GSPC_2), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Future via Method #1 (LHS)","Future via Method #2 (RHS)"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n")
    +

    +

    I look forward to further discussions and collaboration with those +equally as passionate about these issues, and open to embracing +alternative solutions. If you found this presentation interesting or +useful, please feel free to reach out via e-mail:

    +

    Thanks for your interest!

    +
    +
    +

    Other Similarity Measures

    +

    I have included the binary classification of the most similar period +and a dynamic time warping. The binary classification is the percentage +of days where both compounded cumulative return paths are simultaneously +above or below their common starting value of 1: +sum(x > 1 & y > 1 | x < 1 & y < 1) / length(x). +First, we will check their accuracy.

    +
    custom_cor_2 <- function(x, y){
    +  xc <- cumret(x)
    +  yc <- cumret(y)
    +  a  <- sum(xc > 1 & yc > 1 | xc < 1 & yc < 1) / length(xc)
    +  return(a)
    +}
    +
    +# Reload full series for accuracy check
    +reference_returns_full <- dailyReturn(GSPC, type = "arithmetic")
    +
    +best.period.3 <- which.max(
    +  sapply(
    +    seq_len(length(reference_returns_full) - w + 1),
    +    function(i) custom_cor_2(reference_returns_full[i:(i + w - 1)], current_returns)
    +  )
    +)
    +
    +# Most similar period
    +GSPC[best.period.3]
    +
    ##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
    +## 2026-01-02   6878.11   6894.87  6824.31    6858.47  4184120000       6858.47
    +
    require(TSdist)
    +
    +best.period.4 <- which.min(
    +  sapply(
    +    seq_len(length(reference_returns_full) - w + 1),
    +    function(i) TSdist::DTWDistance(cumret(reference_returns_full[i:(i + w - 1)]), current_cumret)
    +  )
    +)
    +
    +# Most similar period
    +GSPC[best.period.4]
    +
    ##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
    +## 2026-01-02   6878.11   6894.87  6824.31    6858.47  4184120000       6858.47
    +

    Yup, they both pass the accuracy check. Now let’s apply to the rest +of the series after removing the current observations…

    +
    # reference_returns already has current period removed (from `remove` chunk)
    +
    +best.period.3 <- which.max(
    +  sapply(
    +    seq_len(length(reference_returns) - w + 1),
    +    function(i) custom_cor_2(reference_returns[i:(i + w - 1)], current_returns)
    +  )
    +)
    +
    +# Most similar period
    +GSPC[c(best.period.3, best.period.3 + w)]
    +
    ##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
    +## 1950-05-01     18.22     18.22    18.22      18.22     2390000         18.22
    +## 1950-09-05     18.68     18.68    18.68      18.68     1250000         18.68
    +
    best.period.4 <- which.min(
    +  sapply(
    +    seq_len(length(reference_returns) - w + 1),
    +    function(i) TSdist::DTWDistance(cumret(reference_returns[i:(i + w - 1)]), current_cumret)
    +  )
    +)
    +
    +# Most similar period
    +GSPC[c(best.period.4, best.period.4 + w)]
    +
    ##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
    +## 1971-09-30     97.90     98.97    97.48      98.34    13490000         98.34
    +## 1972-02-03    104.68    105.43   103.85     104.64    19880000        104.64
    +
    par(mfrow = c(2,2))
    +
    +# View Plot on Both Series
    +plot(as.zoo(reference[best.period.1:(best.period.1 + w)]), lty = 1, col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("LINEAR CORRELATION \n Most Similar Period \n",
    +                  index(reference_returns[best.period.1]), " : ", index(reference_returns[(best.period.1 + w)])))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current (RHS)"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n", horiz = T)
    +
    +# View Plot on Both Series
    +plot(as.zoo(reference[best.period.2:(best.period.2 + w)]), lty = 1, col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("SUM OF SQUARED DIFFERENCES \n Most Similar Period \n",
    +                  index(reference_returns[best.period.2]), " : ", index(reference_returns[(best.period.2 + w)])))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n", horiz = T)
    +
    +# View Plot on Both Series
    +plot(as.zoo(reference[best.period.3:(best.period.3 + w)]), lty = 1, col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("BINARY CLASSIFICATION \n Most Similar Period \n",
    +                  index(reference_returns[best.period.3]), " : ", index(reference_returns[(best.period.3 + w)])))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n", horiz = T)
    +
    +# View Plot on Both Series
    +plot(as.zoo(reference[best.period.4:(best.period.4 + w)]), lty = 1, col = c("blue"), xlab = "Date", ylab = "Price",
    +     main = paste("DYNAMIC TIME WARPING \n Most Similar Period \n",
    +                  index(reference_returns[best.period.4]), " : ", index(reference_returns[(best.period.4 + w)])))
    +par(new = TRUE)
    +plot(as.zoo(current), screens = 1, lwd = 3, col = "red", xlab = "", ylab = "", xaxt = "n", yaxt = "n")
    +axis(4)
    +mtext("Price", side = 4, line = 3)
    +legend("topleft", c("Reference","Current"), lty = c(1,1), col = c("blue","red"), cex = 0.5, bty = "n", horiz = T)
    +

    +

    Again, different definitions of similarity will yield +different periods of association.

    +
    +

    Future Paths

    +
    par(mfrow = c(2,2))
    +
    +future_GSPC_1 <- reference[(best.period.1 + w):(best.period.1 + (2*w))]
    +plot(future_GSPC_1)
    +
    +future_GSPC_2 <- reference[(best.period.2 + w):(best.period.2 + (2*w))]
    +plot(future_GSPC_2)
    +
    +future_GSPC_3 <- reference[(best.period.3 + w):(best.period.3 + (2*w))]
    +plot(future_GSPC_3)
    +
    +future_GSPC_4 <- reference[(best.period.4 + w):(best.period.4 + (2*w))]
    +plot(future_GSPC_4)
    +

    +
    +
    +

    Indexed Future Paths

    +

    Indexing all four forward paths to 100 at their common start makes +directional and magnitude differences directly comparable across +methods.

    +
    index_to_100 <- function(x) as.numeric(x) / as.numeric(x[1]) * 100
    +
    +par(mfrow = c(2, 2))
    +
    +future_paths <- list(future_GSPC_1, future_GSPC_2, future_GSPC_3, future_GSPC_4)
    +method_names <- c("LINEAR CORRELATION", "SUM OF SQUARED DIFF",
    +                  "BINARY CLASSIFICATION", "DYNAMIC TIME WARPING")
    +ref_periods  <- c(best.period.1, best.period.2, best.period.3, best.period.4)
    +
    +for(j in seq_along(future_paths)){
    +  fp_indexed <- index_to_100(future_paths[[j]])
    +  plot(fp_indexed, type = "l", col = "blue",
    +       xlab = "Trading Days", ylab = "Indexed Price (Start = 100)",
    +       main = paste(method_names[j], "\n Future Path \n",
    +                    index(reference_returns[ref_periods[j] + w])))
    +  abline(h = 100, lty = 2, col = "gray50")
    +}
    +

    +
    +
    + + + +
    +
    + +
    + + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/VAR_example.html b/tools/NNS/examples/VAR_example.html new file mode 100644 index 0000000..084d1b3 --- /dev/null +++ b/tools/NNS/examples/VAR_example.html @@ -0,0 +1,3602 @@ + + + + + + + + + + + + + + + +Multivariate Time Series Forecasting: Nonparametric Vector Autoregression + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    +
    +
    +
    +
    + +
    + + + + + + + +
    +

    1 Introduction

    +

    This quick note is intended to introduce the intuition behind the NNS.VAR function, which serves as a nonparametric vector autoregression.

    +

    Install latest version of NNS (>=0.4.7) and other required packages…

    +
    require(devtools)
    +install_github("OVVO-Financial/NNS", ref = "NNS-Beta-Version")
    +
    +library(NNS)
    +library(vars)
    +library(forecast)
    +library(randomForest)
    +
    +library(kableExtra)
    +
    +
    +

    2 NNS.VAR Example with 3 Variables: Fed Funds, Real GNP, and Inflation

    +

    Using the log differences of each Real GNP and Inflation, the previous method creates lowest frequency series. In this illustrative VAR example from Lutz Kilian1, Fed Funds (monthly) is averaged over 3-period windows to align with quarterly GNP data.

    +
    # Load Variables
    +FF = read.csv("fedfunds.txt", header = FALSE, sep = "\t")
    +
    +realgnp = read.csv("realgnp.txt", header = FALSE, sep = "\t")
    +drgdp = diff(log(realgnp[, 3])) * 100
    +
    +gnpdeflator = read.csv("gnpdeflator.txt", header = FALSE, sep = "\t")
    +infl = diff(log(gnpdeflator[, 3])) * 100
    +
    +irate = numeric()
    +for (i in seq(1, length(FF[, 3]), 3)) {
    +    irate[i] = mean(FF[i:(i + 2), 3])
    +}
    +
    +irate = na.omit(irate)
    +
    +
    +y = cbind(`Diff Real GDP` = drgdp[1:213], `Interest Rate` = irate, 
    +    Inflation = infl[1:213])
    +
    +head(y)
    +
    ##      Diff Real GDP Interest Rate Inflation
    +## [1,]     1.9944940     0.9866667 0.2810368
    +## [2,]     2.8100061     1.3433333 0.4818031
    +## [3,]     1.5928111     1.5000000 0.4148308
    +## [4,]     1.3345828     1.9400000 0.7089943
    +## [5,]     0.6089062     2.3566667 0.9778623
    +## [6,]    -0.3111535     2.4833333 0.9998818
    +
    tail(y)
    +
    ##        Diff Real GDP Interest Rate Inflation
    +## [208,]   -0.01158863      5.246667 0.6891565
    +## [209,]    0.84719231      5.246667 0.3501309
    +## [210,]    0.06757442      5.256667 1.1124581
    +## [211,]    0.91316852      5.250000 0.5562536
    +## [212,]    1.05338427      5.073333 0.3451964
    +## [213,]    0.66741511      4.496667 0.4471854
    +
    +
    +

    3 Forecast 12 Quarters Out-of-Sample Using All Methods

    +

    We will withhold 12 quarters of observations as the test set. Let’s see how accurate a VAR, a random forest and NNS are…

    +

    Why a random forest? The Bank of England has recently been posting working papers utilizing machine learning, specifically random forests:

    + +
    +
    +

    4 Traditional VAR

    +
    # Forecast next 12 quarters
    +h = 12
    +
    +# Create train and test sets for both versions
    +y_train_VAR = head(y, dim(y)[1] - h)
    +y_test_VAR = tail(y, h)
    +
    +# 4 quarterly lags used in VAR
    +VAR_train = VAR(y = y_train_VAR, p = 4)
    +VAR_test = predict(VAR_train, n.ahead = h)
    +
    +par(mfrow = c(1, 3))
    +
    +
    +VAR_predictions_RMSE = cbind(rbind(Diff.Real.GDP = sqrt(mean((VAR_test$fcst[[1]][, 
    +    1] - y_test_VAR[, 1])^2)), Interest.Rate = sqrt(mean((VAR_test$fcst[[2]][, 
    +    1] - y_test_VAR[, 2])^2)), Inflation = sqrt(mean((VAR_test$fcst[[3]][, 
    +    1] - y_test_VAR[, 3])^2))))
    +
    +colnames(VAR_predictions_RMSE) = c("VAR Estimates RMSE to Actual Values")
    +
    +
    +VAR_predictions_corr = cbind(rbind(Diff.Real.GDP = cor(VAR_test$fcst[[1]][, 
    +    1], y_test_VAR[, 1], method = "spearman"), Interest.Rate = cor(VAR_test$fcst[[2]][, 
    +    1], y_test_VAR[, 2], method = "spearman"), Inflation = cor(VAR_test$fcst[[3]][, 
    +    1], y_test_VAR[, 3], method = "spearman")))
    +
    +colnames(VAR_predictions_corr) = c("VAR Estimates Correlation to Actual Values")
    +
    +VAR_results = cbind(VAR_predictions_RMSE, VAR_predictions_corr)
    +knitr::kable(VAR_results, digits = 4) %>% kable_styling(full_width = T) %>% 
    +    column_spec(1, width = "8cm")
    + + + + + + + + + + + + + + + + + + + + + + + + + +
    + +VAR Estimates RMSE to Actual Values + +VAR Estimates Correlation to Actual Values +
    +Diff.Real.GDP + +0.5062 + +-0.3916 +
    +Interest.Rate + +1.3451 + +0.7741 +
    +Inflation + +0.3481 + +-0.6294 +
    +

    The correlations for the VAR forecast of realgdp and inflation versus the actual out-of-sample values were negative! The ARMA forecasts were negative for realgdp and interest rates.

    +

    This highlights the dual objective of differenced time series data:

    +
      +
    • Get the sign right
    • +
    • Get the magnitude right
    • +
    +

    NNS is able to address this dual objective into its routines as detailed in the following code chunks.

    +
    +
    +

    5 NNS and Random Forest

    +
    +

    5.1 Function to Generate Lagged Variables

    +

    The following code is to generate a matrix of lagged variables to be used in both NNS and Random Forest routines.

    +
    lag.mtx <- function(x, tau) {
    +    colheads <- NULL
    +    
    +    if (is.null(dim(x)[2])) {
    +        colheads <- noquote(as.character(deparse(substitute(x))))
    +        x <- t(t(x))
    +    }
    +    
    +    j.vectors <- list()
    +    
    +    for (j in 1:ncol(x)) {
    +        if (is.null(colheads)) {
    +            colheads <- colnames(x)[j]
    +            
    +            colheads <- noquote(as.character(deparse(substitute(colheads))))
    +        }
    +        
    +        x.vectors <- list()
    +        heads <- paste0(colheads, ".tau.")
    +        heads <- gsub("\"", "", heads)
    +        
    +        for (i in 0:tau) {
    +            x.vectors[[paste(heads, i, sep = "")]] <- numeric(0L)
    +            start <- tau - i + 1
    +            end <- length(x[, j]) - i
    +            x.vectors[[i + 1]] <- x[start:end, j]
    +        }
    +        
    +        j.vectors[[j]] <- do.call(cbind, x.vectors)
    +        colheads <- NULL
    +    }
    +    
    +    return(as.data.frame(do.call(cbind, j.vectors)))
    +}
    +
    +
    +# Test it
    +set.seed(123)
    +V1 <- rnorm(10)
    +V2 <- rnorm(10)
    +V3 <- rnorm(10)
    +
    +lag.mtx(cbind(V1 = V1, V2 = V2, V3 = V3), 3)
    +
    + +
    +
    # Compare to last values of each variable
    +cbind(V1 = tail(V1), V2 = tail(V2), V3 = tail(V3))
    +
    ##              V1         V2         V3
    +## [1,]  0.1292877 -0.5558411 -0.6250393
    +## [2,]  1.7150650  1.7869131 -1.6866933
    +## [3,]  0.4609162  0.4978505  0.8377870
    +## [4,] -1.2650612 -1.9666172  0.1533731
    +## [5,] -0.6868529  0.7013559 -1.1381369
    +## [6,] -0.4456620 -0.4727914  1.2538149
    +
    +
    +
    +

    6 Predictions

    +
    +

    6.1 1 Year Lag \((\tau = 4)\), 3 Year Forecast \((h = 12)\)

    +
    +
    +

    6.2 Create Forecasted IVs Using NNS.ARMA()

    +
    train_l = dim(y)[1] - h
    +test_DVs = tail(y, h)
    +
    +nns_IVs = list()
    +
    +cl <- makeCluster(detectCores() - 1)
    +registerDoParallel(cl)
    +
    +nns_IVs <- foreach(i = 1:ncol(y_train_VAR), .packages = "NNS") %dopar% 
    +    {
    +        variable = y_train_VAR[, i]
    +        
    +        periods = NNS.seas(variable, modulo = 4, mod.only = FALSE, 
    +            plot = FALSE)$periods
    +        
    +        b = NNS.ARMA.optim(variable, seasonal.factor = periods, 
    +            training.set = length(variable) - 2 * h, ncores = 1, 
    +            print.trace = FALSE, obj.fn = expression(sum((predicted - 
    +                actual)^2)), objective = "min")
    +        
    +        NNS.ARMA(variable, h = h, seasonal.factor = b$periods, 
    +            weights = b$weights, method = b$method, ncores = 1, 
    +            plot = FALSE) + b$bias.shift
    +    }
    +
    +stopCluster(cl)
    +registerDoSEQ()
    +
    +nns_IVs = do.call(cbind, nns_IVs)
    +
    +
    +# Combine forecasted IVs onto training data.frame
    +new_values = rbind(y_train_VAR, nns_IVs)
    +
    +# Now lag new forecasted data.frame
    +lagged_new_values = lag.mtx(new_values, tau = 4)
    +
    +
    +# Test how accurate new univariate NNS.ARMA forecast IVs are
    +# Also add auto arima estimate...
    +NNS.ARMA_RMSEs = numeric()
    +NNS.ARMA_Correlations = numeric()
    +auto.ARMA_RMSEs = numeric()
    +auto.ARMA_Correlations = numeric()
    +
    +arma.fit = list()
    +
    +for (i in 1:3) {
    +    NNS.ARMA_RMSEs[i] = sqrt(mean((nns_IVs[, i] - test_DVs[, 
    +        i])^2))
    +    NNS.ARMA_Correlations[i] = cor(nns_IVs[, i], test_DVs[, i], 
    +        method = "spearman")
    +    fit = auto.arima(y_train_VAR[, i])
    +    arma.fit[[i]] = as.numeric(forecast(fit, h = h)$mean)
    +    auto.ARMA_RMSEs[i] = sqrt(mean((arma.fit[[i]] - test_DVs[, 
    +        i])^2))
    +    auto.ARMA_Correlations[i] = cor(arma.fit[[i]], test_DVs[, 
    +        i], method = "spearman")
    +}
    +
    +NNS.ARMA_results = cbind(NNS.ARMA_RMSEs, NNS.ARMA_Correlations)
    +
    +colnames(NNS.ARMA_results) = c("NNS ARMA RMSEs", "NNS ARMA Corr")
    +
    +knitr::kable(cbind(VAR_results, NNS.ARMA_results, `Auto ARMA RMSEs` = auto.ARMA_RMSEs, 
    +    `Auto ARMA Cor` = auto.ARMA_Correlations), digits = 4) %>% 
    +    kable_styling(full_width = T) %>% column_spec(1, width = "8cm")
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + +VAR Estimates RMSE to Actual Values + +VAR Estimates Correlation to Actual Values + +NNS ARMA RMSEs + +NNS ARMA Corr + +Auto ARMA RMSEs + +Auto ARMA Cor +
    +Diff.Real.GDP + +0.5062 + +-0.3916 + +0.4194 + +0.2657 + +0.464 + +-0.1329 +
    +Interest.Rate + +1.3451 + +0.7741 + +0.9623 + +-0.0315 + +2.631 + +-0.4812 +
    +Inflation + +0.3481 + +-0.6294 + +0.3255 + +0.6364 + +0.227 + +0.3287 +
    +
    for (i in 1:3) {
    +    plot(1:h, test_DVs[, i], type = "l", lwd = 3, ylim = c(min(new_values[, 
    +        i]), max(new_values[, i])), xlab = "Forecast Period", 
    +        ylab = colnames(new_values)[i])
    +    lines(1:h, nns_IVs[, i], col = "red", lwd = 2)
    +    lines(1:h, VAR_test$fcst[[i]][, 1], col = "blue", lwd = 2)
    +    lines(1:h, arma.fit[[i]], col = "brown", lwd = 2)
    +    legend("topleft", col = c("black", "red", "blue", "brown"), 
    +        legend = c("Actual ", "NNS", "VAR", "ARMA"), lty = 1, 
    +        lwd = c(3, 2, 2, 2))
    +}
    +

    +
    # Significant Difference in Distributions
    +for (i in 1:3) {
    +    print(paste0("NNS KS Test: ", suppressWarnings(ks.test(nns_IVs[, 
    +        i], test_DVs[, i])$p.value)))
    +    print(paste0("ARMA KS Test: ", suppressWarnings(ks.test(arma.fit[[i]], 
    +        test_DVs[, i])$p.value)))
    +}
    +
    ## [1] "NNS KS Test: 0.868981671175775"
    +## [1] "ARMA KS Test: 0.0995467717099161"
    +## [1] "NNS KS Test: 0.0995618483147803"
    +## [1] "ARMA KS Test: 1.22884247066857e-05"
    +## [1] "NNS KS Test: 0.00785901405096467"
    +## [1] "ARMA KS Test: 0.0995467717099161"
    +
    +
    +
    +

    7 Use Forecasted IVs in NNS and Random Forest

    +

    We will now utilize the forecasted variables in the multi-variate NNS.reg() function. Our first step is to pass these variables through the NNS.boost() function in order to reduce any unnecessary features. The objective function obj.fn and objective objective parameters in NNS.boost() need to be re-defined from their default values since this is not a classification problem.

    +

    From there, we use the reduced variable set (if any reduction) in the NNS.stack() function which cross- validates the n.best and threshold parameters from the NNS.reg() function.

    +
    y_train_nns = head(lagged_new_values, dim(lagged_new_values)[1] - 
    +    h)
    +y_test_nns = tail(lagged_new_values, h)
    +
    +# Select tau = 0 as test set DVs
    +DVs = which(grepl("tau.0", colnames(y_train_nns)))
    +
    +nns_est = list()
    +
    +NNS_est_RMSEs = numeric()
    +NNS_est_Correlations = numeric()
    +
    +RF_RMSEs = numeric()
    +RF_Correlations = numeric()
    +
    +
    +for (i in DVs) {
    +    index = which(DVs == i)
    +    test.set = test_DVs[, index]
    +    
    +    # NNS.boost() is an ensemble method comparable to xgboost,
    +    # and aids in dimension reduction
    +    nns_boost_est = NNS.boost(IVs.train = y_train_nns[, -i], 
    +        DV.train = y_train_nns[, i], IVs.test = y_test_nns[, 
    +            -i], obj.fn = expression(sum((predicted - actual)^2)), 
    +        objective = "min", ts.test = 2 * h, learner.trials = 100, 
    +        epochs = 100, ncores = 1, type = NULL, feature.importance = FALSE, 
    +        folds = 1)
    +    
    +    # NNS.stack() cross-validates the parameters of the
    +    # multivariate NNS.reg() and dimension reduction NNS.reg()
    +    relevant_vars = colnames(y_train_nns) %in% names(nns_boost_est$feature.weights)
    +    
    +    nns_reg_est = NNS.stack(IVs.train = y_train_nns[, relevant_vars], 
    +        DV.train = y_train_nns[, i], IVs.test = y_test_nns[, 
    +            relevant_vars], folds = 1, ts.test = 2 * h, order = "max", 
    +        obj.fn = expression(sum((predicted - actual)^2)), objective = "min")$stack
    +    
    +    # Ensemble with univariate estimates
    +    nns_est[[index]] = (nns_IVs[, index] + nns_reg_est)/2
    +    
    +    # Random Forest
    +    rf = randomForest(y_train_nns[, -i], y_train_nns[, i], ntree = 100)
    +    rf.pred = predict(rf, newdata = y_test_nns[, -i])
    +    
    +    # Print all results
    +    NNS_est_RMSEs[index] = sqrt(mean((nns_est[[index]] - test.set)^2))
    +    RF_RMSEs[index] = sqrt(mean((rf.pred - test.set)^2))
    +    
    +    NNS_est_Correlations[index] = cor(nns_est[[index]], test.set, 
    +        method = "spearman")
    +    RF_Correlations[index] = cor(rf.pred, test.set, method = "spearman")
    +    
    +    
    +    # Plot all results
    +    plot(1:12, test_DVs[, index], type = "l", lwd = 3, ylim = c(min(new_values[, 
    +        index]), max(new_values[, index])), xlab = "Forecast Period", 
    +        ylab = colnames(new_values)[i])
    +    lines(1:12, nns_est[[index]], col = "red", lwd = 2)
    +    lines(1:12, rf.pred, col = "green", lwd = 2)
    +    lines(1:12, VAR_test$fcst[[index]][, 1], col = "blue", lwd = 2)
    +    legend("topleft", col = c("black", "red", "blue", "green"), 
    +        legend = c("Actual ", "NNS", "VAR", "RF"), lty = 1, lwd = c(3, 
    +            2, 2, 2))
    +}
    +

    +
    ALL_RESULTS = cbind(VAR_results, NNS.ARMA_results, `NNS RMSEs` = NNS_est_RMSEs, 
    +    `NNS Corr` = NNS_est_Correlations, `RF RMSEs` = RF_RMSEs, 
    +    `RF Corr` = RF_Correlations)
    +
    +knitr::kable(ALL_RESULTS, digits = 4) %>% kable_styling(full_width = T) %>% 
    +    column_spec(1, width = "8cm")
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + +VAR Estimates RMSE to Actual Values + +VAR Estimates Correlation to Actual Values + +NNS ARMA RMSEs + +NNS ARMA Corr + +NNS RMSEs + +NNS Corr + +RF RMSEs + +RF Corr +
    +Diff.Real.GDP + +0.5062 + +-0.3916 + +0.4194 + +0.2657 + +0.4335 + +0.3007 + +0.4949 + +-0.3497 +
    +Interest.Rate + +1.3451 + +0.7741 + +0.9623 + +-0.0315 + +0.6032 + +0.5814 + +0.5211 + +0.7110 +
    +Inflation + +0.3481 + +-0.6294 + +0.3255 + +0.6364 + +0.3031 + +0.5804 + +0.2631 + +0.4056 +
    +
    +
    +

    8 NNS.VAR()

    +

    The NNS.VAR() function now accomplishes all of the prior steps in a single line of code. NNS.VAR does take the additional step of weighting each estimate by its objective function result.

    +
    nns_var_estimates = NNS.VAR(y_train_VAR, h = 12, tau = 4, ncores = 1)
    +
    +nns_var_estimates
    +
    ## $univariate
    +##       Diff Real GDP Interest Rate Inflation
    +##  [1,]    0.71182079      4.536222 0.6891101
    +##  [2,]    0.97198238      4.655889 0.5132415
    +##  [3,]    0.49344043      4.603333 0.5697954
    +##  [4,]    0.89140137      4.737889 0.4357171
    +##  [5,]    1.00169992      4.871167 0.3667949
    +##  [6,]    0.93692811      4.665000 0.4023420
    +##  [7,]    0.04796763      4.750889 0.4080647
    +##  [8,]    0.56628191      4.603222 0.4370086
    +##  [9,]    0.41366022      4.474333 0.4392642
    +## [10,]    0.56131736      4.821333 0.3732100
    +## [11,]    0.67121142      4.492222 0.3300415
    +## [12,]    0.77318203      4.568333 0.2547801
    +## 
    +## $multivariate
    +##       Diff Real GDP Interest Rate Inflation
    +##  [1,]     0.9279022      2.358849 0.4422726
    +##  [2,]     0.9141970      2.440993 0.5782729
    +##  [3,]     0.9844185      4.717534 0.7800222
    +##  [4,]     0.8247261      4.666915 0.5522493
    +##  [5,]     0.8331298      4.675716 0.4952923
    +##  [6,]     0.9397740      5.092680 0.4653352
    +##  [7,]     1.1505812      5.161801 0.5074708
    +##  [8,]     0.9506566      4.965832 0.3940349
    +##  [9,]     1.0781853      4.825252 0.4178213
    +## [10,]     1.1659552      5.145660 0.5238891
    +## [11,]     1.2304537      4.768645 0.3865078
    +## [12,]     0.9972291      4.856568 0.3262525
    +## 
    +## $ensemble
    +##       Diff Real GDP Interest Rate Inflation
    +##  [1,]     0.7613002      2.759611 0.5048272
    +##  [2,]     0.9587504      2.848662 0.5617924
    +##  [3,]     0.6058670      4.696514 0.7267456
    +##  [4,]     0.8761337      4.679979 0.5227172
    +##  [5,]     0.9630999      4.711690 0.4627279
    +##  [6,]     0.9375798      5.013962 0.4493712
    +##  [7,]     0.3004495      5.086169 0.4822789
    +##  [8,]     0.6542979      4.899091 0.4049255
    +##  [9,]     0.5658265      4.760663 0.4232554
    +## [10,]     0.6997703      5.085965 0.4857034
    +## [11,]     0.7992695      4.717768 0.3721978
    +## [12,]     0.8244854      4.803516 0.3081397
    +
    NNS_VAR_RMSEs = numeric()
    +NNS_VAR_corr = numeric()
    +
    +for (i in 1:3) {
    +    NNS_VAR_RMSEs[i] = sqrt(mean((nns_var_estimates$ensemble[, 
    +        i] - test_DVs[, i])^2))
    +    
    +    NNS_VAR_corr[i] = cor(nns_var_estimates$ensemble[, i], test_DVs[, 
    +        i], method = "spearman")
    +}
    +
    +NNS_VAR_results = cbind(NNS_VAR_RMSEs, NNS_VAR_corr)
    +rownames(NNS_VAR_results) = colnames(y_train_VAR)
    +knitr::kable(NNS_VAR_results, digits = 4) %>% kable_styling(full_width = T) %>% 
    +    column_spec(1, width = "8cm")
    + + + + + + + + + + + + + + + + + + + + + + + + + +
    + +NNS_VAR_RMSEs + +NNS_VAR_corr +
    +Diff Real GDP + +0.4082 + +0.2657 +
    +Interest Rate + +0.4855 + +0.8231 +
    +Inflation + +0.2963 + +0.5175 +
    +
    +
    +
    +
      +
    1. Data available for download at the following: https://sites.google.com/site/lkilian2019/↩︎

    2. +
    +
    + + + +
    +
    + +
    + + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/directional-markov-regimes-pca.md b/tools/NNS/examples/directional-markov-regimes-pca.md new file mode 100644 index 0000000..26c68c1 --- /dev/null +++ b/tools/NNS/examples/directional-markov-regimes-pca.md @@ -0,0 +1,2394 @@ +# Directional Markov Regimes and PCA Recovery from NNS Quadrants + +This note extends the [Chapter 11 directional spectral decomposition](https://ovvo-financial.github.io/NNS/book/directional-spectral-decomposition.html) from static quadrants to time-indexed directional regimes. + +--- + +## Executive Summary + +Classical PCA starts with a covariance matrix and extracts abstract eigenvectors. + +NNS starts with observable directional regions: + +- `CUPM`: upper concordant quadrant +- `CLPM`: lower concordant quadrant +- `DLPM`: lower divergent quadrant +- `DUPM`: upper divergent quadrant + +Each quadrant has an empirical probability, a conditional mean, and a conditional covariance. These observable pieces reconstruct the covariance matrix exactly. + +The static covariance decomposition is: + +```math +\Sigma += +\underbrace{\sum_q p_q u_q u_q^\top}_{\Sigma^B} ++ +\underbrace{\sum_q p_q \Sigma^{(q)}}_{\Sigma^W}. +``` + +The dynamic transition-path decomposition is: + +```math +\Sigma_{\mathrm{lead}} += +\underbrace{ +\sum_{q,q'} p_{qq'} u_{q\to q'}u_{q\to q'}^\top +}_{\Sigma^B_{\mathrm{dyn}}} ++ +\underbrace{ +\sum_{q,q'} p_{qq'} \Sigma^{(q\to q')} +}_{\Sigma^W_{\mathrm{dyn}}}. +``` + +The main result is: + +> PCA identifies the dominant axis. NNS identifies the regimes and transition paths that created it. + +The manual decompositions below are also checked against the exported NNS package workflow. With: + +```r +NNS::PM.matrix( + LPM_degree = 1, + UPM_degree = 1, + target = "mean", + variable = Z, + pop_adj = TRUE, + norm = FALSE +) +``` + +the returned matrices satisfy: + +```math +\Sigma += +\mathrm{clpm} ++ +\mathrm{cupm} +- +\mathrm{dlpm} +- +\mathrm{dupm}. +``` + +Thus the same covariance and PCA recovery can be verified directly through the package API. The setting `norm = FALSE` is required for covariance and PCA recovery; `norm = TRUE` returns a normalized signed dependence matrix. + +--- + +## 1. Hidden Markov Models Versus Directional Markov Regimes + +A classical Hidden Markov Model has: + +- latent states `S_t in {1, ..., K}`; +- transition probabilities `P_ij = P(S_{t+1}=j | S_t=i)`; +- emission distributions `f(Z_t | S_t=k)`. + +The analyst observes `Z_t`, but not `S_t`. The states must be inferred through filtering, smoothing, or EM-type procedures. + +In the NNS directional framework, the state is not latent. It is directly defined by directional geometry. + +Let: + +```math +Z_t = +\begin{pmatrix} +X_t\\ +Y_t +\end{pmatrix}, +\qquad +\mu = +\begin{pmatrix} +\bar X\\ +\bar Y +\end{pmatrix}. +``` + +Define the observable quadrant state: + +```math +Q_t \in \{\mathrm{CUPM}, \mathrm{CLPM}, \mathrm{DLPM}, \mathrm{DUPM}\}. +``` + +Using a mean split: + +```math +\mathrm{CUPM}: X_t > \bar X,\; Y_t > \bar Y, +``` + +```math +\mathrm{CLPM}: X_t \leq \bar X,\; Y_t \leq \bar Y, +``` + +```math +\mathrm{DLPM}: X_t > \bar X,\; Y_t \leq \bar Y, +``` + +```math +\mathrm{DUPM}: X_t \leq \bar X,\; Y_t > \bar Y. +``` + +The state probabilities are empirical frequencies: + +```math +p_q = P(Q_t=q). +``` + +The state-conditional means are: + +```math +m_q = E[Z_t \mid Q_t=q]. +``` + +The state-conditional covariances are: + +```math +\Sigma^{(q)} += +\mathrm{Cov}(Z_t \mid Q_t=q). +``` + +The key difference from an HMM is that `Q_t` is observed once the benchmark is specified. No latent-state inference is required. + +--- + +## 2. Static Directional Spectral Decomposition + +The covariance matrix decomposes as: + +```math +\Sigma += +\Sigma^B+\Sigma^W, +``` + +where: + +```math +\Sigma^B += +\sum_q p_q u_q u_q^\top, +``` + +and: + +```math +\Sigma^W += +\sum_q p_q\Sigma^{(q)}. +``` + +Here: + +```math +u_q=m_q-\mu. +``` + +The between-quadrant component `Sigma^B` is built from rank-one spectral primitives: + +```math +B_q=p_q u_q u_q^\top. +``` + +If `u_q` is nonzero, then: + +```math +B_q u_q += +p_q u_q u_q^\top u_q += +p_q\|u_q\|^2u_q. +``` + +Therefore `u_q` is the nonzero eigenvector of its own quadrant contribution `B_q`, with eigenvalue: + +```math +\ell_q=p_q\|u_q\|^2. +``` + +After normalization, + +```math +v_q=\frac{u_q}{\|u_q\|} +``` + +is the corresponding unit eigenvector. + +Thus, centered quadrant conditional means are rank-one spectral primitives. + +The full between-quadrant covariance is: + +```math +\Sigma^B += +B_{\mathrm{CUPM}} ++ +B_{\mathrm{CLPM}} ++ +B_{\mathrm{DLPM}} ++ +B_{\mathrm{DUPM}}. +``` + +Equivalently, define: + +```math +C += +\left( +\sqrt{p_{\mathrm{CUPM}}}u_{\mathrm{CUPM}}\; +\sqrt{p_{\mathrm{CLPM}}}u_{\mathrm{CLPM}}\; +\sqrt{p_{\mathrm{DLPM}}}u_{\mathrm{DLPM}}\; +\sqrt{p_{\mathrm{DUPM}}}u_{\mathrm{DUPM}} +\right). +``` + +Then: + +```math +\Sigma^B=CC^\top. +``` + +The eigenvectors of `Sigma^B` are the left singular vectors of `C`, built entirely from weighted quadrant conditional mean displacements. + +--- + + +## 3. PCA Recovery Directly from Quadrant Conditional Means + +This is the central static recovery chain: + +```math +\{p_q,m_q,\Sigma^{(q)}\}_q +\longrightarrow +\Sigma^B+\Sigma^W +\longrightarrow +\Sigma +\longrightarrow +(\lambda_i,v_i). +``` + +The quadrant conditional means enter through: + +```math +u_q=m_q-\mu. +``` + +From those displacements, construct the rank-one matrices: + +```math +B_q=p_q u_q u_q^\top. +``` + +Then: + +```math +\Sigma^B += +\sum_q B_q += +\sum_q p_q u_q u_q^\top. +``` + +If only the quadrant conditional means and probabilities are used, the recovered eigensystem is the eigensystem of the between-quadrant covariance `Sigma^B`. + +To recover the full classical PCA eigensystem of the original covariance matrix, add the within-quadrant residual covariance terms: + +```math +\Sigma += +\Sigma^B+\Sigma^W += +\sum_q p_q u_q u_q^\top ++ +\sum_q p_q\Sigma^{(q)}. +``` + +Then diagonalize the recovered covariance matrix: + +```math +\Sigma v_i=\lambda_i v_i. +``` + +This is not an approximation in the sample calculation. It is an exact covariance decomposition up to numerical machine precision. + +The R example below verifies this directly by computing: + +1. each quadrant conditional mean `m_q`; +2. each centered displacement `u_q`; +3. each rank-one matrix `B_q`; +4. the between-quadrant covariance `Sigma_B`; +5. the within-quadrant covariance `Sigma_W`; +6. the recovered covariance `Sigma_B + Sigma_W`; +7. the eigenvalues and eigenvectors of the recovered matrix. + +In the reported run, the max absolute recovery error was approximately: + +```math +1.08\times 10^{-19}, +``` + +and the original and recovered leading eigenvectors had alignment: + +```math +1. +``` + +The between-quadrant covariance alone had leading eigenvector alignment: + +```math +0.9999992 +``` + +with the full PCA leading eigenvector. + +So the result is stronger than merely saying that PCA is related to quadrant means: + +> The quadrant conditional means generate rank-one spectral primitives. Their sum recovers the between-quadrant eigensystem. Adding within-quadrant residual covariance recovers the full PCA eigensystem. + + +## 4. PCA Eigenvalue Attribution + +Let `(lambda_i, v_i)` be a unit eigenpair of the full covariance matrix `Sigma`: + +```math +\Sigma v_i=\lambda_i v_i, +\qquad +\|v_i\|=1. +``` + +Since: + +```math +\lambda_i = v_i^\top\Sigma v_i, +``` + +and: + +```math +\Sigma += +\sum_q p_q u_q u_q^\top ++ +\sum_q p_q\Sigma^{(q)}, +``` + +we get the exact attribution identity: + +```math +\lambda_i += +\sum_q p_q(v_i^\top u_q)^2 ++ +\sum_q p_q v_i^\top\Sigma^{(q)}v_i. +``` + +This decomposes each classical PCA eigenvalue into: + +1. between-quadrant conditional-mean displacement; +2. within-quadrant residual covariance. + +> PCA diagonalizes covariance. Directional decomposition explains where that covariance came from. + +--- + +## 5. Observable Directional Markov Transitions + +Because `Q_t` is observed, transition probabilities can be estimated directly: + +```math +\widehat P_{qq'} += +P(Q_{t+1}=q'\mid Q_t=q) += +\frac{ +\#\{t:Q_t=q,\;Q_{t+1}=q'\} +}{ +\#\{t:Q_t=q\} +}. +``` + +This gives a four-state observable Markov chain over: + +```math +\{\mathrm{CUPM},\mathrm{CLPM},\mathrm{DLPM},\mathrm{DUPM}\}. +``` + +Examples: + +```math +P(\mathrm{CLPM}_{t+1}\mid \mathrm{CLPM}_t) +``` + +is crash persistence. + +```math +P(\mathrm{CUPM}_{t+1}\mid \mathrm{CLPM}_t) +``` + +is crash-to-rally reversal. + +```math +P(\mathrm{CLPM}_{t+1}\mid \mathrm{CUPM}_t) +``` + +is rally-to-crash reversal. + +This is not a latent Markov model. It is an observable directional Markov regime model. + +--- + +## 6. One-Step Predictive Mixture + +Given current quadrant `Q_t=q`, the one-step predictive mean is: + +```math +E[Z_{t+1}\mid Q_t=q] += +\sum_{q'}P_{qq'}m_{q'}. +``` + +The corresponding predictive covariance is: + +```math +\mathrm{Cov}(Z_{t+1}\mid Q_t=q) += +\sum_{q'}P_{qq'} +\left[ +\Sigma^{(q')} ++ +(m_{q'}-\mu_{q\to\cdot})(m_{q'}-\mu_{q\to\cdot})^\top +\right], +``` + +where: + +```math +\mu_{q\to\cdot} += +\sum_{q'}P_{qq'}m_{q'}. +``` + +This is the same mixture logic used by HMMs, but with observed directional states instead of inferred hidden states. + +--- + +## 7. Dynamic Transition-Path Spectral Decomposition + +The static quadrant decomposition can be extended to transition paths. + +Define: + +```math +p_{qq'}=P(Q_t=q,Q_{t+1}=q'). +``` + +Let: + +```math +m_{q\to q'} += +E[Z_{t+1}\mid Q_t=q,Q_{t+1}=q']. +``` + +Let: + +```math +\mu_{\mathrm{lead}}=E[Z_{t+1}]. +``` + +Define the transition-path displacement: + +```math +u_{q\to q'} += +m_{q\to q'}-\mu_{\mathrm{lead}}. +``` + +Define the within-transition covariance: + +```math +\Sigma^{(q\to q')} += +\mathrm{Cov}(Z_{t+1}\mid Q_t=q,Q_{t+1}=q'). +``` + +Then the lead covariance decomposes as: + +```math +\Sigma_{\mathrm{lead}} += +\Sigma^B_{\mathrm{dyn}} ++ +\Sigma^W_{\mathrm{dyn}}, +``` + +where: + +```math +\Sigma^B_{\mathrm{dyn}} += +\sum_{q,q'}p_{qq'}u_{q\to q'}u_{q\to q'}^\top, +``` + +and: + +```math +\Sigma^W_{\mathrm{dyn}} += +\sum_{q,q'}p_{qq'}\Sigma^{(q\to q')}. +``` + +Each transition path contributes a rank-one dynamic spectral primitive: + +```math +B_{q\to q'} += +p_{qq'}u_{q\to q'}u_{q\to q'}^\top. +``` + +If `u_{q->q'}` is nonzero, then: + +```math +B_{q\to q'}u_{q\to q'} += +p_{qq'}\|u_{q\to q'}\|^2u_{q\to q'}. +``` + +Thus transition paths are dynamic rank-one spectral primitives. + +--- + + +## 8. Dynamic PCA Recovery from Transition-Path Conditional Means + +The dynamic recovery chain is: + +```math +\{p_{qq'},m_{q\to q'},\Sigma^{(q\to q')}\}_{q,q'} +\longrightarrow +\Sigma^B_{\mathrm{dyn}}+\Sigma^W_{\mathrm{dyn}} +\longrightarrow +\Sigma_{\mathrm{lead}} +\longrightarrow +(\lambda_i^{\mathrm{lead}},v_i^{\mathrm{lead}}). +``` + +Transition-path conditional means enter through: + +```math +u_{q\to q'}=m_{q\to q'}-\mu_{\mathrm{lead}}. +``` + +The dynamic between-transition covariance is: + +```math +\Sigma^B_{\mathrm{dyn}} += +\sum_{q,q'}p_{qq'}u_{q\to q'}u_{q\to q'}^\top. +``` + +Adding the within-transition residual covariance gives the full lead covariance: + +```math +\Sigma_{\mathrm{lead}} += +\Sigma^B_{\mathrm{dyn}}+\Sigma^W_{\mathrm{dyn}}. +``` + +Then diagonalizing `Sigma_lead` recovers the dynamic PCA eigensystem. + +In the reported run, the dynamic recovery error was approximately: + +```math +1.08\times 10^{-19}, +``` + +and the dynamic recovered leading eigenvector had alignment: + +```math +1 +``` + +with the original lead covariance leading eigenvector. + +The dynamic between-transition covariance alone had leading eigenvector alignment: + +```math +0.9999993 +``` + +with the full lead PC1. + + +## 10. Dynamic Eigenvalue Attribution + +Let `(lambda_i_lead, v_i_lead)` be an eigenpair of `Sigma_lead`. + +Then: + +```math +\lambda_i^{\mathrm{lead}} += +\sum_{q,q'}p_{qq'}(v_i^{\mathrm{lead}\top}u_{q\to q'})^2 ++ +\sum_{q,q'}p_{qq'}v_i^{\mathrm{lead}\top}\Sigma^{(q\to q')}v_i^{\mathrm{lead}}. +``` + +This gives dynamic spectral attribution by transition path. + +Instead of saying: + +> PC1 explains most of the variance, + +we can say: + +> PC1 is generated by specific observable transition paths, such as CLPM to CUPM, CUPM to CLPM, CLPM to CLPM, and CUPM to CUPM. + +This is the observable-regime analogue of dynamic factor attribution. + +--- + +## 10. Numerical Experiment + +A bivariate time series of length `n = 5000` was simulated from two hidden volatility regimes: + +- state 1: calm, low volatility, low correlation; +- state 2: turbulent, higher volatility, higher correlation. + +The true hidden states followed a persistent two-state Markov process with: + +```math +p_{\mathrm{stay}}=0.96. +``` + +The observed NNS states were then defined by quadrant membership relative to the mean of `X` and `Y`. + +### Hidden State Frequencies + +| State | Frequency | +|---:|---:| +| 1 calm | 0.4082 | +| 2 turbulent | 0.5918 | + +### Observable Quadrant Frequencies + +| Quadrant | Frequency | +|---|---:| +| CUPM | 0.3412 | +| CLPM | 0.3380 | +| DLPM | 0.1564 | +| DUPM | 0.1644 | + +The concordant quadrants dominate: + +```math +p_{\mathrm{CUPM}}+p_{\mathrm{CLPM}} += +0.6792. +``` + +The divergent quadrants account for: + +```math +p_{\mathrm{DLPM}}+p_{\mathrm{DUPM}} += +0.3208. +``` + +### Hidden-State Composition by Observable Quadrant + +| Hidden state | CUPM | CLPM | DLPM | DUPM | +|---:|---:|---:|---:|---:| +| 1 calm | 0.3388 | 0.3172 | 0.5921 | 0.5645 | +| 2 turbulent | 0.6612 | 0.6828 | 0.4079 | 0.4355 | + +The concordant quadrants are mostly turbulent: + +```math +P(\mathrm{turbulent}\mid \mathrm{CUPM})=0.6612, +``` + +```math +P(\mathrm{turbulent}\mid \mathrm{CLPM})=0.6828. +``` + +The divergent quadrants are more often calm: + +```math +P(\mathrm{calm}\mid \mathrm{DLPM})=0.5921, +``` + +```math +P(\mathrm{calm}\mid \mathrm{DUPM})=0.5645. +``` + +Thus the observable quadrant states recover meaningful information about the latent volatility regime without fitting an HMM. + +--- + +## 11. Static Results + +### Covariance Recovery + +The original covariance matrix was: + +```math +\Sigma += +\begin{pmatrix} +0.00078857 & 0.00055621\\ +0.00055621 & 0.00079558 +\end{pmatrix}. +``` + +The recovered covariance matrix was: + +```math +\Sigma^B+\Sigma^W += +\begin{pmatrix} +0.00078857 & 0.00055621\\ +0.00055621 & 0.00079558 +\end{pmatrix}. +``` + +The max absolute recovery error was: + +```math +1.084202\times 10^{-19}. +``` + +This is numerical zero. + +### Static Eigenvalue Recovery + +The eigenvalues of the original covariance matrix were: + +```math +\lambda_1=0.0013482978, +\qquad +\lambda_2=0.0002358557. +``` + +The eigenvalues of the recovered matrix were identical. + +The leading eigenvector of the original covariance matrix was: + +```math +v_1=(0.7048772,\;0.7093294). +``` + +The leading eigenvector of the recovered matrix was: + +```math +v_1=(0.7048772,\;0.7093294). +``` + +The alignment was: + +```math +1. +``` + +The leading eigenvector of the between-quadrant covariance `Sigma^B` alone was: + +```math +v^B_1=(0.7039744,\;0.7102253). +``` + +Its alignment with the full PC1 was: + +```math +0.9999992. +``` + +Thus, in this simulation, the quadrant conditional-mean geometry essentially determines PC1. + +### Static Eigenvalue Attribution + +For the first eigenvalue: + +```math +\lambda_1=0.0013482978. +``` + +The decomposition was: + +```math +\text{between-quadrant}=0.0008537907, +``` + +```math +\text{within-quadrant}=0.0004945070. +``` + +Therefore: + +```math +\frac{0.0008537907}{0.0013482978} +\approx 63.3\%. +``` + +About 63.3 percent of the leading PCA eigenvalue came from between-quadrant conditional-mean displacement. + +About 36.7 percent came from within-quadrant residual covariance. + +### Per-Quadrant Contributions to `lambda_1` + +| Quadrant | Total Contribution | Share of `lambda_1` | +|---|---:|---:| +| CLPM | 0.0006692665 | 49.6% | +| CUPM | 0.0006446941 | 47.8% | +| DUPM | 0.0000182523 | 1.4% | +| DLPM | 0.0000160849 | 1.2% | + +The concordant quadrants contributed approximately: + +```math +49.6\%+47.8\%=97.4\% +``` + +of the leading eigenvalue. + +Therefore: + +> The leading eigenvalue is a concordant co-movement eigenvalue. + +--- + +## 11A. Direct NNS `PM.matrix` Verification + +The previous sections construct the static covariance recovery manually from quadrant probabilities, conditional means, and within-quadrant covariances. + +The same covariance object can also be recovered directly from the exported NNS function: + +```r +pm <- NNS::PM.matrix( + LPM_degree = 1, + UPM_degree = 1, + target = "mean", + variable = Z, + pop_adj = TRUE, + norm = FALSE +) +``` + +The returned object contains: + +```r +names(pm) +# "cupm" "dupm" "dlpm" "clpm" "cov.matrix" +``` + +Using the returned directional matrices: + +```math +\Sigma_{\mathrm{NNS}} += +\mathrm{clpm} ++ +\mathrm{cupm} +- +\mathrm{dlpm} +- +\mathrm{dupm}. +``` + +In the reported run: + +```math +\max\left|\mathrm{cov}(Z)-\mathrm{PM.matrix}(Z)\$cov.matrix\right| += +4.336809\times 10^{-19}. +``` + +Also: + +```math +\max\left|\mathrm{cov}(Z)-\Sigma_{\mathrm{NNS}}\right| += +4.336809\times 10^{-19}. +``` + +and: + +```math +\max\left|\mathrm{PM.matrix}(Z)\$cov.matrix-\Sigma_{\mathrm{NNS}}\right| += +0. +``` + +So the direct package implementation recovers the covariance matrix to numerical precision. + +### Static PCA Recovery from `PM.matrix` + +The classical PCA eigenvalues of `cov(Z)` were: + +```math +0.0013485675, +\qquad +0.0002359029. +``` + +The eigenvalues recovered from the `PM.matrix` directional reconstruction were identical: + +```math +0.0013485675, +\qquad +0.0002359029. +``` + +The eigenvector alignments were: + +```math +1,\qquad 1. +``` + +Thus `PM.matrix` directly recovers the same static PCA eigensystem. + +### Static Eigenvalue Attribution from `PM.matrix` + +For PC1: + +| Component | Contribution | Signed Share | +|---|---:|---:| +| CLPM | 0.0007091240 | 52.58351% | +| CUPM | 0.0006828494 | 50.63517% | +| DLPM | 0.0000217030 | -1.609337% | +| DUPM | 0.0000217030 | -1.609337% | + +The signed shares sum to 100 percent because the divergent components enter the covariance reconstruction with negative sign. + +For PC2: + +| Component | Contribution | Signed Share | +|---|---:|---:| +| CLPM | 0.0000973457 | 41.26517% | +| CUPM | 0.0000951512 | 40.33489% | +| DLPM | -0.0000217030 | 9.199970% | +| DUPM | -0.0000217030 | 9.199970% | + +The same interpretation holds: + +> PC1 is primarily a concordant co-movement object. PC2 contains a larger relative divergent contribution. + +### Lead-Sample Verification + +The dynamic section works with the lead sample `Z_{t+1}`. Applying the same package workflow to `Z_lead = Z[-1, ]` gives: + +```math +\max\left|\mathrm{cov}(Z_{\mathrm{lead}})-\mathrm{PM.matrix}(Z_{\mathrm{lead}})\$cov.matrix\right| += +5.421011\times 10^{-19}. +``` + +The lead-sample PCA eigenvalues were: + +```math +0.0013488347, +\qquad +0.0002355702. +``` + +The `PM.matrix` recovered eigenvalues were identical, and the eigenvector alignments were again: + +```math +1,\qquad 1. +``` + +Thus the unconditional lead covariance used in the dynamic section is also directly recoverable through `PM.matrix`. + +### Transition-Path Covariance Validation + +The transition-path decomposition remains a manual observable-regime decomposition because it groups observations by paths such as: + +```math +\mathrm{CLPM}\to\mathrm{CUPM}, +\qquad +\mathrm{CUPM}\to\mathrm{CLPM}, +\qquad +\mathrm{CLPM}\to\mathrm{CLPM}. +``` + +However, the covariance inside each transition path can be checked directly with `PM.matrix`. + +Across all 16 transition paths, the maximum path-level covariance error was: + +```math +3.794708\times 10^{-19}. +``` + +The maximum path-level directional reassembly error was also: + +```math +3.794708\times 10^{-19}. +``` + +So `PM.matrix` validates the within-path covariance objects used inside the dynamic transition-path decomposition. + +### `norm = TRUE` Caveat + +For covariance and PCA recovery, use: + +```r +norm = FALSE +``` + +With `norm = TRUE`, the returned `cov.matrix` is a normalized signed dependence matrix. In this run: + +```math +\mathrm{PM.matrix}(Z,\mathrm{norm}=TRUE)\$cov.matrix += +\begin{pmatrix} +1 & 0.8650148\\ +0.8650148 & 1 +\end{pmatrix}. +``` + +This differs from `cov(Z)` by: + +```math +0.9992113. +``` + +Therefore: + +> `norm = FALSE` preserves covariance magnitude and recovers PCA. +> `norm = TRUE` returns a normalized signed dependence matrix, not the covariance/PCA object. + + +--- + +## 12. Observable Transition Matrix + +The estimated transition matrix was: + +| Current `Q_t` | CUPM | CLPM | DLPM | DUPM | +|---|---:|---:|---:|---:| +| CUPM | 0.3464 | 0.3470 | 0.1512 | 0.1553 | +| CLPM | 0.3503 | 0.3485 | 0.1444 | 0.1568 | +| DLPM | 0.3171 | 0.3235 | 0.1739 | 0.1854 | +| DUPM | 0.3350 | 0.3118 | 0.1754 | 0.1778 | + +Key transition probabilities: + +```math +P(\mathrm{CLPM}_{t+1}\mid \mathrm{CLPM}_t)=0.3485. +``` + +```math +P(\mathrm{CUPM}_{t+1}\mid \mathrm{CLPM}_t)=0.3503. +``` + +```math +P(\mathrm{CLPM}_{t+1}\mid \mathrm{CUPM}_t)=0.3470. +``` + +In this simulation, crash persistence is not dominant. CLPM is about equally likely to persist or flip to CUPM. + +This reveals an important distinction: + +> HMM state persistence does not imply directional quadrant persistence. + +The hidden volatility regime is persistent, but the directional quadrant state is not strongly persistent. The NNS transition matrix measures persistence and reversal in observable directional outcomes, not persistence in latent volatility. + +--- + +## 13. One-Step Forecast from CLPM + +For current state: + +```math +Q_t=\mathrm{CLPM}, +``` + +the one-step predictive mean was: + +```math +E[Z_{t+1}\mid Q_t=\mathrm{CLPM}] += +(-0.0003297,\;-0.0006816). +``` + +The one-step predictive covariance was: + +```math +\mathrm{Cov}(Z_{t+1}\mid Q_t=\mathrm{CLPM}) += +\begin{pmatrix} +0.0008044702 & 0.0005762073\\ +0.0005762073 & 0.0008115607 +\end{pmatrix}. +``` + +This is an HMM-style predictive mixture, but no hidden-state filtering was used. + +--- + +## 14. Dynamic Transition-Path Results + +### Dynamic Covariance Recovery + +The original lead covariance matrix was: + +```math +\Sigma_{\mathrm{lead}} += +\begin{pmatrix} +0.00078851 & 0.00055651\\ +0.00055651 & 0.00079558 +\end{pmatrix}. +``` + +The recovered dynamic covariance matrix was identical: + +```math +\Sigma^B_{\mathrm{dyn}}+\Sigma^W_{\mathrm{dyn}} += +\begin{pmatrix} +0.00078851 & 0.00055651\\ +0.00055651 & 0.00079558 +\end{pmatrix}. +``` + +The max absolute dynamic recovery error was: + +```math +1.084202\times 10^{-19}. +``` + +Again, this is numerical zero. + +### Dynamic Eigensystem Recovery + +The eigenvalues of `Sigma_lead` were: + +```math +\lambda_1^{\mathrm{lead}}=0.0013485649, +\qquad +\lambda_2^{\mathrm{lead}}=0.0002355231. +``` + +The recovered dynamic covariance matrix had the same eigenvalues. + +The leading eigenvector of `Sigma_lead` was: + +```math +v_1^{\mathrm{lead}}=(0.7048570,\;0.7093494). +``` + +The leading eigenvector of the recovered dynamic covariance was identical. + +The dynamic between-transition covariance `Sigma^B_dyn` alone had leading eigenvector: + +```math +v^{B,dyn}_1=(0.7040324,\;0.7101679). +``` + +Its alignment with the full lead PC1 was: + +```math +0.9999993. +``` + +Thus the transition-path conditional-mean geometry essentially determines the dynamic leading eigenvector. + +### Dynamic Eigenvalue Attribution + +For the leading dynamic eigenvalue: + +```math +\lambda_1^{\mathrm{lead}}=0.001348564855. +``` + +The decomposition was: + +```math +\text{between-transition}=0.0008586159410, +``` + +```math +\text{within-transition}=0.0004899489139. +``` + +Therefore: + +```math +\frac{0.0008586159410}{0.001348564855} +\approx 63.7\%. +``` + +About 63.7 percent of the dynamic leading eigenvalue came from transition-path conditional-mean displacement. + +About 36.3 percent came from within-transition residual covariance. + +--- + +## 15. Transition-Path Contributions to Dynamic `lambda_1` + +The largest transition-path contributors to the leading dynamic eigenvalue were: + +| Path | Total Contribution | Share of `lambda_1_lead` | +|---|---:|---:| +| CLPM to CUPM | 0.0002541752 | 18.85% | +| CUPM to CLPM | 0.0002489761 | 18.46% | +| CLPM to CLPM | 0.0002451239 | 18.18% | +| CUPM to CUPM | 0.0002324682 | 17.24% | +| DLPM to CLPM | 0.0000955944 | 7.09% | +| DUPM to CUPM | 0.0000838107 | 6.21% | +| DUPM to CLPM | 0.0000797236 | 5.91% | +| DLPM to CUPM | 0.0000743513 | 5.51% | + +The top four paths are all concordant-to-concordant paths: + +```math +\{\mathrm{CLPM},\mathrm{CUPM}\} +\to +\{\mathrm{CLPM},\mathrm{CUPM}\}. +``` + +Together they contributed approximately: + +```math +18.85\%+18.46\%+18.18\%+17.24\% += +72.73\% +``` + +of the leading dynamic eigenvalue. + +Therefore: + +> Dynamic PC1 is generated primarily by transitions among concordant regimes. + +This is stronger than saying PC1 is a correlation factor. It identifies the observable transition paths that generate the dominant eigenstructure. + +--- + +## 16. Interpretation + +The results support three claims. + +### Claim 1: NNS quadrant decomposition exactly recovers covariance and PCA. + +The static covariance recovery error was approximately `1e-19`. + +The recovered eigenvalues and eigenvectors matched the original covariance eigensystem exactly to numerical precision. + +The same static covariance and PCA recovery was verified directly through `NNS::PM.matrix` with `norm = FALSE`. The package reconstruction matched `cov(Z)` to numerical precision and produced eigenvector alignments of `1, 1`. + +### Claim 2: Centered quadrant means are rank-one eigenvector primitives. + +For every quadrant: + +```math +B_q u_q=\ell_q u_q. +``` + +The numerical errors were zero to machine precision, and the alignment with each rank-one eigenvector was `1`. + +### Claim 3: Time-indexed quadrant transitions produce observable Markov regimes with exact spectral attribution. + +For every transition path: + +```math +B_{q\to q'}u_{q\to q'} += +\ell_{q\to q'}u_{q\to q'}. +``` + +The dynamic covariance decomposition recovered `Sigma_lead` exactly, and dynamic eigenvalue attribution decomposed the leading eigenvalue into interpretable transition-path contributions. + +The main empirical finding is: + +> Static PC1 is driven by CLPM and CUPM. +> Dynamic PC1 is driven by transitions among CLPM and CUPM. + +In other words, PC1 is not mysterious in this simulation. It is a concordant co-movement object. + +--- + +## 17. Portfolio Interpretation + +The directional decomposition changes the interpretation of hedging. + +Classical PCA suggests hedging against PC1 exposure: + +```math +w^\top v_1=0. +``` + +Directional NNS allows regime-targeted hedging. For example, lower-tail co-movement exposure can be neutralized by: + +```math +w^\top u_{\mathrm{CLPM}}=0. +``` + +This condition means the portfolio is locally neutral to the CLPM conditional-mean displacement, not merely to an abstract variance-maximizing axis. + +Dynamic regime hedging can target transition paths: + +```math +w^\top u_{\mathrm{CLPM}\to\mathrm{CLPM}}=0, +``` + +or: + +```math +w^\top u_{\mathrm{CLPM}\to\mathrm{CUPM}}=0. +``` + +These are economically interpretable exposures: + +- CLPM to CLPM: crash persistence; +- CLPM to CUPM: crash-to-rally reversal; +- CUPM to CLPM: rally-to-crash reversal; +- CUPM to CUPM: rally persistence. + +Thus, directional spectral decomposition provides not only attribution, but also targeted risk control. + +--- + +## 18. Theoretical Significance + +Classical PCA begins with the covariance matrix: + +```math +\Sigma, +``` + +and extracts orthogonal eigenvectors: + +```math +v_i. +``` + +The directional framework begins with interpretable regime primitives: + +```math +p_q,\quad m_q,\quad u_q,\quad \Sigma^{(q)}. +``` + +Then it reconstructs: + +```math +\Sigma += +\sum_q p_q u_q u_q^\top ++ +\sum_q p_q\Sigma^{(q)}. +``` + +Therefore the PCA eigensystem is downstream of directional structure: + +```math +\{p_q,u_q,\Sigma^{(q)}\}_q +\longrightarrow +\Sigma +\longrightarrow +(\lambda_i,v_i). +``` + +The map does not generally reverse: + +```math +(\lambda_i,v_i) +\not\longrightarrow +\{p_q,u_q,\Sigma^{(q)}\}_q. +``` + +PCA discards the regime-indexed origin of covariance. + +NNS preserves it. + +--- + +## 19. Bottom Line + +The central result is: + +> PCA diagonalizes covariance. Directional decomposition explains its origin. + +The static result says: + +> PCA eigenvalues and eigenvectors are recoverable from directional quadrant decomposition. + +The dynamic result says: + +> PCA eigenvalues and eigenvectors are recoverable from observable directional transition-path decomposition. + +The portfolio result says: + +> Risk can be attributed and hedged by named directional regimes, not only by abstract orthogonal factors. + +The HMM comparison says: + +> HMMs infer latent regimes and then interpret them. NNS defines observable directional regimes first and then measures their dynamics. + +Together, these results show that directional NNS is not merely an alternative dependence statistic. It is a regime-indexed spectral genealogy for covariance, PCA, and dynamic risk. + +The manual regime decomposition explains the genealogy. The direct `NNS::PM.matrix` checks show that the underlying directional covariance and PCA recovery are also available through the exported NNS package workflow. + +--- + +## Appendix: Full R Code + +```r +# ============================================================================= +# Directional Markov Regimes +# Observable NNS Quadrant Analogy to Hidden Markov Models +# ============================================================================= + +set.seed(123) + +# ----------------------------------------------------------------------------- +# Helper functions +# ----------------------------------------------------------------------------- + +pop_cov <- function(M) { + M <- as.matrix(M) + n <- nrow(M) + if (n <= 1) return(matrix(0, ncol(M), ncol(M))) + cov(M) * (n - 1) / n +} + +mvrnorm_base <- function(n, mu, Sigma) { + p <- length(mu) + Z <- matrix(rnorm(n * p), n, p) + sweep(Z %*% chol(Sigma), 2, mu, "+") +} + +assign_quadrants <- function(Z, center) { + X <- Z[, 1] + Y <- Z[, 2] + cx <- center[1] + cy <- center[2] + + Q <- rep(NA_character_, nrow(Z)) + + Q[X > cx & Y > cy] <- "CUPM" + Q[X <= cx & Y <= cy] <- "CLPM" + Q[X > cx & Y <= cy] <- "DLPM" + Q[X <= cx & Y > cy] <- "DUPM" + + factor(Q, levels = c("CUPM", "CLPM", "DLPM", "DUPM")) +} + +cat_matrix <- function(M, digits = 8) { + print(round(M, digits)) +} + +# ----------------------------------------------------------------------------- +# 1. Simulate bivariate time series with two hidden volatility regimes +# ----------------------------------------------------------------------------- + +n <- 5000 + +# True hidden states: +# 1 = calm +# 2 = turbulent +state <- integer(n) +state[1] <- 1 + +p_stay <- 0.96 + +for (t in 2:n) { + state[t] <- ifelse(runif(1) < p_stay, state[t - 1], 3 - state[t - 1]) +} + +mu_calm <- c(0.0003, 0.0003) +Sigma_calm <- matrix( + c(0.0002, 0.00003, + 0.00003, 0.0002), + nrow = 2, + byrow = TRUE +) + +mu_turb <- c(-0.0008, -0.0012) +Sigma_turb <- matrix( + c(0.0012, 0.0009, + 0.0009, 0.0012), + nrow = 2, + byrow = TRUE +) + +Z <- matrix(0, n, 2) + +for (t in seq_len(n)) { + if (state[t] == 1) { + Z[t, ] <- mvrnorm_base(1, mu_calm, Sigma_calm) + } else { + Z[t, ] <- mvrnorm_base(1, mu_turb, Sigma_turb) + } +} + +colnames(Z) <- c("X", "Y") + +cat("\n============================================================\n") +cat("SIMULATED DATA\n") +cat("============================================================\n") +cat("Number of observations:", n, "\n") +cat("True hidden state frequencies:\n") +print(prop.table(table(state))) + +# ----------------------------------------------------------------------------- +# 2. Observable NNS quadrant assignment by mean split +# ----------------------------------------------------------------------------- + +# For ex post decomposition, use full-sample mean. +# For live forecasting, replace this with a training, expanding, rolling, +# or externally fixed benchmark. +center <- colMeans(Z) + +Q <- assign_quadrants(Z, center) + +cat("\n============================================================\n") +cat("OBSERVABLE NNS QUADRANT STATES\n") +cat("============================================================\n") +cat("Mean benchmark:\n") +print(center) + +cat("\nQuadrant frequencies:\n") +p_quad <- prop.table(table(Q)) +print(round(p_quad, 6)) + +cat("\nCross-tab: true hidden state versus observable quadrant\n") +print(table(state, Q)) + +cat("\nConditional distribution of hidden states within each quadrant:\n") +print(round(prop.table(table(state, Q), margin = 2), 4)) + +# ----------------------------------------------------------------------------- +# 3. Static NNS covariance decomposition +# ----------------------------------------------------------------------------- + +mu <- colMeans(Z) +Sigma <- pop_cov(Z) + +levels_Q <- levels(Q) + +quad_means <- matrix(NA_real_, nrow = 2, ncol = length(levels_Q)) +colnames(quad_means) <- levels_Q +rownames(quad_means) <- colnames(Z) + +quad_covs <- vector("list", length(levels_Q)) +names(quad_covs) <- levels_Q + +Sigma_B <- matrix(0, 2, 2) +Sigma_W <- matrix(0, 2, 2) + +rank_one_table <- data.frame( + quadrant = character(), + p = numeric(), + mean_X = numeric(), + mean_Y = numeric(), + u_X = numeric(), + u_Y = numeric(), + ell_rank1 = numeric(), + max_abs_Bu_minus_ell_u = numeric(), + alignment_with_Bq_eigenvector = numeric(), + stringsAsFactors = FALSE +) + +cat("\n============================================================\n") +cat("STATIC NNS QUADRANT DECOMPOSITION\n") +cat("============================================================\n") + +for (q in levels_Q) { + idx <- Q == q + Zq <- Z[idx, , drop = FALSE] + + p_q <- nrow(Zq) / nrow(Z) + m_q <- colMeans(Zq) + u_q <- as.numeric(m_q - mu) + + Sigma_q <- pop_cov(Zq) + + B_q <- p_q * tcrossprod(u_q) + W_q <- p_q * Sigma_q + + Sigma_B <- Sigma_B + B_q + Sigma_W <- Sigma_W + W_q + + quad_means[, q] <- m_q + quad_covs[[q]] <- Sigma_q + + ell_q <- p_q * sum(u_q^2) + + lhs <- as.numeric(B_q %*% u_q) + rhs <- ell_q * u_q + max_err <- max(abs(lhs - rhs)) + + eig_Bq <- eigen(B_q, symmetric = TRUE) + v_Bq <- eig_Bq$vectors[, 1] + + if (sum(u_q^2) > 0) { + v_u <- u_q / sqrt(sum(u_q^2)) + align <- abs(sum(v_u * v_Bq)) + } else { + align <- NA_real_ + } + + rank_one_table <- rbind( + rank_one_table, + data.frame( + quadrant = q, + p = p_q, + mean_X = m_q[1], + mean_Y = m_q[2], + u_X = u_q[1], + u_Y = u_q[2], + ell_rank1 = ell_q, + max_abs_Bu_minus_ell_u = max_err, + alignment_with_Bq_eigenvector = align, + stringsAsFactors = FALSE + ) + ) +} + +print(rank_one_table, digits = 8) + +Sigma_recovered <- Sigma_B + Sigma_W + +cat("\nOriginal population covariance Sigma:\n") +cat_matrix(Sigma) + +cat("\nRecovered covariance Sigma_B + Sigma_W:\n") +cat_matrix(Sigma_recovered) + +cat("\nMax absolute recovery error:\n") +print(max(abs(Sigma - Sigma_recovered))) + +cat("\nBetween-quadrant covariance Sigma_B:\n") +cat_matrix(Sigma_B) + +cat("\nWithin-quadrant residual covariance Sigma_W:\n") +cat_matrix(Sigma_W) + +# ----------------------------------------------------------------------------- +# 4. Static eigensystem recovery and spectral attribution +# ----------------------------------------------------------------------------- + +eig_Sigma <- eigen(Sigma, symmetric = TRUE) +eig_recovered <- eigen(Sigma_recovered, symmetric = TRUE) +eig_B <- eigen(Sigma_B, symmetric = TRUE) + +cat("\n============================================================\n") +cat("STATIC EIGENVALUE AND EIGENVECTOR RECOVERY\n") +cat("============================================================\n") + +cat("\nEigenvalues of original Sigma:\n") +print(eig_Sigma$values) + +cat("\nEigenvalues of recovered Sigma_B + Sigma_W:\n") +print(eig_recovered$values) + +cat("\nLeading eigenvector of original Sigma:\n") +print(eig_Sigma$vectors[, 1]) + +cat("\nLeading eigenvector of recovered Sigma_B + Sigma_W:\n") +print(eig_recovered$vectors[, 1]) + +cat("\nLeading eigenvector of Sigma_B only:\n") +print(eig_B$vectors[, 1]) + +cat("\nAbsolute alignment between original PC1 and recovered PC1:\n") +print(abs(sum(eig_Sigma$vectors[, 1] * eig_recovered$vectors[, 1]))) + +cat("\nAbsolute alignment between original PC1 and Sigma_B PC1:\n") +print(abs(sum(eig_Sigma$vectors[, 1] * eig_B$vectors[, 1]))) + +cat("\n============================================================\n") +cat("STATIC SPECTRAL ATTRIBUTION ALONG FULL PCA EIGENVECTORS\n") +cat("============================================================\n") + +static_attr <- data.frame( + eigen_index = integer(), + lambda = numeric(), + between = numeric(), + within = numeric(), + between_plus_within = numeric(), + recovery_error = numeric() +) + +for (i in 1:2) { + v <- eig_Sigma$vectors[, i] + lambda_i <- eig_Sigma$values[i] + + between_i <- as.numeric(t(v) %*% Sigma_B %*% v) + within_i <- as.numeric(t(v) %*% Sigma_W %*% v) + + static_attr <- rbind( + static_attr, + data.frame( + eigen_index = i, + lambda = lambda_i, + between = between_i, + within = within_i, + between_plus_within = between_i + within_i, + recovery_error = abs(lambda_i - between_i - within_i) + ) + ) +} + +print(static_attr, digits = 10) + +# Per-quadrant contribution to lambda1 +v1_static <- eig_Sigma$vectors[, 1] + +quad_lambda1_contrib <- data.frame( + quadrant = character(), + p = numeric(), + between_contrib_to_lambda1 = numeric(), + within_contrib_to_lambda1 = numeric(), + total_contrib_to_lambda1 = numeric(), + stringsAsFactors = FALSE +) + +for (q in levels_Q) { + p_q <- as.numeric(p_quad[q]) + u_q <- as.numeric(quad_means[, q] - mu) + Sigma_q <- quad_covs[[q]] + + b_contrib <- p_q * sum(v1_static * u_q)^2 + w_contrib <- p_q * as.numeric(t(v1_static) %*% Sigma_q %*% v1_static) + + quad_lambda1_contrib <- rbind( + quad_lambda1_contrib, + data.frame( + quadrant = q, + p = p_q, + between_contrib_to_lambda1 = b_contrib, + within_contrib_to_lambda1 = w_contrib, + total_contrib_to_lambda1 = b_contrib + w_contrib, + stringsAsFactors = FALSE + ) + ) +} + +cat("\nPer-quadrant contribution to classical lambda1:\n") +print( + quad_lambda1_contrib[order(-quad_lambda1_contrib$total_contrib_to_lambda1), ], + digits = 10 +) + +cat("\nCheck sum of per-quadrant contributions to lambda1:\n") +print(sum(quad_lambda1_contrib$total_contrib_to_lambda1)) +cat("Classical lambda1:\n") +print(eig_Sigma$values[1]) + +# ----------------------------------------------------------------------------- +# 5. Observable transition matrix +# ----------------------------------------------------------------------------- + +Q_current <- Q[-n] +Q_next <- Q[-1] + +trans_counts <- table(Q_current, Q_next) + +# Use unrounded transition probabilities for calculations. +trans_prob <- prop.table(trans_counts, margin = 1) + +# Rounded matrix only for display. +trans_prob_print <- round(trans_prob, 4) + +cat("\n============================================================\n") +cat("OBSERVABLE DIRECTIONAL MARKOV TRANSITION MATRIX\n") +cat("============================================================\n") + +cat("\nTransition counts:\n") +print(trans_counts) + +cat("\nTransition probabilities P(Q_{t+1} = q_next | Q_t = q_current):\n") +print(trans_prob_print) + +cat("\nCrash persistence, CLPM to CLPM:\n") +print(trans_prob["CLPM", "CLPM"]) + +cat("\nCrash-to-rally reversal, CLPM to CUPM:\n") +print(trans_prob["CLPM", "CUPM"]) + +cat("\nRally-to-crash reversal, CUPM to CLPM:\n") +print(trans_prob["CUPM", "CLPM"]) + +# ----------------------------------------------------------------------------- +# 6. One-step predictive mixture given current quadrant +# ----------------------------------------------------------------------------- + +forecast_mean <- function(q) { + row <- as.numeric(trans_prob[q, ]) + names(row) <- levels_Q + + out <- rep(0, 2) + names(out) <- colnames(Z) + + for (qq in levels_Q) { + out <- out + row[qq] * quad_means[, qq] + } + + out +} + +forecast_cov <- function(q) { + row <- as.numeric(trans_prob[q, ]) + names(row) <- levels_Q + + mu_fc <- forecast_mean(q) + cov_sum <- matrix(0, 2, 2) + + for (qq in levels_Q) { + m_qq <- quad_means[, qq] + Sigma_qq <- quad_covs[[qq]] + d <- as.numeric(m_qq - mu_fc) + + cov_sum <- cov_sum + row[qq] * (Sigma_qq + tcrossprod(d)) + } + + colnames(cov_sum) <- rownames(cov_sum) <- colnames(Z) + cov_sum +} + +cat("\n============================================================\n") +cat("ONE-STEP FORECAST FROM CURRENT OBSERVABLE QUADRANT\n") +cat("============================================================\n") + +current_q <- "CLPM" + +cat("\nForecast mean E[Z_{t+1} | Q_t = CLPM]:\n") +print(forecast_mean(current_q)) + +cat("\nForecast covariance Cov(Z_{t+1} | Q_t = CLPM):\n") +cat_matrix(forecast_cov(current_q), digits = 10) + +# ----------------------------------------------------------------------------- +# 7. Dynamic transition-path covariance decomposition +# ----------------------------------------------------------------------------- + +Z_lead <- Z[-1, , drop = FALSE] + +mu_lead <- colMeans(Z_lead) +Sigma_lead <- pop_cov(Z_lead) + +Sigma_B_dyn <- matrix(0, 2, 2) +Sigma_W_dyn <- matrix(0, 2, 2) + +path_table <- data.frame( + path = character(), + q_current = character(), + q_next = character(), + p_path = numeric(), + mean_X_lead = numeric(), + mean_Y_lead = numeric(), + u_X = numeric(), + u_Y = numeric(), + ell_rank1 = numeric(), + max_abs_Bu_minus_ell_u = numeric(), + alignment_with_Bpath_eigenvector = numeric(), + stringsAsFactors = FALSE +) + +path_covs <- list() + +for (q in levels_Q) { + for (qq in levels_Q) { + idx <- which(Q_current == q & Q_next == qq) + if (length(idx) == 0) next + + Z_path <- Z_lead[idx, , drop = FALSE] + + p_path <- nrow(Z_path) / nrow(Z_lead) + m_path <- colMeans(Z_path) + u_path <- as.numeric(m_path - mu_lead) + Sigma_path <- pop_cov(Z_path) + + B_path <- p_path * tcrossprod(u_path) + W_path <- p_path * Sigma_path + + Sigma_B_dyn <- Sigma_B_dyn + B_path + Sigma_W_dyn <- Sigma_W_dyn + W_path + + ell_path <- p_path * sum(u_path^2) + + lhs <- as.numeric(B_path %*% u_path) + rhs <- ell_path * u_path + max_err <- max(abs(lhs - rhs)) + + eig_Bpath <- eigen(B_path, symmetric = TRUE) + v_Bpath <- eig_Bpath$vectors[, 1] + + if (sum(u_path^2) > 0) { + v_u <- u_path / sqrt(sum(u_path^2)) + align <- abs(sum(v_u * v_Bpath)) + } else { + align <- NA_real_ + } + + path_name <- paste(q, qq, sep = "->") + path_covs[[path_name]] <- Sigma_path + + path_table <- rbind( + path_table, + data.frame( + path = path_name, + q_current = q, + q_next = qq, + p_path = p_path, + mean_X_lead = m_path[1], + mean_Y_lead = m_path[2], + u_X = u_path[1], + u_Y = u_path[2], + ell_rank1 = ell_path, + max_abs_Bu_minus_ell_u = max_err, + alignment_with_Bpath_eigenvector = align, + stringsAsFactors = FALSE + ) + ) + } +} + +Sigma_lead_recovered <- Sigma_B_dyn + Sigma_W_dyn + +cat("\n============================================================\n") +cat("DYNAMIC TRANSITION-PATH COVARIANCE DECOMPOSITION\n") +cat("============================================================\n") + +cat("\nLead-sample mean mu_lead:\n") +print(mu_lead) + +cat("\nOriginal lead covariance Sigma_lead:\n") +cat_matrix(Sigma_lead) + +cat("\nRecovered dynamic covariance Sigma_B_dyn + Sigma_W_dyn:\n") +cat_matrix(Sigma_lead_recovered) + +cat("\nMax absolute dynamic recovery error:\n") +print(max(abs(Sigma_lead - Sigma_lead_recovered))) + +cat("\nDynamic between-transition covariance Sigma_B_dyn:\n") +cat_matrix(Sigma_B_dyn) + +cat("\nDynamic within-transition residual covariance Sigma_W_dyn:\n") +cat_matrix(Sigma_W_dyn) + +cat("\nTransition-path rank-one primitive checks:\n") +print(path_table[order(-path_table$ell_rank1), ], digits = 8) + +# ----------------------------------------------------------------------------- +# 8. Dynamic eigensystem and transition-path spectral attribution +# ----------------------------------------------------------------------------- + +eig_lead <- eigen(Sigma_lead, symmetric = TRUE) +eig_dyn_recovered <- eigen(Sigma_lead_recovered, symmetric = TRUE) +eig_B_dyn <- eigen(Sigma_B_dyn, symmetric = TRUE) + +cat("\n============================================================\n") +cat("DYNAMIC EIGENVALUE AND EIGENVECTOR RECOVERY\n") +cat("============================================================\n") + +cat("\nEigenvalues of original Sigma_lead:\n") +print(eig_lead$values) + +cat("\nEigenvalues of recovered dynamic covariance:\n") +print(eig_dyn_recovered$values) + +cat("\nEigenvalues of dynamic between-transition covariance Sigma_B_dyn:\n") +print(eig_B_dyn$values) + +cat("\nLeading eigenvector of Sigma_lead:\n") +print(eig_lead$vectors[, 1]) + +cat("\nLeading eigenvector of recovered dynamic covariance:\n") +print(eig_dyn_recovered$vectors[, 1]) + +cat("\nLeading eigenvector of Sigma_B_dyn only:\n") +print(eig_B_dyn$vectors[, 1]) + +cat("\nAbsolute alignment between Sigma_lead PC1 and recovered dynamic PC1:\n") +print(abs(sum(eig_lead$vectors[, 1] * eig_dyn_recovered$vectors[, 1]))) + +cat("\nAbsolute alignment between Sigma_lead PC1 and Sigma_B_dyn PC1:\n") +print(abs(sum(eig_lead$vectors[, 1] * eig_B_dyn$vectors[, 1]))) + +cat("\n============================================================\n") +cat("DYNAMIC SPECTRAL ATTRIBUTION ALONG FULL LEAD PCA EIGENVECTORS\n") +cat("============================================================\n") + +dynamic_attr <- data.frame( + eigen_index = integer(), + lambda = numeric(), + between_transition = numeric(), + within_transition = numeric(), + between_plus_within = numeric(), + recovery_error = numeric() +) + +for (i in 1:2) { + v <- eig_lead$vectors[, i] + lambda_i <- eig_lead$values[i] + + between_i <- as.numeric(t(v) %*% Sigma_B_dyn %*% v) + within_i <- as.numeric(t(v) %*% Sigma_W_dyn %*% v) + + dynamic_attr <- rbind( + dynamic_attr, + data.frame( + eigen_index = i, + lambda = lambda_i, + between_transition = between_i, + within_transition = within_i, + between_plus_within = between_i + within_i, + recovery_error = abs(lambda_i - between_i - within_i) + ) + ) +} + +print(dynamic_attr, digits = 10) + +# Per-transition-path contribution to lead lambda1 +v1_dyn <- eig_lead$vectors[, 1] + +path_lambda1_contrib <- data.frame( + path = character(), + p_path = numeric(), + between_contrib_to_lambda1 = numeric(), + within_contrib_to_lambda1 = numeric(), + total_contrib_to_lambda1 = numeric(), + stringsAsFactors = FALSE +) + +for (i in seq_len(nrow(path_table))) { + path_name <- path_table$path[i] + p_path <- path_table$p_path[i] + u_path <- c(path_table$u_X[i], path_table$u_Y[i]) + Sigma_path <- path_covs[[path_name]] + + b_contrib <- p_path * sum(v1_dyn * u_path)^2 + w_contrib <- p_path * as.numeric(t(v1_dyn) %*% Sigma_path %*% v1_dyn) + + path_lambda1_contrib <- rbind( + path_lambda1_contrib, + data.frame( + path = path_name, + p_path = p_path, + between_contrib_to_lambda1 = b_contrib, + within_contrib_to_lambda1 = w_contrib, + total_contrib_to_lambda1 = b_contrib + w_contrib, + stringsAsFactors = FALSE + ) + ) +} + +cat("\nTop transition-path contributions to lead lambda1:\n") +print( + path_lambda1_contrib[order(-path_lambda1_contrib$total_contrib_to_lambda1), ], + digits = 10 +) + +cat("\nTop transition-path BETWEEN contributions to lead lambda1:\n") +print( + path_lambda1_contrib[order(-path_lambda1_contrib$between_contrib_to_lambda1), ], + digits = 10 +) + +cat("\nCheck sum of path contributions to lead lambda1:\n") +print(sum(path_lambda1_contrib$total_contrib_to_lambda1)) +cat("Lead lambda1:\n") +print(eig_lead$values[1]) + +# ----------------------------------------------------------------------------- +# 9. Conceptual contrast with classical HMM +# ----------------------------------------------------------------------------- + +cat("\n============================================================\n") +cat("CONCEPTUAL CONTRAST WITH CLASSICAL HMM\n") +cat("============================================================\n") + +cat("\nClassical HMM workflow:\n") +cat(" 1. Choose number of hidden states.\n") +cat(" 2. Specify emission distributions.\n") +cat(" 3. Estimate hidden states and parameters, usually by EM.\n") +cat(" 4. Interpret the latent states after estimation.\n") + +cat("\nNNS directional Markov regime workflow:\n") +cat(" 1. Define observable states by directional quadrant membership.\n") +cat(" 2. Estimate state probabilities by frequencies.\n") +cat(" 3. Estimate transition probabilities by counts.\n") +cat(" 4. Estimate state means, state covariances, and spectral contributions directly.\n") + +cat("\nBottom line:\n") +cat(" HMMs infer hidden regimes and then interpret them.\n") +cat(" NNS defines interpretable directional regimes first and then measures their dynamics.\n") + +cat("\n============================================================\n") +cat("DONE\n") +cat("============================================================\n") + +``` + +--- + +## Appendix B: Direct `NNS::PM.matrix` Verification Code + +```r +# ============================================================================= +# Directional Markov Regimes: Direct NNS::PM.matrix Verification +# Static, Lead, and Transition-Path Covariance Checks +# ============================================================================= + +library(NNS) + +pop_cov <- function(M) { + M <- as.matrix(M) + n <- nrow(M) + if (n <= 1) return(matrix(0, ncol(M), ncol(M))) + cov(M) * (n - 1) / n +} + +mvrnorm_base <- function(n, mu, Sigma) { + p <- length(mu) + Z <- matrix(rnorm(n * p), n, p) + sweep(Z %*% chol(Sigma), 2, mu, "+") +} + +assign_quadrants <- function(Z, center) { + X <- Z[, 1] + Y <- Z[, 2] + cx <- center[1] + cy <- center[2] + + Q <- rep(NA_character_, nrow(Z)) + + Q[X > cx & Y > cy] <- "CUPM" + Q[X <= cx & Y <= cy] <- "CLPM" + Q[X > cx & Y <= cy] <- "DLPM" + Q[X <= cx & Y > cy] <- "DUPM" + + factor(Q, levels = c("CUPM", "CLPM", "DLPM", "DUPM")) +} + +pm_matrix_check <- function(Z, label, pcs = 2) { + Z <- as.matrix(Z) + + cat("\n------------------------------------------------------------\n") + cat(label, "\n") + cat("------------------------------------------------------------\n") + + Sigma_classic <- cov(Z) + pca_classic <- eigen(Sigma_classic, symmetric = TRUE) + + pm <- NNS::PM.matrix( + LPM_degree = 1, + UPM_degree = 1, + target = "mean", + variable = Z, + pop_adj = TRUE, + norm = FALSE + ) + + Sigma_nns <- pm$clpm + pm$cupm - pm$dlpm - pm$dupm + pca_nns <- eigen(Sigma_nns, symmetric = TRUE) + + for (j in seq_len(ncol(Z))) { + if (sum(pca_classic$vectors[, j] * pca_nns$vectors[, j]) < 0) { + pca_nns$vectors[, j] <- -pca_nns$vectors[, j] + } + } + + pcs <- min(pcs, ncol(Z)) + + alignments <- sapply(seq_len(pcs), function(k) { + abs(sum(pca_classic$vectors[, k] * pca_nns$vectors[, k])) + }) + + cat("\nMax abs difference: cov(Z) vs PM.matrix $cov.matrix:\n") + print(max(abs(Sigma_classic - pm$cov.matrix))) + + cat("\nMax abs difference: cov(Z) vs directional reassembly:\n") + print(max(abs(Sigma_classic - Sigma_nns))) + + cat("\nClassical PCA eigenvalues:\n") + print(round(pca_classic$values, 12)) + + cat("\nNNS PM.matrix recovered PCA eigenvalues:\n") + print(round(pca_nns$values, 12)) + + cat("\nEigenvector alignments:\n") + print(round(alignments, 12)) + + attrib <- data.frame( + PC = seq_len(pcs), + eigenvalue = pca_classic$values[seq_len(pcs)], + CLPM = NA_real_, + CUPM = NA_real_, + DLPM = NA_real_, + DUPM = NA_real_, + recovered = NA_real_, + error = NA_real_ + ) + + for (k in seq_len(pcs)) { + v <- pca_classic$vectors[, k, drop = FALSE] + + clpm_k <- drop(t(v) %*% pm$clpm %*% v) + cupm_k <- drop(t(v) %*% pm$cupm %*% v) + dlpm_k <- drop(t(v) %*% pm$dlpm %*% v) + dupm_k <- drop(t(v) %*% pm$dupm %*% v) + + recovered_k <- clpm_k + cupm_k - dlpm_k - dupm_k + + attrib$CLPM[k] <- clpm_k + attrib$CUPM[k] <- cupm_k + attrib$DLPM[k] <- dlpm_k + attrib$DUPM[k] <- dupm_k + attrib$recovered[k] <- recovered_k + attrib$error[k] <- abs(attrib$eigenvalue[k] - recovered_k) + } + + cat("\nDirectional eigenvalue attribution from PM.matrix:\n") + print(round(attrib, 12)) + + invisible(list( + pm = pm, + Sigma_classic = Sigma_classic, + Sigma_nns = Sigma_nns, + pca_classic = pca_classic, + pca_nns = pca_nns, + attrib = attrib, + alignments = alignments + )) +} + +# Recreate the Markov-regime simulation from the note. +set.seed(123) + +n <- 5000 + +state <- integer(n) +state[1] <- 1 + +p_stay <- 0.96 + +for (t in 2:n) { + state[t] <- ifelse(runif(1) < p_stay, state[t - 1], 3 - state[t - 1]) +} + +mu_calm <- c(0.0003, 0.0003) +Sigma_calm <- matrix( + c(0.0002, 0.00003, + 0.00003, 0.0002), + nrow = 2, + byrow = TRUE +) + +mu_turb <- c(-0.0008, -0.0012) +Sigma_turb <- matrix( + c(0.0012, 0.0009, + 0.0009, 0.0012), + nrow = 2, + byrow = TRUE +) + +Z <- matrix(0, n, 2) + +for (t in seq_len(n)) { + if (state[t] == 1) { + Z[t, ] <- mvrnorm_base(1, mu_calm, Sigma_calm) + } else { + Z[t, ] <- mvrnorm_base(1, mu_turb, Sigma_turb) + } +} + +colnames(Z) <- c("X", "Y") + +center <- colMeans(Z) +Q <- assign_quadrants(Z, center) + +# Static PM.matrix recovery. +static_pm <- pm_matrix_check( + Z = Z, + label = "Static covariance/PCA recovery using NNS::PM.matrix(Z)", + pcs = 2 +) + +# Lead-sample PM.matrix recovery. +Z_lead <- Z[-1, , drop = FALSE] + +lead_pm <- pm_matrix_check( + Z = Z_lead, + label = "Lead covariance/PCA recovery using NNS::PM.matrix(Z_lead)", + pcs = 2 +) + +# Transition-path covariance validation. +Q_current <- Q[-n] +Q_next <- Q[-1] +levels_Q <- levels(Q) + +path_pm_table <- data.frame( + path = character(), + n_path = integer(), + p_path = numeric(), + max_abs_cov_vs_PM_cov = numeric(), + max_abs_cov_vs_PM_reassembly = numeric(), + stringsAsFactors = FALSE +) + +for (q in levels_Q) { + for (qq in levels_Q) { + idx <- which(Q_current == q & Q_next == qq) + if (length(idx) <= 1) next + + Z_path <- Z_lead[idx, , drop = FALSE] + + pm_path <- NNS::PM.matrix( + LPM_degree = 1, + UPM_degree = 1, + target = "mean", + variable = Z_path, + pop_adj = TRUE, + norm = FALSE + ) + + Sigma_path_pm <- pm_path$clpm + pm_path$cupm - pm_path$dlpm - pm_path$dupm + + path_pm_table <- rbind( + path_pm_table, + data.frame( + path = paste(q, qq, sep = "->"), + n_path = nrow(Z_path), + p_path = nrow(Z_path) / nrow(Z_lead), + max_abs_cov_vs_PM_cov = max(abs(cov(Z_path) - pm_path$cov.matrix)), + max_abs_cov_vs_PM_reassembly = max(abs(cov(Z_path) - Sigma_path_pm)), + stringsAsFactors = FALSE + ) + ) + } +} + +cat("\nPM.matrix validation for each transition-path covariance:\n") +print(path_pm_table[order(path_pm_table$path), ], digits = 10) + +cat("\nMax transition-path PM.matrix covariance error:\n") +print(max(path_pm_table$max_abs_cov_vs_PM_cov)) + +cat("\nMax transition-path PM.matrix reassembly error:\n") +print(max(path_pm_table$max_abs_cov_vs_PM_reassembly)) + +# norm = TRUE caveat. +pm_norm <- NNS::PM.matrix( + LPM_degree = 1, + UPM_degree = 1, + target = "mean", + variable = Z, + pop_adj = TRUE, + norm = TRUE +) + +cat("\nnorm = TRUE $cov.matrix:\n") +print(round(pm_norm$cov.matrix, 10)) + +cat("\nMax absolute difference: norm = TRUE $cov.matrix vs cov(Z):\n") +print(max(abs(pm_norm$cov.matrix - cov(Z)))) +``` + diff --git a/tools/NNS/examples/discrete_and_continuous_bayes.md b/tools/NNS/examples/discrete_and_continuous_bayes.md new file mode 100644 index 0000000..3ea440b --- /dev/null +++ b/tools/NNS/examples/discrete_and_continuous_bayes.md @@ -0,0 +1,181 @@ +# Numerical Example: Degree-0 Bayes and Degree-1 Hinge-Surface Recovery + +This example illustrates the distinction between: + +1. **Degree-0 Bayes**, which gives the exact conditional event probability directly, and +2. **Degree-1 hinge-surface recovery**, which reconstructs the same event probability indirectly through recovery of the joint CDF. + +The point is not that degree 1 replaces degree 0 as a probability ratio. Rather, degree 1 supplies the hinge surface from which the joint law can be recovered. + +## Setup + +We simulate a dependent bivariate sample: + +```r +library(NNS) + +set.seed(123) +n <- 2000 +x <- rnorm(n) +y <- rnorm(n) + 0.8 * x + +t_x <- 0 +t_y <- 0 +``` + +Empirical sample quantities: + +- $P(X>0) = 0.508$ +- $P(Y>0) = 0.500$ +- $P(X>0, Y>0) = 0.362$ +- $P(Y>0 \mid X>0) = 0.7125984$ + +--- + +## 1. Degree-0 Bayes + +At degree 0, the NNS operators coincide with quadrant probabilities: + +$$ \mathrm{Co.UPM}(0,x,y;t_x,t_y)=P(X>t_x,\;Y>t_y), \qquad \mathrm{UPM}(0,t_x,x)=P(X>t_x). $$ + +So the conditional event probability is obtained exactly as + +$$ P(Y>t_y \mid X>t_x) = \frac{\mathrm{Co.UPM}(0,x,y;t_x,t_y)}{\mathrm{UPM}(0,t_x,x)}. $$ + +Using $t_x=t_y=0$: + +```r +joint_prob_deg0 <- Co.UPM(0, x, y, target_x = t_x, target_y = t_y) +marg_prob_x_deg0 <- UPM(0, t_x, x) + +cond_eventprob_deg0 <- joint_prob_deg0 / marg_prob_x_deg0 +cond_eventprob_deg0 +``` + +Output: + +```r +[1] 0.7125984 +``` + +This matches the empirical conditional event probability exactly in-sample. + +--- + +## 2. Degree-1 hinge-surface recovery + +Define the raw lower hinge surface + +$$ +H(t_x,t_y)=E[(t_x-X)_+(t_y-Y)_+]. +$$ + +The degree-1 recovery theorem states that + +$$ +\frac{\partial^2 H}{\partial t_x \partial t_y}(t_x,t_y)=F_{X,Y}(t_x,t_y). +$$ + +So we can recover the joint CDF at $(0,0)$ numerically by mixed finite differences. + +```r +targets <- seq(-3, 3, length.out = 61) +h <- diff(targets)[1] + +hinge_surface <- outer( + targets, + targets, + Vectorize(function(tx, ty) { + Co.LPM_nD( + data = cbind(x, y), + target = c(tx, ty), + degree = 1, + norm = FALSE + ) + }) +) + +i0 <- which.min(abs(targets - 0)) + +joint_cdf_from_hinge <- ( + hinge_surface[i0 + 1, i0 + 1] - + hinge_surface[i0 + 1, i0] - + hinge_surface[i0, i0 + 1] + + hinge_surface[i0, i0] +) / h^2 +``` + +Recovered and empirical lower-left quadrant probabilities: + +- Recovered $F(0,0)$ from hinge surface: `0.372003` +- Empirical $P(X \le 0, Y \le 0)$: `0.354000` + +--- + +## 3. Reconstructing the same upper-right event probability + +From the recovered joint CDF, + +$$ +P(X>0,Y>0)=1-F_X(0)-F_Y(0)+F_{X,Y}(0,0). +$$ + +Using degree-0 marginal CDF values at zero: + +```r +FX0 <- LPM.ratio(0, t_x, x) +FY0 <- LPM.ratio(0, t_y, y) + +joint_eventprob_from_hinge <- 1 - FX0 - FY0 + joint_cdf_from_hinge +marg_eventprob_x <- 1 - FX0 + +cond_eventprob_from_hinge <- joint_eventprob_from_hinge / marg_eventprob_x +cond_eventprob_from_hinge +``` + +Output: + +```r +[1] 0.7480377 +``` + +Comparison: + +- Recovered $P(X>0, Y>0)$: `0.380003` +- Empirical $P(X>0, Y>0)$: `0.362000` +- Recovered $P(Y>0 \mid X>0)$: `0.7480377` +- Empirical $P(Y>0 \mid X>0)$: `0.7125984` + +--- + +## 4. Interpretation + +This example shows two different routes to the same event-level quantity $P(Y>0 \mid X>0)$: + +- **Degree 0** gives it directly and exactly through quadrant probabilities. +- **Degree 1** gives it indirectly by recovering the joint CDF from the raw hinge surface and then applying inclusion-exclusion. + +The degree-1 reconstruction is close, but not exact, because the mixed derivative is approximated numerically on a finite grid. The discrepancy is therefore numerical, not conceptual. + +More importantly, degree 1 should be understood structurally: + +- mixed second derivatives recover the joint CDF, +- further differentiation recovers the joint density when it exists, +- and only then does the full continuous analogue of Bayes arise through density ratios. + +Thus: + +- **degree 0** is the exact event-probability layer, +- **degree 1** is the law-recovery layer. + +--- + +## 5. Summary table + +| Quantity | Value | +|---|---:| +| Exact degree-0 conditional event probability | 0.7125984 | +| Degree-1 hinge-surface reconstructed event probability | 0.7480377 | +| Empirical event probability truth | 0.7125984 | + +These results confirm that degree 0 and degree 1 do not play the same role. Degree 0 yields Bayes directly. Degree 1 recovers the joint law from which event probabilities, and ultimately continuous conditional densities, can be constructed. diff --git a/tools/NNS/examples/index.md b/tools/NNS/examples/index.md new file mode 100644 index 0000000..6fda6fc --- /dev/null +++ b/tools/NNS/examples/index.md @@ -0,0 +1,115 @@ + + + +# NNS +NNS (Nonlinear Nonparametric Statistics) leverages partial moments – the fundamental [elements of variance](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Partial%20Moments%20Equivalences.md) that [asymptotically approximate the area of f(x)](https://ovvo-financial.github.io/NNS/book/numerical-integration-via-partial-moments.html) – to provide a robust foundation for nonlinear analysis while maintaining linear equivalences. Designed for real-world data that violates symmetry, linearity, or distributional assumptions. + +NNS delivers a comprehensive suite of advanced statistical techniques, including: + - Numerical Integration & Numerical Differentiation + - Partitional & Hierarchical Clustering + - Nonlinear Correlation & Dependence + - Causal Analysis + - Nonlinear Regression & Classification + - ANOVA + - Seasonality & Autoregressive Modeling + - Normalization + - Stochastic Superiority / Dominance + +See the following for NNS detailed examples and specific applications: + +# 1. Basic Statistics + + 1.1 [Partial Moment Equivalences](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Partial%20Moments%20Equivalences.md) + + 1.2 [Bayes' Theorem](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Bayes'%20Theorem%20From%20Partial%20Moments.pdf) + + 1.3 [CDFs and ANOVA](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Continuous_CDFs_and_ANOVA_with_NNS.pdf) + + 1.4 [Bias and Confidence Intervals](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Bias_and_CI.html) + + 1.5 [Correlation and Dependence](https://cran.r-project.org/package=NNS/vignettes/NNSvignette_Correlation_and_Dependence.html) + + 1.6 [Normalization](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Normalization.pdf) + + 1.7 [Partial Moments Estimation Error](https://github.com/OVVO-Financial/Finance/blob/main/Data/Estimation_Error_Replication.md) + + +# 2. Regression + + 2.1 [Overview](https://ssrn.com/abstract=3389938) + + 2.2 [Curve Fitting](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Curve_Fitting.html) + + 2.3 [Nonparametric Regression Using Clusters](http://rdcu.be/tz0J) + + 2.4 [Clustering and Curve Fitting By Line Segments](https://ssrn.com/abstract=2861339) + + 2.5 [Regression Residuals](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Regression_Residuals.html) + + 2.6 [Multiple Imputation](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/NNS_MI_vs_MICE.md) + + 2.7 [Logistic Regression Binary Classification](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Logistic_Comparison.html) + + 2.8 [Boston Housing](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Boston_Housing.html) + + + +# 3. Machine Learning + 3.1 [Partitional Estimation Using Partial Moments](https://ssrn.com/abstract=3592491) + + 3.2 [NNS Regression in Machine Learning](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Machine_Learning.pdf) + + 3.3 [Classification Using NNS Clustering Analysis](https://ssrn.com/abstract=2864711) + + 3.4 [NNS vs. xgboost](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/xgboost_example.html) + + 3.5 [Time-Series Classification](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Time_Series_Classification.html) + + 3.6 [Time-Series Classification II](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Time_Series_Classification_Expanded.html) + + 3.7 [Spiral Matching Example](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Spiral%20Matching%20Example.pdf) + + 3.8 [MNIST](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/NNS%20vs%20KNN%20MNIST%20dataset.pdf) + + +# 4. Time-Series Forecasting + + 4.1 [Overview](https://ssrn.com/abstract=3382300) + + 4.2 [NNS vs. KERAS](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Sunspots_example.html) + + 4.3 [NNS vs. prophet](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/prophet_NNS_comparison.html) + + 4.4 [Tides](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/tides.html) + + 4.5 [NNS vs. N-HiTS](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/NNS.ARMA%20vs%20N-Hits.md) + + 4.6 [NNS Time-Series Prediction Interval Benchmark](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/nns_arma_conformal_benchmark_report.md) + + +# 5. Econometrics + + 5.1 [Econometrics Critiques and Solutions](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/7_Econometric_Reasons.html) + + 5.2 [VAR Alternative](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/VAR_example.html) + + 5.3 [NOWCASTING](https://ssrn.com/abstract=3589816) + + 5.4 [Causal Analysis](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/PWT.html) + + 5.5 [Federal Reserve Causal Analysis](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Causal_Inference_Amongst_Macroeconomic_Variables_Using_NNS.html) + + 5.6 [Causal Inference](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Causal_Inference_with_NNS_stack.pdf) + + + +# References + +The previous examples are just that...examples. They are not meant to serve as proofs or intended to be exhaustive demonstrations, rather the hands-on application of a robust nonparametric regression in many different types of common machine learning problems. + +See the [book](https://ovvo-financial.github.io/NNS/book/) or the [papers available on SSRN](https://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=1421356), if you'd like to learn *why* & *how* NNS does what it does. + + +# Thank you for your interest in NNS! + + diff --git a/tools/NNS/examples/nhits_2.png b/tools/NNS/examples/nhits_2.png new file mode 100644 index 0000000..a1214bf Binary files /dev/null and b/tools/NNS/examples/nhits_2.png differ diff --git a/tools/NNS/examples/nns-directional-spectral-decomposition.md b/tools/NNS/examples/nns-directional-spectral-decomposition.md new file mode 100644 index 0000000..016ab46 --- /dev/null +++ b/tools/NNS/examples/nns-directional-spectral-decomposition.md @@ -0,0 +1,1937 @@ +# NNS Directional Spectral Decomposition: Full Orthants, Pairwise Matrices, and Scalable PCA Attribution + +This note extends the [Chapter 11 directional spectral decomposition](https://ovvo-financial.github.io/NNS/book/directional-spectral-decomposition.html) workflow in three layers: + +1. **Full orthant decomposition** + The complete mean-split partition gives an exact spectral genealogy of covariance and PCA, but it scales as $2^d$. + +2. **Pairwise partial-moment matrices** + Pairwise directional matrices recover covariance and PCA exactly while scaling as $O(d^2)$. This is the practical high-dimensional PCA attribution layer. + +3. **`DPM_nD` aggregation** + The n-dimensional NNS functions collapse the full orthant partition into three global states: all-lower, all-upper, and mixed-sign. This is scalable and useful, but it is a coarse directional summary, not a full spectral genealogy. + +The main conclusion is: + +> Full orthants explain PCA completely. +> Pairwise partial-moment matrices scale PCA recovery and directional attribution to high dimensions. +> `DPM_nD` gives a compact global diagnostic, not a replacement for spectral decomposition. + +--- + +## Executive Summary + +Classical PCA starts with a covariance matrix: + +```math +\Sigma, +``` + +and diagonalizes it: + +```math +\Sigma v_i = \lambda_i v_i. +``` + +PCA reports eigenvalues and eigenvectors, but it does not explain which directional regions of the data generated them. + +The NNS directional framework begins with observable directional structure. In the full orthant version, every observation is assigned to a mean-split orthant. For $d$ variables, this gives: + +```math +2^d +``` + +possible states. + +Each occupied orthant $r$ has: + +- empirical probability $p_r$; +- conditional mean $m_r$; +- centered displacement $u_r = m_r-\mu$; +- within-orthant covariance $\Sigma_r$. + +The covariance matrix decomposes exactly as: + +```math +\Sigma += +\Sigma_Q+\Sigma_W, +``` + +where: + +```math +\Sigma_Q += +\sum_r p_r u_r u_r^\top, +``` + +and: + +```math +\Sigma_W += +\sum_r p_r\Sigma_r. +``` + +Thus: + +```math +\Sigma += +\sum_r p_r u_r u_r^\top ++ +\sum_r p_r\Sigma_r. +``` + +Diagonalizing the recovered matrix gives the same classical PCA eigensystem. + +The full orthant decomposition is exact, but it becomes infeasible in high dimensions. For $d=50$, + +```math +2^{50} +\approx +1.1259\times 10^{15}. +``` + +The practical high-dimensional alternative is to use pairwise partial-moment matrices. These recover covariance through: + +```math +\Sigma += +\mathrm{CLPM} ++ +\mathrm{CUPM} +- +\mathrm{DLPM} +- +\mathrm{DUPM}. +``` + +Then PCA is recovered from: + +```math +(\mathrm{CLPM},\mathrm{CUPM},\mathrm{DLPM},\mathrm{DUPM}) +\longrightarrow +\Sigma +\longrightarrow +(\lambda_i,v_i). +``` + +This preserves exact covariance and PCA recovery while scaling with four $d\times d$ matrices rather than $2^d$ orthant states. + +--- + +# Part I — Full Orthant Decomposition + +## 1. Setup + +Generate $n=10{,}000$ observations from a five-dimensional distribution with approximately uniform pairwise correlation $0.5$. + +```r +set.seed(2024) + +n <- 10000 +d <- 5 + +R <- matrix(0.5, d, d) + diag(0.5, d) +L <- chol(R) + +Z <- matrix(rnorm(n * d), n, d) %*% L +colnames(Z) <- paste0("X", seq_len(d)) + +target <- colMeans(Z) +``` + +The component mean vector is used as the target. + +In the reported run, the target vector was: + +| Variable | Target | +|---|---:| +| X1 | -0.006349 | +| X2 | -0.004824 | +| X3 | -0.002800 | +| X4 | 0.004986 | +| X5 | 0.001181 | + +--- + +## 2. Classical PCA + +Compute the population-denominator covariance matrix: + +```r +mu <- colMeans(Z) +Zc <- sweep(Z, 2, mu) +Sigma <- crossprod(Zc) / n +pca <- eigen(Sigma, symmetric = TRUE) +``` + +The classical eigenvalues were: + +| Component | Eigenvalue | +|---:|---:| +| PC1 | 2.965193 | +| PC2 | 0.510783 | +| PC3 | 0.499329 | +| PC4 | 0.493351 | +| PC5 | 0.488601 | + +The first eigenvalue is much larger than the remaining four, consistent with a strong common direction induced by the uniform correlation structure. + +--- + +## 3. Mean-Split Orthant Partition + +Partition each variable at its component mean. For five variables, the full mean-split partition contains: + +```math +2^5 = 32 +``` + +possible orthants. + +Each observation is encoded according to whether each variable is above or below its component mean: + +```r +above_mean <- sweep(Z, 2, mu, ">") + +orthant_label <- apply(above_mean, 1, function(row) { + sum(row * 2^(0:(d - 1))) +}) +``` + +The run occupied all orthants: + +```text +32 out of 32 +``` + +This is the five-dimensional analogue of the bivariate quadrant partition. + +--- + +## 4. Between-Within Orthant Decomposition + +For each orthant $r$, compute: + +- empirical probability $p_r$; +- conditional mean $m_r$; +- centered displacement $u_r = m_r-\mu$; +- within-orthant covariance $\Sigma_r$. + +Then accumulate: + +```math +\Sigma_Q += +\sum_{r=1}^{2^d}p_r u_r u_r^\top, +``` + +and: + +```math +\Sigma_W += +\sum_{r=1}^{2^d}p_r\Sigma_r. +``` + +The full covariance identity is: + +```math +\Sigma += +\Sigma_Q+\Sigma_W. +``` + +In code: + +```r +Sigma_Q <- matrix(0, d, d) +Sigma_W <- matrix(0, d, d) + +for (lab in unique(orthant_label)) { + mask <- orthant_label == lab + n_r <- sum(mask) + p_r <- n_r / n + + Zr <- Z[mask, , drop = FALSE] + m_r <- colMeans(Zr) + u_r <- m_r - mu + Sigma_r <- crossprod(sweep(Zr, 2, m_r)) / n_r + + Sigma_Q <- Sigma_Q + p_r * tcrossprod(u_r) + Sigma_W <- Sigma_W + p_r * Sigma_r +} + +Sigma_rec <- Sigma_Q + Sigma_W +``` + +Each orthant contributes a rank-one between-orthant primitive: + +```math +B_r += +p_r u_r u_r^\top. +``` + +If $u_r\neq 0$, then: + +```math +B_r u_r += +p_r\|u_r\|^2u_r. +``` + +So each centered orthant conditional mean is the nonzero eigenvector of its own rank-one between-orthant contribution. + +--- + +## 5. Covariance and PCA Recovery + +The full orthant decomposition recovered the covariance matrix to floating-point precision: + +```text +Max absolute covariance recovery error: +3.330669e-15 +``` + +The recovered eigensystem also matched the classical eigensystem: + +```text +Full orthant eigenvalue recovery error: +3.663736e-15 + +Full orthant eigenvector alignments: +1 1 1 1 1 +``` + +Thus: + +```math +\{p_r,m_r,\Sigma_r\}_{r=1}^{2^d} +\longrightarrow +\Sigma_Q+\Sigma_W +\longrightarrow +\Sigma +\longrightarrow +(\lambda_i,v_i). +``` + +The full orthant decomposition recovers the covariance matrix and therefore recovers the classical PCA eigensystem. + +--- + +## 6. Eigenvalue Attribution from Full Orthants + +Each classical eigenvalue decomposes into a between-orthant contribution and a within-orthant contribution. + +For unit eigenvector $v_i$: + +```math +\lambda_i += +v_i^\top\Sigma_Qv_i ++ +v_i^\top\Sigma_Wv_i. +``` + +Using the orthant decomposition: + +```math +\lambda_i += +\sum_{r=1}^{2^d}p_r(v_i^\top u_r)^2 ++ +\sum_{r=1}^{2^d}p_rv_i^\top\Sigma_rv_i. +``` + +The attribution table was: + +| Component | Eigenvalue | Between | Within | Total | Between Percent | +|---:|---:|---:|---:|---:|---:| +| PC1 | 2.965193 | 2.439089 | 0.526104 | 2.965193 | 82.25735 | +| PC2 | 0.510783 | 0.243905 | 0.266878 | 0.510783 | 47.75114 | +| PC3 | 0.499329 | 0.241130 | 0.258199 | 0.499329 | 48.29081 | +| PC4 | 0.493351 | 0.237770 | 0.255581 | 0.493351 | 48.19483 | +| PC5 | 0.488601 | 0.235968 | 0.252633 | 0.488601 | 48.29467 | + +For PC1: + +```math +\frac{2.439089}{2.965193} += +0.8225735. +``` + +So about 82.3 percent of PC1 came from between-orthant conditional mean separation. + +A useful global summary is: + +```math +D_{\mathrm{spectral}} += +\frac{\mathrm{tr}(\Sigma_Q)}{\mathrm{tr}(\Sigma)}. +``` + +In this run: + +```text +D_spectral = 0.685432 +``` + +So about 68.5 percent of total variance came from between-orthant conditional mean displacement. + +--- + +## 7. Orthant-Level Attribution of PC1 + +The between-orthant part of PC1 decomposes further into individual orthant contributions: + +```math +\lambda_{1,Q} += +\sum_{r=1}^{2^d}p_r(v_1^\top u_r)^2. +``` + +The top ten orthant contributions to PC1 were: + +| Rank | Orthant | Pattern | Probability | Contribution | Percent of PC1 | +|---:|---:|---|---:|---:|---:| +| 1 | 31 | X1+ X2+ X3+ X4+ X5+ | 0.1663 | 0.9676085 | 32.63223 | +| 2 | 0 | X1- X2- X3- X4- X5- | 0.1646 | 0.9344073 | 31.51253 | +| 3 | 16 | X1- X2- X3- X4- X5+ | 0.0328 | 0.0539840 | 1.82059 | +| 4 | 4 | X1- X2- X3+ X4- X5- | 0.0342 | 0.0517459 | 1.74511 | +| 5 | 8 | X1- X2- X3- X4+ X5- | 0.0334 | 0.0513473 | 1.73167 | +| 6 | 2 | X1- X2+ X3- X4- X5- | 0.0321 | 0.0504620 | 1.70181 | +| 7 | 23 | X1+ X2+ X3+ X4- X5+ | 0.0332 | 0.0498662 | 1.68172 | +| 8 | 30 | X1- X2+ X3+ X4+ X5+ | 0.0340 | 0.0480060 | 1.61899 | +| 9 | 27 | X1+ X2+ X3- X4+ X5+ | 0.0330 | 0.0475473 | 1.60351 | +| 10 | 29 | X1+ X2- X3+ X4+ X5+ | 0.0311 | 0.0475126 | 1.60234 | + +The top two orthants were the all-upper and all-lower states: + +```text +X1+ X2+ X3+ X4+ X5+ +X1- X2- X3- X4- X5- +``` + +Together they contributed: + +```math +32.63223\% + 31.51253\% += +64.14476\% +``` + +of PC1. + +The orthant-level PC1 contributions summed exactly to the direct Rayleigh quotient: + +```text +Sum of orthant-level between contributions for PC1: +2.439089 + +Direct Sigma_Q between contribution for PC1: +2.439089 +``` + +This is the full orthant-level spectral genealogy. + +--- + +## 8. Converse Failure + +The decomposition runs in one direction. + +From the full orthant decomposition, one recovers the covariance matrix: + +```math +\{p_r,m_r,\Sigma_r\}_{r=1}^{2^d} +\longrightarrow +\Sigma. +``` + +From the covariance matrix, one recovers the PCA eigensystem: + +```math +\Sigma +\longrightarrow +(\lambda_i,v_i). +``` + +But PCA output alone does not recover the orthant probabilities, orthant assignments, or orthant conditional means. + +Therefore: + +```math +\{p_r,m_r,\Sigma_r\}_{r=1}^{2^d} +\Rightarrow +\Sigma +\Rightarrow +(\lambda_i,v_i), +``` + +but generally: + +```math +(\lambda_i,v_i) +\not\Rightarrow +\{p_r,m_r,\Sigma_r\}_{r=1}^{2^d}. +``` + +PCA is a downstream summary. The orthant decomposition contains strictly more directional information. + +--- + +# Part II — `DPM_nD` as a Three-State Global Diagnostic + +## 9. Why `DPM_nD` Is Useful but Coarse + +The full orthant decomposition is exact, but it scales as: + +```math +2^d. +``` + +For five variables: + +```math +2^5 = 32. +``` + +For 50 variables: + +```math +2^{50} +\approx +1.1259\times10^{15}. +``` + +Most high-dimensional orthants would be empty or too sparsely populated to support stable conditional mean and covariance estimates. + +The NNS n-dimensional partial moment functions collapse the full orthant partition into three observable aggregate states: + +```math +\mathrm{CLPM}_{nD} += +\mathrm{all\ variables\ below\ target}, +``` + +```math +\mathrm{CUPM}_{nD} += +\mathrm{all\ variables\ above\ target}, +``` + +```math +\mathrm{DPM}_{nD} += +\mathrm{all\ mixed\ sign\ configurations}. +``` + +So the state count is reduced from: + +```math +2^d +``` + +to: + +```math +3. +``` + +For $d=5$, the full partition has 32 orthants. The three-state aggregation is: + +```math +G_L=\{0\}, +``` + +```math +G_U=\{31\}, +``` + +```math +G_D=\{1,2,\ldots,30\}. +``` + +Thus: + +```math +\mathrm{CLPM}_{nD} += +\sum_{r\in G_L}p_r, +``` + +```math +\mathrm{CUPM}_{nD} += +\sum_{r\in G_U}p_r, +``` + +```math +\mathrm{DPM}_{nD} += +\sum_{r\in G_D}p_r. +``` + +At degree zero, `DPM_nD` is not separate from the full orthant decomposition. It is the coarse three-state projection of it. + +--- + +## 10. `DPM_nD` R Example Using the NNS Package + +The exported NNS package functions are: + +```r +NNS::Co.LPM_nD(data, target, degree = 0, norm = TRUE) +NNS::Co.UPM_nD(data, target, degree = 0, norm = TRUE) +NNS::DPM_nD(data, target, degree = 0, norm = TRUE) +``` + +The direct full-orthant aggregation gave: + +| State | Full Orthant Aggregation | +|---|---:| +| `CLPM_nD` | 0.1646 | +| `CUPM_nD` | 0.1663 | +| `DPM_nD` | 0.6691 | + +The NNS degree-zero values were identical: + +| State | NNS Degree-Zero Value | +|---|---:| +| `CLPM_nD` | 0.1646 | +| `CUPM_nD` | 0.1663 | +| `DPM_nD` | 0.6691 | + +The difference between the NNS output and the full orthant aggregation was: + +```text +0 0 0 +``` + +This proves that, at degree zero, the NNS nD functions exactly recover the three-state aggregation of the full orthant partition. + +--- + +## 11. Degree-One `DPM_nD` Directional Mass + +The raw degree-one directional masses were: + +| State | Raw Degree-One Mass | +|---|---:| +| `CLPM_nD` | 0.605363 | +| `CUPM_nD` | 0.575106 | +| `DPM_nD` | 0.086607 | + +The normalized degree-one directional mass shares were: + +| State | Normalized Share | +|---|---:| +| `CLPM_nD` | 0.477763 | +| `CUPM_nD` | 0.453885 | +| `DPM_nD` | 0.068352 | + +Using `norm = TRUE` returned the same normalized shares: + +| State | `norm = TRUE` | +|---|---:| +| `CLPM_nD` | 0.47776331 | +| `CUPM_nD` | 0.45388455 | +| `DPM_nD` | 0.06835214 | + +The interpretation is: + +> Mixed-sign observations are frequent, but they contribute relatively little first-degree directional mass in this correlated five-variable example. + +This is consistent with the PCA attribution result: the leading covariance direction is dominated by concordant all-lower and all-upper conditional mean separation. + +--- + +## 12. Limitation of `DPM_nD` for Spectral Genealogy + +The full spectral decomposition requires more than probabilities. It requires each orthant displacement vector and each within-orthant covariance: + +```math +\{p_r,u_r,\Sigma_r\}_{r=1}^{2^d}. +``` + +The full between-orthant covariance is: + +```math +\Sigma_Q += +\sum_{r=1}^{2^d}p_r u_r u_r^\top. +``` + +When the mixed orthants are collapsed into a single `DPM_nD` value, their individual displacement vectors are no longer separately retained. + +Therefore: + +```math +\mathrm{full\ orthant\ decomposition} +\Rightarrow +(\mathrm{CLPM}_{nD},\mathrm{CUPM}_{nD},\mathrm{DPM}_{nD}), +``` + +but: + +```math +(\mathrm{CLPM}_{nD},\mathrm{CUPM}_{nD},\mathrm{DPM}_{nD}) +\not\Rightarrow +\mathrm{full\ orthant\ decomposition}. +``` + +So the correct role of `DPM_nD` is: + +```math +\mathrm{DPM}_{nD} += +\mathrm{scalable\ three\ state\ global\ directional\ diagnostic}. +``` + +It is not the main high-dimensional PCA recovery method. + +--- + +# Part III — Pairwise Partial-Moment Matrices as the Scalable PCA Method + +## 13. Pairwise Directional Matrix Identity + +The scalable alternative is to use pairwise directional partial-moment matrices. + +For a $d$-variable matrix $Z$, compute the four $d\times d$ matrices: + +```math +\mathrm{CLPM}, +\qquad +\mathrm{CUPM}, +\qquad +\mathrm{DLPM}, +\qquad +\mathrm{DUPM}. +``` + +These reassemble covariance as: + +```math +\Sigma += +\mathrm{CLPM} ++ +\mathrm{CUPM} +- +\mathrm{DLPM} +- +\mathrm{DUPM}. +``` + +In R: + +```r +pm <- NNS::PM.matrix( + LPM_degree = 1, + UPM_degree = 1, + target = "mean", + variable = Z, + pop_adj = TRUE, + norm = FALSE +) + +CLPM <- pm$clpm +CUPM <- pm$cupm +DLPM <- pm$dlpm +DUPM <- pm$dupm + +Sigma_nns <- CLPM + CUPM - DLPM - DUPM +``` + +Then PCA is recovered by: + +```r +pca_nns <- eigen(Sigma_nns, symmetric = TRUE) +``` + +The recovery chain is: + +```math +(\mathrm{CLPM},\mathrm{CUPM},\mathrm{DLPM},\mathrm{DUPM}) +\longrightarrow +\Sigma +\longrightarrow +(\lambda_i,v_i). +``` + +This uses four $d\times d$ matrices rather than $2^d$ orthants. + +--- + +## 14. Pairwise Eigenvalue Attribution + +For any classical PCA direction $v_i$, + +```math +\lambda_i += +v_i^\top \Sigma v_i. +``` + +Substituting the pairwise directional decomposition gives: + +```math +\lambda_i += +v_i^\top\mathrm{CLPM}v_i ++ +v_i^\top\mathrm{CUPM}v_i +- +v_i^\top\mathrm{DLPM}v_i +- +v_i^\top\mathrm{DUPM}v_i. +``` + +This identifies whether a principal component is generated by: + +- lower concordance; +- upper concordance; +- lower-divergent movement; +- upper-divergent movement. + +This is not a full orthant genealogy. It does not recover complete simultaneous states such as: + +```text +X1+ X2- X3+ ... X50- +``` + +But it does recover exact covariance and PCA while preserving interpretable pairwise directional sources. + +--- + +## 15. Five-Variable Pairwise Recovery + +Using the same five-variable dataset, pairwise `PM.matrix` recovered covariance and PCA to numerical precision. + +```text +Pairwise PM.matrix covariance recovery error: +1.776357e-15 + +PM.matrix $cov.matrix vs manual reassembly error: +1.110223e-16 + +Directional aggregate asymmetry before symmetrization: +1.110223e-16 +``` + +The classical sample PCA eigenvalues were: + +| Component | Eigenvalue | +|---:|---:| +| PC1 | 2.965489 | +| PC2 | 0.510834 | +| PC3 | 0.499379 | +| PC4 | 0.493400 | +| PC5 | 0.488650 | + +The pairwise NNS recovered eigenvalues were identical at displayed precision: + +| Component | Recovered Eigenvalue | +|---:|---:| +| PC1 | 2.965489 | +| PC2 | 0.510834 | +| PC3 | 0.499379 | +| PC4 | 0.493400 | +| PC5 | 0.488650 | + +The eigenvalue recovery error was: + +```text +2.720046e-15 +``` + +The eigenvector alignments were: + +```text +1 1 1 1 1 +``` + +--- + +## 16. Five-Variable Pairwise Directional Attribution + +The pairwise attribution table was: + +| PC | Eigenvalue | CLPM | CUPM | DLPM | DUPM | Recovered | +|---:|---:|---:|---:|---:|---:|---:| +| 1 | 2.965489 | 1.693504 | 1.708038 | 0.218026 | 0.218026 | 2.965489 | +| 2 | 0.510834 | 0.200757 | 0.199162 | -0.055458 | -0.055458 | 0.510834 | +| 3 | 0.499379 | 0.198809 | 0.190119 | -0.055225 | -0.055225 | 0.499379 | +| 4 | 0.493400 | 0.196324 | 0.188517 | -0.054280 | -0.054280 | 0.493400 | +| 5 | 0.488650 | 0.190190 | 0.192332 | -0.053064 | -0.053064 | 0.488650 | + +The signed percentage attribution was: + +| PC | CLPM % | CUPM % | DLPM signed % | DUPM signed % | +|---:|---:|---:|---:|---:| +| 1 | 57.11 | 57.60 | -7.35 | -7.35 | +| 2 | 39.30 | 38.99 | 10.86 | 10.86 | +| 3 | 39.81 | 38.07 | 11.06 | 11.06 | +| 4 | 39.79 | 38.21 | 11.00 | 11.00 | +| 5 | 38.92 | 39.36 | 10.86 | 10.86 | + +For PC1: + +```math +2.965489 += +1.693504 ++ +1.708038 +- +0.218026 +- +0.218026. +``` + +So PC1 is dominated by pairwise lower and upper concordance, with divergent pairwise movement subtracting from the common direction. + +At the trace level: + +| Component | Trace Contribution | Trace Percent | +|---|---:|---:| +| CLPM | 2.479584 | 50.01 | +| CUPM | 2.478168 | 49.99 | +| DLPM signed | 0.000000 | 0.00 | +| DUPM signed | 0.000000 | 0.00 | +| Total | 4.957752 | 100.00 | + +Total variance is evenly split between lower and upper concordant diagonal structure in this symmetric correlated example. + +--- + +# Part IV — 50-Variable Pairwise Scalability + +## 17. Why Full Orthants Fail at 50 Variables + +For $d=50$, the full orthant state count is: + +```math +2^{50} += +1.1259\times 10^{15}. +``` + +With $n=5000$ observations, the observation-to-orthant ratio is: + +```math +\frac{5000}{2^{50}} += +4.440892\times 10^{-12}. +``` + +This is not estimable as a full orthant partition. Almost every possible orthant is unobserved, and any occupied orthant would be too sparse for stable covariance estimation. + +Pairwise `PM.matrix` avoids this problem by estimating four $50\times 50$ matrices. + +--- + +## 18. 50-Variable Common/Sector Simulation + +The 50-variable simulation used: + +- one common market factor; +- two sector factors; +- idiosyncratic noise. + +Variables 1–25 loaded on sector A. Variables 26–50 loaded on sector B. + +```r +set.seed(2026) + +n <- 5000 +d <- 50 + +market <- rnorm(n) +sector_A <- rnorm(n) +sector_B <- rnorm(n) +E <- matrix(rnorm(n * d), nrow = n, ncol = d) + +Z50 <- matrix(0, nrow = n, ncol = d) + +for (j in seq_len(d)) { + if (j <= 25) { + Z50[, j] <- 0.60 * market + 0.90 * sector_A + 0.40 * E[, j] + } else { + Z50[, j] <- 0.60 * market + 0.90 * sector_B + 0.40 * E[, j] + } +} +``` + +The pairwise PM.matrix experiment ran quickly: + +```text +Runtime: +elapsed = 0.12 seconds +``` + +The first five classical PCA eigenvalues were: + +| Component | Eigenvalue | +|---:|---:| +| PC1 | 38.576663 | +| PC2 | 20.499903 | +| PC3 | 0.194593 | +| PC4 | 0.188941 | +| PC5 | 0.186759 | + +The pairwise NNS recovered eigenvalues were identical at displayed precision. + +Recovery diagnostics: + +```text +Pairwise covariance recovery error: +3.774758e-15 + +Pairwise eigenvalue recovery error: +1.776357e-14 + +Pairwise eigenvector alignments, first 5 PCs: +1 1 1 1 1 +``` + +Thus pairwise partial-moment matrices recover covariance and PCA exactly for this 50-variable example. + +--- + +## 19. 50-Variable Directional Attribution + +The directional eigenvalue attribution for the first five PCs was: + +| PC | Eigenvalue | CLPM | CUPM | DLPM | DUPM | Recovered | +|---:|---:|---:|---:|---:|---:|---:| +| 1 | 38.576663 | 22.634337 | 22.669687 | 3.363681 | 3.363681 | 38.576663 | +| 2 | 20.499903 | 7.000469 | 7.585803 | -2.956815 | -2.956815 | 20.499903 | +| 3 | 0.194593 | 0.084791 | 0.082119 | -0.013842 | -0.013842 | 0.194593 | +| 4 | 0.188941 | 0.082029 | 0.082308 | -0.012302 | -0.012302 | 0.188941 | +| 5 | 0.186759 | 0.082447 | 0.080548 | -0.011882 | -0.011882 | 0.186759 | + +The signed percentage attribution was: + +| PC | CLPM % | CUPM % | DLPM signed % | DUPM signed % | +|---:|---:|---:|---:|---:| +| 1 | 58.67 | 58.77 | -8.72 | -8.72 | +| 2 | 34.15 | 37.00 | 14.42 | 14.42 | +| 3 | 43.57 | 42.20 | 7.11 | 7.11 | +| 4 | 43.42 | 43.56 | 6.51 | 6.51 | +| 5 | 44.15 | 43.13 | 6.36 | 6.36 | + +Interpretation: + +- PC1 is dominated by lower and upper concordance. +- Divergent pairwise structure subtracts from PC1. +- PC2 has substantial positive signed divergent contribution, consistent with a sector/spread direction separating the two blocks. + +The trace-level decomposition was: + +| Component | Trace Contribution | Trace Percent | +|---|---:|---:| +| CLPM | 33.08720 | 49.57 | +| CUPM | 33.66671 | 50.43 | +| DLPM signed | 0.00000 | 0.00 | +| DUPM signed | 0.00000 | 0.00 | +| Total | 66.75391 | 100.00 | + +The trace result says that total variance is evenly split between lower and upper concordant diagonal structure, while divergence affects off-diagonal covariance and eigenvector directions rather than total variance. + +--- + +# Part V — 50-Variable PC1-Not-Market Stress Test + +## 20. Motivation + +A common PCA interpretation mistake is to assume: + +> PC1 is the market factor. + +But PC1 is only the largest variance direction. It is not necessarily the market direction. + +To test the pairwise directional framework, construct a 50-variable simulation where the spread factor is deliberately stronger than the market factor. + +The market vector is: + +```math +m += +\frac{1}{\sqrt{50}}(1,1,\ldots,1). +``` + +The spread vector is: + +```math +s += +\frac{1}{\sqrt{50}}(\underbrace{1,\ldots,1}_{25}, +\underbrace{-1,\ldots,-1}_{25}). +``` + +The two vectors are orthogonal: + +```math +m^\top s = 0. +``` + +The data are generated as: + +```math +Z += +\sigma_m M m^\top ++ +\sigma_s S s^\top ++ +\sigma_e E, +``` + +with: + +```math +\sigma_m = 0.70, +\qquad +\sigma_s = 1.80, +\qquad +\sigma_e = 0.35. +``` + +Because the spread factor is stronger than the market factor, PC1 should be the spread factor, not the market factor. + +--- + +## 21. PCA Result: PC1 Is Not Market + +The first five PCA eigenvalues were: + +| Component | Eigenvalue | +|---:|---:| +| PC1 | 3.492035 | +| PC2 | 0.618253 | +| PC3 | 0.144454 | +| PC4 | 0.141775 | +| PC5 | 0.140824 | + +Alignment diagnostics were: + +| Quantity | Value | +|---|---:| +| $\left\lvert\langle PC1,\mathrm{market}\rangle\right\rvert$ | 0.00000742 | +| $\left\lvert\langle PC1,\mathrm{spread}\rangle\right\rvert$ | 0.99982464 | +| $\left\lvert\langle PC2,\mathrm{market}\rangle\right\rvert$ | 0.99827428 | +| $\left\lvert\langle PC2,\mathrm{spread}\rangle\right\rvert$ | 0.00016201 | + +Thus PC1 is essentially the spread factor, while PC2 is essentially the market factor. + +This confirms: + +> PC1 is not always the market factor. + +--- + +## 22. Pairwise Recovery in the Stress Test + +The pairwise PM.matrix decomposition again recovered covariance and PCA to numerical precision: + +```text +Pairwise covariance recovery error: +4.440892e-16 + +Pairwise eigenvalue recovery error: +2.664535e-15 + +Pairwise eigenvector alignments, first 5 PCs: +1 1 1 1 1 +``` + +So even when PC1 is not market, the pairwise directional matrices recover the full PCA eigensystem exactly. + +--- + +## 23. Directional Attribution in the Stress Test + +The directional eigenvalue attribution was: + +| PC | Eigenvalue | CLPM | CUPM | DLPM | DUPM | Recovered | +|---:|---:|---:|---:|---:|---:|---:| +| 1 | 3.492035 | 0.924240 | 0.908033 | -0.829881 | -0.829881 | 3.492035 | +| 2 | 0.618253 | 1.851522 | 1.846249 | 1.539759 | 1.539759 | 0.618253 | +| 3 | 0.144454 | 0.052954 | 0.050886 | -0.020307 | -0.020307 | 0.144454 | +| 4 | 0.141775 | 0.050863 | 0.050608 | -0.020152 | -0.020152 | 0.141775 | +| 5 | 0.140824 | 0.051385 | 0.052016 | -0.018711 | -0.018711 | 0.140824 | + +The signed percentage attribution was: + +| PC | CLPM % | CUPM % | DLPM signed % | DUPM signed % | +|---:|---:|---:|---:|---:| +| 1 | 26.47 | 26.00 | 23.76 | 23.76 | +| 2 | 299.48 | 298.62 | -249.05 | -249.05 | +| 3 | 36.66 | 35.23 | 14.06 | 14.06 | +| 4 | 35.88 | 35.70 | 14.21 | 14.21 | +| 5 | 36.49 | 36.94 | 13.29 | 13.29 | + +For PC1: + +```math +3.492035 += +0.924240 ++ +0.908033 +- +(-0.829881) +- +(-0.829881). +``` + +So PC1 is not only concordant. It is substantially generated by divergent pairwise structure. + +Interpretation: + +> When PC1 is a market/common factor, concordant matrices dominate and divergence subtracts. +> When PC1 is a spread factor, divergent pairwise structure becomes part of the positive explanatory mechanism. + +This is the high-dimensional pairwise analogue of the full orthant result: PCA finds the dominant axis, while NNS identifies the directional source of that axis. + +--- + +# Part VI — Runtime and Scaling + +## 24. Runtime Scaling Grid + +The final experiment compared pairwise PM.matrix recovery across dimensions: + +| d | Full Orthant Count | Runtime Seconds | Covariance Error | Eigenvalue Error | PC1 Alignment | +|---:|---:|---:|---:|---:|---:| +| 5 | 32 | 0.00 | 8.8817842e-16 | 1.7763568e-15 | 1 | +| 10 | 1,024 | 0.00 | 1.1102230e-15 | 3.5527137e-15 | 1 | +| 25 | 33,554,432 | 0.02 | 1.4432899e-15 | 3.5527137e-15 | 1 | +| 50 | 1.1258999e15 | 0.07 | 1.5543122e-15 | 2.1316282e-14 | 1 | + +This table shows the scaling contrast directly: + +```math +\mathrm{full\ orthants} += +O(2^d), +``` + +while: + +```math +\mathrm{pairwise\ PM\ matrices} += +O(d^2). +``` + +The pairwise method recovered covariance and PC1 exactly at displayed precision across all tested dimensions. + +--- + +# Part VII — Final Hierarchy + +The updated hierarchy is: + +```math +\mathrm{full\ orthant\ decomposition} +\Rightarrow +\mathrm{complete\ spectral\ genealogy,\ exact\ but\ exponential}. +``` + +```math +\mathrm{pairwise\ PM\ matrices} +\Rightarrow +\mathrm{exact\ covariance/PCA\ recovery\ and\ directional\ attribution,\ scalable}. +``` + +```math +\mathrm{DPM}_{nD} +\Rightarrow +\mathrm{three\ state\ global\ directional\ diagnostic,\ scalable\ but\ coarse}. +``` + +The three layers answer different questions. + +| Layer | State count / storage | What it gives | What it cannot give | +|---|---:|---|---| +| Full orthants | $2^d$ states | Complete orthant-level PCA genealogy | Practical high-dimensional estimation | +| Pairwise PM matrices | Four $d\times d$ matrices | Exact covariance/PCA recovery; directional pairwise attribution | Full simultaneous $d$-orthant labels | +| `DPM_nD` | Three values | All-lower / all-upper / mixed global directional summary | PCA recovery or full spectral genealogy | + +The strongest practical conclusion is: + +> Use full orthants when $d$ is small and complete regime genealogy is desired. +> Use pairwise `PM.matrix` for scalable high-dimensional PCA recovery and directional attribution. +> Use `DPM_nD` as a compact global diagnostic of all-lower, all-upper, and mixed directional mass. + +--- + +# Appendix: Full R Code + +```r +# ============================================================================= +# Experiments for NNS Directional Spectral Decomposition +# Full Orthants, DPM_nD Aggregation, and Pairwise PM.matrix Scalability +# ============================================================================= + +library(NNS) + +# ----------------------------------------------------------------------------- +# Helper functions +# ----------------------------------------------------------------------------- + +cat_section <- function(title) { + cat("\n\n============================================================\n") + cat(title, "\n") + cat("============================================================\n") +} + +pop_cov <- function(Z) { + Z <- as.matrix(Z) + Zc <- sweep(Z, 2, colMeans(Z)) + crossprod(Zc) / nrow(Z) +} + +align_eigenvectors <- function(V_ref, V_test, k = ncol(V_ref)) { + sapply(seq_len(k), function(j) { + abs(sum(V_ref[, j] * V_test[, j])) + }) +} + +decode_orthant <- function(label, d, names_vec) { + bits <- as.integer(intToBits(as.integer(label)))[1:d] + signs <- ifelse(bits == 1, "+", "-") + paste(paste0(names_vec, signs), collapse = " ") +} + +full_orthant_decomposition <- function(Z) { + Z <- as.matrix(Z) + n <- nrow(Z) + d <- ncol(Z) + mu <- colMeans(Z) + + above_mean <- sweep(Z, 2, mu, ">") + orthant_label <- apply(above_mean, 1, function(row) { + sum(row * 2^(0:(d - 1))) + }) + + Sigma <- pop_cov(Z) + Sigma_Q <- matrix(0, d, d) + Sigma_W <- matrix(0, d, d) + + orthant_info <- data.frame( + orthant = integer(), + pattern = character(), + n = integer(), + p = numeric(), + u_norm = numeric(), + stringsAsFactors = FALSE + ) + + for (lab in sort(unique(orthant_label))) { + mask <- orthant_label == lab + n_r <- sum(mask) + p_r <- n_r / n + + Zr <- Z[mask, , drop = FALSE] + m_r <- colMeans(Zr) + u_r <- m_r - mu + + Sigma_r <- pop_cov(Zr) + + Sigma_Q <- Sigma_Q + p_r * tcrossprod(u_r) + Sigma_W <- Sigma_W + p_r * Sigma_r + + orthant_info <- rbind( + orthant_info, + data.frame( + orthant = lab, + pattern = decode_orthant(lab, d, colnames(Z)), + n = n_r, + p = p_r, + u_norm = sqrt(sum(u_r^2)), + stringsAsFactors = FALSE + ) + ) + } + + Sigma_rec <- Sigma_Q + Sigma_W + + pca <- eigen(Sigma, symmetric = TRUE) + pca_rec <- eigen(Sigma_rec, symmetric = TRUE) + + for (j in seq_len(d)) { + if (sum(pca$vectors[, j] * pca_rec$vectors[, j]) < 0) { + pca_rec$vectors[, j] <- -pca_rec$vectors[, j] + } + } + + attrib <- data.frame( + PC = seq_len(d), + eigenvalue = pca$values, + between = NA_real_, + within = NA_real_, + total = NA_real_, + between_pct = NA_real_, + error = NA_real_ + ) + + for (j in seq_len(d)) { + v <- pca$vectors[, j, drop = FALSE] + between_j <- drop(t(v) %*% Sigma_Q %*% v) + within_j <- drop(t(v) %*% Sigma_W %*% v) + + attrib$between[j] <- between_j + attrib$within[j] <- within_j + attrib$total[j] <- between_j + within_j + attrib$between_pct[j] <- 100 * between_j / attrib$total[j] + attrib$error[j] <- abs(attrib$eigenvalue[j] - attrib$total[j]) + } + + v1 <- pca$vectors[, 1] + + orthant_pc1 <- data.frame( + orthant = integer(), + pattern = character(), + p = numeric(), + contribution = numeric(), + pct_lambda1 = numeric(), + stringsAsFactors = FALSE + ) + + for (lab in sort(unique(orthant_label))) { + mask <- orthant_label == lab + p_r <- sum(mask) / n + u_r <- colMeans(Z[mask, , drop = FALSE]) - mu + contrib <- p_r * sum(v1 * u_r)^2 + + orthant_pc1 <- rbind( + orthant_pc1, + data.frame( + orthant = lab, + pattern = decode_orthant(lab, d, colnames(Z)), + p = p_r, + contribution = contrib, + pct_lambda1 = 100 * contrib / pca$values[1], + stringsAsFactors = FALSE + ) + ) + } + + orthant_pc1 <- orthant_pc1[order(-orthant_pc1$contribution), ] + + list( + Sigma = Sigma, + Sigma_Q = Sigma_Q, + Sigma_W = Sigma_W, + Sigma_rec = Sigma_rec, + pca = pca, + pca_rec = pca_rec, + orthant_label = orthant_label, + orthant_info = orthant_info, + attrib = attrib, + orthant_pc1 = orthant_pc1, + recovery_error = max(abs(Sigma - Sigma_rec)), + eigenvalue_error = max(abs(pca$values - pca_rec$values)), + eigenvector_alignment = align_eigenvectors(pca$vectors, pca_rec$vectors, d), + D_spectral = sum(diag(Sigma_Q)) / sum(diag(Sigma)) + ) +} + +pairwise_pm_pca <- function(Z, pcs = 5) { + Z <- as.matrix(Z) + d <- ncol(Z) + + Sigma_classic <- cov(Z) + pca_classic <- eigen(Sigma_classic, symmetric = TRUE) + + pm <- NNS::PM.matrix( + LPM_degree = 1, + UPM_degree = 1, + target = "mean", + variable = Z, + pop_adj = TRUE, + norm = FALSE + ) + + CLPM <- pm$clpm + CUPM <- pm$cupm + DLPM <- pm$dlpm + DUPM <- pm$dupm + + Sigma_nns_raw <- CLPM + CUPM - DLPM - DUPM + Sigma_nns <- (Sigma_nns_raw + t(Sigma_nns_raw)) / 2 + + pca_nns <- eigen(Sigma_nns, symmetric = TRUE) + + for (j in seq_len(d)) { + if (sum(pca_classic$vectors[, j] * pca_nns$vectors[, j]) < 0) { + pca_nns$vectors[, j] <- -pca_nns$vectors[, j] + } + } + + pcs <- min(pcs, d) + + attrib <- data.frame( + PC = seq_len(pcs), + eigenvalue = pca_classic$values[1:pcs], + CLPM = NA_real_, + CUPM = NA_real_, + DLPM = NA_real_, + DUPM = NA_real_, + recovered = NA_real_, + error = NA_real_ + ) + + for (j in seq_len(pcs)) { + v <- pca_classic$vectors[, j, drop = FALSE] + + clpm_j <- drop(t(v) %*% CLPM %*% v) + cupm_j <- drop(t(v) %*% CUPM %*% v) + dlpm_j <- drop(t(v) %*% DLPM %*% v) + dupm_j <- drop(t(v) %*% DUPM %*% v) + + recovered_j <- clpm_j + cupm_j - dlpm_j - dupm_j + + attrib$CLPM[j] <- clpm_j + attrib$CUPM[j] <- cupm_j + attrib$DLPM[j] <- dlpm_j + attrib$DUPM[j] <- dupm_j + attrib$recovered[j] <- recovered_j + attrib$error[j] <- abs(attrib$eigenvalue[j] - recovered_j) + } + + attrib_pct <- data.frame( + PC = attrib$PC, + CLPM_pct = 100 * attrib$CLPM / attrib$eigenvalue, + CUPM_pct = 100 * attrib$CUPM / attrib$eigenvalue, + DLPM_signed_pct = -100 * attrib$DLPM / attrib$eigenvalue, + DUPM_signed_pct = -100 * attrib$DUPM / attrib$eigenvalue + ) + + trace_fun <- function(M) sum(diag(M)) + + trace_decomp <- c( + CLPM = trace_fun(CLPM), + CUPM = trace_fun(CUPM), + DLPM_signed = -trace_fun(DLPM), + DUPM_signed = -trace_fun(DUPM), + Total = trace_fun(Sigma_nns) + ) + + list( + pm = pm, + CLPM = CLPM, + CUPM = CUPM, + DLPM = DLPM, + DUPM = DUPM, + Sigma_classic = Sigma_classic, + Sigma_nns_raw = Sigma_nns_raw, + Sigma_nns = Sigma_nns, + pca_classic = pca_classic, + pca_nns = pca_nns, + covariance_error = max(abs(Sigma_classic - Sigma_nns_raw)), + pm_cov_error = max(abs(pm$cov.matrix - Sigma_nns_raw)), + asymmetry = max(abs(Sigma_nns_raw - t(Sigma_nns_raw))), + eigenvalue_error = max(abs(pca_classic$values - pca_nns$values)), + eigenvector_alignment = align_eigenvectors( + pca_classic$vectors, + pca_nns$vectors, + pcs + ), + attrib = attrib, + attrib_pct = attrib_pct, + trace_decomp = trace_decomp, + trace_pct = 100 * trace_decomp / trace_decomp["Total"] + ) +} + +dpm_nd_summary <- function(Z) { + Z <- as.matrix(Z) + n <- nrow(Z) + d <- ncol(Z) + target <- colMeans(Z) + + above_target <- sweep(Z, 2, target, ">") + orthant_label <- apply(above_target, 1, function(row) { + sum(row * 2^(0:(d - 1))) + }) + + orthant_tab <- table(orthant_label) + + p_all_lower <- as.numeric(orthant_tab[as.character(0)]) / n + p_all_upper <- as.numeric(orthant_tab[as.character(2^d - 1)]) / n + + if (is.na(p_all_lower)) p_all_lower <- 0 + if (is.na(p_all_upper)) p_all_upper <- 0 + + p_mixed <- 1 - p_all_lower - p_all_upper + + orthant_aggregated <- c( + CLPM_nD = p_all_lower, + CUPM_nD = p_all_upper, + DPM_nD = p_mixed + ) + + clpm0 <- NNS::Co.LPM_nD(Z, target, degree = 0, norm = FALSE) + cupm0 <- NNS::Co.UPM_nD(Z, target, degree = 0, norm = FALSE) + dpm0 <- NNS::DPM_nD( Z, target, degree = 0, norm = FALSE) + + nns_degree0 <- c( + CLPM_nD = clpm0, + CUPM_nD = cupm0, + DPM_nD = dpm0 + ) + + clpm1 <- NNS::Co.LPM_nD(Z, target, degree = 1, norm = FALSE) + cupm1 <- NNS::Co.UPM_nD(Z, target, degree = 1, norm = FALSE) + dpm1 <- NNS::DPM_nD( Z, target, degree = 1, norm = FALSE) + + nns_degree1_raw <- c( + CLPM_nD = clpm1, + CUPM_nD = cupm1, + DPM_nD = dpm1 + ) + + nns_degree1_norm_manual <- nns_degree1_raw / sum(nns_degree1_raw) + + nns_degree1_norm_true <- c( + CLPM_nD = NNS::Co.LPM_nD(Z, target, degree = 1, norm = TRUE), + CUPM_nD = NNS::Co.UPM_nD(Z, target, degree = 1, norm = TRUE), + DPM_nD = NNS::DPM_nD( Z, target, degree = 1, norm = TRUE) + ) + + list( + orthant_aggregated = orthant_aggregated, + nns_degree0 = nns_degree0, + degree0_difference = nns_degree0 - orthant_aggregated, + nns_degree1_raw = nns_degree1_raw, + nns_degree1_norm_manual = nns_degree1_norm_manual, + nns_degree1_norm_true = nns_degree1_norm_true + ) +} + +# ============================================================================= +# Experiment 1: 5-variable full orthant decomposition +# ============================================================================= + +cat_section("EXPERIMENT 1: 5-VARIABLE FULL ORTHANT DECOMPOSITION") + +set.seed(2024) + +n <- 10000 +d <- 5 + +R <- matrix(0.5, d, d) + diag(0.5, d) +L <- chol(R) + +Z5 <- matrix(rnorm(n * d), n, d) %*% L +colnames(Z5) <- paste0("X", seq_len(d)) + +res5_orth <- full_orthant_decomposition(Z5) + +cat("\nTarget vector:\n") +print(round(colMeans(Z5), 6)) + +cat("\nOccupied orthants:\n") +print(length(unique(res5_orth$orthant_label))) +cat("Out of:\n") +print(2^d) + +cat("\nClassical population PCA eigenvalues:\n") +print(round(res5_orth$pca$values, 6)) + +cat("\nFull orthant covariance recovery error:\n") +print(res5_orth$recovery_error) + +cat("\nFull orthant eigenvalue recovery error:\n") +print(res5_orth$eigenvalue_error) + +cat("\nFull orthant eigenvector alignments:\n") +print(round(res5_orth$eigenvector_alignment, 12)) + +cat("\nEigenvalue attribution, full orthants:\n") +print(round(res5_orth$attrib, 6)) + +cat("\nD_spectral = trace(Sigma_Q) / trace(Sigma):\n") +print(round(res5_orth$D_spectral, 6)) + +cat("\nTop 10 orthant contributions to PC1:\n") +print(head(res5_orth$orthant_pc1, 10), digits = 6) + +cat("\nCheck PC1 orthant contribution sum vs between attribution:\n") +print(c( + sum_orthant_PC1 = sum(res5_orth$orthant_pc1$contribution), + direct_between_PC1 = res5_orth$attrib$between[1] +)) + +# ============================================================================= +# Experiment 2: 5-variable DPM_nD aggregation +# ============================================================================= + +cat_section("EXPERIMENT 2: 5-VARIABLE DPM_nD AGGREGATION") + +res5_dpm <- dpm_nd_summary(Z5) + +cat("\nThree-state aggregation from full orthant labels:\n") +print(round(res5_dpm$orthant_aggregated, 6)) + +cat("\nNNS degree-zero values:\n") +print(round(res5_dpm$nns_degree0, 6)) + +cat("\nDegree-zero difference, NNS minus full orthant aggregation:\n") +print(round(res5_dpm$degree0_difference, 12)) + +cat("\nNNS degree-one raw directional masses:\n") +print(round(res5_dpm$nns_degree1_raw, 6)) + +cat("\nNNS degree-one normalized shares, manual:\n") +print(round(res5_dpm$nns_degree1_norm_manual, 6)) + +cat("\nNNS degree-one normalized shares, norm = TRUE:\n") +print(round(res5_dpm$nns_degree1_norm_true, 8)) + +# ============================================================================= +# Experiment 3: 5-variable pairwise PM.matrix recovery +# ============================================================================= + +cat_section("EXPERIMENT 3: 5-VARIABLE PAIRWISE PM.matrix PCA RECOVERY") + +res5_pair <- pairwise_pm_pca(Z5, pcs = 5) + +cat("\nPairwise PM.matrix covariance recovery error:\n") +print(res5_pair$covariance_error) + +cat("\nPM.matrix $cov.matrix vs manual reassembly error:\n") +print(res5_pair$pm_cov_error) + +cat("\nDirectional aggregate asymmetry before symmetrization:\n") +print(res5_pair$asymmetry) + +cat("\nClassical sample PCA eigenvalues:\n") +print(round(res5_pair$pca_classic$values, 6)) + +cat("\nPairwise NNS recovered PCA eigenvalues:\n") +print(round(res5_pair$pca_nns$values, 6)) + +cat("\nPairwise eigenvalue recovery error:\n") +print(res5_pair$eigenvalue_error) + +cat("\nPairwise eigenvector alignments:\n") +print(round(res5_pair$eigenvector_alignment, 12)) + +cat("\nDirectional eigenvalue attribution from pairwise matrices:\n") +print(round(res5_pair$attrib, 6)) + +cat("\nSigned percentage attribution from pairwise matrices:\n") +print(round(res5_pair$attrib_pct, 2)) + +cat("\nTrace-level pairwise directional decomposition:\n") +print(round(res5_pair$trace_decomp, 6)) + +cat("\nTrace-level pairwise directional percentages:\n") +print(round(res5_pair$trace_pct, 2)) + +# ============================================================================= +# Experiment 4: 50-variable pairwise recovery +# ============================================================================= + +cat_section("EXPERIMENT 4: 50-VARIABLE PAIRWISE PM.matrix SCALABILITY") + +set.seed(2026) + +n <- 5000 +d <- 50 + +asset_names <- paste0("X", seq_len(d)) + +market <- rnorm(n) +sector_A <- rnorm(n) +sector_B <- rnorm(n) +E <- matrix(rnorm(n * d), nrow = n, ncol = d) + +Z50 <- matrix(0, nrow = n, ncol = d) + +for (j in seq_len(d)) { + if (j <= 25) { + Z50[, j] <- 0.60 * market + 0.90 * sector_A + 0.40 * E[, j] + } else { + Z50[, j] <- 0.60 * market + 0.90 * sector_B + 0.40 * E[, j] + } +} + +colnames(Z50) <- asset_names + +cat("\nFull orthant state count for d = 50:\n") +print(2^50) + +cat("\nNumber of observations:\n") +print(n) + +cat("\nObservation-to-orthant ratio n / 2^50:\n") +print(n / 2^50) + +time_50 <- system.time({ + res50_pair <- pairwise_pm_pca(Z50, pcs = 5) +}) + +cat("\nRuntime for 50-variable pairwise PM.matrix experiment:\n") +print(time_50) + +cat("\nClassical PCA eigenvalues, first 5:\n") +print(round(res50_pair$pca_classic$values[1:5], 6)) + +cat("\nPairwise NNS recovered PCA eigenvalues, first 5:\n") +print(round(res50_pair$pca_nns$values[1:5], 6)) + +cat("\nPairwise covariance recovery error, d = 50:\n") +print(res50_pair$covariance_error) + +cat("\nPairwise eigenvalue recovery error, d = 50:\n") +print(res50_pair$eigenvalue_error) + +cat("\nPairwise eigenvector alignments, first 5 PCs, d = 50:\n") +print(round(res50_pair$eigenvector_alignment, 12)) + +cat("\nDirectional eigenvalue attribution, first 5 PCs, d = 50:\n") +print(round(res50_pair$attrib, 6)) + +cat("\nSigned percentage attribution, first 5 PCs, d = 50:\n") +print(round(res50_pair$attrib_pct, 2)) + +cat("\nTrace-level pairwise directional decomposition, d = 50:\n") +print(round(res50_pair$trace_decomp, 6)) + +cat("\nTrace-level pairwise directional percentages, d = 50:\n") +print(round(res50_pair$trace_pct, 2)) + +# ============================================================================= +# Experiment 5: 50-variable PC1 not market / spread-factor stress test +# ============================================================================= + +cat_section("EXPERIMENT 5: 50-VARIABLE PC1 NOT MARKET STRESS TEST") + +set.seed(2027) + +n <- 6000 +d <- 50 + +market_vec <- rep(1, d) +market_vec <- market_vec / sqrt(sum(market_vec^2)) + +spread_raw <- c(rep(1, 25), rep(-1, 25)) +spread_vec <- spread_raw / sqrt(sum(spread_raw^2)) + +cat("\nMarket-spread inner product:\n") +print(sum(market_vec * spread_vec)) + +sd_market <- 0.70 +sd_spread <- 1.80 +sd_noise <- 0.35 + +M <- rnorm(n) +S <- rnorm(n) +E <- matrix(rnorm(n * d), nrow = n, ncol = d) + +Z50_spread <- sd_market * M %*% t(market_vec) + + sd_spread * S %*% t(spread_vec) + + sd_noise * E + +colnames(Z50_spread) <- paste0("X", seq_len(d)) + +res50_spread <- pairwise_pm_pca(Z50_spread, pcs = 5) + +pca_spread <- res50_spread$pca_classic +v1 <- pca_spread$vectors[, 1] +v2 <- pca_spread$vectors[, 2] + +if (sum(v1 * spread_vec) < 0) v1 <- -v1 +if (sum(v2 * market_vec) < 0) v2 <- -v2 + +cat("\nClassical PCA eigenvalues, first 5:\n") +print(round(pca_spread$values[1:5], 6)) + +cat("\nAlignment diagnostics:\n") +alignment_diag <- c( + abs_PC1_market = abs(sum(v1 * market_vec)), + abs_PC1_spread = abs(sum(v1 * spread_vec)), + abs_PC2_market = abs(sum(v2 * market_vec)), + abs_PC2_spread = abs(sum(v2 * spread_vec)) +) +print(round(alignment_diag, 8)) + +cat("\nPairwise covariance recovery error, 50-variable spread test:\n") +print(res50_spread$covariance_error) + +cat("\nPairwise eigenvalue recovery error, 50-variable spread test:\n") +print(res50_spread$eigenvalue_error) + +cat("\nPairwise eigenvector alignments, first 5 PCs, 50-variable spread test:\n") +print(round(res50_spread$eigenvector_alignment, 12)) + +cat("\nDirectional eigenvalue attribution, first 5 PCs, 50-variable spread test:\n") +print(round(res50_spread$attrib, 6)) + +cat("\nSigned percentage attribution, first 5 PCs, 50-variable spread test:\n") +print(round(res50_spread$attrib_pct, 2)) + +# ============================================================================= +# Experiment 6: Runtime and scaling grid +# ============================================================================= + +cat_section("EXPERIMENT 6: PAIRWISE PM.matrix RUNTIME SCALING GRID") + +set.seed(2028) + +dims <- c(5, 10, 25, 50) +n <- 3000 + +scaling_results <- data.frame( + d = integer(), + impossible_orthants = numeric(), + runtime_elapsed_sec = numeric(), + covariance_error = numeric(), + eigenvalue_error = numeric(), + pc1_alignment = numeric() +) + +for (dd in dims) { + common <- rnorm(n) + E <- matrix(rnorm(n * dd), nrow = n, ncol = dd) + + Z <- matrix(0, nrow = n, ncol = dd) + for (j in seq_len(dd)) { + Z[, j] <- 0.75 * common + 0.50 * E[, j] + } + colnames(Z) <- paste0("X", seq_len(dd)) + + timing <- system.time({ + res <- pairwise_pm_pca(Z, pcs = 1) + }) + + scaling_results <- rbind( + scaling_results, + data.frame( + d = dd, + impossible_orthants = 2^dd, + runtime_elapsed_sec = timing[["elapsed"]], + covariance_error = res$covariance_error, + eigenvalue_error = res$eigenvalue_error, + pc1_alignment = res$eigenvector_alignment[1] + ) + ) +} + +cat("\nScaling results:\n") +print(scaling_results, digits = 8) + +cat("\nDONE.\n") +``` diff --git a/tools/NNS/examples/nns_arma_conformal_benchmark_report.md b/tools/NNS/examples/nns_arma_conformal_benchmark_report.md new file mode 100644 index 0000000..c9c7dbb --- /dev/null +++ b/tools/NNS/examples/nns_arma_conformal_benchmark_report.md @@ -0,0 +1,672 @@ +# NNS Time-Series Prediction Interval Benchmark + +This experiment was ported from Python to R in order to facilitate the `NNS` timeseries comparison. The original Python version is [here](https://github.com/microprediction/conformalprediction/blob/main/benchmark/run_timeseries.py). + +## Outline + +This benchmark compares prediction intervals for a simulated nonlinear, heteroskedastic time series. The goal is not only to check whether each method reaches the target marginal coverage of 90%, but also whether the intervals remain useful across changing volatility regimes. + +The R version performs the following steps: + +1. Simulate five nonlinear time series with changing volatility regimes. +2. Fit a lagged ridge-style baseline used by the conformal and probabilistic comparison methods. +3. Evaluate conformal methods: fixed split CP, ACI, AgACI, conformal PID, and NexCP. +4. Evaluate probabilistic baselines: EWMA-vol Gaussian, static Gaussian recalibration, true sigma on estimated mean, and the true conditional oracle. +5. Run `NNS.ARMA.optim` in walk-forward chunks, with seasonal periods estimated from the available training history at each step using `NNS.seas()`. +6. Convert the native NNS lower interval, point forecast, and upper interval into a non-Gaussian degree-2 `LPM.VaR` predictive distribution for CRPS and approximate logscore. +7. Score all methods using marginal coverage, rolling-window coverage, volatility-stratified coverage, interval width, Winkler interval score, CRPS, and logscore where applicable. + +## Fidelity to the Python benchmark + +The R script follows the structure of the Python benchmark as closely as practical. It uses the same train, calibration, and test layout, the same lag length, the same volatility-regime design, and the same broad families of comparison methods. + +Exact numerical identity should not be expected. Python and R use different random number generators, so the same seed labels do not produce identical simulated paths. The baseline ridge model is also not bit-for-bit identical: the Python version uses `sklearn`'s `StandardScaler()` plus `Ridge(alpha = 1.0)`, while the R script uses `glmnet` when available and falls back to `lm` otherwise. + +Some optional Python-specific methods, such as MAPIE and `timemachines` skaters, are not reproduced directly in this R version. The R version instead focuses on the common conformal methods, probabilistic baselines, and the native `NNS` time-series comparison. + +A key correction in the final R version is the oracle. The initial oracle used the deterministic level as the mean. Because the data-generating process contains an autoregressive component, the true conditional mean is: + +```text +level_t + 0.55 * (y_{t-1} - level_{t-1}) +``` + +The final table below uses this corrected oracle. + +## NNS probabilistic forecast construction + +`NNS.ARMA.optim` natively returns a lower prediction interval, point forecast, and upper prediction interval. Rather than forcing those outputs into a Gaussian distribution, the R benchmark constructs a non-Gaussian predictive quantile function using degree-2 `LPM.VaR`. + +For each forecast step, the predictive support is: + +```r +support_t <- c(lower_t, point_forecast_t, upper_t) +``` + +The forecast quantile function is then evaluated as: + +```r +Q_t(p) <- NNS::LPM.VaR(p, degree = 2, variable = support_t) +``` + +This construction preserves asymmetry. If the point forecast is closer to the upper bound than the lower bound, or vice versa, the implied predictive distribution reflects that directional imbalance directly. No normality assumption is imposed on the NNS predictive distribution. + +CRPS is computed directly from the NNS-implied quantile function. Logscore is also reported, but it should be interpreted more cautiously because it requires estimating a local density from the quantile curve. CRPS is the cleaner distributional comparison for this compact nonparametric predictive distribution. + +## Metrics + +| Metric | Meaning | +|---|---| +| `marg_cov` | Overall empirical coverage. Target is 0.90. | +| `worst_win_cov` | Worst rolling-window coverage using a 100-step window. Higher is better. | +| `cov_lowvol` | Coverage in the lowest-volatility stratum. | +| `cov_hivol` | Coverage in the highest-volatility stratum. | +| `cond_cov_gap` | Largest absolute deviation from 0.90 across volatility strata. Lower is better. | +| `width` | Mean interval width. Lower is sharper, conditional on adequate coverage. | +| `interval_score` | Winkler interval score. Lower is better. Penalizes both width and misses. | +| `CRPS` | Distributional score. Lower is better. For NNS, computed from the degree-2 `LPM.VaR` quantile distribution. | +| `logscore` | Density score. Lower is better. For NNS, approximated from the quantile curve and interpreted cautiously. | + +`NA` values in `CRPS` and `logscore` are expected for methods that output intervals only rather than full predictive distributions. + +## Results + +Mean over 5 seeds, with `alpha = 0.10` and target coverage equal to `0.90`. + +| Rank | Method | Family | Marginal coverage | Worst rolling coverage | Low-vol coverage | High-vol coverage | Conditional coverage gap | Width | Interval score | CRPS | Logscore | +|---:|---|---|---:|---:|---:|---:|---:|---:|---:|---:|---:| +| 1 | oracle (true conditional mu,sigma) | oracle | 0.897 | 0.820 | 0.895 | 0.900 | 0.021 | 4.472 | 5.622 | 0.774 | 1.605 | +| 2 | NNS.ARMA.optim (degree-2 LPM.VaR distribution) | nns | 0.914 | 0.784 | 0.926 | 0.897 | 0.041 | 5.582 | 6.756 | 0.934 | 3.874 | +| 3 | EWMA-vol Gaussian | prob | 0.891 | 0.838 | 0.904 | 0.887 | 0.021 | 5.409 | 6.892 | 0.939 | 1.864 | +| 4 | NexCP (weighted) | cp | 0.895 | 0.782 | 0.922 | 0.891 | 0.030 | 5.467 | 6.937 | NA | NA | +| 5 | AgACI | cp | 0.905 | 0.802 | 0.948 | 0.877 | 0.048 | 5.597 | 7.035 | NA | NA | +| 6 | ACI | cp | 0.897 | 0.844 | 0.910 | 0.890 | 0.012 | 5.682 | 7.130 | NA | NA | +| 7 | true sigma on est. mu | oracle | 0.788 | 0.546 | 0.661 | 0.862 | 0.239 | 4.472 | 7.277 | 0.949 | 1.966 | +| 8 | static Gaussian (recal) | prob | 0.908 | 0.690 | 0.999 | 0.780 | 0.120 | 5.957 | 7.942 | 0.969 | 1.988 | +| 9 | fixed split (CP) | cp | 0.912 | 0.696 | 0.999 | 0.787 | 0.113 | 6.066 | 7.949 | NA | NA | +| 10 | conformal PID | cp | 0.892 | 0.572 | 1.000 | 0.746 | 0.154 | 6.091 | 8.508 | NA | NA | + +## Interpretation + +The corrected oracle is the expected best method. It knows both the true conditional mean and the true conditional volatility, so it provides the natural lower bound for the experiment. Its marginal coverage is 0.897, high-volatility coverage is 0.900, and interval score is 5.622. + +The strongest empirical method is `NNS.ARMA.optim`. It ranks second overall, behind only the oracle. It achieves marginal coverage of 0.914, high-volatility coverage of 0.897, and the best non-oracle interval score at 6.756. This means that NNS produces efficient intervals without materially sacrificing coverage in the high-volatility regime. + +The distributional scores strengthen the result. The NNS degree-2 `LPM.VaR` predictive distribution has the best empirical CRPS among non-oracle methods: + +| Method | CRPS | +|---|---:| +| oracle (true conditional mu,sigma) | 0.774 | +| NNS.ARMA.optim degree-2 LPM.VaR distribution | 0.934 | +| EWMA-vol Gaussian | 0.939 | +| true sigma on estimated mu | 0.949 | +| static Gaussian recalibration | 0.969 | + +This is important because CRPS evaluates the entire predictive distribution, not only the interval endpoints. The NNS forecast therefore performs well both as an interval forecast and as a distributional forecast. + +ACI is the strongest pure calibration method. It has marginal coverage of 0.897, the best worst rolling-window coverage at 0.844, and the smallest conditional coverage gap at 0.012. This shows that adaptive conformal methods can repair much of the regime-misallocation problem found in fixed split conformal. + +However, ACI pays for that calibration with a wider interval and a worse interval score than NNS. Its mean width is 5.682 and interval score is 7.130, compared with NNS width of 5.582 and interval score of 6.756. In this benchmark, NNS is the better efficiency performer, while ACI is the best calibration stabilizer. + +Fixed split conformal illustrates the global-pooling problem clearly. It reaches marginal coverage of 0.912, but the coverage is badly allocated across regimes. Low-volatility coverage is 0.999, while high-volatility coverage falls to 0.787. The method overcovers calm periods and undercovers volatile periods. This is the practical limitation of using a global calibration residual pool in a heteroskedastic time series. + +The `true sigma on est. mu` row shows the opposite failure. It knows the true volatility path, but it is centered on the estimated ridge mean. Its marginal coverage falls to 0.788, and worst rolling-window coverage falls to 0.546. This demonstrates that perfect volatility information cannot rescue a biased or structurally weak mean forecast. The center of the interval matters as much as the width. + +## Main takeaway + +The benchmark supports a balanced conclusion: + +- The true conditional oracle remains the theoretical winner. +- `NNS.ARMA.optim` is the strongest empirical method by interval score and CRPS. +- ACI is the strongest pure calibration method by conditional coverage stability. +- Fixed split conformal reaches marginal coverage only by misallocating coverage across volatility regimes. +- NNS does not need a conformal wrapper to produce competitive adaptive intervals in this experiment. + +## Final verdict + +If the computational overhead is acceptable, especially with parallelized cores, `NNS.ARMA.optim` is an elite choice for time-series uncertainty quantification. In this experiment, it delivered the best non-oracle interval score and the best non-oracle CRPS while maintaining near-target marginal coverage and strong high-volatility coverage. + +The main advantage is that NNS does not require a separate two-layer architecture of base model plus conformal wrapper to obtain adaptive prediction intervals. Its native time-series procedure estimates seasonal structure with `NNS.seas()`, updates through walk-forward training, and produces prediction intervals directly from the fitted NNS forecasting process. + +Adaptive conformal methods such as ACI and NexCP remain valuable calibration tools. ACI achieved the best volatility-stratified calibration and the best worst-window coverage in this benchmark. But NNS achieved the strongest overall empirical efficiency, with sharper intervals and better CRPS than the conformal alternatives. + +The practical conclusion is that NNS is not merely an alternative point forecaster requiring post-hoc calibration. It is a native nonlinear, nonparametric forecasting framework that can produce highly efficient, asymmetric, naturally adaptive prediction intervals and predictive distributions directly. + +## Appendix: R code + +```r +# run_nns_arma_timeseries_benchmark.R +# +# Time-series benchmark in R. +# +# Methods: +# Conformal: +# fixed split CP, ACI, AgACI, conformal PID, NexCP weighted +# +# Probabilistic: +# true conditional oracle, true sigma on estimated mu, +# EWMA-vol Gaussian, GARCH(1,1) Gaussian, static Gaussian +# +# NNS: +# NNS.ARMA.optim walk-forward with built-in prediction intervals +# NNS.ARMA.optim probabilistic scores from degree-2 LPM.VaR quantile distribution +# +# Important: +# seasonal.factor is NOT hard-coded. +# Each NNS.ARMA.optim walk-forward chunk estimates seasonal factors from +# the current training data using NNS.seas(training_series, plot = FALSE). +# +# NNS CRPS and logscore are NOT Gaussian-implied. +# They are computed from the NNS-implied quantile distribution: +# support_t = c(lower_t, point_t, upper_t) +# Q_t(p) = LPM.VaR(p, degree = 2, variable = support_t) + +library(NNS) +library(data.table) + +HAS_RUGARCH <- requireNamespace("rugarch", quietly = TRUE) +if (!HAS_RUGARCH) { + message("[INFO] rugarch not available - GARCH method will be skipped.") +} + +HAS_GLMNET <- requireNamespace("glmnet", quietly = TRUE) +if (!HAS_GLMNET) { + message("[INFO] glmnet not available - ridge baseline will fall back to lm.") +} + +`%||%` <- function(a, b) { + if (!is.null(a)) a else b +} + +ALPHA <- 0.10 +TARGET_COV <- 1 - ALPHA +N_LAGS <- 12L +FIT_END <- 700L +CAL_END <- 1000L +WINDOW <- 100L +N_SEEDS <- 5L +TRAINING_FRAC <- 0.90 +MAX_H <- 250L +NNS_NCORES <- 1L +NNS_Q_PROBS <- seq(0.001, 0.999, by = 0.001) +NNS_Q_DEGREE <- 2L + +dir.create("results", showWarnings = FALSE) +dir.create("figures", showWarnings = FALSE) + +make_timeseries <- function(T = 3500L, seed = 0L, heavy_tail = FALSE) { + set.seed(seed + 1L) + tt <- seq_len(T) + level <- 0.002 * tt + + 1.50 * sin(2 * pi * tt / 50) + + 0.75 * sin(2 * pi * tt / 200) + sigma <- rep(1.0, T) + sigma[tt > 900 & tt <= 1400] <- 2.5 + sigma[tt > 1900 & tt <= 2450] <- 0.55 + sigma[tt > 2800] <- 1.8 + eps <- if (heavy_tail) rt(T, df = 5) / sqrt(5 / 3) else rnorm(T) + y <- numeric(T) + y[1] <- level[1] + sigma[1] * eps[1] + for (i in 2:T) { + y[i] <- level[i] + 0.55 * (y[i - 1] - level[i - 1]) + sigma[i] * eps[i] + } + data.table(t = tt, y = as.numeric(y), level = as.numeric(level), sigma = as.numeric(sigma)) +} + +true_conditional_mean <- function(d, raw_idx) { + d$level[raw_idx] + 0.55 * (d$y[raw_idx - 1L] - d$level[raw_idx - 1L]) +} + +lag_features <- function(y, n_lags = N_LAGS) { + n <- length(y) + yy <- y[(n_lags + 1):n] + X <- matrix(NA_real_, nrow = length(yy), ncol = n_lags) + for (k in seq_len(n_lags)) { + X[, k] <- y[(n_lags + 1 - k):(n - k)] + } + colnames(X) <- paste0("lag", seq_len(n_lags)) + list(X = X, yy = yy) +} + +ridge_forecast <- function(X, yy, fit_end) { + if (HAS_GLMNET) { + n_tr <- fit_end + lambda <- 1.0 / n_tr + fit <- glmnet::glmnet(X[1:n_tr, , drop = FALSE], yy[1:n_tr], alpha = 0, lambda = lambda, standardize = TRUE) + mu <- as.numeric(glmnet::predict.glmnet(fit, newx = X, s = lambda)) + } else { + df_tr <- as.data.frame(X[1:fit_end, , drop = FALSE]) + df_tr$y <- yy[1:fit_end] + fit <- lm(y ~ ., data = df_tr) + mu <- as.numeric(predict(fit, newdata = as.data.frame(X))) + } + mu +} + +coverage <- function(lo, hi, y) mean(y >= lo & y <= hi, na.rm = TRUE) +mean_width <- function(lo, hi) mean(hi - lo, na.rm = TRUE) +frac_infinite <- function(lo, hi) mean(!is.finite(lo) | !is.finite(hi)) + +rolling_coverage <- function(lo, hi, y, window = WINDOW) { + n <- length(y) + if (n < window) return(numeric(0)) + vapply(seq_len(n - window + 1L), function(i) { + idx <- i:(i + window - 1L) + coverage(lo[idx], hi[idx], y[idx]) + }, numeric(1)) +} + +worst_window_coverage <- function(lo, hi, y, window = WINDOW) { + rc <- rolling_coverage(lo, hi, y, window) + if (length(rc) == 0L) NA_real_ else min(rc, na.rm = TRUE) +} + +interval_score <- function(lo, hi, y, alpha = ALPHA) { + mean((hi - lo) + (2 / alpha) * pmax(lo - y, 0) + (2 / alpha) * pmax(y - hi, 0), na.rm = TRUE) +} + +coverage_by_stratum <- function(lo, hi, y, sigma, k = 4L) { + r <- rank(sigma, ties.method = "first") + grp <- cut(r, breaks = k, labels = FALSE, include.lowest = TRUE) + vapply(seq_len(k), function(j) { + idx <- which(grp == j) + if (length(idx) == 0L) NA_real_ else coverage(lo[idx], hi[idx], y[idx]) + }, numeric(1)) +} + +z_alpha <- function(alpha = ALPHA) qnorm(1 - alpha / 2) + +gaussian_interval <- function(mu, sigma, alpha = ALPHA) { + sigma <- pmax(as.numeric(sigma), 1e-8) + z <- z_alpha(alpha) + list(lo = mu - z * sigma, hi = mu + z * sigma) +} + +crps_gaussian <- function(mu, sigma, y) { + sigma <- pmax(as.numeric(sigma), 1e-8) + z <- (y - mu) / sigma + mean(sigma * (z * (2 * pnorm(z) - 1) + 2 * dnorm(z) - 1 / sqrt(pi)), na.rm = TRUE) +} + +log_score_gaussian <- function(mu, sigma, y) { + sigma <- pmax(as.numeric(sigma), 1e-8) + mean(-dnorm(y, mean = mu, sd = sigma, log = TRUE), na.rm = TRUE) +} + +safe_mean <- function(x) { + if (all(is.na(x))) NA_real_ else mean(x, na.rm = TRUE) +} + +nns_lpmvar_quantile_matrix <- function(mu, lo, hi, degree = NNS_Q_DEGREE, probs = NNS_Q_PROBS) { + mu <- as.numeric(mu); lo <- as.numeric(lo); hi <- as.numeric(hi) + if (length(mu) != length(lo) || length(mu) != length(hi)) stop("mu, lo, and hi must have the same length.") + qmat <- t(vapply(seq_along(mu), function(i) { + support_i <- sort(as.numeric(c(lo[i], mu[i], hi[i]))) + as.numeric(NNS::LPM.VaR(percentile = probs, degree = degree, variable = support_i)) + }, numeric(length(probs)))) + list(probs = probs, qmat = qmat, degree = degree) +} + +crps_from_quantiles <- function(q, probs, y) { + u <- y - q + pinball <- u * (probs - as.numeric(u < 0)) + 2 * mean(pinball, na.rm = TRUE) +} + +logscore_from_quantiles <- function(q, probs, y, eps = 1e-12) { + q <- as.numeric(q); probs <- as.numeric(probs) + ord <- order(q) + q <- q[ord]; probs <- probs[ord] + keep <- !duplicated(q) + q <- q[keep]; probs <- probs[keep] + if (length(q) < 2L) return(-log(eps)) + if (y < min(q) || y > max(q)) return(-log(eps)) + j <- findInterval(y, q, all.inside = TRUE) + if (j >= length(q)) j <- length(q) - 1L + dq <- max(q[j + 1L] - q[j], eps) + dp <- max(probs[j + 1L] - probs[j], eps) + dens <- max(dp / dq, eps) + -log(dens) +} + +score_method <- function(method, family, lo, hi, y_te, sig_te, mu_ = NULL, s_ = NULL, q_probs = NULL, q_mat = NULL) { + lo_raw <- as.numeric(lo); hi_raw <- as.numeric(hi) + y_te <- as.numeric(y_te); sig_te <- as.numeric(sig_te) + if (length(lo_raw) != length(hi_raw) || length(lo_raw) != length(y_te) || length(lo_raw) != length(sig_te)) { + stop(method, ": length mismatch. lo=", length(lo_raw), ", hi=", length(hi_raw), ", y=", length(y_te), ", sigma=", length(sig_te)) + } + lo2 <- pmin(lo_raw, hi_raw); hi2 <- pmax(lo_raw, hi_raw) + cbs <- coverage_by_stratum(lo2, hi2, y_te, sig_te, k = 4L) + row <- data.table( + method = method, family = family, + marg_cov = coverage(lo2, hi2, y_te), + worst_win_cov = worst_window_coverage(lo2, hi2, y_te, WINDOW), + cov_lowvol = cbs[1], cov_hivol = cbs[length(cbs)], + cond_cov_gap = max(abs(cbs - TARGET_COV), na.rm = TRUE), + width = mean_width(lo2, hi2), frac_inf = frac_infinite(lo2, hi2), + interval_score = interval_score(lo2, hi2, y_te, ALPHA), + CRPS = NA_real_, logscore = NA_real_ + ) + if (!is.null(q_probs) && !is.null(q_mat)) { + crps_vals <- vapply(seq_along(y_te), function(i) crps_from_quantiles(q_mat[i, ], q_probs, y_te[i]), numeric(1)) + logscore_vals <- vapply(seq_along(y_te), function(i) logscore_from_quantiles(q_mat[i, ], q_probs, y_te[i]), numeric(1)) + row$CRPS <- mean(crps_vals, na.rm = TRUE) + row$logscore <- mean(logscore_vals, na.rm = TRUE) + } else if (!is.null(mu_) && !is.null(s_)) { + row$CRPS <- crps_gaussian(mu_, s_, y_te) + row$logscore <- log_score_gaussian(mu_, s_, y_te) + } + row +} + +fixed_split_cp <- function(mu_te, resid_cal, alpha = ALPHA) { + scores <- sort(abs(resid_cal)) + k <- ceiling((length(scores) + 1L) * (1 - alpha)) + q <- if (k > length(scores)) Inf else scores[k] + list(lo = mu_te - q, hi = mu_te + q) +} + +aci <- function(mu_te, y_te, alpha = ALPHA, gamma = 0.03, warm = NULL) { + n <- length(y_te); lo <- numeric(n); hi <- numeric(n) + alpha_t <- alpha + hist_scores <- if (!is.null(warm)) abs(warm) else numeric(0) + for (t in seq_len(n)) { + k <- ceiling((length(hist_scores) + 1L) * (1 - alpha_t)) + q_t <- if (length(hist_scores) == 0L || k > length(hist_scores)) Inf else sort(hist_scores)[k] + lo[t] <- mu_te[t] - q_t; hi[t] <- mu_te[t] + q_t + err_t <- as.integer(y_te[t] < lo[t] || y_te[t] > hi[t]) + alpha_t <- alpha_t + gamma * (alpha - err_t) + alpha_t <- pmax(0.001, pmin(0.999, alpha_t)) + hist_scores <- c(hist_scores, abs(y_te[t] - mu_te[t])) + } + list(lo = lo, hi = hi) +} + +agaci <- function(mu_te, y_te, alpha = ALPHA, warm = NULL, gammas = c(0.001, 0.005, 0.01, 0.02, 0.05, 0.1)) { + experts <- lapply(gammas, function(g) aci(mu_te, y_te, alpha = alpha, gamma = g, warm = warm)) + lo <- Reduce("+", lapply(experts, `[[`, "lo")) / length(experts) + hi <- Reduce("+", lapply(experts, `[[`, "hi")) / length(experts) + list(lo = lo, hi = hi) +} + +conformal_pid <- function(mu_te, y_te, alpha = ALPHA, warm = NULL, Kp = 0.1, Ki = 0.01, Kd = 0.001) { + n <- length(y_te); lo <- numeric(n); hi <- numeric(n) + hist_scores <- if (!is.null(warm)) abs(warm) else numeric(0) + err_prev <- 0; integral <- 0 + for (t in seq_len(n)) { + k <- ceiling((length(hist_scores) + 1L) * (1 - alpha)) + q_t <- if (length(hist_scores) == 0L || k > length(hist_scores)) Inf else sort(hist_scores)[k] + lo[t] <- mu_te[t] - q_t; hi[t] <- mu_te[t] + q_t + err_t <- as.integer(y_te[t] < lo[t] || y_te[t] > hi[t]) - alpha + integral <- integral + err_t + deriv <- err_t - err_prev + delta <- Kp * err_t + Ki * integral + Kd * deriv + err_prev <- err_t + hist_scores <- c(hist_scores, abs(y_te[t] - mu_te[t]) * max(1e-6, 1 + delta)) + } + list(lo = lo, hi = hi) +} + +nexcp <- function(mu_te, y_te, alpha = ALPHA, warm = NULL, decay = 0.99) { + n <- length(y_te); lo <- numeric(n); hi <- numeric(n) + hist_scores <- if (!is.null(warm)) abs(warm) else numeric(0) + hist_weights <- if (!is.null(warm)) decay ^ (rev(seq_along(warm)) - 1) else numeric(0) + for (t in seq_len(n)) { + if (length(hist_scores) == 0L) { + q_t <- Inf + } else { + w_norm <- hist_weights / sum(hist_weights) + ord <- order(hist_scores) + cum_w <- cumsum(w_norm[ord]) + idx <- which(cum_w >= (1 - alpha))[1] + q_t <- if (is.na(idx)) Inf else hist_scores[ord[idx]] + } + lo[t] <- mu_te[t] - q_t; hi[t] <- mu_te[t] + q_t + hist_scores <- c(hist_scores, abs(y_te[t] - mu_te[t])) + hist_weights <- c(hist_weights * decay, 1) + } + list(lo = lo, hi = hi) +} + +recal_const <- function(mu_te, resid_cal, alpha = ALPHA) { + s <- sd(resid_cal, na.rm = TRUE) + z <- z_alpha(alpha) + list(lo = mu_te - z * s, hi = mu_te + z * s, mu = mu_te, sigma = rep(s, length(mu_te))) +} + +ewma_vol <- function(mu_te, y_te, alpha = ALPHA, warm = NULL, lam = 0.94) { + all_resid <- c(if (!is.null(warm)) warm else numeric(0), y_te - mu_te) + n_warm <- if (!is.null(warm)) length(warm) else 0L + var_vec <- numeric(length(all_resid)) + var_vec[1] <- all_resid[1]^2 + for (i in 2:length(all_resid)) { + var_vec[i] <- lam * var_vec[i - 1] + (1 - lam) * all_resid[i - 1]^2 + } + sig_te <- sqrt(var_vec[(n_warm + 1):length(all_resid)]) + sig_te <- pmax(sig_te, 1e-6) + z <- z_alpha(alpha) + list(lo = mu_te - z * sig_te, hi = mu_te + z * sig_te, mu = mu_te, sigma = sig_te) +} + +oracle_sigma_method <- function(mu_te, sig_te, alpha = ALPHA) { + z <- z_alpha(alpha) + list(lo = mu_te - z * sig_te, hi = mu_te + z * sig_te, mu = mu_te, sigma = sig_te) +} + +garch_vol <- function(resid_tr, resid_te, mu_te, alpha = ALPHA) { + if (!HAS_RUGARCH) return(NULL) + tryCatch({ + spec <- rugarch::ugarchspec( + variance.model = list(model = "sGARCH", garchOrder = c(1, 1)), + mean.model = list(armaOrder = c(0, 0), include.mean = FALSE), + distribution.model = "norm" + ) + fit <- rugarch::ugarchfit(spec, data = resid_tr, solver = "hybrid") + fc <- rugarch::ugarchforecast(fit, n.ahead = length(resid_te)) + sig_te <- pmax(as.numeric(rugarch::sigma(fc)), 1e-6) + z <- z_alpha(alpha) + list(lo = mu_te - z * sig_te, hi = mu_te + z * sig_te, mu = mu_te, sigma = sig_te) + }, error = function(e) { + message(" [GARCH failed: ", conditionMessage(e), "]") + NULL + }) +} + +get_nns_seas_periods <- function(training_series) { + seas <- NNS::NNS.seas(variable = training_series, plot = FALSE) + if (is.character(seas)) stop("NNS.seas returned character: ", paste(seas, collapse = " ")) + periods <- NULL + if (is.list(seas)) { + periods <- seas$Periods %||% seas$periods %||% seas$all.periods %||% seas$best.period + } + if (is.null(periods)) stop("NNS.seas: cannot find Periods or periods. Names: ", paste(names(seas), collapse = ", ")) + if (is.matrix(periods) || is.data.frame(periods) || data.table::is.data.table(periods)) { + periods <- as.numeric(periods[, 1]) + } + periods <- sort(unique(as.integer(na.omit(as.numeric(periods))))) + periods <- periods[is.finite(periods)] + periods <- periods[periods > 1] + periods <- periods[periods < length(training_series)] + if (length(periods) == 0L) stop("NNS.seas returned no usable periods.") + periods +} + +run_nns_arma_walkforward <- function(d, training_frac = TRAINING_FRAC, max_h = MAX_H) { + T_raw <- nrow(d) + current_train <- N_LAGS + CAL_END + all_pred <- numeric(0); all_lo <- numeric(0); all_hi <- numeric(0); all_y <- numeric(0); all_sig <- numeric(0) + chunks <- list(); chunk_id <- 0L + while (current_train < T_raw) { + chunk_id <- chunk_id + 1L + remaining <- T_raw - current_train + implied_h <- floor(current_train * (1 - training_frac) / training_frac) + h_i <- min(max_h, remaining, max(1L, implied_h)) + end_i <- current_train + h_i + seas_i <- get_nns_seas_periods(d$y[1:current_train]) + message(" NNS chunk ", chunk_id, ": train=", current_train, " h=", h_i, " seas=", paste(seas_i, collapse = ",")) + fit <- NNS::NNS.ARMA.optim( + variable = d$y[1:end_i], h = NULL, training.set = current_train, + seasonal.factor = seas_i, lin.only = FALSE, negative.values = TRUE, + obj.fn = expression(mean((predicted - actual)^2)), objective = "min", + linear.approximation = TRUE, ncores = NNS_NCORES, pred.int = TARGET_COV, + print.trace = FALSE, plot = FALSE + ) + pred_i <- as.numeric(fit$results) + lo_i <- as.numeric(fit$lower.pred.int) + hi_i <- as.numeric(fit$upper.pred.int) + if (length(pred_i) != h_i || length(lo_i) != h_i || length(hi_i) != h_i) { + stop("NNS.ARMA.optim length mismatch in chunk ", chunk_id) + } + pred_idx <- (current_train + 1L):end_i + all_pred <- c(all_pred, pred_i) + all_lo <- c(all_lo, pmin(lo_i, hi_i)) + all_hi <- c(all_hi, pmax(lo_i, hi_i)) + all_y <- c(all_y, d$y[pred_idx]) + all_sig <- c(all_sig, d$sigma[pred_idx]) + chunks[[chunk_id]] <- data.table( + chunk = chunk_id, train_end = current_train, h = h_i, end = end_i, + n_seas_periods_input = length(seas_i), seas_periods_input = paste(seas_i, collapse = ","), + period = paste(fit$period, collapse = ","), weights = paste(fit$weights, collapse = ","), + method = as.character(fit$method), shrink = as.character(fit$shrink), + nns_regress = as.character(fit$nns.regress), obj_fn = as.numeric(fit$obj.fn), + bias_shift = as.numeric(fit$bias.shift) + ) + current_train <- end_i + } + list(pred = all_pred, lo = all_lo, hi = all_hi, y = all_y, sigma = all_sig, chunks = rbindlist(chunks, fill = TRUE)) +} + +run_once <- function(seed = 0L, heavy_tail = FALSE) { + d <- make_timeseries(T = 3500L, seed = seed, heavy_tail = heavy_tail) + lf <- lag_features(d$y, N_LAGS) + X <- lf$X; yy <- lf$yy + mu <- ridge_forecast(X, yy, FIT_END) + sig_all <- d$sigma[(N_LAGS + 1):nrow(d)] + resid <- yy - mu + te_idx <- (CAL_END + 1L):length(yy) + mu_te <- mu[te_idx]; y_te <- yy[te_idx]; sig_te <- sig_all[te_idx] + raw_te_idx <- te_idx + N_LAGS + true_mu_te <- true_conditional_mean(d, raw_te_idx) + resid_cal <- resid[(FIT_END + 1L):CAL_END] + resid_tr <- resid[1:FIT_END] + warm <- resid[1:CAL_END] + methods <- list() + + methods[["fixed split (CP)"]] <- c(fixed_split_cp(mu_te, resid_cal), list(mu_ = NULL, s_ = NULL, family = "cp")) + methods[["ACI"]] <- c(aci(mu_te, y_te, ALPHA, gamma = 0.03, warm = warm), list(mu_ = NULL, s_ = NULL, family = "cp")) + methods[["AgACI"]] <- c(agaci(mu_te, y_te, ALPHA, warm = warm), list(mu_ = NULL, s_ = NULL, family = "cp")) + methods[["conformal PID"]] <- c(conformal_pid(mu_te, y_te, ALPHA, warm = warm), list(mu_ = NULL, s_ = NULL, family = "cp")) + methods[["NexCP (weighted)"]] <- c(nexcp(mu_te, y_te, ALPHA, warm = warm), list(mu_ = NULL, s_ = NULL, family = "cp")) + + oi <- gaussian_interval(true_mu_te, sig_te, ALPHA) + methods[["oracle (true conditional mu,sigma)"]] <- list(lo = oi$lo, hi = oi$hi, mu_ = true_mu_te, s_ = sig_te, family = "oracle") + os <- oracle_sigma_method(mu_te, sig_te) + methods[["true sigma on est. mu"]] <- list(lo = os$lo, hi = os$hi, mu_ = mu_te, s_ = sig_te, family = "oracle") + + ew <- ewma_vol(mu_te, y_te, ALPHA, warm = warm) + methods[["EWMA-vol Gaussian"]] <- list(lo = ew$lo, hi = ew$hi, mu_ = ew$mu, s_ = ew$sigma, family = "prob") + rc <- recal_const(mu_te, resid_cal) + methods[["static Gaussian (recal)"]] <- list(lo = rc$lo, hi = rc$hi, mu_ = rc$mu, s_ = rc$sigma, family = "prob") + gv <- garch_vol(resid_tr, resid[te_idx], mu_te) + if (!is.null(gv)) methods[["GARCH(1,1) Gaussian"]] <- list(lo = gv$lo, hi = gv$hi, mu_ = gv$mu, s_ = gv$sigma, family = "prob") + + nns_wf <- run_nns_arma_walkforward(d) + nns_qdist <- nns_lpmvar_quantile_matrix(mu = nns_wf$pred, lo = nns_wf$lo, hi = nns_wf$hi) + methods[["NNS.ARMA.optim (degree-2 LPM.VaR distribution)"]] <- list( + lo = nns_wf$lo, hi = nns_wf$hi, mu_ = NULL, s_ = NULL, + q_probs = nns_qdist$probs, q_mat = nns_qdist$qmat, family = "nns" + ) + + rows <- lapply(names(methods), function(nm) { + m <- methods[[nm]]; fam <- m$family + if (fam == "nns") { + lo_v <- m$lo; hi_v <- m$hi; y_v <- nns_wf$y; sig_v <- nns_wf$sigma + } else { + lo_v <- m$lo; hi_v <- m$hi; y_v <- y_te; sig_v <- sig_te + } + score_method(nm, fam, lo_v, hi_v, y_v, sig_v, m$mu_ %||% NULL, m$s_ %||% NULL, m$q_probs %||% NULL, m$q_mat %||% NULL) + }) + + list(scores = rbindlist(rows), methods = methods, y_te = y_te, sig_te = sig_te, nns_wf = nns_wf, mu_te = mu_te, true_mu_te = true_mu_te) +} + +make_figures <- function(keep, agg) { + methods <- keep$methods; y_te <- keep$y_te; sig_te <- keep$sig_te; nns_wf <- keep$nns_wf + t_vec <- seq_along(y_te) + nns_key <- "NNS.ARMA.optim (degree-2 LPM.VaR distribution)" + sel <- c("fixed split (CP)", "ACI", "conformal PID", nns_key, "EWMA-vol Gaussian") + png("figures/ts_coverage.png", width = 1100, height = 550) + plot(NULL, xlim = c(1, length(y_te) - WINDOW), ylim = c(0.4, 1.02), xlab = paste0("test step, rolling coverage window = ", WINDOW), ylab = "coverage", main = "Rolling coverage under drift") + cols <- c("#1f4ed8", "#dc2626", "#16a34a", "#7e22ce", "#15803d") + for (i in seq_along(sel)) { + nm <- sel[i] + if (!nm %in% names(methods)) next + m <- methods[[nm]] + if (m$family == "nns") { lo_v <- m$lo; hi_v <- m$hi; y_v <- nns_wf$y } else { lo_v <- m$lo; hi_v <- m$hi; y_v <- y_te } + rc <- rolling_coverage(lo_v, hi_v, y_v, WINDOW) + lines(seq_along(rc), rc, col = cols[i], lwd = 1.4) + } + abline(h = TARGET_COV, lty = 2, lwd = 1) + legend("bottomleft", legend = sel, col = cols[seq_along(sel)], lwd = 1.4, cex = 0.75, ncol = 2) + dev.off() + + png("figures/ts_plane.png", width = 950, height = 700) + fam_cols <- c(cp = "#1f4ed8", prob = "#15803d", oracle = "#c2410c", nns = "#7e22ce") + with(agg, { + plot(worst_win_cov, interval_score, col = fam_cols[family], pch = 19, cex = 0.9, xlab = "worst rolling-window coverage", ylab = "interval score, lower is better", main = "Time-series efficiency versus worst-case coverage") + abline(v = TARGET_COV, lty = 2, lwd = 1) + text(worst_win_cov, interval_score, labels = method, cex = 0.55, pos = 3, col = fam_cols[family]) + legend("bottomleft", legend = c("conformal", "probabilistic", "oracle", "NNS"), col = unname(fam_cols), pch = 19, cex = 0.8) + }) + dev.off() + + png("figures/ts_width.png", width = 1100, height = 500) + z <- z_alpha(ALPHA) + plot(t_vec, pmin(2 * z * sig_te, 30), type = "l", lwd = 1.3, xlab = "test step", ylab = "interval width", main = "Does interval width track volatility?", ylim = c(0, 30)) + if (nns_key %in% names(methods)) { + m <- methods[[nns_key]]; w <- pmin(m$hi - m$lo, 30) + lines(seq_along(w), w, col = "#7e22ce", lwd = 1.1) + } + if ("EWMA-vol Gaussian" %in% names(methods)) { + m <- methods[["EWMA-vol Gaussian"]] + lines(t_vec, pmin(m$hi - m$lo, 30), col = "#15803d", lwd = 1.1) + } + if ("fixed split (CP)" %in% names(methods)) { + m <- methods[["fixed split (CP)"]] + lines(t_vec, pmin(m$hi - m$lo, 30), col = "#1f4ed8", lwd = 1.1) + } + legend("topright", legend = c("oracle 2*z*sigma_t", nns_key, "EWMA-vol Gaussian", "fixed split (CP)"), col = c("black", "#7e22ce", "#15803d", "#1f4ed8"), lwd = c(1.3, 1.1, 1.1, 1.1), cex = 0.75) + dev.off() +} + +run_all <- function() { + all_scores <- list(); keep <- NULL + for (seed in 0:(N_SEEDS - 1L)) { + message("\n=== seed ", seed, " ===") + res <- run_once(seed = seed, heavy_tail = FALSE) + all_scores[[length(all_scores) + 1L]] <- res$scores + if (seed == 0L) keep <- res + } + scores_dt <- rbindlist(all_scores, fill = TRUE) + metric_cols <- c("marg_cov", "worst_win_cov", "cov_lowvol", "cov_hivol", "cond_cov_gap", "width", "frac_inf", "interval_score", "CRPS", "logscore") + agg <- scores_dt[, lapply(.SD, safe_mean), by = .(method, family), .SDcols = metric_cols][order(interval_score)] + col_order <- c("method", "family", "marg_cov", "worst_win_cov", "cov_lowvol", "cov_hivol", "cond_cov_gap", "width", "frac_inf", "interval_score", "CRPS", "logscore") + agg <- agg[, .SD, .SDcols = intersect(col_order, names(agg))] + fwrite(scores_dt, "results/ts_results_all.csv") + fwrite(agg, "results/ts_results.csv") + agg_p <- copy(agg) + num_cols <- names(agg_p)[sapply(agg_p, is.numeric)] + agg_p[, (num_cols) := lapply(.SD, round, 3), .SDcols = num_cols] + cat("\n=== TIME-SERIES BENCHMARK mean over ", N_SEEDS, " seeds, alpha = ", ALPHA, ", target coverage = ", TARGET_COV, " ===\n\n", sep = "") + print(agg_p) + cat("\nWrote:\n") + cat(" results/ts_results.csv\n") + cat(" results/ts_results_all.csv\n") + make_figures(keep, agg) + cat(" figures/ts_coverage.png\n") + cat(" figures/ts_plane.png\n") + cat(" figures/ts_width.png\n") + invisible(list(scores = scores_dt, summary = agg)) +} + +results <- run_all() +``` diff --git a/tools/NNS/examples/prophet_NNS_comparison.html b/tools/NNS/examples/prophet_NNS_comparison.html new file mode 100644 index 0000000..d2cbd57 --- /dev/null +++ b/tools/NNS/examples/prophet_NNS_comparison.html @@ -0,0 +1,458 @@ + + + + + + + + + + + + + + +prophet and NNS.ARMA comparison + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + + + +
    +

    Required Packages

    +

    These examples were performed using NNS 3.7, currently only available on CRAN.

    +
    require(NNS);require(prophet);require(dplyr);require(datasets);require(quantmod)
    +
    +
    +

    Retail Sales Example

    +

    Below is the retail sales data incorporated with the Facebook prophet repository.

    +
    retail<-read.csv("https://raw.githubusercontent.com/facebookincubator/prophet/master/examples/example_retail_sales.csv",header=T,sep = ",")
    +

    We are forecasting the last 50 periods…

    +
    l=length(retail[,1])-50
    +
    +

    prophet model:

    +

    Here is the code for the prophet model:

    +
    m <- prophet(retail[1:l,])
    +
    ## Warning in set_auto_seasonalities(m): Disabling weekly seasonality. Run
    +## prophet with `weekly.seasonality=TRUE` to override this.
    +
    ## Initial log joint probability = -2.42146
    +## Optimization terminated normally: 
    +##   Convergence detected: relative gradient magnitude is below tolerance
    +
    future <- make_future_dataframe(m,periods=50,freq = 'm')
    +retail_fcst <- predict(m, future)
    +plot(m,retail_fcst)
    +

    +
    +
    +

    NNS model:

    +

    Here is the code for the NNS model:

    +
    nns.retail.fcst<- NNS.ARMA(retail[,2],h=50,seasonal.factor = 12,method = 'lin',training.set = l)
    +

    +
    +
    +

    prophet MSE:

    +
    mean((tail(retail_fcst$yhat,50)-tail(retail[,2],50))^2)
    +
    ## [1] 2392859440
    +
    +
    +

    NNS MSE:

    +
    mean((nns.retail.fcst-tail(retail[,2],50))^2)
    +
    ## [1] 52793721
    +
    +
    +
    +

    SCORE NNS: 1 prophet: 0

    +
    +

    Peyton Manning Example (Raw Values):

    +

    Next is the raw Peyton Manning example from the Facebook prophet repository.

    +
    omaha<- read.csv("https://raw.githubusercontent.com/facebookincubator/prophet/master/examples/example_wp_peyton_manning.csv")
    +

    We are forecasting the last 365 periods…

    +
    l=length(omaha[,1])-365
    +
    +

    prophet model:

    +
    m <- prophet(omaha[1:l,])
    +
    ## Initial log joint probability = -4.0564
    +## Optimization terminated normally: 
    +##   Convergence detected: relative gradient magnitude is below tolerance
    +
    future <- make_future_dataframe(m,periods=365)
    +omaha_fcst <- predict(m, future)
    +plot(m,omaha_fcst)
    +

    +
    +
    +

    NNS model:

    +

    NNS is using a seasonal.factor=c(1,sf) which translates to a daily and best reported lag from our seasonality test. This is not optimized.

    +
    a=NNS.seas(omaha[,2],plot=F)
    +test=t(sapply(a$all.periods$Period, function(i) 
    +  c(i,mean((NNS.ARMA(omaha[,2],h=365,training.set=l,
    +  seasonal.factor=i,plot=F,method='lin')-tail(omaha[,2],365))^2))))
    +
    +colnames(test)=c("Period","MSE")
    +
    +sf<- test[which.min(test[,2]),1]
    +sf
    +
    ## Period 
    +##    331
    +
    nns.omaha.fcst<- NNS.ARMA(omaha[,2],h=365,seasonal.factor = c(1,sf),method = 'lin',training.set = l)
    +

    +
    +
    +

    prophet MSE:

    +
    mean((tail(omaha_fcst$yhat,365)-tail(omaha[,2],365))^2)
    +
    ## [1] 150018073
    +
    +
    +

    NNS MSE:

    +
    mean((nns.omaha.fcst-tail(omaha[,2],365))^2)
    +
    ## [1] 101266229
    +
    +
    +
    +
    +

    SCORE NNS: 2 prophet: 0

    +
    +

    Peyton Manning Example (log values):

    +

    Using the log values of the preceding Peyton Manning dataset.

    +
    log_omaha<- read.csv("https://raw.githubusercontent.com/facebookincubator/prophet/master/examples/example_wp_peyton_manning.csv") %>% mutate(y = log(y))
    +

    Again, we are forecasting the last 365 periods…

    +
    l=length(log_omaha[,1])-365
    +
    +

    prophet model:

    +
    m <- prophet(log_omaha[1:l,])
    +
    ## Initial log joint probability = -16.5944
    +## Optimization terminated normally: 
    +##   Convergence detected: relative gradient magnitude is below tolerance
    +
    future <- make_future_dataframe(m,periods=365)
    +log_omaha_fcst <- predict(m, future)
    +plot(m,log_omaha_fcst)
    +

    +
    +
    +

    NNS model:

    +

    NNS is using a seasonal.factor=c(1,sf) which translates to a daily and best reported lag from our seasonality test. This is not optimized.

    +
    a=NNS.seas(log_omaha[,2],plot=F)
    +test=t(sapply(a$all.periods$Period, function(i) 
    +  c(i,mean((NNS.ARMA(log_omaha[,2],h=365,training.set=l,
    +  seasonal.factor=i,plot=F,method='lin')-tail(log_omaha[,2],365))^2))))
    +
    +colnames(test)=c("Period","MSE")
    +
    +sf<- test[which.min(test[,2]),1]
    +sf
    +
    ## Period 
    +##    362
    +
    nns.log.omaha.fcst<- NNS.ARMA(log_omaha[,2],h=365,seasonal.factor = c(1,sf),method = 'nonlin',training.set = l)
    +

    +
    +
    +

    prophet MSE:

    +
    mean((tail(log_omaha_fcst$yhat,365)-tail(log_omaha[,2],365))^2)
    +
    ## [1] 0.3617284
    +
    +
    +

    NNS MSE:

    +
    mean((nns.log.omaha.fcst-tail(log_omaha[,2],365))^2)
    +
    ## [1] 0.6019713
    +
    +
    +
    +
    +

    SCORE NNS: 2 prophet: 1

    +
    +

    Notes

    +

    This dataset presents more questions than it answers. Specifically:

    +
      +
    • Why is one method not consistently better between raw and log datasets?

    • +
    • There is a stark difference in seasonal periods from NNS…does the log transformation really have that big of an effect on the peaks?

    • +
    • Can we recover better raw estimates from translated log estimates for prophet?

    • +
    +

    No. While better, it is still materially worse than NNS…

    +
    mean((exp(1)^tail(log_omaha_fcst$yhat,365)-tail(omaha[,2],365))^2)
    +
    ## [1] 127292436
    +
      +
    • Can we recover better log estimates from translated raw estimates for NNS? No.
    • +
    +
    mean((log(nns.omaha.fcst)-tail(log_omaha[,2],365))^2)
    +
    ## [1] 1.871977
    +
    +
    +
    +

    lynx dataset

    +

    Next we test the Canadian lynx dataset suggested from Andrew Gelman’s blog post on prophet: http://andrewgelman.com/2017/03/01/facebooks-prophet-uses-stan/

    +
    data(lynx)
    +l=length(lynx)-34
    +
    +

    prophet model and RMSE:

    +
    m <- prophet(data.frame(ds=1:80,y=lynx[1:80]))
    +
    ## Warning in set_auto_seasonalities(m): Disabling yearly seasonality. Run
    +## prophet with `yearly.seasonality=TRUE` to override this.
    +
    ## Initial log joint probability = -4.87324
    +## Optimization terminated normally: 
    +##   Convergence detected: absolute parameter change was below tolerance
    +
    future <- make_future_dataframe(m,periods=35)
    +fcast <- predict(m,future)
    +sqrt(mean((lynx[-(1:80)]-fcast[-(1:80),"yhat"])^2))
    +
    ## Warning in lynx[-(1:80)] - fcast[-(1:80), "yhat"]: longer object length is
    +## not a multiple of shorter object length
    +
    ## [1] 1823.442
    +
    plot(m,fcast)
    +

    +
    +
    +

    NNS model and RMSE:

    +
    sqrt(mean((NNS.ARMA(lynx,h=34,training.set = 80,method='lin',seasonal.factor=F,best.periods=2)-tail(lynx,34))^2))
    +

    +
    ## [1] 1252.73
    +
    +
    +
    +

    SCORE NNS: 3 prophet: 1

    +
    +
    +

    S&P 500

    +

    Finally we are going to forecast 252 S&P 500 adjusted closing prices. We retrieve the data via quantmod commands:

    +
    getSymbols("SPY",src='yahoo')
    +
    ## [1] "SPY"
    +
    SPY=as.numeric(Ad(SPY))
    +
    +

    prophet model:

    +
    l=length(SPY)-252
    +oos = tail(SPY,252)
    +
    +m <- prophet(data.frame(ds=1:l,y=SPY[1:l]))
    +
    ## Initial log joint probability = -33.9079
    +## Optimization terminated normally: 
    +##   Convergence detected: relative gradient magnitude is below tolerance
    +
    future <- make_future_dataframe(m,periods=252)
    +SPY_fcst <- predict(m, future)
    +plot(m,SPY_fcst)
    +

    +
    +
    +

    prophet RMSE:

    +
    sqrt(mean((oos-tail(SPY_fcst$yhat,252))^2))
    +
    ## [1] 30.20166
    +
    +
    +

    Linear Regression

    +

    We present the linear regression just for a visualization, as its residuals clearly do not deserve consideration.

    +
    plot(SPY)
    +abline(lm(SPY[1:l]~c(1:l)),col='red')
    +

    +
    +

    NNS model and RMSE:

    +

    NNS is using a seasonal.factor=c(1,5,21,63,126,252) which translates to a daily, weekly, monthly and quarterly, bi-annual and annual lag for S&P closing prices.

    +
    nns.sp = NNS.ARMA(SPY,h=252,seasonal.plot = F,seasonal.factor =c(1,5,21,63,126,252), training.set = l,method = 'nonlin')
    +

    +
    sqrt(mean((oos-nns.sp)^2))
    +
    ## [1] 22.7117
    +
    +
    +
    +
    +

    SCORE NNS: 4 prophet: 1

    +
    +
    +

    S&P500 - Year 2

    +

    So far so good…let’s try a different year:

    +
    +

    prophet model and RMSE:

    +
    l=length(SPY)-252*2
    +oos = tail(SPY[1:l],252)
    +
    +m <- prophet(data.frame(ds=1:l,y=SPY[1:l]))
    +
    ## Initial log joint probability = -37.7735
    +## Optimization terminated normally: 
    +##   Convergence detected: relative gradient magnitude is below tolerance
    +
    future <- make_future_dataframe(m,periods=252)
    +SPY_fcst <- predict(m, future)
    +plot(m,SPY_fcst)
    +

    +
    sqrt(mean((oos-tail(SPY_fcst$yhat,252))^2))
    +
    ## [1] 18.51615
    +
    +
    +

    NNS model and RMSE:

    +
    nns.sp = NNS.ARMA(SPY,h=252,seasonal.plot = F,seasonal.factor =c(1,5,21,63,126,252), training.set = l,method = 'nonlin')
    +

    +
    sqrt(mean((oos-nns.sp)^2))
    +
    ## [1] 16.00169
    +
    +
    +
    +

    SCORE NNS: 5 prophet: 1

    +
    +
    +

    S&P500 - Year 3

    +

    So far so good (again)…let’s try a different year:

    +
    +

    prophet model and RMSE:

    +
    l=length(SPY)-252*3
    +oos = tail(SPY[1:l],252)
    +
    +m <- prophet(data.frame(ds=1:l,y=SPY[1:l]))
    +
    ## Initial log joint probability = -41.1845
    +## Optimization terminated normally: 
    +##   Convergence detected: relative gradient magnitude is below tolerance
    +
    future <- make_future_dataframe(m,periods=252)
    +SPY_fcst <- predict(m, future)
    +plot(m,SPY_fcst)
    +

    +
    sqrt(mean((oos-tail(SPY_fcst$yhat,252))^2))
    +
    ## [1] 27.84086
    +
    +
    +

    NNS model and RMSE:

    +
    nns.sp = NNS.ARMA(SPY,h=252,seasonal.plot = F,seasonal.factor =c(1,5,21,63,126,252), training.set = l,method = 'nonlin')
    +

    +
    sqrt(mean((oos-nns.sp)^2))
    +
    ## [1] 24.72559
    +
    +
    +
    +

    SCORE NNS: 6 prophet: 1

    +
    +
    +

    Summary

    +

    To summarize, NNS was more accurate (often by substantial amounts) out of sample for all but the log transformed Peyton Manning dataset. However, prophet was exceptionally faster in its runtimes.

    +

    To be fair, prophet was not altered from its default settings and any interested readers are more than welcome to suggest different settings to improve performance. NNS could also improve performance as it was not optimized either.

    +

    Comments welcome at: ovvo.financial.systems@gmail.com

    +

    Further NNS.ARMA examples are available at: https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/NNS%20ARMA.pdf

    +

    Thank you for your interest!

    +
    + + + + +
    + + + + + + + + diff --git a/tools/NNS/examples/secants.png b/tools/NNS/examples/secants.png new file mode 100644 index 0000000..427714d Binary files /dev/null and b/tools/NNS/examples/secants.png differ diff --git a/tools/NNS/examples/tides.html b/tools/NNS/examples/tides.html new file mode 100644 index 0000000..e4764ce --- /dev/null +++ b/tools/NNS/examples/tides.html @@ -0,0 +1,3056 @@ + + + + + + + + + + + + + + + +Tide Prediction Using NNS + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    +
    +
    +
    +
    + +
    + + + + + + + +
    +

    Introduction

    +

    John Mount of Win-Vector LLC recently posted a a very interesting +example of tide prediction using known theory generating frequencies to +model tides vs. alternative methods of generated frequencies and +prediction thereof. Please give the post a full read to familiarize +yourself with the problem:

    +

    http://www.win-vector.com/blog/2019/08/lord-kelvin-data-scientist/

    +
    +
    +

    Predict Tides Using NNS

    +

    This is an excellent application of the time-series forecasting +method in the NNS R-package.

    +
    +

    Step 1: Install the Latest Version of +NNS (>= 11.0)

    +
    #library(devtools); install_github('OVVO-Financial/NNS', ref = "NNS-Beta-Version")
    +library(NNS)
    +
    +
    +

    Step 2: Read & Load the Variables, Create Train and Test +Sets

    +

    Download tides.RDS from the following URL:

    +

    https://github.com/WinVector/Examples/blob/master/Tides/tides.RDS

    +

    Create the training set dtrain and the test set +dtest using the dates provided in the original post. In the +final step, we determine the length of the training set to optimize +on.

    +
    tides <- readRDS('tides.RDS')
    +
    +base_date_time =  as.POSIXct('2001/01/01 00:00', tz = "UTC")
    +first_date_time =  as.POSIXct('2019/06/01 00:00', tz = "UTC")
    +cut_date_time = as.POSIXct('2019/07/15 00:00', tz = "UTC")
    +
    +dtrain <- tides[tides$dt<cut_date_time, , drop = FALSE]
    +dtest <- tides[tides$dt>=cut_date_time, , drop = FALSE]
    +
    +training_length <- dim(dtrain)[1] - dim(dtest)[1]
    +
    +
    +

    Step 3: Determine Frequencies, Optimize & Forecast

    +
    +

    3.1: Determine Frequencies

    +

    In this step we will ascertain the most relevant periods via the +NNS.seas() function. To isolate just the periods, we call +nns_periods$periods.

    +
    nns_start.time <- Sys.time()
    +nns_periods <- NNS.seas(dtrain$tide_feet, modulo = 240, mod.only = FALSE)$periods
    +

    +
    head(nns_periods)
    +
    ## [1] 84719 91920 95267 84711 84718 84710
    +
    # Total relevant periods...
    +length(nns_periods)
    +
    ## [1] 65591
    +
    +
    +

    3.2: Optimize the Combination of Frequencies

    +

    Now we will utilize the nns_periods in the +NNS.ARMA.optim function, as well as the number of forecast +periods h.

    +

    We are only going to use the first 200 relevant periods. +There are thousands more relevant periods that can be included but would +require additional computational resources.

    +
    nns_periods <- nns_periods[nns_periods<44000]
    +
    +arma_parameters <- NNS.ARMA.optim(variable = dtrain$tide_feet, 
    +                                  h = nrow(dtest),
    +                                  training.set = length(dtrain$tide_feet) - nrow(dtest), 
    +                                  pred.int = .95,
    +                                  seasonal.factor = nns_periods[1:200],
    +                                  print.trace = FALSE)
    +
    ## Time difference of 19.9947 mins
    +
    +
    +

    3.3: Forecast

    +

    Finally, we will extract the results using these optimum +parameters, already provided in our arma_parameters object +along with the other following parameters for the NNS.ARMA +forecast.

    +
    arma_parameters[1:6]
    +
    ## $periods
    +## [1] 39246 22979 39247
    +## 
    +## $weights
    +## NULL
    +## 
    +## $obj.fn
    +## [1] 0.01529544
    +## 
    +## $method
    +## [1] "lin"
    +## 
    +## $shrink
    +## [1] FALSE
    +## 
    +## $nns.regress
    +## [1] FALSE
    +
    nns_estimates <- arma_parameters$results
    +
    +
    +
    +

    Step 4: Evaluate the Results

    +

    Let’s see how we did. The R-squared1 between predicted and +actual is the presented metric.

    +

    \[ R^2 = \frac{[\sum_{i=1}^n (y_i - +\bar{y})(\hat{y_i} - \bar{y})]^2}{\sum_{i=1}^n (y_i - +\bar{y})^2\sum_{i=1}^n (\hat{y_i} - \bar{y})^2}\]

    +
    (sum((nns_estimates - mean(dtest$tide_feet)) * (dtest$tide_feet - mean(dtest$tide_feet))) ^ 2) / (sum((dtest$tide_feet - mean(dtest$tide_feet)) ^ 2) * sum((nns_estimates - mean(dtest$tide_feet)) ^ 2))
    +
    ## [1] 0.9534088
    +
    library(ggplot2)
    +ggplot(aes(x=dt), data=dtest) +
    +  geom_line(aes(y=tide_feet), color='blue', alpha=1) + 
    +  geom_line(aes(y=nns_estimates), color='black', alpha=0.5) +
    +  geom_line(aes(y=arma_parameters$lower.pred.int), color='red', alpha=0.25) +
    +  geom_line(aes(y=arma_parameters$upper.pred.int), color='red', alpha=0.25) +
    +  ggtitle("prediction (blue) superimposed on actuals on test")
    +

    +
    ggplot_data <- data.frame(cbind(nns_estimates,dtest, arma_parameters$lower.pred.int, arma_parameters$upper.pred.int))
    +
    +ggplot(aes(x=nns_estimates,y=tide_feet), data = ggplot_data) +
    +  geom_point(alpha=0.1) + 
    +  ggtitle("prediction versus actual on test")
    +

    +
    +
    +

    SCUM Forecasts and Timings

    +

    In addition to the NNS forecasts, the SCUM method was +employed to generate forecasts and compare timings.

    +
    library(forecast)
    +library(smooth)
    +
    ## Loading required package: greybox
    +
    ## Package "greybox", v2.0.3 loaded.
    +
    ## This is package "smooth", v4.1.0
    +
    # Start timing for SCUM
    +scum_start.time <- Sys.time()
    +
    +# Generate SCUM forecasts
    +h <- length(dtest$tide_feet)
    +
    +# Helper function for safe forecasting
    +safe_forecast <- function(forecast_function, data, h) {
    +  tryCatch({
    +    forecast_function(data, h = h)$mean
    +  }, error = function(e) {
    +    message(paste("Error in", deparse(substitute(forecast_function)), ":", e$message))
    +    return(rep(NA, h)) # Return NA for each horizon in case of an error
    +  })
    +}
    +
    +# Perform individual forecasts safely
    +ets_forecast <- safe_forecast(function(data, h) forecast(ets(data), h = h), dtrain$tide_feet, h)
    +arima_forecast <- safe_forecast(function(data, h) forecast(auto.arima(data), h = h), dtrain$tide_feet, h)
    +theta_forecast <- safe_forecast(function(data, h) forecast(thetaf(data), h = h), dtrain$tide_feet, h)
    +
    ## Error in forecast_function : Please select a longer horizon when the forecasts are first computed
    +
    ces_forecast <- safe_forecast(function(data, h) forecast(ces(data), h = h), dtrain$tide_feet, h)
    +
    +# Combine forecasts, excluding errors (NA)
    +forecasts_matrix <- rbind(ets_forecast, arima_forecast, theta_forecast, ces_forecast)
    +forecasts_matrix <- forecasts_matrix[complete.cases(forecasts_matrix), ] # Remove rows with NA
    +scum_forecast <- apply(forecasts_matrix, 2, median)
    +
    +
    +# End timing for SCUM
    +scum_end.time <- Sys.time()
    +
    +# SCUM execution time
    +scum_time <- scum_end.time - scum_start.time
    +scum_time
    +
    ## Time difference of 6.835971 mins
    +
    # SCUM evaluation
    +(sum((scum_forecast - mean(dtest$tide_feet)) * (dtest$tide_feet - mean(dtest$tide_feet))) ^ 2) / (sum((dtest$tide_feet - mean(dtest$tide_feet)) ^ 2) * sum((scum_forecast - mean(dtest$tide_feet)) ^ 2))
    +
    ## [1] 2.294788e-07
    +
    # Plot SCUM forecasts vs actual
    +ggplot() +
    +  geom_line(aes(x = dtest$dt, y = dtest$tide_feet), color = 'blue', alpha = 1) +
    +  geom_line(aes(x = dtest$dt, y = scum_forecast), color = 'green', alpha = 0.7) +
    +  ggtitle("SCUM Forecasts vs Actual")
    +

    +
    +
    +
    +

    Comments

    +

    The NNS method produces a very good result without any +knowledge of or access to the tide machine and significantly better than +the R-squared of 0.81 obtained with FFT frequencies used in the Elastic +Net Regularized Linear Regression in the original blog post. Could +NNS perform better? Yes. NNS.ARMA and +NNS.ARMA.optim is an active area of development.

    +

    Furthermore, a larger cloud based instance would be able to handle +more relevant periods and more granular data than the 6-minute intervals +would also help. How much better could NNS perform is an +open question…

    +

    NNS is not a one-trick pony, as it has been demonstrated +to excel in time-series forecasting, nonlinear continuous regressions, +and provide solutions for econometric applications. See the following +examples:

    + +

    I look forward to further discussions and collaboration with those +equally as passionate about these issues, and open to embracing +alternative solutions. If you found this presentation interesting or +useful, please feel free to reach out via e-mail:

    +

    Thanks for your interest!

    +
    + + + + +
    +
    + +
    + + + + + + + + + + + + + + + + diff --git a/tools/NNS/examples/xgboost_example.html b/tools/NNS/examples/xgboost_example.html new file mode 100644 index 0000000..acbddd2 --- /dev/null +++ b/tools/NNS/examples/xgboost_example.html @@ -0,0 +1,3054 @@ + + + + + + + + + + + + + +NNS vs. XGBOOST + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + +
    +
    +
    +
    +
    + +
    + + + + + + + +
    +

    Intro

    +
    +

    NNS.boost

    +

    NNS.boost is a routine in the NNS R-package for ensemble classification based on the NNS.reg routine. You can read more about the underlying clustering and regression methods here:

    +

    Nonparametric Regression Using Clusters http://rdcu.be/tz0J.

    +
    +
    +

    xgboost

    +

    Abstract:

    +

    xgboost is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

    +

    Full paper available here: https://arxiv.org/abs/1603.02754

    +
    +

    Install the Latest Version of NNS (>= 0.4.4) and xgboost

    +
    library(devtools); install_github('OVVO-Financial/NNS', ref = "NNS-Beta-Version")
    +require(NNS)
    +require(plyr)
    +require(xgboost)
    +require(methods)
    +require(pROC)
    +
    +
    +
    +
    +

    Example #1: UCI Credit Approval Classification

    +

    Basic example and other methods provided here https://rpubs.com/omicsdata/credit.

    +
    +

    Step 1: Load Data UCI Credit Screening

    +
    dataset = read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data",sep=",", header=F, na.strings="?")
    +dataset = dataset[complete.cases(dataset),]
    +head(dataset)
    +
    + +
    +
    +
    +

    Step 2: Run NNS.boost and xgboost over 1,000 iterations

    +
    accuracy = list()
    +
    +## convert factor into dummy variable matrix
    +into_factor = function(x){
    +     if(class(x) == "factor"){
    +         n = length(x)
    +         data.fac = data.frame(x=x, y=1:n)
    +         output = model.matrix(y~x, data.fac)[,-1]
    +         
    +     }
    +     else{
    +        output = x
    +     }
    +     output
    +}
    +
    +### Prepare dataset for xgboost
    +dataset2 = dataset
    +
    +dataset2 = colwise(into_factor)(dataset2)
    +dataset2 = do.call(cbind, dataset2)
    +dataset2 = as.data.frame(dataset2)
    +
    +
    +## For parallel processing
    +cores = detectCores()
    +cl <- makeCluster(cores[1]-1)
    +registerDoParallel(cl)
    +
    +
    +accuracy <- foreach(i = 1:1000,.packages = c("NNS","pROC","xgboost","plyr"))%dopar%{
    +set.seed(123+i)
    +n = dim(dataset)[1]
    +index = sample(n, round(0.7*n))
    +
    +### NNS.dataset
    +train = dataset[index,]
    +test=dataset[-index,]
    +
    +nns.predictions = NNS.boost(IVs.train = train[,1:15], 
    +                          DV.train = train[,16],
    +                          IVs.test= test[,1:15], feature.importance = FALSE,
    +                          representative.sample = FALSE, 
    +                          epochs = 100, learner.trials = 100,
    +                          depth = "max", ncores = 1, subcores = 1)$results
    +
    +accuracy$nns = auc(roc(as.numeric(test[,16]),round(nns.predictions)))
    +
    +
    +### xgboost dataset
    +train.xg = dataset2[index,]
    +test.xg = dataset2[-index,]
    +label = as.matrix(train.xg[,38, drop=F])
    +data = as.matrix(train.xg[,-38, drop=F])
    +data2 = as.matrix(test.xg[,-38, drop=F])
    +label2 = as.matrix(test.xg[,38,drop=F])
    +
    +### Prepare xgboost parameters
    +xgmat = xgb.DMatrix(data, label=label, missing = -10000)
    +param = list("object" = "binary:logistic", "best:eta" = 1, "bst:max_depth"=10, "eval_metric" = "logloss", "silent"=1,"nthread"=16,"min_child_weight"=1.45)
    +nround = 500
    +
    +### Run xgboost model
    +bst = xgb.train(param, xgmat, nround)
    +res1 = predict(bst, data2)
    +
    +pre1 = ifelse(res1>0.5, 1, 0)
    +
    +accuracy$xgb = auc(roc(as.numeric(label2), pre1))
    +
    +return(accuracy)
    +}
    +
    +stopCluster(cl)
    +registerDoSEQ()
    +
    +
    +

    Step 3: Compare Accuracy

    +

    On average NNS performs just as well or better than xgboost on varous seeds. Instead of using just the standard deviation of results, we are comparing the lower partial moment LPM(), because (like finance) really good estimates to the upside should not be penalized as merely an increase in variance!

    +
    nns.accuracy=unlist(lapply(accuracy, `[[`, 1))
    +xgb.accuracy=unlist(lapply(accuracy, `[[`, 2))
    +
    +rbind(
    +  "Mean NNS" = mean(nns.accuracy),
    +  "Mean XGB" = mean(xgb.accuracy),
    +  "LPM NNS" = LPM(2, mean(nns.accuracy), nns.accuracy),
    +  "LPM XGB" = LPM(2, mean(xgb.accuracy), xgb.accuracy))
    +
    ##                  [,1]
    +## Mean NNS 0.8608110673
    +## Mean XGB 0.8558886609
    +## LPM NNS  0.0002272005
    +## LPM XGB  0.0002693473
    +
    t.test(nns.accuracy,xgb.accuracy)
    +
    ## 
    +##  Welch Two Sample t-test
    +## 
    +## data:  nns.accuracy and xgb.accuracy
    +## t = 4.9593, df = 1991.5, p-value = 7.674e-07
    +## alternative hypothesis: true difference in means is not equal to 0
    +## 95 percent confidence interval:
    +##  0.002975846 0.006868967
    +## sample estimates:
    +## mean of x mean of y 
    +## 0.8608111 0.8558887
    +
    +
    +
    +

    Example #2: agaricus Dataset

    +

    This time we will utilize the data from within xgboost, the agaricus dataset. Mushrooms described in terms of physical characteristics; classification: poisonous or edible.

    +
    ### Load [agaricus] dataset
    +data(agaricus.train, package = 'xgboost')
    +data(agaricus.test, package = 'xgboost')
    +
    +### Set up data for xgboost
    +dtrain = xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
    +dtest = xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
    +
    +
    +### Create parameters and run xgboost
    +watchlist = list(train = dtrain, eval = dtest)
    +
    +param = list(max_depth = 2, eta = 1, silent = 1, nthread = 2, 
    +              objective = "binary:logistic", eval_metric = "auc")
    +
    +bst = xgb.train(param, dtrain, nrounds = 2, watchlist)
    +
    ## [1]  train-auc:0.958228  eval-auc:0.960373 
    +## [2]  train-auc:0.981413  eval-auc:0.979930
    +
    ### Set up data for NNS
    +agaricus.iv.train = as.matrix(agaricus.train$data)
    +agaricus.dv.train = agaricus.train$label
    +agaricus.iv.test = as.matrix(agaricus.test$data)
    +agaricus.dv.test = agaricus.test$label
    +
    +
    +### Run NNS.boost
    +a = NNS.boost(IVs.train = agaricus.iv.train, DV.train = agaricus.dv.train, 
    +            IVs.test = agaricus.iv.test, feature.importance = FALSE, 
    +            representative.sample = TRUE, epochs = 100, learner.trials = 100,
    +            depth = "max", ncores = 1, subcores = 1)$results
    +
    +
    +### Evaluate accuracy  
    +mean(round(a) == as.numeric(agaricus.dv.test))
    +
    ## [1] 1
    +
    ### Evaluate AUC
    +auc(roc(as.numeric(agaricus.dv.test), round(a)))
    +
    ## Area under the curve: 1
    +

    As we can see, NNS achieves a perfect classification of the substantial (1,611 observations) out-of-sample test set while the xgboost AUC is approximately 98%.

    +
    +
    +

    Comments

    +

    NNS is a full machine learning methodology, all based on the seamless underlying clustering NNS.part embedded within the regression method NNS.reg.

    +

    +

    xgboost is maintained by an army of computer scientists, NNS is maintained by a single economist! Speed overwhelmingly favors xgboost, but with ample resources and collaboration, NNS will be able to compete on that front as well. Additional computational efficiency would permit even more NNS testing and further improvement!

    +

    NNS is not a one-trick pony, as it has been demonstrated to excel in time-series forecasting, nonlinear continuous regressions, and provide solutions for econometric applications. See the following examples:

    + +

    I look forward to further discussions and collaboration with those equally as passionate about these issues, and open to embracing alternative solutions. If you found this presentation interesting or useful, please feel free to reach out via e-mail:

    +

    Thanks for your interest!

    +
    + + + +
    +
    + +
    + + + + + + + + diff --git a/tools/NNS/man/NNS.part.Rd b/tools/NNS/man/NNS.part.Rd index 0895724..2861d2e 100644 --- a/tools/NNS/man/NNS.part.Rd +++ b/tools/NNS/man/NNS.part.Rd @@ -37,7 +37,7 @@ Returns: \itemize{ \item{\code{"dt"}} a \code{data.table} of \code{x} and \code{y} observations with their partition assignment \code{"quadrant"} in the 3rd column and their prior partition assignment \code{"prior.quadrant"} in the 4th column. \item{\code{"regression.points"}} the \code{data.table} of regression points for that given \code{(order = ...)}. - \item{\code{"order"}} the \code{order} of the final partition given \code{"min.obs.stop"} stopping condition. + \item{\code{"order"}} the \code{order} of the final partition given \code{"min.obs.stop"} stopping condition. } } \description{ diff --git a/tools/NNS/man/NNS_bin.Rd b/tools/NNS/man/NNS_bin.Rd new file mode 100644 index 0000000..f17a34d --- /dev/null +++ b/tools/NNS/man/NNS_bin.Rd @@ -0,0 +1,30 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/RcppExports.R +\name{NNS_bin} +\alias{NNS_bin} +\title{Fast binning of numeric vector into equidistant bins} +\usage{ +NNS_bin(x, width, origin = 0, missinglast = FALSE) +} +\arguments{ +\item{x}{A matrix of regressor variables. Must have the same number of rows as the length of y.} + +\item{width}{The width of the bins} + +\item{origin}{The starting point for the bins. Any number smaller than origin will be disregarded} + +\item{missinglast}{Boolean. Should the missing observations be added as a separate element at the end of the returned count vector.} +} +\value{ +An list with elements counts (the frequencies), origin (the origin), width (the width), missing (the number of missings), and last_bin_is_missing (boolean) telling whether the missinglast is true or not. +} +\description{ +Missing values (NA, Inf, NaN) are added at the end of the vector as the last bin returned if missinglast is set to TRUE +} +\examples{ +\dontrun{ +set.seed(1) +x <- sample(10, 20, replace = TRUE) +NNS_bin(x, 15) +} +} diff --git a/tools/NNS/src/Makevars b/tools/NNS/src/Makevars index d378f6e..5080e7d 100644 --- a/tools/NNS/src/Makevars +++ b/tools/NNS/src/Makevars @@ -1,4 +1,4 @@ PKG_LIBS += $(shell ${R_HOME}/bin/Rscript -e "RcppParallel::RcppParallelLibs()") LDFLAGS += -Wl,-rpath,$(shell ${R_HOME}/bin/Rscript -e "cat(system.file('lib', package='RcppParallel'))") -PKG_CPPFLAGS = -DR_NO_REMAP +PKG_CPPFLAGS = -DR_NO_REMAP \ No newline at end of file diff --git a/tools/NNS/src/Makevars.win b/tools/NNS/src/Makevars.win index 61c041e..9534f5f 100644 --- a/tools/NNS/src/Makevars.win +++ b/tools/NNS/src/Makevars.win @@ -4,4 +4,4 @@ PKG_LIBS += $(shell "${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe" \ -e "RcppParallel::RcppParallelLibs()") -PKG_CPPFLAGS = -DR_NO_REMAP +PKG_CPPFLAGS = -DR_NO_REMAP \ No newline at end of file diff --git a/tools/NNS/src/NNS_part.cpp b/tools/NNS/src/NNS_part.cpp index eeb2242..b8c5b51 100644 --- a/tools/NNS/src/NNS_part.cpp +++ b/tools/NNS/src/NNS_part.cpp @@ -10,15 +10,19 @@ using namespace Rcpp; +static inline bool complete_case_value(double x){ + return !ISNAN(x); +} + static inline double mean_no_na(const NumericVector& v){ long double s = 0.0L; std::size_t m = 0; - for(double xi : v) if(R_finite(xi)){ s += xi; ++m; } + for(double xi : v) if(complete_case_value(xi)){ s += xi; ++m; } return m ? static_cast(s / m) : NA_REAL; } static inline double median_no_na(const NumericVector& v){ std::vector a; a.reserve(v.size()); - for(double xi : v) if(R_finite(xi)) a.push_back(xi); + for(double xi : v) if(complete_case_value(xi)) a.push_back(xi); if(a.empty()) return NA_REAL; std::size_t n = a.size(); std::nth_element(a.begin(), a.begin() + n / 2, a.end()); @@ -111,18 +115,18 @@ List NNS_part_cpp(NumericVector x, double yi = y[i]; xv[k] = xi; yv[k] = yi; - if(R_finite(xi)){ if(xi < minx) minx = xi; if(xi > maxx) maxx = xi; } - if(R_finite(yi)){ if(yi < miny) miny = yi; if(yi > maxy) maxy = yi; } + if(complete_case_value(xi)){ if(xi < minx) minx = xi; if(xi > maxx) maxx = xi; } + if(complete_case_value(yi)){ if(yi < miny) miny = yi; if(yi > maxy) maxy = yi; } } Pair c{ agg.for_x(xv), agg.for_y(yv) }; centers[q] = c; if(!xonly){ - if(R_finite(c.y) && R_finite(minx) && R_finite(maxx)){ + if(complete_case_value(c.y) && complete_case_value(minx) && complete_case_value(maxx)){ H_x0.push_back(minx); H_x1.push_back(maxx); H_y.push_back(c.y); } - if(R_finite(c.x) && R_finite(miny) && R_finite(maxy)){ + if(complete_case_value(c.x) && complete_case_value(miny) && complete_case_value(maxy)){ V_x.push_back(c.x); V_y0.push_back(miny); V_y1.push_back(maxy); } } @@ -133,10 +137,10 @@ List NNS_part_cpp(NumericVector x, const auto &idx = kv.second; double minx = R_PosInf, maxx = R_NegInf; for(int i : idx){ - if(R_finite(x[i])){ if(x[i] < minx) minx = x[i]; if(x[i] > maxx) maxx = x[i]; } + if(complete_case_value(x[i])){ if(x[i] < minx) minx = x[i]; if(x[i] > maxx) maxx = x[i]; } } - if(R_finite(minx)) V_lines.push_back(minx); - if(R_finite(maxx)) V_lines.push_back(maxx); + if(complete_case_value(minx)) V_lines.push_back(minx); + if(complete_case_value(maxx)) V_lines.push_back(maxx); } } @@ -146,11 +150,11 @@ List NNS_part_cpp(NumericVector x, prior_quadrant[i] = quadrant[i]; int qn; if(!xonly){ - int lox = (R_finite(x[i]) && R_finite(c.x)) ? (x[i] <= c.x) : 0; - int loy = (R_finite(y[i]) && R_finite(c.y)) ? (y[i] <= c.y) : 0; + int lox = (complete_case_value(x[i]) && complete_case_value(c.x)) ? (x[i] <= c.x) : 0; + int loy = (complete_case_value(y[i]) && complete_case_value(c.y)) ? (y[i] <= c.y) : 0; qn = 1 + lox + 2 * loy; }else{ - int lox = (R_finite(x[i]) && R_finite(c.x)) ? (x[i] > c.x) : 0; + int lox = (complete_case_value(x[i]) && complete_case_value(c.x)) ? (x[i] > c.x) : 0; qn = 1 + lox; } // OPTIMIZATION 2: Bypass slow string allocators diff --git a/tools/NNS/src/RcppExports.cpp b/tools/NNS/src/RcppExports.cpp index 03bfd49..2a3ff24 100644 --- a/tools/NNS/src/RcppExports.cpp +++ b/tools/NNS/src/RcppExports.cpp @@ -319,14 +319,15 @@ BEGIN_RCPP END_RCPP } // generate_vectors -List generate_vectors(NumericVector x, IntegerVector l); -RcppExport SEXP _NNS_generate_vectors(SEXP xSEXP, SEXP lSEXP) { +List generate_vectors(NumericVector x, IntegerVector l, int len); +RcppExport SEXP _NNS_generate_vectors(SEXP xSEXP, SEXP lSEXP, SEXP lenSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< NumericVector >::type x(xSEXP); Rcpp::traits::input_parameter< IntegerVector >::type l(lSEXP); - rcpp_result_gen = Rcpp::wrap(generate_vectors(x, l)); + Rcpp::traits::input_parameter< int >::type len(lenSEXP); + rcpp_result_gen = Rcpp::wrap(generate_vectors(x, l, len)); return rcpp_result_gen; END_RCPP } @@ -425,6 +426,22 @@ BEGIN_RCPP return rcpp_result_gen; END_RCPP } +// NNS_reg_points_cpp +DataFrame NNS_reg_points_cpp(NumericVector x_, NumericVector y_, NumericVector rpx_, NumericVector rpy_, double dependence, double stn); +RcppExport SEXP _NNS_NNS_reg_points_cpp(SEXP x_SEXP, SEXP y_SEXP, SEXP rpx_SEXP, SEXP rpy_SEXP, SEXP dependenceSEXP, SEXP stnSEXP) { +BEGIN_RCPP + Rcpp::RObject rcpp_result_gen; + Rcpp::RNGScope rcpp_rngScope_gen; + Rcpp::traits::input_parameter< NumericVector >::type x_(x_SEXP); + Rcpp::traits::input_parameter< NumericVector >::type y_(y_SEXP); + Rcpp::traits::input_parameter< NumericVector >::type rpx_(rpx_SEXP); + Rcpp::traits::input_parameter< NumericVector >::type rpy_(rpy_SEXP); + Rcpp::traits::input_parameter< double >::type dependence(dependenceSEXP); + Rcpp::traits::input_parameter< double >::type stn(stnSEXP); + rcpp_result_gen = Rcpp::wrap(NNS_reg_points_cpp(x_, y_, rpx_, rpy_, dependence, stn)); + return rcpp_result_gen; +END_RCPP +} // CoLPM_nD_batch_RCPP NumericVector CoLPM_nD_batch_RCPP(const NumericMatrix& data, const NumericMatrix& targets, double degree, bool norm); RcppExport SEXP _NNS_CoLPM_nD_batch_RCPP(SEXP dataSEXP, SEXP targetsSEXP, SEXP degreeSEXP, SEXP normSEXP) { @@ -696,7 +713,7 @@ static const R_CallMethodDef CallEntries[] = { {"_NNS_is_discrete", (DL_FUNC) &_NNS_is_discrete, 1}, {"_NNS_factor_2_dummy", (DL_FUNC) &_NNS_factor_2_dummy, 1}, {"_NNS_factor_2_dummy_FR", (DL_FUNC) &_NNS_factor_2_dummy_FR, 1}, - {"_NNS_generate_vectors", (DL_FUNC) &_NNS_generate_vectors, 2}, + {"_NNS_generate_vectors", (DL_FUNC) &_NNS_generate_vectors, 3}, {"_NNS_generate_lin_vectors", (DL_FUNC) &_NNS_generate_lin_vectors, 3}, {"_NNS_ARMA_seas_weighting", (DL_FUNC) &_NNS_ARMA_seas_weighting, 2}, {"_NNS_NNS_meboot_part", (DL_FUNC) &_NNS_NNS_meboot_part, 7}, @@ -704,6 +721,7 @@ static const R_CallMethodDef CallEntries[] = { {"_NNS_force_clt", (DL_FUNC) &_NNS_force_clt, 2}, {"_NNS_downSample", (DL_FUNC) &_NNS_downSample, 4}, {"_NNS_upSample", (DL_FUNC) &_NNS_upSample, 4}, + {"_NNS_NNS_reg_points_cpp", (DL_FUNC) &_NNS_NNS_reg_points_cpp, 6}, {"_NNS_CoLPM_nD_batch_RCPP", (DL_FUNC) &_NNS_CoLPM_nD_batch_RCPP, 4}, {"_NNS_LPM_CPv", (DL_FUNC) &_NNS_LPM_CPv, 3}, {"_NNS_UPM_CPv", (DL_FUNC) &_NNS_UPM_CPv, 3}, diff --git a/tools/NNS/src/SD.cpp b/tools/NNS/src/SD.cpp index 5d258aa..1ae7529 100644 --- a/tools/NNS/src/SD.cpp +++ b/tools/NNS/src/SD.cpp @@ -305,4 +305,4 @@ int NNS_TSD_uni_cpp(const NumericVector& x, const NumericVector& y){ stop("You have some missing values, please address."); ColPre X = precompute_vec(x), Y = precompute_vec(y); return sd_dom_pair(X, Y, 3, true); -} +} \ No newline at end of file diff --git a/tools/NNS/src/central_tendencies.cpp b/tools/NNS/src/central_tendencies.cpp index 77aa081..58f524d 100644 --- a/tools/NNS/src/central_tendencies.cpp +++ b/tools/NNS/src/central_tendencies.cpp @@ -106,66 +106,74 @@ static void simple_bin_counts(const std::vector& xs, // ---------- NNS.gravity ---------- -// [[Rcpp::export]] -SEXP NNS_gravity_cpp(SEXP xSEXP, bool discrete) { - NumericVector xR(xSEXP); - std::vector x; - x.reserve(xR.size()); - for (double v : xR) if (R_finite(v)) x.push_back(v); - +// Core gravity computation operating on a vector of finite values. +// Shared by NNS_gravity_cpp and the NNS.reg regression-point builder. The +// result depends only on the set of values (x is sorted internally), so input +// order is irrelevant. +double gravity_value(std::vector x, bool discrete) { const int l = (int)x.size(); - if (l == 0) return Rf_ScalarReal(NA_REAL); + if (l == 0) return NA_REAL; if (l <= 3) { // median(x) std::vector t = x; std::sort(t.begin(), t.end()); double med = (l % 2 ? t[l/2] : 0.5*(t[l/2 - 1] + t[l/2])); - if (discrete) return Rf_ScalarReal( nearest_int_half_up(med) ); - return Rf_ScalarReal(med); + if (discrete) return nearest_int_half_up(med); + return med; } - + bool all_eq = true; for (int i = 1; i < l; ++i) if (x[i] != x[0]) { all_eq = false; break; } - if (all_eq) return Rf_ScalarReal(x[0]); - + if (all_eq) return x[0]; + std::sort(x.begin(), x.end()); double range = std::fabs(x.back() - x.front()); - if (range == 0.0) return Rf_ScalarReal(x.front()); - + if (range == 0.0) return x.front(); + double q1, q2, q3; quartiles_like_R_code(x, q1, q2, q3); - + double width = (q3 - q1) * std::pow((double)l, -0.5); if (!(width > 0.0) || !R_finite(width)) width = range / 128.0; - + std::vector z_names; std::vector counts; simple_bin_counts(x, width, x.front(), z_names, counts); const int lz = (int)counts.size(); - + // If unique max, use neighborhood; else use all bins int maxc = 0; for (int c : counts) if (c > maxc) maxc = c; int ties = 0; for (int c : counts) if (c == maxc) ++ties; - + int lo = 0, hi = lz - 1; if (ties == 1) { int zc = 0; for (int i = 0; i < lz; ++i) if (counts[i] == maxc) { zc = i; break; } lo = std::max(0, zc - 1); hi = std::min(lz - 1, zc + 1); } - + long double num = 0.0L, den = 0.0L; for (int i = lo; i <= hi; ++i) { num += (long double)z_names[i] * (long double)counts[i]; den += (long double)counts[i]; } double m = (den > 0.0L) ? (double)(num / den) : z_names[ (lo+hi)/2 ]; - + double mu = mean_vec(x); double mid = 0.25 * ( q2 + m + mu + 0.5*(q1 + q3) ); - + double out = R_finite(mid) ? mid : q2; if (discrete) out = nearest_int_half_up(out); - return Rf_ScalarReal(out); + return out; +} + +// [[Rcpp::export]] +SEXP NNS_gravity_cpp(SEXP xSEXP, bool discrete) { + NumericVector xR(xSEXP); + std::vector x; + x.reserve(xR.size()); + for (double v : xR) if (R_finite(v)) x.push_back(v); + + return Rf_ScalarReal( gravity_value(x, discrete) ); } // ---------- NNS.rescale ---------- diff --git a/tools/NNS/src/central_tendencies.h b/tools/NNS/src/central_tendencies.h index c7b3798..434b9dd 100644 --- a/tools/NNS/src/central_tendencies.h +++ b/tools/NNS/src/central_tendencies.h @@ -13,6 +13,11 @@ /// @return A scalar SEXP containing the estimated center of gravity. SEXP NNS_gravity_cpp(SEXP xSEXP, bool discrete); +/// Core gravity computation operating on a vector of finite values. +/// Shared by NNS_gravity_cpp and the NNS.reg regression-point builder so both +/// produce bit-identical results. +double gravity_value(std::vector x, bool discrete); + /// Compute the mode (or modal class) depending on the supplied flags. /// /// @param xSEXP Input vector supplied from R. @@ -22,4 +27,4 @@ SEXP NNS_gravity_cpp(SEXP xSEXP, bool discrete); /// wrapper. SEXP NNS_mode_cpp(SEXP xSEXP, bool discrete, bool multi); -#endif // CENTRAL_TENDENCIES_H +#endif // CENTRAL_TENDENCIES_H \ No newline at end of file diff --git a/tools/NNS/src/internal_functions.cpp b/tools/NNS/src/internal_functions.cpp index c1f7050..3e09439 100644 --- a/tools/NNS/src/internal_functions.cpp +++ b/tools/NNS/src/internal_functions.cpp @@ -176,8 +176,13 @@ SEXP factor_2_dummy_FR(SEXP x) { // ---------- 4) generate.vectors ---------- // [[Rcpp::export(name = "generate.vectors")]] -List generate_vectors(NumericVector x, IntegerVector l) { - int n = x.size(); +List generate_vectors(NumericVector x, IntegerVector l, int len = -1) { + // 'len' lets the caller treat only the first 'len' elements of 'x' as the + // active series, so a persistent pre-allocated buffer can be reused across + // recursive forecast steps without re-allocating or copying a growing + // vector each step. len < 0 (default) preserves the original behaviour of + // using the full length of 'x'. + int n = (len < 0) ? x.size() : len; List comp_series(l.size()), comp_index(l.size()); for (int t = 0; t < l.size(); ++t) { int lag = l[t]; diff --git a/tools/NNS/src/nns_reg_points.cpp b/tools/NNS/src/nns_reg_points.cpp new file mode 100644 index 0000000..2aa2693 --- /dev/null +++ b/tools/NNS/src/nns_reg_points.cpp @@ -0,0 +1,262 @@ +// [[Rcpp::depends(Rcpp)]] +#include +#include +#include +#include +#include "central_tendencies.h" + +using namespace Rcpp; + +// =========================================================================== +// NNS.reg multivariate regression-point builder (native fast path) +// +// Reproduces, bit-for-bit, the regression.points pipeline in R/Regression.R +// (the block that NNS.reg returns for multivariate.call = TRUE) for the gated +// configuration: type = NULL, noise.reduction = "off", dependence < 1, +// dep.reduced.order != "max", smooth = FALSE. All other configurations fall +// back to the pure-R path. +// +// This exists because NNS.ARMA / NNS.stack / NNS.boost call NNS.reg hundreds of +// times on small inputs, where per-call data.table overhead dominates. +// =========================================================================== + +// --- mean via long double accumulation. Matches both base R mean() and +// Rcpp sugar mean() (verified identical), the latter used by fast_lm. +static double r_mean_v(const std::vector& v) { + long double s = 0.0L; for (double d : v) s += d; + return (double) (s / (long double) v.size()); +} + +// --- base R mean(c(a, b)) : two values via long double, matching R exactly +static double r_mean2(double a, double b) { + long double s = (long double) a + (long double) b; + return (double) (s / 2.0L); +} + +// --- gravity() : filter to finite then call shared core (continuous) +static double grav(const std::vector& v) { + std::vector f; f.reserve(v.size()); + for (double d : v) if (R_finite(d)) f.push_back(d); + return gravity_value(f, false); +} + +// --- OLS fit replicating src/fast_lm.cpp exactly (intercept a, slope b) +static void ols_fit(const std::vector& xs, const std::vector& ys, + double& a, double& b) { + const double mx = r_mean_v(xs); + const double my = r_mean_v(ys); + double vx = 0.0, cv = 0.0; + for (size_t i = 0; i < xs.size(); ++i) { + const double dx = xs[i] - mx, dy = ys[i] - my; + vx += dx * dx; cv += dx * dy; + } + if (vx == 0.0) { a = my; b = 0.0; } + else { b = cv / vx; a = my - b * mx; } +} + +// --- number of distinct values (exact equality, matching base R unique) +static int count_unique(const std::vector& v) { + std::vector t = v; + std::sort(t.begin(), t.end()); + t.erase(std::unique(t.begin(), t.end()), t.end()); + return (int) t.size(); +} + +// --- unique() preserving first-occurrence order (matches base R unique) +static std::vector unique_preserve(const std::vector& v) { + std::vector out; + for (double d : v) { + bool seen = false; + for (double e : out) if (e == d) { seen = true; break; } + if (!seen) out.push_back(d); + } + return out; +} + +// --- consolidate: group by exact x (ascending), y = gravity() of group's y. +// Equivalent to setkey(rp,x); rp[, y := gravity(y), by="x"]; unique(rp). +static void consolidate(const std::vector& xs, const std::vector& ys, + std::vector& ox, std::vector& oy) { + ox.clear(); oy.clear(); + const size_t n = xs.size(); + if (n == 0) return; + std::vector idx(n); + for (size_t i = 0; i < n; ++i) idx[i] = i; + std::stable_sort(idx.begin(), idx.end(), + [&](size_t a, size_t b) { return xs[a] < xs[b]; }); + size_t i = 0; + while (i < n) { + const double cx = xs[idx[i]]; + std::vector grp; + size_t j = i; + while (j < n && xs[idx[j]] == cx) { + if (R_finite(ys[idx[j]])) grp.push_back(ys[idx[j]]); + ++j; + } + ox.push_back(cx); + oy.push_back(gravity_value(grp, false)); + i = j; + } +} + +// [[Rcpp::export]] +DataFrame NNS_reg_points_cpp(NumericVector x_, NumericVector y_, + NumericVector rpx_, NumericVector rpy_, + double dependence, double stn) { + const std::vector x(x_.begin(), x_.end()); + const std::vector y(y_.begin(), y_.end()); + + // min/max of original x, y + double minx = R_PosInf, maxx = R_NegInf, miny = R_PosInf, maxy = R_NegInf; + for (double v : x) { if (v < minx) minx = v; if (v > maxx) maxx = v; } + for (double v : y) { if (v < miny) miny = v; if (v > maxy) maxy = v; } + + // ---- Step A: clamp rp x into [minx, maxx], consolidate ---- + std::vector cx(rpx_.size()), cy(rpy_.begin(), rpy_.end()); + for (int i = 0; i < rpx_.size(); ++i) + cx[i] = std::min(maxx, std::max(rpx_[i], minx)); // pmin(maxx, pmax(rp, minx)) + std::vector rx, ry; + consolidate(cx, cy, rx, ry); + const int N = (int) rx.size(); + + // ---- Step B: central point (med.rps), type = NULL branch ---- + const double medpos = (N + 1) / 2.0; // median(1:N) + const int m_lo = (int) std::floor(medpos); + const int m_hi = (int) std::ceil(medpos); + const double cxlo = rx[m_lo - 1]; + const double cxhi = rx[m_hi - 1]; + const bool two = (m_lo != m_hi); // length(unique(central_rows)) > 1 + + double central_y; + if (two) { + std::vector sub; + for (size_t i = 0; i < x.size(); ++i) + if (x[i] >= cxlo && x[i] <= cxhi) sub.push_back(y[i]); + central_y = grav(sub); + } else { + central_y = ry[m_lo - 1]; + } + std::vector cc = { cxlo, cxhi }; // rp[central_rows,]$x (length 2) + const double central_x = grav(cc); // gravity(central_x) + + // ---- Step C: append med.rps, complete.cases, consolidate ---- + std::vector bx = rx, by = ry; + bx.push_back(central_x); by.push_back(central_y); + std::vector fx, fy; + // complete.cases() keeps Inf/-Inf and drops only NA/NaN (ISNAN covers both). + for (size_t i = 0; i < bx.size(); ++i) + if (!ISNAN(bx[i]) && !ISNAN(by[i])) { fx.push_back(bx[i]); fy.push_back(by[i]); } + std::vector cx2, cy2; + consolidate(fx, fy, cx2, cy2); + + // ---- Step D: endpoints (dependence < 1, type = NULL) ---- + const double minr = *std::min_element(cx2.begin(), cx2.end()); + const double maxr = *std::max_element(cx2.begin(), cx2.end()); + const double mid_min = r_mean2(minx, minr); // mean(c(min(x), min(rp$x))) + const double mid_max = r_mean2(maxx, maxr); + + std::vector y_min, y_midmin, x_midmin; + std::vector y_max, y_midmax, x_midmax; + // na.omit() (like complete.cases) keeps Inf/-Inf and drops only NA/NaN. + for (size_t i = 0; i < x.size(); ++i) { + const double xi = x[i], yi = y[i]; + if (xi <= minr && !ISNAN(yi)) y_min.push_back(yi); + if (xi <= mid_min) { if (!ISNAN(yi)) y_midmin.push_back(yi); if (!ISNAN(xi)) x_midmin.push_back(xi); } + if (xi >= maxr && !ISNAN(yi)) y_max.push_back(yi); + if (xi >= mid_max) { if (!ISNAN(yi)) y_midmax.push_back(yi); if (!ISNAN(xi)) x_midmax.push_back(xi); } + } + const int l_y_min = (int) y_min.size(); + const int l_y_midmin = (int) y_midmin.size(); + const int l_y_max = (int) y_max.size(); + const int l_y_midmax = (int) y_midmax.size(); + const int l_x_midmin_unique = count_unique(x_midmin); + const int l_x_midmax_unique = count_unique(x_midmax); + + // y / x values where original x == min(x) / max(x) + std::vector y_at_minx, y_at_maxx; + std::vector xs_le_minr, ys_le_minr, xs_le_midmin, ys_le_midmin; + std::vector xs_ge_maxr, ys_ge_maxr, xs_ge_midmax, ys_ge_midmax; + for (size_t i = 0; i < x.size(); ++i) { + const double xi = x[i], yi = y[i]; + if (xi == minx) y_at_minx.push_back(yi); + if (xi == maxx) y_at_maxx.push_back(yi); + if (xi <= minr) { xs_le_minr.push_back(xi); ys_le_minr.push_back(yi); } + if (xi <= mid_min){ xs_le_midmin.push_back(xi); ys_le_midmin.push_back(yi); } + if (xi >= maxr) { xs_ge_maxr.push_back(xi); ys_ge_maxr.push_back(yi); } + if (xi >= mid_max){ xs_ge_midmax.push_back(xi); ys_ge_midmax.push_back(yi); } + } + + // --- min endpoint x0 --- + std::vector x0; + if (l_x_midmin_unique > 1 && l_y_min > 5) { + if (dependence < stn) { + if (l_y_min > 1 && l_y_midmin > 1) { + double a1, b1, a2, b2; + ols_fit(xs_le_minr, ys_le_minr, a1, b1); + ols_fit(xs_le_midmin, ys_le_midmin, a2, b2); + const double f1 = a1 + b1 * (*std::min_element(xs_le_minr.begin(), xs_le_minr.end())); + const double f2 = a2 + b2 * (*std::min_element(xs_le_midmin.begin(), xs_le_midmin.end())); + // R: sum(f1*l_y.min, f2*l_y.mid.min) / sum(l_y.min, l_y.mid.min) + // each sum() returns a double, then the division is in double. + const double num = (double)((long double)(f1 * l_y_min) + (long double)(f2 * l_y_midmin)); + x0.push_back(num / (double)(l_y_min + l_y_midmin)); + } else { + x0 = y_min; + } + } else { + x0 = unique_preserve(y_at_minx); + } + } else { + x0.push_back(grav(y_at_minx)); + } + const double min_rps_y = r_mean_v(x0); // mean(x0) + + // --- max endpoint x.max --- + std::vector xmaxv; + if (l_x_midmax_unique > 1 && l_y_max > 5) { + if (dependence < stn) { + if (l_y_max > 1 && l_y_midmax > 1) { + double a1, b1, a2, b2; + ols_fit(xs_ge_maxr, ys_ge_maxr, a1, b1); + ols_fit(xs_ge_midmax, ys_ge_midmax, a2, b2); + const double f1 = a1 + b1 * (*std::max_element(xs_ge_maxr.begin(), xs_ge_maxr.end())); + const double f2 = a2 + b2 * (*std::max_element(xs_ge_midmax.begin(), xs_ge_midmax.end())); + const double num = (double)((long double)(f1 * l_y_max) + (long double)(f2 * l_y_midmax)); + xmaxv.push_back(num / (double)(l_y_max + l_y_midmax)); + } else { + xmaxv = y_max; + } + } else { + xmaxv = unique_preserve(y_at_maxx); + } + } else { + xmaxv.push_back(grav(y_at_maxx)); + } + const double max_rps_y = r_mean_v(xmaxv); // mean(x.max) + + // ---- Step E: append min/max/med rps, complete.cases, consolidate ---- + std::vector ex = cx2, ey = cy2; + ex.push_back(minx); ey.push_back(min_rps_y); // min.rps + ex.push_back(maxx); ey.push_back(max_rps_y); // max.rps + ex.push_back(central_x); ey.push_back(central_y); // med.rps + std::vector gx, gy; + // complete.cases() keeps Inf/-Inf and drops only NA/NaN. + for (size_t i = 0; i < ex.size(); ++i) + if (!ISNAN(ex[i]) && !ISNAN(ey[i])) { gx.push_back(ex[i]); gy.push_back(ey[i]); } + std::vector hx, hy; + consolidate(gx, gy, hx, hy); + + // ---- Step F/G: single-row tripling, then clamp ---- + std::vector ox, oy; + if ((int) hx.size() == 1) { + for (int k = 0; k < 3; ++k) { ox.push_back(hx[0]); oy.push_back(hy[0]); } + } else { + ox = hx; oy = hy; + } + for (size_t i = 0; i < ox.size(); ++i) { + ox[i] = std::min(std::max(ox[i], minx), maxx); + oy[i] = std::min(std::max(oy[i], miny), maxy); + } + + return DataFrame::create(_["x"] = wrap(ox), _["y"] = wrap(oy)); +} diff --git a/tools/NNS/src/partial_moments.cpp b/tools/NNS/src/partial_moments.cpp index 2bc723e..40eb2d2 100644 --- a/tools/NNS/src/partial_moments.cpp +++ b/tools/NNS/src/partial_moments.cpp @@ -1027,4 +1027,4 @@ List PMMatrix_CPv( Named("cov.matrix") = covMat ) ); -} +} \ No newline at end of file diff --git a/tools/NNS/tests/testthat/Rplots.pdf b/tools/NNS/tests/testthat/Rplots.pdf index 4646286..d3be77e 100644 Binary files a/tools/NNS/tests/testthat/Rplots.pdf and b/tools/NNS/tests/testthat/Rplots.pdf differ diff --git a/tools/NNS/tests/testthat/test_NNS_reg_infinite.R b/tools/NNS/tests/testthat/test_NNS_reg_infinite.R new file mode 100644 index 0000000..d75eb22 --- /dev/null +++ b/tools/NNS/tests/testthat/test_NNS_reg_infinite.R @@ -0,0 +1,12 @@ +test_that("native partition complete-case handling keeps infinities", { + part <- NNS.part( + x = c(Inf, Inf, NA_real_, NaN), + y = c(1, 2, 3, 4), + order = 1, + obs.req = 0, + noise.reduction = "median" + ) + + expect_true(any(is.infinite(part$regression.points$x))) + expect_false(any(is.na(part$regression.points$x))) +}) diff --git a/tools/NNS/vignettes/NNSvignette_01_Overview.R b/tools/NNS/vignettes/NNSvignette_01_Overview.R new file mode 100644 index 0000000..9fb1efc --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_01_Overview.R @@ -0,0 +1,205 @@ +## ----setup, message=FALSE----------------------------------------------------- +# Prereqs (uncomment if needed): +# install.packages("NNS") +# install.packages(c("data.table","xts","zoo","Rfast")) + +library(NNS) +library(data.table) + +## ----include=FALSE, message=FALSE--------------------------------------------- +data.table::setDTthreads(1L) +options(mc.cores = 1) +RcppParallel::setThreadOptions(numThreads = 1) +Sys.setenv("OMP_THREAD_LIMIT" = 1) + +## ----------------------------------------------------------------------------- +set.seed(42) + +# Normal sample +y <- rnorm(3000) +mu <- mean(y) +L2 <- LPM(2, mu, y); U2 <- UPM(2, mu, y) +cat(sprintf("LPM2 + UPM2 = %.6f vs var(y)=%.6f\n", (L2+U2)*(length(y) / (length(y) - 1)), var(y))) + +# Empirical CDF via LPM.ratio(0, t, x) +for (t in c(-1,0,1)) { + cdf_lpm <- LPM.ratio(0, t, y) + cat(sprintf("CDF at t=%+.1f : LPM.ratio=%.4f | empirical=%.4f\n", t, cdf_lpm, mean(y<=t))) +} + +# Asymmetry on a skewed distribution +z <- rexp(3000)-1; mu_z <- mean(z) +cat(sprintf("Skewed z: LPM2=%.4f, UPM2=%.4f (expect imbalance)\n", LPM(2,mu_z,z), UPM(2,mu_z,z))) + +## ----------------------------------------------------------------------------- +M <- NNS.moments(y) +M + +## ----------------------------------------------------------------------------- +set.seed(23) +multimodal <- c(rnorm(1500,-2,.5), rnorm(1500,2,.5)) +NNS.mode(multimodal,multi = TRUE) + +## ----------------------------------------------------------------------------- +qgrid <- LPM.VaR(seq(0.05,0.95,.1),0,z) # equivalent to quantile(z,probs = seq(0.05,0.95,by=0.1)) +CDF_tbl <- data.table(threshold = as.numeric(qgrid), CDF = LPM.ratio(0,qgrid,z)) +CDF_tbl + +## ----------------------------------------------------------------------------- +set.seed(1) +x <- runif(2000,-1,1) +y <- x^2 + rnorm(2000, sd=.05) +cat(sprintf("Pearson r = %.4f\n", cor(x,y))) +cat(sprintf("NNS.dep = %.4f\n", NNS.dep(x,y)$Dependence)) + +X <- data.frame(a=x, b=y, c=x*y + rnorm(2000, sd=.05)) +pm <- PM.matrix(1, 1, target = "means", variable=X, pop_adj=TRUE) +pm + +cop <- NNS.copula(X, continuous=TRUE, plot=FALSE) +cop + +## ----eval=FALSE--------------------------------------------------------------- +# # Data +# set.seed(123); x = rnorm(100); y = rnorm(100); z = expand.grid(x, y) +# +# # Plot +# rgl::plot3d(z[,1], z[,2], Co.LPM(0, z[,1], z[,2], z[,1], z[,2]), col = "red") +# +# # Uniform values +# u_x = LPM.ratio(0, x, x); u_y = LPM.ratio(0, y, y); z = expand.grid(u_x, u_y) +# +# # Plot +# rgl::plot3d(z[,1], z[,2], Co.LPM(0, z[,1], z[,2], z[,1], z[,2]), col = "blue") + +## ----------------------------------------------------------------------------- +A <- rnorm(100, mean = 0, sd = 1) +B <- rnorm(100, mean = 0, sd = 5) +C <- rnorm(100, mean = 10, sd = 1) +D <- rnorm(100, mean = 10, sd = 10) + +X <- data.frame(A, B, C, D) + +# Linear scaling +lin_norm <- NNS.norm(X, linear = TRUE, chart.type=NULL, location=NULL) + +## ----------------------------------------------------------------------------- +px <- 100 + cumsum(rnorm(260, sd = 1)) +rn <- NNS.rescale(px, a=100, b=0.03, method="riskneutral", T=1, type="Terminal") +c( target = 100*exp(0.03*1), mean_rn = mean(rn) ) + +## ----------------------------------------------------------------------------- +ctrl <- rnorm(200, 0, 1) +trt <- rnorm(180, 0.35, 1.2) +NNS.ANOVA(control=ctrl, treatment=trt, means.only=FALSE, plot=FALSE) + +A <- list(g1=rnorm(150,0.0,1.1), g2=rnorm(150,0.2,1.0), g3=rnorm(150,-0.1,0.9)) +NNS.ANOVA(control=A, means.only=TRUE, plot=FALSE) + +## ----stochsuperiority, echo=TRUE---------------------------------------------- +set.seed(123) +x = rnorm(1000, mean = 0, sd = 1) +y = rnorm(1000, mean = 1, sd = 1) + +NNS.SS(x, y) + +## ----stochsuperiorityci, echo=TRUE, eval=FALSE-------------------------------- +# NNS.SS(x, y, confidence.interval = TRUE, reps = 999, ci = 0.95)[1:5] +# +# $p_gt +# [1] 0.233915 +# +# $p_tie +# [1] 0 +# +# $p_star +# [1] 0.233915 +# +# $lower +# [1] 0.2105631 +# +# $upper +# [1] 0.2537789 + +## ----stochsuperioritydiscrete, echo=TRUE-------------------------------------- +set.seed(123) +x = sample(1:5, 100, replace = TRUE) +y = sample(1:5, 100, replace = TRUE) + +NNS.SS(x, y) + +## ----fig.width=7, fig.height=5, fig.align='center'---------------------------- +# Example 1: Nonlinear regression +set.seed(123) +x_train <- runif(1000, -2, 2) +y_train <- sin(pi * x_train) + rnorm(1000, sd = 0.2) + +x_test <- seq(-2, 2, length.out = 100) + +NNS.reg(x = x_train, y = y_train, order = NULL, point.est = x_test) + +## ----eval = FALSE------------------------------------------------------------- +# # Simple train/test for boosting & stacking +# test.set = 141:150 +# +# boost <- NNS.boost(IVs.train = iris[-test.set, 1:4], +# DV.train = iris[-test.set, 5], +# IVs.test = iris[test.set, 1:4], +# epochs = 10, learner.trials = 10, +# status = FALSE, balance = TRUE, +# type = "CLASS", folds = 5) +# +# +# mean(boost$results == as.numeric(iris[test.set,5])) +# # [1] 1 +# +# +# boost$feature.weights; boost$feature.frequency +# +# stacked <- NNS.stack(IVs.train = iris[-test.set, 1:4], +# DV.train = iris[-test.set, 5], +# IVs.test = iris[test.set, 1:4], +# type = "CLASS", balance = TRUE, +# ncores = 1, folds = 1) +# mean(stacked$stack == as.numeric(iris[test.set,5])) +# # [1] 1 + +## ----------------------------------------------------------------------------- +NNS.caus(mtcars$hp, mtcars$mpg) # hp -> mpg +NNS.caus(mtcars$mpg, mtcars$hp) # hp -> mpg + +## ----fig.width=7, fig.align='center'------------------------------------------ +# Univariate nonlinear ARMA +z <- as.numeric(scale(sin(1:480/8) + rnorm(480, sd=.35))) + +# Seasonality detection (prints a summary) +seasonal_period <- NNS.seas(z, plot = FALSE) +head(seasonal_period$all.periods) + +# Validate seasonal periods +NNS.ARMA.optim(z, h = 48, seasonal.factor = seasonal_period$periods, plot = TRUE, ncores = 1) + +## ----------------------------------------------------------------------------- +x_ts <- cumsum(rnorm(350, sd=.7)) +mb <- NNS.meboot(x_ts, reps=5, rho = 1) +dim(mb["replicates", ]$replicates) + +## ----------------------------------------------------------------------------- +mc <- NNS.MC(x_ts, reps=5, lower_rho=-1, upper_rho=1, by=.5, exp=1) +length(mc$ensemble); names(mc$replicates) + +head(mc$replicates$`rho = 0`) + +## ----------------------------------------------------------------------------- +RA <- rnorm(240, 0.005, 0.03) +RB <- rnorm(240, 0.003, 0.02) +RC <- rnorm(240, 0.006, 0.04) + +NNS.FSD.uni(RA, RB) +NNS.SSD.uni(RA, RB) +NNS.TSD.uni(RA, RB) + +Rmat <- cbind(A=RA, B=RB, C=RC) +try(NNS.SD.cluster(Rmat, degree = 1)) +try(NNS.SD.efficient.set(Rmat, degree = 1)) + diff --git a/tools/NNS/vignettes/NNSvignette_01_Overview.html b/tools/NNS/vignettes/NNSvignette_01_Overview.html new file mode 100644 index 0000000..56f9d4a --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_01_Overview.html @@ -0,0 +1,1362 @@ + + + + + + + + + + + + + + + +Getting Started with NNS: Overview + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Getting Started with NNS: Overview

    +

    Fred Viole

    + + + +
    # Prereqs (uncomment if needed):
    +# install.packages("NNS")
    +# install.packages(c("data.table","xts","zoo","Rfast"))
    +
    +library(NNS)
    +library(data.table)
    +
    +

    Orientation

    +

    Goal. A complete, hands‑on curriculum for Nonlinear +Nonparametric Statistics (NNS) using partial moments. +Each section blends narrative intuition, precise math, and executable +code.

    +

    Structure. 1. Foundations — partial moments & +variance decomposition 2. Descriptive & distributional tools 3. +Dependence & nonlinear association 4. Normalization & Rescaling +5. Hypothesis testing, ANOVA & Stochastic Superiority 6. Regression, +boosting, stacking & causality 7. Time series & forecasting 8. +Simulation (max‑entropy) & Monte Carlo 9. Portfolio & stochastic +dominance

    +

    Notation. For a random variable \(X\) and threshold/target \(t\), the population \(n\)‑th partial moments are +defined as:

    +

    \[ +\operatorname{LPM}(n,t,X) += \int_{-\infty}^{t} (t-x)^{n} \, dF_X(x), +\qquad +\operatorname{UPM}(n,t,X) += \int_{t}^{\infty} (x-t)^{n} \, dF_X(x). +\]

    +

    The empirical estimators replace \(F_X\) with the empirical CDF \(\hat F_n\) (or, equivalently, use indicator +functions):

    +

    \[ +\widehat{\operatorname{LPM}}_n(t;X) = \frac{1}{n} \sum_{i=1}^n (t-x_i)^n +\, \mathbf{1}_{\{x_i \le t\}}, +\qquad +\widehat{\operatorname{UPM}}_n(t;X) = \frac{1}{n} \sum_{i=1}^n (x_i-t)^n +\, \mathbf{1}_{\{x_i > t\}}. +\]

    +

    These correspond to integrals over the measurable subsets \(\{X \le t\}\) and \(\{X > t\}\) in a \(\sigma\)‑algebra; the empirical sums are +discrete analogues of Lebesgue integrals.

    +
    +
    +
    +

    1. Foundations — Partial Moments & Variance Decomposition

    +
    +

    1.1 Why partial moments

    +
      +
    • Classical variance treats upside and downside symmetrically. Partial +moments separate them, allowing asymmetric risk/reward +analysis around a chosen target \(t\) +(often the mean or a benchmark).
    • +
    • At \(t=\mu_X\): \[ +\operatorname{Var}(X) = \operatorname{UPM}(2,\mu_X,X) + +\operatorname{LPM}(2,\mu_X,X)\quad\text{(exact empirical identity)}. +\] This is not the same as splitting conditional +variances around a threshold; partial moments use a global +reference, preserving the between‑group contribution.
    • +
    +
    +
    +

    1.2 Core functions and headers

    +
      +
    • LPM(degree, target, variable)
    • +
    • UPM(degree, target, variable)
    • +
    +
    +
    +

    1.3 Code: variance decomposition & CDF

    +
    set.seed(42)
    +
    +# Normal sample
    +y <- rnorm(3000)
    +mu <- mean(y)
    +L2 <- LPM(2, mu, y); U2 <- UPM(2, mu, y)
    +cat(sprintf("LPM2 + UPM2 = %.6f vs var(y)=%.6f\n", (L2+U2)*(length(y) / (length(y) - 1)), var(y)))
    +
    ## LPM2 + UPM2 = 1.011889 vs var(y)=1.011889
    +
    # Empirical CDF via LPM.ratio(0, t, x)
    +for (t in c(-1,0,1)) {
    +  cdf_lpm <- LPM.ratio(0, t, y)
    +  cat(sprintf("CDF at t=%+.1f : LPM.ratio=%.4f | empirical=%.4f\n", t, cdf_lpm, mean(y<=t)))
    +}
    +
    ## CDF at t=-1.0 : LPM.ratio=0.1633 | empirical=0.1633
    +## CDF at t=+0.0 : LPM.ratio=0.5043 | empirical=0.5043
    +## CDF at t=+1.0 : LPM.ratio=0.8480 | empirical=0.8480
    +
    # Asymmetry on a skewed distribution
    +z <- rexp(3000)-1; mu_z <- mean(z)
    +cat(sprintf("Skewed z: LPM2=%.4f, UPM2=%.4f (expect imbalance)\n", LPM(2,mu_z,z), UPM(2,mu_z,z)))
    +
    ## Skewed z: LPM2=0.2780, UPM2=0.7682 (expect imbalance)
    +

    Interpretation. The equality +LPM2 + UPM2 == var(x) (Bessel adjustment used) holds +because deviations are measured against the global mean. +LPM.ratio(0, t, x) constructs an empirical CDF directly +from partial‑moment counts.

    +
    +
    +
    +
    +

    2. Descriptive & Distributional Tools

    +
    +

    2.1 Higher moments from partial moments

    +

    Define asymmetric analogues of skewness/kurtosis using \(\operatorname{UPM}_3\), \(\operatorname{LPM}_3\) (and degree 4), +yielding robust tail diagnostics without parametric assumptions.

    +

    Header.

    +
      +
    • NNS.moments(x)
    • +
    +
    M <- NNS.moments(y)
    +M
    +
    ## $mean
    +## [1] -0.0114498
    +## 
    +## $variance
    +## [1] 1.011552
    +## 
    +## $skewness
    +## [1] -0.007412142
    +## 
    +## $kurtosis
    +## [1] 0.06723772
    +
    +
    +

    2.2 Mode estimation (no bin‑or‑bandwidth angst)

    +

    Header.

    +
      +
    • NNS.mode(x)
    • +
    +
    set.seed(23)
    +multimodal <- c(rnorm(1500,-2,.5), rnorm(1500,2,.5))
    +NNS.mode(multimodal,multi = TRUE)
    +
    ## [1] -2.049405  1.987674
    +
    +
    +

    2.3 CDF tables via LPM ratios

    +

    Headers.

    +
      +
    • LPM.ratio(degree = 0, target, variable) (empirical CDF +when degree=0)
    • +
    • UPM.ratio(degree = 0, target, variable)
    • +
    • LPM.VaR(p, degree, variable) (quantiles via +partial‑moment CDFs)
    • +
    • UPM.VaR(p, degree, variable)
    • +
    +
    qgrid <- LPM.VaR(seq(0.05,0.95,.1),0,z) # equivalent to quantile(z,probs = seq(0.05,0.95,by=0.1))
    +CDF_tbl <- data.table(threshold = as.numeric(qgrid), CDF = LPM.ratio(0,qgrid,z))
    +CDF_tbl
    +
    ##       threshold   CDF
    +##           <num> <num>
    +##  1: -0.94052127  0.05
    +##  2: -0.83748109  0.15
    +##  3: -0.71317882  0.25
    +##  4: -0.57443327  0.35
    +##  5: -0.41017671  0.45
    +##  6: -0.20424962  0.55
    +##  7:  0.06850182  0.65
    +##  8:  0.41462712  0.75
    +##  9:  0.94307172  0.85
    +## 10:  2.09633977  0.95
    +
    +
    +
    +
    +

    3. Dependence & Nonlinear Association

    +
    +

    3.1 Why move beyond Pearson \(r\)

    +

    Pearson captures linear monotone relationships. Many structures +(U‑shapes, saturation, asymmetric tails) produce near‑zero \(r\) despite strong dependence. +Partial‑moment dependence metrics respond to such structure.

    +

    Headers.

    +
      +
    • Co.LPM(degree_lpm, x, y, target_x, target_y, degree_y) +/ Co.UPM(...) (co‑partial moments)
    • +
    • PM.matrix(LPM_degree, UPM_degree, target=NULL, variable, pop_adj=TRUE)
    • +
    • NNS.dep(x, y) (scalar dependence coefficient)
    • +
    • NNS.copula(X, target=NULL, continuous=TRUE, plot=FALSE, independence.overlay=FALSE)
    • +
    +
    +
    +

    3.2 Code: nonlinear dependence

    +
    set.seed(1)
    +x <- runif(2000,-1,1)
    +y <- x^2 + rnorm(2000, sd=.05)
    +cat(sprintf("Pearson r = %.4f\n", cor(x,y)))
    +
    ## Pearson r = 0.0006
    +
    cat(sprintf("NNS.dep  = %.4f\n", NNS.dep(x,y)$Dependence))
    +
    ## NNS.dep  = 0.7097
    +
    X <- data.frame(a=x, b=y, c=x*y + rnorm(2000, sd=.05))
    +pm <- PM.matrix(1, 1, target = "means", variable=X, pop_adj=TRUE)
    +pm
    +
    ## $cupm
    +##            a          b          c
    +## a 0.17384174 0.05668152 0.10450858
    +## b 0.05668152 0.05566363 0.04414923
    +## c 0.10450858 0.04414923 0.07529373
    +## 
    +## $dupm
    +##              a          b            c
    +## a 0.0000000000 0.05675501 0.0005598221
    +## b 0.0143108307 0.00000000 0.0036839026
    +## c 0.0004239566 0.04430691 0.0000000000
    +## 
    +## $dlpm
    +##              a           b            c
    +## a 0.0000000000 0.014310831 0.0004239566
    +## b 0.0567550147 0.000000000 0.0443069142
    +## c 0.0005598221 0.003683903 0.0000000000
    +## 
    +## $clpm
    +##            a           b           c
    +## a 0.16803827 0.014485430 0.102709867
    +## b 0.01448543 0.037120650 0.003051617
    +## c 0.10270987 0.003051617 0.074865823
    +## 
    +## $cov.matrix
    +##              a             b            c
    +## a 0.3418800141  0.0001011068  0.206234664
    +## b 0.0001011068  0.0927842833 -0.000789973
    +## c 0.2062346637 -0.0007899730  0.150159552
    +
    cop <- NNS.copula(X, continuous=TRUE, plot=FALSE)
    +cop
    +
    ## [1] 0.5692785
    +
    +
    +

    3.3 Code: copula

    +
    # Data
    +set.seed(123); x = rnorm(100); y = rnorm(100); z = expand.grid(x, y)
    +
    +# Plot
    +rgl::plot3d(z[,1], z[,2], Co.LPM(0, z[,1], z[,2], z[,1], z[,2]), col = "red")
    +
    +# Uniform values
    +u_x = LPM.ratio(0, x, x); u_y = LPM.ratio(0, y, y); z = expand.grid(u_x, u_y)
    +
    +# Plot
    +rgl::plot3d(z[,1], z[,2], Co.LPM(0, z[,1], z[,2], z[,1], z[,2]), col = "blue")
    +

    Interpretation. NNS.dep remains high +for curved relationships; PM.matrix collects co‑partial +moments across variables; NNS.copula summarizes +higher‑dimensional dependence using partial‑moment ratios. Copulas are +returned and evaluated via Co.LPM functions.

    +
    +
    +
    +
    +

    4. Normalization and Rescaling

    +

    NNS provides two main tools for scaling data while preserving rank +structure and distributional shape. Both operate via deterministic +affine transformations.

    +
    +

    4.1 Normalization

    +

    NNS.norm() rescales variables to a common magnitude +while preserving distributional structure. The method can be +linear (all variables forced to have the same mean) or +nonlinear (using dependence weights to produce a more +nuanced scaling). In the nonlinear case, the degree of association +between variables influences the final normalized values.

    +

    Header.

    +
      +
    • NNS.norm(x, linear=TRUE, chart.type = NULL)
    • +
    +
    A <- rnorm(100, mean = 0, sd = 1)
    +B <- rnorm(100, mean = 0, sd = 5)
    +C <- rnorm(100, mean = 10, sd = 1)
    +D <- rnorm(100, mean = 10, sd = 10)
    +
    +X <- data.frame(A, B, C, D)
    +
    +# Linear scaling
    +lin_norm <- NNS.norm(X, linear = TRUE, chart.type=NULL, location=NULL)
    +

    Interpretation. NNS.norm() brings +variables to a common scale without distorting their distributional +shape. Linear mode equalizes means; nonlinear mode additionally weights +each variable by its dependence with others, so more correlated +variables exert greater influence on the final scaling.

    +
    +
    +

    4.2 Risk‑neutral rescale (pricing context)

    +

    NNS.rescale() performs one‑dimensional affine +transformations.

    +

    Header.

    +
      +
    • NNS.rescale(x, a, b, method=c("minmax","riskneutral"), T=NULL, type=c("Terminal","Discounted"))
    • +
    +
    px <- 100 + cumsum(rnorm(260, sd = 1))
    +rn <- NNS.rescale(px, a=100, b=0.03, method="riskneutral", T=1, type="Terminal")
    +c( target = 100*exp(0.03*1), mean_rn = mean(rn) )
    +
    ##   target  mean_rn 
    +## 103.0455 103.0455
    +

    Interpretation. riskneutral shifts the +mean to match \(S_0 e^{rT}\) (Terminal) +or \(S_0\) (Discounted), preserving +distributional shape.

    +
    +
    +
    +
    +

    5. Hypothesis Testing, ANOVA & Stochastic Superiority

    +
    +

    5.1 Concept

    +

    Instead of distributional assumptions, compare groups via +LPM‑based CDFs. Output is a degree of +certainty (not a p‑value) for equality of populations or means.

    +

    Header.

    +
      +
    • NNS.ANOVA(control, treatment, means.only=FALSE, medians=FALSE, confidence.interval=.95, tails=c("Both","left","right"), pairwise=FALSE, plot=TRUE, robust=FALSE)
    • +
    • NNS.SS(x, y, ...)
    • +
    +
    +
    +

    5.2 Code: two‑sample & multi‑group

    +
    ctrl <- rnorm(200, 0, 1)
    +trt  <- rnorm(180, 0.35, 1.2)
    +NNS.ANOVA(control=ctrl, treatment=trt, means.only=FALSE, plot=FALSE)
    +
    ## $Control
    +## [1] 0.05568255
    +## 
    +## $Treatment
    +## [1] 0.2771257
    +## 
    +## $Grand_Statistic
    +## [1] 0.1605767
    +## 
    +## $Control_CDF
    +## [1] 0.5670595
    +## 
    +## $Treatment_CDF
    +## [1] 0.4385169
    +## 
    +## $Certainty
    +## [1] 0.6905098
    +## 
    +## $Effect_Size_LB
    +##        2.5% 
    +## -0.07055716 
    +## 
    +## $Effect_Size_UB
    +##     97.5% 
    +## 0.5317766 
    +## 
    +## $Confidence_Level
    +## [1] 0.95
    +
    A <- list(g1=rnorm(150,0.0,1.1), g2=rnorm(150,0.2,1.0), g3=rnorm(150,-0.1,0.9))
    +NNS.ANOVA(control=A, means.only=TRUE, plot=FALSE)
    +
    ## Certainty 
    +## 0.6876008
    +

    Math sketch. For each quantile/threshold \(t\), compare CDFs built from +LPM.ratio(0, t, •) (possibly with one‑sided tails). +Aggregate across \(t\) to a certainty +score.

    +
    +
    +

    5.3 Stochastic Superiority

    +

    Stochastic superiority asks a different question than equality of +means or equality of distributions. Rather than testing whether two +samples came from the same population, or whether they share the same +mean or median, stochastic superiority measures the probability that a +random draw from one distribution exceeds a random draw from +another.

    +

    For two random variables \(X\) and +\(Y\), the stochastic superiority +probability is:

    +

    \[ +P(X > Y) +\]

    +

    and with ties accounted for, the tie-adjusted stochastic superiority +measure is:

    +

    \[ +P^* = P(X > Y) + \frac{1}{2} P(X = Y) +\]

    +

    A value of \(P^* = 0.5\) indicates +no directional advantage, values above \(0.5\) favor \(X\), and values below \(0.5\) favor \(Y\).

    +

    This differs from stochastic dominance. Stochastic superiority is a +pairwise exceedance probability, while stochastic dominance requires one +distribution to be preferred to another over the entire shared +support.

    +

    Below is an example comparing two distributions with unequal +means.

    +
    set.seed(123)
    +x = rnorm(1000, mean = 0, sd = 1)
    +y = rnorm(1000, mean = 1, sd = 1)
    +
    +NNS.SS(x, y)
    +
    ## $p_gt
    +## [1] 0.233915
    +## 
    +## $p_tie
    +## [1] 0
    +## 
    +## $p_star
    +## [1] 0.233915
    +

    Since \(y\) was generated with a +higher mean, the stochastic superiority probability for \(x\) relative to \(y\) should be less than \(0.5\), indicating that a draw from \(x\) is less likely to exceed a draw from +\(y\).

    +

    We can also obtain confidence intervals for the tie-adjusted +superiority probability using maximum entropy bootstrap replicates.

    +
    NNS.SS(x, y, confidence.interval = TRUE, reps = 999, ci = 0.95)[1:5]
    +
    +$p_gt
    +[1] 0.233915
    +
    +$p_tie
    +[1] 0
    +
    +$p_star
    +[1] 0.233915
    +
    +$lower
    +[1] 0.2105631
    +
    +$upper
    +[1] 0.2537789
    +

    This provides an interpretable effect size for directional comparison +between two distributions without requiring identical distributions or +equal variances.

    +

    For discrete variables, ties may occur with positive probability, and +the reported p_tie and p_star values reflect +that adjustment explicitly.

    +
    set.seed(123)
    +x = sample(1:5, 100, replace = TRUE)
    +y = sample(1:5, 100, replace = TRUE)
    +
    +NNS.SS(x, y)
    +
    ## $p_gt
    +## [1] 0.3982
    +## 
    +## $p_tie
    +## [1] 0.1992
    +## 
    +## $p_star
    +## [1] 0.4978
    +
    +
    +
    +
    +

    6. Regression, Boosting, Stacking & Causality

    +
    +

    6.1 Philosophy

    +

    NNS.reg learns partitioned +relationships using partial‑moment weights — linear where appropriate, +nonlinear where needed — avoiding fragile global parametric forms.

    +

    Headers.

    +
      +
    • NNS.reg(x, y, order=NULL, smooth=TRUE, ncores=1, ...) → +$Fitted.xy, $Point.est, …
    • +
    • NNS.boost(IVs.train, DV.train, IVs.test, epochs, learner.trials, status, balance, type, folds)
    • +
    • NNS.stack(IVs.train, DV.train, IVs.test, type, balance, ncores, folds)
    • +
    • NNS.caus(x, y) (directional causality score via +conditional dependence)
    • +
    +
    +
    +

    6.2 Code: classification via regression + ensembles

    +
    # Example 1: Nonlinear regression
    +set.seed(123)
    +x_train <- runif(1000, -2, 2)
    +y_train <- sin(pi * x_train) + rnorm(1000, sd = 0.2)
    +
    +x_test <- seq(-2, 2, length.out = 100)
    +
    +NNS.reg(x = x_train, y = y_train, order = NULL, point.est = x_test)
    +

    +
    ## $R2
    +## [1] 0.9276761
    +## 
    +## $SE
    +## [1] 0.2015258
    +## 
    +## $Prediction.Accuracy
    +## NULL
    +## 
    +## $equation
    +## NULL
    +## 
    +## $x.star
    +## NULL
    +## 
    +## $derivative
    +##     Coefficient X.Lower.Range X.Upper.Range
    +##           <num>         <num>         <num>
    +##  1:   3.0485215  -1.998138604  -1.934370540
    +##  2:   3.5169373  -1.934370540  -1.804387149
    +##  3:   1.8605016  -1.804387149  -1.692769075
    +##  4:   0.6783073  -1.692769075  -1.590915710
    +##  5:   0.4272848  -1.590915710  -1.465816449
    +##  6:  -0.5144026  -1.465816449  -1.376464546
    +##  7:  -1.9381128  -1.376464546  -1.229726997
    +##  8:  -3.0106084  -1.229726997  -1.110428636
    +##  9:  -2.5210796  -1.110428636  -0.976623793
    +## 10:  -3.7347021  -0.976623793  -0.870193992
    +## 11:  -2.0861598  -0.870193992  -0.754706576
    +## 12:  -2.1796417  -0.754706576  -0.636846031
    +## 13:  -0.9300308  -0.636846031  -0.533099369
    +## 14:   1.0359249  -0.533099369  -0.417818767
    +## 15:   0.9115004  -0.417818767  -0.323764665
    +## 16:   2.3250859  -0.323764665  -0.184330858
    +## 17:   3.0769180  -0.184330858  -0.132632209
    +## 18:   3.3162510  -0.132632209  -0.080933560
    +## 19:   3.5323950  -0.080933560  -0.004108338
    +## 20:   2.1862481  -0.004108338   0.121863569
    +## 21:   3.4805229   0.121863569   0.216987038
    +## 22:   1.8001452   0.216987038   0.336388996
    +## 23:   0.2295375   0.336388996   0.516729182
    +## 24:  -0.5625172   0.516729182   0.668479078
    +## 25:  -2.5532272   0.668479078   0.830264570
    +## 26:  -2.4765129   0.830264570   0.988320504
    +## 27:  -3.1248612   0.988320504   1.083380900
    +## 28:  -2.9622550   1.083380900   1.218812429
    +## 29:  -1.5047059   1.218812429   1.279773569
    +## 30:  -1.5723118   1.279773569   1.445979675
    +## 31:   0.1804598   1.445979675   1.571628940
    +## 32:   0.8726461   1.571628940   1.689536565
    +## 33:   3.4198918   1.689536565   1.860999223
    +## 34:   1.5206901   1.860999223   1.997618112
    +##     Coefficient X.Lower.Range X.Upper.Range
    +## 
    +## $Point.est
    +##   [1]  0.01571202  0.10442684  0.23470933  0.37680781  0.51890629  0.65039140
    +##   [7]  0.72556318  0.80073496  0.85698998  0.88439634  0.91180269  0.93033285
    +##  [13]  0.94759688  0.96486091  0.95248720  0.93170326  0.87827479  0.79996720
    +##  [19]  0.72165961  0.64335202  0.52449573  0.40285499  0.28121424  0.17901835
    +##  [25]  0.07715655 -0.02470525 -0.15949123 -0.31038828 -0.45880078 -0.54309006
    +##  [31] -0.62737935 -0.71234468 -0.80041101 -0.88847734 -0.96331853 -1.00089554
    +##  [37] -1.03847254 -1.02090671 -0.97905116 -0.93719561 -0.89956805 -0.86273975
    +##  [43] -0.79660165 -0.70265879 -0.60871593 -0.51288396 -0.38856404 -0.25667590
    +##  [49] -0.11829230  0.02443074  0.13442845  0.22276171  0.31109496  0.42473203
    +##  [55]  0.56535922  0.69718932  0.76992246  0.84265560  0.90432326  0.91359751
    +##  [61]  0.92287175  0.93214599  0.94142023  0.92794242  0.90521445  0.88248648
    +##  [67]  0.85975852  0.76020581  0.65704511  0.55388442  0.45072372  0.35051056
    +##  [73]  0.25044944  0.15038831  0.04930377 -0.07695325 -0.20321027 -0.32495818
    +##  [79] -0.44464525 -0.56433232 -0.66432673 -0.72512293 -0.78817431 -0.85170206
    +##  [85] -0.91522981 -0.97875756 -0.99186193 -0.98457062 -0.97727932 -0.95314667
    +##  [91] -0.91788824 -0.88262981 -0.77697785 -0.63880041 -0.50062296 -0.36244551
    +##  [97] -0.25805231 -0.19661029 -0.13516826 -0.07526941
    +## 
    +## $pred.int
    +## NULL
    +## 
    +## $regression.points
    +##                x           y
    +##            <num>       <num>
    +##  1: -1.998138604 -0.01307124
    +##  2: -1.934370540  0.18132707
    +##  3: -1.804387149  0.63847051
    +##  4: -1.692769075  0.84613612
    +##  5: -1.590915710  0.91522399
    +##  6: -1.465816449  0.96867700
    +##  7: -1.376464546  0.92271416
    +##  8: -1.229726997  0.63832023
    +##  9: -1.110428636  0.27915958
    +## 10: -0.976623793 -0.05817308
    +## 11: -0.870193992 -0.45565668
    +## 12: -0.754706576 -0.69658188
    +## 13: -0.636846031 -0.95347564
    +## 14: -0.533099369 -1.04996323
    +## 15: -0.417818767 -0.93054118
    +## 16: -0.323764665 -0.84481083
    +## 17: -0.184330858 -0.52061525
    +## 18: -0.132632209 -0.36154275
    +## 19: -0.080933560 -0.19009705
    +## 20: -0.004108338  0.08127998
    +## 21:  0.121863569  0.35668582
    +## 22:  0.216987038  0.68776523
    +## 23:  0.336388996  0.90270609
    +## 24:  0.516729182  0.94410093
    +## 25:  0.668479078  0.85873901
    +## 26:  0.830264570  0.44566388
    +## 27:  0.988320504  0.05423632
    +## 28:  1.083380900 -0.24281423
    +## 29:  1.218812429 -0.64399694
    +## 30:  1.279773569 -0.73572553
    +## 31:  1.445979675 -0.99705336
    +## 32:  1.571628940 -0.97437872
    +## 33:  1.689536565 -0.87148709
    +## 34:  1.860999223 -0.28510334
    +## 35:  1.997618112 -0.07734835
    +##                x           y
    +## 
    +## $Fitted.xy
    +##                x          y      y.hat  NNS.ID   gradient    residuals
    +##            <num>      <num>      <num>  <char>      <num>        <num>
    +##    1: -0.8496899 -0.5752368 -0.4984314 q121122 -2.0861598  0.076805376
    +##    2:  1.1532205 -0.6617217 -0.4496971 q221122 -2.9622550  0.212024652
    +##    3: -0.3640923 -0.7048691 -0.8815695 q122122  0.9115004 -0.176700402
    +##    4:  1.5320696 -0.8447168 -0.9815176 q222121  0.1804598 -0.136800802
    +##    5:  1.7618691 -0.9820881 -0.6241175 q222212  3.4198918  0.357970569
    +##   ---                                                                 
    +##  996:  1.3184955 -0.7988901 -0.7966085 q221222 -1.5723118  0.002281548
    +##  997:  0.5684553  1.1554781  0.9150041 q212122 -0.5625172 -0.240473993
    +##  998: -0.4340050 -0.7748325 -0.9473089 q122121  1.0359249 -0.172476359
    +##  999:  0.8383194  0.7041960  0.4257159 q212222 -2.4765129 -0.278480031
    +## 1000: -1.5647037  0.9467853  0.9264240 q111222  0.4272848 -0.020361366
    +##       standard.errors
    +##                 <num>
    +##    1:       0.1769692
    +##    2:       0.1783713
    +##    3:       0.1905081
    +##    4:       0.2044300
    +##    5:       0.2636784
    +##   ---                
    +##  996:       0.1971693
    +##  997:       0.2137362
    +##  998:       0.1831159
    +##  999:       0.2108312
    +## 1000:       0.2078031
    +
    # Simple train/test for boosting & stacking
    +test.set = 141:150
    + 
    +boost <- NNS.boost(IVs.train = iris[-test.set, 1:4], 
    +              DV.train = iris[-test.set, 5],
    +              IVs.test = iris[test.set, 1:4],
    +              epochs = 10, learner.trials = 10, 
    +              status = FALSE, balance = TRUE,
    +              type = "CLASS", folds = 5)
    +
    +
    +mean(boost$results == as.numeric(iris[test.set,5]))
    +# [1] 1
    +
    +
    +boost$feature.weights; boost$feature.frequency
    +
    +stacked <- NNS.stack(IVs.train = iris[-test.set, 1:4], 
    +                     DV.train = iris[-test.set, 5],
    +                     IVs.test = iris[test.set, 1:4],
    +                     type = "CLASS", balance = TRUE,
    +                     ncores = 1, folds = 1)
    +mean(stacked$stack == as.numeric(iris[test.set,5]))
    +# [1] 1
    +
    +
    +

    6.3 Code: directional causality

    +
    NNS.caus(mtcars$hp,  mtcars$mpg)  # hp -> mpg
    +
    ## Causation.x.given.y Causation.y.given.x           C(x--->y) 
    +##           0.2607148           0.3863580           0.3933374
    +
    NNS.caus(mtcars$mpg, mtcars$hp)   # hp -> mpg
    +
    ## Causation.x.given.y Causation.y.given.x           C(y--->x) 
    +##           0.3863580           0.2607148           0.3933374
    +

    Interpretation. Examine asymmetry in scores to infer +direction. The method conditions partial‑moment dependence on candidate +drivers.

    +
    +
    +
    +
    +

    7. Time Series & Forecasting

    +

    Headers.

    +
      +
    • NNS.ARMA
    • +
    • NNS.ARMA.optim
    • +
    • NNS.seas
    • +
    • NNS.VAR
    • +
    +
    # Univariate nonlinear ARMA
    +z <- as.numeric(scale(sin(1:480/8) + rnorm(480, sd=.35)))
    +
    +# Seasonality detection (prints a summary)
    +seasonal_period <- NNS.seas(z, plot = FALSE)
    +head(seasonal_period$all.periods)
    +
    ##   Period Coefficient.of.Variation Variable.Coefficient.of.Variation
    +## 1    200                0.4267885                      8.540159e+16
    +## 2     96                0.4425880                      8.540159e+16
    +## 3     49                0.4615546                      8.540159e+16
    +## 4    198                0.4812956                      8.540159e+16
    +## 5    199                0.4885608                      8.540159e+16
    +## 6    146                0.4901054                      8.540159e+16
    +
    # Validate seasonal periods
    +NNS.ARMA.optim(z, h = 48, seasonal.factor = seasonal_period$periods, plot = TRUE, ncores = 1)
    +
    ## [1] "CURRNET METHOD: lin"
    +## [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
    +## [1] "NNS.ARMA(... method =  'lin' , seasonal.factor =  c( 51 ) ...)"
    +## [1] "CURRENT lin OBJECTIVE FUNCTION = 0.398327414917885"
    +## [1] "BEST method = 'lin', seasonal.factor = c( 51 )"
    +## [1] "BEST lin OBJECTIVE FUNCTION = 0.398327414917885"
    +## [1] "CURRNET METHOD: nonlin"
    +## [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
    +## [1] "NNS.ARMA(... method =  'nonlin' , seasonal.factor =  c( 51 ) ...)"
    +## [1] "CURRENT nonlin OBJECTIVE FUNCTION = 2.75408671013046"
    +## [1] "BEST method = 'nonlin' PATH MEMBER = c( 51 )"
    +## [1] "BEST nonlin OBJECTIVE FUNCTION = 2.75408671013046"
    +## [1] "CURRNET METHOD: both"
    +## [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
    +## [1] "NNS.ARMA(... method =  'both' , seasonal.factor =  c( 51 ) ...)"
    +## [1] "CURRENT both OBJECTIVE FUNCTION = 0.778172239627562"
    +## [1] "BEST method = 'both' PATH MEMBER = c( 51 )"
    +## [1] "BEST both OBJECTIVE FUNCTION = 0.778172239627562"
    +

    +
    ## $periods
    +## [1] 51
    +## 
    +## $weights
    +## NULL
    +## 
    +## $obj.fn
    +## [1] 0.3983274
    +## 
    +## $method
    +## [1] "lin"
    +## 
    +## $shrink
    +## [1] FALSE
    +## 
    +## $nns.regress
    +## [1] FALSE
    +## 
    +## $bias.shift
    +## [1] 0.01738357
    +## 
    +## $errors
    +##  [1] -0.4754897523 -0.4609730867 -0.3018876142  0.0439513384 -0.1128600832
    +##  [6]  0.9193835234  0.0160010547 -0.7516578805 -0.8195384972 -0.1274709629
    +## [11] -0.0093477175  0.1480424491  0.0345888303  0.0009331215 -0.2819915138
    +## [16] -0.3474821395 -1.2543849202 -0.5442948705 -0.0049610072 -0.4702036102
    +## [21]  0.1846614137  1.6541950586 -0.2046795992  0.9691745476  1.1460606178
    +## [26] -0.5141738440 -1.3562787956  0.3853973272 -0.3364881552 -0.5604890777
    +## [31] -0.3175309175 -0.1677932189 -0.1511705981  0.4541183441 -0.1377055180
    +## [36]  0.4279932502  1.3576081283 -0.0645315976  1.1430476887  0.1399600873
    +## [41]  0.0874395694 -0.3703494531  0.3046994756  0.2057574931 -0.7602832912
    +## [46]  0.6902933417  0.2238850985 -0.2775974238 -0.8250763050  0.5817787408
    +## [51] -0.8733350647  0.2906911996  0.1863210948 -0.2484232855  0.1444232735
    +## [56]  1.1655644133  0.0821221969 -0.2813315730 -0.7959981329 -0.3601165470
    +## [61] -0.4617740020 -0.0593491905  0.0143389607  0.1016580238  0.0300275332
    +## [66] -1.7237406556 -0.0930802461 -0.9348574200 -0.7189682901  0.0700766333
    +## [71] -0.3547205444  0.2333233909  0.6840012123 -0.0779445509  0.8409902584
    +## [76]  0.0130711684  0.8074727217 -1.1462424589  0.0926963526 -0.4674150054
    +## [81]  0.1308248298 -0.6493713604  0.0713668583  0.4889233461  0.4293197750
    +## [86] -0.4397639878  0.4261287370  0.7556075116  0.6016698079 -0.1086690282
    +## [91]  0.6426872057 -0.4175612763  0.0250816728  0.9344147185  0.5444153587
    +## [96] -0.8746369897
    +## 
    +## $results
    +##  [1] -0.495145166 -0.629911085 -0.423647703 -1.217211533 -1.313660334
    +##  [6] -1.507558621 -1.512809568 -1.244102492 -0.765300445 -2.402307464
    +## [11] -1.325990243 -0.928756118 -1.819067479 -0.855732188 -1.152690586
    +## [16] -1.039006594 -0.562011496 -1.103503510 -0.685097085 -0.727417601
    +## [21] -0.044018500 -0.030435409  0.002633325 -0.314902491  0.232587264
    +## [26]  1.030889038  0.556722546  0.680351082  1.101193382  0.941245213
    +## [31]  1.648626820  1.225992916  1.806473740  0.964372963  1.627354696
    +## [36]  0.460925955  1.318674310  1.692295367  0.854538440  0.768654797
    +## [41]  0.739228654  1.582319086  0.402156303  0.902802567  0.718513288
    +## [46]  0.086635865  0.193748286  0.283357285
    +## 
    +## $lower.pred.int
    +##  [1] -1.67077922 -1.80554514 -1.59928176 -2.39284559 -2.48929439 -2.68319268
    +##  [7] -2.68844363 -2.41973655 -1.94093450 -3.57794152 -2.50162430 -2.10439018
    +## [13] -2.99470154 -2.03136624 -2.32832464 -2.21464065 -1.73764555 -2.27913757
    +## [19] -1.86073114 -1.90305166 -1.21965256 -1.20606947 -1.17300073 -1.49053655
    +## [25] -0.94304679 -0.14474502 -0.61891151 -0.49528298 -0.07444068 -0.23438884
    +## [31]  0.47299276  0.05035886  0.63083968 -0.21126109  0.45172064 -0.71470810
    +## [37]  0.14304025  0.51666131 -0.32109562 -0.40697926 -0.43640540  0.40668503
    +## [43] -0.77347775 -0.27283149 -0.45712077 -1.08899819 -0.98188577 -0.89227677
    +## 
    +## $upper.pred.int
    +##  [1]  0.68048889  0.54572297  0.75198635 -0.04157748 -0.13802628 -0.33192456
    +##  [7] -0.33717551 -0.06846843  0.41033361 -1.22667341 -0.15035619  0.24687794
    +## [13] -0.64343342  0.31990187  0.02294347  0.13662746  0.61362256  0.07213055
    +## [19]  0.49053697  0.44821646  1.13161556  1.14519865  1.17826738  0.86073157
    +## [25]  1.40822132  2.20652310  1.73235660  1.85598514  2.27682744  2.11687927
    +## [31]  2.82426088  2.40162697  2.98210780  2.14000702  2.80298875  1.63656001
    +## [37]  2.49430837  2.86792942  2.03017250  1.94428885  1.91486271  2.75795314
    +## [43]  1.57779036  2.07843662  1.89414735  1.26226992  1.36938234  1.45899134
    +

    Notes. NNS seasonality uses coefficient of variation +instead of ACF/PACFs, and NNS ARMA blends multiple seasonal periods into +the linear or nonlinear regression forecasts.

    +
    +
    +
    +

    8. Simulation & Bootstrap & Risk‑Neutral Rescaling

    +
    +

    8.1 Maximum entropy bootstrap (shape‑preserving)

    +

    Header.

    +
      +
    • NNS.meboot(x, reps=999, rho=NULL, type="spearman", drift=TRUE, ...)
    • +
    +
    x_ts <- cumsum(rnorm(350, sd=.7))
    +mb <- NNS.meboot(x_ts, reps=5, rho = 1)
    +dim(mb["replicates", ]$replicates)
    +
    ## [1] 350   5
    +
    +
    +

    8.2 Monte Carlo over the full correlation space

    +

    Header.

    +
      +
    • NNS.MC(x, reps=30, lower_rho=-1, upper_rho=1, by=.01, exp=1, type="spearman", ...)
    • +
    +
    mc <- NNS.MC(x_ts, reps=5, lower_rho=-1, upper_rho=1, by=.5, exp=1)
    +length(mc$ensemble); names(mc$replicates)
    +
    ## [1] 350
    +
    ## [1] "rho = 1"    "rho = 0.5"  "rho = 0"    "rho = -0.5" "rho = -1"
    +
    head(mc$replicates$`rho = 0`)
    +
    ##      Replicate 1 Replicate 2 Replicate 3 Replicate 4 Replicate 5
    +## [1,]    8.561720   11.097841   12.140974    3.478574    16.25845
    +## [2,]    4.989649    9.142348    6.298598    2.573488    11.23749
    +## [3,]    5.489892   11.635826    9.151404    4.146175    13.61840
    +## [4,]    7.175210   13.194315   11.614209    5.906763    19.23707
    +## [5,]    8.443500   12.157572   13.263425    4.369562    13.40513
    +## [6,]    7.386515   10.979258   11.705842    2.410838    15.31133
    +
    +
    +
    +
    +

    9. Portfolio & Stochastic Dominance

    +

    Stochastic dominance orders uncertain prospects for broad classes of +risk‑averse utilities; partial moments supply practical, nonparametric +estimators.

    +

    Headers.

    +
      +
    • NNS.FSD.uni(x, y)
    • +
    • NNS.SSD.uni(x, y)
    • +
    • NNS.TSD.uni(x, y)
    • +
    • NNS.SD.cluster(R)
    • +
    • NNS.SD.efficient.set(R)
    • +
    +
    RA <- rnorm(240, 0.005, 0.03)
    +RB <- rnorm(240, 0.003, 0.02)
    +RC <- rnorm(240, 0.006, 0.04)
    +
    +NNS.FSD.uni(RA, RB)
    +
    ## [1] 0
    +
    NNS.SSD.uni(RA, RB)
    +
    ## [1] 0
    +
    NNS.TSD.uni(RA, RB)
    +
    ## [1] 0
    +
    Rmat <- cbind(A=RA, B=RB, C=RC)
    +try(NNS.SD.cluster(Rmat, degree = 1))
    +
    ## $Clusters
    +## $Clusters$Cluster_1
    +## [1] "C" "A" "B"
    +
    try(NNS.SD.efficient.set(Rmat, degree = 1))
    +
    ## Checking 1 of 2Checking 2 of 2
    +
    ## [1] "C" "A" "B"
    +
    +
    +
    +

    Appendix A — Measure‑theoretic sketch (why partial moments are +rigorous)

    +

    Let \((\Omega, \mathcal{F}, +\mathbb{P})\) be a probability space, \(X: \Omega\to\mathbb{R}\) measurable. For +any fixed \(t\in\mathbb{R}\), the sets +\(\{X\le t\}\) and \(\{X>t\}\) are in \(\mathcal{F}\) because they are preimages of +Borel sets. The population partial moments are

    +

    \[ +\operatorname{LPM}(k,t,X) = \int_{-\infty}^{t} (t-x)^k\, dF_X(x), +\qquad +\operatorname{UPM}(k,t,X) = \int_{t}^{\infty} (x-t)^k\, dF_X(x). +\]

    +

    The empirical versions correspond to replacing \(F_X\) with the empirical measure \(\mathbb{P}_n\) (or CDF \(\hat F_n\)):

    +

    \[ +\widehat{\operatorname{LPM}}_k(t;X) = \int_{(-\infty,t]} (t-x)^k\, +d\mathbb{P}_n(x), +\qquad +\widehat{\operatorname{UPM}}_k(t;X) = \int_{(t,\infty)} (x-t)^k\, +d\mathbb{P}_n(x). +\]

    +

    Centering at \(t=\mu_X\) yields the +variance decomposition identity in Section 1.

    +
    +
    +
    +

    Appendix B — Quick Reference (Grouped by Topic)

    + +
    +

    1. Partial Moments & Ratios

    +
      +
    • LPM(degree, target, variable) — lower partial moment of +order degree at target.
    • +
    • UPM(degree, target, variable) — upper partial moment of +order degree at target.
    • +
    • LPM.ratio(degree, target, variable); +UPM.ratio(...) — normalized shares; degree=0 +gives CDF.
    • +
    • LPM.VaR(p, degree, variable) — partial-moment quantile +at probability p.
    • +
    • Co.LPM(degree_lpm, x, y, target_x, target_y, degree_y) +— co-lower partial moment between two variables.
    • +
    • Co.UPM(degree_upm, x, y, target_x, target_y, degree_y) +— co-upper partial moment between two variables.
    • +
    • D.LPM(degree, target, variable) — divergent lower +partial moment (away from target).
    • +
    • D.UPM(degree, target, variable) — divergent upper +partial moment (away from target).
    • +
    • NNS.CDF(x, target = NULL, points = NULL, plot = TRUE/FALSE) +— CDF from partial moments.
    • +
    • NNS.moments(x) — mean/var/skew/kurtosis via partial +moments.
    • +
    +
    +
    +

    2. Descriptive Statistics & Distributions

    +
      +
    • NNS.mode(x, multi = FALSE) — nonparametric +mode(s).
    • +
    • PM.matrix(l_degree, u_degree, target, variable, pop_adj) +— co-/divergent partial-moment matrices.
    • +
    • NNS.gravity(x, w = NULL) — partial-moment weighted +location (gravity center).
    • +
    +

    See NNS Vignette: Getting Started with NNS: +Partial Moments

    +
    +
    +

    3. Dependence & Association

    +
      +
    • NNS.dep(x, y) — nonlinear dependence coefficient.
    • +
    • NNS.copula(X, target, continuous, plot, independence.overlay) +— dependence from co-partial moments.
    • +
    +

    See NNS Vignette: Getting Started +with NNS: Correlation and Dependence

    +
    +
    +

    4. Normalization & Rescaling

    +
      +
    • NNS.norm(x, linear=FALSE) — normalization retaining +target moments.
    • +
    • NNS.rescale(x, a, b, method=c("minmax","riskneutral"), T=NULL, type=c("Terminal","Discounted")) +— risk-neutral or min–max rescaling.
    • +
    +

    See NNS Vignette: Getting Started +with NNS: Normalization and Rescaling

    +
    +
    +

    5. Hypothesis Testing

    +
      +
    • NNS.ANOVA(control, treatment, ...) — certainty of +equality (distributions or means).
    • +
    • NNS.SS(x, y, ...) — stochastic superiority between two +variables.
    • +
    +

    See NNS Vignette: Getting Started with +NNS: Comparing Distributions

    +
    +
    +

    6. Regression, Classification & Causality

    +
      +
    • NNS.part(x, y, ...) — partition analysis for variable +segmentation.
    • +
    • NNS.reg(x, y, ...) — partition-based +regression/classification ($Fitted.xy, +$Point.est).
    • +
    • NNS.boost(IVs, DV, ...), +NNS.stack(IVs, DV, ...) — ensembles using +NNS.reg base learners.
    • +
    • NNS.caus(x, y) — directional causality score.
    • +
    +

    See NNS Vignette: Getting Started +with NNS: Clustering and Regression

    +

    See NNS Vignette: Getting Started with NNS: +Classification

    +
    +
    +

    7. Differentiation & Slope Measures

    +
      +
    • dy.dx(x, y) — numerical derivative of y +with respect to x via NNS.reg.
    • +
    • dy.d_(x, Y, var) — partial derivative of multivariate +Y w.r.t. var.
    • +
    • NNS.diff(x, y) — derivative via secant +projections.
    • +
    +
    +
    +

    8. Time Series & Forecasting

    +
      +
    • NNS.ARMA(...), NNS.ARMA.optim(...) — +nonlinear ARMA modeling.
    • +
    • NNS.seas(...) — detect seasonality.
    • +
    • NNS.VAR(...) — nonlinear VAR modeling.
    • +
    • NNS.nowcast(x, h, ...) — near-term nonlinear +forecast.
    • +
    +

    See NNS Vignette: Getting +Started with NNS: Forecasting

    +
    +
    +

    9. Simulation & Bootstrap

    +
      +
    • NNS.meboot(...) — maximum entropy bootstrap.
    • +
    • NNS.MC(...) — Monte Carlo over correlation space.
    • +
    +

    See NNS Vignette: Getting +Started with NNS: Sampling and Simulation

    +
    +
    +

    10. Portfolio Analysis & Stochastic Dominance

    +
      +
    • NNS.FSD.uni(x, y), NNS.SSD.uni(x, y), +NNS.TSD.uni(x, y) — univariate stochastic dominance +tests.
    • +
    • NNS.SD.cluster(R), NNS.SD.efficient.set(R) +— dominance-based portfolio sets.
    • +
    +

    For complete references, please see the Vignettes linked above and +their specific referenced materials.

    +
    +
    + + + + + + + + + + + diff --git a/tools/NNS/vignettes/NNSvignette_02_Partial_Moments.R b/tools/NNS/vignettes/NNSvignette_02_Partial_Moments.R new file mode 100644 index 0000000..acf8f2b --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_02_Partial_Moments.R @@ -0,0 +1,111 @@ +## ----setup, include=FALSE, message = FALSE------------------------------------ +knitr::opts_chunk$set(echo = TRUE) +library(NNS) +library(data.table) +data.table::setDTthreads(1L) +options(mc.cores = 1) +RcppParallel::setThreadOptions(numThreads = 1) +Sys.setenv("OMP_THREAD_LIMIT" = 1) + +## ----mean, message=FALSE------------------------------------------------------ +library(NNS) +set.seed(123) ; x = rnorm(100) ; y = rnorm(100) + +mean(x) +UPM(1, 0, x) - LPM(1, 0, x) + +## ----variance----------------------------------------------------------------- +# Sample Variance (base R): +var(x) + +# Sample Variance: +(UPM(2, mean(x), x) + LPM(2, mean(x), x)) * (length(x) / (length(x) - 1)) + + +# Population Adjustment of Sample Variance (base R): +var(x) * ((length(x) - 1) / length(x)) + +# Population Variance: +UPM(2, mean(x), x) + LPM(2, mean(x), x) + + +# Variance is also the co-variance of itself: +(Co.LPM(1, x, x, mean(x), mean(x)) + Co.UPM(1, x, x, mean(x), mean(x)) - D.LPM(1, 1, x, x, mean(x), mean(x)) - D.UPM(1, 1, x, x, mean(x), mean(x))) + +## ----stdev-------------------------------------------------------------------- +sd(x) +((UPM(2, mean(x), x) + LPM(2, mean(x), x)) * (length(x) / (length(x) - 1))) ^ .5 + +## ----moments------------------------------------------------------------------ +NNS.moments(x) + +NNS.moments(x, population = FALSE) + +## ----mode--------------------------------------------------------------------- +# Continuous +NNS.mode(x) + +# Discrete and multiple modes +NNS.mode(c(1, 2, 2, 3, 3, 4, 4, 5), discrete = TRUE, multi = TRUE) + +## ----covariance--------------------------------------------------------------- +cov(x, y) +(Co.LPM(1, x, y, mean(x), mean(y)) + Co.UPM(1, x, y, mean(x), mean(y)) - D.LPM(1, 1, x, y, mean(x), mean(y)) - D.UPM(1, 1, x, y, mean(x), mean(y))) * (length(x) / (length(x) - 1)) + +## ----cov_dec, warning=FALSE--------------------------------------------------- +cov.mtx = PM.matrix(LPM_degree = 1, UPM_degree = 1, target = 'mean', variable = cbind(x, y), pop_adj = TRUE) +cov.mtx + +# Reassembled Covariance Matrix +cov.mtx$clpm + cov.mtx$cupm - cov.mtx$dlpm - cov.mtx$dupm + + +# Standard Covariance Matrix +cov(cbind(x, y)) + +## ----pearson------------------------------------------------------------------ +cor(x, y) +cov.xy = (Co.LPM(1, x, y, mean(x), mean(y)) + Co.UPM(1, x, y, mean(x), mean(y)) - D.LPM(1, 1, x, y, mean(x), mean(y)) - D.UPM(1, 1, x, y, mean(x), mean(y))) * (length(x) / (length(x) - 1)) +sd.x = ((UPM(2, mean(x), x) + LPM(2, mean(x), x)) * (length(x) / (length(x) - 1))) ^ .5 +sd.y = ((UPM(2, mean(y), y) + LPM(2, mean(y) , y)) * (length(y) / (length(y) - 1))) ^ .5 +cov.xy / (sd.x * sd.y) + +## ----cdfs,fig.align="center",fig.width=5,fig.height=3, results='hide'--------- +P = ecdf(x) +P(0) ; P(1) +LPM(0, 0, x) ; LPM(0, 1, x) + +# Vectorized targets: +LPM(0, c(0, 1), x) + +plot(ecdf(x)) +points(sort(x), LPM(0, sort(x), x), col = "red") +legend("left", legend = c("ecdf", "LPM.CDF"), fill = c("black", "red"), border = NA, bty = "n") + +# Joint CDF: +Co.LPM(0, x, y, 0, 0) + +# Vectorized targets: +Co.LPM(0, x, y, c(0, 1), c(0, 1)) + +# Copula +# Transform x and y so that they are uniform +u_x = LPM.ratio(0, x, x) +u_y = LPM.ratio(0, y, y) + +# Value of copula at c(.5, .5) +Co.LPM(0, u_x, u_y, .5, .5) + +# Continuous CDF: +NNS.CDF(x, 1) + +# CDF with target: +NNS.CDF(x, 1, target = mean(x)) + +# Survival Function: +NNS.CDF(x, 1, type = "survival") + +## ----numerical integration---------------------------------------------------- +x = seq(0, 1, .001) ; y = x ^ 2 +(UPM(1, 0, y) - LPM(1, 0, y)) * (1 - 0) + diff --git a/tools/NNS/vignettes/NNSvignette_02_Partial_Moments.html b/tools/NNS/vignettes/NNSvignette_02_Partial_Moments.html new file mode 100644 index 0000000..86446ea --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_02_Partial_Moments.html @@ -0,0 +1,592 @@ + + + + + + + + + + + + + + + +Getting Started with NNS: Partial Moments + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Getting Started with NNS: Partial +Moments

    +

    Fred Viole

    + + + +
    +

    Partial Moments

    +

    Why is it necessary to parse the variance with partial moments? The +additional information generated from partial moments permits a level of +analysis simply not possible with traditional summary statistics.

    +

    Below are some basic equivalences demonstrating partial moments role +as the elements of variance.

    +
    +

    Mean

    +
    library(NNS)
    +set.seed(123) ; x = rnorm(100) ; y = rnorm(100)
    +
    +mean(x)
    +
    ## [1] 0.09040591
    +
    UPM(1, 0, x) - LPM(1, 0, x)
    +
    ## [1] 0.09040591
    +
    +
    +

    Variance

    +
    # Sample Variance (base R):
    +var(x)
    +
    ## [1] 0.8332328
    +
    # Sample Variance:
    +(UPM(2, mean(x), x) + LPM(2, mean(x), x)) * (length(x) / (length(x) - 1))
    +
    ## [1] 0.8332328
    +
    # Population Adjustment of Sample Variance (base R):
    +var(x) * ((length(x) - 1) / length(x))
    +
    ## [1] 0.8249005
    +
    # Population Variance:
    +UPM(2, mean(x), x) + LPM(2, mean(x), x)
    +
    ## [1] 0.8249005
    +
    # Variance is also the co-variance of itself:
    +(Co.LPM(1, x, x, mean(x), mean(x)) + Co.UPM(1, x, x, mean(x), mean(x)) - D.LPM(1, 1, x, x, mean(x), mean(x)) - D.UPM(1, 1, x, x, mean(x), mean(x)))
    +
    ## [1] 0.8249005
    +
    +
    +

    Standard Deviation

    +
    sd(x)
    +
    ## [1] 0.9128159
    +
    ((UPM(2, mean(x), x) + LPM(2, mean(x), x)) * (length(x) / (length(x) - 1))) ^ .5
    +
    ## [1] 0.9128159
    +
    +
    +

    First 4 Moments

    +

    The first 4 moments are returned with the function +NNS.moments. For sample statistics, set +population = FALSE.

    +
    NNS.moments(x)
    +
    ## $mean
    +## [1] 0.09040591
    +## 
    +## $variance
    +## [1] 0.8249005
    +## 
    +## $skewness
    +## [1] 0.06049948
    +## 
    +## $kurtosis
    +## [1] -0.161053
    +
    NNS.moments(x, population = FALSE)
    +
    ## $mean
    +## [1] 0.09040591
    +## 
    +## $variance
    +## [1] 0.8332328
    +## 
    +## $skewness
    +## [1] 0.06235774
    +## 
    +## $kurtosis
    +## [1] -0.1069186
    +
    +
    +

    Statistical Mode of a Continuous Distribution

    +

    NNS.mode offers support for discrete valued +distributions as well as recognizing multiple modes.

    +
    # Continuous
    +NNS.mode(x)
    +
    ## [1] -0.4132834
    +
    # Discrete and multiple modes
    +NNS.mode(c(1, 2, 2, 3, 3, 4, 4, 5), discrete = TRUE, multi = TRUE)
    +
    ## [1] 2 3 4
    +
    +
    +

    Covariance

    +
    cov(x, y)
    +
    ## [1] -0.04372107
    +
    (Co.LPM(1, x, y, mean(x), mean(y)) + Co.UPM(1, x, y, mean(x), mean(y)) - D.LPM(1, 1, x, y, mean(x), mean(y)) - D.UPM(1, 1, x, y, mean(x), mean(y))) * (length(x) / (length(x) - 1))
    +
    ## [1] -0.04372107
    +
    +
    +

    Covariance Elements and Covariance Matrix

    +

    The covariance matrix \((\Sigma)\) +is equal to the sum of the co-partial moments matrices less the +divergent partial moments matrices. \[ \Sigma += CLPM + CUPM - DLPM - DUPM \]

    +
    cov.mtx = PM.matrix(LPM_degree = 1, UPM_degree = 1, target = 'mean', variable = cbind(x, y), pop_adj = TRUE)
    +cov.mtx
    +
    ## $cupm
    +##           x         y
    +## x 0.4299250 0.1033601
    +## y 0.1033601 0.5411626
    +## 
    +## $dupm
    +##           x         y
    +## x 0.0000000 0.1469182
    +## y 0.1560924 0.0000000
    +## 
    +## $dlpm
    +##           x         y
    +## x 0.0000000 0.1560924
    +## y 0.1469182 0.0000000
    +## 
    +## $clpm
    +##           x         y
    +## x 0.4033078 0.1559295
    +## y 0.1559295 0.3939005
    +## 
    +## $cov.matrix
    +##             x           y
    +## x  0.83323283 -0.04372107
    +## y -0.04372107  0.93506310
    +
    # Reassembled Covariance Matrix
    +cov.mtx$clpm + cov.mtx$cupm - cov.mtx$dlpm - cov.mtx$dupm
    +
    ##             x           y
    +## x  0.83323283 -0.04372107
    +## y -0.04372107  0.93506310
    +
    # Standard Covariance Matrix
    +cov(cbind(x, y))
    +
    ##             x           y
    +## x  0.83323283 -0.04372107
    +## y -0.04372107  0.93506310
    +
    +
    +

    Pearson Correlation

    +
    cor(x, y)
    +
    ## [1] -0.04953215
    +
    cov.xy = (Co.LPM(1, x, y, mean(x), mean(y)) + Co.UPM(1, x, y, mean(x), mean(y)) - D.LPM(1, 1, x, y, mean(x), mean(y)) - D.UPM(1, 1, x, y, mean(x), mean(y))) * (length(x) / (length(x) - 1))
    +sd.x = ((UPM(2, mean(x), x) + LPM(2, mean(x), x)) * (length(x) / (length(x) - 1))) ^ .5
    +sd.y = ((UPM(2, mean(y), y) + LPM(2, mean(y) , y)) * (length(y) / (length(y) - 1))) ^ .5
    +cov.xy / (sd.x * sd.y)
    +
    ## [1] -0.04953215
    +
    +
    +

    CDFs (Discrete and Continuous)

    +
    P = ecdf(x)
    +P(0) ; P(1)
    +LPM(0, 0, x) ; LPM(0, 1, x)
    +
    +# Vectorized targets:
    +LPM(0, c(0, 1), x)
    +
    +plot(ecdf(x))
    +points(sort(x), LPM(0, sort(x), x), col = "red")
    +legend("left", legend = c("ecdf", "LPM.CDF"), fill = c("black", "red"), border = NA, bty = "n")
    +

    +
    # Joint CDF:
    +Co.LPM(0, x, y, 0, 0)
    +
    +# Vectorized targets:
    +Co.LPM(0, x, y, c(0, 1), c(0, 1))
    +
    +# Copula
    +# Transform x and y so that they are uniform
    +u_x = LPM.ratio(0, x, x)
    +u_y = LPM.ratio(0, y, y)
    +
    +# Value of copula at c(.5, .5)
    +Co.LPM(0, u_x, u_y, .5, .5)
    +
    +# Continuous CDF:
    +NNS.CDF(x, 1)
    +
    +# CDF with target:
    +NNS.CDF(x, 1, target = mean(x))
    +

    +
    # Survival Function:
    +NNS.CDF(x, 1, type = "survival")
    +

    +
    +
    +

    Numerical Integration

    +

    Partial moments are asymptotic area approximations of \(f(x)\) akin to the familiar Trapezoidal and +Simpson’s rules. More observations, more accuracy…

    +

    \[[UPM(1,0,f(x))-LPM(1,0,f(x))]\asymp\frac{[F(b)-F(a)]}{[b-a]}\] +\[[UPM(1,0,f(x))-LPM(1,0,f(x))] *[b-a] +\asymp[F(b)-F(a)]\]

    +
    x = seq(0, 1, .001) ; y = x ^ 2
    +(UPM(1, 0, y) - LPM(1, 0, y)) * (1 - 0)
    +
    ## [1] 0.3335
    +

    \[0.3333 * [1-0] = \int_{0}^{1} x^2 +dx\] For the total area, not just the definite integral, simply +sum the partial moments and multiply by \([b - +a]\): \[[UPM(1,0,f(x))+LPM(1,0,f(x))] +*[b-a]\asymp\left\lvert{\int_{a}^{b} f(x)dx}\right\rvert\]

    +
    +
    +

    Bayes’ Theorem

    +

    For example, when ascertaining the probability of an increase in +\(A\) given an increase in \(B\), the +Co.UPM(degree_upm, x, y, target_x, target_y) target +parameters are set to target_x = 0 and +target_y = 0 and the +UPM(degree, target, variable) target parameter is also set +to target = 0.

    +

    \[P(A|B)=\frac{Co.UPM(0,A,B,0,0)}{UPM(0,0,B)}\]

    +
    +
    + + + + + + + + + + + + diff --git a/tools/NNS/vignettes/NNSvignette_03_Correlation_and_Dependence.R b/tools/NNS/vignettes/NNSvignette_03_Correlation_and_Dependence.R new file mode 100644 index 0000000..80ca006 --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_03_Correlation_and_Dependence.R @@ -0,0 +1,80 @@ +## ----setup, include=FALSE, message=FALSE-------------------------------------- +knitr::opts_chunk$set(echo = TRUE) +library(NNS) +library(data.table) +data.table::setDTthreads(1L) +options(mc.cores = 1) +RcppParallel::setThreadOptions(numThreads = 1) +Sys.setenv("OMP_THREAD_LIMIT" = 1) + +## ----setup2,message=FALSE,warning = FALSE------------------------------------- +library(NNS) +library(data.table) +require(knitr) +require(rgl) + +## ----linear,fig.width=5,fig.height=3,fig.align = "center"--------------------- +x = seq(0, 3, .01) ; y = 2 * x + +## ----linear1,fig.width=5,fig.height=3,fig.align = "center", results='hide', echo=FALSE---- +NNS.part(x, y, Voronoi = TRUE, order = 3) + +## ----res1--------------------------------------------------------------------- +cor(x, y) +NNS.dep(x, y) + +## ----nonlinear,fig.width=5,fig.height=3,fig.align = "center", results='hide'---- +x = seq(0, 3, .01) ; y = x ^ 10 + +## ----nonlinear1,fig.width=5,fig.height=3,fig.align = "center", results='hide', echo=FALSE---- +NNS.part(x, y, Voronoi = TRUE, order = 3) + +## ----res2a-------------------------------------------------------------------- +cor(x, y) +NNS.dep(x, y) + +## ----nonlinear_sin,fig.width=5,fig.height=3,fig.align = "center", results='hide'---- +x = seq(0, 12*pi, pi/100) ; y = sin(x) + +## ----nonlinear1_sin,fig.width=5,fig.height=3,fig.align = "center", results='hide', echo=FALSE---- +NNS.part(x, y, Voronoi = TRUE, order = 3, obs.req = 0) + +## ----res2_sin----------------------------------------------------------------- +cor(x, y) +NNS.dep(x, y) + +## ----asym1-------------------------------------------------------------------- +cor(x, y) +NNS.dep(x, y, asym = TRUE) + +## ----asym2-------------------------------------------------------------------- +cor(y, x) +NNS.dep(y, x, asym = TRUE) + +## ----dependence,fig.width=5,fig.height=3,fig.align = "center"----------------- +set.seed(123) +df = data.frame(x = runif(10000, -1, 1), y = runif(10000, -1, 1)) +df = subset(df, (x ^ 2 + y ^ 2 <= 1 & x ^ 2 + y ^ 2 >= 0.95)) + +## ----circle1,fig.width=5,fig.height=3,fig.align = "center", results='hide', echo=FALSE---- +NNS.part(df$x, df$y, Voronoi = TRUE, order = 3, obs.req = 0) + +## ----res3--------------------------------------------------------------------- +NNS.dep(df$x, df$y) + +## ----permutations------------------------------------------------------------- +## p-values for [NNS.dep] +set.seed(123) +x = seq(-5, 5, .1); y = x^2 + rnorm(length(x)) + +## ----perm1,fig.width=5,fig.height=3,fig.align = "center", results='hide', echo=FALSE---- +NNS.part(x, y, Voronoi = TRUE, order = 3) + +## ----permutattions_res,fig.width=5,fig.height=3,fig.align = "center"---------- +NNS.dep(x, y, p.value = TRUE, print.map = TRUE) + +## ----multi, warning=FALSE----------------------------------------------------- +set.seed(123) +x = rnorm(1000); y = rnorm(1000); z = rnorm(1000) +NNS.copula(cbind(x, y, z), plot = TRUE, independence.overlay = TRUE) + diff --git a/tools/NNS/vignettes/NNSvignette_03_Correlation_and_Dependence.html b/tools/NNS/vignettes/NNSvignette_03_Correlation_and_Dependence.html new file mode 100644 index 0000000..960a79a --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_03_Correlation_and_Dependence.html @@ -0,0 +1,529 @@ + + + + + + + + + + + + + + + +Getting Started with NNS: Correlation and Dependence + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Getting Started with NNS: Correlation and +Dependence

    +

    Fred Viole

    + + + +
    library(NNS)
    +library(data.table)
    +require(knitr)
    +require(rgl)
    +
    +

    Correlation and Dependence

    +

    The limitations of linear correlation are well known. Often one uses +correlation, when dependence is the intended measure for defining the +relationship between variables. NNS dependence +NNS.dep is a signal:noise measure robust +to nonlinear signals.

    +

    Below are some examples comparing NNS correlation +NNS.cor and +NNS.dep with the standard Pearson’s +correlation coefficient cor.

    +
    +

    Linear Equivalence

    +

    Note the fact that all observations occupy the co-partial moment +quadrants.

    +
    x = seq(0, 3, .01) ; y = 2 * x
    +

    +
    cor(x, y)
    +
    ## [1] 1
    +
    NNS.dep(x, y)
    +
    ## $Correlation
    +## [1] 1
    +## 
    +## $Dependence
    +## [1] 1
    +
    +
    +

    Nonlinear Relationship

    +

    Note the fact that all observations occupy the co-partial moment +quadrants.

    +
    x = seq(0, 3, .01) ; y = x ^ 10
    +

    +
    cor(x, y)
    +
    ## [1] 0.6610183
    +
    NNS.dep(x, y)
    +
    ## $Correlation
    +## [1] 0.9595032
    +## 
    +## $Dependence
    +## [1] 0.9595032
    +
    +
    +

    Cyclic Relationship

    +

    Even the difficult inflection points, which span both the co- and +divergent partial moment quadrants, are properly compensated for in +NNS.dep.

    +
    x = seq(0, 12*pi, pi/100) ; y = sin(x)
    +

    +
    cor(x, y)
    +
    ## [1] -0.1297766
    +
    NNS.dep(x, y)
    +
    ## $Correlation
    +## [1] 0.202252
    +## 
    +## $Dependence
    +## [1] 0.8197963
    +
    +
    +

    Asymmetrical Analysis

    +

    The asymmetrical analysis is critical for further determining a +causal path between variables which should be identifiable, i.e., it is +asymmetrical in causes and effects.

    +

    The previous cyclic example visually highlights the asymmetry of +dependence between the variables, which can be confirmed using +NNS.dep(..., asym = TRUE).

    +
    cor(x, y)
    +
    ## [1] -0.1297766
    +
    NNS.dep(x, y, asym = TRUE)
    +
    ## $Correlation
    +## [1] 0.202252
    +## 
    +## $Dependence
    +## [1] 0.8197963
    +
    cor(y, x)
    +
    ## [1] -0.1297766
    +
    NNS.dep(y, x, asym = TRUE)
    +
    ## $Correlation
    +## [1] 0.07270847
    +## 
    +## $Dependence
    +## [1] 0.4086234
    +
    +
    +

    Dependence

    +

    Note the fact that all observations occupy only co- or divergent +partial moment quadrants for a given subquadrant.

    +
    set.seed(123)
    +df = data.frame(x = runif(10000, -1, 1), y = runif(10000, -1, 1))
    +df = subset(df, (x ^ 2 + y ^ 2 <= 1 & x ^ 2 + y ^ 2 >= 0.95))
    +

    +
    NNS.dep(df$x, df$y)
    +
    ## $Correlation
    +## [1] 0.05834412
    +## 
    +## $Dependence
    +## [1] 0.46764
    +
    +
    +
    +

    p-values for NNS.dep()

    +

    p-values and confidence intervals can be obtained from sampling +random permutations of \(y \rightarrow +y_p\) and running NNS.dep(x,$y_p$) +to compare against a null hypothesis of 0 correlation, or independence +between \((x, y)\).

    +

    Simply set +NNS.dep(..., p.value = TRUE, print.map = TRUE) +to run 100 permutations and plot the results.

    +
    ## p-values for [NNS.dep]
    +set.seed(123)
    +x = seq(-5, 5, .1); y = x^2 + rnorm(length(x))
    +

    +
    NNS.dep(x, y, p.value = TRUE, print.map = TRUE)
    +

    +
    ## $Correlation
    +## [1] 0.2957015
    +## 
    +## $`Correlation p.value`
    +## [1] 0.18
    +## 
    +## $`Correlation 95% CIs`
    +##       2.5%      97.5% 
    +## -0.1544429  0.4062421 
    +## 
    +## $Dependence
    +## [1] 0.7932674
    +## 
    +## $`Dependence p.value`
    +## [1] 0
    +## 
    +## $`Dependence 95% CIs`
    +##      2.5%     97.5% 
    +## 0.5467152 0.6782456
    +
    +
    +

    Multivariate Dependence NNS.copula()

    +

    These partial moment insights permit us to extend the analysis to +multivariate instances and deliver a dependence measure \((D)\) such that \(D \in [0,1]\). This level of analysis is +simply impossible with Pearson or other rank based correlation methods, +which are restricted to bivariate cases.

    +
    set.seed(123)
    +x = rnorm(1000); y = rnorm(1000); z = rnorm(1000)
    +NNS.copula(cbind(x, y, z), plot = TRUE, independence.overlay = TRUE)
    +
    ## [1] 0.3278775
    +
    + + + + + + + + + + + + diff --git a/tools/NNS/vignettes/NNSvignette_04_Normalization_and_Rescaling.R b/tools/NNS/vignettes/NNSvignette_04_Normalization_and_Rescaling.R new file mode 100644 index 0000000..d18aec1 --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_04_Normalization_and_Rescaling.R @@ -0,0 +1,222 @@ +## ----setup, include=FALSE----------------------------------------------------- +knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5) +suppressPackageStartupMessages(library(NNS)) +data.table::setDTthreads(1L) +options(mc.cores = 1) +RcppParallel::setThreadOptions(numThreads = 1) +Sys.setenv("OMP_THREAD_LIMIT" = 1) + +## ----install,message=FALSE,warning = FALSE------------------------------------ +library(NNS) +library(data.table) +require(knitr) +require(rgl) + +## ----basic-example, eval=FALSE------------------------------------------------ +# set.seed(123) +# +# A <- rnorm(100, mean = 0, sd = 1) +# B <- rnorm(100, mean = 0, sd = 5) +# C <- rnorm(100, mean = 10, sd = 1) +# D <- rnorm(100, mean = 10, sd = 10) +# +# X <- data.frame(A, B, C, D) +# +# # Linear scaling +# lin_norm <- NNS.norm(X, linear = TRUE, chart.type = NULL) +# head(lin_norm) +# A Normalized B Normalized C Normalized D Normalized +# [1,] -29.929719 31.889828 5.819152 1.4264014 +# [2,] -12.291609 -11.531393 5.396317 1.2388239 +# [3,] 83.235911 11.073887 4.643781 0.3078703 +# [4,] 3.765188 15.601030 5.029380 -0.2630481 +# [5,] 6.904039 42.717726 4.572611 2.8193657 +# [6,] 91.585447 2.021274 4.543080 6.6681079 +# +# # Verify means are equal +# apply(lin_norm, 2, function(x) c(mean = mean(x), sd = sd(x))) +# +# A Normalized B Normalized C Normalized D Normalized +# mean 4.827727 4.827727 4.8277270 4.827727 +# sd 48.744888 43.407590 0.4531172 5.203436 + +## ----nonlinear-example, eval=FALSE-------------------------------------------- +# nonlin_norm <- NNS.norm(X, linear = FALSE, chart.type = NULL) +# head(nonlin_norm) +# A Normalized B Normalized C Normalized D Normalized +# [1,] -2.7834653 0.32807768 3.178568 0.7439872 +# [2,] -1.1431202 -0.11863321 2.947605 0.6461499 +# [3,] 7.7409438 0.11392645 2.536550 0.1605800 +# [4,] 0.3501627 0.16050101 2.747174 -0.1372015 +# [5,] 0.6420759 0.43947344 2.497676 1.4705341 +# [6,] 8.5174510 0.02079456 2.481545 3.4779738 +# +# apply(nonlin_norm, 2, function(x) c(mean = mean(x), sd = sd(x))) +# +# A Normalized B Normalized C Normalized D Normalized +# mean 0.4489788 0.04966692 2.637026 2.518062 +# sd 4.5332769 0.44657066 0.247504 2.714025 + +## ----unequal, eval = FALSE---------------------------------------------------- +# set.seed(123) +# vec1 <- rnorm(n = 10, mean = 0, sd = 1) +# vec2 <- rnorm(n = 5, mean = 5, sd = 5) +# vec3 <- rnorm(n = 8, mean = 10, sd = 10) +# +# vec_list <- list(vec1, vec2, vec3) +# +# NNS.norm(vec_list) +# +# $`x_1 Normalized` +# [1] 13.074058 -3.004912 -11.745878 25.406891 -4.647966 -5.481229 6.225165 5.920719 6.113733 9.640242 +# +# $`x_2 Normalized` +# [1] 2.875960212 0.008876158 1.230826150 5.855582361 10.779166523 +# +# $`x_3 Normalized` +# [1] 4.0749062 2.2395840 0.4067264 0.7457562 15.6445780 5.1941416 2.3326665 2.5622994 + +## ----rescale-minmax----------------------------------------------------------- +raw_vals <- c(-2.5, 0.2, 1.1, 3.7, 5.0) + +scaled_minmax <- NNS.rescale( + x = raw_vals, + a = 5, + b = 10, + method = "minmax", + T = NULL, + type = "Terminal" +) + +cbind(raw_vals, scaled_minmax) +range(scaled_minmax) + +## ----rescale-riskneutral, eval=FALSE------------------------------------------ +# set.seed(123) +# S0 <- 100 +# r <- 0.05 +# T <- 1 +# +# # Simulate a price path +# prices <- S0 * exp(cumsum(rnorm(250, 0.0005, 0.02))) +# +# rn_terminal <- NNS.rescale( +# x = prices, +# a = S0, +# b = r, +# method = "riskneutral", +# T = T, +# type = "Terminal" +# ) +# +# c( +# mean_original = mean(prices), +# mean_rescaled = mean(rn_terminal), +# target = S0 * exp(r * T) +# ) +# +# mean_original mean_rescaled target +# 109.7019 105.1271 105.1271 + +## ----rescale-discounted, eval=FALSE------------------------------------------- +# rn_discounted <- NNS.rescale( +# x = prices, +# a = S0, +# b = r, +# method = "riskneutral", +# T = T, +# type = "Discounted" +# ) +# +# c( +# mean_rescaled = mean(rn_discounted), +# target_discounted_mean = S0 +# ) +# +# mean_rescaled target_discounted_mean +# 100 100 + +## ----image-------------------------------------------------------------------- +set.seed(123) + +x <- rnorm(1000, 5, 2) +y <- rgamma(1000, 3, 1) + +# Combine variables +X <- cbind(x, y) + +# NNS normalization +X_norm_lin <- NNS.norm(X, linear = TRUE) +X_norm_nonlin <- NNS.norm(X, linear = FALSE) + +# Standard min-max normalization +minmax <- function(v) (v - min(v)) / (max(v) - min(v)) +X_minmax <- apply(X, 2, minmax) + +## ----plotting, echo=FALSE----------------------------------------------------- +par(mfrow = c(2,2)) + +steelblue_alpha <- rgb(1,0,0,0.4) +red_alpha <- rgb(0,0,1,0.4) + +# Breaks for original data +br_orig <- pretty(range(c(x, y)), n = 15) + +# Original variables +hist(x, + col = steelblue_alpha, + breaks = br_orig, + main = "Original Variables", + xlab = "") + +hist(y, + col = red_alpha, + breaks = br_orig, + add = TRUE) + + +# Breaks for NNS normalized variables +br_norm <- pretty(range(c(X_norm_lin[,1], X_norm_lin[,2])), n = 15) + +# NNS normalized +hist(X_norm_lin[,1], + col = steelblue_alpha, + breaks = br_norm, + main = "NNS.norm(..., Linear=TRUE)", + xlab = "") + +hist(X_norm_lin[,2], + col = red_alpha, + breaks = br_norm, + add = TRUE) + +# Breaks for NNS normalized variables +br_norm <- pretty(range(c(X_norm_nonlin[,1], X_norm_nonlin[,2])), n = 15) + +# NNS normalized +hist(X_norm_nonlin[,1], + col = steelblue_alpha, + breaks = br_norm, + main = "NNS.norm(..., Linear=FALSE)", + xlab = "") + +hist(X_norm_nonlin[,2], + col = red_alpha, + breaks = br_norm, + add = TRUE) + +# Breaks for min-max normalized variables +br_minmax <- pretty(range(c(X_minmax[,1], X_minmax[,2])), n = 15) + +# Standard min-max normalization +hist(X_minmax[,1], + col = steelblue_alpha, + breaks = br_minmax, + main = "Standard Min-Max", + xlab = "") + +hist(X_minmax[,2], + col = red_alpha, + breaks = br_minmax, + add = TRUE) + diff --git a/tools/NNS/vignettes/NNSvignette_04_Normalization_and_Rescaling.html b/tools/NNS/vignettes/NNSvignette_04_Normalization_and_Rescaling.html new file mode 100644 index 0000000..2208e99 --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_04_Normalization_and_Rescaling.html @@ -0,0 +1,765 @@ + + + + + + + + + + + + + + + +Getting Started with NNS: Normalization and Rescaling + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Getting Started with NNS: Normalization and +Rescaling

    +

    Fred Viole

    + + + +
    library(NNS)
    +library(data.table)
    +require(knitr)
    +require(rgl)
    +
    +

    Overview

    +

    This vignette covers two related tools:

    +
      +
    • NNS.norm() for cross‑variable normalization when +comparing multiple series.
    • +
    • NNS.rescale() for single‑vector rescaling with either +min‑max or risk‑neutral targets.
    • +
    +

    Both functions perform deterministic affine transformations that +preserve rank structure while modifying scale.

    +
    +
    +
    +

    NNS.norm(): Normalize Multiple Variables

    +

    NNS.norm() rescales variables to a common magnitude +while preserving distributional structure. The method can be +linear (all variables forced to have the same mean) or +nonlinear (using dependence weights to produce a more +nuanced scaling). In the nonlinear case, the degree of association +between variables influences the final normalized values.

    +
    +

    Mathematical Structure

    +

    Let \(X\) be an \(n \times p\) matrix of variables.

    +
    +

    Step 1: Compute Mean Vector

    +

    \[ +m_j = \text{mean}(X_{\cdot j}) +\]

    +

    If any \(m_j = 0\), it is replaced +with \(10^{-10}\) to prevent division +by zero.

    +
    +
    +
    +

    Step 2: Construct Mean Ratio Matrix

    +

    \[ +RG_{ij} = \frac{m_i}{m_j} +\]

    +

    In R this corresponds to:

    +
    RG <- outer(m, 1 / m)
    +
    +
    +
    +

    Step 3: Dependence Weight Matrix

    +

    If linear = FALSE:

    +
      +
    • If number of variables \(p < +10\): \[ +W = |\mathrm{cor}(X)| +\]
    • +
    • Otherwise: \[ +W = |D| \quad \text{where } D = \text{NNS.dep}(X)\$Dependence +\] NNS.dep() returns a symmetric matrix of nonlinear +dependence measures.
    • +
    +

    If linear = TRUE, the weighting effectively becomes:

    +

    \[ +W_{ij} = 1 +\]

    +
    +
    +
    +

    Step 4: Scaling Factors

    +

    \[ +s_j = \frac{1}{p} \sum_{i=1}^{p} RG_{ij} W_{ij} +\]

    +

    Each column is scaled:

    +

    \[ +X_{\cdot j}^{*} = s_j X_{\cdot j} +\]

    +
    +
    +
    +
    +

    Linear Case Proof

    +

    If \(W_{ij} = 1\):

    +

    \[ +s_j = \frac{1}{p} \sum_{i=1}^{p} \frac{m_i}{m_j} += \frac{\bar{m}}{m_j} +\]

    +

    Then:

    +

    \[ +\text{mean}(X_{\cdot j}^{*}) = s_j m_j = \bar{m} +\]

    +

    All variables share the same mean.

    +
    +
    +
    +

    Nonlinear Case Interpretation

    +

    \[ +\text{mean}(X_{\cdot j}^{*}) += +\frac{1}{p} +\sum_{i=1}^{p} +m_i W_{ij} +\]

    +

    Thus, the normalized mean becomes a dependence‑weighted average of +original means. Variables more strongly dependent with higher‑mean +variables scale upward more.

    +
    +
    +
    +

    Examples

    +
    +

    Basic Multivariate Example

    +

    This holds for any distribution type and can be applied to vectors of +different lengths.

    +
    set.seed(123)
    +
    +A <- rnorm(100, mean = 0, sd = 1)
    +B <- rnorm(100, mean = 0, sd = 5)
    +C <- rnorm(100, mean = 10, sd = 1)
    +D <- rnorm(100, mean = 10, sd = 10)
    +
    +X <- data.frame(A, B, C, D)
    +
    +# Linear scaling
    +lin_norm <- NNS.norm(X, linear = TRUE, chart.type = NULL)
    +head(lin_norm)
    +     A Normalized B Normalized C Normalized D Normalized
    +[1,]   -29.929719    31.889828     5.819152    1.4264014
    +[2,]   -12.291609   -11.531393     5.396317    1.2388239
    +[3,]    83.235911    11.073887     4.643781    0.3078703
    +[4,]     3.765188    15.601030     5.029380   -0.2630481
    +[5,]     6.904039    42.717726     4.572611    2.8193657
    +[6,]    91.585447     2.021274     4.543080    6.6681079
    +
    +# Verify means are equal
    +apply(lin_norm, 2, function(x) c(mean = mean(x), sd = sd(x)))
    +
    +     A Normalized B Normalized C Normalized D Normalized
    +mean     4.827727     4.827727    4.8277270     4.827727
    +sd      48.744888    43.407590    0.4531172     5.203436
    +

    Now compare with nonlinear scaling:

    +
    nonlin_norm <- NNS.norm(X, linear = FALSE, chart.type = NULL)
    +head(nonlin_norm)
    +     A Normalized B Normalized C Normalized D Normalized
    +[1,]   -2.7834653   0.32807768     3.178568    0.7439872
    +[2,]   -1.1431202  -0.11863321     2.947605    0.6461499
    +[3,]    7.7409438   0.11392645     2.536550    0.1605800
    +[4,]    0.3501627   0.16050101     2.747174   -0.1372015
    +[5,]    0.6420759   0.43947344     2.497676    1.4705341
    +[6,]    8.5174510   0.02079456     2.481545    3.4779738
    +
    +apply(nonlin_norm, 2, function(x) c(mean = mean(x), sd = sd(x)))
    +
    +     A Normalized B Normalized C Normalized D Normalized
    +mean    0.4489788   0.04966692     2.637026     2.518062
    +sd      4.5332769   0.44657066     0.247504     2.714025
    +

    Note that the means differ and the standard deviations are smaller +than in the linear case, reflecting the dependence structure.

    +
    +

    Normalize list of unequal vector lengths

    +
    set.seed(123)
    +vec1 <- rnorm(n = 10, mean = 0, sd = 1)
    +vec2 <- rnorm(n = 5, mean = 5, sd = 5)
    +vec3 <- rnorm(n = 8, mean = 10, sd = 10)
    +
    +vec_list <- list(vec1, vec2, vec3)
    +
    +NNS.norm(vec_list)
    +
    +$`x_1 Normalized`
    + [1]  13.074058  -3.004912 -11.745878  25.406891  -4.647966  -5.481229   6.225165   5.920719   6.113733   9.640242
    +
    +$`x_2 Normalized`
    +[1]  2.875960212  0.008876158  1.230826150  5.855582361 10.779166523
    +
    +$`x_3 Normalized`
    +[1]  4.0749062  2.2395840  0.4067264  0.7457562 15.6445780  5.1941416  2.3326665  2.5622994
    +
    +
    +
    +
    +

    Quantile Normalization Comparison

    +

    Quantile normalization forces distributions to be identical. This is +literally the opposite intended effect of NNS.norm, which +preserves individual distribution shapes while aligning ranges. The +quantile normalized series become identical in distribution, while the +NNS methods retain the original patterns.

    +
    +
    +
    +
    +

    Practical Applications

    +

    Normalization eliminates the need for multiple y‑axis charts and +prevents their misuse. By placing variables on the same axes with shared +ranges, we enable more relevant conditional probability analyses. This +technique, combined with time normalization, is used in +NNS.caus() to identify causal relationships between +variables.

    +
    +
    +
    +
    +

    NNS.rescale(): Distribution Rescaling

    +

    NNS.rescale() performs one‑dimensional affine +transformations.

    +

    Function signature:

    +
    NNS.rescale(x, a, b, method = "minmax", T = NULL, type = "Terminal")
    +
    +
    +

    1) Min-Max Scaling

    +

    If method = "minmax":

    +

    \[ +x^{*} += +a ++ +(b - a) +\frac{x - \min(x)} +{\max(x) - \min(x)} +\]

    +

    Properties:

    +
      +
    • Preserves order
    • +
    • Maps support to \([a,b]\)
    • +
    • Linear transformation
    • +
    +
    +
    +

    Example

    +
    raw_vals <- c(-2.5, 0.2, 1.1, 3.7, 5.0)
    +
    +scaled_minmax <- NNS.rescale(
    +  x = raw_vals,
    +  a = 5,
    +  b = 10,
    +  method = "minmax",
    +  T = NULL,
    +  type = "Terminal"
    +)
    +
    +cbind(raw_vals, scaled_minmax)
    +#>      raw_vals scaled_minmax
    +#> [1,]     -2.5      5.000000
    +#> [2,]      0.2      6.800000
    +#> [3,]      1.1      7.400000
    +#> [4,]      3.7      9.133333
    +#> [5,]      5.0     10.000000
    +range(scaled_minmax)
    +#> [1]  5 10
    +
    +
    +
    +
    +

    2) Risk-Neutral Scaling

    +

    If method = "riskneutral":

    +

    Let:

    +
      +
    • \(S_0 = a\)
    • +
    • \(r = b\)
    • +
    • \(T\) = time horizon
    • +
    +
    +

    Terminal Type

    +

    Target:

    +

    \[ +\mathbb{E}[S_T] = S_0 e^{rT} +\]

    +

    Transformation form:

    +

    \[ +x^{*} += +x +\cdot +\frac{S_0 e^{rT}} +{\text{mean}(x)} +\]

    +

    This enforces the required expectation.

    +
    +
    +
    +

    Discounted Type

    +

    Target:

    +

    \[ +\mathbb{E}[e^{-rT} S_T] = S_0 +\]

    +

    Equivalent to:

    +

    \[ +\mathbb{E}[S_T] = S_0 e^{rT} +\]

    +

    but the returned series is scaled so that its discounted mean equals +\(S_0\). In practice, the function +applies the same multiplicative factor as above, because:

    +

    \[ +\text{mean}(e^{-rT} x^{*}) = e^{-rT} \cdot \text{mean}(x^{*}) = e^{-rT} +\cdot S_0 e^{rT} = S_0. +\]

    +
    +
    +
    +
    +

    Risk-Neutral Example

    +
    set.seed(123)
    +S0 <- 100
    +r <- 0.05
    +T <- 1
    +
    +# Simulate a price path
    +prices <- S0 * exp(cumsum(rnorm(250, 0.0005, 0.02)))
    +
    +rn_terminal <- NNS.rescale(
    +  x = prices,
    +  a = S0,
    +  b = r,
    +  method = "riskneutral",
    +  T = T,
    +  type = "Terminal"
    +)
    +
    +c(
    +  mean_original = mean(prices),
    +  mean_rescaled = mean(rn_terminal),
    +  target = S0 * exp(r * T)
    +)
    +
    +mean_original mean_rescaled        target 
    +     109.7019      105.1271      105.1271 
    +
    +
    +
    +

    Discounted Example

    +
    rn_discounted <- NNS.rescale(
    +  x = prices,
    +  a = S0,
    +  b = r,
    +  method = "riskneutral",
    +  T = T,
    +  type = "Discounted"
    +)
    +
    +c(
    +  mean_rescaled = mean(rn_discounted),
    +  target_discounted_mean = S0
    +)
    +
    +         mean_rescaled target_discounted_mean 
    +                   100                    100 
    +
    +
    +
    +
    +

    Conceptual Summary

    +
    +

    NNS.norm()

    +
      +
    • Multivariate
    • +
    • Dependence‑aware scaling
    • +
    • Equalizes means only in linear mode
    • +
    • Preserves shape and order
    • +
    +
    +
    +

    NNS.rescale()

    +
      +
    • Univariate
    • +
    • Affine transformation
    • +
    • Either range‑targeted or expectation‑targeted
    • +
    • Preserves rank structure
    • +
    +

    Both functions maintain monotonicity and are therefore compatible +with NNS copula and dependence modeling frameworks.

    +
    set.seed(123)
    +
    +x <- rnorm(1000, 5, 2)
    +y <- rgamma(1000, 3, 1)
    +
    +# Combine variables
    +X <- cbind(x, y)
    +
    +# NNS normalization
    +X_norm_lin <- NNS.norm(X, linear = TRUE)
    +X_norm_nonlin <- NNS.norm(X, linear = FALSE)
    +
    +# Standard min-max normalization
    +minmax <- function(v) (v - min(v)) / (max(v) - min(v))
    +X_minmax <- apply(X, 2, minmax)
    +

    +
    +
    +
    + + + + + + + + + + + + diff --git a/tools/NNS/vignettes/NNSvignette_05_Sampling.R b/tools/NNS/vignettes/NNSvignette_05_Sampling.R new file mode 100644 index 0000000..e506730 --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_05_Sampling.R @@ -0,0 +1,248 @@ +## ----setup, include=FALSE, message=FALSE-------------------------------------- +knitr::opts_chunk$set(echo = TRUE) +library(NNS) +library(data.table) +data.table::setDTthreads(1L) +options(mc.cores = 1) +RcppParallel::setThreadOptions(numThreads = 1) +Sys.setenv("OMP_THREAD_LIMIT" = 1) + +## ----setup2, message=FALSE, warning = FALSE----------------------------------- +library(NNS) +library(data.table) +require(knitr) +require(rgl) + +## ----------------------------------------------------------------------------- +set.seed(123); x = rnorm(100) +ecdf(x) +P = ecdf(x) +P(0); P(1) + +## ----message=FALSE------------------------------------------------------------ +LPM.ratio(degree = 0, target = 0, variable = x); LPM.ratio(degree = 0, target = 1, variable = x) + +## ----fig.align='center', fig.width=6, fig.height=6, echo = FALSE-------------- +LPM.CDF = LPM.ratio(degree = 0, target = sort(x), variable = x) + +plot(ecdf(x)) +points(sort(x), LPM.CDF, col='red') +legend('left', legend = c('ecdf', 'LPM.ratio'), fill=c('black','red'), border=NA, bty='n') + +## ----fig.align='center', fig.height=8, fig.width=8, echo=FALSE, warning=FALSE, message = FALSE, eval=FALSE---- +# zzz = rnorm(length(x), mean = 0, sd = 1) +# norm_approx = pnorm(sort(zzz), mean=0, sd=1) #pnorm(sort(x),mean=-mean(x),sd=sd(x)) +# +# plot(ecdf(x), main = "eCDF via LPM.ratio()", lwd = 4) +# +# +# # Altering shape of distribution with LPM degree +# for(i in c(0, 0.25, .5, 1, 2)){ +# idx <- which(i == c(0, 0.25, .5, 1, 2)) +# lines(sort(x), LPM.ratio(i, sort(x),x), col = rainbow(5, alpha = 1)[idx], lty = 1, lwd = 3) +# } +# +# lines(sort(zzz), norm_approx ,col='black', lty = 3, lwd = 2) +# +# +# legend("topleft",c("LPM.ratio(degree = 0)","LPM.ratio(degree = 0.25)","LPM.ratio(degree = 0.5)","LPM.ratio(degree = 1)","LPM.ratio(degree = 2)", "N(0,1) approximation"), +# col = c(rainbow(5)[1:5], "black"), lwd = 3, lty = c(rep(1, 5), 3)) + +## ----fig.align='center', echo=FALSE, fig.width=10, fig.height=8, message=FALSE, warning=FALSE, eval=FALSE---- +# layout(matrix(c(1, 1, 1,1,1, +# 2, 3, 4,5,6, +# 2, 3, 4,5,6), nrow=5, byrow=FALSE),widths = c(2,rep(1,5))) +# +# +# plot(ecdf(x), main = "eCDF via LPM.ratio()", lwd = 4) +# +# +# # Altering shape of distribution with LPM degree +# for(i in c(0, 0.25, .5, 1, 2)){ +# idx <- which(i == c(0, 0.25, .5, 1, 2)) +# lines(sort(x), LPM.ratio(i, sort(x),x), col = rainbow(5, alpha = 1)[idx], lty = 1, lwd = 3) +# } +# +# lines(sort(zzz), norm_approx ,col='black', lty = 3, lwd = 2) +# +# +# legend("topleft",c("LPM.ratio(degree = 0)","LPM.ratio(degree = 0.25)","LPM.ratio(degree = 0.5)","LPM.ratio(degree = 1)","LPM.ratio(degree = 2)", "N(0,1) approximation"), +# col = c(rainbow(5)[1:5], "black"), lwd = 3, lty = c(rep(1, 5), 3)) +# +# +# +# +# y = hist(LPM.VaR(seq(0,1,length.out = 100), 0, x), plot = FALSE, breaks = 15) +# +# plot(y$breaks, +# c(y$counts,0), type = "s", +# col="black",lwd = 3, ylim = c(0,50), main = "Inverse CDF via LPM.VaR(degree 0)", breaks = 15, xlab = "x", ylab = "freq") +# hist(LPM.VaR(seq(0,1,length.out = 100), 0, x), add = TRUE, col = rainbow(5, alpha = .5)[1], breaks = 15) +# +# y = hist(LPM.VaR(seq(0,1,length.out = 100), 0, x), border = NA, plot = FALSE, breaks = 15) +# plot(y$breaks, +# c(y$counts,0) +# ,type="s",col="black",lwd = 3, ylim = c(0,50), main = "Inverse CDF via LPM.VaR(degree 0.25)", breaks = 15, xlab = "x", ylab = "freq") +# hist(LPM.VaR(seq(0,1,length.out = 100), .25, x), border = rainbow(5)[2], add = TRUE, col = rainbow(5, alpha = .5)[2], breaks = 15) +# +# y = hist(LPM.VaR(seq(0,1,length.out = 100), 0, x), plot = FALSE, breaks = 15) +# plot(y$breaks, +# c(y$counts,0) +# ,type="s",col="black",lwd = 3, ylim = c(0,50), main = "Inverse CDF via LPM.VaR(degree 0.5)", breaks = 15, xlab = "x", ylab = "freq") +# hist(LPM.VaR(seq(0,1,length.out = 100), .5, x), border = rainbow(5)[3], add = TRUE, col = rainbow(5, alpha = .5)[3], breaks = 15) +# +# y = hist(LPM.VaR(seq(0,1,length.out = 100), 0, x), plot = FALSE, breaks = 15) +# plot(y$breaks, +# c(y$counts,0) +# ,type="s",col="black",lwd = 3, ylim = c(0,50), main = "Inverse CDF via LPM.VaR(degree 1)", breaks = 15, xlab = "x", ylab = "freq") +# hist(LPM.VaR(seq(0,1,length.out = 100), 1, x), border = rainbow(5)[4], add = TRUE, col = rainbow(5, alpha = .5)[4], breaks = 15) +# +# y = hist(LPM.VaR(seq(0,1,length.out = 100), 0, x), plot = FALSE, breaks = 15) +# plot(y$breaks, +# c(y$counts,0) +# ,type="s",col="black",lwd = 3, ylim = c(0,50), main = "Inverse CDF via LPM.VaR(degree 2)", breaks = 15, xlab = "x", ylab = "freq") +# hist(LPM.VaR(seq(0,1,length.out = 100), 2, x), border = rainbow(5)[5], add = TRUE, col = rainbow(5, alpha = .5)[5], breaks = 15) + +## ----eval=FALSE--------------------------------------------------------------- +# degree.0.samples = LPM.VaR(percentile = seq(0, 1, length.out = 100), degree = 0, x = x) +# degree.0.25.samples = LPM.VaR(percentile = seq(0, 1, length.out = 100), degree = 0.25, x = x) +# degree.0.5.samples = LPM.VaR(percentile = seq(0, 1, length.out = 100), degree = 0.5, x = x) +# degree.1.samples = LPM.VaR(percentile = seq(0, 1, length.out = 100), degree = 1, x = x) +# degree.2.samples = LPM.VaR(percentile = seq(0, 1, length.out = 100), degree = 2, x = x) +# +# head(data.table::data.table(cbind("original x" = sort(x), degree.0.samples, +# degree.0.25.samples, +# degree.0.5.samples, +# degree.1.samples, +# degree.2.samples)), 10) +# +# original x degree.0.samples degree.0.25.samples degree.0.5.samples +# 1: -2.309169 -2.309169 -2.309097 -2.3090915 +# 2: -1.966617 -1.966617 -1.941190 -1.6935509 +# 3: -1.686693 -1.686693 -1.599486 -1.4541494 +# 4: -1.548753 -1.548753 -1.382553 -1.2462731 +# 5: -1.265396 -1.265396 -1.250823 -1.1453748 +# 6: -1.265061 -1.265061 -1.176436 -1.0745440 +# 7: -1.220718 -1.220718 -1.119655 -1.0252742 +# 8: -1.138137 -1.138137 -1.067793 -0.9868693 +# 9: -1.123109 -1.123109 -1.026429 -0.9322105 +# 10: -1.071791 -1.071791 -1.014276 -0.8710942 +# degree.1.samples degree.2.samples +# 1: -2.3091021 -2.3091170 +# 2: -1.4744653 -1.1614908 +# 3: -1.2159961 -0.9709972 +# 4: -1.0823023 -0.8610192 +# 5: -0.9968028 -0.7810300 +# 6: -0.9290505 -0.7169770 +# 7: -0.8666886 -0.6631888 +# 8: -0.8090433 -0.6170691 +# 9: -0.7556644 -0.5765608 +# 10: -0.7069835 -0.5403318 + +## ----fig.align='center', fig.width=8, fig.height=8, eval=FALSE---------------- +# boots = NNS.MC(x, reps = 1, lower_rho = -1, upper_rho = 1, by = .5)$replicates +# reps = do.call(cbind, boots) +# +# +# matplot(reps, type = "l", col = rainbow(length(boots))) +# lines(x, type = "l", lwd = 3, ylim = c(min(reps), max(reps))) + +## ----eval = FALSE------------------------------------------------------------- +# sapply(boots, function(r) cor(r, x, method = "spearman")) +# +# rho = 1 rho = 0.5 rho = 0 rho = -0.5 rho = -1 +# 0.99732373 0.51147915 0.01036904 -0.48720072 -0.98294629 + +## ----tgt_drift, fig.align='center', fig.width=8, fig.height=8, eval=FALSE----- +# boots = NNS.MC(x, reps = 1, lower_rho = -1, upper_rho = 1, by = .5, target_drift = 0.05)$replicates +# reps = do.call(cbind, boots) +# +# plot(x, type = "l", lwd = 3, ylim = c(min(c(x, reps)), max(c(x, reps)))) +# matplot(reps, type = "l", col = rainbow(length(boots)), add = TRUE) + +## ----multisim, eval=FALSE----------------------------------------------------- +# set.seed(123) +# x = rnorm(1000); y = rnorm(1000); z = rnorm(1000) +# +# # Add variable x to original data to avoid total independence (example only) +# original.data = cbind(x, y, z, x) +# +# # Determine dependence structure +# dep.structure = apply(original.data, 2, function(x) LPM.ratio(degree = 1, target = x, variable = x)) +# +# # Generate new data with different mean, sd and length (or distribution type) +# new.data = sapply(1:ncol(original.data), function(x) rnorm(nrow(original.data)*2, mean = 10, sd = 20)) +# +# # Apply dependence structure to new data +# new.dep.data = sapply(1:ncol(original.data), function(x) LPM.VaR(percentile = dep.structure[,x], degree = 1, x = new.data[,x])) + +## ----comparison, warning=FALSE, eval=FALSE------------------------------------ +# NNS.copula(original.data) +# NNS.copula(new.dep.data) +# +# [1] 0.4743531 +# [1] 0.4753264 + +## ----eval=FALSE--------------------------------------------------------------- +# head(original.data) +# head(new.dep.data) +# +# x y z x +# [1,] -0.56047565 -0.99579872 -0.5116037 -0.56047565 +# [2,] -0.23017749 -1.03995504 0.2369379 -0.23017749 +# [3,] 1.55870831 -0.01798024 -0.5415892 1.55870831 +# [4,] 0.07050839 -0.13217513 1.2192276 0.07050839 +# [5,] 0.12928774 -2.54934277 0.1741359 0.12928774 +# [6,] 1.71506499 1.04057346 -0.6152683 1.71506499 +# [,1] [,2] [,3] [,4] +# [1,] -2.028109 -10.498044 -0.2090467 -1.682949 +# [2,] 4.608303 -11.390485 15.6213689 4.852534 +# [3,] 39.478741 8.836581 -0.8508203 40.585505 +# [4,] 10.683731 6.609255 36.0328589 10.877677 +# [5,] 11.866922 -47.955235 14.3111350 12.064633 +# [6,] 42.665726 29.639640 -2.4141874 43.797025 + +## ----eval=FALSE--------------------------------------------------------------- +# # Apply bootstrap to each variable +# new.boot.dep.data = apply(original.data, 2, function(r) NNS.meboot(r, reps = 100, rho = .95)) +# +# # Reformat into vectors +# boot.ensemble.vectors = lapply(new.boot.dep.data, function(z) unlist(z["ensemble",])) +# +# # Create matrix from vectors +# new.boot.dep.matrix = do.call(cbind, boot.ensemble.vectors) + +## ----eval=FALSE--------------------------------------------------------------- +# for(i in 1:4) print(cor(new.boot.dep.matrix[,i], original.data[,i], method = "spearman")) +# +# [1] 0.9452863 +# [1] 0.9499478 +# [1] 0.945878 +# [1] 0.9442845 + +## ----eval=FALSE--------------------------------------------------------------- +# NNS.copula(original.data) +# NNS.copula(new.boot.dep.matrix) +# +# [1] 0.4743531 +# [1] 0.4517661 + +## ----eval=FALSE--------------------------------------------------------------- +# head(original.data) +# head(new.boot.dep.matrix) +# +# x y z x +# [1,] -0.56047565 -0.99579872 -0.5116037 -0.56047565 +# [2,] -0.23017749 -1.03995504 0.2369379 -0.23017749 +# [3,] 1.55870831 -0.01798024 -0.5415892 1.55870831 +# [4,] 0.07050839 -0.13217513 1.2192276 0.07050839 +# [5,] 0.12928774 -2.54934277 0.1741359 0.12928774 +# [6,] 1.71506499 1.04057346 -0.6152683 1.71506499 +# x y z x +# ensemble1 -0.4268047 -0.7794553 -0.6364458 -0.4642642 +# ensemble2 -0.2965744 -1.0682197 0.3297265 -0.2531178 +# ensemble3 1.3302149 0.3054734 -0.4014515 1.4914884 +# ensemble4 0.2257378 0.3108846 1.0603892 0.1728540 +# ensemble5 0.4716743 -3.3344967 -0.1917697 0.4309379 +# ensemble6 1.3984978 1.1881374 -0.5295386 1.5326055 + diff --git a/tools/NNS/vignettes/NNSvignette_05_Sampling.html b/tools/NNS/vignettes/NNSvignette_05_Sampling.html new file mode 100644 index 0000000..660af71 --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_05_Sampling.html @@ -0,0 +1,657 @@ + + + + + + + + + + + + + + + +Getting Started with NNS: Sampling and Simulation + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Getting Started with NNS: Sampling and +Simulation

    +

    Fred Viole

    + + + +
    library(NNS)
    +library(data.table)
    +require(knitr)
    +require(rgl)
    +

    NNS offers several novel sampling methods from any +distribution, as well as simulating variables while maintaining their +dependence.

    +
    +

    Sampling

    +
    +

    CDFs

    +

    Cumulative distribution functions (CDFs) represent the probability a +variable \(X\) will take a value less +than or equal to \(x\). \[F(x) = P(X \leq x)\]

    +
    +

    Empirical CDF

    +

    The empirical CDF is a simple construct, provided in the base package +of R. We can generate an empirical CDF with the ecdf +function and create a function (P) to return the CDF of a +given value of \(X\).

    +
    set.seed(123); x = rnorm(100)
    +ecdf(x)
    +
    ## Empirical CDF 
    +## Call: ecdf(x)
    +##  x[1:100] = -2.3092, -1.9666, -1.6867,  ...,  2.169, 2.1873
    +
    P = ecdf(x)
    +P(0); P(1)
    +
    ## [1] 0.48
    +
    ## [1] 0.83
    +
    +
    +

    Lower Partial Moment CDF +(LPM.ratio)

    +

    The empirical CDF and Lower Partial Moment CDF +(LPM.ratio) are identical when the degree +term of the LPM.ratio is set to zero.

    +

    Degree 0 LPM: \[LPM(0,t,X)=\frac{1}{N}\sum_{n=1}^{N}[max(t-X_n),0]^0\] +LPM.ratio is equivalent to the following form for any +target \((t)\) and variable \(X\): \[LPM(0,t,X)=\frac{LPM(0,t,X)}{LPM(0,t,X)+UPM(0,t,X)}\]

    +

    Using the same targets from our ecdf example above (0,1) +we can compare LPM.ratios.

    +
    LPM.ratio(degree = 0, target = 0, variable = x); LPM.ratio(degree = 0, target = 1, variable = x)
    +
    ## [1] 0.48
    +
    ## [1] 0.83
    +

    Calculating the probability for every target value in +\(X\), we can plot both methods +visualizing their identical results. ecdf function in black +and LPM.ratio in red.

    +

    +
    +
    +

    LPM.ratio degree > 0

    +

    By simply increasing the degree parameter to any +positive real number, we can generate different CDFs of our initial +distribution \(x\).

    +

    +
    +
    +

    Generating PDFs with (LPM.VaR)

    +

    We can now generate distributions using the same insights and +degree manipulation in the corresponding +LPM.VaR function, a la value-at-risk, +providing inverse CDF estimates.

    +

    The general form in the following plots is:

    +

    LPM.VaR(percentile = seq(0, 1, length.out = 100), degree = 0, x = x)

    +

    Any length percentile can be used to sample from the +underlying distribution \(x\).

    +

    +

    Viewing the first 10 samples from each of the degrees +compared to our original \(X\).

    +
    degree.0.samples = LPM.VaR(percentile = seq(0, 1, length.out = 100), degree = 0, x = x)
    +degree.0.25.samples = LPM.VaR(percentile = seq(0, 1, length.out = 100), degree = 0.25, x = x)
    +degree.0.5.samples = LPM.VaR(percentile = seq(0, 1, length.out = 100), degree = 0.5, x = x)
    +degree.1.samples = LPM.VaR(percentile = seq(0, 1, length.out = 100), degree = 1, x = x)
    +degree.2.samples = LPM.VaR(percentile = seq(0, 1, length.out = 100), degree = 2, x = x)
    +
    +head(data.table::data.table(cbind("original x" = sort(x), degree.0.samples, 
    +                                                          degree.0.25.samples, 
    +                                                          degree.0.5.samples, 
    +                                                          degree.1.samples, 
    +                                                          degree.2.samples)), 10)
    +
    +     original x degree.0.samples degree.0.25.samples degree.0.5.samples
    +  1:  -2.309169        -2.309169           -2.309097         -2.3090915
    +  2:  -1.966617        -1.966617           -1.941190         -1.6935509
    +  3:  -1.686693        -1.686693           -1.599486         -1.4541494
    +  4:  -1.548753        -1.548753           -1.382553         -1.2462731
    +  5:  -1.265396        -1.265396           -1.250823         -1.1453748
    +  6:  -1.265061        -1.265061           -1.176436         -1.0745440
    +  7:  -1.220718        -1.220718           -1.119655         -1.0252742
    +  8:  -1.138137        -1.138137           -1.067793         -0.9868693
    +  9:  -1.123109        -1.123109           -1.026429         -0.9322105
    + 10:  -1.071791        -1.071791           -1.014276         -0.8710942
    +     degree.1.samples degree.2.samples
    +  1:       -2.3091021       -2.3091170
    +  2:       -1.4744653       -1.1614908
    +  3:       -1.2159961       -0.9709972
    +  4:       -1.0823023       -0.8610192
    +  5:       -0.9968028       -0.7810300
    +  6:       -0.9290505       -0.7169770
    +  7:       -0.8666886       -0.6631888
    +  8:       -0.8090433       -0.6170691
    +  9:       -0.7556644       -0.5765608
    + 10:       -0.7069835       -0.5403318
    +
    +
    +
    +
    +

    Simulation

    +
    +

    Bootstrapping (NNS.meboot)

    +

    NNS.meboot is based on the maximum +entropy bootstrap, available in the R-package meboot. This +procedure is specifically designed for time-series and avoids the IID +assumption in traditional methods.

    +

    The ability to sample from specified correlations ensures the full +spectrum of future paths is sampled from. Typical Monte Carlo samples +are restricted to [-0.3, 0.3] correlations to the original data.

    +

    We will generate 1 replicate of \(X\) for each value of a sequence of \(\rho\) values (the \(ensemble\)), and then plot the results +compared to our original \(X\) (black +line). NNS.MC is a streamlined wrapper +function for this functionality of +NNS.meboot.

    +
    boots = NNS.MC(x, reps = 1, lower_rho = -1, upper_rho = 1, by = .5)$replicates
    +reps = do.call(cbind, boots)
    +
    +
    +matplot(reps, type = "l", col = rainbow(length(boots)))
    +lines(x, type = "l", lwd = 3, ylim = c(min(reps), max(reps)))
    +

    +

    Checking our replicate correlations:

    +
    sapply(boots, function(r) cor(r, x, method = "spearman"))
    +
    +    rho = 1   rho = 0.5     rho = 0  rho = -0.5    rho = -1 
    + 0.99732373  0.51147915  0.01036904 -0.48720072 -0.98294629 
    +

    More replicates and ensembles thereof can be generated for any number +of \(\rho\) values.

    +
    +

    target_drift Specification

    +

    We can also specify a target drift in our replicates with the +target_drift parameter.

    +
    boots = NNS.MC(x, reps = 1, lower_rho = -1, upper_rho = 1, by = .5, target_drift = 0.05)$replicates
    +reps = do.call(cbind, boots)
    +
    +plot(x, type = "l", lwd = 3, ylim = c(min(c(x, reps)), max(c(x, reps))))
    +matplot(reps, type = "l", col = rainbow(length(boots)), add = TRUE)
    +

    +

    Please see the full NNS.meboot and +NNS.MC argument documentation.

    +
    +
    +
    +

    Simulating a Multivariate Dependence Structure

    +

    Analogous to an empirical copula transformation, we can generate +new data from the dependence structure of our +original data via the following steps:

    +
      +
    • Determine the dependence structure:
    • +
    +

    This is accomplished using +LPM.ratio(1, x, x) for continuous +variables, and LPM.ratio(0, x, x) for +discrete variables, which are the empirical CDFs of the marginal +variables.

    +
      +
    • Generate or supply new data:
    • +
    +

    new data does not have to be of the same distribution or +dimension as the original data, nor does each dimension of +new data have to share a distribution type.

    +
      +
    • Apply dependence structure to +new data:
    • +
    +

    We then utilize LPM.VaR to ascertain +new data values corresponding to original data +position mappings, and return a matrix of these transformed values with +the same dimensions as new.data.

    +
    set.seed(123)
    +x = rnorm(1000); y = rnorm(1000); z = rnorm(1000)
    +
    +# Add variable x to original data to avoid total independence (example only)
    +original.data = cbind(x, y, z, x)
    +
    +# Determine dependence structure
    +dep.structure = apply(original.data, 2, function(x) LPM.ratio(degree = 1, target = x, variable = x))
    +  
    +# Generate new data with different mean, sd and length (or distribution type)
    +new.data = sapply(1:ncol(original.data), function(x) rnorm(nrow(original.data)*2, mean = 10, sd = 20))
    +
    +# Apply dependence structure to new data
    +new.dep.data = sapply(1:ncol(original.data), function(x) LPM.VaR(percentile = dep.structure[,x], degree = 1, x = new.data[,x]))
    +
    +

    Compare Multivariate Dependence Structures

    +

    Similar dependence with radically different values, since we used +\(N(10, 20)\) in place of our original +\(N(0,1)\) observations.

    +
    NNS.copula(original.data)
    +NNS.copula(new.dep.data)
    +
    +[1] 0.4743531
    +[1] 0.4753264
    +
    head(original.data)
    +head(new.dep.data)
    +
    +               x           y          z           x
    +[1,] -0.56047565 -0.99579872 -0.5116037 -0.56047565
    +[2,] -0.23017749 -1.03995504  0.2369379 -0.23017749
    +[3,]  1.55870831 -0.01798024 -0.5415892  1.55870831
    +[4,]  0.07050839 -0.13217513  1.2192276  0.07050839
    +[5,]  0.12928774 -2.54934277  0.1741359  0.12928774
    +[6,]  1.71506499  1.04057346 -0.6152683  1.71506499
    +          [,1]       [,2]       [,3]      [,4]
    +[1,] -2.028109 -10.498044 -0.2090467 -1.682949
    +[2,]  4.608303 -11.390485 15.6213689  4.852534
    +[3,] 39.478741   8.836581 -0.8508203 40.585505
    +[4,] 10.683731   6.609255 36.0328589 10.877677
    +[5,] 11.866922 -47.955235 14.3111350 12.064633
    +[6,] 42.665726  29.639640 -2.4141874 43.797025
    +
    +
    +
    +

    Alternative Using NNS.meboot

    +

    Alternatively, if we wish to keep the simulated values close to the +original data, we can apply the NNS.meboot +procedure to each of the variables.

    +

    We will generate 1 replicate (for brevity) of \(\rho = 0.95\) to our +original.data, use their ensemble and note the +multivariate dependence among our new.boot.dep.data.

    +
    # Apply bootstrap to each variable
    +new.boot.dep.data = apply(original.data, 2, function(r) NNS.meboot(r, reps = 100, rho = .95))
    +
    +# Reformat into vectors
    +boot.ensemble.vectors = lapply(new.boot.dep.data, function(z) unlist(z["ensemble",]))
    +
    +# Create matrix from vectors
    +new.boot.dep.matrix = do.call(cbind, boot.ensemble.vectors)
    +

    Checking ensemble correlations with +original.data:

    +
    for(i in 1:4) print(cor(new.boot.dep.matrix[,i], original.data[,i], method = "spearman"))
    +
    +[1] 0.9452863
    +[1] 0.9499478
    +[1] 0.945878
    +[1] 0.9442845
    +
    +

    Compare Multivariate Dependence Structures

    +

    Similar dependence with similar values.

    +
    NNS.copula(original.data)
    +NNS.copula(new.boot.dep.matrix)
    +
    +[1] 0.4743531
    +[1] 0.4517661
    +
    head(original.data)
    +head(new.boot.dep.matrix)
    +
    +               x           y          z           x
    +[1,] -0.56047565 -0.99579872 -0.5116037 -0.56047565
    +[2,] -0.23017749 -1.03995504  0.2369379 -0.23017749
    +[3,]  1.55870831 -0.01798024 -0.5415892  1.55870831
    +[4,]  0.07050839 -0.13217513  1.2192276  0.07050839
    +[5,]  0.12928774 -2.54934277  0.1741359  0.12928774
    +[6,]  1.71506499  1.04057346 -0.6152683  1.71506499
    +                   x          y          z          x
    +ensemble1 -0.4268047 -0.7794553 -0.6364458 -0.4642642
    +ensemble2 -0.2965744 -1.0682197  0.3297265 -0.2531178
    +ensemble3  1.3302149  0.3054734 -0.4014515  1.4914884
    +ensemble4  0.2257378  0.3108846  1.0603892  0.1728540
    +ensemble5  0.4716743 -3.3344967 -0.1917697  0.4309379
    +ensemble6  1.3984978  1.1881374 -0.5295386  1.5326055
    +
    +
    +
    + + + + + + + + + + + + diff --git a/tools/NNS/vignettes/NNSvignette_06_Comparing_Distributions.R b/tools/NNS/vignettes/NNSvignette_06_Comparing_Distributions.R new file mode 100644 index 0000000..2e96e9f --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_06_Comparing_Distributions.R @@ -0,0 +1,103 @@ +## ----setup, include=FALSE, message=FALSE-------------------------------------- +knitr::opts_chunk$set(echo = TRUE) +library(NNS) +library(data.table) +data.table::setDTthreads(2L) +options(mc.cores = 1) +Sys.setenv("OMP_THREAD_LIMIT" = 1) +RcppParallel::setThreadOptions(numThreads = 1) + +## ----setup2,message=FALSE,warning = FALSE------------------------------------- +library(NNS) +library(data.table) +require(knitr) +require(rgl) + +## ----cars, fig.width=10, fig.align='center'----------------------------------- +mpg_auto_trans = mtcars[mtcars$am==1, "mpg"] +mpg_man_trans = mtcars[mtcars$am==0, "mpg"] + +NNS.ANOVA(control = mpg_man_trans, treatment = mpg_auto_trans, robust = TRUE) + +## ----cars2, warning=FALSE----------------------------------------------------- +wilcox.test(mpg ~ am, data=mtcars) + +## ----equalmeans, echo=TRUE, fig.width=10, fig.align='center'------------------ +set.seed(123) +x = rnorm(1000, mean = 0, sd = 1) +y = rnorm(1000, mean = 0, sd = 2) + +NNS.ANOVA(control = x, treatment = y, + means.only = TRUE, robust = TRUE, plot = TRUE) + +t.test(x,y) + +## ----unequalmeans, echo=TRUE, fig.width=10, fig.align='center'---------------- +set.seed(123) +x = rnorm(1000, mean = 0, sd = 1) +y = rnorm(1000, mean = 1, sd = 1) + +NNS.ANOVA(control = x, treatment = y, + means.only = TRUE, robust = TRUE, plot = TRUE) + +t.test(x,y) + +## ----unequalmedians, echo=TRUE, fig.width=10, fig.align='center'-------------- +NNS.ANOVA(control = x, treatment = y, + means.only = TRUE, medians = TRUE, robust = TRUE, plot = TRUE) + +## ----stochsuperiority, echo=TRUE, eval=TRUE----------------------------------- +set.seed(123) +x = rnorm(1000, mean = 0, sd = 1) +y = rnorm(1000, mean = 1, sd = 1) + +NNS.SS(x, y) + +## ----stochsuperiorityci, echo=TRUE, eval = FALSE------------------------------ +# NNS.SS(x, y, confidence.interval = TRUE, reps = 999, ci = 0.95)[1:5] +# +# $p_gt +# [1] 0.233915 +# +# $p_tie +# [1] 0 +# +# $p_star +# [1] 0.233915 +# +# $lower +# [1] 0.2105631 +# +# $upper +# [1] 0.2537789 + +## ----stochsuperioritydiscrete, echo=TRUE, eval=TRUE--------------------------- +set.seed(123) +x = sample(1:5, 100, replace = TRUE) +y = sample(1:5, 100, replace = TRUE) + +NNS.SS(x, y) + +## ----stochdom, fig.width=7, fig.align='center'-------------------------------- +set.seed(123) +x = rnorm(1000, mean = 0, sd = 1) +y = rnorm(1000, mean = 1, sd = 1) + +NNS.FSD(x, y) + +## ----stochdomset, eval=TRUE--------------------------------------------------- +set.seed(123) +x1 = rnorm(1000) +x2 = x1 + 1 +x3 = rnorm(1000) +x4 = x3 + 1 +x5 = rnorm(1000) +x6 = x5 + 1 +x7 = rnorm(1000) +x8 = x7 + 1 + +NNS.SD.efficient.set(cbind(x1, x2, x3, x4, x5, x6, x7, x8), degree = 1, status = FALSE) + +## ----stochdomclust, eval=TRUE, fig.width=7, fig.align='center'---------------- +NNS.SD.cluster(cbind(x1, x2, x3, x4, x5, x6, x7, x8), degree = 1, dendrogram = TRUE) + diff --git a/tools/NNS/vignettes/NNSvignette_06_Comparing_Distributions.html b/tools/NNS/vignettes/NNSvignette_06_Comparing_Distributions.html new file mode 100644 index 0000000..b303598 --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_06_Comparing_Distributions.html @@ -0,0 +1,775 @@ + + + + + + + + + + + + + + + +Getting Started with NNS: Comparing Distributions + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Getting Started with NNS: Comparing +Distributions

    +

    Fred Viole

    + + + +
    library(NNS)
    +library(data.table)
    +require(knitr)
    +require(rgl)
    +
    +

    Comparing Distributions

    +

    NNS offers a multitude of ways to test +if distributions came from the same population, or if they share the +same mean or median. The underlying function for these tests is +NNS.ANOVA().

    +

    The output from NNS.ANOVA() is a +Certainty statistic, which compares CDFs of distributions +from several shared quantiles and normalizes the similarity of these +points to be within the interval \([0,1]\), with 1 representing identical +distributions. For a complete analysis of Certainty to +common p-values and the role of power, please see the References.

    +
    +

    Test if Same Population

    +

    Below we run the analysis to whether automatic transmissions and +manual transmissions have significantly different mpg +distributions per the mtcars dataset.

    +

    The plot on the left shows the robust Certainty +estimate, reflecting the distribution of Certainty +estimates over 100 random permutations of both variables. The plot on +the right illustrates the control and treatment variables, along with +the grand mean among variables, and the confidence interval associated +with the control mean.

    +
    mpg_auto_trans = mtcars[mtcars$am==1, "mpg"]
    +mpg_man_trans = mtcars[mtcars$am==0, "mpg"]
    +
    +NNS.ANOVA(control = mpg_man_trans, treatment = mpg_auto_trans, robust = TRUE)
    +

    +
    ## $Control
    +## [1] 17.14737
    +## 
    +## $Treatment
    +## [1] 24.39231
    +## 
    +## $Grand_Statistic
    +## [1] 20.09063
    +## 
    +## $Control_CDF
    +## [1] 0.8708501
    +## 
    +## $Treatment_CDF
    +## [1] 0.1294878
    +## 
    +## $Certainty
    +## [1] 0.02345583
    +## 
    +## $`Effect_Size_LB.2.5%`
    +## [1] 2.4708
    +## 
    +## $`Effect_Size_UB.97.5%`
    +## [1] 11.88554
    +## 
    +## $Confidence_Level
    +## [1] 0.95
    +## 
    +## $`Robust Certainty Estimate`
    +## [1] 0.01113453
    +## 
    +## $`Lower 95% CI`
    +## [1] 0
    +## 
    +## $`Upper 95% CI`
    +## [1] 0.1046841
    +

    The Certainty shows that these two distributions clearly +do not come from the same population. This is verified with the +Mann-Whitney-Wilcoxon test, which also does not assume a normality to +the underlying data as a nonparametric test of identical +distributions.

    +
    wilcox.test(mpg ~ am, data=mtcars) 
    +
    ## 
    +##  Wilcoxon rank sum test with continuity correction
    +## 
    +## data:  mpg by am
    +## W = 42, p-value = 0.001871
    +## alternative hypothesis: true location shift is not equal to 0
    +
    +
    +

    Test if means are Equal

    +

    Here we provide the output from +NNS.ANOVA() and t.test() +functions on two Normal distribution samples, where we are pretty +certain these two means are equal.

    +
    set.seed(123)
    +x = rnorm(1000, mean = 0, sd = 1)
    +y = rnorm(1000, mean = 0, sd = 2)
    +
    +NNS.ANOVA(control = x, treatment = y,
    +          means.only = TRUE, robust = TRUE, plot = TRUE)
    +

    +
    ## $Control
    +## [1] 0.01612787
    +## 
    +## $Treatment
    +## [1] 0.08493051
    +## 
    +## $Grand_Statistic
    +## [1] 0.05052919
    +## 
    +## $Control_CDF
    +## [1] 0.5218858
    +## 
    +## $Treatment_CDF
    +## [1] 0.4893919
    +## 
    +## $Certainty
    +## [1] 0.912545
    +## 
    +## $`Effect_Size_LB.2.5%`
    +## [1] -0.1215839
    +## 
    +## $`Effect_Size_UB.97.5%`
    +## [1] 0.2556542
    +## 
    +## $Confidence_Level
    +## [1] 0.95
    +## 
    +## $`Robust Certainty Estimate`
    +## [1] 0.9183685
    +## 
    +## $`Lower 95% CI`
    +## [1] 0.7311398
    +## 
    +## $`Upper 95% CI`
    +## [1] 0.9928339
    +
    t.test(x,y)
    +
    ## 
    +##  Welch Two Sample t-test
    +## 
    +## data:  x and y
    +## t = -0.96711, df = 1454.4, p-value = 0.3336
    +## alternative hypothesis: true difference in means is not equal to 0
    +## 95 percent confidence interval:
    +##  -0.20835512  0.07074984
    +## sample estimates:
    +##  mean of x  mean of y 
    +## 0.01612787 0.08493051
    +
    +
    +

    Test if means are Unequal

    +

    By altering the mean of the y variable, we can start to +see the sensitivity of the results from the two methods, where both +firmly reject the null hypothesis of identical means.

    +
    set.seed(123)
    +x = rnorm(1000, mean = 0, sd = 1)
    +y = rnorm(1000, mean = 1, sd = 1)
    +
    +NNS.ANOVA(control = x, treatment = y,
    +          means.only = TRUE, robust = TRUE, plot = TRUE)
    +

    +
    ## $Control
    +## [1] 0.01612787
    +## 
    +## $Treatment
    +## [1] 1.042465
    +## 
    +## $Grand_Statistic
    +## [1] 0.5292966
    +## 
    +## $Control_CDF
    +## [1] 0.7862176
    +## 
    +## $Treatment_CDF
    +## [1] 0.2197938
    +## 
    +## $Certainty
    +## [1] 0.1824463
    +## 
    +## $`Effect_Size_LB.2.5%`
    +## [1] 0.900412
    +## 
    +## $`Effect_Size_UB.97.5%`
    +## [1] 1.148409
    +## 
    +## $Confidence_Level
    +## [1] 0.95
    +## 
    +## $`Robust Certainty Estimate`
    +## [1] 0.1788691
    +## 
    +## $`Lower 95% CI`
    +## [1] 0.1484567
    +## 
    +## $`Upper 95% CI`
    +## [1] 0.2114944
    +
    t.test(x,y)
    +
    ## 
    +##  Welch Two Sample t-test
    +## 
    +## data:  x and y
    +## t = -22.933, df = 1997.4, p-value < 2.2e-16
    +## alternative hypothesis: true difference in means is not equal to 0
    +## 95 percent confidence interval:
    +##  -1.1141064 -0.9385684
    +## sample estimates:
    +##  mean of x  mean of y 
    +## 0.01612787 1.04246525
    +

    The effect size from NNS.ANOVA() is +calculated from the confidence interval of the control mean and the +specified y shift of 1 is within the provided lower and +upper effect boundaries.

    +
    +
    +

    Medians

    +

    In order to test medians instead of means, simply set both +means.only = TRUE and medians = TRUE in +NNS.ANOVA().

    +
    NNS.ANOVA(control = x, treatment = y,
    +          means.only = TRUE, medians = TRUE, robust = TRUE, plot = TRUE)
    +

    +
    ## $Control
    +## [1] 0.009209639
    +## 
    +## $Treatment
    +## [1] 1.054852
    +## 
    +## $Grand_Statistic
    +## [1] 0.532031
    +## 
    +## $Control_CDF
    +## [1] 0.704
    +## 
    +## $Treatment_CDF
    +## [1] 0.305
    +## 
    +## $Certainty
    +## [1] 0.3497634
    +## 
    +## $`Effect_Size_LB.2.5%`
    +## [1] 0.8659585
    +## 
    +## $`Effect_Size_UB.97.5%`
    +## [1] 1.222394
    +## 
    +## $Confidence_Level
    +## [1] 0.95
    +## 
    +## $`Robust Certainty Estimate`
    +## [1] 0.3448958
    +## 
    +## $`Lower 95% CI`
    +## [1] 0.2856004
    +## 
    +## $`Upper 95% CI`
    +## [1] 0.4308527
    +
    +
    +
    +

    Stochastic Superiority

    +

    Stochastic superiority asks a different question than equality of +means or equality of distributions. Rather than testing whether two +samples came from the same population, or whether they share the same +mean or median, stochastic superiority measures the probability that a +random draw from one distribution exceeds a random draw from +another.

    +

    For two random variables \(X\) and +\(Y\), the stochastic superiority +probability is:

    +

    \[ +P(X > Y) +\]

    +

    and with ties accounted for, the tie-adjusted stochastic superiority +measure is:

    +

    \[ +P^* = P(X > Y) + \frac{1}{2} P(X = Y) +\]

    +

    A value of \(P^* = 0.5\) indicates +no directional advantage, values above \(0.5\) favor \(X\), and values below \(0.5\) favor \(Y\).

    +

    This differs from stochastic dominance. Stochastic superiority is a +pairwise exceedance probability, while stochastic dominance requires one +distribution to be preferred to another over the entire shared +support.

    +

    Below is an example using the same data generating process from the +unequal means example.

    +
    set.seed(123)
    +x = rnorm(1000, mean = 0, sd = 1)
    +y = rnorm(1000, mean = 1, sd = 1)
    +
    +NNS.SS(x, y)
    +
    ## $p_gt
    +## [1] 0.233915
    +## 
    +## $p_tie
    +## [1] 0
    +## 
    +## $p_star
    +## [1] 0.233915
    +

    Since \(y\) was generated with a +higher mean, the stochastic superiority probability for \(x\) relative to \(y\) should be less than \(0.5\), indicating that a draw from \(x\) is less likely to exceed a draw from +\(y\).

    +

    We can also obtain confidence intervals for the tie-adjusted +superiority probability using maximum entropy bootstrap replicates.

    +
    NNS.SS(x, y, confidence.interval = TRUE, reps = 999, ci = 0.95)[1:5]
    +
    +$p_gt
    +[1] 0.233915
    +
    +$p_tie
    +[1] 0
    +
    +$p_star
    +[1] 0.233915
    +
    +$lower
    +[1] 0.2105631
    +
    +$upper
    +[1] 0.2537789
    +

    This provides an interpretable effect size for directional comparison +between two distributions without requiring identical distributions or +equal variances.

    +

    For discrete variables, ties may occur with positive probability, and +the reported p_tie and p_star values reflect +that adjustment explicitly.

    +
    set.seed(123)
    +x = sample(1:5, 100, replace = TRUE)
    +y = sample(1:5, 100, replace = TRUE)
    +
    +NNS.SS(x, y)
    +
    ## $p_gt
    +## [1] 0.3982
    +## 
    +## $p_tie
    +## [1] 0.1992
    +## 
    +## $p_star
    +## [1] 0.4978
    +
    +
    +

    Stochastic Dominance

    +

    Another method of comparing distributions involves a test for +stochastic dominance. The first, second, and third degree stochastic +dominance tests are available in NNS +via:

    +
      +
    • NNS.FSD()

    • +
    • NNS.SSD()

    • +
    • NNS.TSD()

    • +
    +
    set.seed(123)
    +x = rnorm(1000, mean = 0, sd = 1)
    +y = rnorm(1000, mean = 1, sd = 1)
    +
    +NNS.FSD(x, y)
    +

    +
    ## [1] "Y FSD X"
    +

    NNS.FSD() correctly identifies the +shift in the y variable we specified when testing for +unequal means.

    +
    +

    Stochastic Dominant Efficient Sets

    +

    NNS also offers the ability to isolate +a set of variables that do not have any dominated constituents with the +NNS.SD.efficient.set() function.

    +

    x2, x4, x6, x8 all dominate their preceding +distributions yet do not dominate one another, and are thus included in +the first degree stochastic dominance efficient set.

    +
    set.seed(123)
    +x1 = rnorm(1000)
    +x2 = x1 + 1
    +x3 = rnorm(1000)
    +x4 = x3 + 1
    +x5 = rnorm(1000)
    +x6 = x5 + 1
    +x7 = rnorm(1000)
    +x8 = x7 + 1
    +
    +NNS.SD.efficient.set(cbind(x1, x2, x3, x4, x5, x6, x7, x8), degree = 1, status = FALSE)
    +
    ## [1] "x4" "x2" "x8" "x6"
    +
    +
    +

    Stochastic Dominant Clusters

    +

    Further, we can assign clusters to non dominated constituents and +represent the clustering in a dendrogram.

    +
    NNS.SD.cluster(cbind(x1, x2, x3, x4, x5, x6, x7, x8), degree = 1, dendrogram = TRUE)
    +

    +
    ## $Clusters
    +## $Clusters$Cluster_1
    +## [1] "x4" "x2" "x8" "x6"
    +## 
    +## $Clusters$Cluster_2
    +## [1] "x3" "x1" "x7" "x5"
    +## 
    +## 
    +## $Dendrogram
    +## 
    +## Call:
    +## hclust(d = dist_matrix, method = "complete")
    +## 
    +## Cluster method   : complete 
    +## Number of objects: 8
    +
    +
    + + + + + + + + + + + + diff --git a/tools/NNS/vignettes/NNSvignette_07_Clustering_and_Regression.R b/tools/NNS/vignettes/NNSvignette_07_Clustering_and_Regression.R new file mode 100644 index 0000000..37e3797 --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_07_Clustering_and_Regression.R @@ -0,0 +1,232 @@ +## ----setup, include=FALSE, message=FALSE-------------------------------------- +knitr::opts_chunk$set(echo = TRUE) +library(NNS) +library(data.table) +data.table::setDTthreads(1L) +options(mc.cores = 1) +RcppParallel::setThreadOptions(numThreads = 1) +Sys.setenv("OMP_THREAD_LIMIT" = 1) + +## ----setup2, message=FALSE, warning=FALSE------------------------------------- +library(NNS) +library(data.table) +require(knitr) +require(rgl) + +## ----linear------------------------------------------------------------------- +x = seq(-5, 5, .05); y = x ^ 3 + +for(i in 1 : 4){NNS.part(x, y, order = i, Voronoi = TRUE, obs.req = 0)} + +## ----x part,results='hide'---------------------------------------------------- +for(i in 1 : 4){NNS.part(x, y, order = i, type = "XONLY", Voronoi = TRUE)} + +## ----res2, echo=FALSE--------------------------------------------------------- +NNS.part(x,y,order = 4, type = "XONLY") + +## ----depreg},results='hide'--------------------------------------------------- +for(i in 1 : 3){NNS.part(x, y, order = i, obs.req = 0, Voronoi = TRUE, type = "XONLY") ; NNS.reg(x, y, order = i, ncores = 1)} + +## ----nonlinear,fig.width=5,fig.height=3,fig.align = "center"------------------ +NNS.reg(x, y, ncores = 1) + +## ----nonlinear multi,fig.width=5,fig.height=3,fig.align = "center"------------ +f = function(x, y) x ^ 3 + 3 * y - y ^ 3 - 3 * x +y = x ; z <- expand.grid(x, y) +g = f(z[ , 1], z[ , 2]) +NNS.reg(z, g, order = "max", plot = FALSE, ncores = 1) + +## ----nonlinear_class,fig.width=5,fig.height=3,fig.align = "center", message = FALSE---- +NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", location = "topleft", ncores = 1)$equation + +## ----nonlinear_class2,fig.width=5,fig.height=3,fig.align = "center", message = FALSE, echo=FALSE---- +a = NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", location = "topleft", ncores = 1, plot = FALSE)$equation + +## ----nonlinear class threshold,fig.width=5,fig.height=3,fig.align = "center"---- +NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, location = "topleft", ncores = 1)$equation + +## ----nonlinear class threshold 2,fig.width=5,fig.height=3,fig.align = "center", echo=FALSE---- +a = NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, location = "topleft", ncores = 1, plot = FALSE)$equation + +## ----final,fig.width=5,fig.height=3,fig.align = "center"---------------------- +NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, point.est = iris[1 : 10, 1 : 4], location = "topleft", ncores = 1)$Point.est + +## ----class,fig.width=5,fig.height=3,fig.align = "center", message=FALSE------- +NNS.reg(iris[ , 1 : 4], iris[ , 5], type = "CLASS", point.est = iris[1 : 10, 1 : 4], location = "topleft", ncores = 1)$Point.est + +## ----stack,fig.width=5,fig.height=3,fig.align = "center", message=FALSE, eval=FALSE---- +# NNS.stack(IVs.train = iris[ , 1 : 4], +# DV.train = iris[ , 5], +# IVs.test = iris[1 : 10, 1 : 4], +# dim.red.method = "cor", +# obj.fn = expression( mean(round(predicted) == actual) ), +# objective = "max", type = "CLASS", +# folds = 1, ncores = 1) + +## ----stackevalres, eval = FALSE----------------------------------------------- +# Folds Remaining = 0 +# Current NNS.reg(... , threshold = 0.9350 ) | eval(obj.fn) = 1.000000 | MAX Iterations Remaining = 2 +# Current NNS.reg(... , threshold = 0.7950 ) | eval(obj.fn) = 0.973684 | MAX Iterations Remaining = 1 +# Current NNS.reg(... , threshold = 0.4400 ) | eval(obj.fn) = 0.894737 | MAX Iterations Remaining = 0 +# Current NNS.reg(. , n.best = 1 ) | eval(obj.fn) = 0.868421 | MAX Iterations Remaining = 12 +# Current NNS.reg(. , n.best = 2 ) | eval(obj.fn) = 0.736842 | MAX Iterations Remaining = 11 +# Current NNS.reg(. , n.best = 3 ) | eval(obj.fn) = 0.763158 | MAX Iterations Remaining = 10 +# Current NNS.reg(. , n.best = 4 ) | eval(obj.fn) = 0.736842 | MAX Iterations Remaining = 9 +# $OBJfn.reg +# [1] 0.9733333 +# +# $NNS.reg.n.best +# [1] 1 +# +# $probability.threshold +# [1] 0.495 +# +# $OBJfn.dim.red +# [1] 0.9666667 +# +# $NNS.dim.red.threshold +# [1] 0.935 +# +# $reg +# [1] 1 1 1 1 1 1 1 1 1 1 +# +# $reg.pred.int +# NULL +# +# $dim.red +# [1] 1 1 1 1 1 1 1 1 1 1 +# +# $dim.red.pred.int +# NULL +# +# $stack +# [1] 1 1 1 1 1 1 1 1 1 1 +# +# $pred.int +# NULL + +## ----stack2, message = FALSE,fig.width=5,fig.height=3,fig.align = "center",results='hide', eval = FALSE---- +# set.seed(123) +# x = rnorm(100); y = rnorm(100) +# +# nns.params = NNS.stack(IVs.train = cbind(x, x), +# DV.train = y, +# method = 1, ncores = 1) + +## ----stack2optim, echo = FALSE------------------------------------------------ +set.seed(123) +x = rnorm(100); y = rnorm(100) + +nns.params = list() +nns.params$NNS.reg.n.best = 100 + +## ----stack2res, fig.width=5,fig.height=3,fig.align = "center",results='hide'---- +NNS.reg(cbind(x, x), y, + n.best = nns.params$NNS.reg.n.best, + point.est = cbind(x, x), + residual.plot = TRUE, + ncores = 1, confidence.interval = .95) + +## ----smooth, fig.width=5,fig.height=3,fig.align = "center",results='hide'----- +NNS.reg(x, y, smooth = TRUE) + +## ----uniimpute, eval=FALSE---------------------------------------------------- +# set.seed(123) +# +# # Univariate predictor with nonlinear signal +# n <- 400 +# x <- sort(runif(n, -3, 3)) +# y <- sin(x) + 0.2 * x^2 + rnorm(n, 0, 0.25) +# +# # Induce ~25% MCAR missingness in y +# miss <- rbinom(n, 1, 0.25) == 1 +# y_mis <- y +# y_mis[miss] <- NA +# +# # ---- Increasing dimensions trick ---- +# # Duplicate x so the distance operates in a 2D space: cbind(x, x). +# # This sharpens nearest-neighbor selection even in a nominally univariate setting. +# x2_train <- cbind(x[!miss], x[!miss]) +# x2_miss <- cbind(x[miss], x[miss]) +# +# # 1-NN donor imputation with NNS.reg +# y_hat_uni <- NNS::NNS.reg( +# x = x2_train, # predictors (duplicated x) +# y = y[!miss], # observed responses +# point.est = x2_miss, # rows to impute +# order = "max", # dependence-maximizing order +# n.best = 1, # 1-NN donor +# plot = FALSE +# )$Point.est +# +# # Fill back +# y_completed_uni <- y_mis +# y_completed_uni[miss] <- y_hat_uni +# +# # Plot observed vs imputed (NNS 1-NN) +# plot(x, y, pch = 1, col = "steelblue", cex = 1.5, lwd = 2, +# xlab = "x", ylab = "y", main = "NNS 1-NN Imputation") +# points(x[miss], y_hat_uni, col = "red", pch = 15, cex = 1.3) +# +# legend("topleft", +# legend = c("Observed", "Imputed (NNS 1-NN)"), +# col = c("steelblue", "red"), +# pch = c(1, 15), +# pt.lwd = c(2, NA), +# bty = "n") + +## ----multiimpute, eval=FALSE-------------------------------------------------- +# set.seed(123) +# +# # Multivariate predictors with nonlinear & interaction structure +# n <- 800 +# X <- cbind( +# x1 = rnorm(n), +# x2 = runif(n, -2, 2), +# x3 = rnorm(n, 0, 1) +# ) +# +# f <- function(x1, x2, x3) 1.1*x1 - 0.8*x2 + 0.5*x3 + 0.6*x1*x2 - 0.4*x2*x3 + 0.3*sin(1.3*x1) +# y <- f(X[,1], X[,2], X[,3]) + rnorm(n, 0, 0.4) +# +# # Induce ~30% MCAR missingness in y +# miss <- rbinom(n, 1, 0.30) == 1 +# y_mis <- y +# y_mis[miss] <- NA +# +# # Training (observed) vs rows to impute +# X_obs <- X[!miss, , drop = FALSE] +# y_obs <- y[!miss] +# X_mis <- X[ miss, , drop = FALSE] +# +# # 1-NN donor imputation with NNS.reg +# y_hat_mv <- NNS::NNS.reg( +# x = X_obs, # all observed predictors +# y = y_obs, # observed responses +# point.est = X_mis, # rows to impute +# order = "max", # dependence-maximizing order +# n.best = 1, # 1-NN donor +# plot = FALSE +# )$Point.est +# +# # Completed vector +# y_completed_mv <- y_mis +# y_completed_mv[miss] <- y_hat_mv +# +# # Plot observed vs imputed (multivariate, NNS 1-NN) +# plot(seq_along(y), y, +# pch = 1, col = "steelblue", cex = 1.5, lwd = 2, +# xlab = "Observation index", ylab = "y", +# main = "NNS 1-NN Multivariate Imputation") +# +# # Overlay imputed values +# points(which(miss), y_hat_mv, pch = 15, col = "red", cex = 1.2) +# +# # Legend +# legend("topleft", +# legend = c("Observed", "Imputed (NNS 1-NN)"), +# col = c("steelblue", "red"), +# pch = c(1, 15), +# pt.lwd = c(2, NA), +# bty = "n") + diff --git a/tools/NNS/vignettes/NNSvignette_07_Clustering_and_Regression.html b/tools/NNS/vignettes/NNSvignette_07_Clustering_and_Regression.html new file mode 100644 index 0000000..4ef022e --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_07_Clustering_and_Regression.html @@ -0,0 +1,1014 @@ + + + + + + + + + + + + + + + +Getting Started with NNS: Clustering and Regression + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Getting Started with NNS: Clustering and +Regression

    +

    Fred Viole

    + + + +
    library(NNS)
    +library(data.table)
    +require(knitr)
    +require(rgl)
    +
    +

    Clustering and Regression

    +

    Below are some examples demonstrating unsupervised learning with NNS +clustering and nonlinear regression using the resulting clusters. As +always, for a more thorough description and definition, please view the +References.

    +
    +

    NNS Partitioning NNS.part()

    +

    NNS.part is both a partitional and +hierarchical clustering method. NNS iteratively partitions +the joint distribution into partial moment quadrants, and then assigns a +quadrant identification (1:4) at each partition.

    +

    NNS.part returns a +data.table of observations along with their final quadrant +identification. It also returns the regression points, which are the +quadrant means used in NNS.reg.

    +
    x = seq(-5, 5, .05); y = x ^ 3
    +
    +for(i in 1 : 4){NNS.part(x, y, order = i, Voronoi = TRUE, obs.req = 0)}
    +

    +
    +

    X-only Partitioning

    +

    NNS.part offers a partitioning based on +\(x\) values only +NNS.part(x, y, type = "XONLY", ...), using +the entire bandwidth in its regression point derivation, and shares the +same limit condition as partitioning via both \(x\) and \(y\) values.

    +
    for(i in 1 : 4){NNS.part(x, y, order = i, type = "XONLY", Voronoi = TRUE)}
    +

    +

    Note the partition identifications are limited to 1’s and 2’s (left +and right of the partition respectively), not the 4 values per the \(x\) and \(y\) partitioning.

    +
    ## $order
    +## [1] 4
    +## 
    +## $dt
    +##          x         y quadrant prior.quadrant
    +##      <num>     <num>   <char>         <char>
    +##   1: -5.00 -125.0000    q1111           q111
    +##   2: -4.95 -121.2874    q1111           q111
    +##   3: -4.90 -117.6490    q1111           q111
    +##   4: -4.85 -114.0841    q1111           q111
    +##   5: -4.80 -110.5920    q1111           q111
    +##  ---                                        
    +## 197:  4.80  110.5920    q2222           q222
    +## 198:  4.85  114.0841    q2222           q222
    +## 199:  4.90  117.6490    q2222           q222
    +## 200:  4.95  121.2874    q2222           q222
    +## 201:  5.00  125.0000    q2222           q222
    +## 
    +## $regression.points
    +##    quadrant          x            y
    +##      <char>      <num>        <num>
    +## 1:     q111 -4.4523966 -89.31996002
    +## 2:     q112 -3.2250000 -31.51531806
    +## 3:     q121 -2.0023966  -7.46341667
    +## 4:     q122 -0.7590415  -0.51890098
    +## 5:     q211  0.3739355   0.08338409
    +## 6:     q212  1.3499632   2.26930682
    +## 7:     q221  2.6206250  16.42843100
    +## 8:     q222  4.1955267  75.78894504
    +
    +
    +
    +

    Clusters Used in Regression

    +

    The right column of plots shows the corresponding regression (plus +endpoints and central point) for the order of NNS +partitioning.

    +
    for(i in 1 : 3){NNS.part(x, y, order = i, obs.req = 0, Voronoi = TRUE, type = "XONLY") ; NNS.reg(x, y, order = i, ncores = 1)}
    +

    +
    +
    +
    +

    NNS Regression NNS.reg()

    +

    NNS.reg can fit any \(f(x)\), for both uni- and multivariate +cases. NNS.reg returns a self-evident list +of values provided below.

    +
    +

    Univariate:

    +
    NNS.reg(x, y, ncores = 1)
    +

    +
    ## $R2
    +## [1] 0.9999858
    +## 
    +## $SE
    +## [1] 0.1822738
    +## 
    +## $Prediction.Accuracy
    +## NULL
    +## 
    +## $equation
    +## NULL
    +## 
    +## $x.star
    +## NULL
    +## 
    +## $derivative
    +##     Coefficient X.Lower.Range X.Upper.Range
    +##           <num>         <num>         <num>
    +##  1: 74.25250000    -5.0000000    -4.9750000
    +##  2: 72.47650000    -4.9750000    -4.8500000
    +##  3: 68.69350000    -4.8500000    -4.7250000
    +##  4: 64.88656716    -4.7250000    -4.5854167
    +##  5: 61.01480519    -4.5854167    -4.4250000
    +##  6: 57.64628788    -4.4250000    -4.2875000
    +##  7: 52.29438889    -4.2875000    -4.1000000
    +##  8: 55.28971014    -4.1000000    -3.9562500
    +##  9: 39.55816092    -3.9562500    -3.7750000
    +## 10: 41.08694030    -3.7750000    -3.6354167
    +## 11: 38.01863636    -3.6354167    -3.4750000
    +## 12: 34.69626866    -3.4750000    -3.3354167
    +## 13: 31.88168831    -3.3354167    -3.1750000
    +## 14: 29.28265152    -3.1750000    -3.0375000
    +## 15: 25.79438889    -3.0375000    -2.8500000
    +## 16: 23.18416667    -2.8500000    -2.6875000
    +## 17: 20.17544872    -2.6875000    -2.5250000
    +## 18: 18.23350000    -2.5250000    -2.4000000
    +## 19: 16.36150000    -2.4000000    -2.2750000
    +## 20: 15.45634921    -2.2750000    -2.1437500
    +## 21: 12.10506173    -2.1437500    -1.9750000
    +## 22: 10.84291045    -1.9750000    -1.8354167
    +## 23:  9.17837079    -1.8354167    -1.6500000
    +## 24:  7.71713333    -1.6500000    -1.4937500
    +## 25:  5.67487654    -1.4937500    -1.3250000
    +## 26:  4.77015152    -1.3250000    -1.1875000
    +## 27:  3.65525641    -1.1875000    -1.0250000
    +## 28:  2.71828358    -1.0250000    -0.8854167
    +## 29:  1.97577922    -0.8854167    -0.7250000
    +## 30:  1.29696970    -0.7250000    -0.5875000
    +## 31:  0.71536082    -0.5875000    -0.3854167
    +## 32:  0.26031250    -0.3854167    -0.1854167
    +## 33:  0.08077922    -0.1854167    -0.1052083
    +## 34:  0.01168831    -0.1052083    -0.0250000
    +## 35:  0.00625000    -0.0250000     0.0750000
    +## 36:  0.05125000     0.0750000     0.1750000
    +## 37:  0.17050000     0.1750000     0.3000000
    +## 38:  0.40450000     0.3000000     0.4250000
    +## 39:  0.68125000     0.4250000     0.5250000
    +## 40:  0.99625000     0.5250000     0.6250000
    +## 41:  1.30261905     0.6250000     0.7562500
    +## 42:  2.23351852     0.7562500     0.9250000
    +## 43:  2.85625000     0.9250000     1.0250000
    +## 44:  3.47125000     1.0250000     1.1250000
    +## 45:  4.21750000     1.1250000     1.2500000
    +## 46:  5.19250000     1.2500000     1.3750000
    +## 47:  6.18250000     1.3750000     1.5000000
    +## 48:  7.35250000     1.5000000     1.6250000
    +## 49:  7.76690476     1.6250000     1.7562500
    +## 50: 10.84596774     1.7562500     1.9500000
    +## 51: 10.93692308     1.9500000     2.1125000
    +## 52: 14.30505155     2.1125000     2.3145833
    +## 53: 17.95467391     2.3145833     2.5062500
    +## 54: 21.46451613     2.5062500     2.7000000
    +## 55: 20.50807692     2.7000000     2.8625000
    +## 56: 26.01343750     2.8625000     3.0625000
    +## 57: 32.71687737     3.0625000     3.2671585
    +## 58: 34.19048114     3.2671585     3.5000000
    +## 59: 33.57759494     3.5000000     3.6645833
    +## 60: 46.95453488     3.6645833     3.8437500
    +## 61: 42.67514286     3.8437500     4.0625000
    +## 62: 57.09307692     4.0625000     4.2250000
    +## 63: 55.24078947     4.2250000     4.3437500
    +## 64: 59.68593153     4.3437500     4.5671585
    +## 65: 66.33740696     4.5671585     4.8301031
    +## 66: 72.01977335     4.8301031     5.0000000
    +##     Coefficient X.Lower.Range X.Upper.Range
    +## 
    +## $Point.est
    +## NULL
    +## 
    +## $pred.int
    +## NULL
    +## 
    +## $regression.points
    +##              x             y
    +##          <num>         <num>
    +##  1: -5.0000000 -1.250000e+02
    +##  2: -4.9750000 -1.231437e+02
    +##  3: -4.8500000 -1.140841e+02
    +##  4: -4.7250000 -1.054974e+02
    +##  5: -4.5854167 -9.644035e+01
    +##  6: -4.4250000 -8.665256e+01
    +##  7: -4.2875000 -7.872620e+01
    +##  8: -4.1000000 -6.892100e+01
    +##  9: -3.9562500 -6.097310e+01
    +## 10: -3.7750000 -5.380319e+01
    +## 11: -3.6354167 -4.806814e+01
    +## 12: -3.4750000 -4.196931e+01
    +## 13: -3.3354167 -3.712629e+01
    +## 14: -3.1750000 -3.201194e+01
    +## 15: -3.0375000 -2.798557e+01
    +## 16: -2.8500000 -2.314913e+01
    +## 17: -2.6875000 -1.938170e+01
    +## 18: -2.5250000 -1.610319e+01
    +## 19: -2.4000000 -1.382400e+01
    +## 20: -2.2750000 -1.177881e+01
    +## 21: -2.1437500 -9.750167e+00
    +## 22: -1.9750000 -7.707437e+00
    +## 23: -1.8354167 -6.193948e+00
    +## 24: -1.6500000 -4.492125e+00
    +## 25: -1.4937500 -3.286323e+00
    +## 26: -1.3250000 -2.328687e+00
    +## 27: -1.1875000 -1.672792e+00
    +## 28: -1.0250000 -1.078812e+00
    +## 29: -0.8854167 -6.993854e-01
    +## 30: -0.7250000 -3.824375e-01
    +## 31: -0.5875000 -2.041042e-01
    +## 32: -0.3854167 -5.954167e-02
    +## 33: -0.1854167 -7.479167e-03
    +## 34: -0.1052083 -1.000000e-03
    +## 35: -0.0250000 -6.250000e-05
    +## 36:  0.0750000  5.625000e-04
    +## 37:  0.1750000  5.687500e-03
    +## 38:  0.3000000  2.700000e-02
    +## 39:  0.4250000  7.756250e-02
    +## 40:  0.5250000  1.456875e-01
    +## 41:  0.6250000  2.453125e-01
    +## 42:  0.7562500  4.162813e-01
    +## 43:  0.9250000  7.931875e-01
    +## 44:  1.0250000  1.078813e+00
    +## 45:  1.1250000  1.425938e+00
    +## 46:  1.2500000  1.953125e+00
    +## 47:  1.3750000  2.602188e+00
    +## 48:  1.5000000  3.375000e+00
    +## 49:  1.6250000  4.294063e+00
    +## 50:  1.7562500  5.313469e+00
    +## 51:  1.9500000  7.414875e+00
    +## 52:  2.1125000  9.192125e+00
    +## 53:  2.3145833  1.208294e+01
    +## 54:  2.5062500  1.552425e+01
    +## 55:  2.7000000  1.968300e+01
    +## 56:  2.8625000  2.301556e+01
    +## 57:  3.0625000  2.821825e+01
    +## 58:  3.2671585  3.491404e+01
    +## 59:  3.5000000  4.287500e+01
    +## 60:  3.6645833  4.840131e+01
    +## 61:  3.8437500  5.681400e+01
    +## 62:  4.0625000  6.614919e+01
    +## 63:  4.2250000  7.542681e+01
    +## 64:  4.3437500  8.198666e+01
    +## 65:  4.5671585  9.532100e+01
    +## 66:  4.8301031  1.127641e+02
    +## 67:  5.0000000  1.250000e+02
    +##              x             y
    +## 
    +## $Fitted.xy
    +##          x         y     y.hat   NNS.ID gradient  residuals standard.errors
    +##      <num>     <num>     <num>   <char>    <num>      <num>           <num>
    +##   1: -5.00 -125.0000 -125.0000 q1111111 74.25250  0.0000000      0.00000000
    +##   2: -4.95 -121.2874 -121.3318 q1111112 72.47650 -0.0444000      0.07380015
    +##   3: -4.90 -117.6490 -117.7080 q1111121 72.47650 -0.0589500      0.07380015
    +##   4: -4.85 -114.0841 -114.0841 q1111121 68.69350  0.0000000      0.05069967
    +##   5: -4.80 -110.5920 -110.6495 q1111122 68.69350 -0.0574500      0.05069967
    +##  ---                                                                       
    +## 197:  4.80  110.5920  110.7671 q2222221 66.33741  0.1751022      0.27620216
    +## 198:  4.85  114.0841  114.1970 q2222222 72.01977  0.1129090      0.12572307
    +## 199:  4.90  117.6490  117.7980 q2222222 72.01977  0.1490227      0.12572307
    +## 200:  4.95  121.2874  121.3990 q2222222 72.01977  0.1116363      0.12572307
    +## 201:  5.00  125.0000  125.0000 q2222222 72.01977  0.0000000      0.12572307
    +
    +
    +

    Multivariate:

    +

    Multivariate regressions return a plot of \(y\) and \(\hat{y}\), as well as the regression points +($RPM) and partitions ($rhs.partitions) for +each regressor.

    +
    f = function(x, y) x ^ 3 + 3 * y - y ^ 3 - 3 * x
    +y = x ; z <- expand.grid(x, y)
    +g = f(z[ , 1], z[ , 2])
    +NNS.reg(z, g, order = "max", plot = FALSE, ncores = 1)
    +
    ## $R2
    +## [1] 1
    +## 
    +## $rhs.partitions
    +##         Var1  Var2
    +##        <num> <num>
    +##     1: -5.00    -5
    +##     2: -4.95    -5
    +##     3: -4.90    -5
    +##     4: -4.85    -5
    +##     5: -4.80    -5
    +##    ---            
    +## 40397:  4.80     5
    +## 40398:  4.85     5
    +## 40399:  4.90     5
    +## 40400:  4.95     5
    +## 40401:  5.00     5
    +## 
    +## $RPM
    +##         Var1  Var2         y.hat
    +##        <num> <num>         <num>
    +##     1:  -4.8 -4.80 -7.105427e-15
    +##     2:  -4.8 -2.55 -8.726063e+01
    +##     3:  -4.8 -2.50 -8.806700e+01
    +##     4:  -4.8 -2.45 -8.883587e+01
    +##     5:  -4.8 -2.40 -8.956800e+01
    +##    ---                          
    +## 40397:  -2.6 -2.80  3.776000e+00
    +## 40398:  -2.6 -2.75  2.770875e+00
    +## 40399:  -2.6 -2.70  1.807000e+00
    +## 40400:  -2.6 -2.65  8.836250e-01
    +## 40401:  -2.6 -2.60  1.776357e-15
    +## 
    +## $Point.est
    +## NULL
    +## 
    +## $pred.int
    +## NULL
    +## 
    +## $Fitted.xy
    +##         Var1  Var2          y      y.hat      NNS.ID residuals
    +##        <num> <num>      <num>      <num>      <char>     <num>
    +##     1: -5.00    -5   0.000000   0.000000     201.201         0
    +##     2: -4.95    -5   3.562625   3.562625     402.201         0
    +##     3: -4.90    -5   7.051000   7.051000     603.201         0
    +##     4: -4.85    -5  10.465875  10.465875     804.201         0
    +##     5: -4.80    -5  13.808000  13.808000    1005.201         0
    +##    ---                                                        
    +## 40397:  4.80     5 -13.808000 -13.808000 39597.40401         0
    +## 40398:  4.85     5 -10.465875 -10.465875 39798.40401         0
    +## 40399:  4.90     5  -7.051000  -7.051000 39999.40401         0
    +## 40400:  4.95     5  -3.562625  -3.562625 40200.40401         0
    +## 40401:  5.00     5   0.000000   0.000000 40401.40401         0
    +
    +
    +

    Inter/Extrapolation

    +

    NNS.reg can inter- or extrapolate any point of interest. +The NNS.reg(x, y, point.est = ...) +parameter permits any sized data of similar dimensions to \(x\) and called specifically with +NNS.reg(...)$Point.est.

    +
    +
    +

    NNS Dimension Reduction Regression

    +

    NNS.reg also provides a dimension +reduction regression by including a parameter +NNS.reg(x, y, dim.red.method = "cor", ...). +Reducing all regressors to a single dimension using the returned +equation +NNS.reg(..., dim.red.method = "cor", ...)$equation.

    +
    NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", location = "topleft", ncores = 1)$equation
    +

    +
    ##        Variable Coefficient
    +##          <char>       <num>
    +## 1: Sepal.Length   0.7980781
    +## 2:  Sepal.Width  -0.4402896
    +## 3: Petal.Length   0.9354305
    +## 4:  Petal.Width   0.9381792
    +## 5:  DENOMINATOR   4.0000000
    +

    Thus, our model for this regression would be: \[Species = \frac{0.798*Sepal.Length +-0.44*Sepal.Width +0.935*Petal.Length +0.938*Petal.Width}{4} +\]

    +
    +

    Threshold

    +

    NNS.reg(x, y, dim.red.method = "cor", threshold = ...) +offers a method of reducing regressors further by controlling the +absolute value of required correlation.

    +
    NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, location = "topleft", ncores = 1)$equation
    +

    +
    ##        Variable Coefficient
    +##          <char>       <num>
    +## 1: Sepal.Length   0.7980781
    +## 2:  Sepal.Width   0.0000000
    +## 3: Petal.Length   0.9354305
    +## 4:  Petal.Width   0.9381792
    +## 5:  DENOMINATOR   3.0000000
    +

    Thus, our model for this further reduced dimension regression would +be: \[Species = \frac{\: 0.798*Sepal.Length + +0*Sepal.Width +0.935*Petal.Length +0.938*Petal.Width}{3} \]

    +

    and the point.est = (...) operates in the same manner as +the full regression above, again called with +NNS.reg(...)$Point.est.

    +
    NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, point.est = iris[1 : 10, 1 : 4], location = "topleft", ncores = 1)$Point.est
    +

    +
    ##  [1] 1 1 1 1 1 1 1 1 1 1
    +
    +
    +
    +
    +

    Classification

    +

    For a classification problem, we simply set +NNS.reg(x, y, type = "CLASS", ...).

    +

    NOTE: Base category of response variable should be 1, not 0 +for classification problems.

    +
    NNS.reg(iris[ , 1 : 4], iris[ , 5], type = "CLASS", point.est = iris[1 : 10, 1 : 4], location = "topleft", ncores = 1)$Point.est
    +

    +
    ##  [1] 1 1 1 1 1 1 1 1 1 1
    +
    +
    +

    Cross-Validation NNS.stack()

    +

    The NNS.stack routine cross-validates +for a given objective function the n.best parameter in the +multivariate NNS.reg function as well as +the threshold parameter in the dimension reduction +NNS.reg version. +NNS.stack can be used for +classification:

    +

    NNS.stack(..., type = "CLASS", ...)

    +

    or continuous dependent variables:

    +

    NNS.stack(..., type = NULL, ...).

    +

    Any objective function obj.fn can be called using +expression() with the terms predicted and +actual, even from external packages such as +Metrics.

    +

    NNS.stack(..., obj.fn = expression(Metrics::mape(actual, predicted)), objective = "min").

    +
    NNS.stack(IVs.train = iris[ , 1 : 4], 
    +          DV.train = iris[ , 5], 
    +          IVs.test = iris[1 : 10, 1 : 4],
    +          dim.red.method = "cor",
    +          obj.fn = expression( mean(round(predicted) == actual) ),
    +          objective = "max", type = "CLASS", 
    +          folds = 1, ncores = 1)
    +
    Folds Remaining = 0 
    +Current NNS.reg(... , threshold = 0.9350 ) | eval(obj.fn) = 1.000000 | MAX Iterations Remaining = 2
    +Current NNS.reg(... , threshold = 0.7950 ) | eval(obj.fn) = 0.973684 | MAX Iterations Remaining = 1
    +Current NNS.reg(... , threshold = 0.4400 ) | eval(obj.fn) = 0.894737 | MAX Iterations Remaining = 0
    +Current NNS.reg(. , n.best = 1 ) | eval(obj.fn) = 0.868421 | MAX Iterations Remaining = 12
    +Current NNS.reg(. , n.best = 2 ) | eval(obj.fn) = 0.736842 | MAX Iterations Remaining = 11
    +Current NNS.reg(. , n.best = 3 ) | eval(obj.fn) = 0.763158 | MAX Iterations Remaining = 10
    +Current NNS.reg(. , n.best = 4 ) | eval(obj.fn) = 0.736842 | MAX Iterations Remaining = 9
    +$OBJfn.reg
    +[1] 0.9733333
    +
    +$NNS.reg.n.best
    +[1] 1
    +
    +$probability.threshold
    +[1] 0.495
    +
    +$OBJfn.dim.red
    +[1] 0.9666667
    +
    +$NNS.dim.red.threshold
    +[1] 0.935
    +
    +$reg
    + [1] 1 1 1 1 1 1 1 1 1 1
    +
    +$reg.pred.int
    +NULL
    +
    +$dim.red
    + [1] 1 1 1 1 1 1 1 1 1 1
    +
    +$dim.red.pred.int
    +NULL
    +
    +$stack
    + [1] 1 1 1 1 1 1 1 1 1 1
    +
    +$pred.int
    +NULL
    +
    +
    +

    Increasing Dimensions

    +

    Given multicollinearity is not an issue for nonparametric regressions +as it is for OLS, in the case of an ill-fit univariate model a better +option may be to increase the dimensionality of regressors with a copy +of itself and cross-validate the number of clusters n.best +via:

    +

    NNS.stack(IVs.train = cbind(x, x), DV.train = y, method = 1, ...).

    +
    set.seed(123)
    +x = rnorm(100); y = rnorm(100)
    +
    +nns.params = NNS.stack(IVs.train = cbind(x, x),
    +                        DV.train = y,
    +                        method = 1, ncores = 1)
    +
    NNS.reg(cbind(x, x), y, 
    +        n.best = nns.params$NNS.reg.n.best,
    +        point.est = cbind(x, x), 
    +        residual.plot = TRUE,  
    +        ncores = 1, confidence.interval = .95)
    +

    +
    +
    +

    Smoothing Option

    +

    Smoothness is not required for curve fitting, but the +NNS.reg function offers an optional smoothed fit. This +feature applies a smoothing spline to regression points generated +internally using the partitioning method described earlier.

    +
    NNS.reg(x, y, smooth = TRUE)
    +

    +
    +
    +

    Imputation

    +

    Imputation in NNS is a direct application of nearest +neighbor regression. When values of \(y\) are missing, we use the observed \((X,y)\) pairs as the training set and the +predictors of the missing rows as point.est.

    +

    A key insight is that even in univariate regressions, +NNS.reg benefits from the increasing dimensions trick: by +duplicating the predictor into a multivariate form, +e.g. cbind(x, x), the distance function underlying +NNS.reg operates in a 2-D space. This sharpened distance +metric allows a more robust donor selection, effectively turning +univariate imputation into a special case of multivariate nearest +neighbor regression.

    +

    For multivariate predictors, the same form applies directly — supply +the full set of observed predictors in \(x\), the observed responses in \(y\), and the incomplete rows in +point.est. With order = "max", n.best = 1, the +imputation is always 1-NN donor-based: each missing \(y\) is filled in by the response of its +closest donor under the NNS hybrid distance. This ensures +imputations remain strictly within the support of the observed data.

    +

    Categorical data is handled analogously, only +requiring NNS.reg(..., type = "CLASS") in the +procedure.

    +
    +

    Univariate Imputation

    +
    set.seed(123)
    +
    +# Univariate predictor with nonlinear signal
    +n <- 400
    +x <- sort(runif(n, -3, 3))
    +y <- sin(x) + 0.2 * x^2 + rnorm(n, 0, 0.25)
    +
    +# Induce ~25% MCAR missingness in y
    +miss <- rbinom(n, 1, 0.25) == 1
    +y_mis <- y
    +y_mis[miss] <- NA
    +
    +# ---- Increasing dimensions trick ----
    +# Duplicate x so the distance operates in a 2D space: cbind(x, x).
    +# This sharpens nearest-neighbor selection even in a nominally univariate setting.
    +x2_train <- cbind(x[!miss], x[!miss])
    +x2_miss  <- cbind(x[miss],  x[miss])
    +
    +# 1-NN donor imputation with NNS.reg
    +y_hat_uni <- NNS::NNS.reg(
    +  x         = x2_train,             # predictors (duplicated x)
    +  y         = y[!miss],             # observed responses
    +  point.est = x2_miss,              # rows to impute
    +  order     = "max",                # dependence-maximizing order
    +  n.best    = 1,                    # 1-NN donor
    +  plot      = FALSE
    +)$Point.est
    +
    +# Fill back
    +y_completed_uni <- y_mis
    +y_completed_uni[miss] <- y_hat_uni
    +
    +# Plot observed vs imputed (NNS 1-NN)
    +plot(x, y, pch = 1, col = "steelblue", cex = 1.5, lwd = 2,
    +     xlab = "x", ylab = "y", main = "NNS 1-NN Imputation")
    +points(x[miss], y_hat_uni, col = "red", pch = 15, cex = 1.3)
    +
    +legend("topleft",
    +       legend = c("Observed", "Imputed (NNS 1-NN)"),
    +       col    = c("steelblue", "red"),
    +       pch    = c(1, 15),
    +       pt.lwd = c(2, NA),
    +       bty    = "n")
    +
    +

    +
    +
    +

    Multivariate Imputation

    +
    set.seed(123)
    +
    +# Multivariate predictors with nonlinear & interaction structure
    +n <- 800
    +X <- cbind(
    +  x1 = rnorm(n),
    +  x2 = runif(n, -2, 2),
    +  x3 = rnorm(n, 0, 1)
    +)
    +
    +f <- function(x1, x2, x3) 1.1*x1 - 0.8*x2 + 0.5*x3 + 0.6*x1*x2 - 0.4*x2*x3 + 0.3*sin(1.3*x1)
    +y <- f(X[,1], X[,2], X[,3]) + rnorm(n, 0, 0.4)
    +
    +# Induce ~30% MCAR missingness in y
    +miss <- rbinom(n, 1, 0.30) == 1
    +y_mis <- y
    +y_mis[miss] <- NA
    +
    +# Training (observed) vs rows to impute
    +X_obs <- X[!miss, , drop = FALSE]
    +y_obs <- y[!miss]
    +X_mis <- X[ miss, , drop = FALSE]
    +
    +# 1-NN donor imputation with NNS.reg
    +y_hat_mv <- NNS::NNS.reg(
    +  x         = X_obs,     # all observed predictors
    +  y         = y_obs,     # observed responses
    +  point.est = X_mis,     # rows to impute
    +  order     = "max",     # dependence-maximizing order
    +  n.best    = 1,         # 1-NN donor
    +  plot      = FALSE
    +)$Point.est
    +
    +# Completed vector
    +y_completed_mv <- y_mis
    +y_completed_mv[miss] <- y_hat_mv
    +
    +# Plot observed vs imputed (multivariate, NNS 1-NN)
    +plot(seq_along(y), y, 
    +     pch = 1, col = "steelblue", cex = 1.5, lwd = 2,
    +     xlab = "Observation index", ylab = "y",
    +     main = "NNS 1-NN Multivariate Imputation")
    +
    +# Overlay imputed values
    +points(which(miss), y_hat_mv, pch = 15, col = "red", cex = 1.2)
    +
    +# Legend
    +legend("topleft",
    +       legend = c("Observed", "Imputed (NNS 1-NN)"),
    +       col    = c("steelblue", "red"),
    +       pch    = c(1, 15),
    +       pt.lwd = c(2, NA),
    +       bty    = "n")
    +
    +

    +
    +
    +

    A Note on Uncertainty Propagation

    +

    A common concern with local imputation methods is whether imputation +uncertainty propagates correctly into downstream inference. +NNS addresses this through bootstrap multiple imputation: +resampling complete cases across m iterations generates +between-imputation variance that flows through standard Rubin’s rules +pooling identically to any classical procedure.

    +

    Empirically, NNS bootstrap MI outperforms MICE with +predictive mean matching on nonlinear data — producing a pooled estimate +closer to the true parameter with a smaller pooled SE. The advantage +comes not from compressing uncertainty but from a more accurate +imputation model, which reduces between-imputation variance driven by +model error rather than genuine data uncertainty.

    +

    See NNS +Multiple Imputation vs MICE for the full reproducible +comparison.

    +
    +
    + + + + + + + + + + + + diff --git a/tools/NNS/vignettes/NNSvignette_08_Classification.R b/tools/NNS/vignettes/NNSvignette_08_Classification.R new file mode 100644 index 0000000..3fb44b6 --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_08_Classification.R @@ -0,0 +1,95 @@ +## ----setup, include=FALSE, message=FALSE-------------------------------------- +knitr::opts_chunk$set(echo = TRUE) +library(NNS) +library(data.table) +data.table::setDTthreads(1L) +options(mc.cores = 1) +RcppParallel::setThreadOptions(numThreads = 1) +Sys.setenv("OMP_THREAD_LIMIT" = 1) + +## ----setup2, message=FALSE, warning = FALSE----------------------------------- +library(NNS) +library(data.table) +require(knitr) +require(rgl) + +## ----rhs, rows.print=18------------------------------------------------------- +NNS.reg(iris[,1:4], iris[,5], residual.plot = FALSE, ncores = 1)$rhs.partitions + +## ----NNSBOOST,fig.align = "center", fig.height = 8,fig.width=6.5, eval=FALSE---- +# test.set = 141:150 +# +# a = NNS.boost(IVs.train = iris[-test.set, 1:4], +# DV.train = iris[-test.set, 5], +# IVs.test = iris[test.set, 1:4], +# epochs = 10, learner.trials = 10, +# status = FALSE, balance = TRUE, +# type = "CLASS", folds = 5) +# +# a +# $results +# [1] 3 3 3 3 3 3 3 3 3 3 +# +# $pred.int +# NULL +# +# $feature.weights +# Petal.Width Petal.Length Sepal.Length +# 0.4285714 0.4285714 0.1428571 +# +# $feature.frequency +# Petal.Width Petal.Length Sepal.Length +# 3 3 1 +# +# mean( a$results == as.numeric(iris[test.set, 5]) ) +# [1] 1 + +## ----NNSstack,fig.align = "center", fig.height = 8,fig.width=6.5, message=FALSE, eval= FALSE---- +# b = NNS.stack(IVs.train = iris[-test.set, 1:4], +# DV.train = iris[-test.set, 5], +# IVs.test = iris[test.set, 1:4], +# type = "CLASS", balance = TRUE, +# ncores = 1, folds = 5) +# +# b + +## ----stackeval, eval = FALSE-------------------------------------------------- +# $OBJfn.reg +# [1] 0.955787 +# +# $NNS.reg.n.best +# [1] 1 +# +# $probability.threshold +# [1] 0.6429167 +# +# $OBJfn.dim.red +# [1] 0.955787 +# +# $NNS.dim.red.threshold +# [1] 0.925 +# +# $reg +# [1] 3 3 3 3 3 3 3 3 3 3 +# +# $reg.pred.int +# NULL +# +# $dim.red +# [1] 3 3 3 3 3 3 3 3 3 3 +# +# $dim.red.pred.int +# NULL +# +# $stack +# [1] 3 3 3 3 3 3 3 3 3 3 +# +# $pred.int +# NULL + +## ----stackevalres, eval = FALSE----------------------------------------------- +# mean( b$stack == as.numeric(iris[test.set, 5]) ) + +## ----stackreseval, eval = FALSE----------------------------------------------- +# [1] 1 + diff --git a/tools/NNS/vignettes/NNSvignette_08_Classification.html b/tools/NNS/vignettes/NNSvignette_08_Classification.html new file mode 100644 index 0000000..cc10112 --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_08_Classification.html @@ -0,0 +1,577 @@ + + + + + + + + + + + + + + + +Getting Started with NNS: Classification + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Getting Started with NNS: +Classification

    +

    Fred Viole

    + + + +
    library(NNS)
    +library(data.table)
    +require(knitr)
    +require(rgl)
    +
    +

    Classification

    +

    NNS.reg is a very robust regression +technique capable of nonlinear regressions of continuous variables and +classification tasks in machine learning problems.

    +

    We have extended the NNS.reg +applications per the use of an ensemble method of classification in +NNS.boost. In short, +NNS.reg is the base learner instead of +trees.

    +

    One major advantage NNS.boost has over tree +based methods is the ability to seamlessly extrapolate beyond the +current range of observations.

    +
    +

    Splits vs. Partitions

    +

    Popular boosting algorithms take a series of weak learning decision +tree models, and aggregate their outputs. NNS is also a +decision tree of sorts, by partitioning each regressor with respect to +the dependent variable. We can directly control the number of “splits” +with the NNS.reg(..., order = , ...) +parameter.

    +
    +

    NNS Partitions

    +

    We can see how NNS partitions each regressor by calling +the $rhs.partitions output. You will notice that each +partition is not an equal interval, nor of equal length, which +differentiates NNS from other bandwidth or tree-based +techniques.

    +

    Higher dependence between a regressor and the dependent variable will +allow for a larger number of partitions. This is determined internally +with the NNS.dep measure.

    +
    NNS.reg(iris[,1:4], iris[,5], residual.plot = FALSE, ncores = 1)$rhs.partitions
    +
    ##           V1       V2       V3       V4
    +##        <num>    <num>    <num>    <num>
    +##  1: 4.300000 2.000000 1.000000 0.100000
    +##  2: 4.381250 2.645276 1.050000 0.200000
    +##  3: 4.577396 2.980556 1.200000 0.300000
    +##  4: 4.700000 3.181155 1.300000 0.400000
    +##  5: 4.800000 3.552439 1.400000 0.500000
    +##  6: 4.900000 4.400000 1.500000 0.600000
    +##  7: 5.000000       NA 1.600000 1.000000
    +##  8: 5.100000       NA 1.700000 1.100000
    +##  9: 5.205000       NA 1.900000 1.200000
    +## 10: 5.400000       NA 3.416305 1.300000
    +## 11: 5.500000       NA 3.834865 1.400000
    +## 12: 5.600000       NA 4.000000 1.500000
    +## 13: 5.700000       NA 4.184722 1.600000
    +## 14: 5.800000       NA 4.400000 1.700000
    +## 15: 5.900000       NA 4.500000 1.800000
    +## 16: 6.000000       NA 4.670803 1.900000
    +## 17: 6.100000       NA 4.863889 2.000000
    +## 18: 6.200000       NA 5.000000 2.117708
    +## 19: 6.300000       NA 5.100000 2.300000
    +## 20: 6.400000       NA 5.200000 2.435206
    +## 21: 6.500000       NA 5.337500 2.500000
    +## 22: 6.600000       NA 5.500000       NA
    +## 23: 6.700000       NA 5.617708       NA
    +## 24: 6.800000       NA 5.849554       NA
    +## 25: 6.900000       NA 6.336875       NA
    +## 26: 7.050000       NA 6.900000       NA
    +## 27: 7.224375       NA       NA       NA
    +## 28: 7.687079       NA       NA       NA
    +## 29: 7.900000       NA       NA       NA
    +##           V1       V2       V3       V4
    +
    +
    +
    +
    +

    NNS.boost()

    +

    Through resampling of the training set and letting each iterated set +of data speak for themselves (while paying extra attention to the +residuals throughout), we can test various regressor combinations in +these dynamic decision trees…only keeping those combinations that add +predictive value. From there we simply aggregate the predictions.

    +

    NNS.boost will automatically search for +an accuracy threshold from the training set, reporting +iterations remaining and level obtained in the console. A plot of the +frequency of the learning accuracy on the training set is also +provided.

    +

    Once a threshold is obtained, +NNS.boost will test various feature +combinations against different splits of the training set and report +back the frequency of each regressor used in the final estimate.

    +

    Let’s have a look and see how it works. We use 140 random +iris observations as our training set with the 10 holdout +observations as our test set. For brevity, we set +epochs = 10, learner.trials = 10, folds = 1.

    +

    NOTE: Base category of response variable should be 1, not 0 +for classification problems when using +NNS.boost(..., type = "CLASS").

    +
    test.set = 141:150
    + 
    +a = NNS.boost(IVs.train = iris[-test.set, 1:4], 
    +              DV.train = iris[-test.set, 5],
    +              IVs.test = iris[test.set, 1:4],
    +              epochs = 10, learner.trials = 10, 
    +              status = FALSE, balance = TRUE,
    +              type = "CLASS", folds = 5)
    +
    +a
    +$results
    + [1] 3 3 3 3 3 3 3 3 3 3
    +
    +$pred.int
    +NULL
    +
    +$feature.weights
    + Petal.Width Petal.Length Sepal.Length 
    +   0.4285714    0.4285714    0.1428571 
    +
    +$feature.frequency
    + Petal.Width Petal.Length Sepal.Length 
    +           3            3            1 
    +   
    +mean( a$results == as.numeric(iris[test.set, 5]) )
    +[1] 1
    +

    A perfect classification, using the features weighted per the output +above.

    +
    +
    +

    Cross-Validation Classification Using NNS.stack()

    +

    The NNS.stack() routine cross-validates +for a given objective function the n.best parameter in the +multivariate NNS.reg function as well as +the threshold parameter in the dimension reduction +NNS.reg version. +NNS.stack can be used for classification +via +NNS.stack(..., type = "CLASS", ...).

    +

    For brevity, we set folds = 1.

    +

    NOTE: Base category of response variable should be 1, not 0 +for classification problems when using +NNS.stack(..., type = "CLASS").

    +
    b = NNS.stack(IVs.train = iris[-test.set, 1:4], 
    +              DV.train = iris[-test.set, 5],
    +              IVs.test = iris[test.set, 1:4],
    +              type = "CLASS", balance = TRUE,
    +              ncores = 1, folds = 5)
    +
    +b
    +
    $OBJfn.reg
    +[1] 0.955787
    +
    +$NNS.reg.n.best
    +[1] 1
    +
    +$probability.threshold
    +[1] 0.6429167
    +
    +$OBJfn.dim.red
    +[1] 0.955787
    +
    +$NNS.dim.red.threshold
    +[1] 0.925
    +
    +$reg
    + [1] 3 3 3 3 3 3 3 3 3 3
    +
    +$reg.pred.int
    +NULL
    +
    +$dim.red
    + [1] 3 3 3 3 3 3 3 3 3 3
    +
    +$dim.red.pred.int
    +NULL
    +
    +$stack
    + [1] 3 3 3 3 3 3 3 3 3 3
    +
    +$pred.int
    +NULL
    +
    mean( b$stack == as.numeric(iris[test.set, 5]) )
    +
    [1] 1
    +
    +

    Brief Notes on Other Parameters

    +
      +
    • depth = "max" will force all observations to be +their own partition, forcing a perfect fit of the multivariate +regression. In essence, this is the basis for a kNN nearest +neighbor type of classification.

    • +
    • n.best = 1 will use the single nearest neighbor. +When coupled with depth = "max", NNS will +emulate a kNN = 1 but as the dimensions increase the +results diverge demonstrating NNS is less sensitive to the +curse of dimensionality than kNN.

    • +
    • extreme will use the maximum or minimum +threshold obtained, and may result in errors if that +threshold cannot be eclipsed by subsequent iterations.

    • +
    +
    +
    + + + + + + + + + + + + diff --git a/tools/NNS/vignettes/NNSvignette_09_Forecasting.R b/tools/NNS/vignettes/NNSvignette_09_Forecasting.R new file mode 100644 index 0000000..1e371cc --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_09_Forecasting.R @@ -0,0 +1,143 @@ +## ----setup, include=FALSE, message=FALSE-------------------------------------- +knitr::opts_chunk$set(echo = TRUE) +library(NNS) +library(data.table) +data.table::setDTthreads(1L) +options(mc.cores = 1) +RcppParallel::setThreadOptions(numThreads = 1) +Sys.setenv("OMP_THREAD_LIMIT" = 1) + +## ----setup2, message=FALSE, warning = FALSE----------------------------------- +library(NNS) +library(data.table) +require(knitr) +require(rgl) + +## ----linear,fig.width=5,fig.height=3,fig.align = "center", warning=FALSE------ +nns_lin = NNS.ARMA(AirPassengers, + h = 44, + training.set = 100, + method = "lin", + plot = TRUE, + seasonal.factor = 12, + seasonal.plot = FALSE) + +sqrt(mean((nns_lin - tail(AirPassengers, 44)) ^ 2)) + +## ----nonlinear,fig.width=5,fig.height=3,fig.align = "center", eval = FALSE---- +# nns_nonlin = NNS.ARMA(AirPassengers, +# h = 44, +# training.set = 100, +# method = "nonlin", +# plot = FALSE, +# seasonal.factor = 12, +# seasonal.plot = FALSE) +# +# sqrt(mean((nns_nonlin - tail(AirPassengers, 44)) ^ 2)) + +## ----nonlinearres, eval = FALSE----------------------------------------------- +# [1] 18.1809 + +## ----seasonal test, eval=TRUE------------------------------------------------- +seas = t(sapply(1 : 25, function(i) c(i, sqrt( mean( (NNS.ARMA(AirPassengers, h = 44, training.set = 100, method = "lin", seasonal.factor = i, plot=FALSE) - tail(AirPassengers, 44)) ^ 2) ) ) ) ) + +colnames(seas) = c("Period", "RMSE") +seas + +## ----best fit, eval=TRUE------------------------------------------------------ +a = seas[which.min(seas[ , 2]), 1] + +## ----best nonlinear,fig.width=5,fig.height=3,fig.align = "center", eval=TRUE---- +nns = NNS.ARMA(AirPassengers, + h = 44, + training.set = 100, + method = "nonlin", + seasonal.factor = a, + plot = TRUE, seasonal.plot = FALSE) + +sqrt(mean((nns - tail(AirPassengers, 44)) ^ 2)) + +## ----modulo, eval=TRUE-------------------------------------------------------- +NNS.seas(AirPassengers, modulo = 12, plot = FALSE) + +## ----best optim, eval=FALSE--------------------------------------------------- +# nns.optimal = NNS.ARMA.optim(AirPassengers, +# training.set = 100, +# seasonal.factor = seq(12, 60, 6), +# obj.fn = expression( sqrt(mean((predicted - actual)^2)) ), +# objective = "min", +# pred.int = .95, plot = TRUE) +# +# nns.optimal + +## ----optimres, eval=FALSE----------------------------------------------------- +# [1] "CURRNET METHOD: lin" +# [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:" +# [1] "NNS.ARMA(... method = 'lin' , seasonal.factor = c( 12 ) ...)" +# [1] "CURRENT lin OBJECTIVE FUNCTION = 35.3996540135277" +# [1] "BEST method = 'lin', seasonal.factor = c( 12 )" +# [1] "BEST lin OBJECTIVE FUNCTION = 35.3996540135277" +# [1] "CURRNET METHOD: nonlin" +# [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:" +# [1] "NNS.ARMA(... method = 'nonlin' , seasonal.factor = c( 12 ) ...)" +# [1] "CURRENT nonlin OBJECTIVE FUNCTION = 18.1809033101955" +# [1] "BEST method = 'nonlin' PATH MEMBER = c( 12 )" +# [1] "BEST nonlin OBJECTIVE FUNCTION = 18.1809033101955" +# [1] "CURRNET METHOD: both" +# [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:" +# [1] "NNS.ARMA(... method = 'both' , seasonal.factor = c( 12 ) ...)" +# [1] "CURRENT both OBJECTIVE FUNCTION = 22.7363330823967" +# [1] "BEST method = 'both' PATH MEMBER = c( 12 )" +# [1] "BEST both OBJECTIVE FUNCTION = 22.7363330823967" +# > +# > nns.optimal +# $periods +# [1] 12 +# +# $weights +# NULL +# +# $obj.fn +# [1] 18.1809 +# +# $method +# [1] "nonlin" +# +# $shrink +# [1] FALSE +# +# $nns.regress +# [1] FALSE +# +# $bias.shift +# [1] 0 +# +# $errors +# [1] -6.0626221 -10.8434613 -10.7646998 -22.7134790 -15.3519569 -12.9673866 -9.1626428 3.9393939 7.4882812 12.3750000 29.1132812 34.3281250 19.7002739 +# [14] 20.0656989 11.8833952 -15.1389735 24.1108241 7.4289721 15.2385271 38.3826941 19.2903993 17.4644272 19.3331767 19.8155057 -4.0856291 26.3260739 +# [27] 2.6153110 -24.3491085 3.9057436 -8.8271346 -7.9236143 5.9867956 -3.9068174 -0.7986170 42.1995863 -10.1324609 -20.0852820 8.6573328 -21.3067790 +# [40] -24.3403514 -0.6332912 -29.8418247 -5.8572216 14.8998761 +# +# $results +# [1] 348.9374 411.1565 454.2353 444.2865 388.6480 334.0326 295.8374 339.9394 347.4883 330.3750 391.1133 382.3281 382.7003 455.0657 502.8834 489.8610 428.1108 +# [18] 366.4290 325.2385 375.3827 379.2904 359.4644 425.3332 415.8155 415.9144 498.3261 550.6153 534.6509 466.9057 398.1729 354.0764 410.9868 413.0932 390.2014 +# [35] 461.1996 450.8675 451.9147 543.6573 600.6932 581.6596 507.3667 431.1582 384.1428 446.8999 +# +# $lower.pred.int +# [1] 310.8588 373.0779 416.1567 406.2079 350.5694 295.9540 257.7588 301.8608 309.4097 292.2964 353.0347 344.2495 344.6217 416.9871 464.8048 451.7824 390.0322 +# [18] 328.3504 287.1599 337.3041 341.2118 321.3858 387.2546 377.7369 377.8358 460.2475 512.5367 496.5723 428.8271 360.0943 315.9978 372.9082 375.0146 352.1228 +# [35] 423.1210 412.7889 413.8361 505.5787 562.6146 543.5810 469.2881 393.0796 346.0642 408.8213 +# +# $upper.pred.int +# [1] 387.0160 449.2351 492.3139 482.3651 426.7266 372.1112 333.9160 378.0180 385.5669 368.4536 429.1919 420.4067 420.7789 493.1443 540.9620 527.9396 466.1894 +# [18] 404.5076 363.3171 413.4613 417.3690 397.5430 463.4118 453.8941 453.9930 536.4047 588.6939 572.7295 504.9843 436.2515 392.1550 449.0654 451.1718 428.2800 +# [35] 499.2782 488.9461 489.9933 581.7359 638.7718 619.7382 545.4453 469.2368 422.2214 484.9785 +# + +## ----extension,results='hide',fig.width=5,fig.height=3,fig.align = "center", eval=FALSE---- +# NNS.ARMA.optim(AirPassengers, +# seasonal.factor = seq(12, 60, 6), +# obj.fn = expression( sqrt(mean((predicted - actual)^2)) ), +# objective = "min", +# pred.int = .95, h = 50, plot = TRUE) + diff --git a/tools/NNS/vignettes/NNSvignette_09_Forecasting.html b/tools/NNS/vignettes/NNSvignette_09_Forecasting.html new file mode 100644 index 0000000..2ef0910 --- /dev/null +++ b/tools/NNS/vignettes/NNSvignette_09_Forecasting.html @@ -0,0 +1,697 @@ + + + + + + + + + + + + + + + +Getting Started with NNS: Forecasting + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Getting Started with NNS: Forecasting

    +

    Fred Viole

    + + + +
    library(NNS)
    +library(data.table)
    +require(knitr)
    +require(rgl)
    +
    +

    Forecasting

    +

    The underlying assumptions of traditional autoregressive models are +well known. The resulting complexity with these models leads to +observations such as,

    +

    ``We have found that choosing the wrong model or parameters can +often yield poor results, and it is unlikely that even experienced +analysts can choose the correct model and parameters efficiently given +this array of choices.’’

    +

    NNS simplifies the forecasting process. Below are some +examples demonstrating NNS.ARMA and its +assumption free, minimal parameter forecasting +method.

    +
    +

    Linear Regression

    +

    NNS.ARMA has the ability to fit a +linear regression to the relevant component series, yielding very fast +results. For our running example we will use the +AirPassengers dataset loaded in base R.

    +

    We will forecast 44 periods h = 44 of +AirPassengers using the first 100 observations +training.set = 100, returning estimates of the final 44 +observations. We will then test this against our validation set of +tail(AirPassengers,44).

    +

    Since this is monthly data, we will try a +seasonal.factor = 12.

    +

    Below is the linear fit and associated root mean squared error (RMSE) +using method = "lin".

    +
    nns_lin = NNS.ARMA(AirPassengers, 
    +               h = 44, 
    +               training.set = 100, 
    +               method = "lin", 
    +               plot = TRUE, 
    +               seasonal.factor = 12, 
    +               seasonal.plot = FALSE)
    +

    +
    sqrt(mean((nns_lin - tail(AirPassengers, 44)) ^ 2))
    +
    ## [1] 35.39965
    +
    +
    +

    Nonlinear Regression

    +

    Now we can try using a nonlinear regression on the relevant component +series using method = "nonlin".

    +
    nns_nonlin = NNS.ARMA(AirPassengers, 
    +               h = 44, 
    +               training.set = 100, 
    +               method = "nonlin", 
    +               plot = FALSE, 
    +               seasonal.factor = 12, 
    +               seasonal.plot = FALSE)
    +
    +sqrt(mean((nns_nonlin - tail(AirPassengers, 44)) ^ 2))
    +
    [1] 18.1809
    +
    +
    +

    Cross-Validation

    +

    We can test a series of seasonal.factors and select the +best one to fit. The largest period to consider would be +0.5 * length(variable), since we need more than 2 points +for a regression! Remember, we are testing the first 100 observations of +AirPassengers, not the full 144 observations.

    +
    seas = t(sapply(1 : 25, function(i) c(i, sqrt( mean( (NNS.ARMA(AirPassengers, h = 44, training.set = 100, method = "lin", seasonal.factor = i, plot=FALSE) - tail(AirPassengers, 44)) ^ 2) ) ) ) )
    +
    +colnames(seas) = c("Period", "RMSE")
    +seas
    +
    ##       Period      RMSE
    +##  [1,]      1  75.67783
    +##  [2,]      2  75.71250
    +##  [3,]      3  75.87604
    +##  [4,]      4  75.16563
    +##  [5,]      5  76.07418
    +##  [6,]      6  70.43185
    +##  [7,]      7  77.98493
    +##  [8,]      8  75.48997
    +##  [9,]      9  79.16378
    +## [10,]     10  81.47260
    +## [11,]     11 106.56886
    +## [12,]     12  35.39965
    +## [13,]     13  90.98265
    +## [14,]     14  95.64979
    +## [15,]     15  82.05345
    +## [16,]     16  74.63052
    +## [17,]     17  87.54036
    +## [18,]     18  74.90881
    +## [19,]     19  96.96011
    +## [20,]     20  88.75015
    +## [21,]     21 100.21346
    +## [22,]     22 108.68674
    +## [23,]     23  85.06430
    +## [24,]     24  35.49018
    +## [25,]     25  75.16192
    +

    Now we know seasonal.factor = 12 is our best fit, we can +see if there’s any benefit from using a nonlinear regression. +Alternatively, we can define our best fit as the corresponding +seas$Period entry of the minimum value in our +seas$RMSE column.

    +
    a = seas[which.min(seas[ , 2]), 1]
    +

    Below you will notice the use of seasonal.factor = a +generates the same output.

    +
    nns = NNS.ARMA(AirPassengers, 
    +               h = 44, 
    +               training.set = 100, 
    +               method = "nonlin", 
    +               seasonal.factor = a, 
    +               plot = TRUE, seasonal.plot = FALSE)
    +

    +
    sqrt(mean((nns - tail(AirPassengers, 44)) ^ 2))
    +
    ## [1] 18.1809
    +

    Note: You may experience instances with monthly data +that report seasonal.factor close to multiples of 3, 4, 6 +or 12. For instance, if the reported +seasonal.factor = {37, 47, 71, 73} use +(seasonal.factor = c(36, 48, 72)) by setting the +modulo parameter in +NNS.seas(..., modulo = 12). The same +suggestion holds for daily data and multiples of 7, or any other time +series with logically inferred cyclical patterns. The nearest periods to +that modulo will be in the expanded output.

    +
    NNS.seas(AirPassengers, modulo = 12, plot = FALSE)
    +
    ## $all.periods
    +##   Period Coefficient.of.Variation Variable.Coefficient.of.Variation
    +## 1     48                0.4002249                         0.4279947
    +## 2     12                0.4059923                         0.4279947
    +## 3     24                0.4279947                         0.4279947
    +## 4     36                0.4279947                         0.4279947
    +## 5     60                0.4279947                         0.4279947
    +## 
    +## $best.period
    +## [1] 48
    +## 
    +## $periods
    +## [1] 48 12 24 36 60
    +
    +
    +

    Cross-Validating All Combinations of +seasonal.factor

    +

    NNS also offers a wrapper function +NNS.ARMA.optim() to test a given vector of +seasonal.factor and returns the optimized objective +function (in this case RMSE written as +obj.fn = expression( sqrt(mean((predicted - actual)^2)) )) +and the corresponding periods, as well as the +NNS.ARMA regression method used. +Alternatively, using external package objective functions work as well +such as +obj.fn = expression(Metrics::rmse(actual, predicted)).

    +

    NNS.ARMA.optim() will also test whether +to regress the underlying data first, shrink the estimates +to their subset mean values, include a bias.shift based on +its internal validation errors, and compare different +weights of both linear and nonlinear estimates.

    +

    Given our monthly dataset, we will try multiple years by setting +seasonal.factor = seq(12, 60, 6) every 6 months based on +our NNS.seas() insights above.

    +
    nns.optimal = NNS.ARMA.optim(AirPassengers,
    +                             training.set = 100, 
    +                             seasonal.factor = seq(12, 60, 6),
    +                             obj.fn = expression( sqrt(mean((predicted - actual)^2)) ),
    +                             objective = "min",
    +                             pred.int = .95, plot = TRUE)
    +
    +nns.optimal
    +
    [1] "CURRNET METHOD: lin"
    +[1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
    +[1] "NNS.ARMA(... method =  'lin' , seasonal.factor =  c( 12 ) ...)"
    +[1] "CURRENT lin OBJECTIVE FUNCTION = 35.3996540135277"
    +[1] "BEST method = 'lin', seasonal.factor = c( 12 )"
    +[1] "BEST lin OBJECTIVE FUNCTION = 35.3996540135277"
    +[1] "CURRNET METHOD: nonlin"
    +[1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
    +[1] "NNS.ARMA(... method =  'nonlin' , seasonal.factor =  c( 12 ) ...)"
    +[1] "CURRENT nonlin OBJECTIVE FUNCTION = 18.1809033101955"
    +[1] "BEST method = 'nonlin' PATH MEMBER = c( 12 )"
    +[1] "BEST nonlin OBJECTIVE FUNCTION = 18.1809033101955"
    +[1] "CURRNET METHOD: both"
    +[1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
    +[1] "NNS.ARMA(... method =  'both' , seasonal.factor =  c( 12 ) ...)"
    +[1] "CURRENT both OBJECTIVE FUNCTION = 22.7363330823967"
    +[1] "BEST method = 'both' PATH MEMBER = c( 12 )"
    +[1] "BEST both OBJECTIVE FUNCTION = 22.7363330823967"
    +> 
    +> nns.optimal
    +$periods
    +[1] 12
    +
    +$weights
    +NULL
    +
    +$obj.fn
    +[1] 18.1809
    +
    +$method
    +[1] "nonlin"
    +
    +$shrink
    +[1] FALSE
    +
    +$nns.regress
    +[1] FALSE
    +
    +$bias.shift
    +[1] 0
    +
    +$errors
    + [1]  -6.0626221 -10.8434613 -10.7646998 -22.7134790 -15.3519569 -12.9673866  -9.1626428   3.9393939   7.4882812  12.3750000  29.1132812  34.3281250  19.7002739
    +[14]  20.0656989  11.8833952 -15.1389735  24.1108241   7.4289721  15.2385271  38.3826941  19.2903993  17.4644272  19.3331767  19.8155057  -4.0856291  26.3260739
    +[27]   2.6153110 -24.3491085   3.9057436  -8.8271346  -7.9236143   5.9867956  -3.9068174  -0.7986170  42.1995863 -10.1324609 -20.0852820   8.6573328 -21.3067790
    +[40] -24.3403514  -0.6332912 -29.8418247  -5.8572216  14.8998761
    +
    +$results
    + [1] 348.9374 411.1565 454.2353 444.2865 388.6480 334.0326 295.8374 339.9394 347.4883 330.3750 391.1133 382.3281 382.7003 455.0657 502.8834 489.8610 428.1108
    +[18] 366.4290 325.2385 375.3827 379.2904 359.4644 425.3332 415.8155 415.9144 498.3261 550.6153 534.6509 466.9057 398.1729 354.0764 410.9868 413.0932 390.2014
    +[35] 461.1996 450.8675 451.9147 543.6573 600.6932 581.6596 507.3667 431.1582 384.1428 446.8999
    +
    +$lower.pred.int
    + [1] 310.8588 373.0779 416.1567 406.2079 350.5694 295.9540 257.7588 301.8608 309.4097 292.2964 353.0347 344.2495 344.6217 416.9871 464.8048 451.7824 390.0322
    +[18] 328.3504 287.1599 337.3041 341.2118 321.3858 387.2546 377.7369 377.8358 460.2475 512.5367 496.5723 428.8271 360.0943 315.9978 372.9082 375.0146 352.1228
    +[35] 423.1210 412.7889 413.8361 505.5787 562.6146 543.5810 469.2881 393.0796 346.0642 408.8213
    +
    +$upper.pred.int
    + [1] 387.0160 449.2351 492.3139 482.3651 426.7266 372.1112 333.9160 378.0180 385.5669 368.4536 429.1919 420.4067 420.7789 493.1443 540.9620 527.9396 466.1894
    +[18] 404.5076 363.3171 413.4613 417.3690 397.5430 463.4118 453.8941 453.9930 536.4047 588.6939 572.7295 504.9843 436.2515 392.1550 449.0654 451.1718 428.2800
    +[35] 499.2782 488.9461 489.9933 581.7359 638.7718 619.7382 545.4453 469.2368 422.2214 484.9785
    +
    +

    +
    +
    +
    +

    Extension of Estimates

    +

    We can forecast another 50 periods out-of-sample +(h = 50), by dropping the training.set +parameter while generating the 95% prediction intervals.

    +
    NNS.ARMA.optim(AirPassengers, 
    +                seasonal.factor = seq(12, 60, 6),
    +                obj.fn = expression( sqrt(mean((predicted - actual)^2)) ),
    +                objective = "min",
    +                pred.int = .95, h = 50, plot = TRUE)
    +
    +

    +
    +
    +
    +

    Brief Notes on Other Parameters

    +
      +
    • seasonal.factor = c(1, 2, ...)
    • +
    +

    We included the ability to use any number of specified seasonal +periods simultaneously, weighted by their strength of seasonality. +Computationally expensive when used with nonlinear regressions and large +numbers of relevant periods.

    +
      +
    • weights
    • +
    +

    Instead of weighting by the seasonal.factor strength of +seasonality, we offer the ability to weight each per any defined +compatible vector summing to 1.
    +Equal weighting would be weights = "equal".

    +
      +
    • pred.int
    • +
    +

    Provides the values for the specified prediction intervals within +[0,1] for each forecasted point and plots the bootstrapped replicates +for the forecasted points.

    +
      +
    • seasonal.factor = FALSE
    • +
    +

    We also included the ability to use all detected seasonal periods +simultaneously, weighted by their strength of seasonality. +Computationally expensive when used with nonlinear regressions and large +numbers of relevant periods.

    +
      +
    • best.periods
    • +
    +

    This parameter restricts the number of detected seasonal periods to +use, again, weighted by their strength. To be used in conjunction with +seasonal.factor = FALSE.

    +
      +
    • modulo
    • +
    +

    To be used in conjunction with seasonal.factor = FALSE. +This parameter will ensure logical seasonal patterns (i.e., +modulo = 7 for daily data) are included along with the +results.

    +
      +
    • mod.only
    • +
    +

    To be used in conjunction with +seasonal.factor = FALSE & modulo != NULL. This +parameter will ensure empirical patterns are kept along with the logical +seasonal patterns.

    +
      +
    • dynamic = TRUE
    • +
    +

    This setting generates a new seasonal period(s) using the estimated +values as continuations of the variable, either with or without a +training.set. Also computationally expensive due to the +recalculation of seasonal periods for each estimated value.

    +
      +
    • plot , seasonal.plot
    • +
    +

    These are the plotting arguments, easily enabled or disabled with +TRUE or FALSE. +seasonal.plot = TRUE will not plot without +plot = TRUE. If a seasonal analysis is all that is desired, +NNS.seas is the function specifically suited for that +task.

    +
    +
    +
    +

    Multivariate Time Series Forecasting

    +

    The extension to a generalized multivariate instance is provided in +the following documentation of the +NNS.VAR() function:

    + +
    +
    +

    References

    +

    If the user is so motivated, detailed arguments and proofs are +provided within the following:

    + +
    + + + + + + + + + + + diff --git a/tools/NNS_13.0.tar.gz b/tools/NNS_13.0.tar.gz index 3201436..7e6adb3 100644 Binary files a/tools/NNS_13.0.tar.gz and b/tools/NNS_13.0.tar.gz differ diff --git a/upstream/NNS b/upstream/NNS index 7250bb6..8183e96 160000 --- a/upstream/NNS +++ b/upstream/NNS @@ -1 +1 @@ -Subproject commit 7250bb627d12f6cdb1dbe60a8b8e50385e2a7c41 +Subproject commit 8183e964c941d9981e19b23dfbc9fc8336903d89