+
+bookdown::pdf_book:
+ keep_tex: true
+ latex_engine: xelatex
+ pandoc_args:
+ - "--pdf-engine-opt=-8bit"
+
+bookdown::epub_book: default
\ No newline at end of file
diff --git a/tools/NNS/book/chapter-01-why-classical-statistics-breaks.Rmd b/tools/NNS/book/chapter-01-why-classical-statistics-breaks.Rmd
new file mode 100644
index 0000000..0a248e7
--- /dev/null
+++ b/tools/NNS/book/chapter-01-why-classical-statistics-breaks.Rmd
@@ -0,0 +1,203 @@
+# Why Classical Statistics Breaks
+
+Statistics was designed for a world that rarely exists.
+
+The classical statistical framework was built during a time when data were scarce, computation was expensive, and tractable mathematical models were essential. In that environment, simplifying assumptions were not merely convenient—they were necessary.
+
+Symmetry simplified algebra, linearity simplified inference, and parametric distributions simplified estimation. The result was a remarkably elegant mathematical framework that dominated statistics for over a century.
+
+Yet the real world is rarely so cooperative: relationships are often nonlinear, and observed distributions are frequently skewed, heavy-tailed, or otherwise far from normal. Modern data therefore repeatedly violate the assumptions upon which classical statistics was constructed.
+
+A familiar example is daily asset returns: even broad equity indexes exhibit fat tails and occasional abrupt drawdowns that are poorly captured by Gaussian models.
+
+This book begins with a simple observation: **many of the core tools of classical statistics fail because they collapse directional information into symmetric aggregates.** Once this collapse occurs, important structural information about the data is permanently lost. The purpose of this chapter is to explain why this happens—and why a different statistical primitive is needed.
+
+---
+
+## The Hidden Assumption of Symmetry
+
+Most statistical quantities treat deviations from a reference point symmetrically.
+
+Consider the most familiar measure of variability:
+
+\[
+Var(X) = E[(X-\mu)^2]
+\]
+
+The formula squares deviations from the mean and averages them. Positive and negative deviations contribute equally.
+
+But real systems often care deeply about **which direction a deviation occurs**.
+
+A negative financial return is not equivalent to a positive return of the same magnitude.
+A forecast that underestimates demand may be far more costly than one that overestimates it.
+A loss relative to a benchmark is not psychologically equivalent to a gain.
+
+Yet classical statistics treats these deviations identically.
+
+The symmetry is not inherent to the data.
+It is imposed by the mathematical formulation.
+
+And once imposed, directional information disappears.
+
+---
+
+## Aggregation Before Observation
+
+To see what is lost, rewrite variance by separating positive and negative deviations.
+
+Define the positive-part operator
+
+\[
+x^{+} = \max(x,0)
+\]
+
+Then variance can be written as
+
+\[
+Var(X) =
+E[(X-\mu)^2_+] + E[(\mu-X)^2_+].
+\]
+
+This decomposition shows that variance is actually the **sum of two directional quantities**:
+
+- upside deviation
+- downside deviation
+
+Variance reports only their sum.
+
+Two distributions can therefore have identical variance while possessing completely different directional structures.
+
+One distribution may have large upside volatility and little downside risk.
+Another may have the opposite profile.
+
+Variance cannot distinguish them.
+
+The symmetric statistic is therefore a **projection** of a richer directional structure.
+
+Mathematically, the directional components determine the symmetric moment uniquely, but the symmetric moment cannot recover the directional components without additional assumptions.
+
+Classical moments therefore aggregate directional information before reporting the result.
+Once aggregated, the original directional structure cannot generally be recovered.
+
+---
+
+## The Problem with Linear Dependence
+
+The same issue appears in dependence measurement.
+
+The classical correlation coefficient measures the strength of a linear relationship:
+
+\[
+\rho(X,Y)=\frac{Cov(X,Y)}{\sigma_X\sigma_Y}.
+\]
+
+Correlation works well when relationships are approximately linear.
+
+But many relationships are not.
+
+Two variables may exhibit strong dependence through nonlinear patterns:
+
+- threshold effects in economics
+- volatility clustering in financial markets
+- asymmetric reactions to shocks
+- conditional dependence structures that cancel under linear aggregation
+
+For example, if \(Y = X^2\) and \(X\) is symmetric around zero, then \(Corr(X,Y) = 0\) despite perfect deterministic dependence. Correlation does not merely understate the relationship—it misses it entirely.
+
+The problem again arises from aggregation.
+
+Covariance averages co-deviations across the entire distribution, collapsing directional structure into a single linear measure.
+
+---
+
+## Parametric Comfort and Model Risk
+
+Another pillar of classical statistics is the use of parametric distributions.
+
+The normal distribution occupies a central role in statistical inference:
+
+- hypothesis testing
+- regression modeling
+- time series analysis
+- risk measurement
+
+Parametric models dramatically simplify estimation because they restrict the space of possible distributions.
+
+But when the assumed model is incorrect, inference can become dangerously misleading.
+
+Financial markets provide many examples.
+
+Asset returns exhibit heavy tails, skewness, and time-varying volatility—features that violate the assumptions of the normal distribution. Yet models based on Gaussian assumptions have historically underestimated extreme events.
+
+The problem is not simply that the wrong distribution is chosen.
+
+The deeper issue is that **parametric assumptions impose structure that the data may not possess**.
+
+---
+
+## The Limits of Traditional Nonparametrics
+
+Nonparametric methods were introduced to address these problems by estimating statistical objects directly from data.
+
+Kernel density estimation, kernel regression, and smoothing splines are common examples.
+
+However, most nonparametric methods introduce another challenge: **bandwidth selection**.
+
+The bandwidth determines how much smoothing occurs.
+
+Small bandwidths produce noisy estimates.
+Large bandwidths obscure structure.
+
+In practice, bandwidth selection is often the dominant source of modeling error in nonparametric estimation.
+
+Thus even nonparametric methods frequently rely on externally chosen tuning parameters.
+
+---
+
+## A Different Primitive
+
+The difficulties described above share a common source, and it is worth stating them plainly before moving on:
+
+1. **Symmetric aggregation hides directional information.**
+2. **Linear dependence measures fail for nonlinear relationships.**
+3. **Parametric assumptions introduce model risk.**
+4. **Many nonparametric methods depend on arbitrary bandwidth selection.**
+
+Classical statistics begins with **symmetric aggregates**.
+
+Directional information is collapsed before analysis begins.
+
+An alternative approach is to reverse this order.
+
+Instead of starting with symmetric statistics, we begin with **directional deviations relative to a benchmark**—measuring how observations move relative to a target, separately above and below it.
+
+The key insight of this book is that directional deviation relative to a benchmark is sufficient to reconstruct many of the core constructs of statistics.
+
+From this single primitive we will derive:
+
+- the cumulative distribution function,
+- classical moments,
+- nonlinear dependence measures,
+- nonparametric estimators,
+- and benchmark-relative expected utility.
+
+Remarkably, symmetric statistics emerge from this framework not as axioms but as aggregations—special cases of a more general directional structure.
+
+---
+
+## From Symmetric Statistics to Directional Statistics
+
+Classical statistics treats symmetry as fundamental.
+
+Directional statistics treats symmetry as a special case.
+
+Under the directional framework:
+
+- symmetric moments become aggregates of directional components,
+- nonlinear dependence can be measured directly,
+- distributions can be represented without parametric assumptions,
+- and nonparametric estimation can adapt to data structure without externally chosen bandwidths.
+
+The next chapter introduces the mathematical foundation of this framework: **directional deviation operators**.
+
+These operators are the primitive from which the rest of the book is built.
diff --git a/tools/NNS/book/chapter-02-directional-deviation-operators.Rmd b/tools/NNS/book/chapter-02-directional-deviation-operators.Rmd
new file mode 100644
index 0000000..9f8cc34
--- /dev/null
+++ b/tools/NNS/book/chapter-02-directional-deviation-operators.Rmd
@@ -0,0 +1,286 @@
+# Directional Deviation Operators
+
+Chapter 1 argued that many failures of classical statistics arise from **symmetric aggregation**.
+Classical moments, covariance, and correlation collapse directional information into symmetric summaries before analysis begins.
+
+This chapter introduces the mathematical primitive that avoids that collapse:
+
+**directional deviation operators.**
+
+These operators measure deviations relative to a benchmark separately above and below the reference point. From this simple construction we will derive many familiar objects of statistics.
+
+The framework begins with a simple observation:
+
+**any deviation relative to a benchmark has a direction.**
+
+---
+
+## Deviations Relative to a Benchmark
+
+Let \(X\) be a real-valued random variable and let \(t \in \mathbb{R}\) denote a benchmark.
+
+Classical statistics measures deviations using
+
+\[
+X - t
+\]
+
+which mixes positive and negative deviations together.
+
+Directional statistics separates them.
+
+Define the **positive-part operator**
+
+\[
+x^{+} = \max(x,0).
+\]
+
+Using this operator we define two directional deviations:
+
+\[
+(X-t)^+ = \max(X-t,0)
+\]
+
+\[
+(t-X)^+ = \max(t-X,0).
+\]
+
+These represent
+
+- deviations **above the benchmark**
+- deviations **below the benchmark**
+
+Both quantities are nonnegative.
+
+Together they fully characterize the magnitude of deviation relative to \(t\).
+
+---
+
+## Directional Decomposition of Deviations
+
+Every deviation can be decomposed into directional components.
+
+For any real number \(x\),
+
+\[
+x = x^+ - (-x)^+.
+\]
+
+Applying this identity to \(X-t\) yields
+
+\[
+X-t = (X-t)^+ - (t-X)^+.
+\]
+
+Thus the classical deviation can be expressed as the **difference between two directional magnitudes**.
+
+The directional operators also reconstruct the magnitude of deviation:
+
+\[
+|X-t| = (X-t)^+ + (t-X)^+.
+\]
+
+Thus directional components fully determine both the signed deviation and its magnitude.
+
+The key implication is structural:
+
+**the symmetric deviation is an aggregation of directional components.**
+
+Classical statistics begins with the aggregate.
+Directional statistics begins with the components.
+
+---
+
+## Directional Operators
+
+The functions
+
+\[
+(X-t)^+ , \quad (t-X)^+
+\]
+
+are called **directional deviation operators**.
+
+They induce a natural partition of the sample space:
+
+- \(X > t\)
+- \(X \le t\)
+
+Within each region the operators measure the magnitude of deviation from the benchmark.
+
+This partition is fundamental. Many real-world systems evaluate outcomes relative to targets:
+
+- profits relative to costs
+- returns relative to required benchmarks
+- losses relative to liabilities
+- forecast errors relative to expected demand
+
+Directional deviation operators formalize this benchmark-relative measurement.
+
+---
+
+## Partial Moments
+
+Once directional deviations are defined, their magnitudes can be summarized through expectations.
+
+For integer \(r \ge 0\), define the **lower partial moment**
+
+\[
+L_r(t;X) = E[(t-X)_+^r]
+\]
+
+and the **upper partial moment**
+
+\[
+U_r(t;X) = E[(X-t)_+^r].
+\]
+
+These quantities measure directional deviation magnitudes relative to the benchmark.
+
+For these expectations to be finite, it is sufficient that the corresponding directional powers are integrable (for example, \(E[|X|^r]<\infty\) for fixed \(r\)).
+
+The parameter \(r\) determines the type of deviation measured.
+
+| Degree \(r\) | Interpretation |
+|---|---|
+| 0 | probability mass |
+| 1 | directional mean deviation |
+| 2 | directional variance |
+| \(r>2\) | higher-order tail structure |
+
+Partial moments therefore generalize classical moments while preserving directional structure.
+
+
+## Notation Bridge: Theory to R Implementation
+
+The manuscript uses theoretical notation in proofs and function-style notation in implementation examples. The mapping is direct:
+
+| Theoretical object | Meaning | R implementation pattern (Using `NNS` Package) |
+|---|---|---|
+| \(L_r(t;X)\) | lower partial moment of degree \(r\) at benchmark \(t\) | `LPM(r, t, X)` |
+| \(U_r(t;X)\) | upper partial moment of degree \(r\) at benchmark \(t\) | `UPM(r, t, X)` |
+| \(L_r(t;X)_{\text{ratio}}\) | normalized lower share \(L_r/(L_r+U_r)\) | `LPM.ratio(r, t, X)` |
+| \(U_r(t;X)_{\text{ratio}}\) | normalized upper share \(U_r/(L_r+U_r)\) | `UPM.ratio(r, t, X)` |
+| \(CoLPM\), \(CoUPM\), \(DLPM\), \(DUPM\) | concordant/divergent co-partial moments | `Co.LPM(...)`, `Co.UPM(...)`, `D.LPM(...)`, `D.UPM(...)` |
+
+Unless otherwise stated, later chapters use the mathematical form for derivations and the function-call form for reproducible examples.
+
+---
+
+## Benchmarks
+
+A distinctive feature of partial moments is the benchmark \(t\).
+
+In classical statistics, reference points are usually determined by the distribution itself.
+The mean, median, and variance are all defined internally.
+
+Partial moments differ in an important way: the benchmark need not be determined by the distribution.
+
+Instead, \(t\) may represent an **externally meaningful reference point** chosen by the analyst or by the decision context.
+
+Examples include:
+
+- target returns in finance
+- policy thresholds in economics
+- forecast baselines in operations
+- safety limits in engineering
+- aspiration levels in behavioral decision theory
+
+The benchmark therefore embeds the context in which deviations matter.
+
+Directional statistics evaluates distributions relative to those contexts rather than purely distributional averages.
+
+---
+
+## Relationship to Classical Moments
+
+Classical moments arise as aggregations of partial moments.
+
+For integer \(r \ge 1\),
+
+\[
+E[(X-t)^r] = U_r(t;X) + (-1)^r L_r(t;X).
+\]
+
+This identity shows that symmetric moments are **signed combinations of directional components**.
+
+Several familiar quantities follow immediately.
+
+### Mean
+
+\[
+E[X] = U_1(0;X) - L_1(0;X)
+\]
+
+### Variance
+
+\[
+Var(X) = U_2(\mu;X) + L_2(\mu;X)
+\]
+
+This identity is the **population** variance decomposition. In R, `var(x)` returns the sample variance, so numerical checks against `UPM(2, mean(x), x) + LPM(2, mean(x), x)` should include the Bessel correction factor \(n/(n-1)\).
+
+### Third Central Moment
+
+\[
+E[(X-\mu)^3] = U_3(\mu;X) - L_3(\mu;X).
+\]
+
+Thus classical symmetric statistics do not introduce fundamentally new objects.
+They aggregate directional ones.
+
+Importantly, the mapping from partial moments to symmetric moments is **many-to-one**.
+
+Directional components determine the symmetric moment uniquely, but the symmetric moment does not generally determine the directional components.
+
+Directional moments therefore contain strictly more information.
+
+---
+
+## Information Loss in Symmetric Aggregation
+
+Consider two distributions with identical variance.
+
+Distribution A may exhibit
+
+- large upside deviations
+- small downside deviations.
+
+Distribution B may exhibit
+
+- small upside deviations
+- large downside deviations.
+
+Both can produce the same value of
+
+\[
+Var(X).
+\]
+
+Variance alone cannot distinguish them.
+
+However the directional quantities
+
+\[
+U_2(\mu;X), \quad L_2(\mu;X)
+\]
+
+immediately reveal the asymmetry.
+
+Symmetric statistics therefore represent **projections of directional structure**.
+
+Once this projection occurs, the original directional information cannot generally be recovered.
+
+---
+
+## Directional Operators as a Statistical Primitive
+
+Directional deviation operators provide a foundation from which many statistical constructs can be derived.
+
+Rather than beginning with symmetric statistics and imposing directional interpretation afterward, the directional framework reverses the order: deviations relative to a benchmark are measured first, and symmetric statistics emerge only as aggregations of those directional components.
+
+The implications of this perspective are surprisingly broad.
+
+The next chapter begins with a result that illustrates the power of the framework:
+
+**the cumulative distribution function itself is a partial moment.**
diff --git a/tools/NNS/book/chapter-03-distribution-theory-from-partial-moments.Rmd b/tools/NNS/book/chapter-03-distribution-theory-from-partial-moments.Rmd
new file mode 100644
index 0000000..38bc8cc
--- /dev/null
+++ b/tools/NNS/book/chapter-03-distribution-theory-from-partial-moments.Rmd
@@ -0,0 +1,366 @@
+# Distribution Theory from Partial Moments
+
+Chapter 2 introduced directional deviation operators and the partial moments constructed from them.
+Those operators separate deviations relative to a benchmark into directional components.
+
+This chapter establishes a surprising result:
+
+**the cumulative distribution function is itself a partial moment.**
+
+Once this relationship is recognized, several foundational objects of probability theory—survival functions, hazard rates, and quantile functions—emerge naturally from the same framework.
+
+---
+
+## Degree-Zero Partial Moments
+
+Recall the definitions of the lower and upper partial moments.
+
+For integer \( r \ge 0 \),
+
+\[
+L_r(t;X) = E[(t-X)_+^r]
+\]
+
+\[
+U_r(t;X) = E[(X-t)_+^r].
+\]
+
+When \( r = 0 \), we interpret the expressions directly in indicator form:
+
+\[
+(t-X)_+^0 =
+\begin{cases}
+1 & X \le t \\
+0 & X > t
+\end{cases}
+\]
+
+\[
+(X-t)_+^0 =
+\begin{cases}
+1 & X > t \\
+0 & X \le t
+\end{cases}
+\]
+
+Thus the degree-zero lower partial moment becomes
+
+\[
+L_0(t;X) = E[(t-X)_+^0].
+\]
+
+Observe that the expression \((t-X)_+^0\) behaves exactly like an indicator function, which is the intended degree-zero convention used throughout this chapter.
+
+Thus
+
+\[
+(t-X)_+^0 = 1_{\{X \le t\}}.
+\]
+
+Taking expectations yields the following fundamental result.
+
+---
+
+### Theorem 3.1 (CDF Representation)
+
+For any random variable \(X\) and benchmark \(t \in \mathbb{R}\),
+
+\[
+L_0(t;X) = P(X \le t) = F_X(t).
+\]
+
+**Proof**
+
+From the definition of the lower partial moment,
+
+\[
+L_0(t;X) = E[(t-X)_+^0].
+\]
+
+As shown above,
+
+\[
+(t-X)_+^0 = 1_{\{X \le t\}}.
+\]
+
+Therefore
+
+\[
+L_0(t;X) = E[1_{\{X \le t\}}].
+\]
+
+Since the expectation of an indicator equals the probability of the event,
+
+\[
+L_0(t;X) = P(X \le t).
+\]
+
+Thus
+
+\[
+L_0(t;X) = F_X(t).
+\]
+
+\(\square\)
+
+**Remark.**
+The cumulative distribution function is therefore not an independent primitive of probability theory. It is the degree-zero instance of the partial-moment operator.
+
+### Empirical CDF Equivalence in R
+
+The degree-zero lower partial moment can be computed directly and compared to the empirical CDF:
+
+```r
+library(NNS)
+P = ecdf(x)
+P(0) ; P(1)
+LPM(degree = 0, target = 0, variable = x) ; LPM(degree = 0, target = 1, variable = x)
+
+# Vectorized targets:
+LPM(degree = 0, target = c(0, 1), variable = x)
+
+plot(ecdf(x))
+points(sort(x), LPM(degree = 0, target = sort(x), variable = x), col = "red")
+legend("left", legend = c("ecdf", "LPM.CDF"), fill = c("black", "red"), border = NA, bty = "n")
+```
+
+
+
+
+
+---
+
+## Complementary Directional Probability
+
+Theorem 3.1 showed that the cumulative distribution function is the degree-zero lower partial moment:
+
+\[
+F_X(t) = L_0(t;X).
+\]
+
+The complementary directional probability is given by the **upper degree-zero partial moment**
+
+\[
+U_0(t;X) = P(X > t).
+\]
+
+These two quantities partition the sample space, so
+
+\[
+L_0(t;X) + U_0(t;X) = 1.
+\]
+
+Equivalently,
+
+\[
+F_X(t) + U_0(t;X) = 1.
+\]
+
+Thus the directional operators provide a natural decomposition of probability mass relative to the benchmark \(t\):
+
+- \(L_0(t;X)\): probability mass **at or below the benchmark**
+- \(U_0(t;X)\): probability mass **above the benchmark**
+
+This directional partition forms the foundation for the survival and hazard functions examined in the next sections.
+
+---
+
+## The Survival Function
+
+The **survival function** is defined as
+
+\[
+S_X(t) = P(X > t).
+\]
+
+Using the directional framework,
+
+\[
+S_X(t) = U_0(t;X).
+\]
+
+Thus the survival function is simply the **upper degree-zero partial moment**.
+
+Because
+
+\[
+F_X(t) + S_X(t) = 1,
+\]
+
+the CDF and survival function represent complementary directional probabilities.
+
+This interpretation is particularly useful in reliability analysis, survival analysis, and risk management, where interest often lies in the probability that outcomes exceed a threshold.
+
+---
+
+## Hazard Rates
+
+In survival analysis the **hazard rate** describes the instantaneous probability of failure conditional on survival.
+
+For continuous distributions the hazard rate is defined as
+
+\[
+h(t) = \frac{f(t)}{S_X(t)}
+\]
+
+where \(f(t)\) is the probability density function.
+
+The density function can be written as the derivative of the cumulative distribution function:
+
+\[
+f(t) = \frac{d}{dt}F_X(t).
+\]
+
+Since
+
+\[
+F_X(t) = L_0(t;X),
+\]
+
+this implies
+
+\[
+f(t) = \frac{d}{dt}L_0(t;X).
+\]
+
+Thus the hazard rate becomes
+
+\[
+h(t) = \frac{f(t)}{U_0(t;X)}.
+\]
+
+This provides a directional interpretation of the hazard rate.
+
+The upper partial moment \(U_0(t;X)\) represents the probability mass that remains **above the benchmark \(t\)**.
+The hazard rate therefore measures the instantaneous **flow of probability mass across the benchmark** from the upper directional region \(X > t\) into the lower region \(X \le t\).
+
+The **cumulative hazard function** is
+
+\[
+H(t) = \int_0^t h(s)\,ds.
+\]
+
+Although hazard rates are typically introduced within survival analysis, they arise naturally within the directional framework once the survival function is recognized as an upper partial moment.
+
+---
+
+## Quantile Functions
+
+The **quantile function** provides the inverse mapping of the cumulative distribution function.
+
+For \( p \in (0,1) \), the quantile is defined as
+
+\[
+Q(p) = \inf\{x : F_X(x) \ge p\}.
+\]
+
+Because
+
+\[
+F_X(t) = L_0(t;X),
+\]
+
+the quantile function identifies the benchmark \(t\) at which the degree-zero partial moment reaches probability level \(p\).
+
+Quantiles therefore correspond to **benchmarks that partition probability mass**.
+
+This interpretation aligns naturally with the directional framework, which evaluates distributions relative to benchmark thresholds.
+
+
+
+### Lower-Tail Thresholds as Degree-Zero Partial-Moment Quantiles
+
+A lower-tail threshold is often introduced in application-specific language, but within the directional framework it is simply a quantile of the degree-zero lower partial moment.
+
+Let
+\[
+F_X(t)=P(X\le t).
+\]
+By the result established earlier in this chapter,
+\[
+F_X(t)=L_0(t;X).
+\]
+Therefore the lower-tail quantile at probability level \(\alpha\) may be written as
+\[
+Q_X(\alpha)=\inf\{t\in\mathbb{R}:F_X(t)\ge \alpha\}
+=\inf\{t\in\mathbb{R}:L_0(t;X)\ge \alpha\}.
+\]
+
+This identity is general. It does not depend on whether \(X\) represents returns, forecast errors, waiting times, deviations from a quality target, or distances below a safety threshold. In every case, the degree-zero lower partial moment answers the same question: what proportion of observations fall below the benchmark \(t\)?
+
+In some fields, especially finance, the lower-tail quantile
+\[
+\inf\{t:F_X(t)\ge \alpha\}
+\]
+is called Value-at-Risk. But the mathematical object is broader than that label. It is the benchmark value that partitions a chosen fraction \(\alpha\) of lower-tail mass.
+
+This observation matters because it shows that threshold analysis is not an external application added onto the theory of distributions. It is already contained in the degree-zero directional representation of probability. The estimation-error literature makes the same point explicitly by identifying \(LPM_0\) with the cumulative distribution function and hence with the probability-of-loss object used in applied risk work.
+
+A second implication will become important later. Degree zero partitions observations by frequency alone. Higher degrees retain the same threshold logic while reweighting observations by the severity of their deviations from the benchmark. Thus quantile thinking extends naturally from event frequency to severity-weighted directional mass.
+
+**Proposition 3.1A.** For any random variable \(X\) and any \(\alpha\in(0,1)\),
+
+\[
+Q_X(\alpha)=\inf\{t:L_0(t;X)\ge \alpha\}.
+\]
+
+**Proof.** Since \(L_0(t;X)=F_X(t)\), the result follows directly from the definition of the lower quantile.
+
+
+---
+
+## Probability Integral Transform
+
+If \(X\) has cumulative distribution function \(F_X\), then the transformed variable
+
+\[
+U = F_X(X)
+\]
+
+is uniformly distributed on \([0,1]\).
+
+Since
+
+\[
+F_X(t) = L_0(t;X),
+\]
+
+the probability integral transform can be written in directional form as
+
+\[
+U = L_0(X;X).
+\]
+
+Here the benchmark equals the realized observation. The operator therefore measures the probability that an **independent draw from the same distribution** does not exceed the observed value.
+
+The transformation maps observations into probability space and forms the foundation for many statistical procedures including simulation, copula modeling, and dependence analysis.
+
+---
+
+## Distribution Theory as Directional Measurement
+
+Classical probability theory typically introduces the cumulative distribution function as a primitive object.
+
+The directional framework reveals that the CDF arises from a simpler structure.
+
+It is simply the **degree-zero instance of the partial-moment operator**.
+
+Higher-order partial moments measure magnitudes of directional deviation, while the degree-zero case measures directional probability mass.
+
+Thus probability distribution functions and moment statistics emerge from the same underlying primitive.
+
+---
+
+## Structural Implications
+
+The results of this chapter establish three key points.
+
+1. The cumulative distribution function is the degree-zero lower partial moment.
+
+2. The survival function is the degree-zero upper partial moment.
+
+3. Quantile functions identify benchmarks that partition probability mass.
+
+Distribution theory therefore lies inside the same directional framework that generates moment statistics.
+
+The next chapter turns to **numerical integration via partial moments**. Chapter 5 then shows how **classical moments arise as signed combinations of partial moments**, further demonstrating the unifying role of directional statistics.
diff --git a/tools/NNS/book/chapter-04-numerical-integration-via-partial-moments.Rmd b/tools/NNS/book/chapter-04-numerical-integration-via-partial-moments.Rmd
new file mode 100644
index 0000000..6c57aab
--- /dev/null
+++ b/tools/NNS/book/chapter-04-numerical-integration-via-partial-moments.Rmd
@@ -0,0 +1,352 @@
+# Numerical Integration via Partial Moments
+
+Chapter 3 showed that the cumulative distribution function arises as the degree-zero partial moment.
+Probability mass itself can therefore be represented through the directional deviation operators introduced earlier.
+
+The same idea extends naturally to **numerical integration**.
+
+Many quantities in probability, statistics, and economics are defined as definite integrals.
+Expected values and risk measures both rely on integrating functions with respect to probability distributions.
+
+This chapter shows that partial moments provide a natural and flexible way to approximate such integrals.
+
+Rather than relying on classical quadrature formulas alone, we can represent integrals through expectations of directional deviations relative to benchmarks.
+
+---
+
+## Definite Integrals as Expectations
+
+Let \(X\) be a random variable with cumulative distribution function \(F_X\).
+
+For any measurable function \(g(x)\), the expectation of \(g(X)\) can be written as
+
+\[
+E[g(X)] = \int_{-\infty}^{\infty} g(x)\, dF_X(x).
+\]
+
+When \(X\) has density \(f(x)\), this becomes
+
+\[
+E[g(X)] = \int_{-\infty}^{\infty} g(x) f(x)\, dx.
+\]
+
+Thus expectations are **definite integrals weighted by probability**.
+
+This representation allows integrals to be estimated directly from sample data:
+
+\[
+E[g(X)] \approx \frac{1}{n}\sum_{i=1}^{n} g(x_i).
+\]
+
+In practice, many statistical quantities—including moments and risk measures—are simply special cases of this expectation integral.
+
+---
+
+## Integrals from Directional Deviations
+
+Consider the upper partial moment
+
+\[
+U_r(t;X) = E[(X-t)_+^r].
+\]
+
+Using the definition of expectation,
+
+\[
+U_r(t;X) = \int_{t}^{\infty} (x-t)^r f(x)\,dx.
+\]
+
+Similarly, the lower partial moment is
+
+\[
+L_r(t;X) = \int_{-\infty}^{t} (t-x)^r f(x)\,dx.
+\]
+
+Thus partial moments correspond directly to **definite integrals over directional regions** of the distribution.
+
+The integrand is the deviation magnitude relative to the benchmark \(t\).
+
+These integrals quantify how much probability mass lies above or below the benchmark and how far those observations lie from it.
+
+---
+
+## Approximation via Sample Partial Moments
+
+Suppose we observe a sample \(x_1,\dots,x_n\).
+
+The partial moments can be estimated empirically:
+
+\[
+\hat{U}_r(t) = \frac{1}{n}\sum_{i=1}^{n} (x_i-t)_+^r
+\]
+
+\[
+\hat{L}_r(t) = \frac{1}{n}\sum_{i=1}^{n} (t-x_i)_+^r.
+\]
+
+These quantities approximate the integrals
+
+\[
+\int_{t}^{\infty} (x-t)^r f(x)\,dx
+\]
+
+and
+
+\[
+\int_{-\infty}^{t} (t-x)^r f(x)\,dx.
+\]
+
+Unlike classical quadrature rules that rely on fixed grid points, the empirical partial moments use the observed data directly.
+
+This approach provides a **data-adaptive integration scheme**.
+
+---
+
+## Example: Estimating Downside Risk
+
+To illustrate, suppose we observe the returns
+
+\[
+x = \{-4,-2,-1,1,3,5\}.
+\]
+
+Let the benchmark return be
+
+\[
+t = 0.
+\]
+
+The lower partial moment of degree 1 is
+
+\[
+L_1(0;X) = E[(0-X)_+].
+\]
+
+Compute the directional deviations:
+
+| \(x_i\) | \((0-x_i)_+\) |
+|---|---|
+| -4 | 4 |
+| -2 | 2 |
+| -1 | 1 |
+| 1 | 0 |
+| 3 | 0 |
+| 5 | 0 |
+
+The empirical estimate becomes
+
+\[
+\hat{L}_1(0) =
+\frac{4+2+1}{6}
+=
+\frac{7}{6}
+\approx 1.17.
+\]
+
+This quantity measures the **unconditional average shortfall below the benchmark** (i.e., averaged over all observations, including zeros above the benchmark).
+
+The calculation approximates the integral
+
+\[
+\int_{-\infty}^{0} (0-x)f(x)\,dx
+\]
+
+using only the observed sample.
+
+This calculation is exactly the empirical estimator computed above in this section with \(r = 1\).
+The example therefore illustrates how the general partial-moment estimator performs numerical integration directly from sample data.
+
+---
+
+## Relationship to Classical Quadrature
+
+Classical numerical integration methods approximate integrals using weighted sums of function values evaluated at predetermined nodes.
+
+Expectation integrals differ in an important way:
+
+\[
+E[g(X)] = \int g(x)\, dF_X(x).
+\]
+
+If \(X \sim \mathrm{Unif}(a,b)\), then
+
+\[
+E[f(X)] = \frac{1}{b-a}\int_a^b f(x)\,dx.
+\]
+
+So the interval integral is recovered by the explicit scale factor:
+
+\[
+\int_a^b f(x)\,dx = (b-a)E[f(X)].
+\]
+
+Using first-degree partial moments at benchmark \(t=0\) for \(Y=f(X)\),
+
+\[
+E[Y] = U_1(0;Y)-L_1(0;Y),
+\]
+
+hence
+
+\[
+\int_a^b f(x)\,dx \approx (b-a)\left(\hat U_1(0;Y)-\hat L_1(0;Y)\right).
+\]
+
+For unsigned (total) area,
+
+\[
+\int_a^b |f(x)|\,dx \approx (b-a)\left(\hat U_1(0;Y)+\hat L_1(0;Y)\right).
+\]
+
+The key accuracy point is that the \((b-a)\) term multiplies the partial-moment estimate whenever the domain is \([a,b]\) rather than a unit-length interval.
+
+Here the weighting measure is the distribution \(F_X\), not uniform measure in general.
+
+Empirical partial moments therefore approximate integrals using the observed data themselves as evaluation points. Regions with higher probability mass contribute more strongly to the approximation.
+
+In this sense, partial-moment integration is **distribution-adaptive**: the integration nodes are determined by the data rather than by a fixed grid.
+
+From a computational perspective, empirical partial moments are also simple to scale: for fixed \(r\) and benchmark \(t\), each estimate requires a single pass through the sample (\(O(n)\) operations). Classical quadrature can be very accurate for smooth low-dimensional integrands, but it depends on externally chosen nodes and weights and can become sensitive when mass is concentrated in tail regions. The partial-moment estimator trades closed-form quadrature weights for direct data weighting under \(F_X\), which is often numerically stable in statistical applications.
+
+---
+
+## Convergence Properties
+
+Under standard regularity conditions, empirical expectations converge to their population counterparts.
+
+By the law of large numbers,
+
+\[
+\hat{U}_r(t) \rightarrow U_r(t;X)
+\]
+
+and
+
+\[
+\hat{L}_r(t) \rightarrow L_r(t;X)
+\]
+
+as \(n \to \infty\).
+
+Thus the empirical partial moments provide consistent estimators of the corresponding integrals.
+
+Because the integration nodes are the observed data themselves, the approximation improves automatically as the sample grows.
+
+No externally chosen bandwidth or grid resolution is required.
+
+---
+
+## Applications
+
+The partial-moment representation of integrals has many applications.
+
+### Probability and Distribution Analysis
+
+Many distributional quantities can be written as integrals of deviation functions.
+
+Examples include:
+
+- unconditional partial-moment shortfall measures (distinct from conditional expected shortfall / CVaR)
+- tail risk measures
+- higher-order moments.
+
+### Risk Measurement
+
+In finance, downside risk measures often take the form
+
+\[
+E[(\tau-X)_+^r].
+\]
+
+These are precisely lower partial moments relative to a target return \(\tau\).
+
+Thus many risk measures are simply integrals of directional deviations.
+
+
+### Directional Probability Bounds
+
+Partial moments do more than approximate benchmark-relative integrals numerically. They also support conservative bounds on tail probabilities. This is important because threshold-based decisions often require guarantees that remain valid even when the underlying distribution is unknown or misspecified.
+
+Suppose \(g<\mu\) is a lower benchmark and the event of interest is
+\[
+X\le g.
+\]
+A classical one-sided Chebyshev argument bounds this lower-tail probability using symmetric dispersion:
+\[
+P(X\le g)\le \frac{1}{2}\left(\frac{\sigma}{\mu-g}\right)^2.
+\]
+This bound depends only on the mean and variance, so it remains distribution-free, but it does not distinguish between upper and lower deviations.
+
+A directional refinement replaces symmetric variance with semivariance:
+\[
+P(X\le g)\le \left(\frac{\sigma_-}{\mu-g}\right)^2,
+\]
+where \(\sigma_-\) measures dispersion only on the adverse side of the benchmark. The estimation-error literature highlights the importance of this refinement through the Berck–Hihn result, which links semivariance directly to a strong boundary form of Chebyshev’s inequality.
+
+A further generalization uses lower partial moments of degree \(\alpha\). Define
+\[
+\theta(t,\alpha)=\big(E[(t-X)_+^\alpha]\big)^{1/\alpha}.
+\]
+Then, for \(g\le t\),
+\[
+P(X\le g)\le \left(\frac{\theta(t,\alpha)}{t-g}\right)^\alpha.
+\]
+The probability-bounds literature presents this as an Atwood-style lower-partial-moment inequality and interprets \(\theta(t,\alpha)\) as generalized downside dispersion.
+
+These bounds form a directional hierarchy:
+\[
+\text{symmetric variance} \to \text{directional second moment} \to \text{general directional degree } \alpha.
+\]
+The central theme is that tail-probability control need not be built from a separate theory. It can be generated from the same benchmark-relative operators that already define the directional framework.
+
+### Threshold Analysis and Directional Dispersion
+
+Probability bounds become especially meaningful when interpreted as threshold-analysis tools.
+
+In many applied settings, the analyst cares about whether a process falls below a critical level. Examples include:
+
+* a forecast undershooting a service target,
+* an inventory position dropping below a replenishment threshold,
+* a reliability metric falling below a safety margin,
+* or a return falling below an acceptable performance benchmark.
+
+In each case the relevant question is the same:
+\[
+P(X\le g)?
+\]
+Classical methods answer this using symmetric dispersion summaries. The directional framework answers it more precisely by measuring deviations on the relevant side of the benchmark.
+
+The quantity
+\[
+L_\alpha(t;X)=E[(t-X)_+^\alpha]
+\]
+therefore serves three roles simultaneously:
+\[
+\text{directional integral},
+\quad
+\text{benchmark-relative dispersion summary},
+\quad
+\text{engine of a probability bound}.
+\]
+That multi-use structure is one of the framework’s main advantages. It reduces the gap between descriptive measurement, numerical integration, and decision support.
+
+The estimation-error literature places this in a broader historical context: semivariance and related partial moments are not ad hoc devices, but directional statistics with direct links to probability inequalities, utility-sensitive modeling, and nonparametric analysis.
+
+
+---
+
+## Summary
+
+This chapter established that partial moments naturally represent definite integrals over directional regions of a distribution.
+
+The key results are:
+
+1. Expectations are definite integrals with respect to probability distributions.
+2. Upper and lower partial moments correspond to integrals over directional deviation regions.
+3. Empirical partial moments provide data-adaptive approximations of these integrals.
+4. Convergence follows from standard laws of large numbers.
+5. Many statistical and economic quantities—including risk measures—can be expressed using this framework.
+
+Partial moments therefore act as **numerical integrators that aggregate directional deviations relative to benchmarks**.
+
+The next chapter shows how **classical symmetric moments arise as signed combinations of partial moments**, completing the bridge between directional statistics and traditional moment analysis.
diff --git a/tools/NNS/book/chapter-05-classical-moments-as-directional-aggregates.Rmd b/tools/NNS/book/chapter-05-classical-moments-as-directional-aggregates.Rmd
new file mode 100644
index 0000000..ab4fb1a
--- /dev/null
+++ b/tools/NNS/book/chapter-05-classical-moments-as-directional-aggregates.Rmd
@@ -0,0 +1,314 @@
+# Classical Moments as Directional Aggregates
+
+Chapters 2–4 introduced directional deviation operators and the partial moments constructed from them.
+
+These operators measure deviations relative to a benchmark separately above and below the reference point.
+They therefore provide a directional description of distributional structure.
+
+This chapter shows that **classical symmetric moments arise as aggregations of these directional components**.
+
+Mean, variance, and higher-order moments do not introduce fundamentally new statistical objects.
+Instead, they emerge as signed combinations of partial moments.
+
+Once this relationship is recognized, classical moment theory can be interpreted as a special case of the directional framework.
+
+---
+
+## Moments Relative to a Benchmark
+
+In classical statistics, the \(r\)-th moment of a random variable \(X\) relative to a benchmark \(t\) is defined as
+
+\[
+E[(X-t)^r].
+\]
+
+This expression represents the \(r\)-th moment about the point \(t\).
+
+When the benchmark equals the mean \(t=\mu\), the quantity becomes the **\(r\)-th central moment**.
+Otherwise it represents a moment relative to an arbitrary reference point.
+
+Examples include
+
+- \(r=1\): mean deviation
+- \(r=2\): variance when \(t=\mu\)
+- \(r=3\): skewness-related moment
+- \(r=4\): kurtosis-related moment
+
+These quantities summarize distributions by aggregating deviations around a reference point.
+
+However, the deviation \(X-t\) combines positive and negative directions together.
+
+Directional statistics separates these components.
+
+---
+
+## Directional Moment Decomposition
+
+Recall the directional deviation operators
+
+\[
+(X-t)^+ = \max(X-t,0)
+\]
+
+\[
+(t-X)^+ = \max(t-X,0).
+\]
+
+These represent deviations above and below the benchmark.
+
+Raising these quantities to power \(r\) and taking expectations yields the partial moments
+
+\[
+U_r(t;X) = E[(X-t)_+^r]
+\]
+
+\[
+L_r(t;X) = E[(t-X)_+^r].
+\]
+
+Using the directional decomposition
+
+\[
+X-t = (X-t)^+ - (t-X)^+,
+\]
+
+one obtains the identity
+
+\[
+E[(X-t)^r] = U_r(t;X) + (-1)^r L_r(t;X).
+\]
+
+Thus **every classical moment can be written as a signed combination of directional partial moments**.
+
+---
+
+## Mean
+
+Setting \(r=1\) and \(t=0\) yields
+
+\[
+E[X] = U_1(0;X) - L_1(0;X).
+\]
+
+The mean can therefore be interpreted as the difference between
+
+- average upward deviations from the benchmark
+- average downward deviations from the benchmark.
+
+If the benchmark is chosen as the mean itself, \(t=\mu\), then
+
+\[
+E[X-\mu] = U_1(\mu;X) - L_1(\mu;X).
+\]
+
+But by definition \(E[X-\mu]=0\).
+Therefore
+
+\[
+U_1(\mu;X)=L_1(\mu;X).
+\]
+
+This equality holds **only when the benchmark equals the mean**.
+For other benchmarks, upward and downward deviations generally do not balance.
+
+Thus the classical property that deviations around the mean sum to zero has a natural directional interpretation.
+
+---
+
+## Variance
+
+Variance is defined as
+
+\[
+Var(X) = E[(X-\mu)^2].
+\]
+
+Applying the directional decomposition with \(t=\mu\) yields
+
+\[
+Var(X) = U_2(\mu;X) + L_2(\mu;X).
+\]
+
+As in Chapter 2, this is a population identity. When verifying numerically in R, use `UPM(2, mean(x), x) + LPM(2, mean(x), x)` for population variance, and multiply by \(n/(n-1)\) to match `var(x)`.
+
+This equality is exact because both terms are computed around the same global mean \(\mu\). It should not be confused with averaging conditional subgroup variances, which omits a nonnegative between-group term unless explicitly added.
+
+Variance therefore equals the sum of two directional components:
+
+- upward deviation relative to the mean
+- downward deviation relative to the mean.
+
+The classical statistic reports only their total magnitude.
+
+Two distributions may share identical variance while exhibiting very different directional structures.
+
+---
+
+## Higher-Order Moments
+
+Higher-order moments follow the same decomposition.
+
+### Third Moment
+
+\[
+E[(X-\mu)^3] = U_3(\mu;X) - L_3(\mu;X).
+\]
+
+This moment measures directional asymmetry.
+
+### Fourth Moment
+
+\[
+E[(X-\mu)^4] = U_4(\mu;X) + L_4(\mu;X).
+\]
+
+This moment reflects the magnitude of tail deviations regardless of direction.
+
+In each case, classical moments aggregate directional components into a single statistic.
+
+---
+
+## Standardized Skewness and Kurtosis
+
+In practice, third and fourth moments are normalized by variance to produce dimensionless statistics.
+
+Skewness is defined as
+
+\[
+Skew(X) =
+\frac{E[(X-\mu)^3]}{Var(X)^{3/2}}.
+\]
+
+Using the directional representation,
+
+\[
+Skew(X) =
+\frac{U_3(\mu;X) - L_3(\mu;X)}
+{(U_2(\mu;X)+L_2(\mu;X))^{3/2}}.
+\]
+
+A useful intuition follows immediately from this expression.
+If a distribution has a longer right tail than left tail, large positive deviations dominate so that
+
+\[
+U_3(\mu;X) \gg L_3(\mu;X),
+\]
+
+producing positive skewness.
+
+Similarly, kurtosis is
+
+\[
+Kurt(X) =
+\frac{E[(X-\mu)^4]}{Var(X)^2}.
+\]
+
+Substituting the directional components gives
+
+\[
+Kurt(X) =
+\frac{U_4(\mu;X)+L_4(\mu;X)}
+{(U_2(\mu;X)+L_2(\mu;X))^2}.
+\]
+
+In finite samples, estimates based on third and fourth moments can be unstable because extreme observations are raised to high powers. The directional decomposition still applies, but empirical interpretation should account for this sensitivity.
+
+Thus the familiar standardized statistics also arise directly from directional partial moments.
+
+---
+
+## Information Loss in Symmetric Aggregation
+
+The mapping from partial moments to classical moments is **many-to-one**.
+
+Directional components determine the symmetric moment uniquely.
+
+However the symmetric moment does not determine the directional components.
+
+For example,
+
+\[
+Var(X) = U_2(\mu;X) + L_2(\mu;X)
+\]
+
+does not reveal how the total variance is distributed between the two sides of the distribution.
+
+Consider two distributions:
+
+| Distribution | \(U_2(\mu;X)\) | \(L_2(\mu;X)\) | \(Var(X)=U_2+L_2\) |
+|---|---:|---:|---:|
+| A | 10 | 0 | 10 |
+| B | 5 | 5 | 10 |
+
+Both produce
+
+\[
+Var(X)=10,
+\]
+
+yet their directional risk structures are completely different.
+
+
+A useful edge case is degenerate support: if all probability mass is concentrated at a single point, then both directional components are zero, \(U_2(\mu;X)=L_2(\mu;X)=0\), so variance is exactly zero. This confirms that the decomposition remains valid at the boundary and that nonzero variance requires at least one nonzero directional component.
+
+Symmetric moments therefore represent **projections of directional structure**.
+Once aggregated, the original directional information cannot generally be recovered.
+
+---
+
+## Measure-Theoretic Interpretation
+
+The directional decomposition follows naturally from the partition of the sample space induced by the benchmark \(t\):
+
+- \(X>t\)
+- \(X\le t\)
+
+Expectation integrals can therefore be written as
+
+\[
+E[(X-t)^r]
+=
+\int_{x>t}(x-t)^r f(x)\,dx
++
+\int_{x\le t}(x-t)^r f(x)\,dx.
+\]
+
+These two integrals correspond exactly to the upper and lower partial moments.
+
+This representation also clarifies the variance example from the previous section.
+Two distributions may share the same variance while producing different values of
+
+\[
+U_2(\mu;X)
+=
+\int_{x>\mu}(x-\mu)^2 f(x)\,dx.
+\]
+
+Distribution A places most of its squared deviations in the region \(x>\mu\), producing a large value of \(U_2\).
+Distribution B distributes deviations more evenly across the two regions.
+
+Although the directional integrals differ, their sum
+
+\[
+U_2(\mu;X)+L_2(\mu;X)
+\]
+
+can still produce the same total variance.
+
+Thus the measure-theoretic decomposition explains precisely how symmetric aggregation hides directional structure.
+
+---
+
+## Implications
+
+The results of this chapter show that classical moment statistics arise from directional components rather than the other way around.
+
+Partial moments therefore provide a structural foundation from which several familiar constructs emerge:
+
+- probability distributions (degree \(r=0\)),
+- classical moments (degrees \(r\ge1\)),
+- standardized measures such as skewness and kurtosis.
+
+Seen from this perspective, symmetric statistics summarize directional information that is already present in the distribution.
+
+Chapter 6 develops the measure-theoretic foundation for this framework, and Part II extends the analysis to descriptive statistics derived from the directional perspective introduced here.
diff --git a/tools/NNS/book/chapter-06-measure-theoretic-interpretation.Rmd b/tools/NNS/book/chapter-06-measure-theoretic-interpretation.Rmd
new file mode 100644
index 0000000..882c919
--- /dev/null
+++ b/tools/NNS/book/chapter-06-measure-theoretic-interpretation.Rmd
@@ -0,0 +1,345 @@
+# Measure-Theoretic Interpretation
+
+Chapters 2–5 developed the directional framework from an algebraic perspective.
+Directional deviation operators were introduced, partial moments were defined, and classical statistics was shown to arise as an aggregation of directional components.
+
+This chapter places that framework inside **measure-theoretic probability**.
+
+The goal is not to introduce new probability axioms.
+Instead, we show that directional deviation operators align naturally with the core structures of probability theory:
+
+- measurable functions
+- partitions of the sample space
+- positive and negative function decompositions
+- Lebesgue integration
+
+Viewed from this perspective, partial moments are not merely convenient statistics.
+They represent a **canonical measurable refinement** of symmetric statistical quantities.
+
+For reference, the key assumptions used throughout this chapter are:
+
+- \(X\) is measurable on \((\Omega,\mathcal{F},P)\),
+- the benchmark \(t\) is fixed (or measurable when data-dependent benchmarks are considered),
+- the relevant moments exist (e.g., \(X\in L^r\) for degree \(r\)).
+
+---
+
+## Probability Spaces
+
+Let
+
+\[
+(\Omega,\mathcal{F},P)
+\]
+
+denote a probability space where
+
+- \(\Omega\) is the sample space,
+- \(\mathcal{F}\) is a σ-algebra of measurable events,
+- \(P\) is a probability measure.
+
+A real-valued random variable is a measurable function
+
+\[
+X:\Omega \rightarrow \mathbb{R}.
+\]
+
+The cumulative distribution function of \(X\) is
+
+\[
+F_X(t) = P(X \le t).
+\]
+
+Expectations of measurable functions are defined through the Lebesgue integral
+
+\[
+E[g(X)] = \int_{\Omega} g(X(\omega))\, dP(\omega).
+\]
+
+This integral provides the foundation for statistical quantities such as moments, expectations, and risk measures.
+
+Directional statistics operates within exactly the same framework.
+As we will see later in this chapter, the directional deviation operators introduced earlier align naturally with the positive and negative function decompositions used in Lebesgue integration.
+
+---
+
+## Benchmark-Induced Partitions
+
+Let \(t \in \mathbb{R}\) be a benchmark.
+
+The benchmark induces a natural measurable partition of the sample space:
+
+\[
+\Omega =
+\{\omega : X(\omega) \le t\}
+\cup
+\{\omega : X(\omega) > t\}.
+\]
+
+Equivalently, the real line is partitioned into two regions:
+
+- the **lower region** \(X \le t\)
+- the **upper region** \(X > t\)
+
+This partition plays a central role in probability theory.
+The cumulative distribution function itself is defined through it:
+
+\[
+F_X(t) = P(X \le t).
+\]
+
+Directional statistics extends this same partition structure to **magnitudes of deviation**.
+
+---
+
+## Positive and Negative Function Decomposition
+
+Measure theory frequently decomposes functions into positive and negative parts.
+
+For any real-valued function \(f\),
+
+\[
+f = f^{+} - f^{-}
+\]
+
+where
+
+\[
+f^{+} = \max(f,0), \qquad f^{-} = \max(-f,0).
+\]
+
+Both components are nonnegative measurable functions.
+
+Lebesgue integration then satisfies
+
+\[
+\int f\, d\mu =
+\int f^{+} d\mu
+-
+\int f^{-} d\mu.
+\]
+
+This decomposition ensures that integrals of arbitrary measurable functions can be constructed from integrals of nonnegative functions.
+
+Directional deviation operators follow exactly the same structure.
+
+Let
+
+\[
+f(X) = X - t.
+\]
+
+Then
+
+\[
+(X-t)^+ = \max(X-t,0)
+\]
+
+\[
+(t-X)^+ = \max(t-X,0).
+\]
+
+Thus
+
+\[
+X-t = (X-t)^+ - (t-X)^+.
+\]
+
+Directional deviations therefore correspond directly to the **positive and negative parts of the deviation function**.
+
+---
+
+## Partial Moments as Measurable Integrals
+
+Partial moments are expectations of these nonnegative measurable functions.
+
+For integer \(r \ge 0\),
+
+\[
+U_r(t;X) = E[(X-t)_+^r]
+\]
+
+\[
+L_r(t;X) = E[(t-X)_+^r].
+\]
+
+Assuming \(X \in L^r\) so that these expectations exist, partial moments are simply expectations of nonnegative measurable functions.
+
+Using the definition of expectation,
+
+\[
+U_r(t;X)
+=
+\int_{\Omega} (X(\omega)-t)_+^r \, dP(\omega)
+\]
+
+\[
+L_r(t;X)
+=
+\int_{\Omega} (t-X(\omega))_+^r \, dP(\omega).
+\]
+
+These integrals can be written explicitly over the benchmark partition:
+
+\[
+U_r(t;X)
+=
+\int_{X>t} (X-t)^r \, dP
+\]
+
+\[
+L_r(t;X)
+=
+\int_{X\le t} (t-X)^r \, dP.
+\]
+
+Thus partial moments are **Lebesgue integrals evaluated over measurable directional regions**.
+
+The benchmark \(t\) defines the partition, and the integrand measures deviation magnitude within each region.
+
+---
+
+## Recovery of Classical Moments
+
+From the positive–negative decomposition of deviations introduced in Section 6.3,
+
+\[
+X-t = (X-t)^+ - (t-X)^+,
+\]
+
+raising both sides to power \(r\) and integrating yields
+
+\[
+E[(X-t)^r] = U_r(t;X) + (-1)^r L_r(t;X).
+\]
+
+This identity holds for any integrable random variable.
+
+Classical symmetric moments therefore arise as **signed combinations of two directional integrals**.
+
+For example:
+
+### Mean
+
+\[
+E[X] = U_1(0;X) - L_1(0;X).
+\]
+
+### Variance
+
+\[
+Var(X) = U_2(\mu;X) + L_2(\mu;X).
+\]
+
+### Third Moment
+
+\[
+E[(X-\mu)^3] = U_3(\mu;X) - L_3(\mu;X).
+\]
+
+The symmetric moment is therefore an **aggregation operator applied to directional components**.
+
+---
+
+## Canonical Refinement of Symmetric Moments
+
+The mapping
+
+\[
+(U_r,L_r) \rightarrow E[(X-t)^r]
+\]
+
+is **many-to-one**.
+
+Directional components uniquely determine the symmetric moment, but the symmetric moment cannot generally recover the directional components.
+
+This implies a strict information hierarchy:
+
+\[
+(U_r,L_r) \quad \text{contains more information than} \quad E[(X-t)^r].
+\]
+
+In measure-theoretic terms, the directional decomposition represents a **refinement of the measurable structure** induced by symmetric aggregation.
+
+The symmetric moment collapses two measurable integrals into a single value.
+
+Directional moments preserve the contributions of each measurable region.
+
+---
+
+## Alignment with Probability Partitions
+
+Probability itself is defined through partitions.
+
+For any event \(A\),
+
+\[
+P(A) + P(A^c) = 1.
+\]
+
+Similarly, the cumulative distribution function partitions probability mass relative to a threshold \(t\):
+
+\[
+F_X(t) = P(X \le t)
+\]
+
+\[
+1 - F_X(t) = P(X > t).
+\]
+
+Directional deviation operators extend this same partition structure.
+
+Instead of measuring only probability mass in each region, they measure **magnitudes of deviation within those regions**.
+
+The degree-zero case recovers probability:
+
+\[
+L_0(t;X) = P(X \le t)
+\]
+
+\[
+U_0(t;X) = P(X > t).
+\]
+
+Higher degrees measure deviation magnitude within the same partition.
+
+---
+
+## Structural Interpretation
+
+From the measure-theoretic perspective, the directional framework reflects a deeper structural fact about probability.
+
+Every deviation relative to a benchmark induces a natural measurable partition of the sample space.
+Lebesgue integration aggregates contributions across that partition.
+
+Directional partial moments simply retain the integrals over each region separately.
+
+Symmetric statistics combine those integrals into a single value.
+
+Thus the directional representation does not introduce new probability objects.
+It reveals the **underlying structure that symmetric statistics aggregate away**.
+
+---
+
+## Implications
+
+The measure-theoretic interpretation clarifies the role of partial moments in statistical theory.
+In particular, it shows that the directional framework is not a new probabilistic system but a refinement of the standard measure-theoretic structure already used throughout statistics.
+
+1. Directional deviation operators correspond to positive and negative function decompositions.
+
+2. Partial moments are Lebesgue integrals over measurable directional regions.
+
+3. Classical symmetric moments are aggregations of those integrals.
+
+4. Directional moments therefore preserve strictly more structural information about distributions.
+
+These structural properties explain why directional statistics can support the applied methods developed in the remainder of the book: the framework preserves the same probability foundations while retaining directional information that symmetric statistics discard.
+
+Operationally, this means the directional framework can be used without replacing standard probabilistic machinery. One can work with the same probability space, the same measurable-function toolkit, and the same integration rules, while reporting richer benchmark-relative diagnostics for risk, asymmetry, and tail behavior.
+
+---
+
+The next part of the book turns from theoretical foundations to **descriptive statistics derived from directional partial moments**.
+
+Part II develops descriptive statistics that retain the directional information preserved by this refined measurable structure. Rather than collapsing deviations into symmetric aggregates, these measures describe distributions in terms of their directional behavior relative to meaningful benchmarks.
diff --git a/tools/NNS/book/chapter-07-directional-descriptive-statistics.Rmd b/tools/NNS/book/chapter-07-directional-descriptive-statistics.Rmd
new file mode 100644
index 0000000..5dfba86
--- /dev/null
+++ b/tools/NNS/book/chapter-07-directional-descriptive-statistics.Rmd
@@ -0,0 +1,331 @@
+# Directional Descriptive Statistics
+
+Chapters 2–6 established the theoretical foundations of directional statistics.
+
+Directional deviation operators were introduced, partial moments were defined, classical moments were derived as aggregations of directional components, and the framework was shown to align naturally with measure-theoretic probability. These results demonstrated that many familiar statistical quantities arise from the same primitive structure: **directional deviations relative to a benchmark**.
+
+With the theoretical foundation in place, we now turn to **descriptive statistics**.
+
+Classical descriptive statistics summarize distributions using symmetric aggregates such as the mean, variance, skewness, and kurtosis. While these quantities are useful, they obscure directional structure because they combine positive and negative deviations into a single measure.
+
+Directional descriptive statistics retain the information that symmetric statistics discard. Rather than collapsing deviations into aggregates, they describe distributions in terms of **directional behavior relative to benchmarks**.
+
+---
+
+## Directional Mean Interpretation
+
+Recall from Chapter 5 that the mean can be expressed as the difference between directional partial moments:
+
+\[
+E[X] = U_1(0;X) - L_1(0;X).
+\]
+
+This identity shows that the mean is not a primitive quantity. It is the **net directional deviation relative to the benchmark \(t = 0\)**.
+
+More generally, for any benchmark \(t\),
+
+\[
+E[X - t] = U_1(t;X) - L_1(t;X).
+\]
+
+Thus the expectation of deviations relative to a benchmark equals the difference between
+
+- deviations **above the benchmark**, and
+- deviations **below the benchmark**.
+
+If \(t = \mu\), then
+
+\[
+U_1(\mu;X) = L_1(\mu;X).
+\]
+
+In words, the mean is the point at which **expected upward and downward deviations balance**.
+
+Directional statistics therefore interprets the mean not simply as a central location but as a **balance point between directional deviations**.
+
+---
+
+## Directional Variance Decomposition
+
+Variance also has a natural directional interpretation.
+
+Chapter 5 showed that
+
+\[
+Var(X) = U_2(\mu;X) + L_2(\mu;X).
+\]
+
+This decomposition is exact for population variance. In implementation checks, remember that `var(x)` in R is the sample variance, so matching it requires multiplying `UPM(2, mean(x), x) + LPM(2, mean(x), x)` by \(n/(n-1)\).
+
+This is an **exact decomposition** relative to the global mean \(\mu\), not an approximation and not a conditional-variance identity.
+
+To avoid a common confusion, compare with the law of total variance for the split \(X\ge \mu\) versus \(X<\mu\):
+
+\[
+Var(X)=p\,Var(X\mid X\ge \mu)+(1-p)\,Var(X\mid X<\mu)+p(1-p)(\mu_{\ge}-\mu_{<})^2,
+\]
+
+where \(p=P(X\ge\mu)\), \(\mu_{\ge}=E[X\mid X\ge\mu]\), and \(\mu_{<}=E[X\mid X<\mu]\). Hence
+
+\[
+Var(X)\ge p\,Var(X\mid X\ge \mu)+(1-p)\,Var(X\mid X<\mu),
+\]
+
+because the between-group term is nonnegative. By contrast, partial moments already account for total variance around the **same global center** \(\mu\), so no extra between-group correction is missing:
+
+\[
+Var(X)=U_2(\mu;X)+L_2(\mu;X).
+\]
+
+Equivalently, \(L_2(\mu;X)\) is the (global-mean) downside semivariance and \(U_2(\mu;X)\) is the corresponding upside semivariance.
+
+Variance therefore consists of two directional components:
+
+- **upside variance**: \(U_2(\mu;X)\)
+- **downside variance**: \(L_2(\mu;X)\)
+
+Classical statistics reports only their sum.
+
+Directional descriptive statistics retain both quantities separately.
+
+This decomposition provides immediate insight into distributional structure.
+
+For example, two assets may share identical variance but differ dramatically in directional risk:
+
+| Distribution | \(U_2(\mu;X)\) | \(L_2(\mu;X)\) | Variance |
+|---|---|---|---|
+| A | 10 | 0 | 10 |
+| B | 5 | 5 | 10 |
+
+Variance alone cannot distinguish these cases.
+
+Directional variance reveals whether volatility arises primarily from **upside movements** or **downside movements**.
+
+This distinction is particularly important in finance, economics, and risk management where negative deviations are often evaluated differently than positive ones (a topic developed more fully in Part VIII).
+
+---
+
+## Benchmark-Relative Descriptive Statistics
+
+A key advantage of partial moments is that the benchmark \(t\) can be chosen externally.
+
+Classical descriptive statistics typically use internally determined reference points such as the mean or median. Directional statistics allows the analyst to describe distributions relative to **meaningful benchmarks**.
+
+Examples include
+
+- required returns in finance
+- policy thresholds in economics
+- forecast targets in operations
+- safety limits in engineering
+
+Suppose \(t\) represents a target value.
+
+Then the first-degree partial moments describe benchmark-relative behavior:
+
+\[
+U_1(t;X) = E[(X-t)_+]
+\]
+
+\[
+L_1(t;X) = E[(t-X)_+].
+\]
+
+These quantities measure
+
+- the **unconditional average excess above the benchmark**, and
+- the **unconditional average shortfall below the benchmark**.¹
+
+Unlike symmetric statistics, these measures directly reflect the context in which outcomes are evaluated.
+
+To make this concrete, consider the sample
+
+\[
+x=\{-2,-1,0,3,5\}
+\]
+
+with benchmark \(t=1\). Then
+
+\[
+\hat{L}_1(1)=\frac{1}{5}(3+2+1+0+0)=1.2,
+\quad
+\hat{U}_1(1)=\frac{1}{5}(0+0+0+2+4)=1.2.
+\]
+
+Here the unconditional average shortfall below the benchmark equals the unconditional average excess above it, even though frequencies differ: three observations fall below \(t\), one equals \(t\), and two exceed \(t\). This illustrates how benchmark-relative directional moments separate **how often** outcomes fall on each side from **how far** they lie from the benchmark.
+
+---
+
+¹ These are unconditional averages over the full sample/population, not conditional expectations (e.g., not CVaR-style conditioning on tail events only). Because these quantities are expectations of deviations, they can be influenced by extreme observations within each region of the distribution. This tail sensitivity becomes particularly relevant when analyzing heavy-tailed distributions, as discussed later in Section 7.5.
+
+---
+
+## Directional Skewness
+
+Skewness measures asymmetry in distributions.
+
+The classical skewness coefficient is
+
+\[
+Skew(X)=
+\frac{E[(X-\mu)^3]}{Var(X)^{3/2}}.
+\]
+
+Using the directional decomposition,
+
+\[
+E[(X-\mu)^3] = U_3(\mu;X) - L_3(\mu;X).
+\]
+
+Thus skewness can be written as
+
+\[
+Skew(X)=
+\frac{U_3(\mu;X)-L_3(\mu;X)}
+{(U_2(\mu;X)+L_2(\mu;X))^{3/2}}.
+\]
+
+This expression provides a clear interpretation.
+
+- If \(U_3(\mu;X) > L_3(\mu;X)\), large positive deviations dominate and skewness is positive.
+- If \(L_3(\mu;X) > U_3(\mu;X)\), large negative deviations dominate and skewness is negative.
+
+In applied settings, **whether extreme outcomes occur on the upside or the downside is often more decision-relevant than the overall asymmetry coefficient alone**.
+
+For example, financial return distributions with positive skewness frequently reflect patterns of frequent small losses punctuated by occasional large gains. In directional terms this corresponds to
+
+\[
+U_3(\mu;X) \gg L_3(\mu;X).
+\]
+
+Conversely, strategies that produce steady small gains but occasionally experience large losses exhibit
+
+\[
+L_3(\mu;X) \gg U_3(\mu;X).
+\]
+
+Directional skewness therefore identifies **which side of the distribution generates extreme asymmetry**, a distinction that symmetric skewness coefficients alone cannot fully describe.
+
+---
+
+## Directional Kurtosis
+
+Kurtosis describes the magnitude of extreme deviations.
+
+Classically, kurtosis is often interpreted as a measure of **tail heaviness** (or sometimes distributional “peakedness”).
+
+The classical definition is
+
+\[
+Kurt(X)=
+\frac{E[(X-\mu)^4]}{Var(X)^2}.
+\]
+
+Using the directional representation,
+
+\[
+E[(X-\mu)^4] = U_4(\mu;X) + L_4(\mu;X).
+\]
+
+Thus kurtosis becomes
+
+\[
+Kurt(X)=
+\frac{U_4(\mu;X)+L_4(\mu;X)}
+{(U_2(\mu;X)+L_2(\mu;X))^2}.
+\]
+
+Directional statistics refines the classical interpretation.
+
+Instead of reporting only the total magnitude of extreme deviations, we may examine
+
+- **upper tail heaviness**: \(U_4(\mu;X)\)
+- **lower tail heaviness**: \(L_4(\mu;X)\)
+
+Suppose two distributions share identical kurtosis. Classical statistics would describe both as equally heavy-tailed.
+
+Directional kurtosis reveals whether extreme observations arise primarily from the **upper tail** or the **lower tail**.
+
+For example, venture-capital portfolios may exhibit large values of
+
+\[
+U_4(\mu;X)
+\]
+
+reflecting occasional extremely large gains, while certain credit portfolios may display large
+
+\[
+L_4(\mu;X)
+\]
+
+reflecting rare but severe losses.
+
+Although both portfolios might share similar classical kurtosis, their **directional tail structures—and therefore their risk characteristics—are fundamentally different.**
+
+---
+
+## Directional Distribution Profiles
+
+Combining directional partial moments across degrees produces a **directional profile of a distribution**.
+
+When using higher-order profiles, existence conditions matter: interpreting directional structure through order \(r\) requires the corresponding partial moments \(L_r(t;X)\) and \(U_r(t;X)\) to be finite.
+
+For a benchmark \(t\), the sequence
+
+\[
+L_0(t;X), L_1(t;X), L_2(t;X), \dots
+\]
+
+describes
+
+- probability mass below the benchmark
+- mean deviation below the benchmark
+- variance below the benchmark
+- higher-order tail structure
+
+Similarly,
+
+\[
+U_0(t;X), U_1(t;X), U_2(t;X), \dots
+\]
+
+describe the corresponding properties **above the benchmark**.
+
+Together these sequences provide a detailed directional characterization of the distribution.
+
+To illustrate, consider a distribution with many small losses and occasional large gains.
+
+Relative to a benchmark \(t = 0\), such a distribution might exhibit
+
+\[
+L_0(t;X) \approx 0.60, \quad L_1(t;X)\text{ small}, \quad L_2(t;X)\text{ modest}
+\]
+
+but
+
+\[
+U_0(t;X) \approx 0.40, \quad U_1(t;X)\text{ moderate}, \quad U_2(t;X)\text{ large}.
+\]
+
+This directional profile indicates that losses occur more frequently, but gains—when they occur—are substantially larger.
+
+These profiles can also be visualized—for example, using bar charts of \(L_r\) and \(U_r\) across degrees \(r = 0,1,2,\dots\)—providing an intuitive graphical summary of directional distribution structure.
+
+Classical statistics might summarize the same distribution with a moderate mean and high variance. The directional profile reveals the **mechanism generating those aggregates**.
+
+Distributions that appear similar under symmetric statistics can therefore exhibit very different directional structures. Examining how deviations are distributed between the upper and lower regions often provides clearer insight into the sources of asymmetry and tail behavior.
+
+---
+
+## Summary
+
+This chapter developed descriptive statistics derived from directional partial moments.
+
+Classical descriptive statistics summarize distributions through symmetric aggregates such as the mean, variance, skewness, and kurtosis. The directional framework reveals that each of these quantities arises from a pair of directional components measuring deviations above and below a benchmark.
+
+Viewing descriptive statistics in this way clarifies several important ideas. The mean represents the point at which upward and downward deviations balance. Variance combines upside and downside variability that may arise from very different sources. Higher-order moments such as skewness and kurtosis reflect asymmetries in directional tail behavior.
+
+More importantly, partial moments allow descriptive statistics to be defined relative to **externally meaningful benchmarks**, enabling analysts to examine distributions in the context in which outcomes are actually evaluated.
+
+A final bridge to the next chapter is immediate: because the degree-zero lower partial moment recovers the cumulative distribution function, the same directional framework used here for descriptive decomposition also yields direct nonparametric distribution estimation.
+
+The next chapter builds on this descriptive framework by showing how **entire distributions can be estimated directly from partial moments**, providing a nonparametric alternative to traditional density estimation methods and avoiding the bandwidth selection problems discussed in Chapter 1.
diff --git a/tools/NNS/book/chapter-08-distribution-estimation.Rmd b/tools/NNS/book/chapter-08-distribution-estimation.Rmd
new file mode 100644
index 0000000..59a644d
--- /dev/null
+++ b/tools/NNS/book/chapter-08-distribution-estimation.Rmd
@@ -0,0 +1,355 @@
+# Distribution Estimation
+
+Chapter 7 introduced directional descriptive statistics derived from partial moments.
+Those statistics summarize distributions while preserving directional information relative to meaningful benchmarks.
+
+The next step is **distribution estimation**.
+
+Classical statistics typically represents distributions through either
+
+- parametric models (such as the normal distribution), or
+- smoothed nonparametric estimators (such as kernel density estimation).
+
+Parametric models impose strong structural assumptions, while many nonparametric estimators require externally chosen smoothing parameters such as bandwidths.
+
+The directional framework provides a different approach. Because the cumulative distribution function is itself a partial moment, entire distributions can be estimated directly from **empirical partial moments** without parametric assumptions or externally chosen smoothing parameters.
+
+This chapter develops that approach.
+
+---
+
+## The Empirical Distribution Function
+
+Suppose we observe a sample
+
+\[
+x_1, x_2, \dots, x_n.
+\]
+
+The **empirical distribution function (EDF)** is defined as
+
+\[
+\hat{F}_n(t) =
+\frac{1}{n}\sum_{i=1}^{n} 1_{\{x_i \le t\}}.
+\]
+
+This quantity represents the proportion of observations less than or equal to the benchmark \(t\).
+
+The EDF is the classical nonparametric estimator of the cumulative distribution function.
+
+A fundamental property follows from the directional framework developed earlier. The cumulative distribution function can be written as a degree-zero partial moment:
+
+\[
+F_X(t) = L_0(t;X).
+\]
+
+Consequently, the empirical distribution function can be written as
+
+\[
+\hat{F}_n(t) =
+\frac{1}{n}\sum_{i=1}^{n} (t-x_i)_+^0.
+\]
+
+Thus the empirical distribution function is simply the **empirical degree-zero lower partial moment**.
+
+Distribution estimation therefore arises naturally within the directional framework.
+
+---
+
+## Empirical Partial Moment Estimators
+
+More generally, partial moments can be estimated directly from sample data.
+
+For degree \(r \ge 0\),
+
+\[
+\hat{L}_r(t) =
+\frac{1}{n}\sum_{i=1}^{n} (t-x_i)_+^r
+\]
+
+\[
+\hat{U}_r(t) =
+\frac{1}{n}\sum_{i=1}^{n} (x_i-t)_+^r.
+\]
+
+These estimators converge to the population partial moments
+
+\[
+L_r(t;X) = E[(t-X)_+^r]
+\]
+
+\[
+U_r(t;X) = E[(X-t)_+^r]
+\]
+
+by the law of large numbers.
+
+Thus empirical partial moments provide estimators of directional deviation structure that do not require specifying a parametric family.
+
+Importantly, the case \(r=0\) produces the empirical distribution function itself.
+
+---
+
+## Distribution Estimation from Partial Moments
+
+Because the cumulative distribution function equals the degree-zero partial moment,
+
+\[
+F_X(t) = L_0(t;X),
+\]
+
+estimating \(L_0\) directly estimates the distribution.
+
+The empirical estimator
+
+\[
+\hat{F}_n(t) =
+\frac{1}{n}\sum_{i=1}^{n} (t-x_i)_+^0
+\]
+
+therefore provides a **nonparametric estimate of the entire distribution**.
+
+This estimator has several desirable properties.
+
+### Nonparametric
+
+No parametric model is assumed, so the estimator applies broadly across distributional forms.
+
+### Data-Driven
+
+The estimate depends only on the observed sample.
+
+### Consistency
+
+By the **Glivenko–Cantelli theorem**,
+
+\[
+\sup_t |\hat{F}_n(t)-F_X(t)| \to 0.
+\]
+
+This theorem states that the empirical distribution converges uniformly to the true distribution. In practical terms, the **largest possible difference between the empirical and true cumulative distributions across all benchmarks becomes arbitrarily small as the sample size grows**. This result holds for all probability distributions under standard conditions, not only for continuous distributions.
+
+Thus the empirical distribution provides a reliable estimator of the entire probability distribution under the usual i.i.d. sampling framework.
+
+---
+
+## From Distribution to Density
+
+While the empirical distribution function estimates the cumulative distribution, analysts often wish to estimate the **probability density function**.
+
+For continuous distributions,
+
+\[
+f(t) = \frac{d}{dt}F_X(t).
+\]
+
+Because
+
+\[
+F_X(t) = L_0(t;X),
+\]
+
+this implies
+
+\[
+f(t) = \frac{d}{dt}L_0(t;X).
+\]
+
+In practice the empirical distribution function is a step function, so its derivative does not produce a smooth density estimate.
+
+This distinction is important: the EDF already gives a complete nonparametric estimate of the CDF, but obtaining a smooth density estimate typically requires additional regularity assumptions and/or smoothing choices.
+
+Classical statistics addresses this issue using **kernel density estimation**, which smooths the empirical distribution using a bandwidth parameter. Another common alternative is the **histogram estimator**, which approximates the density by counting observations within fixed intervals. However, histograms also require selecting a bin width, which plays a role analogous to the bandwidth in kernel density estimation.
+
+The directional framework approaches the problem differently. Rather than smoothing the distribution directly, it identifies structural features of the distribution—such as the **mode and local concentration of probability mass**—through data-adaptive procedures that do not require externally chosen smoothing parameters.
+
+The theoretical foundation for this approach is established in Chapter 13, where degree-one partial moments are shown to recover the full distribution without smoothing parameters. The computational implementation — data-adaptive partitioning that locates modes and local probability mass concentration — is developed in Chapter 18.
+
+Practical implementation of these methods — including sampling from empirical distributions and generating PDFs via degree manipulation of `LPM.VaR` — is demonstrated in the NNS package vignette [Sampling and Simulation](https://cran.r-project.org/web/packages/NNS/vignettes/NNSvignette_05_Sampling.html).
+
+---
+
+## Comparison with Kernel Density Estimation
+
+Kernel density estimation is one of the most widely used nonparametric density estimators.
+
+Given a kernel function \(K(\cdot)\) and bandwidth \(h\), the estimator is
+
+\[
+\hat{f}(t) =
+\frac{1}{nh}\sum_{i=1}^{n}
+K\left(\frac{t-x_i}{h}\right).
+\]
+
+The **kernel function** determines the shape of the local weighting (common choices include Gaussian, Epanechnikov, and uniform kernels), while the **bandwidth** controls the degree of smoothing.
+
+Bandwidth selection is critical.
+
+- If \(h\) is too small, the estimate becomes noisy.
+- If \(h\) is too large, important structure may be obscured.
+
+Selecting an appropriate bandwidth often requires cross-validation or heuristic rules.
+
+Empirical partial moment estimators avoid this issue for CDF estimation because they do not rely on smoothing parameters: the distribution estimate arises directly from the data. By contrast, smooth density estimation generally reintroduces smoothing or shape assumptions.
+
+---
+
+## Example: Empirical Partial Moments
+
+Consider the observations
+
+\[
+x = \{-3,-1,0,2,4\}.
+\]
+
+Let the benchmark be
+
+\[
+t = 1.
+\]
+
+The empirical distribution function becomes
+
+\[
+\hat{F}_n(1)=\frac{3}{5}=0.6
+\]
+
+since three observations are less than or equal to 1.
+
+Now compute the first-degree empirical partial moments.
+
+Lower partial moment:
+
+\[
+\hat{L}_1(1)=
+\frac{1}{5}\sum_{i=1}^{5}(1-x_i)_+
+\]
+
+\[
+=
+\frac{1}{5}(4+2+1+0+0)
+=
+1.4
+\]
+
+Upper partial moment:
+
+\[
+\hat{U}_1(1)=
+\frac{1}{5}\sum_{i=1}^{5}(x_i-1)_+
+\]
+
+\[
+=
+\frac{1}{5}(0+0+0+1+3)
+=
+0.8.
+\]
+
+These quantities describe the distribution relative to the benchmark \(t=1\):
+
+- 60% of observations lie below the benchmark
+- the unconditional average shortfall below the benchmark is 1.4
+- the unconditional average excess above the benchmark is 0.8
+
+Now compare with benchmark \(t=0\):
+
+\[
+\hat{F}_n(0)=\frac{3}{5}=0.6,
+\quad
+\hat{L}_1(0)=\frac{1}{5}(3+1+0+0+0)=0.8,
+\quad
+\hat{U}_1(0)=\frac{1}{5}(0+0+0+2+4)=1.2.
+\]
+
+At \(t=1\), unconditional shortfall dominates unconditional excess (1.4 vs 0.8); at \(t=0\), unconditional excess dominates unconditional shortfall (1.2 vs 0.8). This illustrates how directional conclusions can change with the benchmark, even for the same sample.
+
+Together, these statistics provide a directional description of the distribution that complements the empirical distribution function.
+
+---
+
+## Tail Sensitivity
+
+Because empirical partial moments aggregate deviations relative to benchmarks, they naturally reveal **tail structure**.
+
+Consider the first-degree lower partial moment
+
+\[
+\hat{L}_1(t) =
+\frac{1}{n}\sum_{i=1}^{n}(t-x_i)_+.
+\]
+
+This quantity measures the **unconditional average shortfall below the benchmark**.
+
+Similarly,
+
+\[
+\hat{U}_1(t) =
+\frac{1}{n}\sum_{i=1}^{n}(x_i-t)_+.
+\]
+
+measures the **unconditional average excess above the benchmark**.
+
+By examining these quantities across benchmarks \(t\), analysts can explore how deviations accumulate in the lower and upper regions of the distribution.
+
+The influence of extreme observations depends on the order of the partial moment. When \(r=0\) (the empirical distribution function), each observation contributes only through an indicator function and therefore influences the estimate equally regardless of magnitude. For \(r \ge 1\), however, deviations enter the calculation through powers of the distance from the benchmark. As the order \(r\) increases, extreme observations exert progressively greater influence on the estimate, reflecting the increasing emphasis on tail behavior.
+
+---
+
+## Robustness Properties
+
+Empirical distribution estimators possess several robustness advantages that follow directly from their nonparametric construction, while still inheriting standard finite-sample variability.
+
+First, they reduce **model risk**. Because no parametric distribution is imposed, misspecification from choosing an incorrect family is avoided.
+
+Second, the estimator is **transparent**. Each observation contributes directly to the estimate through the indicator function \(1_{\{x_i \le t\}}\), ensuring that the distribution estimate reflects the empirical data without additional smoothing or transformation.
+
+Third, the estimator improves **systematically with sample size**. As additional observations are collected, the empirical distribution converges uniformly to the true distribution.
+
+Because partial moments measure deviations relative to benchmarks, extreme observations influence the estimates in proportion to their deviation magnitude when \(r \ge 1\). In applications such as risk management, where extreme outcomes carry important information, this sensitivity can be desirable because it preserves tail behavior that smoothing-based estimators may dilute.
+
+---
+
+## Directional Distribution Analysis
+
+Combining empirical partial moments across benchmarks provides a detailed description of the distribution.
+
+For example, evaluating
+
+\[
+\hat{L}_0(t),\quad \hat{L}_1(t),\quad \hat{L}_2(t)
+\]
+
+across values of \(t\) reveals
+
+- probability mass below each benchmark,
+- average deviation below each benchmark,
+- variance contribution below each benchmark.
+
+Similarly,
+
+\[
+\hat{U}_0(t),\quad \hat{U}_1(t),\quad \hat{U}_2(t)
+\]
+
+describe corresponding behavior above the benchmark.
+
+Together these quantities form a **directional representation of the distribution**.
+
+Rather than summarizing the data with a few symmetric statistics, the directional framework allows analysts to examine how probability mass and deviation magnitudes accumulate across different regions of the distribution.
+
+---
+
+## Summary
+
+This chapter examined distribution estimation from the perspective of directional statistics.
+
+A key observation is that the cumulative distribution function itself is a partial moment. Consequently, empirical partial moments provide a natural nonparametric method for estimating entire probability distributions.
+
+Several conclusions follow.
+
+First, the empirical distribution function is the empirical degree-zero lower partial moment. Second, empirical partial moments provide consistent estimators of directional deviation structure. Third, distribution estimation can be performed without parametric assumptions or externally chosen smoothing parameters.
+
+While the empirical distribution function provides a complete description of the distribution, applied analysis also requires understanding how distributions interact across variables and across states of the sample space.
+
+The next chapter begins Part III on dependence by showing why classical correlation can fail under nonlinear and asymmetric structures, motivating directional dependence measures built from the same partial-moment foundation.
diff --git a/tools/NNS/book/chapter-09-why-correlation-fails.Rmd b/tools/NNS/book/chapter-09-why-correlation-fails.Rmd
new file mode 100644
index 0000000..918f3db
--- /dev/null
+++ b/tools/NNS/book/chapter-09-why-correlation-fails.Rmd
@@ -0,0 +1,586 @@
+# Why Correlation Fails
+
+Chapters 7 and 8 developed descriptive statistics and distribution estimation using directional partial moments.
+Those results showed that many classical statistical quantities arise from aggregations of directional deviations relative to benchmarks.
+
+The next topic is **dependence between variables**.
+
+Classical statistics measures dependence primarily through **covariance and correlation**.
+These statistics summarize relationships between variables with a single number.
+
+However, these measures possess fundamental limitations. They
+
+- measure only **linear association**,
+- aggregate directional information symmetrically,
+- and can obscure nonlinear, asymmetric, or tail-specific relationships.
+
+Directional statistics provides a deeper perspective.
+Just as classical moments arise from aggregations of partial moments, **covariance and correlation arise from aggregations of directional co-partial moments**.
+
+This chapter explains why correlation can fail and establishes the connection between covariance and directional partial-moment matrices.
+
+---
+
+## Classical Dependence Measures
+
+For two random variables \(X\) and \(Y\), the **covariance** is
+
+\[
+\operatorname{Cov}(X,Y)
+=
+E[(X-\mu_X)(Y-\mu_Y)].
+\]
+
+Covariance measures the joint variation of the two variables relative to their means.
+
+The **Pearson correlation coefficient** standardizes covariance:
+
+\[
+\rho(X,Y)
+=
+\frac{\operatorname{Cov}(X,Y)}
+{\sigma_X\sigma_Y}.
+\]
+
+The statistic lies in the interval
+
+\[
+-1 \le \rho(X,Y) \le 1.
+\]
+
+Values near
+
+- \(1\) indicate strong positive linear association,
+- \(-1\) indicate strong negative linear association,
+- \(0\) indicate no linear association.
+
+The key limitation is that correlation measures **only linear relationships**.
+If dependence is nonlinear, asymmetric, or concentrated in tails, correlation may understate it or miss it entirely.
+
+---
+
+## Directional Co-Partial Moments
+
+Directional statistics partitions the joint distribution relative to benchmark values \(t_X\) and \(t_Y\).
+
+Four directional regions arise:
+
+\[
+X \le t_X,\; Y \le t_Y,
+\]
+
+\[
+X \le t_X,\; Y > t_Y,
+\]
+
+\[
+X > t_X,\; Y \le t_Y,
+\]
+
+\[
+X > t_X,\; Y > t_Y.
+\]
+
+These regions correspond to combinations of directional deviations.
+
+Benchmarks may be chosen in different ways depending on the application:
+
+- **External benchmarks**, such as policy targets, required returns, safety thresholds, or liability levels.
+- **Internal benchmarks**, such as sample means, medians, or other distribution-derived reference points.
+
+The covariance decomposition developed in the next section uses the **means**:
+
+\[
+t_X = \mu_X,
+\qquad
+t_Y = \mu_Y.
+\]
+
+Define the positive-part operator
+
+\[
+(x)_+ = \max(x,0).
+\]
+
+The directional co-partial moments of order \(r,s\) are:
+
+### Co-Lower Partial Moment
+
+\[
+\operatorname{CoLPM}_{r,s}(X,Y)
+=
+E[(t_X-X)_+^r (t_Y-Y)_+^s].
+\]
+
+This measures concordant lower-side co-movement: both variables are below their benchmarks.
+
+### Co-Upper Partial Moment
+
+\[
+\operatorname{CoUPM}_{r,s}(X,Y)
+=
+E[(X-t_X)_+^r (Y-t_Y)_+^s].
+\]
+
+This measures concordant upper-side co-movement: both variables are above their benchmarks.
+
+### Divergent Lower Partial Moment
+
+\[
+\operatorname{DLPM}_{r,s}(X,Y)
+=
+E[(X-t_X)_+^r (t_Y-Y)_+^s].
+\]
+
+This measures one divergent direction: \(X\) is above its benchmark while \(Y\) is below its benchmark.
+
+### Divergent Upper Partial Moment
+
+\[
+\operatorname{DUPM}_{r,s}(X,Y)
+=
+E[(t_X-X)_+^r (Y-t_Y)_+^s].
+\]
+
+This measures the opposite divergent direction: \(X\) is below its benchmark while \(Y\) is above its benchmark.
+
+Together, these four quantities provide a **directional decomposition of dependence structure**.
+
+---
+
+## Covariance from Co-Partial Moments
+
+Covariance can be expressed directly in terms of directional co-partial moments.
+
+Let the benchmarks equal the means:
+
+\[
+t_X=\mu_X,
+\qquad
+t_Y=\mu_Y.
+\]
+
+From the directional decomposition introduced in Chapter 2,
+
+\[
+x = x_+ - (-x)_+.
+\]
+
+Applying this to deviations gives
+
+\[
+X-\mu_X = (X-\mu_X)_+ - (\mu_X-X)_+
+\]
+
+and
+
+\[
+Y-\mu_Y = (Y-\mu_Y)_+ - (\mu_Y-Y)_+.
+\]
+
+Define
+
+\[
+A=(X-\mu_X)_+,
+\qquad
+B=(\mu_X-X)_+,
+\]
+
+\[
+C=(Y-\mu_Y)_+,
+\qquad
+D=(\mu_Y-Y)_+.
+\]
+
+Then
+
+\[
+(X-\mu_X)(Y-\mu_Y)
+=
+(A-B)(C-D).
+\]
+
+Expanding gives
+
+\[
+(A-B)(C-D)
+=
+AC + BD - AD - BC.
+\]
+
+Substituting the definitions yields
+
+\[
+\begin{aligned}
+(X-\mu_X)(Y-\mu_Y)
+&=
+(X-\mu_X)_+(Y-\mu_Y)_+ \\
+&\quad+
+(\mu_X-X)_+(\mu_Y-Y)_+ \\
+&\quad-
+(X-\mu_X)_+(\mu_Y-Y)_+ \\
+&\quad-
+(\mu_X-X)_+(Y-\mu_Y)_+.
+\end{aligned}
+\]
+
+Taking expectations gives
+
+\[
+\operatorname{Cov}(X,Y)
+=
+\operatorname{CoUPM}_{1,1}(X,Y)
++
+\operatorname{CoLPM}_{1,1}(X,Y)
+-
+\operatorname{DLPM}_{1,1}(X,Y)
+-
+\operatorname{DUPM}_{1,1}(X,Y).
+\]
+
+Thus covariance is the **signed aggregation of four directional co-partial moments**.
+
+This mirrors the earlier variance decomposition
+
+\[
+\operatorname{Var}(X)
+=
+U_2(\mu;X)+L_2(\mu;X).
+\]
+
+The difference is that covariance requires both concordant and divergent directional components. Concordant components enter positively. Divergent components enter negatively.
+
+---
+
+## Covariance Matrices from Partial-Moment Matrices
+
+For a system of \(N\) variables, directional co-partial moments form matrices.
+
+Define the degree-1 directional matrices by
+
+\[
+\operatorname{CoLPM}_{ij}
+=
+\operatorname{CoLPM}_{1,1}(X_i,X_j),
+\]
+
+\[
+\operatorname{CoUPM}_{ij}
+=
+\operatorname{CoUPM}_{1,1}(X_i,X_j),
+\]
+
+\[
+\operatorname{DLPM}_{ij}
+=
+\operatorname{DLPM}_{1,1}(X_i,X_j),
+\]
+
+\[
+\operatorname{DUPM}_{ij}
+=
+\operatorname{DUPM}_{1,1}(X_i,X_j).
+\]
+
+Each matrix captures directional co-movement across the variables.
+
+The classical covariance matrix can be written as
+
+\[
+\Sigma
+=
+\operatorname{CoLPM}
++
+\operatorname{CoUPM}
+-
+\operatorname{DLPM}
+-
+\operatorname{DUPM}.
+\]
+
+The diagonal elements satisfy
+
+\[
+\Sigma_{ii}
+=
+\operatorname{Var}(X_i).
+\]
+
+This follows because when \(i=j\), the divergent partial moments vanish. A variable cannot be both above and below its own benchmark at the same observation. The expression therefore reduces to the variance decomposition derived earlier:
+
+\[
+\operatorname{Var}(X_i)
+=
+U_2(\mu_i;X_i)+L_2(\mu_i;X_i).
+\]
+
+Like their univariate counterparts, these directional matrices can be **estimated empirically from sample data** using sample co-partial moments.
+
+```{r pm-matrix-example}
+library(NNS)
+
+set.seed(123)
+x <- rnorm(100)
+y <- rnorm(100)
+
+cov.mtx <- PM.matrix(
+ LPM_degree = 1,
+ UPM_degree = 1,
+ target = "mean",
+ variable = cbind(x, y),
+ pop_adj = TRUE
+)
+
+cov.mtx
+
+# Reassembled covariance matrix
+cov.mtx$clpm + cov.mtx$cupm - cov.mtx$dlpm - cov.mtx$dupm
+
+# Standard covariance matrix
+cov(cbind(x, y))
+```
+
+The reassembled matrix is identical to the standard covariance matrix, confirming the degree-1 directional decomposition in empirical form.
+
+---
+
+## Gram-Matrix Structure of Concordant Co-Partial Moment Matrices
+
+The concordant co-partial moment matrices have a simple linear algebra structure.
+
+For a system of \(N\) variables observed over \(T\) periods, define the lower directional-deviation matrix \(L^{(r)}\) by
+
+\[
+L^{(r)}_{t i}
+=
+(t_i-X_{i,t})_+^r.
+\]
+
+The \(i\)-th column of \(L^{(r)}\) is the lower directional-deviation vector for variable \(X_i\).
+
+Then the co-lower partial moment matrix is
+
+\[
+\operatorname{CoLPM}^{(r)}
+=
+\frac{1}{T}
+\left(L^{(r)}\right)^\top L^{(r)}.
+\]
+
+Similarly, define the upper directional-deviation matrix \(U^{(r)}\) by
+
+\[
+U^{(r)}_{t i}
+=
+(X_{i,t}-t_i)_+^r.
+\]
+
+Then the co-upper partial moment matrix is
+
+\[
+\operatorname{CoUPM}^{(r)}
+=
+\frac{1}{T}
+\left(U^{(r)}\right)^\top U^{(r)}.
+\]
+
+Thus each concordant co-partial moment matrix is a **Gram matrix**: its entries are pairwise inner products of directional-deviation vectors.
+
+For any weight vector \(w\),
+
+\[
+\begin{aligned}
+w^\top \operatorname{CoLPM}^{(r)} w
+&=
+\frac{1}{T}
+w^\top
+\left(L^{(r)}\right)^\top
+L^{(r)}
+w \\
+&=
+\frac{1}{T}
+\left\|L^{(r)}w\right\|^2 \\
+&\geq 0.
+\end{aligned}
+\]
+
+Therefore \(\operatorname{CoLPM}^{(r)}\) is positive semidefinite. The same argument applies to \(\operatorname{CoUPM}^{(r)}\).
+
+This explains why concordant co-partial moment matrices are symmetric and positive semidefinite: they are matrices of inner products between directional deviation vectors. The result is structural, not distributional. It does not require normality, linearity, or parametric assumptions.
+
+This point should not be confused with the covariance reconstruction above. The concordant matrices \(\operatorname{CoLPM}\) and \(\operatorname{CoUPM}\) are positive semidefinite Gram matrices. The covariance matrix is a **signed aggregation** that subtracts the divergent matrices:
+
+\[
+\Sigma
+=
+\operatorname{CoLPM}
++
+\operatorname{CoUPM}
+-
+\operatorname{DLPM}
+-
+\operatorname{DUPM}.
+\]
+
+The Gram structure explains why the directional building blocks are well-behaved. The signed aggregation explains how classical covariance is recovered from those building blocks.
+
+---
+
+## Correlation as a Normalized Covariance
+
+The correlation matrix is obtained by standardizing covariance:
+
+\[
+\rho_{ij}
+=
+\frac{\Sigma_{ij}}
+{\sqrt{\Sigma_{ii}\Sigma_{jj}}}.
+\]
+
+Since covariance itself is derived from directional matrices, correlation represents a further aggregation.
+
+The information hierarchy becomes
+
+\[
+(\operatorname{CoLPM},\operatorname{CoUPM},\operatorname{DLPM},\operatorname{DUPM})
+\rightarrow
+\Sigma
+\rightarrow
+\rho.
+\]
+
+Directional matrices therefore preserve more structural information about dependence than correlation alone.
+
+Correlation is useful when a single linear summary is appropriate. It is incomplete when the dependence structure differs across lower, upper, or divergent regions.
+
+---
+
+## Nonlinear Dependence
+
+Correlation measures linear association and therefore fails when relationships are nonlinear.
+
+Consider
+
+\[
+Y=X^2
+\]
+
+with \(X\) symmetrically distributed around zero.
+
+In this case
+
+\[
+\operatorname{Corr}(X,Y)=0.
+\]
+
+For example, if \(X\sim N(0,1)\), then
+
+\[
+\operatorname{Corr}(X,X^2)=0.
+\]
+
+Despite zero correlation, the variables are perfectly dependent.
+
+Directional co-partial moments reveal this structure.
+
+With benchmark \(t_X=0\),
+
+- \(\operatorname{CoUPM}\) captures dependence when \(X>0\) and \(Y\) is above its benchmark,
+- \(\operatorname{DUPM}\) or \(\operatorname{DLPM}\) captures the mirrored dependence when \(X<0\), depending on the benchmark chosen for \(Y\).
+
+The directional matrices expose strong dependence that the aggregated covariance can cancel.
+
+---
+
+## Asymmetric Dependence
+
+**Asymmetric dependence** refers to dependence that differs between the upper and lower regions of the joint distribution.
+
+Examples include
+
+- financial assets that move together primarily during crashes,
+- economic variables responding differently to positive and negative shocks,
+- risk exposures concentrated in losses.
+
+Directional matrices isolate these effects directly.
+
+For example,
+
+\[
+\operatorname{CoLPM}
+\]
+
+captures joint downside deviations, while
+
+\[
+\operatorname{CoUPM}
+\]
+
+captures joint upside co-movement.
+
+If dependence is concentrated in one region, the directional matrices reveal it even when overall covariance appears modest.
+
+---
+
+## Tail Dependence
+
+Extreme events often drive the most consequential relationships.
+
+Correlation averages dependence across the entire distribution and therefore may understate tail relationships.
+
+Directional co-partial moments of higher order emphasize extreme deviations:
+
+\[
+\operatorname{CoLPM}_{r,s},
+\qquad
+\operatorname{CoUPM}_{r,s}.
+\]
+
+Increasing \(r\) and \(s\) increases sensitivity to extreme observations.
+
+This concept is closely related to **tail dependence in copula theory**, which Chapter 10 examines in detail.
+
+---
+
+## Information Loss in Aggregation
+
+The mapping from directional matrices to covariance is **many-to-one**.
+
+Different directional dependence structures can produce identical covariance values.
+
+Similarly, many covariance matrices produce identical correlation matrices after normalization.
+
+Thus correlation discards substantial structural information about joint distributions.
+
+Directional methods preserve this information by retaining contributions from each directional region separately.
+
+The directional representation is therefore strictly richer:
+
+\[
+\text{directional co-partial moments}
+\rightarrow
+\text{covariance}
+\rightarrow
+\text{correlation}.
+\]
+
+Each arrow aggregates information. Once aggregated, the lost directional structure cannot generally be recovered without additional assumptions.
+
+---
+
+## Summary
+
+This chapter examined the limitations of classical correlation and covariance.
+
+Key observations include:
+
+1. Correlation measures only linear association.
+2. Covariance aggregates directional co-deviations across the joint distribution.
+3. Covariance itself arises from directional co-partial moments.
+4. The covariance matrix equals a signed aggregation of directional partial-moment matrices.
+5. Concordant co-partial moment matrices are Gram matrices and are therefore symmetric and positive semidefinite.
+6. Correlation is the normalized version of the covariance aggregate.
+
+Directional statistics therefore provides a richer representation of dependence structure.
+
+The following chapter develops directional dependence measures built from directional co-partial moments.
diff --git a/tools/NNS/book/chapter-10-directional-dependence.Rmd b/tools/NNS/book/chapter-10-directional-dependence.Rmd
new file mode 100644
index 0000000..ad85a3f
--- /dev/null
+++ b/tools/NNS/book/chapter-10-directional-dependence.Rmd
@@ -0,0 +1,387 @@
+# Directional Dependence
+
+Chapter 9 showed that classical covariance and correlation arise from **aggregations of directional co-partial moments**. While correlation summarizes joint variation with a single symmetric statistic, many real-world relationships are **nonlinear, asymmetric, or concentrated in extreme events**.
+
+A familiar example occurs in financial markets. During ordinary periods, many assets appear weakly correlated. Yet during crises, losses often occur simultaneously across markets. Correlation averages across all observations and therefore may fail to capture this type of **asymmetric tail dependence**.
+
+Directional statistics addresses this limitation by examining how variables move relative to **benchmarks for each variable simultaneously**. Instead of collapsing joint behavior into a single number, the directional framework partitions the joint distribution and measures deviations within each region separately.
+
+This chapter develops **directional dependence** using co-partial moments. These statistics preserve the directional structure of the joint distribution and reveal nonlinear and asymmetric relationships that classical correlation can obscure.
+
+---
+
+## Directional Benchmarks
+
+Let \(X\) and \(Y\) be random variables with benchmarks \(t_X\) and \(t_Y\).
+
+Benchmarks may be chosen in several ways depending on the application:
+
+- **Internal benchmarks**, such as the mean or median.
+- **External benchmarks**, such as target returns or policy thresholds.
+- **Context-specific benchmarks**, reflecting operational constraints or decision thresholds.
+
+The benchmarks partition the joint distribution into four directional regions:
+
+\[
+X \le t_X, \quad Y \le t_Y
+\]
+
+\[
+X \le t_X, \quad Y > t_Y
+\]
+
+\[
+X > t_X, \quad Y \le t_Y
+\]
+
+\[
+X > t_X, \quad Y > t_Y
+\]
+
+These four regions represent combinations of directional deviations for the two variables.
+
+| | \(Y \le t_Y\) | \(Y > t_Y\) |
+|---|---|---|
+| \(X \le t_X\) | **CoLPM region** | **DUPM region** |
+| \(X > t_X\) | **DLPM region** | **CoUPM region** |
+
+Each quadrant corresponds directly to one of the four directional co-partial moments.
+
+---
+
+## Co-Partial Moments
+
+Let the positive-part operator be
+
+\[
+(x)^+ = \max(x,0).
+\]
+
+Directional co-partial moments measure joint deviations relative to the benchmarks.
+
+### Co-Lower Partial Moment
+
+\[
+CoLPM_{r,s}(X,Y)
+=
+E[(t_X-X)_+^r (t_Y-Y)_+^s]
+\]
+
+Joint deviations **below both benchmarks**.
+
+### Co-Upper Partial Moment
+
+\[
+CoUPM_{r,s}(X,Y)
+=
+E[(X-t_X)_+^r (Y-t_Y)_+^s]
+\]
+
+Joint deviations **above both benchmarks**.
+
+### Divergent Lower Partial Moment
+
+\[
+DLPM_{r,s}(X,Y)
+=
+E[(X-t_X)_+^r (t_Y-Y)_+^s]
+\]
+
+\(X\) above its benchmark while \(Y\) falls below.
+
+### Divergent Upper Partial Moment
+
+\[
+DUPM_{r,s}(X,Y)
+=
+E[(t_X-X)_+^r (Y-t_Y)_+^s]
+\]
+
+\(X\) below its benchmark while \(Y\) exceeds its benchmark.
+
+Together these four quantities provide a **directional decomposition of joint dependence**.
+
+---
+
+## Worked Example
+
+Consider the sample
+
+\[
+(X,Y) =
+(-3,-2), (-1,-1), (0,1), (2,4), (3,5).
+\]
+
+Let the benchmarks be
+
+\[
+t_X = 0, \quad t_Y = 0.
+\]
+
+Compute first-degree co-partial moments.
+
+### CoLPM
+
+\[
+CoLPM_{1,1}
+=
+\frac{1}{5}(3\cdot2 + 1\cdot1 + 0 + 0 + 0)
+=
+\frac{7}{5}
+=
+1.4
+\]
+
+### CoUPM
+
+\[
+CoUPM_{1,1}
+=
+\frac{1}{5}(0 + 0 + 0 + 2\cdot4 + 3\cdot5)
+=
+\frac{23}{5}
+=
+4.6
+\]
+
+### DLPM
+
+\[
+DLPM_{1,1} = 0
+\]
+
+### DUPM
+
+\[
+DUPM_{1,1} = 0
+\]
+
+The interpretation is immediate.
+
+- Downside dependence exists but is modest.
+- Upside deviations occur together strongly.
+- Divergent movements do not occur.
+
+In this dataset, whenever \(X\) is above its benchmark, \(Y\) is also above its benchmark, and whenever \(X\) is below its benchmark, \(Y\) is also below its benchmark. Consequently observations never fall into divergent regions.
+
+Boundary observations contribute zero to all co-partial moments. For example, the point \((0,1)\) lies exactly on the \(X\) benchmark, so both \((X-t_X)_+\) and \((t_X-X)_+\) equal zero.
+
+Real datasets rarely exhibit such perfect alignment, and in practice the divergent moments capture regions where one variable rises while the other falls.
+
+---
+
+## Dependence Versus Correlation
+
+Covariance aggregates directional components:
+
+\[
+Cov(X,Y)
+=
+CoUPM_{1,1}
++
+CoLPM_{1,1}
+-
+DLPM_{1,1}
+-
+DUPM_{1,1}.
+\]
+
+Correlation further standardizes covariance.
+
+\[
+(CoLPM,CoUPM,DLPM,DUPM)
+\rightarrow
+Cov(X,Y)
+\rightarrow
+Corr(X,Y)
+\]
+
+Directional statistics therefore preserves structural information lost through aggregation.
+
+---
+
+## Nonlinear Dependence Detection
+
+Directional dependence can reveal nonlinear relationships that correlation cannot detect.
+
+Consider
+
+\[
+Y = X^2
+\]
+
+with \(X\) symmetrically distributed around zero.
+
+In this case
+
+\[
+Corr(X,Y)=0.
+\]
+
+Despite zero correlation, the variables are perfectly dependent.
+
+Directional moments reveal the structure.
+
+When \(X>0\), both \(X\) and \(Y=X^2\) exceed their benchmarks, producing contributions to the **CoUPM region**.
+
+When \(X<0\), \(X\) lies below its benchmark while \(Y=X^2\) remains positive and therefore above its benchmark. These observations fall into the **DUPM region**, capturing the mirrored dependence structure.
+
+Thus the directional decomposition exposes dependence that the symmetric aggregation in correlation cancels.
+
+---
+
+## Asymmetric Dependence
+
+Many systems exhibit **asymmetric dependence**, where relationships differ between positive and negative deviations.
+
+Financial markets provide a common example.
+
+Assets may behave largely independently during rising markets but move strongly together during market crashes.
+
+In such cases
+
+\[
+CoLPM_{1,1} \gg CoUPM_{1,1}.
+\]
+
+A replicable simulation makes this asymmetry explicit. In the construction below, **negative shocks are shared** between both variables, while positive-side behavior is generated independently:
+
+```r
+library(NNS)
+set.seed(42)
+n <- 500
+shock <- rnorm(n)
+
+x <- ifelse(shock < 0, shock, rnorm(n))
+y <- ifelse(shock < 0, shock + rnorm(n, 0, 0.1), rnorm(n))
+
+Co.LPM(1, x, y, mean(x), mean(y))
+## [1] 0.2770795
+Co.UPM(1, x, y, mean(x), mean(y))
+## [1] 0.2103299
+D.LPM(1, 1, x, y, mean(x), mean(y))
+## [1] 0.06191035
+D.UPM(1, 1, x, y, mean(x), mean(y))
+## [1] 0.08481611
+```
+
+In typical runs, the concordant downside component exceeds the upside component, confirming that joint downside co-movement dominates while upside dependence remains weaker.
+
+Correlation averages across all regions and may therefore appear moderate even when downside dependence dominates.
+
+---
+
+## Tail-Sensitive Dependence
+
+Higher-order co-partial moments emphasize extreme deviations.
+
+Increasing the orders \(r\) and \(s\) increases sensitivity to large observations.
+
+\[
+CoLPM_{r,s}, \quad CoUPM_{r,s}
+\]
+
+measure **tail dependence**.
+
+For example, a risk manager concerned with extreme joint losses may examine \(CoLPM_{2,2}\) or higher orders rather than \(CoLPM_{1,1}\), since larger powers place greater weight on large deviations.
+
+---
+
+## Empirical Estimation
+
+Directional co-partial moments can be estimated from data.
+
+For observations
+
+\[
+(x_i,y_i), i=1,\dots,n
+\]
+
+the empirical estimators are
+
+\[
+\widehat{CoLPM}_{r,s}
+=
+\frac{1}{n}
+\sum_{i=1}^{n}
+(t_X-x_i)_+^r (t_Y-y_i)_+^s
+\]
+
+\[
+\widehat{CoUPM}_{r,s}
+=
+\frac{1}{n}
+\sum_{i=1}^{n}
+(x_i-t_X)_+^r (y_i-t_Y)_+^s
+\]
+
+\[
+\widehat{DLPM}_{r,s}
+=
+\frac{1}{n}
+\sum_{i=1}^{n}
+(x_i-t_X)_+^r (t_Y-y_i)_+^s
+\]
+
+\[
+\widehat{DUPM}_{r,s}
+=
+\frac{1}{n}
+\sum_{i=1}^{n}
+(t_X-x_i)_+^r (y_i-t_Y)_+^s
+\]
+
+These converge to population values by the law of large numbers.
+
+Statistical inference for these estimators—including bootstrap procedures—is discussed later in the book when directional dependence measures are applied in empirical analysis.
+
+In practice these quantities are implemented in the **Nonlinear Nonparametric Statistics (NNS)** framework, which computes empirical co-partial moment matrices and nonlinear dependence measures.
+
+---
+
+## Directional Dependence Profiles
+
+Directional dependence can be studied across **multiple moment orders**.
+
+\[
+CoLPM_{1,1},CoLPM_{2,2},CoLPM_{3,3},\dots
+\]
+
+\[
+CoUPM_{1,1},CoUPM_{2,2},CoUPM_{3,3},\dots
+\]
+
+These sequences describe how dependence changes across the distribution.
+
+Example:
+
+| Order | CoLPM | CoUPM |
+|---|---|---|
+| 1 | 1.2 | 1.1 |
+| 2 | 3.9 | 1.4 |
+| 3 | 8.5 | 1.8 |
+
+Moderate deviations appear symmetric, but higher orders reveal increasing **downside dependence**.
+
+Directional profiles therefore show **how dependence evolves from ordinary fluctuations to extreme events**.
+
+---
+
+## Summary
+
+This chapter introduced directional dependence using co-partial moments.
+
+Key ideas:
+
+1. Joint distributions partition into four directional regions.
+2. Co-partial moments measure deviations within those regions.
+3. Covariance and correlation arise as **aggregations** of directional components.
+4. Directional statistics reveals nonlinear, asymmetric, and tail-specific dependence.
+5. Empirical estimators can be computed directly from data.
+6. Dependence profiles show how relationships evolve across deviation magnitudes.
+
+Correlation therefore represents only a limited summary of joint behavior.
+
+Directional dependence provides a richer representation of relationships between variables.
+
+The next chapter connects this framework to **copula interpretation**, linking directional partial moments with rank-based dependence structures used in multivariate statistics.
diff --git a/tools/NNS/book/chapter-11-directional-spectral-decomposition.Rmd b/tools/NNS/book/chapter-11-directional-spectral-decomposition.Rmd
new file mode 100644
index 0000000..d4ac9a3
--- /dev/null
+++ b/tools/NNS/book/chapter-11-directional-spectral-decomposition.Rmd
@@ -0,0 +1,1291 @@
+# Directional Spectral Decomposition
+
+Chapters 9 and 10 developed dependence structure using directional co-partial moments.
+Chapter 9 showed that covariance and correlation arise from aggregations of directional co-partial moments.
+Chapter 10 showed that those directional components reveal nonlinear, asymmetric, and tail-specific dependence that correlation cannot detect.
+
+One further classical object lies downstream of covariance: its **eigenvalue decomposition**.
+
+Principal component analysis, factor models, covariance ellipses, and multivariate risk diagnostics all begin from the eigensystem of the covariance matrix.
+Since covariance itself is recovered from directional co-partial moment matrices, the eigensystem is also recoverable from those directional components.
+
+This chapter establishes that result and draws out its consequence:
+
+\[
+\text{PCA diagonalizes covariance.
+Directional decomposition explains where that covariance came from.}
+\]
+
+The eigensystem is not replaced.
+It is attributed.
+
+---
+
+## Classical Spectral Decomposition
+
+Let
+
+\[
+Z =
+\begin{pmatrix}
+X \\
+Y
+\end{pmatrix}
+\]
+
+be a bivariate random vector with mean
+
+\[
+\mu = E[Z]
+=
+\begin{pmatrix}
+\mu_X \\
+\mu_Y
+\end{pmatrix}.
+\]
+
+The covariance matrix is
+
+\[
+\Sigma
+=
+E[(Z-\mu)(Z-\mu)^\top].
+\]
+
+Since \(\Sigma\) is symmetric and positive semidefinite, it admits an orthonormal eigendecomposition
+
+\[
+\Sigma
+=
+V\Lambda V^\top,
+\]
+
+where \(V = (v_1, v_2)\) contains orthonormal eigenvectors and
+
+\[
+\Lambda =
+\begin{pmatrix}
+\lambda_1 & 0 \\
+0 & \lambda_2
+\end{pmatrix}
+\]
+
+contains eigenvalues with \(\lambda_1 \geq \lambda_2 \geq 0\).
+
+Classical PCA identifies \(v_1\) as the direction of maximum variance:
+
+\[
+v_1
+=
+\arg\max_{\|v\|=1} v^\top \Sigma v,
+\qquad
+\lambda_1 = v_1^\top \Sigma v_1.
+\]
+
+This is a powerful summary, but it remains a symmetric aggregate.
+It does not say whether the variance along \(v_1\) originated from concordant lower-side co-movement, concordant upper-side co-movement, divergent behavior, or residual scatter within directional regions.
+
+Directional spectral decomposition answers that question.
+
+---
+
+## Directional Recovery of the Eigensystem
+
+Chapter 9 established that the covariance matrix is recovered from directional co-partial moment matrices:
+
+\[
+\Sigma
+=
+\operatorname{CoLPM}
++
+\operatorname{CoUPM}
+-
+\operatorname{DLPM}
+-
+\operatorname{DUPM}.
+\]
+
+Since the covariance matrix determines its eigensystem, the classical eigensystem is also recovered from the directional aggregate:
+
+\[
+(\lambda_i, v_i)
+=
+\operatorname{eig}_i
+\!\left(
+\operatorname{CoLPM}
++
+\operatorname{CoUPM}
+-
+\operatorname{DLPM}
+-
+\operatorname{DUPM}
+\right).
+\]
+
+The information hierarchy therefore extends to
+
+\[
+(\operatorname{CoLPM},\operatorname{CoUPM},\operatorname{DLPM},\operatorname{DUPM})
+\rightarrow
+\Sigma
+\rightarrow
+(\lambda_i, v_i)
+\rightarrow
+\rho.
+\]
+
+The ordering is not symmetric.
+The directional matrices determine the covariance matrix and its eigenstructure.
+The eigenstructure does not determine the directional matrices.
+Many different directional structures can produce the same covariance matrix, and many different covariance matrices can produce similar principal directions.
+Once the directional components are aggregated into \(\Sigma\), the information about where the co-movement occurred is generally lost.
+
+This asymmetry is central:
+
+\[
+\text{Directional structure recovers PCA.
+PCA does not recover directional structure.}
+\]
+
+---
+
+## Quadrant Mean Geometry
+
+There is a second route to the eigensystem, more geometric than the co-partial moment reconstruction.
+It passes through the **conditional means of the four directional quadrants**.
+
+Let the benchmarks be the component means:
+
+\[
+t_X = \mu_X,
+\qquad
+t_Y = \mu_Y.
+\]
+
+The four directional quadrants are
+
+| Region | Condition | Interpretation |
+|---|---|---|
+| CUPM | \(X > \mu_X,\; Y > \mu_Y\) | concordant upper |
+| CLPM | \(X \leq \mu_X,\; Y \leq \mu_Y\) | concordant lower |
+| DLPM | \(X > \mu_X,\; Y \leq \mu_Y\) | divergent lower |
+| DUPM | \(X \leq \mu_X,\; Y > \mu_Y\) | divergent upper |
+
+For each quadrant \(q\), define the quadrant probability
+
+\[
+p_q = P(Q = q),
+\]
+
+the quadrant conditional mean
+
+\[
+m_q = E[Z \mid Q = q],
+\]
+
+and the centered quadrant mean displacement
+
+\[
+u_q = m_q - \mu.
+\]
+
+Because the quadrant means partition the distribution,
+
+\[
+\mu = \sum_q p_q m_q,
+\]
+
+which gives
+
+\[
+\sum_q p_q u_q = 0.
+\]
+
+This is the law of total expectation in vector form.
+It implies that the weighted quadrant mean displacements must balance around the global mean.
+
+The displacement vectors \(u_q\) are the geometric objects of interest.
+Each one points from the global mean to a quadrant conditional mean.
+They identify where the conditional mass of the distribution sits after the directional partition.
+
+---
+
+## Between-Within Covariance Decomposition
+
+The covariance matrix decomposes exactly through the quadrant partition.
+
+Inside quadrant \(q\), write
+
+\[
+Z - \mu
+=
+(m_q - \mu) + (Z - m_q)
+=
+u_q + \varepsilon_q,
+\]
+
+where \(E[\varepsilon_q \mid Q = q] = 0\).
+
+The conditional covariance contribution from quadrant \(q\) is
+
+\[
+E[(Z-\mu)(Z-\mu)^\top \mid Q = q]
+=
+u_q u_q^\top
++
+\operatorname{Cov}(Z \mid Q = q).
+\]
+
+Averaging across quadrants gives
+
+\[
+\boxed{
+\Sigma
+=
+\underbrace{\sum_q p_q u_q u_q^\top}_{\Sigma_Q}
++
+\underbrace{\sum_q p_q \operatorname{Cov}(Z \mid Q = q)}_{\Sigma_W}.
+}
+\]
+
+The first term, \(\Sigma_Q\), is the **between-quadrant covariance**: how much covariance arises from the locations of the quadrant conditional means relative to the global mean.
+
+The second term, \(\Sigma_W\), is the **within-quadrant covariance**: the remaining scatter around each quadrant mean, pooled across quadrants.
+
+This identity is the law of total covariance applied to the NNS quadrant partition.
+It is exact and requires no distributional assumptions.
+
+The two terms answer different questions.
+
+\[
+\text{PCA of }\Sigma\text{ is PCA of total covariance.}
+\]
+
+\[
+\text{PCA of }\Sigma_Q\text{ is PCA of conditional mean displacement.}
+\]
+
+These need not coincide.
+
+---
+
+## Rank-One Spectral Primitives
+
+Each quadrant contributes a rank-one matrix to the between-quadrant covariance:
+
+\[
+B_q
+=
+p_q u_q u_q^\top.
+\]
+
+When \(u_q \neq 0\),
+
+\[
+B_q u_q
+=
+p_q u_q u_q^\top u_q
+=
+p_q \|u_q\|^2 u_q.
+\]
+
+The vector \(u_q\) is the nonzero eigenvector of \(B_q\), with eigenvalue
+
+\[
+\lambda_q = p_q \|u_q\|^2.
+\]
+
+After normalization, \(v_q = u_q / \|u_q\|\) is the corresponding unit eigenvector.
+
+This is the precise sense in which each quadrant mean displacement is spectral.
+It is not generally an eigenvector of the full covariance matrix.
+It is exactly the eigenvector of its own rank-one contribution to \(\Sigma_Q\).
+
+The between-quadrant covariance is the sum of these rank-one primitives:
+
+\[
+\Sigma_Q
+=
+B_{\operatorname{CUPM}}
++B_{\operatorname{CLPM}}
++B_{\operatorname{DLPM}}
++B_{\operatorname{DUPM}}.
+\]
+
+Defining the matrix of weighted displacement columns,
+
+\[
+C
+=
+\begin{pmatrix}
+\sqrt{p_{\operatorname{CUPM}}}\,u_{\operatorname{CUPM}} &
+\sqrt{p_{\operatorname{CLPM}}}\,u_{\operatorname{CLPM}} &
+\sqrt{p_{\operatorname{DLPM}}}\,u_{\operatorname{DLPM}} &
+\sqrt{p_{\operatorname{DUPM}}}\,u_{\operatorname{DUPM}}
+\end{pmatrix},
+\]
+
+one has \(\Sigma_Q = CC^\top\).
+The eigenvectors of \(\Sigma_Q\) are the left singular vectors of \(C\), built entirely from weighted quadrant mean displacements.
+
+\[
+\boxed{
+\text{Centered NNS quadrant means are rank-one spectral primitives.}
+}
+\]
+
+---
+
+## Recovering Eigenvectors from Quadrant Conditional Means
+
+The preceding section gives the local rank-one statement.
+We now make the recovery step explicit.
+
+At a given NNS split, the only inputs needed for the between-quadrant eigensystem are the quadrant probabilities and the quadrant conditional means:
+
+\[
+\{p_q, m_q\}_{q \in \{\operatorname{CUPM},\operatorname{CLPM},\operatorname{DLPM},\operatorname{DUPM}\}}.
+\]
+
+From these quantities,
+
+\[
+\mu = \sum_q p_q m_q,
+\qquad
+u_q = m_q-\mu.
+\]
+
+Construct the weighted conditional-mean matrix
+
+\[
+C =
+\begin{pmatrix}
+\sqrt{p_{\operatorname{CUPM}}}u_{\operatorname{CUPM}} &
+\sqrt{p_{\operatorname{CLPM}}}u_{\operatorname{CLPM}} &
+\sqrt{p_{\operatorname{DLPM}}}u_{\operatorname{DLPM}} &
+\sqrt{p_{\operatorname{DUPM}}}u_{\operatorname{DUPM}}
+\end{pmatrix}.
+\]
+
+Then
+
+\[
+\Sigma_Q = CC^\top.
+\]
+
+Thus the eigenvectors of \(\Sigma_Q\) are recovered from the quadrant conditional means by
+
+\[
+\Sigma_Q v_j = \lambda_j v_j.
+\]
+
+Equivalently, take the singular value decomposition
+
+\[
+C = U\Lambda_Q^{1/2}R^\top.
+\]
+
+Then
+
+\[
+\Sigma_Q = CC^\top = U\Lambda_Q U^\top,
+\]
+
+so the columns of \(U\) are the eigenvectors of the conditional-mean covariance.
+No raw observations are needed at this step once the quadrant conditional means and probabilities have been computed.
+
+This is the exact role of the conditional means:
+
+\[
+\boxed{
+\{p_q,m_q\}_q
+\quad \Longrightarrow \quad
+\{u_q\}_q
+\quad \Longrightarrow \quad
+C
+\quad \Longrightarrow \quad
+\Sigma_Q
+\quad \Longrightarrow \quad
+(v_{Q,j},\lambda_{Q,j}).
+}
+\]
+
+The eigenvectors are therefore not imposed externally.
+They are recovered from the geometry of the four quadrant conditional means.
+
+A second, more visual representation comes from opposite quadrant centroid contrasts:
+
+\[
+a_C = m_{\operatorname{CUPM}} - m_{\operatorname{CLPM}},
+\qquad
+a_D = m_{\operatorname{DLPM}} - m_{\operatorname{DUPM}}.
+\]
+
+The first contrast joins the two concordant conditional means.
+The second contrast joins the two divergent conditional means.
+After normalization,
+
+\[
+\tilde v_C = \frac{a_C}{\|a_C\|},
+\qquad
+\tilde v_D = \frac{a_D}{\|a_D\|}.
+\]
+
+When the concordant and divergent centroid pairs lie on orthogonal local axes, these contrast directions are the eigenvectors of \(\Sigma_Q\):
+
+\[
+v_{Q,1} = \tilde v_C,
+\qquad
+v_{Q,2} = \tilde v_D,
+\]
+
+up to signs and ordering.
+This alignment condition is common in symmetric or nearly elliptical dependence structures, but it is not required for recovery.
+Without this special alignment, the eigenvectors of \(\Sigma_Q\) are the orthogonal principal axes obtained by diagonalizing \(CC^\top\); they remain weighted linear combinations of the same quadrant conditional mean displacements.
+
+The distinction is important:
+
+\[
+\boxed{
+\text{Individual }u_q\text{ are eigenvectors of their own }B_q=p_qu_qu_q^\top.
+}
+\]
+
+\[
+\boxed{
+\text{The full between-quadrant eigenvectors are recovered by summing those }B_q\text{ through }\Sigma_Q=CC^\top.
+}
+\]
+
+For the original PCA eigensystem of the full covariance matrix, add the within-quadrant residual covariance:
+
+\[
+\Sigma = \Sigma_Q+\Sigma_W.
+\]
+
+Then diagonalizing \(\Sigma\) recovers the classical eigenvectors.
+As recursive NNS partitions refine and terminal cells shrink, \(\Sigma_W\) decreases.
+At the finite-sample singleton limit, \(\Sigma_W=0\), so the eigenvectors of the full covariance are recovered entirely from conditional means of the terminal regions.
+
+---
+
+## Quadrant Mean Slope Versus PC1 Within a Quadrant
+
+The between-within decomposition clarifies a distinction that arises when analyzing a single directional quadrant.
+
+Consider the CLPM region.
+The CLPM mean displacement is
+
+\[
+u_{\operatorname{CLPM}} = m_{\operatorname{CLPM}} - \mu.
+\]
+
+The line from \(\mu\) through \(m_{\operatorname{CLPM}}\) has direction
+
+\[
+g_{\operatorname{CLPM}}
+=
+\frac{u_{\operatorname{CLPM}}}{\|u_{\operatorname{CLPM}}\|}.
+\]
+
+This is the eigenvector of the rank-one matrix \(B_{\operatorname{CLPM}}\).
+
+By contrast, the first principal component of the CLPM observations is the leading eigenvector of
+
+\[
+\operatorname{Cov}(Z \mid \operatorname{CLPM}).
+\]
+
+These are different matrices computed from different objects.
+The quadrant mean slope is a between-centroid displacement direction.
+The within-quadrant PC1 is a within-quadrant scatter direction.
+The within-quadrant regression line is a conditional least-squares direction.
+
+\[
+\text{quadrant mean slope}
+\neq
+\text{within-quadrant PC1}
+\neq
+\text{within-quadrant regression line.}
+\]
+
+For directional dependence analysis, the quadrant mean slope is the relevant object because it describes where conditional mass moved relative to the global mean benchmark.
+
+---
+
+## Eigenvalue Attribution
+
+The most useful result is not merely that the eigensystem is recoverable.
+It is that each eigenvalue can be attributed to directional sources.
+
+Let \(v_i\) be a unit eigenvector of \(\Sigma\).
+Then
+
+\[
+\lambda_i = v_i^\top \Sigma v_i.
+\]
+
+Substituting the between-within decomposition,
+
+\[
+\lambda_i
+=
+v_i^\top \Sigma_Q v_i
++
+v_i^\top \Sigma_W v_i.
+\]
+
+Expanding each term,
+
+\[
+\boxed{
+\lambda_i
+=
+\sum_q p_q (v_i^\top u_q)^2
++
+\sum_q p_q \, v_i^\top \operatorname{Cov}(Z \mid Q{=}q)\, v_i.
+}
+\]
+
+The first sum is the **between-quadrant contribution**: each term measures how much the \(i\)-th principal direction aligns with the conditional mean displacement of quadrant \(q\), weighted by the quadrant's probability.
+
+The second sum is the **within-quadrant contribution**: residual scatter around each quadrant mean, projected onto the principal direction.
+
+Define
+
+\[
+\lambda_{i,Q} = \sum_q p_q (v_i^\top u_q)^2,
+\qquad
+\lambda_{i,W} = \sum_q p_q \, v_i^\top \operatorname{Cov}(Z \mid Q{=}q)\, v_i,
+\]
+
+so that
+
+\[
+\lambda_i = \lambda_{i,Q} + \lambda_{i,W}.
+\]
+
+At the quadrant level, the between contribution from quadrant \(q\) to eigenvalue \(i\) is
+
+\[
+\lambda_{i,q}^{between} = p_q (v_i^\top u_q)^2.
+\]
+
+This answers a question ordinary PCA does not pose:
+
+\[
+\boxed{
+\text{Which directional regions of the joint distribution generated this eigenvalue?}
+}
+\]
+
+If most of \(\lambda_1\) arises from CLPM and CUPM terms, the leading principal direction is driven by concordant co-movement.
+If most of \(\lambda_1\) arises from DLPM and DUPM terms, the leading direction is driven by divergent behavior.
+If \(\Sigma_W\) dominates, the principal axis reflects residual within-region scatter rather than separation among conditional means.
+
+PCA reports the axis.
+Directional decomposition reports the sources of that axis.
+
+---
+
+## Two-Dimensional Explicit Recovery
+
+In two dimensions the eigensystem is recovered in closed form from the directional components.
+
+Let the reconstructed covariance matrix be
+
+\[
+\Sigma
+=
+\begin{pmatrix}
+a & b \\
+b & d
+\end{pmatrix}.
+\]
+
+The eigenvalues are
+
+\[
+\lambda_{1,2}
+=
+\frac{a+d}{2}
+\pm
+\sqrt{
+\left(\frac{a-d}{2}\right)^2 + b^2
+}.
+\]
+
+The principal-axis angle \(\theta\) satisfies
+
+\[
+\tan(2\theta)
+=
+\frac{2b}{a-d}.
+\]
+
+The eigenvectors are
+
+\[
+v_1
+=
+\begin{pmatrix}
+\cos\theta \\
+\sin\theta
+\end{pmatrix},
+\qquad
+v_2
+=
+\begin{pmatrix}
+-\sin\theta \\
+\cos\theta
+\end{pmatrix}.
+\]
+
+Because the directional pieces reconstruct \(a\), \(b\), and \(d\) exactly, they also reconstruct \(\lambda_{1,2}\) and \(v_{1,2}\) exactly.
+The recovery is complete.
+
+---
+
+## Recursive Spectral Refinement
+
+Chapter 10 noted that NNS partitioning can be applied recursively, subdividing each region into further quadrants.
+This produces a nested sequence of partitions \(\mathcal{P}_1, \mathcal{P}_2, \ldots, \mathcal{P}_O\).
+
+For any partition \(\mathcal{P}_O\) with cells \(r\), probabilities \(p_r\), conditional means \(m_r\), and displacements \(u_r = m_r - \mu\), the same decomposition applies:
+
+\[
+\Sigma
+=
+\sum_r p_r u_r u_r^\top
++
+\sum_r p_r \operatorname{Cov}(Z \mid r).
+\]
+
+Each cell contributes a rank-one between-cell primitive \(B_r = p_r u_r u_r^\top\).
+
+As the partition refines, within-cell covariance decreases.
+At the limit where each terminal cell contains one observation,
+
+\[
+\operatorname{Cov}(Z \mid r) = 0
+\]
+
+for every cell, and the full empirical covariance is represented entirely by terminal cell centroids:
+
+\[
+\Sigma = \sum_r p_r u_r u_r^\top.
+\]
+
+The eigenvalue perturbation bound follows immediately from Weyl's inequality.
+Since \(\Sigma - B_O = W_O\),
+
+\[
+|\lambda_i(\Sigma) - \lambda_i(B_O)| \leq \|W_O\|_2.
+\]
+
+When the eigenvalue gap is large, fewer partition steps are needed to recover the dominant principal direction.
+
+Recursive NNS partitioning therefore provides a **multiscale positive semidefinite decomposition of conditional mean covariance**: each split explains part of the residual total covariance through the between-child displacement, and these contributions accumulate monotonically as cells are refined.
+
+---
+
+## Multivariate Extension
+
+The same construction extends to any dimension \(d\).
+
+A mean split across all \(d\) coordinates produces up to \(2^d\) orthants.
+For any partition with cells \(r\), the decomposition
+
+\[
+\Sigma
+=
+\sum_r p_r u_r u_r^\top
++
+\sum_r p_r \operatorname{Cov}(Z \mid r)
+\]
+
+holds in \(\mathbb{R}^{d \times d}\).
+
+Each cell contributes a rank-one positive semidefinite matrix \(B_r = p_r u_r u_r^\top\).
+The between-cell covariance has rank at most \(\min(d, K-1)\), where \(K\) is the number of occupied cells.
+The \(K-1\) bound appears because the constraint \(\sum_r p_r u_r = 0\) removes one degree of freedom.
+
+The multivariate statement therefore matches the bivariate one:
+
+\[
+\boxed{
+\text{NNS partitions generate locatable rank-one spectral primitives in any dimension.}
+}
+\]
+
+See the following for a [detailed higher dimension example](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/nns-directional-spectral-decomposition.md).
+
+---
+
+## Converse Failure
+
+The directional decomposition runs in one direction.
+
+Given the directional pieces \(\{p_q, m_q, \operatorname{Cov}(Z \mid Q{=}q)\}_q\), the covariance matrix and its eigensystem are recovered.
+Given only \(\Sigma\) or its eigensystem \((V, \Lambda)\), the directional pieces are not recovered.
+
+Neither \(\Sigma\) nor \((V, \Lambda)\) determines
+
+- which observations belong to CLPM, CUPM, DLPM, or DUPM,
+- the quadrant probabilities \(p_q\),
+- the quadrant conditional means \(m_q\),
+- the within-quadrant covariance matrices \(\operatorname{Cov}(Z \mid Q{=}q)\),
+- or the higher-order directional moment profiles.
+
+In symbols:
+
+\[
+\{p_q, m_q, \operatorname{Cov}(Z \mid Q{=}q)\}_q
+\rightarrow
+\Sigma
+\rightarrow
+(V, \Lambda)
+\]
+
+is recoverable in both steps, while
+
+\[
+(V, \Lambda)
+\rightarrow
+\{p_q, m_q, \operatorname{Cov}(Z \mid Q{=}q)\}_q
+\]
+
+is not.
+
+This is another instance of the information-loss principle developed throughout the book.
+Directional components determine symmetric aggregates.
+Symmetric aggregates do not generally determine directional components.
+
+---
+
+## Correct Claims and Caveats
+
+Several precise statements follow from the results above.
+
+Correct:
+
+\[
+\text{NNS directional co-partial moment matrices recover covariance and therefore recover the classical eigensystem.}
+\]
+
+Correct:
+
+\[
+\text{Each centered quadrant mean }u_q\text{ is the eigenvector of }B_q = p_q u_q u_q^\top.
+\]
+
+Correct:
+
+\[
+\text{Classical eigenvalues admit exact directional attribution: }
+\lambda_i = \lambda_{i,Q} + \lambda_{i,W}.
+\]
+
+Correct:
+
+\[
+\text{The full eigenvectors of }\Sigma_Q\text{ are obtained after summing the }B_q\text{ matrices, not before.}
+\]
+
+Not generally correct:
+
+\[
+\text{Every line connecting opposite quadrant centroids is an eigenvector of }\Sigma_Q.
+\]
+
+That holds only when concordant and divergent quadrant means lie on orthogonal axes, a special alignment condition that may hold approximately for nearly elliptical positive dependence but is not guaranteed in general.
+
+Not generally correct:
+
+\[
+\lambda_1(\Sigma_Q + \Sigma_W) = \lambda_1(\Sigma_Q) + \lambda_1(\Sigma_W).
+\]
+
+Eigenvalues are not additive across summands.
+The exact attribution uses the Rayleigh quotient along the same eigenvector:
+
+\[
+\lambda_i
+=
+v_i^\top \Sigma_Q v_i
++
+v_i^\top \Sigma_W v_i.
+\]
+
+These caveats strengthen rather than weaken the results.
+They identify precisely what is claimed: locatable spectral attribution, not a replacement for linear algebra.
+
+---
+
+## Quadrant Decomposition in R
+
+The following functions implement the between-within decomposition.
+Population normalization \(1/n\) is used throughout to match the expectation formulas above.
+
+```{r spectral-setup}
+pop_cov <- function(Z) {
+ Z <- as.matrix(Z)
+ Zc <- sweep(Z, 2, colMeans(Z), FUN = "-")
+ crossprod(Zc) / nrow(Zc)
+}
+
+quadrant_decomposition <- function(Z) {
+ Z <- as.matrix(Z)
+ stopifnot(ncol(Z) == 2)
+
+ n <- nrow(Z)
+ mu <- colMeans(Z)
+ mx <- mu[1]; my <- mu[2]
+
+ quadrants <- list(
+ CUPM = (Z[,1] > mx) & (Z[,2] > my),
+ CLPM = (Z[,1] <= mx) & (Z[,2] <= my),
+ DLPM = (Z[,1] > mx) & (Z[,2] <= my),
+ DUPM = (Z[,1] <= mx) & (Z[,2] > my)
+ )
+
+ Sigma_Q <- matrix(0, 2, 2)
+ Sigma_W <- matrix(0, 2, 2)
+ details <- vector("list", length(quadrants))
+ centroids <- vector("list", length(quadrants))
+
+ for (i in seq_along(quadrants)) {
+ qname <- names(quadrants)[i]
+ mask <- quadrants[[i]]
+ n_q <- sum(mask)
+ if (n_q == 0) next
+
+ p_q <- n_q / n
+ Z_q <- Z[mask, , drop = FALSE]
+ m_q <- colMeans(Z_q)
+ u_q <- m_q - mu
+
+ Sigma_Q <- Sigma_Q + p_q * tcrossprod(u_q)
+ Sigma_W <- Sigma_W + p_q * pop_cov(Z_q)
+
+ centroids[[i]] <- data.frame(
+ quadrant = qname,
+ n = n_q,
+ p = p_q,
+ mean_x = m_q[1],
+ mean_y = m_q[2],
+ u_x = u_q[1],
+ u_y = u_q[2]
+ )
+
+ details[[i]] <- data.frame(
+ quadrant = qname,
+ n = n_q,
+ p = round(p_q, 6),
+ mean_x = round(m_q[1], 6),
+ mean_y = round(m_q[2], 6),
+ u_x = round(u_q[1], 6),
+ u_y = round(u_q[2], 6),
+ lambda_rank1 = round(p_q * sum(u_q^2), 6)
+ )
+ }
+
+ list(
+ mu = mu,
+ Sigma = pop_cov(Z),
+ Sigma_Q = Sigma_Q,
+ Sigma_W = Sigma_W,
+ centroids = do.call(rbind, centroids),
+ details = do.call(rbind, details)
+ )
+}
+```
+
+Generate positively dependent bivariate data, compute the quadrant decomposition, and pass \(\Sigma\) directly to `eigen` for attribution.
+
+```{r spectral-data}
+set.seed(123)
+n <- 10000
+rho <- 0.70
+R <- matrix(c(1, rho, rho, 1), 2, 2)
+Z <- matrix(rnorm(2 * n), n, 2) %*% chol(R)
+
+D <- quadrant_decomposition(Z)
+eig_classical <- eigen(D$Sigma)
+
+D$details
+```
+
+The quadrant summary shows the four conditional mean displacements \(u_q\) and their rank-one eigenvalues \(p_q \|u_q\|^2\).
+For positively dependent data the concordant quadrants (CUPM, CLPM) carry large displacements in opposite directions along the main axis, while the divergent quadrants (DLPM, DUPM) show much smaller displacements orthogonal to it.
+
+---
+
+## Rank-One Primitive Verification in R
+
+The previous table reports the rank-one eigenvalue for each quadrant.
+The following code verifies the stronger statement from Section 11.5 directly: for each quadrant,
+
+\[
+B_q u_q = p_q u_q u_q^\top u_q = p_q \|u_q\|^2 u_q.
+\]
+
+It also verifies that summing the four rank-one primitives reconstructs the between-quadrant conditional-mean covariance \(\Sigma_Q\).
+
+```{r rank-one-primitive-verification}
+rank_one_primitive_check <- function(D) {
+ pieces <- lapply(seq_len(nrow(D$centroids)), function(i) {
+ row <- D$centroids[i, ]
+ u <- c(row$u_x, row$u_y)
+ p <- row$p
+
+ Bq <- p * tcrossprod(u)
+ lambda <- p * sum(u^2)
+
+ lhs <- as.numeric(Bq %*% u)
+ rhs <- lambda * u
+
+ eig_Bq <- eigen(Bq, symmetric = TRUE)
+ v_q <- as.numeric(u / sqrt(sum(u^2)))
+
+ list(
+ Bq = Bq,
+ table = data.frame(
+ quadrant = row$quadrant,
+ lambda_rank1 = lambda,
+ max_abs_Bu_minus_lambda_u = max(abs(lhs - rhs)),
+ alignment_with_eigen_Bq = abs(drop(crossprod(v_q, eig_Bq$vectors[, 1])))
+ )
+ )
+ })
+
+ Sigma_Q_from_Bq <- Reduce(`+`, lapply(pieces, `[[`, "Bq"))
+
+ list(
+ checks = do.call(rbind, lapply(pieces, `[[`, "table")),
+ Sigma_Q_from_Bq = Sigma_Q_from_Bq,
+ max_abs_Sigma_Q_error = max(abs(Sigma_Q_from_Bq - D$Sigma_Q))
+ )
+}
+
+R1 <- rank_one_primitive_check(D)
+
+R1$checks
+R1$max_abs_Sigma_Q_error
+```
+
+The column `max_abs_Bu_minus_lambda_u` should be numerically zero, confirming that each centered quadrant conditional mean is the eigenvector of its own rank-one matrix.
+The column `alignment_with_eigen_Bq` should be one up to floating-point precision, confirming that the normalized vector \(u_q / \|u_q\|\) is the same direction recovered by `eigen(Bq)`.
+The final scalar verifies
+
+\[
+\Sigma_Q = B_{\operatorname{CUPM}} + B_{\operatorname{CLPM}} + B_{\operatorname{DLPM}} + B_{\operatorname{DUPM}}.
+\]
+
+---
+
+## Conditional-Mean Eigenvector Recovery in R
+
+The following code uses only the quadrant probabilities and quadrant conditional means stored in `D$centroids`.
+It reconstructs \(C\), then \(\Sigma_Q = CC^\top\), then the eigenvectors of the between-quadrant conditional-mean covariance.
+
+```{r conditional-mean-eigenvector-recovery}
+recover_eigen_from_quadrant_means <- function(D) {
+ U <- as.matrix(D$centroids[, c("u_x", "u_y")])
+ P <- D$centroids$p
+
+ Cmat <- t(sqrt(P) * U)
+ Sigma_Q_from_means <- Cmat %*% t(Cmat)
+ eig_Q <- eigen(Sigma_Q_from_means)
+
+ list(
+ C = Cmat,
+ Sigma_Q_from_means = Sigma_Q_from_means,
+ eig_Q = eig_Q
+ )
+}
+
+Qrec <- recover_eigen_from_quadrant_means(D)
+
+# This should match D$Sigma_Q, which was accumulated quadrant by quadrant.
+max(abs(Qrec$Sigma_Q_from_means - D$Sigma_Q))
+
+# Eigenvectors recovered from quadrant conditional means.
+Qrec$eig_Q$vectors
+
+# Same eigensystem obtained by diagonalizing the stored Sigma_Q.
+eigen(D$Sigma_Q)$vectors
+
+# Eigenvector signs are arbitrary. Absolute inner products should be near 1.
+abs(t(Qrec$eig_Q$vectors) %*% eigen(D$Sigma_Q)$vectors)
+```
+
+The recovery above is the explicit conditional-mean step:
+
+\[
+\{p_q,m_q\}_q
+\rightarrow
+C
+\rightarrow
+CC^\top
+\rightarrow
+\operatorname{eig}(CC^\top).
+\]
+
+The same conditional means also provide the visible concordant and divergent centroid contrasts.
+
+```{r centroid-contrast-directions}
+unit <- function(x) as.numeric(x / sqrt(sum(x^2)))
+
+centroid <- function(D, qname) {
+ row <- D$centroids[D$centroids$quadrant == qname, ]
+ c(row$mean_x, row$mean_y)
+}
+
+v_concordant <- unit(centroid(D, "CUPM") - centroid(D, "CLPM"))
+v_divergent <- unit(centroid(D, "DLPM") - centroid(D, "DUPM"))
+
+contrast_comparison <- cbind(
+ concordant_contrast = v_concordant,
+ Sigma_Q_v1 = Qrec$eig_Q$vectors[, 1],
+ divergent_contrast = v_divergent,
+ Sigma_Q_v2 = Qrec$eig_Q$vectors[, 2]
+)
+
+round(contrast_comparison, 6)
+
+# Alignment between contrast directions and the Sigma_Q eigenvectors.
+round(c(
+ concordant_with_v1 = abs(drop(crossprod(v_concordant, Qrec$eig_Q$vectors[, 1]))),
+ divergent_with_v2 = abs(drop(crossprod(v_divergent, Qrec$eig_Q$vectors[, 2])))
+), 6)
+```
+
+For the positively dependent example, the concordant contrast aligns closely with the first between-quadrant eigenvector and the divergent contrast aligns closely with the second.
+In more asymmetric samples, the exact recovery remains \(C \rightarrow CC^\top \rightarrow \operatorname{eig}(CC^\top)\), while the centroid contrasts provide the directly visible directional geometry.
+
+---
+
+## Eigenvalue Attribution in R
+
+The following code attributes each classical eigenvalue into between-quadrant and within-quadrant components using the Rayleigh quotient.
+
+```{r spectral-attribution}
+V <- eig_classical$vectors
+
+attribute_lambda <- function(v, Sigma_Q, Sigma_W) {
+ v <- matrix(v, ncol = 1)
+ between <- drop(t(v) %*% Sigma_Q %*% v)
+ within <- drop(t(v) %*% Sigma_W %*% v)
+ c(between = between, within = within, total = between + within)
+}
+
+attrib_1 <- attribute_lambda(V[, 1], D$Sigma_Q, D$Sigma_W)
+attrib_2 <- attribute_lambda(V[, 2], D$Sigma_Q, D$Sigma_W)
+
+rbind(lambda_1 = attrib_1, lambda_2 = attrib_2)
+eig_classical$values
+```
+
+This is the exact decomposition \(\lambda_i = v_i^\top \Sigma_Q v_i + v_i^\top \Sigma_W v_i\).
+The totals in each row match the classical eigenvalues to floating-point precision.
+
+Quadrant-level attribution of the between component:
+
+```{r spectral-quadrant-attribution}
+quadrant_between_contrib <- function(Z, v) {
+ Z <- as.matrix(Z)
+ v <- as.numeric(v)
+ n <- nrow(Z)
+ mu <- colMeans(Z)
+ mx <- mu[1]; my <- mu[2]
+
+ quadrants <- list(
+ CUPM = (Z[,1] > mx) & (Z[,2] > my),
+ CLPM = (Z[,1] <= mx) & (Z[,2] <= my),
+ DLPM = (Z[,1] > mx) & (Z[,2] <= my),
+ DUPM = (Z[,1] <= mx) & (Z[,2] > my)
+ )
+
+ out <- data.frame(quadrant = names(quadrants), contribution = NA_real_)
+
+ for (i in seq_along(quadrants)) {
+ mask <- quadrants[[i]]
+ if (sum(mask) == 0) { out$contribution[i] <- 0; next }
+ p_q <- sum(mask) / n
+ u_q <- colMeans(Z[mask, , drop = FALSE]) - mu
+ out$contribution[i] <- p_q * sum(v * u_q)^2
+ }
+
+ out
+}
+
+q_attr_v1 <- quadrant_between_contrib(Z, V[, 1])
+q_attr_v2 <- quadrant_between_contrib(Z, V[, 2])
+
+# PC1: driven by concordant quadrants
+q_attr_v1
+sum(q_attr_v1$contribution) # matches attrib_1["between"]
+
+# PC2: driven by divergent quadrants
+q_attr_v2
+sum(q_attr_v2$contribution) # matches attrib_2["between"]
+```
+
+For this positively dependent example the pattern is clear.
+PC1 receives nearly all its between-quadrant contribution from CUPM and CLPM: the leading principal direction is a concordant co-movement axis.
+PC2 receives nearly all its between-quadrant contribution from DLPM and DUPM: the minor axis is a divergent direction.
+This is the interpretive content that classical PCA alone cannot supply.
+
+---
+
+## Visualizing the CLPM Mean Slope
+
+The following figure displays the three distinct directions associated with the CLPM region: the quadrant mean slope, the within-CLPM regression line, and the within-CLPM first principal component.
+The exact picture varies by random seed; the three lines generally differ because they are computed from different statistical objects.
+
+```{r clpm-mean-slope-figure, fig.width=6, fig.height=6}
+set.seed(321)
+n <- 5000
+Z0 <- matrix(rnorm(2 * n), n, 2)
+mu0 <- colMeans(Z0)
+clpm_mask <- Z0[,1] <= mu0[1] & Z0[,2] <= mu0[2]
+Zc <- Z0[clpm_mask, , drop = FALSE]
+mc <- colMeans(Zc)
+
+plot(
+ Zc[, 1], Zc[, 2],
+ pch = 1, cex = 0.35,
+ xlab = "X", ylab = "Y",
+ main = "CLPM: Mean Slope, Regression, and PC1"
+)
+abline(v = mu0[1], lty = 2)
+abline(h = mu0[2], lty = 2)
+points(mu0[1], mu0[2], pch = 19, cex = 1.2)
+points(mc[1], mc[2], pch = 19, cex = 1.2)
+
+# Quadrant mean slope: line through global mean toward CLPM conditional mean
+slope_mean <- (mc[2] - mu0[2]) / (mc[1] - mu0[1])
+abline(a = mu0[2] - slope_mean * mu0[1], b = slope_mean, col = "green", lwd = 2)
+
+# Linear regression inside CLPM
+fit <- lm(Zc[, 2] ~ Zc[, 1])
+abline(fit, col = "gold", lwd = 2)
+
+# PC1 inside CLPM: leading eigenvector of within-CLPM covariance
+pc1 <- eigen(pop_cov(Zc))$vectors[, 1]
+slope_pc1 <- pc1[2] / pc1[1]
+abline(a = mc[2] - slope_pc1 * mc[1], b = slope_pc1, col = "blue", lwd = 2)
+
+legend(
+ "bottomright",
+ legend = c("CLPM mean slope", "CLPM regression", "CLPM PC1"),
+ col = c("green", "gold", "blue"),
+ lwd = 2,
+ bty = "n"
+)
+```
+
+The green line is the eigenvector direction of the rank-one matrix
+
+\[
+B_{\operatorname{CLPM}}
+=
+p_{\operatorname{CLPM}} u_{\operatorname{CLPM}} u_{\operatorname{CLPM}}^\top.
+\]
+
+The blue line is the leading eigenvector of \(\operatorname{Cov}(Z \mid \operatorname{CLPM})\).
+The yellow line is the ordinary least-squares regression line within the CLPM subset.
+They differ because they summarize different statistical objects.
+
+---
+
+## Practical Interpretation
+
+Directional spectral decomposition changes what can be said about a classical PCA result.
+
+**Risk and Stress Testing.**
+If the leading eigenvalue of a return covariance matrix is dominated by CLPM contributions, the leading risk factor is primarily a joint downside event.
+That is more actionable than knowing only that assets load on a common factor.
+
+**Portfolio Construction.**
+Mean-variance optimization treats the covariance matrix as a single object.
+Directional spectral decomposition separates it into concordant downside, concordant upside, divergent, between-centroid, and within-region components.
+An analyst can ask whether an optimized portfolio is exposed to broad covariance or specifically to lower-tail covariance.
+
+**Nonlinear Dependence.**
+When correlation is small but \(\Sigma_Q\) is large relative to \(\Sigma_W\), the conditional means are separated across quadrants in a way that total covariance may mask.
+This signals potential nonlinear or regime-dependent structure worth investigating through the full directional dependence measures of Chapter 10.
+
+**Model Diagnostics.**
+The ratio
+
+\[
+D_{spectral} = \frac{\operatorname{tr}(\Sigma_Q)}{\operatorname{tr}(\Sigma)}
+\]
+
+measures the share of total covariance trace explained by between-quadrant mean displacement.
+A high value indicates that the quadrant partition is capturing the relevant second-moment geometry.
+A low value indicates that within-region residual scatter dominates, and finer partitioning or higher-order moments may be warranted.
+
+A full numerical implementation with transition matrix estimation, one-step predictive mixtures, and dynamic eigenvalue attribution by transition path is available at [OVVO-Financial/NNS: directional-markov-regimes-pca.md](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/directional-markov-regimes-pca.md).
+
+---
+
+## Summary
+
+This chapter extended the directional decomposition of covariance into spectral analysis.
+
+Key results:
+
+1. The directional co-partial moment matrices recover covariance and therefore recover the classical PCA eigensystem:
+
+\[
+(\operatorname{CoLPM},\operatorname{CoUPM},\operatorname{DLPM},\operatorname{DUPM})
+\rightarrow
+\Sigma
+\rightarrow
+(\lambda_i, v_i).
+\]
+
+2. The covariance matrix decomposes into between-quadrant and within-quadrant components:
+
+\[
+\Sigma
+=
+\sum_q p_q u_q u_q^\top
++
+\sum_q p_q \operatorname{Cov}(Z \mid Q{=}q).
+\]
+
+3. Each quadrant conditional mean displacement generates a rank-one spectral primitive
+
+\[
+B_q = p_q u_q u_q^\top
+\]
+
+with eigenvector \(u_q\) and eigenvalue \(p_q \|u_q\|^2\).
+
+4. Classical eigenvalues admit exact directional attribution:
+
+\[
+\lambda_i
+=
+\sum_q p_q (v_i^\top u_q)^2
++
+\sum_q p_q \, v_i^\top \operatorname{Cov}(Z \mid Q{=}q)\, v_i.
+\]
+
+5. PCA is downstream of the NNS directional structure.
+The directional components recover PCA and explain which quadrant regions generated the result.
+
+6. The converse fails: the eigensystem does not recover the directional components.
+
+The central message is:
+
+\[
+\text{PCA diagonalizes covariance.
+Directional decomposition explains the sources of that covariance.}
+\]
+
+The next chapter connects the directional partial moment framework to **copula interpretation**, linking co-partial moments with rank-based dependence structures used in multivariate statistics.
\ No newline at end of file
diff --git a/tools/NNS/book/chapter-12-copula-interpretation.Rmd b/tools/NNS/book/chapter-12-copula-interpretation.Rmd
new file mode 100644
index 0000000..8646ecb
--- /dev/null
+++ b/tools/NNS/book/chapter-12-copula-interpretation.Rmd
@@ -0,0 +1,437 @@
+# Copula Interpretation
+
+Chapters 9 and 10 showed that classical dependence measures such as covariance and correlation arise from **aggregations of directional co-partial moments**. Directional statistics preserves the structure of joint deviations by separating contributions across regions of the joint distribution rather than collapsing them into a single summary statistic.
+
+Another widely used framework for describing dependence is **copula theory**, which represents dependence by transforming variables into probability space and isolating the joint structure from the marginal distributions.
+
+This chapter shows that the directional framework connects naturally to copula theory. In particular, directional co-partial moments can be interpreted as **magnitude-weighted dependence measures within copula space**.
+
+---
+
+## Copula Fundamentals
+
+Let \(X\) and \(Y\) be continuous random variables with cumulative distribution functions
+
+\[
+F_X(x), \qquad F_Y(y).
+\]
+
+Define the probability transforms
+
+\[
+U = F_X(X), \qquad V = F_Y(Y).
+\]
+
+By the probability integral transform,
+
+\[
+U, V \sim \text{Uniform}(0,1).
+\]
+
+This result holds when \(X\) and \(Y\) are continuous random variables. For discrete or mixed distributions the probability integral transform requires minor adjustments, but the continuous case suffices for the conceptual development here.
+
+The joint distribution of \((U,V)\) is called the **copula** of \((X,Y)\):
+
+\[
+C(u,v) = P(U \le u, V \le v).
+\]
+
+Sklar’s theorem states that any joint distribution can be written
+
+\[
+F_{X,Y}(x,y)
+=
+C(F_X(x),F_Y(y)).
+\]
+
+Thus the copula isolates the **dependence structure independently of the marginal distributions**.
+
+---
+
+## Directional Statistics and Probability Space
+
+The directional framework provides a natural interpretation of this transformation.
+
+From earlier chapters, the cumulative distribution function equals the degree-zero lower partial moment:
+
+\[
+F_X(t) = L_0(t;X).
+\]
+
+Thus the copula transformation
+
+\[
+U = F_X(X)
+\]
+
+can be written
+
+\[
+U = L_0(X;X).
+\]
+
+In other words, copula coordinates arise directly from **directional probability transforms**.
+
+Each observation is mapped into probability space according to its position within the cumulative distribution.
+
+---
+
+## Directional Regions in Copula Space
+
+Once transformed, the joint distribution lies in the unit square
+
+\[
+[0,1]^2.
+\]
+
+Benchmarks partition this square into directional regions.
+
+For probability thresholds \(u_t\) and \(v_t\),
+
+| | \(V \le v_t\) | \(V > v_t\) |
+|---|---|---|
+| \(U \le u_t\) | joint lower region | divergent region |
+| \(U > u_t\) | divergent region | joint upper region |
+
+These correspond directly to the four directional regions defined earlier:
+
+- **CoLPM region:** both variables below benchmark
+- **CoUPM region:** both variables above benchmark
+- **DLPM region:** \(X\) above benchmark, \(Y\) below
+- **DUPM region:** \(X\) below benchmark, \(Y\) above
+
+Thus the directional framework partitions the copula domain in the same way that co-partial moments partition the original joint distribution.
+
+---
+
+## Co-Partial Moments as Weighted Copula Regions
+
+Directional co-partial moments measure deviations within these regions.
+
+For benchmarks \(t_X\) and \(t_Y\),
+
+\[
+CoLPM_{r,s}(X,Y)
+=
+E[(t_X-X)_+^r (t_Y-Y)_+^s].
+\]
+
+The corresponding copula probability is
+
+\[
+P(X \le t_X, Y \le t_Y)
+=
+C(F_X(t_X),F_Y(t_Y)).
+\]
+
+Equivalently,
+
+\[
+CoLPM_{0,0}(t_X,t_Y)
+=
+C(F_X(t_X),F_Y(t_Y)).
+\]
+
+Thus
+
+- copulas measure **probability of directional regions**, while
+- co-partial moments measure **magnitude of deviations within those regions**.
+
+Higher orders \(r,s\) increase sensitivity to extreme observations, producing a continuous generalization of tail dependence.
+
+---
+
+## Copula Representation of Co-Partial Moments
+
+Directional co-partial moments admit a direct representation in copula space.
+
+**Theorem 11.1 (Copula Representation of Co-Partial Moments)**
+
+Let \(X\) and \(Y\) be continuous random variables with copula \(C(u,v)\) and quantile functions \(Q_X(u)\), \(Q_Y(v)\). Here
+
+\[
+Q_X(u)=F_X^{-1}(u), \qquad Q_Y(v)=F_Y^{-1}(v)
+\]
+
+denote the marginal quantile functions.
+
+Then the co-lower partial moment can be written
+
+\[
+CoLPM_{r,s}(X,Y)
+=
+\int_0^1
+\int_0^1
+(t_X - Q_X(u))_+^r
+(t_Y - Q_Y(v))_+^s
+\, dC(u,v).
+\]
+
+**Proof.**
+
+By Sklar’s theorem,
+
+\[
+(X,Y) =
+(Q_X(U), Q_Y(V))
+\]
+
+where \((U,V)\) follows the copula \(C\).
+
+Substituting into the definition of the co-lower partial moment gives
+
+\[
+E[(t_X-Q_X(U))_+^r (t_Y-Q_Y(V))_+^s].
+\]
+
+Expressing the expectation with respect to the copula distribution yields the result. ∎
+
+This representation shows that copulas describe **probability mass over directional regions**, while co-partial moments additionally weight observations by their **deviation magnitudes**.
+
+---
+
+## Example: Directional Dependence Surface and Copula Transformation
+
+The following illustration uses functions from the **NNS R package** introduced earlier. In particular, `Co.LPM()` computes co-lower partial moments and `LPM.ratio()` produces the probability transform used for directional ranking.
+
+Generate correlated Gaussian observations:
+
+```r
+library(MASS)
+
+set.seed(123)
+
+Sigma <- matrix(c(1,0.7,0.7,1),2,2)
+xy <- mvrnorm(100,c(0,0),Sigma)
+
+x <- xy[,1]
+y <- xy[,2]
+
+z <- expand.grid(x,y)
+```
+
+Plot the **Co-Lower Partial Moment surface** relative to benchmark \(t_X=t_Y=0\):
+
+```r
+rgl::plot3d(
+ z[,1],
+ z[,2],
+ Co.LPM(0,z[,1],z[,2],z[,1],z[,2]),
+ col="red"
+)
+```
+
+
+
+
+
+In the call `Co.LPM(0, z[,1], z[,2], z[,1], z[,2])`, the argument order is `(degree, x, y, target.x, target.y)`. Setting `degree = 0` produces the probability-level co-lower partial moment, and reusing `z[,1]`/`z[,2]` as both variables and targets evaluates the surface over the full grid of benchmark pairs. The visualization uses `rgl::plot3d`, which produces an interactive three-dimensional plot.
+
+This surface represents the magnitude of **joint downside deviations** in the original variable space.
+
+Next transform the variables into probability space:
+
+```r
+u_x <- LPM.ratio(0,x,x)
+u_y <- LPM.ratio(0,y,y)
+
+z <- expand.grid(u_x,u_y)
+```
+
+Plotting the same directional statistic in probability space gives
+
+```r
+rgl::plot3d(
+ z[,1],
+ z[,2],
+ Co.LPM(0,z[,1],z[,2],z[,1],z[,2]),
+ col="blue"
+)
+```
+
+
+![Figure 11.2. Co-LPM surface in copula/probability space \([0,1]^2\) (blue), showing the same dependence geometry after marginal probability transformation.](images/ch11_transformed_copula.png)
+
+
+
+The resulting surface lies within the unit square \([0,1]^2\), which represents the **copula domain**.
+
+The transformation
+
+\[
+(X,Y) \rightarrow (F_X(X),F_Y(Y))
+\]
+
+changes the coordinate system but preserves the dependence structure.
+
+For a direct multivariate dependence summary in the package, `NNS.copula()` can be called on a matrix of variables:
+
+```r
+set.seed(123)
+z3 <- rnorm(length(x))
+
+NNS.copula(cbind(x, y, z3), plot = TRUE, independence.overlay = TRUE)
+## [1] 0.302
+```
+
+The return value is a single scalar in \([0,1]\) for the full multivariate system, where values closer to 0 indicate near-independence and values closer to 1 indicate stronger joint dependence. When needed, `continuous = TRUE` can be supplied to align with the continuous-CDF formulation used elsewhere in the package vignettes.
+
+---
+
+## Tail Dependence and Directional Moments
+
+Copula theory frequently focuses on **tail dependence**, which measures the probability that variables experience extreme outcomes simultaneously.
+
+Upper tail dependence is defined as
+
+\[
+\lambda_U
+=
+\lim_{u\to1^-}
+P(V>u \mid U>u)
+\]
+
+and lower tail dependence as
+
+\[
+\lambda_L
+=
+\lim_{u\to0^+}
+P(V\le u \mid U\le u).
+\]
+
+These limits exist for many commonly used copula families. In some cases, such as the Gaussian copula, both coefficients equal zero even when correlation is strong.
+
+Directional statistics provides a natural extension of this concept.
+
+Let
+
+\[
+t_X = Q_X(u), \qquad
+t_Y = Q_Y(u)
+\]
+
+denote quantile thresholds approaching the lower tail.
+
+The degree-zero co-partial moment is
+
+\[
+CoLPM_{0,0}(t_X,t_Y)
+=
+P(X\le t_X, Y\le t_Y).
+\]
+
+Then
+
+\[
+\frac{CoLPM_{0,0}(t_X,t_Y)}{P(X\le t_X)}
+=
+P(Y\le t_Y \mid X\le t_X).
+\]
+
+As \(u\to0\), this conditional probability converges to the copula lower tail dependence coefficient
+
+\[
+\lambda_L.
+\]
+
+This follows directly from the definition of tail dependence as the limit of conditional copula probabilities.
+
+Higher-order directional moments generalize this concept by weighting deviations within the tail region.
+
+---
+
+## Comparison with Classical Copula Models
+
+Copula analysis often relies on parametric families such as
+
+- Gaussian copulas
+- Clayton copulas
+- Gumbel copulas
+- Student-t copulas.
+
+These models impose specific functional forms on the dependence structure.
+
+Directional dependence differs in several important ways.
+
+### Nonparametric Structure
+
+Co-partial moments are estimated directly from the data and do not require specifying a copula family.
+
+### Sensitivity to Extreme Deviations
+
+Many classical copulas capture **tail coincidence probabilities** but ignore the magnitude of extreme events. For example, the Gaussian copula has zero tail dependence unless correlation is exactly one, a property that has surprised many practitioners.
+
+Directional moments avoid this limitation by measuring **deviation magnitude within tail regions**.
+
+### Benchmark Flexibility
+
+Copula analysis typically evaluates dependence at probability thresholds. Directional statistics instead allows benchmarks to be specified directly in the variable space.
+
+---
+
+## Multivariate Extension
+
+Copula theory extends naturally to higher dimensions.
+
+For variables
+
+\[
+X_1,\dots,X_d
+\]
+
+the joint distribution can be written
+
+\[
+F(x_1,\dots,x_d)
+=
+C(F_1(x_1),\dots,F_d(x_d)).
+\]
+
+Directional statistics extends similarly.
+
+Benchmarks \(t_1,\dots,t_d\) partition the sample space into directional regions across all variables. Each variable contributes two directional states (above or below its benchmark), producing \(2^d\) joint regions.
+
+Multivariate co-partial moments measure deviations within these regions. For example,
+
+\[
+E[(t_1-X_1)_+(t_2-X_2)_+(t_3-X_3)_+]
+\]
+
+captures joint downside deviations across three variables.
+
+In practice analysts often focus on specific regions of interest—such as the region where all variables fall below their benchmarks—rather than enumerating all \(2^d\) regions explicitly.
+
+---
+
+## Structural Interpretation
+
+Copulas separate **marginal distributions** from **dependence structure**.
+
+Directional statistics provides a complementary perspective:
+
+- Marginal distributions arise from **degree-zero partial moments**.
+- Dependence arises from **co-partial moments across directional regions**.
+
+Thus both frameworks describe the same joint structure from different viewpoints.
+
+Copulas emphasize **rank-based probability structure**, while directional moments emphasize **benchmark-relative deviations**.
+
+---
+
+## Summary
+
+This chapter connected directional dependence with copula theory.
+
+Key observations include:
+
+1. Copulas represent dependence in probability space.
+2. Probability transforms map observations into the unit square.
+3. Directional benchmarks partition copula space into four dependence regions.
+4. Degree-zero co-partial moments correspond to copula region probabilities.
+5. Higher-order co-partial moments generalize tail dependence by weighting extreme deviations.
+6. Directional methods provide a nonparametric and magnitude-sensitive interpretation of copula dependence.
+
+Directional partial moments therefore offer a natural bridge between benchmark-based statistics and rank-based dependence analysis.
+
+The next chapter develops conditional probability and Bayes' theorem from the partial-moment framework.
diff --git a/tools/NNS/book/chapter-13-conditional-probability-and-bayes-theorem.Rmd b/tools/NNS/book/chapter-13-conditional-probability-and-bayes-theorem.Rmd
new file mode 100644
index 0000000..6aa8a7a
--- /dev/null
+++ b/tools/NNS/book/chapter-13-conditional-probability-and-bayes-theorem.Rmd
@@ -0,0 +1,403 @@
+# Conditional Probability and Bayes' Theorem
+
+Chapters 9–11 established the directional framework for analyzing relationships between variables. A central missing ingredient for inference is **conditional probability**: how the probability of one event changes when information about another event becomes available.
+
+Classical statistics defines conditional probability through probability ratios. The directional framework provides a deeper interpretation: conditional probabilities arise directly from **degree-zero co-partial moments**, and Bayes' theorem follows as a simple algebraic identity within this structure. Moreover, degree-one co-partial moments go further — they are not merely risk metrics but **distributional generators** from which the full joint law can be recovered through differentiation.
+
+This chapter develops the directional formulation of conditional probability, derives Bayes' theorem from partial-moment relationships, and connects degree-zero and degree-one co-partial moments to establish their joint role in inference.
+
+---
+
+## Classical Conditional Probability
+
+Let \(A\) and \(B\) be events with \(P(B) > 0\).
+
+The classical definition of conditional probability is
+
+\[
+P(A \mid B) = \frac{P(A \cap B)}{P(B)}.
+\]
+
+The numerator represents the probability that both events occur simultaneously. The denominator represents the probability that the conditioning event occurs.
+
+Conditional probability therefore measures the **relative frequency of \(A\) within the subset of outcomes where \(B\) occurs**.
+
+This definition leads directly to the **multiplication rule**
+
+\[
+P(A \cap B) = P(A \mid B)\,P(B).
+\]
+
+The directional framework reproduces these relationships naturally through partial moments — and in doing so, reveals their geometric origin in the joint distribution.
+
+---
+
+## Events as Degree-Zero Partial Moments
+
+From Chapter 3, the cumulative distribution function can be written as a degree-zero lower partial moment:
+
+\[
+F_X(t) = L_0(t;\,X) = P(X \le t).
+\]
+
+Events defined by inequalities correspond directly to degree-zero partial moments. For two variables \(X\) and \(Y\),
+
+\[
+P(X \le t_X) = L_0(t_X;\,X), \qquad P(Y \le t_Y) = L_0(t_Y;\,Y).
+\]
+
+Symmetrically, the survival function from Chapter 3 gives
+
+\[
+P(X > t_X) = U_0(t_X;\,X), \qquad P(Y > t_Y) = U_0(t_Y;\,Y).
+\]
+
+The joint events across all four quadrants defined by benchmarks \((t_X, t_Y)\) correspond to the four **degree-zero co-partial moments**:
+
+\[
+\mathrm{CoLPM}_{0,0}(t_X,t_Y) = P(X \le t_X,\; Y \le t_Y),
+\]
+
+\[
+\mathrm{CoUPM}_{0,0}(t_X,t_Y) = P(X > t_X,\; Y > t_Y),
+\]
+
+\[
+\mathrm{DLPM}_{0,0}(t_X,t_Y) = P(X \le t_X,\; Y > t_Y),
+\]
+
+\[
+\mathrm{DUPM}_{0,0}(t_X,t_Y) = P(X > t_X,\; Y \le t_Y).
+\]
+
+The concordant moments, CoLPM and CoUPM, capture joint movement in the same directional region. The **divergent moments**, DLPM and DUPM, capture the two cross-quadrant regions where \(X\) and \(Y\) move in opposite directions relative to their benchmarks — directly parallel to the divergent co-partial moment structure developed in Chapter 10. All four are degree-zero specializations of the general co-partial moment framework.
+
+---
+
+## The Four-Quadrant Probability Partition
+
+Benchmarks \((t_X, t_Y)\) partition the joint distribution into four mutually exclusive regions:
+
+| | \(Y \le t_Y\) | \(Y > t_Y\) |
+|------------------|----------------------|----------------------|
+| \(X \le t_X\) | CoLPM\(_{0,0}\) | DLPM\(_{0,0}\) |
+| \(X > t_X\) | DUPM\(_{0,0}\) | CoUPM\(_{0,0}\) |
+
+Because these four regions partition the joint distribution completely,
+
+\[
+\mathrm{CoLPM}_{0,0}
++ \mathrm{CoUPM}_{0,0}
++ \mathrm{DLPM}_{0,0}
++ \mathrm{DUPM}_{0,0}
+= 1.
+\]
+
+This is the **degree-zero partition of unity**: each observation contributes exactly one unit of probability mass to exactly one quadrant. It is a complete nonparametric probability representation of the joint distribution relative to any pair of benchmarks.
+
+In NNS R-package notation:
+\[
+\begin{aligned}
+1 &= \texttt{Co.UPM}(0,X,Y,t_X,t_Y)
+ + \texttt{D.UPM}(0,0,X,Y,t_X,t_Y) \\
+ &\quad + \texttt{D.LPM}(0,0,X,Y,t_X,t_Y)
+ + \texttt{Co.LPM}(0,X,Y,t_X,t_Y).
+\end{aligned}
+\]
+
+where the four terms correspond respectively to
+\(P(X>t_X,\,Y>t_Y)\),
+\(P(X>t_X,\,Y\le t_Y)\),
+\(P(X\le t_X,\,Y>t_Y)\), and
+\(P(X\le t_X,\,Y\le t_Y)\).
+Conditional probabilities are simply **relative weights of these regions after conditioning on one of the marginals**.
+
+---
+
+## Conditional Probability from Co-Partial Moments
+
+All eight conditional probabilities arising from the four-quadrant partition can be expressed as ratios of a co-partial moment to a marginal partial moment. We organize them by quadrant.
+
+## Concordant lower-tail conditioning
+
+\[
+P(Y \le t_Y \mid X \le t_X)
+= \frac{\mathrm{CoLPM}_{0,0}(t_X,t_Y)}{L_0(t_X;\,X)}, \qquad
+P(X \le t_X \mid Y \le t_Y)
+= \frac{\mathrm{CoLPM}_{0,0}(t_X,t_Y)}{L_0(t_Y;\,Y)}.
+\]
+
+## Concordant upper-tail conditioning
+
+\[
+P(Y > t_Y \mid X > t_X)
+= \frac{\mathrm{CoUPM}_{0,0}(t_X,t_Y)}{U_0(t_X;\,X)}, \qquad
+P(X > t_X \mid Y > t_Y)
+= \frac{\mathrm{CoUPM}_{0,0}(t_X,t_Y)}{U_0(t_Y;\,Y)}.
+\]
+
+## Discordant conditioning
+
+\[
+P(Y > t_Y \mid X \le t_X)
+= \frac{\mathrm{DLPM}_{0,0}(t_X,t_Y)}{L_0(t_X;\,X)}, \qquad
+P(Y \le t_Y \mid X > t_X)
+= \frac{\mathrm{DUPM}_{0,0}(t_X,t_Y)}{U_0(t_X;\,X)}.
+\]
+
+Each formula follows from the same logic: the joint probability of the relevant quadrant divided by the marginal probability of the conditioning event. Together, these eight expressions **complete the four-quadrant conditional probability picture** — every conditional probability involving thresholds on \(X\) and \(Y\) is a ratio of a degree-zero co-partial moment to a marginal degree-zero partial moment.
+
+In NNS notation, letting \(A = \{X > t_X\}\) and \(B = \{Y > t_Y\}\):
+
+\[
+P(A) = \texttt{UPM}(0,t_X,X), \qquad
+P(B) = \texttt{UPM}(0,t_Y,Y),
+\]
+
+\[
+P(B \mid A) = \frac{\texttt{Co.UPM}(0,X,Y,t_X,t_Y)}{\texttt{UPM}(0,t_X,X)}, \qquad
+P(A \mid B) = \frac{\texttt{Co.UPM}(0,X,Y,t_X,t_Y)}{\texttt{UPM}(0,t_Y,Y)}.
+\]
+
+---
+
+## Bayes' Theorem
+
+Bayes' theorem describes how conditional probabilities relate when the conditioning direction is reversed.
+
+Starting from the multiplication rule,
+
+\[
+P(A \cap B) = P(A \mid B)\,P(B) = P(B \mid A)\,P(A).
+\]
+
+Equating the two expressions and solving for \(P(A \mid B)\) yields Bayes' theorem:
+
+\[
+P(A \mid B)
+=
+\frac{P(B \mid A)\,P(A)}{P(B)}.
+\]
+
+This identity allows probabilities to be updated when new information becomes available. The directional framework reveals that this is not merely an algebraic manipulation of probability ratios — it is a direct consequence of the symmetry of co-partial moments.
+
+---
+
+## Bayes' Theorem from Partial Moments
+
+Using the directional framework, Bayes' theorem follows immediately from co-partial moment identities.
+
+## Lower-tail derivation
+
+Let \(A = \{X \le t_X\}\) and \(B = \{Y \le t_Y\}\). From Section 13.4,
+
+\[
+P(B \mid A) = \frac{\mathrm{CoLPM}_{0,0}(t_X,t_Y)}{L_0(t_X;\,X)}, \qquad
+P(A \mid B) = \frac{\mathrm{CoLPM}_{0,0}(t_X,t_Y)}{L_0(t_Y;\,Y)}.
+\]
+
+Rearranging the first equation gives
+
+\[
+\mathrm{CoLPM}_{0,0}(t_X,t_Y) = P(B \mid A)\,L_0(t_X;\,X).
+\]
+
+Substituting into the second equation,
+
+\[
+P(A \mid B)
+= \frac{P(B \mid A)\,L_0(t_X;\,X)}{L_0(t_Y;\,Y)}.
+\]
+
+Since \(L_0(t_X;\,X) = P(A)\) and \(L_0(t_Y;\,Y) = P(B)\),
+
+\[
+\boxed{P(A \mid B) = \frac{P(B \mid A)\,P(A)}{P(B)}.}
+\]
+
+## Upper-tail derivation
+
+The identical derivation holds in the upper region. Let \(A = \{X > t_X\}\) and \(B = \{Y > t_Y\}\). Replacing CoLPM with CoUPM and \(L_0\) with \(U_0\) throughout,
+
+\[
+P(B \mid A) = \frac{\mathrm{CoUPM}_{0,0}(t_X,t_Y)}{U_0(t_X;\,X)}, \qquad
+P(A \mid B) = \frac{\mathrm{CoUPM}_{0,0}(t_X,t_Y)}{U_0(t_Y;\,Y)},
+\]
+
+which yields the same Bayes identity by the same algebra. **Bayes' theorem holds symmetrically across all four directional regions**, reflecting the structural symmetry of the co-partial moment framework rather than any special property of the lower tail.
+
+---
+
+## Posterior Probability Interpretation
+
+Bayesian inference interprets probabilities as quantities that update when new information becomes available.
+
+Let
+
+- \(P(A)\) be the **prior probability**,
+- \(P(B \mid A)\) the **likelihood**,
+- \(P(A \mid B)\) the **posterior probability**.
+
+Within the directional framework:
+
+- **priors** correspond to marginal degree-zero partial moments — the probability mass of a directional region before conditioning,
+- **likelihoods** correspond to conditional probabilities derived from co-partial moments — how that mass concentrates when the other variable is observed,
+- **posteriors** represent **updated directional probabilities after conditioning** — the renormalized weight of one quadrant given information from another.
+
+Bayesian updating therefore corresponds to **redistributing probability weight across the four directional regions of the joint distribution**. The four-quadrant partition of Section 13.3 is the geometric object being operated on; Bayes' theorem is the renormalization rule.
+
+---
+
+## Example
+
+Suppose a dataset contains observations of two variables \(X\) and \(Y\). Let the benchmarks be
+
+\[
+t_X = 0, \qquad t_Y = 0.
+\]
+
+Assume empirical probabilities are
+
+\[
+P(X \le 0) = 0.4, \qquad P(Y \le 0) = 0.5, \qquad P(X \le 0,\; Y \le 0) = 0.3.
+\]
+
+Then
+
+\[
+P(Y \le 0 \mid X \le 0) = \frac{0.3}{0.4} = 0.75, \qquad
+P(X \le 0 \mid Y \le 0) = \frac{0.3}{0.5} = 0.6.
+\]
+
+Applying Bayes' theorem as a check:
+
+\[
+P(X \le 0 \mid Y \le 0)
+= \frac{P(Y \le 0 \mid X \le 0)\,P(X \le 0)}{P(Y \le 0)}
+= \frac{0.75 \times 0.4}{0.5} = 0.6. \checkmark
+\]
+
+The directional framework identifies \(\mathrm{CoLPM}_{0,0}(0,0) = 0.3\) as the probability mass in the joint lower-left region. Conditional probabilities are relative frequencies within that region, computed directly from partial-moment ratios without distributional assumptions.
+
+---
+
+## The Degree-One Extension: Co-Partial Moments as Distributional Generators
+
+The analysis so far has operated at degree zero, where co-partial moments are indicator-level probability masses. A natural question arises: what additional structure is carried by **degree-one co-partial moments**?
+
+## The hinge surface
+
+Define the **degree-one lower co-partial moment surface**
+
+\[
+H(t_X, t_Y) = E\!\bigl[(t_X - X)_+\,(t_Y - Y)_+\bigr].
+\]
+
+This replaces indicator contributions with **hinge magnitudes** — continuous functions of how far each variable falls below its benchmark. Unlike degree-zero moments, which record whether an observation lands in a quadrant, degree-one moments record **how far into that quadrant it lies**.
+
+## Continuous partition of unity
+
+Degree-one co-partial moments form a continuous partition of unity over the same four-quadrant geometry as degree zero. Defining concordant and divergent degree-one quantities
+
+\[
+C^{--}(t_X,t_Y) = E[(t_X-X)_+(t_Y-Y)_+], \quad
+C^{++}(t_X,t_Y) = E[(X-t_X)_+(Y-t_Y)_+],
+\]
+
+\[
+D^{+-}(t_X,t_Y) = E[(X-t_X)_+(t_Y-Y)_+], \quad
+D^{-+}(t_X,t_Y) = E[(t_X-X)_+(Y-t_Y)_+],
+\]
+
+and total magnitude \(S = C^{--} + C^{++} + D^{+-} + D^{-+}\), the normalized weights
+
+\[
+w^{--} = \frac{C^{--}}{S}, \quad w^{++} = \frac{C^{++}}{S}, \quad w^{+-} = \frac{D^{+-}}{S}, \quad w^{-+} = \frac{D^{-+}}{S}
+\]
+
+satisfy \(w^{--} + w^{++} + w^{+-} + w^{-+} = 1\) with all weights non-negative whenever \(S > 0\). The case \(S = 0\) is degenerate: since all four hinge products are non-negative, \(S = 0\) implies each term is zero, which in turn requires that for every observation, \(X = t_X\) or \(Y = t_Y\) (or both). This is a measure-zero event under any absolutely continuous distribution but can arise in discrete data; in practice one simply avoids placing benchmarks at point masses. In the limit as degree approaches zero, the normalized weights collapse to the hard quadrant probabilities of Section 13.3.
+
+## Distributional recovery
+
+The hinge surface carries more information than its degree-zero counterpart. The following result shows that \(H\) is a complete representation of the joint law.
+
+**Theorem** (Distributional Recovery). Assume \((X, Y)\) is integrable and differentiation under the expectation is valid (e.g., by dominated convergence). Then at all continuity points of the joint CDF \(F_{X,Y}\),
+
+\[
+\frac{\partial^2 H}{\partial t_X\,\partial t_Y}(t_X, t_Y) = F_{X,Y}(t_X, t_Y).
+\]
+
+If \(F_{X,Y}\) is absolutely continuous with sufficiently smooth density \(f_{X,Y}\), then
+
+\[
+\frac{\partial^4 H}{\partial t_X^2\,\partial t_Y^2}(t_X, t_Y) = f_{X,Y}(t_X, t_Y).
+\]
+
+Consequently, \(H(\cdot,\cdot)\) over all threshold pairs uniquely determines the joint law and, when it exists, the joint density. The qualification "at all continuity points of \(F_{X,Y}\)" is essential: for discrete distributions, the CDF has jump discontinuities and the derivative identities hold only at points where \(F_{X,Y}\) is continuous.
+
+**Proof sketch.** To justify differentiation under the expectation, assume there exists an integrable envelope dominating the local difference quotients of the hinge terms in a neighborhood of \((t_X,t_Y)\); this is exactly the dominated-convergence qualification in the theorem statement. Using \(\partial_{t_X}(t_X - X)_+ = \mathbf{1}\{X \le t_X\}\) and \(\partial_{t_Y}(t_Y - Y)_+ = \mathbf{1}\{Y \le t_Y\}\) almost everywhere,
+
+\[
+\frac{\partial H}{\partial t_X}(t_X,t_Y) = E\!\bigl[\mathbf{1}\{X \le t_X\}(t_Y - Y)_+\bigr],
+\]
+
+and therefore
+
+\[
+\frac{\partial^2 H}{\partial t_X\,\partial t_Y}(t_X,t_Y)
+= E\!\bigl[\mathbf{1}\{X \le t_X\}\,\mathbf{1}\{Y \le t_Y\}\bigr]
+= P(X \le t_X,\; Y \le t_Y)
+= F_{X,Y}(t_X,t_Y). \qquad\square
+\]
+
+The surface \(H\) is directly estimable from data by averaging hinge products. Mixed second derivatives recover the joint CDF numerically via finite differences; further differentiation recovers the density, though this is noisier due to higher-order amplification of sampling variation.
+
+See [Discrete and Continuous Bayes](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/discrete_and_continuous_bayes.md) for a worked example.
+
+
+
+## Hierarchy across degrees
+
+This result establishes a natural hierarchy:
+
+- **Degree 0**: Indicator-level probability partition — the four-quadrant decomposition of Sections 13.3–13.6.
+- **Degree 1**: Continuous hinge partition of unity and complete distributional recovery — the minimal degree at which partial moments become full distributional generators.
+- **Higher degrees**: Tail-emphasized continuous partitions that place increasing weight on extreme deviations from the benchmark. These are valuable for risk analysis in benchmark-relative tail analysis, but do not increase representational completeness beyond what degree one already provides. Once the full joint law is recovered, higher degrees refine which parts of the distribution are emphasized, not what is represented.
+
+Thus **degree one is the completeness threshold**: the minimal order at which the full joint law is captured.
+
+---
+
+## Partial Moments as a Bridge Between Bayesian and Frequentist Inference
+
+A deeper consequence of the distributional recovery theorem is that partial moments provide a **law-invariant bridge** between Bayesian and frequentist statistical frameworks.
+
+A functional is **law-invariant** if its value depends only on the distribution of the random variable, not on the specific probability space or the process that generated it. Partial moments are law-invariant in precisely this sense: \(L_n(t;\,X)\) and \(U_n(t;\,X)\) depend only on the distribution of \(X\), not on how that distribution was constructed.
+
+The central distinction between Bayesian and frequentist perspectives is how the probability measure \(P\) is constructed: Bayesians form a **posterior predictive distribution** by updating a prior with data; frequentists approximate the **data-generating measure** directly with the empirical distribution. Both pipelines ultimately produce a probability measure for outcomes \(X\), and once that measure is specified, partial-moment operators act on it identically.
+
+Formally:
+
+- **Bayesian path**: prior \(\pi(\theta)\) → likelihood \(L(D \mid \theta)\) → posterior \(\pi(\theta \mid D)\) → posterior predictive \(P_B = \int P_\theta\,\pi(d\theta \mid D)\) → compute \(L_n(t;\,X)\), \(U_n(t;\,X)\) with \(X \sim P_B\).
+
+- **Frequentist path**: empirical law \(\hat{P}_n = \frac{1}{n}\sum_{i=1}^n \delta_{X_i}\) → compute \(L_n(t;\,X)\), \(U_n(t;\,X)\) with \(X \sim \hat{P}_n\).
+
+Because both pipelines reduce to the same partial-moment operators applied to different input distributions, **any two models that agree on the distribution of \(X\) will produce identical partial moments** — not just the normalized degree-one weights, but all partial moments of all degrees. The formula stays the same; only the input distribution changes. This is the formal sense in which partial moments are a practical lingua franca between Bayesian and frequentist workflows.
+
+The directional formulation of Bayes' theorem in Sections 13.5–13.6 is therefore paradigm-agnostic: it holds whether the joint distribution is constructed from a posterior predictive, an empirical distribution, or any other probability measure fed into the co-partial moment operators.
+
+---
+
+## Summary
+
+Conditional probability and Bayes' theorem arise naturally and completely from the partial-moment framework. Key results include:
+
+- Degree-zero partial moments represent **probabilities of directional events**, recovering the CDF and survival function as special cases.
+- Degree-zero co-partial moments represent **joint event probabilities** and partition the joint distribution into four mutually exclusive regions — two concordant (CoLPM, CoUPM) and two divergent (DLPM, DUPM) — summing to one.
+- **All eight conditional probabilities** from the four-quadrant partition are ratios of a degree-zero co-partial moment to a marginal partial moment.
+- **Bayes' theorem follows directly from co-partial moment identities**, holds symmetrically in both the lower and upper tails, and requires no distributional assumptions.
+- **Bayesian updating corresponds to renormalizing the four-quadrant probability partition** after conditioning on one marginal.
+- **Degree-one co-partial moments are distributional generators**: the mixed second derivative of the hinge surface recovers the joint CDF at all continuity points, and the mixed fourth derivative recovers the joint density when it exists. Degree one is the completeness threshold.
+- **Partial moments are law-invariant**: they depend only on the induced distribution of \(X\), making them identical across Bayesian and frequentist pipelines whenever those pipelines agree on the distribution of outcomes.
+
+The next chapter extends these conditional-probability tools to directional causation — asking not merely how probability mass is distributed across variables, but which variable is doing the driving.
diff --git a/tools/NNS/book/chapter-14-directional-causation.Rmd b/tools/NNS/book/chapter-14-directional-causation.Rmd
new file mode 100644
index 0000000..ab0080d
--- /dev/null
+++ b/tools/NNS/book/chapter-14-directional-causation.Rmd
@@ -0,0 +1,305 @@
+---
+output:
+ pdf_document: default
+ html_document: default
+---
+# Directional Causation
+
+Chapters 9–11 developed the directional framework for measuring dependence between variables. Co-partial moments were shown to capture asymmetric, nonlinear, and tail-specific joint behavior that classical correlation obscures. The copula interpretation then demonstrated how dependence structure can be separated from marginal distributions entirely.
+
+Dependence alone, however, does not imply **causation**. Building on the conditional-probability and Bayes machinery from the previous chapter, we now turn to directional influence.
+Two variables may move together because
+
+- one variable influences the other,
+- both are driven by a common underlying factor, or
+- the relationship arises from structural constraints in the system.
+
+Identifying the direction and strength of causal influence requires more than a symmetric dependence measure. It requires a framework that can detect **which variable is doing the driving**.
+
+Classical approaches to this problem rely on **Granger causality**, which uses parametric time-series regressions to test whether lagged values of one variable improve linear predictions of another. The directional framework offers a different approach: causal structure is inferred from **nonlinear probability relationships between variables after removing each variable's internal dynamics**, without imposing a parametric model.
+
+This chapter develops the **directional causation framework** in three stages: removing internal dynamics through nonlinear lag normalization, placing residual signals on a shared scale through joint rangespace normalization, and measuring causal influence through partial-moment-based conditional probability and asymmetric directional dependence.
+
+---
+
+## Limitations of Classical Granger Causality
+
+In classical time-series analysis, a variable \(X\) is said to *Granger-cause* \(Y\) if past values of \(X\) improve prediction of \(Y\) beyond what \(Y\)'s own history provides.
+
+A typical vector autoregressive model takes the form
+
+\[
+Y_t =
+\sum_{i=1}^{p} a_i Y_{t-i}
++
+\sum_{i=1}^{p} b_i X_{t-i}
++
+\varepsilon_t .
+\]
+
+If the coefficients \(b_i\) are jointly significant, \(X\) is said to Granger-cause \(Y\).
+
+Granger causality captures a genuine and important insight: the causal role of \(X\) in \(Y\) should only be assessed after conditioning on \(Y\)'s own past. The directional framework retains this principle. What changes is how that conditioning is implemented — through nonlinear normalization rather than linear regression — and what the causal evidence consists of — conditional probability and asymmetric dependence rather than regression coefficients.
+
+The parametric Granger approach carries four limitations that are familiar from earlier chapters.
+
+**Linear specification.** The causal relationship between variables is constrained to be linear. Nonlinear effects, including the simple case where \(X\) drives \(Y = X^2\), are invisible to the regression coefficients even when the causal relationship is strong and unambiguous.
+
+**Symmetric aggregation.** Regression models aggregate deviations symmetrically around the mean. Asymmetric causal effects — where large upward movements in \(X\) drive movements in \(Y\) but small movements do not — are absorbed into the residual.
+
+**Distributional assumptions.** Inference relies on parametric assumptions about the error distribution.
+
+**Model dependence.** Results are sensitive to lag length selection, variable inclusion, and specification choices that the investigator must make before seeing the data.
+
+The directional causation framework addresses all four by working entirely within the partial-moment machinery developed in Chapters 2–11.
+
+---
+
+## Theoretical Foundations: Three Axioms
+
+Before developing the method, it is useful to state what a causation measure should satisfy. Three axioms motivate the construction.
+
+**Axiom 1 — Self-causation exclusion.** No variable should be identified as causing itself. When \(X = Y\), the causal measure should return a symmetric result indicating no net directional influence, and the diagonal of the causation matrix should be zero.
+
+**Axiom 2 — Nonlinear causation detection.** The measure should accurately identify causal relationships that are nonlinear and directional. A functional relationship \(Y = f(X)\) should register positive causal influence from \(X\) to \(Y\) regardless of whether \(f\) is linear, quadratic, or otherwise nonmonotone.
+
+**Axiom 3 — Directionality proportionality.** Causal strength should scale with the degree of functional asymmetry between the two variables. When \(X\) strongly determines \(Y\) but \(Y\) only weakly constrains \(X\), the statistic should reflect this imbalance clearly and in a stable, interpretable way.
+
+These axioms motivate a measure built from two components: a **conditional probability** that captures whether movements in one variable constrain the range of the other, and an **asymmetric dependence** measure that captures whether those movements are directionally aligned. Neither component alone satisfies all three axioms; together they do.
+
+---
+
+## Lagged Co-Partial Moments
+
+The partial-moment framework extends naturally to temporal relationships. This extension provides the conceptual foundation for the lag-normalization step that follows.
+
+Let \(\{X_t\}\) and \(\{Y_t\}\) be two time series with benchmarks \(t_X\) and \(t_Y\). For a lag \(\tau \ge 0\), define the **lagged co-partial moments**
+
+\[
+\text{CoLPM}_{r,s}^{(\tau)}(X,Y)
+= E\!\Bigl[(t_X - X_{t-\tau})_+^r\,(t_Y - Y_t)_+^s\Bigr]
+\]
+
+\[
+\text{CoUPM}_{r,s}^{(\tau)}(X,Y)
+= E\!\Bigl[(X_{t-\tau} - t_X)_+^r\,(Y_t - t_Y)_+^s\Bigr]
+\]
+
+where \((\cdot)_+ = \max(\cdot,0)\) is the positive-part operator from Chapter 2.
+
+When \(\tau = 0\) these reduce exactly to the contemporaneous co-partial moments from Chapter 10. The lagged versions introduce a temporal asymmetry: because the roles of \(X\) and \(Y\) are evaluated at different time points, exchanging \(X\) and \(Y\) does not produce the same quantity:
+
+\[
+\text{CoUPM}_{r,s}^{(\tau)}(X,Y) \neq \text{CoUPM}_{r,s}^{(\tau)}(Y,X)
+\]
+
+in general. This asymmetry is the partial-moment foundation of directional causation. \(\text{CoUPM}_{r,s}^{(\tau)}\) is large when upward deviations of \(X\) at time \(t-\tau\) tend to be followed by upward deviations of \(Y\) at time \(t\). \(\text{CoLPM}_{r,s}^{(\tau)}\) captures the same co-movement in the downward direction.
+
+Like their contemporaneous counterparts from Chapter 10, lagged co-partial moments are estimated directly from sample data by replacing the expectation with an empirical average over the \(n - \tau\) available observation pairs:
+
+\[
+\widehat{\text{CoUPM}}_{r,s}^{(\tau)}
+=
+\frac{1}{n-\tau}
+\sum_{t=\tau+1}^{n}
+(x_{t-\tau} - t_X)_+^r\,(y_t - t_Y)_+^s.
+\]
+
+These estimators converge to their population values by the law of large numbers.
+
+The causation statistic developed below operationalizes this lagged structure: first removing the self-driven component of each variable, then measuring the residual cross-variable co-movement through conditional probability and asymmetric dependence.
+
+---
+
+## Removing Internal Dynamics
+
+The first computational step separates each variable's **internal temporal dynamics** from its interaction with the other variable.
+
+This is the nonparametric analogue of pre-whitening in classical time-series analysis. In the Granger framework, pre-whitening is accomplished by including the variable's own lags in the regression. In the directional framework, it is accomplished through **nonlinear lag normalization**.
+
+Let \(\tau\) denote the lag order. Form the lag matrix for \(X\):
+
+\[
+\mathbf{X}_\tau = \bigl[X_t,\; X_{t-1},\; \dots,\; X_{t-\tau}\bigr]
+\]
+
+and analogously \(\mathbf{Y}_\tau\) for \(Y\). Apply joint normalization within each lag matrix:
+
+\[
+X_t^{*} = \texttt{NNS.norm}(\mathbf{X}_\tau)[\,\cdot\,,1]
+\qquad
+Y_t^{*} = \texttt{NNS.norm}(\mathbf{Y}_\tau)[\,\cdot\,,1]
+\]
+
+where \(\texttt{NNS.norm}(\cdot)\) implements the empirical CDF transformation — mapping each observation to its relative rank position within the column — a direct application of the degree-zero partial moment \(L_0(t;X) = P(X \le t)\) from Chapter 3. The first column of each normalized matrix gives the representation of the current observation relative to its own lag structure.
+
+This normalization step has a direct partial-moment interpretation. Mapping each observation through the empirical CDF of its lagged neighborhood positions it on the uniform \([0,1]\) scale relative to its own history. The resulting \(X_t^{*}\) represents the **relative standing of the current observation within its own temporal context** — precisely the information that a Granger regression extracts through linear projection, but without imposing a linear model.
+
+Any remaining association between \(X_t^{*}\) and \(Y_t^{*}\) therefore reflects cross-variable interaction rather than self-dependence.
+
+When \(\tau = 0\), the lag-normalization step is skipped and the raw variables are passed directly to the joint normalization step. This corresponds to the cross-sectional case where temporal ordering carries no information. The lag order \(\tau\) may be specified directly, or set to \(\tau = \texttt{"ts"}\) for automatic selection via the detected seasonality of each series using \(\texttt{NNS.seas}\).
+
+---
+
+## Joint Rangespace Normalization
+
+Once internal dynamics have been removed, the lag-adjusted variables \(X_t^{*}\) and \(Y_t^{*}\) must be placed on a **common scale** before their interaction can be meaningfully measured.
+
+Because the two series may differ in scale, units, and distributional shape, direct comparison of the lag-normalized values can be misleading. The solution draws on the same degree-zero partial moment used in the previous step. From Chapter 3, the degree-zero lower partial moment equals the empirical CDF:
+
+\[
+L_0(t; X) = P(X \le t).
+\]
+
+Mapping both variables jointly through their empirical CDFs places them on a shared \([0,1]\) scale while preserving their relative positions within the joint distribution. Chapter 10 showed that this is precisely the probability integral transform that defines copula space.
+
+The directional framework applies this idea jointly to the lag-adjusted variables:
+
+\[
+\bigl[X_t^{**},\; Y_t^{**}\bigr]
+=
+\texttt{NNS.norm}\!\bigl(\bigl[X_t^{*},\; Y_t^{*}\bigr]\bigr).
+\]
+
+The resulting \(X_t^{**}\) and \(Y_t^{**}\) are copula-like transforms of the lag-adjusted series. Their degree-zero partial moments are approximately uniformly distributed on \([0,1]\), connecting to the copula interpretation of Chapter 10. All subsequent probability and dependence calculations are therefore performed on variables that are simultaneously free of internal dynamics and free of scale differences.
+
+---
+
+## Conditional Probability via Partial Moments
+
+With both variables on a shared rangespace, the conditional probability that movements in \(X\) constrain the distribution of \(Y\) can now be measured directly using partial moments.
+
+The **partial-moment conditional probability** is defined as the fraction of \(X^{**}\)'s mass that falls within the observed support of \(Y^{**}\):
+
+\[
+P(X^{**} \mid Y^{**})
+=
+1
+-
+\Bigl[
+L_1\!\bigl(\min(Y^{**});\; X^{**}\bigr)_{\text{ratio}}
++
+U_1\!\bigl(\max(Y^{**});\; X^{**}\bigr)_{\text{ratio}}
+\Bigr]
+\]
+
+where the degree-one ratio forms are
+
+\[
+L_r(t; X)_{\text{ratio}}
+=
+\frac{L_r(t; X)}{L_r(t; X) + U_r(t; X)},
+\qquad
+U_r(t; X)_{\text{ratio}}
+=
+\frac{U_r(t; X)}{L_r(t; X) + U_r(t; X)}.
+\]
+
+When \(L_r(t;X) + U_r(t;X) = 0\), indicating no mass on either side of \(t\), both ratios are defined as zero.
+
+The first subtracted term, \(L_1(\min(Y^{**}); X^{**})_{\text{ratio}}\), measures the proportion of \(X^{**}\) mass lying **below the lower bound** of \(Y^{**}\)'s support. The second, \(U_1(\max(Y^{**}); X^{**})_{\text{ratio}}\), measures the proportion lying **above the upper bound**. Subtracting both tails from one yields the probability that a randomly drawn value of \(X^{**}\) falls within the range occupied by \(Y^{**}\) — a measure of **distributional co-occupancy** grounded entirely in the partial-moment calculus developed in Chapters 2–4.
+
+This measure is not symmetric: \(P(X^{**} \mid Y^{**}) \neq P(Y^{**} \mid X^{**})\) in general, because the support ranges of \(X^{**}\) and \(Y^{**}\) after joint normalization need not be identical. It requires no kernel bandwidth, no distributional assumption, and no parametric model.
+
+---
+
+## Asymmetric Directional Dependence
+
+Conditional probability alone does not establish the **direction** of co-movement. Two variables may overlap substantially in range while moving in opposite directions, or one may respond to the other only in extreme regions.
+
+To capture directional alignment, the framework uses the asymmetric dependence measures from Chapter 10. From the directional co-partial moment structure, the dependence of \(Y\) on \(X\) and the dependence of \(X\) on \(Y\) need not be equal after joint normalization:
+
+\[
+\rho_{X^{**} \to Y^{**}} \neq \rho_{Y^{**} \to X^{**}}.
+\]
+
+These are computed from the asymmetric directional dependence matrix of the jointly normalized variables — exactly the structure developed in Chapter 10 — using \(\texttt{NNS.dep}(\cdot, \texttt{asym} = \texttt{TRUE})\).
+
+Implementation detail: `asym = TRUE` turns on asymmetric dependence, but **direction is determined by argument order**. In practice, `NNS.dep(x, y, asym = TRUE)` and `NNS.dep(y, x, asym = TRUE)` generally differ; the first quantifies directional dependence of `y` on `x`, while the second quantifies the reverse direction.
+
+Define the **excess directional dependence** of \(Y\) on \(X\) as
+
+\[
+\Delta\rho = \rho_{X^{**} \to Y^{**}} - \rho_{Y^{**} \to X^{**}}.
+\]
+
+When \(\Delta\rho > 0\), movements in \(X\) are more closely tracked by \(Y\) than the reverse — a second and independent signature of causal flow from \(X\) to \(Y\) beyond what conditional overlap alone captures. When \(\Delta\rho \le 0\), this component contributes nothing to the \(X \to Y\) direction.
+
+The use of asymmetric directional dependence here — rather than Pearson correlation — is a direct consequence of Chapter 10. Joint normalization places the variables in copula space, but copula-space variables still exhibit asymmetric tail co-movements that the Chapter 10 framework is designed to detect. Classical symmetric correlation would average away exactly the directional asymmetry that makes the causation statistic informative.
+
+---
+
+## The Raw Directional Causation Statistic
+
+The two components — conditional probability and asymmetric directional dependence — are combined into a single statistic.
+
+The **raw directional causation value** from \(X\) to \(Y\) is
+
+\[
+\tilde{C}_{X \to Y}
+=
+\frac{1}{2}
+\Bigl[
+P(X^{**} \mid Y^{**})
++
+\max\!\bigl(\Delta\rho, 0\bigr)
+\Bigr].
+\]
+
+The first term rewards directional overlap in support after lag and copula-style normalization. The second term adds only the positive directional asymmetry in dependence, so reverse-direction dominance is not allowed to inflate the \(X \to Y\) score.
+
+By construction, \(\tilde{C}_{X \to Y}\in[0,1]\) in standard empirical settings: both components are bounded in \([0,1]\), and the outer factor \(1/2\) averages them.
+
+---
+
+## Bidirectional Normalization
+
+Directional causation should be interpreted comparatively, not in isolation. Define the reverse-direction raw score \(\tilde{C}_{Y \to X}\) by swapping \(X\) and \(Y\) in the same construction.
+
+The final **directional causation index** from \(X\) to \(Y\) is then normalized as
+
+\[
+C_{X \to Y}
+=
+\frac{\tilde{C}_{X \to Y}}{\tilde{C}_{X \to Y}+\tilde{C}_{Y \to X}},
+\qquad
+C_{Y \to X}=1-C_{X \to Y}.
+\]
+
+When both raw scores are zero, neither direction shows measurable directional-causation signal and both normalized values are defined as zero.
+
+Interpretation:
+
+- \(C_{X \to Y} > 0.5\): stronger evidence for \(X\) leading \(Y\),
+- \(C_{X \to Y} < 0.5\): stronger evidence for \(Y\) leading \(X\),
+- \(C_{X \to Y} \approx 0.5\): weak directional asymmetry.
+
+---
+
+## Summary
+
+This chapter formalized directional causation as a three-stage construction: lag normalization to remove self-dynamics, joint rangespace normalization to align scales, and asymmetric directional scoring to detect net directional flow.
+
+The resulting statistic remains nonparametric, benchmark-relative, and distribution-aware, while avoiding linear-model restrictions of classical Granger-style tests.
+
+For an applied macroeconomic example using `NNS.nowcast`, `NNS.caus`, noise benchmarks, and strength-of-inference summaries, see [Causal Inference Amongst Macroeconomic Variables Using NNS](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Causal_Inference_Amongst_Macroeconomic_Variables_Using_NNS.html).
+
+The next chapter develops distribution comparison methods, providing nonparametric tools for directly comparing distributions without parametric assumptions.
+
+
+```{r invisible_marker_ch13, echo=FALSE, results='asis'}
+name <- "OVVO Labs Registrant"
+date <- Sys.Date()
+
+cat(sprintf(
+"\\begingroup
+\\color{white}
+\\centering
+\\vspace*{0.45\\textheight}
+{\\tiny Generated for: %s \\quad Date: %s}
+\\par
+\\endgroup
+",
+name, date
+))
+```
diff --git a/tools/NNS/book/chapter-15-distribution-comparison.Rmd b/tools/NNS/book/chapter-15-distribution-comparison.Rmd
new file mode 100644
index 0000000..a1bdd2f
--- /dev/null
+++ b/tools/NNS/book/chapter-15-distribution-comparison.Rmd
@@ -0,0 +1,930 @@
+# Distribution Comparison
+
+Previous chapters developed the directional framework for describing distributions, dependence, and conditional probability. Distribution estimation in Chapter 8 showed how empirical partial moments provide nonparametric estimates of entire distributions, while Chapters 9–13 demonstrated how directional moments reveal dependence, conditional probability, and causal structure.
+
+A natural next question is how to **compare distributions**.
+
+Classical statistics typically approaches this problem through **hypothesis testing**. Tests such as the Kolmogorov–Smirnov test, the Mann–Whitney test, or parametric t-tests attempt to determine whether two samples arise from the same underlying distribution.
+
+While useful, these procedures emphasize **binary decisions**—reject or fail to reject a null hypothesis—rather than describing how distributions actually differ.
+
+The directional framework approaches distribution comparison differently. Because partial moments characterize probability mass relative to benchmarks, two distributions can be compared directly through their **directional probability structure**.
+
+This chapter develops a nonparametric approach to distribution comparison based on directional probability measures and effect-size interpretations rather than hypothesis-testing decisions. It then introduces the NNS ANOVA procedure, which operationalizes these ideas through the LPM-based continuous CDF, adds **stochastic superiority** as the fundamental pairwise effect-size comparison, and concludes with stochastic dominance tests that extend the comparison to ordered preference relations.
+
+---
+
+This chapter is organized into four signposted blocks so readers can move from concepts to implementation: **Block I (Theory)** develops comparison logic and the continuous-CDF correction; **Block II (Estimation mechanics)** defines operational estimators including stochastic superiority and NNS ANOVA; **Block III (Diagnostics)** covers dominance curves and ordered-comparison diagnostics; **Block IV (Applied workflow)** provides examples and practical guidance.
+
+## Block I — Theory and foundational comparisons
+
+### Classical Hypothesis Testing
+
+In classical statistics, comparing two distributions usually begins with a **null hypothesis**
+
+$$
+H_0: F_X(t) = F_Y(t) \quad \text{for all } t.
+$$
+
+The alternative hypothesis states that the two distributions differ.
+
+Several classical tests address this problem.
+
+### Kolmogorov–Smirnov Test
+
+The Kolmogorov–Smirnov statistic compares the empirical distribution functions
+
+$$
+D = \sup_t |\hat F_X(t) - \hat F_Y(t)|.
+$$
+
+Large values of $D$ indicate that the distributions differ.
+
+### Mann–Whitney Test
+
+The Mann–Whitney test evaluates whether observations from one sample tend to be larger than those from another sample.
+
+### Parametric Tests
+
+When parametric assumptions are imposed, tests such as the t-test compare population means.
+
+Although widely used, these methods possess several limitations.
+
+**Binary interpretation.**
+Hypothesis tests produce accept–reject decisions rather than quantitative descriptions of differences.
+
+**Sample-size dependence.**
+With large samples, even trivial differences become statistically significant.
+
+**Model dependence.**
+Parametric tests require assumptions about distributional form.
+
+**Limited directional insight.**
+Most classical tests provide little information about how distributions differ across regions of the support.
+
+The directional framework emphasizes **probability comparisons and effect sizes** instead.
+
+---
+
+### Nonparametric Distribution Comparison
+
+Let $X$ and $Y$ be two random variables with distributions $F_X$ and $F_Y$.
+
+A natural way to compare the distributions is to examine the probability that an observation from one distribution exceeds an observation from the other.
+
+Define independent draws
+
+$$
+X' \sim F_X, \qquad Y' \sim F_Y.
+$$
+
+Consider the probability
+
+$$
+P(X' > Y').
+$$
+
+This quantity measures how frequently values from distribution $X$ exceed values from distribution $Y$.
+
+Similarly,
+
+$$
+P(Y' > X')
+$$
+
+measures the opposite directional comparison.
+
+Because
+
+$$
+P(X' > Y') + P(Y' > X') + P(X'=Y') = 1,
+$$
+
+these probabilities provide a complete comparison of the two distributions.
+
+For continuous distributions, ties occur with probability zero, so
+
+$$
+P(X' > Y') + P(Y' > X') = 1.
+$$
+
+For discrete distributions, ties occur with positive probability. In this case, use the tie-adjusted directional probability
+
+$$
+p^* = P(X' > Y') + \tfrac{1}{2}P(X' = Y').
+$$
+
+This preserves symmetry and keeps the directional comparison centered at $0.5$ when the two distributions are indistinguishable.
+
+This simple probability comparison already provides more information than a binary hypothesis test: it directly measures **which distribution tends to produce larger outcomes**.
+
+---
+
+### Directional Probability and Effect Size
+
+The directional probability
+
+$$
+p = P(X' > Y')
+$$
+
+has a natural interpretation as an **effect-size measure**.
+
+- $p = 0.5$ indicates that the two distributions are indistinguishable.
+- $p > 0.5$ indicates that $X$ tends to produce larger values.
+- $p < 0.5$ indicates that $Y$ tends to produce larger values.
+
+Unlike hypothesis testing, this directional probability does not depend on arbitrary significance thresholds and remains directly interpretable as a frequency statement. If one reports the symmetric certainty transform $C = |2p - 1|$, then $C = 0$ (not $1$) corresponds to indistinguishable distributions, while $C = 1$ indicates complete separation.
+
+---
+
+### Directional Probability Comparisons
+
+Directional probability comparisons can be extended using partial moments.
+
+Let $t$ be a benchmark. The probability that $X$ exceeds the benchmark while $Y$ does not is
+
+$$
+P(X > t,\, Y \le t).
+$$
+
+Similarly,
+
+$$
+P(Y > t,\, X \le t)
+$$
+
+measures the opposite directional region.
+
+Within the partial-moment framework, these probabilities correspond to **degree-zero divergent co-partial moments**:
+
+$$
+DUPM_{0,0}(t,t) = P(X > t,\, Y \le t),
+$$
+
+$$
+DLPM_{0,0}(t,t) = P(X \le t,\, Y > t).
+$$
+
+The difference
+
+$$
+\Delta(t) = DUPM_{0,0}(t,t) - DLPM_{0,0}(t,t)
+$$
+
+indicates which distribution dominates relative to the benchmark.
+
+- If $\Delta(t) > 0$, distribution $X$ more frequently exceeds the benchmark.
+- If $\Delta(t) < 0$, distribution $Y$ does.
+
+By examining $\Delta(t)$ across a range of benchmarks, analysts obtain a **directional dominance curve** describing where one distribution exceeds the other.
+
+---
+
+### The Discrete–Continuous CDF Distinction and Bias Elimination
+
+Before developing operational comparison procedures, it is essential to confront a source of **systematic bias** embedded in the standard empirical CDF that has significant consequences for distribution comparison.
+
+#### The Empirical CDF as a Discrete Measure
+
+The empirical cumulative distribution function is identical to the **degree-zero lower partial moment ratio**:
+
+$$
+\hat{F}_X(t) = L_0(t; X) = \frac{1}{n} \sum_{i=1}^n \mathbf{1}_{\{x_i \le t\}},
+$$
+
+where $\mathbf{1}_{\{\cdot\}}$ is the indicator function. This is a **discrete** probability measure: it assigns probability mass only at observed data points and is a step function everywhere else. Even with one million observations, it remains a step function — it is never truly continuous.
+
+A direct consequence of this discreteness is systematic bias in probability estimation at the sample mean. For a symmetric distribution, exactly 50% of the population lies below the mean. Yet for any finite sample,
+
+$$
+\hat{F}_X(\bar{x}) = \frac{1}{n}\sum_{i=1}^n \mathbf{1}_{\{x_i \le \bar{x}\}}
+e 0.5
+$$
+
+in general, because the sample mean typically falls between two observed values. This quantity oscillates around 0.5 and only converges asymptotically — it will never equal 0.5 for any particular finite draw.
+
+This is not a minor technicality. Any comparison procedure that evaluates a group's discrete CDF at a shared benchmark inherits this bias: the two CDFs will appear to differ even when the groups are identical, simply because of discretization noise.
+
+#### The Degree-One Partial Moment as a Continuous Probability
+
+The directional framework resolves this bias by replacing the discrete indicator with **area-based probability mass** using the degree-one lower partial moment ratio:
+
+$$
+F_1(t; X) = \frac{LPM_1(t, X)}{LPM_1(t, X) + UPM_1(t, X)},
+$$
+
+where
+
+$$
+LPM_1(t, X) = \frac{1}{n} \sum_{i=1}^n \max(0,\, t - x_i),
+\qquad
+UPM_1(t, X) = \frac{1}{n} \sum_{i=1}^n \max(0,\, x_i - t).
+$$
+
+Rather than counting the fraction of observations below $t$, this ratio measures the fraction of the **total area of deviations** that lies below $t$. It corresponds to the continuous PDF probability
+
+$$
+P(X \le t) = \frac{\int_{-\infty}^t f(x)\,dx}{\int_{-\infty}^\infty f(x)\,dx},
+$$
+
+capturing the area between discrete bins that the step-function CDF ignores. This is the essential connection to the derivative relationship $f(x) = dF(x)/dx$: the degree-one ratio encodes the continuous probability density information that the degree-zero CDF discards.
+
+In NNS, this is computed via `LPM.ratio(degree = 1, target, variable)`.
+
+#### The Mean-Target Property: Exact Bias Elimination
+
+A fundamental property follows from the algebraic structure of the degree-one ratio.
+
+**Theorem.** For any random variable $X$ with finite mean $\mu_X$, and for any sample $x_1, \dots, x_n$,
+
+$$
+F_1(\bar{x};\, X) = 0.5
+$$
+
+exactly, where $\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$ is the sample mean.
+
+**Proof.** Observe that pointwise,
+
+$$
+(t - x_i)^+ - (x_i - t)^+ = t - x_i
+$$
+
+for every $i$ and every $t$. Setting $t = \bar{x}$ and summing over $i$:
+
+$$
+\sum_{i=1}^n (t - x_i)^+ - \sum_{i=1}^n (x_i - t)^+ = \sum_{i=1}^n (\bar{x} - x_i) = n\bar{x} - \sum_{i=1}^n x_i = 0.
+$$
+
+Therefore $LPM_1(\bar{x}, X) = UPM_1(\bar{x}, X)$, which gives $F_1(\bar{x}; X) = 0.5$ exactly. $\square$
+
+This result holds for **every distribution, every sample size, and without any parametric assumptions**. The discrete CDF only approaches this value asymptotically; the degree-one ratio achieves it exactly from the very first observation.
+
+The same argument extends to the population: $F_1(\mu_X; X) = 0.5$ exactly for the population mean $\mu_X$, for any distribution with finite mean.
+
+#### Empirical Demonstration
+
+The contrast between the two representations is immediately visible in data:
+
+```r
+library(NNS)
+
+set.seed(12345)
+x <- rnorm(100, mean = 5, sd = 1)
+
+# Discrete CDF at the mean — biased
+LPM.ratio(0, mean(x), x)
+## [1] 0.44
+
+# Continuous (area-based) probability at the mean — unbiased
+LPM.ratio(1, mean(x), x)
+## [1] 0.5
+```
+
+With 100 observations, the discrete CDF places only 44% of mass below the sample mean. The degree-one ratio returns exactly 0.5. Increasing to 500 observations closes the gap but never eliminates it for the discrete version:
+
+```r
+set.seed(12345)
+x2 <- rnorm(500, mean = 5, sd = 1)
+
+LPM.ratio(0, mean(x2), x2) # discrete — still biased
+## [1] 0.496
+
+LPM.ratio(1, mean(x2), x2) # continuous — exact
+## [1] 0.5
+```
+
+Tracking both measures sequentially across every observation confirms that the degree-zero ratio oscillates around 0.5 and converges only in the limit, while the degree-one ratio is pinned at exactly 0.5 throughout:
+
+```r
+set.seed(12345)
+x <- rnorm(500)
+
+lpm0 <- numeric(length(x))
+lpm1 <- numeric(length(x))
+
+for (i in seq_along(x)) {
+ lpm0[i] <- LPM.ratio(0, mean(x[1:i]), x[1:i])
+ lpm1[i] <- LPM.ratio(1, mean(x[1:i]), x[1:i])
+}
+
+plot(lpm0, col = "red", type = "l", lwd = 2,
+ ylim = c(0, 1), ylab = "P(X ≤ mean)", xlab = "n")
+lines(lpm1, col = "blue", lwd = 2)
+abline(h = 0.5, lty = 2)
+legend("topright", legend = c("LPM degree 0 (discrete)",
+ "LPM degree 1 (continuous)"),
+ col = c("red", "blue"), lwd = 2, bty = "n")
+```
+
+
+
+
+
+The red line wanders; the blue line is flat at 0.5 for every $n \ge 1$.
+
+#### Implications for Distribution Comparison
+
+This property has direct consequences for the comparison methods developed in the remainder of this chapter.
+
+When two samples $X$ and $Y$ are evaluated at a shared benchmark — such as the grand mean $\bar{z}$ — under the null hypothesis of identical population means, both $F_1(\bar{z}; X)$ and $F_1(\bar{z}; Y)$ should return 0.5. Deviations from 0.5 then provide unambiguous evidence that the group means diverge from the grand statistic.
+
+Using the discrete CDF instead, both evaluations would generically differ from 0.5 even under the null, producing spurious evidence of separation. The degree-one ratio eliminates this source of false signal entirely.
+
+The NNS ANOVA procedure in Section 15.4.2 is built directly on this foundation.
+
+---
+
+## Block II — Estimation mechanics
+
+### Empirical Estimation
+
+These probability comparisons can be estimated directly from sample data.
+
+Let
+
+$$
+x_1,\dots,x_n \sim X, \qquad y_1,\dots,y_m \sim Y.
+$$
+
+An empirical estimator of $P(X > Y)$ is
+
+$$
+\hat p = \frac{1}{nm} \sum_{i=1}^n \sum_{j=1}^m \mathbf{1}_{\{x_i > y_j\}}.
+$$
+
+This statistic measures the proportion of cross-sample comparisons in which an observation from $X$ exceeds one from $Y$.
+
+The estimator converges to the population probability by the law of large numbers.
+
+This estimator requires no parametric assumptions and uses the full sample information.
+
+---
+
+## Block III — Diagnostics and dominance analysis
+
+### Directional Dominance Curves
+
+Benchmark-based comparisons extend this idea further.
+
+Define
+
+$$
+\hat \Delta(t) = \hat P(X>t,\,Y\le t) - \hat P(Y>t,\,X\le t).
+$$
+
+Plotting $\hat\Delta(t)$ across benchmarks produces a **directional dominance curve**.
+
+Interpretation:
+
+- Positive values indicate regions where distribution $X$ dominates.
+- Negative values indicate regions where distribution $Y$ dominates.
+- Values near zero indicate similar behavior.
+
+Unlike scalar summary statistics, this curve reveals **where along the distribution the differences occur**.
+
+For example, one distribution may dominate in the upper tail while the other dominates in the lower tail.
+
+Directional dominance curves therefore provide a detailed nonparametric comparison of distributions.
+
+---
+
+
+
+### Severity-Weighted Distribution Comparison
+
+Distribution comparison need not stop at asking which sample places more probability mass below a threshold. The directional framework allows a stronger question: how quickly does adverse severity accumulate below that threshold?
+
+For a benchmark \(t\), degree-zero comparison evaluates
+\[
+L_0(t;X)=P(X\le t),
+\]
+which is purely frequency-based. Degree one evaluates
+\[
+L_1(t;X)=E[(t-X)_+],
+\]
+which aggregates the total adverse deviation below the benchmark. Degree two evaluates
+\[
+L_2(t;X)=E[(t-X)_+^2],
+\]
+which penalizes larger deviations disproportionately.
+
+These degrees therefore define a general hierarchy:
+\[
+\text{degree 0} \to \text{event frequency},
+\]
+\[
+\text{degree 1} \to \text{aggregate adverse magnitude},
+\]
+\[
+\text{degree 2} \to \text{extreme-deviation sensitivity}.
+\]
+
+This hierarchy is useful in any setting where two distributions can have similar lower-tail frequency but very different lower-tail severity. A system may violate a benchmark only slightly more often than another system, yet do so with much larger deviations once the benchmark is crossed. Frequency alone would miss that distinction; higher-degree partial moments make it visible.
+
+The probability-bounds literature uses this same logic in discussions of partial-moment-ratio thresholds, though often in finance terminology. Here the broader point is more important than the label: a threshold comparison can be performed either in count space or in severity-weighted space.
+
+
+## Block IV — Applied workflow and practical inference
+
+
+### Practical Threshold Comparison Across Degrees
+
+A practical directional workflow for comparing distributions is therefore to evaluate lower-tail structure degree by degree.
+
+First, compare degree-zero lower-tail probabilities to assess how often observations fall below the benchmark. This recovers the familiar CDF-based comparison.
+
+Second, compare degree-one lower partial moments to assess how much aggregate adverse magnitude accumulates below the benchmark.
+
+Third, compare degree-two lower partial moments when larger deviations deserve disproportionate emphasis.
+
+This layered comparison is useful across domains. In forecasting, it distinguishes models that miss a target equally often but differ in the magnitude of their misses. In operations, it distinguishes supply systems that stock out with similar frequency but very different shortage depth. In reliability, it distinguishes designs whose failure margins are crossed with similar probability but different severity once crossed.
+
+This section also clarifies why the degree-one continuous probability representation matters. Chapter 14 established that the degree-one partial-moment ratio removes the discrete-CDF bias at the mean in finite samples. That same area-based logic provides a smoother and more interpretable path from event counting to severity-weighted comparison.
+
+
+### Example
+
+Suppose two samples represent outcomes from two strategies.
+
+Sample $X$:
+
+$$
+-2, -1, 1, 3, 4
+$$
+
+Sample $Y$:
+
+$$
+-3, -2, 0, 2, 2
+$$
+
+Compute cross-sample comparisons.
+
+There are $5 \times 5 = 25$ comparisons.
+
+Counting cases where $x_i > y_j$ yields
+
+$$
+\hat p = 0.64.
+$$
+
+Interpretation:
+
+- Distribution $X$ tends to produce larger values than $Y$.
+- The estimated directional exceedance probability is $0.64$, indicating moderate directional advantage for $X$.
+
+This effect-size interpretation provides a clear and intuitive comparison without invoking hypothesis tests.
+
+---
+
+### NNS ANOVA: CDF-Based Distribution Comparison
+
+The directional framework motivates a fully operational procedure for comparing distributions. The NNS ANOVA method uses the degree-one lower partial moment CDF developed in Section 15.1.8 and evaluates distributional similarity across both the grand mean and selected quantiles.
+
+To avoid notation ambiguity, this section uses a dedicated symbol, $C_{\text{ANOVA}}$, for the NNS ANOVA certainty score.
+
+#### The LPM-Based Continuous CDF
+
+The degree-one partial moment CDF established in Section 15.1.8 provides the measurement foundation for NNS ANOVA. Recall that
+
+$$
+F_1(t; X) = \frac{LPM_1(t, X)}{LPM_1(t, X) + UPM_1(t, X)},
+$$
+
+and that $F_1(\bar{x}; X) = 0.5$ exactly for the sample mean, for any distribution and any sample size.
+
+This **mean-target property** provides a distribution-free anchor for comparing means across samples: under the null that two groups share a common population mean, both groups' degree-one CDFs evaluated at the grand mean will return 0.5. Any deviation signals distributional separation. Because the degree-one ratio eliminates finite-sample discretization bias (Section 15.1.8), this signal is clean — not contaminated by the systematic oscillation present in the discrete CDF.
+
+The degree-one CDF also exhibits greater smoothness than the empirical step function, particularly in small samples, because it encodes area-based probability mass rather than point counts. This smoothness reduces noise sensitivity and improves stability across repeated samples.
+
+#### Grand Mean and the NNS Certainty Statistic
+
+To compare two distributions, NNS ANOVA proceeds as follows.
+
+Let $x_1, \dots, x_n$ and $y_1, \dots, y_m$ denote the control and treatment samples. The **grand statistic** is the sample-size weighted mean of the two group means:
+
+$$
+\bar{z} = \frac{n\bar{x} + m\bar{y}}{n + m}.
+$$
+
+This pooled form ensures the reference point reflects the actual composition of the combined sample, giving appropriately greater weight to whichever group contributes more observations.
+
+Each sample's degree-one partial moment CDF is evaluated at $\bar{z}$:
+
+$$
+F_1(\bar{z}; X), \qquad F_1(\bar{z}; Y).
+$$
+
+Under the null hypothesis that both samples share a common population mean equal to $\bar{z}$, both CDFs should evaluate to approximately 0.5 by the mean-target property. Deviations from 0.5 reflect evidence that the sample means diverge from the grand statistic.
+
+The NNS ANOVA certainty statistic is computed from five deviation terms. Let $\delta_0$ denote the maximum absolute deviation of either group's CDF from 0.5 at the grand mean, capped at 0.5:
+
+$$
+\delta_0 = \min\!\bigl(0.5,\, \max(|F_1(\bar{z}; X) - 0.5|,\, |F_1(\bar{z}; Y) - 0.5|)\bigr).
+$$
+
+Four additional terms are computed at upper and lower quantile targets. The upper 25% target is the average of the 75th upper partial moment quantile of each group, and the lower 25% target is the average of the 75th lower partial moment quantile; analogous targets are constructed at the 12.5% level. At each target $q$, the deviation $\delta_q$ is defined as the maximum absolute departure of either group's partial moment ratio from the expected null value $q$, capped at $q$.
+
+The full-distribution certainty statistic is then
+
+$$
+\begin{aligned}
+\text{Certainty}_{\text{ANOVA, raw}}
+&= \frac{1}{2.5} \Bigg[
+\frac{(0.5 - \delta_0)^2}{0.25}
++ 0.5 \cdot \frac{(0.25 - \delta_{0.25}^{U})^2}{0.0625} \\
+&\qquad + 0.5 \cdot \frac{(0.25 - \delta_{0.25}^{L})^2}{0.0625}
++ 0.25 \cdot \frac{(0.125 - \delta_{0.125}^{U})^2}{0.015625} \\
+&\qquad + 0.25 \cdot \frac{(0.125 - \delta_{0.125}^{L})^2}{0.015625}
+\Bigg].
+\end{aligned}
+$$
+
+Each term is a squared relative deviation from its null value, weighted by benchmark coverage so central benchmarks dominate the score while outer-tail benchmarks remain contributory but less influential. The mean benchmark receives weight 1 because it is the primary location anchor; each 25% tail benchmark receives 0.5 because it targets one-quarter tail regions on each side; and each 12.5% extreme-tail benchmark receives 0.25 to avoid overweighting sparse extremes. The sum is normalized by total weight 2.5. In means-only mode, only the first term is used and divided by 1 rather than 2.5.
+
+When medians rather than means are the comparison target, the CDF evaluation uses the degree-zero partial moment ratio $LPM_0(\bar{z}; X) / (LPM_0(\bar{z}; X) + UPM_0(\bar{z}; X))$, which counts frequency mass rather than area mass, in place of the degree-one ratio.
+
+A **population size adjustment** is applied before the final certainty is returned:
+
+$$
+\text{Certainty}_{\text{ANOVA}} = \min\!\left(1,\; \text{Certainty}_{\text{ANOVA, raw}} \times \left(\frac{n + m - 2}{n + m}\right)^2\right).
+$$
+
+This correction reduces certainty modestly for small combined samples, reflecting the increased estimation uncertainty when fewer observations inform the CDF comparisons. As $n + m \to \infty$ the adjustment approaches one and becomes negligible.
+
+By construction, $\text{Certainty}_{\text{ANOVA}} = 1$ indicates maximal agreement at the grand mean and tail benchmarks, while values closer to $0$ indicate stronger disagreement.
+
+This formulation inverts the conventional hypothesis-testing orientation: rather than a p-value measuring evidence against the null, the certainty statistic directly expresses the degree of distributional agreement.
+
+#### Full Distribution vs. Means-Only Comparison
+
+NNS ANOVA can be applied in two modes.
+
+In **full distribution mode**, the certainty statistic is computed across the grand mean and all quantile benchmarks, measuring overall distributional similarity. This mode is sensitive to differences in both location and spread.
+
+In **means-only mode**, the certainty statistic is computed only at the grand mean, measuring whether the sample means differ. This mode parallels the objective of a classical t-test but without distributional assumptions.
+
+When **medians** are of interest rather than means, the grand statistic is replaced by the combined median, and the evaluation proceeds accordingly.
+
+#### Effect Size and Confidence Intervals
+
+Beyond the certainty statistic, NNS ANOVA provides **effect size estimates** with associated confidence intervals.
+
+Effect sizes are computed by bootstrapping both groups independently. For each of $B$ bootstrap resamples, the mean (or median) of the control and treatment are recorded, yielding empirical distributions of $\bar{x}^*$ and $\bar{y}^*$. The confidence bounds are then read from these bootstrap distributions using partial moment quantiles at the specified $\alpha$ level.
+
+The effect size bounds are defined as the conservative range of plausible treatment effects:
+
+$$
+\text{Effect Size}^{LB} = \bar{y}^*_{\alpha/2} - \bar{x}^*_{1-\alpha/2},
+\qquad
+\text{Effect Size}^{UB} = \bar{y}^*_{1-\alpha/2} - \bar{x}^*_{\alpha/2},
+$$
+
+where $\bar{x}^*_{\alpha/2}$ and $\bar{x}^*_{1-\alpha/2}$ denote the lower and upper bootstrap quantiles of the control mean at the specified confidence level, and analogously for the treatment. The lower bound pairs the pessimistic treatment outcome against the optimistic control; the upper bound does the reverse. This conservative construction ensures that if zero lies outside the interval, the effect is detectable with confidence even under the most unfavorable pairing of bootstrap tails.
+
+#### Robust Estimation via Bootstrap Resampling
+
+The certainty statistic can be made more robust through bootstrap resampling. In this mode, the control and treatment samples are independently resampled with replacement across a specified number of iterations (typically 100), and the certainty statistic is recomputed for each resample.
+
+The resulting distribution of certainty values provides:
+
+- A **robust certainty estimate**: the median or mean certainty across bootstrap resamples.
+- A **confidence interval** for the certainty statistic itself, reflecting sampling uncertainty.
+
+This bootstrap approach is particularly valuable with small samples or in the presence of outliers, where the point estimate of certainty may be unstable.
+
+#### Relationship to Power
+
+A key advantage of the NNS certainty framework over classical p-values is its explicit relationship to statistical power.
+
+Classical p-values conflate the magnitude of a difference with the precision of its estimation. With large samples, even negligible differences produce small p-values. With small samples, meaningful differences may not reach significance at all.
+
+The NNS certainty statistic, by contrast, reflects the actual probability mass separation between distributions and scales naturally with sample size. The mechanism is direct: certainty measures how far apart the CDFs of the two distributions are at the grand mean and selected quantiles, so it tracks the actual signal — the degree of separation between distributions — rather than conflating signal with sample size. Empirically, NNS certainty correlates more strongly with test power $(1 - \beta)$ than do p-values, which show weaker and more volatile associations.
+
+This connection means that certainty values provide information about both the size of the difference and the reliability of its detection—a combination unavailable from p-values alone.
+
+#### Multi-Group and Pairwise Comparisons
+
+The NNS ANOVA framework extends naturally to **multiple groups**. When more than two samples are supplied, the procedure computes a grand statistic across all groups and evaluates each group's CDF at this common benchmark. Pairwise certainty values can also be returned, summarizing all bilateral comparisons in matrix form.
+
+This multi-group capability directly parallels classical one-way ANOVA, but without the assumption of normality or equal variance across groups.
+
+
+---
+
+### Stochastic Superiority
+
+Stochastic superiority asks a different question than equality of means or equality of distributions. Rather than asking whether two samples came from the same population, or whether they share the same mean or median, it measures the probability that a random draw from one distribution exceeds a random draw from the other.
+
+Let
+
+$$
+X' \sim F_X, \qquad Y' \sim F_Y,
+$$
+
+independently. The stochastic superiority probability is
+
+$$
+p_{X,Y} = P(X' > Y').
+$$
+
+For continuous distributions, ties occur with probability zero, so $p_{X,Y} + p_{Y,X} = 1$. For discrete or mixed distributions, ties may occur with positive probability. In that case the tie-adjusted comparison is
+
+$$
+p^*_{X,Y} = P(X' > Y') + \tfrac{1}{2}P(X' = Y').
+$$
+
+This adjustment preserves symmetry,
+
+$$
+p^*_{X,Y} + p^*_{Y,X} = 1,
+$$
+
+and keeps the comparison centered at $0.5$ when neither distribution has a directional advantage.
+
+A value of $p^*_{X,Y} = 0.5$ indicates no directional advantage. Values above $0.5$ favor $X$, and values below $0.5$ favor $Y$. One may also report the certainty-style transform
+
+$$
+C_{SS} = |2p^*_{X,Y} - 1|,
+$$
+
+which maps the comparison to $[0,1]$, where $0$ denotes no directional separation and $1$ denotes complete separation. Unlike a p-value, both $p^*_{X,Y}$ and $C_{SS}$ retain a direct frequency interpretation.
+
+This differs from stochastic dominance. Stochastic superiority is a pairwise exceedance probability, while stochastic dominance requires one distribution to be preferred over the entire shared support. It also differs from NNS ANOVA. NNS ANOVA asks whether the distributions are in agreement at the grand mean and selected benchmark points; stochastic superiority asks which distribution tends to generate larger draws overall. It is therefore stronger than a simple mean comparison, because it uses the full cross-sample ordering, but weaker than dominance, because it does not require the ordering to hold at every threshold. A distribution may have $p^*_{X,Y} > 0.5$ and still fail to dominate if the CDFs cross.
+
+Given samples
+
+$$
+x_1, \dots, x_n \sim X, \qquad y_1, \dots, y_m \sim Y,
+$$
+
+the empirical estimator is
+
+$$
+\hat p = \frac{1}{nm} \sum_{i=1}^n \sum_{j=1}^m \mathbf{1}_{\{x_i > y_j\}},
+$$
+
+with tie-adjusted form
+
+$$
+\hat p^* = \frac{1}{nm} \sum_{i=1}^n \sum_{j=1}^m \left[\mathbf{1}_{\{x_i > y_j\}} + \tfrac{1}{2}\mathbf{1}_{\{x_i = y_j\}}\right].
+$$
+
+These estimators use all pairwise cross-sample comparisons and require no parametric assumptions.
+
+In **`NNS`**, stochastic superiority is computed with `NNS.SS()`. The function returns the directional exceedance probability, the tie probability, and the tie-adjusted superiority probability. Confidence intervals can also be obtained by resampling.
+
+```r
+library(NNS)
+
+set.seed(123)
+x <- rnorm(1000, mean = 0, sd = 1)
+y <- rnorm(1000, mean = 1, sd = 1)
+
+NNS.SS(x, y)
+```
+
+Because the second sample is shifted to the right, the superiority probability for $X$ relative to $Y$ should fall below $0.5$, while the superiority probability for $Y$ relative to $X$ should exceed $0.5$.
+
+For discrete data, ties should be reported rather than ignored:
+
+```r
+set.seed(123)
+x <- sample(1:5, 100, replace = TRUE)
+y <- sample(1:5, 100, replace = TRUE)
+
+NNS.SS(x, y)
+```
+
+This is especially important in ordinal and categorical applications where equal outcomes are common. In practice, stochastic superiority is often the cleanest first effect-size summary to report, because it answers the most direct comparative question: how often does one distribution beat the other?
+
+
+---
+
+### Stochastic Dominance
+
+The directional probability comparison introduced in Section 15.1.6 measures how often observations from one distribution exceed those from another. This idea can be formalized into a preference ordering known as **stochastic dominance**.
+
+Stochastic dominance provides a rigorous nonparametric criterion for determining when one distribution is unambiguously preferred to another, without specifying a utility function beyond minimal regularity conditions.
+
+#### First-Order Stochastic Dominance
+
+Distribution $X$ **first-order stochastically dominates** distribution $Y$, written $X \succ_1 Y$, if
+
+$$
+F_X(t) \le F_Y(t) \quad \text{for all } t,
+$$
+
+with strict inequality for at least one $t$.
+
+Equivalently, $X$ dominates $Y$ in the first order if and only if every non-decreasing utility function assigns at least as high an expected value to $X$ as to $Y$. This means any decision-maker who prefers more to less will prefer $X$.
+
+In terms of the empirical CDF, $X \succ_1 Y$ whenever the distribution of $X$ lies entirely to the right of the distribution of $Y$ across all quantiles.
+
+#### Second-Order Stochastic Dominance
+
+Distribution $X$ **second-order stochastically dominates** distribution $Y$, written $X \succ_2 Y$, if
+
+$$
+\int_{-\infty}^t F_X(s)\, ds \le \int_{-\infty}^t F_Y(s)\, ds \quad \text{for all } t.
+$$
+
+Second-order dominance captures risk aversion: $X \succ_2 Y$ if and only if every non-decreasing concave utility function prefers $X$. A distribution can dominate in the second order without dominating in the first, provided any CDF crossings are compensated by area accumulation.
+
+#### Third-Order Stochastic Dominance
+
+Distribution $X$ **third-order stochastically dominates** distribution $Y$ if the iterated integral condition holds:
+
+$$
+\int_{-\infty}^t \int_{-\infty}^s F_X(u)\, du\, ds
+\le
+\int_{-\infty}^t \int_{-\infty}^s F_Y(u)\, du\, ds
+\quad \text{for all } t.
+$$
+
+Third-order dominance adds a condition on the skewness of the CDF integral and corresponds to preference among agents who are risk-averse and have decreasing absolute risk aversion (DARA). It permits distributions to intersect in the first-order sense as long as earlier-order deficits are offset by later-order surpluses.
+
+#### Connection to Partial Moments
+
+Stochastic dominance criteria have natural expressions in terms of partial moments.
+
+First-order stochastic dominance is equivalent to the condition that the lower partial moment of degree zero satisfies
+
+$$
+LPM_0(t, X) \le LPM_0(t, Y) \quad \text{for all } t.
+$$
+
+Because $LPM_0(t, X) = F_X(t)$, this is the direct CDF criterion.
+
+Second-order dominance can be expressed through the degree-one lower partial moment:
+
+$$
+LPM_1(t, X) \le LPM_1(t, Y) \quad \text{for all } t.
+$$
+
+Since $LPM_1(t, X) = \int_{-\infty}^t F_X(s)\, ds$, this recovers the integral condition exactly.
+
+Third-order dominance corresponds to the degree-two lower partial moment condition:
+
+$$
+LPM_2(t, X) \le LPM_2(t, Y) \quad \text{for all } t.
+$$
+
+These equivalences mean that **stochastic dominance tests are partial moment comparisons** evaluated across the full support of the data. The NNS framework implements all three levels directly through empirical partial moment estimates.
+
+This is also the bridge to Chapter 17: the degree-one quantile objects used for bias-corrected prediction intervals (`LPM.VaR(..., degree = 1, ...)`, `UPM.VaR(..., degree = 1, ...)`) are generated from the same degree-one lower/upper partial moment geometry used here for SSD and TSD diagnostics. Put differently, interval construction and dominance testing are not separate methods — they are two uses of the same directional probability representation.
+
+#### Empirical Stochastic Dominance Tests
+
+Given samples $x_1, \dots, x_n$ and $y_1, \dots, y_m$, the empirical first-order dominance test checks whether
+
+$$
+\hat F_X(t) \le \hat F_Y(t)
+$$
+
+holds for all evaluation points $t$ in the combined support of the data. If the condition holds everywhere, $X$ first-order stochastically dominates $Y$. If it fails at some points, the distributions intersect and neither dominates at the first order.
+
+Second- and third-order tests proceed analogously, replacing the empirical CDF with its iterated integrals, which correspond to the empirical degree-one and degree-two lower partial moments evaluated at each point.
+
+The NNS implementations `NNS.FSD()`, `NNS.SSD()`, and `NNS.TSD()` perform these evaluations directly and return which distribution, if any, dominates at each order.
+
+
+#### Stochastic Dominant Efficient Sets
+
+When comparing more than two distributions simultaneously, the concept of dominance generalizes to the notion of an **efficient set**: the collection of distributions that are not dominated by any other distribution at the specified order.
+
+Formally, the first-order stochastic dominant efficient set is
+
+$$
+\mathcal{E}_1 = \{X_i :
+exists\, X_j \text{ such that } X_j \succ_1 X_i\}.
+$$
+
+Distributions outside this set are dominated and can be excluded by any decision-maker with a non-decreasing utility function.
+
+The `NNS.SD.efficient.set()` function identifies the efficient set across an arbitrary collection of distributions at first, second, or third order. This is particularly useful in portfolio selection, strategy evaluation, and any setting where a large number of alternatives must be ranked.
+
+#### Stochastic Dominance Clustering
+
+An extension of the efficient set concept groups distributions into **stochastic dominance clusters**: collections of distributions that share similar dominance relationships with one another.
+
+Within a cluster, no member dominates any other at the specified order. Across clusters, members of higher-ranked clusters tend to dominate members of lower-ranked clusters.
+
+The `NNS.SD.cluster()` function implements this procedure and can render results as a **dendrogram** showing hierarchical dominance relationships. This visualization reveals which groups of distributions are interchangeable in the preference order and which are strictly ranked relative to others.
+
+
+---
+
+### Practical Inference Without Parametric Assumptions
+
+The directional framework emphasizes **probability comparisons and effect sizes** rather than binary hypothesis tests.
+
+Key advantages include:
+
+**Nonparametric validity.**
+No assumptions are required about the distributional form of the data.
+
+**Interpretability.**
+Probability comparisons directly answer practical questions such as "How often does one outcome exceed another?" Use directional exceedance probabilities such as $P(X' > Y')$ for pairwise comparison, and use $\text{Certainty}_{\text{ANOVA}}$ for NNS ANOVA distributional agreement.
+
+**Directional insight.**
+Benchmark-based comparisons reveal where differences occur within the distribution, not merely whether they exist.
+
+**Robustness.**
+Results do not depend on arbitrary significance thresholds, and bootstrap-based robustness estimation provides reliable inference under small samples or outliers.
+
+**Power awareness.**
+The NNS certainty statistic correlates directly with test power, addressing a fundamental limitation of classical p-values.
+
+**Preference ordering.**
+Stochastic dominance tests express distributional preference in terms that are directly linked to decision theory, enabling selection among competing alternatives without specifying a complete utility function.
+
+These properties make directional distribution comparison particularly useful in fields such as finance, economics, and risk management where **tail behavior and asymmetric outcomes** often matter more than average differences.
+
+---
+
+### Example Dataset Workflow
+
+A convenient applied example is the mtcars transmission split from the NNS distribution-comparison vignette. Let Group A be miles-per-gallon for automatic transmissions, `mtcars$mpg[mtcars$am == 0]`,
+and let Group B be miles-per-gallon for manual transmissions, `mtcars$mpg[mtcars$am == 1]`.
+This yields two empirical distributions on the same response variable and provides a direct setting for comparing stochastic superiority, NNS ANOVA, and stochastic dominance.
+
+
+```{r chapter14-mtcars-workflow, eval=FALSE}
+auto_mpg <- mtcars$mpg[mtcars$am == 0]
+manual_mpg <- mtcars$mpg[mtcars$am == 1]
+
+# 1. Pairwise directional advantage
+NNS.SS(manual_mpg, auto_mpg)
+# $p_gt
+# [1] 0.8259109
+#
+# $p_tie
+# [1] 0.008097166
+#
+# $p_star
+# [1] 0.8299595
+
+# 2. Full-distribution comparison
+NNS.ANOVA(control = auto_mpg,
+ treatment = manual_mpg,
+ robust = TRUE)
+# $Control
+# [1] 17.14737
+#
+# $Treatment
+# [1] 24.39231
+#
+# $Grand_Statistic
+# [1] 20.09063
+#
+# $Control_CDF
+# [1] 0.8708501
+#
+# $Treatment_CDF
+# [1] 0.1294878
+#
+# $Certainty
+# [1] 0.02345583
+#
+# $`Effect_Size_LB.2.5%`
+# [1] 2.377328
+#
+# $`Effect_Size_UB.97.5%`
+# [1] 12.2155
+#
+# $Confidence_Level
+# [1] 0.95
+#
+# $`Robust Certainty Estimate`
+# [1] 0.01094359
+#
+# $`Lower 95% CI`
+# [1] 3.864872e-06
+#
+# $`Upper 95% CI`
+# [1] 0.1048396
+
+# 3. Preference ordering over the full support
+NNS.FSD(manual_mpg, auto_mpg)
+# [1] "X FSD Y"
+```
+
+For this comparison, `NNS.SS(manual_mpg, auto_mpg)` yields `p_gt = 0.8259109`, `p_tie = 0.008097166`,
+and `p_star = 0.8299595`, indicating that a randomly selected manual-transmission car exceeds a randomly selected automatic-transmission car in miles-per-gallon about 83 percent of the time.
+`NNS.ANOVA(control = auto_mpg, treatment = manual_mpg, robust = TRUE)` returns `Certainty = 0.02345583` and `Robust Certainty Estimate = 0.01094359`, indicating very little distributional agreement between the two groups.
+`NNS.FSD(manual_mpg, auto_mpg)` returns "X FSD Y", implying that the manual-transmission miles-per-gallon distribution first-order stochastically dominates the automatic-transmission distribution. Together, these results show pairwise directional advantage,
+weak distributional agreement, and full-support preference for the manual-transmission group.
+
+### Summary
+
+Classical distribution comparison is usually framed as a sequence of tests that end in accept-or-reject decisions. The directional framework developed here shifts the emphasis from binary testing to interpretable probability comparisons and partial-moment geometry.
+
+At the most direct level, two distributions can be compared through probabilities such as $P(X > Y)$ and the tie-adjusted stochastic superiority measure $P^* = P(X' > Y') + \tfrac{1}{2}P(X' = Y')$. This provides a pairwise effect size with an immediate interpretation: how often does one distribution generate larger outcomes than the other?
+
+At the distributional-agreement level, the degree-one continuous CDF removes the finite-sample bias of the empirical CDF at the mean, with $F_1(\bar{x}; X) = 0.5$ exactly. That property makes NNS ANOVA a distribution-free and bias-free comparison procedure. Rather than depending on parametric assumptions, it evaluates agreement through benchmark-relative CDF deviations and reports an interpretable certainty statistic together with effect sizes and robust confidence intervals.
+
+At the strongest level, stochastic dominance extends directional comparison from pairwise exceedance to full preference ordering over the support. First-, second-, and third-order dominance can all be written as partial-moment inequalities, which shows that dominance analysis, efficient sets, and dominance clustering are all natural extensions of the same directional probability representation.
+
+Taken together, these tools provide a coherent hierarchy for nonparametric distribution comparison. Stochastic superiority answers the pairwise question, NNS ANOVA answers the agreement question, and stochastic dominance answers the preference-ordering question.
diff --git a/tools/NNS/book/chapter-16-directional-tail-thresholds-probability-bounds-and-estimation-error.Rmd b/tools/NNS/book/chapter-16-directional-tail-thresholds-probability-bounds-and-estimation-error.Rmd
new file mode 100644
index 0000000..602aab4
--- /dev/null
+++ b/tools/NNS/book/chapter-16-directional-tail-thresholds-probability-bounds-and-estimation-error.Rmd
@@ -0,0 +1,328 @@
+# Directional Tail Thresholds, Probability Bounds, and Estimation Error
+
+Previous chapters developed the directional framework for probability, dependence, distribution comparison, and prediction. A natural next step is threshold analysis.
+
+In many practical settings, the analyst is not interested only in a central interval for future observations. The analyst also wants to understand how a process behaves near adverse regions of its distribution: how often a benchmark is crossed, how severe the benchmark violations are once they occur, how conservative tail-probability statements remain under weak assumptions, and how stable the resulting estimates are in finite samples.
+
+These questions arise in many domains:
+
+- forecast errors relative to a service target,
+- inventory levels relative to a replenishment threshold,
+- reliability metrics relative to a safety margin,
+- environmental measurements relative to a policy limit,
+- financial returns relative to a minimum acceptable outcome.
+
+The common structure is benchmark-relative tail analysis. The same directional operators that generated distribution functions and quantile intervals in earlier chapters also generate threshold rules, directional probability bounds, and finite-sample diagnostics. This chapter develops that connection.
+
+The main thesis is simple:
+
+1. degree-0 partial moments recover the usual lower-tail probability and quantile threshold,
+2. higher-degree partial moments generate severity-weighted threshold rules,
+3. semivariance and higher lower partial moments yield distribution-free upper bounds for tail probabilities,
+4. and the practical usefulness of these quantities depends on their estimation stability.
+
+Thus quantiles, semivariance, lower partial moments, and estimation error are not separate topics. They are manifestations of the same benchmark-relative geometry.
+
+---
+
+## Why Threshold Analysis Needs More Than Quantiles
+
+A lower-tail quantile identifies where adverse observations begin to accumulate. If \(X\) is a random variable and \(\alpha \in (0,1)\), the lower \(\alpha\)-quantile is
+
+\[
+Q_X(\alpha)=\inf\{t : F_X(t)\ge \alpha\}.
+\]
+
+This is already enough to partition a chosen fraction of lower-tail probability mass.
+
+But quantiles alone do not answer two additional questions that matter in practice.
+
+First, how conservative is the selected threshold if the data are skewed, heavy-tailed, or otherwise poorly described by a parametric model?
+
+Second, how stable is the threshold estimate in finite samples?
+
+The first question is about probability control. A threshold may appear numerically precise, but if it relies on a misspecified model then its practical interpretation can be fragile exactly where decisions are most sensitive.
+
+The second question is about sample sensitivity. Any statistic used in a decision system must stabilize as information accumulates. A threshold or directional probability measure that behaves erratically under modest sample variation may be mathematically elegant but operationally weak.
+
+The purpose of this chapter is therefore broader than quantile selection alone. It is to show that the directional framework supports:
+
+- threshold selection,
+- severity-weighted thresholding,
+- directional probability bounds,
+- and finite-sample estimation diagnostics
+
+within one common structure.
+
+---
+
+## Degree-Zero Thresholds and Their Directional Meaning
+
+A foundational identity of the directional framework is
+
+\[
+L_0(t;X)=P(X\le t)=F_X(t).
+\]
+
+That result means the cumulative distribution function is not external to the partial-moment framework. It is the degree-0 lower partial moment itself.
+
+Therefore the lower-tail threshold at probability level \(\alpha\) is
+
+\[
+t_\alpha^{(0)}=\inf\{t : L_0(t;X)\ge \alpha\}.
+\]
+
+This is simply the ordinary lower quantile written in directional form.
+
+In finance this degree-0 threshold is often called Value-at-Risk, but the mathematical object is more general than that label. The same degree-0 threshold can represent:
+
+- a maximum tolerated forecast shortfall,
+- a minimum service threshold,
+- a lower safety boundary,
+- or any analyst-defined adverse benchmark.
+
+The structural meaning is identical in every case:
+
+\[
+t_\alpha^{(0)}
+\]
+
+is the benchmark at which an \(\alpha\) fraction of observations lies at or below the threshold.
+
+Thus degree 0 is the **frequency-calibrated** threshold rule.
+
+---
+
+## From Frequency to Severity: Higher-Degree Thresholds
+
+Degree-0 thresholds count adverse events, but they do not distinguish between small and large deviations once the threshold is crossed.
+
+That limitation is exactly what higher-order partial moments correct.
+
+For degree \(d \ge 1\), define the lower and upper directional masses
+
+\[
+L_d(t;X)=E[(t-X)_+^d], \qquad U_d(t;X)=E[(X-t)_+^d].
+\]
+
+A normalized lower share is then
+
+\[
+F_d(t;X)=\frac{L_d(t;X)}{L_d(t;X)+U_d(t;X)}.
+\]
+
+When \(d=0\), this reduces to the ordinary probability partition. When \(d=1\), observations are weighted by the magnitude of their deviation from the benchmark. When \(d=2\), large deviations receive quadratic emphasis.
+
+This yields a family of generalized thresholds:
+
+\[
+t_\alpha^{(d)}=\inf\{t : F_d(t;X)\ge \alpha\}.
+\]
+
+The interpretation changes by degree:
+
+\[
+d=0 \rightarrow \text{event frequency},
+\]
+
+\[
+d=1 \rightarrow \text{aggregate adverse magnitude},
+\]
+
+\[
+d=2 \rightarrow \text{extreme-deviation sensitivity}.
+\]
+
+So the correct conceptual reading is not that partial moments add extra domain-specific measures. It is that quantile calibration itself can be performed in different geometries:
+
+- raw counting geometry at degree 0,
+- linear severity geometry at degree 1,
+- quadratic severity geometry at degree 2.
+
+---
+
+## Directional Probability Bounds
+
+Threshold selection is only part of the problem. Analysts also want distribution-free upper bounds on the probability of threshold violation.
+
+Suppose \(g < \mu\), where \(\mu = E[X]\), and consider the lower-tail event
+
+\[
+X \le g.
+\]
+
+A classical one-sided Chebyshev argument bounds this probability using only the mean and variance:
+
+\[
+P(X \le g)\le \frac{1}{2}\left(\frac{\sigma}{\mu-g}\right)^2.
+\]
+
+A directional refinement replaces symmetric standard deviation with semideviation:
+
+\[
+P(X \le g)\le \left(\frac{\sigma_-}{\mu-g}\right)^2,
+\]
+
+where \(\sigma_-\) measures only downside dispersion.
+
+A more general bound uses lower partial moments of degree \(\alpha\). Define
+
+\[
+\theta(t,\alpha)=\left(E[(t-X)_+^\alpha]\right)^{1/\alpha}.
+\]
+
+Then, for \(g \le t\),
+
+\[
+P(X\le g)\le \left(\frac{\theta(t,\alpha)}{t-g}\right)^\alpha.
+\]
+
+So the directional hierarchy of probability control is
+
+\[
+\text{symmetric variance bound} \to \text{semivariance bound} \to \text{general lower-partial-moment bound}.
+\]
+
+Each step aligns the bound more closely with the side of the distribution that matters for the decision.
+
+---
+
+## Severity-Weighted Thresholds as Early-Intervention Rules
+
+Once higher-degree thresholds are viewed as severity-weighted quantiles, an important practical feature becomes clear.
+
+A degree-1 or degree-2 threshold can be less extreme than the degree-0 threshold and yet still produce milder realized tail behavior.
+
+This is not a contradiction. It occurs because higher-degree thresholds are calibrated in weighted directional mass, not in raw event counts.
+
+Mathematically, this makes sense. The degree-2 rule assigns much more weight to large adverse deviations than to small ones. If a 10-unit shortfall contributes \(10^2\) units of quadratic severity while a 1-unit shortfall contributes only \(1^2\), then the threshold naturally shifts toward earlier intervention.
+
+The same logic applies across domains:
+
+- in forecasting, the rule intervenes before very large misses accumulate,
+- in operations, it triggers replenishment before deep shortages form,
+- in engineering, it signals action before large safety-margin breaches dominate the lower tail.
+
+Thus higher-degree thresholds are best interpreted as **early-intervention rules under asymmetric cost**.
+
+---
+
+## Model Misspecification and Robustness
+
+Directional thresholds are especially useful when parametric models misrepresent the lower tail.
+
+The chapter-level lesson is general. Parametric misspecification is often most consequential exactly in the tail region where decision costs are highest.
+
+The directional framework responds in two ways.
+
+First, it estimates thresholds directly from empirical directional structure via degree-0 or higher-degree partial-moment quantiles.
+
+Second, it supplements those empirical thresholds with distribution-free probability bounds that remain valid under far weaker assumptions than a fully specified parametric family.
+
+This is particularly important under skewness, heavy tails, and asymmetric adverse regions, where symmetric models can understate the severity of rare but important events.
+
+---
+
+## Estimation Error and Sample-Size Sensitivity
+
+A directional statistic is only operationally useful if it stabilizes as sample size grows.
+
+Estimation error is therefore not a peripheral concern. It is central to any threshold-based or benchmark-driven decision process. If a statistic is unstable, then even a mathematically correct threshold rule can become unreliable in practice.
+
+The key empirical question is whether partial moments behave at least as well as classical mean-variance quantities under regular conditions and whether they improve upon them when the data are asymmetric or heavy-tailed.
+
+This matters well beyond portfolio optimization. Any benchmark-driven procedure depends on stable estimation of lower-tail structure. If lower partial moments and semideviation remain well behaved under skewness and heavy tails, then they are not only conceptually aligned with directional asymmetry. They are also strong candidates for practical nonparametric measurement when classical symmetric summaries are fragile.
+
+In particular, the stability of degree-0 partial moments reinforces the result that the cdf itself is a partial moment. The cdf is not merely a theoretical building block; it is also a stable empirical object within the directional system.
+
+---
+
+## Utility, Decision Context, and Why Degree Matters
+
+The correct threshold degree depends on the decision problem.
+
+In general benchmark-relative terms, the lesson is:
+
+- if the main concern is **how often** a threshold is crossed, degree 0 is appropriate;
+- if the concern is **how much aggregate damage** accumulates below the threshold, degree 1 is more natural;
+- if the concern is **rare but severe violations**, degree 2 or higher can be more appropriate.
+
+A benchmark may be a target return, a policy threshold, a forecast baseline, a service minimum, or a safety limit. The degree determines how adverse deviation relative to that benchmark is measured.
+
+So the framework is not one-threshold-fits-all. It is a family of threshold rules indexed by the geometry of the adverse region.
+
+---
+
+## Practical Workflow
+
+A general workflow for directional tail analysis is:
+
+1. **Choose a benchmark context.**
+ Identify the lower threshold region that matters substantively.
+
+2. **Estimate the degree-0 threshold.**
+ Compute
+
+ \[
+ t_\alpha^{(0)}=\inf\{t:L_0(t;X)\ge \alpha\}.
+ \]
+
+ This yields the frequency-calibrated threshold.
+
+3. **Estimate higher-degree thresholds.**
+ Compute degree-1 and degree-2 threshold rules through normalized directional mass:
+
+ \[
+ t_\alpha^{(d)}=\inf\{t:F_d(t;X)\ge \alpha\}.
+ \]
+
+4. **Bound lower-tail probability conservatively.**
+ Use one-sided Chebyshev, semivariance, and Atwood-style lower-partial-moment bounds to assess worst-case violation probabilities.
+
+5. **Compare with parametric approximations if relevant.**
+ Large discrepancies indicate model risk in the tail.
+
+6. **Assess finite-sample stability and sample-size sensitivity.**
+ Implement the sample-size sensitivity diagnostics from Section 17.7 using the **Maximum Entropy Bootstrap** workflow developed in Chapter 17, especially when threshold rules feed larger decision or optimization systems.
+
+This workflow makes clear why threshold analysis, probability bounds, and estimation error belong together.
+
+---
+
+## Summary
+
+This chapter extended the directional framework from interval estimation to full tail-threshold analysis.
+
+The key ideas are:
+
+- The lower-tail quantile is the degree-0 partial-moment threshold because
+
+ \[
+ L_0(t;X)=F_X(t).
+ \]
+
+- Higher-degree thresholds are severity-weighted calibrations of the lower tail, not merely alternative labels.
+
+- Semivariance and higher lower partial moments yield distribution-free upper bounds on threshold-violation probabilities.
+
+- Severity-weighted thresholds can act as early-intervention rules because they respond to adverse magnitude, not just event counts.
+
+- Parametric misspecification matters most in the tail, so empirical directional thresholds and probability bounds provide complementary robustness.
+
+- Partial moments form a coherent nonparametric language for threshold selection, probability control, and finite-sample decision support.
+
+In that sense, quantiles, semivariance, lower partial moments, and estimation error belong together. They are generated by the same benchmark-relative primitives and serve the same broader purpose: to make probability, thresholds, and adverse deviation analysis interpretable without relying on restrictive symmetry or parametric assumptions.
+
+Chapter 17 then supplies the synthetic-data and Maximum Entropy Bootstrap machinery used to operationalize these stability checks, after which Chapter 18 returns to recursive mean-split estimation for adaptive nonparametric regression.
+
+---
+
+## References
+
+- Berck, P., & Hihn, J. (1982). *Using the Semivariance to Estimate Safety-First Rules*. *American Journal of Agricultural Economics*, May 1982, 298-300.
+- Atwood, M. (1985). *Demonstration of the Use of Lower Partial Moments to Improve Safety-First Probability Limits*. *American Journal of Agricultural Economics*, 67(4), 880-886. DOI: 10.2307/1241818.
+- Rockafellar, R. T., & Uryasev, S. (2000). *Optimization of Conditional Value-at-Risk*. *Journal of Risk*, 2(3), 21-41.
+- Chebyshev, P. L. (1867). *Des valeurs moyennes*. *Journal de Mathématiques Pures et Appliquées*, 12, 177-184.
+- Viole, F., & Nawrocki, D. (2012). *Cumulative Distribution Functions and UPM/LPM Analysis*. SSRN. DOI: https://dx.doi.org/10.2139/ssrn.2148482
+- Viole, F. (2025). *Value-at-Risk (VaR) and Probability Bounds Analysis* (June 18, 2025). SSRN. Available at: https://ssrn.com/abstract=5310345. DOI: http://dx.doi.org/10.2139/ssrn.5310345
+- Nawrocki, D., & Viole, F. (2024). *Estimation error and partial moments*. *International Review of Financial Analysis*, 95, Part B, 103443. DOI: https://doi.org/10.1016/j.irfa.2024.103443
diff --git a/tools/NNS/book/chapter-17-prediction-intervals.Rmd b/tools/NNS/book/chapter-17-prediction-intervals.Rmd
new file mode 100644
index 0000000..4265b62
--- /dev/null
+++ b/tools/NNS/book/chapter-17-prediction-intervals.Rmd
@@ -0,0 +1,405 @@
+# Prediction Intervals
+
+Previous chapters established the directional framework for probability, dependence, and statistical inference. Chapter 13 derived conditional probability and Bayes' theorem from co-partial moments, and Chapter 15 developed methods for comparing entire distributions without relying on parametric hypothesis tests — including the formal establishment of the degree-one partial moment CDF as a bias-free, continuous probability representation.
+
+A natural next step is **prediction**.
+
+In statistical analysis it is often necessary to estimate the range within which **future observations** are likely to occur. Classical statistics addresses this problem using *confidence intervals* and *prediction intervals*. These concepts are frequently confused, but they serve fundamentally different purposes.
+
+This chapter clarifies the distinction and develops **distribution-free prediction intervals** based on partial moments. Because the directional framework represents distributions directly through probability mass relative to benchmarks, interval estimation can be constructed without parametric assumptions and without relying on asymptotic approximations.
+
+---
+
+## Confidence Intervals versus Prediction Intervals
+
+A **confidence interval** estimates an unknown population parameter.
+
+For example, a classical confidence interval for the mean takes the form
+
+$$
+\bar{X} \pm z_{\alpha/2} \frac{s}{\sqrt{n}},
+$$
+
+where
+
+- $\bar{X}$ is the sample mean,
+- $s$ is the sample standard deviation,
+- $n$ is the sample size.
+
+This interval reflects uncertainty about the **parameter** $\mu = E[X]$.
+
+A **prediction interval**, by contrast, estimates the range within which a **future observation** will fall.
+
+For a normally distributed population, a classical prediction interval is
+
+$$
+\bar{X} \pm t_{\alpha/2,n-1} s \sqrt{1 + \frac{1}{n}}.
+$$
+
+Prediction intervals are wider because they incorporate two sources of uncertainty:
+
+1. uncertainty about the population parameters, and
+2. natural variability of individual observations.
+
+In practice, however, classical prediction intervals rely heavily on **distributional assumptions**, particularly normality.
+
+The directional framework allows prediction intervals to be constructed **directly from the empirical distribution**, avoiding these assumptions entirely.
+
+---
+
+## The Discrete–Continuous Distinction: Recap and Application to Intervals
+
+Chapter 15 (Section 15.1.8) established a fundamental result that carries directly into interval estimation. The empirical CDF is a **degree-zero lower partial moment** — a discrete, step-function measure that is systematically biased at the mean for any finite sample:
+
+$$
+\hat{F}_X(\bar{x}) = \frac{1}{n}\sum_{i=1}^n \mathbf{1}_{\{x_i \le \bar{x}\}} \ne 0.5
+$$
+
+in general. The **degree-one partial moment ratio**
+
+$$
+F_1(t; X) = \frac{LPM_1(t, X)}{LPM_1(t, X) + UPM_1(t, X)}
+$$
+
+eliminates this bias exactly: $F_1(\bar{x}; X) = 0.5$ for every distribution, every sample size, without exception (Section 15.1.8).
+
+This distinction matters for interval estimation because any quantile procedure that uses the discrete CDF to locate interval bounds inherits the same finite-sample bias — producing intervals that are systematically shifted or miscalibrated. The degree-one ratio corrects this at the source.
+
+---
+
+## Prediction as a Quantile Problem
+
+Prediction intervals can be understood as **quantile intervals** of the underlying distribution.
+
+Let $X$ be a random variable with cumulative distribution function $F_X(t)$. The $p$-quantile is defined as
+
+$$
+Q_X(p) = \inf \{t : F_X(t) \ge p\}.
+$$
+
+A prediction interval with coverage probability $1-\alpha$ is therefore
+
+$$
+[Q_X(\alpha/2), \, Q_X(1-\alpha/2)].
+$$
+
+For example, a 95% prediction interval corresponds to
+
+$$
+[Q_X(0.025),\, Q_X(0.975)].
+$$
+
+Classical parametric methods estimate these quantiles using assumed distributions. The directional framework estimates them **directly from partial moments**, with the additional capability to choose between discrete (degree-zero) and continuous (degree-one) probability representations.
+
+Prediction intervals are one application of quantile inversion, but they are not the only one. The same inversion logic can also be used to select benchmark thresholds for directional decision analysis.
+
+A lower-tail threshold chosen by
+\[
+\inf\{t:F_X(t)\ge \alpha\}
+\]
+can be interpreted as the lower endpoint of a tail-quantile construction. In finance this object is often called Value-at-Risk, but the underlying mathematics is much more general. It may represent an acceptable forecast error, a reliability boundary, a minimum service level, or any other adverse threshold defined relative to a benchmark.
+
+This distinction is conceptual rather than mathematical. Prediction intervals ask for a range likely to contain future observations. Threshold analysis asks where the lower tail begins to contain a specified fraction of directional mass. In both cases the central task is quantile inversion.
+
+Once higher degrees are introduced, the interpretation broadens further. Degree-zero thresholds partition observations by frequency. Higher-degree thresholds partition them by severity-weighted directional mass. Thus the quantile framework supports both interval estimation and benchmark-sensitive threshold design.
+
+---
+
+## Partial-Moment Quantile Functions: LPM.VaR and UPM.VaR
+
+The NNS package provides two complementary quantile functions that invert the partial moment CDF:
+
+**`LPM.VaR(percentile, degree, variable)`** — returns the value $t$ such that $F_\text{degree}(t; X) = p$ for the lower tail. This is the left-tail (lower) quantile at probability level $p$.
+
+**`UPM.VaR(percentile, degree, variable)`** — returns the value $t$ such that $1 - F_\text{degree}(t; X) = p$ for the upper tail. This is the right-tail (upper) quantile at probability level $p$.
+
+Both functions accept a degree argument. Setting `degree = 0` uses the discrete empirical CDF and therefore returns classical empirical quantiles. Setting `degree = 1` uses the continuous area-based probability representation established in Chapter 15. Higher degrees extend the same inversion principle to severity-weighted directional probability. Thus `LPM.VaR` is best interpreted not as a finance-specific tool, but as a general lower-tail threshold operator generated by the partial-moment framework.
+
+For a prediction interval with coverage $1 - \alpha$:
+
+```r
+# 95% prediction interval using continuous (degree = 1) quantiles
+lower <- LPM.VaR(percentile = 0.025, degree = 1, x = x)
+upper <- UPM.VaR(percentile = 0.025, degree = 1, x = x)
+```
+
+The degree-one quantiles provide **smoother, less jagged interval boundaries** than their degree-zero counterparts, particularly in small samples where the step-function CDF may produce large jumps between adjacent order statistics.
+
+
+
+### Generalized Threshold Operators
+
+The notation `LPM.VaR` can appear more domain-specific than the underlying mathematics actually is. In the present framework, the function returns the benchmark \(t\) that solves a lower-tail threshold problem under a chosen directional degree.
+
+When \(d=0\), the threshold is
+\[
+t_\alpha^{(0)}=\inf\{t:L_0(t;X)\ge \alpha\},
+\]
+which is the ordinary empirical lower quantile.
+
+When \(d=1\), the threshold instead solves
+\[
+t_\alpha^{(1)}=\inf\left\{t:\frac{L_1(t;X)}{L_1(t;X)+U_1(t;X)}\ge \alpha\right\}.
+\]
+This no longer counts all observations equally. Larger deviations below the benchmark contribute more heavily to the lower-tail mass.
+
+Similarly, when \(d=2\),
+\[
+t_\alpha^{(2)}=\inf\left\{t:\frac{L_2(t;X)}{L_2(t;X)+U_2(t;X)}\ge \alpha\right\},
+\]
+so extreme adverse deviations receive quadratic weight.
+
+These thresholds therefore define a family of directional calibration rules:
+\[
+\text{degree 0: frequency-calibrated threshold},
+\]
+\[
+\text{degree 1: magnitude-calibrated threshold},
+\]
+\[
+\text{degree 2: extreme-deviation-calibrated threshold}.
+\]
+
+This interpretation is fully general. It applies whenever one wishes to choose a threshold not only by how often a process crosses it, but also by how severely the process behaves once crossed.
+
+
+## Link to Stochastic Dominance (Chapter 15)
+
+Chapters 15 and 17 use the same degree-one quantile geometry from different angles.
+
+- In **stochastic dominance** (Chapter 15), the ordering is defined by integrated quantile/CDF behavior.
+- In **prediction intervals** (this chapter), interval endpoints are degree-zero or degree-one quantiles from `LPM.VaR` and `UPM.VaR`.
+
+The methods are mathematically unified: the degree-one objects used to construct bias-corrected intervals are the same continuous probability objects used to diagnose dominance relations.
+
+---
+
+## Comparison with Bootstrap Confidence Intervals
+
+The difference between discrete and continuous partial moment quantiles becomes especially apparent when applied to **bootstrap confidence intervals**.
+
+Consider the correlation statistic from the `law` dataset (Efron and Tibshirani, 1993). Standard bootstrap methods produce a range of confidence intervals depending on the method chosen:
+
+```r
+library(bootstrap); library(boot)
+data("law")
+
+get_r <- function(data, indices, x, y) {
+ d <- data[indices, ]
+ round(as.numeric(cor(d[x], d[y])), 3)
+}
+
+set.seed(12345)
+boot_out <- boot(law, x = "LSAT", y = "GPA", R = 500, statistic = get_r)
+
+boot.ci(boot_out)
+## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
+## Level Normal Basic
+## 95% ( 0.5247, 1.0368 ) ( 0.5900, 1.0911 )
+## Level Percentile BCa
+## 95% ( 0.4609, 0.9620 ) ( 0.3948, 0.9443 )
+```
+
+The distribution of bootstrapped correlations is visibly asymmetric — there is a long left tail and the upper bound exceeds 1.0, which is impossible for a correlation coefficient.
+
+## Degree-Zero Partial Moment Intervals
+
+The degree-zero LPM quantile corresponds exactly to the percentile bootstrap method. The discrete CDF assigns equal weight to each bootstrap replicate, producing the same step-function quantiles:
+
+```r
+# Discrete lower and upper CI — corresponds to percentile method
+LPM.VaR(percentile = 0.025, degree = 0, x = boot_out$t)
+## [1] 0.4688333
+
+UPM.VaR(percentile = 0.025, degree = 0, x = boot_out$t)
+## [1] 0.9632222
+```
+
+## Degree-One Partial Moment Intervals
+
+The degree-one quantile uses area-based probability, which weights replicates by their distance from the boundary. This naturally down-weights extreme observations and corrects for asymmetry — without a double bootstrap:
+
+```r
+# Continuous CI — bias-corrected, no double bootstrap needed
+LPM.VaR(percentile = 0.025, degree = 1, x = boot_out$t)
+## [1] 0.5612749
+
+UPM.VaR(percentile = 0.025, degree = 1, x = boot_out$t)
+## [1] 0.8263255
+```
+
+The degree-one interval $(0.561,\, 0.826)$ is substantially tighter and more symmetric than the percentile interval $(0.461,\, 0.963)$ — and without requiring the computationally expensive double bootstrap needed for studentized intervals. The upper bound is comfortably below 1.0.
+
+Classical bootstrap corrections (BCa, studentized) attempt to fix asymmetry through additional resampling passes or acceleration constants. The degree-one partial moment approach achieves comparable correction by simply replacing the discrete counting measure with area-based probability on the **original** bootstrap sample — the same substitution that eliminates bias in the NNS ANOVA procedure (Chapter 15, Section 15.1.8).
+
+---
+
+## Distribution-Free Prediction Intervals
+
+Using the NNS quantile functions, a prediction interval with coverage probability $1-\alpha$ is constructed as
+
+$$
+[\text{LPM.VaR}(\alpha/2,\, d,\, X),\; \text{UPM.VaR}(\alpha/2,\, d,\, X)]
+$$
+
+for a chosen degree $d \in \{0, 1\}$.
+
+**Degree 0** recovers the classical empirical quantile interval — the order statistics at ranks $\lceil n\alpha/2 \rceil$ and $\lceil n(1-\alpha/2) \rceil$.
+
+**Degree 1** provides a continuous, bias-corrected alternative that avoids the finite-sample discretization error documented in Section 15.1.8.
+
+No parametric assumptions, no variance estimates, and no asymptotic approximations are required by either variant.
+
+
+More generally, one may define a lower-tail directional threshold by
+\[
+t_\alpha^{(d)}=\mathrm{LPM.VaR}(\alpha,\text{ degree }=d,X),
+\]
+where \(d=0\) corresponds to frequency-based probability and \(d\ge 1\) corresponds to severity-weighted directional probability. The interpretation of the threshold changes with \(d\), but the computational principle remains unchanged: invert the chosen directional probability representation.
+
+---
+
+## Example
+
+Consider the fully specified sample
+
+$$
+X = -2,\,-1,\,0,\,1,\,2,\,3,\,5,\,7,\,9,\,10.
+$$
+
+For a 90% prediction interval ($\alpha = 0.10$):
+
+```r
+x <- c(-2, -1, 0, 1, 2, 3, 5, 7, 9, 10)
+
+# Degree 0: classical empirical quantiles
+LPM.VaR(percentile = 0.05, degree = 0, x = x)
+## [1] -2
+UPM.VaR(percentile = 0.05, degree = 0, x = x)
+## [1] 10
+
+# Degree 1: continuous area-based quantiles
+LPM.VaR(percentile = 0.05, degree = 1, x = x)
+## [1] -1.65
+UPM.VaR(percentile = 0.05, degree = 1, x = x)
+## [1] 9.15
+```
+
+The degree-zero interval is $[-2, 10]$ — anchored exactly at the observed extremes. The degree-one interval is tighter, reflecting the continuous probability mass that lies between order statistics. With only $n=10$ observations, this difference is practically meaningful.
+
+---
+
+
+
+### Worked Threshold Example: Frequency versus Severity
+
+To see how degree changes interpretation, consider a sample \(X\) and a lower-tail calibration level \(\alpha\).
+
+A degree-zero threshold is
+\[
+t_\alpha^{(0)}=\mathrm{LPM.VaR}(\alpha,\text{ degree }=0,X),
+\]
+which partitions the sample by event frequency. Approximately an \(\alpha\) fraction of observations lies below the selected threshold.
+
+Now compare this with degree-one and degree-two thresholds:
+\[
+t_\alpha^{(1)}=\mathrm{LPM.VaR}(\alpha,\text{ degree }=1,X),
+\qquad
+t_\alpha^{(2)}=\mathrm{LPM.VaR}(\alpha,\text{ degree }=2,X).
+\]
+These thresholds are chosen in a different geometry. They do not partition simple counts. Instead they partition severity-weighted directional mass. Large adverse deviations therefore influence the threshold more strongly than small ones.
+
+This leads to an important practical phenomenon. A severity-weighted threshold may occur at a milder benchmark than a frequency-based threshold, yet still produce a less severe set of threshold violations in realized use. The reason is that the rule is designed to trigger earlier, before the most extreme adverse deviations dominate the lower tail. The probability-bounds literature demonstrates this effect empirically in finance examples, but the mechanism is general and should be understood as an early-intervention property of higher-degree threshold calibration.
+
+This distinction matters in many domains:
+
+* in forecasting, a severity-weighted threshold warns before very large misses accumulate,
+* in operations, it triggers replenishment before severe shortages emerge,
+* in engineering, it signals intervention before large safety-margin breaches dominate the tail.
+
+What appears paradoxical under degree-zero counting becomes natural under directional weighting. The threshold is solving a different partition problem.
+
+
+## Interpretation and Coverage
+
+Prediction intervals constructed from partial moment quantiles possess a simple probabilistic interpretation.
+
+For general threshold analysis, the interpretation can be stated succinctly. Under degree zero, the lower-tail threshold controls event frequency. Under higher degrees, the threshold controls weighted adverse exposure. The probability statement is therefore exact in the ordinary counting sense only for degree zero. For higher degrees, what is controlled is not raw event count but directional mass in a weighted geometry.
+
+Let $X_{n+1}$ denote a future observation drawn from the same distribution as the sample. Then
+
+$$
+P\bigl(\text{LPM.VaR}(\alpha/2, d, X) \le X_{n+1} \le \text{UPM.VaR}(\alpha/2, d, X)\bigr) \approx 1 - \alpha.
+$$
+
+For degree $d = 0$, the approximation converges to equality as $n \to \infty$.
+
+For degree $d = 1$, the continuous probability representation reduces finite-sample error, providing improved coverage in small samples without parametric assumptions.
+
+Unlike parametric prediction intervals, neither variant depends on distributional assumptions. Coverage is determined entirely by the empirical probability structure.
+
+---
+
+## Conditional Prediction Intervals via NNS Regression
+
+Conditional intervals derived from `NNS.reg` are developed in detail in Chapter 21, where the regression estimator is introduced formally and interpreted geometrically.
+
+For continuity, the key idea is that `NNS.reg(..., confidence.interval = ...)` builds local intervals from partition-specific empirical distributions. In this chapter, we focus on the unconditional interval mechanics (`LPM.VaR` / `UPM.VaR`) that those later conditional constructions rely on.
+
+```r
+set.seed(12345)
+x <- runif(1000, -2, 2)
+y <- sin(pi * x_train) + rnorm(1000, sd = 0.2)
+
+NNS.reg(x = x_train, y = y_train, order = NULL, confidence.interval = .95)
+```
+
+
+
+
+
+## Advantages of the Directional Approach
+
+Prediction intervals based on partial moments possess several advantages over classical methods.
+
+**Bias elimination.**
+As established in Chapter 15 (Section 15.1.8), the degree-one partial moment ratio places exactly 50% of probability mass below the mean for every distribution and every sample size. Interval bounds derived from degree-one quantiles do not inherit the systematic bias of the discrete empirical CDF.
+
+**Distribution-free construction.**
+No assumptions about normality or parametric form are required.
+
+**Robustness.**
+Intervals are determined by empirical probability mass rather than moment estimates that may be sensitive to extreme observations.
+
+**No double bootstrap required.**
+Asymmetry correction that classical methods achieve only with computationally expensive double-bootstrap or BCa procedures is obtained automatically from the continuous partial moment representation.
+
+**Conditional adaptivity.**
+Through the NNS regression framework, prediction intervals adapt to local data structure — capturing nonlinearity and heteroskedasticity without parametric specification.
+
+**Interpretability.**
+Intervals correspond directly to probability statements about future observations, rooted in directional partial-moment representations.
+
+A further advantage of the directional approach is that it unifies interval estimation and threshold-based decision analysis. The analyst need not switch theories when moving from prediction intervals to adverse-threshold selection. In both cases the task is to invert a directional probability representation. The only difference is whether mass is counted equally, as in degree zero, or weighted by severity, as in higher degrees.
+
+A second advantage is that threshold analysis can be separated from domain-specific naming conventions. Lower-tail quantiles, conditional lower-tail means, semivariance, and higher-order partial moments can all be interpreted inside a single benchmark-relative framework. The estimation-error literature supports this broader view by treating partial moments as nonparametric statistical objects with useful asymptotic behavior, rather than merely as specialized measures for one field.
+
+---
+
+## Summary
+
+Prediction intervals describe the range within which future observations are expected to occur.
+
+Classical prediction intervals rely on parametric assumptions and variance estimates. Bootstrap methods correct for asymmetry but require multiple resampling passes. The directional framework constructs intervals directly from the **empirical distribution represented by partial moments**, with an additional layer of bias correction available through the continuous (degree-one) probability representation established in Chapter 15.
+
+Key results:
+
+- The discrete CDF ($d=0$) is a degree-zero partial moment — unbiased asymptotically but systematically biased for any finite sample (Chapter 15, Section 15.1.8).
+- The continuous partial moment CDF ($d=1$) eliminates this bias exactly, with $F_1(\bar{x}; X) = 0.5$ without exception.
+- `LPM.VaR` and `UPM.VaR` invert these CDFs to produce distribution-free quantile intervals at either degree.
+- Degree-one intervals match or exceed bias-corrected bootstrap intervals without the need for double resampling.
+- Conditional prediction intervals from `NNS.reg` adapt automatically to local nonlinearity and heteroskedasticity.
+
+These ideas complete the core framework for **nonparametric statistical inference using directional statistics**. The next chapter turns to **directional tail thresholds, probability bounds, and estimation error**, extending interval logic into benchmark-driven tail-risk decisions before Chapter 17 develops synthetic data generation and bootstrap methods for uncertainty quantification.
+
+---
diff --git a/tools/NNS/book/chapter-18-recursive-mean-split-estimation.Rmd b/tools/NNS/book/chapter-18-recursive-mean-split-estimation.Rmd
new file mode 100644
index 0000000..23abdff
--- /dev/null
+++ b/tools/NNS/book/chapter-18-recursive-mean-split-estimation.Rmd
@@ -0,0 +1,548 @@
+# Recursive Mean-Split Estimation
+
+Part V of the book turns from probability representation and inference to **nonparametric estimation**.
+The goal of estimation is to recover unknown functional relationships from data without imposing a predetermined parametric form.
+
+A central object in many statistical problems is the **conditional mean function**
+
+\[
+f(x) = E[Y \mid X = x].
+\]
+
+Classical regression models estimate this relationship by specifying a functional form — linear, polynomial, or otherwise — and fitting parameters to the data.
+
+Nonparametric methods instead estimate \(f(x)\) directly from observations.
+Partition-based estimators are among the most intuitive approaches: the predictor space is divided into regions, and the conditional mean is estimated locally within each region.
+
+This chapter introduces the **recursive mean-split estimator**, the partition-based nonparametric method at the core of the NNS framework.
+The estimator recursively divides the data into regions based on conditional mean structure, producing a flexible estimator that adapts automatically to nonlinear relationships.
+
+Two distinct splitting modes are available. In **joint partitioning**, regions are defined by splitting simultaneously on both the predictor \(X\) and the response \(Y\) at their joint means, producing four partial-moment quadrants at each level. In **\(X\)-only partitioning**, regions are defined by splitting solely on the predictor mean, producing two subregions at each level. Both modes share the same limit condition and recursive logic; they differ in how region boundaries are located and how many subregions each split creates.
+
+The procedure reflects the same directional logic used throughout the book: just as variance decomposes into directional components relative to a benchmark, recursive mean splitting partitions the joint distribution according to deviations around conditional means.
+
+A key theoretical result is established here: the recursive mean-split estimator belongs to the well-studied class of **data-adaptive partition estimators**, and consistency is inherited directly from that class under standard conditions on cell diameter and occupancy. This is not a limitation requiring further development — it is a strength. The estimator sits within a family whose convergence properties are fully characterized in the literature, and the NNS contribution is the specific splitting rule, the partial-moment geometric interpretation, and the multivariate architecture built on top of that foundation.
+
+---
+
+## Motivation for Partition-Based Estimation
+
+Suppose we observe independent pairs
+
+\[
+(X_1, Y_1), \dots, (X_n, Y_n),
+\]
+
+where
+
+\[
+Y_i = f(X_i) + \varepsilon_i,
+\]
+
+and
+
+\[
+E[\varepsilon_i \mid X_i] = 0.
+\]
+
+The objective is to estimate the unknown regression function
+
+\[
+f(x) = E[Y \mid X = x].
+\]
+
+### Parametric approaches
+
+Parametric regression assumes a functional form such as
+
+\[
+f(x) = \beta_0 + \beta_1 x.
+\]
+
+While simple and interpretable, this assumption can be severely restrictive when the relationship between variables is nonlinear.
+
+### Nonparametric alternatives
+
+Nonparametric estimators relax these assumptions. Common examples include
+
+- kernel regression,
+- local polynomial regression,
+- partition estimators.
+
+Kernel methods estimate \(f(x)\) through weighted averages of nearby observations.
+However, they require selecting a **bandwidth parameter** that determines the smoothing scale, and the probability that any kernel function assigns positive mass to an exact observed value is zero — the estimate therefore cannot achieve an exact fit at all observations simultaneously.
+
+Partition estimators divide the predictor space into regions and compute averages within each region.
+The recursive mean-split method belongs to this class but differs from classical approaches through its **partial-moment splitting rule**, which anchors region boundaries directly to the geometry of the conditional mean. The estimator is consistent for the true conditional mean under the same regularity conditions that govern all partition estimators in this class — conditions whose sufficiency has been established in the statistical literature. As the order parameter increases, the number of regions grows and the estimator converges to a perfect fit in finite steps.
+
+---
+
+## Consistency by Class Membership
+
+The recursive mean-split estimator is an instance of the **data-adaptive partition estimator** class studied by Stone (1977), Lugosi and Nobel (1996), and Györfi, Kohler, Krzyżak, and Walk (2002).
+
+The general result for this class is as follows.
+
+**Theorem (Consistency of Partition Estimators).** Let \((X_1, Y_1), \dots, (X_n, Y_n)\) be independent and identically distributed, with \(E[Y^2] < \infty\). Let \(\hat{f}_n\) be the partition estimator
+
+\[
+\hat{f}_n(x) = \frac{1}{N_{n,x}} \sum_{i : X_i \in A_n(x)} Y_i,
+\]
+
+where \(A_n(x)\) is the cell of a data-adaptive partition \(\mathcal{P}_n\) containing \(x\). If the partition satisfies
+
+\[
+\operatorname{diam}(A_n(x)) \xrightarrow{P} 0
+\qquad \text{and} \qquad
+N_{n,x} \xrightarrow{P} \infty
+\]
+
+as \(n \to \infty\), then
+
+\[
+E\bigl[(\hat{f}_n(x) - f(x))^2\bigr] \to 0.
+\]
+
+The recursive mean-split estimator satisfies both conditions under the standard occupancy and order growth assumptions used in practice. As \(n \to \infty\) with order parameter \(O = O(n)\) growing appropriately and minimum occupancy held fixed:
+
+- each cell's diameter converges to zero because successive splits are anchored to sample means, which concentrate around the true conditional mean as \(n\) grows, producing finer and finer partitions in regions where the function varies;
+- each cell's occupancy grows because the data density in any fixed region of the predictor space grows linearly with \(n\).
+
+Consequently, **consistency of the recursive mean-split estimator is inherited directly from the established theory of partition estimators.** This is not an approximate or informal claim. The estimator belongs to a well-characterized class, and the convergence result applies.
+
+What the NNS framework contributes beyond this class membership is:
+
+1. a specific splitting rule grounded in the partial-moment geometry of the data,
+2. a finite-order perfect-fit property unavailable to kernel estimators,
+3. a multivariate architecture — developed fully in Chapter 21 — that operates on per-regressor regression points rather than on the raw observation cloud, substantially mitigating the curse of dimensionality.
+
+---
+
+## Conditional Mean Partitions
+
+Let \(P_n\) denote a partition of the data space into regions
+
+\[
+A_1, A_2, \dots, A_K.
+\]
+
+Within each region, the conditional mean is estimated by averaging the responses of observations whose predictors fall inside that region.
+
+For a predictor value \(x\), let
+
+\[
+A_n(x)
+\]
+
+be the cell containing \(x\). The partition estimator is
+
+\[
+\hat{f}_n(x)
+=
+\frac{1}{N_{n,x}}
+\sum_{i : X_i \in A_n(x)} Y_i,
+\]
+
+where
+
+\[
+N_{n,x} = \#\{ i : X_i \in A_n(x) \}
+\]
+
+is the number of observations in the cell.
+
+The estimate of \(f(x)\) is therefore the **average response within the region containing \(x\)**.
+
+This structure is flexible in three respects:
+
+- Regions adapt to nonlinear features of the conditional mean surface.
+- Local averages approximate the conditional expectation without a specified functional form.
+- Estimation is entirely data-driven.
+
+The **size and shape of regions determine the effective smoothing scale**: coarser regions produce smoother estimates; finer regions track local variation more closely.
+
+---
+
+## Recursive Splitting Algorithms
+
+The key design choice is how the regions \(A_k\) are constructed.
+
+The NNS recursive mean-split estimator builds partitions through an iterative procedure governed by an **order parameter** \(O\).
+At each order, each existing region is further subdivided.
+When left unspecified in the package interface (`order = NULL`), recursion depth is determined internally for each regressor according to its directional dependence with the response; Chapter 21 formalizes this dependence-driven adaptivity in the regression workflow.
+
+Two splitting modes are available, depending on whether regions are anchored to the joint distribution of \((X, Y)\) or to the marginal distribution of \(X\) alone.
+
+---
+
+## Joint Partitioning
+
+Joint partitioning defines regions by splitting simultaneously on both \(X\) and \(Y\) at their local means.
+
+### Initialization
+
+Begin with all \(n\) observations as a single region.
+
+### Splitting rule
+
+For a region \(R\) containing observations \(\{(X_i, Y_i)\}_{i \in I_R}\), compute the local means
+
+\[
+\bar{X}_R = \frac{1}{|I_R|} \sum_{i \in I_R} X_i,
+\qquad
+\bar{Y}_R = \frac{1}{|I_R|} \sum_{i \in I_R} Y_i.
+\]
+
+Partition the observations in \(R\) into four quadrants according to whether each observation lies above or below \(\bar{X}_R\) and \(\bar{Y}_R\):
+
+| Quadrant | Condition | Label |
+|----------|-----------|-------|
+| CoUPM | \(X_i > \bar{X}_R\) and \(Y_i > \bar{Y}_R\) | NE |
+| DUPM | \(X_i \le \bar{X}_R\) and \(Y_i > \bar{Y}_R\) | NW |
+| DLPM | \(X_i > \bar{X}_R\) and \(Y_i \le \bar{Y}_R\) | SE |
+| CoLPM | \(X_i \le \bar{X}_R\) and \(Y_i \le \bar{Y}_R\) | SW |
+
+These four quadrants correspond directly to the four co-partial moment regions introduced in Chapter 10.
+The joint mean \((\bar{X}_R, \bar{Y}_R)\) is the point at which
+
+\[
+U_1(\bar{X}_R; X \mid R) = L_1(\bar{X}_R; X \mid R)
+\quad \text{and} \quad
+U_1(\bar{Y}_R; Y \mid R) = L_1(\bar{Y}_R; Y \mid R),
+\]
+
+so the split occurs precisely where the first-order upper and lower partial moments are balanced on both dimensions. Each observation is assigned to one of the four quadrants, with a **quadrant identification number** recorded at each level.
+
+### Recursion
+
+Within each nonempty quadrant, repeat the same procedure: compute local means and split into four subquadrants.
+Continuing to order \(O\) produces at most \(4^{O-1}\) nonempty regions.
+
+### Regression points
+
+The **regression points** of the partition are the local means \((\bar{X}_R, \bar{Y}_R)\) within each region.
+These points summarize the conditional mean surface and serve as the basis for prediction and curve fitting.
+
+```r
+x <- seq(-5, 5, .05)
+y <- x ^ 3
+
+for(i in 1 : 4){NNS.part(x, y, order = i, Voronoi = TRUE, obs.req = 0)}
+```
+
+
+
+---
+
+## \(X\)-Only Partitioning
+
+\(X\)-only partitioning defines regions by splitting solely on the mean of the predictor \(X\), without reference to \(Y\).
+
+### Initialization
+
+Begin with all \(n\) observations as a single region.
+
+### Splitting rule
+
+For a region \(R\) with observations \(\{(X_i, Y_i)\}_{i \in I_R}\), compute the local predictor mean
+
+\[
+\bar{X}_R = \frac{1}{|I_R|} \sum_{i \in I_R} X_i.
+\]
+
+Partition the observations into two subregions:
+
+\[
+R_L = \{ i \in I_R : X_i \le \bar{X}_R \},
+\qquad
+R_U = \{ i \in I_R : X_i > \bar{X}_R \}.
+\]
+
+Quadrant identifications in this mode are limited to the symbols 1 (left of the split) and 2 (right of the split).
+
+### Recursion
+
+Within each nonempty subregion, repeat the same procedure.
+Continuing to order \(O\) produces at most \(2^O\) nonempty regions.
+
+### Regression points
+
+The regression points are the local means \((\bar{X}_R, \bar{Y}_R)\) within each predictor-defined region.
+Because the region boundaries are determined by \(X\) alone, the regression points make use of the **full bandwidth** of the response values within each region for their \(Y\) coordinate.
+
+```r
+x <- seq(-5, 5, .05)
+y <- x ^ 3
+
+for(i in 1 : 4){NNS.part(x, y, order = i, type = "XONLY", Voronoi = TRUE, obs.req = 0)}
+```
+
+
+
+
+
+---
+
+## Comparison of Modes
+
+Both modes share the same conceptual foundation — recursive splitting at local means — and converge to a perfect fit as \(O\) grows. They differ in the following respects.
+
+| Property | Joint partitioning | \(X\)-only partitioning |
+|---|---|---|
+| Split dimensions | \(X\) and \(Y\) jointly | \(X\) only |
+| Subregions per split | 4 | 2 |
+| Regions at order \(O\) | at most \(4^{O-1}\) | at most \(2^O\) |
+| Quadrant IDs | digits 1–4 | digits 1–2 |
+| Region definition | Joint (X,Y) partial-moment quadrants | Predictor-space intervals |
+| Regression points | Joint quadrant means | Predictor-interval means |
+
+Joint partitioning converges faster due to exponential growth in regions and provides a richer representation of the joint distribution.
+\(X\)-only partitioning is more directly analogous to classical predictor-space partition estimators and is often preferred when a clear functional relationship \(Y = f(X)\) is the primary object of interest.
+
+### Stopping criteria
+
+Splitting terminates when any of the following conditions are met:
+
+- A region contains fewer than a user-specified minimum number of observations (minimum occupancy threshold).
+- The order parameter \(O\) reaches a user-specified maximum.
+
+The minimum occupancy threshold is quantitative and directly governs the coarsest admissible partition. Under any fixed occupancy threshold, as \(n \to \infty\) the occupied regions shrink and the occupancy within each region grows, satisfying the two conditions required for consistency in the partition estimator class.
+
+---
+
+## Partial-Moment Interpretation of Joint Partitioning
+
+The joint splitting rule has a precise interpretation in terms of the partial-moment framework developed in earlier chapters.
+
+Recall from Chapter 5 that variance decomposes into directional components:
+
+\[
+\text{Var}(X) = U_2(\mu_X; X) + L_2(\mu_X; X),
+\]
+
+where
+
+\[
+U_2(\mu_X; X) = E[(X - \mu_X)_+^2]
+\quad \text{and} \quad
+L_2(\mu_X; X) = E[(\mu_X - X)_+^2].
+\]
+
+Recursive joint splitting applies this logic at the level of the **conditional distribution within each region**.
+
+Each split separates observations into four directional groups defined by their co-partial-moment quadrant membership — the same four regions introduced in Chapter 10:
+
+- **CoUPM**: above \(\bar{X}_R\) and above \(\bar{Y}_R\),
+- **CoLPM**: below \(\bar{X}_R\) and below \(\bar{Y}_R\),
+- **DLPM**: above \(\bar{X}_R\) and below \(\bar{Y}_R\),
+- **DUPM**: below \(\bar{X}_R\) and above \(\bar{Y}_R\).
+
+The joint mean \((\bar{X}_R, \bar{Y}_R)\) is the unique point at which the first-order partial moments balance on both axes. Splitting at this point therefore exactly decomposes the local joint variability into its four directional components.
+
+By recursively applying this decomposition, the algorithm parses the joint distribution into regions of progressively refined directional structure.
+Areas where the conditional mean changes most rapidly — where the CoUPM and CoLPM quadrants are most unequal in their response means — are split more frequently as \(O\) increases, because the local means shift and the quadrant boundaries realign at each level.
+
+The resulting partition structure therefore reflects the geometry of the conditional mean surface, not an externally imposed grid.
+
+---
+
+## Multivariate Predictors
+
+For a predictor vector \(X_i \in \mathbb{R}^d\), the recursive mean-split procedure extends through an architecture designed specifically to mitigate the curse of dimensionality. This architecture is described in full in Chapter 21; its foundations are laid here.
+
+### Per-regressor partitioning against the response
+
+The key structural decision in the multivariate case is that **each predictor is partitioned independently against the response**, rather than partitioning the joint predictor space.
+
+For predictor \(j \in \{1, \dots, d\}\ \), recursive mean splitting is applied to the pairs \((X_i^{(j)}, Y_i)\) to produce a set of regression points for that predictor — local conditional means summarizing the relationship between \(X^{(j)}\) and \(Y\). These per-regressor partitions are each governed by the same splitting rules described above and produce a set of \(K_j\) regression points for predictor \(j\).
+
+This is not joint partitioning of the full \(d\)-dimensional predictor space. It is a collection of univariate partitions, each anchored to the response. The important consequence is that the number of candidate regression points grows as \(\sum_j K_j\) — linearly in the number of regressors — rather than as \(\prod_j K_j\), which would grow exponentially and reproduce the curse.
+
+### Regression point matrix
+
+The regression points from all \(d\) per-regressor partitions are assembled into a **regression point matrix (RPM)**. Each row of the RPM corresponds to one occupied joint region in the multivariate structure; the columns record the local mean of each predictor within that region, and a final column records the corresponding local mean response.
+
+For a new observation \(x^* \in \mathbb{R}^d\), prediction proceeds by identifying the rows of the RPM closest to \(x^*\) across the predictor columns, then returning the distance-weighted average of the corresponding local response means. This is a nearest-neighbor search over regression points — compressed, denoised local conditional means — rather than over the \(n\) raw observations.
+
+The curse of dimensionality is mitigated along two dimensions simultaneously:
+
+1. **Search space compression.** The RPM has far fewer rows than \(n\). Each row is a local conditional mean derived from a cluster of observations, not a raw data point.
+2. **Noise reduction before search.** Each regression point has already been smoothed through local averaging, so the candidates in the nearest-neighbor search carry substantially less noise than raw observations.
+
+The full multivariate prediction architecture — including dependence-adaptive neighbor count and alternative synthetic-predictor dimension reduction — is developed in Chapter 21.
+
+---
+
+## Estimation Workflow
+
+In practice, recursive mean-split estimation follows a direct workflow.
+
+### Step 1 — Data preparation
+
+Collect paired observations
+
+\[
+(X_i, Y_i), \quad i = 1, \dots, n,
+\]
+
+with \(X_i \in \mathbb{R}^d\) for the multivariate case.
+
+### Step 2 — Select partitioning mode
+
+Choose joint partitioning (using both \(X\) and \(Y\) means) or \(X\)-only partitioning (using only the predictor mean), depending on the application.
+
+### Step 3 — Recursive mean splitting
+
+Iteratively apply the mean-split rule:
+
+1. Compute the local conditional means within the region.
+2. Divide observations according to their quadrant membership.
+3. Assign quadrant identifications and record regression points.
+4. Repeat within each subregion.
+
+### Step 4 — Stopping rule
+
+Terminate splitting when any region falls below the minimum occupancy threshold or when the order \(O\) reaches its maximum.
+
+### Step 5 — Prediction
+
+For a new predictor value \(x\):
+
+1. Identify the region \(A_n(x)\) containing \(x\) (univariate) or the matching rows in the RPM (multivariate).
+2. Return the local response mean or the distance-weighted average of the nearest regression-point means as the predicted value:
+
+\[
+\hat{f}_n(x)
+=
+\frac{1}{N_{n,x}}
+\sum_{i : X_i \in A_n(x)} Y_i.
+\]
+
+### Step 6 — Curve fitting (optional)
+
+The regression points at each order can be connected by **linear segments** to produce a piecewise-linear curve. This provides a smooth interpolating surface between partition means, supports well-defined interpolation and extrapolation, and reduces variance relative to a pure step-function estimator.
+
+---
+
+## Limit Condition and Perfect Fit
+
+A notable property of the recursive mean-split estimator is its **finite-order limit condition**.
+
+As the order \(O\) increases, the number of regions grows — exponentially in the joint case (\(4^{O-1}\)) or geometrically in the \(X\)-only case (\(2^O\)). At a finite order \(O^*\), every observation occupies its own region and becomes its own regression point. At this limit:
+
+\[
+\hat{f}_n(X_i) = Y_i \quad \text{for all } i = 1, \dots, n,
+\]
+
+and the \(R^2\) of the in-sample fit equals 1.
+
+This property distinguishes NNS from kernel regression, which cannot achieve an exact fit at all observations simultaneously due to the continuous support of the kernel function. The limit condition is reached in finite steps, not asymptotically.
+
+In practice, \(O\) is selected to balance fit quality against overfitting. Larger \(O\) reduces bias but increases variance; the appropriate order is determined by the signal-to-noise ratio of the data. The NNS dependence measure provides an objective criterion for this selection.
+
+---
+
+## Properties of the Recursive Mean-Split Estimator
+
+The recursive mean-split estimator possesses several important characteristics.
+
+### Consistency by class inheritance
+
+The estimator belongs to the class of data-adaptive partition estimators studied by Stone (1977) and extended by Lugosi and Nobel (1996) and Györfi et al. (2002). Under standard conditions — shrinking cell diameter and growing cell occupancy — the class is universally consistent. The recursive mean-split estimator satisfies these conditions under the occupancy and order growth rules used in practice, and consistency is therefore established by class membership rather than requiring a separate proof from first principles.
+
+### Nonparametric flexibility
+
+No functional form is imposed on the relationship between \(X\) and \(Y\).
+The estimator represents highly nonlinear relationships without specifying a parametric family.
+
+### Grounding in partial moments
+
+Region boundaries are defined by the same partial-moment structure — upper and lower deviations relative to a benchmark — that underlies the directional statistics developed throughout this book.
+The estimator is therefore not simply an ad hoc tree method but an application of the NNS partial-moment framework to the estimation of conditional expectations.
+
+### Data-adaptive partitions
+
+Regions are determined entirely by the data.
+Areas where the conditional mean changes rapidly receive finer partitions as \(O\) increases.
+
+### Implicit smoothing
+
+The order \(O\) governs the effective smoothing scale.
+Chapter 19 formalizes this by interpreting the partition cell diameter as a **dynamic bandwidth**, connecting the estimator to classical nonparametric smoothing theory.
+
+### Piecewise-linear interpolation
+
+Connecting regression points with line segments produces an interpolating surface that is stable, interpretable, and well-behaved for interpolation and extrapolation — properties that kernel-based nonparametric regressions do not generally share.
+
+### Finite limit condition
+
+At a finite order \(O^*\), every observation occupies its own region and the estimator achieves a perfect in-sample fit — \(\hat{f}_n(X_i) = Y_i\) for all \(i\). This property is not available to kernel or polynomial methods, which are intrinsically approximate. It provides a well-defined upper bound on approximation error and a principled spectrum of fits indexed by \(O\), from a single linear approximation at \(O = 1\) to exact interpolation at \(O = O^*\).
+
+---
+
+## Relationship to Other Methods
+
+The recursive mean-split estimator is related to several classical approaches but differs from each in important respects.
+
+### CART and decision trees
+
+CART algorithms select splits greedily to minimize impurity or squared prediction error, then apply pruning penalties to prevent overfitting.
+
+Recursive mean splitting differs in two key respects. First, the splitting criterion is the **conditional mean** of the data within the region, not an optimized impurity measure. Second, overfitting is controlled through the minimum occupancy threshold and the order parameter rather than through post-hoc pruning. The resulting partition structure follows the conditional mean geometry rather than an impurity landscape (Breiman et al., 1984).
+
+### Kernel regression
+
+Kernel estimators smooth data using a bandwidth parameter selected externally, often by cross-validation.
+
+Recursive mean-split estimation avoids explicit bandwidth selection: smoothing arises implicitly through the order parameter and the partition cell size. Chapter 19 makes this connection precise by showing that the cell diameter functions as a data-adaptive bandwidth.
+
+A further distinction concerns exactness of fit. Because all kernel functions are continuous probability distributions, the probability that a kernel function assigns positive mass to any exact observed value is zero. Consequently, the kernel estimate at any point is a weighted average over the surrounding distribution and can never exactly equal any single observation. The recursive mean-split estimator does not share this limitation: at finite order \(O^*\), it achieves an exact fit at every observed point simultaneously.
+
+### \(k\)-means clustering
+
+The NNS partition objective — minimizing within-quadrant sum of squares — is equivalent to the \(k\)-means objective when the number of clusters equals the number of partition regions. The key difference is that NNS does not require a pre-specified \(k\): the number of regions is determined by the order parameter \(O\) and the data, not by an externally supplied cluster count.
+
+### Fixed-grid estimators
+
+Uniform grids divide the predictor space into equally sized cells.
+Mean-split partitions are data-driven and adapt to the density and conditional mean geometry of the sample.
+
+---
+
+## Summary
+
+Recursive mean-split estimation provides a flexible, data-adaptive, nonparametric method for estimating conditional expectations.
+
+The key ideas developed in this chapter are:
+
+- **Consistency by class membership.** The recursive mean-split estimator belongs to the class of data-adaptive partition estimators, for which universal consistency under shrinking-diameter and growing-occupancy conditions has been established in the literature (Stone 1977; Lugosi and Nobel 1996; Györfi et al. 2002). The estimator satisfies both conditions under standard stopping rules, and its consistency is therefore inherited directly from this class.
+- Partition-based estimation of conditional means, using the local sample average within each region as the estimator.
+- Two splitting modes: joint \((X, Y)\) partitioning, which creates four partial-moment quadrants at each split, and \(X\)-only partitioning, which creates two predictor-interval subregions.
+- The joint splitting rule is grounded in the co-partial-moment structure of the NNS framework: region boundaries coincide with the joint conditional means, and the four quadrants correspond exactly to the CoUPM, CoLPM, DLPM, and DUPM regions.
+- An order parameter \(O\) governs partition depth; at a finite \(O^*\), the estimator achieves a perfect in-sample fit.
+- Multivariate predictors are handled via per-regressor partitioning against the response, which produces a regression point matrix (RPM) of local conditional means. The search space for prediction grows linearly in the number of regressors, not exponentially, substantially mitigating the curse of dimensionality.
+- Prediction connects regression points with local averages or piecewise-linear segments in the univariate case, and with distance-weighted nearest-neighbor averaging over the RPM in the multivariate case.
+
+Chapter 19 interprets the partition cell diameter as a **dynamic bandwidth**, linking the estimator to classical kernel smoothing theory while preserving its data-adaptive character.
+
+Chapter 21 develops the multivariate regression architecture in full, showing how per-regressor partitioning with response-anchored centroids — combined with dependence-adaptive neighbor count — produces a nearest-neighbor prediction method over a compressed, denoised geometry that handles high-dimensional predictors substantially better than raw-observation kNN.
+
+---
+
+## References
+
+- Stone, C. J. (1977). Consistent nonparametric regression. *Annals of Statistics*, 5(4), 595–620.
+
+- Lugosi, G., & Nobel, A. (1996). Consistency of data-driven histogram methods for density estimation and classification. *Annals of Statistics*, 24(2), 687–706.
+
+- Györfi, L., Kohler, M., Krzyżak, A., & Walk, H. (2002). *A Distribution-Free Theory of Nonparametric Regression*. Springer.
+
+- Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). *Classification and Regression Trees*. Wadsworth.
+
+- Vinod, H. D., & Viole, F. (2018). Clustering and curve fitting by line segments. *Preprints*, 2018010090. https://doi.org/10.20944/preprints201801.0090.v1
+
+- Viole, F., & Nawrocki, D. (2012). Deriving nonlinear correlation coefficients from partial moments. *SSRN eLibrary*. https://doi.org/10.2139/ssrn.2148522
+
+- Viole, F. (2016). NNS: Nonlinear nonparametric statistics. R package. https://cran.r-project.org/package=NNS
diff --git a/tools/NNS/book/chapter-19-dynamic-bandwidth-interpretation.Rmd b/tools/NNS/book/chapter-19-dynamic-bandwidth-interpretation.Rmd
new file mode 100644
index 0000000..bd06939
--- /dev/null
+++ b/tools/NNS/book/chapter-19-dynamic-bandwidth-interpretation.Rmd
@@ -0,0 +1,588 @@
+# Dynamic Bandwidth Interpretation
+
+Chapter 18 established the recursive mean-split estimator as a **partition-based nonparametric regression method** and showed that it is consistent for the true conditional mean function — not as a novel result requiring special proof, but as a direct consequence of belonging to the well-characterized class of data-adaptive partition estimators studied by Stone (1977), Lugosi and Nobel (1996), and Györfi et al. (2002). The shrinking-diameter and growing-occupancy conditions satisfied by recursive mean splitting are precisely the conditions that class-level consistency theorems require.
+
+A further interpretation now comes into focus:
+
+**the partition cell itself plays the role of a bandwidth.**
+
+In classical nonparametric estimation, smoothing is controlled by a user-chosen parameter.
+In kernel regression this parameter is the bandwidth \(h\).
+In histogram methods it is the bin width.
+In splines it is the smoothing penalty.
+
+These quantities determine the scale over which observations are averaged.
+
+The recursive mean-split estimator also averages observations locally. But it does so without requiring the analyst to choose an external smoothing scale in the same way. Instead, the smoothing scale is induced by the partition:
+
+- large cells produce coarse smoothing,
+- small cells produce fine smoothing.
+
+This chapter interprets the partition diameter as a **dynamic, stochastic, location-dependent bandwidth**, and uses that interpretation to connect recursive mean-split estimation to the broader theory of nonparametric smoothing. The consistency conditions already established in Chapter 18 are then re-read through this bandwidth lens — showing that they are the direct analogues of the shrinking-bandwidth conditions in classical kernel theory.
+
+---
+
+## Bandwidth in Classical Nonparametric Estimation
+
+The notion of bandwidth is central to classical nonparametric methods.
+
+In kernel regression, the Nadaraya–Watson estimator takes the form
+
+\[
+\hat f_h(x)
+=
+\frac{\sum_{i=1}^n K\!\left(\frac{x-X_i}{h}\right)Y_i}
+{\sum_{i=1}^n K\!\left(\frac{x-X_i}{h}\right)}.
+\]
+
+Here:
+
+- \(K(\cdot)\) is a kernel function,
+- \(h>0\) is the bandwidth.
+
+The bandwidth determines how quickly weights decay as observations move away from \(x\).
+
+If \(h\) is small, only very nearby observations influence the estimate.
+If \(h\) is large, distant observations receive substantial weight.
+
+Thus bandwidth governs the bias–variance tradeoff:
+
+- **small bandwidth** \(\rightarrow\) low bias, high variance,
+- **large bandwidth** \(\rightarrow\) high bias, low variance.
+
+The same logic appears in histogram regression and fixed-grid estimators.
+There the analogue of \(h\) is the cell width.
+
+Although classical methods differ in form, they share a common structure:
+
+**they smooth by averaging observations over neighborhoods whose size is controlled by a tuning parameter.**
+
+This parameter is usually chosen by cross-validation, plug-in rules, or asymptotic heuristics.
+
+That choice is often difficult, unstable, and highly consequential.
+
+---
+
+## Local Averaging in the Recursive Mean-Split Estimator
+
+Recall from Chapter 18 that the recursive mean-split estimator has the form
+
+\[
+\hat f_n(x)
+=
+\frac{1}{N_{n,x}}
+\sum_{i:X_i\in A_n(x)} Y_i,
+\]
+
+where
+
+- \(A_n(x)\) is the terminal cell containing \(x\),
+- \(N_{n,x}\) is the number of observations in that cell.
+
+This is a **local average**. The only difference from a kernel or histogram estimator is how the local neighborhood is defined.
+
+In kernel regression, the neighborhood is determined by a distance-weighting rule around \(x\).
+In recursive mean-split estimation, the neighborhood is the partition cell \(A_n(x)\).
+
+So the estimator already has the essential structure of a smoother:
+
+1. identify a local region around \(x\),
+2. average the responses inside that region.
+
+The smoothing scale is therefore determined by the **size of the cell containing \(x\)**.
+
+This suggests the natural bandwidth analogue
+
+\[
+h_n(x) := \operatorname{diam}(A_n(x)),
+\]
+
+where \(\operatorname{diam}(A_n(x))\) denotes the diameter of the cell.
+
+This quantity depends on
+
+- the sample,
+- the recursive splitting path,
+- the predictor location \(x\),
+- the stopping rule.
+
+It is therefore **data-adaptive and location-specific**.
+
+---
+
+## Partition Diameter as Stochastic Bandwidth
+
+The interpretation
+
+\[
+h_n(x) := \operatorname{diam}(A_n(x))
+\]
+
+makes the connection precise.
+
+The recursive mean-split estimator can be viewed as a **local-constant smoother with random bandwidth \(h_n(x)\)**.
+
+This bandwidth differs from the classical kernel bandwidth in three important ways.
+
+### It is stochastic
+
+The cells are determined by the observed sample, so \(h_n(x)\) is random.
+
+### It is location-dependent
+
+Different regions of the predictor space can have different cell sizes:
+
+\[
+h_n(x_1) \neq h_n(x_2)
+\]
+
+in general.
+
+### It is endogenous
+
+The bandwidth is not imposed externally.
+It emerges from the recursive mean-split geometry itself.
+
+This is the key conceptual shift.
+
+Classical bandwidth methods ask:
+
+**What smoothing scale should we choose?**
+
+The recursive mean-split framework instead asks:
+
+**What smoothing scale does the data imply through repeated mean-based partitioning?**
+
+The answer is encoded in the partition diameter.
+
+---
+
+## Adaptive Smoothing Mechanisms
+
+Why does the induced bandwidth adapt to data structure?
+
+Because the recursive mean-split rule repeatedly partitions the data around local means.
+
+In regions where the conditional mean function changes rapidly, successive splits generate finer cells.
+In regions where the conditional mean is comparatively flat, fewer effective refinements are needed and cells remain larger.
+
+Thus the smoothing scale contracts more aggressively where the signal is more complex.
+
+This adaptivity is the geometric content of the estimator.
+
+To see the intuition, consider three stylized regions of a regression surface.
+
+### Flat region
+
+Suppose \(f(x)\) is nearly constant on some interval.
+A large cell still produces a good approximation because averaging over that region introduces little bias.
+
+### Moderately curved region
+
+If \(f(x)\) changes gradually, recursive splitting creates smaller cells so that local averages track the curvature more closely.
+
+### Sharp structural change
+
+If \(f(x)\) changes abruptly, the partition must refine more aggressively to avoid pooling observations across substantively different conditional means.
+
+So the induced bandwidth is not globally fixed.
+It contracts where localization is most needed.
+
+This is exactly what one wants from a nonparametric smoother.
+
+---
+
+## Consistency Conditions Re-Read Through the Bandwidth Lens
+
+Chapter 18 established that consistency of the partition estimator class requires two conditions:
+
+\[
+\operatorname{diam}(A_n(x)) \to 0
+\qquad\text{and}\qquad
+N_{n,x}\to\infty.
+\]
+
+Under the bandwidth interpretation, these become directly analogous to the standard kernel consistency conditions
+
+\[
+h_n(x)\to 0
+\qquad\text{and}\qquad
+n_{\text{eff}}(x)\to\infty.
+\]
+
+The first condition says the local neighborhood must shrink, so that the estimator becomes localized and bias vanishes.
+
+The second says the number of observations used in the local average must grow, so that variance vanishes.
+
+Thus the recursive mean-split estimator satisfies the same asymptotic logic as classical nonparametric smoothing, with the neighborhood size determined by partition geometry rather than by analyst choice. The two frameworks are not merely analogous — the partition estimator consistency result is the same theorem, stated in partition language rather than kernel language.
+
+In this sense, Chapter 18 can be re-read entirely through the bandwidth lens:
+
+- shrinking cell diameter = shrinking bandwidth,
+- growing occupancy = growing effective local sample size.
+
+The bias–variance decomposition becomes
+
+\[
+\hat f_n(x)-f(x)
+=
+\bigl(\hat f_n(x)-\bar f_n(x)\bigr)
++
+\bigl(\bar f_n(x)-f(x)\bigr),
+\]
+
+where the variance term is controlled by occupancy and the bias term is controlled by the dynamic bandwidth \(h_n(x)\).
+
+---
+
+## Local Averaging Interpretation
+
+The bandwidth interpretation also clarifies what the estimator is doing pointwise.
+
+For any \(x\), the recursive mean-split estimator averages over the region \(A_n(x)\):
+
+\[
+\hat f_n(x)
+=
+E_n[Y\mid X\in A_n(x)],
+\]
+
+where \(E_n\) denotes the empirical average.
+
+Thus \(\hat f_n(x)\) is the empirical conditional mean over a neighborhood whose diameter is \(h_n(x)\).
+
+This is exactly analogous to a local averaging estimator with adaptive window width.
+
+The distinction is that the "window" need not be symmetric, fixed-width, or Euclidean in the simplistic kernel sense.
+It is determined by recursive partition membership.
+
+This yields two useful perspectives.
+
+### Piecewise-constant view
+
+The estimator is constant within each terminal cell.
+So the partition defines a locally constant regression surface.
+
+### Piecewise-linear view
+
+As Chapter 18 emphasized, connecting regression points by line segments yields a piecewise-linear representation.
+This can be interpreted as smoothing the cellwise local averages into a continuous interpolation while preserving the same adaptive partition geometry.
+
+In either case, the underlying localization scale is still determined by the cell diameter.
+
+---
+
+## Comparison with Kernel Regression
+
+Kernel regression and recursive mean-split estimation share an essential principle:
+
+**both estimate \(f(x)\) by averaging responses from a local neighborhood around \(x\).**
+
+But they differ in how that neighborhood is defined.
+
+### Kernel regression
+
+Neighborhood influence is determined by weights
+
+\[
+K\!\left(\frac{x-X_i}{h}\right),
+\]
+
+with a user-specified bandwidth \(h\).
+
+### Recursive mean-split estimation
+
+Neighborhood influence is determined by membership in the terminal cell \(A_n(x)\), whose effective width is
+
+\[
+h_n(x)=\operatorname{diam}(A_n(x)).
+\]
+
+The contrast is important.
+
+#### External versus endogenous smoothing
+
+Kernel regression requires an externally chosen bandwidth.
+
+Recursive mean splitting generates its bandwidth from the data.
+
+#### Global versus local scale
+
+Kernel bandwidth is often global, unless one explicitly uses variable-bandwidth methods.
+
+Recursive mean splitting is inherently variable-bandwidth because cell sizes differ across locations.
+
+#### Weight decay versus region membership
+
+Kernel methods use continuously decaying weights.
+Partition methods use discrete inclusion within a cell.
+
+#### Exact fit limit
+
+Kernel estimators remain weighted averages over continuous neighborhoods and do not achieve exact interpolation at all observed points simultaneously.
+
+Recursive mean-split estimation reaches a finite order \(O^*\) at which every point forms its own region and
+
+\[
+\hat f_n(X_i)=Y_i
+\]
+
+for all observed \(i\).
+
+Thus the recursive mean-split estimator spans a spectrum from coarse smoothing to exact interpolation through the order parameter and occupancy rule.
+
+---
+
+## Comparison with Fixed-Grid Estimators
+
+Fixed-grid partition methods divide the predictor space into prespecified intervals or cells.
+
+These methods have a bandwidth analogue as well: the grid width.
+
+But they suffer from a major limitation:
+
+**the grid is chosen before seeing the geometry of the data.**
+
+As a result:
+
+- dense regions may be oversmoothed,
+- sparse regions may be undersmoothed,
+- boundaries may cut across important nonlinear structure.
+
+Recursive mean-split estimation avoids this problem by generating the partition from the observed sample.
+
+So while both methods can be written as local averages over cells, only the recursive mean-split approach makes the bandwidth
+
+- stochastic,
+- endogenous,
+- responsive to conditional mean geometry.
+
+This is why the phrase **dynamic bandwidth** is appropriate: the smoothing scale changes with the data, the location, and the recursive structure of the estimator.
+
+---
+
+## Comparison with CART and Tree-Based Methods
+
+CART and related tree methods also induce adaptive partitions.
+
+This makes them the closest classical relatives of recursive mean-split estimation.
+
+But the source of adaptivity differs.
+
+### CART
+
+Splits are selected greedily to optimize impurity reduction or squared-error reduction, usually followed by pruning or complexity penalties.
+
+### Recursive mean splitting
+
+Splits are anchored to the **local mean structure** of the data itself.
+
+This distinction matters because it changes the meaning of the induced bandwidth.
+
+In CART, cell size reflects the outcome of a greedy optimization path under a chosen impurity criterion.
+
+In recursive mean splitting, cell size reflects repeated partitioning around benchmark-relative local means. The bandwidth is therefore connected to the same directional benchmark logic developed throughout the book.
+
+So although both methods are adaptive partition estimators — and both inherit their consistency from the same class-level results — the recursive mean-split bandwidth is structurally tied to the directional framework rather than merely to greedy optimization.
+
+---
+
+## A Simple Illustrative Example
+
+Consider the univariate sample
+
+\[
+(X,Y)=
+(1,2), (2,3), (3,3), (6,8), (7,9), (8,9).
+\]
+
+Suppose the first split occurs at the predictor mean
+
+\[
+\bar X = \frac{1+2+3+6+7+8}{6}=4.5.
+\]
+
+This creates two predictor regions:
+
+- left cell: \(X \le 4.5\),
+- right cell: \(X > 4.5\).
+
+The cellwise means are
+
+\[
+\hat f_{\text{left}} = \frac{2+3+3}{3}=\frac{8}{3},
+\qquad
+\hat f_{\text{right}} = \frac{8+9+9}{3}=\frac{26}{3}.
+\]
+
+At this stage, the effective bandwidths are roughly the cell diameters:
+
+\[
+h_{\text{left}} \approx 3-1 = 2,
+\qquad
+h_{\text{right}} \approx 8-6 = 2.
+\]
+
+So both regions use a relatively coarse smoothing scale.
+
+Now suppose the right cell is split again because its internal structure warrants further refinement.
+Then two smaller subcells are created, each with smaller diameter. Their corresponding bandwidths contract.
+
+The estimate in that region becomes more local.
+
+This simple example illustrates the principle:
+
+**every additional split reduces the effective bandwidth in the affected region.**
+
+Unlike a kernel estimator, which changes smoothness by altering a global numeric parameter \(h\), the recursive mean-split estimator changes smoothness by refining the partition itself.
+
+---
+
+## The Order Parameter as Global Smoothing Control
+
+Although the bandwidth is local and stochastic, the order parameter \(O\) still plays an important global role.
+
+Increasing \(O\):
+
+- increases the number of potential regions,
+- decreases typical cell diameter,
+- reduces bias,
+- increases variance,
+- pushes the estimator toward exact interpolation.
+
+So \(O\) functions as a global control on the *capacity* of the partition, while the realized bandwidths \(h_n(x)\) provide the local smoothing scales.
+
+This is an important distinction.
+
+- \(O\) is not itself the bandwidth.
+- Rather, \(O\) regulates how small the bandwidths are allowed to become.
+
+In that sense, the recursive mean-split estimator combines
+
+- a **global refinement control** through \(O\), and
+- **local adaptive smoothing** through \(h_n(x)=\operatorname{diam}(A_n(x))\).
+
+In the default implementation, this global control is deployed locally as well: when `order = NULL`, each regressor receives its own effective order based on its directional dependence strength with the response. Bandwidth therefore becomes not only stochastic and location-dependent, but also **signal-dependent**, allocating finer resolution where directional dependence is strongest and broader smoothing where evidence is weaker.
+
+This dual structure helps explain why the method can be both flexible and interpretable.
+
+---
+
+## Multivariate Interpretation
+
+The bandwidth interpretation extends naturally to multivariate predictors, and here it connects to the key architectural feature introduced in Chapter 18 and developed fully in Chapter 21.
+
+In the NNS multivariate setting, each predictor is partitioned independently against the response, rather than partitioning the joint predictor space. The per-regressor bandwidths are therefore:
+
+\[
+h_n^{(j)}(x^{(j)}) := \operatorname{diam}(A_n^{(j)}(x^{(j)})), \quad j = 1, \dots, d,
+\]
+
+where \(A_n^{(j)}\) is the terminal cell for predictor \(j\) in its own univariate partition against \(Y\).
+
+These per-regressor bandwidths are each data-adaptive and stochastic. But crucially, they do not compound exponentially as \(d\) grows, because the partition is not being formed in the joint \(d\)-dimensional predictor space. Each regressor's smoothing scale is determined independently by its own relationship with the response.
+
+This is the bandwidth-level expression of the curse-of-dimensionality mitigation described in Chapter 18: by partitioning each regressor against the response separately, the effective smoothing scale for each predictor dimension is governed by the univariate data density in that dimension — not by the joint density in \(\mathbb{R}^d\), which deteriorates rapidly with dimension.
+
+The resulting regression point matrix then supports a nearest-neighbor prediction step in a space whose size grows linearly in \(d\), with each candidate neighbor already denoised through local averaging. The per-regressor dynamic bandwidths make this compression principled: each bandwidth has already adapted to local signal structure before the joint prediction step begins.
+
+---
+
+## Advantages of the Dynamic Bandwidth View
+
+Interpreting recursive mean-split estimation through bandwidth offers several conceptual advantages.
+
+### It links NNS to classical smoothing theory
+
+The estimator is not an isolated procedure.
+It belongs to the same family of local averaging methods as kernels and histograms, and its consistency is the class-level result of Stone (1977) re-expressed in partition geometry.
+
+### It clarifies the consistency proof
+
+The shrinking-diameter condition from Chapter 18 is simply the shrinking-bandwidth condition in disguise. No additional theoretical machinery is needed; the connection is structural.
+
+### It explains adaptivity
+
+Because the bandwidth is generated by the data, smoothing automatically varies across regions.
+
+### It avoids arbitrary external tuning in the classical sense
+
+Rather than choosing a bandwidth directly, the analyst controls partition refinement and occupancy, while the local smoothing scale emerges endogenously.
+
+### It preserves interpretability
+
+Each bandwidth corresponds to an actual region of the observed data, not merely to a tuning number in a weighting formula.
+
+### It clarifies the multivariate architecture
+
+Per-regressor bandwidths make explicit why the NNS multivariate approach avoids the worst consequences of the curse of dimensionality: each dimension's smoothing scale is determined by its own data density and its own relationship with the response, rather than by the joint density of the full predictor space.
+
+---
+
+## Structural Interpretation within NNS
+
+This chapter completes an important conceptual arc in the book.
+
+Earlier chapters showed that directional deviation operators generate:
+
+- cumulative distribution functions,
+- classical moments,
+- nonlinear dependence measures,
+- benchmark-relative probability statements.
+
+Chapter 18 then showed that recursive mean splitting uses benchmark-relative geometry to define estimation regions and produces a consistent estimator by class membership.
+
+This chapter adds the final interpretation:
+
+**those same deviation-defined regions induce a stochastic bandwidth.**
+
+So the NNS estimator is not merely a partition rule.
+
+It is an adaptive smoothing procedure whose local scale is generated by recursive benchmark-relative decomposition — and whose consistency is not a novel claim but a direct inheritance from the established theory of partition estimators.
+
+The structural message is therefore unified:
+
+- directional deviations define probability mass,
+- directional deviations decompose moments,
+- directional deviations reveal dependence,
+- directional deviations generate estimation regions,
+- and those regions determine the local smoothing scale.
+
+Bandwidth, in this framework, is not an external input.
+
+It is an emergent property of the directional partition itself.
+
+---
+
+## Summary
+
+The main ideas of this chapter are:
+
+- Classical nonparametric methods smooth by averaging over neighborhoods whose size is controlled by a bandwidth or bin width.
+- The recursive mean-split estimator is also a local averaging estimator, but its neighborhood is the terminal partition cell \(A_n(x)\).
+- The natural bandwidth analogue is the cell diameter
+
+\[
+h_n(x)=\operatorname{diam}(A_n(x)).
+\]
+
+- This bandwidth is **stochastic, location-dependent, and endogenous**.
+- Regions where the conditional mean varies more sharply receive finer partitions and therefore smaller effective bandwidths.
+- The consistency conditions from Chapter 18 — shrinking diameter and growing occupancy — are exactly the shrinking-bandwidth and growing-local-sample-size conditions of classical nonparametric kernel theory. Consistency is inherited from the partition estimator class; the bandwidth interpretation makes this inheritance explicit.
+- In the multivariate case, per-regressor bandwidths are determined independently for each predictor's relationship with the response, avoiding the exponential deterioration of joint bandwidth in high dimensions and providing the bandwidth-level foundation for the curse-of-dimensionality mitigation developed in Chapter 21.
+
+The next chapter turns from the smoothing interpretation of recursive mean splitting to one of its major practical consequences: **clustering**. If recursive partitioning can define local estimation neighborhoods, it can also define groups of structurally similar observations.
+
+---
+
+## References
+
+- Stone, C. J. (1977). Consistent nonparametric regression. *Annals of Statistics*, 5(4), 595–620.
+
+- Lugosi, G., & Nobel, A. (1996). Consistency of data-driven histogram methods for density estimation and classification. *Annals of Statistics*, 24(2), 687–706.
+
+- Györfi, L., Kohler, M., Krzyżak, A., & Walk, H. (2002). *A Distribution-Free Theory of Nonparametric Regression*. Springer.
+
+- Wand, M. P., & Jones, M. C. (1995). *Kernel Smoothing*. Chapman and Hall.
+
+- Fan, J., & Gijbels, I. (1996). *Local Polynomial Modelling and Its Applications*. Chapman and Hall.
diff --git a/tools/NNS/book/chapter-20-synthetic-data-and-maximum-entropy-bootstrap.Rmd b/tools/NNS/book/chapter-20-synthetic-data-and-maximum-entropy-bootstrap.Rmd
new file mode 100644
index 0000000..8f93bc3
--- /dev/null
+++ b/tools/NNS/book/chapter-20-synthetic-data-and-maximum-entropy-bootstrap.Rmd
@@ -0,0 +1,454 @@
+# Synthetic Data and Maximum Entropy Bootstrap
+
+The previous chapter showed that threshold selection, directional probability bounds, and finite-sample stability are linked problems. These concerns naturally motivate resampling procedures that preserve empirical structure while generating synthetic realizations for robustness analysis. This chapter therefore turns to maximum entropy bootstrap and related synthetic-data methods as computational tools for the estimation and threshold-analysis problems developed in Chapter 16.
+
+In many practical situations, however, analysts require not only statistical summaries of existing data but also **synthetic data generation**. Synthetic data can be used for simulation, risk analysis, forecasting evaluation, and Monte Carlo experiments.
+
+A common approach for generating synthetic datasets is **bootstrap resampling**, which repeatedly samples from the observed data. While useful, classical bootstrap procedures assume independence or rely on model-based adjustments to accommodate dependence.
+
+Time-series data present a particular challenge because **temporal dependence must be preserved** for synthetic samples to remain realistic.
+
+The **maximum entropy bootstrap (ME bootstrap)** provides a solution. By constructing bootstrap samples that satisfy entropy-maximizing constraints while preserving the empirical dependence structure, the method produces synthetic time series that retain key statistical properties of the original data.
+
+The NNS implementation, `NNS.meboot`, extends the original `meboot` algorithm in several important directions: it allows the user to specify an **arbitrary target rank correlation** between the original and resampled series, supports multiple dependence metrics native to the NNS framework, provides fine-grained control over the trend component of synthetic series, and enables a richer class of Monte Carlo simulations than classical iid resampling permits.
+
+This chapter introduces bootstrap methods, explains the theory of the maximum entropy bootstrap, and demonstrates how synthetic time series can be generated and customized using `NNS.meboot`.
+
+---
+
+## Bootstrap Methods
+
+The bootstrap is a general method for estimating sampling distributions by **resampling from observed data**.
+
+Suppose a dataset consists of observations
+
+$$
+x_1, x_2, \dots, x_n .
+$$
+
+A bootstrap sample is obtained by drawing observations **with replacement** from this set, producing
+
+$$
+x_1^*, x_2^*, \dots, x_n^* .
+$$
+
+Repeating this procedure many times produces an ensemble of synthetic datasets. Any statistic $T(X)$ can then be evaluated across the bootstrap samples to approximate its sampling distribution.
+
+Bootstrap procedures are widely used to estimate standard errors, confidence intervals, and bias corrections.
+
+However, classical bootstrap resampling assumes that observations are **independent and identically distributed (i.i.d.)**.
+
+For time-series data this assumption fails because observations exhibit **serial dependence**. Simple resampling destroys the temporal ordering and therefore eliminates the structure that generated the data.
+
+---
+
+## Limitations of Classical Bootstrap for Time Series
+
+Several modifications of the bootstrap have been proposed to address dependence.
+
+### Block Bootstrap
+
+The block bootstrap resamples contiguous blocks of observations rather than individual values. This partially preserves local dependence but introduces new design choices: block length, overlap structure, and edge effects. Choosing block parameters can strongly influence results.
+
+### Model-Based Bootstrap
+
+Another approach fits a parametric time-series model such as ARIMA and then simulates synthetic data from the estimated model. This method inherits the limitations of the assumed model: specification risk, distributional assumptions, and sensitivity to parameter estimation.
+
+Both approaches therefore require **tuning choices or parametric assumptions**.
+
+### The iid Correlation Constraint
+
+A further limitation of standard iid Monte Carlo simulation (MCS) is less obvious but practically important. When a large number of resampled series is generated by iid shuffling with replacement, the Pearson correlation coefficients between those series and the original tend to cluster in a narrow range of approximately $[-0.3, 0.3]$, regardless of the underlying data. This occurs because the expected correlation between two independent random samples drawn from the same distribution is zero; sampling variability alone produces the observed spread, and no mechanism drives resamples toward strongly positive or negative correlation with the original. As a result, standard MCS does not provide adequate variety in simulated paths — tail scenarios and strongly correlated or anti-correlated futures are systematically underrepresented.
+
+The maximum entropy bootstrap avoids these problems by constructing synthetic samples using **information-theoretic principles** that preserve the empirical dependence structure without specifying a parametric model, and by allowing the user to inject controlled variety through a target rank correlation parameter.
+
+---
+
+## Maximum Entropy Principle
+
+The maximum entropy principle originates from information theory.
+
+Given incomplete information about a system, the probability distribution that best represents the current state of knowledge is the one that **maximizes entropy subject to known constraints**.
+
+For a discrete distribution with probabilities $p_i$, entropy is
+
+$$
+H = -\sum_i p_i \log p_i .
+$$
+
+Maximizing entropy ensures that the resulting distribution introduces **no additional assumptions beyond the constraints provided by the data**.
+
+In the context of bootstrap resampling, the constraints arise from the empirical properties of the observed time series. The goal is to generate synthetic sequences that satisfy these constraints while maximizing entropy, thereby producing the **least-biased distribution consistent with the data**.
+
+---
+
+## Maximum Entropy Bootstrap for Time Series
+
+The maximum entropy bootstrap constructs synthetic time-series samples through a sequence of steps that preserve essential features of the observed data.
+
+Let the observed series be
+
+$$
+x_1, x_2, \dots, x_n .
+$$
+
+The ME bootstrap algorithm proceeds conceptually as follows.
+
+### Step 1: Order Statistics
+
+Sort the observations to obtain the ordered sample
+
+$$
+x_{(1)} \le x_{(2)} \le \dots \le x_{(n)} .
+$$
+
+The ordering allows construction of piecewise intervals between adjacent values.
+
+### Step 2: Interval Construction
+
+Define intervals between successive order statistics and extend them at the boundaries. These intervals represent regions in which synthetic observations may occur.
+
+### Step 3: Maximum Entropy Density
+
+Within each interval a density function is constructed so that the resulting distribution maximizes entropy while preserving the empirical mean and variance. The interval means follow the Theil–Laitinen weighting scheme, which assigns weight $0.25$ to each neighbor and $0.50$ to the central value for interior points, with boundary adjustments at the extremes.
+
+### Step 4: Random Sampling
+
+Random values are drawn from this maximum-entropy distribution.
+
+### Step 5: Time Ordering
+
+The sampled values are reordered to match the rank structure of the original series, restoring temporal dependence.
+
+The resulting synthetic dataset preserves marginal distribution characteristics, dependence structure, and sample size. Because the method uses entropy maximization rather than parametric modeling, it remains **distribution-free**.
+
+---
+
+## Dependence-Preserving Resampling
+
+The key innovation of the maximum entropy bootstrap is the preservation of **rank dependence**.
+
+Let
+
+$$
+R_t = \text{rank}(x_t)
+$$
+
+denote the rank of observation $x_t$ within the sample. After synthetic values are generated, they are assigned to time indices according to the same rank ordering:
+
+$$
+x_t^* = y_{(R_t)}
+$$
+
+where $y_{(i)}$ denotes the $i$-th ordered synthetic value.
+
+This mapping ensures that the **relative ordering of observations over time** matches that of the original data. As a result, autocorrelation and other dependence features remain approximately preserved. Unlike block bootstrap methods, this approach requires **no block-length tuning** and does not impose parametric assumptions.
+
+### Theoretical Basis for Rank Matching
+
+The theoretical justification for perfect rank matching was formalized by Joag-dev (1984), who showed that if one requires strong dependence between the original series $x_t$ and any resampled series $x_t^*$ without imposing parametric constraints, the order statistics of both series must conform with each other. This distribution-free measure of strong dependence corresponds to a Spearman rank correlation of unity.
+
+However, as discussed in Section 16.6, the NNS implementation relaxes this constraint, allowing the user to specify any target rank correlation in $[-1, 1]$.
+
+---
+
+## Arbitrary Spearman Rank Correlation: The `rho` Parameter
+
+A major extension of `NNS.meboot` relative to the original `meboot` package is the ability to specify an **arbitrary Spearman rank correlation** $\rho \in [-1, 1]$ between the original series and each bootstrap replicate. This is controlled by the `rho` argument.
+
+The standard meboot algorithm always produces resamples with $\rho = 1$ relative to the original series (perfect rank alignment). While this preserves dependence maximally, it limits the variety of simulated paths. For some applications, such as stress testing, scenario analysis, or Monte Carlo simulation, the analyst may want resamples that are weakly correlated, orthogonal, or even negatively correlated with the original.
+
+### How Rank Targeting Works
+
+For each replicate, the algorithm constructs two extreme orderings:
+
+- **Aligned**: synthetic values sorted to match the rank order of the original residuals (corresponds to $\rho = +1$).
+- **Anti-aligned**: synthetic values sorted in the reverse rank order of the original residuals (corresponds to $\rho = -1$).
+
+A convex combination of these two extremes is then optimized so that the resulting series achieves the target correlation $\rho$ with the original. The optimization is performed replicate-by-replicate in residual space.
+
+### Dependence Metric Options
+
+The `type` argument controls which dependence measure is targeted:
+
+| `type` | Measure used |
+|---------------|---------------------------------------|
+| `"spearman"` | Spearman rank correlation (default) |
+| `"pearson"` | Pearson linear correlation |
+| `"NNScor"` | NNS nonlinear correlation |
+| `"NNSdep"` | NNS nonlinear dependence |
+
+The `"NNScor"` and `"NNSdep"` options integrate the NNS co-partial-moment framework directly into the bootstrap loop, allowing dependence targeting that captures nonlinear relationships. `"NNScor"` corresponds to the NNS nonlinear correlation coefficient introduced in Chapter 10, which detects monotonic and non-monotonic associations through co-partial moments; `"NNSdep"` corresponds to the directional dependence measure from Chapter 10, which quantifies the strength of dependence independently of its direction or functional form.
+
+### Simulation Evidence
+
+Vinod and Viole (2020) demonstrate through simulation that for OLS inference on nonstationary I(1) data, meboot-based confidence intervals with $\rho \ge 0.6$ outperform traditional OLS confidence intervals. When $\rho = 1$, the average absolute deviation from the nominal rejection rate is approximately $0.037$, far smaller than the OLS analog of approximately $0.423$.
+
+For random walk experiments, setting $\rho < 0.5$ transforms the resampled series into stationary I(0) series — verified by the ADF test rejecting the unit-root null. This suggests that `rho` provides a new, model-free route to stationarizing nonstationary series as an alternative to differencing or de-trending.
+
+---
+
+## Trend Decomposition and Drift Control
+
+The `NNS.meboot` implementation operates on **residuals** rather than raw levels. Specifically:
+
+1. A linear trend is estimated from the original series via ordinary least squares.
+2. The ME bootstrap resampling is applied to the residuals.
+3. Reconstructed synthetic series are formed as $\text{baseline}_t + \text{resampled residual}_t$, where the baseline is the fitted linear trend evaluated at each time point.
+
+This decomposition ensures that the synthetic series inherit the correct distributional properties from the residuals while allowing independent control of the trend component.
+
+### Drift Arguments
+
+Three arguments govern the trend component:
+
+- **`drift = TRUE`** (default): the original series' estimated linear drift is preserved in all replicates.
+- **`drift = FALSE`**: the trend is removed; replicates are centered around a flat baseline.
+- **`target_drift`**: specifies an explicit drift value (e.g., a risk-free rate of return) to impose on all replicates.
+- **`target_drift_scale`**: multiplies the estimated drift by a scalar, allowing proportional adjustments.
+
+These options are particularly useful in financial applications where synthetic return paths should reflect a specific expected return or be drift-neutral for risk attribution purposes.
+
+---
+
+## Synthetic Time-Series Generation with `NNS.meboot`
+
+The NNS package provides `NNS.meboot` for generating synthetic bootstrap samples that preserve the empirical distribution and temporal structure of the original series.
+
+### Basic Usage
+```r
+library(NNS)
+
+# Generate 100 bootstrap replicates of AirPassengers
+boots <- NNS.meboot(AirPassengers, reps = 100, rho = 1, xmin = 0)
+
+# Verify Spearman correlation of ensemble to original
+cor(boots["ensemble", ]$ensemble, AirPassengers, method = "spearman")
+
+# Plot all replicates
+matplot(boots["replicates", ]$replicates, type = "l")
+
+# Overlay ensemble mean
+lines(boots["ensemble", ]$ensemble, lwd = 3)
+
+# Overlay original
+lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red")
+```
+
+
+
+
+
+
+### Return Object
+
+`NNS.meboot` returns a named list with the following elements:
+
+| Element | Description |
+|---------------|-----------------------------------------------------------------|
+| `x` | Original input data |
+| `replicates` | Matrix of bootstrap replicates (rows = time, cols = replicates) |
+| `ensemble` | Row mean across all replicates |
+| `xx` | Sorted order statistics of residuals |
+| `z` | Class interval limits |
+| `dv` | Absolute consecutive deviations |
+| `dvtrim` | Trimmed mean of `dv` (used for tail extension) |
+| `xmin` | Effective lower bound for ensemble values |
+| `xmax` | Effective upper bound for ensemble values |
+| `desintxb` | Desired interval means (Theil–Laitinen) |
+| `ordxx` | Rank ordering of original residuals |
+| `kappa` | Scale adjustment factor (if `scl.adjustment = TRUE`) |
+
+### Vectorized `rho`
+
+The `rho` argument is vectorized, enabling a single call to produce replicates at multiple target correlations simultaneously:
+```r
+# Three sets of replicates: orthogonal, half-correlated, and fully correlated
+boots <- NNS.meboot(AirPassengers, reps = 10, rho = c(0, 0.5, 1), xmin = 0)
+
+matplot(do.call(cbind, boots["replicates", ]), type = "l")
+lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red")
+```
+
+For Monte Carlo workflows, the package also provides `NNS.MC()` as a convenience wrapper around this `NNS.meboot`-based simulation pipeline.
+
+Similarly, `target_drift` is vectorized across drift levels while holding `rho` fixed:
+```r
+# Replicates with two different target drift rates, rho fixed at 0
+boots <- NNS.meboot(AirPassengers, reps = 10, rho = 0, xmin = 0,
+ target_drift = c(1, 7))
+matplot(do.call(cbind, boots["replicates", ]), type = "l")
+lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red")
+```
+
+---
+
+## Improved Monte Carlo Simulation
+
+### The Limitation of Standard iid MCS
+
+Traditional Monte Carlo simulation generates synthetic paths by sampling with replacement from the observed series. While easy to implement, this approach inadvertently constrains the variety of simulated paths.
+
+Empirically, when 10,000 iid resamples are drawn from a series, the Pearson correlation coefficients between each resample and the original series cluster in the range approximately $[-0.3, 0.3]$, with a pronounced concentration near zero. This happens because random shuffling with replacement systematically destroys correlation structure without exploring strongly positive or strongly negative paths.
+```r
+set.seed(12345)
+xt <- rnorm(1:100, mean = 9, sd = 12)
+
+# Standard iid MCS: 10,000 replicates
+X <- matrix(NA, nrow = 100, ncol = 10000)
+for (i in 1:10000) {
+ X[, i] <- sample(xt, 100, replace = TRUE)
+}
+
+hist(cor(X, xt),
+ main = "Standard iid Monte Carlo Simulation\n10,000 Iterations",
+ xlab = "Correlation")
+```
+
+
+
+
+
+
+The resulting histogram is tightly centered on zero, confirming that standard MCS cannot generate strongly correlated or anti-correlated scenario paths.
+
+### Expanded Simulation via Vectorized `rho`
+
+The `NNS.meboot` approach spans the full correlation range $[-1, 1]$ by generating one replicate per target $\rho$ value across a fine grid, then expanding each to a set of standard meboot replicates:
+```r
+library(NNS)
+set.seed(12345)
+xt <- rnorm(1:100, mean = 9, sd = 12)
+
+# Step 1: Generate one replicate per rho across [-1, 1]
+boots_grid <- NNS.meboot(xt, reps = 1,
+ rho = seq(-1, 1, 0.01),
+ drift = FALSE)
+
+Z <- do.call(cbind, boots_grid["replicates", ])
+
+# Step 2: Expand each to 50 standard meboot replicates
+new_MCS <- list()
+for (i in 1:dim(Z)[2]) {
+ new_MCS[[i]] <- NNS.meboot(Z[, i], reps = 50, rho = 1,
+ drift = FALSE)["replicates", ]$replicates
+}
+
+hist(cor(do.call(cbind, new_MCS), xt),
+ main = "NNS.meboot Simulation\n10,000 Iterations",
+ xlab = "Correlation")
+```
+
+
+![Figure 17.3. `NNS.meboot` Monte Carlo correlation histogram after vectorized dependence targeting, spanning a wider \([-1,1]\) range.](images/ch17_meboot_mc_sim.png)
+
+
+
+The resulting distribution of correlations is approximately uniform across $[-1, 1]$, confirming that the `rho`-vectorized approach provides a fundamentally richer simulation basis than standard iid MCS.
+
+---
+
+## Applications in Forecasting and Risk Analysis
+
+### Financial Risk Management: Value at Risk and Expected Shortfall
+
+The richer simulation produced by `NNS.meboot` is especially valuable in financial risk management, where tail behavior drives key risk metrics.
+
+Vinod and Viole (2020) demonstrate this using ten years of daily S&P 500 returns (1998–2007) to generate simulations, then evaluating out-of-sample performance against the 2008 financial crisis. The simulation exercise generates approximately 2.5 million observations from each method to estimate the 1% Value at Risk (VaR), expected shortfall (ES), and minimum simulated return.
+
+| Metric | Actual 2008 | Standard MCS | NNS.meboot |
+|---------------------|-------------|--------------|------------|
+| 99% VaR | −8.77% | −2.94% | −3.10% |
+| Expected Shortfall | −8.92% | −3.66% | −3.91% |
+| Minimum Value | −9.03% | −6.80% | −15.68% |
+
+The NNS.meboot simulation reveals a minimum simulated daily return exceeding −15%, a result that would have warned investors of extreme tail risk before the crisis. The standard MCS minimum of approximately −6.8% — itself less than the observed worst day in the preceding nine years of data — conveyed a dangerously false sense of security.
+
+The `sym = TRUE` argument forces the maximum entropy density to be symmetric around zero within each interval. For financial return series — where positive and negative deviations of equal magnitude should receive equal probability mass — this prevents the ME density from inheriting any asymmetry that may be present in the residuals, producing a more conservative and balanced tail exploration. Combined with `xmin` set to cap extreme losses, the simulation explores the left tail without imposing a parametric distributional form:
+```r
+library(quantmod)
+library(NNS)
+
+getSymbols("^GSPC", from = "1998-01-01", to = "2009-01-01")
+SPX_train <- as.numeric(dailyReturn(GSPC["1998-01::2008-01"]))
+SPX_test <- as.numeric(dailyReturn(GSPC["2008"]))
+
+# Generate paths across the full rho range
+SPX_boots <- NNS.meboot(SPX_train, reps = 1,
+ rho = seq(-1, 1, 0.01),
+ drift = FALSE)
+
+SPX_meboot_grid <- do.call(cbind, SPX_boots["replicates", ])
+
+# Expand each path to 5 replicates with symmetric ME density
+new_SPX <- list()
+for (i in 1:dim(SPX_meboot_grid)[2]) {
+ new_SPX[[i]] <- NNS.meboot(SPX_meboot_grid[, i], reps = 5,
+ rho = 1, drift = FALSE,
+ sym = TRUE)["replicates", ]$replicates
+}
+
+all_returns <- unlist(new_SPX)
+
+# Risk metrics
+quantile(all_returns, 0.01) # 99% VaR
+mean(all_returns[all_returns <= quantile(all_returns, 0.01)]) # ES
+min(all_returns) # Minimum simulated return
+```
+
+### Forecast Model Evaluation
+
+Bootstrapped time series allow analysts to evaluate forecast stability, assess model sensitivity, and compute predictive distributions without imposing distributional assumptions on forecast errors.
+
+### Stationarity Transformation
+
+Setting `rho` to small values (below approximately $0.5$) produces resampled series that the ADF test classifies as stationary, even when the original series is I(1). This provides a model-free alternative to differencing or de-trending that may be preferable when the analyst does not wish to commit to a specific transformation.
+
+---
+
+## Relationship to the NNS Framework
+
+The maximum entropy bootstrap fits naturally within the directional statistics framework developed throughout this book.
+
+Directional methods emphasize distributional structure relative to benchmarks, while ME bootstrap preserves the empirical distribution and dependence relationships from which those directional statistics are computed. Synthetic samples generated through `NNS.meboot` therefore maintain the properties required for downstream NNS analyses, including:
+
+- partial-moment estimation,
+- directional dependence measurement via `NNS.dep`,
+- distribution comparison via NNS ANOVA,
+- nonparametric forecasting.
+
+The `type = "NNScor"` and `type = "NNSdep"` options close this loop explicitly: the bootstrap targeting criterion is itself computed using NNS co-partial moments, so the resampling respects the same nonlinear dependence geometry as the rest of the NNS toolkit.
+
+By combining entropy-based resampling with directional statistical measures, analysts obtain a fully **nonparametric workflow for simulation and inference**.
+
+
+
+---
+
+## Summary
+
+This chapter introduced synthetic data generation through the maximum entropy bootstrap as implemented in `NNS.meboot`.
+
+Key points include:
+
+- Classical bootstrap methods assume independence and may destroy temporal structure.
+- Standard iid MCS generates simulated paths with correlations concentrated near zero, failing to represent tail or counter-trend scenarios.
+- The maximum entropy bootstrap constructs synthetic samples by maximizing entropy subject to empirical constraints, preserving marginal distributions and dependence structure without parametric assumptions.
+- Rank-based reordering preserves the temporal dependence structure of the original series.
+- `NNS.meboot` extends the original `meboot` algorithm with a vectorized `rho` argument that targets any Spearman rank correlation in $[-1, 1]$, multiple dependence metric options including NNS-native measures, and drift decomposition for precise trend control.
+- The expanded Monte Carlo simulation enabled by vectorized `rho` spans the full correlation range and provides materially richer risk estimates, as demonstrated in the S&P 500 stress-testing example.
+- Low `rho` settings offer a model-free approach to generating stationary resamples from nonstationary series.
+
+Together with the prediction and inference methods developed in earlier chapters, the ME bootstrap provides a powerful tool for **distribution-free simulation and forecasting within the NNS framework**.
+
+---
+
+## References
+
+- Joag-Dev, K. (1984). Measures of dependence. In P. K. Krishnaiah & P. K. Sen (Eds.), *Handbook of Statistics* (Vol. 4, pp. 79–88). North-Holland.
+- Vinod, H. D. (2004). Ranking mutual funds using unconventional utility theory and stochastic dominance. *Journal of Empirical Finance*, **11**(3), 353–377.
+- Vinod, H. D. (2006). Maximum entropy ensembles for time series inference in economics. *Journal of Asian Economics*, **17**(6), 955–978.
+- Vinod, H. D. (2013). *Maximum entropy bootstrap algorithm enhancements* (SSRN Working Paper 2285041). https://doi.org/10.2139/ssrn.2285041
+- Vinod, H. D., & López-de-Lacalle, J. (2009). Maximum entropy bootstrap for time series: The meboot R package. *Journal of Statistical Software*, **29**(5), 1–19.
+- Vinod, H. D., & Viole, F. (2020). *Arbitrary Spearman's rank correlations in maximum entropy bootstrap and improved Monte Carlo simulations* (SSRN Working Paper 3621614). https://doi.org/10.2139/ssrn.3621614
+- Viole, F. (2016). NNS: Nonlinear nonparametric statistics. R package. https://cran.r-project.org/package=NNS
diff --git a/tools/NNS/book/chapter-21-clustering.Rmd b/tools/NNS/book/chapter-21-clustering.Rmd
new file mode 100644
index 0000000..144755d
--- /dev/null
+++ b/tools/NNS/book/chapter-21-clustering.Rmd
@@ -0,0 +1,504 @@
+# Clustering
+
+Chapters 19–20 developed the recursive mean-split estimator as a partition-based method for nonparametric estimation and showed that its induced cell geometry behaves like a dynamic bandwidth. Those results focused on **supervised estimation**, where a response variable $Y$ is observed and the objective is to recover a conditional mean function.
+
+This chapter turns to **unsupervised learning**.
+
+In unsupervised learning there is no designated response variable. The goal is instead to discover **structure within the data itself**. Among the most fundamental unsupervised tasks is **clustering**: partitioning observations into groups whose members are more similar to one another than to members of other groups.
+
+Classical clustering methods such as **k-means** and **hierarchical clustering** are widely used, but they inherit familiar limitations:
+
+- k-means is driven by Euclidean distance and spherical-centroid geometry,
+- hierarchical clustering depends heavily on linkage rules,
+- both require the analyst to specify or assume a number of groups in advance,
+- and both can obscure nonlinear, asymmetric, and benchmark-relative structure.
+
+The directional framework developed throughout this book suggests a different perspective.
+
+Rather than defining similarity purely by geometric distance, we can define it through **directional behavior relative to local benchmarks**. In this view, clustering is not merely grouping by proximity. It is grouping by **shared directional structure**.
+
+Critically, the number of clusters that emerges from this procedure is **determined by the data**, not prescribed by the analyst. This is not a minor implementation detail. It is a fundamental departure from methods that require $K$ to be chosen before any structure has been examined — and it means, in particular, that the number of clusters need not equal, and in general will not equal, the number of class labels in any downstream supervised problem.
+
+This chapter develops that idea using the partition machinery introduced in Chapters 19–20 and connects it to the NNS clustering procedure.
+
+---
+
+## What Clustering Seeks to Recover
+
+Suppose we observe multivariate data
+
+$$X_1, X_2, \dots, X_n \in \mathbb{R}^d.$$
+
+A clustering algorithm seeks a partition of the index set
+
+$$\{1,2,\dots,n\} = C_1 \cup C_2 \cup \cdots \cup C_K, \qquad C_j \cap C_\ell = \varnothing \text{ for } j \ne \ell,$$
+
+where each set $C_k$ is interpreted as a **cluster**.
+
+The ideal is that observations within the same cluster share a common structural pattern, while observations in different clusters do not.
+
+But this raises two immediate questions:
+
+**What does "similar" mean?**
+
+And equally: **how many groups should there be?**
+
+Different clustering methods answer both questions differently. Classical methods typically answer the second question first — by requiring the analyst to supply $K$ — and then optimize toward that prespecified count. The directional framework answers neither question in advance. Similarity is defined through benchmark-relative structure, and the number of clusters is whatever the recursive partition produces under the chosen stopping rule. Crucially, that count is a **consequence of the data**, not an assumption about it.
+
+- Euclidean methods define similarity through geometric closeness.
+- Density methods define similarity through connected regions of high probability mass.
+- Model-based methods define similarity through shared latent distributions.
+- Directional methods define similarity through shared benchmark-relative structure.
+
+The directional definition is particularly natural when the variables exhibit asymmetry, nonlinear dependence, or tail-specific behavior.
+
+---
+
+## Why Distance Alone Can Mislead
+
+The simplest clustering intuition is geometric distance.
+
+If two observations $x_i$ and $x_j$ are close in Euclidean norm,
+
+$$\|x_i - x_j\|,$$
+
+they are regarded as similar.
+
+This works well when clusters are compact, approximately spherical, and separated mainly by location.
+
+But many real datasets violate these conditions.
+
+### Nonlinear shape
+
+Two observations can lie on the same curved manifold and be structurally similar even if their straight-line distance is not especially small.
+
+### Asymmetric spread
+
+Clusters may have very different directional variability: wide in one direction, narrow in another.
+
+### Benchmark-relative structure
+
+Two points may be similar because they lie on the same side of a threshold across several variables, even if their raw coordinates differ.
+
+### Tail co-movement
+
+Observations may be grouped naturally by whether they occur jointly in extreme regions rather than by ordinary central distance.
+
+Distance alone therefore treats all directions symmetrically and all dimensions uniformly unless ad hoc reweighting is imposed.
+
+The directional framework instead asks whether observations share common **sign and magnitude of deviation** relative to meaningful benchmarks.
+
+---
+
+## Partition-Based Clustering
+
+The recursive mean-split machinery from Chapter 18 provides a natural unsupervised clustering mechanism.
+
+In supervised estimation, partition cells were used to approximate a regression surface. In clustering, the same recursive partitions can be interpreted directly as **unsupervised groupings of the data cloud**.
+
+Let $R\subset \mathbb{R}^d$ denote a region containing a subset of observations. Define the local centroid
+
+$$\bar X_R = \frac{1}{|I_R|}\sum_{i\in I_R} X_i,$$
+
+where $I_R$ indexes the observations in region $R$.
+
+The directional idea is to partition the region relative to this local benchmark.
+
+In two dimensions this creates four quadrants. In $d$ dimensions it creates up to $2^d$ directional subregions, depending on which coordinates lie above or below their local means.
+
+At each stage:
+
+1. compute the local benchmark vector,
+2. assign each observation according to its directional deviation pattern,
+3. recurse within each nonempty subregion.
+
+This yields a tree of increasingly refined directional cells.
+
+The terminal cells form a clustering of the data. Their number is determined entirely by where observations fall relative to successive local means and by the chosen stopping rule. No value of $K$ is ever specified.
+
+Unlike k-means, these clusters are not required to be spherical, convex, or globally separable by Voronoi boundaries. They arise from repeated local directional refinement.
+
+---
+
+## Clusters Are Not Classes
+
+A point that deserves explicit statement: **the number of clusters produced by the directional partition need not equal, and in general will not equal, the number of classes in a downstream classification problem.**
+
+This distinction matters because the two concepts serve different purposes.
+
+A **cluster** is a group of observations that share common benchmark-relative directional structure in the predictor space. It is an unsupervised concept, derived entirely from the geometry of the data cloud without reference to any response variable or label.
+
+A **class** is a label assigned to an observation — by a supervisor, domain expert, or measured outcome. In a classification problem, classes are given; the analyst's task is to predict them.
+
+The relationship between clusters and classes is empirical, not definitional. A dataset with 3 known class labels may contain 4, 7, or 12 natural directional clusters, because the feature space harbors more local structure than the labels alone reflect. Conversely, a dataset with 10 nominal categories may collapse to 3 meaningfully distinct directional clusters if several categories share the same benchmark-relative geometry.
+
+Neither outcome is a failure. They reflect the fact that clusters describe **where the data lives** while classes describe **what the data is labeled**. The two can inform each other — clusters often predict classes well precisely because shared directional structure tends to co-occur with shared labels — but they are answering different questions.
+
+This matters practically. When using directional partitioning as a preprocessing step for classification, the analyst should not constrain the number of clusters to match the number of known classes. Doing so would import a supervised assumption into an unsupervised procedure and foreclose the possibility of discovering that the data contains more, fewer, or differently arranged groups than the labels suggest. Instead, let the recursive partition discover the natural directional groupings, then examine how class labels distribute across those groups. If clusters map cleanly to classes, that alignment is a finding. If they do not, that misalignment is also informative.
+
+---
+
+## Directional Similarity
+
+To formalize the idea, let $x_i, x_j \in \mathbb{R}^d$ be two observations and let $t \in \mathbb{R}^d$ denote a benchmark vector.
+
+For each coordinate $m = 1,\dots,d$, define the directional signs
+
+$$s_m(x_i;t) = \begin{cases} -1 & x_{im} \le t_m,\\ +1 & x_{im} > t_m. \end{cases}$$
+
+The vector
+
+$$s(x_i;t) = \bigl(s_1(x_i;t),\dots,s_d(x_i;t)\bigr)$$
+
+records the directional region occupied by $x_i$ relative to the benchmark.
+
+Two observations are **directionally concordant** at benchmark $t$ if
+
+$$s(x_i;t) = s(x_j;t).$$
+
+They are **directionally discordant** if they occupy different sign regions.
+
+This already gives a coarse notion of similarity: observations falling into the same directional cell share the same pattern of benchmark-relative deviations.
+
+A richer notion includes magnitudes. Define the directional deviation vectors
+
+$$u(x_i;t) = \bigl((x_{i1}-t_1)_+,\dots,(x_{id}-t_d)_+\bigr),$$
+
+$$\ell(x_i;t) = \bigl((t_1-x_{i1})_+,\dots,(t_d-x_{id})_+\bigr).$$
+
+These separate upward and downward deviations coordinatewise. Two observations can then be judged similar not only because they lie in the same directional region, but because their directional deviation magnitudes are similar.
+
+This is the unsupervised analogue of the partial-moment logic used throughout the book.
+
+---
+
+## Recursive Mean Splits as Unsupervised Structure Discovery
+
+The recursive clustering logic can be understood most clearly in low dimension.
+
+### Two-dimensional case
+
+Suppose each observation is a pair
+
+$$X_i = (X_{i1}, X_{i2}).$$
+
+Within a region $R$, compute the local mean vector
+
+$$(\bar X_{R,1}, \bar X_{R,2}).$$
+
+This mean induces four directional regions:
+
+| Region | Condition |
+|---|---|
+| lower-lower | $X_{i1}\le \bar X_{R,1}$, $X_{i2}\le \bar X_{R,2}$ |
+| lower-upper | $X_{i1}\le \bar X_{R,1}$, $X_{i2}> \bar X_{R,2}$ |
+| upper-lower | $X_{i1}> \bar X_{R,1}$, $X_{i2}\le \bar X_{R,2}$ |
+| upper-upper | $X_{i1}> \bar X_{R,1}$, $X_{i2}> \bar X_{R,2}$ |
+
+Each nonempty region is then split again using its own local mean.
+
+This process produces a nested partition of the data cloud.
+
+### Higher-dimensional case
+
+In $d$ dimensions the same principle applies, though the number of directional subregions per split can grow to $2^d$. In practice one often works with lower-dimensional projections, selected variable subsets, or stopping rules that prevent excessive fragmentation.
+
+The core idea remains unchanged:
+
+**cluster structure is revealed by repeated directional partitioning around local benchmarks.**
+
+This is unsupervised because no response variable is needed. The geometry of the predictor cloud is itself the object of analysis. And the number of clusters that results is a property of that geometry — not a parameter set in advance.
+
+---
+
+## Relationship to Partial Moments
+
+The connection to partial moments is more than intuitive.
+
+Recall from earlier chapters that benchmark-relative structure is encoded by directional deviation operators. In clustering, those same operators summarize the local spread of each candidate cluster.
+
+For a region $R$ and coordinate $m$, define local partial moments about the regional mean $t_m = \bar X_{R,m}$:
+
+$$U_r^{(R,m)} = \frac{1}{|I_R|}\sum_{i\in I_R}(X_{im}-\bar X_{R,m})_+^r,$$
+
+$$L_r^{(R,m)} = \frac{1}{|I_R|}\sum_{i\in I_R}(\bar X_{R,m}-X_{im})_+^r.$$
+
+These quantities describe the directional spread of the cluster in each coordinate.
+
+A region with balanced and small directional moments is relatively compact around its centroid. A region with large or highly asymmetric directional moments is more diffuse and may warrant further splitting.
+
+Thus recursive clustering can be interpreted as repeatedly refining regions whose directional spread remains structurally heterogeneous.
+
+In this way, clustering is tied directly to the same benchmark-relative decomposition that generated classical moments and dependence measures earlier in the book.
+
+---
+
+## Comparison with k-Means
+
+The most common partition-based clustering method is **k-means**.
+
+Given a desired number of clusters $K$, k-means minimizes the within-cluster sum of squared Euclidean distances:
+
+$$\sum_{k=1}^K \sum_{i\in C_k} \|x_i - \mu_k\|^2,$$
+
+where $\mu_k$ is the centroid of cluster $C_k$.
+
+This objective has several advantages:
+
+- computational simplicity,
+- interpretability,
+- good performance for compact spherical clusters.
+
+But it also has well-known limitations.
+
+### Prespecified $K$
+
+The number of clusters must be chosen in advance. This is the most fundamental limitation: the analyst must decide how many groups exist before examining whether the data supports that count.
+
+### Euclidean symmetry
+
+The method is driven by squared symmetric distance and therefore inherits the same aggregation logic criticized in earlier chapters.
+
+### Shape restriction
+
+Voronoi partitions are best suited to roughly spherical clusters.
+
+### Sensitivity to initialization
+
+Different random starts can produce different solutions.
+
+The directional partition approach differs in all four respects.
+
+1. It generates a hierarchy of clusters **without requiring a fixed $K$ at the outset**. The number of terminal clusters is a derived quantity, not an input.
+2. It partitions by directional benchmark-relative structure rather than global squared distance alone.
+3. It accommodates irregular, asymmetric, and locally nonlinear cluster geometry.
+4. Its recursion is deterministic once the splitting rule and stopping criteria are specified.
+
+The contrast on the first point is worth dwelling on. When an analyst runs k-means with $K = 3$ because a dataset has 3 known classes, they have already imported supervised information into an unsupervised procedure. The directional framework keeps the two tasks separate: discover how many directional groups the data contains, then examine how those groups relate to any available labels. These are different questions, and conflating them by setting $K$ equal to the number of classes forecloses the possibility of learning anything surprising from the unsupervised step.
+
+This does not imply that directional clustering dominates k-means in every setting. If the true cluster structure is spherical and centroid-based, k-means is efficient and appropriate. The point is that many real datasets are not, and that a data-determined $K$ is almost always preferable to an analyst-assumed one.
+
+```r
+# Clustered Dataset
+g <- 6
+set.seed(g)
+n <- 100
+d <- data.frame(x = unlist(lapply(1:g, function(i) rnorm(n/g, runif(1)*i^2))),
+y <- unlist(lapply(1:g, function(i) rnorm(n/g, runif(1)*i^2))))
+
+library(clue)
+par(mfrow = c(2, 1))
+km <- kmeans(d, 3)
+plot(d, col = km$cluster, main = paste("k-means (k = ", 3,")", sep = ""),
+ cex.main = 2)
+points(km$centers, pch = 15, col = 3:1)
+NNS.part(d$x, d$y, order = 3, Voronoi = TRUE, obs.req = 0)
+```
+
+
+
+
+
+---
+
+## Comparison with Hierarchical Clustering
+
+Hierarchical clustering constructs nested groupings either by
+
+- **agglomeration**: successively merging points or clusters, or
+- **division**: successively splitting them.
+
+Its main attraction is that it yields a dendrogram rather than a single partition, and it does not require $K$ to be fixed before the algorithm runs — the analyst cuts the tree at a chosen height after the fact.
+
+However, hierarchical clustering depends heavily on the chosen linkage rule:
+
+- single linkage,
+- complete linkage,
+- average linkage,
+- Ward's method.
+
+Different linkage rules can produce very different cluster structures on the same data.
+
+The directional recursive partition procedure is also hierarchical and also avoids fixing $K$ in advance. But its hierarchy is built from **local benchmark splits** rather than pairwise merge criteria.
+
+This yields several conceptual differences.
+
+### Geometry
+
+Hierarchical distance methods are governed by pairwise distances. Directional recursion is governed by local benchmark-relative partitions.
+
+### Interpretation
+
+Dendrogram merges may be difficult to interpret substantively. Directional splits are interpretable as above-below benchmark separations along specific coordinates or regions.
+
+### Local adaptivity
+
+Directional recursion recomputes local means inside each region, allowing the geometry to adapt as the partition deepens.
+
+Thus directional clustering can be viewed as a **benchmark-driven divisive hierarchical method** whose effective $K$ is determined by stopping criteria applied to local directional spread rather than by visual inspection of a dendrogram.
+
+---
+
+## Stopping Rules and Practical Cluster Formation
+
+A recursive partition procedure requires a stopping rule. Otherwise it eventually isolates individual observations into singleton clusters — the maximum possible $K$, and the least useful.
+
+Several stopping criteria are natural. Each implicitly determines how many clusters the procedure returns.
+
+### Minimum cell size
+
+Stop splitting when a region contains fewer than some threshold number of observations. This directly bounds the finest possible partition and places a floor on cluster size.
+
+### Maximum order
+
+Stop after a fixed recursion depth. At order $O$, the maximum number of populated terminal clusters is bounded above by $4^O$ for joint partitioning, though most cells will typically be empty. At order 1 there are at most 4 clusters; at order 2, at most 16; at order 3, at most 64.
+
+### Directional compactness
+
+Stop when local directional partial moments are sufficiently small, indicating that the region is already internally coherent. This is the most principled criterion: it halts refinement precisely when further splitting would not improve the directional description of the cluster.
+
+### Stability criteria
+
+Stop when further splits do not materially change the cluster assignments.
+
+These choices play a role analogous to model complexity control elsewhere in nonparametric estimation.
+
+In practice, one often combines them. For example:
+
+- split until order $O$,
+- but do not split cells with fewer than $m$ observations,
+- and stop early if directional spread falls below a tolerance.
+
+The terminal cells are then interpreted as clusters. Neighboring terminal cells can also be merged post hoc if a coarser partition is desired.
+
+None of these rules requires the analyst to specify how many clusters should result. The count of terminal clusters is a **derived output**, not an input parameter.
+
+---
+
+## Directional Similarity Matrices
+
+The clustering logic can also be expressed through a similarity matrix.
+
+For observations $x_i$ and $x_j$, define a directional similarity score
+
+$$S_{ij}(t) = \sum_{m=1}^d 1_{\{s_m(x_i;t)=s_m(x_j;t)\}},$$
+
+which counts the number of coordinates for which the two observations lie on the same side of the benchmark.
+
+A normalized version is
+
+$$\tilde S_{ij}(t) = \frac{1}{d} S_{ij}(t), \qquad 0 \le \tilde S_{ij}(t) \le 1.$$
+
+This gives a benchmark-relative concordance measure.
+
+One may refine it by including magnitudes:
+
+$$S_{ij}^{(r)}(t) = \sum_{m=1}^d \left[ 1_{\{s_m(x_i;t)=s_m(x_j;t)\}} \cdot \exp\!\left(-\bigl||x_{im}-t_m|^r - |x_{jm}-t_m|^r\bigr|\right) \right].$$
+
+The precise form can vary, but the principle is clear: similarity depends jointly on
+
+- side of benchmark,
+- magnitude of directional deviation.
+
+Such matrices can be used for visualization, graph-based clustering, or post-processing of recursive partitions — and they make no assumption about the number of groups.
+
+---
+
+## Unsupervised Learning Structures
+
+Clustering is often only the first step in unsupervised learning.
+
+Once a directional partition has been constructed, it supports several additional tasks.
+
+### Cluster prototypes
+
+Each cluster can be summarized by its local mean vector and directional spread profile.
+
+### Anomaly detection
+
+Observations falling into tiny or highly isolated terminal cells may be treated as anomalies.
+
+### Regime identification
+
+In time-indexed multivariate data, recurring visits to the same cluster can be interpreted as regimes or states.
+
+### Preprocessing for supervised learning
+
+Clusters can serve as features or as local neighborhoods for later regression and classification. When used this way, the number of directional clusters should be allowed to differ from the number of class labels — the unsupervised partition describes the geometry of the predictor space, which need not align neatly with any particular labeling scheme.
+
+Thus the partition is not just a grouping. It is a structural summary of the data cloud.
+
+This is especially important in the NNS framework, where the same recursive partitions reappear in estimation, dependence analysis, and machine learning.
+
+---
+
+## Applications
+
+Directional clustering is useful whenever structure is asymmetric, benchmark-relative, or nonlinear — and whenever the number of natural groups is genuinely unknown.
+
+### Finance
+
+Assets or time periods can be clustered by joint directional behavior, such as upside co-movement versus downside co-movement, rather than by symmetric return distance alone. The number of meaningful market regimes is not known in advance and should not be fixed by assumption.
+
+### Economics
+
+Cross-sectional units can be grouped by shared threshold behavior relative to policy benchmarks, inflation targets, or growth regimes. Whether there are 2, 4, or 7 meaningful regimes is an empirical question the directional partition can answer.
+
+### Risk management
+
+Scenarios can be clustered by tail-region structure, distinguishing ordinary variation from extreme co-occurring losses. The number of meaningfully distinct tail scenarios is data-determined.
+
+### Operations
+
+Demand profiles can be grouped relative to service thresholds, stockout regions, or capacity limits.
+
+### Biostatistics
+
+Patients may cluster more meaningfully by benchmark-relative biomarker patterns than by raw Euclidean distance when thresholds matter clinically. The number of clinically relevant subgroups need not equal the number of diagnostic categories in the classification system.
+
+In each case the essential advantage is the same: the clustering respects **direction and context**, not merely distance, and the number of groups reflects **what the data contains**, not what the analyst assumed.
+
+---
+
+## Conceptual Interpretation
+
+Classical clustering often begins with a notion of "center" and measures how far each point lies from that center — after first deciding how many centers there should be.
+
+The directional approach begins one level deeper.
+
+It asks:
+
+- on which side of the benchmark does the observation lie?
+- how far does it lie there?
+- does it share that directional structure with neighboring observations?
+- does further partitioning reveal additional heterogeneity?
+
+And it defers the question of how many groups there are until the partition itself has provided an answer.
+
+This mirrors the conceptual arc of the entire book.
+
+Classical statistics begins with symmetric aggregates.
+Directional statistics begins with the components.
+
+Classical clustering begins with a prespecified $K$ and global distance.
+Directional clustering begins with benchmark-relative structure and lets $K$ emerge.
+
+---
+
+## Summary
+
+This chapter introduced clustering as an unsupervised extension of the directional partition framework developed in earlier chapters. Its main contributions are fivefold.
+
+First, it defined **directional similarity**: observations are regarded as similar not merely when they are close in Euclidean distance, but when they share common benchmark-relative structure, including the sign and magnitude of their directional deviations.
+
+Second, it developed **recursive partition clustering** as a natural unsupervised use of mean-split partitioning. By repeatedly dividing the data cloud according to local benchmark-relative structure, the method produces clusters that adapt to asymmetry, irregular geometry, and nonlinear organization.
+
+Third, it established that **the number of clusters is determined by the data, not specified by the analyst**. The terminal cells of the recursive partition at any given order and occupancy threshold are a consequence of where observations fall relative to successive local means — not a parameter set before analysis begins. This is a fundamental departure from k-means and any other method that requires $K$ to be chosen in advance.
+
+Fourth, it clarified that **clusters are not classes**. The number of directional groups discovered by the partition need not equal the number of class labels in any downstream supervised problem. Constraining the partition to produce exactly as many clusters as there are classes imports supervised information into an unsupervised procedure and forecloses the possibility of discovering richer or different structure than the labels suggest.
+
+Fifth, it compared the directional approach with **classical clustering methods** such as k-means and hierarchical clustering. Unlike k-means, the directional method does not rely solely on symmetric Euclidean centroid geometry and does not require a prespecified $K$; unlike standard hierarchical methods, it is driven by local benchmark splits rather than arbitrary linkage criteria, and its effective $K$ is determined by stopping criteria on local directional spread.
+
+Taken together, these results show that clustering in the NNS framework is not simply a distance-based grouping exercise. It is a benchmark-relative structural decomposition of the data, consistent with the broader theme of the book: classical methods begin with symmetric aggregates, while directional methods begin with their components.
+
+The next chapter turns from unsupervised grouping to **nonparametric regression**, where these same partition structures are used to estimate conditional relationships directly from data.
diff --git a/tools/NNS/book/chapter-22-nonparametric-regression.Rmd b/tools/NNS/book/chapter-22-nonparametric-regression.Rmd
new file mode 100644
index 0000000..3a902fb
--- /dev/null
+++ b/tools/NNS/book/chapter-22-nonparametric-regression.Rmd
@@ -0,0 +1,507 @@
+# Nonparametric Regression
+
+Chapters 18 and 19 established the partition-based estimation framework that underlies the NNS approach to nonparametric regression. Chapter 18 introduced recursive mean-split estimation as a member of the well-characterized class of data-adaptive partition estimators and showed that consistency is inherited directly from that class under standard shrinking-diameter and occupancy conditions. Chapter 19 interpreted the induced partition diameter as a dynamic bandwidth, linking the estimator to classical nonparametric smoothing theory and showing that the consistency conditions are the direct analogue of shrinking-bandwidth conditions in kernel regression. Chapter 20 showed that the same recursive partitions can be used for unsupervised clustering.
+
+This chapter brings those strands together under the familiar label of **regression**.
+
+In classical statistics, regression is often identified with a fitted equation: a line, a polynomial, or another parametric surface chosen in advance. In the directional framework, regression is something more fundamental:
+
+**the estimation of conditional expectation from data without imposing a predetermined functional form.**
+
+The NNS approach treats regression as the recovery of a nonlinear surface by recursive local averaging over benchmark-defined regions. The univariate case and the multivariate case, however, operate through meaningfully different prediction mechanisms:
+
+- In the **univariate case**, the estimator is a piecewise-constant conditional expectation surface, with optional piecewise-linear interpolation across regression points.
+- In the **multivariate case**, the estimator operates as a nearest-neighbor search over a compressed set of regression points — local conditional means derived from per-regressor partitions against the response — rather than over the raw observations themselves.
+
+This distinction is not incidental. The multivariate architecture is designed specifically to mitigate the curse of dimensionality. It is the central contribution of the NNS regression framework for high-dimensional settings, and it is developed carefully below.
+
+---
+
+## Regression as Conditional Expectation
+
+Let $X \in \mathbb{R}^d$ denote a predictor vector and let $Y \in \mathbb{R}$ denote a response variable.
+
+The central object of regression is the **conditional mean function**
+
+$$f(x) = E[Y \mid X = x].$$
+
+This function gives the expected value of the response at each predictor location.
+
+Classical regression estimates $f$ by restricting it to a family such as
+
+$$f(x) = \beta_0 + \beta^\top x$$
+
+or
+
+$$f(x) = \beta_0 + \beta_1 x + \beta_2 x^2 + \cdots.$$
+
+Such models can be useful when the functional form is approximately correct. But when the underlying relationship is nonlinear, threshold-driven, piecewise, or asymmetric, a parametric family can distort the structure it aims to estimate.
+
+Nonparametric regression removes this restriction. Rather than specifying the shape of $f$ in advance, it estimates the function directly from the data.
+
+The directional framework does this by recursively partitioning the data into regions and estimating the local conditional mean inside each region.
+
+---
+
+## Why Classical Regression Can Fail
+
+The limitations of classical regression mirror the limitations discussed throughout the book.
+
+### Functional rigidity
+
+A linear model assumes that the response changes at a constant rate in each direction of predictor space. Many relationships do not.
+
+### Global averaging
+
+A single fitted equation averages across the full sample. Local nonlinear structure may be flattened into a misleading global trend.
+
+### Symmetric error treatment
+
+Least-squares fitting penalizes positive and negative residuals symmetrically. In many settings the two directions matter differently.
+
+### Parametric dependence
+
+Inference often depends on Gaussian errors, homoscedasticity, and stable functional form.
+
+These assumptions are not always wrong. But they are often stronger than the data justify.
+
+Nonparametric regression avoids imposing them at the outset.
+
+---
+
+## Partition-Based Regression in the NNS Framework
+
+The NNS regression framework begins with the recursive mean-split estimator introduced in Chapter 18.
+
+Suppose we observe
+
+$$(X_1,Y_1), \dots, (X_n,Y_n).$$
+
+A partition of the predictor space produces regions
+
+$$A_1, A_2, \dots, A_K.$$
+
+Within each region, the regression function is estimated by the local sample average:
+
+$$\hat f(x) = \frac{1}{N(x)} \sum_{i : X_i \in A(x)} Y_i,$$
+
+where $A(x)$ is the region containing $x$ and $N(x)$ is the number of observations in that region.
+
+This is the basic NNS regression rule:
+
+**estimate the conditional expectation by averaging responses inside a data-adaptive local region.**
+
+The distinctive feature is not the averaging formula itself. Partition estimators are classical, and their consistency is well-established. The distinctive features are the **geometry of the partition** — generated recursively from local means, following the directional structure of the data — and the **multivariate architecture** built on per-regressor partitioning against the response, which is the source of the method's ability to handle many predictors without exponential deterioration.
+
+---
+
+## From Conditional Means to Regression Points
+
+The recursive partition yields a collection of local mean points, called **regression points**.
+
+In the univariate case, these are pairs
+
+$$(\bar X_R, \bar Y_R)$$
+
+for each region $R$, where
+
+$$\bar X_R = \frac{1}{|I_R|}\sum_{i \in I_R} X_i, \qquad \bar Y_R = \frac{1}{|I_R|}\sum_{i \in I_R} Y_i.$$
+
+In higher dimensions, $\bar X_R$ becomes a local mean vector in predictor space.
+
+These regression points play a different role in the univariate and multivariate cases, and it is important to keep that distinction clear.
+
+---
+
+## The Univariate Case: Piecewise Estimation
+
+In the univariate case, the regression points can be interpreted in two complementary ways.
+
+### Piecewise-constant surface
+
+Within each terminal cell, the estimate is constant:
+
+$$\hat f(x) = \bar Y_R \qquad \text{for } x \in R.$$
+
+This yields a stepwise approximation to the conditional mean surface.
+
+### Piecewise-linear surface
+
+If neighboring regression points are connected by line segments, the result is a continuous piecewise-linear surface. This gives the NNS univariate regression estimator two useful faces:
+
+- a **local averaging estimator** for theory,
+- a **piecewise-linear interpolating surface** for visualization and prediction.
+
+Both arise from the same partition geometry. The piecewise-linear representation provides a transparent interpolation and extrapolation rule: between any two adjacent regression points, the surface varies linearly, while the regression points themselves remain anchored to empirical local conditional means.
+
+```r
+x <- seq(-5, 5, .05)
+y <- x ^ 3
+
+for(i in 1 : 3){NNS.part(x, y, order = i, obs.req = 0, Voronoi = TRUE, type = "XONLY")
+ NNS.reg(x, y, order = i, ncores = 1)}
+```
+
+
+
+
+
+---
+
+## Piecewise Estimation from Partition Clusters
+
+Chapter 20 showed that recursive mean-split partitioning can be interpreted as clustering. The same result has a direct regression interpretation.
+
+Each terminal partition cell can be viewed as a **local cluster of observations** sharing similar benchmark-relative structure. Once that structure is identified, regression proceeds by fitting the conditional mean locally inside each cluster.
+
+This interpretation clarifies why the method is effective on nonlinear data.
+
+A single global line may fit badly because observations that belong to fundamentally different local regimes are forced into one equation. Partition-based regression instead allows the data to decompose into structurally coherent regions before averaging.
+
+In this sense, NNS univariate regression is **piecewise conditional expectation estimation from partition clusters**, with linear interpolation between the resulting regression points.
+
+The procedure is:
+
+- recursively partition the sample into locally coherent regions,
+- compute the local regression point within each region,
+- connect adjacent regression points with line segments to form a continuous interpolating surface.
+
+Thus clustering and regression are not separate operations. In the NNS framework, they are two views of the same recursive structure.
+
+---
+
+## Interpretation of the Estimated Model
+
+One advantage of the NNS regression framework is that the fitted model remains interpretable.
+
+Classical black-box machine-learning methods can predict well while making it difficult to understand what the model has learned. Recursive mean-split regression retains a geometric interpretation at every stage.
+
+### Local benchmark interpretation
+
+Each split occurs at a local mean. The partition tree records how the data separate relative to conditional benchmarks.
+
+### Regional interpretation
+
+Each terminal cell corresponds to a region where the conditional expectation is approximately stable.
+
+### Surface interpretation
+
+The fitted surface can be read as an assembly of local conditional expectations stitched together over the predictor space.
+
+### Complexity interpretation
+
+Model complexity is controlled by partition order and occupancy thresholds rather than by a fixed polynomial degree or a hidden parameterization.
+
+This makes the estimator interpretable in a way that is both statistical and geometric:
+
+- where the surface bends, the partition refines,
+- where the surface is flat, the partition stays coarse,
+- where the data are sparse, the smoothing remains broader,
+- where the data are dense, the surface can localize more aggressively.
+
+---
+
+## Bias, Variance, and Adaptive Smoothing
+
+Chapters 18 and 19 already established the asymptotic logic of the estimator. In regression language, the same result can be restated simply.
+
+At any point $x$, the estimator averages the responses within a cell $A_n(x)$. Two conditions determine its quality:
+
+- the cell must be **small enough** that $f$ is nearly constant inside it,
+- the cell must contain **enough observations** that the local average is stable.
+
+These are the usual bias–variance requirements.
+
+If the cell is too wide, the estimate is biased because it averages across substantively different predictor values.
+If the cell is too small too early, the estimate has high variance because the local sample is too thin.
+
+The NNS partition addresses this through recursive refinement. Its effective bandwidth is the cell diameter
+
+$$h_n(x) = \operatorname{diam}(A_n(x)).$$
+
+This bandwidth is not chosen externally. It is induced by the recursive geometry of the data.
+
+Thus nonparametric regression in the NNS framework may be interpreted as **adaptive local averaging with endogenous bandwidth** — and the convergence of this bandwidth to zero as $n$ grows, combined with growing occupancy, is exactly the condition that delivers consistency by class membership.
+
+---
+
+## Comparison with Classical Regression Models
+
+The differences between NNS regression and classical models can now be stated directly.
+
+### Linear regression
+
+Ordinary least squares assumes a single global hyperplane $Y = \beta_0 + \beta^\top X + \varepsilon$. This is efficient when the relationship is approximately linear, but restrictive when it is not. NNS regression imposes no global linearity and allows the shape to vary across regions.
+
+### Polynomial regression
+
+Polynomial models allow curvature, but only of a prespecified algebraic form. High-order polynomials can oscillate and extrapolate poorly. NNS regression does not require choosing a polynomial degree; curvature emerges from recursive local refinement.
+
+### Generalized additive models
+
+Additive models allow nonlinear marginal effects but often assume additive separability across predictors. NNS regression does not require additive separability; interactions can appear naturally through the partition geometry.
+
+### CART and regression trees
+
+Tree methods also partition the predictor space and are the closest classical analogue. Both NNS and CART belong to the data-adaptive partition estimator class and share the same class-level consistency guarantees. But CART chooses splits greedily to optimize impurity or squared error reduction and then relies on pruning penalties. NNS regression anchors partitions to local mean structure itself; its geometry follows recursive benchmark-relative splitting rather than a greedy impurity search.
+
+### Kernel regression
+
+Kernel estimators average nearby observations using weights determined by a bandwidth $h$. They are classical and flexible but require explicit bandwidth selection, and their local neighborhoods are imposed by a weighting rule rather than generated by recursive structure. NNS regression avoids explicit bandwidth tuning and obtains localization through partition diameter instead. Both approaches are consistent local averagers; they differ in how the neighborhood is defined and whether the smoothing scale is chosen by the analyst or induced by the data.
+
+### k-nearest-neighbor regression
+
+Standard kNN regression predicts by averaging the $k$ observed responses whose predictor values lie closest to the query point. It searches over the raw observation cloud of size $n$. The multivariate NNS regression is superficially similar but mechanistically different: it searches over a compressed set of **regression points** — local conditional means derived from per-regressor partitions — rather than over raw observations. The search space is smaller, and each candidate neighbor has already been denoised through local averaging. This distinction is the foundation of the NNS multivariate architecture and is developed in full in the following section.
+
+---
+
+## Exact Fit, Interpolation, and Extrapolation
+
+An important feature of recursive mean-split regression is its finite limit behavior.
+
+At a sufficiently high partition order $O^*$, each observation occupies its own terminal region. At that point,
+
+$$\hat f(X_i) = Y_i \qquad \text{for all } i.$$
+
+This identifies the finite interpolation limit of the estimator, but it is also a warning sign: in practice, the preferred partition order is chosen before this limit, typically by cross-validation or dependence-driven order selection, in order to balance local fidelity against overfitting.
+
+The estimator spans a full spectrum:
+
+- coarse global approximation at low order,
+- increasingly local nonlinear fit at intermediate order,
+- exact in-sample interpolation at finite maximal order.
+
+In the univariate case, prediction may be based either on the piecewise-constant local mean within a terminal cell, or on the piecewise-linear interpolation across adjacent regression points. In the multivariate case, neither of these descriptions applies — prediction proceeds instead by nearest-neighbor lookup over the regression-point matrix, as described in the following section.
+
+---
+
+## Multivariate Regression: Per-Regressor Partitioning and the Curse of Dimensionality
+
+The multivariate case requires a separate treatment, and its architecture is the primary contribution of NNS regression for high-dimensional settings.
+
+The fundamental challenge in multivariate nonparametric regression is the **curse of dimensionality**: as the number of predictors $d$ grows, the volume of predictor space grows exponentially, and observations spread thinly across it. Local averaging methods that partition the joint predictor space suffer because the number of cells grows as $K^d$ for $K$ partition points per dimension, while the number of observations per cell decreases at a corresponding rate. For standard kNN regression, the relevant neighbors become increasingly distant as $d$ grows.
+
+The NNS multivariate architecture addresses this challenge through a structural decision at the partitioning stage, not through dimensionality reduction after the fact.
+
+### Per-regressor partitioning against the response
+
+Each predictor $X^{(j)}$ is partitioned **independently against the response $Y$** using the univariate recursive mean-split procedure. This produces a set of $K_j$ regression points for predictor $j$ — local conditional means of the pairs $(X^{(j)}, Y)$ within the partition cells of that regressor.
+
+This is the key architectural decision. Rather than partitioning the joint $d$-dimensional predictor space — which would produce up to $K^d$ cells — NNS partitions each predictor dimension separately against the response. The number of regression points generated is $\sum_{j=1}^d K_j$, which grows **linearly** in $d$ rather than exponentially.
+
+The benefit is twofold. First, the search space for prediction is compressed: the candidate set has size of order $\sum_j K_j$, not $\prod_j K_j$. Second, each candidate in the search space is not a raw observation but a **local conditional mean** — an average over a cluster of observations that share similar benchmark-relative structure with respect to the response. Noise has already been reduced before the distance calculation is performed.
+
+### Why this mitigates the curse
+
+The curse of dimensionality in standard kNN is a consequence of searching over $n$ raw observations in $\mathbb{R}^d$: the volume of the space containing any fixed proportion of the observations grows as $n^{-1/d}$, making nearest neighbors increasingly distant and the local average increasingly biased.
+
+In the NNS multivariate framework:
+
+1. **The search space is compressed.** The regression point matrix (RPM) has $M \ll n$ rows, one per occupied joint cell. Each row is a local conditional mean, not a raw observation.
+
+2. **Each candidate neighbor is denoised.** Because each regression point averages over a cluster of observations, the effective noise level of each candidate is reduced relative to a raw observation. The nearest-neighbor distance calculation is performed over a geometry that is already smoother than the raw data.
+
+3. **The partition depth per regressor is signal-adaptive.** When `order = NULL`, each regressor receives a partition depth proportional to its directional dependence with the response (measured by `NNS.dep`). Regressors with weak predictive content receive shallow partitions and contribute few regression points. Regressors with strong predictive content receive deeper partitions. The search space is therefore automatically concentrated on the dimensions most relevant to prediction.
+
+Together, these three properties mean that the effective dimensionality of the prediction problem is reduced not by collapsing predictors into a lower-dimensional index, but by compressing and denoising the candidate set before the nearest-neighbor search.
+
+### Regression point matrix
+
+The per-regressor regression points are assembled into a **regression point matrix (RPM)**. Each row of the RPM corresponds to one occupied joint cell in the multivariate partition structure; the columns record the local mean of each predictor within that cell, and a final column records the corresponding local mean response.
+
+For a new observation $x^* \in \mathbb{R}^d$, prediction proceeds by identifying the rows of the RPM whose predictor means lie closest to $x^*$ and returning the weighted average of the corresponding local response means.
+
+```r
+# Multivariate regression example
+fit <- NNS.reg(X, y) # X is an n x d matrix; order = NULL by default
+fit$RPM # the regression point matrix
+fit$Point.est # predicted values for new observations
+```
+
+### Dependence-sensitive neighbor count
+
+The number of neighbors used in the final averaging step is itself dependence-sensitive. When estimated dependence between predictors and the response is high, the local regression surface is more coherent and fewer neighbors suffice for a stable prediction. When dependence is lower, broader averaging over more neighbors improves stability.
+
+Localization is therefore adjusted not only by partition geometry, but also by the estimated strength of the multivariate signal. The multivariate NNS regression is thus a **response-anchored regression-point nearest-neighbor estimator**: partitioning creates a denoised, compressed geometry of local conditional means, and nearest-neighbor search over that geometry — with dependence-adaptive neighbor count — supplies the final prediction.
+
+### Structural comparison with standard kNN
+
+The difference from standard kNN can be stated precisely:
+
+| Property | Standard kNN | Multivariate NNS |
+|---|---|---|
+| Search space | $n$ raw observations | $M \ll n$ regression points |
+| Candidate quality | Individual observations, full noise | Local conditional means, partially denoised |
+| Search space growth in $d$ | Fixed at $n$; neighbor distance grows | $\sum_j K_j$, linear in $d$ |
+| Neighbor count | Fixed $k$ | Dependence-adaptive |
+| Partition basis | None | Per-regressor against response |
+
+The NNS approach is therefore not simply kNN with a different distance metric. It is kNN over a fundamentally different, response-anchored candidate set.
+
+---
+
+## Adaptive Order Selection: Dependence-Driven Partition Depth
+
+When the user leaves `order = NULL` (the default), `NNS.reg` does not apply one global partition depth uniformly across all predictors. Instead, it computes a directional dependence score between each regressor and the response using `NNS.dep`-style dependence, then allocates recursion depth per regressor accordingly.
+
+Regressors with stronger directional dependence receive deeper recursive partitioning, enabling finer local approximation where signal is most evident. Regressors with weak dependence receive shallower partitioning, yielding broader smoothing and reducing the chance of overfitting noise-dominant inputs.
+
+This is the concrete implementation of the dynamic-bandwidth interpretation from Chapter 19: partition cell diameter is not only data-adaptive but **dependence-adaptive**. Smoothing granularity is endogenously assigned by signal strength rather than fixed through a single hand-tuned global parameter.
+
+```r
+fit <- NNS.reg(x, y) # order = NULL by default
+fit$rhs.partitions # realized partition depth per regressor
+```
+
+The realized depth profile is relative: predictors with higher directional dependence typically receive finer partitioning than predictors with weaker dependence in the same fit. The exact realized depths depend on the sample, occupancy constraints, and other fitting controls.
+
+The main practical consequence is that per-variable manual tuning is often unnecessary in exploratory workflows, while full control remains available when needed (`order = 5`, `order = "max"`, and related settings).
+
+---
+
+## Dimension Reduction via Synthetic Predictors
+
+The package also provides a qualitatively different way to address multivariate regression when dimensionality becomes burdensome. Rather than preserving the full joint predictor geometry, the predictors may be collapsed into a single **synthetic index** and standard univariate NNS regression applied to that index.
+
+Let the predictors be $X^{(1)},\dots,X^{(d)}$. After rescaling each predictor to the unit interval, write the normalized predictors as $\tilde X^{(1)},\dots,\tilde X^{(d)} \in [0,1]$. The synthetic predictor is
+
+$$X^* = \frac{\sum_{j=1}^{d} w_j \tilde X^{(j)}}{\sum_{j=1}^{d} \mathbf{1}[w_j\neq 0]},$$
+
+where $w_j$ is the weight assigned to predictor $j$.
+
+This replaces a $d$-dimensional predictor vector with a single composite variable, allowing the full univariate recursive mean-split machinery — including piecewise-linear interpolation — to be reused directly.
+
+### Weighting options
+
+The weights $w_j$ may be determined in several ways:
+
+- **Equal weighting** assigns all included predictors the same weight.
+- **Correlation weighting** (`dim.red.method = "cor"`) uses signed correlation coefficients.
+- **Directional dependence weighting** uses `NNS.dep`, connecting the reduction step to the nonlinear dependence framework developed in Chapter 10.
+- **Directional causation weighting** uses `NNS.caus`, allowing predictors with stronger causal evidence to receive greater weight.
+- **Ensemble weighting** combines multiple weighting schemes into a single composite score.
+
+Dimension reduction is therefore not a purely geometric projection. It is a **structure-aware aggregation of predictors**, where the weights themselves may be derived from directional measures developed earlier in the book.
+
+### Variable selection through thresholding
+
+The `threshold` parameter excludes predictors whose weights fall below a chosen value $\tau$:
+
+$$w_j < \tau \implies \text{predictor } j \text{ excluded from } X^*.$$
+
+This turns the reduction step into a form of **variable selection** as well as aggregation.
+
+### Regression after reduction
+
+Once $X^*$ is formed, the regression problem becomes univariate:
+
+$$f^*(x^*) = E[Y \mid X^* = x^*].$$
+
+Standard univariate NNS regression is then applied to $(X^*, Y)$, producing the familiar recursive partition, regression points, piecewise-linear interpolation path, and local conditional means.
+
+The multivariate reduction pipeline is therefore:
+
+1. rescale each predictor,
+2. compute directional or correlation-based weights,
+3. threshold weak predictors if desired,
+4. form the synthetic predictor $X^*$,
+5. run univariate NNS regression on $X^*$ against $Y$.
+
+### Conceptual comparison of the two multivariate paths
+
+The dimension-reduction path and the regression-point nearest-neighbor path address dimensionality through different strategies.
+
+- The **regression-point nearest-neighbor path** preserves the joint predictor structure. It mitigates dimensionality by partitioning each regressor against the response independently, then searching over a compressed set of local conditional means — a search space that grows linearly rather than exponentially in $d$ — with dependence-adaptive neighbor count. Prediction is a smooth weighted average over regression points. Piecewise-linear interpolation is not available on this path because there is no natural ordering of regression points in $\mathbb{R}^d$.
+
+- The **dimension-reduction path** collapses the predictor space entirely, trading interaction structure for parsimony. Once the synthetic index is formed, univariate NNS regression applies — and its piecewise-linear interpolation is again available.
+
+For some problems, especially when predictors are numerous and noisy, the synthetic-index approach can be an advantage rather than a liability. Simple weighted composites often generalize well out of sample, and the univariate path offers stability, interpretability, and straightforward visualization that the full multivariate path cannot match.
+
+The two strategies are worth keeping distinct — in particular because the prediction mechanism differs between them.
+
+---
+
+## Practical Perspective on NNS Regression
+
+From a practical standpoint, the NNS regression framework can be summarized with five ideas.
+
+### It is nonparametric
+
+No functional form is assumed for the regression surface. Consistency follows from class membership in the data-adaptive partition estimator class established by Stone (1977).
+
+### It is nonlinear
+
+Curvature, thresholds, and interactions emerge naturally through local partition geometry.
+
+### It is adaptive
+
+The effective smoothing scale varies by region rather than being imposed globally. In the multivariate case, it also varies by predictor, with finer partitioning allocated to regressors with stronger directional dependence on the response.
+
+### It is interpretable
+
+The fitted model can be understood through local means, partition regions, and regression points.
+
+### Its prediction mechanism depends on dimensionality
+
+In the univariate case, prediction uses piecewise-linear interpolation across ordered regression points. In the multivariate case, prediction uses nearest-neighbor search over the regression-point matrix — a fundamentally different mechanism operating over a compressed, denoised candidate set that grows linearly in the number of predictors, substantially mitigating the curse of dimensionality that affects standard kNN and joint-partition methods alike.
+
+---
+
+## Relationship to the Broader NNS Program
+
+This chapter completes an important conceptual arc.
+
+Earlier chapters showed that directional deviation operators generate distribution functions, moment decompositions, dependence measures, conditional probabilities, and stochastic dominance diagnostics.
+
+Chapters 18 and 19 then showed that the same directional logic induces a consistent adaptive estimator through recursive mean splitting — consistent by class membership, with the dynamic bandwidth interpretation making the connection to classical kernel theory explicit.
+
+This chapter reframes that estimator in its most familiar applied form: **nonparametric regression**.
+
+The deeper point is that regression itself can be understood as another consequence of the same benchmark-relative directional primitive that generated the earlier parts of the book. Distribution theory, dependence, clustering, and regression are not separate constructions here. They are structurally linked through recursive directional decomposition.
+
+The multivariate architecture — per-regressor partitioning against the response, regression-point matrix construction, and dependence-adaptive nearest-neighbor prediction — is the practical expression of that linkage in high-dimensional settings. It is where the theoretical elegance of the partial-moment framework translates into a concrete answer to one of the hardest problems in nonparametric estimation.
+
+---
+
+## Summary
+
+This chapter developed nonparametric regression in the NNS framework as **conditional expectation estimation by recursive partitioning**.
+
+Its main contributions are sevenfold.
+
+**First**, it defined regression at its most fundamental level as estimation of the conditional mean function $f(x) = E[Y \mid X = x]$, and located the NNS approach within the class of data-adaptive partition estimators whose consistency has been established by Stone (1977), Lugosi and Nobel (1996), and Györfi et al. (2002).
+
+**Second**, it showed how recursive mean-split partitions produce **nonlinear regression surfaces** by forming local conditional averages over data-adaptive regions, with partition geometry following the benchmark-relative directional structure of the data.
+
+**Third**, it interpreted those regions as **partition clusters**, so that regression becomes piecewise estimation from locally coherent structural groupings.
+
+**Fourth**, it distinguished clearly between the univariate and multivariate prediction mechanisms. In the univariate case, regression points are connected by line segments to produce a **piecewise-linear interpolating surface**. In the multivariate case, this description does not apply: prediction is performed by **nearest-neighbor search over the regression-point matrix** — a compressed, denoised set of local conditional means — yielding a smooth weighted average rather than a linear interpolation.
+
+**Fifth**, it explained why the multivariate architecture mitigates the curse of dimensionality. Per-regressor partitioning against the response produces a candidate search set that grows **linearly** in the number of predictors rather than exponentially, and each candidate is a denoised local conditional mean rather than a raw observation. This is not a post-hoc dimensionality reduction; it is a structural property of the partitioning design.
+
+**Sixth**, it introduced **dimension reduction via synthetic predictors** as an alternative multivariate path. Predictors are rescaled, weighted by directional relevance, thresholded for variable selection, and collapsed into a single composite index $X^*$, after which standard univariate NNS regression — including piecewise-linear interpolation — applies directly.
+
+**Seventh**, it compared the NNS approach with classical regression models — linear, polynomial, additive, tree-based, kernel-based, and kNN — highlighting the distinctive combination of nonparametric flexibility, endogenous bandwidth, response-anchored regression-point nearest-neighbor prediction, dependence-adaptive localization, and optional synthetic-index reduction.
+
+The next chapter turns from conditional mean estimation to **classification**, where the same directional partition structures are used not to predict a numeric response, but to assign observations to classes.
+
+> **Further Reading / Examples**
+> For hands-on regression applications, including multivariate and noisy data examples, see the [NNS Regression Examples](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/README.md#2-regression).
+
+---
+
+## References
+
+- Stone, C. J. (1977). Consistent nonparametric regression. *Annals of Statistics*, 5(4), 595–620.
+
+- Lugosi, G., & Nobel, A. (1996). Consistency of data-driven histogram methods for density estimation and classification. *Annals of Statistics*, 24(2), 687–706.
+
+- Györfi, L., Kohler, M., Krzyżak, A., & Walk, H. (2002). *A Distribution-Free Theory of Nonparametric Regression*. Springer.
+
+- Vinod, H. D., & Viole, F. (2017). Nonparametric regression using clusters. *Computational Economics*, 52(4), 1181–1209. https://doi.org/10.1007/s10614-017-9713-5
+
+- Vinod, H. D., & Viole, F. (2018). Clustering and curve fitting by line segments. *Preprints*, 2018010090. https://doi.org/10.20944/preprints201801.0090.v1
+
+- Viole, F. (2020). Partitional estimation using partial moments. *SSRN eLibrary*. https://doi.org/10.2139/ssrn.3592491
+
+- Viole, F., & Nawrocki, D. (2013). *Nonlinear Nonparametric Statistics: Using Partial Moments*. CreateSpace.
diff --git a/tools/NNS/book/chapter-23-classification.Rmd b/tools/NNS/book/chapter-23-classification.Rmd
new file mode 100644
index 0000000..e9deb82
--- /dev/null
+++ b/tools/NNS/book/chapter-23-classification.Rmd
@@ -0,0 +1,701 @@
+# Classification
+
+Chapter 21 developed nonparametric regression in the NNS framework as conditional expectation estimation by recursive partitioning. There, the response variable was numeric, and the objective was to recover a conditional mean surface.
+
+This chapter turns to **classification**.
+
+In classification problems, the response is not a number to be predicted directly, but a **category to be assigned**. The task is to determine, from observed predictor variables, which class label is most appropriate for a new observation.
+
+Classical classification methods such as **logistic regression**, **linear discriminant analysis**, **support vector machines**, and **random forests** are widely used and often effective. But they inherit many of the structural limitations discussed throughout this book:
+
+- linearity assumptions,
+- symmetric treatment of deviations,
+- tuning dependence,
+- and difficulty adapting to nonlinear, asymmetric, or benchmark-relative class structure.
+
+The directional framework offers a different perspective.
+
+Rather than beginning with a global separating equation or a fixed geometric margin, NNS classification begins with **recursive benchmark-relative partitioning of the predictor space**. Classification then proceeds by assigning class labels according to the local structure of the resulting partitions.
+
+The central idea is simple:
+
+**classification is the categorical analogue of conditional expectation estimation.**
+
+Regression estimates
+
+\[
+E[Y \mid X=x].
+\]
+
+Classification estimates
+
+\[
+P(Y = c \mid X=x)
+\]
+
+for each class \(c\), and assigns the label corresponding to the largest conditional probability.
+
+This chapter develops that viewpoint.
+
+---
+
+## Classification as Conditional Probability Estimation
+
+Let \(X \in \mathbb{R}^d\) denote a predictor vector and let \(Y\) take values in a finite label set
+
+\[
+\mathcal{C} = \{1,2,\dots,K\}.
+\]
+
+A classifier is a rule
+
+\[
+g:\mathbb{R}^d \to \mathcal{C}
+\]
+
+that assigns a class label to each predictor point.
+
+The optimal classifier under zero-one loss is the **Bayes classifier**:
+
+\[
+g^*(x) = \arg\max_{c \in \mathcal{C}} P(Y=c \mid X=x).
+\]
+
+Thus the classification problem is fundamentally a conditional probability problem.
+
+This aligns naturally with the NNS framework developed earlier. Chapter 13 showed that conditional probabilities can be written in terms of partial moments and co-partial moments. Classification therefore fits directly into the same directional machinery:
+
+- partition the predictor space,
+- estimate local class probabilities,
+- assign the class with highest estimated local probability.
+
+So the statistical primitive does not change. Only the form of the response changes.
+
+---
+
+## Why Classical Classification Methods Can Fail
+
+Classical classification methods often perform well when class boundaries are smooth, approximately linear, and well separated. But many real datasets are not.
+
+### Linear decision boundaries
+
+Methods such as logistic regression and linear discriminant analysis impose global linear or quadratic boundary structure. When the true class geometry is nonlinear, these boundaries can misclassify substantial regions.
+
+### Symmetric distance assumptions
+
+Distance-based classifiers often treat deviations symmetrically around centers. But class structure may depend more on one side of a threshold than another.
+
+### Global parameterization
+
+Many classical methods summarize the full predictor space with a small number of global parameters. This can obscure local class structure.
+
+### Tuning dependence
+
+Support vector machines require kernel and penalty choices. Tree ensembles require tuning of depth, feature subsampling, and aggregation parameters. Performance can depend heavily on these choices.
+
+### Imbalance sensitivity
+
+In imbalanced settings, global classifiers may be dominated by the majority class unless explicitly reweighted.
+
+These limitations echo the broader theme of the book:
+
+**classical methods often begin with an aggregate geometric form, whereas directional methods begin with local benchmark-relative structure.**
+
+---
+
+## Directional Decision Regions
+
+The NNS approach to classification begins from the recursive partition machinery introduced in Chapters 18–20.
+
+Suppose the predictor space is partitioned into cells
+
+\[
+A_1, A_2, \dots, A_M.
+\]
+
+Within each terminal cell, the local class probabilities are estimated empirically:
+
+\[
+\hat p_c(x)
+=
+\frac{\#\{i : X_i \in A(x),\, Y_i = c\}}
+{\#\{i : X_i \in A(x)\}},
+\]
+
+where \(A(x)\) denotes the terminal cell containing \(x\).
+
+The classifier is then
+
+\[
+\hat g(x) = \arg\max_{c \in \mathcal{C}} \hat p_c(x).
+\]
+
+This is the partition-based analogue of the Bayes rule.
+
+The crucial difference from classical methods is that the regions \(A(x)\) are not fixed by global parametric geometry. They are induced recursively by the data through benchmark-relative splits.
+
+In the binary case, the decision boundary is the set of points where the estimated conditional class probabilities are equal:
+
+\[
+\hat p_1(x) = \hat p_2(x).
+\]
+
+Equivalently,
+
+\[
+P(Y=1 \mid X=x) - P(Y=2 \mid X=x) = 0.
+\]
+
+Within NNS, this boundary is not imposed in advance. It emerges from the partition structure.
+
+---
+
+## Binary Classification
+
+Consider first the case
+
+\[
+Y \in \{1,2\}.
+\]
+
+For each partition cell \(A\), define
+
+\[
+\hat p(A) = \frac{1}{N_A}\sum_{i:X_i \in A} Y_i,
+\]
+
+where \(N_A\) is the number of observations in the cell.
+
+Since \(Y\) is binary, \(\hat p(A)\) is simply the empirical fraction of class-1 observations in the cell. Thus
+
+\[
+\hat p(A) \approx P(Y=1 \mid X \in A).
+\]
+
+The decision rule becomes
+
+\[
+\hat g(x)=
+\begin{cases}
+1 & \hat p(A(x)) > 1/2,\\
+2 & \hat p(A(x)) \le 1/2.
+\end{cases}
+\]
+
+This makes the analogy to regression immediate. In regression, the local average estimates the conditional mean. In binary classification, the local average of the binary label estimates the conditional class probability.
+
+So binary classification in the NNS framework is simply **partition-based probability estimation followed by thresholding**.
+
+> **Implementation note (important):** for `NNS.boost(..., type = "CLASS")` and related classification interfaces, class labels should start at `1` (not `0`). Recode `0/1` targets to `1/2` before fitting.
+
+---
+
+## Multiclass Classification
+
+Now suppose
+
+\[
+Y \in \{1,2,\dots,K\}
+\]
+
+with \(K>2\).
+
+For each class \(c\), define the local probability estimator
+
+\[
+\hat p_c(A) =
+\frac{\#\{i:X_i \in A,\ Y_i=c\}}{N_A}.
+\]
+
+Because the classes partition the response space,
+
+\[
+\sum_{c=1}^K \hat p_c(A)=1.
+\]
+
+The multiclass decision rule is
+
+\[
+\hat g(x)=\arg\max_{c} \hat p_c(A(x)).
+\]
+
+This yields a **piecewise-constant class probability surface** over the predictor space.
+
+The resulting decision regions need not be linear, convex, or globally smooth. They inherit the geometry of the recursive partition.
+
+This is one of the major strengths of the NNS classifier:
+
+- complex local structure can be captured,
+- multiple class regions can interleave nonlinearly,
+- and no prior assumption is imposed on the shape of the boundary.
+
+---
+
+## Recursive Mean-Split Classification Geometry
+
+The partition structure from the regression chapters remains central.
+
+At each stage of recursive partitioning, a region is split around local means. In joint partitioning, this creates benchmark-defined subregions corresponding to directional quadrants. In \(X\)-only partitioning, it creates recursive subdivisions of predictor space.
+
+For classification, these same regions become **local decision neighborhoods**.
+
+Each terminal cell stores:
+
+- its local predictor benchmark structure,
+- its class composition,
+- its estimated class probabilities,
+- and its dominant class label.
+
+The decision function is therefore interpretable geometrically.
+
+### Regional interpretation
+
+A class assignment is not produced by a hidden global optimization alone. It is produced because the new point falls into a particular benchmark-relative region whose observed class composition favors one label.
+
+### Boundary interpretation
+
+Decision boundaries are unions of partition edges separating regions with different dominant labels or different class-probability rankings.
+
+### Refinement interpretation
+
+Where class mixing remains high, further partitioning can sharpen the local probability estimate. Where classes are already well separated, coarser regions suffice.
+
+Thus the classifier adapts its complexity to the structure of the data.
+
+---
+
+## Directional Decision Boundaries
+
+The phrase **directional decision boundary** has a precise meaning in this framework.
+
+A classical linear classifier produces a boundary such as
+
+\[
+\beta_0 + \beta^\top x = 0,
+\]
+
+which divides the predictor space globally into two half-spaces.
+
+A directional classifier instead produces boundaries induced by recursive benchmark-relative partitions. These boundaries are directional in three senses.
+
+### They are benchmark-relative
+
+Each split is defined relative to a local benchmark, typically a mean vector.
+
+### They are locally adaptive
+
+Boundaries need not extend globally as a single hyperplane. They adapt region by region.
+
+### They preserve asymmetry
+
+If class separation is stronger on one side of a benchmark than another, the partition geometry reflects that asymmetry.
+
+This is important in applications where class identity depends on threshold behavior. For example:
+
+- default versus non-default beyond leverage thresholds,
+- disease state beyond biomarker cutoffs,
+- regime classification beyond volatility breaks,
+- operational alert states beyond service-level violations.
+
+In such settings, the meaningful structure is often directional before it is geometric.
+
+---
+
+## Probability Surfaces and Class Assignment
+
+Because classification in NNS is based on local probability estimation, it is useful to distinguish three related objects.
+
+### Local class probability surface
+
+For each class \(c\),
+
+\[
+x \mapsto \hat p_c(x)
+\]
+
+gives the estimated probability that \(x\) belongs to class \(c\).
+
+### Hard classification map
+
+\[
+x \mapsto \hat g(x)
+\]
+
+assigns the label with largest estimated local probability.
+
+### Classification certainty
+
+A useful summary in the binary case is
+
+\[
+\hat C(x) = |2\hat p_1(x)-1|.
+\]
+
+This lies in \([0,1]\).
+
+- \(\hat C(x)=0\) indicates maximal local ambiguity,
+- \(\hat C(x)=1\) indicates complete local separation.
+
+In the multiclass case, an analogous certainty measure is
+
+\[
+\hat C(x)= \hat p_{(1)}(x)-\hat p_{(2)}(x),
+\]
+
+where \(\hat p_{(1)}\) and \(\hat p_{(2)}\) are the largest and second-largest local class probabilities.
+
+This difference measures the local margin between the best and second-best labels.
+
+Thus the classifier naturally provides not only a label, but also a measure of how decisively that label is supported.
+
+---
+
+## Package Implementation Note
+
+In the NNS package, the classification logic described here is exposed most practically through the ensemble interfaces `NNS.boost()` and `NNS.stack()`, using `type = "CLASS"` to invoke class-label prediction rather than numeric conditional-mean prediction.
+
+The underlying partition logic remains the same: predictor space is decomposed into benchmark-relative regions, local class probabilities are estimated within those regions, and final assignment is made by dominant-label selection through
+
+\[
+\hat g(x)=\arg\max_c \hat p_c(x).
+\]
+
+The function `NNS.reg()` is useful for understanding the underlying partition-based estimation structure, and more generally the NNS framework supports classification through the same recursive partition machinery developed for regression. But in applied package use, classification is most naturally presented through the boosted and stacked interfaces, which stabilize the local decision rule and improve empirical performance.
+
+---
+
+## Boosting and Stacking in the NNS Framework
+
+The base partition classifier can be strengthened through ensemble methods.
+
+The NNS framework includes two especially important ensemble ideas:
+
+- **boosting**, and
+- **stacking**.
+
+These extend the partition-based classifier without abandoning the directional logic.
+
+### Boosting
+
+Boosting combines many classification models generated from resampled or reweighted data. Each model captures a slightly different local view of the class structure. Their outputs are then aggregated.
+
+Conceptually, if
+
+\[
+\hat g^{(1)}, \hat g^{(2)}, \dots, \hat g^{(B)}
+\]
+
+denote classifiers trained across bootstrap or resampled iterations, the boosted classifier aggregates them via voting or averaged class probabilities.
+
+This reduces instability in any single partition realization and improves classification robustness, especially when boundaries are irregular or the sample is noisy.
+
+### Stacking
+
+Stacking combines multiple candidate classifiers at a second stage. Instead of choosing a single best model class, stacking learns how to weight or combine them.
+
+If
+
+\[
+\hat p_c^{(1)}(x), \hat p_c^{(2)}(x), \dots, \hat p_c^{(m)}(x)
+\]
+
+are probability estimates from different base learners, a stacked classifier forms a combined estimate
+
+\[
+\hat p_c^{\text{stack}}(x)
+\]
+
+through optimized combination, then classifies by the largest combined probability.
+
+In the NNS setting, stacking is especially natural because class probabilities are already interpretable directional quantities. The second-stage learner therefore combines probability surfaces, not merely hard labels.
+
+These ensemble procedures preserve the core NNS strengths:
+
+- nonlinear boundaries,
+- local adaptivity,
+- and benchmark-relative interpretation,
+
+while often improving predictive accuracy.
+
+### Illustrative Workflow
+
+A practical classification workflow in NNS therefore proceeds in two layers.
+
+1. The partition-based learner defines local benchmark-relative class neighborhoods.
+2. The ensemble wrapper aggregates those local decisions across resamples or model combinations.
+
+Schematically, this is implemented by calling `NNS.boost(..., type = "CLASS")` for boosted classification or `NNS.stack(..., type = "CLASS")` for stacked classification. The first improves stability through repeated resampling and aggregation; the second combines candidate classifiers through an optimized second-stage rule.
+
+Thus the package implementation mirrors the theory exactly: local directional partitions generate class probabilities, and ensemble aggregation improves robustness of the final decision boundary.
+
+---
+
+## Why Ensemble Classification Helps
+
+The usefulness of boosting and stacking can be understood mathematically.
+
+Suppose each base classifier produces an estimated class probability
+
+\[
+\hat p_b(x) = p(x) + \varepsilon_b(x),
+\]
+
+where \(p(x)\) is the target probability and \(\varepsilon_b(x)\) is model-specific estimation error.
+
+Averaging over \(B\) such classifiers gives
+
+\[
+\bar p_B(x) = \frac{1}{B}\sum_{b=1}^B \hat p_b(x)
+= p(x) + \frac{1}{B}\sum_{b=1}^B \varepsilon_b(x).
+\]
+
+If the errors are centered and not perfectly dependent, then
+
+\[
+Var(\bar p_B(x))
+=
+\frac{1}{B^2}
+\sum_{b=1}^B Var(\varepsilon_b(x))
++
+\frac{2}{B^2}\sum_{b **Further Reading / Examples**
+> For machine learning applications, including the MNIST classification example, see the [NNS Machine Learning Examples](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/README.md#3-machine-learning).
diff --git a/tools/NNS/book/chapter-24-ensemble-methods.Rmd b/tools/NNS/book/chapter-24-ensemble-methods.Rmd
new file mode 100644
index 0000000..785df79
--- /dev/null
+++ b/tools/NNS/book/chapter-24-ensemble-methods.Rmd
@@ -0,0 +1,478 @@
+# Ensemble Methods
+
+Chapters 20–22 developed the machine-learning side of the NNS framework from three complementary angles.
+
+- Chapter 20 treated recursive partitions as an unsupervised clustering device.
+- Chapter 21 used those same partitions for conditional expectation estimation.
+- Chapter 22 used them for local conditional probability estimation and classification.
+
+A natural next step is to combine multiple directional learners into a single predictive system.
+
+This is the role of **ensemble methods**.
+
+In classical machine learning, ensembles improve predictive performance by combining many imperfect models. Bagging reduces variance through aggregation. Boosting emphasizes informative learners. Stacking combines model outputs through a meta-model. Random forests, gradient boosting, and stacked generalization are all expressions of this general idea.
+
+The directional framework reaches the same destination by a different route.
+
+Rather than aggregating trees, margins, or globally specified basis functions, NNS ensembles aggregate **benchmark-relative nonparametric learners** built from recursive partition logic. The result is not an imported ensemble superstructure grafted onto a classical base learner. It is an extension of the same directional machinery already developed in this book.
+
+Two package routines operationalize this idea:
+
+- `NNS.boost`, which performs resampling, feature-subset screening, and aggregation of NNS learners,
+- `NNS.stack`, which uses the predictions of NNS base models as meta-features for an optimized stacked model, including cross-validated selection of the neighbor count $k$ used in multivariate regression-point prediction.
+
+This chapter develops the conceptual role of ensembles in NNS, explains how boosting and stacking fit within the broader directional framework, and discusses practical issues of cross-validation, stability, and computational cost.
+
+---
+
+## Why Ensemble Learning Helps
+
+A nonparametric learner is flexible precisely because it does not impose a rigid functional form. That flexibility is a strength, but it also creates variability.
+
+When data are finite, noisy, imbalanced, or high-dimensional, different local partitions can emphasize different parts of the structure. One learner may capture an important threshold effect; another may be more stable in the center of the distribution; another may better recognize rare classes or tail behavior.
+
+No single learner is guaranteed to be uniformly best across all regions of the sample space.
+
+This motivates ensemble learning.
+
+The basic idea is simple:
+
+1. generate multiple candidate learners,
+2. allow them to capture different aspects of the data,
+3. combine them in a way that improves stability and accuracy.
+
+In classical language, ensembles often improve performance by reducing variance without increasing bias too severely, or by reducing bias through adaptive combinations of weak learners.
+
+The directional viewpoint sharpens this intuition.
+
+Because benchmark-relative partitions preserve local structure, different learners may disagree not randomly but **structurally**:
+
+- one learner may emphasize upper-tail behavior,
+- another may emphasize local neighborhood structure,
+- another may emphasize feature subsets with stronger nonlinear dependence,
+- another may perform better under class imbalance.
+
+An ensemble can therefore be interpreted as a way of aggregating **multiple local structural views** of the same problem.
+
+---
+
+## Ensemble Logic in the NNS Framework
+
+The conceptual continuity with earlier chapters is important.
+
+Throughout the book, the central move has been:
+
+- start with directional structure,
+- partition relative to benchmarks,
+- summarize within those regions,
+- aggregate only afterward.
+
+Ensemble learning in NNS follows exactly the same logic.
+
+A single NNS learner already performs a local structural decomposition of the data. An ensemble performs a second-level aggregation across many such decompositions.
+
+So the order of construction is
+
+1. **within-learner aggregation**: local averages or local class probabilities inside benchmark-relative partitions,
+2. **between-learner aggregation**: combining many directional learners into a final estimate.
+
+This two-level structure is why ensembles are natural in the NNS setting rather than auxiliary.
+
+The first level handles nonlinear local geometry.
+The second level stabilizes that geometry across feature subsets, folds, and candidate model specifications.
+
+---
+
+## Base Learners: Recursive Partition Estimation
+
+Both `NNS.boost` and `NNS.stack` are built on the same base principle: prediction from the NNS regression engine.
+
+The package documentation for `NNS.stack` states that it is a prediction model using the predictions of the NNS base models as features, and the documentation for `NNS.boost` states that it is an ensemble method using NNS multivariate regression as the base learner rather than trees. In both cases, the base learner is therefore not a decision stump, not a CART tree, and not a linear model. It is the directional nonparametric estimator developed in earlier chapters.
+
+This matters conceptually.
+
+Classical boosting — and its modern descendant gradient boosting — achieves its power by iteratively correcting residuals from deliberately weak base learners. The mathematical theory of that residual-correction process is well developed, and systems like XGBoost have been engineered to exploit it with exceptional efficiency. NNS does not attempt to replicate that theory. Its base learner is already a flexible nonlinear estimator; there are no residuals to correct in the same sense. Ensemble improvement in NNS comes from a different source entirely: stability and structural coverage across feature subsets and validation splits, not from iterative error minimization.
+
+This is an engineering difference as much as a mathematical one. Competing with XGBoost on its own terms — speed of residual descent, regularization of additive tree models, hardware-aware split enumeration — is not the goal of NNS ensembles and should not be the standard of comparison. The goal is a directional nonparametric system that can represent nonlinear, asymmetric, and threshold-driven structure without imposing a tree topology, and that remains interpretable through the same benchmark-relative geometry used throughout the book.
+
+That distinction gives NNS ensembles a different flavor:
+
+- the individual learners are already nonlinear and locally adaptive,
+- ensemble gains come primarily from stabilization, feature engineering, and structural aggregation,
+- interpretation remains tied to local partitions rather than to abstract parameter updates.
+
+---
+
+## Resampling and Aggregation via `NNS.boost`
+
+The `NNS.boost` routine implements ensemble learning through a sequence of resampling, feature screening, threshold selection, and final aggregation.
+
+At the interface level, the routine accepts training predictors `IVs.train`, a response `DV.train`, optional test predictors `IVs.test`, learner controls such as `depth`, `learner.trials`, `epochs`, `CV.size`, and optimization controls such as `obj.fn`, `objective`, and `threshold`. It also supports class balancing through `balance`, time-series handling through `ts.test`, prediction intervals through `pred.int`, and feature-frequency summaries through `features.only` and `feature.importance`.
+
+The package description makes two points immediately clear.
+
+First, `NNS.boost` is not restricted to numeric regression; it can also be used for classification via `type = "CLASS"`.
+
+Second, the routine is not merely averaging many full-model fits on bootstrap resamples. It is also **learning which feature combinations are useful**.
+
+### Threshold-learning stage
+
+At an abstract level, the procedure begins by generating many candidate feature subsets and evaluating their predictive performance on resampled validation splits. Let
+
+$$\mathcal{F}_1,\mathcal{F}_2,\dots,\mathcal{F}_M$$
+
+denote the candidate feature subsets. For each subset $\mathcal{F}_m$, a base learner produces predictions $\hat y^{(m)}_i$ on held-out observations, and an objective function evaluates the result:
+
+$$J_m = \Phi\!\bigl(\hat y^{(m)}, y\bigr),$$
+
+where $\Phi$ is the chosen objective.
+
+The package default for continuous targets is sum of squared errors,
+
+$$\Phi(\hat y,y)=\sum_i (\hat y_i-y_i)^2,$$
+
+while for classification the code automatically switches the default objective to accuracy when `type = "CLASS"`.
+
+The threshold is then learned from the empirical distribution of the candidate objective scores. In the default setting `extreme = FALSE`, the routine uses the upper hinge of the five-number summary for maximization problems and the lower hinge for minimization problems. If `extreme = TRUE`, it instead uses the literal maximum or minimum observed score. Feature subsets whose validation performance passes that threshold are retained for the ensemble.
+
+This is the first important NNS-specific departure from classical textbook boosting:
+
+**the algorithm is not reweighting observations sequentially in the AdaBoost sense; it is screening and aggregating feature-defined directional learners.**
+
+### Weighted feature sampling in the epoch loop
+
+After the threshold-learning stage, `NNS.boost` constructs a **feature pool weighted by survival frequency**: each feature index is repeated in proportion to how often it appeared across the learner-trial sets that passed the threshold. In each epoch, two random decisions are made jointly — a feature count $k \in \{1, \ldots, n\}$ is drawn uniformly, and then $k$ distinct indices are sampled from this weighted pool without replacement. Features that recurred often in successful learner trials are therefore overrepresented in the draw, but any combination of any size can still appear.
+
+This is structurally analogous to the random-subspace mechanism in random forests, but with a crucial difference: the sampling probabilities are not uniform. They are earned. A feature earns higher sampling weight by having appeared in learner-trial subsets that cleared the validation threshold. The epoch loop is thus not random exploration of feature space — it is **weighted exploration biased toward stability**, with the bias itself learned from the threshold stage.
+
+An epoch's feature set is retained only if its validation objective clears the same threshold. The surviving sets feed the frequency table that drives the final synthetic predictor construction described below.
+
+### User-specified objectives
+
+The objective need not be squared error or simple accuracy. The package allows any objective written as an expression in `predicted` and `actual`. Thus users may supply application-specific measures such as precision-weighted loss, F-score style criteria, percentage error, or other custom objectives.
+
+This is an important practical strength. It allows the ensemble to be aligned with the actual loss relevant to the application rather than with a default proxy.
+
+### Feature-frequency aggregation
+
+Once useful candidate subsets have been identified, the retained feature sets are aggregated by frequency. Features that appear often among successful learners receive greater weight in the final construction.
+
+This makes `NNS.boost` partly a prediction routine and partly a feature-stability routine.
+
+The ensemble is therefore interpretable in two linked ways:
+
+- through its predictions,
+- through the frequency with which features participate in successful directional learners.
+
+That second output is especially useful in nonlinear settings, where variable importance is often harder to read from a single local model.
+
+### Final estimate
+
+Once the epoch loop is complete, feature frequencies are normalized across all passing epochs to produce a weight vector aligned to the columns of `IVs.train`. These normalized frequencies are then supplied to `NNS.reg` as a custom coefficient vector through `dim.red.method`, which constructs a frequency-weighted synthetic predictor $X^*$ from the original features. In this way, predictors that recur more often among successful learners receive greater influence in the final synthetic index.
+
+The final estimate is not taken directly from that first dimension-reduction fit. Instead, `NNS.boost` uses the resulting training and test projections onto $X^*$ to form a two-column design `(X^*, X^*)`, and then passes that design to `NNS.stack` with `method = 1`. This allows the routine to obtain the final prediction through the NNS regression-point mechanism while using the stacked framework to optimize the terminal neighbor-selection step `n.best`.
+
+Accordingly, the final model should not be described as a committee vote, a simple average of retained learners, or merely a single `NNS.reg` fit on the weighted composite. It is better understood as a **stability-weighted synthetic predictor followed by an optimized final NNS regression-point estimate**. The boosting stage learns which features survive repeated validation, and the closing stacking step converts that learned structure into the final prediction rule.
+
+---
+
+## Optimized Stacking via `NNS.stack`
+
+If `NNS.boost` performs feature-based ensemble screening, `NNS.stack` performs **meta-learning from NNS predictions** — and simultaneously optimizes the neighbor count $k$ used in the multivariate regression-point search.
+
+The package documentation describes `NNS.stack` as a prediction model using the predictions of NNS base models as features for the stacked model. That sentence captures the essential idea of stacking.
+
+Suppose that multiple base learners produce predictions
+
+$$\hat y^{(1)}(x),\hat y^{(2)}(x),\dots,\hat y^{(K)}(x).$$
+
+A stacked model does not choose one of them. It treats them as a new feature vector
+
+$$z(x)=\bigl(\hat y^{(1)}(x),\dots,\hat y^{(K)}(x)\bigr)$$
+
+and then learns a second-stage predictor
+
+$$\hat y_{\mathrm{stack}}(x)=G\!\bigl(z(x)\bigr).$$
+
+In the NNS setting, the base learners come from two main sources documented in the function interface:
+
+- **Method 1**: direct `NNS.reg` prediction, which in the multivariate case operates as regression-point nearest-neighbor search over a compressed set of local conditional means (as developed in Chapter 21) — not a piecewise-linear surface.
+- **Method 2**: dimension-reduction regression built from synthetic predictor combinations, which collapses the predictor space to a univariate index before applying standard NNS regression.
+
+The `method` argument controls whether method 1, method 2, or both are used. The default `method = c(1, 2)` includes both, which means that the stacked system can combine the geometry-preserving regression-point prediction of method 1 with the parsimonious synthetic-index regression of method 2.
+
+This is important because the two sources of prediction reflect structurally different views of the same problem. Method 1 operates in the full predictor space through a compressed nearest-neighbor geometry; method 2 compresses the predictor space itself before any regression is performed. Stacking across both allows the meta-learner to weight whichever structural representation generalizes better on the held-out validation data.
+
+The `dim.red.method` argument controls how synthetic predictor weights are determined for method 2:
+
+- `"cor"` for linear correlation,
+- `"NNS.dep"` for nonlinear dependence,
+- `"NNS.caus"` for directional causation,
+- `"equal"` for equal weighting,
+- `"all"` for averaging all methods.
+
+Thus the stacked learner can use not only multiple predictions, but multiple **structural weighting philosophies**. In particular, when `"NNS.caus"` is used, the weighting is directional rather than symmetric: the synthetic regressor is constructed from estimated causal influence rather than from a purely mutual dependence score.
+
+### Optimizing $k$: neighbor count selection in the regression-point search
+
+A key mechanism of `NNS.stack` that distinguishes it from generic stacking is the **cross-validated optimization of $k$**, the number of nearest regression-point neighbors used in the multivariate prediction step of method 1.
+
+Recall from Chapter 21 that multivariate `NNS.reg` does not predict by connecting regression points with line segments. It performs nearest-neighbor search over the regression-point matrix — a compressed set of local conditional means derived from the per-variable partitions. The number of neighbors $k$ used in that search directly controls the smoothness of the final prediction: small $k$ produces more local, potentially noisy predictions; large $k$ produces broader averaging that may underfit sharp local structure.
+
+Choosing the right $k$ is therefore a bias-variance decision specific to the regression-point geometry of each dataset. `NNS.stack` addresses this automatically within the fold loop: across each fold, candidate values of $k$ are evaluated on held-out validation predictions, and the $k$ that best satisfies the chosen objective is selected. The selected $k$ is then used for final prediction on the test set.
+
+This means `NNS.stack` is not merely stacking predictions from fixed base learners. It is simultaneously discovering the right localization level for the regression-point nearest-neighbor search, fold by fold, as part of the same cross-validation loop that evaluates feature combinations and classification thresholds.
+
+The practical consequence is substantial. A value of $k$ that is too small will overfit to idiosyncratic regression-point neighborhoods; a value that is too large will smooth away the local structure that makes the regression-point geometry useful in the first place. Cross-validated $k$ selection finds the point of best generalization without requiring the analyst to tune it manually.
+
+This is one of the clearest ways in which `NNS.stack` goes beyond assembling predictions from pre-configured learners: it actively optimizes a structural hyperparameter of the underlying prediction mechanism, not just the weights placed on each learner's output.
+
+### Classification threshold optimization
+
+For classification problems, `NNS.stack` includes `optimize.threshold = TRUE` by default. This means the routine does not simply round probabilities at $0.5$ in every case. Instead it searches a grid of candidate thresholds on validation predictions and chooses the threshold that maximizes the selected objective. The final classification threshold is then aggregated across folds into the reported `probability.threshold`.
+
+That is especially useful under class imbalance, where the optimal decision threshold may differ materially from one half.
+
+### Distance options
+
+The `dist` argument permits `"L1"`, `"L2"`, `"DTW"`, and `"FACTOR"` distances.
+
+This is another respect in which NNS stacking differs from generic stacking. The stacked system is not confined to Euclidean geometry. It can accommodate
+
+- Manhattan distance,
+- Euclidean distance,
+- dynamic time warping for temporal alignment,
+- factor-frequency style handling for discrete structures.
+
+The choice of distance metric applies to the regression-point nearest-neighbor search in method 1. Different distance metrics will define different neighborhoods in the regression-point space, and `NNS.stack`'s cross-validation loop evaluates which combination of $k$ and distance metric generalizes best on the held-out data.
+
+So the meta-model is not merely combining predictions. It is combining predictions within a distance-aware, $k$-optimized, data-type-aware directional framework.
+
+---
+
+## Cross-Validation in Nonparametric Settings
+
+Cross-validation plays a central role in both `NNS.boost` and `NNS.stack`.
+
+This is not accidental. In nonparametric estimation, flexibility is high and parametric asymptotic approximations are often less informative. Validation by held-out prediction is therefore especially important.
+
+The package interface exposes this through
+
+- `CV.size`, the cross-validation proportion,
+- `folds`, the number of cross-validation folds,
+- and, in the case of `NNS.boost`, repeated learner trials and epochs.
+
+### What cross-validation is optimizing
+
+In the NNS ensemble context, cross-validation is not merely estimating out-of-sample error. It is simultaneously optimizing several interconnected structural choices:
+
+- in `NNS.boost`: which feature subsets produce predictions that generalize, and how to weight them,
+- in `NNS.stack` for regression: what value of $k$ produces the right level of localization in the regression-point nearest-neighbor search,
+- in `NNS.stack` for classification: what probability threshold maximizes the chosen classification objective.
+
+A classical parametric model may have only a few coefficients whose complexity is explicit once the model is fit. A directional nonparametric ensemble is different. Its effective complexity depends on partition geometry, feature subset choice, neighbor count, class balancing, dimension reduction, and aggregation across learners. Cross-validation is therefore the practical device that simultaneously determines how much local structure to trust and at what spatial scale to apply it.
+
+### General formulation
+
+Let a loss function be denoted by $\ell(y,\hat y)$. If the sample is partitioned into folds $\mathcal{I}_1,\dots,\mathcal{I}_K$, then the $K$-fold cross-validation score for a model $M$ is
+
+$$CV_K(M) = \frac{1}{K} \sum_{k=1}^K \frac{1}{|\mathcal{I}_k|} \sum_{i\in \mathcal{I}_k} \ell\bigl(y_i,\hat y_i^{(-k)}\bigr),$$
+
+where $\hat y_i^{(-k)}$ denotes the prediction for observation $i$ from the model trained without fold $k$.
+
+Because the model is benchmark-relative and local, cross-validation is effectively checking whether the local structural decomposition — including the regression-point geometry and the chosen $k$ — generalizes beyond the particular sample partition that produced it.
+
+### Random versus temporal validation
+
+Both routines also include a `ts.test` option for time-series settings. This matters because ordinary random cross-validation can break temporal dependence and produce misleadingly optimistic results when data are ordered.
+
+In time-dependent settings, `ts.test` should be used so that validation preserves temporal ordering rather than random folds. This is the right way to adapt the ensemble logic to forecasting and other sequential applications, and it applies equally to the $k$ optimization step: the optimal neighbor count for a time-ordered regression-point geometry may differ materially from what random-fold cross-validation would select.
+
+---
+
+## Ensemble Learning and the Bias–Variance Tradeoff
+
+The classical motivation for ensembles is often expressed through the bias–variance decomposition.
+
+If a predictor $\hat f(x)$ is used for squared-error prediction, then at a fixed point $x$,
+
+$$E\bigl[(Y-\hat f(x))^2 \mid X=x\bigr] = \sigma^2(x) + \bigl(E[\hat f(x)]-f(x)\bigr)^2 + Var(\hat f(x)),$$
+
+where $\sigma^2(x)$ is irreducible noise, the squared middle term is bias, and the final term is variance.
+
+Ensembles often help because averaging many unstable predictors can reduce the variance term.
+
+In the NNS context, this logic remains true but should be interpreted geometrically.
+
+A single learner depends on a particular partition, feature subset, validation split, and — in the multivariate case — a particular value of $k$ for the regression-point nearest-neighbor search. Different learners therefore generate different local geometric approximations to the regression or classification surface. Aggregating across them stabilizes the resulting estimate.
+
+So for NNS ensembles:
+
+- **variance reduction** comes from averaging across multiple local structural decompositions,
+- **bias reduction** may come from allowing multiple structural views that no single learner captures well,
+- **localization calibration** comes from cross-validated $k$ selection, which finds the spatial scale at which the regression-point geometry generalizes best,
+- **robustness** comes from filtering learners through validation rather than trusting one partition unconditionally.
+
+This is why ensembles are especially attractive in nonlinear, asymmetric, or heterogeneous data settings.
+
+---
+
+## Regression Ensembles and Classification Ensembles
+
+Both `NNS.boost` and `NNS.stack` support numeric or categorical targets, though the documentation for `NNS.boost` emphasizes classification and the code clearly adapts its objective behavior depending on `type`.
+
+### Regression ensembles
+
+For continuous responses, the ensemble aims to estimate $f(x)=E[Y\mid X=x]$ more accurately and more stably than a single learner.
+
+Here the relevant concerns are local curvature, heteroskedasticity, feature interactions, optimal neighbor count $k$ for the regression-point search, and out-of-sample squared or absolute loss.
+
+### Classification ensembles
+
+For categorical responses, the ensemble aims to estimate conditional class probabilities $P(Y=c\mid X=x)$ or their decision-equivalent ranking more accurately and more stably.
+
+Here the relevant concerns are class imbalance, threshold optimization, rare-class recognition, and discrete decision accuracy.
+
+The distinction is not merely operational. It also changes the objective surface.
+
+In regression, averaging predictions often behaves smoothly, and the optimal $k$ is determined by the curvature of the underlying regression surface relative to the regression-point geometry.
+
+In classification, small changes in the predicted score can move observations across a decision threshold. This is why threshold optimization and balancing options are especially important for classification ensembles, and why the $k$ optimization and threshold optimization steps in `NNS.stack` are both necessary: one controls spatial localization, the other controls the decision boundary.
+
+---
+
+## Relation to Classical Ensemble Methods
+
+The NNS ensemble framework overlaps with familiar classical methods, but it is not identical to any one of them.
+
+### Comparison with bagging
+
+Bagging stabilizes unstable learners by averaging across bootstrap samples.
+
+NNS boosting shares the spirit of stabilization through repeated resampled learning, but it is more selective: it screens learners by performance and tracks feature frequencies rather than averaging every learner equally.
+
+### Comparison with AdaBoost and gradient boosting
+
+Classical boosting methods often reweight observations sequentially or fit residuals stage by stage.
+
+`NNS.boost` is different. It is closer to a performance-thresholded ensemble over feature subsets and validation splits, using NNS regression as the base learner. It does not rely on the same additive stagewise residual-updating mechanism as gradient boosting.
+
+### Comparison with random forests
+
+Random forests combine tree learners built on bootstrap samples and random feature subsets.
+
+NNS ensembles share the idea of random or selective feature subsets, but the base learners are not trees. They are directional nonparametric regressors and classifiers based on recursive partition structure. Moreover, in `NNS.boost`, random feature subsets are not retained merely because they were sampled; they are screened by validation performance and then aggregated by frequency. This makes the feature-selection step partly stochastic and partly performance-driven.
+
+The present empirical dominance of tree-based ensembles may reflect engineering maturity as much as statistical principle. Tree methods benefit from decades of optimization in split search, pruning, regularization, parallelization, software design, and default tuning. To the extent that NNS better preserves nonlinear, asymmetric, and directional information, improvements in algorithmic efficiency, tuning strategy, and implementation may allow those structural advantages to appear more consistently in practical applications. Thus, the theoretical contrast with trees should not be framed only as a contest of current benchmark performance, but also as a question of whether engineering has caught up with the underlying statistical geometry.
+
+### Comparison with classical stacking
+
+Classical stacking uses predictions from multiple models as features for a meta-model.
+
+This is the closest analogue to `NNS.stack`. But even here the base models, the $k$ optimization for the regression-point search, the dimension-reduction options, the distance metrics, and the threshold optimization are specific to the NNS framework. Standard stacking does not optimize a neighbor count for an underlying nearest-neighbor geometry because its base learners are not nearest-neighbor estimators over compressed regression points.
+
+So the correct interpretation is not that NNS simply reimplements classical ensemble methods with new names. It is that NNS develops **directional analogues of those ensemble principles**, with an additional layer of structural optimization — $k$ selection — that arises naturally from the regression-point prediction mechanism.
+
+---
+
+## Practical Performance Considerations
+
+Because ensemble methods combine multiple learners, practical considerations matter.
+
+### Computational cost
+
+Ensembles are more computationally intensive than single fits. This is especially true when the number of predictors is large, the candidate subset space is large, many folds or trials are used, $k$ optimization spans a wide candidate range, or time-series distance calculations such as DTW are involved. The price of flexibility is computation.
+
+### Feature dimensionality
+
+As dimension grows, the number of possible feature subsets grows combinatorially. If there are $p$ predictors, then the total number of nonempty subsets is
+
+$$\sum_{k=1}^{p} \binom{p}{k} = 2^p - 1.$$
+
+This growth explains why exhaustive search becomes infeasible in high dimension and why threshold-based screening is practically valuable.
+
+In small dimensions, `NNS.boost` can evaluate all subsets deterministically. In larger dimensions, it instead samples feature combinations rather than enumerating the full subset space. Thus the procedure remains a guided stochastic search rather than a brute-force exhaustive one.
+
+### Class imbalance
+
+When classes are imbalanced, raw accuracy may be misleading. The `balance` option in both routines helps address this by combining down-sampling and up-sampling when classification is requested. A more complete treatment of this imbalance-handling ensemble workflow appears in Chapter 25, where multivariate forecasting workflows use the same up/down-sampling logic under skewed classification-style targets.
+
+### Missing data
+
+The package notes make clear that missing data should be handled before fitting. This is especially important for nonparametric ensembles, where local structure can be distorted badly by ad hoc missing-value handling.
+
+### Objective-function choice
+
+The user is not restricted to squared error. Any objective expressed in terms of `predicted` and `actual` can be supplied. This allows the ensemble — including the $k$ selection step — to be tuned to the application's actual loss function rather than to a generic default.
+
+---
+
+## Interpretation of Ensemble Output
+
+One criticism often leveled at ensemble methods is that they improve prediction at the cost of interpretability.
+
+That criticism is less severe in the NNS setting than it is in many black-box systems.
+
+### Prediction interpretation
+
+The final output is still a benchmark-relative nonparametric estimate. It remains tied to the directional structure developed throughout the book.
+
+### Neighbor-count interpretation
+
+The cross-validated $k$ returned by `NNS.stack` is itself informative. A small optimal $k$ indicates that the regression-point geometry contains sharp local variation that is best captured with tight neighborhoods. A large optimal $k$ indicates that the response surface is smoother relative to the regression-point distribution, and that broader averaging generalizes better. The selected $k$ is therefore a data-driven summary of the effective locality of the regression surface.
+
+### Feature-frequency interpretation
+
+`NNS.boost` returns feature weights and feature frequencies. These summarize which predictors recur most often among successful learners. This does not provide the same interpretation as a linear coefficient, but it does provide a meaningful **stability-based importance profile**.
+
+### Structural interpretation
+
+Because the base learners are directional and partition-based, the ensemble still reflects local structural decomposition rather than an opaque hidden representation.
+
+So interpretability is not lost entirely. It changes form:
+
+- less coefficient interpretation,
+- more structural, stability, and localization interpretation.
+
+That is often the right trade in nonlinear settings.
+
+---
+
+## Conceptual Summary
+
+This chapter completes the machine-learning progression.
+
+- Chapter 20 used recursive partitions for **unsupervised grouping**.
+- Chapter 21 used them for **numeric prediction**, with regression-point nearest-neighbor search in the multivariate case.
+- Chapter 22 used them for **categorical prediction**.
+- This chapter uses them in **aggregated form** to improve predictive stability and performance, with cross-validated optimization of the neighbor count $k$ as a central mechanism.
+
+The conceptual thread is unbroken.
+
+The same directional primitive that generated partial moments, dependence measures, causation, clustering, regression, and classification also supports ensemble learning — and the $k$ optimization in `NNS.stack` is a direct consequence of the regression-point nearest-neighbor prediction mechanism established in Chapter 21. Once prediction is understood as a nearest-neighbor search over compressed local conditional means rather than a piecewise-linear surface, it becomes natural that the ensemble layer would need to determine the right localization scale for that search.
+
+---
+
+## Summary
+
+This chapter developed ensemble learning in the NNS framework as **aggregation of directional nonparametric learners**.
+
+Its main contributions are sixfold.
+
+**First**, it explained **feature-subset screening based on predictive performance**. Rather than combining all sampled learners indiscriminately, `NNS.boost` retains those feature-defined learners whose validation performance passes a learned threshold.
+
+**Second**, it described the **weighted epoch sampling mechanism**. After the threshold-learning stage, feature indices are pooled with weights proportional to their survival frequency, and each epoch draws from this weighted pool. The epoch loop is therefore not uniform random exploration but structured search biased toward stable features — a performance-earned analog of the random-subspace method.
+
+**Third**, it introduced the **cross-validated optimization of $k$** in `NNS.stack`. Because multivariate NNS regression predicts through nearest-neighbor search over a compressed regression-point matrix — not through a piecewise-linear surface — the number of neighbors $k$ is a structural hyperparameter that directly controls the localization scale of prediction. `NNS.stack` optimizes $k$ fold by fold as part of its cross-validation loop, selecting the neighbor count that best generalizes on held-out data. This is one of the clearest ways `NNS.stack` goes beyond generic stacking: it actively discovers the right spatial scale for the underlying prediction mechanism.
+
+**Fourth**, it developed **dimension-reduction stacking with multiple dependence metrics**. Synthetic predictors may be constructed using linear correlation, nonlinear dependence, directional causation, equal weighting, or an average across all methods, and method 1 (regression-point nearest-neighbor) and method 2 (synthetic-index univariate regression) can be combined within the same stacked model.
+
+**Fifth**, it showed that NNS stacking is a form of **distance-aware meta-learning**. The ensemble can combine predictions using Euclidean, Manhattan, dynamic time warping, or factor-based geometry, and the chosen distance metric applies to the regression-point neighbor search.
+
+**Sixth**, it emphasized **cross-validation as the practical control of complexity** in nonparametric ensemble learning. Cross-validation simultaneously optimizes feature selection, neighbor count, and classification thresholds — not merely estimating out-of-sample error, but actively determining the structural configuration of the learner.
+
+Taken together, these results show that ensemble methods in NNS are not auxiliary add-ons. They are the natural machine-learning extension of the book's central principle:
+
+**start with directional structure, preserve it locally, and aggregate only afterward.**
+
+The next part of the book turns to **time series**, where the same nonparametric and directional principles are extended to temporal dependence, forecasting, and multivariate dynamics.
\ No newline at end of file
diff --git a/tools/NNS/book/chapter-25-nonparametric-time-series-models.Rmd b/tools/NNS/book/chapter-25-nonparametric-time-series-models.Rmd
new file mode 100644
index 0000000..6898eb1
--- /dev/null
+++ b/tools/NNS/book/chapter-25-nonparametric-time-series-models.Rmd
@@ -0,0 +1,714 @@
+# Nonparametric Time Series Models
+
+Part VII turns the directional framework toward **time**.
+
+Previous chapters developed nonparametric estimation, clustering, regression, classification, and ensemble learning using recursive partitioning, local averaging, and benchmark-relative structure. Those methods treated observations as unordered or cross-sectional. Time-series analysis adds a new constraint:
+
+**the observations arrive in sequence, and that ordering matters.**
+
+A time series is not merely a set of values. It is a structured sequence in which the past may influence the future, seasonal patterns may recur, and dependence may change across time.
+
+Classical time-series analysis addresses these problems through models such as **ARIMA**, **ETS**, and **state-space methods**. These are often effective, but they inherit familiar limitations:
+
+- linear autoregressive structure,
+- parametric error assumptions,
+- explicit stationarity requirements,
+- and model identification choices that must be imposed before estimation.
+
+The NNS framework approaches time series differently.
+
+At its heart, time-series modeling is treated as a **subset regression problem**: a sequence is decomposed into lagged component series, and future values are forecast by applying the same nonlinear nonparametric regression logic developed earlier in the book to those components. In this view, autoregression is not abandoned. It is generalized.
+
+Rather than beginning with a linear dynamic equation, NNS begins with a simpler principle:
+
+**forecast the future by learning the structure of the past without imposing a parametric law for that structure.**
+
+This chapter develops that idea.
+
+---
+
+## Time Series as Ordered Nonparametric Data
+
+Let
+
+\[
+\{X_t\}_{t=1}^T
+\]
+
+denote a real-valued time series.
+
+In classical analysis, a time series is often modeled through a difference equation such as
+
+\[
+X_t = \phi_1 X_{t-1} + \cdots + \phi_p X_{t-p} + \varepsilon_t,
+\]
+
+possibly after differencing, detrending, or seasonal adjustment.
+
+That formulation assumes from the outset that the dynamic relation is linear in the lagged values.
+
+The directional framework takes a broader view.
+
+A time series can be written as a regression problem in which the response is the future observation and the predictors are functions of the past:
+
+\[
+X_t = f(X_{t-1}, X_{t-2}, \dots) + \varepsilon_t.
+\]
+
+The task is then to estimate the unknown dynamic map \(f\) nonparametrically.
+
+This places time-series analysis inside the same framework developed in Chapters 19–24:
+
+- identify informative local structure,
+- partition or decompose the data,
+- estimate conditionally,
+- and aggregate only afterward.
+
+The only additional element is temporal order.
+
+---
+
+## Why Classical Time-Series Models Can Fail
+
+The central models of classical forecasting are powerful, but they are built around structural assumptions that many real series violate.
+
+### Linearity
+
+ARIMA and related autoregressive models assume that the next observation depends linearly on lagged values, perhaps after transformation. But many series exhibit threshold effects, asymmetric responses, cyclical distortions, or nonlinear seasonal interactions.
+
+### Stationarity requirements
+
+The Box–Jenkins framework is built around stationarity. In practice, many observed series are not stationary in level, variance, or seasonal structure. Transformations and differencing may help, but they also alter the object being modeled.
+
+### Parametric identification
+
+Classical modeling requires choosing model orders, differencing levels, seasonal terms, and error structures. These decisions can materially change the forecast.
+
+### Symmetric error treatment
+
+Least-squares fitting treats positive and negative forecast errors symmetrically, even in contexts where underprediction and overprediction have different consequences.
+
+The directional nonparametric approach seeks to preserve the useful idea of autoregression while relaxing these restrictions.
+
+---
+
+## Autoregression as a Subset Regression Problem
+
+The NNS time-series framework begins from a simple decomposition.
+
+Suppose a series exhibits a seasonal or cyclical lag \(m\). Then observations separated by that lag belong to a common component series:
+
+\[
+\{X_1, X_{1+m}, X_{1+2m}, \dots\},
+\quad
+\{X_2, X_{2+m}, X_{2+2m}, \dots\},
+\quad \dots
+\quad
+\{X_m, X_{2m}, X_{3m}, \dots\}.
+\]
+
+Each component series is itself a smaller time series indexed by occurrence number within that phase.
+
+Forecasting then becomes a regression problem on each component series separately.
+
+For a given component series with index vector
+
+\[
+z = 1,2,\dots,n_j
+\]
+
+and values
+
+\[
+y^{(j)}_1, y^{(j)}_2, \dots, y^{(j)}_{n_j},
+\]
+
+we estimate the next value through either linear or nonlinear regression:
+
+\[
+y^{(j)}_{n_j+1} \approx \hat f_j(n_j+1).
+\]
+
+The final forecast aggregates these component forecasts using weights determined by predictive strength.
+
+This is why time series in NNS are best viewed as a subset regression problem:
+
+- the original series is partitioned into lag-defined subsets,
+- each subset is modeled with NNS regression,
+- and the subset forecasts are recombined.
+
+Autoregression is therefore retained, but its mechanism is generalized from linear lag equations to **nonparametric lag-structure estimation**.
+
+A small implementation detail is worth noting. If the total series length \(T\) is not an exact multiple of \(m\), then the component series need not all have the same length. Some phases will contain one more observation than others. This creates no conceptual difficulty, but it matters in practice because shorter component series provide less information and therefore should generally receive less effective influence in the forecast aggregation.
+
+---
+
+## Seasonal Decomposition Without Parametric Filters
+
+A major strength of the NNS approach is that seasonality is handled directly through component decomposition rather than through fixed harmonic terms or pre-imposed smoothing filters.
+
+Classical methods often represent seasonality through:
+
+- seasonal ARIMA operators,
+- trigonometric terms,
+- moving-average filters,
+- or exponential smoothing recursions.
+
+The NNS approach instead asks a simpler question:
+
+**At which lag lengths does the series become more predictable when split into component sequences?**
+
+This is operationalized through a seasonality test based on the **coefficient of variation** of each component series relative to the coefficient of variation of the full series.
+
+If a lag \(m\) produces component series with lower coefficient of variation than the original series, then the lag reveals recurring structure. Intuitively:
+
+- a lower component-series coefficient of variation means tighter local behavior,
+- tighter local behavior means greater predictability,
+- and greater predictability indicates seasonality or cyclic structure.
+
+Thus seasonality is not defined through a parametric frequency-domain object. It is defined through **predictive concentration in lag-defined subsets**.
+
+---
+
+## Seasonal Detection by Predictive Power
+
+Let the full series have coefficient of variation
+
+\[
+CV(X) = \frac{\sigma_X}{|\mu_X|},
+\]
+
+assuming the mean is nonzero.
+
+For a candidate lag \(m\), construct the \(m\) component series. Let their representative predictive concentration be summarized through their component coefficients of variation.
+
+If these component coefficients are systematically lower than the overall series coefficient of variation, then the lag \(m\) is informative.
+
+The interpretation is immediate:
+
+- lower \(CV\) means less dispersion relative to level,
+- less dispersion means more stable phase-specific structure,
+- more stable phase-specific structure means improved forecastability.
+
+This gives a nonparametric test for seasonality grounded in prediction rather than in harmonic decomposition.
+
+The chapter’s conceptual point is broader than the specific diagnostic:
+
+**seasonality is treated as recurring conditional structure, not as a parametric periodic law.**
+
+### A worked illustration
+
+Consider the toy quarterly series
+
+\[
+X = (10, 18, 11, 21, 12, 20, 13, 23).
+\]
+
+The overall mean is
+
+\[
+\bar X = \frac{10+18+11+21+12+20+13+23}{8} = 16,
+\]
+
+and the sample standard deviation is approximately
+
+\[
+s_X \approx 5.345.
+\]
+
+Hence the overall coefficient of variation is
+
+\[
+CV(X) \approx \frac{5.345}{16} = 0.334.
+\]
+
+Now test lag \(m=2\). The component series are
+
+\[
+(10,11,12,13)
+\quad\text{and}\quad
+(18,21,20,23).
+\]
+
+For the first component,
+
+\[
+\bar X_1 = 11.5,\qquad s_1 \approx 1.291,\qquad CV_1 \approx \frac{1.291}{11.5}=0.112.
+\]
+
+For the second,
+
+\[
+\bar X_2 = 20.5,\qquad s_2 \approx 2.082,\qquad CV_2 \approx \frac{2.082}{20.5}=0.102.
+\]
+
+Both component coefficients of variation are far below the overall value \(0.334\). That means the lag-2 decomposition produces tighter, more internally coherent subseries than the original series. In the NNS interpretation, lag \(2\) reveals meaningful recurring structure.
+
+By contrast, consider lag \(m=3\). The three component series are
+
+\[
+(10,21,13),\qquad (18,12,23),\qquad (11,20).
+\]
+
+These exhibit much larger within-component variation, so their component coefficients of variation are not uniformly smaller than the full-series value. In this case lag \(3\) is not as predictive as lag \(2\).
+
+This simple example shows exactly how the test works in practice. One computes the overall coefficient of variation, computes the component-series coefficients of variation for each candidate lag, and prefers those lags for which the component series become materially tighter than the original sequence.
+
+Mathematically, the logic is straightforward: if
+
+\[
+CV_j(m) < CV(X)
+\quad \text{for component series } j=1,\dots,m,
+\]
+
+then conditioning on phase within lag \(m\) reduces relative dispersion. Reduced relative dispersion means more concentrated conditional behavior, and more concentrated conditional behavior implies improved forecastability.
+
+In applications one usually needs an operational aggregation rule. A natural choice is a weighted average of component coefficients of variation,
+
+\[
+\overline{CV}(m)
+=
+\sum_{j=1}^{m} w_j\,CV_j(m),
+\qquad
+w_j \ge 0,
+\qquad
+\sum_{j=1}^{m} w_j = 1,
+\]
+
+where \(w_j\) may be proportional to component length or to some predictive-strength measure. Then lag \(m\) is favored when
+
+\[
+\overline{CV}(m) < CV(X).
+\]
+
+An even stricter rule requires most, or all, component CVs to lie below the full-series CV. The exact rule is a modeling choice, but the guiding principle is unchanged: a good seasonal lag is one that produces tighter conditional distributions than the unpartitioned series.
+
+A technical caveat is also important. If \(\mu_X \approx 0\), then \(CV(X)\) can become unstable because the denominator is near zero. In such cases the analyst may instead compare component standard deviations directly, center the series at a more stable scale, or regularize the denominator by adding a small constant. The predictive logic remains the same even if the raw coefficient of variation is numerically unreliable.
+
+---
+
+## Multiple Seasonalities
+
+Many real series have more than one recurring period.
+
+Examples include:
+
+- monthly data with annual and multi-year cycles,
+- hourly data with daily and weekly cycles,
+- sales data with weekly, monthly, and promotional rhythms.
+
+Classical models often struggle when multiple seasonalities interact, especially when the interactions are nonlinear or when the seasonal periods are not cleanly nested.
+
+The NNS framework handles this naturally by allowing multiple candidate lags:
+
+\[
+m_1, m_2, \dots, m_K.
+\]
+
+Each lag defines its own component decomposition and its own forecast. These lag-specific forecasts can then be combined using weights reflecting both:
+
+- the predictive tightness of the corresponding component series,
+- and the amount of information available within each lag structure.
+
+This is a central advantage of the nonparametric formulation: multiple seasonal patterns need not be forced into one rigid dynamic equation. They can be estimated as separate predictive structures and then aggregated.
+
+---
+
+## Nonlinear Autoregressive Structures
+
+The classical AR model is linear in lagged values. NNS replaces this with a more general dynamic relation.
+
+If \(X_t\) depends on past observations through a nonlinear rule, then there is no reason to insist that the forecast be generated by a straight line fitted to lagged values.
+
+Suppose, conceptually, that
+
+\[
+X_t = f(X_{t-m}) + \varepsilon_t
+\]
+
+for some unknown nonlinear function \(f\).
+
+The NNS framework estimates \(f\) by applying the nonparametric regression machinery from earlier chapters to each component series. This allows the method to capture:
+
+- turning points,
+- diminishing effects,
+- local curvature,
+- asymmetric phase behavior,
+- and regime-like transitions.
+
+The importance of this step cannot be overstated.
+
+A component series may itself be nonlinear even when the original series looks smooth. If the local phase-specific dynamics are nonlinear, a linear subseries regression can point in the wrong direction entirely. Nonparametric regression is therefore not a cosmetic addition. It is the mechanism that allows autoregression to remain autoregressive without remaining linear.
+
+---
+
+## Directional Temporal Dependence
+
+Time-series dependence is not merely contemporaneous dependence shifted through time. It has direction:
+
+- earlier values may help predict later values,
+- later values cannot influence earlier ones,
+- and positive versus negative deviations may propagate differently across time.
+
+Within the broader NNS framework, this suggests a temporal analogue of the co-partial-moment decomposition developed in Chapters 11, 12, and 14. For a lag \(\tau \ge 1\), one can study lagged co-partial moments formed from aligned pairs such as \((X_{t-\tau}, X_t)\), separating concordant movement from divergent movement across time.
+
+In that interpretation, one class of lagged moments captures persistence in the same directional regime, while another captures reversals between periods. The distinction is useful because many time series are dynamically asymmetric even when their unconditional summaries appear mild.
+
+For example:
+
+- volatility clusters after large shocks,
+- downturns may persist longer than upswings,
+- inventory shortages may propagate differently than surpluses,
+- and demand spikes may reverse more sharply than demand collapses.
+
+Directional temporal dependence therefore generalizes classical autocorrelation by preserving regime-specific information that linear autocovariance averages away.
+
+It is important, however, to place this idea correctly within the NNS framework. In the current univariate forecasting routines, time dependence is operationalized primarily through lag-defined component regression and seasonality detection, not through a standalone directional-autocorrelation statistic. The lagged co-partial-moment construction belongs most naturally to the theory developed for asymmetric dependence and causation, where temporal ordering is analyzed explicitly rather than only through forecast generation. Readers can map this directly to Chapters 11, 12, and 14: Chapter 10 supplies asymmetric directional dependence, Chapter 11 supplies copula-space normalization intuition, and Chapter 13 supplies directional-causation asymmetry.
+
+---
+
+## Forecasting from Component Regressions
+
+The NNS forecasting workflow can now be stated clearly.
+
+### Step 1: Select candidate seasonal lags
+
+Identify one or more plausible lag lengths, either from domain knowledge or from the predictive seasonality test.
+
+### Step 2: Form component series
+
+For each lag \(m\), partition the original series into \(m\) phase-specific subseries.
+
+### Step 3: Regress each component forward
+
+For each component series, estimate the next observation using either:
+
+- linear regression,
+- nonlinear NNS regression,
+- both, or
+- mean-based shrinkage variants.
+
+### Step 4: Weight and aggregate
+
+Combine component forecasts using weights that reflect predictive concentration and sample support.
+
+### Step 5: Iterate if forecasting multiple steps ahead
+
+For multi-step forecasting, append the newly predicted value and repeat. Seasonal factors may be kept fixed or updated dynamically as the forecast path evolves.
+
+This procedure preserves the definition of autoregression:
+
+the forecast is still generated from the series’ own past.
+
+But it does so without imposing stationarity, without requiring Box–Jenkins identification, and without restricting the lag relation to a linear map.
+
+A brief clarification of the mean-based option is useful. In some component series the fitted regression may be unstable because the component is short, noisy, or nearly flat. In that case a practical alternative is to shrink the regression estimate toward the component mean, or even to use the component mean directly. This sacrifices some responsiveness in exchange for stability. Conceptually, it is a local bias-variance tradeoff: when the estimated slope or nonlinear fit is unreliable, the component average can act as a robust anchor.
+
+```r
+# Univariate nonlinear ARMA
+z <- as.numeric(scale(sin(1:480/8) + rnorm(480, sd=.35)))
+
+# Seasonality detection (prints a summary)
+seasonal_period <- NNS.seas(z, plot = FALSE)
+head(seasonal_period$all.periods)
+
+## Period Coefficient.of.Variation Variable.Coefficient.of.Variation
+## 1 99 0.5122054 1.168502e+17
+## 2 147 0.5256021 1.168502e+17
+## 3 100 0.5598477 1.168502e+17
+## 4 146 0.5618687 1.168502e+17
+## 5 199 0.5766158 1.168502e+17
+## 6 98 0.5801409 1.168502e+17
+
+
+# Validate seasonal periods and forecast
+NNS.ARMA.optim(z, h = 48, seasonal.factor = seasonal_period$periods, plot = TRUE, ncores = 1)
+```
+
+
+
+
+
+
+
+---
+
+## Dynamic Updating and Recursive Forecast Paths
+
+A one-step forecast is rarely the end goal. In practice, analysts often require
+
+\[
+h = 1,2,\dots,H
+\]
+
+steps ahead.
+
+In the NNS framework, multi-step forecasting proceeds recursively.
+
+If \(\hat X_{T+1}\) is forecast first, then it is appended to the series and treated as part of the evolving path when forecasting \(\hat X_{T+2}\), and so on.
+
+This creates two natural modes.
+
+### Static seasonal structure
+
+The seasonal lags and weights are estimated once from the historical sample and then held fixed for all future steps.
+
+### Dynamic seasonal structure
+
+The seasonal structure is recomputed as the forecast path grows, allowing the decomposition itself to evolve.
+
+The static approach favors stability.
+The dynamic approach favors adaptability.
+
+But that distinction can be made more precise.
+
+Static updating is generally preferable when:
+
+- the dominant seasonal pattern is well known in advance,
+- the series is long enough that seasonal weights are already stable,
+- the forecast horizon is short relative to the seasonal period,
+- or the analyst values interpretability and reproducibility over rapid adaptation.
+
+In such cases, recomputing the lag structure at every step may add noise rather than information. If the underlying periodic structure is persistent, then a fixed decomposition acts as a stabilizer.
+
+Dynamic recomputation is preferable when the data suggest that the seasonal structure itself is moving. Typical signals include:
+
+- abrupt level shifts,
+- changing amplitudes of recurring cycles,
+- newly emerging or fading periodicities,
+- strong structural breaks,
+- or forecast errors that begin to cluster by phase.
+
+For example, a retail series may historically be dominated by an annual pattern, yet after a major platform change or supply shock, shorter promotional cycles may become more predictive than the old annual rhythm. In that case, holding the original seasonal factor fixed can lock the forecast into an outdated regime. Dynamic updating allows the decomposition to respond as the series evolves.
+
+So the practical decision is not merely “stability versus adaptability.” It is a question of whether the analyst believes the lag structure is itself part of the stable signal or part of the changing environment.
+
+A useful empirical guide is out-of-sample validation. One may reserve a holdout period, compare static and dynamic forecasts over that window, and select the updating rule that yields better predictive accuracy. In that sense, the static-versus-dynamic decision is not purely philosophical. It can be treated as a forecasting design choice subject to cross-validation.
+
+---
+
+## Prediction Intervals for Forecasts
+
+Point forecasts are only one part of the forecasting problem. Analysts also need measures of uncertainty.
+
+Because the NNS framework is nonparametric, forecast intervals are constructed without Gaussian error assumptions. Instead, uncertainty can be propagated using the maximum entropy bootstrap machinery developed in Chapter 17.
+
+The logic is straightforward:
+
+1. generate replicates consistent with the forecast path and dependence structure,
+2. compute the implied distribution of future outcomes,
+3. extract lower and upper predictive bounds from directional quantiles.
+
+This produces prediction intervals that are aligned with the empirical distributional shape of the series rather than with a parametric error law.
+
+Thus the directional framework provides not only nonlinear point forecasts, but also **distribution-free forecast uncertainty quantification**.
+
+A bit more concretely, suppose the fitted model yields a forecast path
+
+\[
+\hat X_{T+1}, \dots, \hat X_{T+H}.
+\]
+
+The bootstrap procedure does not assume i.i.d. Gaussian residuals around that path. Instead, it constructs synthetic continuations that preserve the rank structure and dependence features of the observed series as closely as possible. Each bootstrap replicate yields an alternative future trajectory, and the collection of such trajectories forms an empirical predictive distribution at each horizon.
+
+If the series is asymmetric, heavy-tailed, or exhibits occasional bursts, those features can appear in the predictive distribution instead of being averaged away by a normal approximation. Directional lower and upper tail functionals can then be used to extract forecast bands. In this sense, the interval forecast is not an accessory to the point forecast. It is the distributional analogue of the same nonparametric logic: preserve the observed structure first, summarize uncertainty second.
+
+In practice, one might generate a large number of bootstrap replicates, often on the order of hundreds or thousands, then evaluate the empirical future distribution at each horizon. If a central \(100(1-\alpha)\%\) interval is desired, the lower and upper bounds can be extracted from directional quantiles corresponding to \(\alpha/2\) and \(1-\alpha/2\). The exact number of replicates is an accuracy-versus-computation choice, but the principle is always the same: empirical resampled paths replace parametric error formulas.
+
+---
+
+## Relation to Earlier NNS Chapters
+
+Time-series modeling in NNS is not an isolated technique. It is a direct extension of earlier ideas.
+
+### From Chapter 19
+
+The forecast engine is built from recursive conditional estimation.
+
+### From Chapter 22
+
+The local regression on component series inherits the data-adaptive bandwidth logic of partition estimators.
+
+### From Chapter 22
+
+Forecasting is just regression where the predictor is lagged time index or lagged structure and the response is the next component value.
+
+### From Chapter 24
+
+Aggregation across multiple lag structures is an ensemble of directional learners.
+
+So the time-series framework is not an exception to the book’s theory. It is one of its most natural applications.
+
+---
+
+## Comparison with ARIMA
+
+ARIMA remains one of the benchmark tools of time-series forecasting.
+
+Its strengths are well known:
+
+- interpretable lag operators,
+- strong theory under stationarity,
+- effective performance on linear stochastic dynamics.
+
+But its limitations are equally clear when viewed through the directional lens.
+
+### Structural form
+
+ARIMA assumes a linear dependence structure after differencing. NNS does not.
+
+### Stationarity
+
+ARIMA is built around stationarity and invertibility conditions. NNS forecasting does not require the series to satisfy a stationary parametric model in level.
+
+### Identification burden
+
+ARIMA requires order selection and specification diagnostics. NNS shifts the task from parametric identification to predictive lag decomposition.
+
+### Nonlinearity
+
+ARIMA can approximate some nonlinear behavior through transformations or hybridization, but nonlinearity is not native to the model. In NNS, it is native.
+
+This does not mean ARIMA is obsolete. It means that ARIMA is best understood as a special, linear, tightly structured case of a broader forecasting problem.
+
+For balance, however, NNS does not eliminate modeling choices. It replaces ARIMA’s order-identification problem with choices about lag selection, regression method, aggregation weights, and updating scheme. The difference is methodological rather than absolute: in NNS, these choices are naturally evaluated by predictive performance rather than by adherence to a pre-specified parametric identification protocol.
+
+---
+
+## Comparison with ETS Models
+
+ETS methods model time series through combinations of
+
+- error,
+- trend,
+- and seasonality,
+
+typically using exponential smoothing recursions and state-space interpretations.
+
+These methods are often highly effective, especially on business forecasting problems with stable level, trend, and seasonal components.
+
+Relative to ETS, the NNS approach differs in several ways.
+
+### Component meaning
+
+ETS decomposes the series into latent level, trend, and seasonality states.
+NNS decomposes it into lag-defined predictive subsets.
+
+### Smoothing mechanism
+
+ETS uses recursive smoothing equations.
+NNS uses local regression and weighted aggregation across component series.
+
+### Parametric structure
+
+ETS specifies an updating architecture in advance.
+NNS lets the predictive structure emerge from the data.
+
+### Nonlinear interactions
+
+ETS can adapt smoothly, but it is not inherently designed for rich nonlinear autoregressive geometry.
+NNS is.
+
+The practical difference is conceptual:
+
+ETS smooths a presumed component architecture.
+NNS learns a predictive architecture from subset behavior.
+
+A further distinction is distributional. Many ETS formulations are estimated in likelihood-based frameworks tied to specific error models, often Gaussian or close variants. The NNS approach imposes no such distributional law on the series or on the forecast errors.
+
+---
+
+## When the NNS Approach Is Especially Useful
+
+The nonparametric time-series framework is particularly attractive when one or more of the following hold:
+
+- the series exhibits nonlinear cyclic behavior,
+- multiple seasonalities are present,
+- model stationarity is doubtful,
+- lag effects are structurally asymmetric,
+- parametric identification is fragile,
+- or prediction accuracy matters more than adherence to a classical stochastic specification.
+
+Examples include:
+
+- retail and transaction flows,
+- cyclical economic indicators,
+- energy demand,
+- financial and commodity time series,
+- and operational processes with threshold-driven dynamics.
+
+These are precisely the settings where local structure matters more than global parametric elegance.
+
+---
+
+## Limitations
+
+A nonparametric forecasting method is not free of tradeoffs.
+
+### Primarily univariate in this chapter
+
+The methods developed here focus on a single series. Cross-series interactions are deferred to Chapter 26.
+
+### Data requirements
+
+Because prediction is learned from historical structure, sparse component series may limit reliability for very large lag lengths.
+
+### Computational cost
+
+Searching many seasonal combinations and fitting nonlinear regressions can be more computationally intensive than fitting a simple linear ARIMA.
+
+### Interpretability of dynamics
+
+A fitted ARIMA equation provides direct coefficients. NNS instead provides a predictive mechanism based on component regressions and weights. This is often more flexible, but less compact as a closed-form law.
+
+These limitations are real. But they are the cost of avoiding the stronger assumptions of parametric time-series models.
+
+The multivariate extension is natural in principle, but not automatic in implementation. Once several series enter, one must distinguish self-dependence from cross-dependence, align potentially different frequencies, and account for lead-lag structure across variables.
+
+---
+
+## Leakage-Safe Backtesting Protocol
+
+Forecasting performance must be assessed with strict time-order preservation. A leakage-safe protocol is:
+
+- **Rolling-origin evaluation**: choose an initial training window `[1, T_0]`.
+- **Forecast horizon definition**: for each origin `t`, produce forecasts for `t+h` without using observations after `t`.
+- **Expanding or sliding refit**:
+ - expanding window: train on `[1, t]`, or
+ - sliding window: train on `[t-w+1, t]`.
+- **No future-informed preprocessing**: any scaling, interpolation, imputation, or feature construction must be computed using data available at origin `t` only.
+- **Horizon-specific scoring**: report MAE/RMSE/coverage separately for each horizon `h` rather than pooling all horizons.
+- **Interval calibration check**: compare nominal vs empirical coverage for prediction intervals across horizons.
+
+For seasonal component construction, lag selection must also be origin-specific; selecting global lags from the full sample before backtesting constitutes leakage.
+
+---
+
+
+## Summary
+
+This chapter developed nonparametric time-series modeling in the NNS framework.
+
+Its main contributions are fivefold.
+
+First, it reframed **time series as a subset regression problem**. Forecasting is treated as conditional estimation on lag-defined component series rather than as fitting a single global linear recursion.
+
+Second, it developed **seasonal decomposition by predictive concentration**. Seasonal structure is detected through reductions in coefficient of variation across component series, linking seasonality directly to forecastability.
+
+Third, it established **nonlinear autoregressive forecasting**. Component series are projected forward using nonparametric regression, allowing local curvature and asymmetric dynamics to enter the forecast natively.
+
+Fourth, it clarified how the book’s broader dependence framework extends into time. Lagged directional structure can be studied theoretically through co-partial-moment ideas, even though the chapter’s main forecasting routines operationalize time dependence through component regression and seasonality rather than through a standalone directional-autocorrelation statistic.
+
+Fifth, it clarified the chapter’s relationship to classical methods. ARIMA and ETS remain important special-purpose tools, but they impose structural assumptions that the NNS framework avoids.
+
+Taken together, these results show that the directional nonparametric framework extends naturally from cross-sectional estimation to temporal prediction.
+
+But the framework developed here remains fundamentally univariate. Once other series matter, the main difficulty is no longer just whether the past of \(X_t\) predicts its future, but whether lagged values of \(Y_t\), \(Z_t\), and other related processes alter that forecast in nonlinear and asymmetric ways. In that setting, univariate decomposition can miss cross-series lead-lag effects, common shocks, and mixed-frequency structure.
+
+The next chapter therefore generalizes the same ideas to **multivariate forecasting**, where multiple time series interact through directional dependence, lagged cross-variable structure, and mixed-frequency information.
+
+> **Further Reading / Examples**
+
+> For forecasting applications, including the tidal data example, see the [NNS Time-Series Forecasting Examples](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/README.md#4-time-series-forecasting). This behavior is illustrated in the [tidal forecasting example](https://htmlpreview.github.io/?https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/tides.html), where the seasonal decomposition captures the dominant 12-month cycle.
+
+> For prediction-interval calibration under nonstationarity — `NNS.ARMA.optim` benchmarked against conformal-prediction methods on coverage and the Winkler interval score — see the [time-series prediction-interval benchmark](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/nns_arma_conformal_benchmark_report.md).
diff --git a/tools/NNS/book/chapter-26-multivariate-forecasting.Rmd b/tools/NNS/book/chapter-26-multivariate-forecasting.Rmd
new file mode 100644
index 0000000..908718a
--- /dev/null
+++ b/tools/NNS/book/chapter-26-multivariate-forecasting.Rmd
@@ -0,0 +1,663 @@
+# Multivariate Forecasting
+
+Chapter 24 developed the NNS approach to univariate forecasting. A single series can be projected forward by treating future values as a nonlinear autoregression problem, estimating the relation between current and lagged values without imposing a fixed parametric law, and then extrapolating the series using directional nonparametric structure.
+
+But many forecasting problems are not univariate.
+
+Macroeconomic indicators move together.
+Financial variables transmit shocks across markets.
+Operational systems contain multiple indicators observed at different frequencies.
+In all such cases, the future of one variable depends not only on its own past, but also on the lagged history of other variables.
+
+This chapter extends the forecasting framework from one time series to a **system of time series**.
+
+The package implementation is `NNS.VAR`, a **nonparametric vector autoregressive model incorporating `NNS.ARMA` estimates of variables into `NNS.reg` for a multivariate time-series forecast**. Its purpose is explicit: combine univariate nonlinear time-series forecasting with multivariate nonlinear regression so that each series can borrow information from the others while still retaining its own temporal structure.
+
+Whenever this chapter appeals to mean-split regression behavior (local refinement with occupancy control), the theoretical reference point is Chapter 18: the consistency conditions for recursive mean-split estimation under shrinking local diameters and growing local sample support.
+
+A second implementation, `NNS.nowcast`, wraps this framework for mixed-frequency macroeconomic forecasting and nowcasting, using a built-in panel of Federal Reserve style indicators and passing them to `NNS.VAR` with monthly alignment and a 12-lag structure.
+
+The key ideas of the chapter are therefore:
+
+- multivariate time-series dependence,
+- nonlinear vector autoregressive extensions,
+- directional cross-variable interactions,
+- mixed-frequency inputs,
+- and empirical forecasting systems built from these components.
+
+---
+
+## From Univariate to Multivariate Forecasting
+
+Let
+
+\[
+X_t =
+\begin{pmatrix}
+X_{1t}\\
+X_{2t}\\
+\vdots\\
+X_{pt}
+\end{pmatrix}
+\in \mathbb{R}^p
+\]
+
+denote a \(p\)-dimensional time series.
+
+In univariate forecasting, we model a future value as
+
+\[
+X_{t+h} = f(X_t, X_{t-1}, X_{t-2}, \dots) + \varepsilon_{t+h}.
+\]
+
+In the multivariate setting, each component may depend on the lagged history of every variable in the system:
+
+\[
+X_{j,t+h}
+=
+f_j\!\bigl(
+X_{1,t}, X_{1,t-1}, \dots,
+X_{2,t}, X_{2,t-1}, \dots,
+X_{p,t}, X_{p,t-1}, \dots
+\bigr)
++
+\varepsilon_{j,t+h},
+\qquad j=1,\dots,p.
+\]
+
+So the problem is no longer to forecast a single path in isolation. It is to estimate a **joint dynamic structure** in which variables affect one another through time.
+
+This is the motivation for vector autoregression.
+
+---
+
+## Classical VAR and Its Limits
+
+A classical vector autoregressive model of order \(k\), VAR\((k)\), is written
+
+\[
+X_t = c + A_1 X_{t-1} + A_2 X_{t-2} + \cdots + A_k X_{t-k} + \varepsilon_t,
+\]
+
+where \(c\) is a vector of intercepts, \(A_1,\dots,A_k\) are coefficient matrices, and \(\varepsilon_t\) is an innovation vector.
+
+This formulation is powerful because it allows each variable to depend on lagged values of all the others. But it also imposes several strong restrictions.
+
+### Linearity
+
+All effects enter linearly through the coefficient matrices. Threshold behavior, asymmetry, nonlinear interactions, and regime changes are not modeled directly.
+
+### Uniform lag structure
+
+Classical VAR usually imposes the same lag order across all variables, even when different variables operate naturally at different horizons.
+
+### Parametric residual thinking
+
+Forecast construction and inference are built around residual assumptions that may become fragile in heavy-tailed, asymmetric, or nonlinear environments.
+
+### Mixed-frequency difficulty
+
+Variables observed monthly, quarterly, weekly, or daily do not fit naturally into the same linear lag system without additional modeling layers.
+
+The NNS approach keeps the useful intuition of cross-variable lag forecasting while relaxing these structural restrictions.
+
+---
+
+## Nonparametric Vector Autoregression
+
+`NNS.VAR` reframes vector autoregression as a nonlinear nonparametric learning problem.
+
+The implementation has a clear architecture.
+
+### Stage 1: complete each series individually
+
+Each variable is first interpolated where values are missing and extrapolated forward using `NNS.ARMA`. The returned object explicitly includes
+
+- `interpolated_and_extrapolated`,
+- `univariate`,
+- `multivariate`,
+- `ensemble`,
+- and `relevant_variables`.
+
+This design is important. The multivariate forecast does not replace the univariate forecast. It is built on top of it.
+
+### Stage 2: construct lagged predictors
+
+The completed multivariate panel is transformed into a lag matrix.
+
+### Stage 3: reduce lagged predictors
+
+For each target variable, the lagged predictor set is screened using one of several relevance measures and a thresholding rule.
+
+### Stage 4: estimate nonlinear multivariate forecasts
+
+The retained lagged predictors are passed into the multivariate regression system through `NNS.stack` and `NNS.reg`.
+
+### Stage 5: combine univariate and multivariate information
+
+The final forecast is an ensemble of the univariate and multivariate components.
+
+So the “VAR” terminology remains conceptually appropriate, but the mechanism is no longer a linear matrix recursion. It is a **nonlinear regression surface over lagged multivariate data**.
+
+---
+
+## Lagged Predictor Geometry
+
+Suppose the observed system is stored in a matrix
+
+\[
+V =
+\begin{bmatrix}
+X_{11} & X_{21} & \cdots & X_{p1}\\
+X_{12} & X_{22} & \cdots & X_{p2}\\
+\vdots & \vdots & \ddots & \vdots\\
+X_{1T} & X_{2T} & \cdots & X_{pT}
+\end{bmatrix}.
+\]
+
+The first structural step is to generate lagged copies of each column. For a lag depth \(\tau\), these predictors include terms such as
+
+\[
+X_{j,t},\; X_{j,t-1},\; X_{j,t-2},\; \dots,\; X_{j,t-\tau}.
+\]
+
+A major advantage of `NNS.VAR` is that the lag argument `tau` is flexible.
+
+It may be
+
+- a single positive integer, applying the same lag depth to every variable,
+- a vector, assigning a single lag choice to each variable,
+- or a list, assigning multiple lags to each variable separately.
+
+Thus the lag structure need not be homogeneous.
+
+One variable may use only lag 1, another may use lags 1 through 6, and a third may use a sparse pattern such as \(1, 3, 12\). This flexibility is especially important in economic and financial systems, where different variables evolve over different time scales.
+
+Mathematically, the forecasting feature vector for one horizon may be written as
+
+\[
+Z_t
+=
+\bigl(
+X_{1,t}, X_{1,t-1}, \dots, X_{1,t-\tau_1},
+X_{2,t}, X_{2,t-1}, \dots, X_{2,t-\tau_2},
+\dots,
+X_{p,t}, X_{p,t-1}, \dots, X_{p,t-\tau_p}
+\bigr).
+\]
+
+The multivariate forecast for variable \(j\) is then
+
+\[
+\hat X_{j,t+h} = f_j(Z_t),
+\]
+
+with \(f_j\) estimated nonparametrically.
+
+---
+
+## Mixed-Frequency Inputs
+
+A central practical challenge in multivariate forecasting is **mixed-frequency data**.
+
+Examples include:
+
+- monthly inflation with quarterly GDP,
+- weekly claims with monthly industrial production,
+- daily market variables with monthly macroeconomic releases.
+
+If these series are aligned onto a common calendar, the lower-frequency variables necessarily contain missing entries at higher-frequency timestamps.
+
+Classical multivariate methods often require specialized machinery for this. The NNS framework instead treats the problem as a nonlinear interpolation and extrapolation task before multivariate estimation begins.
+
+Let
+
+\[
+V_t = (X_{1t},\dots,X_{pt}),
+\]
+
+with some entries unobserved because the reporting frequencies differ. `NNS.VAR` explicitly returns a matrix called `interpolated_and_extrapolated`, whose purpose is to replace missing values in the original panel using interpolation and univariate extrapolation. This is not an incidental preprocessing step. It is foundational to the mixed-frequency design of the method.
+
+Conceptually, the procedure is:
+
+1. place all series on a common index,
+2. fill interior gaps via interpolation,
+3. forecast trailing missing values using univariate nonlinear forecasting,
+4. build the lagged multivariate system on the completed panel.
+
+So mixed-frequency forecasting is handled natively, rather than being outsourced to a separate parametric state-space model.
+
+---
+
+## Dimension Reduction and Directional Relevance
+
+Once the lagged panel has been created, not all lagged predictors are equally informative for each target variable.
+
+`NNS.VAR` therefore applies a dimension-reduction step before forming the multivariate forecast. The user can choose among four relevance schemes:
+
+- `"cor"`: absolute Spearman correlation,
+- `"NNS.dep"`: nonlinear dependence weights,
+- `"NNS.caus"`: directional causation weights,
+- `"all"`: the average of the three relevance matrices.
+
+This is a substantive departure from classical VAR, where the usual practice is to include all lags up to the chosen order unless restrictions are imposed manually.
+
+Here the guiding question is:
+
+**Which lagged variables are actually relevant for forecasting this target?**
+
+The implementation answers this in two steps.
+
+### Step 1: compute relevance scores
+
+For a target variable \(Y\) and lagged predictors \(Z_1,\dots,Z_m\), a score \(r_\ell\) is computed for each predictor.
+
+Depending on `dim.red.method`, these are:
+
+\[
+r_\ell = |\rho_S(Y,Z_\ell)|,
+\]
+
+or the nonlinear dependence weight from `NNS.dep`, or the directional causation weight from `NNS.caus`, or their average.
+
+### Step 2: apply a threshold
+
+These relevance scores are then compared to the threshold selected by the preceding `NNS.stack` dimension-reduction step. A lagged predictor survives only if its score exceeds that threshold.
+
+So the selection rule is
+
+\[
+Z_\ell \text{ is retained}
+\quad \Longleftrightarrow \quad
+r_\ell > \theta,
+\]
+
+where \(\theta\) is the learned dimension-reduction threshold.
+
+If no lag exceeds the threshold, the implementation falls back to the full lagged predictor set rather than discarding the multivariate structure entirely.
+
+This deserves emphasis. The reduction step is not just “screening by method name.” It is a concrete thresholding rule applied to lag-specific relevance scores.
+
+---
+
+## Directional Cross-Variable Interactions
+
+Why are these relevance measures important?
+
+Because cross-variable forecasting effects are not generally linear, symmetric, or monotone.
+
+A lagged predictor may matter because:
+
+- it moves with the target linearly,
+- it influences the target nonlinearly,
+- or it exhibits directional causal strength that would be poorly summarized by a simple coefficient.
+
+The NNS framework is designed precisely for such cases.
+
+If variable \(X_k\) influences variable \(X_j\) only during downturns, only above a threshold, or only through asymmetric responses, then a linear VAR coefficient can understate or even obscure the relationship. By contrast, dependence weights from `NNS.dep` and directional weights from `NNS.caus` are built to recognize these nonlinear and asymmetric structures.
+
+So the multivariate forecasting problem becomes one of **discovering directional cross-variable interactions in lag space**.
+
+---
+
+## Multivariate Forecasting as Nonlinear Regression
+
+For each target variable \(X_j\), the forecasting problem can be written as
+
+\[
+X_{j,t+h} = f_j(Z_t),
+\]
+
+where \(Z_t\) is the selected lagged feature vector after dimension reduction.
+
+The surface \(f_j\) is estimated through the NNS regression framework rather than through a linear coefficient matrix. This is the conceptual heart of `NNS.VAR`.
+
+Multivariate autoregression is therefore reinterpreted as a supervised learning problem:
+
+- the response is the future value of one variable,
+- the predictors are lagged values of all variables,
+- the regression surface is estimated nonparametrically.
+
+This is still autoregression, but with nonlinear function estimation replacing linear matrix multiplication.
+
+---
+
+## Ensemble Construction
+
+One of the most important features of `NNS.VAR` is that the final output is not merely a multivariate forecast.
+
+For each target variable, the method returns:
+
+- a univariate forecast from `NNS.ARMA`,
+- a multivariate forecast from the nonlinear regression stack,
+- and an ensemble combining the two.
+
+Let
+
+\[
+\hat u_{j,t+h}
+\]
+
+denote the univariate forecast for variable \(j\), and let
+
+\[
+\hat m_{j,t+h}
+\]
+
+denote the multivariate forecast. The ensemble is
+
+\[
+\hat x_{j,t+h}
+=
+w^{(u)}_j \hat u_{j,t+h}
++
+w^{(m)}_j \hat m_{j,t+h},
+\qquad
+w^{(u)}_j + w^{(m)}_j = 1.
+\]
+
+If `naive.weights = TRUE`, the method uses equal weights:
+
+\[
+w^{(u)}_j = w^{(m)}_j = \frac12.
+\]
+
+If `naive.weights = FALSE`, the implementation is more precise than a simple “count of relevant variables.” It computes the proportion of selected lagged predictors that belong to the target variable itself.
+
+Let
+
+- \(n_{\text{own},j}\) be the number of retained lagged predictors for target \(j\) whose base variable is \(j\),
+- \(n_{\text{cross},j}\) be the number of retained lagged predictors whose base variable is not \(j\).
+
+Then
+
+\[
+w^{(u)}_j
+=
+\frac{n_{\text{own},j}}{n_{\text{own},j} + n_{\text{cross},j}},
+\qquad
+w^{(m)}_j
+=
+1 - w^{(u)}_j
+=
+\frac{n_{\text{cross},j}}{n_{\text{own},j} + n_{\text{cross},j}}.
+\]
+
+This is an elegant weighting rule.
+
+- If the selected structure is mostly own-lags, the forecast leans univariate.
+- If the selected structure is mostly cross-variable, the forecast leans multivariate.
+- If there is no usable distinction, the implementation falls back to equal weighting.
+
+So the ensemble is not arbitrary averaging. It is a structural weighting mechanism derived from the selected lag geometry.
+
+---
+
+## Relevant Variables as a Structural Map
+
+The output `relevant_variables` provides more than a convenience list. It gives a structural summary of the system learned by the model.
+
+For each target variable, it records the lagged predictors that survived the relevance threshold. This means the forecast comes with an interpretable dynamic footprint:
+
+- which own-lags matter,
+- which other variables matter,
+- and at which lag horizons they matter.
+
+This has an interpretation close to a learned nonlinear Granger map, but without restricting the analysis to linear coefficients or symmetric residual-based testing.
+
+The result is both predictive and explanatory.
+
+A variable may be forecast mainly by its own lagged history, by a sparse collection of related series, or by a broad network of interacting indicators. The model does not impose one of these patterns a priori; it discovers them from the data.
+
+---
+
+## A Small Synthetic Illustration
+
+Suppose we jointly forecast monthly inflation \(I_t\), unemployment \(U_t\), and industrial production \(P_t\), using lags through month 12.
+
+The full lag system would contain predictors such as
+
+\[
+I_{t-1}, I_{t-2}, \dots, I_{t-12},\;
+U_{t-1}, U_{t-2}, \dots, U_{t-12},\;
+P_{t-1}, P_{t-2}, \dots, P_{t-12}.
+\]
+
+But after dimension reduction, the retained structures for different targets need not look alike.
+
+For inflation, the model might retain mainly
+
+\[
+I_{t-1},\ I_{t-2},\ P_{t-1},
+\]
+
+indicating that short-run inflation persistence and recent production conditions are the dominant signals.
+
+For unemployment, the model might retain
+
+\[
+U_{t-1},\ U_{t-12},\ I_{t-3},
+\]
+
+suggesting a mixture of local persistence, annual seasonality, and delayed inflation spillover.
+
+This synthetic example illustrates the main point: the NNS system does not treat all targets as sharing the same lag law. Each response learns its own sparse multivariate structure.
+
+That is exactly what a nonlinear vector autoregression should do.
+
+---
+
+## Nowcasting with `NNS.nowcast`
+
+Nowcasting deserves special treatment because it is one of the most practical uses of the framework.
+
+`NNS.nowcast` is a wrapper built around `NNS.VAR`. It downloads a base panel of macroeconomic indicators, converts each to monthly frequency, merges them into a common panel, and then calls `NNS.VAR(econ_variables, h = h, tau = 12, nowcast = TRUE, naive.weights = naive.weights)`.
+
+This gives a ready-made mixed-frequency multivariate forecasting system.
+
+The built-in variable set includes indicators such as:
+
+- payroll employment,
+- job openings,
+- CPI and core CPI,
+- durable goods orders,
+- retail sales,
+- unemployment,
+- housing starts and permits,
+- industrial production,
+- personal income,
+- exports and imports,
+- construction spending,
+- unit labor cost,
+- real consumption spending,
+- real GDP,
+- weekly unemployment claims,
+- Treasury rates and yield spreads,
+- Federal Reserve balance sheet assets,
+- commodity prices,
+- federal funds,
+- producer prices,
+- labor force participation,
+- money supply,
+- and ADP payrolls.
+
+This is a practically meaningful nowcasting panel. It combines labor, inflation, output, rates, liquidity, spending, trade, and commodity signals in one monthly-aligned system.
+
+### What the nowcast output means
+
+The output of `NNS.nowcast` is not a single number. It returns the same five-part structure as `NNS.VAR`:
+
+- `interpolated_and_extrapolated`: the completed monthly panel after filling mixed-frequency gaps,
+- `relevant_variables`: the selected lagged predictors for each target,
+- `univariate`: the standalone `NNS.ARMA` forecasts,
+- `multivariate`: the nonlinear multivariate forecasts,
+- `ensemble`: the combined forecast.
+
+So in practice, nowcasting means more than “guess the current GDP number.” It means:
+
+1. align all indicators to the current monthly calendar,
+2. fill missing lower-frequency observations,
+3. estimate joint nonlinear lag relations,
+4. produce univariate, multivariate, and ensemble projections,
+5. inspect which variables were actually driving the current nowcast.
+
+This is a much richer object than a single real-time estimate.
+
+---
+
+## Prediction Intervals
+
+As in Chapter 15, point forecasts alone are not enough. Forecast uncertainty must also be quantified.
+
+The examples associated with both `NNS.VAR` and `NNS.nowcast` construct prediction intervals using `NNS.meboot`, `LPM.VaR`, and `UPM.VaR`.
+
+Suppose the forecasted path of a target series is
+
+\[
+\hat x_1,\hat x_2,\dots,\hat x_h.
+\]
+
+Bootstrap replicates are generated from the forecast path:
+
+\[
+\hat x^{*(1)}, \hat x^{*(2)}, \dots, \hat x^{*(B)}.
+\]
+
+Lower and upper forecast bounds are then obtained using partial-moment quantile operators:
+
+\[
+\text{Lower}_{\alpha}
+=
+\operatorname{LPM.VaR}(\alpha/2,\cdot),
+\qquad
+\text{Upper}_{\alpha}
+=
+\operatorname{UPM.VaR}(\alpha/2,\cdot).
+\]
+
+In the nowcasting examples, this is illustrated directly for GDP. The GDP ensemble path is bootstrapped with `NNS.meboot`, and lower and upper confidence bounds are then computed from `LPM.VaR` and `UPM.VaR`.
+
+The conceptual advantage is consistent with the rest of NNS:
+
+- no Gaussian residual assumption is required,
+- asymmetry is naturally allowed,
+- and dependence-preserving synthetic replicates can be generated rather than relying on iid residual resampling.
+
+---
+
+## Comparison with Classical VAR
+
+The difference between classical VAR and `NNS.VAR` can be summarized clearly.
+
+| Feature | Classical VAR | `NNS.VAR` |
+|---|---|---|
+| Functional form | Linear | Nonlinear nonparametric |
+| Lag structure | Usually common across variables | Scalar, vector, or list-based |
+| Missing mixed-frequency values | Requires extra modeling machinery | Built into interpolation and extrapolation |
+| Variable selection | Often manual or penalized | Built-in relevance screening with thresholding |
+| Multivariate dependence | Linear coefficients | Dependence and causation-aware feature selection |
+| Forecast output | One system forecast | Univariate, multivariate, relevant-variable map, and ensemble |
+| Ensemble weighting | Usually absent | Equal or structure-based own-lag versus cross-lag weighting |
+| Prediction intervals | Parametric or residual bootstrap | Partial-moment quantile intervals with `NNS.meboot` |
+
+The NNS version therefore preserves the idea that variables should be modeled jointly, but relaxes the linear and parametric restrictions around that idea.
+
+---
+
+## Why the Directional Framework Matters
+
+The contribution of the directional framework is not merely technical. It changes what multivariate forecasting can represent.
+
+### Nonlinearity is primary
+
+Forecast relationships do not need to be approximated by a global linear law.
+
+### Cross-variable effects can be asymmetric
+
+Dependence and causation can be directional, state-dependent, and nonlinear.
+
+### Mixed-frequency panels become tractable
+
+Incomplete higher-frequency panels are treated as a forecasting geometry problem rather than as a separate special case.
+
+### Univariate structure is preserved
+
+Each variable retains its own autoregressive signature through the univariate component.
+
+### Forecast combination is structural
+
+The ensemble reflects how much of the retained lag structure is own-history versus cross-variable history.
+
+These are not minor refinements. Together they turn vector autoregression into a flexible directional learning architecture.
+
+---
+
+## Empirical Applications
+
+The framework is naturally suited to systems in which variables move together, asymmetrically, and at different reporting frequencies.
+
+### Macroeconomics
+
+GDP, employment, inflation, industrial production, claims, rates, and spending variables are all natural candidates for mixed-frequency nonlinear nowcasting.
+
+### Finance
+
+Asset returns, volatility, spreads, rates, and macro variables often interact through threshold effects and asymmetric transmission.
+
+### Operations and supply chains
+
+Demand, inventories, labor, orders, and shipping activity may be observed on different calendars but still require joint forecasting.
+
+### Energy and commodities
+
+Production, inventories, spot prices, futures curves, and macro demand indicators evolve jointly but not linearly.
+
+In all these settings, the problem is not merely to fit a system. It is to discover how a dynamic system actually transmits information through time.
+
+---
+
+## Leakage-Safe Validation in Multivariate and Mixed-Frequency Settings
+
+Before evaluating forecast accuracy, the data pipeline itself must be audited for information timing and release-order integrity.
+
+In multivariate nowcasting/forecasting, leakage control is more delicate because indicators arrive asynchronously.
+
+Use the following rules:
+
+- **Vintage-consistent features**: at forecast origin `t`, include only indicator values actually released by `t`.
+- **Release-calendar alignment**: construct mixed-frequency regressors using publication timestamps, not finalized revised datasets.
+- **Origin-wise dimension reduction**: relevance screening/thresholding must be recomputed within each training window.
+- **Horizon-by-horizon evaluation**: report forecast and interval metrics by target horizon and by target variable.
+- **Stability diagnostics**: track how the `relevant_variables` set changes over origins to distinguish signal from selection noise.
+
+These rules make `NNS.VAR`/`NNS.nowcast` evaluations comparable to production conditions and prevent optimistic bias from look-ahead information.
+
+
+---
+
+
+## Summary
+
+This chapter extended the NNS forecasting framework from univariate series to **multivariate forecasting**.
+
+The main results are:
+
+- A multivariate time series is treated as a nonlinear autoregression problem in lagged multivariate space.
+- `NNS.VAR` combines univariate `NNS.ARMA` forecasts with multivariate nonlinear regression forecasts.
+- The lag structure is flexible: `tau` may be a scalar, vector, or list.
+- Mixed-frequency inputs are handled through interpolation and extrapolation before constructing the lagged system.
+- Dimension reduction may be based on correlation, nonlinear dependence, directional causation, or their average.
+- A lagged predictor survives the reduction step only when its relevance score exceeds the learned threshold from the `NNS.stack` screening routine.
+- The ensemble forecast is either equally weighted or weighted according to the share of retained predictors that are own-lags versus cross-variable lags.
+- In classification-style forecasting tasks with skewed targets, NNS ensemble workflows can combine minority up-sampling and majority down-sampling across ensemble members to reduce imbalance bias.
+- The output `relevant_variables` provides a structural map of the learned dynamic system.
+- `NNS.nowcast` applies this architecture directly to a practical macroeconomic panel with mixed frequencies and monthly alignment.
+- Prediction intervals may be built nonparametrically using `NNS.meboot`, `LPM.VaR`, and `UPM.VaR`.
+
+Classical vector autoregression taught an important lesson: forecasting improves when variables are modeled jointly. The NNS framework retains that insight but frees it from linearity, common-frequency restrictions, and rigid parametric structure.
+
+The result is a forecasting system for nonlinear, asymmetric, mixed-frequency multivariate data:
+
+**nonparametric vector autoregression as directional multivariate learning through time.**
+
+---
diff --git a/tools/NNS/book/chapter-27-conclusion-and-next-steps.Rmd b/tools/NNS/book/chapter-27-conclusion-and-next-steps.Rmd
new file mode 100644
index 0000000..3e7e9d8
--- /dev/null
+++ b/tools/NNS/book/chapter-27-conclusion-and-next-steps.Rmd
@@ -0,0 +1,45 @@
+# Conclusion and Next Steps
+
+The previous chapters developed the NNS framework from first principles to operational workflows in dependence analysis, distribution comparison, inference, prediction, and nonparametric estimation.
+
+A useful way to summarize the full arc is:
+
+1. **Directional building blocks** (Chapters 1–3): partial moments preserve sign and magnitude information that symmetric summaries discard.
+2. **Dependence and causation** (Part III): directional co-moments, copulas, and recursive decomposition expose asymmetric relationships and lead/lag structure.
+3. **Inference and comparison** (Part IV): continuous degree-one probability representations remove finite-sample discretization bias and support robust distributional comparisons.
+4. **Estimation and forecasting** (Part V): recursive nonparametric systems turn the same directional primitives into predictive tools without restrictive parametric assumptions.
+
+Taken together, these results support the book's unifying claim: **one directional probability language can connect theory, diagnostics, and implementation across tasks that are often taught separately.**
+
+## What the Framework Has Achieved
+
+Across the text, several practical outcomes recur.
+
+- **Unified notation and implementation**: the same lower/upper partial moment operators appear in derivations and in executable R functions.
+- **Bias-aware probability measurement**: degree-one partial moment ratios provide a continuous finite-sample correction to the step-function empirical CDF.
+- **Distribution-free comparison tools**: NNS ANOVA and stochastic dominance diagnostics compare distributions directly, rather than reducing comparisons to mean/variance-only tests.
+- **Adaptive predictive systems**: NNS regression and interval methods adapt to heteroskedastic, nonlinear structure using local empirical behavior.
+
+These capabilities matter most in real data settings where asymmetry, tail risk, and regime changes are central rather than exceptional.
+
+**Directional threshold analysis.** The framework has shown that the same lower and upper partial-moment operators generate not only distribution functions and quantiles, but also benchmark-sensitive threshold rules for adverse events. Degree zero recovers event-frequency calibration, while higher degrees permit calibration by adverse magnitude and extreme-deviation sensitivity.
+
+**Distribution-free probability control.** Partial moments support conservative tail-probability bounds through semivariance and higher-order directional dispersion measures. This connects descriptive nonparametrics to decision support without requiring strict distributional assumptions.
+
+**Finite-sample relevance under non-normality.** Partial moments are not merely conceptually distribution-free; they demonstrate improved finite-sample stability when data are skewed, heavy-tailed, or otherwise asymmetric.
+
+## Further Resources
+
+To continue beyond this book, the official implementation resources are maintained in three complementary locations:
+
+- **CRAN package**:
+- **Vignettes and method walkthroughs**:
+- **Hands-on examples in the GitHub repository**:
+
+A practical workflow is:
+
+1. Install and review the CRAN package documentation for function references and stable release behavior.
+2. Work through the vignette set for topic-focused application patterns.
+3. Use the GitHub examples index for extended, concrete scripts that can be adapted to your own data.
+
+Together, these resources provide deeper coverage of specialized applications, additional implementation detail, and more end-to-end examples than can be included in a single volume.
diff --git a/tools/NNS/book/chapter-28-appendix-notation-and-function-reference.Rmd b/tools/NNS/book/chapter-28-appendix-notation-and-function-reference.Rmd
new file mode 100644
index 0000000..c40dcec
--- /dev/null
+++ b/tools/NNS/book/chapter-28-appendix-notation-and-function-reference.Rmd
@@ -0,0 +1,106 @@
+# Appendix: Notation and Function Reference
+
+This appendix consolidates the notation used across the book and maps each object to its primary R implementation pattern. It is intended as a quick lookup for readers moving between theoretical sections and code-first workflows.
+
+Unless otherwise noted, all functions in this appendix come from the **NNS** package. In executable code, load the package once and then call functions directly as shown in the table entries below.
+
+```r
+library(NNS)
+```
+
+## Core directional operators and partial moments
+
+| Symbol | Definition | Interpretation | R function / pattern |
+|---|---|---|---|
+| $x^+$ | $\max(x,0)$ | Positive-part operator | `pmax(x, 0)` |
+| $(X-t)^+$ | $\max(X-t,0)$ | Deviation above benchmark $t$ | internal to `UPM(...)` |
+| $(t-X)^+$ | $\max(t-X,0)$ | Deviation below benchmark $t$ | internal to `LPM(...)` |
+| $L_r(t;X)$ | $E[(t-X)_+^r]$ | Lower partial moment, degree $r$ | `LPM(r, t, X)` |
+| $U_r(t;X)$ | $E[(X-t)_+^r]$ | Upper partial moment, degree $r$ | `UPM(r, t, X)` |
+| $L_r/(L_r+U_r)$ | Degree-$r$ lower ratio | Directional CDF-style probability below $t$ | `LPM.ratio(r, t, X)` |
+| $U_r/(L_r+U_r)$ | Degree-$r$ upper ratio | Directional probability above $t$ | `UPM.ratio(r, t, X)` |
+
+## Co-partial moments, dependence, and causation
+
+| Symbol | Definition / role | R function |
+|---|---|---|
+| $CoLPM, CoUPM$ | Concordant lower/upper co-partial moments | `Co.LPM(...)`, `Co.UPM(...)` |
+| $DLPM, DUPM$ | Divergent lower/upper co-partial moments | `D.LPM(...)`, `D.UPM(...)` |
+| $NNS.dep(X,Y)$ | Global nonlinear dependence measure | `NNS.dep(x, y)` |
+| $NNS.copula(X,Y)$ | Nonparametric dependence geometry / copula view | `NNS.copula(x, y)` |
+| $NNS.caus(X,Y)$ | Directional causation diagnostic | `NNS.caus(x, y)` |
+
+## Distribution comparison, dominance, and interval objects
+
+| Symbol | Definition / role | R function |
+|---|---|---|
+| $F^{(0)}(t)$ | Degree-zero empirical CDF (step measure) | `ecdf(x)(t)` or `LPM.ratio(0, t, x)` |
+| $F^{(1)}(t)$ | Degree-one continuous CDF-style ratio | `LPM.ratio(1, t, x)` |
+| $p = P(X' > Y')$ | Directional exceedance probability for pairwise comparison | estimated by cross-sample indicator averages |
+| $\text{Certainty}_{\text{ANOVA}}$ | NNS ANOVA agreement certainty from CDF benchmark deviations ($1$ = strongest agreement) | `NNS.ANOVA(...)` |
+| FSD / SSD / TSD | First-, second-, third-order stochastic dominance | `NNS.FSD(...)`, `NNS.SSD(...)`, `NNS.TSD(...)` |
+| $Q^-_{d}(\alpha)$ | Lower degree-$d$ quantile | `LPM.VaR(alpha, degree = d, x)` |
+| $Q^+_{d}(\alpha)$ | Upper degree-$d$ quantile | `UPM.VaR(alpha, degree = d, x)` |
+| PI$_{1-\alpha}$ | Prediction interval $[Q^-_d(\alpha/2),Q^+_d(\alpha/2)]$ | `LPM.VaR(...)` + `UPM.VaR(...)` |
+
+
+
+`LPM.VaR(percentile, degree, variable)`
+Lower-tail threshold operator obtained by inverting the degree-specific lower partial-moment probability representation.
+Interpretation by degree:
+
+* `degree = 0`: empirical-CDF lower quantile,
+* `degree = 1`: severity-weighted lower threshold based on directional magnitude,
+* `degree = 2`: extreme-deviation-sensitive lower threshold.
+ In finance, the degree-zero case is commonly called VaR, but the operator is more general than that label.
+
+`UPM.VaR(percentile, degree, variable)`
+Upper-tail analog of `LPM.VaR`, used for right-tail threshold selection and interval construction.
+
+
+
+## Directional Decision Regions Crosswalk (Classical → NNS)
+
+To maintain continuity with Chapter 22's directional decision-region framing, the table below maps common classical statistics and procedures to their directional NNS counterparts.
+
+| Classical statistic / workflow | Typical classical role | Directional NNS counterpart | Reference chapter | Notes |
+|---|---|---|---|---|
+| Pearson correlation | Linear association summary | `NNS.dep(x, y)` | Chapter 10 | Captures nonlinear and asymmetric dependence, not only linear co-movement. |
+| Parametric VaR / empirical quantile VaR | Tail-loss thresholding | `LPM.VaR(alpha, degree, x)` *(degree-dependent)* | Chapter 16 | Degree controls sensitivity to tail severity beyond degree-0 quantiles. |
+| Upper-tail quantile threshold | Right-tail risk/opportunity cutoff | `UPM.VaR(alpha, degree, x)` *(degree-dependent)* | Chapter 16 | Upper-tail analog to `LPM.VaR` for asymmetric interval construction. |
+| Classical ANOVA (mean-comparison test) | Group-level location comparison | `NNS.ANOVA(...)` *(degree-dependent CDF benchmarking)* | Chapter 15 | Agreement certainty is benchmarked through directional CDF-style deviations. |
+| Linear Granger-style directional inference | Lead-lag direction under linear structure | `NNS.caus(x, y)` | Chapter 14 | Directionality can be nonlinear and state dependent. |
+| Copula / Joint Tail Dependency | Joint probability of concurrent outcomes | `Co.LPM(degree, target.x, target.y, x, y)` / `Co.UPM(degree, target.x, target.y, x, y)` | Chapter 4 | Co.LPM captures concurrent downside structure; Co.UPM is the upper-tail counterpart for joint directional events. |
+| Mean-variance interval heuristics | Uncertainty bands under Gaussian assumptions | `LPM.VaR(...)` + `UPM.VaR(...)` *(degree-dependent bounds)* | Chapter 17 | Produces directional prediction intervals without normality assumptions. |
+
+
+## Regression and forecasting workflow objects
+
+| Symbol | Definition / role | R function | Reference chapter |
+|---|---|---|---|
+| $\hat y = \hat E[Y\mid X]$ | NNS conditional mean estimate | `NNS.reg(x, y)` | Chapter 22 |
+| Residual local distribution | Partition-level error distribution | via `NNS.reg(...)$Fitted.xy$residuals` outputs | Chapter 22 |
+| $\widehat{PI}(x_0)$ | Conditional prediction interval at $x_0$ | `NNS.reg(..., point.est = x0, confidence.interval = ...)` | Chapter 17 |
+| Regime-specific directional dependence | Time-local / state-local dependence | `NNS.dep(...)` on rolling/segmented windows | Chapter 10 |
+
+## A.3 Technical Note: Adaptive Order and Consistency Conditions
+
+Chapter 18 established two core consistency conditions for recursive mean-split regression:
+
+1. **Shrinking cell diameter** at each target location so local bias vanishes,
+2. **Growing cell occupancy** so local sample averages stabilize.
+
+When `order = NULL`, the implementation determines effective recursion depth per regressor from directional dependence with the response (`NNS.dep`-style logic). This modifies how quickly local cells contract across predictors, but it does not alter the fundamental structure of the consistency argument.
+
+High-dependence predictors are allocated deeper partitioning, so their local diameters contract faster in regions where signal is strong. Low-dependence predictors are partitioned more conservatively, preserving broader local averaging where aggressive refinement would primarily amplify noise. Occupancy control remains enforced through the minimum cell-size rule.
+
+The resulting estimator is therefore **locally adaptive in rate**:
+
+- in high-signal regions, convergence tracks a faster path closer to an oracle fixed-order choice for that local structure;
+- in low-signal regions, the estimator intentionally retains coarser cells, exchanging some local bias for improved stability.
+
+A concise takeaway is:
+
+> Under the Chapter 18 regularity conditions (shrinking local diameters and diverging local occupancy), dependence-driven order allocation preserves the same bias–variance decomposition used for consistency arguments, while allowing refinement to concentrate where dependence signal is stronger.
+
+The key practical point is that internal adaptive order selection keeps the same consistency checklist for users—control occupancy and ensure progressive local refinement—while often improving finite-sample stability by avoiding unnecessary depth in weak-signal coordinates.
diff --git a/tools/NNS/book/images/ARMA_ex.png b/tools/NNS/book/images/ARMA_ex.png
new file mode 100644
index 0000000..a387107
Binary files /dev/null and b/tools/NNS/book/images/ARMA_ex.png differ
diff --git a/tools/NNS/book/images/ARMA_optim.png b/tools/NNS/book/images/ARMA_optim.png
new file mode 100644
index 0000000..4a1632d
Binary files /dev/null and b/tools/NNS/book/images/ARMA_optim.png differ
diff --git a/tools/NNS/book/images/ARMA_optim_h_50.png b/tools/NNS/book/images/ARMA_optim_h_50.png
new file mode 100644
index 0000000..bd8f66f
Binary files /dev/null and b/tools/NNS/book/images/ARMA_optim_h_50.png differ
diff --git a/tools/NNS/book/images/CDFs_1.png b/tools/NNS/book/images/CDFs_1.png
new file mode 100644
index 0000000..693ee3a
Binary files /dev/null and b/tools/NNS/book/images/CDFs_1.png differ
diff --git a/tools/NNS/book/images/CDFs_2.png b/tools/NNS/book/images/CDFs_2.png
new file mode 100644
index 0000000..6dc5a81
Binary files /dev/null and b/tools/NNS/book/images/CDFs_2.png differ
diff --git a/tools/NNS/book/images/NNS_hex_sticker.png b/tools/NNS/book/images/NNS_hex_sticker.png
new file mode 100644
index 0000000..a0271fb
Binary files /dev/null and b/tools/NNS/book/images/NNS_hex_sticker.png differ
diff --git a/tools/NNS/book/images/NNSmc_1.png b/tools/NNS/book/images/NNSmc_1.png
new file mode 100644
index 0000000..16cb75f
Binary files /dev/null and b/tools/NNS/book/images/NNSmc_1.png differ
diff --git a/tools/NNS/book/images/NNSmc_1_tgt_drift.png b/tools/NNS/book/images/NNSmc_1_tgt_drift.png
new file mode 100644
index 0000000..9be0802
Binary files /dev/null and b/tools/NNS/book/images/NNSmc_1_tgt_drift.png differ
diff --git a/tools/NNS/book/images/boost_freq.png b/tools/NNS/book/images/boost_freq.png
new file mode 100644
index 0000000..268b7fa
Binary files /dev/null and b/tools/NNS/book/images/boost_freq.png differ
diff --git a/tools/NNS/book/images/ch11_raw_copula.png b/tools/NNS/book/images/ch11_raw_copula.png
new file mode 100644
index 0000000..6dfe04f
Binary files /dev/null and b/tools/NNS/book/images/ch11_raw_copula.png differ
diff --git a/tools/NNS/book/images/ch11_transformed_copula.png b/tools/NNS/book/images/ch11_transformed_copula.png
new file mode 100644
index 0000000..9764566
Binary files /dev/null and b/tools/NNS/book/images/ch11_transformed_copula.png differ
diff --git a/tools/NNS/book/images/ch14_lpm0_lpm1_diff.png b/tools/NNS/book/images/ch14_lpm0_lpm1_diff.png
new file mode 100644
index 0000000..ef3aea1
Binary files /dev/null and b/tools/NNS/book/images/ch14_lpm0_lpm1_diff.png differ
diff --git a/tools/NNS/book/images/ch15_reg_conf_int.png b/tools/NNS/book/images/ch15_reg_conf_int.png
new file mode 100644
index 0000000..53589d9
Binary files /dev/null and b/tools/NNS/book/images/ch15_reg_conf_int.png differ
diff --git a/tools/NNS/book/images/ch17_iid_mc_sim.png b/tools/NNS/book/images/ch17_iid_mc_sim.png
new file mode 100644
index 0000000..733c8d1
Binary files /dev/null and b/tools/NNS/book/images/ch17_iid_mc_sim.png differ
diff --git a/tools/NNS/book/images/ch17_meboot_mc_sim.png b/tools/NNS/book/images/ch17_meboot_mc_sim.png
new file mode 100644
index 0000000..b88b9e7
Binary files /dev/null and b/tools/NNS/book/images/ch17_meboot_mc_sim.png differ
diff --git a/tools/NNS/book/images/ch17_meboot_orig.png b/tools/NNS/book/images/ch17_meboot_orig.png
new file mode 100644
index 0000000..3dda527
Binary files /dev/null and b/tools/NNS/book/images/ch17_meboot_orig.png differ
diff --git a/tools/NNS/book/images/ch18_part_1.png b/tools/NNS/book/images/ch18_part_1.png
new file mode 100644
index 0000000..cb49d67
Binary files /dev/null and b/tools/NNS/book/images/ch18_part_1.png differ
diff --git a/tools/NNS/book/images/ch18_part_2.png b/tools/NNS/book/images/ch18_part_2.png
new file mode 100644
index 0000000..24abeb7
Binary files /dev/null and b/tools/NNS/book/images/ch18_part_2.png differ
diff --git a/tools/NNS/book/images/ch20_kmeans_comp.png b/tools/NNS/book/images/ch20_kmeans_comp.png
new file mode 100644
index 0000000..47dfe8c
Binary files /dev/null and b/tools/NNS/book/images/ch20_kmeans_comp.png differ
diff --git a/tools/NNS/book/images/ch21_part_reg.png b/tools/NNS/book/images/ch21_part_reg.png
new file mode 100644
index 0000000..c7d1a4d
Binary files /dev/null and b/tools/NNS/book/images/ch21_part_reg.png differ
diff --git a/tools/NNS/book/images/ch24_uni_ts.png b/tools/NNS/book/images/ch24_uni_ts.png
new file mode 100644
index 0000000..759cea8
Binary files /dev/null and b/tools/NNS/book/images/ch24_uni_ts.png differ
diff --git a/tools/NNS/book/images/ch3_cdf_lpm0.png b/tools/NNS/book/images/ch3_cdf_lpm0.png
new file mode 100644
index 0000000..48c2c9e
Binary files /dev/null and b/tools/NNS/book/images/ch3_cdf_lpm0.png differ
diff --git a/tools/NNS/book/images/multi_impute.png b/tools/NNS/book/images/multi_impute.png
new file mode 100644
index 0000000..776770b
Binary files /dev/null and b/tools/NNS/book/images/multi_impute.png differ
diff --git a/tools/NNS/book/images/overview_arma.png b/tools/NNS/book/images/overview_arma.png
new file mode 100644
index 0000000..23b9fed
Binary files /dev/null and b/tools/NNS/book/images/overview_arma.png differ
diff --git a/tools/NNS/book/images/overview_reg.png b/tools/NNS/book/images/overview_reg.png
new file mode 100644
index 0000000..9526dfb
Binary files /dev/null and b/tools/NNS/book/images/overview_reg.png differ
diff --git a/tools/NNS/book/images/uni_impute.png b/tools/NNS/book/images/uni_impute.png
new file mode 100644
index 0000000..3473b8c
Binary files /dev/null and b/tools/NNS/book/images/uni_impute.png differ
diff --git a/tools/NNS/book/index.Rmd b/tools/NNS/book/index.Rmd
new file mode 100644
index 0000000..d80af78
--- /dev/null
+++ b/tools/NNS/book/index.Rmd
@@ -0,0 +1,60 @@
+---
+title: "Nonlinear Nonparametric Statistics: Using Partial Moments"
+subtitle: "Second Edition"
+author: "Fred Viole"
+date: 2026
+site: bookdown::bookdown_site
+output: bookdown::gitbook
+documentclass: book
+header-includes:
+ - \usepackage{xcolor}
+---
+
+# Preface {-}
+
+This is the **Second Edition** of *Nonlinear Nonparametric Statistics: Using Partial Moments*, updated and expanded by Fred Viole in 2026, building upon the foundational 2013 work with David Nawrocki.
+
+This book presents the **Nonlinear Nonparametric Statistics (NNS)** framework as a coherent toolkit for modeling dependence, uncertainty, prediction, and decision-making without imposing restrictive distributional assumptions.
+
+The chapters are organized to move from foundational concepts to practical modeling workflows:
+
+- first principles of nonlinear dependence and directional relationships,
+- nonparametric methods for regression, classification, and density-based tasks,
+- time-series forecasting frameworks for univariate and multivariate settings,
+- and implementation guidance for applied research and production analytics.
+
+The central theme is consistent throughout: when data are asymmetric, heavy-tailed, nonlinear, or regime-sensitive, useful structure can still be extracted directly from the data-generating process using directional and nonparametric methods.
+
+## Executive Summary {-}
+
+This book is designed for readers who want mathematically grounded methods that remain practical in real-world settings where classical assumptions can fail.
+
+At a high level, the NNS framework emphasizes:
+
+- **distribution-agnostic modeling** rather than strict parametric family selection,
+- **directional dependence and causation diagnostics** instead of purely symmetric association summaries,
+- **nonlinear predictive systems** that can adapt to heterogeneous signal structures,
+- and **modular workflows in R** so methods can be combined for exploratory analysis, forecasting, and risk assessment.
+
+Readers can use the text in two ways:
+
+1. **Sequentially**, as a complete conceptual arc from core definitions to advanced forecasting systems.
+2. **As a reference**, by jumping directly to method-specific chapters and accompanying implementation examples.
+
+Whether your domain is economics, finance, operations, policy, or scientific research, the goal is the same: to provide robust, interpretable, and applied nonparametric tools for difficult data.
+
+## About the Examples Repository {-}
+
+This book is designed to be used alongside the companion examples repository:
+
+-
+
+The repository is organized as a practical application layer. Conceptual and theoretical development lives in the book, while reproducible scripts and end-to-end demonstrations live in the examples.
+
+A useful way to navigate both resources together is:
+
+1. Read the chapter for the theoretical framework and notation.
+2. Open the matching section in `examples/README.md` for runnable code patterns.
+3. Adapt those scripts to your own data and evaluate with your domain constraints.
+
+The examples repository is intended for hands-on implementation, not as a substitute for the proofs and derivations developed in the text. Keep the repository disclaimer in mind when applying any script directly to production or policy settings; examples are instructional templates and should be validated, stress-tested, and context-calibrated before operational use.
diff --git a/tools/NNS/book/nns-book.log b/tools/NNS/book/nns-book.log
new file mode 100644
index 0000000..9eca2e7
--- /dev/null
+++ b/tools/NNS/book/nns-book.log
@@ -0,0 +1,1850 @@
+This is XeTeX, Version 3.141592653-2.6-0.999998 (TeX Live 2026) (preloaded format=xelatex 2026.5.9) 16 MAY 2026 19:46
+entering extended mode
+ restricted \write18 enabled.
+ %&-line parsing enabled.
+**nns-book.tex
+(./nns-book.tex
+LaTeX2e <2025-11-01>
+L3 programming layer <2026-03-20>
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/base/book.cls
+Document Class: book 2025/01/22 v1.4n Standard LaTeX document class
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/base/bk10.clo
+File: bk10.clo 2025/01/22 v1.4n Standard LaTeX file (size option)
+)
+\c@part=\count271
+\c@chapter=\count272
+\c@section=\count273
+\c@subsection=\count274
+\c@subsubsection=\count275
+\c@paragraph=\count276
+\c@subparagraph=\count277
+\c@figure=\count278
+\c@table=\count279
+\abovecaptionskip=\skip49
+\belowcaptionskip=\skip50
+\bibindent=\dimen150
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/xcolor/xcolor.sty
+Package: xcolor 2024/09/29 v3.02 LaTeX color extensions (UK)
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics-cfg/color.cfg
+File: color.cfg 2016/01/02 v1.6 sample color configuration
+)
+Package xcolor Info: Driver file: xetex.def on input line 274.
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics-def/xetex.def
+File: xetex.def 2025/11/01 v5.0p Graphics/color driver for xetex
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics/mathcolor.ltx)
+Package xcolor Info: Model `cmy' substituted by `cmy0' on input line 1349.
+Package xcolor Info: Model `RGB' extended on input line 1365.
+Package xcolor Info: Model `HTML' substituted by `rgb' on input line 1367.
+Package xcolor Info: Model `Hsb' substituted by `hsb' on input line 1368.
+Package xcolor Info: Model `tHsb' substituted by `hsb' on input line 1369.
+Package xcolor Info: Model `HSB' substituted by `hsb' on input line 1370.
+Package xcolor Info: Model `Gray' substituted by `gray' on input line 1371.
+Package xcolor Info: Model `wave' substituted by `hsb' on input line 1372.
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsmath/amsmath.sty
+Package: amsmath 2025/07/09 v2.17z AMS math features
+\@mathmargin=\skip51
+For additional information on amsmath, use the `?' option.
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsmath/amstext.sty
+Package: amstext 2024/11/17 v2.01 AMS text
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsmath/amsgen.sty
+File: amsgen.sty 1999/11/30 v2.0 generic functions
+\@emptytoks=\toks17
+\ex@=\dimen151
+)) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsmath/amsbsy.sty
+Package: amsbsy 1999/11/29 v1.2d Bold Symbols
+\pmbraise@=\dimen152
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsmath/amsopn.sty
+Package: amsopn 2022/04/08 v2.04 operator names
+)
+\inf@bad=\count280
+LaTeX Info: Redefining \frac on input line 233.
+\uproot@=\count281
+\leftroot@=\count282
+LaTeX Info: Redefining \overline on input line 398.
+LaTeX Info: Redefining \colon on input line 409.
+\classnum@=\count283
+\DOTSCASE@=\count284
+LaTeX Info: Redefining \ldots on input line 495.
+LaTeX Info: Redefining \dots on input line 498.
+LaTeX Info: Redefining \cdots on input line 619.
+\Mathstrutbox@=\box53
+\strutbox@=\box54
+LaTeX Info: Redefining \big on input line 721.
+LaTeX Info: Redefining \Big on input line 722.
+LaTeX Info: Redefining \bigg on input line 723.
+LaTeX Info: Redefining \Bigg on input line 724.
+\big@size=\dimen153
+LaTeX Font Info: Redeclaring font encoding OML on input line 742.
+LaTeX Font Info: Redeclaring font encoding OMS on input line 743.
+\macc@depth=\count285
+LaTeX Info: Redefining \bmod on input line 904.
+LaTeX Info: Redefining \pmod on input line 909.
+LaTeX Info: Redefining \smash on input line 939.
+LaTeX Info: Redefining \relbar on input line 969.
+LaTeX Info: Redefining \Relbar on input line 970.
+\c@MaxMatrixCols=\count286
+\dotsspace@=\muskip17
+\c@parentequation=\count287
+\dspbrk@lvl=\count288
+\tag@help=\toks18
+\row@=\count289
+\column@=\count290
+\maxfields@=\count291
+\andhelp@=\toks19
+\eqnshift@=\dimen154
+\alignsep@=\dimen155
+\tagshift@=\dimen156
+\tagwidth@=\dimen157
+\totwidth@=\dimen158
+\lineht@=\dimen159
+\@envbody=\toks20
+\multlinegap=\skip52
+\multlinetaggap=\skip53
+\mathdisplay@stack=\toks21
+LaTeX Info: Redefining \[ on input line 2950.
+LaTeX Info: Redefining \] on input line 2951.
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsfonts/amssymb.sty
+Package: amssymb 2013/01/14 v3.01 AMS font symbols
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsfonts/amsfonts.sty
+Package: amsfonts 2013/01/14 v3.01 Basic AMSFonts support
+\symAMSa=\mathgroup4
+\symAMSb=\mathgroup5
+LaTeX Font Info: Redeclaring math symbol \hbar on input line 98.
+LaTeX Font Info: Overwriting math alphabet `\mathfrak' in version `bold'
+(Font) U/euf/m/n --> U/euf/b/n on input line 106.
+)) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/iftex/iftex.sty
+Package: iftex 2024/12/12 v1.0g TeX engine tests
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/unicode-math/unicode-math.sty (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/l3kernel/expl3.sty
+Package: expl3 2026-03-20 L3 programming layer (loader)
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/l3backend/l3backend-xetex.def
+File: l3backend-xetex.def 2026-02-18 L3 backend support: XeTeX
+\g__graphics_track_int=\count292
+\g__pdfannot_backend_int=\count293
+\g__pdfannot_backend_link_int=\count294
+))
+Package: unicode-math 2023/08/13 v0.8r Unicode maths in XeLaTeX and LuaLaTeX
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/unicode-math/unicode-math-xetex.sty
+Package: unicode-math-xetex 2023/08/13 v0.8r Unicode maths in XeLaTeX and LuaLaTeX
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/l3packages/xparse/xparse.sty
+Package: xparse 2025-10-09 L3 Experimental document command parser
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/l3packages/l3keys2e/l3keys2e.sty
+Package: l3keys2e 2025-10-09 LaTeX2e option processing using LaTeX3 keys
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/fontspec/fontspec.sty
+Package: fontspec 2025/09/29 v2.9g Font selection for XeLaTeX and LuaLaTeX
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/fontspec/fontspec-xetex.sty
+Package: fontspec-xetex 2025/09/29 v2.9g Font selection for XeLaTeX and LuaLaTeX
+\l__fontspec_script_int=\count295
+\l__fontspec_language_int=\count296
+\l__fontspec_strnum_int=\count297
+\l__fontspec_tmp_int=\count298
+\l__fontspec_tmpa_int=\count299
+\l__fontspec_tmpb_int=\count300
+\l__fontspec_tmpc_int=\count301
+\l__fontspec_em_int=\count302
+\l__fontspec_emdef_int=\count303
+\l__fontspec_strong_int=\count304
+\l__fontspec_strongdef_int=\count305
+\l__fontspec_tmpa_dim=\dimen160
+\l__fontspec_tmpb_dim=\dimen161
+\l__fontspec_tmpc_dim=\dimen162
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/base/fontenc.sty
+Package: fontenc 2025/07/18 v2.1d Standard LaTeX package
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/fontspec/fontspec.cfg))) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/base/fix-cm.sty
+Package: fix-cm 2020/11/24 v1.1t fixes to LaTeX
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/base/ts1enc.def
+File: ts1enc.def 2001/06/05 v3.0e (jk/car/fm) Standard LaTeX file
+LaTeX Font Info: Redeclaring font encoding TS1 on input line 47.
+LaTeX Encoding Info: Redeclaring text command \capitalcedilla (encoding TS1) on input line 49.
+LaTeX Encoding Info: Redeclaring text command \capitalogonek (encoding TS1) on input line 52.
+LaTeX Encoding Info: Redeclaring text command \capitalgrave (encoding TS1) on input line 55.
+LaTeX Encoding Info: Redeclaring text command \capitalacute (encoding TS1) on input line 56.
+LaTeX Encoding Info: Redeclaring text command \capitalcircumflex (encoding TS1) on input line 57.
+LaTeX Encoding Info: Redeclaring text command \capitaltilde (encoding TS1) on input line 58.
+LaTeX Encoding Info: Redeclaring text command \capitaldieresis (encoding TS1) on input line 59.
+LaTeX Encoding Info: Redeclaring text command \capitalhungarumlaut (encoding TS1) on input line 60.
+LaTeX Encoding Info: Redeclaring text command \capitalring (encoding TS1) on input line 61.
+LaTeX Encoding Info: Redeclaring text command \capitalcaron (encoding TS1) on input line 62.
+LaTeX Encoding Info: Redeclaring text command \capitalbreve (encoding TS1) on input line 63.
+LaTeX Encoding Info: Redeclaring text command \capitalmacron (encoding TS1) on input line 64.
+LaTeX Encoding Info: Redeclaring text command \capitaldotaccent (encoding TS1) on input line 65.
+LaTeX Encoding Info: Redeclaring text command \t (encoding TS1) on input line 66.
+LaTeX Encoding Info: Redeclaring text command \capitaltie (encoding TS1) on input line 67.
+LaTeX Encoding Info: Redeclaring text command \newtie (encoding TS1) on input line 68.
+LaTeX Encoding Info: Redeclaring text command \capitalnewtie (encoding TS1) on input line 69.
+LaTeX Encoding Info: Redeclaring text symbol \textcapitalcompwordmark (encoding TS1) on input line 70.
+LaTeX Encoding Info: Redeclaring text symbol \textascendercompwordmark (encoding TS1) on input line 71.
+LaTeX Encoding Info: Redeclaring text symbol \textquotestraightbase (encoding TS1) on input line 72.
+LaTeX Encoding Info: Redeclaring text symbol \textquotestraightdblbase (encoding TS1) on input line 73.
+LaTeX Encoding Info: Redeclaring text symbol \texttwelveudash (encoding TS1) on input line 74.
+LaTeX Encoding Info: Redeclaring text symbol \textthreequartersemdash (encoding TS1) on input line 75.
+LaTeX Encoding Info: Redeclaring text symbol \textleftarrow (encoding TS1) on input line 76.
+LaTeX Encoding Info: Redeclaring text symbol \textrightarrow (encoding TS1) on input line 77.
+LaTeX Encoding Info: Redeclaring text symbol \textblank (encoding TS1) on input line 78.
+LaTeX Encoding Info: Redeclaring text symbol \textdollar (encoding TS1) on input line 79.
+LaTeX Encoding Info: Redeclaring text symbol \textquotesingle (encoding TS1) on input line 80.
+LaTeX Encoding Info: Redeclaring text command \textasteriskcentered (encoding TS1) on input line 81.
+LaTeX Encoding Info: Redeclaring text symbol \textdblhyphen (encoding TS1) on input line 92.
+LaTeX Encoding Info: Redeclaring text symbol \textfractionsolidus (encoding TS1) on input line 93.
+LaTeX Encoding Info: Redeclaring text symbol \textzerooldstyle (encoding TS1) on input line 94.
+LaTeX Encoding Info: Redeclaring text symbol \textoneoldstyle (encoding TS1) on input line 95.
+LaTeX Encoding Info: Redeclaring text symbol \texttwooldstyle (encoding TS1) on input line 96.
+LaTeX Encoding Info: Redeclaring text symbol \textthreeoldstyle (encoding TS1) on input line 97.
+LaTeX Encoding Info: Redeclaring text symbol \textfouroldstyle (encoding TS1) on input line 98.
+LaTeX Encoding Info: Redeclaring text symbol \textfiveoldstyle (encoding TS1) on input line 99.
+LaTeX Encoding Info: Redeclaring text symbol \textsixoldstyle (encoding TS1) on input line 100.
+LaTeX Encoding Info: Redeclaring text symbol \textsevenoldstyle (encoding TS1) on input line 101.
+LaTeX Encoding Info: Redeclaring text symbol \texteightoldstyle (encoding TS1) on input line 102.
+LaTeX Encoding Info: Redeclaring text symbol \textnineoldstyle (encoding TS1) on input line 103.
+LaTeX Encoding Info: Redeclaring text symbol \textlangle (encoding TS1) on input line 104.
+LaTeX Encoding Info: Redeclaring text symbol \textminus (encoding TS1) on input line 105.
+LaTeX Encoding Info: Redeclaring text symbol \textrangle (encoding TS1) on input line 106.
+LaTeX Encoding Info: Redeclaring text symbol \textmho (encoding TS1) on input line 107.
+LaTeX Encoding Info: Redeclaring text symbol \textbigcircle (encoding TS1) on input line 108.
+LaTeX Encoding Info: Redeclaring text command \textcircled (encoding TS1) on input line 109.
+LaTeX Encoding Info: Redeclaring text symbol \textohm (encoding TS1) on input line 115.
+LaTeX Encoding Info: Redeclaring text symbol \textlbrackdbl (encoding TS1) on input line 116.
+LaTeX Encoding Info: Redeclaring text symbol \textrbrackdbl (encoding TS1) on input line 117.
+LaTeX Encoding Info: Redeclaring text symbol \textuparrow (encoding TS1) on input line 118.
+LaTeX Encoding Info: Redeclaring text symbol \textdownarrow (encoding TS1) on input line 119.
+LaTeX Encoding Info: Redeclaring text symbol \textasciigrave (encoding TS1) on input line 120.
+LaTeX Encoding Info: Redeclaring text symbol \textborn (encoding TS1) on input line 121.
+LaTeX Encoding Info: Redeclaring text symbol \textdivorced (encoding TS1) on input line 122.
+LaTeX Encoding Info: Redeclaring text symbol \textdied (encoding TS1) on input line 123.
+LaTeX Encoding Info: Redeclaring text symbol \textleaf (encoding TS1) on input line 124.
+LaTeX Encoding Info: Redeclaring text symbol \textmarried (encoding TS1) on input line 125.
+LaTeX Encoding Info: Redeclaring text symbol \textmusicalnote (encoding TS1) on input line 126.
+LaTeX Encoding Info: Redeclaring text symbol \texttildelow (encoding TS1) on input line 127.
+LaTeX Encoding Info: Redeclaring text symbol \textdblhyphenchar (encoding TS1) on input line 128.
+LaTeX Encoding Info: Redeclaring text symbol \textasciibreve (encoding TS1) on input line 129.
+LaTeX Encoding Info: Redeclaring text symbol \textasciicaron (encoding TS1) on input line 130.
+LaTeX Encoding Info: Redeclaring text symbol \textacutedbl (encoding TS1) on input line 131.
+LaTeX Encoding Info: Redeclaring text symbol \textgravedbl (encoding TS1) on input line 132.
+LaTeX Encoding Info: Redeclaring text symbol \textdagger (encoding TS1) on input line 133.
+LaTeX Encoding Info: Redeclaring text symbol \textdaggerdbl (encoding TS1) on input line 134.
+LaTeX Encoding Info: Redeclaring text symbol \textbardbl (encoding TS1) on input line 135.
+LaTeX Encoding Info: Redeclaring text symbol \textperthousand (encoding TS1) on input line 136.
+LaTeX Encoding Info: Redeclaring text symbol \textbullet (encoding TS1) on input line 137.
+LaTeX Encoding Info: Redeclaring text symbol \textcelsius (encoding TS1) on input line 138.
+LaTeX Encoding Info: Redeclaring text symbol \textdollaroldstyle (encoding TS1) on input line 139.
+LaTeX Encoding Info: Redeclaring text symbol \textcentoldstyle (encoding TS1) on input line 140.
+LaTeX Encoding Info: Redeclaring text symbol \textflorin (encoding TS1) on input line 141.
+LaTeX Encoding Info: Redeclaring text symbol \textcolonmonetary (encoding TS1) on input line 142.
+LaTeX Encoding Info: Redeclaring text symbol \textwon (encoding TS1) on input line 143.
+LaTeX Encoding Info: Redeclaring text symbol \textnaira (encoding TS1) on input line 144.
+LaTeX Encoding Info: Redeclaring text symbol \textguarani (encoding TS1) on input line 145.
+LaTeX Encoding Info: Redeclaring text symbol \textpeso (encoding TS1) on input line 146.
+LaTeX Encoding Info: Redeclaring text symbol \textlira (encoding TS1) on input line 147.
+LaTeX Encoding Info: Redeclaring text symbol \textrecipe (encoding TS1) on input line 148.
+LaTeX Encoding Info: Redeclaring text symbol \textinterrobang (encoding TS1) on input line 149.
+LaTeX Encoding Info: Redeclaring text symbol \textinterrobangdown (encoding TS1) on input line 150.
+LaTeX Encoding Info: Redeclaring text symbol \textdong (encoding TS1) on input line 151.
+LaTeX Encoding Info: Redeclaring text symbol \texttrademark (encoding TS1) on input line 152.
+LaTeX Encoding Info: Redeclaring text symbol \textpertenthousand (encoding TS1) on input line 153.
+LaTeX Encoding Info: Redeclaring text symbol \textpilcrow (encoding TS1) on input line 154.
+LaTeX Encoding Info: Redeclaring text symbol \textbaht (encoding TS1) on input line 155.
+LaTeX Encoding Info: Redeclaring text symbol \textnumero (encoding TS1) on input line 156.
+LaTeX Encoding Info: Redeclaring text symbol \textdiscount (encoding TS1) on input line 157.
+LaTeX Encoding Info: Redeclaring text symbol \textestimated (encoding TS1) on input line 158.
+LaTeX Encoding Info: Redeclaring text symbol \textopenbullet (encoding TS1) on input line 159.
+LaTeX Encoding Info: Redeclaring text symbol \textservicemark (encoding TS1) on input line 160.
+LaTeX Encoding Info: Redeclaring text symbol \textlquill (encoding TS1) on input line 161.
+LaTeX Encoding Info: Redeclaring text symbol \textrquill (encoding TS1) on input line 162.
+LaTeX Encoding Info: Redeclaring text symbol \textcent (encoding TS1) on input line 163.
+LaTeX Encoding Info: Redeclaring text symbol \textsterling (encoding TS1) on input line 164.
+LaTeX Encoding Info: Redeclaring text symbol \textcurrency (encoding TS1) on input line 165.
+LaTeX Encoding Info: Redeclaring text symbol \textyen (encoding TS1) on input line 166.
+LaTeX Encoding Info: Redeclaring text symbol \textbrokenbar (encoding TS1) on input line 167.
+LaTeX Encoding Info: Redeclaring text symbol \textsection (encoding TS1) on input line 168.
+LaTeX Encoding Info: Redeclaring text symbol \textasciidieresis (encoding TS1) on input line 169.
+LaTeX Encoding Info: Redeclaring text symbol \textcopyright (encoding TS1) on input line 170.
+LaTeX Encoding Info: Redeclaring text symbol \textordfeminine (encoding TS1) on input line 171.
+LaTeX Encoding Info: Redeclaring text symbol \textcopyleft (encoding TS1) on input line 172.
+LaTeX Encoding Info: Redeclaring text symbol \textlnot (encoding TS1) on input line 173.
+LaTeX Encoding Info: Redeclaring text symbol \textcircledP (encoding TS1) on input line 174.
+LaTeX Encoding Info: Redeclaring text symbol \textregistered (encoding TS1) on input line 175.
+LaTeX Encoding Info: Redeclaring text symbol \textasciimacron (encoding TS1) on input line 176.
+LaTeX Encoding Info: Redeclaring text symbol \textdegree (encoding TS1) on input line 177.
+LaTeX Encoding Info: Redeclaring text symbol \textpm (encoding TS1) on input line 178.
+LaTeX Encoding Info: Redeclaring text symbol \texttwosuperior (encoding TS1) on input line 179.
+LaTeX Encoding Info: Redeclaring text symbol \textthreesuperior (encoding TS1) on input line 180.
+LaTeX Encoding Info: Redeclaring text symbol \textasciiacute (encoding TS1) on input line 181.
+LaTeX Encoding Info: Redeclaring text symbol \textmu (encoding TS1) on input line 182.
+LaTeX Encoding Info: Redeclaring text symbol \textparagraph (encoding TS1) on input line 183.
+LaTeX Encoding Info: Redeclaring text symbol \textperiodcentered (encoding TS1) on input line 184.
+LaTeX Encoding Info: Redeclaring text symbol \textreferencemark (encoding TS1) on input line 185.
+LaTeX Encoding Info: Redeclaring text symbol \textonesuperior (encoding TS1) on input line 186.
+LaTeX Encoding Info: Redeclaring text symbol \textordmasculine (encoding TS1) on input line 187.
+LaTeX Encoding Info: Redeclaring text symbol \textsurd (encoding TS1) on input line 188.
+LaTeX Encoding Info: Redeclaring text symbol \textonequarter (encoding TS1) on input line 189.
+LaTeX Encoding Info: Redeclaring text symbol \textonehalf (encoding TS1) on input line 190.
+LaTeX Encoding Info: Redeclaring text symbol \textthreequarters (encoding TS1) on input line 191.
+LaTeX Encoding Info: Redeclaring text symbol \texteuro (encoding TS1) on input line 192.
+LaTeX Encoding Info: Redeclaring text symbol \texttimes (encoding TS1) on input line 193.
+LaTeX Encoding Info: Redeclaring text symbol \textdiv (encoding TS1) on input line 194.
+))
+\g__um_fam_int=\count306
+\g__um_fonts_used_int=\count307
+\l__um_primecount_int=\count308
+\g__um_primekern_muskip=\muskip18
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/unicode-math/unicode-math-table.tex))) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/lm/lmodern.sty
+Package: lmodern 2015/05/01 v1.6.1 Latin Modern Fonts
+LaTeX Font Info: Overwriting symbol font `operators' in version `normal'
+(Font) OT1/cmr/m/n --> OT1/lmr/m/n on input line 22.
+LaTeX Font Info: Overwriting symbol font `letters' in version `normal'
+(Font) OML/cmm/m/it --> OML/lmm/m/it on input line 23.
+LaTeX Font Info: Overwriting symbol font `symbols' in version `normal'
+(Font) OMS/cmsy/m/n --> OMS/lmsy/m/n on input line 24.
+LaTeX Font Info: Overwriting symbol font `largesymbols' in version `normal'
+(Font) OMX/cmex/m/n --> OMX/lmex/m/n on input line 25.
+LaTeX Font Info: Overwriting symbol font `operators' in version `bold'
+(Font) OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 26.
+LaTeX Font Info: Overwriting symbol font `letters' in version `bold'
+(Font) OML/cmm/b/it --> OML/lmm/b/it on input line 27.
+LaTeX Font Info: Overwriting symbol font `symbols' in version `bold'
+(Font) OMS/cmsy/b/n --> OMS/lmsy/b/n on input line 28.
+LaTeX Font Info: Overwriting symbol font `largesymbols' in version `bold'
+(Font) OMX/cmex/m/n --> OMX/lmex/m/n on input line 29.
+LaTeX Font Info: Overwriting math alphabet `\mathbf' in version `normal'
+(Font) OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 31.
+LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `normal'
+(Font) OT1/cmss/m/n --> OT1/lmss/m/n on input line 32.
+LaTeX Font Info: Overwriting math alphabet `\mathit' in version `normal'
+(Font) OT1/cmr/m/it --> OT1/lmr/m/it on input line 33.
+LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `normal'
+(Font) OT1/cmtt/m/n --> OT1/lmtt/m/n on input line 34.
+LaTeX Font Info: Overwriting math alphabet `\mathbf' in version `bold'
+(Font) OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 35.
+LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `bold'
+(Font) OT1/cmss/bx/n --> OT1/lmss/bx/n on input line 36.
+LaTeX Font Info: Overwriting math alphabet `\mathit' in version `bold'
+(Font) OT1/cmr/bx/it --> OT1/lmr/bx/it on input line 37.
+LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `bold'
+(Font) OT1/cmtt/m/n --> OT1/lmtt/m/n on input line 38.
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/upquote/upquote.sty
+Package: upquote 2012/04/19 v1.3 upright-quote and grave-accent glyphs in verbatim
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/base/textcomp.sty
+Package: textcomp 2024/04/24 v2.1b Standard LaTeX package
+)) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/microtype/microtype.sty
+Package: microtype 2026/03/01 v3.2d Micro-typographical refinements (RS)
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics/keyval.sty
+Package: keyval 2022/05/29 v1.15 key=value parser (DPC)
+\KV@toks@=\toks22
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/etoolbox/etoolbox.sty
+Package: etoolbox 2025/10/02 v2.5m e-TeX tools for LaTeX (JAW)
+\etb@tempcnta=\count309
+)
+\MT@toks=\toks23
+\MT@tempbox=\box55
+\MT@count=\count310
+LaTeX Info: Redefining \noprotrusionifhmode on input line 1084.
+LaTeX Info: Redefining \leftprotrusion on input line 1085.
+\MT@prot@toks=\toks24
+LaTeX Info: Redefining \rightprotrusion on input line 1104.
+LaTeX Info: Redefining \textls on input line 1449.
+\MT@outer@kern=\dimen163
+LaTeX Info: Redefining \microtypecontext on input line 2053.
+LaTeX Info: Redefining \textmicrotypecontext on input line 2070.
+\MT@listname@count=\count311
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/microtype/microtype-xetex.def
+File: microtype-xetex.def 2026/03/01 v3.2d Definitions specific to xetex (RS)
+LaTeX Info: Redefining \lsstyle on input line 443.
+LaTeX Info: Redefining \lslig on input line 451.
+\MT@outer@space=\skip54
+)
+Package microtype Info: Loading configuration file microtype.cfg.
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/microtype/microtype.cfg
+File: microtype.cfg 2026/03/01 v3.2d microtype main configuration file (RS)
+)
+LaTeX Info: Redefining \microtypesetup on input line 3065.
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/parskip/parskip.sty
+Package: parskip 2021-03-14 v2.0h non-zero parskip adjustments
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/kvoptions/kvoptions.sty
+Package: kvoptions 2022-06-15 v3.15 Key value format for package options (HO)
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/ltxcmds/ltxcmds.sty
+Package: ltxcmds 2023-12-04 v1.26 LaTeX kernel commands for general use (HO)
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/kvsetkeys/kvsetkeys.sty
+Package: kvsetkeys 2022-10-05 v1.19 Key value parser (HO)
+))) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/fancyvrb/fancyvrb.sty
+Package: fancyvrb 2026/04/16 4.6a verbatim text (tvz,hv)
+\FV@CodeLineNo=\count312
+\FV@InFile=\read2
+\FV@TabBox=\box56
+\c@FancyVerbLine=\count313
+\FV@StepNumber=\count314
+\FV@OutFile=\write3
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/framed/framed.sty
+Package: framed 2011/10/22 v 0.96: framed or shaded text with page breaks
+\OuterFrameSep=\skip55
+\fb@frw=\dimen164
+\fb@frh=\dimen165
+\FrameRule=\dimen166
+\FrameSep=\dimen167
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/tools/longtable.sty
+Package: longtable 2025-10-13 v4.24 Multi-page Table package (DPC)
+\LTleft=\skip56
+\LTright=\skip57
+\LTpre=\skip58
+\LTpost=\skip59
+\LTchunksize=\count315
+\LTcapwidth=\dimen168
+\LT@head=\box57
+\LT@firsthead=\box58
+\LT@foot=\box59
+\LT@lastfoot=\box60
+\LT@gbox=\box61
+\LT@cols=\count316
+\LT@rows=\count317
+\c@LT@tables=\count318
+\c@LT@chunks=\count319
+\LT@p@ftn=\toks25
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/booktabs/booktabs.sty
+Package: booktabs 2020/01/12 v1.61803398 Publication quality tables
+\heavyrulewidth=\dimen169
+\lightrulewidth=\dimen170
+\cmidrulewidth=\dimen171
+\belowrulesep=\dimen172
+\belowbottomsep=\dimen173
+\aboverulesep=\dimen174
+\abovetopsep=\dimen175
+\cmidrulesep=\dimen176
+\cmidrulekern=\dimen177
+\defaultaddspace=\dimen178
+\@cmidla=\count320
+\@cmidlb=\count321
+\@aboverulesep=\dimen179
+\@belowrulesep=\dimen180
+\@thisruleclass=\count322
+\@lastruleclass=\count323
+\@thisrulewidth=\dimen181
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/tools/array.sty
+Package: array 2025/09/25 v2.6n Tabular extension package (FMi)
+\col@sep=\dimen182
+\ar@mcellbox=\box62
+\extrarowheight=\dimen183
+\NC@list=\toks26
+\extratabsurround=\skip60
+\backup@length=\skip61
+\ar@cellbox=\box63
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/tools/calc.sty
+Package: calc 2025/03/01 v4.3b Infix arithmetic (KKT,FJ)
+\calc@Acount=\count324
+\calc@Bcount=\count325
+\calc@Adimen=\dimen184
+\calc@Bdimen=\dimen185
+\calc@Askip=\skip62
+\calc@Bskip=\skip63
+LaTeX Info: Redefining \setlength on input line 86.
+LaTeX Info: Redefining \addtolength on input line 87.
+\calc@Ccount=\count326
+\calc@Cskip=\skip64
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/mdwtools/footnote.sty
+Package: footnote 1997/01/28 1.13 Save footnotes around boxes
+\fn@notes=\box64
+\fn@width=\dimen186
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics/graphicx.sty
+Package: graphicx 2024/12/31 v1.2e Enhanced LaTeX Graphics (DPC,SPQR)
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics/graphics.sty
+Package: graphics 2024/08/06 v1.4g Standard LaTeX Graphics (DPC,SPQR)
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics/trig.sty
+Package: trig 2023/12/02 v1.11 sin cos tan (DPC)
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/graphics-cfg/graphics.cfg
+File: graphics.cfg 2016/06/04 v1.11 sample graphics configuration
+)
+Package graphics Info: Driver file: xetex.def on input line 106.
+)
+\Gin@req@height=\dimen187
+\Gin@req@width=\dimen188
+)
+\pandoc@box=\box65
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/bookmark/bookmark.sty
+Package: bookmark 2023-12-10 v1.31 PDF bookmarks (HO)
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/hyperref/hyperref.sty
+Package: hyperref 2026-04-24 v7.01q Hypertext links for LaTeX
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/kvdefinekeys/kvdefinekeys.sty
+Package: kvdefinekeys 2019-12-19 v1.6 Define keys (HO)
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/pdfescape/pdfescape.sty
+Package: pdfescape 2019/12/09 v1.15 Implements pdfTeX's escape features (HO)
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/pdftexcmds/pdftexcmds.sty
+Package: pdftexcmds 2020-06-27 v0.33 Utility functions of pdfTeX for LuaTeX (HO)
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/infwarerr/infwarerr.sty
+Package: infwarerr 2019/12/03 v1.5 Providing info/warning/error messages (HO)
+)
+Package pdftexcmds Info: \pdf@primitive is available.
+Package pdftexcmds Info: \pdf@ifprimitive is available.
+Package pdftexcmds Info: \pdfdraftmode not found.
+)) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/hycolor/hycolor.sty
+Package: hycolor 2020-01-27 v1.10 Color options for hyperref/bookmark (HO)
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/hyperref/nameref.sty
+Package: nameref 2026-01-29 v2.58 Cross-referencing by name of section
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/refcount/refcount.sty
+Package: refcount 2019/12/15 v3.6 Data extraction from label references (HO)
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/gettitlestring/gettitlestring.sty
+Package: gettitlestring 2019/12/15 v1.6 Cleanup title references (HO)
+)
+\c@section@level=\count327
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/stringenc/stringenc.sty
+Package: stringenc 2019/11/29 v1.12 Convert strings between diff. encodings (HO)
+)
+\@linkdim=\dimen189
+\Hy@linkcounter=\count328
+\Hy@pagecounter=\count329
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/hyperref/pd1enc.def
+File: pd1enc.def 2026-04-24 v7.01q Hyperref: PDFDocEncoding definition (HO)
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/intcalc/intcalc.sty
+Package: intcalc 2019/12/15 v1.3 Expandable calculations with integers (HO)
+)
+\Hy@SavedSpaceFactor=\count330
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/hyperref/puenc.def
+File: puenc.def 2026-04-24 v7.01q Hyperref: PDF Unicode definition (HO)
+)
+Package hyperref Info: Option `unicode' set `true' on input line 4070.
+Package hyperref Info: Hyper figures OFF on input line 4199.
+Package hyperref Info: Link nesting OFF on input line 4204.
+Package hyperref Info: Hyper index ON on input line 4207.
+Package hyperref Info: Plain pages OFF on input line 4214.
+Package hyperref Info: Backreferencing OFF on input line 4219.
+Package hyperref Info: Implicit mode ON; LaTeX internals redefined.
+Package hyperref Info: Bookmarks ON on input line 4466.
+\c@Hy@tempcnt=\count331
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/url/url.sty
+\Urlmuskip=\muskip19
+Package: url 2013/09/16 ver 3.4 Verb mode for urls, etc.
+)
+LaTeX Info: Redefining \url on input line 4805.
+\XeTeXLinkMargin=\dimen190
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/bitset/bitset.sty
+Package: bitset 2019/12/09 v1.3 Handle bit-vector datatype (HO)
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/bigintcalc/bigintcalc.sty
+Package: bigintcalc 2019/12/15 v1.5 Expandable calculations on big integers (HO)
+))
+\Fld@menulength=\count332
+\Field@Width=\dimen191
+\Fld@charsize=\dimen192
+Package hyperref Info: Hyper figures OFF on input line 6091.
+Package hyperref Info: Link nesting OFF on input line 6096.
+Package hyperref Info: Hyper index ON on input line 6099.
+Package hyperref Info: backreferencing OFF on input line 6106.
+Package hyperref Info: Link coloring OFF on input line 6111.
+Package hyperref Info: Link coloring with OCG OFF on input line 6116.
+Package hyperref Info: PDF/A mode OFF on input line 6121.
+\Hy@abspage=\count333
+\c@Item=\count334
+\c@Hfootnote=\count335
+)
+Package hyperref Info: Driver (autodetected): hxetex.
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/hyperref/hxetex.def
+File: hxetex.def 2026-04-24 v7.01q Hyperref driver for XeTeX
+\pdfm@box=\box66
+\c@Hy@AnnotLevel=\count336
+\HyField@AnnotCount=\count337
+\Fld@listcount=\count338
+\c@bookmark@seq@number=\count339
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/rerunfilecheck/rerunfilecheck.sty
+Package: rerunfilecheck 2025-06-21 v1.11 Rerun checks for auxiliary files (HO)
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/generic/uniquecounter/uniquecounter.sty
+Package: uniquecounter 2019/12/15 v1.4 Provide unlimited unique counter (HO)
+)
+Package uniquecounter Info: New unique counter `rerunfilecheck' on input line 284.
+)
+\Hy@SectionHShift=\skip65
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/bookmark/bkm-dvipdfm.def
+File: bkm-dvipdfm.def 2023-12-10 v1.31 bookmark driver for dvipdfm (HO)
+\BKM@id=\count340
+)) (./nns-book.aux)
+\openout1 = `nns-book.aux'.
+
+LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 128.
+LaTeX Font Info: ... okay on input line 128.
+LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 128.
+LaTeX Font Info: ... okay on input line 128.
+LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 128.
+LaTeX Font Info: ... okay on input line 128.
+LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 128.
+LaTeX Font Info: ... okay on input line 128.
+LaTeX Font Info: Checking defaults for TS1/cmr/m/n on input line 128.
+LaTeX Font Info: ... okay on input line 128.
+LaTeX Font Info: Checking defaults for TU/lmr/m/n on input line 128.
+LaTeX Font Info: ... okay on input line 128.
+LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 128.
+LaTeX Font Info: ... okay on input line 128.
+LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 128.
+LaTeX Font Info: ... okay on input line 128.
+LaTeX Font Info: Checking defaults for PD1/pdf/m/n on input line 128.
+LaTeX Font Info: ... okay on input line 128.
+LaTeX Font Info: Checking defaults for PU/pdf/m/n on input line 128.
+LaTeX Font Info: ... okay on input line 128.
+LaTeX Font Info: Overwriting math alphabet `\mathrm' in version `normal'
+(Font) OT1/lmr/m/n --> TU/lmr/m/n on input line 128.
+LaTeX Font Info: Overwriting math alphabet `\mathit' in version `normal'
+(Font) OT1/lmr/m/it --> TU/lmr/m/it on input line 128.
+LaTeX Font Info: Overwriting math alphabet `\mathbf' in version `normal'
+(Font) OT1/lmr/bx/n --> TU/lmr/bx/n on input line 128.
+LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `normal'
+(Font) OT1/lmss/m/n --> TU/lmss/m/n on input line 128.
+LaTeX Font Info: Overwriting math alphabet `\mathsf' in version `bold'
+(Font) OT1/lmss/bx/n --> TU/lmss/bx/n on input line 128.
+LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `normal'
+(Font) OT1/lmtt/m/n --> TU/lmtt/m/n on input line 128.
+LaTeX Font Info: Overwriting math alphabet `\mathtt' in version `bold'
+(Font) OT1/lmtt/m/n --> TU/lmtt/bx/n on input line 128.
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) Font family 'latinmodern-math.otf(0)' created for font
+(fontspec) 'latinmodern-math.otf' with options
+(fontspec) [Scale=MatchLowercase,BoldItalicFont={},ItalicFont={},SmallCapsFont={},Script=Math,BoldFont={latinmodern-math.otf}].
+(fontspec)
+(fontspec) This font family consists of the following NFSS
+(fontspec) series/shapes:
+(fontspec)
+(fontspec) - 'normal' (m/n) with NFSS spec.:
+(fontspec) <->s*[0.9999964596882403]"[latinmodern-math.otf]/OT:script=math;language=dflt;"
+(fontspec) - 'bold' (b/n) with NFSS spec.:
+(fontspec) <->s*[0.9999964596882403]"[latinmodern-math.otf]/OT:script=math;language=dflt;"
+
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(0)/m/n' will be
+(Font) scaled to size 10.0pt on input line 128.
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) Font family 'latinmodern-math.otf(1)' created for font
+(fontspec) 'latinmodern-math.otf' with options
+(fontspec) [Scale=MatchLowercase,BoldItalicFont={},ItalicFont={},SmallCapsFont={},Script=Math,SizeFeatures={{Size=8.5-},{Size=6-8.5,Font=latinmodern-math.otf,Style=MathScript},{Size=-6,Font=latinmodern-math.otf,Style=MathScriptScript}},BoldFont={latinmodern-math.otf}].
+(fontspec)
+(fontspec) This font family consists of the following NFSS
+(fontspec) series/shapes:
+(fontspec)
+(fontspec) - 'normal' (m/n) with NFSS spec.:
+(fontspec) <8.5->s*[0.9999964596882403]"[latinmodern-math.otf]/OT:script=math;language=dflt;"<6-8.5>s*[0.9999964596882403]"[latinmodern-math.otf]/OT:script=math;language=dflt;+ssty=0;"<-6>s*[0.9999964596882403]"[latinmodern-math.otf]/OT:script=math;language=dflt;+ssty=1;"
+(fontspec) - 'bold' (b/n) with NFSS spec.:
+(fontspec) <->s*[0.9999964596882403]"[latinmodern-math.otf]/OT:script=math;language=dflt;"
+
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font) scaled to size 10.0pt on input line 128.
+LaTeX Font Info: Encoding `OT1' has changed to `TU' for symbol font
+(Font) `operators' in the math version `normal' on input line 128.
+LaTeX Font Info: Overwriting symbol font `operators' in version `normal'
+(Font) OT1/lmr/m/n --> TU/latinmodern-math.otf(1)/m/n on input line 128.
+LaTeX Font Info: Encoding `OT1' has changed to `TU' for symbol font
+(Font) `operators' in the math version `bold' on input line 128.
+LaTeX Font Info: Overwriting symbol font `operators' in version `bold'
+(Font) OT1/lmr/bx/n --> TU/latinmodern-math.otf(1)/b/n on input line 128.
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 1.000096459334209.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 1.000096459334209.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 1.000096459334209.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 1.000096459334209.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 1.000096459334209.
+
+
+Package fontspec Info:
+(fontspec) Font family 'latinmodern-math.otf(2)' created for font
+(fontspec) 'latinmodern-math.otf' with options
+(fontspec) [Scale=MatchLowercase,BoldItalicFont={},ItalicFont={},SmallCapsFont={},Script=Math,SizeFeatures={{Size=8.5-},{Size=6-8.5,Font=latinmodern-math.otf,Style=MathScript},{Size=-6,Font=latinmodern-math.otf,Style=MathScriptScript}},BoldFont={latinmodern-math.otf},ScaleAgain=1.0001,FontAdjustment={\fontdimen
+(fontspec) 8\font =6.77pt\relax \fontdimen 9\font =3.94pt\relax
+(fontspec) \fontdimen 10\font =4.44pt\relax \fontdimen 11\font
+(fontspec) =6.86pt\relax \fontdimen 12\font =3.45pt\relax
+(fontspec) \fontdimen 13\font =3.63pt\relax \fontdimen 14\font
+(fontspec) =3.63pt\relax \fontdimen 15\font =2.89pt\relax
+(fontspec) \fontdimen 16\font =2.47pt\relax \fontdimen 17\font
+(fontspec) =2.47pt\relax \fontdimen 18\font =2.5pt\relax
+(fontspec) \fontdimen 19\font =2.0pt\relax \fontdimen 22\font
+(fontspec) =2.5pt\relax \fontdimen 20\font =0pt\relax \fontdimen
+(fontspec) 21\font =0pt\relax }].
+(fontspec)
+(fontspec) This font family consists of the following NFSS
+(fontspec) series/shapes:
+(fontspec)
+(fontspec) - 'normal' (m/n) with NFSS spec.:
+(fontspec) <8.5->s*[1.000096459334209]"[latinmodern-math.otf]/OT:script=math;language=dflt;"<6-8.5>s*[1.000096459334209]"[latinmodern-math.otf]/OT:script=math;language=dflt;+ssty=0;"<-6>s*[1.000096459334209]"[latinmodern-math.otf]/OT:script=math;language=dflt;+ssty=1;"
+(fontspec) - 'bold' (b/n) with NFSS spec.:
+(fontspec) <->s*[1.000096459334209]"[latinmodern-math.otf]/OT:script=math;language=dflt;"
+
+LaTeX Font Info: Encoding `OMS' has changed to `TU' for symbol font
+(Font) `symbols' in the math version `normal' on input line 128.
+LaTeX Font Info: Overwriting symbol font `symbols' in version `normal'
+(Font) OMS/lmsy/m/n --> TU/latinmodern-math.otf(2)/m/n on input line 128.
+LaTeX Font Info: Encoding `OMS' has changed to `TU' for symbol font
+(Font) `symbols' in the math version `bold' on input line 128.
+LaTeX Font Info: Overwriting symbol font `symbols' in version `bold'
+(Font) OMS/lmsy/b/n --> TU/latinmodern-math.otf(2)/b/n on input line 128.
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9998964600422715.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9998964600422715.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9998964600422715.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9998964600422715.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9999964596882403.
+
+
+Package fontspec Info:
+(fontspec) latinmodern-math scale = 0.9998964600422715.
+
+
+Package fontspec Info:
+(fontspec) Font family 'latinmodern-math.otf(3)' created for font
+(fontspec) 'latinmodern-math.otf' with options
+(fontspec) [Scale=MatchLowercase,BoldItalicFont={},ItalicFont={},SmallCapsFont={},Script=Math,SizeFeatures={{Size=8.5-},{Size=6-8.5,Font=latinmodern-math.otf,Style=MathScript},{Size=-6,Font=latinmodern-math.otf,Style=MathScriptScript}},BoldFont={latinmodern-math.otf},ScaleAgain=0.9999,FontAdjustment={\fontdimen
+(fontspec) 8\font =0.4pt\relax \fontdimen 9\font =2.0pt\relax
+(fontspec) \fontdimen 10\font =1.67pt\relax \fontdimen 11\font
+(fontspec) =1.11pt\relax \fontdimen 12\font =6.0pt\relax
+(fontspec) \fontdimen 13\font =0pt\relax }].
+(fontspec)
+(fontspec) This font family consists of the following NFSS
+(fontspec) series/shapes:
+(fontspec)
+(fontspec) - 'normal' (m/n) with NFSS spec.:
+(fontspec) <8.5->s*[0.9998964600422715]"[latinmodern-math.otf]/OT:script=math;language=dflt;"<6-8.5>s*[0.9998964600422715]"[latinmodern-math.otf]/OT:script=math;language=dflt;+ssty=0;"<-6>s*[0.9998964600422715]"[latinmodern-math.otf]/OT:script=math;language=dflt;+ssty=1;"
+(fontspec) - 'bold' (b/n) with NFSS spec.:
+(fontspec) <->s*[0.9998964600422715]"[latinmodern-math.otf]/OT:script=math;language=dflt;"
+
+LaTeX Font Info: Encoding `OMX' has changed to `TU' for symbol font
+(Font) `largesymbols' in the math version `normal' on input line 128.
+LaTeX Font Info: Overwriting symbol font `largesymbols' in version `normal'
+(Font) OMX/lmex/m/n --> TU/latinmodern-math.otf(3)/m/n on input line 128.
+LaTeX Font Info: Encoding `OMX' has changed to `TU' for symbol font
+(Font) `largesymbols' in the math version `bold' on input line 128.
+LaTeX Font Info: Overwriting symbol font `largesymbols' in version `bold'
+(Font) OMX/lmex/m/n --> TU/latinmodern-math.otf(3)/b/n on input line 128.
+LaTeX Info: Redefining \microtypecontext on input line 128.
+Package microtype Info: Applying patch `item' on input line 128.
+Package microtype Info: Applying patch `toc' on input line 128.
+Package microtype Info: Applying patch `eqnum' on input line 128.
+Package microtype Info: Applying patch `footnote' on input line 128.
+Package microtype Info: Applying patch `verbatim' on input line 128.
+LaTeX Info: Redefining \microtypesetup on input line 128.
+Package microtype Info: Character protrusion enabled (level 2).
+Package microtype Info: Using protrusion set `basicmath'.
+Package microtype Info: No adjustment of tracking.
+Package microtype Info: No adjustment of spacing.
+Package microtype Info: No adjustment of kerning.
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/microtype/mt-LatinModernRoman.cfg
+File: mt-LatinModernRoman.cfg 2026/02/26 v1.2 microtype config. file: Latin Modern Roman (RS)
+)
+Package hyperref Info: Link coloring OFF on input line 128.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font) scaled to size 12.0pt on input line 130.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font) scaled to size 8.0pt on input line 130.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font) scaled to size 6.0pt on input line 130.
+LaTeX Font Info: Trying to load font information for OML+lmm on input line 130.
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/lm/omllmm.fd
+File: omllmm.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
+)
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font) scaled to size 12.0011pt on input line 130.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font) scaled to size 8.00073pt on input line 130.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font) scaled to size 6.00055pt on input line 130.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(3)/m/n' will be
+(Font) scaled to size 11.99872pt on input line 130.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(3)/m/n' will be
+(Font) scaled to size 7.99915pt on input line 130.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(3)/m/n' will be
+(Font) scaled to size 5.99936pt on input line 130.
+LaTeX Font Info: Trying to load font information for U+msa on input line 130.
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsfonts/umsa.fd
+File: umsa.fd 2013/01/14 v3.01 AMS symbols A
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/microtype/mt-msa.cfg
+File: mt-msa.cfg 2006/02/04 v1.1 microtype config. file: AMS symbols (a) (RS)
+)
+LaTeX Font Info: Trying to load font information for U+msb on input line 130.
+(c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/amsfonts/umsb.fd
+File: umsb.fd 2013/01/14 v3.01 AMS symbols B
+) (c:/Users/fredv/AppData/Roaming/TinyTeX/texmf-dist/tex/latex/microtype/mt-msb.cfg
+File: mt-msb.cfg 2005/06/01 v1.0 microtype config. file: AMS symbols (b) (RS)
+) [1
+
+
+] [2
+
+] (./nns-book.toc
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font) scaled to size 7.0pt on input line 2.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(1)/m/n' will be
+(Font) scaled to size 5.0pt on input line 2.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font) scaled to size 10.00092pt on input line 2.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font) scaled to size 7.00064pt on input line 2.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(2)/m/n' will be
+(Font) scaled to size 5.00046pt on input line 2.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(3)/m/n' will be
+(Font) scaled to size 9.99893pt on input line 2.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(3)/m/n' will be
+(Font) scaled to size 6.99925pt on input line 2.
+LaTeX Font Info: Font shape `TU/latinmodern-math.otf(3)/m/n' will be
+(Font) scaled to size 4.99947pt on input line 2.
+[3] [4] [5] [6] [7] [8] [9] [10])
+\tf@toc=\write4
+\openout4 = `nns-book.toc'.
+
+[11] [12
+
+] [13]
+Underfull \hbox (badness 10000) in paragraph at lines 198--199
+[][]$[][][][][] [] [] [] [][][][][][] [] [][][] [] [][][][] [][] [][][][][][][][][] [] [][][] [] [][][][] [] [][][] [][] [][][][] [][] [][][][][][][] []
+ []
+
+[14]
+Chapter 1.
+
+Underfull \vbox (badness 1014) has occurred while \output is active []
+
+[15
+
+] [16] [17] [18] [19] [20]
+Chapter 2.
+[21
+
+] [22] [23] [24] [25] [26] [27] [28
+
+]
+Chapter 3.
+[29] [30]
+LaTeX Font Info: Font shape `TU/lmtt/bx/n' in size <10> not available
+(Font) Font shape `TU/lmtt/b/n' tried instead on input line 920.
+File: images/ch3_cdf_lpm0.png Graphic file (type bmp)
+
+[31] [32] [33]
+Underfull \hbox (badness 4144) in paragraph at lines 1083--1083
+[]\TU/lmr/bx/n/12 Lower-Tail Thresholds as Degree-Zero Partial-
+ []
+
+[34] [35] [36] [37] [38
+
+]
+Chapter 4.
+[39] [40] [41] [42] [43] [44]
+Overfull \hbox (11.9321pt too wide) detected at line 1492
+[] [] [] [] [][][]
+ []
+
+[45]
+Overfull \hbox (73.20363pt too wide) detected at line 1530
+[][] [][] [][]
+ []
+
+[46] [47] [48
+
+]
+Chapter 5.
+[49]
+Underfull \hbox (badness 10000) in paragraph at lines 1596--1598
+
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 1598--1600
+
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 1600--1602
+
+ []
+
+
+Overfull \hbox (7.22562pt too wide) has occurred while \output is active
+\TU/lmr/m/n/10 50 \TU/lmr/m/sl/10 CHAPTER 5. CLASSICAL MOMENTS AS DIRECTIONAL AGGREGATES
+ []
+
+[50] [51]
+Overfull \hbox (7.22562pt too wide) has occurred while \output is active
+\TU/lmr/m/n/10 52 \TU/lmr/m/sl/10 CHAPTER 5. CLASSICAL MOMENTS AS DIRECTIONAL AGGREGATES
+ []
+
+[52] [53]
+Overfull \hbox (7.22562pt too wide) has occurred while \output is active
+\TU/lmr/m/n/10 54 \TU/lmr/m/sl/10 CHAPTER 5. CLASSICAL MOMENTS AS DIRECTIONAL AGGREGATES
+ []
+
+[54] [55]
+Overfull \hbox (7.22562pt too wide) has occurred while \output is active
+\TU/lmr/m/n/10 56 \TU/lmr/m/sl/10 CHAPTER 5. CLASSICAL MOMENTS AS DIRECTIONAL AGGREGATES
+ []
+
+[56
+
+]
+Chapter 6.
+
+Underfull \hbox (badness 10000) in paragraph at lines 1923--1925
+
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 1925--1927
+
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 1927--1929
+
+ []
+
+[57]
+Underfull \hbox (badness 10000) in paragraph at lines 1962--1964
+
+ []
+
+Missing character: There is no σ (U+03C3) in font [lmroman10-regular]:mapping=tex-text;!
+
+Underfull \hbox (badness 10000) in paragraph at lines 1964--1966
+
+ []
+
+[58]
+Underfull \hbox (badness 10000) in paragraph at lines 2012--2014
+
+ []
+
+[59] [60] [61] [62] [63] [64]
+Chapter 7.
+[65
+
+]
+Underfull \hbox (badness 10000) in paragraph at lines 2319--2321
+
+ []
+
+[66] [67]
+Underfull \hbox (badness 10000) in paragraph at lines 2444--2446
+
+ []
+
+[68] [69] [70]
+Underfull \hbox (badness 10000) in paragraph at lines 2604--2606
+
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 2606--2608
+
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 2608--2610
+
+ []
+
+[71] [72] [73] [74
+
+]
+Chapter 8.
+
+Underfull \hbox (badness 10000) in paragraph at lines 2673--2675
+
+ []
+
+[75] [76] [77] [78] [79]
+Underfull \hbox (badness 10000) in paragraph at lines 2924--2926
+
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 2926--2928
+
+ []
+
+
+Overfull \hbox (27.84775pt too wide) detected at line 2939
+[][][][] [] [] [] [][][][] [][][][] [] [][][] [] [] [] [] [] [] [] [][] [] [][][][] [][][][] [] [][][] [] [] [] [] [] [] [] [][] [] [][][][]
+ []
+
+[80] [81] [82] [83] [84
+
+]
+Chapter 9.
+[85] [86] [87]
+Overfull \hbox (31.036pt too wide) detected at line 3294
+[][][][] [][] [] [][][][][] [][] [] [][][][][] [][] [] [][][][][] [][] [] [][][][][] [][][]
+ []
+
+[88] [89] [90]
+Overfull \hbox (93.31343pt too wide) has occurred while \output is active
+\TU/lmr/m/sl/10 9.5. GRAM-MATRIX STRUCTURE OF CONCORDANT CO-PARTIAL MOMENT MATRICES \TU/lmr/m/n/10 91
+ []
+
+[91] [92] [93] [94] [95] [96
+
+]
+Chapter 10.
+[97] [98] [99] [100]
+
+LaTeX Font Warning: Font shape `TU/lmtt/bx/it' in size <10> not available
+(Font) Font shape `TU/lmtt/b/sl' tried instead on input line 3990.
+
+[101] [102] [103] [104]
+Chapter 11.
+
+Overfull \hbox (91.39001pt too wide) detected at line 4153
+[]
+ []
+
+[105
+
+] [106]
+Overfull \hbox (6.46002pt too wide) detected at line 4284
+[]
+ []
+
+[107]
+Underfull \hbox (badness 2478) in paragraph at lines 4355--4355
+[]\TU/lmr/bx/n/14.4 Between-Within Covariance Decomposi-
+ []
+
+[108] [109]
+Overfull \hbox (4.67207pt too wide) detected at line 4519
+[] [] [] [] []
+ []
+
+[110] [111] [112] [113]
+Overfull \hbox (16.9pt too wide) detected at line 4728
+[]
+ []
+
+[114]
+Overfull \hbox (140.76006pt too wide) detected at line 4789
+[]
+ []
+
+
+Overfull \hbox (24.50304pt too wide) detected at line 4808
+[][][][][][][]
+ []
+
+[115] [116]
+Overfull \hbox (64.5pt too wide) in paragraph at lines 4933--4933
+[]\TU/lmtt/m/n/10 ## quadrant n p mean_x mean_y u_x u_y lambda_rank1[]
+ []
+
+
+Overfull \hbox (64.5pt too wide) in paragraph at lines 4933--4933
+[]\TU/lmtt/m/n/10 ## 1 CUPM 3732 0.3732 0.909351 0.902914 0.911723 0.911077 0.619997[]
+ []
+
+[117]
+Overfull \hbox (64.5pt too wide) in paragraph at lines 4933--4933
+[]\TU/lmtt/m/n/10 ## 2 CLPM 3779 0.3779 -0.900760 -0.915622 -0.898388 -0.907459 0.616197[]
+ []
+
+
+Overfull \hbox (64.5pt too wide) in paragraph at lines 4933--4933
+[]\TU/lmtt/m/n/10 ## 3 DLPM 1232 0.1232 0.464620 -0.490819 0.466992 -0.482655 0.055568[]
+ []
+
+
+Overfull \hbox (64.5pt too wide) in paragraph at lines 4933--4933
+[]\TU/lmtt/m/n/10 ## 4 DUPM 1257 0.1257 -0.466076 0.488080 -0.463704 0.496243 0.057983[]
+ []
+
+[118] [119]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+[120]
+Underfull \vbox (badness 10000) detected at line 5198
+ []
+
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+[121]
+File: nns-book_files/figure-latex/clpm-mean-slope-figure-1.pdf Graphic file (type pdf)
+