Skip to content

[mathcore] Fix TKDTreeBinning::FindBin returning out-of-range bins #22649

Open
guitargeek wants to merge 2 commits into
root-project:masterfrom
guitargeek:issue-10784
Open

[mathcore] Fix TKDTreeBinning::FindBin returning out-of-range bins #22649
guitargeek wants to merge 2 commits into
root-project:masterfrom
guitargeek:issue-10784

Conversation

@guitargeek

@guitargeek guitargeek commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

TKDTreeBinning computed fNBins independently from the kd-tree it builds. SetNBins picks a bucket size of fDataSize / nBins (integer division), but the underlying TKDTree creates ceil(fNPoints / bucketSize) terminal nodes. When the requested number of bins does not divide the data size evenly, the bucket size is rounded down and the tree ends up with more terminal nodes than fNBins. Since FindBin returns a terminal-node index, it could return a value >= GetNBins(), pointing at a bin with no edges and no content:

binning.GetNBins()                         # 1001
binning.FindBin([1.2, 0.5, 0.5, 0.5, 0.5]) # 1004
binning.GetBinContent(1004)                # RuntimeWarning: No such bin

Set fNBins to the actual number of terminal nodes (GetNNodes() + 1) after building the tree, so FindBin always returns an index within [0, GetNBins()) and every bin has valid edges and content. This also self-corrects existing objects on read-back, as the Streamer rebuilds the tree via SetNBins.

Also fix SetBinsContent to read the per-node point counts directly from the tree (GetNPointsNode) instead of an incorrect last-bin heuristic, so bin contents are exact and sum to the data size.

Add a regression test covering the non-evenly-dividing case.

Closes #10786.

🤖 Done with the help of AI.

The test had no failure return code, so errors would go undetected.
TKDTreeBinning computed fNBins independently from the kd-tree it builds.
SetNBins picks a bucket size of fDataSize / nBins (integer division), but
the underlying TKDTree creates ceil(fNPoints / bucketSize) terminal nodes.
When the requested number of bins does not divide the data size evenly, the
bucket size is rounded down and the tree ends up with more terminal nodes
than fNBins. Since FindBin returns a terminal-node index, it could return a
value >= GetNBins(), pointing at a bin with no edges and no content:

```c++
binning.GetNBins()                         # 1001
binning.FindBin([1.2, 0.5, 0.5, 0.5, 0.5]) # 1004
binning.GetBinContent(1004)                # RuntimeWarning: No such bin
```

Set fNBins to the actual number of terminal nodes (GetNNodes() + 1) after
building the tree, so FindBin always returns an index within
[0, GetNBins()) and every bin has valid edges and content. This also
self-corrects existing objects on read-back, as the Streamer rebuilds the
tree via SetNBins.

Also fix SetBinsContent to read the per-node point counts directly from the
tree (GetNPointsNode) instead of an incorrect last-bin heuristic, so bin
contents are exact and sum to the data size.

Add a regression test covering the non-evenly-dividing case.

Closes root-project#10786.

🤖 Done with the help of AI.
@guitargeek guitargeek linked an issue Jun 18, 2026 that may be closed by this pull request
1 task
@github-actions

Copy link
Copy Markdown

Test Results

    22 files      22 suites   3d 19h 24m 34s ⏱️
 3 869 tests  3 848 ✅   0 💤 21 ❌
76 423 runs  76 257 ✅ 145 💤 21 ❌

For more details on these failures, see this check.

Results for commit 1c6eafc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TKDTreeBinning missing bins

1 participant