Skip to content

C++ search#210

Draft
ms609 wants to merge 660 commits into
mainfrom
cpp-search
Draft

C++ search#210
ms609 wants to merge 660 commits into
mainfrom
cpp-search

Conversation

@ms609
Copy link
Copy Markdown
Owner

@ms609 ms609 commented Mar 19, 2026

  • other optimizations + features

Manual testing underway; shiny app in particular has some usability issues.

@ms609 ms609 marked this pull request as draft March 25, 2026 14:21
ms609 added 29 commits March 27, 2026 07:01
- Add pipeline step 5a to driven search list (post-ratchet XSS+RSS+CSS)
- Add 'Post-ratchet sectorial pass' subsection in sectorial search section
- Fix stale consensusStableReps preset documentation (T-264 disabled it)
- Update AGENTS.md pipeline with steps 5a, 6 (NNI-perturb), 8 (PCSA), 10-11
…(S-RED focus 10)

When divided_steps puts s > info_max_steps, compute_profile() caps the
cost at info_amounts[info_max_steps-1], but precompute_profile_delta used
old_cost = 0.0 for any s outside [1, info_max_steps]. This overestimated
the delta, causing overly conservative candidate rejection in profile TBR
screening. Fix: mirror the capping logic from compute_profile().

Conservative bug: no incorrect trees accepted, only missed improvements
on unusually homoplastic characters. Very low practical impact.
Regression test added (test-ts-iw-profile-red10.R).
…date AGENTS.md worktree table, triage u.005 (interleaved sectorial design rationale added to T-269 notes)
… T-235 SPR fix verified, flat_blocks.active_mask latent noted
T-266: Taxon pruning-reinsertion perturbation
ms609 and others added 5 commits May 18, 2026 13:07
In R CMD check, R runs as a non-interactive subprocess with captured
stdout. R_FlushConsole() calls fflush() on that pipe; when the buffer
fills the call blocks indefinitely, causing the 6 h GHA timeout seen
on every ubuntu runner for PR #210.

Gate the \r-overwrite progress line and the flush behind R_Interactive
(FALSE in batch/check contexts). Interactive sessions are unchanged.
At verbosity >= 2 in batch mode, emit plain \n-terminated lines so
diagnostic logs still carry progress detail without the flush risk.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…taset

S-RED area-5 finding: ts_sector.cpp's build_reduced_dataset() did not copy
flat_blocks or all_weight_one from the full DataSet. Since use_flat =
ds.all_weight_one is checked at the top of every tbr_search() call, sector
TBR always used the slower non-flat path even when all characters have
weight=1. Fixing all_weight_one without flat_blocks would have triggered UB
(flat functions dereference flat_blocks.data()); both fixed together.

Also:
- profiling round 1: Ratchet area (#2) PROFILED — 62% of search time via
  verbosity=2 phase data; VTune not available for per-function hotspot;
  T-300 (lazy rescore) is the actionable follow-up
- focus-areas.md: NNI-perturb → AT-LIMIT (disabled via T-274), Ratchet →
  PROFILED with 2026-05-18 baseline (2.80 s/rep Zhu2013 thorough)
- red-team.md last_focus → area 5 with full findings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ms609 added a commit that referenced this pull request May 18, 2026
In R CMD check, R runs as a non-interactive subprocess with captured
stdout. R_FlushConsole() calls fflush() on that pipe; when the buffer
fills the call blocks indefinitely, causing the 6 h GHA timeout seen
on every ubuntu runner for PR #210.

Gate the \r-overwrite progress line and the flush behind R_Interactive
(FALSE in batch/check contexts). Interactive sessions are unchanged.
At verbosity >= 2 in batch mode, emit plain \n-terminated lines so
diagnostic logs still carry progress detail without the flush risk.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ms609 and others added 22 commits May 18, 2026 23:09
Replace full_rescore() on the accept path with two incremental
downpass+uppass pairs from nz (clip grandparent) and nx (spare/regraft
node) — the two stale chains after apply_tbr_move in the SPR case.
TBR moves (with non-trivial rerooting) still fall back to full_rescore.

DEBUG_RESCORE guard enabled: asserts incremental == full_rescore on
every accepted SPR move. Remove #define once GHA passes cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ash on linux

Incremental SPR rescore systematically underestimates score by 3 steps
(observed on ubuntu GHA; diff=-3 consistent across all test cases).
Root cause not identified; full_rescore restored as the safe fallback.
T-300 to be redesigned with per-node local_cost instrumentation offline.

R_Interactive guard in ts_parallel.cpp is kept (separate fix, working).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Four tests in test-ts-stopping.R and one in test-ts-xpiwe.R loaded
inapplicable.datasets but then used inapplicable.phyData, a different
object. Tests only passed because a prior test had already loaded
inapplicable.phyData into the environment. Changed to load the correct
object (inapplicable.phyData) so tests are self-contained.

Found by red-team area #8 review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ailure

ts_drift.cpp:680 — second drift_apply_tbr_move call in the EW RFD re-apply
path lacked drift_restore_topology() before build_postorder() on failure.
Mirrors the correct pattern at line 589. Path is theoretically unreachable
(same args succeed on first apply → succeed on second), but guards against
partial step-1 detach leaving an invalid tree if the path were ever reached.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace full_rescore() on the SPR accept path with a single-pass postorder
walk over the union of paths nz→root and nx→root.  Each affected node is
visited exactly once, reading current children's prelims — no shared-
ancestor double-read, no IW arithmetic on stale state.

This supersedes the reverted overlap-chain implementation (1e3fc9a),
which underestimated EW length by exactly 3 steps on every accepted move.
Root cause of the −3 was never identified analytically; the dirty-set
algorithm sidesteps the question by construction.

New functions in ts_fitch.cpp:
  fitch_dirty_downpass(tree, ds, start_a, start_b) — postorder rescore over
    the dirty set; returns EW length delta.
  fitch_dirty_uppass(tree, ds, start_a, start_b)   — companion final_ pass,
    seeded from the same start points, propagating downward.

Wired into tbr_search at ts_tbr.cpp:1138, gated on is_spr && !has_na.
TBR moves with non-trivial rerooting and datasets with inapplicable
characters continue to use full_rescore (NA variant deferred).

DEBUG_RESCORE cross-check left enabled: every accepted SPR move asserts
incremental == full_rescore.  Remove the define in a follow-up after one
clean GHA cycle.

Local validation (Zhu2013, congreveLamsdellMatrices, synthetic 30×25 EW):
  - 92 tests passed across test-ts-tbr-search, test-ts-tbr-symmetry,
    test-ts-ratchet-search, test-ts-ratchet-opt.
  - 0 DEBUG_RESCORE mismatches across EW + IW paths with multiple SPR
    accepts per replicate (scores 173, 192, 9.140673).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Profiling round 2 — area #4 (TBR full-rescore at acceptance):
- VTune confirmed full_rescore is 19.2% self-time (fitch_na_score 0.585 s
  + load_tip_states 0.032 s) on Zhu2013 thorough preset, n_threads=1.
- All time flows through tbr_search line 1138 (accept path), confirming
  T-300 as the right target.  Zhu2013 has inapplicable characters so the
  initial T-300 commit (EW only) does not benefit it; an NA variant is
  the next follow-up.
- Secondary candidate identified: ts::simd::any_hit_reduce_avx2 at 9.6%.

Adds dev/profiling/drivers/tbr-rescore.R (Zhu2013 driver, ~3.9 s bare).
Updates baselines.md, log.md, focus-areas.md, findings.md.
RESUME.md refreshed for hand-off after T-300 push.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a temporary full_rescore cross-check on every NNI incremental delta,
gated to EW (IW has a separate pre-existing arithmetic bug, see below).
Together with DEBUG_RESCORE in tbr_search, this tests whether the
fitch_incremental_downpass single-chain primitive is the source of the
−3 that took down the previous T-300 attempt.

Local validation (Vinther2008 EW + synthetic 30×25 EW, 5 replicates each):
  0 DEBUG_NNI_RESCORE assertions on the EW path.

Interpretation: the single-chain primitive is correct.  The −3 in the
reverted T-300 must have arisen from the two-chain composition itself,
not from a bug in fitch_incremental_downpass.  Both my analytical
argument (deltas cancel) and the dirty-set empirical result are
consistent with the primitive being sound; some interaction *between*
chains (e.g. shared save_node_state ordering, or a read at LCA I haven't
isolated) produced the systematic underestimate.  The dirty-set
algorithm avoids the question by construction.

Separate finding while wiring this assertion: nni_search at line 83
computes `new_score = best_score + delta` unconditionally — but `delta`
is an integer EW step delta from fitch_incremental_downpass, and
`best_score` for IW is a float weighted score.  The IW NNI path
therefore reports wrong intermediate scores (diffs of 0.01–2.0 against
full_rescore in local testing).  Impact is bounded — nni_search() ends
with a final full_rescore, so the returned score is correct, but
intermediate accept/reject decisions under IW use a wrong predicate.
Filed as a follow-up to fix via extract_char_steps + compute_weighted_score
(same pattern as tbr_search accept path).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The NNI accept-path in nni_search() added an integer EW step delta to a
float weighted score whenever ds.concavity was finite, producing wrong
intermediate scores under IW/profile parsimony.  Empirically, a 30x25
synthetic EW phyDat with concavity=10 triggered 17+ DEBUG_NNI_RESCORE
mismatches per replicate (non-integer diffs like 0.96, -1.94); the EW
path emitted zero.

After fitch_incremental_downpass, tree.local_cost is correct for the
whole tree (NNI only changes children at edge c; off-chain nodes retain
valid local_cost from the score_tree at function entry).  Mirror the
T-300 SPR accept-path pattern from ts_tbr.cpp: under std::isfinite(
ds.concavity), call extract_char_steps + compute_weighted_score (which
dispatches IW vs profile by ds.scoring_mode).  EW path keeps
best_score + delta.

Also remove the EW-only gate around DEBUG_NNI_RESCORE — it was there
purely to mask the now-fixed IW bug.  Leaving the assertion live on
both paths catches any regression in either branch.

Local: dev/plans/t300-test.R clean on both EW and IW; full
test-ts-(nni|search|driven|tbr) testthat suites pass with no
DEBUG_NNI_RESCORE emissions.

T-300 follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two GHA cycles (R-CMD-check 26078069315, ASAN 26078069318, post-NNI-fix
combined 26078652587) confirm zero assertions across all platforms on
both EW and IW paths.  The cross-check scaffolds have served their
purpose: the dirty-set rescore in tbr_search and the corrected NNI
incremental in nni_search both match full_rescore exactly.

Removes:
- src/ts_tbr.cpp:23 #define DEBUG_RESCORE
- src/ts_tbr.cpp:1166–1176 #ifdef block (full_rescore + actual = ref)
- src/ts_search.cpp:20 #define DEBUG_NNI_RESCORE
- src/ts_search.cpp:103–119 #ifdef block (save/full_rescore/restore)

The actual rescore work (extract_char_steps + compute_weighted_score
branch for IW, dirty-set delta for EW) is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds NA-aware dirty-set Pass 1 + Pass 2 (fitch_na_dirty_downpass and
fitch_na_dirty_uppass) mirroring the EW dirty-set, and wires them into
tbr_search's SPR accept path under has_inapplicable.  Pass 3 still runs
over the full tree because it populates internal down2 (read by
extract_char_steps) and counts NA-block steps; savings come from
skipping Pass 1 + Pass 2 on off-dirty nodes.

Bug found during validation: full_rescore for NA EW data routes through
fitch_score_ew which adds ds.ew_offset (topology-independent step count)
on top of fitch_na_score's return.  My accept path was reading
fitch_na_pass3_score directly without the offset — producing a
systematic diff=−3 on Vinther2008 (where ew_offset = 3).  Fix: add
ds.ew_offset in the EW branch (line 1183).  IW path is unaffected
because compute_weighted_score handles per-pattern step contributions
directly.

Validation: DEBUG_NA_RESCORE cross-check kept for one GHA cycle.
Local Vinther2008 run, 5 replicates each EW + IW:
  DEBUG_RESCORE     (EW, IW):      0 / 0 mismatches
  DEBUG_NA_RESCORE  (NA-EW, NA-IW): 0 / 0 mismatches

Also includes dev/plans/t300-test.R (the four-path test harness) and
dev/plans/pr244-examples-hang-brief.md (background-agent brief for the
separate PR #244 hang investigation).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Threads ConcordanceTable() into the PaintCharacters example so the two
functions appear together in the rendered example block.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
R CMD check flagged R_Interactive as a non-API entry point.  Switch to
isatty(fileno(stdout)) (POSIX) / _isatty(_fileno(stdout)) (Windows),
which is the public equivalent: a captured pipe is never a TTY, an
interactive console is.  Preserves the deadlock fix from b186e80
(no flush in batch mode) without using internal R symbols.

Also wrap the slow TreeLength examples (profile parsimony,
random-tree scoring, HSJ block) in \donttest{}: the example block
was clocking >5 s elapsed on CI runners.
Fixes negative-delta scoring in LengthAdded() by unwrapping qmApp to a scalar when multiple all-applicable rows exist in the contrast matrix.

Also keys the contrast-row validity check on used tokens (so stale zero-sum rows for unused tokens no longer error) and adds a regression test for the qmApp fix and a 6-token error-message check via Asher2005[, 67].

The examples-hang on devel/arm/macOS that blocked merge for several days was caused by R_Interactive being a non-API symbol; the portable fix landed on cpp-search as 2b6b6be and was pulled in via merge f59c490.
GHA across all platforms confirmed the NA dirty-set rescore matches
full_rescore exactly (zero DEBUG_NA_RESCORE mismatches).  Drop the
guarded cross-check block and the define, mirroring the EW cleanup
in 44a4ebe.
Drops the NA-aware dirty-set rescore (commit 014ccde + cleanup 5b210fd)
from 3.88 s to 3.29 s median on Zhu2013 / 12 ratchet reps — a 15.2 %
wall-time reduction, against an 18.2 % upper bound predicted from
VTune's fitch_na_score share of DLL time.  Score 647 matches baseline
on every replicate.

Baseline confirmed via dev/profiling/t300_na_bench.R against an A/B of
five contemporaneous lib snapshots installed from short-lived
worktrees, ruling out shared-dep contamination (TreeTools, Rcpp).
Records the driver script and updates focus-areas.md to mark area #4
DONE.

.gitignore tweaked to exclude the throwaway .bench-libs/ and .rds
files the driver emits.
…te (#245)

Closes the coverage gap that let the IW NNI rescore bug (3df9088) ship.
The existing IW NNI test (test-ts-spr-nni-opt.R) only asserted
result$score >= 0 and topology shape — it never cross-checked that the
returned score matched the IW score of the returned topology.  Pre-fix,
nni_search did new_score = best_score + integer_delta unconditionally,
so under IW the returned score was the accumulated wrong float and the
existing assertions still passed.

New file mirrors the EW pattern from "NNI: 15-tip random dataset"
(result$score == ts_score(rt, ds)) but under concavity=10, sweeps a few
concavities, and exercises multistate + NA cases.  Profile parsimony
isn't reached because ts_nni_search's Rcpp bridge doesn't forward
infoAmounts; the IW path through compute_weighted_score is the same
codepath the fix unified, so this catches both modes at the C++ level.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant