Improve parallelism by solving most difficult deals first#216
Draft
tameware wants to merge 6 commits into
Draft
Conversation
tameware
commented
Jun 29, 2026
Collaborator
The heuristic extraction refactor changed weight_alloc_trump_void1's first branch from `lead_suit == trump` to `suit == trump`. Since that is exhaustive with the following `else if (suit != trump)`, the three ruffing branches (using the `24 - rank + ...` formula) became dead code, and trump ruffs were scored with side-suit discard weights instead. This mis-ordered ruffs, costing alpha-beta cutoffs. The effect is small for solve but compounds heavily in calc's warm-TT iterative deepening: calc explored ~34% more nodes than v2.9. Restoring the original `lead_suit == trump` pitch branch makes the ruffing branches reachable again and cuts calc time ~25% (gap to v2.9: 1.37x -> 1.02x). Ordering-only change; double-dummy results are unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>
Per Copilot. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The heuristic/quick-tricks refactor introduced static_cast<unsigned char>
wrappers on values that v2.9 used as signed, changing search behavior:
- make_3 / make_3_ctx: winner[]/second_best[] .hand and .rank were cast
to unsigned char, turning the -1 "no card" sentinel into 255. This broke
winner[trump].hand == -1 style checks in QuickTricks, losing cutoffs.
- weight_alloc_trump_void2 / _void3: rel_rank[aggr[suit]][...] indexed
through static_cast<unsigned char>(aggr[suit]), truncating the 13-bit
aggregate holding to 8 bits and reading the wrong rel_rank row.
- QuickTricksPartnerHand{Trump,NT}: bit_map_rank index cast the signed
rank through unsigned char.
With these reverted to v2.9's signed handling, the per-move-generation
ordering trace now matches v2.9 exactly (0 divergences on list1), closing
the residual calc gap to parity. Ordering/pruning-only change; double-dummy
results are unchanged and all library tests pass.
Co-authored-by: Cursor <cursoragent@cursor.com>
The parallel board loop handed boards out in index order via an atomic counter, so a hard board picked near the end left one worker running long while the others sat idle. Hand out the hardest boards first (longest- processing-time-first) so the tail consists of cheap boards. parallel_all_boards_n gains an optional dispatch-order permutation: workers still pull from the same atomic counter, but the slot is mapped through the order before becoming a board number, so only the dispatch sequence changes and result placement is unaffected. The solve path passes no order and is unchanged. calc estimates per-deal difficulty with a cheap, trump-independent structural proxy (deal_fanout, mirroring Scheduler::Fanout) and sorts board indices by descending difficulty before dispatch. calc list1000 -n18: ~11.0s -> ~9.6s wall (~13%), user CPU unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>
CalcDDtableN builds one board per strain for a single deal. deal_fanout is trump-independent, so all boards share one fanout and the difficulty sort is a pure no-op there. Gate the sort behind a difficulty_sort flag (default on for batch CalcAllTablesN) and disable it for the single-deal path. Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to improve overall throughput for batch calculations by reducing “tail latency” in parallel workloads: it estimates deal difficulty cheaply, sorts boards hardest-first, and adds an optional dispatch-order mechanism to the parallel board runner.
Changes:
- Extend
parallel_all_boards_n()to optionally dispatch boards in a caller-provided order. - Add a cheap per-deal “fanout” estimate and use it to stable-sort batch
calcboards hardest-first before parallel execution. - Simplify/remove several legacy
static_cast<unsigned char>(...)conversions in solver/heuristic code paths.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| library/src/system/parallel_boards.hpp | Adds optional order parameter to control dispatch order (hardest-first, etc.). |
| library/src/system/parallel_boards.cpp | Implements ordered dispatch via slot→board mapping. |
| library/src/calc_tables.cpp | Computes per-deal difficulty estimate and dispatches hardest boards first for batch calc. |
| library/src/heuristic_sorting/heuristic_sorting.cpp | Cleans up heuristic code (including rel-rank indexing casts) and adjusts some void/trump logic. |
| library/src/quick_tricks.cpp | Removes redundant casts when indexing with abs_rank[..].rank. |
| library/src/ab_search.cpp | Removes redundant casts when copying abs_rank winner/second-best into Pos. |
Comment on lines
+45
to
+51
| // Map a dispatch slot to the board number to process. With an order, hand out | ||
| // boards in that sequence (e.g. hardest first); otherwise in index order. | ||
| const bool use_order = | ||
| (order != nullptr && static_cast<int>(order->size()) == count); | ||
| auto board_of = [&](const int slot) -> int { | ||
| return use_order ? (*order)[static_cast<unsigned>(slot)] : slot; | ||
| }; |
Only honor the optional dispatch order when it is a valid permutation of [0, count: each element in range and unique. A malformed order (duplicates or out-of-range values) now falls back to index order, preventing invalid board indices from reaching process_board. EOF ) Co-authored-by: Cursor <cursoragent@cursor.com>
Comment on lines
+45
to
+51
| // Map a dispatch slot to the board number to process. With an order, hand out | ||
| // boards in that sequence (e.g. hardest first); otherwise in index order. | ||
| const bool use_order = | ||
| (order != nullptr && static_cast<int>(order->size()) == count); | ||
| auto board_of = [&](const int slot) -> int { | ||
| return use_order ? (*order)[static_cast<unsigned>(slot)] : slot; | ||
| }; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.