Improve parallelism by solving most difficult deals first by tameware · Pull Request #216 · dds-bridge/dds

tameware · 2026-06-29T02:40:21Z

$ time ./benchmark.sh --branch opus-two-percent --branch opus-improve-parallelism --repeats 5 --max-deals 10000
Building dtest from 'opus-two-percent'...
Building dtest from 'opus-improve-parallelism'...
Restoring 'benchmark'...
DDS dtest benchmark
===================
branch:      branch 'opus-two-percent' (/var/folders/12/xtx6dlwd0mdcxkspvmsszsrc0000gn/T//dds-dtest-branch.Cf0Cly)
compare:     branch 'opus-improve-parallelism' (/var/folders/12/xtx6dlwd0mdcxkspvmsszsrc0000gn/T//dds-dtest-compare.gwQNLJ)
details:     off (summary only)
run order:   interleaved branch, compare
epsilon:     0.5%
hands dir:   /Users/adamw/src/dds/hands
max_deals:   10000
files:       list10000.txt list1000.txt list100.txt list10.txt list1.txt
git branch:  benchmark
repeats:     5


Summary (avg user ms)
==============================================================================
solver file          opus-improve opus-two-per cmp/branch note
------ ------------- ------------ ------------ ---------- ---------------
solve  list10000.txt         2.67         2.63      1.01x opus-two-percent faster
solve  list1000.txt          2.12         2.09      1.01x opus-two-percent faster
solve  list100.txt           2.06         2.05      1.00x equal
solve  list10.txt            4.20         4.30      0.98x opus-improve-parallelism faster
solve  list1.txt            12.00        12.00      1.00x equal
calc   list10000.txt         9.68        10.25      0.94x opus-improve-parallelism faster
calc   list1000.txt          9.63        10.64      0.91x opus-improve-parallelism faster
calc   list100.txt           8.22         8.70      0.95x opus-improve-parallelism faster
calc   list10.txt           14.44        14.38      1.00x equal
calc   list1.txt            38.20        38.00      1.01x opus-two-percent faster
------ ------------- ------------ ------------ ---------- ---------------
TOTAL  elapsed (s)         684.60       716.16      0.96x opus-improve-parallelism faster

Completed 100 runs (100 expected).

The heuristic extraction refactor changed weight_alloc_trump_void1's first branch from `lead_suit == trump` to `suit == trump`. Since that is exhaustive with the following `else if (suit != trump)`, the three ruffing branches (using the `24 - rank + ...` formula) became dead code, and trump ruffs were scored with side-suit discard weights instead. This mis-ordered ruffs, costing alpha-beta cutoffs. The effect is small for solve but compounds heavily in calc's warm-TT iterative deepening: calc explored ~34% more nodes than v2.9. Restoring the original `lead_suit == trump` pitch branch makes the ruffing branches reachable again and cuts calc time ~25% (gap to v2.9: 1.37x -> 1.02x). Ordering-only change; double-dummy results are unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

Per Copilot. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

The heuristic/quick-tricks refactor introduced static_cast<unsigned char> wrappers on values that v2.9 used as signed, changing search behavior: - make_3 / make_3_ctx: winner[]/second_best[] .hand and .rank were cast to unsigned char, turning the -1 "no card" sentinel into 255. This broke winner[trump].hand == -1 style checks in QuickTricks, losing cutoffs. - weight_alloc_trump_void2 / _void3: rel_rank[aggr[suit]][...] indexed through static_cast<unsigned char>(aggr[suit]), truncating the 13-bit aggregate holding to 8 bits and reading the wrong rel_rank row. - QuickTricksPartnerHand{Trump,NT}: bit_map_rank index cast the signed rank through unsigned char. With these reverted to v2.9's signed handling, the per-move-generation ordering trace now matches v2.9 exactly (0 divergences on list1), closing the residual calc gap to parity. Ordering/pruning-only change; double-dummy results are unchanged and all library tests pass. Co-authored-by: Cursor <cursoragent@cursor.com>

The parallel board loop handed boards out in index order via an atomic counter, so a hard board picked near the end left one worker running long while the others sat idle. Hand out the hardest boards first (longest- processing-time-first) so the tail consists of cheap boards. parallel_all_boards_n gains an optional dispatch-order permutation: workers still pull from the same atomic counter, but the slot is mapped through the order before becoming a board number, so only the dispatch sequence changes and result placement is unaffected. The solve path passes no order and is unchanged. calc estimates per-deal difficulty with a cheap, trump-independent structural proxy (deal_fanout, mirroring Scheduler::Fanout) and sorts board indices by descending difficulty before dispatch. calc list1000 -n18: ~11.0s -> ~9.6s wall (~13%), user CPU unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

CalcDDtableN builds one board per strain for a single deal. deal_fanout is trump-independent, so all boards share one fanout and the difficulty sort is a pure no-op there. Gate the sort behind a difficulty_sort flag (default on for batch CalcAllTablesN) and disable it for the single-deal path. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

This PR aims to improve overall throughput for batch calculations by reducing “tail latency” in parallel workloads: it estimates deal difficulty cheaply, sorts boards hardest-first, and adds an optional dispatch-order mechanism to the parallel board runner.

Changes:

Extend parallel_all_boards_n() to optionally dispatch boards in a caller-provided order.
Add a cheap per-deal “fanout” estimate and use it to stable-sort batch calc boards hardest-first before parallel execution.
Simplify/remove several legacy static_cast<unsigned char>(...) conversions in solver/heuristic code paths.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
library/src/system/parallel_boards.hpp	Adds optional `order` parameter to control dispatch order (hardest-first, etc.).
library/src/system/parallel_boards.cpp	Implements ordered dispatch via slot→board mapping.
library/src/calc_tables.cpp	Computes per-deal difficulty estimate and dispatches hardest boards first for batch calc.
library/src/heuristic_sorting/heuristic_sorting.cpp	Cleans up heuristic code (including rel-rank indexing casts) and adjusts some void/trump logic.
library/src/quick_tricks.cpp	Removes redundant casts when indexing with `abs_rank[..].rank`.
library/src/ab_search.cpp	Removes redundant casts when copying `abs_rank` winner/second-best into `Pos`.

+  // Map a dispatch slot to the board number to process. With an order, hand out
+  // boards in that sequence (e.g. hardest first); otherwise in index order.
+  const bool use_order =
+    (order != nullptr && static_cast<int>(order->size()) == count);
+  auto board_of = [&](const int slot) -> int {
+    return use_order ? (*order)[static_cast<unsigned>(slot)] : slot;
+  };


Only honor the optional dispatch order when it is a valid permutation of [0, count: each element in range and unique. A malformed order (duplicates or out-of-range values) now falls back to index order, preventing invalid board indices from reaching process_board. EOF ) Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

+  // Map a dispatch slot to the board number to process. With an order, hand out
+  // boards in that sequence (e.g. hardest first); otherwise in index order.
+  const bool use_order =
+    (order != nullptr && static_cast<int>(order->size()) == count);
+  auto board_of = [&](const int slot) -> int {
+    return use_order ? (*order)[static_cast<unsigned>(slot)] : slot;
+  };


Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

tameware and others added 5 commits June 27, 2026 18:51

Fix incorrect comment

8ecebc8

Per Copilot. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

tameware requested a review from Copilot June 29, 2026 02:40

Copilot started reviewing on behalf of tameware June 29, 2026 02:40 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

tameware self-assigned this Jun 29, 2026

tameware requested a review from Copilot June 29, 2026 03:30

Copilot started reviewing on behalf of tameware June 29, 2026 03:30 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

tameware requested a review from Copilot June 29, 2026 04:02

Copilot started reviewing on behalf of tameware June 29, 2026 04:02 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve parallelism by solving most difficult deals first#216

Improve parallelism by solving most difficult deals first#216
tameware wants to merge 6 commits into
dds-bridge:developfrom
tameware:opus-improve-parallelism

tameware commented Jun 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tameware commented Jun 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants