Skip to content

[perf] Enable multi-thread serial for non-tensor values in MooncakeStore backend#111

Merged
0oshowero0 merged 2 commits into
Ascend:mainfrom
0oshowero0:mooncake
May 30, 2026
Merged

[perf] Enable multi-thread serial for non-tensor values in MooncakeStore backend#111
0oshowero0 merged 2 commits into
Ascend:mainfrom
0oshowero0:mooncake

Conversation

@0oshowero0
Copy link
Copy Markdown
Collaborator

@0oshowero0 0oshowero0 commented May 30, 2026

  1. Enable two-tier multi-threading for batch operations.
    Introduce MAX_SERIAL_WORKER_THREADS (default: 4) to parallelize intra-batch serialization. Combined with MAX_BATCH_WORKER_THREADS (renamed from MAX_WORKER_THREADS), this creates nested parallelism: the outer thread pool dispatches batches concurrently, while the inner thread pool accelerates serialization within each individual batch via serial_utils.batch_encode_into.

  2. Streamline get dispatch by pre-partitioning inputs.
    Hoist index-based list comprehensions out of the batching loops. Keys, shapes, dtypes, and packed sizes are now pre-partitioned into separate tensor / non-tensor lists ahead of time, allowing direct list slicing during thread-pool dispatch and eliminating redundant per-batch reconstruction.

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Copilot AI review requested due to automatic review settings May 30, 2026 09:29
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables nested multi-threading on the MooncakeStore client's non-tensor path by passing num_workers=MAX_SERIAL_WORKER_THREADS to serial_utils.batch_encode_into, and refactors get() to precompute per-category key/shape/dtype/packed_size lists once instead of re-deriving them per batch slice.

Changes:

  • Rename MAX_WORKER_THREADS to MAX_BATCH_WORKER_THREADS and add MAX_SERIAL_WORKER_THREADS for a second layer of serialization parallelism within each non-tensor batch.
  • Hoist tensor/non-tensor split (keys, shapes, dtypes, packed_sizes) out of the batch loops in get() to avoid repeated comprehensions and cast lookups.
  • Reorder put() to collect bytes-future results before waiting on tensor futures, and tweak three docstrings to say "receiver buffer".

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread transfer_queue/storage/clients/mooncake_client.py
Comment thread transfer_queue/storage/clients/mooncake_client.py Outdated
Comment thread transfer_queue/storage/clients/mooncake_client.py Outdated
Comment thread transfer_queue/storage/clients/mooncake_client.py Outdated
Comment thread transfer_queue/storage/clients/mooncake_client.py
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

1 similar comment
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

@0oshowero0 0oshowero0 merged commit 4686763 into Ascend:main May 30, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants