compute: cache per-batch heap size in ArrangementSize operator#36977
Merged
Conversation
The arrangement-size logging operator recomputed every live batch's heap size on every activation, walking each batch's backing regions repeatedly even when nothing changed. Batches are immutable once sealed, so this work is redundant. Cache the computed (size, capacity, allocations) tuple alongside the weak reference when a batch is first observed, and only sum the cached values on subsequent activations. This drops per-activation cost from O(batches * columns * regions) to O(new batches * regions) plus an O(batches) sum, eliminating the repeated region walk that made the operator slow for large arrangements. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The arrangement-size logging operator (
log_arrangement_size) is sometimes slow.On every activation it recomputed the heap size of every live batch by walking each batch's backing regions, even when nothing changed and the operator fired for unrelated reasons.
For large arrangements this repeated walk dominates.
Description
Batches are immutable once sealed, so their heap size never changes after creation.
The operator already keys its batch map on
Rc::as_ptr, relying on stable pointer identity.This change caches the computed
(size, capacity, allocations)tuple alongside the weak reference when a batch is first observed (via the input stream ormap_batches), and only sums the cached values on subsequent activations.Dead batches are still dropped, now via
Weak::strong_countinstead of anupgradethat cloned theRc.Per-activation cost drops from
O(batches * columns * regions)toO(new batches * regions)plus anO(batches)sum, eliminating the repeated region walk.Logged deltas are unchanged: because batches are immutable, the cached size equals what a re-walk would produce.
Verification
Behavior-preserving change covered by existing arrangement-size logging tests and the
mz_arrangement_sizesintrospection views.cargo clippy -p mz-computeandcargo fmtare clean.🤖 Generated with Claude Code