compute: cache per-batch heap size in ArrangementSize operator by antiguru · Pull Request #36977 · MaterializeInc/materialize

antiguru · 2026-06-11T08:26:20Z

Motivation

The arrangement-size logging operator (log_arrangement_size) is sometimes slow.
On every activation it recomputed the heap size of every live batch by walking each batch's backing regions, even when nothing changed and the operator fired for unrelated reasons.
For large arrangements this repeated walk dominates.

Description

Batches are immutable once sealed, so their heap size never changes after creation.
The operator already keys its batch map on Rc::as_ptr, relying on stable pointer identity.
This change caches the computed (size, capacity, allocations) tuple alongside the weak reference when a batch is first observed (via the input stream or map_batches), and only sums the cached values on subsequent activations.
Dead batches are still dropped, now via Weak::strong_count instead of an upgrade that cloned the Rc.

Per-activation cost drops from O(batches * columns * regions) to O(new batches * regions) plus an O(batches) sum, eliminating the repeated region walk.
Logged deltas are unchanged: because batches are immutable, the cached size equals what a re-walk would produce.

Verification

Behavior-preserving change covered by existing arrangement-size logging tests and the mz_arrangement_sizes introspection views.
cargo clippy -p mz-compute and cargo fmt are clean.

🤖 Generated with Claude Code

The arrangement-size logging operator recomputed every live batch's heap size on every activation, walking each batch's backing regions repeatedly even when nothing changed. Batches are immutable once sealed, so this work is redundant. Cache the computed (size, capacity, allocations) tuple alongside the weak reference when a batch is first observed, and only sum the cached values on subsequent activations. This drops per-activation cost from O(batches * columns * regions) to O(new batches * regions) plus an O(batches) sum, eliminating the repeated region walk that made the operator slow for large arrangements. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

petrosagg

nice!

antiguru marked this pull request as ready for review June 11, 2026 08:28

antiguru requested a review from a team as a code owner June 11, 2026 08:28

antiguru requested review from DAlperin and petrosagg June 11, 2026 08:28

petrosagg approved these changes Jun 11, 2026

View reviewed changes

antiguru merged commit 0fe50dd into MaterializeInc:main Jun 11, 2026
117 checks passed

antiguru deleted the arrangement-size-cache branch June 11, 2026 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compute: cache per-batch heap size in ArrangementSize operator#36977

compute: cache per-batch heap size in ArrangementSize operator#36977
antiguru merged 1 commit into
MaterializeInc:mainfrom
antiguru:arrangement-size-cache

antiguru commented Jun 11, 2026

Uh oh!

petrosagg left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

antiguru commented Jun 11, 2026

Motivation

Description

Verification

Uh oh!

petrosagg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants