perf(omezarr): open the z5 dataset once, not per tile#361
Open
sameeul wants to merge 1 commit into
Open
Conversation
Both OME-Zarr loaders called z5::openDataset(*zarr_ptr_, ds_name_) inside loadTile, i.e. on every tile read. openDataset re-stats and re-parses the dataset's .zarray metadata (shape, chunks, dtype, compressor) from disk each time. On a whole-slide image that is thousands of redundant metadata parses of an immutable dataset. Open the dataset once in the constructor, cache the returned std::unique_ptr<z5::Dataset>, and read every tile through the cached handle. readSubarray takes a const Dataset&, so sharing one handle across all (including multithreaded) tile reads is safe. No behavior change: verified both readers still return exact pixel values and checksums, including the multi-tile / partial-tile paths that now exercise handle reuse across tiles.
darkclad
approved these changes
Jul 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to the z5 3.0.1 upgrade (#360). Both OME-Zarr loaders opened the dataset on every tile read:
z5::openDatasetre-stats and re-parses the dataset's.zarraymetadata (shape, chunks, dtype, compressor) from disk each time. On a whole-slide image that's thousands of redundant parses of an immutable dataset.This opens the dataset once in the constructor, caches the returned
std::unique_ptr<z5::Dataset>, and reads every tile through the cached handle.Why it's safe
readSubarraytakes aconst Dataset&, so a single cached handle is safe to share across all tile reads, including the multithreaded tile-loader path (reads don't mutate the handle).Testing
Verified both readers (
NyxusOmeZarrLoaderandRawOmezarrLoader) still return exact pixel values and full-image checksums against the committedtests/data/omezarrdatasets, including the multi-tile / partial-tile cases — those now read all tiles through the single cached handle, which exercises the reuse path directly. The existingtest_omezarr.hGTest suite covers this; all assertions pass.Scope
Two files, +23/-9, no behavior change. This is item 1 from the #360 review follow-ups (the highest-value, zero-risk one). The remaining follow-ups (8-bit/float16 dtype-string matching, float→uint32 truncation, added dtype test fixtures) change what inputs the reader accepts and will come as a separate PR with their own test data.
🤖 Generated with Claude Code