Skip to content

Eliminate data.to_vec() waste copy and release the GIL during compress/hash #45

@27Bslash6

Description

@27Bslash6

Summary

Two avoidable costs in ByteStorage:

  1. data.to_vec() eagerly copies the borrowed &[u8] to owned even though LZ4 and xxh3 both accept &[u8] and the Vec is never retained — a full-payload memcpy before compression starts (up to 512 MB).
  2. GIL held throughoutstore()/retrieve() never call py.allow_threads, so LZ4-compressing/hashing a large payload blocks all Python threads for the full duration.

Evidence

  • cachekit-core/src/byte_storage.rs:184StorageEnvelope::new takes Vec<u8> by value
  • rust/src/python_bindings.rs:40 — no py.allow_threads around the pure-Rust core
  • Also: rmp_serde::to_vec(&envelope) at byte_storage.rs:193 re-copies compressed_data into the envelope (secondary)

Impact

Directly feeds the large-object write-path peak RSS (9.4x write motivation) and serializes large cache writes against all other Python threads.

Fix

Take &[u8] (avoid the to_vec); wrap the compress/hash core in py.allow_threads. Note: the py crate pulls cachekit-core from crates.io (rust/Cargo.toml:23) — confirm published version matches before shipping.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestrustPull requests that update rust code

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions