Skip to content

perf(profiling): reduce profiler arena memory footprint#2048

Draft
taegyunkim wants to merge 4 commits into
taegyunkim/prof-14423-prof-dictinary-benchfrom
taegyunkim/profiles-dictionary-memory-footprint
Draft

perf(profiling): reduce profiler arena memory footprint#2048
taegyunkim wants to merge 4 commits into
taegyunkim/prof-14423-prof-dictinary-benchfrom
taegyunkim/profiles-dictionary-memory-footprint

Conversation

@taegyunkim

@taegyunkim taegyunkim commented May 27, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Reduces the profiler arena memory floor while preserving larger-workload behavior by making ChainAllocator grow geometrically up to a cap.

This PR is stacked on top of #2088, which adds a ProfilesDictionary Criterion benchmark so this change can be compared by the GitLab benchmark job.

Changes:

  • Adds capped geometric growth to ChainAllocator.
  • Adds ChainAllocator::new_capped_in(initial, max, allocator) for callers that want a smaller initial chunk but a historical/max chunk size after growth.
  • Lowers profiler dictionary arena initial chunks from 1 MiB to 64 KiB, capped at the historical 1 MiB chunk size.
  • Lowers per-profile StringTable initial chunks from 4 MiB to 512 KiB, capped at the historical 4 MiB chunk size.
  • Keeps ParallelStringSet / ParallelSliceSet at 16 shards.

Motivation

Python profiler memory analysis showed that common profiles keep only tens to hundreds of KiB of dictionary/string-table content, but libdatadog reserved much larger arena chunks up front. This created a high per-process memory floor, especially across forked workers.

The smaller initial chunks reduce that floor. Geometric growth avoids keeping large/high-cardinality services on tiny chunks indefinitely, so they ramp back to the previous chunk sizes after a few growth events.

Shard count

This PR keeps the existing 16-shard default.

I originally explored reducing ParallelStringSet / ParallelSliceSet from 16 shards to 4, but dropped that from this PR. The extra memory saved by reducing shards after the arena-size change is relatively small (12 * 64 KiB = 768 KiB for the string set), while 16 shards preserve better concurrent insertion headroom.

Consumer concurrency summary:

  • ddprof appears effectively single-writer for dictionary insertion: the worker thread interns while processing events, while export serializes an inactive profile buffer.
  • dd-trace-py can have dictionary insertion concurrency today: one native stack sampler thread can intern off-GIL while one Python/Cython collector path is active under the GIL.
  • Future free-threaded/no-GIL Python could increase producer concurrency.

So this PR focuses on the main memory win: smaller initial arenas with capped growth, without reducing shard count.

Additional Notes

Expected growth patterns:

  • Dictionary arenas: 64 KiB -> 128 KiB -> 256 KiB -> 512 KiB -> 1 MiB -> ...
  • Per-profile StringTable: 512 KiB -> 1 MiB -> 2 MiB -> 4 MiB -> ...

Oversized individual allocations still allocate chunks large enough for the request, even if larger than the routine growth cap.

Approximate initial arena floor after this change:

  • ParallelStringSet: 16 * 64 KiB = 1 MiB instead of 16 * 1 MiB = 16 MiB.
  • FunctionSet: 4 * 64 KiB = 256 KiB instead of 4 * 1 MiB = 4 MiB.
  • MappingSet: 2 * 64 KiB = 128 KiB instead of 2 * 1 MiB = 2 MiB.
  • Per-profile StringTable: 512 KiB instead of 4 MiB.

How to test the change?

Ran:

cargo +nightly-2026-02-08 fmt --all -- --check
cargo check -p libdd-alloc -p libdd-profiling
cargo check -p libdd-profiling --benches
cargo +stable clippy -p libdd-alloc -p libdd-profiling --all-targets --all-features -- -D warnings
cargo nextest run -p libdd-alloc -p libdd-profiling
cargo test --doc -p libdd-alloc -p libdd-profiling

PROF-14423

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

📚 Documentation Check Results

⚠️ 653 documentation warning(s) found

📦 libdd-alloc - 3 warning(s)

📦 libdd-profiling - 650 warning(s)


Updated: 2026-06-10 18:35:04 UTC | Commit: 193ac93 | missing-docs job results

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

Summary by Rule

Rule Base Branch PR Branch Change
expect_used 1 1 No change (0%)
Total 1 1 No change (0%)

Annotation Counts by File

File Base Branch PR Branch Change
libdd-profiling/src/collections/string_table/mod.rs 1 1 No change (0%)

Annotation Stats by Crate

Crate Base Branch PR Branch Change
clippy-annotation-reporter 5 5 No change (0%)
datadog-ffe-ffi 1 1 No change (0%)
datadog-ipc 21 21 No change (0%)
datadog-live-debugger 4 4 No change (0%)
datadog-live-debugger-ffi 10 10 No change (0%)
datadog-profiling-replayer 4 4 No change (0%)
datadog-sidecar 46 46 No change (0%)
libdd-common 13 13 No change (0%)
libdd-common-ffi 12 12 No change (0%)
libdd-data-pipeline 5 5 No change (0%)
libdd-ddsketch 2 2 No change (0%)
libdd-dogstatsd-client 1 1 No change (0%)
libdd-profiling 13 13 No change (0%)
libdd-remote-config 4 4 No change (0%)
libdd-telemetry 20 20 No change (0%)
libdd-tinybytes 4 4 No change (0%)
libdd-trace-normalization 2 2 No change (0%)
libdd-trace-obfuscation 3 3 No change (0%)
libdd-trace-stats 1 1 No change (0%)
libdd-trace-utils 11 11 No change (0%)
Total 182 182 No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

🔒 Cargo Deny Results

⚠️ 6 issue(s) found, showing only errors (advisories, bans, sources)

📦 libdd-alloc - ✅ No issues

📦 libdd-profiling - 6 error(s)

Show output
error[vulnerability]: NSEC3 closest-encloser proof validation enters unbounded loop on cross-zone responses
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:79:1
   │
79 │ hickory-proto 0.25.2 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
   │
   ├ ID: RUSTSEC-2026-0118
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0118
   ├ The NSEC3 closest-encloser proof validation in `hickory-proto`'s
     `DnssecDnsHandle` walks from the QNAME up to the SOA owner name, building a
     list of candidate encloser names. The iterator used assumes the
     QNAME is a descendant of the SOA owner, terminating only when the current
     candidate equals the SOA name. When the SOA in a response's authority section
     is not an ancestor of the QNAME, the loop stalls at the DNS root and never
     terminates, repeatedly calling `Name::base_name()` and pushing newly allocated
     `Name` and hashed-name entries into the candidate `Vec`.
     
     The bug is reachable by any caller of `DnssecDnsHandle` — including the
     resolver, recursor, and client — when built with the `dnssec-ring` or
     `dnssec-aws-lc-rs` feature and configured to perform DNSSEC validation. It is
     triggered while validating a NoData or NXDomain response whose authority
     section contains an SOA record from a zone other than an ancestor of the
     QNAME, on a code path that requires NSEC3 closest-encloser proof. In practice
     this can be reached through an insecure CNAME chain that crosses zone
     boundaries into a DNSSEC-signed zone returning NoData, but the minimum
     condition is just a mismatched SOA owner on a response requiring NSEC3
     validation.
     
     A `debug_assert_ne!(name, Name::root())` guards the loop body, so debug builds
     abort with a panic on the first iteration past the root. Release builds
     compile the assertion out and run the loop unbounded, allocating until the
     process exhausts available memory (OOM). A reachable upstream attacker who
     can return such a response can therefore crash a debug-built validator or
     exhaust memory on a release-built one.
     
     The affected code was migrated from `hickory-proto` to `hickory-net` as part of
     the 0.26.0 release. The `hickory-proto` 0.26.x release no longer offers
     `DnssecDnsHandle` and so we recommend all affected users update to `hickory-net`
     0.26.1 when the implementation of that type is required.
   ├ Announcement: https://github.com/hickory-dns/hickory-dns/security/advisories/GHSA-3v94-mw7p-v465
   ├ Solution: No safe upgrade is available!
   ├ hickory-proto v0.25.2
     └── hickory-resolver v0.25.2
         └── reqwest v0.13.2
             ├── libdd-common v4.2.0
             │   └── libdd-profiling v1.0.0
             │       └── (dev) libdd-profiling v1.0.0 (*)
             └── libdd-profiling v1.0.0 (*)

error[vulnerability]: CPU exhaustion during message encoding due to O(n²) name compression
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:79:1
   │
79 │ hickory-proto 0.25.2 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
   │
   ├ ID: RUSTSEC-2026-0119
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0119
   ├ During message encoding, `hickory-proto`'s `BinEncoder` stores pointers to
     labels that are candidates for name compression in a `Vec<(usize, Vec<u8>)>`.
     The name compression logic then searches for matches with a linear scan.
     
     A malicious message with many records can both introduce many candidate labels,
     and invoke this linear scan many times. This can amplify CPU exhaustion in DoS
     attacks.
     
     This is similar to
     [CVE-2024-8508](https://www.nlnetlabs.nl/downloads/unbound/CVE-2024-8508.txt).
     
     We recommend all affected users update to `hickory-proto` 0.26.1 for the fix.
   ├ Announcement: https://github.com/hickory-dns/hickory-dns/security/advisories/GHSA-q2qq-hmj6-3wpp
   ├ Solution: Upgrade to >=0.26.1 (try `cargo update -p hickory-proto`)
   ├ hickory-proto v0.25.2
     └── hickory-resolver v0.25.2
         └── reqwest v0.13.2
             ├── libdd-common v4.2.0
             │   └── libdd-profiling v1.0.0
             │       └── (dev) libdd-profiling v1.0.0 (*)
             └── libdd-profiling v1.0.0 (*)

error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:157:1
    │
157 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
    │
    ├ ID: RUSTSEC-2026-0097
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
    ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
      
      - The `log` and `thread_rng` features are enabled
      - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
      - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
      - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
      - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
      
      `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
    ├ Announcement: https://github.com/rust-random/rand/pull/1763
    ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
    ├ rand v0.8.5
      ├── libdd-common v4.2.0
      │   └── libdd-profiling v1.0.0
      │       └── (dev) libdd-profiling v1.0.0 (*)
      ├── libdd-profiling v1.0.0 (*)
      └── proptest v1.5.0
          └── (dev) libdd-profiling v1.0.0 (*)

error[vulnerability]: Name constraints for URI names were incorrectly accepted
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:181:1
    │
181 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0098
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0098
    ├ Name constraints for URI names were ignored and therefore accepted.
      
      Note this library does not provide an API for asserting URI names, and URI name constraints are otherwise not implemented.  URI name constraints are now rejected unconditionally.
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-965h-392x-2mh5](https://github.com/rustls/webpki/security/advisories/GHSA-965h-392x-2mh5). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      ├── rustls v0.23.37
      │   ├── hyper-rustls v0.27.7
      │   │   ├── libdd-common v4.2.0
      │   │   │   └── libdd-profiling v1.0.0
      │   │   │       └── (dev) libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2
      │   │       ├── libdd-common v4.2.0 (*)
      │   │       └── libdd-profiling v1.0.0 (*)
      │   ├── libdd-common v4.2.0 (*)
      │   ├── libdd-profiling v1.0.0 (*)
      │   ├── reqwest v0.13.2 (*)
      │   ├── rustls-platform-verifier v0.6.2
      │   │   ├── libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2 (*)
      │   └── tokio-rustls v0.26.0
      │       ├── hyper-rustls v0.27.7 (*)
      │       ├── libdd-common v4.2.0 (*)
      │       └── reqwest v0.13.2 (*)
      └── rustls-platform-verifier v0.6.2 (*)

error[vulnerability]: Name constraints were accepted for certificates asserting a wildcard name
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:181:1
    │
181 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0099
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0099
    ├ Permitted subtree name constraints for DNS names were accepted for certificates asserting a wildcard name.
      
      This was incorrect because, given a name constraint of `accept.example.com`, `*.example.com` could feasibly allow a name of `reject.example.com` which is outside the constraint.
      This is very similar to [CVE-2025-61727](https://go.dev/issue/76442).
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-xgp8-3hg3-c2mh](https://github.com/rustls/webpki/security/advisories/GHSA-xgp8-3hg3-c2mh). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      ├── rustls v0.23.37
      │   ├── hyper-rustls v0.27.7
      │   │   ├── libdd-common v4.2.0
      │   │   │   └── libdd-profiling v1.0.0
      │   │   │       └── (dev) libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2
      │   │       ├── libdd-common v4.2.0 (*)
      │   │       └── libdd-profiling v1.0.0 (*)
      │   ├── libdd-common v4.2.0 (*)
      │   ├── libdd-profiling v1.0.0 (*)
      │   ├── reqwest v0.13.2 (*)
      │   ├── rustls-platform-verifier v0.6.2
      │   │   ├── libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2 (*)
      │   └── tokio-rustls v0.26.0
      │       ├── hyper-rustls v0.27.7 (*)
      │       ├── libdd-common v4.2.0 (*)
      │       └── reqwest v0.13.2 (*)
      └── rustls-platform-verifier v0.6.2 (*)

error[vulnerability]: Reachable panic in certificate revocation list parsing
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:181:1
    │
181 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0104
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0104
    ├ A panic was reachable when parsing certificate revocation lists via [`BorrowedCertRevocationList::from_der`]
      or [`OwnedCertRevocationList::from_der`].  This was the result of mishandling a syntactically valid empty
      `BIT STRING` appearing in the `onlySomeReasons` element of a `IssuingDistributionPoint` CRL extension.
      
      This panic is reachable prior to a CRL's signature being verified.
      
      Applications that do not use CRLs are not affected.
      
      Thank you to @tynus3 for the report.
    ├ Solution: Upgrade to >=0.103.13, <0.104.0-alpha.1 OR >=0.104.0-alpha.7 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      ├── rustls v0.23.37
      │   ├── hyper-rustls v0.27.7
      │   │   ├── libdd-common v4.2.0
      │   │   │   └── libdd-profiling v1.0.0
      │   │   │       └── (dev) libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2
      │   │       ├── libdd-common v4.2.0 (*)
      │   │       └── libdd-profiling v1.0.0 (*)
      │   ├── libdd-common v4.2.0 (*)
      │   ├── libdd-profiling v1.0.0 (*)
      │   ├── reqwest v0.13.2 (*)
      │   ├── rustls-platform-verifier v0.6.2
      │   │   ├── libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2 (*)
      │   └── tokio-rustls v0.26.0
      │       ├── hyper-rustls v0.27.7 (*)
      │       ├── libdd-common v4.2.0 (*)
      │       └── reqwest v0.13.2 (*)
      └── rustls-platform-verifier v0.6.2 (*)

advisories FAILED, bans ok, sources ok

Updated: 2026-06-10 18:36:46 UTC | Commit: 193ac93 | dependency-check job results

@codecov-commenter

codecov-commenter commented May 27, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.57%. Comparing base (477834f) to head (6726636).

Additional details and impacted files
@@                              Coverage Diff                               @@
##           taegyunkim/prof-14423-prof-dictinary-bench    #2048      +/-   ##
==============================================================================
+ Coverage                                       73.53%   73.57%   +0.04%     
==============================================================================
  Files                                             475      475              
  Lines                                           79007    79200     +193     
==============================================================================
+ Hits                                            58095    58270     +175     
- Misses                                          20912    20930      +18     
Components Coverage Δ
libdd-crashtracker 65.34% <ø> (+0.01%) ⬆️
libdd-crashtracker-ffi 37.68% <ø> (ø)
libdd-agent-client 83.79% <ø> (ø)
libdd-alloc 99.10% <100.00%> (+0.32%) ⬆️
libdd-data-pipeline 86.25% <ø> (-0.02%) ⬇️
libdd-data-pipeline-ffi 73.86% <ø> (ø)
libdd-common 79.93% <ø> (ø)
libdd-common-ffi 74.41% <ø> (ø)
libdd-telemetry 73.37% <ø> (+0.02%) ⬆️
libdd-telemetry-ffi 31.36% <ø> (ø)
libdd-dogstatsd-client 82.64% <ø> (ø)
datadog-ipc 74.90% <ø> (-1.47%) ⬇️
libdd-profiling 81.87% <100.00%> (+0.18%) ⬆️
libdd-profiling-ffi 64.79% <ø> (ø)
libdd-sampling 97.48% <ø> (ø)
datadog-sidecar 36.51% <ø> (ø)
datdog-sidecar-ffi 12.23% <ø> (ø)
spawn-worker 48.86% <ø> (ø)
libdd-tinybytes 93.80% <ø> (ø)
libdd-trace-normalization 81.71% <ø> (ø)
libdd-trace-obfuscation 87.30% <ø> (ø)
libdd-trace-protobuf 68.25% <ø> (ø)
libdd-trace-utils 89.32% <ø> (ø)
libdd-tracer-flare 86.57% <ø> (ø)
libdd-log 74.83% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datadog-datadog-prod-us1-2

datadog-datadog-prod-us1-2 Bot commented May 27, 2026

Copy link
Copy Markdown

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

Required checks pass | allchecks   View in Datadog   GitHub Actions

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 73.61% (+0.09%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: b8f8c6d | Docs | Datadog PR Page | Give us feedback!

@dd-octo-sts

dd-octo-sts Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 7.70 MB 7.70 MB 0% (0 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 83.68 MB 83.68 MB -0% (-1.79 KB) 👌
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.34 MB 10.34 MB -0% (-8 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 94.78 MB 94.78 MB -0% (-1.65 KB) 👌
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 24.83 MB 24.83 MB +0% (+1.00 KB) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 86.89 KB 86.89 KB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 180.83 MB 180.82 MB -0% (-16.00 KB) 👌
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 925.68 MB 925.68 MB +0% (+5.18 KB) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 8.09 MB 8.09 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 86.89 KB 86.89 KB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 23.93 MB 23.93 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 47.78 MB 47.78 MB +0% (+464 B) 👌
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 21.52 MB 21.52 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 88.26 KB 88.26 KB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 184.90 MB 184.89 MB -0% (-8.00 KB) 👌
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 918.35 MB 918.36 MB +0% (+5.24 KB) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 6.24 MB 6.24 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 88.26 KB 88.26 KB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 25.66 MB 25.66 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 45.41 MB 45.41 MB +0% (+500 B) 👌
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 74.60 MB 74.60 MB -0% (-48 B) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 8.58 MB 8.58 MB 0% (0 B) 👌
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 90.02 MB 90.02 MB +0% (+144 B) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.44 MB 10.44 MB 0% (0 B) 👌

@taegyunkim taegyunkim force-pushed the taegyunkim/profiles-dictionary-memory-footprint branch 2 times, most recently from 3da10e3 to 477c1f4 Compare May 27, 2026 20:47
@ivoanjo

ivoanjo commented May 28, 2026

Copy link
Copy Markdown
Member

Note that historically the tension here was between fragmentation and memory use -- that's why we set the higher defaults. (See for instance https://docs.google.com/document/d/1g_H7G9s_H9yoxlpyw_B0aoUyIVmo0ZQBzQkp5EUUyX8/edit?tab=t.0 )

This not to say that we can't or shouldn't adjust these numbers, it's more to add context to why larger numbers were chosen rather than starting with smallest possible and just letting it grow.

@taegyunkim

Copy link
Copy Markdown
Contributor Author

Note that historically the tension here was between fragmentation and memory use -- that's why we set the higher defaults. (See for instance https://docs.google.com/document/d/1g_H7G9s_H9yoxlpyw_B0aoUyIVmo0ZQBzQkp5EUUyX8/edit?tab=t.0 )

This not to say that we can't or shouldn't adjust these numbers, it's more to add context to why larger numbers were chosen rather than starting with smallest possible and just letting it grow.

@ivoanjo Thanks for the context! That makes sense, and this is why this PR uses capped geometric growth.

A couple of differences make this less risky than the story from your report:

  • These profiler dictionary/per profile string-table arenas use ChainAllocator<VirtualAllocator>, so on Unix they allocate via mmap, not glibc malloc.
  • The chunks are arena-owned and long-lived. We're not creating malloc/free churn interleaved with runtime allocations.
  • Larger workloads converge back to the historical chunk sizes.

So this keeps the lower memory floor for small/common profiles, while avoiding the "smallest possible and just keep growing tiny chunks" behavior.

I agree we should validate this with real workloads, especially Ruby if we're worried about fragmentation.

@ivoanjo

ivoanjo commented Jun 5, 2026

Copy link
Copy Markdown
Member

Ahh that's great, thanks for the extra context. In particular, I missed the detail where these come from mmap directly -- in that case I indeed expect the likelihood of fragmentation is way way lower (e.g. address space fragmentation could be possible but... I've not heard of it happening very commonly so hopefully the kernel/glibc do a good job there?).

Excited to see the improvements from this one :D

@taegyunkim

Copy link
Copy Markdown
Contributor Author

@ivoanjo the DoE results look very good for Python with this change

For all three archetypes, we see reduction in heap live size, heap live samples, allocated memory, allocations without change in cpu-time.

Enterprise

Screenshot 2026-06-05 at 9 54 57 AM

Latency

Screenshot 2026-06-05 at 9 55 15 AM

Throughput

Screenshot 2026-06-05 at 9 54 48 AM

@taegyunkim taegyunkim force-pushed the taegyunkim/profiles-dictionary-memory-footprint branch from 45a451c to 4686b93 Compare June 5, 2026 14:07
@taegyunkim taegyunkim changed the base branch from main to taegyunkim/prof-14423-prof-dictinary-bench June 5, 2026 14:07
@taegyunkim taegyunkim force-pushed the taegyunkim/prof-14423-prof-dictinary-bench branch from 2e23e15 to 477834f Compare June 5, 2026 20:26
@taegyunkim taegyunkim force-pushed the taegyunkim/profiles-dictionary-memory-footprint branch from 65a7aff to 6726636 Compare June 5, 2026 20:46
Comment thread libdd-profiling/src/collections/string_table/mod.rs Outdated
Comment thread libdd-alloc/src/chain.rs
/// doesn't have enough space for the requested allocation, and then links the
/// new [LinearAllocator] to the previous one, creating a chain. This is where
/// its name comes from.
/// its name comes from. Each successful growth doubles the target chunk size

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we experimented with other factors? e.g. 1.5x would still grow geometrically, but not as fast.

Comment thread libdd-alloc/src/chain.rs
/// this in mind when sizing your hint if you are trying to be precise,
/// such as making sure a specific object fits.
pub const fn new_in(chunk_size_hint: usize, allocator: A) -> Self {
let initial_node_size = Self::normalize_node_size(chunk_size_hint);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does one of these get a function and the other is inline?

Comment on lines +202 to +203
assert!(function_arena_reserved_bytes(&dict) <= 4 * SMALL_ARENA_HINT);
assert!(mapping_arena_reserved_bytes(&dict) <= 2 * SMALL_ARENA_HINT);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do these constants come from?

pub const SIZE_HINT: usize = 1024 * 1024;
// Keep the per-shard arena small; larger dictionaries grow
// geometrically up to the historical 1 MiB chunk size.
pub const SIZE_HINT: usize = 64 * 1024;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this constant?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be named INITIAL_SIZE_HINT

// geometrically up to the historical 4 MiB chunk size, while common
// profiles fit comfortably below this initial size. Talk to .NET
// profiling engineers before making this any bigger.
const SIZE_HINT: usize = 512 * 1024;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the other case, we went from 64K-1M, here we go from 512K-4M. Why

Comment thread libdd-alloc/src/chain.rs

let bool_layout = Layout::new::<bool>();

const GROWTH_ITERATIONS: usize = 16;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the reduction?

Comment thread libdd-alloc/src/chain.rs
if Layout::from_size_align(next, align).is_ok() {
next
} else {
current

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this the right fall-back when from_size_align fails?

Comment thread libdd-alloc/src/chain.rs
} else {
chunk_size_hint
},
node_size: Cell::new(initial_node_size),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document why this need to be a cell?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants