perf(profiling): reduce profiler arena memory footprint by taegyunkim · Pull Request #2048 · DataDog/libdatadog

taegyunkim · 2026-05-27T19:25:41Z

What does this PR do?

Reduces the profiler arena memory floor while preserving larger-workload behavior by making ChainAllocator grow geometrically up to a cap.

This PR is stacked on top of #2088, which adds a ProfilesDictionary Criterion benchmark so this change can be compared by the GitLab benchmark job.

Changes:

Adds capped geometric growth to ChainAllocator.
Adds ChainAllocator::new_capped_in(initial, max, allocator) for callers that want a smaller initial chunk but a historical/max chunk size after growth.
Lowers profiler dictionary arena initial chunks from 1 MiB to 64 KiB, capped at the historical 1 MiB chunk size.
Lowers per-profile StringTable initial chunks from 4 MiB to 512 KiB, capped at the historical 4 MiB chunk size.
Keeps ParallelStringSet / ParallelSliceSet at 16 shards.

Motivation

Python profiler memory analysis showed that common profiles keep only tens to hundreds of KiB of dictionary/string-table content, but libdatadog reserved much larger arena chunks up front. This created a high per-process memory floor, especially across forked workers.

The smaller initial chunks reduce that floor. Geometric growth avoids keeping large/high-cardinality services on tiny chunks indefinitely, so they ramp back to the previous chunk sizes after a few growth events.

Shard count

This PR keeps the existing 16-shard default.

I originally explored reducing ParallelStringSet / ParallelSliceSet from 16 shards to 4, but dropped that from this PR. The extra memory saved by reducing shards after the arena-size change is relatively small (12 * 64 KiB = 768 KiB for the string set), while 16 shards preserve better concurrent insertion headroom.

Consumer concurrency summary:

ddprof appears effectively single-writer for dictionary insertion: the worker thread interns while processing events, while export serializes an inactive profile buffer.
dd-trace-py can have dictionary insertion concurrency today: one native stack sampler thread can intern off-GIL while one Python/Cython collector path is active under the GIL.
Future free-threaded/no-GIL Python could increase producer concurrency.

So this PR focuses on the main memory win: smaller initial arenas with capped growth, without reducing shard count.

Additional Notes

Expected growth patterns:

Dictionary arenas: 64 KiB -> 128 KiB -> 256 KiB -> 512 KiB -> 1 MiB -> ...
Per-profile StringTable: 512 KiB -> 1 MiB -> 2 MiB -> 4 MiB -> ...

Oversized individual allocations still allocate chunks large enough for the request, even if larger than the routine growth cap.

Approximate initial arena floor after this change:

ParallelStringSet: 16 * 64 KiB = 1 MiB instead of 16 * 1 MiB = 16 MiB.
FunctionSet: 4 * 64 KiB = 256 KiB instead of 4 * 1 MiB = 4 MiB.
MappingSet: 2 * 64 KiB = 128 KiB instead of 2 * 1 MiB = 2 MiB.
Per-profile StringTable: 512 KiB instead of 4 MiB.

How to test the change?

Ran:

cargo +nightly-2026-02-08 fmt --all -- --check
cargo check -p libdd-alloc -p libdd-profiling
cargo check -p libdd-profiling --benches
cargo +stable clippy -p libdd-alloc -p libdd-profiling --all-targets --all-features -- -D warnings
cargo nextest run -p libdd-alloc -p libdd-profiling
cargo test --doc -p libdd-alloc -p libdd-profiling

PROF-14423

github-actions · 2026-05-27T19:27:41Z

📚 Documentation Check Results

⚠️ 653 documentation warning(s) found

📦 `libdd-alloc` - 3 warning(s)

📦 `libdd-profiling` - 650 warning(s)

Updated: 2026-06-10 18:35:04 UTC | Commit: 193ac93 | missing-docs job results

github-actions · 2026-05-27T19:29:02Z

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

Base Branch: origin/taegyunkim/prof-14423-prof-dictinary-bench
PR Branch: origin/taegyunkim/profiles-dictionary-memory-footprint

Summary by Rule

Rule	Base Branch	PR Branch	Change
expect_used	1	1	No change (0%)
Total	1	1	No change (0%)

Annotation Counts by File

File	Base Branch	PR Branch	Change
`libdd-profiling/src/collections/string_table/mod.rs`	1	1	No change (0%)

Annotation Stats by Crate

Crate	Base Branch	PR Branch	Change
`clippy-annotation-reporter`	5	5	No change (0%)
`datadog-ffe-ffi`	1	1	No change (0%)
`datadog-ipc`	21	21	No change (0%)
`datadog-live-debugger`	4	4	No change (0%)
`datadog-live-debugger-ffi`	10	10	No change (0%)
`datadog-profiling-replayer`	4	4	No change (0%)
`datadog-sidecar`	46	46	No change (0%)
`libdd-common`	13	13	No change (0%)
`libdd-common-ffi`	12	12	No change (0%)
`libdd-data-pipeline`	5	5	No change (0%)
`libdd-ddsketch`	2	2	No change (0%)
`libdd-dogstatsd-client`	1	1	No change (0%)
`libdd-profiling`	13	13	No change (0%)
`libdd-remote-config`	4	4	No change (0%)
`libdd-telemetry`	20	20	No change (0%)
`libdd-tinybytes`	4	4	No change (0%)
`libdd-trace-normalization`	2	2	No change (0%)
`libdd-trace-obfuscation`	3	3	No change (0%)
`libdd-trace-stats`	1	1	No change (0%)
`libdd-trace-utils`	11	11	No change (0%)
Total	182	182	No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

github-actions · 2026-05-27T19:29:26Z

🔒 Cargo Deny Results

⚠️ 6 issue(s) found, showing only errors (advisories, bans, sources)

📦 `libdd-alloc` - ✅ No issues

📦 `libdd-profiling` - 6 error(s)

Show output

error[vulnerability]: NSEC3 closest-encloser proof validation enters unbounded loop on cross-zone responses
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:79:1
   │
79 │ hickory-proto 0.25.2 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
   │
   ├ ID: RUSTSEC-2026-0118
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0118
   ├ The NSEC3 closest-encloser proof validation in `hickory-proto`'s
     `DnssecDnsHandle` walks from the QNAME up to the SOA owner name, building a
     list of candidate encloser names. The iterator used assumes the
     QNAME is a descendant of the SOA owner, terminating only when the current
     candidate equals the SOA name. When the SOA in a response's authority section
     is not an ancestor of the QNAME, the loop stalls at the DNS root and never
     terminates, repeatedly calling `Name::base_name()` and pushing newly allocated
     `Name` and hashed-name entries into the candidate `Vec`.
     
     The bug is reachable by any caller of `DnssecDnsHandle` — including the
     resolver, recursor, and client — when built with the `dnssec-ring` or
     `dnssec-aws-lc-rs` feature and configured to perform DNSSEC validation. It is
     triggered while validating a NoData or NXDomain response whose authority
     section contains an SOA record from a zone other than an ancestor of the
     QNAME, on a code path that requires NSEC3 closest-encloser proof. In practice
     this can be reached through an insecure CNAME chain that crosses zone
     boundaries into a DNSSEC-signed zone returning NoData, but the minimum
     condition is just a mismatched SOA owner on a response requiring NSEC3
     validation.
     
     A `debug_assert_ne!(name, Name::root())` guards the loop body, so debug builds
     abort with a panic on the first iteration past the root. Release builds
     compile the assertion out and run the loop unbounded, allocating until the
     process exhausts available memory (OOM). A reachable upstream attacker who
     can return such a response can therefore crash a debug-built validator or
     exhaust memory on a release-built one.
     
     The affected code was migrated from `hickory-proto` to `hickory-net` as part of
     the 0.26.0 release. The `hickory-proto` 0.26.x release no longer offers
     `DnssecDnsHandle` and so we recommend all affected users update to `hickory-net`
     0.26.1 when the implementation of that type is required.
   ├ Announcement: https://github.com/hickory-dns/hickory-dns/security/advisories/GHSA-3v94-mw7p-v465
   ├ Solution: No safe upgrade is available!
   ├ hickory-proto v0.25.2
     └── hickory-resolver v0.25.2
         └── reqwest v0.13.2
             ├── libdd-common v4.2.0
             │   └── libdd-profiling v1.0.0
             │       └── (dev) libdd-profiling v1.0.0 (*)
             └── libdd-profiling v1.0.0 (*)

error[vulnerability]: CPU exhaustion during message encoding due to O(n²) name compression
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:79:1
   │
79 │ hickory-proto 0.25.2 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
   │
   ├ ID: RUSTSEC-2026-0119
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0119
   ├ During message encoding, `hickory-proto`'s `BinEncoder` stores pointers to
     labels that are candidates for name compression in a `Vec<(usize, Vec<u8>)>`.
     The name compression logic then searches for matches with a linear scan.
     
     A malicious message with many records can both introduce many candidate labels,
     and invoke this linear scan many times. This can amplify CPU exhaustion in DoS
     attacks.
     
     This is similar to
     [CVE-2024-8508](https://www.nlnetlabs.nl/downloads/unbound/CVE-2024-8508.txt).
     
     We recommend all affected users update to `hickory-proto` 0.26.1 for the fix.
   ├ Announcement: https://github.com/hickory-dns/hickory-dns/security/advisories/GHSA-q2qq-hmj6-3wpp
   ├ Solution: Upgrade to >=0.26.1 (try `cargo update -p hickory-proto`)
   ├ hickory-proto v0.25.2
     └── hickory-resolver v0.25.2
         └── reqwest v0.13.2
             ├── libdd-common v4.2.0
             │   └── libdd-profiling v1.0.0
             │       └── (dev) libdd-profiling v1.0.0 (*)
             └── libdd-profiling v1.0.0 (*)

error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:157:1
    │
157 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
    │
    ├ ID: RUSTSEC-2026-0097
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
    ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
      
      - The `log` and `thread_rng` features are enabled
      - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
      - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
      - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
      - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
      
      `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
    ├ Announcement: https://github.com/rust-random/rand/pull/1763
    ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
    ├ rand v0.8.5
      ├── libdd-common v4.2.0
      │   └── libdd-profiling v1.0.0
      │       └── (dev) libdd-profiling v1.0.0 (*)
      ├── libdd-profiling v1.0.0 (*)
      └── proptest v1.5.0
          └── (dev) libdd-profiling v1.0.0 (*)

error[vulnerability]: Name constraints for URI names were incorrectly accepted
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:181:1
    │
181 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0098
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0098
    ├ Name constraints for URI names were ignored and therefore accepted.
      
      Note this library does not provide an API for asserting URI names, and URI name constraints are otherwise not implemented.  URI name constraints are now rejected unconditionally.
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-965h-392x-2mh5](https://github.com/rustls/webpki/security/advisories/GHSA-965h-392x-2mh5). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      ├── rustls v0.23.37
      │   ├── hyper-rustls v0.27.7
      │   │   ├── libdd-common v4.2.0
      │   │   │   └── libdd-profiling v1.0.0
      │   │   │       └── (dev) libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2
      │   │       ├── libdd-common v4.2.0 (*)
      │   │       └── libdd-profiling v1.0.0 (*)
      │   ├── libdd-common v4.2.0 (*)
      │   ├── libdd-profiling v1.0.0 (*)
      │   ├── reqwest v0.13.2 (*)
      │   ├── rustls-platform-verifier v0.6.2
      │   │   ├── libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2 (*)
      │   └── tokio-rustls v0.26.0
      │       ├── hyper-rustls v0.27.7 (*)
      │       ├── libdd-common v4.2.0 (*)
      │       └── reqwest v0.13.2 (*)
      └── rustls-platform-verifier v0.6.2 (*)

error[vulnerability]: Name constraints were accepted for certificates asserting a wildcard name
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:181:1
    │
181 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0099
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0099
    ├ Permitted subtree name constraints for DNS names were accepted for certificates asserting a wildcard name.
      
      This was incorrect because, given a name constraint of `accept.example.com`, `*.example.com` could feasibly allow a name of `reject.example.com` which is outside the constraint.
      This is very similar to [CVE-2025-61727](https://go.dev/issue/76442).
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-xgp8-3hg3-c2mh](https://github.com/rustls/webpki/security/advisories/GHSA-xgp8-3hg3-c2mh). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      ├── rustls v0.23.37
      │   ├── hyper-rustls v0.27.7
      │   │   ├── libdd-common v4.2.0
      │   │   │   └── libdd-profiling v1.0.0
      │   │   │       └── (dev) libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2
      │   │       ├── libdd-common v4.2.0 (*)
      │   │       └── libdd-profiling v1.0.0 (*)
      │   ├── libdd-common v4.2.0 (*)
      │   ├── libdd-profiling v1.0.0 (*)
      │   ├── reqwest v0.13.2 (*)
      │   ├── rustls-platform-verifier v0.6.2
      │   │   ├── libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2 (*)
      │   └── tokio-rustls v0.26.0
      │       ├── hyper-rustls v0.27.7 (*)
      │       ├── libdd-common v4.2.0 (*)
      │       └── reqwest v0.13.2 (*)
      └── rustls-platform-verifier v0.6.2 (*)

error[vulnerability]: Reachable panic in certificate revocation list parsing
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:181:1
    │
181 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0104
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0104
    ├ A panic was reachable when parsing certificate revocation lists via [`BorrowedCertRevocationList::from_der`]
      or [`OwnedCertRevocationList::from_der`].  This was the result of mishandling a syntactically valid empty
      `BIT STRING` appearing in the `onlySomeReasons` element of a `IssuingDistributionPoint` CRL extension.
      
      This panic is reachable prior to a CRL's signature being verified.
      
      Applications that do not use CRLs are not affected.
      
      Thank you to @tynus3 for the report.
    ├ Solution: Upgrade to >=0.103.13, <0.104.0-alpha.1 OR >=0.104.0-alpha.7 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      ├── rustls v0.23.37
      │   ├── hyper-rustls v0.27.7
      │   │   ├── libdd-common v4.2.0
      │   │   │   └── libdd-profiling v1.0.0
      │   │   │       └── (dev) libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2
      │   │       ├── libdd-common v4.2.0 (*)
      │   │       └── libdd-profiling v1.0.0 (*)
      │   ├── libdd-common v4.2.0 (*)
      │   ├── libdd-profiling v1.0.0 (*)
      │   ├── reqwest v0.13.2 (*)
      │   ├── rustls-platform-verifier v0.6.2
      │   │   ├── libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2 (*)
      │   └── tokio-rustls v0.26.0
      │       ├── hyper-rustls v0.27.7 (*)
      │       ├── libdd-common v4.2.0 (*)
      │       └── reqwest v0.13.2 (*)
      └── rustls-platform-verifier v0.6.2 (*)

advisories FAILED, bans ok, sources ok

Updated: 2026-06-10 18:36:46 UTC | Commit: 193ac93 | dependency-check job results

codecov-commenter · 2026-05-27T19:40:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.57%. Comparing base (477834f) to head (6726636).

Additional details and impacted files

@@                              Coverage Diff                               @@
##           taegyunkim/prof-14423-prof-dictinary-bench    #2048      +/-   ##
==============================================================================
+ Coverage                                       73.53%   73.57%   +0.04%     
==============================================================================
  Files                                             475      475              
  Lines                                           79007    79200     +193     
==============================================================================
+ Hits                                            58095    58270     +175     
- Misses                                          20912    20930      +18

Components	Coverage Δ
libdd-crashtracker	`65.34% <ø> (+0.01%)`	⬆️
libdd-crashtracker-ffi	`37.68% <ø> (ø)`
libdd-agent-client	`83.79% <ø> (ø)`
libdd-alloc	`99.10% <100.00%> (+0.32%)`	⬆️
libdd-data-pipeline	`86.25% <ø> (-0.02%)`	⬇️
libdd-data-pipeline-ffi	`73.86% <ø> (ø)`
libdd-common	`79.93% <ø> (ø)`
libdd-common-ffi	`74.41% <ø> (ø)`
libdd-telemetry	`73.37% <ø> (+0.02%)`	⬆️
libdd-telemetry-ffi	`31.36% <ø> (ø)`
libdd-dogstatsd-client	`82.64% <ø> (ø)`
datadog-ipc	`74.90% <ø> (-1.47%)`	⬇️
libdd-profiling	`81.87% <100.00%> (+0.18%)`	⬆️
libdd-profiling-ffi	`64.79% <ø> (ø)`
libdd-sampling	`97.48% <ø> (ø)`
datadog-sidecar	`36.51% <ø> (ø)`
datdog-sidecar-ffi	`12.23% <ø> (ø)`
spawn-worker	`48.86% <ø> (ø)`
libdd-tinybytes	`93.80% <ø> (ø)`
libdd-trace-normalization	`81.71% <ø> (ø)`
libdd-trace-obfuscation	`87.30% <ø> (ø)`
libdd-trace-protobuf	`68.25% <ø> (ø)`
libdd-trace-utils	`89.32% <ø> (ø)`
libdd-tracer-flare	`86.57% <ø> (ø)`
libdd-log	`74.83% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

datadog-datadog-prod-us1-2 · 2026-05-27T19:41:13Z

Tests

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

Required checks pass | allchecks

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 73.61% (+0.09%)

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: b8f8c6d | Docs | Datadog PR Page | Give us feedback!}

dd-octo-sts · 2026-05-27T20:01:38Z

Artifact Size Benchmark Report

aarch64-alpine-linux-musl

Artifact	Baseline	Commit	Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so	7.70 MB	7.70 MB	0% (0 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a	83.68 MB	83.68 MB	-0% (-1.79 KB) 👌

aarch64-unknown-linux-gnu

Artifact	Baseline	Commit	Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so	10.34 MB	10.34 MB	-0% (-8 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a	94.78 MB	94.78 MB	-0% (-1.65 KB) 👌

libdatadog-x64-windows

Artifact	Baseline	Commit	Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll	24.83 MB	24.83 MB	+0% (+1.00 KB) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib	86.89 KB	86.89 KB	0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb	180.83 MB	180.82 MB	-0% (-16.00 KB) 👌
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib	925.68 MB	925.68 MB	+0% (+5.18 KB) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll	8.09 MB	8.09 MB	0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib	86.89 KB	86.89 KB	0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb	23.93 MB	23.93 MB	0% (0 B) 👌
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib	47.78 MB	47.78 MB	+0% (+464 B) 👌

libdatadog-x86-windows

Artifact	Baseline	Commit	Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll	21.52 MB	21.52 MB	0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib	88.26 KB	88.26 KB	0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb	184.90 MB	184.89 MB	-0% (-8.00 KB) 👌
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib	918.35 MB	918.36 MB	+0% (+5.24 KB) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll	6.24 MB	6.24 MB	0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib	88.26 KB	88.26 KB	0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb	25.66 MB	25.66 MB	0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib	45.41 MB	45.41 MB	+0% (+500 B) 👌

x86_64-alpine-linux-musl

Artifact	Baseline	Commit	Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a	74.60 MB	74.60 MB	-0% (-48 B) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so	8.58 MB	8.58 MB	0% (0 B) 👌

x86_64-unknown-linux-gnu

Artifact	Baseline	Commit	Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a	90.02 MB	90.02 MB	+0% (+144 B) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so	10.44 MB	10.44 MB	0% (0 B) 👌

ivoanjo · 2026-05-28T08:39:27Z

Note that historically the tension here was between fragmentation and memory use -- that's why we set the higher defaults. (See for instance https://docs.google.com/document/d/1g_H7G9s_H9yoxlpyw_B0aoUyIVmo0ZQBzQkp5EUUyX8/edit?tab=t.0 )

This not to say that we can't or shouldn't adjust these numbers, it's more to add context to why larger numbers were chosen rather than starting with smallest possible and just letting it grow.

taegyunkim · 2026-06-04T18:09:52Z

Note that historically the tension here was between fragmentation and memory use -- that's why we set the higher defaults. (See for instance https://docs.google.com/document/d/1g_H7G9s_H9yoxlpyw_B0aoUyIVmo0ZQBzQkp5EUUyX8/edit?tab=t.0 )

This not to say that we can't or shouldn't adjust these numbers, it's more to add context to why larger numbers were chosen rather than starting with smallest possible and just letting it grow.

@ivoanjo Thanks for the context! That makes sense, and this is why this PR uses capped geometric growth.

A couple of differences make this less risky than the story from your report:

These profiler dictionary/per profile string-table arenas use ChainAllocator<VirtualAllocator>, so on Unix they allocate via mmap, not glibc malloc.
The chunks are arena-owned and long-lived. We're not creating malloc/free churn interleaved with runtime allocations.
Larger workloads converge back to the historical chunk sizes.

So this keeps the lower memory floor for small/common profiles, while avoiding the "smallest possible and just keep growing tiny chunks" behavior.

I agree we should validate this with real workloads, especially Ruby if we're worried about fragmentation.

ivoanjo · 2026-06-05T07:50:25Z

Ahh that's great, thanks for the extra context. In particular, I missed the detail where these come from mmap directly -- in that case I indeed expect the likelihood of fragmentation is way way lower (e.g. address space fragmentation could be possible but... I've not heard of it happening very commonly so hopefully the kernel/glibc do a good job there?).

Excited to see the improvements from this one :D

taegyunkim · 2026-06-05T13:58:03Z

@ivoanjo the DoE results look very good for Python with this change

For all three archetypes, we see reduction in heap live size, heap live samples, allocated memory, allocations without change in cpu-time.

Enterprise

Latency

Throughput

…nkim/profiles-dictionary-memory-footprint

danielsn · 2026-06-09T17:54:31Z

 /// doesn't have enough space for the requested allocation, and then links the
 /// new [LinearAllocator] to the previous one, creating a chain. This is where
-/// its name comes from.
+/// its name comes from. Each successful growth doubles the target chunk size


have we experimented with other factors? e.g. 1.5x would still grow geometrically, but not as fast.

danielsn · 2026-06-09T17:56:57Z

    /// this in mind when sizing your hint if you are trying to be precise,
    /// such as making sure a specific object fits.
    pub const fn new_in(chunk_size_hint: usize, allocator: A) -> Self {
+        let initial_node_size = Self::normalize_node_size(chunk_size_hint);


why does one of these get a function and the other is inline?

danielsn · 2026-06-09T17:58:29Z

+        assert!(function_arena_reserved_bytes(&dict) <= 4 * SMALL_ARENA_HINT);
+        assert!(mapping_arena_reserved_bytes(&dict) <= 2 * SMALL_ARENA_HINT);


where do these constants come from?

danielsn · 2026-06-09T17:59:21Z

-    pub const SIZE_HINT: usize = 1024 * 1024;
+    // Keep the per-shard arena small; larger dictionaries grow
+    // geometrically up to the historical 1 MiB chunk size.
+    pub const SIZE_HINT: usize = 64 * 1024;


why this constant?

should this be named INITIAL_SIZE_HINT

danielsn · 2026-06-09T18:00:41Z

+        // geometrically up to the historical 4 MiB chunk size, while common
+        // profiles fit comfortably below this initial size. Talk to .NET
+        // profiling engineers before making this any bigger.
+        const SIZE_HINT: usize = 512 * 1024;


for the other case, we went from 64K-1M, here we go from 512K-4M. Why

danielsn · 2026-06-10T15:24:43Z


        let bool_layout = Layout::new::<bool>();

+        const GROWTH_ITERATIONS: usize = 16;


why the reduction?

danielsn · 2026-06-10T15:26:14Z

+        if Layout::from_size_align(next, align).is_ok() {
+            next
+        } else {
+            current


Why is this the right fall-back when from_size_align fails?

danielsn · 2026-06-10T15:26:44Z

-            } else {
-                chunk_size_hint
-            },
+            node_size: Cell::new(initial_node_size),


document why this need to be a cell?

taegyunkim force-pushed the taegyunkim/profiles-dictionary-memory-footprint branch 2 times, most recently from 3da10e3 to 477c1f4 Compare May 27, 2026 20:47

taegyunkim mentioned this pull request May 29, 2026

perf(profiling): bench branch (do not merge) DataDog/dd-trace-py#18335

Draft

taegyunkim mentioned this pull request Jun 4, 2026

perf(profiling): reduce profiler arena memory footprint DataDog/dd-trace-py#18469

Draft

taegyunkim force-pushed the taegyunkim/profiles-dictionary-memory-footprint branch from 45a451c to 4686b93 Compare June 5, 2026 14:07

taegyunkim changed the base branch from main to taegyunkim/prof-14423-prof-dictinary-bench June 5, 2026 14:07

taegyunkim force-pushed the taegyunkim/prof-14423-prof-dictinary-bench branch from 2e23e15 to 477834f Compare June 5, 2026 20:26

perf(profiling): reduce profiler arena memory footprint

6726636

taegyunkim force-pushed the taegyunkim/profiles-dictionary-memory-footprint branch from 65a7aff to 6726636 Compare June 5, 2026 20:46

morrisonlevi reviewed Jun 8, 2026

View reviewed changes

Comment thread libdd-profiling/src/collections/string_table/mod.rs Outdated

taegyunkim added 2 commits June 9, 2026 17:16

Merge branch 'taegyunkim/prof-14423-prof-dictinary-bench' into taegyu…

3fe55d6

…nkim/profiles-dictionary-memory-footprint

update comments

7de57c1

danielsn reviewed Jun 10, 2026

View reviewed changes

perf(profiling): reduce dictionary string shards

b8f8c6d

		assert!(function_arena_reserved_bytes(&dict) <= 4 * SMALL_ARENA_HINT);
		assert!(mapping_arena_reserved_bytes(&dict) <= 2 * SMALL_ARENA_HINT);


		let bool_layout = Layout::new::<bool>();

		const GROWTH_ITERATIONS: usize = 16;

Conversation

taegyunkim commented May 27, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Shard count

Additional Notes

How to test the change?

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📚 Documentation Check Results

📦 libdd-alloc - 3 warning(s)

📦 libdd-profiling - 650 warning(s)

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Clippy Allow Annotation Report

Summary by Rule

Annotation Counts by File

Annotation Stats by Crate

About This Report

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Cargo Deny Results

📦 libdd-alloc - ✅ No issues

📦 libdd-profiling - 6 error(s)

Uh oh!

codecov-commenter commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

datadog-datadog-prod-us1-2 Bot commented May 27, 2026 • edited by datadog-official Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

dd-octo-sts Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Artifact Size Benchmark Report

Uh oh!

ivoanjo commented May 28, 2026

Uh oh!

taegyunkim commented Jun 4, 2026

Uh oh!

ivoanjo commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taegyunkim commented Jun 5, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

taegyunkim commented May 27, 2026 •

edited by atlassian Bot

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

📦 `libdd-alloc` - 3 warning(s)

📦 `libdd-profiling` - 650 warning(s)

github-actions Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

📦 `libdd-alloc` - ✅ No issues

📦 `libdd-profiling` - 6 error(s)

codecov-commenter commented May 27, 2026 •

edited

Loading

datadog-datadog-prod-us1-2 Bot commented May 27, 2026 •

edited by datadog-official Bot

Loading

dd-octo-sts Bot commented May 27, 2026 •

edited

Loading

ivoanjo commented Jun 5, 2026 •

edited

Loading