⚡ [performance improvement] Defer PathBuf allocations in Tarjan's SCC DFS#149
⚡ [performance improvement] Defer PathBuf allocations in Tarjan's SCC DFS#149bashandbone wants to merge 1 commit intomainfrom
Conversation
… DFS Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's guide (collapsed on small PRs)Reviewer's GuideOptimizes Tarjan's SCC DFS by eliminating redundant PathBuf allocations during hashmap/set lookups and documents the performance pattern in the project’s bolt notes. Class diagram for TarjanState and InvalidationDetector tarjan_dfs changesclassDiagram
class InvalidationDetector {
+graph: DependencyGraph
+tarjan_dfs(v: Path, state: TarjanState, sccs: Vec~Vec~PathBuf~~)
}
class TarjanState {
+indices: HashMap~PathBuf, usize~
+lowlinks: HashMap~PathBuf, usize~
+on_stack: HashSet~PathBuf~
+stack: Vec~PathBuf~
+index_counter: usize
}
class DependencyGraph {
+get_dependencies(v: Path): Vec~Path~
}
InvalidationDetector --> DependencyGraph : uses
InvalidationDetector --> TarjanState : mutates
%% Inner behavior of tarjan_dfs related to allocation
class tarjan_dfs_inner {
+v_buf: PathBuf
+initialize_node(v: Path, state: TarjanState)
+update_lowlink_from_child(v: Path, dep: Path, state: TarjanState)
}
InvalidationDetector ..> tarjan_dfs_inner : calls
TarjanState o-- PathBuf
TarjanState o-- usize
DependencyGraph o-- Path
TarjanState ..> HashMap
TarjanState ..> HashSet
TarjanState ..> Vec
Flow diagram for tarjan_dfs PathBuf allocation and HashMap lookupsflowchart TD
A["Start tarjan_dfs with v: &Path"] --> B["Create v_buf = v.to_path_buf"]
B --> C["Insert (v_buf.clone, index) into state.indices"]
C --> D["Insert (v_buf.clone, index) into state.lowlinks"]
D --> E["Increment state.index_counter"]
E --> F["Push v_buf.clone onto state.stack"]
F --> G["Insert v_buf into state.on_stack"]
G --> H["Get dependencies = graph.get_dependencies(v)"]
H --> I{For each dep in dependencies}
I --> J{dep not in state.indices?}
J -->|Yes| K["Recurse tarjan_dfs(dep, state, sccs)"]
J -->|No| L{dep in state.on_stack?}
K --> M["Read w_lowlink = state.lowlinks[dep]"]
M --> N["Borrowed lookup: v_lowlink = state.lowlinks.get_mut(v)"]
N --> O["Update *v_lowlink = min(*v_lowlink, w_lowlink)"]
O --> I
L -->|Yes| P["Read w_index = state.indices[dep]"]
P --> Q["Borrowed lookup: v_lowlink = state.lowlinks.get_mut(v)"]
Q --> R["Update *v_lowlink = min(*v_lowlink, w_index)"]
R --> I
L -->|No| I
I -->|Done| S["v_index = state.indices[v]"]
S --> T["v_lowlink = state.lowlinks[v]"]
T --> U{v_lowlink == v_index?}
U -->|No| V["Return from tarjan_dfs"]
U -->|Yes| W["Pop stack until v reached, build SCC"]
W --> X["Push SCC into sccs"]
X --> V
%% Key change: lookups now use borrowed v: &Path instead of repeatedly calling v.to_path_buf()
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- You can further reduce allocations by avoiding repeated
v_buf.clone(); for example, insertv_buf.clone()intoindices, movev_bufintolowlinks(or vice versa), and then rely solely on&Pathfor all subsequent lookups instead of cloning again forstack/on_stackif those collections can also take borrowed keys. - Consider explicitly documenting or enforcing (via type alias or a comment near the
TarjanStatedefinition) that theindices/lowlinksmaps are keyed byPathBufbut intentionally use&Pathfor lookups, to make the use ofBorrow-based lookups clear to future readers.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- You can further reduce allocations by avoiding repeated `v_buf.clone()`; for example, insert `v_buf.clone()` into `indices`, move `v_buf` into `lowlinks` (or vice versa), and then rely solely on `&Path` for all subsequent lookups instead of cloning again for `stack`/`on_stack` if those collections can also take borrowed keys.
- Consider explicitly documenting or enforcing (via type alias or a comment near the `TarjanState` definition) that the `indices`/`lowlinks` maps are keyed by `PathBuf` but intentionally use `&Path` for lookups, to make the use of `Borrow`-based lookups clear to future readers.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
This PR optimizes Tarjan’s strongly-connected-components DFS used by InvalidationDetector by avoiding repeated PathBuf heap allocations during hash lookups, keeping allocations to the per-node initialization path.
Changes:
- Replaced repeated
v.to_path_buf()calls inside SCC traversal lookups with borrowed&Pathlookups intoRapidMap<PathBuf, _>/RapidSet<PathBuf>. - Consolidated per-node
PathBufcreation into a singlev_bufused for initial inserts/pushes. - Added a
.jules/bolt.mdentry documenting the performance lesson/pattern.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| crates/flow/src/incremental/invalidation.rs | Removes per-edge PathBuf allocations in Tarjan DFS by using borrowed &Path for map/set lookups. |
| .jules/bolt.md | Documents the allocation-avoidance pattern for hash lookups in traversal code. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
💡 What
Replaced unnecessary owned
PathBufallocations (.to_path_buf()) within thetarjan_dfsgraph traversal loop with borrowed&Pathlookups insidestate.indices,state.lowlinks, andstate.on_stack.🎯 Why
When dealing with HashMap/HashSet lookups in Rust, one should pass borrowed references if possible, rather than heap-allocating an owned item. In deeply recursive DAG invalidation scenarios (like Tarjan's algorithm finding SCCs), calling
v.to_path_buf()strictly for.get()or.contains()calls added unnecessaryO(E)heap allocations (since an edge lookup triggers it). Utilizingstd::collections::HashMapcapability to take a&Pathdrops these allocations to0, deferring any allocation solely to when nodes are first pushed to a stack or registeredO(V).📊 Measured Improvement
Eliminates
O(E)unnecessary heap allocations ofPathBufduring strongly connected component detection in dependency graph invalidations.🔬 Measurement
Verify via
cargo test -p thread-flow --test invalidation_testsand benchmark improvements during graph SCC searches.PR created automatically by Jules for task 17336176359170230244 started by @bashandbone
Summary by Sourcery
Optimize Tarjan SCC DFS invalidation traversal to eliminate redundant path allocations during hash lookups.
Enhancements: