parallel quest smoke test#1631
Conversation
WalkthroughCompose configs switch quest test run modes and add runtime-generated stream IDs; HTTP/RBAC code refactors serialize metadata persistence under a shared update lock and reorder USERS vs SESSIONS lock usage so session mutations occur after USERS locks are dropped; one resource-utilization middleware was removed from several routes. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTP_Handler
participant USERS
participant UPDATE_LOCK
participant METADATA_STORE
participant SESSIONS
Client->>HTTP_Handler: mutating RBAC request (e.g., add role / create user)
HTTP_Handler->>USERS: in-memory validations (read or prepare)
HTTP_Handler->>UPDATE_LOCK: acquire
UPDATE_LOCK->>METADATA_STORE: get_metadata
METADATA_STORE-->>UPDATE_LOCK: metadata
UPDATE_LOCK->>METADATA_STORE: put_metadata (persist changes)
UPDATE_LOCK-->>HTTP_Handler: release
HTTP_Handler->>USERS: update in-memory caches (put_user/mut_roles)
HTTP_Handler->>SESSIONS: refresh/remove sessions as needed
SESSIONS-->>Client: session updates reflected
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsTimed out fetching pipeline failures after 30000ms Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/handlers/http/rbac.rs (1)
289-317:⚠️ Potential issue | 🟠 MajorRecheck group membership after acquiring
UPDATE_LOCK.
Users.get_user_groups(&userid, &tenant_id)now runs before the serialized metadata delete. A concurrent group-update request can add this user to a group in that gap, after which this handler removes the user frommetadata.userswithout removing the new membership. Please validate "user is not in any groups" against the locked metadata state immediately beforeretain(...).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/rbac.rs` around lines 289 - 317, The current check using Users.get_user_groups(&userid, &tenant_id) happens before acquiring UPDATE_LOCK, leaving a race where a concurrent group update can add memberships; after acquiring UPDATE_LOCK (UPDATE_LOCK.lock().await) re-validate group membership against the locked metadata by loading metadata via get_metadata(&tenant_id).await? and checking metadata.users/groups to ensure the user has no group memberships before calling metadata.users.retain(|user| user.userid() != userid); if any membership exists return Err(RBACError::InvalidDeletionRequest(...)).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/handlers/http/modal/query/querier_rbac.rs`:
- Around line 129-168: The code currently calls put_metadata(&metadata,
&tenant_id).await? inside the for user_group in user_groups loop which prevents
persisting the metadata change when user_groups is empty and causes one storage
write per group; move the persistence call so it occurs once after processing
all groups (after the loop that builds groups_to_update and updates
metadata.user_groups) so that metadata.users.retain(...) is always persisted
even if Users.get_user_groups(&userid, &tenant_id) returns empty; keep the
UPDATE_LOCK scope and ensure you still update metadata.user_groups from
groups_to_update (using the existing find/clone_from logic) before calling
put_metadata(&metadata, &tenant_id).await?.
- Around line 76-98: The in-memory RBAC update (Users.put_user and similar
Users.* calls) must be performed while holding the same UPDATE_LOCK that
protects metadata persistence so the in-memory state is ordered with the storage
write; move the Users.put_user call into the critical section where
get_metadata, metadata.users.push(...), and put_metadata(...) occur (i.e. keep
the Users update before releasing UPDATE_LOCK after user::User::new_basic and
put_metadata). Apply the same change to the other affected blocks (around the
ranges that modify metadata and then call Users.*: lines ~223-253, ~320-352,
~367-389) so add/remove role and password reset mutations are done under
UPDATE_LOCK to avoid write reordering across concurrent writers.
In `@src/handlers/http/modal/query/querier_role.rs`:
- Around line 78-83: The in-memory role-cache update (mut_roles()) must remain
serialized with the metadata write guarded by UPDATE_LOCK to prevent stale cache
state; move the mut_roles() call (the code that updates the in-memory roles
cache) inside the same UPDATE_LOCK critical section that calls get_metadata(...)
and put_metadata(...), or extend the scope of _guard so the cache mutation
occurs before _guard is dropped; apply the same change to the other similar
block referenced (the one around mut_roles usage at lines 148-163) so both
persistence and cache updates are atomic relative to UPDATE_LOCK.
In `@src/handlers/http/rbac.rs`:
- Around line 151-176: The in-memory Users updates are applied after releasing
UPDATE_LOCK which breaks write ordering; fix by performing the in-memory
mutations while still holding the lock so metadata and cache are updated
atomically: inside the UPDATE_LOCK critical section (the block around
get_metadata/put_metadata and where you create user via user::User::new_basic)
call Users.put_user(...) and Users.add_roles(...) (or Users.remove_roles where
appropriate) before returning the (user, password) tuple or leaving the block;
apply the same change to the other affected handlers (the blocks around lines
noted) so every metadata write via put_metadata(...) is immediately followed by
the corresponding Users.* updates while holding UPDATE_LOCK.
In `@src/handlers/http/role.rs`:
- Around line 59-67: The cache write must remain serialized with the persistent
metadata write: keep the UPDATE_LOCK guard across the in-memory updates so the
sequence get_metadata(..) -> metadata.roles.insert(...) -> put_metadata(..) and
the subsequent mut_roles()/DEFAULT_ROLE updates occur while _guard is held; move
the tenant_str/ mut_roles()/DEFAULT_ROLE modifications inside the same block
that holds UPDATE_LOCK (and apply the same change to the other occurrences
around mut_roles()/DEFAULT_ROLE) so concurrent requests cannot reorder storage
and cache writes.
---
Outside diff comments:
In `@src/handlers/http/rbac.rs`:
- Around line 289-317: The current check using Users.get_user_groups(&userid,
&tenant_id) happens before acquiring UPDATE_LOCK, leaving a race where a
concurrent group update can add memberships; after acquiring UPDATE_LOCK
(UPDATE_LOCK.lock().await) re-validate group membership against the locked
metadata by loading metadata via get_metadata(&tenant_id).await? and checking
metadata.users/groups to ensure the user has no group memberships before calling
metadata.users.retain(|user| user.userid() != userid); if any membership exists
return Err(RBACError::InvalidDeletionRequest(...)).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 18054772-fd6e-4c54-a017-5689e7687646
📒 Files selected for processing (7)
src/handlers/http/middleware.rssrc/handlers/http/modal/query/querier_rbac.rssrc/handlers/http/modal/query/querier_role.rssrc/handlers/http/modal/server.rssrc/handlers/http/rbac.rssrc/handlers/http/role.rssrc/rbac/mod.rs
| let (user, password) = { | ||
| let _guard = UPDATE_LOCK.lock().await; | ||
| let mut metadata = get_metadata(&tenant_id).await?; | ||
|
|
||
| if Users.contains(&username, &tenant_id) | ||
| || metadata.users.iter().any( | ||
| |user| matches!(&user.ty, UserType::Native(basic) if basic.username == username), | ||
| ) | ||
| { | ||
| return Err(RBACError::UserExists(username)); | ||
| } | ||
|
|
||
| let (mut user, password) = | ||
| user::User::new_basic(username.clone(), tenant_id.clone(), false); | ||
| user.roles.clone_from(&user_roles); | ||
| metadata.users.push(user.clone()); | ||
| put_metadata(&metadata, &tenant_id).await?; | ||
| (user, password) | ||
| }; | ||
|
|
||
| put_metadata(&metadata, &tenant_id).await?; | ||
| // let created_role = user_roles.clone(); | ||
| // Update in-memory state after releasing the lock | ||
| Users.put_user(user.clone()); | ||
|
|
There was a problem hiding this comment.
Serialize the Users.* mutation with the metadata mutation.
These handlers now persist metadata under UPDATE_LOCK and then update the in-memory RBAC state afterward. That is not ordered across concurrent writers: request A can win the storage write first, request B can win it second, and then the post-lock Users.* calls can run in the reverse order. Password reset is the clearest failure mode here, but add/remove role has the same problem. The query node can end up serving stale passwords or stale role memberships until reload.
Also applies to: 223-253, 320-352, 367-389
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/handlers/http/modal/query/querier_rbac.rs` around lines 76 - 98, The
in-memory RBAC update (Users.put_user and similar Users.* calls) must be
performed while holding the same UPDATE_LOCK that protects metadata persistence
so the in-memory state is ordered with the storage write; move the
Users.put_user call into the critical section where get_metadata,
metadata.users.push(...), and put_metadata(...) occur (i.e. keep the Users
update before releasing UPDATE_LOCK after user::User::new_basic and
put_metadata). Apply the same change to the other affected blocks (around the
ranges that modify metadata and then call Users.*: lines ~223-253, ~320-352,
~367-389) so add/remove role and password reset mutations are done under
UPDATE_LOCK to avoid write reordering across concurrent writers.
| { | ||
| let _guard = UPDATE_LOCK.lock().await; | ||
| let mut metadata = get_metadata(&tenant_id).await?; | ||
| metadata.users.retain(|user| user.userid() != userid); | ||
|
|
||
| // also delete from user groups | ||
| let user_groups = Users.get_user_groups(&userid, &tenant_id); | ||
| let mut groups_to_update = Vec::new(); | ||
|
|
||
| for user_group in user_groups { | ||
| if let Some(groups) = write_user_groups().get_mut(tenant) | ||
| && let Some(ug) = groups.get_mut(&user_group) | ||
| && let Some(users) = users().get(tenant) | ||
| && let Some(user) = users.get(&userid) | ||
| { | ||
| let userid = match &user.ty { | ||
| UserType::Native(basic) => basic.username.clone(), | ||
| UserType::OAuth(oauth) => oauth.userid.clone(), | ||
| UserType::ApiKey(api_key) => api_key.userid.clone(), | ||
| }; | ||
| ug.remove_users_by_user_ids(HashSet::from_iter([userid]))?; | ||
| groups_to_update.push(ug.clone()); | ||
| } else { | ||
| // User not found, skip or log as needed | ||
| continue; | ||
| } | ||
|
|
||
| for updated_group in &groups_to_update { | ||
| if let Some(existing) = metadata | ||
| .user_groups | ||
| .iter_mut() | ||
| .find(|ug| ug.name == updated_group.name) | ||
| { | ||
| existing.clone_from(updated_group); | ||
| } else { | ||
| metadata.user_groups.push(updated_group.clone()); | ||
| } | ||
| } | ||
|
|
||
| put_metadata(&metadata, &tenant_id).await?; |
There was a problem hiding this comment.
Persist the delete even when the user has no groups.
put_metadata(&metadata, &tenant_id) is inside the for user_group in user_groups loop. If user_groups is empty, metadata.users.retain(...) is never written, so the delete only happens in memory and the user comes back after reload. Even when groups exist, this does one storage write per group.
Suggested fix
{
let _guard = UPDATE_LOCK.lock().await;
let mut metadata = get_metadata(&tenant_id).await?;
metadata.users.retain(|user| user.userid() != userid);
// also delete from user groups
let user_groups = Users.get_user_groups(&userid, &tenant_id);
- let mut groups_to_update = Vec::new();
for user_group in user_groups {
if let Some(groups) = write_user_groups().get_mut(tenant)
&& let Some(ug) = groups.get_mut(&user_group)
&& let Some(users) = users().get(tenant)
&& let Some(user) = users.get(&userid)
{
let userid = match &user.ty {
UserType::Native(basic) => basic.username.clone(),
UserType::OAuth(oauth) => oauth.userid.clone(),
UserType::ApiKey(api_key) => api_key.userid.clone(),
};
ug.remove_users_by_user_ids(HashSet::from_iter([userid]))?;
- groups_to_update.push(ug.clone());
- } else {
- // User not found, skip or log as needed
- continue;
- }
-
- for updated_group in &groups_to_update {
if let Some(existing) = metadata
.user_groups
.iter_mut()
- .find(|ug| ug.name == updated_group.name)
+ .find(|existing| existing.name == ug.name)
{
- existing.clone_from(updated_group);
+ existing.clone_from(ug);
} else {
- metadata.user_groups.push(updated_group.clone());
+ metadata.user_groups.push(ug.clone());
}
+ } else {
+ continue;
}
-
- put_metadata(&metadata, &tenant_id).await?;
}
+ put_metadata(&metadata, &tenant_id).await?;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| { | |
| let _guard = UPDATE_LOCK.lock().await; | |
| let mut metadata = get_metadata(&tenant_id).await?; | |
| metadata.users.retain(|user| user.userid() != userid); | |
| // also delete from user groups | |
| let user_groups = Users.get_user_groups(&userid, &tenant_id); | |
| let mut groups_to_update = Vec::new(); | |
| for user_group in user_groups { | |
| if let Some(groups) = write_user_groups().get_mut(tenant) | |
| && let Some(ug) = groups.get_mut(&user_group) | |
| && let Some(users) = users().get(tenant) | |
| && let Some(user) = users.get(&userid) | |
| { | |
| let userid = match &user.ty { | |
| UserType::Native(basic) => basic.username.clone(), | |
| UserType::OAuth(oauth) => oauth.userid.clone(), | |
| UserType::ApiKey(api_key) => api_key.userid.clone(), | |
| }; | |
| ug.remove_users_by_user_ids(HashSet::from_iter([userid]))?; | |
| groups_to_update.push(ug.clone()); | |
| } else { | |
| // User not found, skip or log as needed | |
| continue; | |
| } | |
| for updated_group in &groups_to_update { | |
| if let Some(existing) = metadata | |
| .user_groups | |
| .iter_mut() | |
| .find(|ug| ug.name == updated_group.name) | |
| { | |
| existing.clone_from(updated_group); | |
| } else { | |
| metadata.user_groups.push(updated_group.clone()); | |
| } | |
| } | |
| put_metadata(&metadata, &tenant_id).await?; | |
| { | |
| let _guard = UPDATE_LOCK.lock().await; | |
| let mut metadata = get_metadata(&tenant_id).await?; | |
| metadata.users.retain(|user| user.userid() != userid); | |
| // also delete from user groups | |
| let user_groups = Users.get_user_groups(&userid, &tenant_id); | |
| for user_group in user_groups { | |
| if let Some(groups) = write_user_groups().get_mut(tenant) | |
| && let Some(ug) = groups.get_mut(&user_group) | |
| && let Some(users) = users().get(tenant) | |
| && let Some(user) = users.get(&userid) | |
| { | |
| let userid = match &user.ty { | |
| UserType::Native(basic) => basic.username.clone(), | |
| UserType::OAuth(oauth) => oauth.userid.clone(), | |
| UserType::ApiKey(api_key) => api_key.userid.clone(), | |
| }; | |
| ug.remove_users_by_user_ids(HashSet::from_iter([userid]))?; | |
| if let Some(existing) = metadata | |
| .user_groups | |
| .iter_mut() | |
| .find(|existing| existing.name == ug.name) | |
| { | |
| existing.clone_from(ug); | |
| } else { | |
| metadata.user_groups.push(ug.clone()); | |
| } | |
| } else { | |
| continue; | |
| } | |
| } | |
| put_metadata(&metadata, &tenant_id).await?; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/handlers/http/modal/query/querier_rbac.rs` around lines 129 - 168, The
code currently calls put_metadata(&metadata, &tenant_id).await? inside the for
user_group in user_groups loop which prevents persisting the metadata change
when user_groups is empty and causes one storage write per group; move the
persistence call so it occurs once after processing all groups (after the loop
that builds groups_to_update and updates metadata.user_groups) so that
metadata.users.retain(...) is always persisted even if
Users.get_user_groups(&userid, &tenant_id) returns empty; keep the UPDATE_LOCK
scope and ensure you still update metadata.user_groups from groups_to_update
(using the existing find/clone_from logic) before calling
put_metadata(&metadata, &tenant_id).await?.
| { | ||
| let _guard = UPDATE_LOCK.lock().await; | ||
| let mut metadata = get_metadata(&tenant_id).await?; | ||
| metadata.roles.insert(name.clone(), role.clone()); | ||
| put_metadata(&metadata, &tenant_id).await?; | ||
| } |
There was a problem hiding this comment.
Don't release UPDATE_LOCK before the mut_roles() update below.
The metadata write is serialized, but the in-memory role-cache write is no longer part of that ordering. Concurrent put/delete requests can therefore leave mut_roles() with an older value than the one that was last persisted, which will skew later role validation and authorization until the cache is rebuilt.
Also applies to: 148-163
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/handlers/http/modal/query/querier_role.rs` around lines 78 - 83, The
in-memory role-cache update (mut_roles()) must remain serialized with the
metadata write guarded by UPDATE_LOCK to prevent stale cache state; move the
mut_roles() call (the code that updates the in-memory roles cache) inside the
same UPDATE_LOCK critical section that calls get_metadata(...) and
put_metadata(...), or extend the scope of _guard so the cache mutation occurs
before _guard is dropped; apply the same change to the other similar block
referenced (the one around mut_roles usage at lines 148-163) so both persistence
and cache updates are atomic relative to UPDATE_LOCK.
| let (user, password) = { | ||
| let _guard = UPDATE_LOCK.lock().await; | ||
| let mut metadata = get_metadata(&tenant_id).await?; | ||
|
|
||
| if Users.contains(&userid, &tenant_id) | ||
| || metadata.users.iter().any(|user| match &user.ty { | ||
| UserType::Native(basic) => basic.username == userid, | ||
| UserType::OAuth(_) => false, | ||
| UserType::ApiKey(_) => false, | ||
| }) | ||
| { | ||
| return Err(RBACError::UserExists(userid)); | ||
| } | ||
|
|
||
| metadata.users.push(user.clone()); | ||
| let (mut user, password) = user::User::new_basic(userid.clone(), tenant_id.clone(), false); | ||
| user.roles.clone_from(&user_roles); | ||
| metadata.users.push(user.clone()); | ||
| put_metadata(&metadata, &tenant_id).await?; | ||
| (user, password) | ||
| }; | ||
|
|
||
| put_metadata(&metadata, &tenant_id).await?; | ||
| let created_role = user_roles.clone(); | ||
| Users.put_user(user.clone()); | ||
| if !created_role.is_empty() { | ||
| add_roles_to_user( | ||
| req, | ||
| web::Path::<String>::from(userid.clone()), | ||
| web::Json(created_role), | ||
| ) | ||
| .await?; | ||
| // Update in-memory state after releasing the lock | ||
| Users.put_user(user); | ||
| if !user_roles.is_empty() { | ||
| Users.add_roles(&userid, user_roles, &tenant_id); | ||
| } |
There was a problem hiding this comment.
The post-lock Users.* update can now apply out of order.
These handlers persist metadata under UPDATE_LOCK and then mutate the in-memory RBAC state afterward. That breaks write ordering across concurrent requests: two password resets can leave the cache on the older hash, and add/remove role can similarly leave cached memberships behind the persisted state. The in-memory auth path will then disagree with storage until the cache is rebuilt.
Also applies to: 195-217, 383-399, 459-476
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/handlers/http/rbac.rs` around lines 151 - 176, The in-memory Users
updates are applied after releasing UPDATE_LOCK which breaks write ordering; fix
by performing the in-memory mutations while still holding the lock so metadata
and cache are updated atomically: inside the UPDATE_LOCK critical section (the
block around get_metadata/put_metadata and where you create user via
user::User::new_basic) call Users.put_user(...) and Users.add_roles(...) (or
Users.remove_roles where appropriate) before returning the (user, password)
tuple or leaving the block; apply the same change to the other affected handlers
(the blocks around lines noted) so every metadata write via put_metadata(...) is
immediately followed by the corresponding Users.* updates while holding
UPDATE_LOCK.
| { | ||
| let _guard = UPDATE_LOCK.lock().await; | ||
| let mut metadata = get_metadata(&tenant_id).await?; | ||
| metadata.roles.insert(name.clone(), role.clone()); | ||
| put_metadata(&metadata, &tenant_id).await?; | ||
| } | ||
|
|
||
| let tenant_id = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT); | ||
| // Update in-memory state after releasing the lock | ||
| let tenant_str = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT).to_owned(); |
There was a problem hiding this comment.
Keep the cache write serialized with the metadata write.
These changes now drop UPDATE_LOCK before the matching mut_roles() / DEFAULT_ROLE update that follows. Two concurrent requests can therefore persist storage in order A→B and then apply the cache writes in order B→A, leaving memory behind the source of truth. A put racing with delete, or two put_default calls, can reproduce this.
Also applies to: 141-157, 175-184
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/handlers/http/role.rs` around lines 59 - 67, The cache write must remain
serialized with the persistent metadata write: keep the UPDATE_LOCK guard across
the in-memory updates so the sequence get_metadata(..) ->
metadata.roles.insert(...) -> put_metadata(..) and the subsequent
mut_roles()/DEFAULT_ROLE updates occur while _guard is held; move the
tenant_str/ mut_roles()/DEFAULT_ROLE modifications inside the same block that
holds UPDATE_LOCK (and apply the same change to the other occurrences around
mut_roles()/DEFAULT_ROLE) so concurrent requests cannot reorder storage and
cache writes.
Fixes #XXXX.
Description
This PR has:
Summary by CodeRabbit