Skip to content

parallel quest smoke test#1631

Open
parmesant wants to merge 2 commits intoparseablehq:mainfrom
parmesant:update-quest-docker
Open

parallel quest smoke test#1631
parmesant wants to merge 2 commits intoparseablehq:mainfrom
parmesant:update-quest-docker

Conversation

@parmesant
Copy link
Copy Markdown
Contributor

@parmesant parmesant commented Apr 28, 2026

Fixes #XXXX.

Description


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Summary by CodeRabbit

  • Tests
    • Test environment updated to run in parallel with dynamically generated stream IDs and explicit service/storage flags; test runner uses a parallel loading mode for faster execution.
  • Refactor
    • RBAC and session handling reworked to serialize persistent metadata updates and to avoid holding user locks while modifying session state, reducing lock contention.
  • Bug Fixes
    • Removed a resource-utilization middleware from several HTTP routes to restore expected request flow.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

Walkthrough

Compose configs switch quest test run modes and add runtime-generated stream IDs; HTTP/RBAC code refactors serialize metadata persistence under a shared update lock and reorder USERS vs SESSIONS lock usage so session mutations occur after USERS locks are dropped; one resource-utilization middleware was removed from several routes.

Changes

Cohort / File(s) Summary
Docker Compose — test configs
docker-compose-test-with-kafka.yaml, docker-compose-test.yaml
docker-compose-test-with-kafka.yaml: replaces static command array with a shell entrypoint that generates a randomized -stream ID at runtime and runs ./quest.test -test.v -test.parallel=10 with explicit flags for query/minio credentials. docker-compose-test.yaml: changes quest command mode from "load" to "load-parallel".
Session/Users lock ordering & session tracking
src/handlers/http/middleware.rs, src/rbac/..., src/rbac/mod.rs
Reorders lock acquisition so SESSIONS mutations (track_new/remove_user/refresh_user_permissions) happen only after USERS locks are released. Clones needed user/role data while holding USERS lock, drops it, then updates SESSIONS. Uses temporary bool/Option values to carry results across the boundary.
RBAC metadata persistence under shared update lock
src/handlers/http/modal/query/querier_rbac.rs, src/handlers/http/rbac.rs, src/handlers/http/modal/query/querier_role.rs, src/handlers/http/role.rs
Introduces/uses a shared UPDATE_LOCK to serialize metadata get/modify/put operations for role and user mutations. Metadata loads, mutations and put_metadata are performed under the lock; in-memory caches (Users/Roles/DEFAULT_ROLE) and session refresh/remove operations occur after releasing the lock. Some handlers now return computed values from inside the lock for later in-memory updates.
Route middleware removal
src/handlers/http/modal/server.rs
Removes resource_check::check_resource_utilization_middleware (and from_fn adapter) from several HTTP route registrations (query, ingest, counts, ingest OTEL, logstream ingest), leaving authorization guards intact.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTP_Handler
    participant USERS
    participant UPDATE_LOCK
    participant METADATA_STORE
    participant SESSIONS

    Client->>HTTP_Handler: mutating RBAC request (e.g., add role / create user)
    HTTP_Handler->>USERS: in-memory validations (read or prepare)
    HTTP_Handler->>UPDATE_LOCK: acquire
    UPDATE_LOCK->>METADATA_STORE: get_metadata
    METADATA_STORE-->>UPDATE_LOCK: metadata
    UPDATE_LOCK->>METADATA_STORE: put_metadata (persist changes)
    UPDATE_LOCK-->>HTTP_Handler: release
    HTTP_Handler->>USERS: update in-memory caches (put_user/mut_roles)
    HTTP_Handler->>SESSIONS: refresh/remove sessions as needed
    SESSIONS-->>Client: session updates reflected
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰
I nibble locks, then hop away,
Persist by day and cache by play.
Streams spin up with random cheer,
Ten testers racing, loud and clear.
Hooray — the quest runs, fresh and near!

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is entirely the repository template with unfilled placeholders; it contains no actual information about goals, solutions, rationale, or key changes made in the PR. Replace the template placeholders with a detailed description explaining the RBAC deadlock issue being fixed, the lock ordering refactoring across multiple files, the rationale for these changes, and details about the quest test parallelization.
Docstring Coverage ⚠️ Warning Docstring coverage is 4.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'parallel quest smoke test' is related to the changeset, specifically referencing the modification to quest container test execution in docker-compose files, but it is vague and does not capture the substantial RBAC deadlock fixes and lock ordering changes in the codebase that appear to be the primary changes. The title should more clearly reflect the main objectives. Consider updating to something like 'Fix RBAC deadlock and parallelize quest smoke test' to better represent both the core RBAC lock refactoring and the test parallelization.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]
coderabbitai Bot previously approved these changes Apr 28, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/handlers/http/rbac.rs (1)

289-317: ⚠️ Potential issue | 🟠 Major

Recheck group membership after acquiring UPDATE_LOCK.

Users.get_user_groups(&userid, &tenant_id) now runs before the serialized metadata delete. A concurrent group-update request can add this user to a group in that gap, after which this handler removes the user from metadata.users without removing the new membership. Please validate "user is not in any groups" against the locked metadata state immediately before retain(...).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/rbac.rs` around lines 289 - 317, The current check using
Users.get_user_groups(&userid, &tenant_id) happens before acquiring UPDATE_LOCK,
leaving a race where a concurrent group update can add memberships; after
acquiring UPDATE_LOCK (UPDATE_LOCK.lock().await) re-validate group membership
against the locked metadata by loading metadata via
get_metadata(&tenant_id).await? and checking metadata.users/groups to ensure the
user has no group memberships before calling metadata.users.retain(|user|
user.userid() != userid); if any membership exists return
Err(RBACError::InvalidDeletionRequest(...)).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/handlers/http/modal/query/querier_rbac.rs`:
- Around line 129-168: The code currently calls put_metadata(&metadata,
&tenant_id).await? inside the for user_group in user_groups loop which prevents
persisting the metadata change when user_groups is empty and causes one storage
write per group; move the persistence call so it occurs once after processing
all groups (after the loop that builds groups_to_update and updates
metadata.user_groups) so that metadata.users.retain(...) is always persisted
even if Users.get_user_groups(&userid, &tenant_id) returns empty; keep the
UPDATE_LOCK scope and ensure you still update metadata.user_groups from
groups_to_update (using the existing find/clone_from logic) before calling
put_metadata(&metadata, &tenant_id).await?.
- Around line 76-98: The in-memory RBAC update (Users.put_user and similar
Users.* calls) must be performed while holding the same UPDATE_LOCK that
protects metadata persistence so the in-memory state is ordered with the storage
write; move the Users.put_user call into the critical section where
get_metadata, metadata.users.push(...), and put_metadata(...) occur (i.e. keep
the Users update before releasing UPDATE_LOCK after user::User::new_basic and
put_metadata). Apply the same change to the other affected blocks (around the
ranges that modify metadata and then call Users.*: lines ~223-253, ~320-352,
~367-389) so add/remove role and password reset mutations are done under
UPDATE_LOCK to avoid write reordering across concurrent writers.

In `@src/handlers/http/modal/query/querier_role.rs`:
- Around line 78-83: The in-memory role-cache update (mut_roles()) must remain
serialized with the metadata write guarded by UPDATE_LOCK to prevent stale cache
state; move the mut_roles() call (the code that updates the in-memory roles
cache) inside the same UPDATE_LOCK critical section that calls get_metadata(...)
and put_metadata(...), or extend the scope of _guard so the cache mutation
occurs before _guard is dropped; apply the same change to the other similar
block referenced (the one around mut_roles usage at lines 148-163) so both
persistence and cache updates are atomic relative to UPDATE_LOCK.

In `@src/handlers/http/rbac.rs`:
- Around line 151-176: The in-memory Users updates are applied after releasing
UPDATE_LOCK which breaks write ordering; fix by performing the in-memory
mutations while still holding the lock so metadata and cache are updated
atomically: inside the UPDATE_LOCK critical section (the block around
get_metadata/put_metadata and where you create user via user::User::new_basic)
call Users.put_user(...) and Users.add_roles(...) (or Users.remove_roles where
appropriate) before returning the (user, password) tuple or leaving the block;
apply the same change to the other affected handlers (the blocks around lines
noted) so every metadata write via put_metadata(...) is immediately followed by
the corresponding Users.* updates while holding UPDATE_LOCK.

In `@src/handlers/http/role.rs`:
- Around line 59-67: The cache write must remain serialized with the persistent
metadata write: keep the UPDATE_LOCK guard across the in-memory updates so the
sequence get_metadata(..) -> metadata.roles.insert(...) -> put_metadata(..) and
the subsequent mut_roles()/DEFAULT_ROLE updates occur while _guard is held; move
the tenant_str/ mut_roles()/DEFAULT_ROLE modifications inside the same block
that holds UPDATE_LOCK (and apply the same change to the other occurrences
around mut_roles()/DEFAULT_ROLE) so concurrent requests cannot reorder storage
and cache writes.

---

Outside diff comments:
In `@src/handlers/http/rbac.rs`:
- Around line 289-317: The current check using Users.get_user_groups(&userid,
&tenant_id) happens before acquiring UPDATE_LOCK, leaving a race where a
concurrent group update can add memberships; after acquiring UPDATE_LOCK
(UPDATE_LOCK.lock().await) re-validate group membership against the locked
metadata by loading metadata via get_metadata(&tenant_id).await? and checking
metadata.users/groups to ensure the user has no group memberships before calling
metadata.users.retain(|user| user.userid() != userid); if any membership exists
return Err(RBACError::InvalidDeletionRequest(...)).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 18054772-fd6e-4c54-a017-5689e7687646

📥 Commits

Reviewing files that changed from the base of the PR and between 1da9927 and a65fb0d.

📒 Files selected for processing (7)
  • src/handlers/http/middleware.rs
  • src/handlers/http/modal/query/querier_rbac.rs
  • src/handlers/http/modal/query/querier_role.rs
  • src/handlers/http/modal/server.rs
  • src/handlers/http/rbac.rs
  • src/handlers/http/role.rs
  • src/rbac/mod.rs

Comment on lines +76 to 98
let (user, password) = {
let _guard = UPDATE_LOCK.lock().await;
let mut metadata = get_metadata(&tenant_id).await?;

if Users.contains(&username, &tenant_id)
|| metadata.users.iter().any(
|user| matches!(&user.ty, UserType::Native(basic) if basic.username == username),
)
{
return Err(RBACError::UserExists(username));
}

let (mut user, password) =
user::User::new_basic(username.clone(), tenant_id.clone(), false);
user.roles.clone_from(&user_roles);
metadata.users.push(user.clone());
put_metadata(&metadata, &tenant_id).await?;
(user, password)
};

put_metadata(&metadata, &tenant_id).await?;
// let created_role = user_roles.clone();
// Update in-memory state after releasing the lock
Users.put_user(user.clone());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Serialize the Users.* mutation with the metadata mutation.

These handlers now persist metadata under UPDATE_LOCK and then update the in-memory RBAC state afterward. That is not ordered across concurrent writers: request A can win the storage write first, request B can win it second, and then the post-lock Users.* calls can run in the reverse order. Password reset is the clearest failure mode here, but add/remove role has the same problem. The query node can end up serving stale passwords or stale role memberships until reload.

Also applies to: 223-253, 320-352, 367-389

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/modal/query/querier_rbac.rs` around lines 76 - 98, The
in-memory RBAC update (Users.put_user and similar Users.* calls) must be
performed while holding the same UPDATE_LOCK that protects metadata persistence
so the in-memory state is ordered with the storage write; move the
Users.put_user call into the critical section where get_metadata,
metadata.users.push(...), and put_metadata(...) occur (i.e. keep the Users
update before releasing UPDATE_LOCK after user::User::new_basic and
put_metadata). Apply the same change to the other affected blocks (around the
ranges that modify metadata and then call Users.*: lines ~223-253, ~320-352,
~367-389) so add/remove role and password reset mutations are done under
UPDATE_LOCK to avoid write reordering across concurrent writers.

Comment on lines +129 to +168
{
let _guard = UPDATE_LOCK.lock().await;
let mut metadata = get_metadata(&tenant_id).await?;
metadata.users.retain(|user| user.userid() != userid);

// also delete from user groups
let user_groups = Users.get_user_groups(&userid, &tenant_id);
let mut groups_to_update = Vec::new();

for user_group in user_groups {
if let Some(groups) = write_user_groups().get_mut(tenant)
&& let Some(ug) = groups.get_mut(&user_group)
&& let Some(users) = users().get(tenant)
&& let Some(user) = users.get(&userid)
{
let userid = match &user.ty {
UserType::Native(basic) => basic.username.clone(),
UserType::OAuth(oauth) => oauth.userid.clone(),
UserType::ApiKey(api_key) => api_key.userid.clone(),
};
ug.remove_users_by_user_ids(HashSet::from_iter([userid]))?;
groups_to_update.push(ug.clone());
} else {
// User not found, skip or log as needed
continue;
}

for updated_group in &groups_to_update {
if let Some(existing) = metadata
.user_groups
.iter_mut()
.find(|ug| ug.name == updated_group.name)
{
existing.clone_from(updated_group);
} else {
metadata.user_groups.push(updated_group.clone());
}
}

put_metadata(&metadata, &tenant_id).await?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Persist the delete even when the user has no groups.

put_metadata(&metadata, &tenant_id) is inside the for user_group in user_groups loop. If user_groups is empty, metadata.users.retain(...) is never written, so the delete only happens in memory and the user comes back after reload. Even when groups exist, this does one storage write per group.

Suggested fix
     {
         let _guard = UPDATE_LOCK.lock().await;
         let mut metadata = get_metadata(&tenant_id).await?;
         metadata.users.retain(|user| user.userid() != userid);

         // also delete from user groups
         let user_groups = Users.get_user_groups(&userid, &tenant_id);
-        let mut groups_to_update = Vec::new();

         for user_group in user_groups {
             if let Some(groups) = write_user_groups().get_mut(tenant)
                 && let Some(ug) = groups.get_mut(&user_group)
                 && let Some(users) = users().get(tenant)
                 && let Some(user) = users.get(&userid)
             {
                 let userid = match &user.ty {
                     UserType::Native(basic) => basic.username.clone(),
                     UserType::OAuth(oauth) => oauth.userid.clone(),
                     UserType::ApiKey(api_key) => api_key.userid.clone(),
                 };
                 ug.remove_users_by_user_ids(HashSet::from_iter([userid]))?;
-                groups_to_update.push(ug.clone());
-            } else {
-                // User not found, skip or log as needed
-                continue;
-            }
-
-            for updated_group in &groups_to_update {
                 if let Some(existing) = metadata
                     .user_groups
                     .iter_mut()
-                    .find(|ug| ug.name == updated_group.name)
+                    .find(|existing| existing.name == ug.name)
                 {
-                    existing.clone_from(updated_group);
+                    existing.clone_from(ug);
                 } else {
-                    metadata.user_groups.push(updated_group.clone());
+                    metadata.user_groups.push(ug.clone());
                 }
+            } else {
+                continue;
             }
-
-            put_metadata(&metadata, &tenant_id).await?;
         }
+        put_metadata(&metadata, &tenant_id).await?;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{
let _guard = UPDATE_LOCK.lock().await;
let mut metadata = get_metadata(&tenant_id).await?;
metadata.users.retain(|user| user.userid() != userid);
// also delete from user groups
let user_groups = Users.get_user_groups(&userid, &tenant_id);
let mut groups_to_update = Vec::new();
for user_group in user_groups {
if let Some(groups) = write_user_groups().get_mut(tenant)
&& let Some(ug) = groups.get_mut(&user_group)
&& let Some(users) = users().get(tenant)
&& let Some(user) = users.get(&userid)
{
let userid = match &user.ty {
UserType::Native(basic) => basic.username.clone(),
UserType::OAuth(oauth) => oauth.userid.clone(),
UserType::ApiKey(api_key) => api_key.userid.clone(),
};
ug.remove_users_by_user_ids(HashSet::from_iter([userid]))?;
groups_to_update.push(ug.clone());
} else {
// User not found, skip or log as needed
continue;
}
for updated_group in &groups_to_update {
if let Some(existing) = metadata
.user_groups
.iter_mut()
.find(|ug| ug.name == updated_group.name)
{
existing.clone_from(updated_group);
} else {
metadata.user_groups.push(updated_group.clone());
}
}
put_metadata(&metadata, &tenant_id).await?;
{
let _guard = UPDATE_LOCK.lock().await;
let mut metadata = get_metadata(&tenant_id).await?;
metadata.users.retain(|user| user.userid() != userid);
// also delete from user groups
let user_groups = Users.get_user_groups(&userid, &tenant_id);
for user_group in user_groups {
if let Some(groups) = write_user_groups().get_mut(tenant)
&& let Some(ug) = groups.get_mut(&user_group)
&& let Some(users) = users().get(tenant)
&& let Some(user) = users.get(&userid)
{
let userid = match &user.ty {
UserType::Native(basic) => basic.username.clone(),
UserType::OAuth(oauth) => oauth.userid.clone(),
UserType::ApiKey(api_key) => api_key.userid.clone(),
};
ug.remove_users_by_user_ids(HashSet::from_iter([userid]))?;
if let Some(existing) = metadata
.user_groups
.iter_mut()
.find(|existing| existing.name == ug.name)
{
existing.clone_from(ug);
} else {
metadata.user_groups.push(ug.clone());
}
} else {
continue;
}
}
put_metadata(&metadata, &tenant_id).await?;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/modal/query/querier_rbac.rs` around lines 129 - 168, The
code currently calls put_metadata(&metadata, &tenant_id).await? inside the for
user_group in user_groups loop which prevents persisting the metadata change
when user_groups is empty and causes one storage write per group; move the
persistence call so it occurs once after processing all groups (after the loop
that builds groups_to_update and updates metadata.user_groups) so that
metadata.users.retain(...) is always persisted even if
Users.get_user_groups(&userid, &tenant_id) returns empty; keep the UPDATE_LOCK
scope and ensure you still update metadata.user_groups from groups_to_update
(using the existing find/clone_from logic) before calling
put_metadata(&metadata, &tenant_id).await?.

Comment on lines +78 to +83
{
let _guard = UPDATE_LOCK.lock().await;
let mut metadata = get_metadata(&tenant_id).await?;
metadata.roles.insert(name.clone(), role.clone());
put_metadata(&metadata, &tenant_id).await?;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't release UPDATE_LOCK before the mut_roles() update below.

The metadata write is serialized, but the in-memory role-cache write is no longer part of that ordering. Concurrent put/delete requests can therefore leave mut_roles() with an older value than the one that was last persisted, which will skew later role validation and authorization until the cache is rebuilt.

Also applies to: 148-163

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/modal/query/querier_role.rs` around lines 78 - 83, The
in-memory role-cache update (mut_roles()) must remain serialized with the
metadata write guarded by UPDATE_LOCK to prevent stale cache state; move the
mut_roles() call (the code that updates the in-memory roles cache) inside the
same UPDATE_LOCK critical section that calls get_metadata(...) and
put_metadata(...), or extend the scope of _guard so the cache mutation occurs
before _guard is dropped; apply the same change to the other similar block
referenced (the one around mut_roles usage at lines 148-163) so both persistence
and cache updates are atomic relative to UPDATE_LOCK.

Comment thread src/handlers/http/rbac.rs
Comment on lines +151 to 176
let (user, password) = {
let _guard = UPDATE_LOCK.lock().await;
let mut metadata = get_metadata(&tenant_id).await?;

if Users.contains(&userid, &tenant_id)
|| metadata.users.iter().any(|user| match &user.ty {
UserType::Native(basic) => basic.username == userid,
UserType::OAuth(_) => false,
UserType::ApiKey(_) => false,
})
{
return Err(RBACError::UserExists(userid));
}

metadata.users.push(user.clone());
let (mut user, password) = user::User::new_basic(userid.clone(), tenant_id.clone(), false);
user.roles.clone_from(&user_roles);
metadata.users.push(user.clone());
put_metadata(&metadata, &tenant_id).await?;
(user, password)
};

put_metadata(&metadata, &tenant_id).await?;
let created_role = user_roles.clone();
Users.put_user(user.clone());
if !created_role.is_empty() {
add_roles_to_user(
req,
web::Path::<String>::from(userid.clone()),
web::Json(created_role),
)
.await?;
// Update in-memory state after releasing the lock
Users.put_user(user);
if !user_roles.is_empty() {
Users.add_roles(&userid, user_roles, &tenant_id);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

The post-lock Users.* update can now apply out of order.

These handlers persist metadata under UPDATE_LOCK and then mutate the in-memory RBAC state afterward. That breaks write ordering across concurrent requests: two password resets can leave the cache on the older hash, and add/remove role can similarly leave cached memberships behind the persisted state. The in-memory auth path will then disagree with storage until the cache is rebuilt.

Also applies to: 195-217, 383-399, 459-476

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/rbac.rs` around lines 151 - 176, The in-memory Users
updates are applied after releasing UPDATE_LOCK which breaks write ordering; fix
by performing the in-memory mutations while still holding the lock so metadata
and cache are updated atomically: inside the UPDATE_LOCK critical section (the
block around get_metadata/put_metadata and where you create user via
user::User::new_basic) call Users.put_user(...) and Users.add_roles(...) (or
Users.remove_roles where appropriate) before returning the (user, password)
tuple or leaving the block; apply the same change to the other affected handlers
(the blocks around lines noted) so every metadata write via put_metadata(...) is
immediately followed by the corresponding Users.* updates while holding
UPDATE_LOCK.

Comment thread src/handlers/http/role.rs
Comment on lines +59 to +67
{
let _guard = UPDATE_LOCK.lock().await;
let mut metadata = get_metadata(&tenant_id).await?;
metadata.roles.insert(name.clone(), role.clone());
put_metadata(&metadata, &tenant_id).await?;
}

let tenant_id = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);
// Update in-memory state after releasing the lock
let tenant_str = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT).to_owned();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Keep the cache write serialized with the metadata write.

These changes now drop UPDATE_LOCK before the matching mut_roles() / DEFAULT_ROLE update that follows. Two concurrent requests can therefore persist storage in order A→B and then apply the cache writes in order B→A, leaving memory behind the source of truth. A put racing with delete, or two put_default calls, can reproduce this.

Also applies to: 141-157, 175-184

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/role.rs` around lines 59 - 67, The cache write must remain
serialized with the persistent metadata write: keep the UPDATE_LOCK guard across
the in-memory updates so the sequence get_metadata(..) ->
metadata.roles.insert(...) -> put_metadata(..) and the subsequent
mut_roles()/DEFAULT_ROLE updates occur while _guard is held; move the
tenant_str/ mut_roles()/DEFAULT_ROLE modifications inside the same block that
holds UPDATE_LOCK (and apply the same change to the other occurrences around
mut_roles()/DEFAULT_ROLE) so concurrent requests cannot reorder storage and
cache writes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant