Skip to content

fix(db): unify state history tables#2957

Draft
osu wants to merge 5 commits into
NVIDIA:mainfrom
osu:fix/1136-unify-state-history-tables
Draft

fix(db): unify state history tables#2957
osu wants to merge 5 commits into
NVIDIA:mainfrom
osu:fix/1136-unify-state-history-tables

Conversation

@osu

@osu osu commented Jun 28, 2026

Copy link
Copy Markdown
Member

Description

Normalize all eight state-history tables around a common five-column layout with an object_id TEXT resource key.

This change:

  • makes timestamps mandatory and backfills existing NULL values;
  • removes cascading resource foreign keys so history survives deletion;
  • updates shared accessors, direct joins, retention triggers, indexes, and machine cleanup;
  • retains power-shelf, rack, switch, DPA-interface, and machine-cleanup history;
  • adds schema, 250-record retention, deletion-retention, and DPA history-query coverage.

Related issues

Closes #1136

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Completed validation:

  • cargo test --locked -p carbide-api-db -- --test-threads=1 — 217 passed; doc tests 3 passed, 1 ignored.
  • cargo test --locked -p carbide-api-db dpa_interface::test::deleting_ -- --nocapture --test-threads=1 — 2 passed.
  • Linux: cargo test --locked -p carbide-api-core test_force_delete_ -- --nocapture --test-threads=1 — 9 passed.
  • Linux: machine_history filter — 2 passed.
  • Linux: exact VPC-prefix renamed-column test — 1 passed.
  • Linux: exact network-segment retention test — 1 passed.
  • Linux: cargo clippy --locked -p carbide-api-db -p carbide-api-core --all-targets --all-features — passed.
  • Pinned nightly cargo fmt --all -- --check — passed.
  • Populated-upgrade rehearsal from origin/main: seeded all eight legacy tables, including NULL power-shelf/switch timestamps, then applied this migration. All eight rows were preserved; resource IDs became TEXT; timestamps were non-null; every table had the common five-column layout and zero foreign keys.

Additional Notes

The migration renames internal history columns in place. As with existing direct-rename migrations in this repository, old application binaries are not compatible with the post-migration column names during the rollout window.

Signed-off-by: Hasan Khan <hasank@nvidia.com>
@osu osu requested a review from a team as a code owner June 28, 2026 18:40
@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5466c580-c592-4b4a-9c5e-86f89845948e

📥 Commits

Reviewing files that changed from the base of the PR and between 4717b8c and 8dfbe81.

📒 Files selected for processing (1)
  • crates/api-core/tests/integration/rack_find.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/api-core/tests/integration/rack_find.rs

Summary by CodeRabbit

  • Bug Fixes
    • Hard deletes for racks, switches, and power shelves now keep their state history instead of removing it.
    • History lookups and snapshots now consistently return the expected records after deletions.
    • Cleanup flows now preserve relevant state-history entries while removing the associated resource.

Walkthrough

The PR unifies state-history tables around object_id, updates Rust query/build helpers and SQL joins to use that column, preserves history during force-delete flows, and expands tests to verify retention and the revised SQL shapes.

Changes

State History Unification

Layer / File(s) Summary
DB migration: schema unification and retention triggers
crates/api-db/migrations/20260628120000_unify_state_history_tables.sql
Drops cascading foreign keys, normalizes timestamp to NOT NULL, renames entity-specific IDs to object_id, standardizes indexes, recreates per-object retention triggers, and adds cleanup_machine_by_id without deleting state history.
state_history Rust API: unified object_id queries
crates/api-db/src/state_history.rs
Removes per-table object-id helpers and delete_by_object_id, rewrites lookup/persist/update paths to use fixed object_id bindings, and adds integration tests for schema, persistence, retention, and concurrency.
History joins updated to object_id across entities
crates/api-db/src/dpa_interface.rs, crates/api-db/src/network_segment.rs, crates/api-db/src/sql/machine_snapshot_history_join.snippet, crates/api-db/src/sql/managed_host_history_join.snippet
History aggregation queries switch to object_id; DPA interface deletion no longer pre-removes state-history rows, and DPA interface retention tests are added.
Force-delete handlers: history retention
crates/api-core/src/handlers/power_shelf.rs, crates/api-core/src/handlers/rack.rs, crates/api-core/src/handlers/switch.rs
Force-delete handlers stop deleting state-history records and update doc comments to describe retained history.
Test updates: retention assertions and object_id SQL
crates/api-core/src/tests/power_shelf.rs, crates/api-core/src/tests/switch.rs, crates/api-core/tests/integration/rack_find.rs, crates/api-core/src/tests/machine_history.rs, crates/api-core/src/tests/vpc_prefix.rs
Force-delete tests now seed and verify retained history, while machine history and VPC prefix tests use object_id-based SQL.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately summarizes the main database normalization change.
Description check ✅ Passed The description directly matches the changeset and the stated schema and history-retention goals.
Linked Issues check ✅ Passed The PR satisfies #1136 by standardizing object_id, enforcing timestamps, removing cascading FKs, and preserving 250-row retention.
Out of Scope Changes check ✅ Passed No clear unrelated code changes are present; the cleanup and query updates support the state-history unification work.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/api-db/migrations/20260628120000_unify_state_history_tables.sql`:
- Around line 80-158: The retention triggers in
machine_state_history_keep_limit, network_segment_state_history_keep_limit,
vpc_prefix_state_history_keep_limit, dpa_interface_state_history_keep_limit,
ib_partition_state_history_keep_limit, power_shelf_state_history_keep_limit,
rack_state_history_keep_limit, and switch_state_history_keep_limit can race
under concurrent inserts for the same object_id. Add per-object serialization
inside each trigger body before the DELETE/overflow calculation, using a lock or
equivalent mechanism keyed by NEW.object_id, so only one transaction performs
retention cleanup for a given object at a time. Keep the existing 250-row cap
logic, but ensure it runs after the lock is acquired.

In `@crates/api-db/src/dpa_interface.rs`:
- Around line 309-310: The history JSON built in the DPA interface query is not
ordered deterministically, so align the aggregate in the history_agg subquery
with state_history::for_object by preserving a stable sort before json_agg.
Update the query around dpa_interface_state_history and the
json_agg/json_build_object expression so include_history=true returns history
entries in the same id ASC order every time.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d93c359a-59de-4c3f-a8d7-095f96b2f97d

📥 Commits

Reviewing files that changed from the base of the PR and between d3395d8 and 9fd867f.

📒 Files selected for processing (14)
  • crates/api-core/src/handlers/power_shelf.rs
  • crates/api-core/src/handlers/rack.rs
  • crates/api-core/src/handlers/switch.rs
  • crates/api-core/src/tests/machine_history.rs
  • crates/api-core/src/tests/power_shelf.rs
  • crates/api-core/src/tests/rack_find.rs
  • crates/api-core/src/tests/switch.rs
  • crates/api-core/src/tests/vpc_prefix.rs
  • crates/api-db/migrations/20260628120000_unify_state_history_tables.sql
  • crates/api-db/src/dpa_interface.rs
  • crates/api-db/src/network_segment.rs
  • crates/api-db/src/sql/machine_snapshot_history_join.snippet
  • crates/api-db/src/sql/managed_host_history_join.snippet
  • crates/api-db/src/state_history.rs

Comment thread crates/api-db/src/dpa_interface.rs Outdated
Signed-off-by: Hasan Khan <hasank@nvidia.com>
@osu

osu commented Jun 28, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/api-db/src/state_history.rs (1)

236-288: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Join the spawned writer even when the wait assertion fails.

timeout(...)? can return before second_insert.await, which drops the JoinHandle and detaches the DB task on failure. Capture the wait result, release the first transaction, await the spawned writer, then propagate the original error.

Proposed cleanup ordering
-        tokio::time::timeout(std::time::Duration::from_secs(5), async {
+        let wait_result = tokio::time::timeout(std::time::Duration::from_secs(5), async {
             loop {
                 let waiting: bool = sqlx::query_scalar(
@@
-        })
-        .await
-        .map_err(|_| {
+        })
+        .await
+        .map_err(|_| {
             std::io::Error::other(format!(
                 "second writer did not wait for {table_name} retention lock",
             ))
-        })??;
+        });
         drop(observer);
 
         first_txn.commit().await?;
-        second_insert.await?.map_err(std::io::Error::other)?;
+        let second_result = second_insert.await?.map_err(std::io::Error::other);
+        wait_result??;
+        second_result?;

As per coding guidelines, avoid spawning background tasks without joining them; as per path instructions, prioritize concurrency and resource-lifetime findings.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-db/src/state_history.rs` around lines 236 - 288, The spawned
writer task in the state history test is being detached when the wait assertion
times out, because `timeout(...)?` returns before `second_insert.await` and the
`JoinHandle` is dropped. Update the flow around `second_insert`, `pid_receiver`,
and `first_txn.commit` so you capture the wait result first, always
release/commit the first transaction, then await `second_insert` before
returning or propagating the original error. Keep the original timeout/assertion
error, but ensure the spawned DB task is joined on both success and failure
paths.

Sources: Coding guidelines, Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/api-db/src/state_history.rs`:
- Around line 236-288: The spawned writer task in the state history test is
being detached when the wait assertion times out, because `timeout(...)?`
returns before `second_insert.await` and the `JoinHandle` is dropped. Update the
flow around `second_insert`, `pid_receiver`, and `first_txn.commit` so you
capture the wait result first, always release/commit the first transaction, then
await `second_insert` before returning or propagating the original error. Keep
the original timeout/assertion error, but ensure the spawned DB task is joined
on both success and failure paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 14ac5f81-b88e-419a-8f42-4d898de2967e

📥 Commits

Reviewing files that changed from the base of the PR and between 9fd867f and c46d99e.

📒 Files selected for processing (3)
  • crates/api-db/migrations/20260628120000_unify_state_history_tables.sql
  • crates/api-db/src/dpa_interface.rs
  • crates/api-db/src/state_history.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/api-db/src/dpa_interface.rs
  • crates/api-db/migrations/20260628120000_unify_state_history_tables.sql

@osu

osu commented Jun 28, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Signed-off-by: Hasan Khan <hasank@nvidia.com>
@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 302 6 30 114 8 144
machine-validation-runner 765 30 194 283 37 221
machine_validation 765 30 194 283 37 221
machine_validation-aarch64 765 30 194 283 37 221
nvmetal-carbide 765 30 194 283 37 221
TOTAL 3368 126 806 1252 156 1028

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@osu osu self-assigned this Jun 28, 2026
@ajf

ajf commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

@kensimon can you review this please.

@ajf ajf requested a review from kensimon June 29, 2026 20:09
Signed-off-by: Hasan Khan <hasank@nvidia.com>
@github-actions

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/api-core/tests/integration/rack_find.rs (1)

172-205: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Cover history retention in the already-soft-deleted branch.

This test exercises a distinct admin_force_delete_rack path, but it never verifies the PR’s retention contract there. Seed a rack state-history row before delete_rack() and assert it is still returned after the hard delete, otherwise a branch-specific regression can slip through unnoticed.

💡 Suggested test extension
 async fn test_force_delete_rack_already_soft_deleted(
     pool: PgPool,
 ) -> Result<(), Box<dyn std::error::Error>> {
     let env = TestHarness::builder(pool).build().await;
     let TestRack { id: rack_id } = env.create_rack().await;
 
+    let mut txn = env.pool.begin().await?;
+    db::state_history::persist(
+        &mut txn,
+        db::state_history::StateHistoryTableId::Rack,
+        &rack_id,
+        &"retained-through-soft-delete",
+        config_version::ConfigVersion::initial(),
+    )
+    .await?;
+    txn.commit().await?;
+
     env.api()
         .delete_rack(tonic::Request::new(DeleteRackRequest {
             id: rack_id.to_string(),
         }))
         .await?;
@@
     assert!(racks.is_empty(), "Rack should be hard-deleted");
+
+    let mut conn = env.pool.acquire().await?;
+    let history = db::state_history::for_object(
+        &mut conn,
+        db::state_history::StateHistoryTableId::Rack,
+        &rack_id,
+    )
+    .await?;
+    assert!(
+        history
+            .iter()
+            .any(|record| record.state == r#""retained-through-soft-delete""#),
+        "Rack state history should be retained",
+    );
 
     Ok(())
 }

As per coding guidelines, "Add or update focused tests for bug fixes, shared behavior, API contracts, migrations, and cross-module changes."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/tests/integration/rack_find.rs` around lines 172 - 205, The
`test_force_delete_rack_already_soft_deleted` integration test covers
`admin_force_delete_rack` but does not verify retention of rack history in this
branch. Before calling `delete_rack()`, seed a rack state-history record for the
created rack, then after the force delete assert that the history is still
retrievable while the rack itself is gone. Use the existing `TestHarness`,
`delete_rack`, and `admin_force_delete_rack` flow in this test to add the
branch-specific retention check.

Sources: Coding guidelines, Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/api-core/tests/integration/rack_find.rs`:
- Around line 172-205: The `test_force_delete_rack_already_soft_deleted`
integration test covers `admin_force_delete_rack` but does not verify retention
of rack history in this branch. Before calling `delete_rack()`, seed a rack
state-history record for the created rack, then after the force delete assert
that the history is still retrievable while the rack itself is gone. Use the
existing `TestHarness`, `delete_rack`, and `admin_force_delete_rack` flow in
this test to add the branch-specific retention check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bc0639f2-9924-4714-9f1f-b772c3615eee

📥 Commits

Reviewing files that changed from the base of the PR and between 219e815 and 4717b8c.

📒 Files selected for processing (3)
  • crates/api-core/src/tests/vpc_prefix.rs
  • crates/api-core/tests/integration/rack_find.rs
  • crates/api-db/src/dpa_interface.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/api-core/src/tests/vpc_prefix.rs
  • crates/api-db/src/dpa_interface.rs

@osu osu marked this pull request as draft June 30, 2026 20:17
@copy-pr-bot

copy-pr-bot Bot commented Jun 30, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Hasan Khan <hasank@nvidia.com>
@osu

osu commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@osu

osu commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/ok to test 8dfbe81

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unify state history tables

2 participants