Skip to content

fix(node): prefer canonical repo row over mirror row in get_repo (#124)#141

Merged
kevincodex1 merged 7 commits into
Gitlawb:mainfrom
Gravirei:bur_fix_3
Jul 3, 2026
Merged

fix(node): prefer canonical repo row over mirror row in get_repo (#124)#141
kevincodex1 merged 7 commits into
Gitlawb:mainfrom
Gravirei:bur_fix_3

Conversation

@Gravirei

@Gravirei Gravirei commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Replicated mirror rows are stored with is_public=true and no visibility rules, and get_repo could resolve a request to the mirror row instead of the canonical one. An unauthenticated caller could read a private/withheld repo's refs and objects through the bare short-DID owner, bypassing the canonical row's visibility rules.

Fix: Added ORDER BY CASE WHEN position('/' in id) > 0 THEN 1 ELSE 0 END, created_at ASC, id ASC to get_repo so canonical rows (UUID id, no slash) are always preferred over mirror rows (slash-form id). This matches the ranking logic already used by DEDUP_CTE and dedupe_canonical_repos.

Closes #124

Summary by CodeRabbit

  • Bug Fixes
    • Improved repository lookup so did:key: owner inputs are normalized and matched only under the correct conditions, avoiding unintended DID cross-matches.
    • Updated repository selection to deterministically prefer the canonical repository row over mirror entries when both exist, with consistent tie-breaking.
    • Preserved the selected repository’s visibility (public/private) based on which row is returned.
    • Updated repository listing filtering and extended database integration tests to cover canonical-vs-mirror precedence and DID matching edge cases.

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Updates repo owner normalization and get_repo matching to treat did:key: inputs consistently, prefer canonical rows over mirror rows deterministically, and reuse the normalization in deduped repo listing. Adds integration tests for canonical selection, mirror fallback, and non-key DID matching.

Changes

get_repo canonical vs mirror row selection

Layer / File(s) Summary
Owner matching and canonical row ordering
crates/gitlawb-node/src/db/mod.rs
Adds owner-key normalization, updates Db::get_repo matching and ordering, and reuses the normalization in list_all_repos_deduped_with_stars.
Repo lookup behavior tests
crates/gitlawb-node/src/db/mod.rs
Adds integration coverage for canonical-over-mirror selection, mirror fallback, and non-did:key DID matching.

Estimated code review effort: 3 (Moderate) | ~20 minutes

Possibly related issues

Possibly related PRs

  • Gitlawb/node#73: Also updates crates/gitlawb-node/src/db/mod.rs around did:key: matching and canonical-versus-mirror repo row handling.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description explains the bug and fix, but it omits most required template sections like kind, what changed, and verification. Add the template sections: Summary, Kind of change, What changed, and How a reviewer can verify, plus any checklist items that apply.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly states the main change: preferring the canonical repo row over the mirror row in get_repo.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@beardthelion beardthelion added crate:node gitlawb-node — the serving node and REST API kind:bug Defect fix — wrong or unsafe behavior subsystem:storage Blob/object store, Arweave, IPFS, archives subsystem:visibility Path-scoped visibility and content withholding labels Jul 1, 2026

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_repo fix is correct. position('/' in id) ranks the canonical UUID row ahead of the slash-form mirror row, it matches the existing DEDUP_CTE ranking, and it makes the fetch_optional result deterministic instead of insertion-order-dependent. I built the change in isolation on top of main and ran it: the full node suite is green (317), clippy and MSRV 1.91 are clean, and the fix is load-bearing at the read gate. Without it, an anonymous read of a private repo resolves through the stale-public mirror row. Two things to sort before merge, plus a formatting nit.

Findings

  • [P1] Land #124 on its own; drop the #126 commits from this branch. bur_fix_3 carries all six commits from #133 (the #126 allowed-set flip) on top of the one get_repo commit, so this PR's diff against main re-includes work that #133 is still reviewing, and that PR has an open changes-requested. Merging as-is would land #126 through the back door and skip its review. Rebase so this PR contains only fix(node): prefer canonical repo row over mirror row. It cherry-picks onto main cleanly and has no dependency on #126.

  • [P2] Make the regression test catch the bug it is guarding. crates/gitlawb-node/src/db/mod.rs:3709 The test seeds the canonical row first and the mirror second, so an unordered fetch_optional returns the canonical row by insertion order. It passes with the ORDER BY removed, so it does not actually lock in the fix. Seed the mirror row before the canonical, which fails without the fix and passes with it, or add a caller-level case that drives authorize_repo_read and asserts an anonymous read of the private repo is denied.

  • [P3] Fix the rustfmt violation. crates/gitlawb-node/src/db/mod.rs:3752 The trailing assert!(!got.is_public, ...) needs wrapping; cargo fmt --check fails on it. Fork PRs don't run the fmt job, so CI here won't flag it.

…lawb#124)

get_repo matched both mirror rows (bare short-DID, is_public=true, no
visibility rules) and canonical rows (full did:key: DID, correct rules),
returning one nondeterministically. When the mirror row won, the
visibility gate saw is_public=true + empty rules, allowing unauthenticated
read of private/withheld content.

Add ORDER BY to prefer canonical rows (UUID id, no slash) over mirror
rows (slash-form id), matching the ranking logic already used by
DEDUP_CTE and dedupe_canonical_repos.

Test seeds mirror FIRST so an unordered fetch_optional returns the
mirror row — proving the test locks in the fix.
@github-actions github-actions Bot added the needs-tests Source changed without accompanying tests (advisory) label Jul 1, 2026
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution. A couple of things will help us review this faster:

  • This changes Rust source but no tests changed. Tests are required for fixes and strongly encouraged for features.

See CONTRIBUTING.md. Update the PR and these notes will clear automatically.

@Gravirei Gravirei requested a review from beardthelion July 1, 2026 06:17

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_repo fix is correct and I verified it closes #124 through the read gate: the CASE WHEN position('/' in id) > 0 term dominates the sort, so the canonical row (hex UUID id) always outranks the mirror row ({short}/{name}, always slash-form), including the case where the mirror row was written before the canonical one and created_at ASC alone would pick the mirror. Rebased cleanly to one commit off main, the earlier fmt nit is resolved, and the db suite is green on the merged state. The listing path already ranks canonical-first with the identical CASE, so the two resolvers now agree.

Findings

  • [P2] Make the regression test guard the CASE term, not just the presence of an ORDER BY
    crates/gitlawb-node/src/db/mod.rs:3729
    The test locks in less than it looks. The canonical row is dated 2026-01-01 while upsert_mirror_repo stamps the mirror with Utc::now(), so the created_at ASC tiebreaker alone selects the canonical row: remove the CASE term (keeping created_at ASC, id ASC) and the test still passes, even though the CASE is what actually fixes #124. A later "created_at already orders these" simplification would drop it and silently reopen the bypass. Date the canonical row after the mirror instead (e.g. ts("2126-01-01T00:00:00Z") on its created_at/updated_at), reproducing the mirror-written-first ordering; with that change, dropping the CASE fails the test and the full fix passes.

Optional, non-blocking: a two-canonical-row case would pin the created_at ASC, id ASC tiebreak, which is deterministic but currently untested.

@kevincodex1 — one P2 on the test; the fix itself is correct and closes the bypass.

@Gravirei Gravirei requested a review from beardthelion July 1, 2026 18:02
@beardthelion beardthelion dismissed stale reviews from themself July 1, 2026 20:59

Stale: addressed on e7f1ee1; re-reviewing current head.

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed on e7f1ee1. The regression test now guards the CASE term as it should. I ran it both ways on the merged state: green as-is, and dropping the CASE WHEN position('/' in id) term (leaving created_at ASC, id ASC) fails it with the mirror row winning over the canonical row, which is the exact #124 bypass. Dating the canonical row to 2126 is what makes created_at ASC alone select the mirror, so the CASE is load-bearing now. The fix itself is unchanged and correct, CI is green.

@kevincodex1 LGTM.

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found an issue that needs to be addressed before this is ready.

Findings

  • [P2] Keep get_repo matching did:key aliases only
    crates/gitlawb-node/src/db/mod.rs:894
    The new ordering makes get_repo("z6...", name) deterministically prefer any matching non-slash row, but the predicate still matches every owner DID that contains :<bare> via owner_did LIKE '%:' || $1 || '%'. That is broader than the did:key-only normalization used by did_matches, DEDUP_CTE, and the existing distinct-DID-method tests: did:key:z6... and did:gitlawb:z6... / did:web:z6... must stay separate owners even when they share the same trailing id and repo name. With this patch, a bare mirror lookup can rank a non-key DID row ahead of the exact slash-form mirror row and return the wrong repo/visibility rules. Please tighten get_repo to the same did:key-aware owner key used by the dedup paths, or otherwise restrict the canonical preference so it only applies to the matching did:key:<bare> row.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/gitlawb-node/src/db/mod.rs (1)

895-898: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Optional: extract the did:key: normalization into a shared helper.

This exact match owner_did.strip_prefix("did:key:") { Some(rest) if !rest.contains(':') => rest, _ => owner_did } block is duplicated in list_all_repos_deduped_with_stars (Line 1034-1037). Since both must stay in lock-step with the SQL CASE for the security invariant to hold, a single fn normalize_owner_key(did: &str) -> &str reduces the drift risk.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/db/mod.rs` around lines 895 - 898, Refactor the
duplicated did:key normalization logic into a shared helper to keep both call
sites in sync with the SQL CASE invariant. Add a small fn
normalize_owner_key(did: &str) -> &str and use it from the owner_key handling in
both list_all_repos_deduped_with_stars and the other matching block so the
strip_prefix("did:key:") / contains(':') behavior lives in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/gitlawb-node/src/db/mod.rs`:
- Around line 895-898: Refactor the duplicated did:key normalization logic into
a shared helper to keep both call sites in sync with the SQL CASE invariant. Add
a small fn normalize_owner_key(did: &str) -> &str and use it from the owner_key
handling in both list_all_repos_deduped_with_stars and the other matching block
so the strip_prefix("did:key:") / contains(':') behavior lives in one place.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 47bffae1-920f-4fda-b20e-76494f6faae5

📥 Commits

Reviewing files that changed from the base of the PR and between e7f1ee1 and ca003cb.

📒 Files selected for processing (1)
  • crates/gitlawb-node/src/db/mod.rs

@Gravirei Gravirei requested review from beardthelion and jatmn July 2, 2026 01:39

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I found one remaining issue that needs to be addressed before this is ready.

Findings

  • [P2] Keep emitted clone URLs aligned with the tightened owner matching
    crates/gitlawb-node/src/api/repos.rs:1635
    get_repo now intentionally only treats a bare owner as equivalent to did:key:<owner>; non-key DIDs such as did:gitlawb:z6... and did:web:... remain distinct. However to_response still builds clone_url by taking the last : segment for every owner DID, so a canonical non-key repo response advertises /<last-segment>/<repo>.git, and that URL now resolves through get_repo("<last-segment>", repo) to no canonical row (or to a mirror if one exists) instead of the repo that was just returned. Please update the URL/slug construction to use the same did:key-only shortening rule, or otherwise keep the advertised clone URL resolvable under the new matcher.

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_repo tightening verifies correct on 7f92302a and your earlier did:key-only [P2] is resolved. Two to fold in before merge.

Findings

  • [P2] Add a read-gate test that proves the #124 leak is closed, not just row selection crates/gitlawb-node/src/db/mod.rs (new tests ~3720)
    The three new tests assert get_repo returns the canonical row, but nothing drives authorize_repo_read, where the leak lived. Seed a private canonical plus a public mirror twin for the same owner+name (mirror inserted first), call authorize_repo_read with caller=None, and assert Err(RepoNotFound). That locks the property at the gate, so a future regression is caught by the denial rather than by row order alone. While you're there, one get_repo call with the full did:key: form (not just the bare id) covers the normalize branch the current tests skip.

  • [P2] The clone_url slug from jatmn's note hits the fork check too, same fix crates/gitlawb-node/src/api/repos.rs:1635 and :1477
    Confirmed jatmn's point: to_response derives the slug from owner_did.split(':').next_back(), which stops round-tripping through the tightened get_repo for a non-key DID owner. The same last-segment slug feeds the fork-name-conflict check at repos.rs:1477 (forker_short then get_repo), so shorten both consumers with the same did:key-only rule (make normalize_owner_key pub(crate) and reuse it). For calibration: no reachable path plants a non-key DID on a canonical row today (auth is did:key-only via to_verifying_key, and the two insert paths are create/fork with auth.0 and the always-slash mirror upsert), so this is forward-consistency for when did:web/did:gitlawb ownership lands, not a live break. Cheap to align now.

@Gravirei once the gate test and the slug alignment are in, I'll re-review. Nice work tightening the matcher.

…ey matching

- Make normalize_owner_key pub(crate) and reuse it in api/repos, api/events, api/peers, and visibility modules.

- Add read-gate authorize_repo_read and to_response clone_url slug tests.
@Gravirei Gravirei requested review from beardthelion and jatmn July 2, 2026 05:32

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I rechecked the previously discussed paths and do not see any remaining actionable issues from my side.

@kevincodex1 LGTM

@beardthelion beardthelion dismissed their stale review July 2, 2026 14:12

Superseded: re-reviewed 084ca04 — both prior P2s (gate test, clone-URL slug) are resolved. Posting a fresh review with the remaining items.

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed on 084ca046, and I verified the core by execution: the #124 gate test is load-bearing (on the head anon gets Err(RepoNotFound); reverting only the canonical-preference ORDER BY flips it to serving the is_public:true mirror for the private repo), normalize_owner_key matches the SQL CASE across the boundary set (did:key short/full, empty residual, did:key:z:extra, non-key, bare, empty, uppercase), and the is_owner change is fail-closed for reachable data. Both my earlier P2s and jatmn's clone-URL P2 are resolved. The fix is correct; a few things to fold in before merge, the first being the one that matters.

Findings

  • [P2] Guard the Rust/SQL owner-key normalization against drift
    crates/gitlawb-node/src/db/mod.rs:829
    normalize_owner_key and the roughly nine verbatim SQL CASE WHEN owner_did LIKE 'did:key:%' ... copies (the get_repo WHERE at :907, the DEDUP_CTE, the count) are byte-identity-coupled with only a comment enforcing it, and the helper has no unit tests. They agree today (I checked the full boundary set), but a future edit to one side diverges silently: the clone-URL round-trip breaks, or the matcher mismatches and reopens a leak. Add unit tests over that boundary set, ideally asserting the Rust output equals a SQL round-trip, and consider hoisting the CASE into a shared const so a mismatch becomes a compile-time impossibility.

  • [P3] Add an is_owner regression test for the non-key tightening
    crates/gitlawb-node/src/visibility.rs:29
    is_owner now returns false when a non-key owner (did:gitlawb:z6Mkfoo) is probed with a bare last segment (z6Mkfoo), where the old .split(':').next_back() returned true. That behavior change has no test; root_rule_allows_owner_short_form only covers the did:key path and passes against both old and new code. Add a non-key-owner case asserting the bare segment no longer matches.

  • [P3] Add LIMIT 1 to the get_repo query
    crates/gitlawb-node/src/db/mod.rs:913
    The query can now match both a canonical and a mirror row, so the ORDER BY builds a two-row set that fetch_optional then trims. LIMIT 1 makes the single-row intent explicit and skips the needless sort-and-discard.

  • [P3] Fix the is_owner doc comment; keep normalize_owner_key
    crates/gitlawb-node/src/visibility.rs:26
    The comment says is_owner mirrors the owner-match idiom in api/protect.rs, but protect.rs uses did_matches while is_owner now uses the stricter normalize_owner_key. Update the comment to reflect the intentional did:key-only rule. Keep normalize_owner_key here: switching to did_matches would reintroduce the cross-method looseness this PR removes.

  • [P3] Document the non-key clone_url disk-path constraint
    crates/gitlawb-node/src/api/repos.rs:1627
    For a non-key DID owner, normalize_owner_key returns the full DID, so clone_url becomes /did:web:host:alice/repo.git; that resolves through get_repo, but the colon-bearing path segment would break the sync.rs disk-path join. Not reachable today (auth is did:key-only), so this is a forward constraint to document/handle before non-key ownership lands, not a live break.

Out of scope for this PR, tracking separately (no action needed here): api/agents.rs caller_matches_did still uses the old last-segment matcher; it's the one owner-match site the centralization didn't reach, and it isn't exploitable (deregister only revokes the caller's own DID).

@Gravirei nice work, the #124 fix is correct and verified. Once the drift guard is in (and the P3s while you're there), I'll re-review.

… review feedback

- Hoist 7 verbatim SQL CASE copies into OWNER_KEY_CASE_SQL const
  (dedup_cte(), get_repo, count_repos_deduped), keeping one literal
  copy in the v7 migration with a cross-reference comment.
- Add 9 unit tests for normalize_owner_key over the full boundary set.
- Add LIMIT 1 to get_repo query for single-row intent.
- Add non_key_owner_bare_short_does_not_match is_owner regression test.
- Update is_owner doc to reflect did:key-only rule.
- Document non-key clone_url disk-path constraint in to_response.
@github-actions github-actions Bot removed the needs-tests Source changed without accompanying tests (advisory) label Jul 2, 2026
…KEY_CASE_SQL

Verifies every boundary value produces identical results from the
Rust function and the SQL CASE expression, preventing silent drift.
@Gravirei Gravirei requested review from beardthelion and jatmn July 2, 2026 16:23

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. I do not see any actionable issues from my review.

@kevincodex1 LGTM

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed on cc0722e9. Every item from my last pass is resolved, and I verified the changed branches load-bearing by execution, both ways:

  • #124 gate: reverting only the canonical-preference ORDER BY term flips authorize_repo_read to serve the public mirror row for the private canonical to an anonymous caller (Ok(is_public: true) instead of Err(RepoNotFound)); with the term in place it denies.
  • did:key-only is_owner: non_key_owner_bare_short_does_not_match passes as written and fails if I restore the old split(':').next_back() (bare short wrongly matches the non-key owner), so it's load-bearing.
  • Drift guard: normalize_owner_key_matches_sql_case passes and fails if the Rust helper or the SQL const diverge; the hoist renders byte-identical for the deduped and count queries.

The P2 drift guard and the four P3s are all in.

  • [P3] Cover the both-rows state end to end. The reachable way one node ends up with both a canonical and a mirror row for the same owner/name is trigger_sync re-mirroring its own repo, but the tests only seed that state directly. A trigger_sync -> upsert_mirror test that then asserts the gate still denies would lock the path the fix exists for.

Non-blocking latent (safe while auth is did:key-only): the frozen idx_repos_owner_key_name migration keeps a verbatim CASE copy (index-usage only), and the owner-side-only is_owner normalization plus the agents.rs/profile last-segment matchers are worth revisiting when non-key DIDs land.

@kevincodex1 this is ready. LGTM.

@kevincodex1 kevincodex1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevincodex1 kevincodex1 merged commit 6c95592 into Gitlawb:main Jul 3, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crate:node gitlawb-node — the serving node and REST API kind:bug Defect fix — wrong or unsafe behavior subsystem:storage Blob/object store, Arweave, IPFS, archives subsystem:visibility Path-scoped visibility and content withholding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replicated mirror rows are public with no rules: unauthenticated read of withheld repo content via bare short-DID owner

4 participants