fix(node): prefer canonical repo row over mirror row in get_repo (#124)#141
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughUpdates repo owner normalization and Changesget_repo canonical vs mirror row selection
Estimated code review effort: 3 (Moderate) | ~20 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
beardthelion
left a comment
There was a problem hiding this comment.
The get_repo fix is correct. position('/' in id) ranks the canonical UUID row ahead of the slash-form mirror row, it matches the existing DEDUP_CTE ranking, and it makes the fetch_optional result deterministic instead of insertion-order-dependent. I built the change in isolation on top of main and ran it: the full node suite is green (317), clippy and MSRV 1.91 are clean, and the fix is load-bearing at the read gate. Without it, an anonymous read of a private repo resolves through the stale-public mirror row. Two things to sort before merge, plus a formatting nit.
Findings
-
[P1] Land #124 on its own; drop the #126 commits from this branch.
bur_fix_3carries all six commits from #133 (the #126 allowed-set flip) on top of the oneget_repocommit, so this PR's diff againstmainre-includes work that #133 is still reviewing, and that PR has an open changes-requested. Merging as-is would land #126 through the back door and skip its review. Rebase so this PR contains onlyfix(node): prefer canonical repo row over mirror row. It cherry-picks ontomaincleanly and has no dependency on #126. -
[P2] Make the regression test catch the bug it is guarding.
crates/gitlawb-node/src/db/mod.rs:3709The test seeds the canonical row first and the mirror second, so an unorderedfetch_optionalreturns the canonical row by insertion order. It passes with theORDER BYremoved, so it does not actually lock in the fix. Seed the mirror row before the canonical, which fails without the fix and passes with it, or add a caller-level case that drivesauthorize_repo_readand asserts an anonymous read of the private repo is denied. -
[P3] Fix the rustfmt violation.
crates/gitlawb-node/src/db/mod.rs:3752The trailingassert!(!got.is_public, ...)needs wrapping;cargo fmt --checkfails on it. Fork PRs don't run the fmt job, so CI here won't flag it.
…lawb#124) get_repo matched both mirror rows (bare short-DID, is_public=true, no visibility rules) and canonical rows (full did:key: DID, correct rules), returning one nondeterministically. When the mirror row won, the visibility gate saw is_public=true + empty rules, allowing unauthenticated read of private/withheld content. Add ORDER BY to prefer canonical rows (UUID id, no slash) over mirror rows (slash-form id), matching the ranking logic already used by DEDUP_CTE and dedupe_canonical_repos. Test seeds mirror FIRST so an unordered fetch_optional returns the mirror row — proving the test locks in the fix.
|
Thanks for the contribution. A couple of things will help us review this faster:
See CONTRIBUTING.md. Update the PR and these notes will clear automatically. |
beardthelion
left a comment
There was a problem hiding this comment.
The get_repo fix is correct and I verified it closes #124 through the read gate: the CASE WHEN position('/' in id) > 0 term dominates the sort, so the canonical row (hex UUID id) always outranks the mirror row ({short}/{name}, always slash-form), including the case where the mirror row was written before the canonical one and created_at ASC alone would pick the mirror. Rebased cleanly to one commit off main, the earlier fmt nit is resolved, and the db suite is green on the merged state. The listing path already ranks canonical-first with the identical CASE, so the two resolvers now agree.
Findings
- [P2] Make the regression test guard the
CASEterm, not just the presence of anORDER BY
crates/gitlawb-node/src/db/mod.rs:3729
The test locks in less than it looks. The canonical row is dated2026-01-01whileupsert_mirror_repostamps the mirror withUtc::now(), so thecreated_at ASCtiebreaker alone selects the canonical row: remove theCASEterm (keepingcreated_at ASC, id ASC) and the test still passes, even though theCASEis what actually fixes #124. A later "created_at already orders these" simplification would drop it and silently reopen the bypass. Date the canonical row after the mirror instead (e.g.ts("2126-01-01T00:00:00Z")on itscreated_at/updated_at), reproducing the mirror-written-first ordering; with that change, dropping theCASEfails the test and the full fix passes.
Optional, non-blocking: a two-canonical-row case would pin the created_at ASC, id ASC tiebreak, which is deterministic but currently untested.
@kevincodex1 — one P2 on the test; the fix itself is correct and closes the bypass.
Stale: addressed on e7f1ee1; re-reviewing current head.
beardthelion
left a comment
There was a problem hiding this comment.
Re-reviewed on e7f1ee1. The regression test now guards the CASE term as it should. I ran it both ways on the merged state: green as-is, and dropping the CASE WHEN position('/' in id) term (leaving created_at ASC, id ASC) fails it with the mirror row winning over the canonical row, which is the exact #124 bypass. Dating the canonical row to 2126 is what makes created_at ASC alone select the mirror, so the CASE is load-bearing now. The fix itself is unchanged and correct, CI is green.
@kevincodex1 LGTM.
jatmn
left a comment
There was a problem hiding this comment.
I found an issue that needs to be addressed before this is ready.
Findings
- [P2] Keep
get_repomatching did:key aliases only
crates/gitlawb-node/src/db/mod.rs:894
The new ordering makesget_repo("z6...", name)deterministically prefer any matching non-slash row, but the predicate still matches every owner DID that contains:<bare>viaowner_did LIKE '%:' || $1 || '%'. That is broader than the did:key-only normalization used bydid_matches,DEDUP_CTE, and the existing distinct-DID-method tests:did:key:z6...anddid:gitlawb:z6.../did:web:z6...must stay separate owners even when they share the same trailing id and repo name. With this patch, a bare mirror lookup can rank a non-key DID row ahead of the exact slash-form mirror row and return the wrong repo/visibility rules. Please tightenget_repoto the same did:key-aware owner key used by the dedup paths, or otherwise restrict the canonical preference so it only applies to the matchingdid:key:<bare>row.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/gitlawb-node/src/db/mod.rs (1)
895-898: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winOptional: extract the
did:key:normalization into a shared helper.This exact
match owner_did.strip_prefix("did:key:") { Some(rest) if !rest.contains(':') => rest, _ => owner_did }block is duplicated inlist_all_repos_deduped_with_stars(Line 1034-1037). Since both must stay in lock-step with the SQL CASE for the security invariant to hold, a singlefn normalize_owner_key(did: &str) -> &strreduces the drift risk.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/db/mod.rs` around lines 895 - 898, Refactor the duplicated did:key normalization logic into a shared helper to keep both call sites in sync with the SQL CASE invariant. Add a small fn normalize_owner_key(did: &str) -> &str and use it from the owner_key handling in both list_all_repos_deduped_with_stars and the other matching block so the strip_prefix("did:key:") / contains(':') behavior lives in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/gitlawb-node/src/db/mod.rs`:
- Around line 895-898: Refactor the duplicated did:key normalization logic into
a shared helper to keep both call sites in sync with the SQL CASE invariant. Add
a small fn normalize_owner_key(did: &str) -> &str and use it from the owner_key
handling in both list_all_repos_deduped_with_stars and the other matching block
so the strip_prefix("did:key:") / contains(':') behavior lives in one place.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 47bffae1-920f-4fda-b20e-76494f6faae5
📒 Files selected for processing (1)
crates/gitlawb-node/src/db/mod.rs
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the update. I found one remaining issue that needs to be addressed before this is ready.
Findings
- [P2] Keep emitted clone URLs aligned with the tightened owner matching
crates/gitlawb-node/src/api/repos.rs:1635
get_reponow intentionally only treats a bare owner as equivalent todid:key:<owner>; non-key DIDs such asdid:gitlawb:z6...anddid:web:...remain distinct. Howeverto_responsestill buildsclone_urlby taking the last:segment for every owner DID, so a canonical non-key repo response advertises/<last-segment>/<repo>.git, and that URL now resolves throughget_repo("<last-segment>", repo)to no canonical row (or to a mirror if one exists) instead of the repo that was just returned. Please update the URL/slug construction to use the same did:key-only shortening rule, or otherwise keep the advertised clone URL resolvable under the new matcher.
There was a problem hiding this comment.
The get_repo tightening verifies correct on 7f92302a and your earlier did:key-only [P2] is resolved. Two to fold in before merge.
Findings
-
[P2] Add a read-gate test that proves the #124 leak is closed, not just row selection
crates/gitlawb-node/src/db/mod.rs(new tests ~3720)
The three new tests assertget_reporeturns the canonical row, but nothing drivesauthorize_repo_read, where the leak lived. Seed a private canonical plus a public mirror twin for the same owner+name (mirror inserted first), callauthorize_repo_readwithcaller=None, and assertErr(RepoNotFound). That locks the property at the gate, so a future regression is caught by the denial rather than by row order alone. While you're there, oneget_repocall with the fulldid:key:form (not just the bare id) covers the normalize branch the current tests skip. -
[P2] The clone_url slug from jatmn's note hits the fork check too, same fix
crates/gitlawb-node/src/api/repos.rs:1635and:1477
Confirmed jatmn's point:to_responsederives the slug fromowner_did.split(':').next_back(), which stops round-tripping through the tightenedget_repofor a non-key DID owner. The same last-segment slug feeds the fork-name-conflict check atrepos.rs:1477(forker_shortthenget_repo), so shorten both consumers with the same did:key-only rule (makenormalize_owner_keypub(crate)and reuse it). For calibration: no reachable path plants a non-key DID on a canonical row today (auth is did:key-only viato_verifying_key, and the two insert paths are create/fork withauth.0and the always-slash mirror upsert), so this is forward-consistency for when did:web/did:gitlawb ownership lands, not a live break. Cheap to align now.
@Gravirei once the gate test and the slug alignment are in, I'll re-review. Nice work tightening the matcher.
…ey matching - Make normalize_owner_key pub(crate) and reuse it in api/repos, api/events, api/peers, and visibility modules. - Add read-gate authorize_repo_read and to_response clone_url slug tests.
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the update. I rechecked the previously discussed paths and do not see any remaining actionable issues from my side.
@kevincodex1 LGTM
Superseded: re-reviewed 084ca04 — both prior P2s (gate test, clone-URL slug) are resolved. Posting a fresh review with the remaining items.
beardthelion
left a comment
There was a problem hiding this comment.
Re-reviewed on 084ca046, and I verified the core by execution: the #124 gate test is load-bearing (on the head anon gets Err(RepoNotFound); reverting only the canonical-preference ORDER BY flips it to serving the is_public:true mirror for the private repo), normalize_owner_key matches the SQL CASE across the boundary set (did:key short/full, empty residual, did:key:z:extra, non-key, bare, empty, uppercase), and the is_owner change is fail-closed for reachable data. Both my earlier P2s and jatmn's clone-URL P2 are resolved. The fix is correct; a few things to fold in before merge, the first being the one that matters.
Findings
-
[P2] Guard the Rust/SQL owner-key normalization against drift
crates/gitlawb-node/src/db/mod.rs:829
normalize_owner_keyand the roughly nine verbatim SQLCASE WHEN owner_did LIKE 'did:key:%' ...copies (the get_repo WHERE at:907, the DEDUP_CTE, the count) are byte-identity-coupled with only a comment enforcing it, and the helper has no unit tests. They agree today (I checked the full boundary set), but a future edit to one side diverges silently: the clone-URL round-trip breaks, or the matcher mismatches and reopens a leak. Add unit tests over that boundary set, ideally asserting the Rust output equals a SQL round-trip, and consider hoisting the CASE into a sharedconstso a mismatch becomes a compile-time impossibility. -
[P3] Add an is_owner regression test for the non-key tightening
crates/gitlawb-node/src/visibility.rs:29
is_ownernow returns false when a non-key owner (did:gitlawb:z6Mkfoo) is probed with a bare last segment (z6Mkfoo), where the old.split(':').next_back()returned true. That behavior change has no test;root_rule_allows_owner_short_formonly covers the did:key path and passes against both old and new code. Add a non-key-owner case asserting the bare segment no longer matches. -
[P3] Add LIMIT 1 to the get_repo query
crates/gitlawb-node/src/db/mod.rs:913
The query can now match both a canonical and a mirror row, so theORDER BYbuilds a two-row set thatfetch_optionalthen trims.LIMIT 1makes the single-row intent explicit and skips the needless sort-and-discard. -
[P3] Fix the is_owner doc comment; keep normalize_owner_key
crates/gitlawb-node/src/visibility.rs:26
The comment saysis_ownermirrors the owner-match idiom inapi/protect.rs, butprotect.rsusesdid_matcheswhileis_ownernow uses the stricternormalize_owner_key. Update the comment to reflect the intentional did:key-only rule. Keepnormalize_owner_keyhere: switching todid_matcheswould reintroduce the cross-method looseness this PR removes. -
[P3] Document the non-key clone_url disk-path constraint
crates/gitlawb-node/src/api/repos.rs:1627
For a non-key DID owner,normalize_owner_keyreturns the full DID, soclone_urlbecomes/did:web:host:alice/repo.git; that resolves throughget_repo, but the colon-bearing path segment would break thesync.rsdisk-path join. Not reachable today (auth is did:key-only), so this is a forward constraint to document/handle before non-key ownership lands, not a live break.
Out of scope for this PR, tracking separately (no action needed here): api/agents.rs caller_matches_did still uses the old last-segment matcher; it's the one owner-match site the centralization didn't reach, and it isn't exploitable (deregister only revokes the caller's own DID).
@Gravirei nice work, the #124 fix is correct and verified. Once the drift guard is in (and the P3s while you're there), I'll re-review.
… review feedback - Hoist 7 verbatim SQL CASE copies into OWNER_KEY_CASE_SQL const (dedup_cte(), get_repo, count_repos_deduped), keeping one literal copy in the v7 migration with a cross-reference comment. - Add 9 unit tests for normalize_owner_key over the full boundary set. - Add LIMIT 1 to get_repo query for single-row intent. - Add non_key_owner_bare_short_does_not_match is_owner regression test. - Update is_owner doc to reflect did:key-only rule. - Document non-key clone_url disk-path constraint in to_response.
…KEY_CASE_SQL Verifies every boundary value produces identical results from the Rust function and the SQL CASE expression, preventing silent drift.
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the contribution. I do not see any actionable issues from my review.
@kevincodex1 LGTM
beardthelion
left a comment
There was a problem hiding this comment.
Re-reviewed on cc0722e9. Every item from my last pass is resolved, and I verified the changed branches load-bearing by execution, both ways:
- #124 gate: reverting only the canonical-preference
ORDER BYterm flipsauthorize_repo_readto serve the public mirror row for the private canonical to an anonymous caller (Ok(is_public: true)instead ofErr(RepoNotFound)); with the term in place it denies. - did:key-only
is_owner:non_key_owner_bare_short_does_not_matchpasses as written and fails if I restore the oldsplit(':').next_back()(bare short wrongly matches the non-key owner), so it's load-bearing. - Drift guard:
normalize_owner_key_matches_sql_casepasses and fails if the Rust helper or the SQL const diverge; the hoist renders byte-identical for the deduped and count queries.
The P2 drift guard and the four P3s are all in.
- [P3] Cover the both-rows state end to end. The reachable way one node ends up with both a canonical and a mirror row for the same owner/name is
trigger_syncre-mirroring its own repo, but the tests only seed that state directly. Atrigger_sync -> upsert_mirrortest that then asserts the gate still denies would lock the path the fix exists for.
Non-blocking latent (safe while auth is did:key-only): the frozen idx_repos_owner_key_name migration keeps a verbatim CASE copy (index-usage only), and the owner-side-only is_owner normalization plus the agents.rs/profile last-segment matchers are worth revisiting when non-key DIDs land.
@kevincodex1 this is ready. LGTM.
Replicated mirror rows are stored with
is_public=trueand no visibility rules, andget_repocould resolve a request to the mirror row instead of the canonical one. An unauthenticated caller could read a private/withheld repo's refs and objects through the bare short-DID owner, bypassing the canonical row's visibility rules.Fix: Added
ORDER BY CASE WHEN position('/' in id) > 0 THEN 1 ELSE 0 END, created_at ASC, id ASCtoget_reposo canonical rows (UUID id, no slash) are always preferred over mirror rows (slash-form id). This matches the ranking logic already used byDEDUP_CTEanddedupe_canonical_repos.Closes #124
Summary by CodeRabbit
did:key:owner inputs are normalized and matched only under the correct conditions, avoiding unintended DID cross-matches.