Replicated mirror rows are stored is_public=true with no visibility rules, and get_repo can resolve a request to the mirror row instead of the canonical one. The result is that an unauthenticated caller can read a private/withheld repo's refs (and, on the pack path, its objects) through the bare short-DID owner, bypassing the canonical row's visibility rules.
Found as a follow-up while planning the GET /ipfs/{cid} visibility gate (#110). It is independent of #110: the leak is already live on the smart-HTTP pack path today.
Where
upsert_mirror_repo — crates/gitlawb-node/src/db/mod.rs:813
get_repo — crates/gitlawb-node/src/db/mod.rs:843
git_info_refs / git_upload_pack gate — crates/gitlawb-node/src/api/repos.rs:486
sync.rs replication ingest (no rule replication)
Root cause
Three facts combine:
-
Mirror rows are born public with no rules. When a peer's repo is replicated here, upsert_mirror_repo inserts a row with id = "{owner_short}/{name}", owner_did = <bare short DID>, and a hardcoded is_public = true. It inserts no visibility rules, and sync.rs never replicates the owner's rules. Visibility rules attach per physical repo.id (visibility_rules.repo_id; list_visibility_rules(&repo.id)), so the mirror row has an empty rule set regardless of how the canonical repo is gated.
-
get_repo fuzzy-matches both rows and picks one nondeterministically. Its query is WHERE (owner_did = $1 OR owner_did LIKE '%:' || $1 || '%') AND name = $2 with fetch_optional and no ORDER BY. For a request to the bare short DID, both the mirror row (owner_did = <short>, exact match) and the canonical row (owner_did = did:key:<short>, LIKE match) qualify; the DB returns one with no defined preference.
-
The read gate trusts whichever row it got. git_info_refs / git_upload_pack call list_visibility_rules(&record.id) + visibility_check(.., "/"). When the record is the mirror row, the rule set is empty and is_public = true, so the check returns Allow and the advertisement/pack is served, even though the canonical row carries a /secret/** deny (or is fully private).
Reproduction (observed)
A throwaway #[sqlx::test] (reverted, not committed) seeded a source repo on a topsecret-branch, registered a mirror row via upsert_mirror_repo(<short>, "mir", <disk>, None), then added a canonical row (is_public=false) with a /secret/** deny rule for the same logical repo.
- Anonymous
GET /<short>/mir.git/info/refs?service=git-upload-pack → 200 OK, body contained topsecret-branch (the real ref served to an anonymous caller).
get_repo(<short>, "mir") resolved to the mirror row: Some(("z6Mk…/mir", is_public=true)).
- Same logical repo, opposite decisions at
"/": mirror row (is_public=true, no rules) → Allow; canonical row (is_public=false, /secret/** deny) → Deny.
Impact
Any repo replicated to this node is readable by anyone who addresses it via the bare short-DID owner, regardless of the owner's visibility settings on the canonical row. This affects the smart-HTTP advertisement and pack paths, and the same is_public=true/no-rules mirror row would defeat a per-row visibility gate on GET /ipfs/{cid} (the #110 fix) as well. Severity high: it is an unauthenticated read of private/withheld content.
Suggested remediation (directions, not a decision)
- Make
get_repo deterministic and prefer the canonical row (e.g. ORDER BY favouring the did:key: form, or resolve through the existing canonical dedup before gating), so the gate never keys off a mirror row.
- And/or stop creating mirror rows as
is_public=true with no rules: replicate the source's visibility state during sync, or default mirrors closed until rules are known.
- And/or gate reads against the canonical row's rules for the logical repo rather than the physical mirror row.
The right fix likely touches replication and the read gate together; worth scoping before implementing.
Replicated mirror rows are stored
is_public=truewith no visibility rules, andget_repocan resolve a request to the mirror row instead of the canonical one. The result is that an unauthenticated caller can read a private/withheld repo's refs (and, on the pack path, its objects) through the bare short-DID owner, bypassing the canonical row's visibility rules.Found as a follow-up while planning the
GET /ipfs/{cid}visibility gate (#110). It is independent of #110: the leak is already live on the smart-HTTP pack path today.Where
upsert_mirror_repo—crates/gitlawb-node/src/db/mod.rs:813get_repo—crates/gitlawb-node/src/db/mod.rs:843git_info_refs/git_upload_packgate —crates/gitlawb-node/src/api/repos.rs:486sync.rsreplication ingest (no rule replication)Root cause
Three facts combine:
Mirror rows are born public with no rules. When a peer's repo is replicated here,
upsert_mirror_repoinserts a row withid = "{owner_short}/{name}",owner_did = <bare short DID>, and a hardcodedis_public = true. It inserts no visibility rules, andsync.rsnever replicates the owner's rules. Visibility rules attach per physicalrepo.id(visibility_rules.repo_id;list_visibility_rules(&repo.id)), so the mirror row has an empty rule set regardless of how the canonical repo is gated.get_repofuzzy-matches both rows and picks one nondeterministically. Its query isWHERE (owner_did = $1 OR owner_did LIKE '%:' || $1 || '%') AND name = $2withfetch_optionaland noORDER BY. For a request to the bare short DID, both the mirror row (owner_did = <short>, exact match) and the canonical row (owner_did = did:key:<short>, LIKE match) qualify; the DB returns one with no defined preference.The read gate trusts whichever row it got.
git_info_refs/git_upload_packcalllist_visibility_rules(&record.id)+visibility_check(.., "/"). When the record is the mirror row, the rule set is empty andis_public = true, so the check returnsAllowand the advertisement/pack is served, even though the canonical row carries a/secret/**deny (or is fully private).Reproduction (observed)
A throwaway
#[sqlx::test](reverted, not committed) seeded a source repo on atopsecret-branch, registered a mirror row viaupsert_mirror_repo(<short>, "mir", <disk>, None), then added a canonical row (is_public=false) with a/secret/**deny rule for the same logical repo.GET /<short>/mir.git/info/refs?service=git-upload-pack→ 200 OK, body containedtopsecret-branch(the real ref served to an anonymous caller).get_repo(<short>, "mir")resolved to the mirror row:Some(("z6Mk…/mir", is_public=true))."/": mirror row (is_public=true, no rules) →Allow; canonical row (is_public=false,/secret/**deny) →Deny.Impact
Any repo replicated to this node is readable by anyone who addresses it via the bare short-DID owner, regardless of the owner's visibility settings on the canonical row. This affects the smart-HTTP advertisement and pack paths, and the same
is_public=true/no-rules mirror row would defeat a per-row visibility gate onGET /ipfs/{cid}(the #110 fix) as well. Severity high: it is an unauthenticated read of private/withheld content.Suggested remediation (directions, not a decision)
get_repodeterministic and prefer the canonical row (e.g.ORDER BYfavouring thedid:key:form, or resolve through the existing canonical dedup before gating), so the gate never keys off a mirror row.is_public=truewith no rules: replicate the source's visibility state during sync, or default mirrors closed until rules are known.The right fix likely touches replication and the read gate together; worth scoping before implementing.