Skip to content

Replicated mirror rows are public with no rules: unauthenticated read of withheld repo content via bare short-DID owner #124

Description

@beardthelion

Replicated mirror rows are stored is_public=true with no visibility rules, and get_repo can resolve a request to the mirror row instead of the canonical one. The result is that an unauthenticated caller can read a private/withheld repo's refs (and, on the pack path, its objects) through the bare short-DID owner, bypassing the canonical row's visibility rules.

Found as a follow-up while planning the GET /ipfs/{cid} visibility gate (#110). It is independent of #110: the leak is already live on the smart-HTTP pack path today.

Where

  • upsert_mirror_repocrates/gitlawb-node/src/db/mod.rs:813
  • get_repocrates/gitlawb-node/src/db/mod.rs:843
  • git_info_refs / git_upload_pack gate — crates/gitlawb-node/src/api/repos.rs:486
  • sync.rs replication ingest (no rule replication)

Root cause

Three facts combine:

  1. Mirror rows are born public with no rules. When a peer's repo is replicated here, upsert_mirror_repo inserts a row with id = "{owner_short}/{name}", owner_did = <bare short DID>, and a hardcoded is_public = true. It inserts no visibility rules, and sync.rs never replicates the owner's rules. Visibility rules attach per physical repo.id (visibility_rules.repo_id; list_visibility_rules(&repo.id)), so the mirror row has an empty rule set regardless of how the canonical repo is gated.

  2. get_repo fuzzy-matches both rows and picks one nondeterministically. Its query is WHERE (owner_did = $1 OR owner_did LIKE '%:' || $1 || '%') AND name = $2 with fetch_optional and no ORDER BY. For a request to the bare short DID, both the mirror row (owner_did = <short>, exact match) and the canonical row (owner_did = did:key:<short>, LIKE match) qualify; the DB returns one with no defined preference.

  3. The read gate trusts whichever row it got. git_info_refs / git_upload_pack call list_visibility_rules(&record.id) + visibility_check(.., "/"). When the record is the mirror row, the rule set is empty and is_public = true, so the check returns Allow and the advertisement/pack is served, even though the canonical row carries a /secret/** deny (or is fully private).

Reproduction (observed)

A throwaway #[sqlx::test] (reverted, not committed) seeded a source repo on a topsecret-branch, registered a mirror row via upsert_mirror_repo(<short>, "mir", <disk>, None), then added a canonical row (is_public=false) with a /secret/** deny rule for the same logical repo.

  • Anonymous GET /<short>/mir.git/info/refs?service=git-upload-pack200 OK, body contained topsecret-branch (the real ref served to an anonymous caller).
  • get_repo(<short>, "mir") resolved to the mirror row: Some(("z6Mk…/mir", is_public=true)).
  • Same logical repo, opposite decisions at "/": mirror row (is_public=true, no rules) → Allow; canonical row (is_public=false, /secret/** deny) → Deny.

Impact

Any repo replicated to this node is readable by anyone who addresses it via the bare short-DID owner, regardless of the owner's visibility settings on the canonical row. This affects the smart-HTTP advertisement and pack paths, and the same is_public=true/no-rules mirror row would defeat a per-row visibility gate on GET /ipfs/{cid} (the #110 fix) as well. Severity high: it is an unauthenticated read of private/withheld content.

Suggested remediation (directions, not a decision)

  • Make get_repo deterministic and prefer the canonical row (e.g. ORDER BY favouring the did:key: form, or resolve through the existing canonical dedup before gating), so the gate never keys off a mirror row.
  • And/or stop creating mirror rows as is_public=true with no rules: replicate the source's visibility state during sync, or default mirrors closed until rules are known.
  • And/or gate reads against the canonical row's rules for the logical repo rather than the physical mirror row.

The right fix likely touches replication and the read gate together; worth scoping before implementing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    crate:nodegitlawb-node — the serving node and REST APIkind:securityVulnerability fix or hardeningsev:highMajor break or real security/trust risk, no easy workaroundsubsystem:apiNode REST API request/response surfacesubsystem:replicationMirror, replica, and cross-node syncsubsystem:visibilityPath-scoped visibility and content withholding

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions