fix(node): gate /ipfs/pins and /arweave/anchors behind authentication (#121)#134
fix(node): gate /ipfs/pins and /arweave/anchors behind authentication (#121)#134Gravirei wants to merge 22 commits into
Conversation
Gitlawb#126) The IPFS visibility gate used withheld_blob_oids (a deny-set enumerating only reachable blobs), so a dangling/unreachable blob was absent from the set and served in cleartext to anonymous callers. Flip to an allowed-set (allowed_blob_set_for_caller) that enumerates reachable blobs the caller may read: a dangling blob has no path, is never in the set, and 404s.
Move store::read_object before the allowed_blob_set_for_caller spawn_blocking call so random-CID spray against repos with path-scoped rules cannot trigger full-history git walks on repos that don't carry the object.
|
Thanks for the contribution. A couple of things will help us review this faster:
See CONTRIBUTING.md. Update the PR and these notes will clear automatically. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds optional-signature middleware for ChangesAuth Gating for Metadata Index Endpoints
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/gitlawb-node/src/api/ipfs.rs (1)
186-203: 🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy liftAvoid exposing the node-wide pin index to every signed DID.
This blocks anonymous callers, but any authenticated DID still reaches
list_pinned_cids()and receives the full node-wide CID index. If authenticated users are not all node-wide admins, private-repo CIDs remain enumerable. Gate this with a real node-wide permission or make pins repo-scoped and authorize repo read before returning entries.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/ipfs.rs` around lines 186 - 203, The current auth check in list_pins only blocks anonymous callers, but still lets any authenticated DID reach list_pinned_cids() and see the full node-wide pin index. Update list_pins to enforce a real node-wide permission using the authenticated caller from AuthenticatedDid before returning pins, or change the data model so pins are repo-scoped and authorize repo read access per caller. Keep the existing unauthorized path for unauthenticated requests, and ensure the final result only includes entries the caller is allowed to see.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/api/arweave.rs`:
- Around line 45-57: The global anchor listing path in the arweave handler still
exposes private repo metadata because it only checks that a caller exists before
calling list_arweave_anchors(None, limit). Update the handler in the arweave API
to enforce per-repo read visibility for the no-repo case, or require q.repo to
be present, or gate global listing behind a node-admin capability. Use the
existing caller check and the list_arweave_anchors call site to locate the fix.
- Around line 52-55: The Arweave anchor query is only capping the upper bound in
the handler, so negative `q.limit` values can still reach the database and fail
in `list_arweave_anchors`. Update the limit handling in `arweave.rs` to clamp
the request value to a minimum of zero and a maximum of 200 before passing it
into `state.db.list_arweave_anchors`, keeping the existing `q.repo.as_deref()`
flow unchanged.
---
Outside diff comments:
In `@crates/gitlawb-node/src/api/ipfs.rs`:
- Around line 186-203: The current auth check in list_pins only blocks anonymous
callers, but still lets any authenticated DID reach list_pinned_cids() and see
the full node-wide pin index. Update list_pins to enforce a real node-wide
permission using the authenticated caller from AuthenticatedDid before returning
pins, or change the data model so pins are repo-scoped and authorize repo read
access per caller. Keep the existing unauthorized path for unauthenticated
requests, and ensure the final result only includes entries the caller is
allowed to see.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f9fd5b6a-451e-4f4f-a04b-f97da029b822
📒 Files selected for processing (3)
crates/gitlawb-node/src/api/arweave.rscrates/gitlawb-node/src/api/ipfs.rscrates/gitlawb-node/src/server.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/gitlawb-node/src/test_support.rs (1)
1968-2017: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd a repo-scoped public success case and assert the filter actually narrows the result set.
These tests only exercise
?repo=against a single private repo with a single anchor. That leaves two intended contracts unpinned: anonymous access to a public repo-scoped listing, and returning only anchors for the requested repo. A regression that blanket-denies public?repo=reads or ignores therepopredicate would still pass here.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/test_support.rs` around lines 1968 - 2017, The `anchors_repo_denies_anonymous_on_private` and `anchors_repo_allows_owner` tests only cover a single private repo, so they do not prove that `?repo=` works for public repositories or that the repo filter is actually applied. Add a repo-scoped success case for a public repo in the `anchors_router`/`seed_anchor` test area, and make the assertion check that only anchors for the requested repo are returned (not just the count). Use the existing `anchors_repo_*` test patterns and the `anchors_router` request helpers to locate the right spot.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/gitlawb-node/src/test_support.rs`:
- Around line 1968-2017: The `anchors_repo_denies_anonymous_on_private` and
`anchors_repo_allows_owner` tests only cover a single private repo, so they do
not prove that `?repo=` works for public repositories or that the repo filter is
actually applied. Add a repo-scoped success case for a public repo in the
`anchors_router`/`seed_anchor` test area, and make the assertion check that only
anchors for the requested repo are returned (not just the count). Use the
existing `anchors_repo_*` test patterns and the `anchors_router` request helpers
to locate the right spot.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 13b6010e-7968-4a2e-ae89-754bc8ff29d1
📒 Files selected for processing (1)
crates/gitlawb-node/src/test_support.rs
beardthelion
left a comment
There was a problem hiding this comment.
This closes the anonymous hole in #121 and the ?repo= anchor path is correctly visibility-gated, so the direction is right. But the global listings need to filter on current visibility, not just require authentication, and there is a real leak path that auth alone does not cover. I traced the write and read sides end to end against the merged state.
The index tables are correctly gated on the write side: pinning (repos.rs:1055,:1167, behind withheld.is_some()) and anchoring (repos.rs:1249, behind announce) only run for anonymously-root-readable repos, and object_list excludes withheld blobs. So a repo that is private at push time is never indexed. CodeRabbit's finding is still valid though, by a different mechanism than a missing write gate: the gate is evaluated once at push, visibility is mutable afterward, and nothing reconciles the index.
Findings
-
[P2] Filter the global
/arweave/anchorsand/ipfs/pinslistings on current visibility, not just authentication
crates/gitlawb-node/src/api/arweave.rs:53,crates/gitlawb-node/src/api/ipfs.rs(list_pins)
This confirms and extends CodeRabbit's open finding on the global anchor listing. Requiring authentication does not authorize: identities here are permissionless (optional_signatureverifies a self-made signature,registeris open), so any throwaway DID still reads the global lists. And because index rows are never reconciled when a repo is made private after a public push (noDELETEof either table exists,set_visibilitytouches neither, and both list queries are unfilteredSELECTs), the global lists can serve a now-private repo's slug, owner DID, branch names, commit SHAs, and object CIDs. Content stays gated byGET /ipfs/{cid}(#110/#133), so this is metadata disclosure. The same gap applies to/ipfs/pins, which is not flagged elsewhere. Resolve each row's repo and apply the current-visibility check before returning it (or restrict the global listing to a node-admin capability). Full mechanism and repro in #136. -
[P2] Fix the
?repo=gate-vs-filter representation drift (and the test that hides it)
crates/gitlawb-node/src/api/arweave.rs:51
The gate resolves the repo viaauthorize_repo_read->get_repo, which matches the owner byLIKE(fulldid:key:or bare short form). The result query then filters exactWHERE repo = $1on the raw?repo=string, but anchor rows are written short-form ({owner_short}/{name},repos.rs:1142). So?repo=did:key:zX/nameauthorizes (200) yet returns an empty list; only?repo=zX/namereturns the owner's anchors. It fails safe (returns fewer results, not more), so it is a correctness/usability bug, not a leak. The newanchors_repo_*tests pass only becauseseed_anchorseeds the full-DID form, which production never writes. Normalize the slug from the authorizedRepoRecordbefore querying, and seed the production short form in tests. -
[P3] Run
cargo fmt—test_support.rs:1903failsrustfmt --check
crates/gitlawb-node/src/test_support.rs:1903
A longassert_eq!line needs wrapping; the format gate fails.cargo fmtfixes it.
Notes (non-blocking): the parse-fail branch returns NotFound("repo not found") while the gate path returns RepoNotFound; both 404 and deny is indistinguishable from miss, so no oracle, but returning RepoNotFound on both is more consistent. ?limit=-1 reaches Postgres as LIMIT -1 (pre-existing). The new tests use local routers rather than build_router, so the layering fix that is the core of this PR is not asserted end to end; a test through build_router would lock it. The router refactor itself is correct: the old .merge-after-.layer left /ipfs/pins unsigned, and this fixes it.
The ?repo= gating pattern is the right one. Extending the same current-visibility filtering to the two global listings (per #136) is what completes the #121 fix.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/api/arweave.rs`:
- Around line 36-37: The anonymous global anchor listing path in the arweave API
still allows `list_arweave_anchors(None, limit)` to run when `repo` is absent,
which should be rejected instead. Update the handler in `arweave.rs` around
`caller`/`list_arweave_anchors` so that when no repository is specified and
`auth` yields `None`, the request returns a 401 before any query is executed.
Keep the authenticated and repo-scoped paths unchanged, and make sure the
`list_arweave_anchors` call only happens after a valid caller has been
established.
In `@crates/gitlawb-node/src/api/ipfs.rs`:
- Around line 195-203: The pin listing handler is still continuing when auth is
missing, so anonymous requests get a filtered success response instead of being
rejected. In the IPFS API handler around the caller/auth extraction and
`list_pinned_cids` flow, add an explicit unauthorized check when `caller` is
`None` before loading pins, and return the proper 401 early from this function
instead of proceeding into the database query and filter logic.
- Around line 220-229: The repo selection logic in list_pins is bypassing the
quarantine gate because it calls visibility_check directly instead of the shared
authorization path used by authorize_repo_read. Update the readable-repo
derivation in the loop over repos to preserve the quarantine exclusion before
applying repo-level visibility, preferably by reusing authorize_repo_read or
extracting the quarantine filter into a shared helper referenced by both
list_pins and authorize_repo_read. Ensure the fix is applied around the repo
iteration and visibility_check flow so quarantined repos never contribute CIDs
to the pins index.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e012ef44-6106-4e47-a97e-b4a94070946d
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
crates/gitlawb-node/src/api/arweave.rscrates/gitlawb-node/src/api/ipfs.rscrates/gitlawb-node/src/test_support.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/gitlawb-node/src/test_support.rs
…tion ✓ P3 blocker fixed: cargo fmt applied — the format gate will pass. ✓ P3 cleanup resolved: withheld_blob_oids is still used by replication code in repos.rs, so it stays. • P2 follow-up: Tree/commit disclosure tracked in Gitlawb#135 — out of scope here.
jatmn
left a comment
There was a problem hiding this comment.
I found issues that need to be addressed before this is ready.
Findings
-
[P1] Complete CodeRabbit's request to reject anonymous global listings
crates/gitlawb-node/src/api/arweave.rs:36,crates/gitlawb-node/src/api/ipfs.rs:195
The latest patch still only makes the signature optional and then proceeds whencallerisNone. As a result, anonymousGET /api/v1/arweave/anchorsandGET /api/v1/ipfs/pinsreturn200instead of the401behavior this PR claims and tests for. I verified this with the PR's own focused tests:cargo test -p gitlawb-node pins_list -- --nocapturefails withpins_list_denies_anonymousreturning200instead of401, andcargo test -p gitlawb-node anchors_ -- --nocapturefails withanchors_global_denies_anonymousreturning200instead of401. Please complete the current CodeRabbit requests by returningUnauthorizedbefore querying/filtering whenever the global listing has no authenticated caller. -
[P1] Fix the new endpoint tests so they pass and cover the production data shape
crates/gitlawb-node/src/test_support.rs:1889,crates/gitlawb-node/src/test_support.rs:1960,crates/gitlawb-node/src/test_support.rs:2011
The new tests currently seed rows that the new implementation filters out, so the PR's own coverage is red and does not prove the intended production behavior.pins_list_allows_authenticatedinserts a barepinned_cidsrow but no readable repo/object containing that SHA, whilelist_pinsnow only returns pins whose SHA is found by scanning readable repo object databases, so the count is0instead of1. The anchor tests seedsome/repoor the full-DID slug (did:key:.../repo), but production anchor writes use the short owner slug fromrepos.rs, andlist_anchorsnow normalizes authorized?repo=lookups to that short form, so the owner/global success tests also return count0. Please seed the same repo/object and short-slug shapes that the push/anchor paths actually write, then keep the assertions on the filtered results. -
[P2] Preserve the quarantine gate when deriving readable repos for pins
crates/gitlawb-node/src/api/ipfs.rs:220
authorize_repo_readhides quarantined repos before it applies visibility rules, but the newlist_pinsimplementation bypasses that helper and iterateslist_all_repos()with a directvisibility_check. A quarantined public mirror can therefore still contribute object SHAs toallowed_sha256s, and any matching pinned CID is returned even though the rest of the read surface treats that repo as nonexistent until quarantine is cleared. Please reuseauthorize_repo_readsemantics here or add the sameis_repo_quarantinedexclusion before scanning repo objects.
beardthelion
left a comment
There was a problem hiding this comment.
The core gates verify as resolved on the current head: anonymous global listings return 401 before any query, the pin scan skips quarantined repos, and the tests now seed the production slug and object shapes. The earlier P1/P1#2/P2 all check out on this head. What's left is regression coverage on the new exclusion branches, which currently have no test exercising them.
Findings
-
[P3] Cover the quarantine exclusion in
list_pins
crates/gitlawb-node/src/api/ipfs.rs:235
Theis_repo_quarantinedskip is correct, but nopins_test seeds a quarantined repo, so nothing guards it against regression. Seed a quarantined mirror that shares an object SHA with a pinned CID (and no readable non-quarantined repo carrying that SHA), then assert the pin is withheld. -
[P3] Cover the path-scoped withheld-blob exclusion in
list_pins
crates/gitlawb-node/src/api/ipfs.rs:260
list_pinsbuilds its allowed set withwithheld_blob_oids, a separate construction fromget_by_cid'sallowed_blob_set_for_caller, and nopins_test sets a path-scoped rule, so this subtraction is uncovered. Add a public repo with a/secret/**rule where a withheld blob OID is pinned, and assert an authenticated non-reader's listing excludes that CID while the visible-object pins remain. This is the branch that would leak a private blob SHA through the pin index if the subtraction regresses. -
[P3] Add a negative case for the global authenticated anchor filter
crates/gitlawb-node/src/api/arweave.rs:76
anchors_global_allows_authenticatedonly proves a public anchor is visible; the per-rowauthorize_repo_readfilter has no test that an authenticated non-reader is denied another repo's private anchor. Add that exclusion assertion, mirroringanchors_repo_denies_non_readerfor the?repo=path.
There was a problem hiding this comment.
🧹 Nitpick comments (3)
crates/gitlawb-node/src/test_support.rs (3)
2079-2240: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winDuplicated owner/short-slug boilerplate across the five anchors tests.
anchors_global_allows_authenticated,anchors_global_denies_non_reader,anchors_repo_denies_anonymous_on_private,anchors_repo_allows_owner, andanchors_repo_denies_non_readerall repeat the identicalowner_short/short_slugderivation and repo/anchor seeding sequence. A small shared helper (e.g.,seed_owned_repo_with_anchor(is_public, repo_name) -> (Keypair, String /*owner_did*/, String /*short_slug*/)) would cut the duplication and centralize the short-slug convention these tests depend on.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/test_support.rs` around lines 2079 - 2240, The five anchors tests repeat the same owner DID to short-slug derivation and repo/anchor seeding logic, so factor that setup into a shared helper in test_support.rs and reuse it from anchors_global_allows_authenticated, anchors_global_denies_non_reader, anchors_repo_denies_anonymous_on_private, anchors_repo_allows_owner, and anchors_repo_denies_non_reader. Have the helper create the Keypair, derive owner_short/short_slug, seed either a public or private repo, and insert the anchor so the tests only vary by visibility and request/assertion details.
2146-2177: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winMissing positive-path coverage: anonymous
?repo=access to a public repo's anchors.The
?repo=tests only cover a private repo (deny-anonymous here, allow-owner at 2180-2212, deny-non-reader at 2214-2240). There's no test asserting an anonymous caller can still read anchors for a public repo via?repo=, even though the production comments emphasize that gating must not break legitimate anonymous access to public content. Adding that case would directly validate the PR's stated goal of not over-restricting public repos.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/test_support.rs` around lines 2146 - 2177, Add missing positive-path coverage for anonymous `?repo=` access on a public repo in the `anchors_repo_denies_anonymous_on_private` area. Create a new test alongside the existing `anchors_router` / `list_anchors` cases that seeds a public repo and anchor, calls the `GET /api/v1/arweave/anchors?repo=...` path with `anon_get`, and asserts `StatusCode::OK` (or the expected success response) to verify `?repo=` still allows anonymous reads for public content.
1881-2030: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winConsider extracting shared setup for the three pins tests.
pins_list_allows_authenticated,pins_list_excludes_quarantined_repos, andpins_list_withholds_path_scoped_blobseach repeat the same Keypair/fs_slug/short/seed_cid_reposboilerplate. Extracting a small helper (e.g., returning(Keypair, owner_did, CidFixture)) would reduce duplication and the risk of the three tests silently drifting apart on the owner-DID/slug convention.♻️ Sketch of a shared fixture helper
struct PinFixture { owner: gitlawb_core::identity::Keypair, owner_did: String, fx: CidFixture, } fn seed_pin_owner(bare_names: &[&str]) -> PinFixture { use gitlawb_core::identity::Keypair; let owner = Keypair::generate(); let owner_did = owner.did().to_string(); let fs_slug = owner_did.replace([':', '/'], "_"); let short = owner_did.split(':').next_back().unwrap().to_string(); let fx = seed_cid_repos(&fs_slug, &short, bare_names); PinFixture { owner, owner_did, fx } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/test_support.rs` around lines 1881 - 2030, The three pins tests repeat the same owner setup, slug derivation, and seed_cid_repos call, so extract that boilerplate into a small shared helper to keep them aligned. Add a fixture/helper near pins_list_allows_authenticated, pins_list_excludes_quarantined_repos, and pins_list_withholds_path_scoped_blobs that returns the generated Keypair, owner_did, and CidFixture (or equivalent), and update each test to use it for fs_slug/short and repository seeding. Keep the existing test-specific assertions and repo/quarantine/visibility setup unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/gitlawb-node/src/test_support.rs`:
- Around line 2079-2240: The five anchors tests repeat the same owner DID to
short-slug derivation and repo/anchor seeding logic, so factor that setup into a
shared helper in test_support.rs and reuse it from
anchors_global_allows_authenticated, anchors_global_denies_non_reader,
anchors_repo_denies_anonymous_on_private, anchors_repo_allows_owner, and
anchors_repo_denies_non_reader. Have the helper create the Keypair, derive
owner_short/short_slug, seed either a public or private repo, and insert the
anchor so the tests only vary by visibility and request/assertion details.
- Around line 2146-2177: Add missing positive-path coverage for anonymous
`?repo=` access on a public repo in the
`anchors_repo_denies_anonymous_on_private` area. Create a new test alongside the
existing `anchors_router` / `list_anchors` cases that seeds a public repo and
anchor, calls the `GET /api/v1/arweave/anchors?repo=...` path with `anon_get`,
and asserts `StatusCode::OK` (or the expected success response) to verify
`?repo=` still allows anonymous reads for public content.
- Around line 1881-2030: The three pins tests repeat the same owner setup, slug
derivation, and seed_cid_repos call, so extract that boilerplate into a small
shared helper to keep them aligned. Add a fixture/helper near
pins_list_allows_authenticated, pins_list_excludes_quarantined_repos, and
pins_list_withholds_path_scoped_blobs that returns the generated Keypair,
owner_did, and CidFixture (or equivalent), and update each test to use it for
fs_slug/short and repository seeding. Keep the existing test-specific assertions
and repo/quarantine/visibility setup unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 77f2dac8-2f19-4e9e-bb77-0aa66c200e5f
📒 Files selected for processing (1)
crates/gitlawb-node/src/test_support.rs
bf9d690 to
069804a
Compare
beardthelion
left a comment
There was a problem hiding this comment.
Re-reviewed on 069804a. The delta since my last pass is test-only (test_support.rs, +200/-84); the gates in list_pins and list_anchors are byte-identical to beafa8a where I already verified them, so this pass covers the new coverage.
All three gaps I flagged are closed, and each new test is load-bearing: with its production guard disabled in isolation, only that test fails, with the expected count.
- quarantine skip:
pins_list_excludes_quarantined_repos(0 vs 1) - path-scoped withheld-blob subtraction:
pins_list_withholds_path_scoped_blobs(secret OID leaks, 1 vs 2) - global per-row filter:
anchors_global_denies_non_reader(0 vs 1)
CodeRabbit's three nitpicks on 81a5772 (extract the pins/anchors fixtures, add the anonymous-public anchor case) are all addressed here. The full CI suite did not trigger on this head, so I ran it locally: fmt clean, clippy clean, full gitlawb-node suite green (325 passed).
One optional follow-up, not a blocker: list_pins has its own copy of the fail-closed "withheld walk failed, skip repo" arm with no test, while get_by_cid's copy is covered by ipfs_cid_walk_error_fails_closed.
Approving. @kevincodex1 this clears my earlier changes-requested; jatmn's CHANGES_REQUESTED also predates this head.
|
@kevincodex1 LGTM |
jatmn
left a comment
There was a problem hiding this comment.
I found issues that need to be addressed before this is ready.
Findings
-
[P2] Cover the production router wiring for the signed metadata endpoints
crates/gitlawb-node/src/test_support.rs:1848,crates/gitlawb-node/src/test_support.rs:2032
The new tests prove the handlers work when the test manually builds a tiny router and appliesoptional_signature, but they do not exerciseserver::build_router, which is the production wiring this PR had to fix. That matters here because/api/v1/ipfs/pinswas previously outside the optional-signature layer; ifserver.rsregressed or the route were left in the old.merge(...).layer(...)shape, the signedpins_list_allows_authenticatedtest would still pass while the real route would never attachAuthenticatedDidand would return401for every signed pins request. Please add at least one integration test throughbuild_routerfor the signed success path, especially/api/v1/ipfs/pins, so the route-layer fix is actually pinned. -
[P3] Complete CodeRabbit's request to clamp negative anchor limits
crates/gitlawb-node/src/api/arweave.rs:69
CodeRabbit's earlier negative-limit request is still valid on the current head:q.limit.min(200)caps only the upper bound, so/api/v1/arweave/anchors?limit=-1still passes-1intolist_arweave_anchors, where it is bound directly into PostgreSQLLIMIT. PostgreSQL rejects a negative limit, so a malformed query can turn this listing endpoint into an internal error instead of a bounded empty/small result. Please clamp the lower bound as well, for example withq.limit.clamp(0, 200), before callingstate.db.list_arweave_anchors.
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the update. I rechecked the changed paths and found issues that still need to be addressed.
Findings
-
[P2] Complete the mirror-row fix for repo-scoped anchor lookups
crates/gitlawb-node/src/api/arweave.rs:57
jatmn's mirror-row request is only fixed for the new global filters. The?repo=path still gates throughauthorize_repo_read, which callsget_repoand can match either the bare public mirror row or the private canonicaldid:key:row because that lookup has no canonical preference. If a caller requests/api/v1/arweave/anchors?repo=<short-owner>/<name>for a repo that has both rows, the public mirror row can satisfy the read gate, then this handler normalizes back to the same short slug and returns the anchor rows even though the canonical repo rules would deny the caller. Please resolve the repo-scoped path against the same canonical/deduped visibility source used by the global filter, or make the canonical-row preference land before relying on this gate, and add a mirror+canonical regression test for the?repo=anchor endpoint. -
[P2] Keep the global anchor listing bounded in SQL
crates/gitlawb-node/src/api/arweave.rs:85
The global path now callslist_arweave_anchors(None, i64::MAX)and only applies the requestedlimitafter every anchor row has been loaded, filtered, and collected in Rust. This endpoint is reachable by any signed DID, and anchors are written on every announced push/ref update, so the first authenticated global-list request on a large node can force an unboundedORDER BY anchored_at DESC LIMIT 9223372036854775807plus a full result materialization just to return at most 200 rows. Please keep the visibility-before-limit behavior without removing the database bound, for example by querying only readable repo slugs withWHERE repo = ANY(...) ORDER BY anchored_at DESC LIMIT $limitor by using a bounded/cursor batch loop.
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the update. I rechecked the changed paths and found an issue that still needs to be addressed.
Findings
- [P2] Include the owner identity when querying anchor rows
crates/gitlawb-node/src/api/arweave.rs:92
The current filter authorizes a deduped repo record, but then collapses the lookup back to the lossy{last-DID-segment}/{repo}slug andlist_arweave_anchors(_for_repos)only filters onarweave_anchors.repo. Two distinct owners such asdid:key:z6Sameanddid:gitlawb:z6Sameare intentionally kept as separate repos elsewhere, but both map toz6Same/<name>here. If the caller can read one of those repos, both the?repo=path and the global readable-slug query can return anchor rows for the other, unreadable owner. Please constrain the anchor query by the canonical owner identity as well as the repo slug, or store/query anchors using the same DID-aware key that the repo dedupe logic uses.
Superseded: re-reviewed the current head 85f8f94. Prior items resolved; posting a fresh review with the remaining findings.
beardthelion
left a comment
There was a problem hiding this comment.
Re-reviewed on 85f8f94a. The core gating is a real improvement and the prior items are resolved (my filter-before-limit and negative-limit, jatmn's SQL bound and mirror-row repo-scoped fix; the pins sibling is correctly immune because it keys off the full owner_did -> disk path, not a slug). But jatmn's owner-identity finding is worse than forward-only: I verified it is reachable today under default config, and it leaks private-repo metadata. Requesting changes on that.
Findings
-
[P1] Cross-owner private-anchor leak (jatmn's owner-identity finding, raised to P1 because it is reachable today): anchor queries key on a lossy last-segment slug with no owner_did constraint, and a public mirror can be planted to collide with a private repo's slug
crates/gitlawb-node/src/api/arweave.rs:92(per-repo) and:114-122(global);crates/gitlawb-node/src/db/mod.rs:2352,2395
Both anchor paths gate visibility correctly on the deduped canonical record, then collapse the owner toowner_did.split(':').next_back()and queryarweave_anchorsby the{short}/{name}slug only. The table has anowner_didcolumn, but neither query filters on it. This is reachable now, not only once non-key auth lands:- An unauthenticated caller
POSTs/api/v1/sync/notify(unsigned is accepted by default,GITLAWB_REQUIRE_SIGNED_PEER_WRITESis off) withrepo = "did:web:evil:z6Victim/secret". The sync worker (sync.rs:228) splits on the first/, admits it (ICAPTCHA_MODEoff by default), andupsert_mirror_repowrites a row withowner_did = "did:web:evil:z6Victim",is_public = true,quarantined = false. - The dedup key only strips
did:key:, so that non-key owner is its own dedup group and surviveslist_all_repos_dedupednext to the victim's privatedid:key:z6Victimcanonical instead of collapsing into it. Its last-segment anchor slug isz6Victim/secret, identical to the victim's. - Any self-registered DID then calls the global
/arweave/anchors; the attacker's public mirror putsz6Victim/secretin the readable set, andWHERE repo = ANY(...)returns the victim's private anchor rows (ref names, SHAs, CIDs).
I reproduced this end-to-end by execution against the production global-anchors handler: an attacker with a throwaway self-signed DID, having planted a public non-key mirror through the sameupsert_mirror_repothe sync worker calls, receives the victim's private anchor (ref, old/new SHA, CID) fromGET /api/v1/arweave/anchors. Constraining the anchor query by the readable repos'owner_didcloses it and leaves the existing allow/deny/non-reader/limit tests green; the unsigned/api/v1/sync/notifythat plants the mirror is also accepted. Constrain both anchor queries by the canonicalowner_did(the column already exists and is written on push) — ideally as(slug, owner_did)pairs rather than two independentANY(...)lists — and add a cross-owner regression test seeding two owners that share a last segment. This is not a regression (the endpoint was fully open before this PR), but the gate this PR adds has to close this bypass to meet its own goal.
- An unauthenticated caller
-
[P2]
gl ipfs listbreaks andgl node statussilently degrades once /ipfs/pins requires auth
crates/gl/src/ipfs_cmd.rs:42,crates/gl/src/node.rs:203
Both constructNodeClient::new(&node, None)and call/api/v1/ipfs/pinsunsigned, so after this gategl ipfs listreturns a hard 401 and thegl node statuspins panel collapses to "IPFS not configured". Update these callers to sign the request (a signed-get path already exists) here or in a companion PR, otherwise shipping the gate breaks the CLI. -
[P2]
/api/v1/ipfs/pinsspawns a git subprocess per readable repo per request
crates/gitlawb-node/src/api/ipfs.rs:356
list_pinswalks every readable repo's objects (list_all_objects, orgit rev-list+git ls-treeper commit when path-scoped rules exist), andlist_pinned_cidshas no LIMIT. On a node with many repos, any authenticated caller forces heavy synchronous git work — a scaling and DoS surface on a security endpoint. Consider starting from the pinned CIDs and bounding the pin query. -
[P3] Use
AppError::RepoNotFound, notAppError::NotFound, for a malformed?repo=
crates/gitlawb-node/src/api/arweave.rs:57
Every other repo-lookup failure here and inauthorize_repo_readusesRepoNotFound(repo_not_foundcode); this branch emitsnot_found, so a client keying on the code misses it. -
[P3] Tests to add with the P1 fix
crates/gitlawb-node/src/test_support.rs
A cross-owner slug-collision regression test (two owners sharing a last segment — the RED test that fails without the owner_did fix); a pins non-reader test on a fully private repo (the current pins tests all use public repos, so theis_public=falsedeny path is unexercised); an anchors quarantine-exclusion test. The doc comment attest_support.rs:2248is also stale (it describes anauthorize_repo_read/get_repopath the global handler no longer uses).
Resolved and verified on this head: my filter-before-limit and negative-limit items, jatmn's SQL bound and mirror-row repo-scoped fix, and all five CodeRabbit threads. The pins path is genuinely immune to the anchor slug-collision.
@Gravirei the direction is right and the gate is close. The P1 is the one to fix before this lands — owner-scope the anchor query. Happy to re-review once that's in.
Superseded: re-reviewed on 77f42f0; posting an updated verdict.
beardthelion
left a comment
There was a problem hiding this comment.
Re-reviewed on 77f42f0d. The cross-owner anchor leak is closed, and I verified it load-bearing by execution on both paths: dropping the global owner_did post-filter and dropping AND owner_did = $2 on the ?repo= path each leaks a victim's private anchor under a colliding public mirror (count 1 vs 0); restored, both green. The REST handler is the only egress sink and pins stays immune. jatmn's owner-identity finding is resolved. Core is sound, approving.
One sequencing note, not a code issue on this diff: the gl CLI still calls /api/v1/ipfs/pins unsigned (ipfs_cmd.rs:42, node.rs:203), so this gate breaks gl ipfs list until a CLI-signing companion lands. Worth pairing in the same release.
@kevincodex1 verified and ready.
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the update. I rechecked the changed paths and found issues that still need to be addressed.
Findings
-
[P2] Apply the owner-DID filter before the global anchor LIMIT
crates/gitlawb-node/src/db/mod.rs:2409
The latest owner-DID post-filter closes the private-anchor leak, but the SQL query still limits by the lossyreposlug before that filter runs. If a readable repo and an unreadable repo share the same short slug,list_arweave_anchors_for_reposcan return the newest unreadable rows inside the requested limit, thenlist_anchorsdrops them byowner_did; older readable anchors for the caller are never fetched and there is no cursor to reach them. This is the same filter-before-limit class the previous global-anchor fix was meant to avoid, just after adding the owner identity check. Please constrain the database query by the readable(repo, owner_did)pairs before applyingLIMIT, or fetch bounded batches until the post-filter fills the requested visible page. -
[P2] Sign the
glpins callers before requiring auth
crates/gl/src/ipfs_cmd.rs:42
This PR makes/api/v1/ipfs/pinsreturn401when the request has noAuthenticatedDid, but both CLI consumers still buildNodeClient::new(&node, None)and call the endpoint with unsignedget:gl ipfs listhits the new 401 path, andgl node statusdrops the pins panel becausetry_get_jsontreats the non-2xx response as unavailable. Please load the local identity and use the existing signed GET path for these callers, or otherwise land the CLI signing companion before shipping the server-side gate so the released CLI is not broken by the endpoint change.
beardthelion
left a comment
There was a problem hiding this comment.
Correcting my state on 77f42f0d: I approved earlier, but jatmn's two findings both hold under verification, so I'm moving to request-changes to concur.
Findings
-
[P2] Constrain the global anchor query to readable
(repo, owner_did)pairs before theLIMIT
crates/gitlawb-node/src/api/arweave.rs:145
jatmn's filter-before-limit is real, and I reproduced it: with a readable and an unreadable repo colliding on the short slug, the caller who owns the readable repo gets an empty page (count: 0) atlimit=1, because the queryLIMITs on the slug and returns the unreadable repo's newer row, which theowner_didfilter then drops, so the readable row is never fetched and there is no cursor to reach it. It is a completeness gap, not a leak: I confirmed the post-filter holds at a wider limit, where the unreadable owner's rows never surface and only the caller's own are returned. Constrain the query to the readable(repo, owner_did)pairs beforeLIMIT. -
[P2] Release-couple with the CLI signing in #146
crates/gl/src/ipfs_cmd.rs:42
The new gate returns 401 to unsigned callers, and bothgl ipfs listand thegl node statuspins panel send unsigned GETs, so they break:node statusrenders "IPFS not configured", andgl ipfs listsilently prints "No IPFS pins recorded". #146 adds exactly this signing, so land it first or together. -
[P3] Add the slug-collision regression test for the global anchor filter
crates/gitlawb-node/src/test_support.rs
Theowner_didpost-filter is the only thing between a colliding private repo and a leak, and nothing guards it: a future refactor swapping the exact-equality check fordid_matcheswould collapse the two DIDs and silently reopen the hole. Seed a readable and an unreadable repo sharingowner_short/name, give the unreadable one the newer rows, and assert the caller sees their own anchor and never the other owner's; at a narrow limit this doubles as the load-bearing test for the finding-#1 fix.
The cross-owner leak closure itself verifies correct and load-bearing.
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the update. I rechecked the changed paths and found a couple issues that still need to be addressed.
Findings
-
[P2] Avoid whole-repo git walks on every pins listing
crates/gitlawb-node/src/api/ipfs.rs:266
The new visibility filter starts by loading every pin row, then loops every repo the caller can read and builds a global allow-set before filtering those pins. For repos without path-scoped rules this runsgit cat-file --batch-all-objectsthroughpush_delta::list_all_objects, and for path-scoped repos it callsallowed_blob_set_for_caller, which walks every reachable commit and runsgit ls-treeper commit. Before this PR,/api/v1/ipfs/pinswas a DB read overpinned_cids; after this change, any signed DID can make the endpoint enumerate object graphs across every public/readable repo, with no pin limit or cursor reducing the work. Please keep the current-visibility filtering, but make the read path bounded/indexed, for example by recording enough repo/pin ownership to filter from pin rows first and only checking the relevant repo/object batch instead of rescanning every readable repository on each request. -
[P2] Surface rejected pins reads instead of rendering them as empty/unconfigured
crates/gl/src/ipfs_cmd.rs:44,crates/gl/src/node.rs:181
The CLI callers now sign/api/v1/ipfs/pins, but they still treat rejected signed reads as successful empty/unconfigured states.NodeClient::get_signedreturns the HTTP response even for a 401/403, andgl ipfs listimmediately parses that JSON without checkingresp.status(), so a rejected pins read such as{"error":"unauthorized"}has nopinsfield and printsNo IPFS pins recorded.gl node statushas the same loss of signal in the dashboard path: missing identity or a rejected signed read both becomeNonefromtry_get_json_signed, then render asIPFS not configured. Please complete the CLI pins fix by checking non-2xx responses before parsing/listing pins and by distinguishing no-identity/rejected/unavailable from a real zero-pin response; adding the mock-based CLI tests from the companion path would keep this from regressing.
beardthelion
left a comment
There was a problem hiding this comment.
The gate direction is right and confidentiality holds. I could not construct a leak, and every error path in list_pins fails closed. Blocking on one thing: the pin-listing gate turns a cheap request into unbounded per-repo work.
Findings
-
[P1] Bound the pin listing; stop walking every repo's object graph per request
crates/gitlawb-node/src/api/ipfs.rs:266-314
list_pinsloads the whole pin index (list_pinned_cidshas no LIMIT), then for every readable repo runs a whole-repo git walk:push_delta::list_all_objects(git cat-file --batch-all-objects) or the per-commitallowed_blob_set_for_caller. OneGET /api/v1/ipfs/pinsfrom any self-registered DID is O(readable repos × objects) of git-subprocess work, uncached, and the route carriesoptional_signatureonly, with norate_limit_by_did. I reproduced it: five readable repos gave five whole-repo walks from a single permissionless request. This is the git-walk cost jatmn flagged; reproducing the reachability puts it at P1 for me. The arweave sibling already shows the fix (list_arweave_anchors_for_repos: build the readable set, one bounded SQL query, no git walk). Bring pins to parity: add an indexed repo/owner key topinned_cidsat pin time, pre-filter from it, and page with LIMIT/cursor. The SQL-only fix needs that column, whichpinned_cidsdoes not have yet. -
[P3] CLI pins-signing here is completed by #146
crates/gl/src/ipfs_cmd.rs
a98e544signsgl ipfs listso the happy path works against the gate, butcmd_listparses the body without checking status, so a 401/5xx prints "No IPFS pins recorded" and exits 0. #146 adds theresp.status()guard and surfaces the denial. No change needed here; noting the lineage so it is not lost.
Closes #121
Summary
Two node-wide metadata endpoints were served without authentication or visibility filtering, leaking private-repo metadata.
Changes
/api/v1/ipfs/pins(list_pins):optional_signaturemiddleware to the route401 Unauthorizedfor anonymous callers, stopping anonymous node-wide CID enumeration/api/v1/arweave/anchors(list_anchors):optional_signaturemiddleware to the route?repo=<owner>/<name>is provided: gates onauthorize_repo_read(deny → 404), preventing anchor metadata leaks for private repos?repo=: requires authentication for the global anchor listingTesting
Summary by CodeRabbit
?repo=<owner>/<name>is strictly validated and unauthorized repos are hidden; results are capped (limit ≤ 200) and returned with correctcount.401(anonymous) and200/404visibility behavior for both global and?repo=scopes.