Replace Typesense with Postgres FTS behind BUZZ_SEARCH_BACKEND flag#1259
Open
tlongwell-block wants to merge 10 commits into
Open
Replace Typesense with Postgres FTS behind BUZZ_SEARCH_BACKEND flag#1259tlongwell-block wants to merge 10 commits into
tlongwell-block wants to merge 10 commits into
Conversation
Adds a Postgres full-text search backend as an alternative to Typesense for
NIP-50 search, gated behind BUZZ_SEARCH_BACKEND=typesense|postgres|disabled
(default typesense — no behavior change for existing deployments).
The replacement is structural: NIP-50 search is the only Typesense call site,
and the read path already refetches canonical events from Postgres by id, so
Typesense was just a lookup index in front of the DB that owns the data. A
generated stored tsvector column + GIN index gives the same shape with zero
write-path code change.
Changes
- migrations/0004_search_fts.sql: events.content_tsv GENERATED ALWAYS AS
to_tsvector('simple', content) STORED, GIN index, cascades to partitions.
- crates/buzz-search: SearchBackend enum (Typesense | Postgres | Disabled),
SearchService::with_postgres / ::disabled, postgres.rs backend impl,
backend-neutral SearchQuery (structured kinds/authors/channel_ids/since/until;
each backend renders its own filter).
- crates/buzz-relay/src/config.rs: BUZZ_SEARCH_BACKEND env wired with strict
parsing (unknown value → ConfigError::InvalidValue, no silent fallback) +
3 unit tests.
- crates/buzz-relay/src/main.rs: dispatch on backend; Postgres → with_postgres
using db.pool(); Disabled → no-op; Typesense → existing path. ensure_collection
only runs for the Typesense backend.
- crates/buzz-relay/src/{handlers/req.rs, api/bridge.rs}: swap to the new
SearchService surface. Caller code shrinks — filter parts were already
structured.
- crates/buzz-db/src/lib.rs: Db::pool() accessor for the PG backend.
Validation (against parent 2e426b2, PG17 side-deployed):
- buzz-search lib: 29/29 pass.
- buzz-relay config tests: 11/11 (incl. 3 new).
- NIP-50 e2e on Typesense backend: 5/5 pass (regression baseline).
- NIP-50 e2e on Postgres backend: 5/5 pass — including
test_nip50_search_relevance_order, confirming ts_rank_cd ranks correctly
for the NIP-50 query shape and the 'simple' tokenizer config is acceptable.
- Wider e2e_nostr_interop sweep on Postgres: 19/23. The 4 failures reproduce
identically on Typesense backend on this branch — pre-existing test-fixture
coupling to a hard-coded 'events' collection name, not a regression.
This is additive: Typesense remains default; nothing in the existing path is
removed. Operators flip BUZZ_SEARCH_BACKEND per release to A/B/rollback.
Signed-off-by: Tyler <109685178+tlongwell-block@users.noreply.github.com>
Co-authored-by: Sami <f4a42a97e594b77bdbd8ee35191c8b28a94a4cb871d96f32921558275421fb68@sprout-oss.stage.blox.sqprod.co>
SearchQuery::new now requires non-empty channel_ids, returning
SearchError::EmptyChannelScope otherwise. Fields are pub(crate) so
struct-literal construction outside the crate is impossible; optional
facets use #[must_use] builder methods (with_kinds/authors/since/
until/page/per_page).
Closes the type-system gap on Eva's gate-1 "no visibility widening"
invariant: the access boundary is now enforced at construction, not
just at the call sites. Both call sites (req.rs, bridge.rs) wrap
SearchQuery::new in a match — req.rs logs + breaks pagination on the
Err path, bridge.rs continues the filter loop. Upstream guards
(build_search_channel_scope_filter + the per-filter h_tag validity
check) keep the Err path unreachable in normal operation, but if a
future refactor lets an empty scope through, behavior is "no results"
not "widened search".
Also adds the missing `info!("Search backend: typesense", ...)` log
line for symmetry with the postgres/disabled branches — small
operational polish, no behavior change.
Tests: buzz-search 30/30 (+1 rejection test), buzz-relay lib 337/337,
NIP-50 e2e 5/5 on both Postgres and Typesense backends (4 NIP-50 +
test_ws_search_isolation_other_user_cannot_find_reminder).
Co-authored-by: Tyler <109685178+tlongwell-block@users.noreply.github.com>
Signed-off-by: Tyler <109685178+tlongwell-block@users.noreply.github.com>
Implements Eva's blocker-2 fix: Postgres backend now orders search
results by relevance, and the e2e test that claims to verify this
actually discriminates rank from recency.
postgres.rs
- SELECT list now includes `ts_rank_cd(content_tsv,
plainto_tsquery('simple', $q)) AS rank` when the query has
searchable text. The same `$q` parameter slot is reused in WHERE.
- ORDER BY rank DESC, created_at DESC when has_text; empty/"*"
queries skip the rank column and fall back to created_at DESC
(no needless tsquery cost).
- SearchHit.score is populated from the rank column (f32 widened
to f64). Empty/"*" queries leave score at 0.0.
e2e_nostr_interop::test_nip50_search_relevance_order
- Redesigned to discriminate rank from recency. Previous fixture
used "alpha bravo charlie" with msg3="alpha bravo" — plainto_tsquery
ANDs all terms, so msg3 never matched the WHERE clause and the
test passed trivially with one candidate (Eva caught this).
- New fixture: query "{prefix} alpha bravo"; msg1 (oldest) has
terms adjacent (high rank); msg2 (middle) doesn't match at all;
msg3 (newest) has terms separated by filler (lower rank).
- Asserts both msg1 and msg3 are present, then asserts events[0].id
== id1 with no `||content.contains(...)` escape hatch.
- Discriminator is term proximity, not term frequency: Typesense's
default _text_match does NOT reward repeated query terms (verified
empirically — identical tm scores for "alpha bravo" vs
"alpha alpha bravo bravo"), but BOTH backends reward adjacency.
Proximity is the property both backends agree on.
- New `send_rest_message_at` helper pins created_at via
`custom_created_at`. Without explicit timestamps, three back-to-back
sends share one wall-clock second; PG falls to heap-scan order and
masquerades as rank ordering. Spreading by 30s each makes the
recency-only ordering deterministically put msg3 first, so a
passing test really means rank wins.
Validation
- buzz-search lib: 30/30. buzz-relay lib: 337/337.
- NIP-50 e2e on Postgres: 4/4 (incl. relevance_order) + isolation 1/1.
- NIP-50 e2e on Typesense: 4/4 + isolation 1/1.
- Proof of discrimination: with postgres.rs reverted to
`ORDER BY created_at DESC`, the new test FAILS on PG (msg3 first,
as predicted). Restored ts_rank_cd ordering after.
Pre-existing failure not introduced by this commit:
test_nip17_gift_wrap_not_searchable fails on both backends — it queries
Typesense directly at events-spike-{backend}; on the PG backend that
collection is never written to (structurally expected), and the
Typesense-backend failure is the same fixture coupling Eva already
acknowledged in the prior turn. No regression vs e5869dd/4b7c8d12.
Co-authored-by: Tyler <109685178+tlongwell-block@users.noreply.github.com>
Signed-off-by: Tyler <109685178+tlongwell-block@users.noreply.github.com>
Replace the GENERATED ALWAYS ... STORED tsvector column + column-backed
GIN index with a single expression index on to_tsvector('simple', content).
Why: the expression index is maintained by Postgres on every INSERT/UPDATE
exactly like a column index (no write-path work), but avoids the stored
column's ALTER TABLE row rewrite / ACCESS EXCLUSIVE backfill — so the
migration build is online-safe on a fresh/small relay and the same-named
index (idx_events_content_fts) is pre-buildable out of band on a large
populated relay (CREATE INDEX CONCURRENTLY + ATTACH per the live-relay
runbook). IF NOT EXISTS makes the migration idempotent against that path.
The query path renders the identical to_tsvector('simple', content)
expression so the planner uses the index. Rank SQL (ts_rank_cd) is
unchanged in behavior.
- migrations/0004_search_fts.sql: single CREATE INDEX IF NOT EXISTS expr index
- schema/schema.sql: drop generated column; expr index for fresh installs
- crates/buzz-search/src/postgres.rs: query refs -> to_tsvector(...) expr
- crates/buzz-db/src/migration.rs: assertions match new shape
- crates/buzz-search/src/lib.rs, crates/buzz-relay/src/main.rs: doc wording
Tests: cargo test -p buzz-db -p buzz-search (incl. ignored PG tests):
118 passed, 0 failed.
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…author search isolation test_nip17_gift_wrap_not_searchable previously queried Typesense directly to prove kind:1059 was never indexed — meaningless on the Postgres backend, where every event lives in the events table and there is no separate index to skip. Rewrite it to issue a NIP-50 search and assert the relay never surfaces the gift wrap (kind:9 control IS returned, kind:1059 is NOT). That guarantee comes from the relay's auth gates + filters_match post-filter, which are identical across all three backends, so the test now guards typesense/postgres/disabled. Add test_nip50_search_cross_author_isolation: an outsider who never joined an author's channel gets zero hits when searching that channel's exact #h + token, while the author finds their own message — proving the channel-scope clamp in handle_search_req (gate #1, no visibility widening) holds. Also relay-side and backend-independent. Compiles clean (cargo test -p buzz-test-client --test e2e_nostr_interop --no-run); the full matrix run lands after Max's backend-flag wiring is rebased in. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Co-authored-by: npub1mprnacetjua2xx3p5eddmhxyk6wv929ymm5py8kd2xfxurxahspqqlgyta <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co> Signed-off-by: npub1mprnacetjua2xx3p5eddmhxyk6wv929ymm5py8kd2xfxurxahspqqlgyta <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co> (cherry picked from commit 8636898) Co-authored-by: Max <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co> Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
The cross-author isolation test was vacuously failing on BOTH backends: it created an `open` channel, which is searchable by anyone by design (get_accessible_channel_ids unions member channels with ALL open channels), so the outsider legitimately found the author's message. Switch the test to create_private_test_channel. In a private channel the creator is bootstrapped as a member (so the author still finds their own post — the non-vacuous control), while the outsider is not a member and gets zero hits. This makes the test a true visibility-widening guard, backend-independent by construction. Adds create_private_test_channel / create_channel_with_visibility helpers; create_test_channel now delegates to the open variant (no behavior change for existing callers). Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Adds test_nip50_search_disabled_fails_closed: posts a message that the postgres/typesense backends provably return, then asserts the relay delivers EOSE with zero events. Proves BUZZ_SEARCH_BACKEND=disabled fails closed — NIP-50 search leaks nothing regardless of how well content would otherwise match. Introduces a test_backend() helper reading BUZZ_TEST_BACKEND; the disabled test early-returns (skips) unless the relay-under-test is the disabled backend, so the full suite stays green against all three backends. Matrix verified green: typesense 9/9, postgres 9/9 (identical parity), disabled 6/6 incl. the fail-closed assertion. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Adds a "Search Backend Test Matrix" section to TESTING.md covering the BUZZ_SEARCH_BACKEND flag, the two gates (no visibility widening; disabled fails closed), the per-backend test table with skip rationale, and how to run the suite with BUZZ_TEST_BACKEND. Adds BUZZ_SEARCH_BACKEND to the config reference. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
CI's Rust Lint and Windows Rust jobs run `cargo clippy --all-targets -- -D warnings`; the manual `min(MAX_PER_PAGE).max(1)` clamp pattern in the Postgres backend tripped clippy::manual_clamp and failed both. Replace with `.clamp(1, MAX_PER_PAGE)` — identical result (1 <= 250). Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Replace Typesense with Postgres full-text search for NIP-50, behind a staged
BUZZ_SEARCH_BACKENDflag (typesense|postgres|disabled, defaultpostgres). Typesense remains selectable for rollback;disabledfailsclosed.
How search is implemented
idx_events_content_fts ON events USING GIN (to_tsvector('simple', content))(migration0004_search_fts.sql).Chosen over a generated
content_tsv STOREDcolumn for a smaller diff and aclean back-out (drop one index).
buzz-searchgains apostgresmodule that renders theidentical
to_tsvector('simple', content)query the index serves, withts_rank_cd(cover-density) relevance ordering.SearchService::disabled()is a no-op that returns empty for every query.
BUZZ_SEARCH_BACKENDis parsed inconfig.rs(defaults topostgres), threaded through the relay handlers and the Helm chart.Two non-negotiable gates
Gate #1 — no visibility widening. Search never returns an event the caller
couldn't otherwise read. The backend only returns candidate IDs;
handle_search_reqindependently re-authorizes every hit before emission —
filters_match,accessible-channel check,
reader_authorized_for_event, and author-only-kindcheck (
req.rs:455-471). This post-filter is downstream of and independent fromthe backend, so backend choice cannot widen visibility by construction.
Gate #2 —
disabledfails closed. WithBUZZ_SEARCH_BACKEND=disabled,every NIP-50 query returns empty regardless of how well content would match.
Testing
Full e2e matrix, green across all three backends (see
TESTING.md→"Search Backend Test Matrix"):
test_nip50_search_disabled_fails_closedHighlights:
test_nip50_search_cross_author_isolation— gate Dependency Dashboard #1: an outsider gets 0hits searching a private channel they aren't a member of (the author, a
member, still finds their own message — non-vacuous control).
test_nip17_gift_wrap_not_searchable— gate Dependency Dashboard #1, backend-agnostic: kind:1059gift wraps never surface via search; a kind:9 control does.
test_nip50_search_disabled_fails_closed— gate Initial release — Sprout Nostr relay with enterprise extensions #2: a would-match queryreturns empty under
disabled.test_nip50_search_relevance_order— proximity-basedts_rank_cdordering(a more-relevant older message ranks above a less-relevant newer one).
Unit suites:
buzz-search30/30,buzz-db70/70 (incl. both migration testsexercising the expression-index swap on fresh + baselined schemas).
Rollback
Set
BUZZ_SEARCH_BACKEND=typesense(no schema change needed) orBUZZ_SEARCH_BACKEND=disabled. To remove the Postgres artifacts entirely:DROP INDEX idx_events_content_fts.