Replace Typesense with Postgres FTS behind BUZZ_SEARCH_BACKEND flag by tlongwell-block · Pull Request #1259 · block/buzz

tlongwell-block · 2026-06-24T23:31:09Z

What

Replace Typesense with Postgres full-text search for NIP-50, behind a staged
BUZZ_SEARCH_BACKEND flag (typesense | postgres | disabled, default
postgres). Typesense remains selectable for rollback; disabled fails
closed.

How search is implemented

Index: an expression GIN index idx_events_content_fts ON events USING GIN (to_tsvector('simple', content)) (migration 0004_search_fts.sql).
Chosen over a generated content_tsv STORED column for a smaller diff and a
clean back-out (drop one index).
Backend: buzz-search gains a postgres module that renders the
identical to_tsvector('simple', content) query the index serves, with
ts_rank_cd (cover-density) relevance ordering. SearchService::disabled()
is a no-op that returns empty for every query.
Wiring: BUZZ_SEARCH_BACKEND is parsed in config.rs (defaults to
postgres), threaded through the relay handlers and the Helm chart.

Two non-negotiable gates

Gate #1 — no visibility widening. Search never returns an event the caller
couldn't otherwise read. The backend only returns candidate IDs; handle_search_req
independently re-authorizes every hit before emission — filters_match,
accessible-channel check, reader_authorized_for_event, and author-only-kind
check (req.rs:455-471). This post-filter is downstream of and independent from
the backend, so backend choice cannot widen visibility by construction.

Gate #2 — disabled fails closed. With BUZZ_SEARCH_BACKEND=disabled,
every NIP-50 query returns empty regardless of how well content would match.

Testing

Full e2e matrix, green across all three backends (see TESTING.md →
"Search Backend Test Matrix"):

Backend	e2e search suite
typesense	9/9
postgres	9/9 (identical parity)
disabled	6/6 incl. `test_nip50_search_disabled_fails_closed`

Highlights:

test_nip50_search_cross_author_isolation — gate Dependency Dashboard #1: an outsider gets 0
hits searching a private channel they aren't a member of (the author, a
member, still finds their own message — non-vacuous control).
test_nip17_gift_wrap_not_searchable — gate Dependency Dashboard #1, backend-agnostic: kind:1059
gift wraps never surface via search; a kind:9 control does.
test_nip50_search_disabled_fails_closed — gate Initial release — Sprout Nostr relay with enterprise extensions #2: a would-match query
returns empty under disabled.
test_nip50_search_relevance_order — proximity-based ts_rank_cd ordering
(a more-relevant older message ranks above a less-relevant newer one).

Unit suites: buzz-search 30/30, buzz-db 70/70 (incl. both migration tests
exercising the expression-index swap on fresh + baselined schemas).

Rollback

Set BUZZ_SEARCH_BACKEND=typesense (no schema change needed) or
BUZZ_SEARCH_BACKEND=disabled. To remove the Postgres artifacts entirely:
DROP INDEX idx_events_content_fts.

Adds a Postgres full-text search backend as an alternative to Typesense for NIP-50 search, gated behind BUZZ_SEARCH_BACKEND=typesense|postgres|disabled (default typesense — no behavior change for existing deployments). The replacement is structural: NIP-50 search is the only Typesense call site, and the read path already refetches canonical events from Postgres by id, so Typesense was just a lookup index in front of the DB that owns the data. A generated stored tsvector column + GIN index gives the same shape with zero write-path code change. Changes - migrations/0004_search_fts.sql: events.content_tsv GENERATED ALWAYS AS to_tsvector('simple', content) STORED, GIN index, cascades to partitions. - crates/buzz-search: SearchBackend enum (Typesense | Postgres | Disabled), SearchService::with_postgres / ::disabled, postgres.rs backend impl, backend-neutral SearchQuery (structured kinds/authors/channel_ids/since/until; each backend renders its own filter). - crates/buzz-relay/src/config.rs: BUZZ_SEARCH_BACKEND env wired with strict parsing (unknown value → ConfigError::InvalidValue, no silent fallback) + 3 unit tests. - crates/buzz-relay/src/main.rs: dispatch on backend; Postgres → with_postgres using db.pool(); Disabled → no-op; Typesense → existing path. ensure_collection only runs for the Typesense backend. - crates/buzz-relay/src/{handlers/req.rs, api/bridge.rs}: swap to the new SearchService surface. Caller code shrinks — filter parts were already structured. - crates/buzz-db/src/lib.rs: Db::pool() accessor for the PG backend. Validation (against parent 2e426b2, PG17 side-deployed): - buzz-search lib: 29/29 pass. - buzz-relay config tests: 11/11 (incl. 3 new). - NIP-50 e2e on Typesense backend: 5/5 pass (regression baseline). - NIP-50 e2e on Postgres backend: 5/5 pass — including test_nip50_search_relevance_order, confirming ts_rank_cd ranks correctly for the NIP-50 query shape and the 'simple' tokenizer config is acceptable. - Wider e2e_nostr_interop sweep on Postgres: 19/23. The 4 failures reproduce identically on Typesense backend on this branch — pre-existing test-fixture coupling to a hard-coded 'events' collection name, not a regression. This is additive: Typesense remains default; nothing in the existing path is removed. Operators flip BUZZ_SEARCH_BACKEND per release to A/B/rollback. Signed-off-by: Tyler <109685178+tlongwell-block@users.noreply.github.com> Co-authored-by: Sami <f4a42a97e594b77bdbd8ee35191c8b28a94a4cb871d96f32921558275421fb68@sprout-oss.stage.blox.sqprod.co>

SearchQuery::new now requires non-empty channel_ids, returning SearchError::EmptyChannelScope otherwise. Fields are pub(crate) so struct-literal construction outside the crate is impossible; optional facets use #[must_use] builder methods (with_kinds/authors/since/ until/page/per_page). Closes the type-system gap on Eva's gate-1 "no visibility widening" invariant: the access boundary is now enforced at construction, not just at the call sites. Both call sites (req.rs, bridge.rs) wrap SearchQuery::new in a match — req.rs logs + breaks pagination on the Err path, bridge.rs continues the filter loop. Upstream guards (build_search_channel_scope_filter + the per-filter h_tag validity check) keep the Err path unreachable in normal operation, but if a future refactor lets an empty scope through, behavior is "no results" not "widened search". Also adds the missing `info!("Search backend: typesense", ...)` log line for symmetry with the postgres/disabled branches — small operational polish, no behavior change. Tests: buzz-search 30/30 (+1 rejection test), buzz-relay lib 337/337, NIP-50 e2e 5/5 on both Postgres and Typesense backends (4 NIP-50 + test_ws_search_isolation_other_user_cannot_find_reminder). Co-authored-by: Tyler <109685178+tlongwell-block@users.noreply.github.com> Signed-off-by: Tyler <109685178+tlongwell-block@users.noreply.github.com>

Implements Eva's blocker-2 fix: Postgres backend now orders search results by relevance, and the e2e test that claims to verify this actually discriminates rank from recency. postgres.rs - SELECT list now includes `ts_rank_cd(content_tsv, plainto_tsquery('simple', $q)) AS rank` when the query has searchable text. The same `$q` parameter slot is reused in WHERE. - ORDER BY rank DESC, created_at DESC when has_text; empty/"*" queries skip the rank column and fall back to created_at DESC (no needless tsquery cost). - SearchHit.score is populated from the rank column (f32 widened to f64). Empty/"*" queries leave score at 0.0. e2e_nostr_interop::test_nip50_search_relevance_order - Redesigned to discriminate rank from recency. Previous fixture used "alpha bravo charlie" with msg3="alpha bravo" — plainto_tsquery ANDs all terms, so msg3 never matched the WHERE clause and the test passed trivially with one candidate (Eva caught this). - New fixture: query "{prefix} alpha bravo"; msg1 (oldest) has terms adjacent (high rank); msg2 (middle) doesn't match at all; msg3 (newest) has terms separated by filler (lower rank). - Asserts both msg1 and msg3 are present, then asserts events[0].id == id1 with no `||content.contains(...)` escape hatch. - Discriminator is term proximity, not term frequency: Typesense's default _text_match does NOT reward repeated query terms (verified empirically — identical tm scores for "alpha bravo" vs "alpha alpha bravo bravo"), but BOTH backends reward adjacency. Proximity is the property both backends agree on. - New `send_rest_message_at` helper pins created_at via `custom_created_at`. Without explicit timestamps, three back-to-back sends share one wall-clock second; PG falls to heap-scan order and masquerades as rank ordering. Spreading by 30s each makes the recency-only ordering deterministically put msg3 first, so a passing test really means rank wins. Validation - buzz-search lib: 30/30. buzz-relay lib: 337/337. - NIP-50 e2e on Postgres: 4/4 (incl. relevance_order) + isolation 1/1. - NIP-50 e2e on Typesense: 4/4 + isolation 1/1. - Proof of discrimination: with postgres.rs reverted to `ORDER BY created_at DESC`, the new test FAILS on PG (msg3 first, as predicted). Restored ts_rank_cd ordering after. Pre-existing failure not introduced by this commit: test_nip17_gift_wrap_not_searchable fails on both backends — it queries Typesense directly at events-spike-{backend}; on the PG backend that collection is never written to (structurally expected), and the Typesense-backend failure is the same fixture coupling Eva already acknowledged in the prior turn. No regression vs e5869dd/4b7c8d12. Co-authored-by: Tyler <109685178+tlongwell-block@users.noreply.github.com> Signed-off-by: Tyler <109685178+tlongwell-block@users.noreply.github.com>

Replace the GENERATED ALWAYS ... STORED tsvector column + column-backed GIN index with a single expression index on to_tsvector('simple', content). Why: the expression index is maintained by Postgres on every INSERT/UPDATE exactly like a column index (no write-path work), but avoids the stored column's ALTER TABLE row rewrite / ACCESS EXCLUSIVE backfill — so the migration build is online-safe on a fresh/small relay and the same-named index (idx_events_content_fts) is pre-buildable out of band on a large populated relay (CREATE INDEX CONCURRENTLY + ATTACH per the live-relay runbook). IF NOT EXISTS makes the migration idempotent against that path. The query path renders the identical to_tsvector('simple', content) expression so the planner uses the index. Rank SQL (ts_rank_cd) is unchanged in behavior. - migrations/0004_search_fts.sql: single CREATE INDEX IF NOT EXISTS expr index - schema/schema.sql: drop generated column; expr index for fresh installs - crates/buzz-search/src/postgres.rs: query refs -> to_tsvector(...) expr - crates/buzz-db/src/migration.rs: assertions match new shape - crates/buzz-search/src/lib.rs, crates/buzz-relay/src/main.rs: doc wording Tests: cargo test -p buzz-db -p buzz-search (incl. ignored PG tests): 118 passed, 0 failed. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>

…author search isolation test_nip17_gift_wrap_not_searchable previously queried Typesense directly to prove kind:1059 was never indexed — meaningless on the Postgres backend, where every event lives in the events table and there is no separate index to skip. Rewrite it to issue a NIP-50 search and assert the relay never surfaces the gift wrap (kind:9 control IS returned, kind:1059 is NOT). That guarantee comes from the relay's auth gates + filters_match post-filter, which are identical across all three backends, so the test now guards typesense/postgres/disabled. Add test_nip50_search_cross_author_isolation: an outsider who never joined an author's channel gets zero hits when searching that channel's exact #h + token, while the author finds their own message — proving the channel-scope clamp in handle_search_req (gate #1, no visibility widening) holds. Also relay-side and backend-independent. Compiles clean (cargo test -p buzz-test-client --test e2e_nostr_interop --no-run); the full matrix run lands after Max's backend-flag wiring is rebased in. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>

Co-authored-by: npub1mprnacetjua2xx3p5eddmhxyk6wv929ymm5py8kd2xfxurxahspqqlgyta <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co> Signed-off-by: npub1mprnacetjua2xx3p5eddmhxyk6wv929ymm5py8kd2xfxurxahspqqlgyta <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co> (cherry picked from commit 8636898) Co-authored-by: Max <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co> Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>

The cross-author isolation test was vacuously failing on BOTH backends: it created an `open` channel, which is searchable by anyone by design (get_accessible_channel_ids unions member channels with ALL open channels), so the outsider legitimately found the author's message. Switch the test to create_private_test_channel. In a private channel the creator is bootstrapped as a member (so the author still finds their own post — the non-vacuous control), while the outsider is not a member and gets zero hits. This makes the test a true visibility-widening guard, backend-independent by construction. Adds create_private_test_channel / create_channel_with_visibility helpers; create_test_channel now delegates to the open variant (no behavior change for existing callers). Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>

Adds test_nip50_search_disabled_fails_closed: posts a message that the postgres/typesense backends provably return, then asserts the relay delivers EOSE with zero events. Proves BUZZ_SEARCH_BACKEND=disabled fails closed — NIP-50 search leaks nothing regardless of how well content would otherwise match. Introduces a test_backend() helper reading BUZZ_TEST_BACKEND; the disabled test early-returns (skips) unless the relay-under-test is the disabled backend, so the full suite stays green against all three backends. Matrix verified green: typesense 9/9, postgres 9/9 (identical parity), disabled 6/6 incl. the fail-closed assertion. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>

Adds a "Search Backend Test Matrix" section to TESTING.md covering the BUZZ_SEARCH_BACKEND flag, the two gates (no visibility widening; disabled fails closed), the per-backend test table with skip rationale, and how to run the suite with BUZZ_TEST_BACKEND. Adds BUZZ_SEARCH_BACKEND to the config reference. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>

CI's Rust Lint and Windows Rust jobs run `cargo clippy --all-targets -- -D warnings`; the manual `min(MAX_PER_PAGE).max(1)` clamp pattern in the Postgres backend tripped clippy::manual_clamp and failed both. Replace with `.clamp(1, MAX_PER_PAGE)` — identical result (1 <= 250). Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>

npub17jjz49l9jjmhhk7cac63j8yt9z555n9cw8vk7v5jz4vzw4ppld5qgj57cc and others added 10 commits June 24, 2026 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace Typesense with Postgres FTS behind BUZZ_SEARCH_BACKEND flag#1259

Replace Typesense with Postgres FTS behind BUZZ_SEARCH_BACKEND flag#1259
tlongwell-block wants to merge 10 commits into
mainfrom
eva/pg-fts-integration

tlongwell-block commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tlongwell-block commented Jun 24, 2026

What

How search is implemented

Two non-negotiable gates

Testing

Rollback

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant