feat: simpler iterative retrieval (BM25 → rerank → grade → expand)#25
Open
dominikpeter wants to merge 3 commits into
Open
feat: simpler iterative retrieval (BM25 → rerank → grade → expand)#25dominikpeter wants to merge 3 commits into
dominikpeter wants to merge 3 commits into
Conversation
Restructures _aretrieve_documents' tail into a staged escalation:
Stage 0 (always) standard retrieval + filter + pin (unchanged)
Stage 1 (on fail) swarm_retrieve variants, merged with stage-0 via
_merge_with_priority so pinned hits stay in top-k
The grader (relevance_check) short-circuits on top-1 BM25 score ≥ 0.7
(cheap heuristic for confident matches) and only fails on confidently-
negative LLM verdicts — marginal cases skip expansion. Updated grader
prompt recognises synonyms/paraphrases/multilingual equivalents so
"Bieröffner" ≈ "Flaschenöffner" counts as a match.
_merge_with_priority is a new helper: reserves the first `keep_top`
slots for the prior stage, RRF-fuses remaining candidates from both
stages. Prevents later-stage noise from demoting earlier hits.
Stage 2 (filter-free broad) intentionally omitted — regressed more
cases than it helped on the current benchmark.
Eval: baseline 122/189 ±2 → with loop 123/189 ±2 (within noise but
architecturally cleaner; handles synonym gaps without manual tuning).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the fan-out fast-path/filter-task/swarm/escalate pileup with a
clean loop that matches the "smart iterative" mental model:
1. preprocess ∥ detect_filter_intent (parallel LLM calls)
2. seed variants = LLM synonyms + edit-distance-1 typo candidates
3. for iter in range(max_iter):
- broad search + (if intent) filter search, batched via
Meilisearch multi-search, always keeping both arms so the
unfiltered rescue stays alive even when filter fires
- stage-aware merge into running pool (_merge_with_priority),
take top 20, rerank, pin filter_docs to top-5
- grader: top-1 score ≥ 0.9 short-circuit, else LLM relevance
check (accept on makes_sense ∧ confidence ≥ 0.7)
- pass → break; fail → LLM generates fresh variants, loop
Stricter grader (0.9 score floor, 0.7 LLM confidence) keeps the loop
running on ambiguous pools instead of early-exiting. Always merging
broad+filter catches cases where the intent LLM narrowed to the wrong
supplier field.
Eval (3-run): baseline 122/189 ±2 → 127/189 ±1 (+5 hits, +2.7pp).
Above noise floor, first real client-side win since #18.
Removes ~200 LOC of escalate/fast-path scaffolding.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per .agents/skills/deslop guidelines: - Shorter docstrings (remove rambling examples that belong in tests). - Drop nested _emit helper in _typo_candidates, inline early-return. - Collapse seen/skip pattern in _merge_with_priority. No behaviour change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the fan-out of fast-path / filter-task / swarm-rescue / escalate with a clean iterative loop that matches the design you sketched: BM25 with extended queries → top 20 → rerank → grader → on fail expand (more variants + semantic) → iterate up to 5 times.
Flow
Key design choices
supplier_namevalue, broad_docs still has the right hits._merge_with_priority): later iters can only ADD candidates — the first 5 slots are reserved for earlier-stage hits, so expansion never demotes a correctly-surfaced doc./multi-search(one HTTP round-trip per stage).Eval
3-run means on
tests.eval_v2(40 OneTrade DE base cases + Article + Supplier):+5 hits, +2.7pp, first client-side win above noise floor since #18.
Latency: ~500-800ms per query on fast path (grader short-circuit). Worst case with 3 iters: ~4-6s.
Removed
_aescalate_if_neededhelper (~80 LOC) — absorbed into the loop_aretrieve_documents—_asearchhandles HyDE internallyNet: -53 LOC (361 → 308), simpler control flow.
🤖 Generated with Claude Code