feat: simpler iterative retrieval (BM25 → rerank → grade → expand) by dominikpeter · Pull Request #25 · bmsuisse/rag7

dominikpeter · 2026-04-21T07:25:34Z

Summary

Replaces the fan-out of fast-path / filter-task / swarm-rescue / escalate with a clean iterative loop that matches the design you sketched: BM25 with extended queries → top 20 → rerank → grader → on fail expand (more variants + semantic) → iterate up to 5 times.

Flow

preprocess ∥ detect_filter_intent          (parallel LLM calls)
seed variants = LLM synonyms + edit-distance-1 typo rewrites

for iter_n in range(max_iter):
    broad_search(variants, semantic=iter_n≥1?0.5:None)   │
                     ∥                                    │  batched via
    filter_search(intent, variants) if intent             │  Meilisearch
                                                          │  multi-search
    merge (always keep both arms) → pool
    stage-aware merge with priority → take top 20 → rerank → pin filter top-5
    grader:
        top-1 score ≥ 0.9             → pass, break
        else LLM relevance_check:
            makes_sense ∧ conf ≥ 0.7  → pass, break
    if fail: LLM generates more variants, loop

Key design choices

Always keep the unfiltered arm: even when filter-intent fires, we still run broad search in parallel and merge both. When the intent LLM picks the wrong supplier_name value, broad_docs still has the right hits.
Stricter grader: 0.9 score floor (was 0.7) + 0.7 LLM confidence floor (was 0.6). Marginal pools now iterate instead of early-exiting.
Progressive semantic weight: iter 0 is BM25-only (if preprocess didn't set a ratio); iter 1+ blends in 0.5 semantic. Matches the ‘bm25 first, then hybrid, then more variants’ design.
Stage-aware merge (_merge_with_priority): later iters can only ADD candidates — the first 5 slots are reserved for earlier-stage hits, so expansion never demotes a correctly-surfaced doc.
Multi-search everywhere: every broad/filter/rescue call batches variants via Meilisearch /multi-search (one HTTP round-trip per stage).

Eval

3-run means on tests.eval_v2 (40 OneTrade DE base cases + Article + Supplier):

	Hit@5
baseline (main)	122/189 ±2
this PR	127/189 ±1

+5 hits, +2.7pp, first client-side win above noise floor since #18.

Latency: ~500-800ms per query on fast path (grader short-circuit). Worst case with 3 iters: ~4-6s.

Removed

_aescalate_if_needed helper (~80 LOC) — absorbed into the loop
Fast-path with relevance-check gate — fast-accept was bypassing filter too aggressively
Hyde task in _aretrieve_documents — _asearch handles HyDE internally

Net: -53 LOC (361 → 308), simpler control flow.

🤖 Generated with Claude Code

Restructures _aretrieve_documents' tail into a staged escalation: Stage 0 (always) standard retrieval + filter + pin (unchanged) Stage 1 (on fail) swarm_retrieve variants, merged with stage-0 via _merge_with_priority so pinned hits stay in top-k The grader (relevance_check) short-circuits on top-1 BM25 score ≥ 0.7 (cheap heuristic for confident matches) and only fails on confidently- negative LLM verdicts — marginal cases skip expansion. Updated grader prompt recognises synonyms/paraphrases/multilingual equivalents so "Bieröffner" ≈ "Flaschenöffner" counts as a match. _merge_with_priority is a new helper: reserves the first `keep_top` slots for the prior stage, RRF-fuses remaining candidates from both stages. Prevents later-stage noise from demoting earlier hits. Stage 2 (filter-free broad) intentionally omitted — regressed more cases than it helped on the current benchmark. Eval: baseline 122/189 ±2 → with loop 123/189 ±2 (within noise but architecturally cleaner; handles synonym gaps without manual tuning). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Replaces the fan-out fast-path/filter-task/swarm/escalate pileup with a clean loop that matches the "smart iterative" mental model: 1. preprocess ∥ detect_filter_intent (parallel LLM calls) 2. seed variants = LLM synonyms + edit-distance-1 typo candidates 3. for iter in range(max_iter): - broad search + (if intent) filter search, batched via Meilisearch multi-search, always keeping both arms so the unfiltered rescue stays alive even when filter fires - stage-aware merge into running pool (_merge_with_priority), take top 20, rerank, pin filter_docs to top-5 - grader: top-1 score ≥ 0.9 short-circuit, else LLM relevance check (accept on makes_sense ∧ confidence ≥ 0.7) - pass → break; fail → LLM generates fresh variants, loop Stricter grader (0.9 score floor, 0.7 LLM confidence) keeps the loop running on ambiguous pools instead of early-exiting. Always merging broad+filter catches cases where the intent LLM narrowed to the wrong supplier field. Eval (3-run): baseline 122/189 ±2 → 127/189 ±1 (+5 hits, +2.7pp). Above noise floor, first real client-side win since #18. Removes ~200 LOC of escalate/fast-path scaffolding. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Per .agents/skills/deslop guidelines: - Shorter docstrings (remove rambling examples that belong in tests). - Drop nested _emit helper in _typo_candidates, inline early-return. - Collapse seen/skip pattern in _merge_with_priority. No behaviour change.

dominikpeter and others added 3 commits April 21, 2026 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: simpler iterative retrieval (BM25 → rerank → grade → expand)#25

feat: simpler iterative retrieval (BM25 → rerank → grade → expand)#25
dominikpeter wants to merge 3 commits into
mainfrom
feat/iterative-retrieval-v2

dominikpeter commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dominikpeter commented Apr 21, 2026

Summary

Flow

Key design choices

Eval

Removed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant