Skip to content

feat(app): maximize RAG context and improve chat UX#118

Merged
comppaz merged 1 commit intomainfrom
feat/rag-context-budget
Feb 19, 2026
Merged

feat(app): maximize RAG context and improve chat UX#118
comppaz merged 1 commit intomainfrom
feat/rag-context-budget

Conversation

@comppaz
Copy link
Copy Markdown
Collaborator

@comppaz comppaz commented Feb 19, 2026

Summary

  • Dynamic context budget: Computes available context space from model window size, conversation history, and reserves — scales memory retrieval to fill available space instead of using a fixed 8k limit
  • Smarter filtering: Widens distance thresholds proportionally when budget allows, includes AI-generated summaries as overview/fallback for each memory, and subtracts page content from budget in extension chat
  • Improved system prompt: Rewrites prompt for better grounding (avoids hallucination) and natural source attribution (no raw type tags)
  • Chat UX fixes: Restores bullet point markers in markdown rendering, removes hover blur from assistant messages

Changed files

File Change
backend/app/services/embeddings/filtering.py New compute_context_budget(), budget scaling in filter/format functions, structured logging
backend/app/services/embeddings/__init__.py Export compute_context_budget
backend/app/routes/chat.py Compute and pass budget in _retrieve_context()
backend/app/native_messaging.py Same budget pattern, subtract page content from budget
backend/app/services/ai/client.py Rewritten system prompt
app/src/index.css Add list-style-type to .chat-prose lists
app/src/components/ChatMessage.tsx Remove glass.hover from assistant messages

Test plan

  • Chat with Ollama (small model) — verify more context is retrieved and summaries appear in logs
  • Chat with OpenRouter (large model) — verify budget scales up and more memories are included
  • Check Context budget: and Filter stats: log lines for correct values
  • Verify bullet points render correctly in chat responses
  • Verify no blur on hover over assistant messages
  • Test extension chat — verify page content is subtracted from budget
  • Verify LLM no longer outputs raw type tags like [web] or [video]

Dynamic context budget system that scales with model context window:
- Compute available context budget from window size minus history/system/reserve
- Widen distance thresholds proportionally when budget allows more content
- Include AI-generated summaries as overview/fallback in formatted context
- Subtract page content from budget in extension chat to prevent overflow

Chat improvements:
- Rewrite system prompt for grounding and natural source attribution
- Fix missing list markers in chat markdown (add list-style-type)
- Remove hover blur from assistant message bubbles
@comppaz comppaz merged commit 75fcea0 into main Feb 19, 2026
1 check failed
@comppaz comppaz deleted the feat/rag-context-budget branch February 19, 2026 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant