Skip to content

MomoSolaris8/supportagent

Repository files navigation

SupportAgent

RAG pipeline over a real Confluence space + Jira project, for the "Insurance Knowledge Search Dashboard" portfolio project. See architecture-proposal-v0.1.de.md for the full design.

Architecture (Prototype / MVP Deployment)

graph TD
    User[👤 Browser]

    subgraph SC["🌐 Frontend"]
        Dashboard[dashboard.py<br/>Streamlit]
    end

    subgraph VC["⚡ Backend API"]
        API[api.py<br/>POST /ask]
        Retrieval[retrieval.py<br/>retrieve]
        Answer[answer.py<br/>generate_answer]
    end

    subgraph SB["🗄️ Database"]
        PG[(Postgres + pgvector)]
    end

    subgraph DS["🤖 AI Services"]
        Embed[text-embedding-v3]
        Chat[qwen3-max]
    end

    User -->|ask question| Dashboard
    Dashboard -->|POST /ask| API

    API --> Retrieval
    Retrieval --> Embed
    Embed --> Retrieval

    Retrieval --> PG
    PG --> Retrieval

    Retrieval --> API

    API --> Answer
    Answer --> Chat
    Chat --> Answer

    Answer --> API
    API --> Dashboard
    Dashboard --> User

    %% Colors
    classDef frontend fill:#4f46e5,color:#fff
    classDef backend fill:#059669,color:#fff
    classDef database fill:#dc2626,color:#fff
    classDef ai fill:#ea580c,color:#fff
    classDef user fill:#6b7280,color:#fff

    class Dashboard frontend
    class API,Retrieval,Answer backend
    class PG database
    class Embed,Chat ai
    class User user
Loading
  • api.py is a thin controller: it just calls retrieve() then generate_answer().
  • Two external calls go to DashScope: one to embed the question, one to generate the final answer from the retrieved chunks.
  • The Streamlit dashboard calls the Vercel API server-to-server (requests.post), so no CORS configuration is needed.

Screenshots

Dashboard: ask a German question and get an answer with cited, expandable sources.

Dashboard with cited sources

Dashboard answering a general "versicherung" query

/ask API schema (FastAPI Swagger UI):

FastAPI /ask schema

The real Jira project (KAN) backing the "documentation gap" tickets used as sources:

Jira board

Setup

python -m venv .venv
source .venv/bin/activate
python -m pip install -e .
docker compose up -d postgres

Copy .env.example to .env and fill in:

  • ATLASSIAN_BASE_URL, ATLASSIAN_EMAIL, ATLASSIAN_API_TOKEN - Confluence/Jira Cloud API token
  • CONFLUENCE_SPACE_KEY, JIRA_PROJECT_KEY - the space/project to read from and write to
  • EMBEDDING_API_KEY - Alibaba Cloud Model Studio (DashScope) API key, used via its OpenAI-compatible endpoint (EMBEDDING_BASE_URL) for both embeddings and chat (CHAT_MODEL)
  • DATABASE_URL - points at the pgvector container started by docker compose up

Pipeline

# 1. Seed the Confluence space + Jira project with sample insurance content
python -m supportagent.seed

# 2. Pull real Confluence pages (tagged "insurance-kb") + Jira issues, normalize to Documents
python -m supportagent.ingest

# 3. Chunk -> embed -> store in pgvector
python -m supportagent.index

RAG Answer API

uvicorn supportagent.api:app --reload

POST /ask with {"question": "..."} retrieves relevant chunks from pgvector, generates a German answer with citations ([1], [2], ...), and returns the cited sources. If the retrieved context doesn't support an answer, it returns a fixed controlled-refusal message instead.

Evaluation

python -m supportagent.eval

Runs a small set of German questions (eval_questions.py) covering single-source retrieval, multi-source synthesis, conflicting sources, terminology robustness, and controlled refusal, and prints a pass/fail report against the live pipeline.

Dashboard

streamlit run supportagent/dashboard.py

A simple chat UI on top of /ask (run uvicorn first, see above): ask a German question, filter by source (Confluence/Jira/all) in the sidebar, and expand each cited source to preview its content and open the original Confluence page or Jira issue. Set API_BASE_URL if the API isn't on http://localhost:8000.

PDF data prep

pdf_to_confluence.py extracts §-numbered sections from German insurance terms PDFs (Musterbedingungen/AVB) into Confluence page drafts. See the module docstring for the dry-run / save workflow.

Project layout

  • models.py - shared Document contract
  • html_utils.py, adf_utils.py - Confluence storage-format HTML and Jira ADF conversions
  • atlassian_client.py - real Confluence v2 / Jira v3 REST client
  • seed_content.py, seed.py - sample data + script to create it in Confluence/Jira
  • ingest.py - pulls real data back out and normalizes it to Document
  • chunking.py, embeddings.py, vector_store.py, index.py - chunk/embed/store pipeline

Tests

python -m pytest

About

RAG pipeline + dashboard over Confluence/Jira for German insurance knowledge search (portfolio project), https://supportagent-hazel.vercel.app,https://supportagent-9nsokjuqhdotpxkgn7qoqc.streamlit.app/,

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages