SupportAgent

RAG pipeline over a real Confluence space + Jira project, for the "Insurance Knowledge Search Dashboard" portfolio project. See architecture-proposal-v0.1.de.md for the full design.

Architecture (Prototype / MVP Deployment)

graph TD
    User[👤 Browser]

    subgraph SC["🌐 Frontend"]
        Dashboard[dashboard.py<br/>Streamlit]
    end

    subgraph VC["⚡ Backend API"]
        API[api.py<br/>POST /ask]
        Retrieval[retrieval.py<br/>retrieve]
        Answer[answer.py<br/>generate_answer]
    end

    subgraph SB["🗄️ Database"]
        PG[(Postgres + pgvector)]
    end

    subgraph DS["🤖 AI Services"]
        Embed[text-embedding-v3]
        Chat[qwen3-max]
    end

    User -->|ask question| Dashboard
    Dashboard -->|POST /ask| API

    API --> Retrieval
    Retrieval --> Embed
    Embed --> Retrieval

    Retrieval --> PG
    PG --> Retrieval

    Retrieval --> API

    API --> Answer
    Answer --> Chat
    Chat --> Answer

    Answer --> API
    API --> Dashboard
    Dashboard --> User

    %% Colors
    classDef frontend fill:#4f46e5,color:#fff
    classDef backend fill:#059669,color:#fff
    classDef database fill:#dc2626,color:#fff
    classDef ai fill:#ea580c,color:#fff
    classDef user fill:#6b7280,color:#fff

    class Dashboard frontend
    class API,Retrieval,Answer backend
    class PG database
    class Embed,Chat ai
    class User user

api.py is a thin controller: it just calls retrieve() then generate_answer().
Two external calls go to DashScope: one to embed the question, one to generate the final answer from the retrieved chunks.
The Streamlit dashboard calls the Vercel API server-to-server (requests.post), so no CORS configuration is needed.

Screenshots

Dashboard: ask a German question and get an answer with cited, expandable sources.

/ask API schema (FastAPI Swagger UI):

The real Jira project (KAN) backing the "documentation gap" tickets used as sources:

Setup

python -m venv .venv
source .venv/bin/activate
python -m pip install -e .
docker compose up -d postgres

Copy .env.example to .env and fill in:

ATLASSIAN_BASE_URL, ATLASSIAN_EMAIL, ATLASSIAN_API_TOKEN - Confluence/Jira Cloud API token
CONFLUENCE_SPACE_KEY, JIRA_PROJECT_KEY - the space/project to read from and write to
EMBEDDING_API_KEY - Alibaba Cloud Model Studio (DashScope) API key, used via its OpenAI-compatible endpoint (EMBEDDING_BASE_URL) for both embeddings and chat (CHAT_MODEL)
DATABASE_URL - points at the pgvector container started by docker compose up

Pipeline

# 1. Seed the Confluence space + Jira project with sample insurance content
python -m supportagent.seed

# 2. Pull real Confluence pages (tagged "insurance-kb") + Jira issues, normalize to Documents
python -m supportagent.ingest

# 3. Chunk -> embed -> store in pgvector
python -m supportagent.index

RAG Answer API

uvicorn supportagent.api:app --reload

POST /ask with {"question": "..."} retrieves relevant chunks from pgvector, generates a German answer with citations ([1], [2], ...), and returns the cited sources. If the retrieved context doesn't support an answer, it returns a fixed controlled-refusal message instead.

Evaluation

python -m supportagent.eval

Runs a small set of German questions (eval_questions.py) covering single-source retrieval, multi-source synthesis, conflicting sources, terminology robustness, and controlled refusal, and prints a pass/fail report against the live pipeline.

Dashboard

streamlit run supportagent/dashboard.py

A simple chat UI on top of /ask (run uvicorn first, see above): ask a German question, filter by source (Confluence/Jira/all) in the sidebar, and expand each cited source to preview its content and open the original Confluence page or Jira issue. Set API_BASE_URL if the API isn't on http://localhost:8000.

PDF data prep

pdf_to_confluence.py extracts §-numbered sections from German insurance terms PDFs (Musterbedingungen/AVB) into Confluence page drafts. See the module docstring for the dry-run / save workflow.

Project layout

models.py - shared Document contract
html_utils.py, adf_utils.py - Confluence storage-format HTML and Jira ADF conversions
atlassian_client.py - real Confluence v2 / Jira v3 REST client
seed_content.py, seed.py - sample data + script to create it in Confluence/Jira
ingest.py - pulls real data back out and normalizes it to Document
chunking.py, embeddings.py, vector_store.py, index.py - chunk/embed/store pipeline

Tests

python -m pytest

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
screenshots		screenshots
supportagent		supportagent
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
architecture-proposal-v0.1.de.md		architecture-proposal-v0.1.de.md
architecture-proposal-v0.1.en.md		architecture-proposal-v0.1.en.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SupportAgent

Architecture (Prototype / MVP Deployment)

Screenshots

Setup

Pipeline

RAG Answer API

Evaluation

Dashboard

PDF data prep

Project layout

Tests

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SupportAgent

Architecture (Prototype / MVP Deployment)

Screenshots

Setup

Pipeline

RAG Answer API

Evaluation

Dashboard

PDF data prep

Project layout

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages