RAG pipeline over a real Confluence space + Jira project, for the
"Insurance Knowledge Search Dashboard" portfolio project. See
architecture-proposal-v0.1.de.md for the full design.
graph TD
User[👤 Browser]
subgraph SC["🌐 Frontend"]
Dashboard[dashboard.py<br/>Streamlit]
end
subgraph VC["⚡ Backend API"]
API[api.py<br/>POST /ask]
Retrieval[retrieval.py<br/>retrieve]
Answer[answer.py<br/>generate_answer]
end
subgraph SB["🗄️ Database"]
PG[(Postgres + pgvector)]
end
subgraph DS["🤖 AI Services"]
Embed[text-embedding-v3]
Chat[qwen3-max]
end
User -->|ask question| Dashboard
Dashboard -->|POST /ask| API
API --> Retrieval
Retrieval --> Embed
Embed --> Retrieval
Retrieval --> PG
PG --> Retrieval
Retrieval --> API
API --> Answer
Answer --> Chat
Chat --> Answer
Answer --> API
API --> Dashboard
Dashboard --> User
%% Colors
classDef frontend fill:#4f46e5,color:#fff
classDef backend fill:#059669,color:#fff
classDef database fill:#dc2626,color:#fff
classDef ai fill:#ea580c,color:#fff
classDef user fill:#6b7280,color:#fff
class Dashboard frontend
class API,Retrieval,Answer backend
class PG database
class Embed,Chat ai
class User user
api.pyis a thin controller: it just callsretrieve()thengenerate_answer().- Two external calls go to DashScope: one to embed the question, one to generate the final answer from the retrieved chunks.
- The Streamlit dashboard calls the Vercel API server-to-server (
requests.post), so no CORS configuration is needed.
Dashboard: ask a German question and get an answer with cited, expandable sources.
/ask API schema (FastAPI Swagger UI):
The real Jira project (KAN) backing the "documentation gap" tickets used as sources:
python -m venv .venv
source .venv/bin/activate
python -m pip install -e .
docker compose up -d postgresCopy .env.example to .env and fill in:
ATLASSIAN_BASE_URL,ATLASSIAN_EMAIL,ATLASSIAN_API_TOKEN- Confluence/Jira Cloud API tokenCONFLUENCE_SPACE_KEY,JIRA_PROJECT_KEY- the space/project to read from and write toEMBEDDING_API_KEY- Alibaba Cloud Model Studio (DashScope) API key, used via its OpenAI-compatible endpoint (EMBEDDING_BASE_URL) for both embeddings and chat (CHAT_MODEL)DATABASE_URL- points at the pgvector container started bydocker compose up
# 1. Seed the Confluence space + Jira project with sample insurance content
python -m supportagent.seed
# 2. Pull real Confluence pages (tagged "insurance-kb") + Jira issues, normalize to Documents
python -m supportagent.ingest
# 3. Chunk -> embed -> store in pgvector
python -m supportagent.indexuvicorn supportagent.api:app --reloadPOST /ask with {"question": "..."} retrieves relevant chunks from pgvector,
generates a German answer with citations ([1], [2], ...), and returns the
cited sources. If the retrieved context doesn't support an answer, it returns
a fixed controlled-refusal message instead.
python -m supportagent.evalRuns a small set of German questions (eval_questions.py) covering
single-source retrieval, multi-source synthesis, conflicting sources,
terminology robustness, and controlled refusal, and prints a pass/fail
report against the live pipeline.
streamlit run supportagent/dashboard.pyA simple chat UI on top of /ask (run uvicorn first, see above): ask a
German question, filter by source (Confluence/Jira/all) in the sidebar, and
expand each cited source to preview its content and open the original
Confluence page or Jira issue. Set API_BASE_URL if the API isn't on
http://localhost:8000.
pdf_to_confluence.py extracts §-numbered sections from German insurance
terms PDFs (Musterbedingungen/AVB) into Confluence page drafts. See the
module docstring for the dry-run / save workflow.
models.py- sharedDocumentcontracthtml_utils.py,adf_utils.py- Confluence storage-format HTML and Jira ADF conversionsatlassian_client.py- real Confluence v2 / Jira v3 REST clientseed_content.py,seed.py- sample data + script to create it in Confluence/Jiraingest.py- pulls real data back out and normalizes it toDocumentchunking.py,embeddings.py,vector_store.py,index.py- chunk/embed/store pipeline
python -m pytest


