Skip to content

x2ankit/quarry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸͺ¨ Quarry

AI Knowledge Infrastructure Platform

Python FastAPI PostgreSQL pgvector Redis Docker

status version license PRs welcome

πŸ“Œ Quickstart Β Β·Β  πŸ—οΈ Architecture Β Β·Β  πŸ—ΊοΈ Roadmap Β Β·Β  🧰 Tech Stack


πŸ“– Overview

Most AI projects stop at the demo: a single notebook, an in-memory vector store, a script that works once on someone's machine. Quarry is built the other way β€” as a real backend first, with the model layer added on top of infrastructure that's already production-shaped.

It's a long-running, versioned platform (v1 β†’ v15) rather than a one-off project, covering everything from RAG and agents to evaluation, observability, and inference serving.

βš–οΈ How it compares

πŸ§ͺ Typical AI demo πŸͺ¨ Quarry
Architecture Single script / notebook Tiered, containerized backend
Data layer In-memory, ephemeral PostgreSQL + pgvector, persisted
State / caching None Redis
Testing Manual spot-checks Metric-driven evaluation (planned)
Observability print() Structured logging & tracing (planned)
Lifecycle Abandoned after a weekend Versioned roadmap, v1 β†’ v15

πŸ—οΈ Architecture

flowchart TB
    Client["πŸ–₯️ Client / UI"]

    subgraph API["⚑ FastAPI API Gateway"]
        direction LR
        Auth["πŸ” Auth Layer<br/><sub>JWT Β· RBAC</sub>"]
        Docs["πŸ“„ Document Layer<br/><sub>Parse Β· Chunk</sub>"]
        Retrieval["πŸ”Ž Retrieval Layer<br/><sub>Embed Β· Search</sub>"]
    end

    subgraph Data["πŸ—„οΈ Data & State Layer"]
        direction LR
        PG[("🐘 PostgreSQL<br/>+ pgvector")]
        Redis[("⚑ Redis<br/>Cache · Queues")]
    end

    Inference["🧠 AI & Inference Layer<br/><sub>planned β€” v3</sub>"]

    Client -->|HTTP / REST| API
    Auth --> Data
    Docs --> Data
    Retrieval --> Data
    Data -.-> Inference

    style Inference stroke-dasharray: 5 5
Loading

πŸ”„ Request flow: the client talks to a single FastAPI gateway, which fans out to auth, document processing, and retrieval. All three sit on a shared PostgreSQL + pgvector store for persistence and a Redis layer for caching and queues. The whole stack runs as Docker Compose services today; the inference/LLM layer is the next piece going on top.


πŸ—ΊοΈ Roadmap

Quarry evolves as a single platform across 15 versions, grouped into four phases.

flowchart LR
    subgraph P1["🏁 Foundation"]
        v1["v1<br/>Core Backend"]
        v2["v2<br/>Production Backend"]
    end

    subgraph P2["🧠 Intelligence"]
        v3["v3<br/>LLM Layer"]
        v4["v4<br/>Production RAG"]
        v5["v5<br/>Agents"]
        v6["v6<br/>Evaluation"]
    end

    subgraph P3["πŸ“ˆ Scale"]
        v7["v7–v11<br/>Learning Β· Research Β·<br/>Repo Intel Β· Retrieval Β· Memory"]
        v12["v12–v13<br/>Guardrails Β·<br/>Cloud Ops"]
    end

    subgraph P4["πŸš€ Platform"]
        v14["v14<br/>Observability"]
        v15["v15<br/>Inference Platform"]
    end

    v1 --> v2 --> v3 --> v4 --> v5 --> v6 --> v7 --> v12 --> v14 --> v15

    classDef done fill:#2ea44f,stroke:#22863a,color:#fff
    classDef active fill:#fb8500,stroke:#d97706,color:#fff
    classDef planned fill:#eee,stroke:#bbb,color:#666

    class v1,v2 done
    class v3 active
    class v4,v5,v6,v7,v12,v14,v15 planned
Loading

🟒 Done Β· 🟠 In progress Β· βšͺ Planned

Phase Version Focus Status
🏁 Foundation v1 Auth, PostgreSQL, PDF parsing, embeddings, retrieval βœ… Complete
🏁 Foundation v2 Redis, pgvector, Docker, multi-container, health checks βœ… Complete
🧠 Intelligence v3 LLM integration, streaming, provider abstraction πŸ”Ά In progress
🧠 Intelligence v4 Hybrid search, re-ranking, advanced RAG ⬜ Planned
🧠 Intelligence v5 Multi-agent orchestration, tool use ⬜ Planned
🧠 Intelligence v6 RAG / agent evaluation framework ⬜ Planned
πŸ“ˆ Scale v7–v11 Fine-tuning, research pipelines, code search, graph RAG, memory ⬜ Planned
πŸ“ˆ Scale v12–v13 Guardrails, PII scrubbing, Kubernetes, CI/CD, Terraform ⬜ Planned
πŸš€ Platform v14 OpenTelemetry, structured logging, tracing ⬜ Planned
πŸš€ Platform v15 Custom vLLM serving, inference optimization ⬜ Planned

βš™οΈ Current Capabilities

  • πŸ” Authentication β€” JWT-based registration and login
  • πŸ“„ Document processing β€” PDF upload, parsing, and chunking via PyMuPDF
  • 🧬 Embeddings β€” automated vector generation via Sentence Transformers
  • πŸ”Ž Semantic retrieval β€” vector search over stored documents
  • πŸ—„οΈ Persistence β€” PostgreSQL via SQLAlchemy, with pgvector for embeddings
  • 🐳 Infrastructure β€” fully containerized: API, Postgres, and Redis as separate services

πŸ’“ Health check

GET /health
{
  "status": "healthy",
  "database": "connected",
  "redis": "connected"
}

🧰 Tech Stack

Layer Technology
🐍 Language Python
⚑ API framework FastAPI
🐘 Database PostgreSQL
πŸ”Ž Vector search pgvector
⚑ Cache / queues Redis
🧩 ORM SQLAlchemy
βœ… Validation Pydantic
πŸ” Auth JWT
πŸ“„ Document parsing PyMuPDF
🧬 Embeddings Sentence Transformers
🐳 Containerization Docker / Docker Compose

πŸ“‚ Repository Structure

quarry/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ api/          # Route handlers and endpoints
β”‚   β”œβ”€β”€ core/         # Config, security, settings
β”‚   β”œβ”€β”€ db/           # Sessions and migrations
β”‚   β”œβ”€β”€ models/       # SQLAlchemy models
β”‚   β”œβ”€β”€ schemas/      # Pydantic schemas
β”‚   └── services/     # Business logic (auth, docs, vectors)
β”œβ”€β”€ docs/             # Architecture notes
β”œβ”€β”€ scripts/          # Setup and DB utilities
β”œβ”€β”€ tests/            # Unit and integration tests
β”œβ”€β”€ .env.example
β”œβ”€β”€ requirements.txt
└── README.md

πŸš€ Quickstart

🐳 With Docker

docker compose up -d        # start the full stack
docker ps                   # view running services
docker compose down         # stop everything

πŸ“˜ API docs are served at http://localhost:8000/docs

πŸ› οΈ Without Docker

git clone https://github.com/x2ankit/quarry.git
cd quarry

python -m venv venv
source venv/bin/activate    # Windows: venv\Scripts\activate

pip install -r requirements.txt
cp .env.example .env        # then configure your settings

alembic upgrade head        # requires Postgres running locally or via Docker
uvicorn app.main:app --reload

πŸ“œ License

This project is open source. See LICENSE for details.


πŸ‘¨β€πŸ’» Author

Ankit Arayan Tripathy

GitHub


⭐ If Quarry's approach to AI infrastructure resonates with you, consider starring the repo.

About

Production-first AI systems platform evolving from retrieval and RAG into agents, memory, evaluation, observability, and inference infrastructure.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors