Skip to content

Vedant-1404/sql-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL Agent

LangChain · SQLAlchemy · Groq · Streamlit

Natural language → SQL → results. A schema-aware ReAct agent that inspects your database, writes precise SELECT queries, executes them safely, and returns plain-English answers — all autonomously.


Architecture

User (natural language question) │ ▼ Streamlit UI (app/main.py) │ ▼ LangChain ReAct Agent (app/agent/executor.py) │ llama-3.3-70b-versatile via Groq │ ├── Tool 1: list_tables → discovers available tables ├── Tool 2: get_table_schema → reads DDL for column names + types └── Tool 3: run_sql_query → executes SELECT, returns formatted rows │ ▼ SQLite via SQLAlchemy (safety layer blocks all non-SELECT)

Key design decisions

Decision Choice Why
Agent pattern ReAct Think → Act → Observe loop matches SQL workflow naturally
Schema-first Always inspects DDL before writing SQL Eliminates hallucinated column names
Safety Dual-layer: SELECT-only check + keyword blocklist Defence in depth — both layers must pass
LLM llama-3.3-70b-versatile via Groq Free, fast, strong SQL generation
DB abstraction SQLAlchemy Swap SQLite → Postgres/MySQL by changing one connection string
Observability return_intermediate_steps=True Every tool call visible in UI expander

Project Structure

p4-sql-agent/
├── app/
│   ├── main.py                 # Streamlit entry point
│   ├── config.py               # Settings loaded from .env
│   ├── agent/
│   │   ├── executor.py         # build_agent(), AgentExecutor config
│   │   └── prompts.py          # SYSTEM_PROMPT + PromptTemplate
│   ├── database/
│   │   ├── engine.py           # SQLAlchemy engine, schema fetch, query exec
│   │   ├── safety.py           # SELECT guard + keyword blocklist
│   │   └── seeder.py           # Sample DB seed (departments, employees, projects)
│   ├── tools/
│   │   ├── schema_tools.py     # list_tables + get_table_schema @tools
│   │   └── query_tools.py      # run_sql_query @tool
│   └── ui/
│       ├── components.py       # Chat history, reasoning trace, example questions
│       └── styles.py           # Page config + custom CSS
├── tests/
│   ├── test_database.py        # Engine, safety layer, schema fetch
│   ├── test_tools.py           # Tool input/output contracts
│   └── test_agent.py           # Agent invocation, intermediate steps
├── data/                       # SQLite DB lives here at runtime (gitignored)
├── seed_db.py                  # Run once to create sample database
├── .env.example
├── pyproject.toml
└── requirements.txt


Setup

# 1. Clone and enter
git clone https://github.com/Vedant-1404/p4-sql-agent.git
cd p4-sql-agent

# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment
cp .env.example .env
# Add your GROQ_API_KEY to .env (free at console.groq.com)

# 5. Seed the sample database (run once)
python seed_db.py

# 6. Run
PYTHONPATH=. streamlit run app/main.py

Sample questions

How many employees are in each department? Who are the top 3 highest-paid employees? What is the average salary by department? Which employees are assigned to more than one project? List all employees hired after 2021 who are still active. Which active projects are over budget vs their department budget?


How the React loop works

  1. Thought — agent reads your question and decides which tool to call first
  2. Action: list_tables — discovers what tables exist in the database
  3. Action: get_table_schema — reads the exact DDL for relevant tables (never guesses column names)
  4. Action: run_sql_query — writes and executes a precise SELECT query
  5. Observation — receives formatted rows back from the database
  6. Final Answer — interprets results and responds in plain English

The full trace of every step is visible in the "Agent reasoning" expander in the UI.


Safety

Two independent layers block destructive queries:

  • Layer 1 (safety.py): rejects anything that doesn't start with SELECT
  • Layer 2 (safety.py): scans for DROP, DELETE, INSERT, UPDATE, ALTER, TRUNCATE, CREATE anywhere in the query

Both layers run before SQLAlchemy ever sees the query.


Configuration

Variable Default Effect
GROQ_API_KEY Required. Get free at console.groq.com
MODEL_NAME llama-3.3-70b-versatile Groq model for SQL generation
DB_PATH data/company.db Path to SQLite file — swap for any SQLite DB
MAX_ROWS 100 Result row cap to prevent token overflow

Tech stack

  • LangChain — ReAct agent, tool definitions, prompt templates
  • Groq — free LLM inference (llama-3.3-70b-versatile)
  • SQLAlchemy — database abstraction (SQLite by default, swappable)
  • Streamlit — chat UI with agent reasoning trace
  • python-dotenv — environment configuration

About

Schema-aware text-to-SQL agent with LangChain ReAct, Groq, SQLAlchemy, and Streamlit. Natural language → SQL → results.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages