SQL Agent

LangChain · SQLAlchemy · Groq · Streamlit

Natural language → SQL → results. A schema-aware ReAct agent that inspects your database, writes precise SELECT queries, executes them safely, and returns plain-English answers — all autonomously.

Architecture

User (natural language question) │ ▼ Streamlit UI (app/main.py) │ ▼ LangChain ReAct Agent (app/agent/executor.py) │ llama-3.3-70b-versatile via Groq │ ├── Tool 1: list_tables → discovers available tables ├── Tool 2: get_table_schema → reads DDL for column names + types └── Tool 3: run_sql_query → executes SELECT, returns formatted rows │ ▼ SQLite via SQLAlchemy (safety layer blocks all non-SELECT)

Key design decisions

Decision	Choice	Why
Agent pattern	ReAct	Think → Act → Observe loop matches SQL workflow naturally
Schema-first	Always inspects DDL before writing SQL	Eliminates hallucinated column names
Safety	Dual-layer: SELECT-only check + keyword blocklist	Defence in depth — both layers must pass
LLM	`llama-3.3-70b-versatile` via Groq	Free, fast, strong SQL generation
DB abstraction	SQLAlchemy	Swap SQLite → Postgres/MySQL by changing one connection string
Observability	`return_intermediate_steps=True`	Every tool call visible in UI expander

Project Structure

p4-sql-agent/
├── app/
│   ├── main.py                 # Streamlit entry point
│   ├── config.py               # Settings loaded from .env
│   ├── agent/
│   │   ├── executor.py         # build_agent(), AgentExecutor config
│   │   └── prompts.py          # SYSTEM_PROMPT + PromptTemplate
│   ├── database/
│   │   ├── engine.py           # SQLAlchemy engine, schema fetch, query exec
│   │   ├── safety.py           # SELECT guard + keyword blocklist
│   │   └── seeder.py           # Sample DB seed (departments, employees, projects)
│   ├── tools/
│   │   ├── schema_tools.py     # list_tables + get_table_schema @tools
│   │   └── query_tools.py      # run_sql_query @tool
│   └── ui/
│       ├── components.py       # Chat history, reasoning trace, example questions
│       └── styles.py           # Page config + custom CSS
├── tests/
│   ├── test_database.py        # Engine, safety layer, schema fetch
│   ├── test_tools.py           # Tool input/output contracts
│   └── test_agent.py           # Agent invocation, intermediate steps
├── data/                       # SQLite DB lives here at runtime (gitignored)
├── seed_db.py                  # Run once to create sample database
├── .env.example
├── pyproject.toml
└── requirements.txt

Setup

# 1. Clone and enter
git clone https://github.com/Vedant-1404/p4-sql-agent.git
cd p4-sql-agent

# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment
cp .env.example .env
# Add your GROQ_API_KEY to .env (free at console.groq.com)

# 5. Seed the sample database (run once)
python seed_db.py

# 6. Run
PYTHONPATH=. streamlit run app/main.py

Sample questions

How many employees are in each department? Who are the top 3 highest-paid employees? What is the average salary by department? Which employees are assigned to more than one project? List all employees hired after 2021 who are still active. Which active projects are over budget vs their department budget?

How the React loop works

Thought — agent reads your question and decides which tool to call first
Action: list_tables — discovers what tables exist in the database
Action: get_table_schema — reads the exact DDL for relevant tables (never guesses column names)
Action: run_sql_query — writes and executes a precise SELECT query
Observation — receives formatted rows back from the database
Final Answer — interprets results and responds in plain English

The full trace of every step is visible in the "Agent reasoning" expander in the UI.

Safety

Two independent layers block destructive queries:

Layer 1 (safety.py): rejects anything that doesn't start with SELECT
Layer 2 (safety.py): scans for DROP, DELETE, INSERT, UPDATE, ALTER, TRUNCATE, CREATE anywhere in the query

Both layers run before SQLAlchemy ever sees the query.

Configuration

Variable	Default	Effect
`GROQ_API_KEY`	—	Required. Get free at console.groq.com
`MODEL_NAME`	`llama-3.3-70b-versatile`	Groq model for SQL generation
`DB_PATH`	`data/company.db`	Path to SQLite file — swap for any SQLite DB
`MAX_ROWS`	`100`	Result row cap to prevent token overflow

Tech stack

LangChain — ReAct agent, tool definitions, prompt templates
Groq — free LLM inference (llama-3.3-70b-versatile)
SQLAlchemy — database abstraction (SQLite by default, swappable)
Streamlit — chat UI with agent reasoning trace
python-dotenv — environment configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQL Agent

Architecture

Key design decisions

Project Structure

Setup

Sample questions

How the React loop works

Safety

Configuration

Tech stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.streamlit		.streamlit
app		app
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
seed_db.py		seed_db.py

Folders and files

Latest commit

History

Repository files navigation

SQL Agent

Architecture

Key design decisions

Project Structure

Setup

Sample questions

How the React loop works

Safety

Configuration

Tech stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages