LangChain · SQLAlchemy · Groq · Streamlit
Natural language → SQL → results. A schema-aware ReAct agent that inspects your database, writes precise SELECT queries, executes them safely, and returns plain-English answers — all autonomously.
User (natural language question) │ ▼ Streamlit UI (app/main.py) │ ▼ LangChain ReAct Agent (app/agent/executor.py) │ llama-3.3-70b-versatile via Groq │ ├── Tool 1: list_tables → discovers available tables ├── Tool 2: get_table_schema → reads DDL for column names + types └── Tool 3: run_sql_query → executes SELECT, returns formatted rows │ ▼ SQLite via SQLAlchemy (safety layer blocks all non-SELECT)
| Decision | Choice | Why |
|---|---|---|
| Agent pattern | ReAct | Think → Act → Observe loop matches SQL workflow naturally |
| Schema-first | Always inspects DDL before writing SQL | Eliminates hallucinated column names |
| Safety | Dual-layer: SELECT-only check + keyword blocklist | Defence in depth — both layers must pass |
| LLM | llama-3.3-70b-versatile via Groq |
Free, fast, strong SQL generation |
| DB abstraction | SQLAlchemy | Swap SQLite → Postgres/MySQL by changing one connection string |
| Observability | return_intermediate_steps=True |
Every tool call visible in UI expander |
p4-sql-agent/
├── app/
│ ├── main.py # Streamlit entry point
│ ├── config.py # Settings loaded from .env
│ ├── agent/
│ │ ├── executor.py # build_agent(), AgentExecutor config
│ │ └── prompts.py # SYSTEM_PROMPT + PromptTemplate
│ ├── database/
│ │ ├── engine.py # SQLAlchemy engine, schema fetch, query exec
│ │ ├── safety.py # SELECT guard + keyword blocklist
│ │ └── seeder.py # Sample DB seed (departments, employees, projects)
│ ├── tools/
│ │ ├── schema_tools.py # list_tables + get_table_schema @tools
│ │ └── query_tools.py # run_sql_query @tool
│ └── ui/
│ ├── components.py # Chat history, reasoning trace, example questions
│ └── styles.py # Page config + custom CSS
├── tests/
│ ├── test_database.py # Engine, safety layer, schema fetch
│ ├── test_tools.py # Tool input/output contracts
│ └── test_agent.py # Agent invocation, intermediate steps
├── data/ # SQLite DB lives here at runtime (gitignored)
├── seed_db.py # Run once to create sample database
├── .env.example
├── pyproject.toml
└── requirements.txt
# 1. Clone and enter
git clone https://github.com/Vedant-1404/p4-sql-agent.git
cd p4-sql-agent
# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
# Add your GROQ_API_KEY to .env (free at console.groq.com)
# 5. Seed the sample database (run once)
python seed_db.py
# 6. Run
PYTHONPATH=. streamlit run app/main.pyHow many employees are in each department? Who are the top 3 highest-paid employees? What is the average salary by department? Which employees are assigned to more than one project? List all employees hired after 2021 who are still active. Which active projects are over budget vs their department budget?
- Thought — agent reads your question and decides which tool to call first
- Action: list_tables — discovers what tables exist in the database
- Action: get_table_schema — reads the exact DDL for relevant tables (never guesses column names)
- Action: run_sql_query — writes and executes a precise SELECT query
- Observation — receives formatted rows back from the database
- Final Answer — interprets results and responds in plain English
The full trace of every step is visible in the "Agent reasoning" expander in the UI.
Two independent layers block destructive queries:
- Layer 1 (
safety.py): rejects anything that doesn't start withSELECT - Layer 2 (
safety.py): scans forDROP,DELETE,INSERT,UPDATE,ALTER,TRUNCATE,CREATEanywhere in the query
Both layers run before SQLAlchemy ever sees the query.
| Variable | Default | Effect |
|---|---|---|
GROQ_API_KEY |
— | Required. Get free at console.groq.com |
MODEL_NAME |
llama-3.3-70b-versatile |
Groq model for SQL generation |
DB_PATH |
data/company.db |
Path to SQLite file — swap for any SQLite DB |
MAX_ROWS |
100 |
Result row cap to prevent token overflow |
- LangChain — ReAct agent, tool definitions, prompt templates
- Groq — free LLM inference (
llama-3.3-70b-versatile) - SQLAlchemy — database abstraction (SQLite by default, swappable)
- Streamlit — chat UI with agent reasoning trace
- python-dotenv — environment configuration