GitHub - nedpals/emma: The interactive handbook for Ignacian Marians

Emma is an AI-powered interactive handbook designed specifically for Ignatian Marians at the University of the Immaculate Conception (UIC). She provides instant answers about academic policies, campus life, and student services - no handbook skimming required.

🚀 Features

Academic Policy Guidance - Get clear explanations about attendance, grading, and course requirements
Campus Life Information - Learn about events, facilities, and resources available on campus
Student Services Support - Navigate administrative processes, support services, and more
Natural Language Interface - Ask questions in everyday language, just like chatting with a friend
Smart Search - Emma automatically searches the handbook, looks up specific pages, and performs calculations to find the best answer
Real-Time Status - See what Emma is doing as she works on your question

🛠️ Technology

Emma is built using:

Google Gemma 4 - For natural language understanding and generation
LM Studio - For local model deployment and management
ChromaDB - For vector database and semantic search capabilities
FastAPI - Backend server
React + Vite + Tailwind CSS - Frontend

🚀 Installation & Setup

Prerequisites

Node.js (v18 or higher)
Python (v3.9 or higher)
Git
LM Studio with the following models downloaded and available:
- gemma-4-E4B-it (text, vision/OCR)
- text-embedding-nomic-embed-text-v2-moe (embeddings)

Local Setup

Clone the repository

git clone https://github.com/nedpals/emma.git
cd emma

Install dependencies

# Install frontend dependencies
cd frontend
npm install

# Install backend dependencies
cd ..
pip install -r requirements.txt

Start the development servers

# Start the frontend development server (in frontend directory)
cd frontend
npm run dev

# In another terminal, start the backend server
python main.py

Access Emma at http://localhost:8000

Handbook Ingestion

Emma uses ChromaDB as its vector store to enable semantic search capabilities. There are two primary methods for ingesting handbook content:

Method 1: Using LM Studio (Recommended for local processing)

Place your handbook documents (PDF format) in the project's root directory (e.g., handbook.pdf).
Ensure LM Studio is running and serving the required models (gemma-4-E4B-it and text-embedding-nomic-embed-text-v2-moe) at http://localhost:1234.
Run the embedding script. Choose one of the following commands:
- Standard Speed: Processes documents in smaller batches (default: 2). Suitable for systems with limited resources.
```
python embedding.py
```
- Faster Speed: Processes documents in larger batches (e.g., 600). Requires more system resources (RAM/VRAM) but significantly speeds up ingestion. Adjust the MAX_EMBED_COUNT value based on your system's capabilities.
```
MAX_EMBED_COUNT=600 python embedding.py
```
The script will first use gemma-4-E4B-it to extract text segments from each page of the PDF via vision/OCR, caching the results in the extracted_2 directory. Then, it will use text-embedding-nomic-embed-text-v2-moe to create vector embeddings for each segment.
The embeddings and vector store data will be persisted in the embeddings_db directory.

Method 2: Using Google AI Studio (Alternative for text extraction)

This method is useful if you encounter issues with local vision model processing or prefer using Google's cloud-based models for the initial text extraction.

Go to Google AI Studio.
Create a new prompt. Upload your handbook PDF file.
Use the prompt content from the ingest_gemini_prompt.txt file in this repository. Ensure you are using a capable multimodal model like Gemini 2.5 Pro.
Run the prompt. Google AI Studio will process the PDF and generate a JSON output containing the extracted text segments based on the prompt's instructions.
Copy the entire JSON output.
Create a new file named page_0.json inside the extracted_2 directory within your local project folder (create the extracted_2 directory if it doesn't exist).
Paste the copied JSON content into extracted_2/page_0.json and save the file.
Ensure LM Studio is running and serving only the required embedding model (text-embedding-nomic-embed-text-v2-moe) at http://localhost:1234.

Run the embedding script (choose standard or faster speed as described in Method 1):

# Standard speed
python embedding.py
# OR Faster speed
# MAX_EMBED_COUNT=600 python embedding.py

The script will detect the cached data in extracted_2/page_0.json, skip the vision/OCR step, and proceed directly to embedding the text segments using the local embedding model.
The embeddings and vector store data will be persisted in the embeddings_db directory.

Note: Both methods produce the same extracted_2/page_0.json format. The embedding pipeline (embedding.py) automatically:

Prepends section context to each chunk for better search relevance
Splits oversized chunks at paragraph boundaries (max ~1200 characters per chunk)
Uses text-embedding-nomic-embed-text-v2-moe for embeddings

To re-embed existing extracted data (e.g., after changing the embedding model), delete the embeddings_db directory and re-run python embedding.py.

🤝 Contributing

We welcome contributions to make Emma even better! If you'd like to contribute:

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

⚠️ Disclaimer

This project is not affiliated with, endorsed by, or connected to the University of the Immaculate Conception (UIC). Emma is an independent, personal project created with a strong desire to assist Ignatian Marians by utilizing the latest technologies available. All information provided should be verified with official UIC sources and personnel.

Made with ❤️ for Ignatian Marians

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github		.github
frontend		frontend
providers		providers
templates		templates
tests		tests
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
agent_setup.py		agent_setup.py
cli.py		cli.py
embedding.py		embedding.py
extractor.py		extractor.py
ingest_gemini_prompt.txt		ingest_gemini_prompt.txt
llm.py		llm.py
main.py		main.py
meta.py		meta.py
models.py		models.py
nlp.py		nlp.py
prompt.py		prompt.py
requirements.txt		requirements.txt
server.py		server.py
vector_store.py		vector_store.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Features

🛠️ Technology

🚀 Installation & Setup

Prerequisites

Local Setup

Handbook Ingestion

🤝 Contributing

📝 License

⚠️ Disclaimer

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 Features

🛠️ Technology

🚀 Installation & Setup

Prerequisites

Local Setup

Handbook Ingestion

🤝 Contributing

📝 License

⚠️ Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages