Emma is an AI-powered interactive handbook designed specifically for Ignatian Marians at the University of the Immaculate Conception (UIC). She provides instant answers about academic policies, campus life, and student services - no handbook skimming required.
- Academic Policy Guidance - Get clear explanations about attendance, grading, and course requirements
- Campus Life Information - Learn about events, facilities, and resources available on campus
- Student Services Support - Navigate administrative processes, support services, and more
- Natural Language Interface - Ask questions in everyday language, just like chatting with a friend
- Smart Search - Emma automatically searches the handbook, looks up specific pages, and performs calculations to find the best answer
- Real-Time Status - See what Emma is doing as she works on your question
Emma is built using:
- Google Gemma 4 - For natural language understanding and generation
- LM Studio - For local model deployment and management
- ChromaDB - For vector database and semantic search capabilities
- FastAPI - Backend server
- React + Vite + Tailwind CSS - Frontend
- Node.js (v18 or higher)
- Python (v3.9 or higher)
- Git
- LM Studio with the following models downloaded and available:
gemma-4-E4B-it(text, vision/OCR)text-embedding-nomic-embed-text-v2-moe(embeddings)
-
Clone the repository
git clone https://github.com/nedpals/emma.git cd emma -
Install dependencies
# Install frontend dependencies cd frontend npm install # Install backend dependencies cd .. pip install -r requirements.txt
-
Start the development servers
# Start the frontend development server (in frontend directory) cd frontend npm run dev # In another terminal, start the backend server python main.py
-
Access Emma at
http://localhost:8000
Emma uses ChromaDB as its vector store to enable semantic search capabilities. There are two primary methods for ingesting handbook content:
Method 1: Using LM Studio (Recommended for local processing)
- Place your handbook documents (PDF format) in the project's root directory (e.g.,
handbook.pdf). - Ensure LM Studio is running and serving the required models (
gemma-4-E4B-itandtext-embedding-nomic-embed-text-v2-moe) athttp://localhost:1234. - Run the embedding script. Choose one of the following commands:
- Standard Speed: Processes documents in smaller batches (default: 2). Suitable for systems with limited resources.
python embedding.py
- Faster Speed: Processes documents in larger batches (e.g., 600). Requires more system resources (RAM/VRAM) but significantly speeds up ingestion. Adjust the
MAX_EMBED_COUNTvalue based on your system's capabilities.MAX_EMBED_COUNT=600 python embedding.py
- Standard Speed: Processes documents in smaller batches (default: 2). Suitable for systems with limited resources.
- The script will first use
gemma-4-E4B-itto extract text segments from each page of the PDF via vision/OCR, caching the results in theextracted_2directory. Then, it will usetext-embedding-nomic-embed-text-v2-moeto create vector embeddings for each segment. - The embeddings and vector store data will be persisted in the
embeddings_dbdirectory.
Method 2: Using Google AI Studio (Alternative for text extraction)
This method is useful if you encounter issues with local vision model processing or prefer using Google's cloud-based models for the initial text extraction.
- Go to Google AI Studio.
- Create a new prompt. Upload your handbook PDF file.
- Use the prompt content from the
ingest_gemini_prompt.txtfile in this repository. Ensure you are using a capable multimodal model like Gemini 2.5 Pro. - Run the prompt. Google AI Studio will process the PDF and generate a JSON output containing the extracted text segments based on the prompt's instructions.
- Copy the entire JSON output.
- Create a new file named
page_0.jsoninside theextracted_2directory within your local project folder (create theextracted_2directory if it doesn't exist). - Paste the copied JSON content into
extracted_2/page_0.jsonand save the file. - Ensure LM Studio is running and serving only the required embedding model (
text-embedding-nomic-embed-text-v2-moe) athttp://localhost:1234. - Run the embedding script (choose standard or faster speed as described in Method 1):
# Standard speed python embedding.py # OR Faster speed # MAX_EMBED_COUNT=600 python embedding.py
- The script will detect the cached data in
extracted_2/page_0.json, skip the vision/OCR step, and proceed directly to embedding the text segments using the local embedding model. - The embeddings and vector store data will be persisted in the
embeddings_dbdirectory.
Note: Both methods produce the same extracted_2/page_0.json format. The embedding pipeline (embedding.py) automatically:
- Prepends section context to each chunk for better search relevance
- Splits oversized chunks at paragraph boundaries (max ~1200 characters per chunk)
- Uses
text-embedding-nomic-embed-text-v2-moefor embeddings
To re-embed existing extracted data (e.g., after changing the embedding model), delete the embeddings_db directory and re-run python embedding.py.
We welcome contributions to make Emma even better! If you'd like to contribute:
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
This project is not affiliated with, endorsed by, or connected to the University of the Immaculate Conception (UIC). Emma is an independent, personal project created with a strong desire to assist Ignatian Marians by utilizing the latest technologies available. All information provided should be verified with official UIC sources and personnel.
Made with ❤️ for Ignatian Marians

