Medical Assistant is a comprehensive desktop application for medical documentation, designed to transcribe and refine spoken medical notes. It leverages multiple AI providers (OpenAI, Anthropic/Claude, Google Gemini, Groq, Cerebras, and Ollama) with a modular, mixin-based architecture (~150K LOC across 420+ modules) for efficient audio-to-text conversion, clinical note generation, and intelligent medical analysis.
- Features
- Quick Start
- Installation
- Pre-built Releases
- Building from Source
- Usage Guide
- Configuration
- Security
- Healthcare Standards
- Architecture
- Testing
- Troubleshooting
- Contributing
- License
| Feature | Description |
|---|---|
| Workflow-Based Interface | Sidebar navigation with 8 sections (Record, SOAP Note, Referral, Letter, Chat, RAG, Recordings, Analysis) plus 6 text editor tabs and 6 SOAP analysis sub-tabs |
| Unified Preferences | Comprehensive settings dialog (Ctrl+,) with tabbed interface for API keys, audio settings, AI models, prompts, and storage |
| AI-Powered Chat | ChatGPT-style interface with context-aware suggestions for interacting with your medical notes |
| RAG Document Search | Hybrid vector + keyword search with medical query expansion, adaptive thresholds, and MMR diversity |
| Knowledge Graph | Interactive visualization of medical entities and relationships from Neo4j |
| RSVP Reader | Speed-reading interface for SOAP notes with ORP highlighting and section navigation |
| Advanced Recording | Record medical conversations with visual feedback, waveform display, timer, and pause/resume capabilities |
| Real-Time Analysis | Optional periodic analysis during recording generates differential diagnoses every 2 minutes |
| Queue System | Background processing queue with "Quick Continue Mode" for efficient multi-patient recording sessions |
| Batch Processing | Process multiple recordings or audio files at once with progress tracking and statistics |
| Recordings Manager | Dedicated tab with search, filter, and document status indicators (✓, —, 🔄, ❌) |
-
Context-Aware SOAP Notes
- Side panel for adding previous medical information
- Automatically integrates patient history into SOAP note generation
- Smart context preservation during recordings
- Voice emotion analysis integration (when using Modulate STT) — patient emotional state woven into Subjective section
-
ICD Code Integration
- Choose between ICD-9, ICD-10, or both code versions
- Automatic code suggestions based on diagnoses
-
Smart Templates
- Pre-built templates: Follow-up, New Patient, Telehealth, Emergency, Pediatric, Geriatric
- Create and save custom context templates
- Template import/export functionality
-
Multi-Format Document Generation
- SOAP notes with customizable sections
- Professional referral letters
- Patient correspondence
- Employer/insurance documentation
Medical Assistant includes specialized AI agents for different clinical tasks:
| Agent | Capabilities |
|---|---|
| Medication Analysis | Extract medications from text, check drug-drug interactions with severity levels, validate dosing appropriateness, suggest alternatives, generate prescriptions |
| Diagnostic Agent | Analyze symptoms, generate differential diagnoses ranked by likelihood, provide ICD codes, suggest diagnostic workups |
| Compliance Agent | Audit SOAP notes against clinical documentation standards, flag missing elements, score completeness |
| Data Extraction | Extract structured clinical data (vitals, labs, medications, diagnoses, allergies) from unstructured text |
| Clinical Workflow | Step-by-step guidance for patient intake, diagnostic workups, treatment protocols, and follow-up care with interactive checklists |
| Referral Agent | Generate professional referral letters with address book integration and specialty inference |
| Synopsis Agent | Generate concise SOAP note summaries for quick review |
| Chat Agent | Conversational AI with tool use for document editing, context-aware responses |
- Address Book Management: Store and manage referral recipients (specialists, facilities, labs)
- CSV Contact Import: Bulk import contacts with field mapping
- Searchable Recipients: Quick search and selection when creating referrals
- Smart Specialty Inference: Automatically suggests appropriate specialists based on clinical content
- Contact Categories: Organize by specialty, facility type, or custom categories
- Hybrid Search: Combines vector similarity (pgvector), BM25 keyword search, and knowledge graph traversal with configurable weights
- Medical Query Expansion: Automatic expansion of medical abbreviations (HTN, COPD, MI) and synonyms for better recall
- Adaptive Thresholds: Dynamically adjusts similarity cutoffs based on query length and score distribution
- MMR Reranking: Maximal Marginal Relevance ensures diverse, non-redundant results
- Knowledge Graph Visualization: Interactive pan/zoom/drag graph canvas showing entities (medications, conditions, procedures) and relationships from Neo4j
- Clinical Guidelines: Upload and search clinical guideline PDFs with chunking, OCR support, and recommendation extraction
- Streaming Responses: Progressive result display with cancellation support
- Conversation Context: Semantic follow-up detection maintains context across queries
Real-time medical translation for multilingual patient consultations:
- Real-time Translation: Automatic translation as you type with smart debouncing
- Speech-to-Text: Record patient speech with automatic transcription
- Text-to-Speech: Play translated responses for patients with voice selection
- Language Support: 100+ languages with automatic detection
- Canned Responses: Customizable quick responses for common medical phrases organized by category
- Conversation Export: Save conversation transcripts for documentation
| Provider | Models | Features |
|---|---|---|
| OpenAI | GPT-4o, GPT-4o-mini, GPT-4 Turbo | Streaming, function calling, dynamic model fetch |
| Anthropic | Claude Opus 4, Claude Sonnet 4, Claude Haiku 4 | Extended context, dynamic model fetch |
| Google Gemini | Gemini 2.0 Flash, Gemini 1.5 Pro, Gemini 1.5 Flash | Multimodal, long context, dynamic model fetch |
| Groq | Llama 3.3 70B, Mixtral 8x7B, Gemma2 9B | Ultra-fast inference, dynamic model fetch |
| Cerebras | Llama 3.3 70B, Qwen 3 32B | Wafer-scale inference, dynamic model fetch |
| Ollama | Llama 3, Mistral, Qwen, Phi-3, etc. | Local/offline, privacy-focused, auto-detect |
- Intelligent Provider Routing: Automatic fallback and provider selection based on model configuration
- Dynamic Model Lists: Models fetched from provider APIs with TTL caching (1 hour) and fallback lists
- Streaming Support: Real-time response streaming for faster perceived performance
| Provider | Model | Best For |
|---|---|---|
| Deepgram | Nova-2 Medical | Medical terminology accuracy, HIPAA-eligible |
| ElevenLabs | Scribe v2 | High accuracy, speaker diarization, entity detection, keyterm prompting |
| Groq | Whisper Large v3 Turbo | Speed (216x real-time), cost-effective |
| Modulate (Velma) | Velma Transcribe | Voice emotion detection (20+ emotions), speaker diarization, deepfake detection, PII/PHI redaction |
| Local Whisper | Turbo | Offline capability, privacy |
- ElevenLabs Integration: Multiple voice options with natural speech
- Model Selection: Flash v2.5 (ultra-low latency), Turbo v2.5, Multilingual v2
- Offline Fallback: pyttsx3 for offline TTS capability
- Mixin-Based Architecture: Large classes decomposed into focused mixins (AudioHandler: 5 mixins, ProcessingQueue: 7 mixins, RagProcessor: 4 mixins) with Protocol contracts
- Type Safety: TypedDict definitions for processing queue tasks, chat context, and guideline batches; runtime-checkable AppProtocol for mixin boundaries
- Secure API Key Storage: Fernet encryption with PBKDF2 key derivation, per-installation salt, machine-specific keys, legacy salt migration
- Security Decorators: Rate limiting, input sanitization with prompt injection detection, and secure API call wrappers
- PHI Redaction: Automatic redaction of 60+ sensitive field types in application logs and audit trail
- Audit Logging: Append-only HIPAA-compliant audit log tracking API key access, data exports, and recording operations
- Database Storage: SQLite with FTS5 full-text search, versioned migrations (17 versions), connection pooling with health checks
- Resilient API Calls: Circuit breaker pattern, exponential backoff, automatic retry, and STT provider failover chain
- Export Functionality: Export recordings and documents in PDF, DOCX, and text formats
- FHIR Support: Export clinical data in HL7 FHIR R4 format (Patient, Encounter, Condition, Observation, MedicationStatement, DocumentReference)
- Performance Optimizations: HTTP/2 support, connection pooling, thread pool executors, background processing queue with priority scheduling
- Import Guards: Optional dependencies (pygame, soundcard, fhir.resources, docx, reportlab) guarded with availability flags
- Cross-Platform: Windows, macOS, and Linux with platform-specific optimizations
- Comprehensive Test Suite: 4,100+ tests (unit + integration) with 50%+ critical path coverage
- Modern UI/UX: Built with Tkinter and ttkbootstrap featuring animations, visual indicators, dark/light themes
# 1. Clone the repository
git clone https://github.com/cortexuvula/Medical-Assistant.git
cd Medical-Assistant
# 2. Install dependencies
pip install -r requirements.txt
# 3. Run the application
python main.py
# 4. Configure API keys via Settings → API Keys (keys are encrypted)Minimum Requirements: At least one LLM provider API key (OpenAI, Anthropic, Gemini, Groq, or Cerebras) and one STT provider API key (Deepgram, ElevenLabs, Groq, or Modulate). Local Whisper and Ollama work without API keys.
- Python 3.10 or higher (required for SDK compatibility)
- FFmpeg (for audio processing)
-
Clone or Download the Repository
git clone https://github.com/cortexuvula/Medical-Assistant.git cd Medical-Assistant -
Create Virtual Environment (Recommended)
python -m venv venv # Windows venv\Scripts\activate # macOS/Linux source venv/bin/activate
-
Install Dependencies
pip install -r requirements.txt
-
Configure API Keys
Option A - Application Dialog (Recommended):
- Launch the application:
python main.py - Go to Settings → API Keys
- Enter your API keys (they are encrypted automatically)
Option B - Environment File: Create a
.envfile in the project root:# LLM Providers OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... GEMINI_API_KEY=AI... GROQ_API_KEY=gsk_... CEREBRAS_API_KEY=csk-... # Speech-to-Text Providers DEEPGRAM_API_KEY=... ELEVENLABS_API_KEY=... MODULATE_API_KEY=... # Optional: Local Models OLLAMA_API_URL=http://localhost:11434 # Optional: RAG Integration NEON_DATABASE_URL=postgresql://user:pass@host/database?sslmode=require # Optional: Knowledge Graph NEO4J_BOLT_URL=bolt://localhost:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=...
- Launch the application:
-
Install FFmpeg
Platform Command Windows winget install FFmpegor download from ffmpeg.orgmacOS brew install ffmpegUbuntu/Debian sudo apt install ffmpegFedora sudo dnf install ffmpeg -
Ollama Setup (Optional)
For local AI models without internet dependency:
# Install Ollama from https://ollama.ai # Pull models ollama pull llama3 ollama pull mistral ollama pull qwen2 # Models are automatically detected by the application
Download pre-built executables from the Releases page:
| Platform | File | Notes |
|---|---|---|
| Windows | MedicalAssistant.exe |
Requires FFmpeg installation |
| macOS | MedicalAssistant-macOS.zip |
FFmpeg bundled, may require security approval |
| Linux | MedicalAssistant |
Requires system FFmpeg |
- Executables include all Python dependencies
- API keys configured via the application's settings dialog
- First run may be slower due to antivirus scanning
- macOS users: Right-click → Open to bypass Gatekeeper on first run
- Python 3.10+ with pip
- All dependencies:
pip install -r requirements.txt
Windows:
scripts\build_windows.batmacOS:
chmod +x scripts/build_macos.sh
./scripts/build_macos.shLinux:
chmod +x scripts/build_linux.sh
./scripts/build_linux.shExecutables are output to the dist/ directory.
python main.pyThe sidebar provides quick access to all application sections:
| Section | Description |
|---|---|
| Record | Start/stop/pause recordings with waveform display, timer, microphone selection, and optional real-time differential diagnosis every 2 minutes |
| SOAP Note | View and edit generated SOAP notes with 6 analysis sub-tabs (Medication Analysis, Differential Diagnosis, Clinical Guidelines, Emotional Assessment, ICD Validation, Medication QA) |
| Referral | Create professional referral letters with address book integration |
| Letter | Generate formal medical correspondence (patient, employer, insurance) |
| Chat | AI chat interface with context-aware suggestions, uses current document as context (Ctrl+/ to focus) |
| RAG | Hybrid document search with medical query expansion, knowledge graph visualization, and clinical guidelines |
| Recordings | Search, filter, and manage recordings with status indicators and batch processing |
| Analysis | Advanced analysis tools: diagnostic analysis, medication analysis, data extraction, clinical workflows |
Six editor tabs display and allow editing of content: Transcript, SOAP Note, Referral, Letter, Chat, and RAG.
| Tool | Function |
|---|---|
| Refine Text | Clean up transcribed text with AI assistance |
| Improve Text | Enhance clarity and medical terminology |
| Medication | Extract medications, check interactions, validate dosing |
| Diagnostic | Analyze symptoms and generate differential diagnoses |
| Workflow | Step-by-step clinical process guidance |
| Translation | Bidirectional medical translation assistant |
| Data Extract | Extract structured data (vitals, labs, medications) |
The context panel provides additional information for SOAP note generation:
- Click the Context button to open the side panel
- Add previous medical history, current medications, allergies
- Select from pre-built templates or create custom ones
- Context is automatically integrated into SOAP note generation
- Context persists during recording sessions
Manage referral recipients efficiently:
- Access via Tools → Manage Address Book
- Add specialists, facilities, and labs with contact details
- Import contacts from CSV: Tools → Import Contacts from CSV
- Quick search when creating referrals
- Automatic specialty inference from clinical content
For multilingual patient consultations:
- Access via Tools → Translation Assistant
- Select patient and doctor languages from dropdowns
- Record Patient Speech: Click microphone to record, transcribes automatically
- Type Responses: Real-time translation as you type
- Play Translation: Click speaker icon for TTS playback
- Canned Responses: Use quick responses for common phrases
- Export: Save conversation transcript
Query your document database with hybrid search:
- Navigate to the RAG tab
- Enter your search query (medical abbreviations are automatically expanded)
- Results rendered in markdown with source attribution
- Copy responses with the copy button
- Follow-up queries maintain conversation context
- Click Knowledge Graph to visualize entity relationships
- Upload clinical guidelines via the Guidelines tab for searchable reference
Access all settings through the comprehensive Preferences dialog:
| Tab | Settings |
|---|---|
| API Keys | All LLM keys (OpenAI, Anthropic, Gemini, Groq, Cerebras) and STT keys (Deepgram, ElevenLabs, Groq, Modulate) |
| Audio & STT | Provider settings (ElevenLabs, Deepgram, Groq, Modulate), TTS voice selection, audio quality |
| AI Models | Temperature settings per task, model selection, translation provider configuration |
| Prompts | Customize Refine, Improve, SOAP, Referral, and Advanced Analysis prompts |
| Storage | Default folder, Custom Vocabulary, Address Book management, Prefix Audio |
| General | Quick Continue Mode, Theme selection, Sidebar preferences, Keyboard shortcuts |
Settings
├── Preferences... [Ctrl+,]
├── ─────────────
├── Update API Keys [Quick access]
├── ─────────────
├── Audio & Transcription ▸
│ ├── ElevenLabs Settings
│ ├── Deepgram Settings
│ ├── Groq Settings
│ ├── Modulate Settings
│ └── TTS Settings
├── AI & Models ▸
│ ├── Temperature Settings
│ ├── Agent Settings
│ ├── Translation Settings
│ └── MCP Tools
├── Prompt Settings ▸
│ ├── Refine Prompt Settings
│ ├── Improve Prompt Settings
│ ├── SOAP Note Settings
│ ├── Referral Settings
│ └── Advanced Analysis Settings
├── Data & Storage ▸
│ ├── Custom Vocabulary
│ ├── Manage Address Book...
│ ├── Import Contacts from CSV...
│ ├── Set Storage Folder
│ └── Record Prefix Audio
├── ─────────────
├── Export Prompts
├── Import Prompts
├── ─────────────
├── Quick Continue Mode [Toggle]
└── Toggle Theme [Alt+T]
| Shortcut | Action |
|---|---|
Ctrl+, / Cmd+, |
Open Preferences dialog |
Ctrl+/ / Cmd+/ |
Focus chat input |
Ctrl+N / Cmd+N |
New session |
Ctrl+S / Cmd+S |
Save |
Ctrl+Z / Cmd+Z |
Undo |
Ctrl+Y / Cmd+Shift+Z |
Redo |
Ctrl+E / Cmd+E |
Export as PDF |
Ctrl+D / Cmd+D |
Run Diagnostic Analysis |
Alt+T |
Toggle dark/light theme |
See SHORTCUTS.md for the complete list.
- Fernet Encryption: API keys encrypted at rest using
cryptographylibrary with PBKDF2 (100K iterations) - Per-Installation Salt: Unique 256-bit salt per installation (stored in
salt.bin) - Machine-Specific Keys: Encryption keys derived from machine identifiers (machine-id, filesystem UUID)
- Legacy Migration: Automatic migration from old static salt to per-install salt with version tracking
- No Plaintext Storage: Keys are never stored in plaintext on disk
- Rate Limiting: Per-provider rate limiting with configurable limits (e.g., 60 calls/minute for Anthropic)
- Input Sanitization: Prompt injection detection with optional strict mode that rejects dangerous content
- API Key Validation: Format validation with regex patterns for known provider key formats (OpenAI, Anthropic, Groq, Cerebras, Gemini)
- PHI Redaction: 60+ sensitive field types automatically redacted in application logs
- Audit Logging: Append-only audit trail tracking sensitive operations (API key access, data exports)
- Database Protection: POSIX 0600 permissions and Windows ACL enforcement on database files
- Path Traversal Protection: File paths validated after resolution to prevent encoded traversal attacks
- Secure HTTP: Explicit TLS verification on all HTTPS clients via centralized client manager
- No Data Transmission: Patient data only sent to configured AI providers
- Local Processing Options: Use Ollama + local Whisper for completely offline operation
- Use the in-app API key dialog (encrypted) rather than
.envfiles - Keep API keys confidential and rotate them periodically
- Use local Whisper or Ollama for sensitive data when possible
- Review provider privacy policies for HIPAA compliance requirements
- Monitor the audit log (
audit.log) for unexpected access patterns
Medical Assistant supports HL7 FHIR R4 for healthcare interoperability:
- Export to FHIR: File → Export → Export as FHIR...
- Clipboard Export: File → Export → Export FHIR to Clipboard
- Supported Resources:
- Patient
- Encounter
- Condition (diagnoses)
- Observation (vitals, labs)
- MedicationStatement
- DocumentReference (SOAP notes)
- ICD-9: Legacy code support for historical records
- ICD-10: Current standard with automatic suggestions
- Dual Coding: Generate both ICD-9 and ICD-10 codes simultaneously
Medical-Assistant/
├── main.py # Application entry point
├── src/ # ~150K LOC across 420+ modules
│ ├── ai/ # AI providers and processors
│ │ ├── agents/ # 8 specialized AI agents
│ │ │ ├── base.py # Base agent with caching, validation
│ │ │ ├── medication.py # Drug interactions, dosing
│ │ │ ├── diagnostic.py # Differential diagnosis
│ │ │ ├── compliance.py # Documentation audit
│ │ │ ├── workflow.py # Clinical workflows
│ │ │ ├── chat.py # Conversational with tool use
│ │ │ ├── synopsis.py # SOAP note summarization
│ │ │ ├── referral.py # Referral letter generation
│ │ │ ├── data_extraction.py # Structured clinical data extraction
│ │ │ └── models.py # AgentConfig, AgentTask, AgentResponse
│ │ ├── providers/ # Modular AI provider implementations
│ │ │ ├── openai_provider.py
│ │ │ ├── anthropic_provider.py
│ │ │ ├── gemini_provider.py
│ │ │ ├── groq_provider.py
│ │ │ ├── cerebras_provider.py
│ │ │ ├── ollama_provider.py
│ │ │ └── router.py # Intelligent provider routing
│ │ ├── ai_processor.py # Core AI processing logic
│ │ ├── chat_processor.py # Chat with TypedDict context
│ │ ├── rag_processor.py # RAG facade (4 mixins)
│ │ ├── rag_query.py # RagQueryMixin
│ │ ├── rag_response.py # RagResponseMixin
│ │ ├── rag_ui.py # RagUIMixin
│ │ └── rag_feedback.py # RagFeedbackMixin
│ ├── audio/ # Audio recording and processing
│ │ ├── audio.py # AudioHandler facade (5 mixins)
│ │ ├── mixins/ # Decomposed audio functionality
│ │ │ ├── transcription_mixin.py
│ │ │ ├── recording_mixin.py
│ │ │ ├── processing_mixin.py
│ │ │ ├── device_mixin.py
│ │ │ └── file_mixin.py
│ │ ├── recording_manager.py
│ │ └── periodic_analysis.py
│ ├── core/ # Application core
│ │ ├── app.py # Main application class
│ │ ├── interfaces.py # Runtime-checkable Protocol definitions
│ │ ├── service_registry.py # Dependency injection registry
│ │ ├── protocols.py # AppProtocol for mixin contracts
│ │ ├── app_initializer.py
│ │ ├── controllers/ # 5 controllers (processing, recording, persistence, config, window)
│ │ ├── handlers/ # 4 handlers (finalization, pause/resume, periodic analysis, recovery)
│ │ ├── env_schema.py # 35 env vars documented
│ │ └── config.py
│ ├── database/ # Data persistence
│ │ ├── database.py # Database with file-level security
│ │ ├── db_migrations.py # 17 versioned schema migrations
│ │ ├── db_pool.py # Connection pooling with health checks
│ │ └── mixins/ # Query mixins (recordings, queue, diagnostics)
│ ├── exporters/ # Document export
│ │ ├── fhir_exporter.py # HL7 FHIR R4
│ │ ├── docx_exporter.py # Word documents
│ │ └── rag_exporter.py # RAG document export
│ ├── managers/ # 15 singleton managers
│ │ ├── agent_manager.py
│ │ ├── api_key_manager.py
│ │ ├── notification_manager.py
│ │ ├── vocabulary_manager.py
│ │ ├── tts_manager.py
│ │ ├── autosave_manager.py
│ │ ├── translation_manager.py
│ │ └── ... # + file, log, data_folder, recipient, RAG managers
│ ├── processing/ # Document processing
│ │ ├── processing_queue.py # Queue facade (7 mixins)
│ │ ├── queue_types.py # TypedDict task definitions
│ │ ├── task_executor_mixin.py
│ │ ├── task_lifecycle_mixin.py
│ │ ├── notification_mixin.py
│ │ ├── batch_processing_mixin.py
│ │ ├── reprocessing_mixin.py
│ │ ├── document_generation_mixin.py
│ │ ├── guidelines_processing_mixin.py
│ │ └── generators/ # 12 generators (SOAP, referral, letter, diagnostic, medication, compliance, etc.)
│ ├── rag/ # RAG subsystem (46 modules)
│ │ ├── hybrid_retriever.py
│ │ ├── streaming_retriever.py
│ │ ├── query_expander.py # Medical term expansion
│ │ ├── bm25_search.py # Full-text keyword search
│ │ ├── adaptive_threshold.py
│ │ ├── mmr_reranker.py # Diversity reranking
│ │ ├── conversation_manager.py
│ │ ├── graph_data_provider.py # Neo4j knowledge graph
│ │ ├── guidelines_upload_manager.py
│ │ └── neon_vector_store.py
│ ├── stt_providers/ # Speech-to-text (5 providers + failover)
│ │ ├── base.py
│ │ ├── deepgram.py # Nova-2 Medical
│ │ ├── elevenlabs.py # Scribe v2, diarization
│ │ ├── groq.py # Whisper Large v3 Turbo
│ │ ├── modulate.py # Velma with emotion detection
│ │ ├── whisper.py # Local Whisper
│ │ └── failover.py # Automatic provider failover
│ ├── tts_providers/ # Text-to-speech
│ │ ├── elevenlabs_tts.py # Flash v2.5, Turbo v2.5, Multilingual v2
│ │ └── pyttsx_provider.py # Offline fallback
│ ├── translation/ # Translation providers
│ │ └── deep_translator_provider.py # Google, DeepL, Microsoft
│ ├── ui/ # User interface
│ │ ├── workflow_ui.py # Main UI orchestration
│ │ ├── chat_ui.py # Chat interface
│ │ ├── menu_manager.py # Application menus
│ │ ├── theme_manager.py # Dark/light themes
│ │ ├── dialogs/ # 80+ dialog windows
│ │ └── components/ # Reusable UI components
│ └── utils/ # Utilities
│ ├── security.py # SecurityManager facade
│ ├── security/ # Encryption, key storage, validators, rate limiting
│ ├── resilience.py # Circuit breaker, retry, backoff
│ ├── validation.py # API key patterns, input validation
│ ├── audit_logger.py # HIPAA-compliant audit trail
│ └── structured_logging.py # PHI redaction in logs
├── config/ # Configuration files
├── tests/ # 4,100+ tests
│ ├── unit/ # Component tests
│ └── integration/ # End-to-end tests
└── scripts/ # Build scripts (Windows, macOS, Linux)
| Pattern | Usage |
|---|---|
| Mixin/Facade | Large classes decomposed into focused mixins; facades preserve backward compatibility |
| Protocol Contracts | AppProtocol and runtime-checkable interfaces define the contracts mixins and controllers expect |
| TypedDict Schemas | ProcessingTask, ChatContextData, BatchTaskStatus etc. for type-safe dict structures |
| Provider Pattern | All AI, STT, and TTS providers inherit from base classes for consistent interfaces |
| Singleton Managers | Agent, translation, and API key managers ensure single instances |
| Circuit Breaker | Resilient API calls with automatic failure detection and recovery |
| Security Decorators | Rate limiting and input sanitization applied via decorators |
| Migration System | Database schema evolution with versioned migrations |
| Observer Pattern | UI updates via event-driven architecture with thread-safe scheduling |
| Service Registry | Dependency injection via ServiceRegistry decouples controllers from the main app class |
| Controller Pattern | 5 controllers (processing, recording, persistence, config, window) encapsulate domain logic |
| Queue System | Background processing with priority, stale task eviction, and batch tracking |
Audio Input → STT Provider (failover chain) → Transcript → AI Processing → Document Generation
↓ ↓
Emotion Data* Agent System (8 agents)
↓ ↓
SOAP Integration Database Storage → Export (PDF/DOCX/FHIR)
↓
RAG Vector Store → Knowledge Graph (Neo4j)
↓
Hybrid Search (vector + BM25 + graph)
* Voice emotion analysis available with Modulate (Velma) STT provider
# Install test dependencies
pip install -r requirements-dev.txt
# Run all tests with pytest
PYTHONPATH=src pytest tests/unit/ tests/integration/
# Run with coverage report
PYTHONPATH=src pytest --cov=src --cov-report=html
# Run specific test suites
PYTHONPATH=src pytest tests/unit/
PYTHONPATH=src pytest tests/integration/
# Run specific test files
PYTHONPATH=src pytest tests/unit/test_audio_extended.py
PYTHONPATH=src pytest tests/unit/test_processing_queue.py
PYTHONPATH=src pytest tests/unit/test_stt_providers/| Suite | Tests | Coverage |
|---|---|---|
| Validation & Config | 300+ | Input validation, settings roundtrip, configuration |
| Exporters | 137 | PDF, DOCX, FHIR R4, RAG export |
| Error Handling | 127 | Structured errors, recovery, logging |
| AI & Chat | 150+ | Chat processor, base agent, medication prompts |
| Audio & Recording | 100+ | Audio handler, prefix caching, mixin decomposition |
| STT Providers | 150+ | Deepgram, ElevenLabs, Groq, Modulate, Whisper, failover |
| Processing Queue | 90+ | Task lifecycle, batch processing, stale eviction, thread safety |
| RAG & Documents | 140+ | Document CRUD, hybrid search, query expansion, RAG processor |
| Security | 50+ | Encryption, key migration, validation, rate limiting |
| Differential & NER | 170+ | Differential tracker, medical NER, state machine |
| Letter Generation | 50 | All letter types, edge cases, template rendering |
| Periodic Analysis | 57 | Timer management, segment extraction, cleanup |
| TTS & Translation | 115+ | Provider management, safe methods, fallbacks, sessions |
| Structured Logging | 79 | PHI redaction, log formatting |
| Integration | 29 | Settings roundtrip, API key crypto, DB migrations |
The project uses GitHub Actions for continuous integration:
- Tests Workflow: Runs on every push and PR across Windows, macOS, and Linux
- Build Workflow: Builds executables for all platforms
- CodeQL: Security scanning for vulnerabilities
| Issue | Solution |
|---|---|
| API Connection Errors | Verify API keys in Settings → API Keys. Check internet connection. |
| Audio Not Recording | Check microphone permissions. Verify FFmpeg installation. Select correct input device. |
| Transcription Errors | Try a different STT provider. Check audio quality. Ensure API key is valid. |
| Only One Speaker Detected | In ElevenLabs Settings, leave "Number of Speakers" empty for auto-detection. Lower the "Diarization Threshold" (e.g. 0.3) for more sensitive speaker separation. |
| Ollama Timeouts | Use smaller model variants. Check system resources. Increase timeout in settings. |
| Queue Stuck | Check logs via Help → View Logs. Restart application if needed. |
| Theme Not Changing | Restart application after theme change for full effect. |
| Export Failures | Check write permissions for output directory. Ensure sufficient disk space. |
- Application Logs: Help → View Logs → View Log Contents
- Log Location: Help → View Logs → Open Logs Folder
- GitHub Issues: Report bugs or request features
For detailed logging, set the environment variable:
export LOG_LEVEL=DEBUG # Linux/macOS
set LOG_LEVEL=DEBUG # Windows| Requirement | Minimum | Recommended |
|---|---|---|
| Operating System | Windows 10, macOS 10.14, Ubuntu 20.04 | Windows 11, macOS 13+, Ubuntu 22.04 |
| Python | 3.10 | 3.11+ |
| Memory | 4GB RAM | 8GB+ RAM (16GB with local Whisper) |
| Storage | 500MB | 2GB+ for recordings and RAG database |
| Internet | Required for cloud AI | Optional with Ollama + local Whisper |
| Audio | Any microphone | USB condenser microphone |
| Optional | - | PostgreSQL (Neon) for RAG, Neo4j for knowledge graph |
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes with clear commit messages
- Run tests:
pytest - Push to your fork:
git push origin feature/amazing-feature - Submit a Pull Request
# Clone your fork
git clone https://github.com/YOUR_USERNAME/Medical-Assistant.git
cd Medical-Assistant
# Create virtual environment
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
# Install development dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Run tests to verify setup
pytest- Keyboard Shortcuts
- Desktop Shortcuts - Creating application shortcuts
- Security Features
- Agent System
- Medication Agent
- Testing Guide
- CLAUDE.md - Development guide for AI assistants
Distributed under the MIT License. See LICENSE for more information.
- OpenAI - GPT models and Whisper
- Anthropic - Claude models
- Google AI - Gemini models
- Deepgram - Nova-2 Medical STT
- ElevenLabs - Scribe STT and TTS
- Groq - Fast LLM and Whisper inference
- Cerebras - Wafer-scale LLM inference
- Modulate.ai - Velma voice emotion detection
- Ollama - Local model hosting
- Neon - Serverless PostgreSQL with pgvector
- Neo4j - Knowledge graph database
- ttkbootstrap - Modern UI themes
Made with care for healthcare professionals