Medical Assistant

Medical Assistant is a comprehensive desktop application for medical documentation, designed to transcribe and refine spoken medical notes. It leverages multiple AI providers (OpenAI, Anthropic/Claude, Google Gemini, Groq, Cerebras, and Ollama) with a modular, mixin-based architecture (~150K LOC across 420+ modules) for efficient audio-to-text conversion, clinical note generation, and intelligent medical analysis.

Features

Core Features

Feature	Description
Workflow-Based Interface	Sidebar navigation with 8 sections (Record, SOAP Note, Referral, Letter, Chat, RAG, Recordings, Analysis) plus 6 text editor tabs and 6 SOAP analysis sub-tabs
Unified Preferences	Comprehensive settings dialog (`Ctrl+,`) with tabbed interface for API keys, audio settings, AI models, prompts, and storage
AI-Powered Chat	ChatGPT-style interface with context-aware suggestions for interacting with your medical notes
RAG Document Search	Hybrid vector + keyword search with medical query expansion, adaptive thresholds, and MMR diversity
Knowledge Graph	Interactive visualization of medical entities and relationships from Neo4j
RSVP Reader	Speed-reading interface for SOAP notes with ORP highlighting and section navigation
Advanced Recording	Record medical conversations with visual feedback, waveform display, timer, and pause/resume capabilities
Real-Time Analysis	Optional periodic analysis during recording generates differential diagnoses every 2 minutes
Queue System	Background processing queue with "Quick Continue Mode" for efficient multi-patient recording sessions
Batch Processing	Process multiple recordings or audio files at once with progress tracking and statistics
Recordings Manager	Dedicated tab with search, filter, and document status indicators (✓, —, 🔄, ❌)

Medical Documentation

Context-Aware SOAP Notes
- Side panel for adding previous medical information
- Automatically integrates patient history into SOAP note generation
- Smart context preservation during recordings
- Voice emotion analysis integration (when using Modulate STT) — patient emotional state woven into Subjective section
ICD Code Integration
- Choose between ICD-9, ICD-10, or both code versions
- Automatic code suggestions based on diagnoses
Smart Templates
- Pre-built templates: Follow-up, New Patient, Telehealth, Emergency, Pediatric, Geriatric
- Create and save custom context templates
- Template import/export functionality
Multi-Format Document Generation
- SOAP notes with customizable sections
- Professional referral letters
- Patient correspondence
- Employer/insurance documentation

AI Agents

Medical Assistant includes specialized AI agents for different clinical tasks:

Agent	Capabilities
Medication Analysis	Extract medications from text, check drug-drug interactions with severity levels, validate dosing appropriateness, suggest alternatives, generate prescriptions
Diagnostic Agent	Analyze symptoms, generate differential diagnoses ranked by likelihood, provide ICD codes, suggest diagnostic workups
Compliance Agent	Audit SOAP notes against clinical documentation standards, flag missing elements, score completeness
Data Extraction	Extract structured clinical data (vitals, labs, medications, diagnoses, allergies) from unstructured text
Clinical Workflow	Step-by-step guidance for patient intake, diagnostic workups, treatment protocols, and follow-up care with interactive checklists
Referral Agent	Generate professional referral letters with address book integration and specialty inference
Synopsis Agent	Generate concise SOAP note summaries for quick review
Chat Agent	Conversational AI with tool use for document editing, context-aware responses

Referral & Address Book

Address Book Management: Store and manage referral recipients (specialists, facilities, labs)
CSV Contact Import: Bulk import contacts with field mapping
Searchable Recipients: Quick search and selection when creating referrals
Smart Specialty Inference: Automatically suggests appropriate specialists based on clinical content
Contact Categories: Organize by specialty, facility type, or custom categories

RAG & Knowledge Graph

Hybrid Search: Combines vector similarity (pgvector), BM25 keyword search, and knowledge graph traversal with configurable weights
Medical Query Expansion: Automatic expansion of medical abbreviations (HTN, COPD, MI) and synonyms for better recall
Adaptive Thresholds: Dynamically adjusts similarity cutoffs based on query length and score distribution
MMR Reranking: Maximal Marginal Relevance ensures diverse, non-redundant results
Knowledge Graph Visualization: Interactive pan/zoom/drag graph canvas showing entities (medications, conditions, procedures) and relationships from Neo4j
Clinical Guidelines: Upload and search clinical guideline PDFs with chunking, OCR support, and recommendation extraction
Streaming Responses: Progressive result display with cancellation support
Conversation Context: Semantic follow-up detection maintains context across queries

Bidirectional Translation Assistant

Real-time medical translation for multilingual patient consultations:

Real-time Translation: Automatic translation as you type with smart debouncing
Speech-to-Text: Record patient speech with automatic transcription
Text-to-Speech: Play translated responses for patients with voice selection
Language Support: 100+ languages with automatic detection
Canned Responses: Customizable quick responses for common medical phrases organized by category
Conversation Export: Save conversation transcripts for documentation

AI & Transcription

LLM Providers (Modular Architecture)

Provider	Models	Features
OpenAI	GPT-4o, GPT-4o-mini, GPT-4 Turbo	Streaming, function calling, dynamic model fetch
Anthropic	Claude Opus 4, Claude Sonnet 4, Claude Haiku 4	Extended context, dynamic model fetch
Google Gemini	Gemini 2.0 Flash, Gemini 1.5 Pro, Gemini 1.5 Flash	Multimodal, long context, dynamic model fetch
Groq	Llama 3.3 70B, Mixtral 8x7B, Gemma2 9B	Ultra-fast inference, dynamic model fetch
Cerebras	Llama 3.3 70B, Qwen 3 32B	Wafer-scale inference, dynamic model fetch
Ollama	Llama 3, Mistral, Qwen, Phi-3, etc.	Local/offline, privacy-focused, auto-detect

Intelligent Provider Routing: Automatic fallback and provider selection based on model configuration
Dynamic Model Lists: Models fetched from provider APIs with TTL caching (1 hour) and fallback lists
Streaming Support: Real-time response streaming for faster perceived performance

Speech-to-Text Providers

Provider	Model	Best For
Deepgram	Nova-2 Medical	Medical terminology accuracy, HIPAA-eligible
ElevenLabs	Scribe v2	High accuracy, speaker diarization, entity detection, keyterm prompting
Groq	Whisper Large v3 Turbo	Speed (216x real-time), cost-effective
Modulate (Velma)	Velma Transcribe	Voice emotion detection (20+ emotions), speaker diarization, deepfake detection, PII/PHI redaction
Local Whisper	Turbo	Offline capability, privacy

Text-to-Speech

ElevenLabs Integration: Multiple voice options with natural speech
Model Selection: Flash v2.5 (ultra-low latency), Turbo v2.5, Multilingual v2
Offline Fallback: pyttsx3 for offline TTS capability

Technical Features

Mixin-Based Architecture: Large classes decomposed into focused mixins (AudioHandler: 5 mixins, ProcessingQueue: 7 mixins, RagProcessor: 4 mixins) with Protocol contracts
Type Safety: TypedDict definitions for processing queue tasks, chat context, and guideline batches; runtime-checkable AppProtocol for mixin boundaries
Secure API Key Storage: Fernet encryption with PBKDF2 key derivation, per-installation salt, machine-specific keys, legacy salt migration
Security Decorators: Rate limiting, input sanitization with prompt injection detection, and secure API call wrappers
PHI Redaction: Automatic redaction of 60+ sensitive field types in application logs and audit trail
Audit Logging: Append-only HIPAA-compliant audit log tracking API key access, data exports, and recording operations
Database Storage: SQLite with FTS5 full-text search, versioned migrations (17 versions), connection pooling with health checks
Resilient API Calls: Circuit breaker pattern, exponential backoff, automatic retry, and STT provider failover chain
Export Functionality: Export recordings and documents in PDF, DOCX, and text formats
FHIR Support: Export clinical data in HL7 FHIR R4 format (Patient, Encounter, Condition, Observation, MedicationStatement, DocumentReference)
Performance Optimizations: HTTP/2 support, connection pooling, thread pool executors, background processing queue with priority scheduling
Import Guards: Optional dependencies (pygame, soundcard, fhir.resources, docx, reportlab) guarded with availability flags
Cross-Platform: Windows, macOS, and Linux with platform-specific optimizations
Comprehensive Test Suite: 4,100+ tests (unit + integration) with 50%+ critical path coverage
Modern UI/UX: Built with Tkinter and ttkbootstrap featuring animations, visual indicators, dark/light themes

Quick Start

# 1. Clone the repository
git clone https://github.com/cortexuvula/Medical-Assistant.git
cd Medical-Assistant

# 2. Install dependencies
pip install -r requirements.txt

# 3. Run the application
python main.py

# 4. Configure API keys via Settings → API Keys (keys are encrypted)

Minimum Requirements: At least one LLM provider API key (OpenAI, Anthropic, Gemini, Groq, or Cerebras) and one STT provider API key (Deepgram, ElevenLabs, Groq, or Modulate). Local Whisper and Ollama work without API keys.

Installation

Prerequisites

Python 3.10 or higher (required for SDK compatibility)
FFmpeg (for audio processing)

Step-by-Step Installation

Clone or Download the Repository

git clone https://github.com/cortexuvula/Medical-Assistant.git
cd Medical-Assistant

Create Virtual Environment (Recommended)

python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

Install Dependencies
```
pip install -r requirements.txt
```

Configure API Keys

Option A - Application Dialog (Recommended):

Launch the application: python main.py
Go to Settings → API Keys
Enter your API keys (they are encrypted automatically)

Option B - Environment File: Create a .env file in the project root:

# LLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=AI...
GROQ_API_KEY=gsk_...
CEREBRAS_API_KEY=csk-...

# Speech-to-Text Providers
DEEPGRAM_API_KEY=...
ELEVENLABS_API_KEY=...
MODULATE_API_KEY=...

# Optional: Local Models
OLLAMA_API_URL=http://localhost:11434

# Optional: RAG Integration
NEON_DATABASE_URL=postgresql://user:pass@host/database?sslmode=require

# Optional: Knowledge Graph
NEO4J_BOLT_URL=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=...

Install FFmpeg

Platform Command

Windows winget install FFmpeg or download from ffmpeg.org

macOS brew install ffmpeg

Ubuntu/Debian sudo apt install ffmpeg

Fedora sudo dnf install ffmpeg

Ollama Setup (Optional)

For local AI models without internet dependency:

# Install Ollama from https://ollama.ai

# Pull models
ollama pull llama3
ollama pull mistral
ollama pull qwen2

# Models are automatically detected by the application

Pre-built Releases

Download pre-built executables from the Releases page:

Platform	File	Notes
Windows	`MedicalAssistant.exe`	Requires FFmpeg installation
macOS	`MedicalAssistant-macOS.zip`	FFmpeg bundled, may require security approval
Linux	`MedicalAssistant`	Requires system FFmpeg

Release Notes

Executables include all Python dependencies
API keys configured via the application's settings dialog
First run may be slower due to antivirus scanning
macOS users: Right-click → Open to bypass Gatekeeper on first run

Building from Source

Prerequisites

Python 3.10+ with pip
All dependencies: pip install -r requirements.txt

Build Commands

Windows:

scripts\build_windows.bat

macOS:

chmod +x scripts/build_macos.sh
./scripts/build_macos.sh

Linux:

chmod +x scripts/build_linux.sh
./scripts/build_linux.sh

Executables are output to the dist/ directory.

Usage Guide

Launching the Application

python main.py

Sidebar Navigation

The sidebar provides quick access to all application sections:

Section	Description
Record	Start/stop/pause recordings with waveform display, timer, microphone selection, and optional real-time differential diagnosis every 2 minutes
SOAP Note	View and edit generated SOAP notes with 6 analysis sub-tabs (Medication Analysis, Differential Diagnosis, Clinical Guidelines, Emotional Assessment, ICD Validation, Medication QA)
Referral	Create professional referral letters with address book integration
Letter	Generate formal medical correspondence (patient, employer, insurance)
Chat	AI chat interface with context-aware suggestions, uses current document as context (`Ctrl+/` to focus)
RAG	Hybrid document search with medical query expansion, knowledge graph visualization, and clinical guidelines
Recordings	Search, filter, and manage recordings with status indicators and batch processing
Analysis	Advanced analysis tools: diagnostic analysis, medication analysis, data extraction, clinical workflows

Text Editor Tabs

Six editor tabs display and allow editing of content: Transcript, SOAP Note, Referral, Letter, Chat, and RAG.

Tools (Sidebar)

Tool	Function
Refine Text	Clean up transcribed text with AI assistance
Improve Text	Enhance clarity and medical terminology
Medication	Extract medications, check interactions, validate dosing
Diagnostic	Analyze symptoms and generate differential diagnoses
Workflow	Step-by-step clinical process guidance
Translation	Bidirectional medical translation assistant
Data Extract	Extract structured data (vitals, labs, medications)

Context Panel

The context panel provides additional information for SOAP note generation:

Click the Context button to open the side panel
Add previous medical history, current medications, allergies
Select from pre-built templates or create custom ones
Context is automatically integrated into SOAP note generation
Context persists during recording sessions

Address Book

Manage referral recipients efficiently:

Access via Tools → Manage Address Book
Add specialists, facilities, and labs with contact details
Import contacts from CSV: Tools → Import Contacts from CSV
Quick search when creating referrals
Automatic specialty inference from clinical content

Translation Assistant

For multilingual patient consultations:

Access via Tools → Translation Assistant
Select patient and doctor languages from dropdowns
Record Patient Speech: Click microphone to record, transcribes automatically
Type Responses: Real-time translation as you type
Play Translation: Click speaker icon for TTS playback
Canned Responses: Use quick responses for common phrases
Export: Save conversation transcript

RAG Document Search

Query your document database with hybrid search:

Navigate to the RAG tab
Enter your search query (medical abbreviations are automatically expanded)
Results rendered in markdown with source attribution
Copy responses with the copy button
Follow-up queries maintain conversation context
Click Knowledge Graph to visualize entity relationships
Upload clinical guidelines via the Guidelines tab for searchable reference

Configuration

Unified Preferences (Ctrl+,)

Access all settings through the comprehensive Preferences dialog:

Tab	Settings
API Keys	All LLM keys (OpenAI, Anthropic, Gemini, Groq, Cerebras) and STT keys (Deepgram, ElevenLabs, Groq, Modulate)
Audio & STT	Provider settings (ElevenLabs, Deepgram, Groq, Modulate), TTS voice selection, audio quality
AI Models	Temperature settings per task, model selection, translation provider configuration
Prompts	Customize Refine, Improve, SOAP, Referral, and Advanced Analysis prompts
Storage	Default folder, Custom Vocabulary, Address Book management, Prefix Audio
General	Quick Continue Mode, Theme selection, Sidebar preferences, Keyboard shortcuts

Settings Menu Structure

Settings
├── Preferences...              [Ctrl+,]
├── ─────────────
├── Update API Keys             [Quick access]
├── ─────────────
├── Audio & Transcription ▸
│   ├── ElevenLabs Settings
│   ├── Deepgram Settings
│   ├── Groq Settings
│   ├── Modulate Settings
│   └── TTS Settings
├── AI & Models ▸
│   ├── Temperature Settings
│   ├── Agent Settings
│   ├── Translation Settings
│   └── MCP Tools
├── Prompt Settings ▸
│   ├── Refine Prompt Settings
│   ├── Improve Prompt Settings
│   ├── SOAP Note Settings
│   ├── Referral Settings
│   └── Advanced Analysis Settings
├── Data & Storage ▸
│   ├── Custom Vocabulary
│   ├── Manage Address Book...
│   ├── Import Contacts from CSV...
│   ├── Set Storage Folder
│   └── Record Prefix Audio
├── ─────────────
├── Export Prompts
├── Import Prompts
├── ─────────────
├── Quick Continue Mode         [Toggle]
└── Toggle Theme                [Alt+T]

Keyboard Shortcuts

Shortcut	Action
`Ctrl+,` / `Cmd+,`	Open Preferences dialog
`Ctrl+/` / `Cmd+/`	Focus chat input
`Ctrl+N` / `Cmd+N`	New session
`Ctrl+S` / `Cmd+S`	Save
`Ctrl+Z` / `Cmd+Z`	Undo
`Ctrl+Y` / `Cmd+Shift+Z`	Redo
`Ctrl+E` / `Cmd+E`	Export as PDF
`Ctrl+D` / `Cmd+D`	Run Diagnostic Analysis
`Alt+T`	Toggle dark/light theme

See SHORTCUTS.md for the complete list.

Security

API Key Protection

Fernet Encryption: API keys encrypted at rest using cryptography library with PBKDF2 (100K iterations)
Per-Installation Salt: Unique 256-bit salt per installation (stored in salt.bin)
Machine-Specific Keys: Encryption keys derived from machine identifiers (machine-id, filesystem UUID)
Legacy Migration: Automatic migration from old static salt to per-install salt with version tracking
No Plaintext Storage: Keys are never stored in plaintext on disk

Security Features

Rate Limiting: Per-provider rate limiting with configurable limits (e.g., 60 calls/minute for Anthropic)
Input Sanitization: Prompt injection detection with optional strict mode that rejects dangerous content
API Key Validation: Format validation with regex patterns for known provider key formats (OpenAI, Anthropic, Groq, Cerebras, Gemini)
PHI Redaction: 60+ sensitive field types automatically redacted in application logs
Audit Logging: Append-only audit trail tracking sensitive operations (API key access, data exports)
Database Protection: POSIX 0600 permissions and Windows ACL enforcement on database files
Path Traversal Protection: File paths validated after resolution to prevent encoded traversal attacks
Secure HTTP: Explicit TLS verification on all HTTPS clients via centralized client manager
No Data Transmission: Patient data only sent to configured AI providers
Local Processing Options: Use Ollama + local Whisper for completely offline operation

Best Practices

Use the in-app API key dialog (encrypted) rather than .env files
Keep API keys confidential and rotate them periodically
Use local Whisper or Ollama for sensitive data when possible
Review provider privacy policies for HIPAA compliance requirements
Monitor the audit log (audit.log) for unexpected access patterns

Healthcare Standards

FHIR R4 Support

Medical Assistant supports HL7 FHIR R4 for healthcare interoperability:

Export to FHIR: File → Export → Export as FHIR...
Clipboard Export: File → Export → Export FHIR to Clipboard
Supported Resources:
- Patient
- Encounter
- Condition (diagnoses)
- Observation (vitals, labs)
- MedicationStatement
- DocumentReference (SOAP notes)

ICD Code Support

ICD-9: Legacy code support for historical records
ICD-10: Current standard with automatic suggestions
Dual Coding: Generate both ICD-9 and ICD-10 codes simultaneously

Architecture

Project Structure

Medical-Assistant/
├── main.py                    # Application entry point
├── src/                       # ~150K LOC across 420+ modules
│   ├── ai/                    # AI providers and processors
│   │   ├── agents/            # 8 specialized AI agents
│   │   │   ├── base.py        # Base agent with caching, validation
│   │   │   ├── medication.py  # Drug interactions, dosing
│   │   │   ├── diagnostic.py  # Differential diagnosis
│   │   │   ├── compliance.py  # Documentation audit
│   │   │   ├── workflow.py    # Clinical workflows
│   │   │   ├── chat.py        # Conversational with tool use
│   │   │   ├── synopsis.py    # SOAP note summarization
│   │   │   ├── referral.py    # Referral letter generation
│   │   │   ├── data_extraction.py # Structured clinical data extraction
│   │   │   └── models.py      # AgentConfig, AgentTask, AgentResponse
│   │   ├── providers/         # Modular AI provider implementations
│   │   │   ├── openai_provider.py
│   │   │   ├── anthropic_provider.py
│   │   │   ├── gemini_provider.py
│   │   │   ├── groq_provider.py
│   │   │   ├── cerebras_provider.py
│   │   │   ├── ollama_provider.py
│   │   │   └── router.py      # Intelligent provider routing
│   │   ├── ai_processor.py    # Core AI processing logic
│   │   ├── chat_processor.py  # Chat with TypedDict context
│   │   ├── rag_processor.py   # RAG facade (4 mixins)
│   │   ├── rag_query.py       # RagQueryMixin
│   │   ├── rag_response.py    # RagResponseMixin
│   │   ├── rag_ui.py          # RagUIMixin
│   │   └── rag_feedback.py    # RagFeedbackMixin
│   ├── audio/                  # Audio recording and processing
│   │   ├── audio.py           # AudioHandler facade (5 mixins)
│   │   ├── mixins/            # Decomposed audio functionality
│   │   │   ├── transcription_mixin.py
│   │   │   ├── recording_mixin.py
│   │   │   ├── processing_mixin.py
│   │   │   ├── device_mixin.py
│   │   │   └── file_mixin.py
│   │   ├── recording_manager.py
│   │   └── periodic_analysis.py
│   ├── core/                   # Application core
│   │   ├── app.py             # Main application class
│   │   ├── interfaces.py      # Runtime-checkable Protocol definitions
│   │   ├── service_registry.py # Dependency injection registry
│   │   ├── protocols.py       # AppProtocol for mixin contracts
│   │   ├── app_initializer.py
│   │   ├── controllers/       # 5 controllers (processing, recording, persistence, config, window)
│   │   ├── handlers/          # 4 handlers (finalization, pause/resume, periodic analysis, recovery)
│   │   ├── env_schema.py      # 35 env vars documented
│   │   └── config.py
│   ├── database/               # Data persistence
│   │   ├── database.py        # Database with file-level security
│   │   ├── db_migrations.py   # 17 versioned schema migrations
│   │   ├── db_pool.py         # Connection pooling with health checks
│   │   └── mixins/            # Query mixins (recordings, queue, diagnostics)
│   ├── exporters/              # Document export
│   │   ├── fhir_exporter.py   # HL7 FHIR R4
│   │   ├── docx_exporter.py   # Word documents
│   │   └── rag_exporter.py    # RAG document export
│   ├── managers/               # 15 singleton managers
│   │   ├── agent_manager.py
│   │   ├── api_key_manager.py
│   │   ├── notification_manager.py
│   │   ├── vocabulary_manager.py
│   │   ├── tts_manager.py
│   │   ├── autosave_manager.py
│   │   ├── translation_manager.py
│   │   └── ...                # + file, log, data_folder, recipient, RAG managers
│   ├── processing/             # Document processing
│   │   ├── processing_queue.py # Queue facade (7 mixins)
│   │   ├── queue_types.py     # TypedDict task definitions
│   │   ├── task_executor_mixin.py
│   │   ├── task_lifecycle_mixin.py
│   │   ├── notification_mixin.py
│   │   ├── batch_processing_mixin.py
│   │   ├── reprocessing_mixin.py
│   │   ├── document_generation_mixin.py
│   │   ├── guidelines_processing_mixin.py
│   │   └── generators/        # 12 generators (SOAP, referral, letter, diagnostic, medication, compliance, etc.)
│   ├── rag/                    # RAG subsystem (46 modules)
│   │   ├── hybrid_retriever.py
│   │   ├── streaming_retriever.py
│   │   ├── query_expander.py   # Medical term expansion
│   │   ├── bm25_search.py     # Full-text keyword search
│   │   ├── adaptive_threshold.py
│   │   ├── mmr_reranker.py    # Diversity reranking
│   │   ├── conversation_manager.py
│   │   ├── graph_data_provider.py  # Neo4j knowledge graph
│   │   ├── guidelines_upload_manager.py
│   │   └── neon_vector_store.py
│   ├── stt_providers/          # Speech-to-text (5 providers + failover)
│   │   ├── base.py
│   │   ├── deepgram.py        # Nova-2 Medical
│   │   ├── elevenlabs.py      # Scribe v2, diarization
│   │   ├── groq.py            # Whisper Large v3 Turbo
│   │   ├── modulate.py        # Velma with emotion detection
│   │   ├── whisper.py         # Local Whisper
│   │   └── failover.py        # Automatic provider failover
│   ├── tts_providers/          # Text-to-speech
│   │   ├── elevenlabs_tts.py  # Flash v2.5, Turbo v2.5, Multilingual v2
│   │   └── pyttsx_provider.py # Offline fallback
│   ├── translation/            # Translation providers
│   │   └── deep_translator_provider.py  # Google, DeepL, Microsoft
│   ├── ui/                     # User interface
│   │   ├── workflow_ui.py     # Main UI orchestration
│   │   ├── chat_ui.py         # Chat interface
│   │   ├── menu_manager.py    # Application menus
│   │   ├── theme_manager.py   # Dark/light themes
│   │   ├── dialogs/           # 80+ dialog windows
│   │   └── components/        # Reusable UI components
│   └── utils/                  # Utilities
│       ├── security.py        # SecurityManager facade
│       ├── security/          # Encryption, key storage, validators, rate limiting
│       ├── resilience.py      # Circuit breaker, retry, backoff
│       ├── validation.py      # API key patterns, input validation
│       ├── audit_logger.py    # HIPAA-compliant audit trail
│       └── structured_logging.py  # PHI redaction in logs
├── config/                     # Configuration files
├── tests/                      # 4,100+ tests
│   ├── unit/                  # Component tests
│   └── integration/           # End-to-end tests
└── scripts/                    # Build scripts (Windows, macOS, Linux)

Key Design Patterns

Pattern	Usage
Mixin/Facade	Large classes decomposed into focused mixins; facades preserve backward compatibility
Protocol Contracts	`AppProtocol` and runtime-checkable interfaces define the contracts mixins and controllers expect
TypedDict Schemas	`ProcessingTask`, `ChatContextData`, `BatchTaskStatus` etc. for type-safe dict structures
Provider Pattern	All AI, STT, and TTS providers inherit from base classes for consistent interfaces
Singleton Managers	Agent, translation, and API key managers ensure single instances
Circuit Breaker	Resilient API calls with automatic failure detection and recovery
Security Decorators	Rate limiting and input sanitization applied via decorators
Migration System	Database schema evolution with versioned migrations
Observer Pattern	UI updates via event-driven architecture with thread-safe scheduling
Service Registry	Dependency injection via `ServiceRegistry` decouples controllers from the main app class
Controller Pattern	5 controllers (processing, recording, persistence, config, window) encapsulate domain logic
Queue System	Background processing with priority, stale task eviction, and batch tracking

Data Flow

Audio Input → STT Provider (failover chain) → Transcript → AI Processing → Document Generation
                  ↓                                              ↓
           Emotion Data*                                  Agent System (8 agents)
                  ↓                                              ↓
           SOAP Integration                         Database Storage → Export (PDF/DOCX/FHIR)
                                                         ↓
                                                  RAG Vector Store → Knowledge Graph (Neo4j)
                                                         ↓
                                                  Hybrid Search (vector + BM25 + graph)

* Voice emotion analysis available with Modulate (Velma) STT provider

Testing

Running Tests

# Install test dependencies
pip install -r requirements-dev.txt

# Run all tests with pytest
PYTHONPATH=src pytest tests/unit/ tests/integration/

# Run with coverage report
PYTHONPATH=src pytest --cov=src --cov-report=html

# Run specific test suites
PYTHONPATH=src pytest tests/unit/
PYTHONPATH=src pytest tests/integration/

# Run specific test files
PYTHONPATH=src pytest tests/unit/test_audio_extended.py
PYTHONPATH=src pytest tests/unit/test_processing_queue.py
PYTHONPATH=src pytest tests/unit/test_stt_providers/

Test Suite (4,100+ tests)

Suite	Tests	Coverage
Validation & Config	300+	Input validation, settings roundtrip, configuration
Exporters	137	PDF, DOCX, FHIR R4, RAG export
Error Handling	127	Structured errors, recovery, logging
AI & Chat	150+	Chat processor, base agent, medication prompts
Audio & Recording	100+	Audio handler, prefix caching, mixin decomposition
STT Providers	150+	Deepgram, ElevenLabs, Groq, Modulate, Whisper, failover
Processing Queue	90+	Task lifecycle, batch processing, stale eviction, thread safety
RAG & Documents	140+	Document CRUD, hybrid search, query expansion, RAG processor
Security	50+	Encryption, key migration, validation, rate limiting
Differential & NER	170+	Differential tracker, medical NER, state machine
Letter Generation	50	All letter types, edge cases, template rendering
Periodic Analysis	57	Timer management, segment extraction, cleanup
TTS & Translation	115+	Provider management, safe methods, fallbacks, sessions
Structured Logging	79	PHI redaction, log formatting
Integration	29	Settings roundtrip, API key crypto, DB migrations

CI/CD

The project uses GitHub Actions for continuous integration:

Tests Workflow: Runs on every push and PR across Windows, macOS, and Linux
Build Workflow: Builds executables for all platforms
CodeQL: Security scanning for vulnerabilities

Troubleshooting

Common Issues

Issue	Solution
API Connection Errors	Verify API keys in Settings → API Keys. Check internet connection.
Audio Not Recording	Check microphone permissions. Verify FFmpeg installation. Select correct input device.
Transcription Errors	Try a different STT provider. Check audio quality. Ensure API key is valid.
Only One Speaker Detected	In ElevenLabs Settings, leave "Number of Speakers" empty for auto-detection. Lower the "Diarization Threshold" (e.g. 0.3) for more sensitive speaker separation.
Ollama Timeouts	Use smaller model variants. Check system resources. Increase timeout in settings.
Queue Stuck	Check logs via Help → View Logs. Restart application if needed.
Theme Not Changing	Restart application after theme change for full effect.
Export Failures	Check write permissions for output directory. Ensure sufficient disk space.

Getting Help

Application Logs: Help → View Logs → View Log Contents
Log Location: Help → View Logs → Open Logs Folder
GitHub Issues: Report bugs or request features

Debug Mode

For detailed logging, set the environment variable:

export LOG_LEVEL=DEBUG  # Linux/macOS
set LOG_LEVEL=DEBUG     # Windows

System Requirements

Requirement	Minimum	Recommended
Operating System	Windows 10, macOS 10.14, Ubuntu 20.04	Windows 11, macOS 13+, Ubuntu 22.04
Python	3.10	3.11+
Memory	4GB RAM	8GB+ RAM (16GB with local Whisper)
Storage	500MB	2GB+ for recordings and RAG database
Internet	Required for cloud AI	Optional with Ollama + local Whisper
Audio	Any microphone	USB condenser microphone
Optional	-	PostgreSQL (Neon) for RAG, Neo4j for knowledge graph

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes with clear commit messages
Run tests: pytest
Push to your fork: git push origin feature/amazing-feature
Submit a Pull Request

Development Setup

# Clone your fork
git clone https://github.com/YOUR_USERNAME/Medical-Assistant.git
cd Medical-Assistant

# Create virtual environment
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Install development dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Run tests to verify setup
pytest

Documentation

Keyboard Shortcuts
Desktop Shortcuts - Creating application shortcuts
Security Features
Agent System
Medication Agent
Testing Guide
CLAUDE.md - Development guide for AI assistants

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgments

OpenAI - GPT models and Whisper
Anthropic - Claude models
Google AI - Gemini models
Deepgram - Nova-2 Medical STT
ElevenLabs - Scribe STT and TTS
Groq - Fast LLM and Whisper inference
Cerebras - Wafer-scale LLM inference
Modulate.ai - Velma voice emotion detection
Ollama - Local model hosting
Neon - Serverless PostgreSQL with pgvector
Neo4j - Knowledge graph database
ttkbootstrap - Modern UI themes

Made with care for healthcare professionals

Name		Name	Last commit message	Last commit date
Latest commit History 807 Commits
.github/workflows		.github/workflows
config		config
docs		docs
examples		examples
hooks		hooks
scripts		scripts
src		src
tests		tests
.bandit.yml		.bandit.yml
.coveragerc		.coveragerc
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.safety-policy.yml		.safety-policy.yml
CLAUDE.md		CLAUDE.md
DESKTOP_SHORTCUTS.md		DESKTOP_SHORTCUTS.md
LICENSE		LICENSE
MedicalAssistant.vbs		MedicalAssistant.vbs
README.md		README.md
SHORTCUTS.md		SHORTCUTS.md
THREADING_DESIGN.md		THREADING_DESIGN.md
entitlements.plist		entitlements.plist
env.example		env.example
icon.icns		icon.icns
icon.ico		icon.ico
icon128x128.ico		icon128x128.ico
icon16x16.ico		icon16x16.ico
icon256x256.ico		icon256x256.ico
icon32x32.ico		icon32x32.ico
icon48x48.ico		icon48x48.ico
main.py		main.py
manage_keys.py		manage_keys.py
medical-assistant.desktop		medical-assistant.desktop
medical_assistant.spec		medical_assistant.spec
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements-linux.txt		requirements-linux.txt
requirements.txt		requirements.txt
update_imports.py		update_imports.py
verify_dedicated_executors.py		verify_dedicated_executors.py

Platform	Command
Windows	`winget install FFmpeg` or download from ffmpeg.org
macOS	`brew install ffmpeg`
Ubuntu/Debian	`sudo apt install ffmpeg`
Fedora	`sudo dnf install ffmpeg`

Folders and files

Latest commit

History

Repository files navigation

Medical Assistant

Table of Contents

Features

Core Features

Medical Documentation

AI Agents

Referral & Address Book

RAG & Knowledge Graph

Bidirectional Translation Assistant

AI & Transcription

LLM Providers (Modular Architecture)

Speech-to-Text Providers

Text-to-Speech

Technical Features

Quick Start

Installation

Prerequisites

Step-by-Step Installation

Pre-built Releases

Release Notes

Building from Source

Prerequisites

Build Commands

Usage Guide

Launching the Application

Sidebar Navigation

Text Editor Tabs

Tools (Sidebar)

Context Panel

Address Book

Translation Assistant

RAG Document Search

Configuration

Unified Preferences (Ctrl+,)

Settings Menu Structure

Keyboard Shortcuts

Security

API Key Protection

Security Features

Best Practices

Healthcare Standards

FHIR R4 Support

ICD Code Support

Architecture

Project Structure

Key Design Patterns

Data Flow

Testing

Running Tests

Test Suite (4,100+ tests)

CI/CD

Troubleshooting

Common Issues

Getting Help

Debug Mode

System Requirements

Contributing

Development Setup

Documentation

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 41

Uh oh!

Contributors

Uh oh!

Languages