A privacy-conscious, on-device-first AI voice assistant that lives in a floating bubble and actually gets things done.
Built by Brian Njuguna Macharia Β· GitHub
Kenji is a wake-word-activated AI assistant for Android that goes beyond simple Q&A. Say "Hey Joe" (or your own custom wake word) and Kenji appears as a floating, glassmorphic bubble that listens, thinks, and acts β making calls, sending WhatsApp messages, controlling phone settings, reading the news, navigating you somewhere, and dozens of other real actions on your device, not just chat responses.
Unlike most voice assistants that route every request through a cloud LLM, Kenji's command understanding runs on-device using a semantic intent classifier (ONNX Runtime + all-MiniLM-L6-v2). Cloud AI (Gemini / Pollinations) is only ever invoked for genuine factual questions and open conversation β never for deciding what action to take. This makes Kenji faster, more predictable, and far less prone to misfiring on simple commands.
| Typical voice assistant | Kenji | |
|---|---|---|
| Command understanding | Cloud LLM guesses every time | On-device semantic classifier decides, instantly |
| Adding a new command | Requires app update / retraining | Add one line to a registry + re-embed (no retraining) |
| Unsupported request | Vague "I can't help with that" | Tells you it understood but isn't built yet β and logs it for the developer |
| Conversation UI | Full-screen takeover | Lightweight floating bubble, stays out of your way |
| Wake-word follow-ups | Often loses context | Remembers session context β "call her" after "message mum" just works |
This project is licensed under the PolyForm Noncommercial License 1.0.0.
You're free to use, study, modify, and contribute to this project for any non-commercial purpose β personal projects, learning, research, or collaboration. Commercial use, resale, or redistribution as part of a paid product requires written permission from the copyright holder.
See the LICENSE file for full terms.
- Phone calls, WhatsApp messages, SMS β by name or number
- Smart reply drafting β AI drafts a reply to your last received message, you confirm or edit before sending
- Scheduled messaging β "WhatsApp John in 2 hours saying I'll be late"
- Scheduled email
- Broadcast messaging to multiple contacts at once
- Open any app by voice (WhatsApp, Facebook, Instagram, Maps, Spotify, and more)
- Post directly to Facebook / Instagram / Twitter
- WiFi, Bluetooth, flashlight, airplane mode, hotspot, DND, brightness, volume β all by voice
- Screenshot, lock screen, go back/home, close all apps
- Read what's on screen aloud (accessibility-powered)
- Take photos/selfies, record video β fully hands-free, including a "cheese" trigger while the camera is open
- Voice-controlled audio recording with a live waveform + stop button in the bubble
- Play music, control playback
- OCR text scanning with live camera preview, multi-language translation, and save-to-notes
- Turn-by-turn navigation via Google Maps
- Weather (current + forecast) via OpenWeatherMap
- News headlines with an auto-scrolling, clickable card carousel synced to speech
- Wikipedia-first factual answers (instant, no hallucination) falling back to Pollinations β Gemini
- Calculations, translations, currency conversion, contact lookup
- Morning digest β weather + calendar + greeting in one briefing
- Calendar reading & meeting prep β AI-generated briefs for upcoming meetings
- Task list management β add, complete, and review to-dos by voice
- Expense logger β log spending by voice, get daily/total summaries
- SOS alert β sends your GPS location to a chosen contact in an emergency
- Driving mode β hands-free notification reading and auto-replies
- Focus mode β timed distraction-free sessions with auto-silence
- Goodnight routine β silences the phone, checks tomorrow's calendar, sets the mood for sleep
Kenji remembers what just happened. After "take a selfie," just say "cheese." After "message mum," say "call her" and it resolves the pronoun correctly. Sessions expire automatically after a few minutes of inactivity.
βββββββββββββββββββββββ
Voice input βββΆ β Keyword pre-filter β ββ strong match βββΆ Execute (instant)
βββββββββββββββββββββββ
β no match
βΌ
βββββββββββββββββββββββββββββββββββ
β ONNX Semantic Classifier β
β (all-MiniLM-L6-v2, on-device) β
βββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββ¬ββββββββββββββββββββββββ
βΌ βΌ βΌ βΌ
Confident match Matches a known Matches nothing Genuine question
to a real feature UNSUPPORTED confidently ("what is...",
β feature pattern β "who is...")
βΌ βΌ βΌ βΌ
Execute directly "I understood, but "I don't have the Wikipedia
(no AI involved) I'm not programmed capability to β
with that feature understand that" fails? βΌ
yet" + logged to β Pollinations
a Google Doc for logged too β
the developer to fails? βΌ
review Gemini
This two-tier "missing feature" detection means Kenji can tell the difference between "I genuinely didn't understand that" and "I understood exactly what you want, I just haven't built it yet" β and logs both cases to a Google Doc so the developer knows exactly what to build next.
- Language: Kotlin
- UI: Jetpack Compose, Material 3
- On-device ML: ONNX Runtime Android +
sentence-transformers/all-MiniLM-L6-v2 - Cloud AI (Q&A only): Google Gemini, Pollinations AI, Mistral
- Knowledge: Wikipedia REST API
- Weather: OpenWeatherMap
- Storage: Room (scheduled tasks), SharedPreferences, local file storage
- Speech: Android SpeechRecognizer + TextToSpeech
- System integration: AccessibilityService, NotificationListenerService, CameraX, ML Kit (OCR)
- Background work: AlarmManager, Foreground Services
com.example.assistantai/
βββ service/
β βββ VoiceAssistantService.kt β core voice pipeline, wake word, conversation state
β βββ AssistantBubbleService.kt β floating bubble overlay window
β βββ AssistantAccessibilityService.kt β system-level actions (WhatsApp send, screenshots, etc.)
β βββ KenjiNotificationService.kt β reads incoming notifications
β βββ CommandPipeline.kt β keyword pre-filter + entity extraction
β βββ AgentScheduler.kt β scheduled message/email engine (Room + AlarmManager)
β βββ WikipediaClient.kt / WeatherClient.kt / MistralIntentRouter.kt
β βββ MissingFeatureLogger.kt β logs unsupported requests to a Google Doc
βββ ml/
β βββ OnnxIntentClassifier.kt β on-device semantic intent classification
βββ data/
β βββ IntentRegistry.kt β single source of truth for every supported intent
βββ ui/bubble/
β βββ BubbleScreen.kt β bubble UI (chat, carousel, thinking states)
β βββ BubbleState.kt β bubble content model
βββ uis/
β βββ TextRecognitionActivity.kt β OCR camera scanner
βββ util/
β βββ ShareUtils.kt β share app link / APK file
βββ MainActivity.kt β dashboard, settings, status panel
βββ SplashScreenActivity.kt β holographic intro animation
βββ RecordingsActivity.kt β saved voice recordings browser
- Android Studio (latest stable)
- Android device or emulator, API 26+
- Free API keys (all have generous free tiers, no credit card required):
| Service | Used for | Get a key |
|---|---|---|
| Google Gemini | Conversational fallback | ai.google.dev |
| OpenWeatherMap | Weather | openweathermap.org/api |
| Mistral AI | Intent routing assistance | console.mistral.ai |
- Clone the repo
- Open in Android Studio, let Gradle sync
- Download the ONNX model assets (see
APPLY_GUIDE_ONNX.md) and place them inapp/src/main/assets/ - Build and run
- Open the app β Settings β paste in your API keys
- Grant microphone, overlay, accessibility, and notification access when prompted
- Say your wake word and start talking to Kenji
Deploy the included Google Apps Script (GoogleAppsScript_Code.gs) to your own Google Doc to receive a log of every request Kenji couldn't fulfil β useful for deciding what to build next. See APPLY_GUIDE_V2.md for the 5-minute deployment steps.
| Permission | Why |
|---|---|
| Microphone | Wake word detection and speech recognition |
| Overlay (Draw over other apps) | The floating bubble UI |
| Accessibility Service | Sending WhatsApp messages, reading screen content, system navigation |
| Notification access | Reading incoming WhatsApp/SMS for smart replies |
| Contacts | Resolving names to phone numbers |
| Camera | Photos, selfies, video, OCR scanning |
| Location | Weather, navigation, SOS alerts |
| SMS | Sending text messages |
| Exact alarm | Scheduled messages firing on time |
Kenji requests each permission only when the relevant feature is first used, and every permission gap is shown β with a one-tap fix β in the dashboard's status panel.
From the dashboard, you can share Kenji with others two ways:
- Share Link β sends a message with the GitHub download link via any messaging app
- Share APK File β sends the actual installable app file directly (useful when the recipient has no internet access)
- Domain-adaptive fine-tuning of the intent classifier using real usage data
- Expand the agentic feature set based on the missing-feature log
- Wear OS companion
- Multi-language wake word support
This project is currently unpublished and shared for personal/portfolio use. Contact the developer for licensing inquiries.
Developed by Brian Njuguna Macharia GitHub: github.com/001kenji
Built with Kotlin, Jetpack Compose, ONNX Runtime, and a genuine attempt to make a voice assistant that actually does things instead of just talking about them.