Kenji — AI Voice Assistant for Android

A privacy-conscious, on-device-first AI voice assistant that lives in a floating bubble and actually gets things done.

Built by Brian Njuguna Macharia · GitHub

What is Kenji?

Kenji is a wake-word-activated AI assistant for Android that goes beyond simple Q&A. Say "Hey Joe" (or your own custom wake word) and Kenji appears as a floating, glassmorphic bubble that listens, thinks, and acts — making calls, sending WhatsApp messages, controlling phone settings, reading the news, navigating you somewhere, and dozens of other real actions on your device, not just chat responses.

Unlike most voice assistants that route every request through a cloud LLM, Kenji's command understanding runs on-device using a semantic intent classifier (ONNX Runtime + all-MiniLM-L6-v2). Cloud AI (Gemini / Pollinations) is only ever invoked for genuine factual questions and open conversation — never for deciding what action to take. This makes Kenji faster, more predictable, and far less prone to misfiring on simple commands.

Why Kenji is different

	Typical voice assistant	Kenji
Command understanding	Cloud LLM guesses every time	On-device semantic classifier decides, instantly
Adding a new command	Requires app update / retraining	Add one line to a registry + re-embed (no retraining)
Unsupported request	Vague "I can't help with that"	Tells you it understood but isn't built yet — and logs it for the developer
Conversation UI	Full-screen takeover	Lightweight floating bubble, stays out of your way
Wake-word follow-ups	Often loses context	Remembers session context — "call her" after "message mum" just works

License

This project is licensed under the PolyForm Noncommercial License 1.0.0.

You're free to use, study, modify, and contribute to this project for any non-commercial purpose — personal projects, learning, research, or collaboration. Commercial use, resale, or redistribution as part of a paid product requires written permission from the copyright holder.

See the LICENSE file for full terms.

Core Capabilities

🗣️ Communication

Phone calls, WhatsApp messages, SMS — by name or number
Smart reply drafting — AI drafts a reply to your last received message, you confirm or edit before sending
Scheduled messaging — "WhatsApp John in 2 hours saying I'll be late"
Scheduled email
Broadcast messaging to multiple contacts at once

📱 Apps & System Control

Open any app by voice (WhatsApp, Facebook, Instagram, Maps, Spotify, and more)
Post directly to Facebook / Instagram / Twitter
WiFi, Bluetooth, flashlight, airplane mode, hotspot, DND, brightness, volume — all by voice
Screenshot, lock screen, go back/home, close all apps
Read what's on screen aloud (accessibility-powered)

📷 Camera & Media

Take photos/selfies, record video — fully hands-free, including a "cheese" trigger while the camera is open
Voice-controlled audio recording with a live waveform + stop button in the bubble
Play music, control playback
OCR text scanning with live camera preview, multi-language translation, and save-to-notes

🧭 Navigation & Knowledge

Turn-by-turn navigation via Google Maps
Weather (current + forecast) via OpenWeatherMap
News headlines with an auto-scrolling, clickable card carousel synced to speech
Wikipedia-first factual answers (instant, no hallucination) falling back to Pollinations → Gemini
Calculations, translations, currency conversion, contact lookup

🤖 Agentic Features

Morning digest — weather + calendar + greeting in one briefing
Calendar reading & meeting prep — AI-generated briefs for upcoming meetings
Task list management — add, complete, and review to-dos by voice
Expense logger — log spending by voice, get daily/total summaries
SOS alert — sends your GPS location to a chosen contact in an emergency
Driving mode — hands-free notification reading and auto-replies
Focus mode — timed distraction-free sessions with auto-silence
Goodnight routine — silences the phone, checks tomorrow's calendar, sets the mood for sleep

🧠 Context Awareness

Kenji remembers what just happened. After "take a selfie," just say "cheese." After "message mum," say "call her" and it resolves the pronoun correctly. Sessions expire automatically after a few minutes of inactivity.

Architecture

                    ┌─────────────────────┐
   Voice input  ──▶ │  Keyword pre-filter  │ ── strong match ──▶ Execute (instant)
                    └─────────────────────┘
                              │ no match
                              ▼
                    ┌─────────────────────────────────┐
                    │  ONNX Semantic Classifier         │
                    │  (all-MiniLM-L6-v2, on-device)    │
                    └─────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┬───────────────────────┐
        ▼                     ▼                     ▼                       ▼
   Confident match       Matches a known       Matches nothing          Genuine question
   to a real feature     UNSUPPORTED           confidently              ("what is...",
        │                 feature pattern            │                  "who is...")
        ▼                     ▼                     ▼                       ▼
   Execute directly      "I understood, but    "I don't have the      Wikipedia
   (no AI involved)       I'm not programmed     capability to              │
                          with that feature      understand that"      fails? ▼
                          yet" + logged to             │              Pollinations
                          a Google Doc for          logged too               │
                          the developer to                                fails? ▼
                          review                                          Gemini

This two-tier "missing feature" detection means Kenji can tell the difference between "I genuinely didn't understand that" and "I understood exactly what you want, I just haven't built it yet" — and logs both cases to a Google Doc so the developer knows exactly what to build next.

Tech Stack

Language: Kotlin
UI: Jetpack Compose, Material 3
On-device ML: ONNX Runtime Android + sentence-transformers/all-MiniLM-L6-v2
Cloud AI (Q&A only): Google Gemini, Pollinations AI, Mistral
Knowledge: Wikipedia REST API
Weather: OpenWeatherMap
Storage: Room (scheduled tasks), SharedPreferences, local file storage
Speech: Android SpeechRecognizer + TextToSpeech
System integration: AccessibilityService, NotificationListenerService, CameraX, ML Kit (OCR)
Background work: AlarmManager, Foreground Services

Project Structure

com.example.assistantai/
├── service/
│   ├── VoiceAssistantService.kt       — core voice pipeline, wake word, conversation state
│   ├── AssistantBubbleService.kt      — floating bubble overlay window
│   ├── AssistantAccessibilityService.kt — system-level actions (WhatsApp send, screenshots, etc.)
│   ├── KenjiNotificationService.kt    — reads incoming notifications
│   ├── CommandPipeline.kt             — keyword pre-filter + entity extraction
│   ├── AgentScheduler.kt              — scheduled message/email engine (Room + AlarmManager)
│   ├── WikipediaClient.kt / WeatherClient.kt / MistralIntentRouter.kt
│   └── MissingFeatureLogger.kt        — logs unsupported requests to a Google Doc
├── ml/
│   └── OnnxIntentClassifier.kt        — on-device semantic intent classification
├── data/
│   └── IntentRegistry.kt              — single source of truth for every supported intent
├── ui/bubble/
│   ├── BubbleScreen.kt                — bubble UI (chat, carousel, thinking states)
│   └── BubbleState.kt                 — bubble content model
├── uis/
│   └── TextRecognitionActivity.kt     — OCR camera scanner
├── util/
│   └── ShareUtils.kt                  — share app link / APK file
├── MainActivity.kt                    — dashboard, settings, status panel
├── SplashScreenActivity.kt            — holographic intro animation
└── RecordingsActivity.kt              — saved voice recordings browser

Setup

Requirements

Android Studio (latest stable)
Android device or emulator, API 26+
Free API keys (all have generous free tiers, no credit card required):

Service	Used for	Get a key
Google Gemini	Conversational fallback	ai.google.dev
OpenWeatherMap	Weather	openweathermap.org/api
Mistral AI	Intent routing assistance	console.mistral.ai

Steps

Clone the repo
Open in Android Studio, let Gradle sync
Download the ONNX model assets (see APPLY_GUIDE_ONNX.md) and place them in app/src/main/assets/
Build and run
Open the app → Settings → paste in your API keys
Grant microphone, overlay, accessibility, and notification access when prompted
Say your wake word and start talking to Kenji

Optional: Missing-feature logging

Deploy the included Google Apps Script (GoogleAppsScript_Code.gs) to your own Google Doc to receive a log of every request Kenji couldn't fulfil — useful for deciding what to build next. See APPLY_GUIDE_V2.md for the 5-minute deployment steps.

Permissions Used

Permission	Why
Microphone	Wake word detection and speech recognition
Overlay (Draw over other apps)	The floating bubble UI
Accessibility Service	Sending WhatsApp messages, reading screen content, system navigation
Notification access	Reading incoming WhatsApp/SMS for smart replies
Contacts	Resolving names to phone numbers
Camera	Photos, selfies, video, OCR scanning
Location	Weather, navigation, SOS alerts
SMS	Sending text messages
Exact alarm	Scheduled messages firing on time

Kenji requests each permission only when the relevant feature is first used, and every permission gap is shown — with a one-tap fix — in the dashboard's status panel.

Share Kenji

From the dashboard, you can share Kenji with others two ways:

Share Link — sends a message with the GitHub download link via any messaging app
Share APK File — sends the actual installable app file directly (useful when the recipient has no internet access)

Roadmap

Domain-adaptive fine-tuning of the intent classifier using real usage data
Expand the agentic feature set based on the missing-feature log
Wear OS companion
Multi-language wake word support

License

This project is currently unpublished and shared for personal/portfolio use. Contact the developer for licensing inquiries.

Credits

Developed by Brian Njuguna Macharia GitHub: github.com/001kenji

Built with Kotlin, Jetpack Compose, ONNX Runtime, and a genuine attempt to make a voice assistant that actually does things instead of just talking about them.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.idea		.idea
app		app
backup		backup
gradle		gradle
screenshots		screenshots
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
export_intent_embeddings.py		export_intent_embeddings.py
export_intent_embeddings_v2.py		export_intent_embeddings_v2.py
export_minilm_for_kenji.py		export_minilm_for_kenji.py
goals.txt		goals.txt
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
intent_embeddings_v2.json		intent_embeddings_v2.json
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kenji — AI Voice Assistant for Android

What is Kenji?

Why Kenji is different

License

Core Capabilities

🗣️ Communication

📱 Apps & System Control

📷 Camera & Media

🧭 Navigation & Knowledge

🤖 Agentic Features

🧠 Context Awareness

Architecture

Tech Stack

Project Structure

Setup

Requirements

Steps

Optional: Missing-feature logging

Permissions Used

Share Kenji

Roadmap

License

Credits

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kenji — AI Voice Assistant for Android

What is Kenji?

Why Kenji is different

License

Core Capabilities

🗣️ Communication

📱 Apps & System Control

📷 Camera & Media

🧭 Navigation & Knowledge

🤖 Agentic Features

🧠 Context Awareness

Architecture

Tech Stack

Project Structure

Setup

Requirements

Steps

Optional: Missing-feature logging

Permissions Used

Share Kenji

Roadmap

License

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages