Skip to content

001kenji/Assistant-AI-Code

Repository files navigation

Kenji β€” AI Voice Assistant for Android

A privacy-conscious, on-device-first AI voice assistant that lives in a floating bubble and actually gets things done.

Built by Brian Njuguna Macharia Β· GitHub


What is Kenji?

Kenji is a wake-word-activated AI assistant for Android that goes beyond simple Q&A. Say "Hey Joe" (or your own custom wake word) and Kenji appears as a floating, glassmorphic bubble that listens, thinks, and acts β€” making calls, sending WhatsApp messages, controlling phone settings, reading the news, navigating you somewhere, and dozens of other real actions on your device, not just chat responses.

Unlike most voice assistants that route every request through a cloud LLM, Kenji's command understanding runs on-device using a semantic intent classifier (ONNX Runtime + all-MiniLM-L6-v2). Cloud AI (Gemini / Pollinations) is only ever invoked for genuine factual questions and open conversation β€” never for deciding what action to take. This makes Kenji faster, more predictable, and far less prone to misfiring on simple commands.


Why Kenji is different

Typical voice assistant Kenji
Command understanding Cloud LLM guesses every time On-device semantic classifier decides, instantly
Adding a new command Requires app update / retraining Add one line to a registry + re-embed (no retraining)
Unsupported request Vague "I can't help with that" Tells you it understood but isn't built yet β€” and logs it for the developer
Conversation UI Full-screen takeover Lightweight floating bubble, stays out of your way
Wake-word follow-ups Often loses context Remembers session context β€” "call her" after "message mum" just works

License

This project is licensed under the PolyForm Noncommercial License 1.0.0.

You're free to use, study, modify, and contribute to this project for any non-commercial purpose β€” personal projects, learning, research, or collaboration. Commercial use, resale, or redistribution as part of a paid product requires written permission from the copyright holder.

See the LICENSE file for full terms.


Core Capabilities

πŸ—£οΈ Communication

  • Phone calls, WhatsApp messages, SMS β€” by name or number
  • Smart reply drafting β€” AI drafts a reply to your last received message, you confirm or edit before sending
  • Scheduled messaging β€” "WhatsApp John in 2 hours saying I'll be late"
  • Scheduled email
  • Broadcast messaging to multiple contacts at once

πŸ“± Apps & System Control

  • Open any app by voice (WhatsApp, Facebook, Instagram, Maps, Spotify, and more)
  • Post directly to Facebook / Instagram / Twitter
  • WiFi, Bluetooth, flashlight, airplane mode, hotspot, DND, brightness, volume β€” all by voice
  • Screenshot, lock screen, go back/home, close all apps
  • Read what's on screen aloud (accessibility-powered)

πŸ“· Camera & Media

  • Take photos/selfies, record video β€” fully hands-free, including a "cheese" trigger while the camera is open
  • Voice-controlled audio recording with a live waveform + stop button in the bubble
  • Play music, control playback
  • OCR text scanning with live camera preview, multi-language translation, and save-to-notes

🧭 Navigation & Knowledge

  • Turn-by-turn navigation via Google Maps
  • Weather (current + forecast) via OpenWeatherMap
  • News headlines with an auto-scrolling, clickable card carousel synced to speech
  • Wikipedia-first factual answers (instant, no hallucination) falling back to Pollinations β†’ Gemini
  • Calculations, translations, currency conversion, contact lookup

πŸ€– Agentic Features

  • Morning digest β€” weather + calendar + greeting in one briefing
  • Calendar reading & meeting prep β€” AI-generated briefs for upcoming meetings
  • Task list management β€” add, complete, and review to-dos by voice
  • Expense logger β€” log spending by voice, get daily/total summaries
  • SOS alert β€” sends your GPS location to a chosen contact in an emergency
  • Driving mode β€” hands-free notification reading and auto-replies
  • Focus mode β€” timed distraction-free sessions with auto-silence
  • Goodnight routine β€” silences the phone, checks tomorrow's calendar, sets the mood for sleep

🧠 Context Awareness

Kenji remembers what just happened. After "take a selfie," just say "cheese." After "message mum," say "call her" and it resolves the pronoun correctly. Sessions expire automatically after a few minutes of inactivity.


Architecture

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   Voice input  ──▢ β”‚  Keyword pre-filter  β”‚ ── strong match ──▢ Execute (instant)
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚ no match
                              β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  ONNX Semantic Classifier         β”‚
                    β”‚  (all-MiniLM-L6-v2, on-device)    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό                     β–Ό                     β–Ό                       β–Ό
   Confident match       Matches a known       Matches nothing          Genuine question
   to a real feature     UNSUPPORTED           confidently              ("what is...",
        β”‚                 feature pattern            β”‚                  "who is...")
        β–Ό                     β–Ό                     β–Ό                       β–Ό
   Execute directly      "I understood, but    "I don't have the      Wikipedia
   (no AI involved)       I'm not programmed     capability to              β”‚
                          with that feature      understand that"      fails? β–Ό
                          yet" + logged to             β”‚              Pollinations
                          a Google Doc for          logged too               β”‚
                          the developer to                                fails? β–Ό
                          review                                          Gemini

This two-tier "missing feature" detection means Kenji can tell the difference between "I genuinely didn't understand that" and "I understood exactly what you want, I just haven't built it yet" β€” and logs both cases to a Google Doc so the developer knows exactly what to build next.


Tech Stack

  • Language: Kotlin
  • UI: Jetpack Compose, Material 3
  • On-device ML: ONNX Runtime Android + sentence-transformers/all-MiniLM-L6-v2
  • Cloud AI (Q&A only): Google Gemini, Pollinations AI, Mistral
  • Knowledge: Wikipedia REST API
  • Weather: OpenWeatherMap
  • Storage: Room (scheduled tasks), SharedPreferences, local file storage
  • Speech: Android SpeechRecognizer + TextToSpeech
  • System integration: AccessibilityService, NotificationListenerService, CameraX, ML Kit (OCR)
  • Background work: AlarmManager, Foreground Services

Project Structure

com.example.assistantai/
β”œβ”€β”€ service/
β”‚   β”œβ”€β”€ VoiceAssistantService.kt       β€” core voice pipeline, wake word, conversation state
β”‚   β”œβ”€β”€ AssistantBubbleService.kt      β€” floating bubble overlay window
β”‚   β”œβ”€β”€ AssistantAccessibilityService.kt β€” system-level actions (WhatsApp send, screenshots, etc.)
β”‚   β”œβ”€β”€ KenjiNotificationService.kt    β€” reads incoming notifications
β”‚   β”œβ”€β”€ CommandPipeline.kt             β€” keyword pre-filter + entity extraction
β”‚   β”œβ”€β”€ AgentScheduler.kt              β€” scheduled message/email engine (Room + AlarmManager)
β”‚   β”œβ”€β”€ WikipediaClient.kt / WeatherClient.kt / MistralIntentRouter.kt
β”‚   └── MissingFeatureLogger.kt        β€” logs unsupported requests to a Google Doc
β”œβ”€β”€ ml/
β”‚   └── OnnxIntentClassifier.kt        β€” on-device semantic intent classification
β”œβ”€β”€ data/
β”‚   └── IntentRegistry.kt              β€” single source of truth for every supported intent
β”œβ”€β”€ ui/bubble/
β”‚   β”œβ”€β”€ BubbleScreen.kt                β€” bubble UI (chat, carousel, thinking states)
β”‚   └── BubbleState.kt                 β€” bubble content model
β”œβ”€β”€ uis/
β”‚   └── TextRecognitionActivity.kt     β€” OCR camera scanner
β”œβ”€β”€ util/
β”‚   └── ShareUtils.kt                  β€” share app link / APK file
β”œβ”€β”€ MainActivity.kt                    β€” dashboard, settings, status panel
β”œβ”€β”€ SplashScreenActivity.kt            β€” holographic intro animation
└── RecordingsActivity.kt              β€” saved voice recordings browser

Setup

Requirements

  • Android Studio (latest stable)
  • Android device or emulator, API 26+
  • Free API keys (all have generous free tiers, no credit card required):
Service Used for Get a key
Google Gemini Conversational fallback ai.google.dev
OpenWeatherMap Weather openweathermap.org/api
Mistral AI Intent routing assistance console.mistral.ai

Steps

  1. Clone the repo
  2. Open in Android Studio, let Gradle sync
  3. Download the ONNX model assets (see APPLY_GUIDE_ONNX.md) and place them in app/src/main/assets/
  4. Build and run
  5. Open the app β†’ Settings β†’ paste in your API keys
  6. Grant microphone, overlay, accessibility, and notification access when prompted
  7. Say your wake word and start talking to Kenji

Optional: Missing-feature logging

Deploy the included Google Apps Script (GoogleAppsScript_Code.gs) to your own Google Doc to receive a log of every request Kenji couldn't fulfil β€” useful for deciding what to build next. See APPLY_GUIDE_V2.md for the 5-minute deployment steps.


Permissions Used

Permission Why
Microphone Wake word detection and speech recognition
Overlay (Draw over other apps) The floating bubble UI
Accessibility Service Sending WhatsApp messages, reading screen content, system navigation
Notification access Reading incoming WhatsApp/SMS for smart replies
Contacts Resolving names to phone numbers
Camera Photos, selfies, video, OCR scanning
Location Weather, navigation, SOS alerts
SMS Sending text messages
Exact alarm Scheduled messages firing on time

Kenji requests each permission only when the relevant feature is first used, and every permission gap is shown β€” with a one-tap fix β€” in the dashboard's status panel.


Share Kenji

From the dashboard, you can share Kenji with others two ways:

  • Share Link β€” sends a message with the GitHub download link via any messaging app
  • Share APK File β€” sends the actual installable app file directly (useful when the recipient has no internet access)

Roadmap

  • Domain-adaptive fine-tuning of the intent classifier using real usage data
  • Expand the agentic feature set based on the missing-feature log
  • Wear OS companion
  • Multi-language wake word support

License

This project is currently unpublished and shared for personal/portfolio use. Contact the developer for licensing inquiries.


Credits

Developed by Brian Njuguna Macharia GitHub: github.com/001kenji

Built with Kotlin, Jetpack Compose, ONNX Runtime, and a genuine attempt to make a voice assistant that actually does things instead of just talking about them.

About

Kenji is a wake-word-activated AI assistant for Android that goes beyond simple Q&A.Unlike most voice assistants that route every request through a cloud LLM, Kenji's command understanding runs on-device using a semantic intent classifier

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors