Real-time AI-powered speech translator for face-to-face conversations.
Talk in your language. Hear the translation in your earpiece. Instantly.
LiveTranslator uses the Gemini Live API WebSocket streaming to translate speech in real-time with sub-second latency. No cloud roundtrips for audio processing — the AI model listens and speaks simultaneously.
One-way translation: your conversation partner speaks, you hear the translation in both earbuds. Perfect for listening to lectures, meetings, or one-on-one conversations.
Two-way simultaneous translation with stereo channel separation:
- Left earphone → translation into your language
- Right earphone → translation into partner's language
Share one pair of earbuds — each person gets their own translation channel.
Duo audio has three launch modes:
- Headphones · normal — default mode with local self-translation filtering.
- Headphones · continuous — experimental mode with both translation channels playing continuously.
- Speaker · anti-echo — half-duplex mode for quick tests without headphones.
- GoAway handling — transparent WebSocket migration when Gemini closes the connection (~10 min intervals)
- Session Resumption — conversation context preserved across reconnections
- Context Window Compression — unlimited session duration (no 15-minute cap)
- Heartbeat monitoring — dead connections detected within 60 seconds
- Foreground Service keeps the microphone alive when the screen is off
- AppState monitoring auto-recovers WebSocket connections when returning from background
- Translation continues while you use other apps
In Solo mode, the translator ignores your native language and only translates the partner's speech — no echo loops.
┌─────────────────────────────────────────────────┐
│ App.tsx │
│ (BT guard, API key) │
├─────────────────────────────────────────────────┤
│ useTranslator hook │
│ (lifecycle orchestration) │
├─────────────────────────────────────────────────┤
│ TranslationEngine │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Session A │ │ Session B │ (duo) │
│ │ partner→my │ │ my→partner │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ ┌─────▼─────┐ ┌─────▼─────┐ │
│ │ L channel │ │ R channel │ │
│ └─────┬─────┘ └─────┬─────┘ │
├───────────┼─────────────────┼───────────────────┤
│ PcmPlayer (native stereo AudioTrack) │
│ interleaved [L,R,L,R,...] @ 24kHz │
└─────────────────────────────────────────────────┘
▲
│ base64 PCM chunks (50ms)
│
┌────────┴──────────┐
│ AudioCapture │
│ (Foreground Svc) │
│ 16kHz mono │
└───────────────────┘
- Node.js 18+
- Android SDK (API 33+)
- Gemini API key with Live API access
# Clone
git clone https://github.com/deprav1/LiveTranslator.git
cd LiveTranslator
# Install dependencies
npm install
# Generate native project only for a fresh checkout without android/
npx expo prebuild
# Run on Android
npx expo run:androidFor the current Windows workspace with an existing native Android folder, prefer
the checked build wrapper below. Re-running expo prebuild can overwrite manual
native fixes unless the same change is already represented in app.json or a
config plugin.
# Windows / PowerShell wrapper
npm run build:apk
# Result
LiveTranslator-release.apkbuild:apk runs tsc --noEmit, forces a fresh Metro release bundle by removing
android/app/build/generated/assets/react, builds assembleRelease, and copies
the APK to the repository root.
GitHub Actions also has a manual/tagged APK workflow. See
docs/GITHUB_WORKFLOWS.md.
- Open the app → tap ⚙️ Settings
- Enter your Gemini API key. Current development uses a test key with quota limits; production ephemeral-token security is intentionally out of scope for this test build.
- Select languages
- Choose Solo or Duo mode
- In Duo, choose the audio mode on the main screen
- Tap START TRANSLATION
| Layer | Technology |
|---|---|
| Framework | Expo SDK 56 + React Native 0.85 |
| AI Model | Gemini Live API (WebSocket streaming) |
| Audio Capture | @siteed/audio-studio (Foreground Service) |
| Audio Playback | Custom native PcmPlayer module (Kotlin, AudioTrack MODE_STREAM) |
| Storage | AsyncStorage (API key persistence) |
| Language | TypeScript 6.0 |
src/
├── components/ # UI: AudioWaveform, LanguagePicker, ModeSwitch, SubtitlesPanel
├── constants/ # Supported languages (sr, en, ru)
├── hooks/ # useTranslator, useApiKey, useSettings, useAppStateReconnect
├── screens/ # HomeScreen, SettingsScreen
├── services/ # Core: GeminiLiveTranslate, TranslationEngine, AudioCapture
└── utils/ # toneTest (stereo channel verification)
modules/
└── pcm-player/ # Native Expo module — stereo PCM streaming (Kotlin + Swift stub)
docs/
└── solutions/ # Compound engineering: documented bugs & fixes
Currently configured with 15 languages in src/constants/languages.ts.
- iOS: Audio playback module is a stub (AVAudioEngine implementation pending)
- Bluetooth mic + Duo: BT SCO profile forces mono audio, breaking stereo separation. The app falls back to the phone microphone while keeping A2DP headphones for stereo playback
- Duo speaker mode: available for experiments, but headphones remain the intended setup for clean two-person translation
- Background on some devices: Aggressive OEMs (Xiaomi, Huawei) may kill the foreground service. Disable battery optimization for the app
- Manual verification pending: the installed APK starts cleanly, but live Gemini translation via
x-goog-api-key, L/R tone-test, input-device list, and long screen-off behavior still need a human phone pass.
This project is licensed under the MIT License — see the LICENSE file for details.
Contributions are welcome! Feel free to open issues and pull requests.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Built with ❤️ and Gemini Live API