Skip to content

deprav1/LiveTranslator

Repository files navigation

LiveTranslator

CI Android APK

Real-time AI-powered speech translator for face-to-face conversations.

Talk in your language. Hear the translation in your earpiece. Instantly.

LiveTranslator uses the Gemini Live API WebSocket streaming to translate speech in real-time with sub-second latency. No cloud roundtrips for audio processing — the AI model listens and speaks simultaneously.


✨ Features

🎧 Solo Mode

One-way translation: your conversation partner speaks, you hear the translation in both earbuds. Perfect for listening to lectures, meetings, or one-on-one conversations.

🎧🎧 Duo Mode

Two-way simultaneous translation with stereo channel separation:

  • Left earphone → translation into your language
  • Right earphone → translation into partner's language

Share one pair of earbuds — each person gets their own translation channel.

Duo audio has three launch modes:

  • Headphones · normal — default mode with local self-translation filtering.
  • Headphones · continuous — experimental mode with both translation channels playing continuously.
  • Speaker · anti-echo — half-duplex mode for quick tests without headphones.

🔄 Seamless Sessions

  • GoAway handling — transparent WebSocket migration when Gemini closes the connection (~10 min intervals)
  • Session Resumption — conversation context preserved across reconnections
  • Context Window Compression — unlimited session duration (no 15-minute cap)
  • Heartbeat monitoring — dead connections detected within 60 seconds

📱 Background Operation

  • Foreground Service keeps the microphone alive when the screen is off
  • AppState monitoring auto-recovers WebSocket connections when returning from background
  • Translation continues while you use other apps

🛡️ Smart Language Filtering

In Solo mode, the translator ignores your native language and only translates the partner's speech — no echo loops.


🏗️ Architecture

┌─────────────────────────────────────────────────┐
│                    App.tsx                       │
│              (BT guard, API key)                 │
├─────────────────────────────────────────────────┤
│              useTranslator hook                  │
│         (lifecycle orchestration)                │
├─────────────────────────────────────────────────┤
│            TranslationEngine                     │
│    ┌─────────────┐    ┌─────────────┐           │
│    │  Session A   │    │  Session B   │  (duo)   │
│    │ partner→my   │    │ my→partner   │          │
│    └──────┬───────┘    └──────┬───────┘          │
│           │                   │                  │
│     ┌─────▼─────┐      ┌─────▼─────┐           │
│     │  L channel │      │  R channel │           │
│     └─────┬─────┘      └─────┬─────┘           │
├───────────┼─────────────────┼───────────────────┤
│     PcmPlayer (native stereo AudioTrack)         │
│     interleaved [L,R,L,R,...] @ 24kHz            │
└─────────────────────────────────────────────────┘
         ▲
         │ base64 PCM chunks (50ms)
         │
┌────────┴──────────┐
│    AudioCapture    │
│  (Foreground Svc)  │
│    16kHz mono      │
└───────────────────┘

🚀 Quick Start

Prerequisites

  • Node.js 18+
  • Android SDK (API 33+)
  • Gemini API key with Live API access

Install & Run

# Clone
git clone https://github.com/deprav1/LiveTranslator.git
cd LiveTranslator

# Install dependencies
npm install

# Generate native project only for a fresh checkout without android/
npx expo prebuild

# Run on Android
npx expo run:android

For the current Windows workspace with an existing native Android folder, prefer the checked build wrapper below. Re-running expo prebuild can overwrite manual native fixes unless the same change is already represented in app.json or a config plugin.

Build APK

# Windows / PowerShell wrapper
npm run build:apk

# Result
LiveTranslator-release.apk

build:apk runs tsc --noEmit, forces a fresh Metro release bundle by removing android/app/build/generated/assets/react, builds assembleRelease, and copies the APK to the repository root.

GitHub Actions also has a manual/tagged APK workflow. See docs/GITHUB_WORKFLOWS.md.

Configure

  1. Open the app → tap ⚙️ Settings
  2. Enter your Gemini API key. Current development uses a test key with quota limits; production ephemeral-token security is intentionally out of scope for this test build.
  3. Select languages
  4. Choose Solo or Duo mode
  5. In Duo, choose the audio mode on the main screen
  6. Tap START TRANSLATION

🛠️ Tech Stack

Layer Technology
Framework Expo SDK 56 + React Native 0.85
AI Model Gemini Live API (WebSocket streaming)
Audio Capture @siteed/audio-studio (Foreground Service)
Audio Playback Custom native PcmPlayer module (Kotlin, AudioTrack MODE_STREAM)
Storage AsyncStorage (API key persistence)
Language TypeScript 6.0

📂 Project Structure

src/
├── components/       # UI: AudioWaveform, LanguagePicker, ModeSwitch, SubtitlesPanel
├── constants/        # Supported languages (sr, en, ru)
├── hooks/            # useTranslator, useApiKey, useSettings, useAppStateReconnect
├── screens/          # HomeScreen, SettingsScreen
├── services/         # Core: GeminiLiveTranslate, TranslationEngine, AudioCapture
└── utils/            # toneTest (stereo channel verification)

modules/
└── pcm-player/       # Native Expo module — stereo PCM streaming (Kotlin + Swift stub)

docs/
└── solutions/        # Compound engineering: documented bugs & fixes

🌍 Supported Languages

Currently configured with 15 languages in src/constants/languages.ts.


⚠️ Known Limitations

  • iOS: Audio playback module is a stub (AVAudioEngine implementation pending)
  • Bluetooth mic + Duo: BT SCO profile forces mono audio, breaking stereo separation. The app falls back to the phone microphone while keeping A2DP headphones for stereo playback
  • Duo speaker mode: available for experiments, but headphones remain the intended setup for clean two-person translation
  • Background on some devices: Aggressive OEMs (Xiaomi, Huawei) may kill the foreground service. Disable battery optimization for the app
  • Manual verification pending: the installed APK starts cleanly, but live Gemini translation via x-goog-api-key, L/R tone-test, input-device list, and long screen-off behavior still need a human phone pass.

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.


🤝 Contributing

Contributions are welcome! Feel free to open issues and pull requests.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Built with ❤️ and Gemini Live API

About

Real-time AI speech translator for face-to-face conversations. Solo & Duo modes with stereo channel separation. Powered by Gemini Live API.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors