Babelive (.NET)

Avalonia 12 desktop app that captures Windows audio output (any app's playback), streams it to OpenAI's realtime translation model (gpt-realtime-translate), and renders the translated audio + dual-language transcript live. Windows-only today; the audio layer is being abstracted for a macOS port.

Stack

.NET 9 + Avalonia 12 (Fluent theme, Inter font fallback)
NAudio — WasapiLoopbackCapture for input, WasapiOut/WaveOutEvent for playback, WdlResamplingSampleProvider for high-quality 48 kHz → 24 kHz resampling
MessageBox.Avalonia — dialog replacement (Avalonia has no built-in MessageBox)
System.Net.WebSockets.ClientWebSocket (built-in) for the realtime API
System.Text.Json (built-in) for protocol serialization

How it works

[Any Windows app] ──► WasapiLoopbackCapture ──► downmix ──► WDL resample to 24 kHz ──► PCM16
                                                                                          │
                                                                                          ▼
                                                              ClientWebSocket → OpenAI Realtime
                                                                                          │
                          ┌───────────────────────────────────────────────────────────────┴───┐
                          ▼                                                                   ▼
              translated audio (PCM16)                                       dual transcript deltas
                          │                                                                   │
                          ▼                                                                   ▼
                    WasapiOut device                                              Avalonia TextBox

Requirements

Windows 10 / 11
.NET 9 SDK
An OpenAI API key with access to gpt-realtime-translate

Setup & run

dotnet restore
dotnet run

On first launch, click the API… button in the settings panel and paste your sk-… key. The key is stored locally at %APPDATA%\Babelive\settings.json (plain JSON, never transmitted anywhere except to the configured API endpoint).

For a self-contained release build:

dotnet publish -c Release

Produces a single Babelive.exe at bin\Release\net9.0-windows\win-x64\publish\ (~80–90 MB, bundles the .NET 9 runtime + Avalonia/Skia/HarfBuzz native libs, single-file compressed). Just ship that one file.

Using it

Pick a target language.
Pick a Capture source — recommended is All system audio (no echo) which uses Win10 build 20348+ Process Loopback to exclude Babelive's own playback. Per-app entries (Teams, Chrome, Spotify, …) and legacy device loopbacks are also available.
Pick a Playback device for the translated audio. Read the feedback warning below.
Click Start, then play any video / call / song.

The settings window is hide-on-close — closing it leaves the lyric overlay + tray icon running. Exit from the tray menu fully quits.

Microsoft Teams / Skype audio

Teams and Skype set AUDCLNT_STREAMFLAGS_PREVENT_LOOPBACK_CAPTURE on their call audio for privacy, so Windows' Process Loopback API returns silence for them. Babelive auto-detects this and, if VB-CABLE is installed, redirects the Teams/Skype process tree to CABLE Input via IAudioPolicyConfig per-app routing, then loopback-captures from the cable. No manual Teams/Skype audio config needed.

Without VB-CABLE installed, Teams/Skype audio cannot be captured — this is a Windows DRM-style restriction, not a Babelive bug.

Zoom / Discord / Google Meet / WebEx / Slack use WebRTC and don't set the flag — they work via plain Process Loopback.

⚠️ Feedback loop warning

If translated audio plays through the same speakers you're capturing, the loopback re-translates it forever. Three fixes:

Use headphones for playback (different physical device than the captured speakers).
Install VB-CABLE — free virtual audio cable. Send the source app's output to CABLE Input; Babelive can then loopback-capture the cable while playing translation through your real speakers / headphones.
Tick "Transcript only" — only spoken text appears, nothing replays.

File layout

Babelive/
├── Babelive.csproj
├── Program.cs                              ← Avalonia entry point
├── App.axaml / App.axaml.cs                ← single-instance gate + window/tray bootstrap
├── MainWindow.axaml / .axaml.cs            ← settings window + audio orchestration
├── LyricWindow.axaml / .axaml.cs           ← transparent topmost desktop-lyrics overlay
├── ApiSettingsWindow.axaml / .axaml.cs     ← API endpoint + key dialog
├── TrayIconHost.cs                         ← system tray (Avalonia TrayIcon + NativeMenu)
├── AppIcon.cs                              ← runtime-generated amber 译 disc icon
├── AppSettings.cs / LanguageCodes.cs       ← persisted prefs + dropdown options
├── Styles/
│   └── Controls.axaml                      ← Fluent overrides (buttons, combos, etc.)
├── Audio/
│   ├── LoopbackCapture.cs                  ← WASAPI / Process Loopback → 24 kHz mono PCM16
│   ├── AudioPlayer.cs                      ← plays translated PCM16 chunks
│   ├── AudioDucker.cs                      ← lowers other apps' session volumes during translation
│   ├── EndpointMuter.cs                    ← "Mute other speakers" — driver-stage endpoint mute
│   ├── DefaultDeviceSetter.cs              ← IPolicyConfigVista default-render-device override
│   ├── AppAudioRouter.cs                   ← IAudioPolicyConfig per-app device routing (Teams)
│   ├── ProcessLoopbackCapture.cs           ← Win10 20348+ process-include/exclude loopback
│   └── ProcessTree.cs                      ← Toolhelp32 process-tree walker (Teams PID family)
└── Translation/
    └── RealtimeTranslatorClient.cs         ← async ClientWebSocket

API quirks / things that may need tuning

The realtime translation API is new. The exact event/field names in RealtimeTranslatorClient.cs are best-effort based on https://developers.openai.com/api/docs/guides/realtime-translation plus the standard /v1/realtime event conventions. If your account sees errors:

Endpoint: defaults to wss://api.openai.com/v1/realtime/translations?model=gpt-realtime-translate. Tick "Alt endpoint" in the UI to fall back to wss://api.openai.com/v1/realtime?model=gpt-realtime-translate.
Session config: RealtimeTranslatorClient.SendSessionUpdateAsync sends session.update with input_audio_format=pcm16, output_audio_format=pcm16, and translation.target_language=<code>. Adjust if the official schema differs.
Event names: Dispatch matches both the output_*.delta and response.output_*.delta shapes. If transcripts/audio don't arrive, log every incoming event and adjust.

Quick sanity test

Open YouTube in any non-target language, hit Start, and the translation should start streaming into the lyric overlay (and the settings window's transcript panes) within a second or two of the source audio playing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Babelive (.NET)

Stack

How it works

Requirements

Setup & run

Using it

Microsoft Teams / Skype audio

⚠️ Feedback loop warning

File layout

API quirks / things that may need tuning

Quick sanity test

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
Audio		Audio
Styles		Styles
Translation		Translation
.gitattributes		.gitattributes
.gitignore		.gitignore
ApiSettingsWindow.axaml		ApiSettingsWindow.axaml
ApiSettingsWindow.axaml.cs		ApiSettingsWindow.axaml.cs
App.axaml		App.axaml
App.axaml.cs		App.axaml.cs
AppIcon.cs		AppIcon.cs
AppSettings.cs		AppSettings.cs
Babelive.csproj		Babelive.csproj
LanguageCodes.cs		LanguageCodes.cs
LyricWindow.axaml		LyricWindow.axaml
LyricWindow.axaml.cs		LyricWindow.axaml.cs
MainWindow.axaml		MainWindow.axaml
MainWindow.axaml.cs		MainWindow.axaml.cs
Program.cs		Program.cs
README.md		README.md
TrayIconHost.cs		TrayIconHost.cs

Folders and files

Latest commit

History

Repository files navigation

Babelive (.NET)

Stack

How it works

Requirements

Setup & run

Using it

Microsoft Teams / Skype audio

⚠️ Feedback loop warning

File layout

API quirks / things that may need tuning

Quick sanity test

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages