whisper.node

An another Node binding of whisper.cpp to make same API with whisper.rn as much as possible.

whisper.cpp: Automatic speech recognition with multi-platform support
whisper.rn: React Native binding of whisper.cpp

Platform Support

macOS
- arm64: CPU and Metal GPU acceleration
- x86_64: CPU only
Windows (x86_64 and arm64)
- CPU
- GPU acceleration via Vulkan
- GPU acceleration via CUDA (x86_64)
Linux (x86_64 and arm64)
- CPU
- GPU acceleration via Vulkan
- GPU acceleration via CUDA
Web
- WASM
- Optional WebGPU through ggml-webgpu when the WASM package is built with GGML_WEBGPU=ON

Installation

npm install @fugood/whisper.node

Usage

Basic Transcription

import { initWhisper } from '@fugood/whisper.node'

const context = await initWhisper({
  model: 'path/to/ggml-base.en.bin',
  useGpu: true,
}, libVariant)

// transcribeFile returns { stop, promise }
const { stop: stop1, promise: promise1 } = context.transcribeFile('audio1.wav', {
  language: 'en',
  temperature: 0.0,
  // ...
})

const result1 = await promise1

// transcribeData also returns { stop, promise }
let audioBuffer // PCM 16-bit, mono, 16kHz
const { stop: stop2, promise: promise2 } = context.transcribeData(audioBuffer, {
  language: 'en',
  temperature: 0.0,
  // ...
})

const result2 = await promise2

// You can also cancel transcription if needed
// await stop1() // Cancels the first transcription
// await stop2() // Cancels the second transcription

// Always release the context when done
await context.release()

Voice Activity Detection (VAD)

import { initWhisperVad } from '@fugood/whisper.node'

// Context-based VAD (for multiple detections)
const vadContext = await initWhisperVad({
  model: 'path/to/ggml-vad.bin',
  useGpu: true,
  nThreads: 2
}, libVariant)

const result = await vadContext.detectSpeechFile('audio.wav')

const result2 = await vadContext.detectSpeechData(audioBuffer)
await vadContext.release()

Note: Audio data should be 16-bit PCM, mono, 16kHz format. The library expects ArrayBuffer containing raw audio data.

Native Logs

import {
  addNativeLogListener,
  isNativeLogEnabled,
  toggleNativeLog,
} from '@fugood/whisper.node'

const logs = addNativeLogListener((level, text) => {
  console.log(`[whisper ${level}] ${text}`)
})

await toggleNativeLog(true)
console.log(isNativeLogEnabled())

// ...

await toggleNativeLog(false)
logs.remove()

Log levels are emitted as lowercase error, warn, info, or debug strings. The same helpers are available in Node.js and browser WASM builds.

Browser WASM

The browser package keeps the same promise-based initWhisper and initWhisperVad entry points. In browsers, filePath is treated as a URL and the model is fetched into the WASM filesystem.

import { initWhisper, initWhisperVad } from '@fugood/whisper.node'

const whisper = await initWhisper({
  filePath: 'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin',
  maxModelBytes: 1536 * 1024 * 1024,
  useGpu: false,
})

const { promise } = whisper.transcribeFile('https://raw.githubusercontent.com/ggml-org/whisper.cpp/master/samples/jfk.wav', {
  language: 'en',
  temperature: 0,
})

console.log(await promise)
await whisper.release()

const vad = await initWhisperVad({
  filePath: 'https://huggingface.co/ggml-org/whisper-vad/resolve/main/ggml-silero-v6.2.0.bin',
  useGpu: false,
})
console.log(await vad.detectSpeechFile('https://raw.githubusercontent.com/ggml-org/whisper.cpp/master/samples/jfk.wav'))
await vad.release()

The browser package ships both single-thread and pthread WASM artifacts. On cross-origin isolated pages (Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp), the loader uses the pthread artifact with SharedArrayBuffer; otherwise it falls back to the single-thread artifact automatically. Oversized model downloads fail before loading into MEMFS. Firefox is capped at 256 MiB by default; other browsers default to 75% of the configured WASM maximum memory. Pass maxModelBytes only when you know the target browser can allocate the model. Whisper transcription defaults to up to 8 threads based on browser hardware concurrency when pthreads are available; pass maxThreads to override it. Browser WASM clamps maxThreads to the compiled pool limit of 8, or 1 in the single-thread fallback. Browser pages run model loading, transcription, benchmarks, and VAD in a dedicated module worker by default so the UI thread can keep rendering. Use the main whisper.node package entrypoint in browser code too:

import { configureWasm, initWhisper } from '@fugood/whisper.node'

Use configureWasm({ worker: false }) only when you explicitly need the in-thread runtime, configureWasm({ threads: false }) to force the single-thread artifact, or pass workerPath, jsPath, and wasmPath when serving the package files from custom URLs. The older workerUrl and runtimeScriptUrl option names still work. Model downloads are cached in browser Cache Storage by default. Pass cacheModel: false to disable persistent caching, modelCacheName to isolate the cache namespace, or modelCacheKey when the fetch URL is a proxy or signed URL but should reuse the same cached model.

Build the browser package with:

npm run build-wasm

Or build with the Emscripten Docker image:

npm run build-wasm-docker

npm run build-wasm enables GGML_WEBGPU=ON by default and emits wasm/whisper-node.js, wasm/whisper-node.wasm, wasm/whisper-node.threads.js, and wasm/whisper-node.threads.wasm. Use bash scripts/build-wasm.sh --no-webgpu for a CPU-only WASM build, or --no-threads / --threads to build only one CPU threading variant. Pass --single-file only when you want the WASM binary embedded into each generated JS file. Modern Emscripten embeds the pthread worker bootstrap in the main JS file, so a separate whisper-node.worker.js is not expected. The browser package also ships its own module worker.js wrapper for non-blocking model load and inference. npm run build-wasm-docker uses emscripten/emsdk:4.0.14-arm64 on arm64 hosts such as Apple Silicon Macs, and emscripten/emsdk:4.0.13 on amd64 hosts. Override with EMSCRIPTEN_IMAGE or EMSCRIPTEN_PLATFORM when needed. A local smoke page is available after building:

node examples/wasm/server.mjs

In the WASM package, useGpu: true enables WebGPU for whisper transcription when the browser supports navigator.gpu. VAD currently falls back to CPU in the browser package because the Silero VAD graph hits unsupported WebGPU ops.

Lib Variants

default: General usage, not support GPU except macOS (Metal)
vulkan: Support GPU Vulkan (Windows/Linux), but some scenario might unstable
cuda: Support GPU CUDA (Windows/Linux), but only for limited capability

Linux: (x86_64: 8.9, arm64: 8.7) Windows: x86_64 - 12.0

License

MIT

Built and maintained by BRICKS.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github		.github
.husky		.husky
cmake		cmake
examples		examples
lib		lib
packages		packages
scripts		scripts
src		src
test		test
whisper.cpp @ 86c40c3		whisper.cpp @ 86c40c3
.gitignore		.gitignore
.gitmodules		.gitmodules
.release-it.json		.release-it.json
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
babel.config.js		babel.config.js
commitlint.config.js		commitlint.config.js
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

whisper.node

Platform Support

Installation

Usage

Basic Transcription

Voice Activity Detection (VAD)

Native Logs

Browser WASM

Lib Variants

License

About

Uh oh!

Releases 17

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

whisper.node

Platform Support

Installation

Usage

Basic Transcription

Voice Activity Detection (VAD)

Native Logs

Browser WASM

Lib Variants

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages