An another Node binding of whisper.cpp to make same API with whisper.rn as much as possible.
- whisper.cpp: Automatic speech recognition with multi-platform support
- whisper.rn: React Native binding of whisper.cpp
- macOS
- arm64: CPU and Metal GPU acceleration
- x86_64: CPU only
- Windows (x86_64 and arm64)
- CPU
- GPU acceleration via Vulkan
- GPU acceleration via CUDA (x86_64)
- Linux (x86_64 and arm64)
- CPU
- GPU acceleration via Vulkan
- GPU acceleration via CUDA
- Web
- WASM
- Optional WebGPU through
ggml-webgpuwhen the WASM package is built withGGML_WEBGPU=ON
npm install @fugood/whisper.nodeimport { initWhisper } from '@fugood/whisper.node'
const context = await initWhisper({
model: 'path/to/ggml-base.en.bin',
useGpu: true,
}, libVariant)
// transcribeFile returns { stop, promise }
const { stop: stop1, promise: promise1 } = context.transcribeFile('audio1.wav', {
language: 'en',
temperature: 0.0,
// ...
})
const result1 = await promise1
// transcribeData also returns { stop, promise }
let audioBuffer // PCM 16-bit, mono, 16kHz
const { stop: stop2, promise: promise2 } = context.transcribeData(audioBuffer, {
language: 'en',
temperature: 0.0,
// ...
})
const result2 = await promise2
// You can also cancel transcription if needed
// await stop1() // Cancels the first transcription
// await stop2() // Cancels the second transcription
// Always release the context when done
await context.release()import { initWhisperVad } from '@fugood/whisper.node'
// Context-based VAD (for multiple detections)
const vadContext = await initWhisperVad({
model: 'path/to/ggml-vad.bin',
useGpu: true,
nThreads: 2
}, libVariant)
const result = await vadContext.detectSpeechFile('audio.wav')
const result2 = await vadContext.detectSpeechData(audioBuffer)
await vadContext.release()Note: Audio data should be 16-bit PCM, mono, 16kHz format. The library expects ArrayBuffer containing raw audio data.
import {
addNativeLogListener,
isNativeLogEnabled,
toggleNativeLog,
} from '@fugood/whisper.node'
const logs = addNativeLogListener((level, text) => {
console.log(`[whisper ${level}] ${text}`)
})
await toggleNativeLog(true)
console.log(isNativeLogEnabled())
// ...
await toggleNativeLog(false)
logs.remove()Log levels are emitted as lowercase error, warn, info, or debug
strings. The same helpers are available in Node.js and browser WASM builds.
The browser package keeps the same promise-based initWhisper and
initWhisperVad entry points. In browsers, filePath is treated as a URL and
the model is fetched into the WASM filesystem.
import { initWhisper, initWhisperVad } from '@fugood/whisper.node'
const whisper = await initWhisper({
filePath: 'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin',
maxModelBytes: 1536 * 1024 * 1024,
useGpu: false,
})
const { promise } = whisper.transcribeFile('https://raw.githubusercontent.com/ggml-org/whisper.cpp/master/samples/jfk.wav', {
language: 'en',
temperature: 0,
})
console.log(await promise)
await whisper.release()
const vad = await initWhisperVad({
filePath: 'https://huggingface.co/ggml-org/whisper-vad/resolve/main/ggml-silero-v6.2.0.bin',
useGpu: false,
})
console.log(await vad.detectSpeechFile('https://raw.githubusercontent.com/ggml-org/whisper.cpp/master/samples/jfk.wav'))
await vad.release()The browser package ships both single-thread and pthread WASM artifacts. On
cross-origin isolated pages (Cross-Origin-Opener-Policy: same-origin and
Cross-Origin-Embedder-Policy: require-corp), the loader uses the pthread
artifact with SharedArrayBuffer; otherwise it falls back to the single-thread
artifact automatically. Oversized model downloads fail before loading into
MEMFS. Firefox is capped at 256 MiB by default; other browsers default to 75% of
the configured WASM maximum memory. Pass maxModelBytes only when you know the
target browser can allocate the model. Whisper transcription defaults to up to 8
threads based on browser hardware concurrency when pthreads are available; pass
maxThreads to override it. Browser WASM clamps maxThreads to the compiled
pool limit of 8, or 1 in the single-thread fallback. Browser pages run model
loading, transcription, benchmarks, and VAD in a dedicated module worker by
default so the UI thread can keep rendering. Use the main whisper.node package
entrypoint in browser code too:
import { configureWasm, initWhisper } from '@fugood/whisper.node'Use configureWasm({ worker: false }) only when you explicitly need the
in-thread runtime, configureWasm({ threads: false }) to force the
single-thread artifact, or pass workerPath, jsPath, and wasmPath when
serving the package files from custom URLs. The older workerUrl and
runtimeScriptUrl option names still work. Model
downloads are cached in browser Cache Storage by default. Pass
cacheModel: false to disable persistent caching, modelCacheName to isolate
the cache namespace, or modelCacheKey when the fetch URL is a proxy or signed
URL but should reuse the same cached model.
Build the browser package with:
npm run build-wasmOr build with the Emscripten Docker image:
npm run build-wasm-dockernpm run build-wasm enables GGML_WEBGPU=ON by default and emits
wasm/whisper-node.js, wasm/whisper-node.wasm,
wasm/whisper-node.threads.js, and wasm/whisper-node.threads.wasm. Use
bash scripts/build-wasm.sh --no-webgpu for a CPU-only WASM build, or
--no-threads / --threads to build only one CPU threading variant. Pass
--single-file only when you want the WASM binary embedded into each generated
JS file. Modern Emscripten embeds the pthread worker bootstrap in the main JS
file, so a separate whisper-node.worker.js is not expected. The browser
package also ships its own module worker.js wrapper for non-blocking model load
and inference. npm run build-wasm-docker uses emscripten/emsdk:4.0.14-arm64
on arm64 hosts such as Apple Silicon Macs, and emscripten/emsdk:4.0.13 on
amd64 hosts. Override with EMSCRIPTEN_IMAGE or EMSCRIPTEN_PLATFORM when
needed. A local smoke page is available after building:
node examples/wasm/server.mjsIn the WASM package, useGpu: true enables WebGPU for whisper transcription
when the browser supports navigator.gpu. VAD currently falls back to CPU in
the browser package because the Silero VAD graph hits unsupported WebGPU ops.
-
default: General usage, not support GPU except macOS (Metal) -
vulkan: Support GPU Vulkan (Windows/Linux), but some scenario might unstable -
cuda: Support GPU CUDA (Windows/Linux), but only for limited capabilityLinux: (x86_64: 8.9, arm64: 8.7) Windows: x86_64 - 12.0
MIT
Built and maintained by BRICKS.