OpenAI Whisper ASR for Node.js with CoreML/ANE acceleration on Apple Silicon
npm install whisper-coreml
Best-in-class speech recognition for Node.js on Apple Silicon
Transcribe audio in 99 languages. Run 100% offline on your Mac.
OpenAI's Whisper is the gold standard for speech recognition accuracy. This package brings it to
Node.js ā powered by Apple's Neural Engine for fast, private, local transcription.
šÆ Accuracy first. Whisper large-v3-turbo delivers state-of-the-art transcription quality ā
better than any cloud API, right on your Mac.
š 99 languages. From Afrikaans to Zulu. Handles accents, dialects, and background noise.
š 100% private. Your audio never leaves your device. No API keys. No cloud. No subscription.
ā” Fast enough. 14x real-time on M1 Ultra ā transcribe 1 hour of audio in under 5 minutes.
Running Whisper without hardware acceleration is painfully slow. Here's how the alternatives
compare:
| Approach | Speed | Drawbacks |
| ----------------------- | ----------------- | --------------------------- |
| OpenAI Whisper (Python) | ~2x real-time | Slow, needs Python |
| whisper.cpp (CPU) | ~4x real-time | No acceleration |
| faster-whisper | ~6x real-time | Needs NVIDIA GPU |
| Cloud APIs | ~1x + latency | Costs $$$, privacy concerns |
| whisper-coreml | 14x real-time | macOS only ā |
The Neural Engine in every Apple Silicon Mac is a dedicated ML accelerator that usually sits
idle. This package puts it to work.
Need even more speed? Our sister project
parakeet-coreml trades language coverage
for 40x real-time performance.
| | whisper-coreml | parakeet-coreml |
| ------------- | ------------------------ | --------------- |
| Best for | Accuracy, rare languages | Maximum speed |
| Speed | 14x real-time | 40x real-time |
| Languages | 99 | 25 European |
- šÆ 99 Languages ā Full OpenAI Whisper multilingual support
- š 14x real-time ā 1 hour of audio in ~4.5 minutes (M1 Ultra)
- š Neural Engine ā Runs on Apple's dedicated ML chip via CoreML
- š Fully Offline ā No internet required after setup
- š¦ Zero Dependencies ā No Python, no subprocess, no hassle
- š Timestamps ā Segment-level timing for subtitles
- ā¬ļø One Command Setup ā npx whisper-coreml download
``bashInstall
npm install whisper-coreml
Requirements: macOS 14+ (Sonoma), Apple Silicon (M1/M2/M3/M4), Node.js 20+
Performance
Measured on M1 Ultra:
`
5 min audio ā 22 seconds ā 14x real-time
1 hour audio ā 4.5 minutes
`Run
npx whisper-coreml benchmark to test on your machine.Quick Start
`typescript
import { WhisperAsrEngine, getModelPath } from "whisper-coreml"const engine = new WhisperAsrEngine({
modelPath: getModelPath()
})
await engine.initialize()
// Transcribe audio (16kHz, mono, Float32Array)
const result = await engine.transcribe(audioSamples, 16000)
console.log(result.text)
// "Hello, this is a test transcription."
console.log(
Language: ${result.language})
console.log(Processed in ${result.durationMs}ms)// Segments include timestamps
for (const seg of result.segments) {
console.log(
[${seg.startMs}ms - ${seg.endMs}ms] ${seg.text})
}engine.cleanup()
`Audio Format
| Property | Requirement |
| ----------- | --------------------------------------------- |
| Sample Rate | 16,000 Hz (16 kHz) |
| Channels | Mono (single channel) |
| Format | Float32Array with values between -1.0ā1.0 |
| Duration | Any length (auto-chunked internally) |
$3
Example with ffmpeg:
`bash
ffmpeg -i input.mp3 -ar 16000 -ac 1 -f f32le output.pcm
`Then load the raw PCM file:
`typescript
import { readFileSync } from "fs"const buffer = readFileSync("output.pcm")
const samples = new Float32Array(buffer.buffer, buffer.byteOffset, buffer.length / 4)
`CLI Commands
`bash
Download the model (~1.5GB)
npx whisper-coreml downloadCheck status
npx whisper-coreml statusRun benchmark (requires cloned repo)
npx whisper-coreml benchmarkGet model directory path
npx whisper-coreml path
`API Reference
$3
The main class for speech recognition.
`typescript
new WhisperAsrEngine(options: WhisperAsrOptions)
`#### Options
| Option | Type | Default | Description |
| ----------- | -------- | -------- | --------------------------------- |
|
modelPath | string | required | Path to ggml model file |
| language | string | "auto" | Language code or "auto" to detect |
| threads | number | 0 | CPU threads (0 = auto) |#### Methods
| Method | Description |
| --------------------------- | ------------------------------ |
|
initialize() | Load model (async) |
| transcribe(samples, rate) | Transcribe audio |
| isReady() | Check if engine is initialized |
| cleanup() | Release native resources |
| getVersion() | Get version information |$3
`typescript
interface TranscriptionResult {
text: string // Full transcription
language: string // Detected language (ISO code)
durationMs: number // Processing time in milliseconds
segments: TranscriptionSegment[]
}interface TranscriptionSegment {
startMs: number // Segment start in milliseconds
endMs: number // Segment end in milliseconds
text: string // Transcription for this segment
confidence: number // Confidence score (0-1)
}
`$3
| Function | Description |
| ---------------------- | -------------------------------------- |
|
isAvailable() | Check if running on supported platform |
| getDefaultModelDir() | Get default model cache path |
| getModelPath() | Get path to the model file |
| isModelDownloaded() | Check if model is downloaded |
| downloadModel() | Download the model |Architecture
`
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Your Node.js App ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā whisper-coreml API ā TypeScript
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā Native Addon ā N-API + C++
ā (whisper_engine) ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā whisper.cpp ā C++
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā CoreML ā Apple Framework
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā Apple Neural Engine ā Dedicated ML Silicon
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
``- Maximum accuracy ā When other solutions aren't good enough
- Rare languages ā 99 languages, far beyond English/European
- Accented speech ā Whisper handles accents and dialects well
- Noisy audio ā Robust to background noise and music
Contributions are welcome! Please read our Contributing Guide for details.
MIT ā see LICENSE for details.
- whisper.cpp by Georgi Gerganov
- OpenAI Whisper by OpenAI
---
Copyright Ā© 2026 Sebastian Software GmbH, Mainz, Germany