Voice activity detector (VAD) for the browser
npm install vad-web
An enterprise-grade Voice Activity Detection (VAD) library for the browser.
It is based on the Silero VAD model
and Transformers.js.
https://vad-web.vercel.app
``bash`
npm install vad-web
Call recordAudio to start recording audio and get a dispose function. Under
the hood, it will run the Silero
VAD model in a web worker to avoid
blocking the main thread.
`ts
import { recordAudio } from 'vad-web'
const dispose = await recordAudio({
onSpeechStart: () => {
console.log('Speech detected')
},
onSpeechEnd: () => {
console.log('Silence detected')
},
onSpeechAvailable: ({ audioData, sampleRate, startTime, endTime }) => {
console.log(Audio received with duration ${endTime - startTime}ms)`
// Further processing can be done here
}
})
`ts`
function recordAudio(options: RecordAudioOptions): Promise
Records audio from the microphone and calls the onAudioData callback with the audio data.
Returns
A function to dispose of the audio recorder.
Options for recordAudio.
onSpeechStart?: () => void
Triggered when speech is detected.
onSpeechEnd?: () => void
Triggered when silence is detected.
onSpeechAvailable?: (data: SpeechData) => void
Triggered when a speech is finished and the audio data is available.
onSpeechOngoing?: (data: SpeechData) => void
Triggered periodically (once per second) while speech is ongoing.
`ts`
function readAudio(options: ReadAudioOptions): Promise
Reads audio data from an ArrayBuffer and calls the onAudioData callback with the audio data.
Returns
A function to dispose of the audio reader.
Options for readAudio.
audioData: ArrayBuffer
Audio file data contained in an ArrayBuffer that is loaded from fetch(), XMLHttpRequest, or FileReader.
realTime?: boolean
If true, simulates real-time processing by adding delays to match the audio duration.
Default: false
onSpeechStart?: () => void
Triggered when speech is detected.
onSpeechEnd?: () => void
Triggered when silence is detected.
onSpeechAvailable?: (data: SpeechData) => void
Triggered when a speech is finished and the audio data is available.
onSpeechOngoing?: (data: SpeechData) => void
Triggered periodically (once per second) while speech is ongoing.
An object representing speech data.
startTime: number
A timestamp in milliseconds
endTime: number
A timestamp in milliseconds
audioData: Float32Array
The audio data
sampleRate: number
The sample rate of the audio data
A function that should be called to stop the recording or recognition session.
Type: () => Promise