vad-web

![NPM version](https://www.npmjs.com/package/vad-web)

An enterprise-grade Voice Activity Detection (VAD) library for the browser.

It is based on the Silero VAD model
and Transformers.js.

Online demo

https://vad-web.vercel.app

source
code

Installation

``bash npm install vad-web`

`Usage`

Call recordAudioto start recording audio and get a dispose function. Under the hood, it will run the Silero VAD model in a web worker to avoid blocking the main thread.

`ts import { recordAudio } from 'vad-web'

const dispose = await recordAudio({ onSpeechStart: () => { console.log('Speech detected') }, onSpeechEnd: () => { console.log('Silence detected') }, onSpeechAvailable: ({ audioData, sampleRate, startTime, endTime }) => { console.log(Audio received with duration ${endTime - startTime}ms) // Further processing can be done here } })`

`API Reference`

`$3`

`ts function recordAudio(options: RecordAudioOptions): Promise`

Records audio from the microphone and calls the onAudioData callback with the audio data.

Returns

A function to dispose of the audio recorder.

`$3`

Options for recordAudio.

onSpeechStart?: () => void

Triggered when speech is detected.

onSpeechEnd?: () => void

Triggered when silence is detected.

onSpeechAvailable?: (data: SpeechData) => void

Triggered when a speech is finished and the audio data is available.

onSpeechOngoing?: (data: SpeechData) => void

Triggered periodically (once per second) while speech is ongoing.

`$3`

`ts function readAudio(options: ReadAudioOptions): Promise`

Reads audio data from an ArrayBuffer and calls the onAudioData callback with the audio data.

Returns

A function to dispose of the audio reader.

`$3`

Options for readAudio.

audioData: ArrayBuffer

Audio file data contained in an ArrayBuffer that is loaded from fetch(), XMLHttpRequest, or FileReader.

realTime?: boolean

If true, simulates real-time processing by adding delays to match the audio duration.

Default: false

onSpeechStart?: () => void

Triggered when speech is detected.

onSpeechEnd?: () => void

Triggered when silence is detected.

onSpeechAvailable?: (data: SpeechData) => void

Triggered when a speech is finished and the audio data is available.

onSpeechOngoing?: (data: SpeechData) => void

Triggered periodically (once per second) while speech is ongoing.

`$3`

An object representing speech data.

startTime: number

A timestamp in milliseconds

endTime: number

A timestamp in milliseconds

audioData: Float32Array

The audio data

sampleRate: number

The sample rate of the audio data

`$3`

A function that should be called to stop the recording or recognition session.

Type: () => Promise`

vad-web

![NPM version](https://www.npmjs.com/package/vad-web)

An enterprise-grade Voice Activity Detection (VAD) library for the browser.

It is based on the Silero VAD model
and Transformers.js.

Online demo

https://vad-web.vercel.app

source
code

Installation

``bash npm install vad-web`

`Usage`

Call recordAudioto start recording audio and get a dispose function. Under the hood, it will run the Silero VAD model in a web worker to avoid blocking the main thread.

`ts import { recordAudio } from 'vad-web'

`API Reference`

`$3`

`ts function recordAudio(options: RecordAudioOptions): Promise`

Records audio from the microphone and calls the onAudioData callback with the audio data.

Returns

A function to dispose of the audio recorder.

`$3`

Options for recordAudio.

onSpeechStart?: () => void

Triggered when speech is detected.

onSpeechEnd?: () => void

Triggered when silence is detected.

onSpeechAvailable?: (data: SpeechData) => void

Triggered when a speech is finished and the audio data is available.

onSpeechOngoing?: (data: SpeechData) => void

Triggered periodically (once per second) while speech is ongoing.

`$3`

`ts function readAudio(options: ReadAudioOptions): Promise`

Reads audio data from an ArrayBuffer and calls the onAudioData callback with the audio data.

Returns

A function to dispose of the audio reader.

`$3`

Options for readAudio.

audioData: ArrayBuffer

Audio file data contained in an ArrayBuffer that is loaded from fetch(), XMLHttpRequest, or FileReader.

realTime?: boolean

If true, simulates real-time processing by adding delays to match the audio duration.

Default: false

onSpeechStart?: () => void

Triggered when speech is detected.

onSpeechEnd?: () => void

Triggered when silence is detected.

onSpeechAvailable?: (data: SpeechData) => void

Triggered when a speech is finished and the audio data is available.

onSpeechOngoing?: (data: SpeechData) => void

Triggered periodically (once per second) while speech is ongoing.

`$3`

An object representing speech data.

startTime: number

A timestamp in milliseconds

endTime: number

A timestamp in milliseconds

audioData: Float32Array

The audio data

sampleRate: number

The sample rate of the audio data

`$3`

A function that should be called to stop the recording or recognition session.

Type: () => Promise`