@visionengine/audio-tts

![English](README.md)
![中文文档](README-zh.md)

VisionEngine Audio TTS MCP Server - Text-to-speech synthesis using Volcengine TTS API with support for multiple voices and languages.

Features

- Text-to-Speech Synthesis - Convert text to natural-sounding speech audio files
- Multiple Voices - Support for various voice types including male/female voices with different styles
- Voice Query - Filter available voices by language
- Audio Customization - Adjust speech rate, volume, pitch, and emotion
- Multiple Formats - Support for MP3, OGG Opus, and PCM output formats
- TTS 2.0 Support - Context-aware speech synthesis with style hints

Installation

$3

Add to your MCP client configuration:

``

json

{

  "mcpServers": {

    "ve-audio-tts": {

      "type": "local",

      "command": "npx",

      "args": ["-y", "@visionengine/audio-tts@latest"],

      "transport": "stdio",

      "env": {

        "API_URL": "https://openspeech.bytedance.com/api/v3/tts/unidirectional",

        "APP_ID": "your_app_id",

        "ACCESS_TOKEN": "your_access_key",

        "RESOURCE_ID": "seed-tts-2.0",

        "WORKDIR": "./public"

      }

    }

  }

}

$3

bash

npm install -g @visionengine/audio-tts





Configuration



Environment variables:



-

API_URL

 - TTS API endpoint (default: https://openspeech.bytedance.com/api/v3/tts/unidirectional)

-

APP_ID

 - Your Volcengine App ID (required)

-

ACCESS_TOKEN

 - Your Volcengine Access Key (required)

-

RESOURCE_ID

 - TTS resource ID (default: seed-tts-2.0)

-

WORKDIR

 - Directory for saving generated audio files (default: ./)



Tools



$3



Synthesize speech from text and save to an audio file.



Parameters:

-

text

 (string, required) - Text content to synthesize into speech

-

speaker

 (string, required) - Voice speaker ID (e.g., 'zh_female_vv_uranus_bigtts')

-

format

 (string, optional) - Audio format: mp3 (default), ogg_opus, or pcm

-

sampleRate

 (number, optional) - Audio sample rate: 8000, 16000, 22050, 24000, 32000, 44100, 48000 (default: 24000)

-

speechRate

 (number, optional) - Speech rate: -50 (0.5x) to 100 (2.0x), default: 0

-

loudnessRate

 (number, optional) - Volume: -50 (0.5x) to 100 (2.0x), default: 0

-

emotion

 (string, optional) - Emotion setting for supported voices (e.g., 'happy', 'sad')

-

emotionScale

 (number, optional) - Emotion intensity: 1-5, default: 4

-

contextTexts

 (string[], optional) - Context hints for TTS 2.0 to adjust style

-

explicitLanguage

 (string, optional) - Explicit language: zh-cn, en, ja, es-mx, id, pt-br, de, fr

-

pitch

 (number, optional) - Pitch adjustment: -12 to 12, default: 0



Example:

typescript

// Basic usage

await tts({

  text: "Hello, welcome to VisionEngine!",

  speaker: "zh_female_vv_uranus_bigtts"

});



// With customization

await tts({

  text: "This is a test with custom settings.",

  speaker: "zh_male_m191_uranus_bigtts",

  format: "mp3",

  speechRate: 10,

  loudnessRate: 5,

  pitch: 2

});





$3



Query available TTS voices filtered by language.



Parameters:

-

language

 (string, optional) - Filter by language code: zh, zh-cn, en, ja, es, id, pt, de, fr. Leave empty for all voices.



Example:

typescript

// Get all voices

await listVoices({});



// Get Chinese voices only

await listVoices({

  language: "zh"

});





Response:

json

{

  "total": 10,

  "language": "zh",

  "voices": [

    {

      "voiceType": "zh_female_vv_uranus_bigtts",

      "name": "Vivi 2.0",

      "gender": "女",

      "age": "青年",

      "description": "语调平稳、咬字柔和、自带治愈安抚力的女声音色",

      "categories": ["通用场景"],

      "languages": ["zh-cn"],

      "trialURL": "https://..."

    }

  ]

}





Usage Examples



$3



Once configured as an MCP server, the tools are available through your MCP client:



> Use tts tool to generate speech from "Hello World" with speaker zh_female_vv_uranus_bigtts

> Use list-voices tool to get available Chinese voices

$3

bash

Install globally

npm install -g @visionengine/audio-tts



Set environment variables

export APP_ID="your_app_id"

export ACCESS_TOKEN="your_access_key"

export WORKDIR="./audio"



Run the server

ve-audio-tts





$3



Add to your Claude Desktop configuration file:



macOS/Linux:

~/Library/Application Support/Claude/claude_desktop_config.json





Windows:

%APPDATA%\Claude\claude_desktop_config.json

json

{

  "mcpServers": {

    "ve-audio-tts": {

      "command": "npx",

      "args": ["-y", "@visionengine/audio-tts@latest"],

      "env": {

        "APP_ID": "your_app_id",

        "ACCESS_TOKEN": "your_access_key",

        "WORKDIR": "/Users/username/Audio"

      }

    }

  }

}





Restart Claude Desktop to use.



Available Voices



| Voice Type | Name | Gender | Description |

|------------|------|--------|-------------|

| zh_female_vv_uranus_bigtts | Vivi 2.0 | Female | Gentle and soothing female voice |

| zh_female_xiaohe_uranus_bigtts | 小何 2.0 | Female | Sweet and lively young female voice |

| zh_male_taocheng_uranus_bigtts | 小天 2.0 | Male | Clear and warm young male voice |

| zh_male_m191_uranus_bigtts | 云舟 2.0 | Male | Mature and magnetic male voice |

| zh_female_santongyongns_saturn_bigtts | 流畅女声 | Female | Smooth and natural female voice |

| zh_female_meilinvyou_saturn_bigtts | 魅力女友 | Female | Charming and gentle female voice |



Use

list-voices

 tool to get the complete list.



Development



$3

bash

npm run build

$3

bash

npm test

$3

bash

Build first

npm run build



Run locally

node dist/index.js

Supported Audio Formats

- MP3 - Compressed audio (default)
- OGG Opus - High-quality compressed audio
- PCM - Raw uncompressed audio

Support

For issues and questions:

- Email: team@visionengine-tech.com
- Website: https://visionengine-tech.com