VisionEngine Subtitle Generation MCP Server - Generate subtitles from audio/video files with automatic timing alignment
npm install @visionengine/subtitle-generatejson
{
"mcpServers": {
"ve-subtitle-generate": {
"type": "local",
"command": "npx",
"args": ["-y", "@visionengine/subtitle-generate@latest"],
"transport": "stdio",
"env": {
"APP_ID": "your_app_id",
"ACCESS_TOKEN": "your_access_token",
"WORKDIR": "./media"
}
}
}
}
`
$3
`bash
npm install -g @visionengine/subtitle-generate
`
Configuration
Environment variables:
- API_BASE_URL - API endpoint (default: https://openspeech.bytedance.com)
- APP_ID - Your application ID (required)
- ACCESS_TOKEN - Your Bearer token for authentication (required)
- WORKDIR - Base directory for relative file paths (default: ./)
Tools
$3
Generate subtitles from audio/video files using speech recognition.
Parameters:
- audioPath (string, required) - Audio/video file path (relative to WORKDIR or absolute)
- language (string, optional) - Language code: zh-CN, en-US, ja-JP, ko-KR, etc.
- wordsPerLine (number, optional) - Maximum words per line (default: 46)
- maxLines (number, optional) - Maximum lines per screen (default: 1)
- useItn (boolean, optional) - Convert Chinese numbers to Arabic numerals
- captionType (enum, optional) - 'auto', 'speech', or 'singing'
- usePunc (boolean, optional) - Add punctuation marks
- useDdc (boolean, optional) - Add silence annotations
- withSpeakerInfo (boolean, optional) - Return speaker information
Supported Languages:
| Language | Code | Recommended words_per_line |
|----------|------|---------------------------|
| Chinese (Simplified) | zh-CN | 15 |
| Cantonese | yue | 15 |
| English (US) | en-US | 55 |
| Japanese | ja-JP | 32 |
| Korean | ko-KR | 32 |
| Spanish | es-MX | 55 |
| Russian | ru-RU | 55 |
| French | fr-FR | 55 |
Example:
`typescript
// Basic usage
await subtitle_generate({
audioPath: "./video.mp4"
});
// With options
await subtitle_generate({
audioPath: "./video.mp4",
language: "zh-CN",
wordsPerLine: 15,
maxLines: 2,
captionType: "speech",
usePunc: true
});
`
Output:
- SRT file saved in the same directory as the input file
- JSON response with utterances and timing information
$3
Align existing subtitle text with audio for precise timing.
Parameters:
- audioPath (string, required) - Audio/video file path (relative to WORKDIR or absolute)
- subtitleText (string, required) - The subtitle text to align with the audio
- captionType (enum, required) - 'speech' or 'singing'
- staPuncMode (enum, optional) - Punctuation mode: '1', '2', or '3'
Punctuation Modes:
- 1 (default) - Omit trailing punctuation from alignment results
- 2 - Replace punctuation with spaces
- 3 - Keep original punctuation
Example:
`typescript
// Align speech subtitle
await subtitle_align({
audioPath: "./speech.wav",
subtitleText: "Hello, welcome to our presentation today.",
captionType: "speech"
});
// Align song lyrics
await subtitle_align({
audioPath: "./song.mp3",
subtitleText: "这是一首美丽的歌曲",
captionType: "singing",
staPuncMode: "3"
});
`
Output:
- SRT file saved with _aligned suffix
- JSON response with word-level timing information
Usage Examples
$3
Once configured as an MCP server, the tools are available through your MCP client:
`
> Generate subtitles for video.mp4
> Align these lyrics with song.mp3: "今天天气真好..."
`
$3
`bash
Install globally
npm install -g @visionengine/subtitle-generate
Set environment variables
export APP_ID="your_app_id"
export ACCESS_TOKEN="your_access_token"
export WORKDIR="./media"
Run the server
ve-subtitle-generate
`
Output Format
The tools save subtitles in SRT format:
`
1
00:00:00,000 --> 00:00:03,197
如果您没有其他需要举报的话这边就先挂断了
2
00:00:03,442 --> 00:00:04,877
祝您生活愉快再见
`
Error Codes
| Code | Meaning | Description |
|------|---------|-------------|
| 0 | Success | - |
| 2000 | Processing | Task is being processed |
| 1001 | Invalid parameters | Missing/invalid request parameters |
| 1002 | No permission | Token invalid/expired |
| 1003 | Rate limited | QPS exceeded |
| 1010 | Audio too long | Duration exceeded threshold |
| 1012 | Invalid audio format | Audio decode failure |
| 1013 | Silent audio | No speech detected |
Development
$3
`bash
pnpm build
`
$3
`bash
pnpm test
`
$3
`bash
Build first
pnpm build
Run locally
node dist/index.js
``