Fast YouTube transcript extraction with bulk processing, Google Takeout support, MCP server, and multiple output formats
npm install @nadimtuhin/ytranscript



Extract transcripts from your entire YouTube watch history in minutes. Build AI-powered video summaries, searchable archives, or feed transcripts directly to Claude, Cursor, and other AI assistants via the built-in MCP server.
Read the blog post: "Automating My Second Brain with YouTube Transcripts"
- No API keys required - Uses YouTube's public innertube API directly
- Works with AI assistants - Built-in MCP server for Claude, Cursor, and others
- Bulk processing - Process thousands of videos from Google Takeout exports
- Resume-safe - Automatically skips already-processed videos
- Multiple formats - JSON, JSONL, CSV, SRT, VTT, plain text
``bashGet a transcript in 10 seconds
npx @nadimtuhin/ytranscript get dQw4w9WgXcQ
Installation
`bash
Global install (recommended for CLI usage)
npm install -g @nadimtuhin/ytranscriptOr use with npx (no install)
npx @nadimtuhin/ytranscript get VIDEO_IDAdd to a project (for library usage)
npm add @nadimtuhin/ytranscript
`Runtimes supported: Node.js 18+ and Bun 1.0+
MCP Server (AI Assistant Integration)
ytranscript includes an MCP (Model Context Protocol) server that lets Claude, Cursor, and other AI assistants fetch YouTube transcripts directly.
$3
| Tool | Description |
|------|-------------|
|
get_transcript | Fetch transcript with format options (text, segments, srt, vtt) |
| get_transcript_languages | List available caption languages for a video |
| extract_video_id | Extract video ID from various YouTube URL formats |
| get_transcripts_bulk | Fetch transcripts for multiple videos at once |$3
Add to
~/Library/Application Support/Claude/claude_desktop_config.json (macOS):`json
{
"mcpServers": {
"ytranscript": {
"command": "npx",
"args": ["-y", "@nadimtuhin/ytranscript", "mcp"]
}
}
}
`Or if installed globally:
`json
{
"mcpServers": {
"ytranscript": {
"command": "ytranscript-mcp"
}
}
}
`$3
Once configured, you can ask Claude:
- "Get the transcript for this YouTube video: https://youtube.com/watch?v=dQw4w9WgXcQ"
- "Summarize the key points from this video"
- "What languages are available for this video's captions?"
- "Get transcripts for these 5 videos and compare their content"
CLI Usage
$3
`bash
Basic usage (outputs plain text)
ytranscript get dQw4w9WgXcQFrom URL
ytranscript get "https://www.youtube.com/watch?v=dQw4w9WgXcQ"With specific language
ytranscript get dQw4w9WgXcQ --lang esOutput as SRT subtitles
ytranscript get dQw4w9WgXcQ --format srt -o video.srtOutput as JSON with timestamps
ytranscript get dQw4w9WgXcQ --format json
`$3
`bash
ytranscript info dQw4w9WgXcQ
Output:
en English (auto-generated)
es Spanish
fr French
`$3
`bash
From Google Takeout exports
ytranscript bulk \
--history "Takeout/YouTube/history/watch-history.json" \
--watch-later "Takeout/YouTube/playlists/Watch later-videos.csv" \
--out-jsonl transcripts.jsonl \
--out-csv transcripts.csvFrom a list of video IDs
ytranscript bulk --videos "dQw4w9WgXcQ,jNQXAC9IVRw,9bZkp7q19f0"From a file (one ID or URL per line)
ytranscript bulk --file videos.txtResume a previous run (skips already-processed videos)
ytranscript bulk --history watch-history.json --resume
`$3
YouTube may rate-limit requests. Use these flags to control pacing:
`bash
ytranscript bulk \
--history watch-history.json \
--concurrency 4 \ # Max concurrent requests (default: 4, safe: 1-8)
--pause-after 10 \ # Pause after N requests (default: 10)
--pause-ms 5000 # Pause duration in ms (default: 5000)
`Recommended for large batches:
--concurrency 2 --pause-after 10 --pause-ms 5000$3
Route requests through an HTTP proxy to avoid rate limiting or access from restricted networks:
`bash
CLI with proxy
ytranscript get dQw4w9WgXcQ --proxy http://localhost:8080Bulk with proxy
ytranscript bulk --history watch-history.json --proxy http://user:pass@proxy.example.com:8080With authentication
ytranscript get dQw4w9WgXcQ --proxy http://username:password@proxy:8080
`Programmatic usage:
`typescript
import { fetchTranscript } from '@nadimtuhin/ytranscript';const transcript = await fetchTranscript('dQw4w9WgXcQ', {
proxy: {
url: 'http://localhost:8080',
},
});
`> Proxy support inspired by ytfetcher
Programmatic API
$3
`typescript
import { fetchTranscript } from '@nadimtuhin/ytranscript';try {
const transcript = await fetchTranscript('dQw4w9WgXcQ', {
languages: ['en', 'es'], // Preference order
includeAutoGenerated: true,
});
console.log(transcript.text); // Full transcript text
console.log(transcript.segments); // Array of { text, start, duration }
console.log(transcript.language); // 'en'
console.log(transcript.isAutoGenerated); // true/false
} catch (error) {
// See "Error Handling" section below
console.error(error.message);
}
`$3
`typescript
import {
loadWatchHistory,
loadWatchLater,
mergeVideoSources,
processVideos,
} from '@nadimtuhin/ytranscript';// Load from Google Takeout
const history = await loadWatchHistory('./watch-history.json');
const watchLater = await loadWatchLater('./watch-later.csv');
// Merge and deduplicate
const videos = mergeVideoSources(history, watchLater);
// Process with progress callback
const results = await processVideos(videos, {
concurrency: 4,
pauseAfter: 10,
pauseDuration: 5000,
onProgress: (completed, total, result) => {
const status = result.transcript ? 'OK' : 'FAIL';
console.log(
[${completed}/${total}] ${result.meta.videoId}: ${status});
},
});// Filter successful results
const transcripts = results.filter((r) => r.transcript);
`$3
`typescript
import { streamVideos, appendJsonl } from '@nadimtuhin/ytranscript';for await (const result of streamVideos(videos, { concurrency: 4 })) {
// Write each result immediately (resume-safe)
await appendJsonl(result, 'output.jsonl');
}
`$3
`typescript
import { fetchTranscript, formatSrt, formatVtt, formatText } from '@nadimtuhin/ytranscript';
import { writeFile } from 'fs/promises';const transcript = await fetchTranscript('dQw4w9WgXcQ');
// SRT subtitles
const srt = formatSrt(transcript);
await writeFile('video.srt', srt);
// VTT subtitles
const vtt = formatVtt(transcript);
await writeFile('video.vtt', vtt);
// Plain text with timestamps
const text = formatText(transcript, true);
// [0:00] First line of transcript
// [0:05] Second line...
`Error Handling
The library throws errors for various failure cases:
| Error Message | Cause | Solution |
|---------------|-------|----------|
|
No captions available for this video | Video has no captions/subtitles | Check with ytranscript info first |
| No suitable caption track found | Requested language not available | Use includeAutoGenerated: true or different language |
| Caption track is empty | Captions exist but have no content | Rare; try a different language |
| HTTP 429 | Rate limited by YouTube | Reduce concurrency, add pauses |
| HTTP 403 | Video is private or region-locked | Cannot access this video |`typescript
try {
const transcript = await fetchTranscript(videoId);
} catch (error) {
if (error.message.includes('No captions available')) {
console.log('This video has no subtitles');
} else if (error.message.includes('429')) {
console.log('Rate limited - slow down requests');
}
}
`Limitations
| Scenario | Supported |
|----------|-----------|
| Public videos with captions | ✅ Yes |
| Auto-generated captions | ✅ Yes |
| Manual/community captions | ✅ Yes |
| Private videos | ❌ No |
| Age-restricted videos | ❌ No |
| Live streams (while live) | ❌ No |
| Premiere videos (before premiere) | ❌ No |
| Region-locked videos | ❌ No (unless you're in the allowed region) |
Google Takeout
To export your YouTube data:
1. Go to Google Takeout
2. Deselect all, then select only "YouTube and YouTube Music"
3. Click "All YouTube data included" and select:
- History → Watch history
- Playlists (includes Watch Later)
4. Export and download
5. Extract the archive
The relevant files are:
-
Takeout/YouTube and YouTube Music/history/watch-history.json
- Takeout/YouTube and YouTube Music/playlists/Watch later-videos.csvAPI Reference
$3
`typescript
interface Transcript {
videoId: string;
text: string;
segments: TranscriptSegment[];
language: string;
isAutoGenerated: boolean;
}interface TranscriptSegment {
text: string;
start: number; // seconds
duration: number; // seconds
}
interface WatchHistoryMeta {
videoId: string;
title?: string;
url?: string;
channel?: { name?: string; url?: string };
watchedAt?: string;
source: 'history' | 'watch_later' | 'manual';
}
interface TranscriptResult {
meta: WatchHistoryMeta;
transcript: Transcript | null;
error?: string; // Present when transcript is null
}
interface FetchOptions {
languages?: string[]; // Default: ['en']
timeout?: number; // Default: 30000 (ms)
includeAutoGenerated?: boolean; // Default: true
proxy?: ProxyConfig; // Optional proxy configuration
}
interface ProxyConfig {
url: string; // HTTP proxy URL (e.g., "http://user:pass@host:port")
}
interface BulkOptions extends FetchOptions {
concurrency?: number; // Default: 4
pauseAfter?: number; // Default: 10
pauseDuration?: number; // Default: 5000 (ms)
skipIds?: Set; // Videos to skip
onProgress?: (completed: number, total: number, result: TranscriptResult) => void;
}
``Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
- Report bugs via GitHub Issues
- Security issues: see SECURITY.md
MIT