CLI tool for audio/video transcription with speaker diarization, AI summarization, and infographic generation
npm install @krasnoperov/transcribeAI transcription skill for Claude Code - Transform audio/video recordings into transcripts with speaker diarization, AI-powered summaries, and visual infographics.
This skill provides a complete pipeline for processing recordings:
- Transcription - Convert audio/video to VTT format with speaker identification (OpenAI Whisper)
- Summarization - Generate structured markdown summaries (OpenAI GPT-5.1)
- Infographics - Create visual summaries from text (Google Gemini)
- All-in-one - Process video → transcript → summary → infographic in one command
See skills/transcribe/SKILL.md for complete usage guide.
This is a Claude Code skill. Install it from the marketplace:
``bash`
/plugin marketplace add krasnoperov/claude-plugins
/plugin install transcribe@krasnoperov-plugins
Once installed, use the /transcribe skill in your conversations:
``
/transcribe transcribe meeting.mp4 to VTT with speaker diarization
/transcribe summarize this transcript into key points
/transcribe create an infographic from this summary
You can also use this package directly via npx:
`bash
export OPENAI_API_KEY="your-openai-key"
export GOOGLE_AI_STUDIO_KEY="your-google-key"
Get your API keys:
- OpenAI
- Google AI Studio
Core Operations
`
transcribe Audio/Video → VTT transcript with speakers
summarize Text/VTT → Markdown summary
infographic Text → Visual infographic image
process All-in-one: video → transcript → summary → infographic
`These operations can be used individually or chained together.
Examples
skills/transcribe/examples/ directory:1. 01-basic-workflow.sh - Step-by-step transcription pipeline
2. 02-all-in-one.sh - Single command processing
$3
`bash
npx -y @krasnoperov/transcribe@latest transcribe podcast.mp3 \
--language es \
--model gpt-4o-transcribe-diarize \
-o podcast.vtt
`$3
`bash
npx -y @krasnoperov/transcribe@latest transcribe meeting.mp4 \
--model gemini-3 \
-o meeting.vtt
`Gemini 3 offers excellent transcription with built-in speaker diarization and can handle very long audio files (up to ~8 hours).
Output (VTT with speaker tags):
`
WEBVTT00:00:00.000 --> 00:00:02.450
Welcome to the podcast...
00:00:02.850 --> 00:00:08.200
Thanks for having me...
`$3
`bash
npx -y @krasnoperov/transcribe@latest summarize transcript.vtt \
--prompt "Focus on action items and decisions" \
-o summary.md
`$3
`bash
npx -y @krasnoperov/transcribe@latest infographic summary.md \
--style "modern minimal corporate" \
-o infographic.png
`Options
$3
`
--model Transcription model:
OpenAI: gpt-4o-transcribe-diarize (default), gpt-4o-transcribe, whisper-1
Google: gemini-3
--language Language code (en, es, ru, de, etc.)
-o, --output Output VTT file
`$3
`
--prompt Custom summarization instructions
-o, --output Output markdown file
`$3
`
--style Style instructions for visual
--reference Reference image for style
-o, --output Output image file
`$3
`
--output-dir Output directory for all files
--language Language for transcription
--model Transcription model
--style Style for infographic
`Requirements
- Node.js >= 18.0.0
- ffmpeg (for audio extraction)
`bash
macOS
brew install ffmpegUbuntu/Debian
sudo apt install ffmpeg
`Development
`bash
npm run build # Build TypeScript
npm run typecheck # Type checking
npm run test # Run tests
npm run dev # Dev mode with type stripping
``MIT License - Copyright (c) 2025 Aleksei Krasnoperov
See LICENSE file for details.