Transcribe

AI transcription skill for Claude Code - Transform audio/video recordings into transcripts with speaker diarization, AI-powered summaries, and visual infographics.

This skill provides a complete pipeline for processing recordings:

- Transcription - Convert audio/video to VTT format with speaker identification (OpenAI Whisper)
- Summarization - Generate structured markdown summaries (OpenAI GPT-5.1)
- Infographics - Create visual summaries from text (Google Gemini)
- All-in-one - Process video → transcript → summary → infographic in one command

See skills/transcribe/SKILL.md for complete usage guide.

Use in Claude Code

This is a Claude Code skill. Install it from the marketplace:

``bash /plugin marketplace add krasnoperov/claude-plugins /plugin install transcribe@krasnoperov-plugins`

Once installed, use the /transcribe skill in your conversations:

`/transcribe transcribe meeting.mp4 to VTT with speaker diarization /transcribe summarize this transcript into key points /transcribe create an infographic from this summary`

`Command Line Usage`

You can also use this package directly via npx:

`bash export OPENAI_API_KEY="your-openai-key" export GOOGLE_AI_STUDIO_KEY="your-google-key"

`Transcribe audio/video`


npx -y @krasnoperov/transcribe@latest transcribe meeting.mp4 -o transcript.vtt
Generate summary

npx -y @krasnoperov/transcribe@latest summarize transcript.vtt -o summary.md
Create infographic

npx -y @krasnoperov/transcribe@latest infographic summary.md -o visual.png
All-in-one pipeline

npx -y @krasnoperov/transcribe@latest process recording.mp4 --output-dir ./output


Get your API keys:
- OpenAI
- Google AI Studio
Core Operations

`transcribe Audio/Video → VTT transcript with speakers summarize Text/VTT → Markdown summary infographic Text → Visual infographic image process All-in-one: video → transcript → summary → infographic`

These operations can be used individually or chained together.

`Examples`

See skills/transcribe/examples/ directory:

1. 01-basic-workflow.sh - Step-by-step transcription pipeline 2. 02-all-in-one.sh - Single command processing

`$3`

`bash npx -y @krasnoperov/transcribe@latest transcribe podcast.mp3 \ --language es \ --model gpt-4o-transcribe-diarize \ -o podcast.vtt`

`$3`

`bash npx -y @krasnoperov/transcribe@latest transcribe meeting.mp4 \ --model gemini-3 \ -o meeting.vtt`

Gemini 3 offers excellent transcription with built-in speaker diarization and can handle very long audio files (up to ~8 hours).

Output (VTT with speaker tags):`WEBVTT

00:00:00.000 --> 00:00:02.450 Welcome to the podcast...

00:00:02.850 --> 00:00:08.200 Thanks for having me...`

`$3`

`bash npx -y @krasnoperov/transcribe@latest summarize transcript.vtt \ --prompt "Focus on action items and decisions" \ -o summary.md`

`$3`

`bash npx -y @krasnoperov/transcribe@latest infographic summary.md \ --style "modern minimal corporate" \ -o infographic.png`

`Options`

`$3`


--model           Transcription model:
                         OpenAI: gpt-4o-transcribe-diarize (default), gpt-4o-transcribe, whisper-1
                         Google: gemini-3
--language         Language code (en, es, ru, de, etc.)
-o, --output       Output VTT file

$3


--prompt           Custom summarization instructions
-o, --output       Output markdown file

$3


--style            Style instructions for visual
--reference       Reference image for style
-o, --output       Output image file

$3


--output-dir

       Output directory for all files
--language         Language for transcription
--model           Transcription model
--style            Style for infographic


Requirements
- Node.js >= 18.0.0
- ffmpeg (for audio extraction)

`bash

`macOS`


brew install ffmpeg
Ubuntu/Debian

sudo apt install ffmpeg


Development

`bash npm run build # Build TypeScript npm run typecheck # Type checking npm run test # Run tests npm run dev # Dev mode with type stripping``

License

See LICENSE file for details.

Transcribe

AI transcription skill for Claude Code - Transform audio/video recordings into transcripts with speaker diarization, AI-powered summaries, and visual infographics.

This skill provides a complete pipeline for processing recordings:

See skills/transcribe/SKILL.md for complete usage guide.

Use in Claude Code

This is a Claude Code skill. Install it from the marketplace:

``bash /plugin marketplace add krasnoperov/claude-plugins /plugin install transcribe@krasnoperov-plugins`

Once installed, use the /transcribe skill in your conversations:

`/transcribe transcribe meeting.mp4 to VTT with speaker diarization /transcribe summarize this transcript into key points /transcribe create an infographic from this summary`

`Command Line Usage`

You can also use this package directly via npx:

`bash export OPENAI_API_KEY="your-openai-key" export GOOGLE_AI_STUDIO_KEY="your-google-key"

`Transcribe audio/video`


npx -y @krasnoperov/transcribe@latest transcribe meeting.mp4 -o transcript.vtt
Generate summary

npx -y @krasnoperov/transcribe@latest summarize transcript.vtt -o summary.md
Create infographic

npx -y @krasnoperov/transcribe@latest infographic summary.md -o visual.png
All-in-one pipeline

npx -y @krasnoperov/transcribe@latest process recording.mp4 --output-dir ./output


Get your API keys:
- OpenAI
- Google AI Studio
Core Operations

These operations can be used individually or chained together.

`Examples`

See skills/transcribe/examples/ directory:

1. 01-basic-workflow.sh - Step-by-step transcription pipeline 2. 02-all-in-one.sh - Single command processing

`$3`

`bash npx -y @krasnoperov/transcribe@latest transcribe podcast.mp3 \ --language es \ --model gpt-4o-transcribe-diarize \ -o podcast.vtt`

`$3`

`bash npx -y @krasnoperov/transcribe@latest transcribe meeting.mp4 \ --model gemini-3 \ -o meeting.vtt`

Gemini 3 offers excellent transcription with built-in speaker diarization and can handle very long audio files (up to ~8 hours).

Output (VTT with speaker tags):`WEBVTT

00:00:00.000 --> 00:00:02.450 Welcome to the podcast...

00:00:02.850 --> 00:00:08.200 Thanks for having me...`

`$3`

`bash npx -y @krasnoperov/transcribe@latest summarize transcript.vtt \ --prompt "Focus on action items and decisions" \ -o summary.md`

`$3`

`bash npx -y @krasnoperov/transcribe@latest infographic summary.md \ --style "modern minimal corporate" \ -o infographic.png`

`Options`

`$3`


--model           Transcription model:
                         OpenAI: gpt-4o-transcribe-diarize (default), gpt-4o-transcribe, whisper-1
                         Google: gemini-3
--language         Language code (en, es, ru, de, etc.)
-o, --output       Output VTT file

$3


--prompt           Custom summarization instructions
-o, --output       Output markdown file

$3


--style            Style instructions for visual
--reference       Reference image for style
-o, --output       Output image file

$3


--output-dir

       Output directory for all files
--language         Language for transcription
--model           Transcription model
--style            Style for infographic


Requirements
- Node.js >= 18.0.0
- ffmpeg (for audio extraction)

`bash

`macOS`


brew install ffmpeg
Ubuntu/Debian

sudo apt install ffmpeg


Development

`bash npm run build # Build TypeScript npm run typecheck # Type checking npm run test # Run tests npm run dev # Dev mode with type stripping``

License

See LICENSE file for details.