Transform books into multimedia content using Google Gemini AI
npm install storycanvasTransform books and text into multimedia content using Google Gemini AI.
StoryCanvas is an interactive CLI tool that converts text files (TXT, PDF, EPUB) into illustrated videos with AI-generated images, narration, and background music. It leverages the Google Gemini ecosystem including Imagen for image generation, Gemini TTS for narration, and Veo for AI video generation.
- Multiple Input Formats: Support for TXT, PDF, EPUB, and Markdown files
- Project Gutenberg Integration: Search and download classic books directly
- AI Image Generation: Create character and scene illustrations using Imagen 4 or Nano Banana
- TTS Narration: Generate spoken narration with 30+ voice options
- Video Creation: Choose between image slideshow or Veo AI video generation
- Background Music: Mix in royalty-free background music
- YouTube Metadata: Auto-generate titles, descriptions, and tags
``bash`
npm install -g storycanvas
Or run directly with npx:
`bash`
npx storycanvas
- Node.js 22 or later
- Google Gemini API key (Get one here)
1. Run the setup wizard:
`bash`
storycanvas onboard
This will guide you through:
- API key configuration
- Model selection
- Output directory setup
2. Create multimedia from a book:
`bash`
storycanvas create --file my-book.epub
3. Or download from Project Gutenberg:
`bash`
storycanvas create --gutenberg 74 # Tom Sawyer
Interactive setup wizard for first-time configuration. Sets up your API key, preferred models, and output directories.
Create multimedia content from text.
`bashInteractive mode
storycanvas create
Options:
-
-f, --file : Path to input file
- -g, --gutenberg : Project Gutenberg book ID
- -s, --stages : Comma-separated stages (illustrations, narration, video, music, metadata)
- -m, --mode : Video mode (slideshow or veo)$3
Browse and download books from Project Gutenberg.
`bash
Interactive mode
storycanvas booksSearch for books
storycanvas books --search "alice wonderland"Download by ID
storycanvas books --download 11List downloaded books
storycanvas books --list
`$3
Run diagnostics to check your setup.
`bash
storycanvas doctor
`Checks:
- Node.js version
- FFmpeg availability
- API key validity
- Configuration status
$3
View and manage configuration.
`bash
Show current config
storycanvas config --showEdit interactively
storycanvas config --editReset to defaults
storycanvas config --resetShow config file path
storycanvas config --path
`Configuration
Configuration is stored in
~/.storycanvasrc. You can edit it manually or use storycanvas config --edit.`json
{
"apiKey": "your-gemini-api-key",
"models": {
"text": "gemini-2.5-flash",
"image": "imagen-4.0-fast-generate-001",
"tts": "gemini-2.5-flash-preview-tts",
"video": "veo-3.1-fast"
},
"image": {
"maxCharacterImages": 30,
"maxSceneImages": 50,
"aspectRatio": "9:16",
"personGeneration": "allow_adult"
},
"video": {
"mode": "slideshow",
"fps": 0.5,
"resolution": "1080p"
},
"tts": {
"enabled": true,
"voice": "Kore"
},
"audio": {
"musicVolume": 0.3,
"narrationVolume": 1.0
},
"directories": {
"output": "./storycanvas-output",
"music": "./music",
"books": "./books"
}
}
`Available Models
$3
- gemini-2.5-flash (default, fast)
- gemini-2.5-pro (enhanced reasoning)$3
- imagen-4.0-fast-generate-001 (default, fast)
- imagen-4.0-ultra-generate-001 (highest quality)
- imagen-4.0-generate-001 (standard)
- gemini-2.5-flash-image (Nano Banana, native Gemini)
- gemini-3-pro-image-preview (Nano Banana Pro)$3
- gemini-2.5-flash-preview-tts (default)
- gemini-2.5-pro-preview-tts$3
- veo-3.1-fast (default, faster)
- veo-3.1 (higher quality)Pipeline Stages
1. Input Processing: Extract text from TXT/PDF/EPUB or download from Gutenberg
2. Illustration Generation: Create character and scene images with AI
3. Narration: Generate TTS audio from the text
4. Video Creation: Combine images into slideshow or generate with Veo
5. Background Music: Mix in audio tracks
6. Metadata Generation: Create YouTube-ready title, description, and tags
Background Music
Place your royalty-free music files in the
./music` directory (or configure a different path). Supported formats: MP3, M4A, WAV, AAC, OGG.MIT
Built with:
- Google Gemini API
- @clack/prompts for terminal UI
- fluent-ffmpeg for video processing