Summary Forge Module

An intelligent tool that uses OpenAI's GPT-5 to forge comprehensive summaries of ebooks in multiple formats.

Repository: git@github.com:profullstack/summary-forge-module.git

Features

- 📚 Multiple Input Formats: Supports PDF, EPUB files, and web page URLs
- 🌐 Web Page Summarization: Fetch and summarize any web page with automatic content extraction
- 🤖 AI-Powered Summaries: Uses GPT-5 with direct PDF upload for better quality
- 📊 Vision API: Preserves formatting, tables, diagrams, and images from PDFs
- 🧩 Intelligent Chunking: Automatically processes large PDFs (500+ pages) without truncation
- 🛡️ Directory Protection: Prompts before overwriting existing summaries (use --force to skip)
- 📦 Multiple Output Formats: Creates Markdown, PDF, EPUB, plain text, and MP3 audio summaries
- 🃏 Printable Flashcards: Generates double-sided flashcard PDFs for studying
- 🖼️ Flashcard Images: Individual PNG images for web app integration (q-001.png, a-001.png, etc.)
- 🎙️ Natural Audio Narration: AI-generated conversational audio script for better listening
- 🗜️ Bundled Output: Packages everything into a convenient .tgz archive
- 🔄 Auto-Conversion: Automatically converts EPUB to PDF using Calibre
- 🔍 Book Search: Search Amazon by title using Rainforest API
- 📖 Auto-Download: Downloads books from Anna's Archive with CAPTCHA solving
- 💻 CLI & Module: Use as a command-line tool or import as an ESM module
- 🎨 Interactive Mode: Guided workflow with inquirer prompts
- 📥 EPUB Priority: Automatically prefers EPUB format (open standard, more flexible)

Installation

$3

``bash pnpm install -g @profullstack/summary-forge-module`

`$3`

`bash pnpm add @profullstack/summary-forge-module`

`Prerequisites`

1. Node.js v20 or newer

2. Calibre (for EPUB conversion - provides ebook-convertcommand)`bash # macOS brew install calibre # Ubuntu/Debian sudo apt-get install calibre # Arch Linux sudo pacman -S calibre`

3. Pandoc (for document conversion)`bash # macOS brew install pandoc # Ubuntu/Debian sudo apt-get install pandoc # Arch Linux sudo pacman -S pandoc`

4. XeLaTeX (for PDF generation)`bash # macOS brew install --cask mactex # Ubuntu/Debian sudo apt-get install texlive-xetex # Arch Linux sudo pacman -S texlive-core texlive-xetex`

`CLI Usage`

`$3`

Before using the CLI, configure your API keys:

`bash summary setup`

This interactive command will prompt you for: - OpenAI API Key (required) - Rainforest API Key (optional - for Amazon book search) - ElevenLabs API Key (optional - for audio generation, get key here) - 2Captcha API Key (optional - for CAPTCHA solving, sign up here) - Browserless API Key (optional) - Browser and proxy settings

Configuration is saved to ~/.config/summary-forge/settings.json and used automatically by all CLI commands.

`$3`

`bash

`View current configuration`


summary config
Update configuration

summary setup
Delete configuration

summary config --delete

Note: The CLI will use configuration in this priority order: 1. Environment variables (.envfile) 2. Configuration file (~/.config/summary-forge/settings.json)

`$3`

`bash summary interactive

`or`


summary i


This launches an interactive menu where you can:
- Process local files (PDF/EPUB)
- Process web page URLs
- Search for books by title
- Look up books by ISBN/ASIN
$3

`bash summary file /path/to/book.pdf summary file /path/to/book.epub

`Force overwrite if directory already exists`


summary file /path/to/book.pdf --force
summary file /path/to/book.pdf -f

$3

`bash summary url https://example.com/article summary url https://blog.example.com/post/123

`Force overwrite if directory already exists`


summary url https://example.com/article --force
summary url https://example.com/article -f


Features:
- Automatically fetches web page content using Puppeteer
- Sanitizes HTML to remove navigation, ads, footers, and other non-content elements
- Saves web page as PDF for processing
- Generates clean title from page title or uses OpenAI to create one
- Prompts specifically optimized for web page content (ignores nav/ads/footers)
- Creates same output formats as book processing (MD, TXT, PDF, EPUB, MP3, flashcards)

$3

`bash

`Search for books (defaults to 1lib.sk - faster, no DDoS protection)`


summary search "LLM Fine Tuning"
summary search "JavaScript" --max-results 5 --extensions pdf,epub
summary search "Python" --year-from 2020 --year-to 2024
summary search "Machine Learning" --languages english --order date
Use Anna's Archive instead (has DDoS protection, slower)

summary search "Clean Code" --source anna
summary search "Rare Book" --source anna --sources zlib,lgli
Title search (shortcut for search command)

summary title "A Philosophy of Software Design"
summary title "Clean Code" --force  # Auto-select first result
summary title "Python" --source anna  # Use Anna's Archive
ISBN lookup (defaults to 1lib.sk)

summary isbn 9780134685991
summary isbn B075HYVHWK --force  # Auto-select and process
summary isbn 9780134685991 --source anna  # Use Anna's Archive
Common Options:

  --source               Search source: zlib (1lib.sk, default) or anna (Anna's Archive)

  -n, --max-results      Maximum results to display (default: 10)

  -f, --force                    Auto-select first result and process immediately

#
1lib.sk Options (--source zlib, default):

  --year-from              Filter by publication year from (e.g., 2020)

  --year-to                Filter by publication year to (e.g., 2024)

  -l, --languages     Language filter, comma-separated (default: english)

  -e, --extensions   File extensions, comma-separated (case-insensitive, default: PDF)

  --content-types         Content types, comma-separated (default: book)

  -s, --order             Sort order: date (newest) or empty for relevance

  --view                   View type: list or grid (default: list)

#
Anna's Archive Options (--source anna):

  -f, --format           Filter by format: pdf, epub, pdf,epub, or all (default: pdf)

  -s, --sort               Sort by: date (newest) or empty for relevance (default: '')

  -l, --language       Language code(s), comma-separated (e.g., en, es, fr) (default: en)

  --sources             Data sources, comma-separated (default: all sources)

                                 Options: zlib, lgli, lgrs, and others

$3

`bash summary isbn B075HYVHWK

`Force overwrite if directory already exists`


summary isbn B075HYVHWK --force
summary isbn B075HYVHWK -f

$3

`bash summary --help summary file --help`

`Programmatic Usage`

`$3`

All methods now return consistent JSON objects with the following structure:

`javascript { success: true | false, // Indicates if operation succeeded ...data, // Method-specific data fields error?: string, // Error message (only when success is false) message?: string // Success message (optional) }`

This enables: - ✅ Consistent error handling - Checksuccessfield instead of try-catch - ✅ REST API ready - Direct JSON responses for HTTP endpoints - ✅ Better debugging - Rich metadata in all responses - ✅ Type-safe - Predictable structure for TypeScript users

`$3`

`javascript import { SummaryForge } from '@profullstack/summary-forge-module'; import { loadConfig } from '@profullstack/summary-forge-module/config';

// Load config from ~/.config/summary-forge/settings.json const configResult = await loadConfig(); if (!configResult.success) { console.error('Failed to load config:', configResult.error); process.exit(1); }

const forge = new SummaryForge(configResult.config);

const result = await forge.processFile('./my-book.pdf'); if (result.success) { console.log('Summary created:', result.archive); console.log('Files:', result.files); console.log('Costs:', result.costs); } else { console.error('Processing failed:', result.error); }`

`$3`

`javascript import { SummaryForge } from '@profullstack/summary-forge-module';

const forge = new SummaryForge({ // Required openaiApiKey: 'sk-...', // Optional API keys rainforestApiKey: 'your-key', // For Amazon search elevenlabsApiKey: 'sk-...', // For audio generation (get key: https://try.elevenlabs.io/oh7kgotrpjnv) twocaptchaApiKey: 'your-key', // For CAPTCHA solving (sign up: https://2captcha.com/?from=9630996) browserlessApiKey: 'your-key', // For browserless.io // Processing options maxChars: 500000, // Max chars to process maxTokens: 20000, // Max tokens in output summary maxInputTokens: 250000, // Max input tokens per API call (default: 250000 for GPT-5) // Audio options voiceId: '21m00Tcm4TlvDq8ikWAM', // ElevenLabs voice voiceSettings: { stability: 0.5, similarity_boost: 0.75 }, // Browser options headless: true, // Run browser in headless mode enableProxy: false, // Enable proxy proxyUrl: 'http://proxy.com', // Proxy URL proxyUsername: 'user', // Proxy username proxyPassword: 'pass', // Proxy password proxyPoolSize: 36 // Number of proxies in pool (default: 36) });

const result = await forge.processFile('./book.epub'); console.log('Archive:', result.archive);`

`$3`

#### Using Amazon/Rainforest API

`javascript const forge = new SummaryForge({ openaiApiKey: process.env.OPENAI_API_KEY, rainforestApiKey: process.env.RAINFOREST_API_KEY });

const searchResult = await forge.searchBookByTitle('Clean Code'); if (!searchResult.success) { console.error('Search failed:', searchResult.error); process.exit(1); }

console.log(Found ${searchResult.count} results:); console.log(searchResult.results.map(b => ({ title: b.title, author: b.author, asin: b.asin })));

// Get download URL const url = forge.getAnnasArchiveUrl(searchResult.results[0].asin); console.log('Download from:', url);`

#### Using Anna's Archive Direct Search (No Rainforest API Required)

`javascript const forge = new SummaryForge({ openaiApiKey: process.env.OPENAI_API_KEY, enableProxy: true, proxyUrl: process.env.PROXY_URL, proxyUsername: process.env.PROXY_USERNAME, proxyPassword: process.env.PROXY_PASSWORD });

// Basic search const searchResult = await forge.searchAnnasArchive('JavaScript', { maxResults: 10, format: 'pdf', sortBy: 'date' // Sort by newest });

if (!searchResult.success) { console.error('Search failed:', searchResult.error); process.exit(1); }

console.log(Found ${searchResult.count} results); console.log(searchResult.results.map(r => ({ title: r.title, author: r.author, format: r.format, size:${r.sizeInMB.toFixed(1)} MB, url: r.url })));

// Download the first result if (searchResult.results.length > 0) { const md5 = searchResult.results[0].href.match(/\/md5\/([a-f0-9]+)/)[1]; const downloadResult = await forge.downloadFromAnnasArchive(md5, '.', searchResult.results[0].title); if (downloadResult.success) { console.log('Downloaded:', downloadResult.filepath); console.log('Directory:', downloadResult.directory); } else { console.error('Download failed:', downloadResult.error); } }`

#### Using 1lib.sk Search (Faster, No DDoS Protection)

// Basic search const searchResult = await forge.search1lib('LLM Fine Tuning', { maxResults: 10, yearFrom: 2020, languages: ['english'], extensions: ['PDF'] });

if (!searchResult.success) { console.error('Search failed:', searchResult.error); process.exit(1); }

console.log(Found ${searchResult.count} results); console.log(searchResult.results.map(r => ({ title: r.title, author: r.author, year: r.year, extension: r.extension, size: r.size, language: r.language, isbn: r.isbn, url: r.url })));

// Download the first result if (searchResult.results.length > 0) { const downloadResult = await forge.downloadFrom1lib( searchResult.results[0].url, '.', searchResult.results[0].title ); if (downloadResult.success) { console.log('Downloaded:', downloadResult.filepath); // Process the downloaded book const processResult = await forge.processFile(downloadResult.filepath, downloadResult.identifier); if (processResult.success) { console.log('Summary created:', processResult.archive); console.log('Costs:', processResult.costs); } else { console.error('Processing failed:', processResult.error); } } else { console.error('Download failed:', downloadResult.error); } }`

Enhanced Error Handling:

The 1lib.sk download functionality includes robust error handling with automatic debugging:

- Multiple Selector Fallbacks: Tries 6 different selectors to find download buttons - Debug HTML Capture: Saves page HTML when download button isn't found - Link Analysis: Lists all links on the page for troubleshooting - Detailed Error Messages: Provides actionable information for debugging

If a download fails, check the debug-book-page.html file in the book's directory for detailed page structure information.

`$3`

#### Constructor Options

`javascript new SummaryForge({ // API Keys openaiApiKey: string, // Required: OpenAI API key rainforestApiKey: string, // Optional: For title search elevenlabsApiKey: string, // Optional: For audio generation twocaptchaApiKey: string, // Optional: For CAPTCHA solving browserlessApiKey: string, // Optional: For browserless.io // Processing Options maxChars: number, // Optional: Max chars to process (default: 400000) maxTokens: number, // Optional: Max tokens in output summary (default: 16000) maxInputTokens: number, // Optional: Max input tokens per API call (default: 250000 for GPT-5) // Audio Options voiceId: string, // Optional: ElevenLabs voice ID (default: Brian) voiceSettings: object, // Optional: Voice customization settings // Browser Options headless: boolean, // Optional: Run browser in headless mode (default: true) enableProxy: boolean, // Optional: Enable proxy (default: false) proxyUrl: string, // Optional: Proxy URL proxyUsername: string, // Optional: Proxy username proxyPassword: string, // Optional: Proxy password proxyPoolSize: number // Optional: Number of proxies in pool (default: 36) })`

#### Methods

All methods return JSON objects with { success, ...data, error?, message? } format.

##### Processing Methods

- processFile(filePath, asin?)- Process a PDF or EPUB file - Returns:{ success, basename, markdown, files, archive, hasAudio, asin, costs, message, error? }- Example:`javascript const result = await forge.processFile('./book.pdf'); if (result.success) { console.log('Archive:', result.archive); console.log('Costs:', result.costs); }`

- processWebPage(url, outputDir?)- Process a web page URL - Returns:{ success, basename, dirName, markdown, files, directory, archive, hasAudio, url, title, costs, message, error? }- Example:`javascript const result = await forge.processWebPage('https://example.com/article'); if (result.success) { console.log('Summary:', result.markdown.substring(0, 100)); }`

##### Search Methods

- searchBookByTitle(title)- Search Amazon using Rainforest API - Returns:{ success, results, count, query, message, error? }- Example:`javascript const result = await forge.searchBookByTitle('Clean Code'); if (result.success) { console.log(Found ${result.count} books); }`

- searchAnnasArchive(query, options?)- Search Anna's Archive directly - Returns:{ success, results, count, query, options, message, error? }- Example:`javascript const result = await forge.searchAnnasArchive('JavaScript', { maxResults: 10, format: 'pdf', sortBy: 'date' }); if (result.success) { console.log(Found ${result.count} results); }`

- search1lib(query, options?)- Search 1lib.sk - Returns:{ success, results, count, query, options, message, error? }

##### Download Methods

- downloadFromAnnasArchive(asin, outputDir?, bookTitle?)- Download from Anna's Archive - Returns:{ success, filepath, directory, asin, format, message, error? }- Example:`javascript const result = await forge.downloadFromAnnasArchive('B075HYVHWK', '.'); if (result.success) { console.log('Downloaded to:', result.filepath); }`

- downloadFrom1lib(bookUrl, outputDir?, bookTitle?, downloadUrl?)- Download from 1lib.sk - Returns:{ success, filepath, directory, title, format, message, error? }

- search1libAndDownload(query, searchOptions?, outputDir?, selectCallback?)- Search and download in one session - Returns:{ success, results, download, message, error? }

##### Generation Methods

- generateSummary(pdfPath)- Generate AI summary from PDF - Returns:{ success, markdown, length, method, chunks?, message, error? }- Methods:gpt5_pdf_upload, text_extraction_single, text_extraction_chunked- Example:`javascript const result = await forge.generateSummary('./book.pdf'); if (result.success) { console.log(Generated ${result.length} char summary using ${result.method}); }`

- generateAudioScript(markdown)- Generate audio-friendly narration script - Returns:{ success, script, length, message }

- generateAudio(text, outputPath)- Generate audio using ElevenLabs TTS - Returns:{ success, path, size, duration, message, error? }

- generateOutputFiles(markdown, basename, outputDir)- Generate all output formats - Returns:{ success, files: {...}, message }

##### Utility Methods

- convertEpubToPdf(epubPath)- Convert EPUB to PDF - Returns:{ success, pdfPath, originalPath, message, error? }

- createBundle(files, archiveName)- Create tar.gz archive - Returns:{ success, path, files, message, error? }

- getCostSummary()- Get cost tracking information - Returns:{ success, openai, elevenlabs, rainforest, total, breakdown }

`Configuration`

`$3`

For CLI usage, run the setup command to configure your API keys:

`bash summary setup`

This saves your configuration to ~/.config/summary-forge/settings.json so you don't need to manage environment variables.

`$3`

For programmatic usage or if you prefer environment variables, create a .env file:

`env OPENAI_API_KEY=sk-your-key-here RAINFOREST_API_KEY=your-key-here ELEVENLABS_API_KEY=sk-your-key-here # Optional: for audio generation TWOCAPTCHA_API_KEY=your-key-here # Optional: for CAPTCHA solving BROWSERLESS_API_KEY=your-key-here # Optional

`Browser Configuration`


HEADLESS=true                          # Run browser in headless mode
ENABLE_PROXY=false                     # Enable proxy for browser requests
PROXY_URL=http://proxy.example.com    # Proxy URL (if enabled)
PROXY_USERNAME=username                # Proxy username (if enabled)
PROXY_PASSWORD=password                # Proxy password (if enabled)
PROXY_POOL_SIZE=36                     # Number of proxies in your pool (default: 36)


Or set them in your shell:

`bash export OPENAI_API_KEY=sk-your-key-here export RAINFOREST_API_KEY=your-key-here export ELEVENLABS_API_KEY=sk-your-key-here # Optional`

`$3`

When using the module programmatically, configuration is loaded in this order (highest priority first):

1. Constructor options - Passed directly to new SummaryForge(options)2. Environment variables - From.envfile or shell 3. Config file - From~/.config/summary-forge/settings.json (CLI only)

`$3`

To avoid IP bans when downloading from Anna's Archive, configure a proxy during setup:

`bash summary setup`

When prompted: 1. Enable proxy:Yes2. Enter proxy URL:http://your-proxy.com:80803. Enter proxy username and password

Why use a proxy? - ✅ Avoids IP bans from Anna's Archive - ✅ USA-based proxies prevent geo-location issues - ✅ Works with both browser navigation and file downloads - ✅ Automatically applied to all download operations

Recommended Proxy Service:

We recommend Webshare.io for reliable, USA-based proxies: - 🌎 USA-based IPs (no geo-location issues) - ⚡ Fast and reliable - 💰 Affordable pricing with free tier - 🔒 HTTP/HTTPS/SOCKS5 support

Important: Use Static Proxies for Sticky Sessions

For Anna's Archive downloads, you need a static/direct proxy (not rotating) to maintain the same IP:

1. In your Webshare dashboard, go to Proxy → List 2. Copy a Static Proxy endpoint (not the rotating endpoint) 3. Use the format:http://host:port (e.g., http://45.95.96.132:8080) 4. Username format:dmdgluqz-US-{session_id} (session ID added automatically)

The tool automatically generates a unique session ID (1 to PROXY_POOL_SIZE) for each download to get a fresh IP, while maintaining that IP throughout the 5-10 minute download process.

Proxy Pool Size Configuration:

Set PROXY_POOL_SIZEto match your Webshare plan (default: 36): - Free tier: 10 proxies →PROXY_POOL_SIZE=10- Starter plan: 25 proxies →PROXY_POOL_SIZE=25- Professional plan: 100 proxies →PROXY_POOL_SIZE=100- Enterprise plan: 250+ proxies →PROXY_POOL_SIZE=250

The tool will randomly select a session ID from 1 to your pool size, distributing load across all available proxies.

Smart ISBN Detection:

When searching Anna's Archive, the tool automatically detects whether an identifier is a real ISBN or an Amazon ASIN: - Real ISBNs (10 or 13 numeric digits): Searches by ISBN for precise results - Amazon ASINs (alphanumeric): Searches by book title instead for better results - This ensures you get relevant search results even when Amazon returns proprietary ASINs instead of standard ISBNs

Note: Rotating proxies (p.webshare.io) don't support sticky sessions. Use individual static proxy IPs from your proxy list instead.

Testing your proxy:`bash node test-proxy.js`

This will verify your proxy configuration by attempting to download a book.

`$3`

Audio generation is optional and requires an ElevenLabs API key. If the key is not provided, the tool will skip audio generation and only create text-based outputs.

Get ElevenLabs API Key: Sign up here for high-quality text-to-speech.

Features: - Uses ElevenLabs Turbo v2.5 model (optimized for audiobooks) - Default voice: Brian (best for technical content, customizable) - Automatically truncates long texts to fit API limits - Generates high-quality MP3 audio files - Natural, conversational narration style

`Output`

The tool generates:

- _summary.md- Markdown summary -_summary.txt- Plain text summary -_summary.pdf- PDF summary with table of contents -_summary.epub- EPUB summary with clickable TOC -_summary.mp3- Audio summary (if ElevenLabs key provided) -.pdf- Original or converted PDF -.epub- Original EPUB (if input was EPUB) -_bundle.tgz - Compressed archive containing all files

`Example Workflow`

`bash

`1. Search for a book`


summary search
Enter: "A Philosophy of Software Design"

Select from results, get ASIN
2. Download and process automatically

summary isbn B075HYVHWK
Downloads, asks if you want to process

Creates summary bundle automatically!
Alternative: Process a local file

summary file ~/Downloads/book.epub


How It Works
1. Input Processing: Accepts PDF or EPUB files (EPUB is converted to PDF)
2. Smart Processing Strategy:
   - Small PDFs (<400k chars): Direct upload to OpenAI's vision API
   - Large PDFs (>400k chars): Intelligent chunking with synthesis
3. AI Summarization: GPT-5 analyzes content with full formatting, tables, and diagrams
4. Format Conversion: Uses Pandoc to convert the Markdown summary to PDF and EPUB
5. Audio Generation: Optional TTS conversion using ElevenLabs
6. Bundling: Creates a compressed archive with all generated files
$3
For PDFs exceeding 400,000 characters (typically 500+ pages), the tool automatically uses an intelligent chunking strategy:
How it works:
1. Analysis: Calculates optimal chunk size based on PDF statistics and GPT-5's token limits
2. Smart Token Management: Respects GPT-5's 272k input token limit with safety margins
3. Page-Based Chunking: Splits PDF into logical chunks that fit within token limits
4. Parallel Processing: Each chunk is summarized independently by GPT-5
5. Intelligent Synthesis: All chunk summaries are combined into a cohesive final summary
6. Quality Preservation: Maintains narrative flow and eliminates redundancy
Token Limit Handling:
- GPT-5 Input Limit: 272,000 tokens
- System Overhead: 20,000 tokens reserved for prompts and instructions
- Available Tokens: 250,000 tokens for content
- Safety Margin: 70% utilization to account for token estimation variance
- Chunk Size: ~565,000 characters per chunk (based on 3.5 chars/token estimate)
Benefits:
- ✅ Complete Coverage: Processes entire books without truncation
- ✅ High Quality: Each section gets full AI attention
- ✅ Seamless Output: Final summary reads as a unified document
- ✅ Cost Efficient: Optimizes token usage across multiple API calls
- ✅ Automatic: No configuration needed - works transparently
- ✅ Token-Aware: Respects API limits to prevent errors

Example Output:`📊 PDF Stats: 523 pages, 1,245,678 chars, ~311,420 tokens 📚 PDF is large - using intelligent chunking strategy This will process the ENTIRE 523-page PDF without truncation 📐 Using chunk size: 120,000 chars 📦 Created 11 chunks for processing Chunk 1: Pages 1-48 (119,234 chars) Chunk 2: Pages 49-95 (118,901 chars) ... ✅ All 11 chunks processed successfully 🔄 Synthesizing chunk summaries into final comprehensive summary... ✅ Final summary synthesized: 45,678 characters`

`$3`

The tool prioritizes OpenAI's vision API for direct PDF upload when possible:

- ✅ Better Quality: Preserves document formatting, tables, and diagrams - ✅ More Accurate: AI can see the actual PDF layout and structure - ✅ Better for Technical Books: Code examples and diagrams are preserved - ✅ Fallback Strategy: Automatically switches to intelligent chunking for large files

`Testing`

Summary Forge includes a comprehensive test suite using Vitest.

`$3`

`bash

`Run all tests`


pnpm test
Run tests in watch mode

pnpm test:watch
Run tests with coverage report

pnpm test:coverage


$3
The test suite includes:
- ✅ 30+ passing tests
- Constructor validation
- Helper method tests
- PDF upload functionality tests
- API integration tests
- Error handling tests
- Edge case coverage
- File operation tests

See test/summary-forge.test.js for the complete test suite.

`Flashcard Generation`

Summary Forge includes powerful flashcard generation capabilities for study and review.

`$3`

Generate double-sided flashcard PDFs optimized for printing:

`javascript import { extractFlashcards, generateFlashcardsPDF } from '@profullstack/summary-forge-module/flashcards'; import fs from 'node:fs/promises';

// Read your markdown summary const markdown = await fs.readFile('./book_summary.md', 'utf-8');

// Extract Q&A pairs const extractResult = extractFlashcards(markdown, { maxCards: 50 }); console.log(Extracted ${extractResult.count} flashcards);

// Generate printable PDF const pdfResult = await generateFlashcardsPDF( extractResult.flashcards, './flashcards.pdf', { title: 'JavaScript Fundamentals', branding: 'SummaryForge.com', cardWidth: 3.5, // inches cardHeight: 2.5, // inches fontSize: 11 } );

console.log(PDF created: ${pdfResult.path}); console.log(Total pages: ${pdfResult.pages});`

`$3`

Generate individual PNG images for each flashcard, perfect for web applications:

`javascript import { extractFlashcards, generateFlashcardImages } from '@profullstack/summary-forge-module/flashcards'; import fs from 'node:fs/promises';

// Read your markdown summary const markdown = await fs.readFile('./book_summary.md', 'utf-8');

// Extract Q&A pairs const extractResult = extractFlashcards(markdown);

// Generate individual PNG images const imageResult = await generateFlashcardImages( extractResult.flashcards, './flashcards', // Output directory { title: 'JavaScript Fundamentals', branding: 'SummaryForge.com', width: 800, // pixels height: 600, // pixels fontSize: 24 } );

if (imageResult.success) { console.log(Generated ${imageResult.images.length} images); console.log('Files:', imageResult.images); // Output: ['./flashcards/q-001.png', './flashcards/a-001.png', ...] }`

Image Naming Convention: -q-001.png, q-002.png, etc. - Question cards -a-001.png, a-002.png, etc. - Answer cards

Use Cases: - 🌐 Web-based flashcard applications - 📱 Mobile learning apps - 🎮 Interactive quiz games - 📊 Study progress tracking systems - 🔄 Spaced repetition software

Features: - ✅ Clean, professional design with book title - ✅ Automatic text wrapping for long content - ✅ Customizable dimensions and styling - ✅ SVG-based rendering for crisp quality - ✅ Works in Docker (no native dependencies)

`$3`

The extractFlashcards function supports multiple markdown formats:

1. Explicit Q&A Format:`markdown Q: What is a closure? A: A closure is a function that has access to variables in its outer scope.`

2. Definition Lists:`markdown Closure : A function that has access to variables in its outer scope.`

3. Question Headers:`markdown

`$3`

A closure is a function that has access to variables in its outer scope.`

`Examples`

See the examples/ directory for more usage examples:

- programmatic-usage.js- Using as a module -flashcard-images-demo.js - Generating flashcard images

`Troubleshooting`

`$3`

If you encounter "Too many requests" errors from 1lib.sk:

Error Message:`Too many requests from your IP xxx.xxx.xxx.xxx Please wait 10 seconds. support@z-lib.fm. Err #ipd1`

Automatic Handling: The tool automatically detects rate limiting and: - ✅ Waits the requested time (usually 10 seconds) - ✅ Retries up to 3 times with exponential backoff - ✅ Adds a 2-second buffer to ensure rate limit has cleared

Manual Solutions: 1. Wait a few minutes before trying again 2. Use a different proxy session (the tool rotates through your proxy pool automatically) 3. Switch to Anna's Archive:summary search "book title" --source anna4. Reduce concurrent requests if running multiple downloads

Note: The proxy pool helps distribute requests across different IPs, reducing rate limiting issues.

`$3`

If you encounter "Download button not found" errors when downloading from 1lib.sk:

1. Check Debug Files: The tool automatically saves debug-book-page.htmlin the book's directory - Open this file to inspect the actual page structure - Look for download links or buttons that might have different selectors

2. Review Error Output: The error message includes: - All selectors that were tried - List of links found on the page - Location of the debug HTML file

3. Common Causes: - Z-Access/Library Access Page: Book page redirects to authentication page (most common) - Page structure changed (1lib.sk updates their site) - Book is deleted or unavailable - Session expired or cookies not maintained - Proxy issues preventing proper page load

4. Solutions: - Recommended: Use Anna's Archive instead:summary search "book title" --source anna- Try thesearch1libcommand separately to verify the book exists - Check if the book page loads correctly in a regular browser with the same proxy - Verify proxy configuration is working correctly - Try a different book from search results

5. Known Issue - Z-Access Page: If you see links tolibrary-access.sk or Z-Access pagein the debug output, this means: - The book page requires authentication or special access - 1lib.sk's session management is blocking automated access - Workaround: Use Anna's Archive which has better automation support

Example Debug Output (Z-Access Issue):`❌ Download button not found on book page Debug HTML saved to: ./uploads/book_name/debug-book-page.html Found 6 links on page First 5 links: - https://library-access.sk (Z-Access page) - mailto:blackbox@z-library.so (blackbox@z-library.so) - https://www.reddit.com/r/zlibrary (https://www.reddit.com/r/zlibrary)`

Recommended Alternative:`bash

`Use Anna's Archive instead (more reliable for automation)`


summary search "prompt engineering" --source anna


$3
If you're getting blocked by Anna's Archive:

1. Enable proxy in your configuration:`bash summary setup`2. Use a USA-based proxy to avoid geo-location issues

3. Test your proxy before downloading:`bash node test-proxy.js B0BCTMXNVN`

4. Run browser in visible mode to debug:`bash summary config --headless false`

`$3`

The proxy is used for: - ✅ Browser navigation (Puppeteer) - ✅ File downloads (fetch with https-proxy-agent) - ✅ All HTTP requests to Anna's Archive

Supported proxy formats: -http://proxy.example.com:8080-https://proxy.example.com:8080-socks5://proxy.example.com:1080-http://proxy.example.com:8080-session- (sticky session)

Recommended Service: Webshare.io - Reliable USA-based proxies with free tier available.

Webshare Sticky Sessions: Add-session-to your proxy URL to maintain the same IP:`http://p.webshare.io:80-session-myapp123`

`CAPTCHA Solving`

When downloading from Anna's Archive, you may encounter CAPTCHAs. To automatically solve them:

1. Sign up for 2Captcha: Get API key here 2. Add to configuration:`bash summary setup``
3. Enter your 2Captcha API key when prompted

The tool will automatically detect and solve CAPTCHAs during downloads, making the process fully automated.

Limitations

- Maximum PDF file size: No practical limit (intelligent chunking handles any size)
- GPT-5 uses default temperature of 1 (not configurable)
- Requires external tools: Calibre, Pandoc, XeLaTeX
- CAPTCHA solving requires 2captcha.com API key (optional)
- Very large PDFs (1000+ pages) may incur higher API costs due to multiple chunk processing
- Anna's Archive may block IPs without proxy configuration
- Chunked processing uses text extraction (images/diagrams described in text only)

Roadmap

- [x] ISBN/ASIN lookup via Anna's Archive
- [x] Automatic download from Anna's Archive with CAPTCHA solving
- [x] Book title search via Rainforest API
- [x] CLI with interactive mode
- [x] ESM module for programmatic use
- [x] Audio generation with ElevenLabs TTS
- [x] Direct PDF upload to OpenAI vision API
- [x] EPUB format prioritization (open standard)
- [ ] Support for more input formats (MOBI, AZW3)
- [ ] Chunked processing for very large books (>100MB)
- [ ] Custom summary templates
- [ ] Web interface
- [ ] Multiple voice options for audio
- [ ] Audio chapter markers
- [ ] Batch processing multiple books

License

ISC

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Summary Forge Module

An intelligent tool that uses OpenAI's GPT-5 to forge comprehensive summaries of ebooks in multiple formats.

Repository: git@github.com:profullstack/summary-forge-module.git

Features

Installation

$3

``bash pnpm install -g @profullstack/summary-forge-module`

`$3`

`bash pnpm add @profullstack/summary-forge-module`

`Prerequisites`

1. Node.js v20 or newer

2. Calibre (for EPUB conversion - provides ebook-convertcommand)`bash # macOS brew install calibre # Ubuntu/Debian sudo apt-get install calibre # Arch Linux sudo pacman -S calibre`

3. Pandoc (for document conversion)`bash # macOS brew install pandoc # Ubuntu/Debian sudo apt-get install pandoc # Arch Linux sudo pacman -S pandoc`

4. XeLaTeX (for PDF generation)`bash # macOS brew install --cask mactex # Ubuntu/Debian sudo apt-get install texlive-xetex # Arch Linux sudo pacman -S texlive-core texlive-xetex`

`CLI Usage`

`$3`

Before using the CLI, configure your API keys:

`bash summary setup`

Configuration is saved to ~/.config/summary-forge/settings.json and used automatically by all CLI commands.

`$3`

`bash

`View current configuration`


summary config
Update configuration

summary setup
Delete configuration

summary config --delete

Note: The CLI will use configuration in this priority order: 1. Environment variables (.envfile) 2. Configuration file (~/.config/summary-forge/settings.json)

`$3`

`bash summary interactive

`or`


summary i


This launches an interactive menu where you can:
- Process local files (PDF/EPUB)
- Process web page URLs
- Search for books by title
- Look up books by ISBN/ASIN
$3

`bash summary file /path/to/book.pdf summary file /path/to/book.epub

`Force overwrite if directory already exists`


summary file /path/to/book.pdf --force
summary file /path/to/book.pdf -f

$3

`bash summary url https://example.com/article summary url https://blog.example.com/post/123

`Force overwrite if directory already exists`


summary url https://example.com/article --force
summary url https://example.com/article -f


Features:
- Automatically fetches web page content using Puppeteer
- Sanitizes HTML to remove navigation, ads, footers, and other non-content elements
- Saves web page as PDF for processing
- Generates clean title from page title or uses OpenAI to create one
- Prompts specifically optimized for web page content (ignores nav/ads/footers)
- Creates same output formats as book processing (MD, TXT, PDF, EPUB, MP3, flashcards)

$3

`bash

`Search for books (defaults to 1lib.sk - faster, no DDoS protection)`


summary search "LLM Fine Tuning"
summary search "JavaScript" --max-results 5 --extensions pdf,epub
summary search "Python" --year-from 2020 --year-to 2024
summary search "Machine Learning" --languages english --order date
Use Anna's Archive instead (has DDoS protection, slower)

summary search "Clean Code" --source anna
summary search "Rare Book" --source anna --sources zlib,lgli
Title search (shortcut for search command)

summary title "A Philosophy of Software Design"
summary title "Clean Code" --force  # Auto-select first result
summary title "Python" --source anna  # Use Anna's Archive
ISBN lookup (defaults to 1lib.sk)

summary isbn 9780134685991
summary isbn B075HYVHWK --force  # Auto-select and process
summary isbn 9780134685991 --source anna  # Use Anna's Archive
Common Options:

  --source               Search source: zlib (1lib.sk, default) or anna (Anna's Archive)

  -n, --max-results      Maximum results to display (default: 10)

  -f, --force                    Auto-select first result and process immediately

#
1lib.sk Options (--source zlib, default):

  --year-from              Filter by publication year from (e.g., 2020)

  --year-to                Filter by publication year to (e.g., 2024)

  -l, --languages     Language filter, comma-separated (default: english)

  -e, --extensions   File extensions, comma-separated (case-insensitive, default: PDF)

  --content-types         Content types, comma-separated (default: book)

  -s, --order             Sort order: date (newest) or empty for relevance

  --view                   View type: list or grid (default: list)

#
Anna's Archive Options (--source anna):

  -f, --format           Filter by format: pdf, epub, pdf,epub, or all (default: pdf)

  -s, --sort               Sort by: date (newest) or empty for relevance (default: '')

  -l, --language       Language code(s), comma-separated (e.g., en, es, fr) (default: en)

  --sources             Data sources, comma-separated (default: all sources)

                                 Options: zlib, lgli, lgrs, and others

$3

`bash summary isbn B075HYVHWK

`Force overwrite if directory already exists`


summary isbn B075HYVHWK --force
summary isbn B075HYVHWK -f

$3

`bash summary --help summary file --help`

`Programmatic Usage`

`$3`

All methods now return consistent JSON objects with the following structure:

`$3`

`javascript import { SummaryForge } from '@profullstack/summary-forge-module'; import { loadConfig } from '@profullstack/summary-forge-module/config';

const forge = new SummaryForge(configResult.config);

`$3`

`javascript import { SummaryForge } from '@profullstack/summary-forge-module';

const result = await forge.processFile('./book.epub'); console.log('Archive:', result.archive);`

`$3`

#### Using Amazon/Rainforest API

`javascript const forge = new SummaryForge({ openaiApiKey: process.env.OPENAI_API_KEY, rainforestApiKey: process.env.RAINFOREST_API_KEY });

const searchResult = await forge.searchBookByTitle('Clean Code'); if (!searchResult.success) { console.error('Search failed:', searchResult.error); process.exit(1); }

console.log(Found ${searchResult.count} results:); console.log(searchResult.results.map(b => ({ title: b.title, author: b.author, asin: b.asin })));

// Get download URL const url = forge.getAnnasArchiveUrl(searchResult.results[0].asin); console.log('Download from:', url);`

#### Using Anna's Archive Direct Search (No Rainforest API Required)

// Basic search const searchResult = await forge.searchAnnasArchive('JavaScript', { maxResults: 10, format: 'pdf', sortBy: 'date' // Sort by newest });

if (!searchResult.success) { console.error('Search failed:', searchResult.error); process.exit(1); }

#### Using 1lib.sk Search (Faster, No DDoS Protection)

// Basic search const searchResult = await forge.search1lib('LLM Fine Tuning', { maxResults: 10, yearFrom: 2020, languages: ['english'], extensions: ['PDF'] });

if (!searchResult.success) { console.error('Search failed:', searchResult.error); process.exit(1); }

Enhanced Error Handling:

The 1lib.sk download functionality includes robust error handling with automatic debugging:

If a download fails, check the debug-book-page.html file in the book's directory for detailed page structure information.

`$3`

#### Constructor Options

#### Methods

All methods return JSON objects with { success, ...data, error?, message? } format.

##### Processing Methods

##### Search Methods

- search1lib(query, options?)- Search 1lib.sk - Returns:{ success, results, count, query, options, message, error? }

##### Download Methods

- downloadFrom1lib(bookUrl, outputDir?, bookTitle?, downloadUrl?)- Download from 1lib.sk - Returns:{ success, filepath, directory, title, format, message, error? }

- search1libAndDownload(query, searchOptions?, outputDir?, selectCallback?)- Search and download in one session - Returns:{ success, results, download, message, error? }

##### Generation Methods

- generateAudioScript(markdown)- Generate audio-friendly narration script - Returns:{ success, script, length, message }

- generateAudio(text, outputPath)- Generate audio using ElevenLabs TTS - Returns:{ success, path, size, duration, message, error? }

- generateOutputFiles(markdown, basename, outputDir)- Generate all output formats - Returns:{ success, files: {...}, message }

##### Utility Methods

- convertEpubToPdf(epubPath)- Convert EPUB to PDF - Returns:{ success, pdfPath, originalPath, message, error? }

- createBundle(files, archiveName)- Create tar.gz archive - Returns:{ success, path, files, message, error? }

- getCostSummary()- Get cost tracking information - Returns:{ success, openai, elevenlabs, rainforest, total, breakdown }

`Configuration`

`$3`

For CLI usage, run the setup command to configure your API keys:

`bash summary setup`

This saves your configuration to ~/.config/summary-forge/settings.json so you don't need to manage environment variables.

`$3`

For programmatic usage or if you prefer environment variables, create a .env file:

`Browser Configuration`


HEADLESS=true                          # Run browser in headless mode
ENABLE_PROXY=false                     # Enable proxy for browser requests
PROXY_URL=http://proxy.example.com    # Proxy URL (if enabled)
PROXY_USERNAME=username                # Proxy username (if enabled)
PROXY_PASSWORD=password                # Proxy password (if enabled)
PROXY_POOL_SIZE=36                     # Number of proxies in your pool (default: 36)


Or set them in your shell:

`bash export OPENAI_API_KEY=sk-your-key-here export RAINFOREST_API_KEY=your-key-here export ELEVENLABS_API_KEY=sk-your-key-here # Optional`

`$3`

When using the module programmatically, configuration is loaded in this order (highest priority first):

1. Constructor options - Passed directly to new SummaryForge(options)2. Environment variables - From.envfile or shell 3. Config file - From~/.config/summary-forge/settings.json (CLI only)

`$3`

To avoid IP bans when downloading from Anna's Archive, configure a proxy during setup:

`bash summary setup`

When prompted: 1. Enable proxy:Yes2. Enter proxy URL:http://your-proxy.com:80803. Enter proxy username and password

Recommended Proxy Service:

Important: Use Static Proxies for Sticky Sessions

For Anna's Archive downloads, you need a static/direct proxy (not rotating) to maintain the same IP:

The tool automatically generates a unique session ID (1 to PROXY_POOL_SIZE) for each download to get a fresh IP, while maintaining that IP throughout the 5-10 minute download process.

Proxy Pool Size Configuration:

The tool will randomly select a session ID from 1 to your pool size, distributing load across all available proxies.

Smart ISBN Detection:

Note: Rotating proxies (p.webshare.io) don't support sticky sessions. Use individual static proxy IPs from your proxy list instead.

Testing your proxy:`bash node test-proxy.js`

This will verify your proxy configuration by attempting to download a book.

`$3`

Audio generation is optional and requires an ElevenLabs API key. If the key is not provided, the tool will skip audio generation and only create text-based outputs.

Get ElevenLabs API Key: Sign up here for high-quality text-to-speech.

`Output`

The tool generates:

`Example Workflow`

`bash

`1. Search for a book`


summary search
Enter: "A Philosophy of Software Design"

Select from results, get ASIN
2. Download and process automatically

summary isbn B075HYVHWK
Downloads, asks if you want to process

Creates summary bundle automatically!
Alternative: Process a local file

summary file ~/Downloads/book.epub


How It Works
1. Input Processing: Accepts PDF or EPUB files (EPUB is converted to PDF)
2. Smart Processing Strategy:
   - Small PDFs (<400k chars): Direct upload to OpenAI's vision API
   - Large PDFs (>400k chars): Intelligent chunking with synthesis
3. AI Summarization: GPT-5 analyzes content with full formatting, tables, and diagrams
4. Format Conversion: Uses Pandoc to convert the Markdown summary to PDF and EPUB
5. Audio Generation: Optional TTS conversion using ElevenLabs
6. Bundling: Creates a compressed archive with all generated files
$3
For PDFs exceeding 400,000 characters (typically 500+ pages), the tool automatically uses an intelligent chunking strategy:
How it works:
1. Analysis: Calculates optimal chunk size based on PDF statistics and GPT-5's token limits
2. Smart Token Management: Respects GPT-5's 272k input token limit with safety margins
3. Page-Based Chunking: Splits PDF into logical chunks that fit within token limits
4. Parallel Processing: Each chunk is summarized independently by GPT-5
5. Intelligent Synthesis: All chunk summaries are combined into a cohesive final summary
6. Quality Preservation: Maintains narrative flow and eliminates redundancy
Token Limit Handling:
- GPT-5 Input Limit: 272,000 tokens
- System Overhead: 20,000 tokens reserved for prompts and instructions
- Available Tokens: 250,000 tokens for content
- Safety Margin: 70% utilization to account for token estimation variance
- Chunk Size: ~565,000 characters per chunk (based on 3.5 chars/token estimate)
Benefits:
- ✅ Complete Coverage: Processes entire books without truncation
- ✅ High Quality: Each section gets full AI attention
- ✅ Seamless Output: Final summary reads as a unified document
- ✅ Cost Efficient: Optimizes token usage across multiple API calls
- ✅ Automatic: No configuration needed - works transparently
- ✅ Token-Aware: Respects API limits to prevent errors

`$3`

The tool prioritizes OpenAI's vision API for direct PDF upload when possible:

`Testing`

Summary Forge includes a comprehensive test suite using Vitest.

`$3`

`bash

`Run all tests`


pnpm test
Run tests in watch mode

pnpm test:watch
Run tests with coverage report

pnpm test:coverage


$3
The test suite includes:
- ✅ 30+ passing tests
- Constructor validation
- Helper method tests
- PDF upload functionality tests
- API integration tests
- Error handling tests
- Edge case coverage
- File operation tests

See test/summary-forge.test.js for the complete test suite.

`Flashcard Generation`

Summary Forge includes powerful flashcard generation capabilities for study and review.

`$3`

Generate double-sided flashcard PDFs optimized for printing:

`javascript import { extractFlashcards, generateFlashcardsPDF } from '@profullstack/summary-forge-module/flashcards'; import fs from 'node:fs/promises';

// Read your markdown summary const markdown = await fs.readFile('./book_summary.md', 'utf-8');

// Extract Q&A pairs const extractResult = extractFlashcards(markdown, { maxCards: 50 }); console.log(Extracted ${extractResult.count} flashcards);

console.log(PDF created: ${pdfResult.path}); console.log(Total pages: ${pdfResult.pages});`

`$3`

Generate individual PNG images for each flashcard, perfect for web applications:

`javascript import { extractFlashcards, generateFlashcardImages } from '@profullstack/summary-forge-module/flashcards'; import fs from 'node:fs/promises';

// Read your markdown summary const markdown = await fs.readFile('./book_summary.md', 'utf-8');

// Extract Q&A pairs const extractResult = extractFlashcards(markdown);

Image Naming Convention: -q-001.png, q-002.png, etc. - Question cards -a-001.png, a-002.png, etc. - Answer cards

Use Cases: - 🌐 Web-based flashcard applications - 📱 Mobile learning apps - 🎮 Interactive quiz games - 📊 Study progress tracking systems - 🔄 Spaced repetition software

`$3`

The extractFlashcards function supports multiple markdown formats:

1. Explicit Q&A Format:`markdown Q: What is a closure? A: A closure is a function that has access to variables in its outer scope.`

2. Definition Lists:`markdown Closure : A function that has access to variables in its outer scope.`

3. Question Headers:`markdown

`$3`

A closure is a function that has access to variables in its outer scope.`

`Examples`

See the examples/ directory for more usage examples:

- programmatic-usage.js- Using as a module -flashcard-images-demo.js - Generating flashcard images

`Troubleshooting`

`$3`

If you encounter "Too many requests" errors from 1lib.sk:

Error Message:`Too many requests from your IP xxx.xxx.xxx.xxx Please wait 10 seconds. support@z-lib.fm. Err #ipd1`

Note: The proxy pool helps distribute requests across different IPs, reducing rate limiting issues.

`$3`

If you encounter "Download button not found" errors when downloading from 1lib.sk:

2. Review Error Output: The error message includes: - All selectors that were tried - List of links found on the page - Location of the debug HTML file

Recommended Alternative:`bash

`Use Anna's Archive instead (more reliable for automation)`


summary search "prompt engineering" --source anna


$3
If you're getting blocked by Anna's Archive:

1. Enable proxy in your configuration:`bash summary setup`2. Use a USA-based proxy to avoid geo-location issues

3. Test your proxy before downloading:`bash node test-proxy.js B0BCTMXNVN`

4. Run browser in visible mode to debug:`bash summary config --headless false`

`$3`

The proxy is used for: - ✅ Browser navigation (Puppeteer) - ✅ File downloads (fetch with https-proxy-agent) - ✅ All HTTP requests to Anna's Archive

Supported proxy formats: -http://proxy.example.com:8080-https://proxy.example.com:8080-socks5://proxy.example.com:1080-http://proxy.example.com:8080-session- (sticky session)

Recommended Service: Webshare.io - Reliable USA-based proxies with free tier available.

Webshare Sticky Sessions: Add-session-to your proxy URL to maintain the same IP:`http://p.webshare.io:80-session-myapp123`

`CAPTCHA Solving`

When downloading from Anna's Archive, you may encounter CAPTCHAs. To automatically solve them:

1. Sign up for 2Captcha: Get API key here 2. Add to configuration:`bash summary setup``
3. Enter your 2Captcha API key when prompted

The tool will automatically detect and solve CAPTCHAs during downloads, making the process fully automated.