# Arch Linux sudo pacman -S texlive-core texlive-xetex `
CLI Usage
$3
Before using the CLI, configure your API keys:
`bash summary setup `
This interactive command will prompt you for: - OpenAI API Key (required) - Rainforest API Key (optional - for Amazon book search) - ElevenLabs API Key (optional - for audio generation, get key here) - 2Captcha API Key (optional - for CAPTCHA solving, sign up here) - Browserless API Key (optional) - Browser and proxy settings
Configuration is saved to ~/.config/summary-forge/settings.json and used automatically by all CLI commands.
$3
`bash
View current configuration
summary config
Update configuration
summary setup
Delete configuration
summary config --delete `
Note: The CLI will use configuration in this priority order: 1. Environment variables (
This launches an interactive menu where you can: - Process local files (PDF/EPUB) - Process web page URLs - Search for books by title - Look up books by ISBN/ASIN
Features: - Automatically fetches web page content using Puppeteer - Sanitizes HTML to remove navigation, ads, footers, and other non-content elements - Saves web page as PDF for processing - Generates clean title from page title or uses OpenAI to create one - Prompts specifically optimized for web page content (ignores nav/ads/footers) - Creates same output formats as book processing (MD, TXT, PDF, EPUB, MP3, flashcards)
$3
`bash
Search for books (defaults to 1lib.sk - faster, no DDoS protection)
summary search "LLM Fine Tuning" summary search "JavaScript" --max-results 5 --extensions pdf,epub summary search "Python" --year-from 2020 --year-to 2024 summary search "Machine Learning" --languages english --order date
Use Anna's Archive instead (has DDoS protection, slower)
summary search "Clean Code" --source anna summary search "Rare Book" --source anna --sources zlib,lgli
Title search (shortcut for search command)
summary title "A Philosophy of Software Design" summary title "Clean Code" --force # Auto-select first result summary title "Python" --source anna # Use Anna's Archive
ISBN lookup (defaults to 1lib.sk)
summary isbn 9780134685991 summary isbn B075HYVHWK --force # Auto-select and process summary isbn 9780134685991 --source anna # Use Anna's Archive
Common Options:
--source Search source: zlib (1lib.sk, default) or anna (Anna's Archive)
-n, --max-results Maximum results to display (default: 10)
-f, --force Auto-select first result and process immediately
#
1lib.sk Options (--source zlib, default):
--year-from Filter by publication year from (e.g., 2020)
--year-to Filter by publication year to (e.g., 2024)
-l, --languages Language filter, comma-separated (default: english)
--sources Data sources, comma-separated (default: all sources)
Options: zlib, lgli, lgrs, and others
`
$3
`bash summary isbn B075HYVHWK
Force overwrite if directory already exists
summary isbn B075HYVHWK --force summary isbn B075HYVHWK -f `
$3
`bash summary --help summary file --help `
Programmatic Usage
$3
All methods now return consistent JSON objects with the following structure:
`javascript { success: true | false, // Indicates if operation succeeded ...data, // Method-specific data fields error?: string, // Error message (only when success is false) message?: string // Success message (optional) } `
This enables: - ā Consistent error handling - Check
success field instead of try-catch - ā REST API ready - Direct JSON responses for HTTP endpoints - ā Better debugging - Rich metadata in all responses - ā Type-safe - Predictable structure for TypeScript users
$3
`javascript import { SummaryForge } from '@profullstack/summary-forge-module'; import { loadConfig } from '@profullstack/summary-forge-module/config';
// Load config from ~/.config/summary-forge/settings.json const configResult = await loadConfig(); if (!configResult.success) { console.error('Failed to load config:', configResult.error); process.exit(1); }
const forge = new SummaryForge(configResult.config);
`javascript import { SummaryForge } from '@profullstack/summary-forge-module';
const forge = new SummaryForge({ // Required openaiApiKey: 'sk-...',
// Optional API keys rainforestApiKey: 'your-key', // For Amazon search elevenlabsApiKey: 'sk-...', // For audio generation (get key: https://try.elevenlabs.io/oh7kgotrpjnv) twocaptchaApiKey: 'your-key', // For CAPTCHA solving (sign up: https://2captcha.com/?from=9630996) browserlessApiKey: 'your-key', // For browserless.io
// Processing options maxChars: 500000, // Max chars to process maxTokens: 20000, // Max tokens in output summary maxInputTokens: 250000, // Max input tokens per API call (default: 250000 for GPT-5)
// Download the first result if (searchResult.results.length > 0) { const downloadResult = await forge.downloadFrom1lib( searchResult.results[0].url, '.', searchResult.results[0].title );
if (downloadResult.success) { console.log('Downloaded:', downloadResult.filepath);
// Process the downloaded book const processResult = await forge.processFile(downloadResult.filepath, downloadResult.identifier); if (processResult.success) { console.log('Summary created:', processResult.archive); console.log('Costs:', processResult.costs); } else { console.error('Processing failed:', processResult.error); } } else { console.error('Download failed:', downloadResult.error); } }
`
Enhanced Error Handling:
The 1lib.sk download functionality includes robust error handling with automatic debugging:
- Multiple Selector Fallbacks: Tries 6 different selectors to find download buttons - Debug HTML Capture: Saves page HTML when download button isn't found - Link Analysis: Lists all links on the page for troubleshooting - Detailed Error Messages: Provides actionable information for debugging
If a download fails, check the
debug-book-page.html file in the book's directory for detailed page structure information.
$3
#### Constructor Options
`javascript new SummaryForge({ // API Keys openaiApiKey: string, // Required: OpenAI API key rainforestApiKey: string, // Optional: For title search elevenlabsApiKey: string, // Optional: For audio generation twocaptchaApiKey: string, // Optional: For CAPTCHA solving browserlessApiKey: string, // Optional: For browserless.io
// Processing Options maxChars: number, // Optional: Max chars to process (default: 400000) maxTokens: number, // Optional: Max tokens in output summary (default: 16000) maxInputTokens: number, // Optional: Max input tokens per API call (default: 250000 for GPT-5)
When using the module programmatically, configuration is loaded in this order (highest priority first):
1. Constructor options - Passed directly to
new SummaryForge(options) 2. Environment variables - From .env file or shell 3. Config file - From ~/.config/summary-forge/settings.json (CLI only)
$3
To avoid IP bans when downloading from Anna's Archive, configure a proxy during setup:
`bash summary setup `
When prompted: 1. Enable proxy:
Yes 2. Enter proxy URL: http://your-proxy.com:8080 3. Enter proxy username and password
Why use a proxy? - ā Avoids IP bans from Anna's Archive - ā USA-based proxies prevent geo-location issues - ā Works with both browser navigation and file downloads - ā Automatically applied to all download operations
Recommended Proxy Service:
We recommend Webshare.io for reliable, USA-based proxies: - š USA-based IPs (no geo-location issues) - ā” Fast and reliable - š° Affordable pricing with free tier - š HTTP/HTTPS/SOCKS5 support
Important: Use Static Proxies for Sticky Sessions
For Anna's Archive downloads, you need a static/direct proxy (not rotating) to maintain the same IP:
1. In your Webshare dashboard, go to Proxy ā List 2. Copy a Static Proxy endpoint (not the rotating endpoint) 3. Use the format:
The tool automatically generates a unique session ID (1 to
PROXY_POOL_SIZE) for each download to get a fresh IP, while maintaining that IP throughout the 5-10 minute download process.
Proxy Pool Size Configuration:
Set
PROXY_POOL_SIZE to match your Webshare plan (default: 36): - Free tier: 10 proxies ā PROXY_POOL_SIZE=10 - Starter plan: 25 proxies ā PROXY_POOL_SIZE=25 - Professional plan: 100 proxies ā PROXY_POOL_SIZE=100 - Enterprise plan: 250+ proxies ā PROXY_POOL_SIZE=250
The tool will randomly select a session ID from 1 to your pool size, distributing load across all available proxies.
Smart ISBN Detection:
When searching Anna's Archive, the tool automatically detects whether an identifier is a real ISBN or an Amazon ASIN: - Real ISBNs (10 or 13 numeric digits): Searches by ISBN for precise results - Amazon ASINs (alphanumeric): Searches by book title instead for better results - This ensures you get relevant search results even when Amazon returns proprietary ASINs instead of standard ISBNs
Note: Rotating proxies (
p.webshare.io) don't support sticky sessions. Use individual static proxy IPs from your proxy list instead.
Testing your proxy:
`bash node test-proxy.js `
This will verify your proxy configuration by attempting to download a book.
$3
Audio generation is optional and requires an ElevenLabs API key. If the key is not provided, the tool will skip audio generation and only create text-based outputs.
Get ElevenLabs API Key:Sign up here for high-quality text-to-speech.
Features: - Uses ElevenLabs Turbo v2.5 model (optimized for audiobooks) - Default voice: Brian (best for technical content, customizable) - Automatically truncates long texts to fit API limits - Generates high-quality MP3 audio files - Natural, conversational narration style
Output
The tool generates:
-
_summary.md - Markdown summary - _summary.txt - Plain text summary - _summary.pdf - PDF summary with table of contents - _summary.epub - EPUB summary with clickable TOC - _summary.mp3 - Audio summary (if ElevenLabs key provided) - .pdf - Original or converted PDF - .epub - Original EPUB (if input was EPUB) - _bundle.tgz - Compressed archive containing all files
Example Workflow
`bash
1. Search for a book
summary search
Enter: "A Philosophy of Software Design"
Select from results, get ASIN
2. Download and process automatically
summary isbn B075HYVHWK
Downloads, asks if you want to process
Creates summary bundle automatically!
Alternative: Process a local file
summary file ~/Downloads/book.epub `
How It Works
1. Input Processing: Accepts PDF or EPUB files (EPUB is converted to PDF) 2. Smart Processing Strategy: - Small PDFs (<400k chars): Direct upload to OpenAI's vision API - Large PDFs (>400k chars): Intelligent chunking with synthesis 3. AI Summarization: GPT-5 analyzes content with full formatting, tables, and diagrams 4. Format Conversion: Uses Pandoc to convert the Markdown summary to PDF and EPUB 5. Audio Generation: Optional TTS conversion using ElevenLabs 6. Bundling: Creates a compressed archive with all generated files
$3
For PDFs exceeding 400,000 characters (typically 500+ pages), the tool automatically uses an intelligent chunking strategy:
How it works: 1. Analysis: Calculates optimal chunk size based on PDF statistics and GPT-5's token limits 2. Smart Token Management: Respects GPT-5's 272k input token limit with safety margins 3. Page-Based Chunking: Splits PDF into logical chunks that fit within token limits 4. Parallel Processing: Each chunk is summarized independently by GPT-5 5. Intelligent Synthesis: All chunk summaries are combined into a cohesive final summary 6. Quality Preservation: Maintains narrative flow and eliminates redundancy
Token Limit Handling: - GPT-5 Input Limit: 272,000 tokens - System Overhead: 20,000 tokens reserved for prompts and instructions - Available Tokens: 250,000 tokens for content - Safety Margin: 70% utilization to account for token estimation variance - Chunk Size: ~565,000 characters per chunk (based on 3.5 chars/token estimate)
Benefits: - ā Complete Coverage: Processes entire books without truncation - ā High Quality: Each section gets full AI attention - ā Seamless Output: Final summary reads as a unified document - ā Cost Efficient: Optimizes token usage across multiple API calls - ā Automatic: No configuration needed - works transparently - ā Token-Aware: Respects API limits to prevent errors
Example Output:
` š PDF Stats: 523 pages, 1,245,678 chars, ~311,420 tokens š PDF is large - using intelligent chunking strategy This will process the ENTIRE 523-page PDF without truncation š Using chunk size: 120,000 chars š¦ Created 11 chunks for processing Chunk 1: Pages 1-48 (119,234 chars) Chunk 2: Pages 49-95 (118,901 chars) ... ā All 11 chunks processed successfully š Synthesizing chunk summaries into final comprehensive summary... ā Final summary synthesized: 45,678 characters `
$3
The tool prioritizes OpenAI's vision API for direct PDF upload when possible:
- ā Better Quality: Preserves document formatting, tables, and diagrams - ā More Accurate: AI can see the actual PDF layout and structure - ā Better for Technical Books: Code examples and diagrams are preserved - ā Fallback Strategy: Automatically switches to intelligent chunking for large files
Testing
Summary Forge includes a comprehensive test suite using Vitest.
$3
`bash
Run all tests
pnpm test
Run tests in watch mode
pnpm test:watch
Run tests with coverage report
pnpm test:coverage `
$3
The test suite includes: - ā 30+ passing tests - Constructor validation - Helper method tests - PDF upload functionality tests - API integration tests - Error handling tests - Edge case coverage - File operation tests
See
test/summary-forge.test.js for the complete test suite.
Flashcard Generation
Summary Forge includes powerful flashcard generation capabilities for study and review.
$3
Generate double-sided flashcard PDFs optimized for printing:
`javascript import { extractFlashcards, generateFlashcardsPDF } from '@profullstack/summary-forge-module/flashcards'; import fs from 'node:fs/promises';
q-001.png, q-002.png, etc. - Question cards - a-001.png, a-002.png, etc. - Answer cards
Use Cases: - š Web-based flashcard applications - š± Mobile learning apps - š® Interactive quiz games - š Study progress tracking systems - š Spaced repetition software
Features: - ā Clean, professional design with book title - ā Automatic text wrapping for long content - ā Customizable dimensions and styling - ā SVG-based rendering for crisp quality - ā Works in Docker (no native dependencies)
$3
The
extractFlashcards function supports multiple markdown formats:
1. Explicit Q&A Format:
`markdown Q: What is a closure? A: A closure is a function that has access to variables in its outer scope. `
2. Definition Lists:
`markdown Closure : A function that has access to variables in its outer scope. `
3. Question Headers:
`markdown
$3
A closure is a function that has access to variables in its outer scope.
`
Examples
See the
examples/ directory for more usage examples:
-
programmatic-usage.js - Using as a module - flashcard-images-demo.js - Generating flashcard images
Troubleshooting
$3
If you encounter "Too many requests" errors from 1lib.sk:
Error Message:
` Too many requests from your IP xxx.xxx.xxx.xxx Please wait 10 seconds. support@z-lib.fm. Err #ipd1 `
Automatic Handling: The tool automatically detects rate limiting and: - ā Waits the requested time (usually 10 seconds) - ā Retries up to 3 times with exponential backoff - ā Adds a 2-second buffer to ensure rate limit has cleared
Manual Solutions: 1. Wait a few minutes before trying again 2. Use a different proxy session (the tool rotates through your proxy pool automatically) 3. Switch to Anna's Archive:
summary search "book title" --source anna 4. Reduce concurrent requests if running multiple downloads
Note: The proxy pool helps distribute requests across different IPs, reducing rate limiting issues.
$3
If you encounter "Download button not found" errors when downloading from 1lib.sk:
1. Check Debug Files: The tool automatically saves
debug-book-page.html in the book's directory - Open this file to inspect the actual page structure - Look for download links or buttons that might have different selectors
2. Review Error Output: The error message includes: - All selectors that were tried - List of links found on the page - Location of the debug HTML file
3. Common Causes: - Z-Access/Library Access Page: Book page redirects to authentication page (most common) - Page structure changed (1lib.sk updates their site) - Book is deleted or unavailable - Session expired or cookies not maintained - Proxy issues preventing proper page load
4. Solutions: - Recommended: Use Anna's Archive instead:
summary search "book title" --source anna - Try the search1lib command separately to verify the book exists - Check if the book page loads correctly in a regular browser with the same proxy - Verify proxy configuration is working correctly - Try a different book from search results
5. Known Issue - Z-Access Page: If you see links to
library-access.sk or Z-Access page in the debug output, this means: - The book page requires authentication or special access - 1lib.sk's session management is blocking automated access - Workaround: Use Anna's Archive which has better automation support
Example Debug Output (Z-Access Issue):
` ā Download button not found on book page Debug HTML saved to: ./uploads/book_name/debug-book-page.html Found 6 links on page First 5 links: - https://library-access.sk (Z-Access page) - mailto:blackbox@z-library.so (blackbox@z-library.so) - https://www.reddit.com/r/zlibrary (https://www.reddit.com/r/zlibrary) `
Recommended Alternative:
`bash
Use Anna's Archive instead (more reliable for automation)
summary search "prompt engineering" --source anna `
$3
If you're getting blocked by Anna's Archive:
1. Enable proxy in your configuration:
`bash summary setup `
2. Use a USA-based proxy to avoid geo-location issues
3. Test your proxy before downloading:
`bash node test-proxy.js B0BCTMXNVN `
4. Run browser in visible mode to debug:
`bash summary config --headless false `
$3
The proxy is used for: - ā Browser navigation (Puppeteer) - ā File downloads (fetch with https-proxy-agent) - ā All HTTP requests to Anna's Archive
Recommended Service:Webshare.io - Reliable USA-based proxies with free tier available.
Webshare Sticky Sessions: Add
-session- to your proxy URL to maintain the same IP: ` http://p.webshare.io:80-session-myapp123 `
CAPTCHA Solving
When downloading from Anna's Archive, you may encounter CAPTCHAs. To automatically solve them:
1. Sign up for 2Captcha: Get API key here 2. Add to configuration:
`bash summary setup `` 3. Enter your 2Captcha API key when prompted
The tool will automatically detect and solve CAPTCHAs during downloads, making the process fully automated.
Limitations
- Maximum PDF file size: No practical limit (intelligent chunking handles any size) - GPT-5 uses default temperature of 1 (not configurable) - Requires external tools: Calibre, Pandoc, XeLaTeX - CAPTCHA solving requires 2captcha.com API key (optional) - Very large PDFs (1000+ pages) may incur higher API costs due to multiple chunk processing - Anna's Archive may block IPs without proxy configuration - Chunked processing uses text extraction (images/diagrams described in text only)
Roadmap
- [x] ISBN/ASIN lookup via Anna's Archive - [x] Automatic download from Anna's Archive with CAPTCHA solving - [x] Book title search via Rainforest API - [x] CLI with interactive mode - [x] ESM module for programmatic use - [x] Audio generation with ElevenLabs TTS - [x] Direct PDF upload to OpenAI vision API - [x] EPUB format prioritization (open standard) - [ ] Support for more input formats (MOBI, AZW3) - [ ] Chunked processing for very large books (>100MB) - [ ] Custom summary templates - [ ] Web interface - [ ] Multiple voice options for audio - [ ] Audio chapter markers - [ ] Batch processing multiple books
License
ISC
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.