Tauri-based OCR and Image Generation App for macOS
npm install imagioA modern desktop OCR (Optical Character Recognition) application built with Tauri, React, and Tesseract. This is a rewrite of the Tesseract-macOS app using modern web technologies and Rust.
- πΌοΈ Multiple Input Methods
- Select images from your filesystem
- Capture screenshots directly (macOS screencapture integration)
- Drag & drop image support
- π Advanced OCR
- Powered by Tesseract 5.5.1
- Multi-language support (English, Chinese, Japanese, Korean, French, German, Spanish)
- Real-time text extraction
- π¨ Advanced Image Processing
- Contrast adjustment (0.5 - 2.0x)
- Brightness adjustment (-0.5 - +0.5)
- Sharpness enhancement (0.5 - 2.0x, unsharp mask)
- Adaptive threshold
- CLAHE (Contrast Limited Adaptive Histogram Equalization)
- Gaussian blur (0-5.0 sigma)
- Bilateral filter (edge-preserving noise reduction)
- Morphological operations (erosion/dilation)
- Preset configurations for common scenarios
- π€ AI-Powered Features
- Prompt Optimization: Transform OCR text into optimized image generation prompts using LLM
- Image Generation: Generate images from optimized prompts using FLUX Pro 1.1 Ultra
- Support for multiple aspect ratios (21:9, 16:9, 4:3, 1:1, 3:4, 9:16, 9:21)
- Integration with Black Forest Labs API
- Customizable image styles (realistic, artistic, anime, abstract, etc.)
- π Text Management
- Copy extracted text to clipboard
- Save results to text files
- Editable text display with monospace font
- π¨ Modern UI/UX
- Clean, responsive three-column layout
- Light/Dark mode support
- Smooth animations and transitions
- Collapsible advanced controls
- Before/after image comparison view
- Processing progress indicator
- Keyboard shortcuts (βO, ββ§S, ββ΅, etc.)
- Settings persistence (localStorage)
(Coming soon)
- Node.js v20.19+ or v22.12+
- Rust 1.77.2+
- Tesseract OCR 5.5.1+
```
Imagio/
βββ src/ # React frontend source code
β βββ App.tsx # Application shell orchestrating feature modules
β βββ components/ # Reusable UI building blocks (toolbar, status, overlays)
β βββ features/ # Feature-oriented folders (ocr, promptOptimization, imageGeneration)
β β βββ ocr/
β β β βββ components/ # OCR-specific panels and advanced controls
β β β βββ useOcrProcessing.ts
β β βββ promptOptimization/
β β β βββ components/ # Prompt settings and optimized prompt panels
β β β βββ usePromptOptimization.ts
β β βββ imageGeneration/
β β βββ useImageGeneration.ts
β βββ hooks/ # Cross-cutting hooks (config loading, keyboard shortcuts)
β βββ utils/ # API clients for OCR-adjacent services
β βββ main.tsx # React entry point
βββ src-tauri/ # Tauri/Rust backend
β βββ src/
β β βββ lib.rs # OCR bindings and command handlers
β β βββ main.rs # Tauri entry point
β βββ Cargo.toml # Rust dependencies
β βββ tauri.conf.json # Tauri configuration
The React layer now follows a feature-first structure:
- Shared UI components live in src/components and stay presentation-only.src/hooks
- Feature folders bundle logic, hooks, and screens for OCR, prompt optimization, and image generation.
- Custom hooks () encapsulate cross-cutting concerns such as config loading and keyboard shortcuts.App.tsx
- acts as a lightweight coordinator, composing features via the hooks and UI primitives.`
1. Select an Image
- Click "π Select Image" (βO) to choose an image file
- OR click "πΈ Take Screenshot" (ββ§S) to capture a screenshot
- OR drag & drop an image file directly
2. Adjust Processing (Optional)
- Click "βοΈ Show Advanced" (βA) to reveal processing controls
- Configure LLM settings for prompt optimization
- OR manually adjust OCR preprocessing parameters
- Choose recognition language
3. Extract Text
- OCR automatically runs when an image is selected
- View the extracted text in the middle panel
- Edit the text if needed
4. Export Results
- Click "π Copy" (βC) to copy text to clipboard
- Click "πΎ Save" (βS) to save as a text file
1. Optimize Prompt
- After extracting text, configure your desired image style
- Add additional description (optional)
- Click "β¨ Generate Prompt" to generate an optimized prompt using LLM
2. Generate Image
- Review and edit the optimized prompt if needed
- Select your desired aspect ratio (16:9, 1:1, etc.)
- Click "π¨ Generate Image" to create an image using FLUX Pro 1.1 Ultra
- Wait for the generation to complete (usually 10-30 seconds)
- View the generated image in the right panel
Note: Image generation requires a valid BFL API key configured in public/config.local.json
- βO - Open image fileββ§S
- - Take screenshotββ΅
- - Extract text (when image loaded)βC
- - Copy text to clipboard (when text available)βS
- - Save text to file (when text available)βA
- - Toggle advanced settings (when no text)
- PNG
- JPG/JPEG
- GIF
- BMP
- TIFF
- WebP
- π¬π§ English (eng)
- π¨π³ Chinese Simplified (chi_sim)
- πΉπΌ Chinese Traditional (chi_tra)
- π―π΅ Japanese (jpn)
- π°π· Korean (kor)
- π«π· French (fra)
- π©πͺ German (deu)
- πͺπΈ Spanish (spa)
Note: Additional language packs can be installed via Tesseract
bash
npm run tauri:dev
`$3
`bash
npm run tauri:build
`The built application will be available in
src-tauri/target/release/bundle/.π Local API Configuration
Create a
public/config.local.json file (this path is .gitignored) to store your API credentials without committing them:`json
{
"llm": {
"apiBaseUrl": "https://api.openai.com/v1",
"apiKey": "sk-your-key",
"modelName": "gpt-4"
},
"bflApiKey": "your-bfl-api-key-here"
}
`Configuration Options:
-
llm.apiBaseUrl: LLM API endpoint (default: http://127.0.0.1:11434/v1 for local Ollama)
- llm.apiKey: Your LLM API key (optional for local models like Ollama)
- llm.modelName: Model name to use (e.g., llama3.1:8b, gpt-4)
- bflApiKey: Your Black Forest Labs API key for FLUX image generationThe app will merge these values with its defaults at startup. Keep this file localβnever add it to git.
π Project Structure
`
Imagio/
βββ src/ # React frontend source code
β βββ App.tsx # Main application component
β βββ App.css # Application styles
β βββ main.tsx # React entry point
βββ src-tauri/ # Tauri/Rust backend
β βββ src/
β β βββ lib.rs # Main Rust code with OCR functionality
β β βββ main.rs # Tauri entry point
β βββ Cargo.toml # Rust dependencies
β βββ tauri.conf.json # Tauri configuration
β βββ icons/ # App icons
βββ index.html # HTML entry point
βββ package.json # Node.js dependencies
βββ vite.config.ts # Vite configuration
βββ README.md # This file
βββ FEATURES.md # Feature tracking document
`π οΈ Technology Stack
$3
- React 19 - UI framework
- TypeScript - Type-safe JavaScript
- Vite 7 - Fast build tool and dev server$3
- Rust - Systems programming language
- Tauri 2.8 - Desktop app framework
- Tesseract 5.5.1 - OCR engine$3
- tauri-plugin-dialog - File picker and dialogs
- tauri-plugin-fs - Filesystem access
- tauri-plugin-log` - Logging utilitiesSee FEATURES.md for detailed feature implementation progress.
All core features are implemented and functional. The app now matches and exceeds the original Tesseract-macOS feature set.
- Original Tesseract-macOS project by Scott Liu
- Tauri - For the amazing framework
- Tesseract OCR - For the OCR engine
- React - For the UI framework
MIT License - see LICENSE file for details
All major issues have been resolved! β
Minor considerations:
- Bilateral filter may be slow on very large images
- Temp processed images are cleaned up on app exit
See FEATURES.md for complete issue tracking.
Contributions are welcome! Please feel free to submit a Pull Request.
For questions or feedback, please open an issue on GitHub.