Privacy-first client-side LLM preprocessing SDK (rule-based + optional WebGPU LLM)
npm install client-llm-preprocessor



Client-Side LLM Preprocessor is a privacy-first JavaScript SDK that enables powerful text preprocessing entirely within the user's browser. It combines high-speed rule-based cleaning with optional high-reasoning LLM-based extraction and semantic cleaning.
---
- šµļø Privacy-First: All data stay on the user's local machine. No API keys, no server-side processing.
- š° Cost Efficient: Clean and extract data locally to drastically reduce token usage before sending to paid APIs.
- ā” Hybrid Processing: High-speed rules for noise removal, LLM for semantic intelligence.
- šļø Structured Extraction: Extract structured data (JSON) directly from messy text.
- š§© Flexible Chunking: Intelligent text splitting by length, sentence, or word.
- š”ļø Hardened & Tested: 60+ tests covering extreme inputs, garbage text, and lifecycle chaos.
- š Easy Integration: Built-in WebGPU detection and standardized error handling.
---
---
This is a proof-of-concept / experiment.
While the API is stable enough for testing, the performance and reliability are still evolving. Please do not rely on this for critical production workloads yet.
Future Ideas (Roadmap):
- š PII Scrubbing: Automatically detect and remove personal details (names, phones, emails) client-side before data ever leaves the device.
- ā” Optimized WebGPU: Better support for lower-end devices.
---
- Quick Start
- Installation
- Core Concepts
- API Reference
- Project Structure
- Performance
- Browser Requirements
- Contributing
- License
---
``javascript
import { Preprocessor } from 'client-llm-preprocessor';
const preprocessor = new Preprocessor();
const isSupported = await preprocessor.checkWebGPU();
if (!isSupported) {
console.warn("WebGPU not supported. Falling back to rule-based cleaning only.");
}
`
`javascript`
const text = "Contact: hello@example.com - Visit https://site.com";
const cleaned = preprocessor.chunk(text, {
removeHtml: true,
removeUrls: true,
removeExtraWhitespace: true
});
// Result: "Contact: hello@example.com -"
`javascript
await preprocessor.loadModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');
const resume = "John Doe, Email: john@doe.com, Phone: 123-456-7890...";
const data = await preprocessor.extract(resume, {
format: 'json',
fields: ['name', 'email', 'phone']
});
`
---
`bash`
npm install client-llm-preprocessor
---
The project follows a modular and well-documented structure:
`text`
local_processing_llm/
āāā .github/ # GitHub-specific workflows and templates
āāā docs/ # In-depth technical guides & architecture
āāā examples/ # Ready-to-run demo pages
āāā src/ # Source code
ā āāā preprocess/ # Core logic (clean, chunk, extract)
ā āāā utils/ # Helpers (logger, validation, errors)
ā āāā engine.js # WebLLM wrapper
ā āāā index.js # Package entry point
āāā tests/ # 60+ automated tests
ā āāā unit/ # Pure logic tests
ā āāā integration/ # Workflow & lifecycle tests
ā āāā helpers/ # Test utilities & mocks
āāā dist/ # Compiled production build (ESM + Types)
āāā package.json # Meta-data & dependencies
āāā README.md # You are here
---
| Input Size | Rule-Based | LLM-Based |
| :--- | :--- | :--- |
| 10 KB | < 1ms | 1-3 seconds |
| 1 MB | 12ms | (Requires Chunking) |
| 10 MB | 180ms | (Sequential Processing) |
> [!TIP]
> For a full breakdown of memory usage and speed benchmarks, see BENCHMARKS.md.
---
- Local Processing: Any modern browser (Chrome, Firefox, Safari, Edge).
- LLM Features: Requires WebGPU support.
- ā
Chrome 113+ (Windows, macOS, Linux)
- ā
Edge 113+
- ā ļø Safari (Experimental/Partial)
- ā Firefox (In progress by Mozilla)
---
- Architecture Overview: How the engine works.
- API Documentation: Full method signatures and options.
- Contributing Guide: How to help improve the project.
- Security Policy: Reporting vulnerabilities.
- Troubleshooting: Solutions for common issues.
---
Distributed under the MIT License. See LICENSE` for more information.