✂️ Smart Data Pruner

> Reduce LLM Noise. Save Money. Optimize Context.

![npm version](https://www.npmjs.com/package/smart-data-pruner)
![License: MIT](https://opensource.org/licenses/MIT)

Smart Data Pruner is a production-ready utility designed for AI engineering. It intelligently shrinks massive JSON objects and strings to strictly fit within your LLM's context window—without crashing on circular references or losing critical schema structure.

✨ Features

- 🧠 Intelligent Pruning: Multi-stage algorithm (Clean -> Light -> Aggressive -> Nuclear -> Bedrock) adapts to your data.
- 💰 Cost Estimator: Calculate costs for GPT-4o, Claude 3.5, Gemini 1.5, and more.
- 🛡️ Robust & Safe: Handles circular references, deep nesting, and non-JSON inputs gracefully.
- 🚀 CLI & Library: Professional CLI with spinners and pretty-printing.

📦 Installation

``bash npm install smart-data-pruner`

`🚀 Usage`

`$3`

`javascript const { SmartPruner, estimateCost } = require('smart-data-pruner');

const massiveData = { / ... 50MB of logs ... / };

// 1. Check Cost try { const cost = estimateCost(massiveData, 'gpt-4o'); console.log(Potential Cost: $${cost.costUSD}); } catch (err) { console.error(err); }

// 2. Prune it! const pruner = new SmartPruner(); const result = pruner.prune(massiveData, 4000); // Target: 4000 tokens

console.log(Strategy Used: ${result.strategy}); console.log(result.output);`

`$3`

`bash

`Prune a file to 4000 tokens (default) and save`


npx smart-prune huge-logs.json --out pruned-logs.json
Prune to specific budget with pretty printing

npx smart-prune data.json --tokens 2000 --pretty
Estimate cost only

npx smart-prune data.json --cost --model claude-3-5-sonnet


🧠 Pruning Strategies
The pruner applies these strategies sequentially until the token budget is met:

1. Clean: Removes null, undefined`, empty strings/arrays.
2. Light Trim: Truncates strings > 1000 chars, arrays > 100 items.
3. Heuristic: target specific noisy keys (logs, history, embeddings).
4. Aggressive Trim: Strings > 200 chars, arrays > 20 items.
5. Nuclear: Strings > 50 chars, arrays > 5 items.
6. Bedrock: Strings > 20 chars, arrays > 1 item (Preserves only schema structure).

📊 Supported Models

- OpenAI: gpt-4o, gpt-4-turbo, gpt-3.5-turbo, gpt-4o-mini
- Anthropic: claude-3-5-sonnet, claude-3-opus, claude-3-haiku
- Google: gemini-1.5-pro, gemini-1.5-flash

📄 License

MIT

✂️ Smart Data Pruner

> Reduce LLM Noise. Save Money. Optimize Context.

![npm version](https://www.npmjs.com/package/smart-data-pruner)
![License: MIT](https://opensource.org/licenses/MIT)

✨ Features

📦 Installation

``bash npm install smart-data-pruner`

`🚀 Usage`

`$3`

`javascript const { SmartPruner, estimateCost } = require('smart-data-pruner');

const massiveData = { / ... 50MB of logs ... / };

// 1. Check Cost try { const cost = estimateCost(massiveData, 'gpt-4o'); console.log(Potential Cost: $${cost.costUSD}); } catch (err) { console.error(err); }

// 2. Prune it! const pruner = new SmartPruner(); const result = pruner.prune(massiveData, 4000); // Target: 4000 tokens

console.log(Strategy Used: ${result.strategy}); console.log(result.output);`

`$3`

`bash

`Prune a file to 4000 tokens (default) and save`


npx smart-prune huge-logs.json --out pruned-logs.json
Prune to specific budget with pretty printing

npx smart-prune data.json --tokens 2000 --pretty
Estimate cost only

npx smart-prune data.json --cost --model claude-3-5-sonnet


🧠 Pruning Strategies
The pruner applies these strategies sequentially until the token budget is met:

📊 Supported Models

- OpenAI: gpt-4o, gpt-4-turbo, gpt-3.5-turbo, gpt-4o-mini
- Anthropic: claude-3-5-sonnet, claude-3-opus, claude-3-haiku
- Google: gemini-1.5-pro, gemini-1.5-flash

📄 License

MIT