Intelligent data cleaning CLI with natural language support - Docker-powered Python engine
npm install cleanifixCleanifix
A CLI tool that automatically cleans your data files through natural language commands. Like having a data analyst in your terminal.
๐ Quick Start
bash# Install
npm install -g cleanifix
Missing Value Detection & Handling - Find and fix missing data automatically
Data Standardization - Normalize dates, phone numbers, addresses, and more
Deduplication - Remove duplicate rows with smart matching
Natural Language Interface
bash# Just describe what you want
cleanifix @customers.csv "find missing phone numbers and fill with 'N/A'"
cleanifix @inventory.csv "standardize product names to title case"
cleanifix @transactions.csv "remove duplicate entries keeping the most recent"
Smart Suggestions
bash$ cleanifix @data.csv "analyze"
๐ Data Quality Report:
โ 156 missing values in 'email' column
โ 89 inconsistent date formats
โ 34 potential duplicates
Suggested fixes:
1. Fill missing emails with domain-based patterns
2. Standardize dates to YYYY-MM-DD
3. Remove exact duplicates keeping first occurrence
Apply all fixes? [Y/n]
๐ฆ Installation
Prerequisites
Node.js 18+
Python 3.8+
4GB RAM recommended for large files
Install from npm
bashnpm install -g cleanifix
Install from source
bashgit clone https://github.com/rickyjs1955/cleanifix.git
cd cleanifix
./scripts/setup-dev.sh
๐ ๏ธ Usage Examples
Basic Cleaning
bash# Find issues
cleanifix @data.csv "show me data quality issues"
๐งน Cleanifix Interactive Mode
> analyze my data
> fill missing ages with average by city
> standardize all names to proper case
> save as cleaned_data.csv
> exit
๐๏ธ Architecture
Cleanifix uses a hybrid architecture:
CLI Interface (Node.js) - Fast, responsive user interaction
Processing Engine (Python) - Powerful data manipulation with pandas
Communication - JSON-based message passing between components
๐ค Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Development Setup
bash# Clone the repo
git clone https://github.com/rickyjs1955/cleanifix.git
cd cleanifix
Basic CLI interface
Missing value handling
Simple standardization
Exact deduplication
CSV support
JSON support
Phase 2 - Enhanced Rules
Fuzzy deduplication
Custom regex patterns
Outlier detection
Data type inference
Excel support
Phase 3 - ML Integration
Smart imputation
Anomaly detection
Pattern learning
Confidence scoring
Auto-cleaning mode
๐ License
MIT License - see LICENSE file for details
๐ Acknowledgments
Built with:
Commander.js - CLI framework
Pandas - Data manipulation
Chalk - Terminal styling
๐ฌ Support
Documentation: docs.cleanifix.dev
Issues: GitHub Issues
Discussions: GitHub Discussions
Made with โค๏ธ by data people, for data people