Vesper MCP Server 🚀

AI-powered dataset discovery, quality analysis, and preparation with multimodal support (text, image, audio, video).

Vesper is a Model Context Protocol (MCP) server that helps you find, analyze, and prepare high-quality datasets for machine learning projects. It integrates seamlessly with AI assistants like Claude, providing autonomous dataset workflows.

✨ Features

$3

- Search across HuggingFace, Kaggle, UCI ML Repository, and more
- Intelligent ranking based on quality, safety, and relevance
- Automatic metadata extraction and enrichment

$3

- Text: Missing data, duplicates, column profiling
- Images: Resolution, corruption, blur detection
- Audio: Sample rate, duration, silence detection
- Video: FPS, frame validation, corruption risk
- Unified Reports: Consolidated quality scores (0-100) with recommendations

$3

- Automated cleaning pipelines
- Format conversion (CSV, JSON, Parquet)
- Train/test/validation splitting
- Automatic installation to project directories

$3

- Analyze mixed datasets (text + images + audio)
- Media-specific quality metrics
- Intelligent modality detection

📦 Installation

$3

bash

npm install -g @vespermcp/mcp-server

$3

bash

npm install -g git+https://github.com/vespermcp/mcp-server.git





The postinstall script will automatically:

- Install Python dependencies (opencv-python, librosa, etc.)

- Create data directories in

~/.vesper



- Display setup instructions



$3

bash

pip install opencv-python pillow numpy librosa soundfile





⚙️ MCP Configuration



$3

1. Go to Settings > Features > MCP

2. Click Add New MCP Server

3. Enter:

   - Name:

vesper



   - Type:

command



   - Command:

vesper





$3

Vesper attempts to auto-configure itself! Restart Claude and check. If not:

json

{

  "mcpServers": {

    "vesper": {

      "command": "vesper",

      "args": [],

      "env": {

        "HF_TOKEN": "your-huggingface-token"

      }

    }

  }

}





> Note: If the

vesper

 command isn't found, you can stick to the absolute path method.



$3



-

KAGGLE_USERNAME & KAGGLE_KEY

: For Kaggle dataset access

-

HF_TOKEN

: For private HuggingFace datasets



🚀 Quick Start



After installation and configuration, restart your AI assistant and try:



search_datasets(query="sentiment analysis", limit=5)



prepare_dataset(query="image classification cats vs dogs")



generate_quality_report(

  dataset_id="huggingface:imdb", 

  dataset_path="/path/to/data"

)





📚 Available Tools



$3



####

search_datasets



Search for datasets across multiple sources.



Parameters:

-

query

 (string): Search query

-

limit

 (number, optional): Max results (default: 10)

-

min_quality_score

 (number, optional): Minimum quality threshold



Example:



search_datasets(query="medical imaging", limit=5, min_quality_score=70)





---



$3



####

prepare_dataset



Download, analyze, and prepare a dataset for use.



Parameters:

-

query

 (string): Dataset search query or ID



Example:



prepare_dataset(query="squad")





---



####

export_dataset



Export a prepared dataset to a custom directory with format conversion.



Parameters:

-

dataset_id

 (string): Dataset identifier

-

target_dir

 (string): Export directory

-

format

 (string, optional): Output format (csv, json, parquet)



Example:



export_dataset(

  dataset_id="huggingface:imdb",

  target_dir="./my-data",

  format="csv"

)





---



$3



####

analyze_image_quality



Analyze image datasets for resolution, corruption, and blur.



Parameters:

-

path

 (string): Path to image file or folder



Example:



analyze_image_quality(path="/path/to/images")





---



####

analyze_media_quality



Analyze audio/video files for quality metrics.



Parameters:

-

path

 (string): Path to media file or folder



Example:



analyze_media_quality(path="/path/to/audio")





---



####

generate_quality_report



Generate a comprehensive unified quality report for multimodal datasets.



Parameters:

-

dataset_id

 (string): Dataset identifier

-

dataset_path

 (string): Path to dataset directory



Example:



generate_quality_report(

  dataset_id="my-dataset",

  dataset_path="/path/to/data"

)





---



$3



####

split_dataset



Split a dataset into train/test/validation sets.



Parameters:

-

dataset_id

 (string): Dataset identifier

-

train_ratio

 (number): Training set ratio (0-1)

-

test_ratio

 (number): Test set ratio (0-1)

-

val_ratio

 (number, optional): Validation set ratio (0-1)



Example:



split_dataset(

  dataset_id="my-dataset",

  train_ratio=0.7,

  test_ratio=0.2,

  val_ratio=0.1

)

🏗️ Architecture

Vesper is built with:
- TypeScript for the MCP server
- Python for image/audio/video processing
- SQLite for metadata storage
- Transformers.js for semantic search

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

📄 License

MIT License - see LICENSE for details.

🐛 Issues & Support

- Issues: https://github.com/vesper/mcp-server/issues
- Discussions: https://github.com/vesper/mcp-server/discussions

🌟 Acknowledgments

Built with:
- Model Context Protocol
- HuggingFace Hub
- Kaggle API
- OpenCV
- librosa

---

Made with ❤️ by the Vesper Team

Vesper MCP Server 🚀

✨ Features

$3

- Search across HuggingFace, Kaggle, UCI ML Repository, and more
- Intelligent ranking based on quality, safety, and relevance
- Automatic metadata extraction and enrichment

$3

- Automated cleaning pipelines
- Format conversion (CSV, JSON, Parquet)
- Train/test/validation splitting
- Automatic installation to project directories

$3

- Analyze mixed datasets (text + images + audio)
- Media-specific quality metrics
- Intelligent modality detection

📦 Installation

$3

bash

npm install -g @vespermcp/mcp-server

$3

bash

npm install -g git+https://github.com/vespermcp/mcp-server.git





The postinstall script will automatically:

- Install Python dependencies (opencv-python, librosa, etc.)

- Create data directories in

~/.vesper



- Display setup instructions



$3

bash

pip install opencv-python pillow numpy librosa soundfile





⚙️ MCP Configuration



$3

1. Go to Settings > Features > MCP

2. Click Add New MCP Server

3. Enter:

   - Name:

vesper



   - Type:

command



   - Command:

vesper





$3

Vesper attempts to auto-configure itself! Restart Claude and check. If not:

json

{

  "mcpServers": {

    "vesper": {

      "command": "vesper",

      "args": [],

      "env": {

        "HF_TOKEN": "your-huggingface-token"

      }

    }

  }

}





> Note: If the

vesper

 command isn't found, you can stick to the absolute path method.



$3



-

KAGGLE_USERNAME & KAGGLE_KEY

: For Kaggle dataset access

-

HF_TOKEN

: For private HuggingFace datasets



🚀 Quick Start



After installation and configuration, restart your AI assistant and try:



search_datasets(query="sentiment analysis", limit=5)



prepare_dataset(query="image classification cats vs dogs")



generate_quality_report(

  dataset_id="huggingface:imdb", 

  dataset_path="/path/to/data"

)





📚 Available Tools



$3



####

search_datasets



Search for datasets across multiple sources.



Parameters:

-

query

 (string): Search query

-

limit

 (number, optional): Max results (default: 10)

-

min_quality_score

 (number, optional): Minimum quality threshold



Example:



search_datasets(query="medical imaging", limit=5, min_quality_score=70)





---



$3



####

prepare_dataset



Download, analyze, and prepare a dataset for use.



Parameters:

-

query

 (string): Dataset search query or ID



Example:



prepare_dataset(query="squad")





---



####

export_dataset



Export a prepared dataset to a custom directory with format conversion.



Parameters:

-

dataset_id

 (string): Dataset identifier

-

target_dir

 (string): Export directory

-

format

 (string, optional): Output format (csv, json, parquet)



Example:



export_dataset(

  dataset_id="huggingface:imdb",

  target_dir="./my-data",

  format="csv"

)





---



$3



####

analyze_image_quality



Analyze image datasets for resolution, corruption, and blur.



Parameters:

-

path

 (string): Path to image file or folder



Example:



analyze_image_quality(path="/path/to/images")





---



####

analyze_media_quality



Analyze audio/video files for quality metrics.



Parameters:

-

path

 (string): Path to media file or folder



Example:



analyze_media_quality(path="/path/to/audio")





---



####

generate_quality_report



Generate a comprehensive unified quality report for multimodal datasets.



Parameters:

-

dataset_id

 (string): Dataset identifier

-

dataset_path

 (string): Path to dataset directory



Example:



generate_quality_report(

  dataset_id="my-dataset",

  dataset_path="/path/to/data"

)





---



$3



####

split_dataset



Split a dataset into train/test/validation sets.



Parameters:

-

dataset_id

 (string): Dataset identifier

-

train_ratio

 (number): Training set ratio (0-1)

-

test_ratio

 (number): Test set ratio (0-1)

-

val_ratio

 (number, optional): Validation set ratio (0-1)



Example:



split_dataset(

  dataset_id="my-dataset",

  train_ratio=0.7,

  test_ratio=0.2,

  val_ratio=0.1

)

🏗️ Architecture

Vesper is built with:
- TypeScript for the MCP server
- Python for image/audio/video processing
- SQLite for metadata storage
- Transformers.js for semantic search

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

📄 License

MIT License - see LICENSE for details.

🐛 Issues & Support

- Issues: https://github.com/vesper/mcp-server/issues
- Discussions: https://github.com/vesper/mcp-server/discussions

🌟 Acknowledgments

Built with:
- Model Context Protocol
- HuggingFace Hub
- Kaggle API
- OpenCV
- librosa

---

Made with ❤️ by the Vesper Team