CLI tool to export git commits in parquet format
npm install git2parquet


A command-line tool to convert git commit history to Parquet format, including unified diffs for data analysis and AI applications.
``bash`
npm install -g git2parquet
`bashExport git history of current repo to gitlog.parquet
git2parquet
Output Schema
The generated Parquet file contains the following columns:
-
hash (STRING): Git commit hash
- authorName (STRING): Author's name
- authorEmail (STRING): Author's email address
- date (TIMESTAMP): Commit date in ISO format
- subject (STRING): Commit message subject line
- diff (STRING): Unified diff showing file changesRequirements
- Node.js
- Must be run from within a git repository
- Git must be available in PATH
Options
-
--help, -h: Show help message
- --open: Open the generated Parquet file with hyperparam after exportUse Cases
- Analyzing code change patterns over time
- Training ML models on code evolution
- Creating datasets for software engineering research
- Building commit history dashboards
Hyperparam
Hyperparam is a tool for exploring and curating AI datasets. The Hyperparam CLI (
npx hyperparam`) is a local viewer for ML datasets that launches a small HTTP server and opens your browser to interactively explore the generated git2parquet output file.