Git repository analytics and reporting tool for analyzing commit patterns and contributor activity
npm install git-sparkAnalyze commit patterns and contributor activity with interactive reports






🎨 Live Demo •
📖 Documentation •
🚀 Quick Start
---
Git Spark analyzes Git repository commit history to provide insights into contributor activity, code changes, and development patterns. It generates interactive HTML reports with charts, contributor statistics, and file analysis based on Git commit data.
- Interactive Reports - HTML dashboards with charts and activity visualizations
- Multiple Formats - Export to HTML, JSON, CSV, and Markdown
- CLI & API - Command-line tool and Node.js library
- File Analysis - Directory structure and change patterns
- Git-only Data - Analysis based solely on commit history
See Git Spark in action with a sample report showing:
- Interactive charts and visualizations
- Contributor activity and statistics
- File and directory analysis
- Dark mode support
- Data export options
- Repository Statistics - Commit counts, contributor activity, and file changes
- Daily Trends - Activity patterns over time with visual charts
- Contributor Analysis - Individual contributor statistics and activity
- File Analysis - Directory structure, file types, and change patterns
- Interactive HTML - Charts, tables, and dark mode support
- Command Line - Simple CLI commands with progress indicators
- Node.js API - Programmatic access for custom integrations
- Multiple Formats - HTML, JSON, CSV, and Markdown output
- Configuration - Customizable analysis periods and options
Enterprise-focused, accessible, and secure analytics dashboard:
- Multi‑Series Timeline – Commits, churn (lines changed), and active authors with dataset toggles
- Daily Activity Trends – Comprehensive daily analysis showing all days in the specified range with activity summaries
- GitHub-style Contributions Calendar – Visual activity heatmap with color-coded intensity levels and interactive tooltips
- Risk Factors Bar Chart – Visual breakdown of churn, recency, ownership, coupling potential, and knowledge concentration inputs
- Governance Radar Chart – Conventional commit adherence, traceability, message quality, WIP/revert penalties
- Dark Mode Toggle (Persistent) – Remembers preference via localStorage; charts dynamically re-theme
- One‑Click Data Export – Download embedded JSON or CSV bundles directly from the report (offline capable)
- Progressive Table Pagination – “Show more” incremental reveal for large author/file sets (performance friendly)
- Dataset Toggles & Live Updates – Enable/disable series without reloading page
- Open Graph Preview Image – Auto‑generated SVG summary for social sharing & link unfurling
- Accessibility Enhancements – ARIA live region announcements for sorting; keyboard focus management; reduced‑motion compliance
- Security‑First Delivery – Strict CSP with SHA‑256 hashed inline script & style blocks (no unsafe-inline), native SVG charts (no external chart library), fully self-contained reports
- Email Redaction Option – Controlled via CLI flag (--redact-emails) for privacy sensitive audits
- Transparent Metrics Documentation – Every team metric includes comprehensive limitations and data source explanations
> Analytical Integrity: All analytics data are embedded (no external calls) ensuring the report is a self-contained artifact suitable for air‑gapped review workflows. Every metric includes honest explanations of what Git data can and cannot reveal.
``bashGlobal installation for CLI usage
npm install -g git-spark
$3
`bash
Initialize Git Spark configuration (interactive wizard)
git-spark initAnalyze current repository (last 30 days)
git-spark --days=30Generate HTML report
git-spark --format=html --output=./reportsAnalyze specific date range
git-spark --since=2025-01-01 --until=2025-12-31
`$3
`typescript
import { GitSpark, analyze } from 'git-spark';// Quick analysis
const report = await analyze('/path/to/repo', { days: 30 });
// Advanced usage with options
const gitSpark = new GitSpark({
repoPath: '/path/to/repo',
since: '2025-01-01',
format: 'html',
output: './reports'
});
const report = await gitSpark.analyze();
await gitSpark.export('html', './reports');
`📖 Documentation
$3
####
git-spark [options]Main analysis command with comprehensive options:
`bash
git-spark [options]Options:
-d, --days analyze last N days
-s, --since start date (YYYY-MM-DD)
-u, --until end date (YYYY-MM-DD)
-f, --format output format (html|json|console|markdown|csv)
-o, --output output directory (default: "./reports")
-c, --config configuration file
-b, --branch analyze specific branch
-a, --author filter by author
-p, --path filter by file path pattern
--heavy enable expensive analyses
--log-level logging verbosity (error|warn|info|debug|verbose)
--no-cache disable caching
--timezone IANA timezone for daily trends (e.g., America/Chicago)
--redact-emails redact email addresses in reports
--exclude-extensions comma-separated list of file extensions to exclude (e.g., .md,.txt)
--azure-devops enable Azure DevOps pull request analytics
--devops-org Azure DevOps organization name
--devops-project Azure DevOps project name
--devops-repo Azure DevOps repository (auto-detected if not specified)
--devops-token Azure DevOps Personal Access Token
-h, --help display help for command
`####
git-spark analyzeDetailed analysis with additional options:
`bash
git-spark analyze [options]Options:
-r, --repo repository path (default: current directory)
[all main command options]
`####
git-spark healthQuick repository health assessment:
`bash
git-spark health [options]Options:
-r, --repo repository path (default: current directory)
`####
git-spark initInteractive configuration wizard to set up Git Spark for your project:
`bash
git-spark init [options]Options:
-r, --repo repository path (default: current directory)
-y, --yes use defaults without prompts
`Creates a
.git-spark.json configuration file with your preferred settings:`bash
Interactive setup with prompts
git-spark initQuick setup with defaults
git-spark init --yesInitialize in a specific directory
git-spark init --repo /path/to/repo
`The wizard guides you through:
- Days to analyze: Number of days for analysis (default: 30)
- Output format: Report format preference (html, json, markdown, csv, console)
- Excluded extensions: File extensions to skip (e.g., .md, .txt, .json)
####
git-spark htmlGenerate comprehensive HTML report with additional options:
`bash
git-spark html [options]Options:
-r, --repo repository path (default: current directory)
-d, --days analyze last N days
-s, --since start date (YYYY-MM-DD)
-u, --until end date (YYYY-MM-DD)
-o, --output output directory (default: "./reports")
-b, --branch analyze specific branch
-a, --author filter by author
-p, --path filter by file path pattern
--open open HTML report in browser after generation
--serve start HTTP server to serve the report
--port port for HTTP server (default: 3000)
--heavy enable expensive analyses for detailed insights
`Examples:
`bash
Generate HTML report for last 30 days
git-spark html --days=30Generate and open in browser
git-spark html --days=30 --openGenerate and serve on local web server
git-spark html --days=60 --serve --port=8080Heavy analysis with detailed insights
git-spark html --days=90 --heavy --output=./detailed-reportsWith Azure DevOps pull request analytics
git-spark html --days=30 --azure-devops --devops-org=myorg --devops-project=myproject
`$3
Git Spark includes optional Azure DevOps integration for comprehensive pull request analytics alongside Git commit analysis.
#### Setup
1. Create a Personal Access Token (PAT) in Azure DevOps with 'Code (Read)' scope
2. Set environment variable or pass token via CLI:
`bash
export AZURE_DEVOPS_TOKEN=your-pat-token
# or
git-spark --azure-devops --devops-token=your-pat-token
`#### Usage Examples
`bash
Auto-detect organization, project, and repo from Git remote
git-spark --azure-devops --days=30Specify Azure DevOps details explicitly
git-spark --azure-devops --devops-org=myorg --devops-project=myproject --devops-repo=myrepoGenerate HTML report with PR analytics
git-spark html --azure-devops --days=60 --output=./reportsUse token from environment variable
export AZURE_DEVOPS_TOKEN=your-pat-token
git-spark --azure-devops --days=30
`#### Features
- Pull Request Analytics: Comprehensive PR workflow analysis including cycle times, review metrics
- Work Item Tracking: Link PRs to work items and requirements
- Review Metrics: Review efficiency and collaboration patterns
- Intelligent Caching: Multi-level caching reduces API calls and respects rate limits
- Graceful Degradation: Continues Git analysis if Azure DevOps is unavailable
- Automatic Configuration: Auto-detects Azure DevOps settings from Git remotes
#### Configuration
Add Azure DevOps settings to
.git-spark.json:`json
{
"azureDevOps": {
"enabled": true,
"organization": "myorg",
"project": "myproject",
"repository": "myrepo",
"auth": {
"method": "pat",
"tokenEnvVar": "AZURE_DEVOPS_TOKEN"
},
"cache": {
"enabled": true,
"ttlMinutes": 60
}
}
}
`####
git-spark validateEnvironment and requirements validation:
`bash
git-spark validate
`#### Daily Trends Analysis Examples
`bash
Analyze last 7 days with comprehensive daily trends
git-spark --days=7 --format=htmlExtended 60-day analysis with contributions calendar
git-spark --days=60 --format=html --output=./reportsGenerate JSON with complete daily trends data for external processing
git-spark --days=30 --format=json --output=./dataHeavy analysis with all features including detailed daily patterns
git-spark --days=30 --heavy --format=html
`#### File Exclusion Examples
`bash
Exclude markdown files from analysis
git-spark --days=30 --exclude-extensions=.mdExclude multiple file types
git-spark --days=30 --exclude-extensions=.md,.txt,.logExclude documentation files for code-focused analysis
git-spark html --days=60 --exclude-extensions=.md,.rst,.adocCombine with other filters
git-spark --days=30 --exclude-extensions=.md --branch=main --author=john@example.com
`$3
Create a
.git-spark.json configuration file to customize analysis using the interactive wizard:`bash
git-spark init
`Alternatively, create the file manually. If
--config is not provided, git-spark will look for .git-spark.json in the repository root.> Note: Configuration file support is available for basic analysis options. The configuration system supports custom thresholds, weights, and exclusion patterns for fine-tuned analysis.
`json
{
"version": "1.0",
"analysis": {
"excludePaths": [
"node_modules/**",
"dist/**",
"build/**",
".git/**"
],
"excludeExtensions": [
".md",
".txt"
],
"timezone": "America/Chicago",
"excludeAuthors": [
"dependabot[bot]",
"github-actions[bot]"
],
"thresholds": {
"largeCommitLines": 500,
"smallCommitLines": 50,
"staleBranchDays": 30,
"largeFileKB": 300,
"hotspotAuthorThreshold": 3
},
"weights": {
"risk": {
"churn": 0.35,
"recency": 0.25,
"ownership": 0.20,
"entropy": 0.10,
"coupling": 0.10
}
}
},
"output": {
"defaultFormat": "html",
"outputDir": "./reports",
"redactEmails": false
},
"performance": {
"maxBuffer": 200,
"enableCaching": true,
"cacheDir": ".git-spark-cache",
"chunkSize": 1000
}
}
`$3
####
GitSpark Class`typescript
class GitSpark {
constructor(options: GitSparkOptions, progressCallback?: ProgressCallback)
async analyze(): Promise
async export(format: OutputFormat, outputPath: string, report?: AnalysisReport): Promise
static getDefaultConfig(): GitSparkConfig
}
`####
GitSparkOptions Interface`typescript
interface GitSparkOptions {
repoPath?: string; // Repository path
since?: string; // Start date (YYYY-MM-DD)
until?: string; // End date (YYYY-MM-DD)
days?: number; // Last N days
branch?: string; // Specific branch
author?: string; // Author filter
path?: string; // Path filter
format?: OutputFormat; // Output format
output?: string; // Output directory
config?: string; // Config file path
timezone?: string; // IANA timezone for daily trends
heavy?: boolean; // Enable expensive analyses
logLevel?: LogLevel; // Log level
noCache?: boolean; // Disable caching
}
`#### Quick Functions
`typescript
// Quick analysis function
async function analyze(
repoPath?: string,
options?: Partial
): Promise// Quick export function
async function exportReport(
report: AnalysisReport,
format: OutputFormat,
outputPath: string
): Promise
`� Analytical Integrity & Data Limitations
$3
Git Spark is built on a foundation of complete transparency about what can and cannot be determined from Git repository data alone. We never guess, estimate, or fabricate metrics from unavailable data sources.
$3
✅ Commit metadata: Author, committer, timestamp, message
✅ File changes: Additions, deletions, modifications
✅ Branch and merge history: Repository structure and workflow patterns
✅ Temporal patterns: When changes occurred based on commit timing
✅ Contribution patterns: Who worked on what files and when
$3
❌ Code review data: No reviewer information, approval status, or review comments
❌ Pull/merge request metadata: No PR numbers, descriptions, or review workflows
❌ Issue tracking: No bug reports, feature requests, or issue relationships
❌ Deployment information: No production deployments, rollbacks, or environment data
❌ Team structure: No organizational hierarchy, roles, or responsibilities
❌ Work hours/timezone: No actual working hours, vacation schedules, or availability
❌ Performance metrics: No build times, test results, or runtime performance
$3
- Transparent Naming: Metric names clearly indicate data source limitations
- Comprehensive Documentation: Every metric includes limitation warnings
- Platform Detection: We identify hosting platforms but acknowledge Git data is fundamentally the same
- Educational Focus: We help users understand what metrics do and don't measure
- No False Claims: We never imply Git data provides complete team performance insights
$3
All team-related metrics include detailed explanations of:
- What the metric actually measures from Git data
- Known limitations and potential misinterpretations
- Recommended approaches for supplementing Git analytics
- Warnings against using metrics for performance reviews without context
�📊 Report Formats
$3
Interactive reports with transparent analytics and comprehensive limitations documentation:
- Executive Summary - Health rating with activity index breakdown and clear data source explanations
- Limitations Section - Comprehensive documentation of what Git data can and cannot reveal (positioned before detailed metrics)
- Top Contributors - Author metrics table with detailed activity patterns
- Team Activity Patterns - Aggregate repository metrics showing overall activity distribution
- File Activity Hotspots - Source code files with highest activity (filtered for relevant code files)
- Author Activity Details - Detailed profile cards for each contributor with commit patterns, file focus, and insights
- Daily Activity Trends - Comprehensive day-by-day analysis with GitHub-style contributions calendar (optional)
- Calculation Documentation - Transparent methodology for all metrics including formulas and measurement principles
- Report Metadata - Generation details, Git branch information, and processing statistics
- Interactive visualizations with dark mode support
- Progressive table pagination for performance
- Sortable columns with accessibility features
- Export capabilities for downstream analysis
- CSP/SRI security hardening
> Transparency First: Every metric in the HTML report includes clear explanations of what it measures, its data sources, and its limitations. The limitations section is prominently positioned before detailed calculations to ensure users understand data constraints upfront.
$3
Machine-readable format for:
- CI/CD integration
- Custom tooling integration
- Data processing and analysis
- API consumption
$3
Terminal-friendly format with:
- Color-coded health indicators
- Tabular data presentation
- Progress indicators
- Quick insights and recommendations
$3
Documentation-friendly format for:
- README integration
- Wiki documentation
- Version control tracking
- Collaboration and sharing
$3
Spreadsheet-compatible format with separate files for:
-
authors.csv - Author statistics and metrics
- files.csv - File-level analysis and risk scores
- timeline.csv - Daily activity and trends (includes all days in analysis period)🔍 Analysis Details
$3
Composite metric based on:
- Commit Frequency - Regular development activity
- Author Diversity - Distributed knowledge and contributions
- Commit Size Distribution - Balanced change patterns
- Governance Adherence - Code quality and standards compliance
$3
Comprehensive daily analysis providing:
- Complete Date Range Coverage - Shows all days in the specified period, including days with zero activity
- Activity Metrics - Commits, authors, file changes, and code volume per day
- GitHub-style Contributions Calendar - Visual heatmap with intensity levels (0-4) matching GitHub's color scheme
- Interactive Tooltips - Hover to see exact commit counts and dates
- Week-based Organization - Calendar view organized by weeks for easy pattern recognition
- JSON Export Support - All daily trends data available for external processing and analysis
> Enhanced Coverage: Unlike traditional analytics that only show active days, Git Spark's daily trends include every day in your analysis period, providing complete visibility into work patterns and identifying both active and quiet periods.
$3
File-level risk assessment (activity scoring) considering:
- Code Churn - Frequency and volume of changes
- Author Count - Number of different contributors
- Recency - How recently files were modified
- Ownership Distribution - Knowledge concentration across files
Risk metrics help identify files that may need attention due to high activity levels, but do not indicate code quality or defect likelihood.
$3
Team organization and specialization insights covering:
- Developer Specialization - Measures how unique each developer's file set is compared to others, promoting clear areas of responsibility
- File Ownership Clarity - Percentage of files with single-author ownership, indicating clear responsibility boundaries
- Organization Efficiency - Low file overlap between developers, suggesting better task distribution and reduced conflicts
- Commit Time Patterns - Work timing analysis based on commit timestamps (not actual working hours)
- Team Active Coverage - Days with multiple contributors (estimated pattern, not actual vacation coverage)
> ⚠️ Important Approach: The Team Organization Score measures specialization and clear ownership rather than traditional collaboration. High scores indicate well-organized teams with distinct areas of responsibility, which typically reduces conflicts and improves efficiency. Very high scores may sometimes indicate knowledge silos, while very low scores suggest unclear ownership or coordination issues. All metrics include comprehensive limitation documentation to prevent misinterpretation.
🏗️ CI/CD Integration
$3
`yaml
name: Repository Analysis
on: [push, pull_request]jobs:
analyze:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0 # Full history for analysis
- uses: actions/setup-node@v3
with:
node-version: '18'
- name: Install git-spark
run: npm install -g git-spark
- name: Run analysis
run: git-spark --format=json --output=./reports
- name: Upload reports
uses: actions/upload-artifact@v3
with:
name: git-spark-reports
path: ./reports/
`$3
`yaml
repository_analysis:
stage: analysis
image: node:18
script:
- npm install -g git-spark
- git-spark --format=json --output=./reports
artifacts:
paths:
- reports/
expire_in: 1 week
only:
- main
- develop
`🔒 Security Considerations
Git Spark is designed with security in mind:
- Input Validation - All user inputs are validated and sanitized
- Path Traversal Protection - Safe file path handling
- Email Redaction - Optional email address anonymization
- Buffer Limits - Configurable limits to prevent DoS attacks
- No Arbitrary Execution - Git commands are parameterized and safe
- Dependency Security - Minimal dependencies with security auditing
- Strict Content Security Policy - Inline script & style blocks hashed (SHA‑256); no
unsafe-inline or dynamic eval
- Native SVG Charts - Self-contained visualizations with no external dependencies
- Single External Origin - Minimizes supply chain surface
- Escaped Dynamic Content - All user / repo derived strings safely encoded in HTML outputSecurity Reporting: Please review our Security Policy for information on reporting vulnerabilities and our security practices.
🎯 Performance
Optimized for large repositories:
- Streaming Processing - Handle massive repositories without memory issues
- Intelligent Caching - Avoid redundant Git operations
- Chunked Analysis - Process commits in configurable batches
- Memory Management - Efficient data structures and garbage collection
- Progress Tracking - Real-time progress indicators for long operations
Benchmarks:
- 10k commits: ~10 seconds
- 100k commits: ~2 minutes
- Memory usage: <500MB for 100k commits
📖 For detailed performance optimization strategies, see the Performance Tuning Guide
🧪 Testing
`bash
Run all tests
npm testRun with coverage
npm run test:coverageRun specific test suite
npm test -- --testNamePattern="GitSpark"Run integration tests
npm run test:integration
`🤝 Contributing
We welcome contributions! Please see our GitHub Issues to get started or open a new issue to discuss your ideas.
$3
`bash
Clone repository
git clone https://github.com/MarkHazleton/git-spark.git
cd git-sparkInstall dependencies
npm installBuild TypeScript
npm run buildRun tests
npm testStart development mode
npm run dev
`$3
- TypeScript - Full type safety and modern JavaScript features
- ESLint + Prettier - Consistent code formatting and quality
- Jest Testing - Comprehensive test coverage (>80%)
- Semantic Versioning - Clear version management
- Conventional Commits - Structured commit messages
📋 Roadmap
$3
- Core Analytics Engine - Comprehensive Git repository analysis
- Multiple Output Formats - HTML, JSON, Markdown, CSV, and console formats
- Transparent Team Metrics - Honest metric terminology with comprehensive limitations documentation
- Analytical Integrity Framework - Clear separation between what Git data can and cannot provide
- Enhanced User Education - Comprehensive warnings and guidance about metric interpretation
- Daily Activity Trends - Comprehensive daily analysis showing all days in specified range (including zero-activity days)
- GitHub-style Contributions Calendar - Interactive activity heatmap with color-coded intensity levels and tooltips
- Activity Index Calculation - Transparent breakdown of commit frequency, author participation, and consistency components
- Author Profile Cards - Detailed individual contributor analysis with commit patterns and file focus
- Secure HTML Reports - Strict CSP + SHA-256 hashed inline content (no unsafe-inline)
- Dark Mode - Persistent theme preference with adaptive styling
- Progressive Tables - Pagination & performance safeguards for large datasets
- Sortable Data Tables - Column sorting with ARIA live announcements
- Accessibility Features - ARIA live regions, keyboard navigation, reduced motion support
- OG Image Generation - Auto-generated SVG summaries for social/link previews
- Email Redaction - Privacy-focused email anonymization option
- CLI Commands - Main analysis, health check, validation, and dedicated HTML report generation
- HTTP Server - Built-in web server for local report viewing (
--serve option)
- Auto-Open Browser - Automatic browser launch after report generation (--open option)
- Azure DevOps Integration - Optional pull request analytics with comprehensive PR workflow insights
- Multi-Source Analytics - Unified Git + Azure DevOps analytics with intelligent correlation
- Intelligent Caching - Multi-level caching for Azure DevOps API with rate limitingThese capabilities establish a foundation of analytical honesty and transparency that guides all development.
$3
- [ ] Branch comparison and diff analysis (
--compare option)
- [ ] Continuous monitoring mode (--watch` option)- [ ] API server mode for remote analysis
- [ ] Machine learning-based anomaly detection
- [ ] Integration with code quality tools (SonarQube, CodeClimate)
- [ ] GitLab merge request integration
- [ ] Real-time monitoring and alerting
- [ ] Multi-repository analysis and benchmarking
- [ ] Advanced visualization with D3.js
- [ ] Web dashboard and UI
- [ ] Database persistence (SQLite/PostgreSQL)
- [ ] User authentication and authorization
- [ ] Team management and permissions
- [ ] Webhook integrations and notifications
This project is licensed under the MIT License - see the LICENSE file for details.
- Git Community - For the powerful version control system
- Open Source Contributors - For the excellent libraries and tools
- Enterprise Users - For feedback and requirements validation
- TypeScript Team - For the robust type system
- Documentation: GitHub Wiki
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: mark@markhazleton.com
---
Mark Hazleton is a passionate software architect and developer with decades of experience building enterprise-scale solutions. As the creator of the WebSpark family of tools and frameworks, Mark is committed to empowering developers with practical, honest, and high-quality open-source solutions.
- Website: markhazleton.com - Portfolio, blog, and technical articles
- GitHub: @MarkHazleton - Open source projects and contributions
- LinkedIn: Mark Hazleton - Professional network
- Email: mark@markhazleton.com - Direct contact
Mark specializes in:
- Enterprise application architecture and design
- Open-source tooling and developer productivity
- Code quality, analytics, and automation
- Mentoring and knowledge sharing in the developer community
---
Git Spark is part of the WebSpark ecosystem - a growing family of tools, frameworks, and demonstrations designed to solve real-world development challenges with elegance and precision.
- WebSpark Demos - Interactive demonstrations of modern web technologies and architectural patterns
- WebSpark Tools - Productivity utilities for developers and teams
- WebSpark Frameworks - Reusable components and libraries for enterprise applications
The WebSpark family is built on core principles:
- ✨ Practical Excellence - Tools that solve real problems elegantly
- 🔍 Transparency First - Honest about capabilities and limitations
- 🎓 Education Focused - Empowering developers with knowledge
- 🤝 Community Driven - Open source and collaborative
- 🏢 Enterprise Ready - Production-grade quality and reliability
Visit markhazleton.com to explore the full WebSpark ecosystem and discover tools that can transform your development workflow.
---
Built with ❤️ and unwavering commitment to analytical honesty for the developer community
> Git Spark prioritizes transparency, accuracy, and user education above all else. We believe developers deserve honest, reliable analytics that clearly communicate both capabilities and limitations.