Extract visitor events from Matomo API and push to Postgres
npm install @socialgouv/matomo-postgresA robust Node.js/TypeScript ETL (Extract, Transform, Load) tool that synchronizes visitor analytics data from Matomo (formerly Piwik) into a PostgreSQL database. Designed for organizations that need to centralize their web analytics data for advanced analysis, reporting, or integration with other systems.
- 🔄 Incremental Synchronization - Smart date range detection with automatic resume capability
- 📊 Complete Data Extraction - Captures visitor sessions, events, custom dimensions, and device information
- 🗄️ Automatic Schema Management - Kysely-based migrations with performance optimizations
- ⚡ High Performance - Controlled concurrency, pagination, and weekly table partitioning
- 🛡️ Type Safety - Full TypeScript implementation with comprehensive type definitions
- 🔍 Detailed Logging - Progress tracking and debug information for monitoring
- 📱 Device Analytics - Screen resolution, device model, and operating system data
- 🌍 Geographic Data - Country, region, and city information from visitor sessions
``bash`
npx @socialgouv/matomo-postgres
`bash`
npm install @socialgouv/matomo-postgresor
pnpm add @socialgouv/matomo-postgres
| Variable | Description | Example |
| ------------- | ------------------------------------ | ------------------------------------- |
| MATOMO_KEY | Matomo API authentication token | your_api_token_here |MATOMO_SITE
| | Numeric site ID in Matomo | 1 |MATOMO_URL
| | Base URL of your Matomo installation | https://analytics.example.com/ |PGDATABASE
| | PostgreSQL connection string | postgresql://user:pass@host:5432/db |
| Variable | Default | Description |
| ------------------------------- | -------------------- | ------------------------------------------------------------------- |
| DESTINATION_TABLE | matomo | Selects which table to write to (normal or partitioned) |MATOMO_TABLE_NAME
| | matomo | Name for the standard table |PARTITIONED_MATOMO_TABLE_NAME
| | matomo_partitioned | Name for the partitioned table |STARTDATE
| | Auto-detected | Override start date for initial import (YYYY-MM-DD) |FORCE_STARTDATE
| | false | When true, skip database lookup and use STARTDATE unconditionally |RESULTPERPAGE
| | 500 | API pagination size (max results per request) |INITIAL_OFFSET
| | 3 | Days to look back on first run |
The tool implements a dual table system to optimize performance for different use cases:
The application creates both a standard table and a partitioned table:
- Standard Table (MATOMO_TABLE_NAME): Traditional PostgreSQL table, suitable for smaller datasets or simpler deploymentsPARTITIONED_MATOMO_TABLE_NAME
- Partitioned Table (): Weekly partitioned table optimized for large datasets and improved query performance
Use the DESTINATION_TABLE environment variable to specify which table receives the imported data:
`bashWrite to standard table
export DESTINATION_TABLE=matomo
$3
Consider using partitioned tables when:
- Large Data Volumes: Importing months or years of analytics data
- Query Performance: Need faster queries on specific date ranges
- Maintenance Operations: Easier to manage large datasets with partition pruning
- Storage Optimization: Better compression and maintenance of historical data
Both tables share the same schema structure, ensuring compatibility regardless of your choice.
$3
When you need to temporarily override the automatic date detection (e.g., to re-import specific data or recover from errors):
`bash
export FORCE_STARTDATE=true
export STARTDATE=2024-01-01
`This configuration will:
- Skip checking the database for the last event
- Use the specified
STARTDATE unconditionally
- Useful for one-time re-imports or data recovery scenariosImportant: Remember to unset
FORCE_STARTDATE after your one-time import to restore normal automatic detection behavior.Cronjob Example:
`bash
Temporarily add to your cronjob environment variables:
FORCE_STARTDATE=true
STARTDATE=2024-10-15
MATOMO_URL=https://analytics.example.com/
MATOMO_SITE=1
MATOMO_KEY=your_api_token
PGDATABASE=postgresql://user:pass@host:5432/dbAfter import completes, remove FORCE_STARTDATE to resume normal operation
`🏗️ Architecture
The tool follows a systematic ETL process:
1. 📅 Date Range Detection - Determines import range based on last sync or configuration
2. 📥 Data Extraction - Fetches visitor data from Matomo's
Live.getLastVisitsDetails API
3. 🔄 Data Transformation - Converts visits into structured events with proper typing
4. 💾 Data Loading - Inserts events into PostgreSQL with conflict resolution
5. 📈 Progress Tracking - Provides detailed logging and resumable operations$3
The tool creates a comprehensive table structure capturing:
- Visitor Information: IDs, geographic location, device details
- Session Metrics: Duration, visit count, visitor type
- Event Data: Actions, categories, values, timestamps (UTC)
- Custom Dimensions: Flexible JSON fields for custom tracking
- Performance Data: Screen resolution, time spent per action
🛠️ Development
$3
1. Start PostgreSQL:
`bash
docker-compose up
`2. Set Environment Variables:
`bash
export MATOMO_URL=https://your-matomo-instance/
export MATOMO_SITE=your_site_id
export MATOMO_KEY=your_api_token
export PGDATABASE=postgres://postgres:postgres@127.0.0.1:5455/postgres
`3. Run the Application:
`bash
pnpm start
`$3
`bash
Build TypeScript
pnpm buildRun tests
pnpm testUpdate test snapshots
pnpm test -uLint code
pnpm lintFix linting issues
pnpm lint:fixRun database migrations
pnpm migrate
`🗄️ Database Migrations
Database schema is managed through Kysely migrations located in
./src/migrations/:Migrations run automatically on each
pnpm start to ensure schema compatibility.📊 Data Flow
1. Initialization - Determine import date range based on:
- Explicit date parameter
- Last event timestamp in database
-
STARTDATE environment variable
- Default offset from current date2. Sequential Processing - For each date:
- Check existing records for pagination offset
- Fetch visitor data in paginated chunks
- Transform visits into individual events
- Insert with conflict resolution
3. Concurrency Control:
- Sequential date processing (one day at a time)
- Parallel event insertion (configurable)
- Automatic pagination for large datasets
🐛 Troubleshooting
$3
API Authentication Errors
- Verify
MATOMO_KEY has sufficient permissions
- Ensure MATOMO_SITE ID is correct
- Check MATOMO_URL includes trailing slashDatabase Connection Issues
- Verify PostgreSQL is running and accessible
- Check
PGDATABASE connection string format
- Ensure database exists and user has write permissionsPerformance Issues
- Adjust
RESULTPERPAGE for optimal API performance
- Monitor database indexes and partitioning
- Consider running during off-peak hours for large imports$3
Enable detailed logging:
`bash
DEBUG=matomo-postgres* npx @socialgouv/matomo-postgres
``