MCP server for dating profile analysis
npm install matchmaker-analysis-mcpbash
./setup-mcp.sh # Unix/Mac
or
.\setup-mcp.ps1 # Windows
`
See QUICKSTART_MCP.md for 5-minute setup guide or MCP_SERVER_README.md for full documentation.
Overview
This system:
1. Reads match pairs from out.json (which users were matched)
2. Looks up detailed profile data from profiles.json
3. Simulates each user's decision using an LLM persona
4. Compares predictions to actual decisions from out.json
5. Reports accuracy metrics
Setup
$3
- Bun runtime (or Node.js)
- OpenRouter API key (Get one here)
$3
`bash
Install dependencies
bun install
Or with npm
npm install
`
$3
Create a .env file with your OpenRouter API key:
`bash
Single API key for all models
OPENROUTER_API_KEY=your_api_key_here
Model configuration (OpenRouter format: provider/model-name)
TEXT_MODEL=openai/gpt-5-mini # Default: text-only evaluation
VISION_MODEL=google/gemini-pro-1.5 # Default: vision + text evaluation
Other available models:
Text models: openai/gpt-4o-mini, anthropic/claude-3.5-sonnet
Vision models: google/gemini-flash-1.5, openai/gpt-4o, anthropic/claude-3.5-sonnet
See https://openrouter.ai/models for full list
Scoring weights (must sum to 1.0, default values shown)
WEIGHT_PHYSICAL=0.50 # Physical attraction (50%)
WEIGHT_LIFESTYLE=0.30 # Lifestyle compatibility (30%)
WEIGHT_PERSONALITY=0.15 # Personality match (15%)
WEIGHT_INTENTIONS=0.05 # Dating intentions alignment (5%)
Optional: Set sample size (default: 10)
SAMPLE_SIZE=10
Optional: Set offset for which matches to test
SAMPLE_OFFSET=0
`
Usage
$3
`bash
bun index.ts
`
This will:
- Load profiles and matches
- Simulate decisions for the specified sample size
- Display results in the console
- Save detailed results to simulation_results.json
$3
`
š Starting LLM Matchmaker Simulation
Text Model: openai/gpt-5-mini
Vision Model: google/gemini-pro-1.5
Sample size: 10
Sample offset: 0
š Loading profiles.json...
ā
Loaded 8102 profiles
ā
Mapped 8100 profiles by userId
š Loading out.json...
ā
Loaded 3495 match interactions
šÆ Running simulation on 10 matches (indices 0 to 9)...
======================================================================
Match 1/10
======================================================================
User 1 ID: 68dff2779005e5f5dacbddd0
Profile: Female, 35 years old
User 2 ID: 680c097e8cf7846e270a19ec
Profile: Male, 31 years old
š¤ User 1 (68dff2779005e5f5dacbddd0) evaluating User 2 (680c097e8cf7846e270a19ec)...
Real decision: ā REJECT
Predicted: ā REJECT (overall: 0.42)
Result: ā
CORRECT
š CATEGORY SCORES:
Physical: 0.35 (Text: 0.30, Vision: 0.40)
Lifestyle: 0.45 (Text: 0.50, Vision: 0.40)
Personality: 0.50 (Text: 0.50, Vision: 0.50)
Intentions: 0.60 (Text: 0.70, Vision: 0.50)
š TEXT REASONING:
While they seem nice, our hobbies don't align well. I'm into outdoor
activities but they prefer staying in. Height difference might also
be an issue given my preferences.
šļø VISUAL ANALYSIS:
Photos show a casual style with indoor settings. They appear to prefer
cozy environments. Not quite my physical type based on my preferences.
...
============================================================
š FINAL RESULTS
============================================================
Overall Accuracy: 75.00% (15/20 correct)
User 1 Accuracy: 70.00%
User 2 Accuracy: 80.00%
Real Acceptance Rate: 25.00%
Predicted Acceptance Rate: 30.00%
Score Analysis:
Average Score: 0.452
Average Score for Real Accepts: 0.683
Average Score for Real Rejects: 0.341
Score Separation: 0.342 (higher is better)
Confusion Matrix:
True Positives: 4
True Negatives: 11
False Positives: 1
False Negatives: 4
`
Project Structure
`
.
āāā types.ts # TypeScript interfaces for profiles and matches
āāā prompts.ts # LLM prompt engineering for persona simulation
āāā index.ts # Main simulation engine
āāā results.ts # Metrics calculation and reporting utilities
āāā validate.ts # Data validation script (no API calls)
āāā profiles.json # User profile data (dictionary)
āāā out.json # Match interactions (input + ground truth)
āāā simulation_results.json # Detailed output (generated)
āāā README.md # This file
āāā SETUP.md # Setup instructions
āāā SCORING_SYSTEM.md # Details on the 0-1 scoring system
āāā OUTPUT_GUIDE.md # Guide to enhanced debugging output
`
How It Works
$3
1. Input: out.json provides match pairs and actual decisions
2. Lookup: profiles.json provides detailed profile data indexed by userId
3. Simulate: For each match, create two LLM personas (one per user)
4. Compare: Compare predicted decisions to actual decisions
$3
Each user becomes an LLM agent with:
- Demographics (age, gender, height, ethnicity)
- Preferences (gender, age range, ethnicity, physical attraction)
- Personality (green flags, red flags, political/religious beliefs)
- Writing style (analyzed from text length and tone)
$3
The system evaluates 4 compatibility dimensions:
1. Physical Attraction (50% weight): Visual appeal, style, photos
2. Lifestyle Compatibility (30% weight): Hobbies, activities, interests
3. Personality Match (15% weight): Vibe, energy, communication style
4. Intention Alignment (5% weight): Dating goals (casual vs serious)
Each dimension scored 0.0-1.0, then combined via weighted average.
$3
Text Track: Text-only LLM reads profile descriptions and scores all 4 dimensions based on written content.
Vision Track: Vision-capable LLM (Gemini) analyzes actual profile photos and scores dimensions based on visual cues.
Final Score: Average of text and vision scores for each dimension, then weighted sum (ā„0.5 = ACCEPT).
$3
Before LLM evaluation, the system checks:
- Gender preferences (expectedGender)
- Age range preferences (ageRange)
- Ethnicity preferences (expectedEthnicity)
If hard filters fail ā instant reject (saves API costs).
Key Features
- Multi-dimensional scoring: 4 compatibility categories (Physical, Lifestyle, Personality, Intentions)
- Dual-track evaluation: Text-only LLM + Vision LLM analyze each candidate independently
- Vision analysis: Gemini evaluates actual profile photos, not just text descriptions
- Persona-based prediction: LLM adopts user's personality/preferences
- Weighted aggregation: Configurable weights for each compatibility dimension
- Continuous scoring (0-1): Nuanced confidence scores for each category
- Dual reasoning display: See both text-based reasoning and visual analysis
- Category breakdown: Understand exactly why matches succeed or fail
- Writing style analysis: Matches casual vs serious users
- Score separation metrics: Measures model's ability to distinguish accept/reject
- UserId debugging: Display actual userIds instead of redacted names
- Comprehensive metrics: Accuracy, confusion matrix, acceptance rates, per-category analysis
Tuning
To improve accuracy:
1. Adjust scoring weights in .env:
- Increase WEIGHT_PHYSICAL if physical attraction matters more
- Increase WEIGHT_LIFESTYLE if hobby alignment is key
- Weights must sum to 1.0
2. Change models in .env:
- Text models: openai/gpt-5-mini, anthropic/claude-3.5-sonnet
- Vision models: google/gemini-pro-1.5, google/gemini-flash-1.5, openai/gpt-4o
- See OpenRouter models for full list
3. Adjust prompts in prompts.ts:
- Make personas more picky/lenient
- Emphasize different evaluation criteria
4. Modify evaluation strategy in index.ts:
- Change how text/vision scores are merged (currently simple average)
- Add minimum thresholds for specific categories
Cost Estimation
Dual-Track System (Text + Vision):
Using openai/gpt-5-mini + google/gemini-pro-1.5:
- Sample of 10 matches: ~$0.08-0.15 (40 API calls: 20 text + 20 vision)
- Full dataset (~3.5K matches): ~$280-525 (14K API calls)
Cost breakdown per match:
- Text track: ~$0.02 (gpt-5-mini)
- Vision track: ~$0.04-0.08 (gemini-pro-1.5 with 3-5 images)
- Total per match: ~$0.06-0.10
Budget Options:
- Use google/gemini-flash-1.5 for faster, cheaper vision (~50% cost reduction)
- Reduce to text-only by setting VISION_MODEL=""` (50% cost reduction, but loses visual analysis)