LLM Matchmaker Simulation

An experiment to predict dating app accept/reject decisions using LLM personas. Each user profile becomes an AI agent that evaluates potential matches based on personality, preferences, and writing style.

> Note: This project uses OpenRouter, which provides unified API access to hundreds of AI models including GPT-5-mini, GPT-4o, Claude, and more. OpenRouter offers better pricing and model flexibility compared to using OpenAI directly.

🆕 MCP Server Available!

You can now run the photo analysis pipeline through a Model Context Protocol (MCP) server, allowing AI assistants like Claude Desktop to analyze profiles directly!

Quick Start:
``

bash

./setup-mcp.sh  # Unix/Mac

or

.\setup-mcp.ps1  # Windows





See QUICKSTART_MCP.md for 5-minute setup guide or MCP_SERVER_README.md for full documentation.



Overview



This system:

1. Reads match pairs from

out.json

 (which users were matched)

2. Looks up detailed profile data from

profiles.json

 

3. Simulates each user's decision using an LLM persona

4. Compares predictions to actual decisions from

out.json



5. Reports accuracy metrics



Setup



$3



- Bun runtime (or Node.js)

- OpenRouter API key (Get one here)



$3

bash

Install dependencies

bun install



Or with npm

npm install





$3



Create a

.env

 file with your OpenRouter API key:

bash

Single API key for all models

OPENROUTER_API_KEY=your_api_key_here



Model configuration (OpenRouter format: provider/model-name)

TEXT_MODEL=openai/gpt-5-mini          # Default: text-only evaluation

VISION_MODEL=google/gemini-pro-1.5    # Default: vision + text evaluation



Other available models:

Text models: openai/gpt-4o-mini, anthropic/claude-3.5-sonnet

Vision models: google/gemini-flash-1.5, openai/gpt-4o, anthropic/claude-3.5-sonnet

See https://openrouter.ai/models for full list



Scoring weights (must sum to 1.0, default values shown)

WEIGHT_PHYSICAL=0.50      # Physical attraction (50%)

WEIGHT_LIFESTYLE=0.30     # Lifestyle compatibility (30%)

WEIGHT_PERSONALITY=0.15   # Personality match (15%)

WEIGHT_INTENTIONS=0.05    # Dating intentions alignment (5%)



Optional: Set sample size (default: 10)

SAMPLE_SIZE=10



Optional: Set offset for which matches to test

SAMPLE_OFFSET=0





Usage



$3

bash

bun index.ts





This will:

- Load profiles and matches

- Simulate decisions for the specified sample size

- Display results in the console

- Save detailed results to

simulation_results.json

$3



🚀 Starting LLM Matchmaker Simulation



Text Model: openai/gpt-5-mini

Vision Model: google/gemini-pro-1.5

Sample size: 10

Sample offset: 0



📚 Loading profiles.json...

✅ Loaded 8102 profiles

✅ Mapped 8100 profiles by userId

📊 Loading out.json...

✅ Loaded 3495 match interactions



🎯 Running simulation on 10 matches (indices 0 to 9)...



======================================================================

Match 1/10

======================================================================

User 1 ID: 68dff2779005e5f5dacbddd0

  Profile: Female, 35 years old

User 2 ID: 680c097e8cf7846e270a19ec

  Profile: Male, 31 years old



👤 User 1 (68dff2779005e5f5dacbddd0) evaluating User 2 (680c097e8cf7846e270a19ec)...

   Real decision: ❌ REJECT

   Predicted: ❌ REJECT (overall: 0.42)

   Result: ✅ CORRECT



   📊 CATEGORY SCORES:

      Physical:    0.35 (Text: 0.30, Vision: 0.40)

      Lifestyle:   0.45 (Text: 0.50, Vision: 0.40)

      Personality: 0.50 (Text: 0.50, Vision: 0.50)

      Intentions:  0.60 (Text: 0.70, Vision: 0.50)



   💭 TEXT REASONING:

   While they seem nice, our hobbies don't align well. I'm into outdoor 

   activities but they prefer staying in. Height difference might also 

   be an issue given my preferences.



   👁️  VISUAL ANALYSIS:

   Photos show a casual style with indoor settings. They appear to prefer

   cozy environments. Not quite my physical type based on my preferences.



...



============================================================

📊 FINAL RESULTS

============================================================



Overall Accuracy: 75.00% (15/20 correct)

User 1 Accuracy: 70.00%

User 2 Accuracy: 80.00%



Real Acceptance Rate: 25.00%

Predicted Acceptance Rate: 30.00%



Score Analysis:

  Average Score: 0.452

  Average Score for Real Accepts: 0.683

  Average Score for Real Rejects: 0.341

  Score Separation: 0.342 (higher is better)



Confusion Matrix:

  True Positives: 4

  True Negatives: 11

  False Positives: 1

  False Negatives: 4





Project Structure



.

├── types.ts          # TypeScript interfaces for profiles and matches

├── prompts.ts        # LLM prompt engineering for persona simulation

├── index.ts          # Main simulation engine

├── results.ts        # Metrics calculation and reporting utilities

├── validate.ts       # Data validation script (no API calls)

├── profiles.json     # User profile data (dictionary)

├── out.json          # Match interactions (input + ground truth)

├── simulation_results.json  # Detailed output (generated)

├── README.md         # This file

├── SETUP.md          # Setup instructions

├── SCORING_SYSTEM.md # Details on the 0-1 scoring system

└── OUTPUT_GUIDE.md   # Guide to enhanced debugging output





How It Works



$3



1. Input:

out.json

 provides match pairs and actual decisions

2. Lookup:

profiles.json provides detailed profile data indexed by userId



3. Simulate: For each match, create two LLM personas (one per user)

4. Compare: Compare predicted decisions to actual decisions



$3



Each user becomes an LLM agent with:

- Demographics (age, gender, height, ethnicity)

- Preferences (gender, age range, ethnicity, physical attraction)

- Personality (green flags, red flags, political/religious beliefs)

- Writing style (analyzed from text length and tone)



$3



The system evaluates 4 compatibility dimensions:



1. Physical Attraction (50% weight): Visual appeal, style, photos

2. Lifestyle Compatibility (30% weight): Hobbies, activities, interests

3. Personality Match (15% weight): Vibe, energy, communication style

4. Intention Alignment (5% weight): Dating goals (casual vs serious)



Each dimension scored 0.0-1.0, then combined via weighted average.



$3



Text Track: Text-only LLM reads profile descriptions and scores all 4 dimensions based on written content.



Vision Track: Vision-capable LLM (Gemini) analyzes actual profile photos and scores dimensions based on visual cues.



Final Score: Average of text and vision scores for each dimension, then weighted sum (≥0.5 = ACCEPT).



$3



Before LLM evaluation, the system checks:

- Gender preferences (

expectedGender

)

- Age range preferences (

ageRange

)

- Ethnicity preferences (

expectedEthnicity

)



If hard filters fail → instant reject (saves API costs).



Key Features



- Multi-dimensional scoring: 4 compatibility categories (Physical, Lifestyle, Personality, Intentions)

- Dual-track evaluation: Text-only LLM + Vision LLM analyze each candidate independently

- Vision analysis: Gemini evaluates actual profile photos, not just text descriptions

- Persona-based prediction: LLM adopts user's personality/preferences

- Weighted aggregation: Configurable weights for each compatibility dimension

- Continuous scoring (0-1): Nuanced confidence scores for each category

- Dual reasoning display: See both text-based reasoning and visual analysis

- Category breakdown: Understand exactly why matches succeed or fail

- Writing style analysis: Matches casual vs serious users

- Score separation metrics: Measures model's ability to distinguish accept/reject

- UserId debugging: Display actual userIds instead of redacted names

- Comprehensive metrics: Accuracy, confusion matrix, acceptance rates, per-category analysis



Tuning



To improve accuracy:



1. Adjust scoring weights in

.env

:

   - Increase

WEIGHT_PHYSICAL

 if physical attraction matters more

   - Increase

WEIGHT_LIFESTYLE

 if hobby alignment is key

   - Weights must sum to 1.0



2. Change models in

.env

:

   - Text models:

openai/gpt-5-mini, anthropic/claude-3.5-sonnet



   - Vision models:

google/gemini-pro-1.5, google/gemini-flash-1.5, openai/gpt-4o



   - See OpenRouter models for full list



3. Adjust prompts in

prompts.ts

:

   - Make personas more picky/lenient

   - Emphasize different evaluation criteria



4. Modify evaluation strategy in

index.ts

:

   - Change how text/vision scores are merged (currently simple average)

   - Add minimum thresholds for specific categories



Cost Estimation



Dual-Track System (Text + Vision):



Using

openai/gpt-5-mini + google/gemini-pro-1.5

:

- Sample of 10 matches: ~$0.08-0.15 (40 API calls: 20 text + 20 vision)

- Full dataset (~3.5K matches): ~$280-525 (14K API calls)



Cost breakdown per match:

- Text track: ~$0.02 (gpt-5-mini)

- Vision track: ~$0.04-0.08 (gemini-pro-1.5 with 3-5 images)

- Total per match: ~$0.06-0.10



Budget Options:

- Use

google/gemini-flash-1.5

 for faster, cheaper vision (~50% cost reduction)

- Reduce to text-only by setting

VISION_MODEL=""` (50% cost reduction, but loses visual analysis)

💡 Tip: Check OpenRouter pricing for current rates and model availability.

Success Criteria

- ✅ >50% accuracy (baseline)
- 🎯 65-75% accuracy (target)
- ✅ Reasonable persona behavior (qualitative)

License

MIT

LLM Matchmaker Simulation

🆕 MCP Server Available!

You can now run the photo analysis pipeline through a Model Context Protocol (MCP) server, allowing AI assistants like Claude Desktop to analyze profiles directly!

Quick Start:
``

bash

./setup-mcp.sh  # Unix/Mac

or

.\setup-mcp.ps1  # Windows





See QUICKSTART_MCP.md for 5-minute setup guide or MCP_SERVER_README.md for full documentation.



Overview



This system:

1. Reads match pairs from

out.json

 (which users were matched)

2. Looks up detailed profile data from

profiles.json

 

3. Simulates each user's decision using an LLM persona

4. Compares predictions to actual decisions from

out.json



5. Reports accuracy metrics



Setup



$3



- Bun runtime (or Node.js)

- OpenRouter API key (Get one here)



$3

bash

Install dependencies

bun install



Or with npm

npm install





$3



Create a

.env

 file with your OpenRouter API key:

bash

Single API key for all models

OPENROUTER_API_KEY=your_api_key_here



Model configuration (OpenRouter format: provider/model-name)

TEXT_MODEL=openai/gpt-5-mini          # Default: text-only evaluation

VISION_MODEL=google/gemini-pro-1.5    # Default: vision + text evaluation



Other available models:

Text models: openai/gpt-4o-mini, anthropic/claude-3.5-sonnet

Vision models: google/gemini-flash-1.5, openai/gpt-4o, anthropic/claude-3.5-sonnet

See https://openrouter.ai/models for full list



Scoring weights (must sum to 1.0, default values shown)

WEIGHT_PHYSICAL=0.50      # Physical attraction (50%)

WEIGHT_LIFESTYLE=0.30     # Lifestyle compatibility (30%)

WEIGHT_PERSONALITY=0.15   # Personality match (15%)

WEIGHT_INTENTIONS=0.05    # Dating intentions alignment (5%)



Optional: Set sample size (default: 10)

SAMPLE_SIZE=10



Optional: Set offset for which matches to test

SAMPLE_OFFSET=0





Usage



$3

bash

bun index.ts





This will:

- Load profiles and matches

- Simulate decisions for the specified sample size

- Display results in the console

- Save detailed results to

simulation_results.json

$3



🚀 Starting LLM Matchmaker Simulation



Text Model: openai/gpt-5-mini

Vision Model: google/gemini-pro-1.5

Sample size: 10

Sample offset: 0



📚 Loading profiles.json...

✅ Loaded 8102 profiles

✅ Mapped 8100 profiles by userId

📊 Loading out.json...

✅ Loaded 3495 match interactions



🎯 Running simulation on 10 matches (indices 0 to 9)...



======================================================================

Match 1/10

======================================================================

User 1 ID: 68dff2779005e5f5dacbddd0

  Profile: Female, 35 years old

User 2 ID: 680c097e8cf7846e270a19ec

  Profile: Male, 31 years old



👤 User 1 (68dff2779005e5f5dacbddd0) evaluating User 2 (680c097e8cf7846e270a19ec)...

   Real decision: ❌ REJECT

   Predicted: ❌ REJECT (overall: 0.42)

   Result: ✅ CORRECT



   📊 CATEGORY SCORES:

      Physical:    0.35 (Text: 0.30, Vision: 0.40)

      Lifestyle:   0.45 (Text: 0.50, Vision: 0.40)

      Personality: 0.50 (Text: 0.50, Vision: 0.50)

      Intentions:  0.60 (Text: 0.70, Vision: 0.50)



   💭 TEXT REASONING:

   While they seem nice, our hobbies don't align well. I'm into outdoor 

   activities but they prefer staying in. Height difference might also 

   be an issue given my preferences.



   👁️  VISUAL ANALYSIS:

   Photos show a casual style with indoor settings. They appear to prefer

   cozy environments. Not quite my physical type based on my preferences.



...



============================================================

📊 FINAL RESULTS

============================================================



Overall Accuracy: 75.00% (15/20 correct)

User 1 Accuracy: 70.00%

User 2 Accuracy: 80.00%



Real Acceptance Rate: 25.00%

Predicted Acceptance Rate: 30.00%



Score Analysis:

  Average Score: 0.452

  Average Score for Real Accepts: 0.683

  Average Score for Real Rejects: 0.341

  Score Separation: 0.342 (higher is better)



Confusion Matrix:

  True Positives: 4

  True Negatives: 11

  False Positives: 1

  False Negatives: 4





Project Structure



.

├── types.ts          # TypeScript interfaces for profiles and matches

├── prompts.ts        # LLM prompt engineering for persona simulation

├── index.ts          # Main simulation engine

├── results.ts        # Metrics calculation and reporting utilities

├── validate.ts       # Data validation script (no API calls)

├── profiles.json     # User profile data (dictionary)

├── out.json          # Match interactions (input + ground truth)

├── simulation_results.json  # Detailed output (generated)

├── README.md         # This file

├── SETUP.md          # Setup instructions

├── SCORING_SYSTEM.md # Details on the 0-1 scoring system

└── OUTPUT_GUIDE.md   # Guide to enhanced debugging output





How It Works



$3



1. Input:

out.json

 provides match pairs and actual decisions

2. Lookup:

profiles.json provides detailed profile data indexed by userId



3. Simulate: For each match, create two LLM personas (one per user)

4. Compare: Compare predicted decisions to actual decisions



$3



Each user becomes an LLM agent with:

- Demographics (age, gender, height, ethnicity)

- Preferences (gender, age range, ethnicity, physical attraction)

- Personality (green flags, red flags, political/religious beliefs)

- Writing style (analyzed from text length and tone)



$3



The system evaluates 4 compatibility dimensions:



1. Physical Attraction (50% weight): Visual appeal, style, photos

2. Lifestyle Compatibility (30% weight): Hobbies, activities, interests

3. Personality Match (15% weight): Vibe, energy, communication style

4. Intention Alignment (5% weight): Dating goals (casual vs serious)



Each dimension scored 0.0-1.0, then combined via weighted average.



$3



Text Track: Text-only LLM reads profile descriptions and scores all 4 dimensions based on written content.



Vision Track: Vision-capable LLM (Gemini) analyzes actual profile photos and scores dimensions based on visual cues.



Final Score: Average of text and vision scores for each dimension, then weighted sum (≥0.5 = ACCEPT).



$3



Before LLM evaluation, the system checks:

- Gender preferences (

expectedGender

)

- Age range preferences (

ageRange

)

- Ethnicity preferences (

expectedEthnicity

)



If hard filters fail → instant reject (saves API costs).



Key Features



- Multi-dimensional scoring: 4 compatibility categories (Physical, Lifestyle, Personality, Intentions)

- Dual-track evaluation: Text-only LLM + Vision LLM analyze each candidate independently

- Vision analysis: Gemini evaluates actual profile photos, not just text descriptions

- Persona-based prediction: LLM adopts user's personality/preferences

- Weighted aggregation: Configurable weights for each compatibility dimension

- Continuous scoring (0-1): Nuanced confidence scores for each category

- Dual reasoning display: See both text-based reasoning and visual analysis

- Category breakdown: Understand exactly why matches succeed or fail

- Writing style analysis: Matches casual vs serious users

- Score separation metrics: Measures model's ability to distinguish accept/reject

- UserId debugging: Display actual userIds instead of redacted names

- Comprehensive metrics: Accuracy, confusion matrix, acceptance rates, per-category analysis



Tuning



To improve accuracy:



1. Adjust scoring weights in

.env

:

   - Increase

WEIGHT_PHYSICAL

 if physical attraction matters more

   - Increase

WEIGHT_LIFESTYLE

 if hobby alignment is key

   - Weights must sum to 1.0



2. Change models in

.env

:

   - Text models:

openai/gpt-5-mini, anthropic/claude-3.5-sonnet



   - Vision models:

google/gemini-pro-1.5, google/gemini-flash-1.5, openai/gpt-4o



   - See OpenRouter models for full list



3. Adjust prompts in

prompts.ts

:

   - Make personas more picky/lenient

   - Emphasize different evaluation criteria



4. Modify evaluation strategy in

index.ts

:

   - Change how text/vision scores are merged (currently simple average)

   - Add minimum thresholds for specific categories



Cost Estimation



Dual-Track System (Text + Vision):



Using

openai/gpt-5-mini + google/gemini-pro-1.5

:

- Sample of 10 matches: ~$0.08-0.15 (40 API calls: 20 text + 20 vision)

- Full dataset (~3.5K matches): ~$280-525 (14K API calls)



Cost breakdown per match:

- Text track: ~$0.02 (gpt-5-mini)

- Vision track: ~$0.04-0.08 (gemini-pro-1.5 with 3-5 images)

- Total per match: ~$0.06-0.10



Budget Options:

- Use

google/gemini-flash-1.5

 for faster, cheaper vision (~50% cost reduction)

- Reduce to text-only by setting

VISION_MODEL=""` (50% cost reduction, but loses visual analysis)

💡 Tip: Check OpenRouter pricing for current rates and model availability.

Success Criteria

- ✅ >50% accuracy (baseline)
- 🎯 65-75% accuracy (target)
- ✅ Reasonable persona behavior (qualitative)

License

MIT

matchmaker-analysis-mcp

LLM Matchmaker Simulation

🆕 MCP Server Available!

or

Overview

Setup

$3

$3

Install dependencies

Or with npm

$3

Single API key for all models

Model configuration (OpenRouter format: provider/model-name)

Other available models:

Text models: openai/gpt-4o-mini, anthropic/claude-3.5-sonnet

Vision models: google/gemini-flash-1.5, openai/gpt-4o, anthropic/claude-3.5-sonnet

See https://openrouter.ai/models for full list

Scoring weights (must sum to 1.0, default values shown)

Optional: Set sample size (default: 10)

Optional: Set offset for which matches to test

Usage

$3

$3

Project Structure

How It Works

$3

$3

$3

$3

$3

Key Features

Tuning

Cost Estimation

Success Criteria

License

matchmaker-analysis-mcp

LLM Matchmaker Simulation

🆕 MCP Server Available!

or

Overview

Setup

$3

$3

Install dependencies

Or with npm

$3

Single API key for all models

Model configuration (OpenRouter format: provider/model-name)

Other available models:

Text models: openai/gpt-4o-mini, anthropic/claude-3.5-sonnet

Vision models: google/gemini-flash-1.5, openai/gpt-4o, anthropic/claude-3.5-sonnet

See https://openrouter.ai/models for full list

Scoring weights (must sum to 1.0, default values shown)

Optional: Set sample size (default: 10)

Optional: Set offset for which matches to test

Usage

$3

$3

Project Structure

How It Works

$3

$3

$3

$3

$3

Key Features

Tuning

Cost Estimation

Success Criteria

License