Uncover the full audit trail of your email threads. Recursively reconstructs the entire conversation history with instant access to the original sender and true source message.
npm install email-origin-chainemail-forward-parser-recursive library with a 100% success rate (239/239). This includes validating message bodies and ensuring non-message snippets are correctly identified. See Test Coverage Report for details.
message/rfc822) and inline text parsing
mailparser and email-forward-parser with custom detectors for Outlook Live, French headers, and more.
bash
npm install email-origin-chain
`
$3
Analyzes an email to extract the most recent message in the chain and its full history.
* raw: string | Buffer | Readable - The full raw email source (recommended to pass as Buffer or Stream to preserve encoding).
* options: Options (optional) - Configuration for the extraction.
#### Example
`javascript
const { extractDeepestHybrid } = require('email-origin-chain');
const fs = require('fs');
// Recommendation: Pass the raw Buffer or Stream directly
const rawEml = fs.readFileSync('email.eml');
const result = await extractDeepestHybrid(rawEml);
// New: Support for Streams
const stream = fs.createReadStream('heavy-thread.eml');
const streamResult = await extractDeepestHybrid(stream);
`
$3
You can test any email file directly using the included extraction tool:
`bash
npx tsx bin/extract.ts tests/fixtures/complex-forward.eml
`
`typescript
import { extractDeepestHybrid } from 'email-origin-chain';
// Process a full EML with hybrid strategy
const result = await extractDeepestHybrid(rawEmailString);
// Process ONLY the text/inline forwards (ignore MIME layer)
const textOnlyResult = await extractDeepestHybrid(rawText, { skipMimeLayer: true });
console.log(result.text); // The deepest original message
console.log(result.history); // Full conversation chain
`
Options
| Option | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| skipMimeLayer | boolean | false | If true, ignores MIME parsing (rfc822) and processes the input as raw text only. Ideal for inputs that are already stripped of headers. |
| maxDepth | number | 5 | Maximum number of recursion levels for MIME parsing. |
| timeoutMs | number | 5000 | Timeout for MIME processing to prevent blocking on huge files. |
Response Format
The library returns a ResultObject with the following structure:
| Field | Type | Description |
| :--- | :--- | :--- |
| from | object \| null | { name?: string, address?: string }. |
| to | array | List of primary recipients. |
| cc | array | List of CC recipients. |
| subject | string \| null | The original subject line of the deepest message. |
| date_raw | string \| null | The original date string found in the email headers. |
| date_iso | string \| null | ISO 8601 UTC representation (normalized via any-date-parser). |
| text | string \| null | Cleaned body content of the deepest message. |
| full_body | string | The full decoded text body before chain splitting. |
| attachments | array | Metadata for MIME attachments found at the deepest level. |
| history | array | Conversation Chaining: Full audit trail of the discussion (see below). |
| confidence_score | number | Reliability score (0-100) based on signal analysis. |
| confidence_description | string | Human-readable explanation of the score. |
| confidence_signals | object | Key-value breakdown of triggered bonuses and penalties. |
| confidence_reasons | array | Detailed list of triggered scoring rules. |
| diagnostics | object | Metadata about the parsing process. |
$3
- method: Strategy used to find the deepest message.
- rfc822: Found via recursive MIME attachments (highest reliability).
- inline: Found via text pattern detection (forwarded blocks).
- fallback: No forward found, returning current message info or best-effort extraction.
- depth: Number of forward levels traversed (0 for original email).
- parsedOk: true if at least a sender (from) and subject were successfully extracted.
- warnings: Array of non-fatal issues (e.g., date normalization failure).
$3
Rather than just finding the "original" source, the library reconstructs the entire Conversation Chain (sometimes called Email Threading or Message Chaining). This allows you to audit every step of a transfer:
- history[0]: The deepest (oldest) message in the chain. Same as the root object.
- history[1...n-1]: Intermediate forwards/messages.
- history[n]: The root (most recent) message you actually received.
Each history entry contains its own from, to, cc, subject, date_iso, text, and flags (array of strings). The contact fields (from, to, cc) are structured as objects containing:
- name: The display name (e.g., "John Doe").
- address: The email address (e.g., "john@example.com").
#### Possible Flags:
- level:deepest: The original source of the thread.
- level:root: The entry representing the received email itself.
- trust:high_mime: Metadata from a real .eml attachment (100% reliable).
- trust:medium_inline: Metadata extracted from text patterns (best effort).
- method:crisp_engine: Detected via standard international patterns (Crisp).
- method:outlook_fr: Detected via standard rules (French, Outlook).
- method:outlook_reverse_fr: Detected via reversed rules (Envoyé before De).
- method:outlook_empty_header: Detected via permissive rules (No date/email).
- method:new_outlook: Detected via modern localized headers (handles bolding and mailto: tags).
- method:reply: Detected via international reply patterns (On ... wrote:).
- method:crisp: Detected via standard international patterns (Crisp/Fallback).
- content:silent_forward: The user forwarded the message without adding any text.
- date:unparseable: A date string was found but could not be normalized to ISO.
Confidence Scoring System
To ensure high-quality extraction from text-based forwards, the library uses a Signal-Based Confidence Score. It analyzes metrics like email address density, sender count consistency, and quote levels to detect "Garbage" or incomplete chains.
$3
- Baseline: 100% confidence for standard formatting (~2 emails per level).
- Penalties:
- Sender Mismatch: More senders found than levels detected (-75%).
- Quote Mismatch: Quote nesting deeper than detected levels (-75%).
- Partial Chain: Only 1 email detected per level (-50%).
- Ghost Forward: No emails found in text (-100%).
- Bonuses:
- Validated Density: High email density corroborated by context headers (+75%).
Check the Confidence Scoring Documentation for full details.
$3
`json
{
"from": { "name": "Original Sender Name", "address": "original@source.com" },
"subject": "Initial Topic",
"text": "The very first message content.",
"full_body": "Check this thread below!\n\n---------- Forwarded message ---------\nFrom: Intermediate Person ...",
"history": [
{
"depth": 2,
"from": { "name": "Original Sender Name", "address": "original@source.com" },
"text": "The very first message content.",
"flags": ["method:outlook_fr", "trust:medium_inline", "level:deepest"]
},
{
"depth": 1,
"from": { "name": "Intermediate Person", "address": "inter@company.com" },
"text": "",
"flags": ["method:crisp", "trust:medium_inline", "content:silent_forward"]
},
{
"depth": 0,
"from": { "name": "Me", "address": "me@provider.com" },
"text": "Check this thread below!",
"flags": ["trust:high_mime", "level:root"]
}
],
"diagnostics": {
"method": "inline",
"depth": 2,
"parsedOk": true,
"warnings": []
},
"confidence_score": 100,
"confidence_description": "High Confidence: Standard Density: Ratio 2.00 is optimal (~2 emails per level)",
"confidence_signals": {},
"confidence_reasons": [
"Standard Density: Ratio 2.00 is optimal (~2 emails per level)"
]
}
`
Examples
$3
When no forward is detected, the library returns the metadata of the email itself.
`typescript
const email = From: alice@example.com
;
const result = await extractDeepestHybrid(email);
console.log(result.diagnostics.depth); // 0
console.log(result.from.address); // "alice@example.com"
`
$3
The library recursively follows "Forwarded message" blocks to find the original sender.
`typescript
const doubleForward =
;
const result = await extractDeepestHybrid(doubleForward);
console.log(result.diagnostics.depth); // 2
console.log(result.from.address); // "original@source.com"
console.log(result.text); // "This is the very first message content."
`
$3
For complex corporate threads where a message is forwarded multiple times across different regional offices (e.g., mixing English and French headers).
`typescript
const extremeChain = From: boss@corp.com
;
const result = await extractDeepestHybrid(extremeChain);
console.log(result.diagnostics.depth); // 4 (5 messages total)
`
JSON Output Example (Extreme Case):
`json
{
"from": { "address": "original@source.com" },
"subject": "original request",
"text": "Hello, please forward this back to me.",
"full_body": "Check the bottom of this long thread.\n\n---------- Forwarded message ---------\nDe : Intermediate Manager...",
"history": [
{
"depth": 4,
"from": { "address": "original@source.com" },
"text": "Hello, please forward this back to me.",
"flags": ["method:crisp", "trust:medium_inline", "level:deepest"]
},
{
"depth": 3,
"from": { "address": "inter-1@provider.com" },
"text": "Ok noted, I am forwarding it back to you.",
"flags": ["method:crisp", "trust:medium_inline"]
},
{
"depth": 2,
"from": { "name": "Employee", "address": "real.end@gmail.com" },
"text": "Great Yodjii, thank you",
"flags": ["method:outlook_empty_header", "trust:medium_inline"]
},
{
"depth": 1,
"from": { "name": "Intermediate Manager", "address": "inter-2@corp.com" },
"text": "But it is quite normal!",
"flags": ["method:crisp", "trust:medium_inline"]
},
{
"depth": 0,
"from": { "address": "boss@corp.com" },
"text": "Check the bottom of this long thread.",
"flags": ["trust:high_mime", "level:root"]
}
],
"diagnostics": {
"method": "inline",
"depth": 4,
"parsedOk": true,
"warnings": []
},
"confidence_score": 100,
"confidence_description": "High Confidence: Standard Density: Ratio 2.00 is optimal (~2 emails per level)",
"confidence_signals": {},
"confidence_reasons": [
"Standard Density: Ratio 2.00 is optimal (~2 emails per level)"
]
}
`
$3
The library automatically handles international headers like "De:", "Objet:", "Message transféré".
`typescript
const frenchEmail =
;
const result = await extractDeepestHybrid(frenchEmail);
console.log(result.from.name); // "Expert Auto"
console.log(result.date_iso); // "2025-02-10T10:39:00.000Z"
`
Extensions & Plugins (Custom Detectors)
The library allows you to inject custom forward detectors to handle specific corporate headers, regional formats, or proprietary email barriers that are not covered by the default detectors.
This system is built on Dependency Injection, meaning your custom logic lives in your application code, not deeper in node_modules.
$3
Implement the ForwardDetector interface:
`typescript
import { extractDeepestHybrid, ForwardDetector, DetectionResult } from 'email-deepest-forward';
class MyCustomDetector implements ForwardDetector {
// Unique name for your detector (will appear in 'diagnostics.method')
name = 'my-custom-detector';
// Priority: Lower number = Higher priority.
// -100 = Override Everything (Expert Plugins)
// -40 to -20 = Specific Build-in Detectors (Outlook, FR, etc.)
// 100 = Crisp (Default International Engine)
// 150 = Reply (Fallback)
priority = -100;
detect(text: string): DetectionResult {
// Example: Detects '--- START FORWARD ---'
const marker = '--- START FORWARD ---';
const idx = text.indexOf(marker);
if (idx !== -1) {
// Extracted body (text AFTER the marker)
const body = text.substring(idx + marker.length).trim();
// Text BEFORE the marker (the message from the forwarder)
const message = text.substring(0, idx).trim();
return {
found: true,
detector: this.name,
confidence: 'high',
message: message, // Important for history reconstruction
email: {
from: { name: 'Detected Sender', address: 'sender@example.com' },
subject: 'Extracted Subject',
date: new Date().toISOString(),
body: body
}
};
}
return { found: false, confidence: 'low' };
}
}
`
$3
Pass your detector instance in the options.customDetectors array:
`typescript
const result = await extractDeepestHybrid(emailContent, {
customDetectors: [ new MyCustomDetector() ]
});
console.log(result.diagnostics.method); // "method:my-custom-detector"
`
---
$3
If you pass a string that isn't an email (e.g., a simple welcome message), the library returns the text but sets parsedOk to false.
`typescript
const result = await extractDeepestHybrid("Welcome to our platform!");
console.log(result.from); // null
console.log(result.full_body); // "Welcome to our platform!"
console.log(result.diagnostics.parsedOk); // false
console.log(result.text); // "Welcome to our platform!"
`
$3
If a date cannot be normalized to ISO format, date_iso will be null and a warning will be added. You can still access the original string via date_raw.
`typescript
const result = await extractDeepestHybrid(emailWithBadDate);
if (!result.date_iso) {
console.warn(result.diagnostics.warnings[0]); // "Could not normalize date: ..."
console.log("Raw date was:", result.date_raw);
}
`
$3
The library strictly requires a string input and will throw an Error otherwise.
`typescript
try {
await extractDeepestHybrid(null as any);
} catch (e) {
console.error(e.message); // "Input must be a string"
}
`
The Expert Cleaner Utility
All built-in detectors use the Cleaner utility to ensure consistent text normalization across recursion levels.
$3
- Normalization: Unifies line breaks (\r\n -> \n), removes BOM, handles .
- Memoization: Cache layer to prevent re-processing the same text multiple times.
- Quote Stripping: Expertly removes > prefixes while preserving body structure.
- Boundary Detection: Uses the "Double Newline" rule found in professional parsers.
`typescript
import { Cleaner } from 'email-origin-chain/utils/cleaner';
const normalized = Cleaner.normalize(rawText);
const bodyOnly = Cleaner.extractBody(lines, lastHeaderIndex);
const quoteFree = Cleaner.stripQuotes(bodyOnly);
`
Strategy
1. MIME Layer: Recursively descends through message/rfc822 attachments using mailparser.
2. Inline Layer: Iteratively scans the body for forwarded blocks using email-forward-parser patterns (supports multi-language).
3. Date Normalization: Uses any-date-parser and luxon` for resilient international date parsing.