Fast, lightweight Open Graph, Twitter Card, and structured data extractor for Node.js with caching and validation
npm install @devmehq/open-graph-extractor


Fast, lightweight, and comprehensive Open Graph extractor for Node.js with advanced features
Extract Open Graph tags, Twitter Cards, structured data, and 60+ meta tag types with built-in caching, validation, and bulk processing. Optimized for performance and security.
- ๐ Lightning Fast: Built-in caching with tiny-lru and optimized parsing
- ๐ฏ Production Ready: Comprehensive error handling, validation, and security features
- ๐ Most Complete: Extracts Open Graph, Twitter Cards, JSON-LD, Schema.org, and 60+ meta tags
- ๐ Smart Analytics: Built-in validation, social scoring, and performance metrics
- ๐ก๏ธ Security First: HTML sanitization, URL validation, and PII protection (Node.js only)
- ๐ง Developer Friendly: Full TypeScript support, modern async/await API
``bashUsing yarn (recommended)
yarn add @devmehq/open-graph-extractor
๐ Quick Start
$3
`typescript
import axios from 'axios';
import { extractOpenGraph } from '@devmehq/open-graph-extractor';// Fetch HTML and extract Open Graph data
const { data: html } = await axios.get('https://example.com');
const ogData = extractOpenGraph(html);
console.log(ogData);
// {
// ogTitle: 'Example Title',
// ogDescription: 'Example Description',
// ogImage: 'https://example.com/image.jpg',
// twitterCard: 'summary_large_image',
// favicon: 'https://example.com/favicon.ico'
// // ... 60+ more fields
// }
`$3
`typescript
import { extractOpenGraphAsync } from '@devmehq/open-graph-extractor';// Extract with validation, caching, and structured data
const result = await extractOpenGraphAsync(html, {
extractStructuredData: true,
validateData: true,
generateScore: true,
cache: {
enabled: true,
ttl: 3600, // 1 hour
storage: 'memory'
},
security: {
sanitizeHtml: true,
validateUrls: true
}
});
console.log(result);
// {
// data: { / Complete Open Graph data / },
// structuredData: { / JSON-LD, Schema.org, etc / },
// confidence: 95,
// errors: [],
// warnings: [],
// metrics: { / Performance data / }
// }
`๐ฏ Advanced Features
$3
`typescript
const result = await extractOpenGraphAsync(html, {
extractStructuredData: true
});console.log(result.structuredData);
// {
// jsonLD: [...], // All JSON-LD scripts
// schemaOrg: {...}, // Schema.org microdata
// dublinCore: {...}, // Dublin Core metadata
// microdata: {...}, // Microdata
// rdfa: {...} // RDFa data
// }
`$3
`typescript
import { extractOpenGraphBulk } from '@devmehq/open-graph-extractor';const urls = ['url1', 'url2', 'url3'...];
const results = await extractOpenGraphBulk({
urls,
concurrency: 5,
rateLimit: {
requests: 100,
window: 60000 // 1 minute
},
onProgress: (completed, total, url) => {
console.log(
Processing ${completed}/${total}: ${url});
}
});
`$3
`typescript
import { validateOpenGraph, generateSocialScore } from '@devmehq/open-graph-extractor';// Validate Open Graph data
const validation = validateOpenGraph(ogData);
console.log(validation);
// {
// valid: false,
// errors: [...],
// warnings: [...],
// score: 75,
// recommendations: [...]
// }
// Get social media score
const score = generateSocialScore(ogData);
console.log(score);
// {
// overall: 82,
// openGraph: { score: 90, ... },
// twitter: { score: 75, ... },
// recommendations: [...]
// }
`$3
`typescript
const result = await extractOpenGraphAsync(html, {
security: {
sanitizeHtml: true, // XSS protection using Cheerio
detectPII: true, // PII detection
maskPII: true, // Mask sensitive data
validateUrls: true, // URL validation
allowedDomains: ['example.com'],
blockedDomains: ['malicious.com']
}
});
`$3
`typescript
// With built-in memory cache (tiny-lru)
const result = await extractOpenGraphAsync(html, {
cache: {
enabled: true,
ttl: 3600, // 1 hour
storage: 'memory',
maxSize: 1000
}
});// With custom cache (Redis example)
import Redis from 'ioredis';
const redis = new Redis();
const result = await extractOpenGraphAsync(html, {
cache: {
enabled: true,
ttl: 3600,
storage: 'custom',
customStorage: {
async get(key) {
const value = await redis.get(key);
return value ? JSON.parse(value) : null;
},
async set(key, value, ttl) {
await redis.setex(key, ttl, JSON.stringify(value));
},
async delete(key) {
await redis.del(key);
},
async clear() {
await redis.flushdb();
},
async has(key) {
return (await redis.exists(key)) === 1;
}
}
}
});
`$3
`typescript
const result = await extractOpenGraphAsync(html);// Automatically detects and prioritizes best images
console.log(result.data.ogImage);
// {
// url: 'https://example.com/image.jpg',
// type: 'jpg',
// width: '1200',
// height: '630',
// alt: 'Description'
// }
// For multiple images, set allMedia: true
const allMediaResult = extractOpenGraph(html, { allMedia: true });
console.log(allMediaResult.ogImage);
// [
// { url: '...', width: '1200', height: '630', type: 'jpg' },
// { url: '...', width: '800', height: '600', type: 'png' }
// ]
`๐ Complete API Reference
$3
####
extractOpenGraph(html, options?)
Synchronous extraction - Fast and lightweight for basic use cases.`typescript
import { extractOpenGraph } from '@devmehq/open-graph-extractor';const data = extractOpenGraph(html, {
customMetaTags: [
{ multiple: false, property: 'article:author', fieldName: 'author' }
],
allMedia: true, // Extract all images/videos
ogImageFallback: true, // Fallback to page images
onlyGetOpenGraphInfo: false // Include fallback content
});
`####
extractOpenGraphAsync(html, options?)
Asynchronous extraction - Full feature set with advanced capabilities.`typescript
import { extractOpenGraphAsync } from '@devmehq/open-graph-extractor';const result = await extractOpenGraphAsync(html, {
// Core options
extractStructuredData: true, // JSON-LD, Schema.org, Microdata
validateData: true, // Data validation
generateScore: true, // SEO/social scoring
extractArticleContent: true, // Article text extraction
detectLanguage: true, // Language detection
normalizeUrls: true, // URL normalization
// Advanced features
cache: { enabled: true, ttl: 3600 },
security: { sanitizeHtml: true, validateUrls: true }
});
`$3
####
IExtractOpenGraphOptions (Sync)
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| customMetaTags | Array | [] | Custom meta tags to extract |
| allMedia | boolean | false | Extract all images/videos instead of just the first |
| onlyGetOpenGraphInfo | boolean | false | Skip fallback content extraction |
| ogImageFallback | boolean | false | Enable image fallback from page content |####
IExtractOpenGraphOptions (Async) - Extends Sync Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| extractStructuredData | boolean | false | Extract JSON-LD, Schema.org, Microdata |
| validateData | boolean | false | Validate extracted Open Graph data |
| generateScore | boolean | false | Generate SEO/social media score (0-100) |
| extractArticleContent | boolean | false | Extract main article text content |
| detectLanguage | boolean | false | Detect content language and text direction |
| normalizeUrls | boolean | false | Normalize and clean all URLs |
| cache | ICacheOptions | undefined | Caching configuration |
| security | ISecurityOptions | undefined | Security and validation settings |####
ICacheOptions
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| enabled | boolean | false | Enable caching |
| ttl | number | 3600 | Time-to-live in seconds |
| storage | string | 'memory' | Storage type: 'memory', 'redis', 'custom' |
| maxSize | number | 1000 | Maximum cache entries (memory only) |
| keyGenerator | Function | - | Custom cache key generator |
| customStorage | ICacheStorage | - | Custom storage implementation |####
ISecurityOptions
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| sanitizeHtml | boolean | false | Sanitize HTML content (XSS protection) |
| detectPII | boolean | false | Detect personally identifiable information |
| maskPII | boolean | false | Mask detected PII in results |
| validateUrls | boolean | false | Validate and filter URLs |
| maxRedirects | number | 5 | Maximum URL redirects to follow |
| timeout | number | 10000 | Request timeout in milliseconds |
| allowedDomains | string[] | [] | Allowed domains whitelist |
| blockedDomains | string[] | [] | Blocked domains blacklist |$3
####
IOGResult (Sync)
Basic extraction result with 60+ fields:`typescript
{
ogTitle?: string;
ogDescription?: string;
ogImage?: string | string[] | IOgImage | IOgImage[];
ogUrl?: string;
ogType?: OGType;
twitterCard?: TwitterCardType;
favicon?: string;
// ... 50+ more fields including:
// Twitter Cards, App Links, Article metadata,
// Product info, Music data, Dublin Core, etc.
}
`####
IExtractionResult (Async)
Enhanced result with validation and metrics:`typescript
{
data: IOGResult; // Extracted Open Graph data
structuredData: { // Structured data extraction
jsonLD: any[];
schemaOrg: any;
microdata: any;
rdfa: any;
dublinCore: any;
};
errors: IError[]; // Validation errors
warnings: IWarning[]; // Validation warnings
confidence: number; // Confidence score (0-100)
confidenceLevel: 'high' | 'medium' | 'low';
fallbacksUsed: string[]; // Which fallbacks were used
metrics: IMetrics; // Performance metrics
validation?: IValidationResult; // Validation details (if enabled)
socialScore?: ISocialScore; // Social media scoring (if enabled)
}
`$3
####
validateOpenGraph(data)
Validates Open Graph data against specifications.`typescript
import { validateOpenGraph } from '@devmehq/open-graph-extractor';const validation = validateOpenGraph(ogData);
console.log(validation);
// {
// valid: boolean,
// errors: IError[],
// warnings: IWarning[],
// score: number,
// recommendations: string[]
// }
`####
generateSocialScore(data)
Generates social media optimization score (0-100).`typescript
import { generateSocialScore } from '@devmehq/open-graph-extractor';const score = generateSocialScore(ogData);
console.log(score);
// {
// overall: number,
// openGraph: { score, present, missing, issues },
// twitter: { score, present, missing, issues },
// schema: { score, present, missing, issues },
// seo: { score, present, missing, issues },
// recommendations: string[]
// }
`####
extractOpenGraphBulk(options)
Process multiple URLs concurrently with rate limiting.`typescript
import { extractOpenGraphBulk } from '@devmehq/open-graph-extractor';const results = await extractOpenGraphBulk({
urls: ['url1', 'url2', 'url3'],
concurrency: 5, // Process 5 URLs simultaneously
rateLimit: { // Rate limiting
requests: 100, // Max 100 requests
window: 60000 // Per 60 seconds
},
continueOnError: true, // Don't stop on individual failures
onProgress: (completed, total, url) => {
console.log(
Progress: ${completed}/${total} - ${url});
},
onError: (url, error) => {
console.error(Failed to process ${url}:, error);
}
});console.log(results.summary);
// {
// total: number,
// successful: number,
// failed: number,
// totalDuration: number,
// averageDuration: number
// }
`๐จ Custom Meta Tags
`typescript
// Extract custom meta tags
const result = extractOpenGraph(html, {
customMetaTags: [
{
multiple: false,
property: 'article:author',
fieldName: 'articleAuthor'
},
{
multiple: true,
property: 'article:tag',
fieldName: 'articleTags'
}
]
});console.log(result.articleAuthor); // Custom field
console.log(result.articleTags); // Array of tags
`๐ Complete Feature Guide
$3
#### Meta Tag Extraction (60+ Types)
- Open Graph: Complete og:* tag support with type validation
- Twitter Cards: All twitter:* tags including player and app cards
- Dublin Core: dc:* metadata extraction
- App Links: al:* tags for mobile app deep linking
- Article Metadata: Publishing dates, authors, sections, tags
- Product Info: Prices, availability, condition, retailer data
- Music Metadata: Albums, artists, songs, duration
- Place/Location: GPS coordinates and location data
`typescript
// Automatically extracts all supported meta types
const data = extractOpenGraph(html);
console.log(data.ogTitle, data.twitterCard, data.articleAuthor);
`#### Intelligent Fallbacks
When meta tags are missing, the library intelligently falls back to:
-
tags for ogTitle
- Meta descriptions for ogDescription
- Page images for ogImage
- Canonical URLs for ogUrl
- Page content analysis for missing data`typescript
// Fallbacks work automatically
const data = extractOpenGraph(html, { ogImageFallback: true });
// Will find images even if og:image is missing
`$3
#### Structured Data Extraction
- JSON-LD: Parses all