Analyze HTML content visibility for AI crawlers and citations - compare static HTML vs fully rendered content
npm install @adobe/spacecat-shared-html-analyzerAnalyze HTML content visibility for AI crawlers and citations. Compare what humans see on websites versus what AI models (ChatGPT, Perplexity, etc.) can read when crawling pages for citations.
``bash`
npm install @adobe/spacecat-shared-html-analyzer
`javascript
import {
analyzeTextComparison,
calculateStats,
calculateBothScenarioStats
} from '@adobe/spacecat-shared-html-analyzer';
// Compare initial HTML (what crawlers see) vs rendered HTML (what users see)
const originalHtml = '
Dynamic content loaded by JS
';// Full text analysis (original chrome extension logic)
const analysis = await analyzeTextComparison(originalHtml, currentHtml);
console.log(analysis.textRetention); // 0.5 (50% text retention)
console.log(analysis.wordDiff); // Detailed word differences
// Basic comparison statistics
const stats = await calculateStats(originalHtml, currentHtml);
console.log(stats.citationReadability); // 50 (50% of content visible to AI)
console.log(stats.contentIncreaseRatio); // 2.3 (2.3x more content in rendered)
// Both scenarios (with/without nav filtering)
const bothStats = await calculateBothScenarioStats(originalHtml, currentHtml);
console.log(bothStats.withNavFooterIgnored.contentGain); // "2.3x"
console.log(bothStats.withoutNavFooterIgnored.missingWords); // Number of missing words
`
This package works in both Node.js and browser environments (including Chrome extensions):
- Node.js: Uses Cheerio for robust HTML parsing
- Browser/Chrome Extensions: Uses native DOMParser with automatic fallback
#### analyzeTextComparison(initHtml, finHtml, ignoreNavFooter)
Comprehensive text analysis between two HTML versions (original chrome extension logic).
Parameters:
- initHtml (string): HTML as seen by crawlers/AIfinHtml
- (string): HTML as seen by users (fully loaded)ignoreNavFooter
- (boolean, default: true): Remove nav/footer elements
Returns: Promise
#### calculateStats(originalHtml, currentHtml, ignoreNavFooter)
Get basic comparison statistics (original chrome extension logic).
Parameters:
- originalHtml (string): Initial HTML contentcurrentHtml
- (string): Final HTML contentignoreNavFooter
- (boolean, default: true): Whether to ignore navigation/footer elements
Returns: Promise
#### calculateBothScenarioStats(originalHtml, currentHtml)
Get comparison statistics for both nav/footer scenarios (original chrome extension logic).
Parameters:
- originalHtml (string): Initial HTML contentcurrentHtml
- (string): Final HTML content
Returns: Promise
#### Content Processing
- stripTagsToText(htmlContent, ignoreNavFooter): Extract plain text from HTMLfilterHtmlContent(htmlContent, ignoreNavFooter, returnText)
- : Advanced HTML filteringtokenize(text, mode)
- : Smart text tokenizationextractWordCount(htmlContent, ignoreNavFooter)
- : Get word counts
#### Diff Analysis
- diffTokens(text1, text2, mode): Generate LCS-based diffgenerateDiffReport(text1, text2, mode)
- : Comprehensive diff statistics
bash
npm run build
`$3
Generate a minified bundle for Chrome extensions:
`bash
npm run build:chrome
`This creates
dist/html-analyzer.min.js that can be included directly in Chrome extension manifest files. The bundle exposes HTMLAnalyzer globally.Version Information
To check the current package version:
$3
`javascript
import packageJson from '@adobe/spacecat-shared-html-analyzer/package.json';
console.log('Version:', packageJson.version);
`$3
`javascript
// After loading the bundle
console.log('Version:', HTMLAnalyzer.version); // "1.0.0"
console.log('Build target:', HTMLAnalyzer.buildFor); // "chrome-extension"
`The version follows Semantic Versioning (SemVer) - see
package.json for the official version.Testing
`bash
npm test
``This project is licensed under the Apache License 2.0 - see the LICENSE.txt file for details.