Compress and convert DOM snapshots to Markdown using TextRank algorithm.
npm install @rajnandan1/ellipsisA TypeScript library for DOM to Snapshot conversion - compress HTML documents for efficient processing by LLMs.
Based on the Beyond Pixels: Exploring DOM Downsampling for LLM-Based Web Agents.
``bash`
npm install @rajnandan1/ellipsis
`typescript
import { ellipsis, adaptiveEllipsis } from "@rajnandan1/ellipsis";
// Method 1: Manual control with k, l, m parameters
const snapshot = await ellipsis(document.body, 0.1, 0.3, 0.5);
// Method 2: Automatic compression to fit token budget
const adaptive = await adaptiveEllipsis(document.body, 4096);
`
The three parameters $k$, $l$, $m$ (each ∈ [0, 1]) control how much to downsample three different aspects of the DOM:
Controls: How many levels of nested container elements (like
, , ) to merge together.How it works: If a DOM has height 10, and $k = 0.5$, it merges containers to reduce the tree to roughly 5 levels deep.
| Value | Effect |
| -------------- | ---------------------------- |
| Low (0.1-0.3) | Preserve more structure |
| High (0.7-1.0) | Flatten more aggressively |
|
Infinity | Remove root wrapper entirely |Significance: Hierarchy is the most important feature. Low $k$ values (0.1-0.3) that preserve structure achieve the best performance because nesting conveys semantic relationships—which buttons belong to which sections, what content is grouped together, etc.
$3
Controls: What fraction of sentences to remove from text nodes.
How it works: Uses the TextRank algorithm to rank sentences by relevance, then eliminates the lowest-ranking fraction.
| Value | Effect |
| ----- | ----------------------------------- |
| 0 | Keep all text |
| 0.5 | Remove 50% least relevant sentences |
| 1.0 | Maximum text compression |
Significance: Text content matters for understanding what actions to take, but moderate text reduction is acceptable. The algorithm keeps the most informative sentences while reducing token count.
$3
Controls: Threshold for filtering HTML attributes based on their semantic importance.
How it works: Attributes are rated by a ground truth scoring system (e.g.,
href = 0.9, disabled = 0.5, style = 0.1). Only attributes scoring above $m$ are kept.| Value | Effect |
| -------------- | ------------------------------ |
| Low (0.1-0.3) | Keep most attributes |
| High (0.7-1.0) | Keep only essential attributes |
Significance: Attributes provide important metadata (like
type="button", disabled, required) that help agents understand element functionality, but many attributes are irrelevant for task completion.Recommended Configurations
Based on research findings, these configurations achieve optimal performance:
| Configuration | k | l | m | Use Case |
| ---------------------- | --- | --- | --- | -------------------------------------------- |
| Preserve Structure | 0.1 | 0 | 0 | Best for complex UIs, forms |
| Balanced | 0.3 | 0.3 | 0.5 | Good general-purpose compression |
| Aggressive | 0.6 | 0.9 | 0.3 | Maximum compression with hierarchy preserved |
| Linearized | ∞ | 0 | 1.0 | Flat output, minimal attributes |
Key insight: Hierarchy ($k$) has the strongest effect on agent performance, while text and attributes can be more aggressively downsampled without as much performance loss.
Methods
$3
Manual compression with explicit control over parameters.
`typescript
import { ellipsis } from "@rajnandan1/ellipsis";const snapshot = await ellipsis(
document.body, // DOM element to compress
0.1, // k: hierarchy (low = preserve structure)
0.3, // l: text compression
0.5, // m: attribute threshold
{
debug: true, // Format output for readability
assignUniqueIDs: true, // Add data-uid to elements
}
);
console.log(snapshot.serializedHtml);
console.log(snapshot.meta);
// { originalSize: 15000, snapshotSize: 3000, sizeRatio: 0.2, estimatedTokens: 750 }
`$3
Automatic compression that finds optimal parameters to fit within a token budget.
`typescript
import { adaptiveEllipsis } from "@rajnandan1/ellipsis";const snapshot = await adaptiveEllipsis(
document.body,
4096, // Target max tokens
5, // Max iterations to find parameters
{ debug: true }
);
console.log(snapshot.serializedHtml);
console.log(snapshot.parameters);
// { k: 0.142, l: 0.333, m: 0.333, adaptiveIterations: 2 }
`How it works: Uses a Halton sequence to explore the parameter space, progressively increasing compression until the output fits within the token budget.
API Reference
$3
`typescript
type EllipsisOptions = {
assignUniqueIDs?: boolean; // Add data-uid attributes to elements
debug?: boolean; // Pretty-print output HTML
keepUnknownElements?: boolean; // Keep elements not in ground truth
preserveAttribute?: string; // Attribute to mark preserved elements (default: "data-preserve")
skipMarkdownTranslation?: boolean; // Skip converting content to markdown
textRankOptions?: {
damping?: number; // TextRank damping factor (default: 0.75)
maxIterations?: number; // TextRank iterations (default: 20)
maxSentences?: number; // Max sentences to process
};
};
`$3
`typescript
type Snapshot = {
serializedHtml: string;
meta: {
originalSize: number; // Original HTML character count
snapshotSize: number; // Compressed HTML character count
sizeRatio: number; // snapshotSize / originalSize
estimatedTokens: number; // Approximate LLM tokens (~chars/4)
};
};// adaptiveEllipsis additionally returns:
type AdaptiveSnapshot = Snapshot & {
parameters: {
k: number;
l: number;
m: number;
adaptiveIterations: number;
};
};
`$3
`typescript
import type {
DOM,
EllipsisOptions,
Snapshot,
TextRankOptions,
} from "@rajnandan1/ellipsis";
`Preserving Elements
To prevent specific elements from being compressed, add the preserve attribute:
`html
$99.99
Important text
`Custom preserve attribute:
`typescript
const snapshot = await ellipsis(dom, 0.5, 0.5, 0.5, {
preserveAttribute: "data-keep",
});
``html
This stays intact
`Playground
Test the library interactively:
`bash
npm run playground
`Opens a browser-based tool to experiment with different parameters and see compression results in real-time.
Browser Usage (IIFE)
`html
``- Beyond Pixels: Exploring DOM Downsampling for LLM-Based Web Agents
MIT