TagSoup is the fastest pure JS SAX/DOM XML/HTML parser and serializer.

- Extremely low memory consumption.
- Tolerant of malformed tag nesting, missing end tags, etc.
- Recognizes CDATA sections, processing instructions, and DOCTYPE declarations.
- Supports both strict XML and forgiving HTML parsing modes.
- 20 kB gzipped ^↗, including dependencies.
- Check out TagSoup dependencies:
Speedy Entities ^↗
and Flyweight DOM ^↗.

``sh npm install --save-prod tag-soup`

- API docs ^↗ - DOM parsing - SAX parsing - Tokenization - Serialization - Performance - Limitations

`DOM parsing`

TagSoup exports preconfigured HTMLDOMParser ^↗which parses HTML markup as a DOM node. This parser never throws errors during parsing and forgives malformed markup:

`ts import { HTMLDOMParser, toHTML } from 'tag-soup';

const fragment = HTMLDOMParser.parseFragment('

hello

cool '); // ⮕ DocumentFragment

toHTML(fragment); // ⮕ '

hello

cool

HTMLDOMParserdecodes both HTML entities and numeric character references withdecodeHTML ^↗.

XMLDOMParser ^↗parses XML markup as a DOM node. It throwsParserError ^↗if markup doesn't satisfy XML spec:

`ts import { XMLDOMParser, toXML } from 'tag-soup';

XMLDOMParser.parseFragment('

hello '); // ❌ ParserError: Unexpected end tag.

const fragment = XMLDOMParser.parseFragment('

hello

');
// ⮕ DocumentFragment
toXML(fragment);
// ⮕ '
hello

XMLDOMParserdecodes both XML entities and numeric character references withdecodeXML ^↗.

TagSoup uses Flyweight DOM ^↗ nodes, which provide many standard DOM manipulation features:

`ts const document = HTMLDOMParser.parseDocument('hello');

document.doctype.name; // ⮕ 'html'

document.textContent; // ⮕ 'hello'`

For example, you can use TreeWalker to traverse DOM nodes:

`ts import { TreeWalker, NodeFilter } from 'flyweight-dom';

const fragment = XMLDOMParser.parseFragment('

hello

');
const treeWalker = new TreeWalker(fragment, NodeFilter.SHOW_TEXT);

treeWalker.nextNode(); // ⮕ Text { 'hello' }`

Create a custom DOM parser usingcreateDOMParser ^↗:

`ts import { createDOMParser } from 'tag-soup';

const myParser = createDOMParser({ voidTags: ['br'], });

myParser.parseFragment('

');
// ⮕ DocumentFragment


SAX parsing

TagSoup exports preconfiguredHTMLSAXParser ^↗which parses HTML markup and calls handler methods when a token is read. This parser never throws errors during parsing and forgives malformed markup:

`ts import { HTMLSAXParser } from 'tag-soup';

HTMLSAXParser.parseFragment('

hello

cool ', { onStartTagOpening(tagName) { // Called with 'p', 'p', and 'br' }, onText(text) { // Called with 'hello' and 'cool' }, });`

XMLSAXParser ^↗parses XML markup and calls handler methods when a token is read. It throwsParserError ^↗if markup doesn't satisfy XML spec:

`ts import { XMLSAXParser } from 'tag-soup';

XMLSAXParser.parseFragment('

hello ', {}); // ❌ ParserError: Unexpected end tag.

XMLSAXParser.parseFragment('

hello

', {
  onEndTag(tagName) {
    // Called with 'br' and 'p'
  },
});

Create a custom SAX parser usingcreateSAXParser ^↗:

`ts import { createSAXParser } from 'tag-soup';

const myParser = createSAXParser({ voidTags: ['br'], });

myParser.parseFragment('

', {
  onStartTagOpening(tagName) {
    // Called with 'p' and 'br'
  },
});


Tokenization

TagSoup exports preconfiguredHTMLTokenizer ^↗which parses HTML markup and invokes a callback when a token is read. This tokenizer never throws errors during tokenization and forgives malformed markup:

`ts import { HTMLTokenizer } from 'tag-soup';

HTMLTokenizer.tokenizeFragment('

hello

cool ', (token, startIndex, endIndex) => { // Handle token });`

XMLTokenizer ^↗parses XML markup and invokes a callback when a token is read. It throwsParserError ^↗if markup doesn't satisfy XML spec:

`ts import { XMLTokenizer } from 'tag-soup';

XMLTokenizer.tokenizeFragment('

hello ', (token, startIndex, endIndex) => {}); // ❌ ParserError: Unexpected end tag.

XMLTokenizer.tokenizeFragment('

hello

', (token, startIndex, endIndex) => {
  // Handle token
});

Create a custom tokenizer usingcreateTokenizer ^↗:

`ts import { createTokenizer } from 'tag-soup';

const myTokenizer = createTokenizer({ voidTags: ['br'], });

myTokenizer.tokenizeFragment('

', (token, startIndex, endIndex) => {
  // Handle token
});


Serialization

TagSoup exports two preconfigured serializers:toHTML ^↗andtoXML ^↗.

`ts import { HTMLDOMParser, toHTML } from 'tag-soup';

const fragment = HTMLDOMParser.parseFragment('

hello

cool '); // ⮕ DocumentFragment

toHTML(fragment); // ⮕ '

hello

cool

Create a custom serializer usingcreateSerializer ^↗:

`ts import { HTMLDOMParser, createSerializer } from 'tag-soup';

const mySerializer = createSerializer({ voidTags: ['br'], });

const fragment = HTMLDOMParser.parseFragment('

hello '); // ⮕ DocumentFragment

mySerializer(fragment); // ⮕ '

hello


Performance
Execution performance is measured in operations per second (± 5%), the higher number is better.
Memory consumption (RAM) is measured in bytes, the lower number is better.
































Library Library size DOM parsing SAX parsing
Ops/sec RAM Ops/sec RAM
tag-soup@3.0.0 
20 kB ^↗
26 Hz 22 MB 58 Hz 22 kB

htmlparser2@10.0.0

58 kB ^↗
19 Hz 23 MB 31 Hz 10 MB

parse5@8.0.0

45 kB ^↗
7 Hz 105 MB 12 Hz 10 MB
Performance was measured when parsing the 3.8 MB HTML file.
Tests were conducted using TooFast on Apple M1 with Node.js v23.11.1.
To reproduce the performance test suite results, clone this repo and run:

Library	Library size	DOM parsing	SAX parsing
Ops/sec	RAM	Ops/sec	RAM
tag-soup@3.0.0	20 kB ^↗	26 Hz	22 MB	58 Hz	22 kB
htmlparser2@10.0.0	58 kB ^↗	19 Hz	23 MB	31 Hz	10 MB
parse5@8.0.0	45 kB ^↗	7 Hz	105 MB	12 Hz	10 MB

`shell npm ci npm run build npm run perf`

`Limitations`

TagSoup doesn't resolve some quirky element structures that malformed HTML may cause.

Assume the following markup:

`html

okay

nope`

With DOMParser ^↗ this markup would be transformed to:

`html

okay

nope`
TagSoup doesn't insert the second strong tag:
`html
okay
nope``

TagSoup is the fastest pure JS SAX/DOM XML/HTML parser and serializer.

``sh npm install --save-prod tag-soup`

- API docs ^↗ - DOM parsing - SAX parsing - Tokenization - Serialization - Performance - Limitations

`DOM parsing`

TagSoup exports preconfigured HTMLDOMParser ^↗which parses HTML markup as a DOM node. This parser never throws errors during parsing and forgives malformed markup:

`ts import { HTMLDOMParser, toHTML } from 'tag-soup';

const fragment = HTMLDOMParser.parseFragment('

hello

cool '); // ⮕ DocumentFragment

toHTML(fragment); // ⮕ '

hello

cool

'`
HTMLDOMParserdecodes both HTML entities and numeric character references withdecodeHTML ^↗.
XMLDOMParser ^↗parses XML markup as a DOM node. It throwsParserError ^↗if markup doesn't satisfy XML spec:
`ts import { XMLDOMParser, toXML } from 'tag-soup';
XMLDOMParser.parseFragment('
hello '); // ❌ ParserError: Unexpected end tag.
const fragment = XMLDOMParser.parseFragment('
hello
'); // ⮕ DocumentFragment toXML(fragment); // ⮕ ' hello`
XMLDOMParserdecodes both XML entities and numeric character references withdecodeXML ^↗.
TagSoup uses Flyweight DOM ^↗ nodes, which provide many standard DOM manipulation features:
`ts const document = HTMLDOMParser.parseDocument('hello');
document.doctype.name; // ⮕ 'html'
document.textContent; // ⮕ 'hello'`
For example, you can use TreeWalker to traverse DOM nodes:
`ts import { TreeWalker, NodeFilter } from 'flyweight-dom';
const fragment = XMLDOMParser.parseFragment('
hello
'); const treeWalker = new TreeWalker(fragment, NodeFilter.SHOW_TEXT);
treeWalker.nextNode(); // ⮕ Text { 'hello' }`
Create a custom DOM parser usingcreateDOMParser ^↗:
`ts import { createDOMParser } from 'tag-soup';
const myParser = createDOMParser({ voidTags: ['br'], });
myParser.parseFragment('
'); // ⮕ DocumentFragment`SAX parsing
TagSoup exports preconfiguredHTMLSAXParser ^↗which parses HTML markup and calls handler methods when a token is read. This parser never throws errors during parsing and forgives malformed markup:
`ts import { HTMLSAXParser } from 'tag-soup';
HTMLSAXParser.parseFragment('
hello
cool ', { onStartTagOpening(tagName) { // Called with 'p', 'p', and 'br' }, onText(text) { // Called with 'hello' and 'cool' }, });`
XMLSAXParser ^↗parses XML markup and calls handler methods when a token is read. It throwsParserError ^↗if markup doesn't satisfy XML spec:
`ts import { XMLSAXParser } from 'tag-soup';
XMLSAXParser.parseFragment('
hello ', {}); // ❌ ParserError: Unexpected end tag.
XMLSAXParser.parseFragment('
hello
', { onEndTag(tagName) { // Called with 'br' and 'p' }, });`
Create a custom SAX parser usingcreateSAXParser ^↗:
`ts import { createSAXParser } from 'tag-soup';
const myParser = createSAXParser({ voidTags: ['br'], });
myParser.parseFragment('
', { onStartTagOpening(tagName) { // Called with 'p' and 'br' }, });`Tokenization
TagSoup exports preconfiguredHTMLTokenizer ^↗which parses HTML markup and invokes a callback when a token is read. This tokenizer never throws errors during tokenization and forgives malformed markup:
`ts import { HTMLTokenizer } from 'tag-soup';
HTMLTokenizer.tokenizeFragment('
hello
cool ', (token, startIndex, endIndex) => { // Handle token });`
XMLTokenizer ^↗parses XML markup and invokes a callback when a token is read. It throwsParserError ^↗if markup doesn't satisfy XML spec:
`ts import { XMLTokenizer } from 'tag-soup';
XMLTokenizer.tokenizeFragment('
hello ', (token, startIndex, endIndex) => {}); // ❌ ParserError: Unexpected end tag.
XMLTokenizer.tokenizeFragment('
hello
', (token, startIndex, endIndex) => { // Handle token });`
Create a custom tokenizer usingcreateTokenizer ^↗:
`ts import { createTokenizer } from 'tag-soup';
const myTokenizer = createTokenizer({ voidTags: ['br'], });
myTokenizer.tokenizeFragment('
', (token, startIndex, endIndex) => { // Handle token });`Serialization
TagSoup exports two preconfigured serializers:toHTML ^↗andtoXML ^↗.
`ts import { HTMLDOMParser, toHTML } from 'tag-soup';
const fragment = HTMLDOMParser.parseFragment('
hello
cool '); // ⮕ DocumentFragment
toHTML(fragment); // ⮕ '
hello
cool
'`
Create a custom serializer usingcreateSerializer ^↗:
`ts import { HTMLDOMParser, createSerializer } from 'tag-soup';
const mySerializer = createSerializer({ voidTags: ['br'], });
const fragment = HTMLDOMParser.parseFragment('
hello '); // ⮕ DocumentFragment
mySerializer(fragment); // ⮕ '
hello
'`Performance Execution performance is measured in operations per second (± 5%), the higher number is better. Memory consumption (RAM) is measured in bytes, the lower number is better. Library Library size DOM parsing SAX parsing Ops/sec RAM Ops/sec RAM tag-soup@3.0.0 20 kB ^↗ 26 Hz 22 MB 58 Hz 22 kB htmlparser2@10.0.0 58 kB ^↗ 19 Hz 23 MB 31 Hz 10 MB parse5@8.0.0 45 kB ^↗ 7 Hz 105 MB 12 Hz 10 MB Performance was measured when parsing the 3.8 MB HTML file. Tests were conducted using TooFast on Apple M1 with Node.js v23.11.1. To reproduce the performance test suite results, clone this repo and run:
`shell npm ci npm run build npm run perf`
Limitations
TagSoup doesn't resolve some quirky element structures that malformed HTML may cause.
Assume the following markup:
`html
okay
nope`
With DOMParser ^↗ this markup would be transformed to:
`html
okay
nope`
TagSoup doesn't insert the second strong tag:
`html
okay
nope``

tag-soup

DOM parsing

SAX parsing

Tokenization

Serialization

Performance

Limitations

tag-soup

DOM parsing

SAX parsing

Tokenization

Serialization

Performance

Limitations

Dist Tags

`DOM parsing`

`Limitations`

`DOM parsing`

`Limitations`