A very fast HTML parser, generating a simplified DOM, with basic element query support.
npm install a-node-html-parserFast HTML Parser is a _very fast_ HTML parser. Which will generate a simplified
DOM tree, with element query support.
Per the design, it intends to parse massive HTML files in lowest price, thus the > Note: when using Fast HTML Parser in a Typescript project the minimum Typescript version supported is ^4.1.2 -- 2022-08-10 ` Tested with htmlparser-benchmark. ` const root = parse(' console.log(root.firstChild.structure); console.log(root.querySelector('#list')); var root = HTMLParser.parse(' Parse the data provided, and return the root of the generated DOM. - data, data to parse Parse the data provided, return true if the given data is valid, and return false if not. Trim element from right (in block) after seeing pattern in a TextNode. Remove whitespaces in this sub tree. Query CSS selector to find matching nodes. Note: Full range of CSS3 selectors supported since v3.0.0. Query CSS Selector to find matching node. Get all elements with the specified tagName. Note: Use * for all elements. Query closest element by css selector. Append a child node to childNodes Parses the specified text as HTML and inserts the resulting nodes into the DOM tree at a specified position. Set Set attributes of the element. Remove Get Exchanges given child with new child. Remove child node. Same as outerHTML Set content. Notice: Do not set content of the root node. Remove current element. Replace current element with other node(s). #### HTMLElement#classList.add Add class name. #### HTMLElement#classList.replace(old: string, new: string) Replace class name with another one. #### HTMLElement#classList.remove() Remove class name. #### HTMLElement#classList.toggle(className: string):void Toggle class. Remove it if it is already included, otherwise add. #### HTMLElement#classList.contains(className: string): boolean Returns true if the classname is already in the classList. #### HTMLElement#classList.values() Get class names. #### Node#clone() Clone a node. #### Node#getElementById(id: string): HTMLElement; Get element by it's ID. Get unescaped text value of current node and its children. Like Get escaped (as-is) text value of current node and its children. May have Get or Set tag name of HTMLElement. Notice: the returned value would be an uppercase string. Get structured Text. Get DOM structure. Get first child node. Get last child node. Set or Get innerHTML. Get outerHTML. Returns a reference to the next child node of the current element's parent. Returns a reference to the next child element of the current element's parent. Returns a reference to the previous child node of the current element's parent. Returns a reference to the previous child element of the current element's parent. Get or Set textContent of current element, more efficient than set_content. Get all attributes of current element. Notice: do not try to change the returned value. Get all attributes of current element. Notice: do not try to change the returned value. Corresponding source code start and end indexes (ie [ 0, 40 ])
performance is the top priority. For this reason, some malformatted HTML may not
be able to parse correctly, but most usual errors are covered (eg. HTML4 style
no closing , etc). Install
``shell`
npm install --save node-html-parser.Performance
shell`
html-parser :24.1595 ms/file ± 18.7667
htmljs-parser :4.72064 ms/file ± 5.67689
html-dom-parser :2.18055 ms/file ± 2.96136
html5parser :1.69639 ms/file ± 2.17111
cheerio :12.2122 ms/file ± 8.10916
parse5 :6.50626 ms/file ± 4.02352
htmlparser2 :2.38179 ms/file ± 3.42389
htmlparser :17.4820 ms/file ± 128.041
high5 :3.95188 ms/file ± 2.52313
node-html-parser:2.04288 ms/file ± 1.25203
node-html-parser (last release):2.00527 ms/file ± 1.21317Usage
ts
import { parse } from 'node-html-parser';
');
// ul#list
// li
// #text
// { tagName: 'ul',
// rawAttrs: 'id="list"',
// childNodes:
// [ { tagName: 'li',
// rawAttrs: '',
// childNodes: [Object],
// classNames: [] } ],
// id: 'list',
// classNames: [] }
console.log(root.toString());
//
root.set_content('
root.toString(); //
``js`
var HTMLParser = require('node-html-parser');
');`Global Methods
$3
- options, parse optionsjs`
{
lowerCaseTagName: false, // convert tag name to lower case (hurts performance heavily)
comment: false, // retrieve comments (hurts performance slightly)
voidTag:{
tags: ['area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'link', 'meta', 'param', 'source', 'track', 'wbr'], // optional and case insensitive, default value is ['area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'link', 'meta', 'param', 'source', 'track', 'wbr']
addClosingSlash: true // optional, default false. void tag serialisation, add a final slash
},
blockTextElements: {
script: true, // keep text content when parsing
noscript: true, // keep text content when parsing
style: true, // keep text content when parsing
pre: true // keep text content when parsing
}
}
value$3
HTMLElement Methods
$3
$3
$3
$3
$3
$3
$3
$3
$3
to key attribute.key$3
$3
attribute.key$3
attribute.innerText$3
$3
$3
$3
$3
$3
$3
HTMLElement Properties
$3
.&` in it. (fast)
(slow for the first time)$3
$3
$3
$3
$3
$3
$3
$3
$3
$3
$3
$3
$3
$3
$3
$3