mini-html-parser

Mini html parser for webworkers / node. Parses and builds a simplified DOM tree in one go. Intended for well-formed HTML.

Installation

With node.js:

npm install mini-html-parser

In the browser (with component):

$ component install matthewmueller/mini-html-parser

* Development: 16kb
* Minified + gzipped: 4kb

Example

``js var html = '

`some title`

this is a post from mat.io.

';
var parser = parser(html);
var dom = parser.parse();


API
$3

Create a parser with the following html string.

`$3`

Parse the html string returning a simplified DOM object. The DOM object contains the following DOM nodes below. If the parser fails to parse the HTML string, parse will return an Error object.

#### element:

`{ nodeName: 'A', nodeType: 1, childNodes: [...], previousSibling: ..., nextSibling: ..., parentNode: ... }`

#### text:

`{ nodeName: '#text', nodeType: 3, nodeValue: '...', previousSibling: ..., nextSibling: ..., parentNode: ... }`

#### comment:

`{ nodeName: '#comment', nodeType: 8, nodeValue: '...', previousSibling: ..., nextSibling: ..., parentNode: ... }`

#### document fragment:

`{ nodeName: '#fragment', nodeType: 11, nodeValue: null, childNodes: [...], previousSibling: null, nextSibling: null, parentNode: null }``

TODO

- handle other node types (doctype, etc.)
- benchmark

This library won't parse X...

This is not a full-blown XML parser. It's error handling is minimal and is best suited for well-formed HTML. For more information read this: http://stackoverflow.com/a/1732454/376773

Credits

A lot of the regular expressions and inspiration came from John Resig's Pure Javascript HTML Parser.

License

MIT