<p align="left"> <img src="https://github.com/yeonjuan/es-html-parser/actions/workflows/main.yml/badge.svg?branch=main" alt="CI Badge" /> <a href="https://codecov.io/gh/yeonjuan/es-html-parser" > <img src="https://codecov.io/gh/yeonjuan/es-html-parser/bra
ES HTML Parser is a HTML parser that generates an abstract syntax tree similar to the ESTree specification.
This project began as a fork of hyntax and is developed to follow ESTree-like ast specification.
See online demo.
- Install
- Usage
- API Reference
- AST Format
- License
```
npm install es-html-parser
`js
import { parse } from "es-html-parser";
const input =
;const { ast, tokens } = parse(input);
`API Reference
$3
#### parse
`ts
parse(html: string, options?: Options): ParseResult;
`Arguments
-
html: HTML string to parse.
- options (optional)
- tokenAdapter: The adapter option for changing tokens information.
- rawContentTags (string[]) : Specifies tag names whose child contents should be treated as raw text, meaning the parser will not interpret characters like < and > as HTML syntax inside these tags.Returns
-
ParseResult: Result of parsing$3
#### ParseResult
`ts
interface ParseResult {
ast: DocumentNode;
tokens: AnyToken[];
}
`-
ast: The root node of the ast.
- tokens: An array of resulting tokens.#### AnyNode
The
AnyNode is an union type of all nodes.`ts
type AnyNode =
| DocumentNode
| TextNode
| TagNode
| OpenTagStartNode
| OpenTagEndNode
| CloseTagNode
| AttributeNode
| AttributeKeyNode
| AttributeValueNode
| AttributeValueWrapperStartNode
| AttributeValueWrapperEndNode
| ScriptTagNode
| OpenScriptTagStartNode
| CloseScriptTagNode
| OpenScriptTagEndNode
| ScriptTagContentNode
| StyleTagNode
| OpenStyleTagStartNode
| OpenStyleTagEndNode
| StyleTagContentNode
| CloseStyleTagNode
| CommentNode
| CommentOpenNode
| CommentCloseNode
| CommentContentNode
| DoctypeNode
| DoctypeOpenNode
| DoctypeCloseNode
| DoctypeAttributeNode
| DoctypeAttributeValueNode
| DoctypeAttributeWrapperStartNode
| DoctypeAttributeWrapperEndNode;
`#### AnyToken
The
AnyToken is an union type all tokens.`ts
type AnyToken =
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token;
`$3
#### TokenTypes
`ts
enum TokenTypes {
Text = "Text",
OpenTagStart = "OpenTagStart",
OpenTagEnd = "OpenTagEnd",
CloseTag = "CloseTag",
AttributeKey = "AttributeKey",
AttributeAssignment = "AttributeAssignment",
AttributeValueWrapperStart = "AttributeValueWrapperStart",
AttributeValue = "AttributeValue",
AttributeValueWrapperEnd = "AttributeValueWrapperEnd",
DoctypeOpen = "DoctypeOpen",
DoctypeAttributeValue = "DoctypeAttributeValue",
DoctypeAttributeWrapperStart = "DoctypeAttributeWrapperStart",
DoctypeAttributeWrapperEnd = "DoctypeAttributeWrapperEnd",
DoctypeClose = "DoctypeClose",
CommentOpen = "CommentOpen",
CommentContent = "CommentContent",
CommentClose = "CommentClose",
OpenScriptTagStart = "OpenScriptTagStart",
OpenScriptTagEnd = "OpenScriptTagEnd",
ScriptTagContent = "ScriptTagContent",
CloseScriptTag = "CloseScriptTag",
OpenStyleTagStart = "OpenStyleTagStart",
OpenStyleTagEnd = "OpenStyleTagEnd",
StyleTagContent = "StyleTagContent",
CloseStyleTag = "CloseStyleTag",
}
`#### NodeTypes
`ts
enum NodeTypes {
Document = "Document",
Tag = "Tag",
Text = "Text",
Doctype = "Doctype",
Comment = "Comment",
CommentOpen = "CommentOpen",
CommentClose = "CommentClose",
CommentContent = "CommentContent",
Attribute = "Attribute",
AttributeKey = "AttributeKey",
AttributeValue = "AttributeValue",
AttributeValueWrapperStart = "AttributeValueWrapperStart",
AttributeValueWrapperEnd = "AttributeValueWrapperEnd",
CloseTag = "CloseTag",
OpenTagEnd = "OpenTagEnd",
OpenTagStart = "OpenTagStart",
DoctypeOpen = "DoctypeOpen",
DoctypeAttribute = "DoctypeAttribute",
DoctypeClose = "DoctypeClose",
ScriptTag = "ScriptTag",
OpenScriptTagStart = "OpenScriptTagStart",
OpenScriptTagEnd = "OpenScriptTagEnd",
ScriptTagContent = "ScriptTagContent",
StyleTag = "StyleTag",
OpenStyleTagStart = "OpenStyleTagStart",
OpenStyleTagEnd = "OpenStyleTagEnd",
StyleTagContent = "StyleTagContent",
CloseStyleTag = "CloseStyleTag",
CloseScriptTag = "CloseScriptTag",
DoctypeAttributeValue = "DoctypeAttributeValue",
DoctypeAttributeWrapperStart = "DoctypeAttributeWrapperStart",
DoctypeAttributeWrapperEnd = "DoctypeAttributeWrapperEnd",
}
`AST Format
- Common
- BaseNode
- SourceLocation
- Position
- Token
- DocumentNode
- TextNode
- TagNode
- OpenTagStartNode
- OpenTagEndNode
- CloseTagNode
- AttributeNode
- AttributeKeyNode
- AttributeValueWrapperStartNode
- AttributeValueWrapperEndNode
- AttributeValueNode
- ScriptTagNode
- OpenScriptTagStartNode
- OpenScriptTagEndNode
- CloseScriptTagNode
- ScriptTagContentNode
- StyleTagNode
- OpenStyleTagStartNode
- OpenStyleTagEndNode
- CloseStyleTagNode
- StyleTagContentNode
- CommentNode
- CommentOpenNode
- CommentCloseNode
- CommentContentNode
- DoctypeNode
- DoctypeOpenNode
- DoctypeCloseNode
- DoctypeAttributeNode
- DoctypeAttributeValueNode
- DoctypeAttributeWrapperStartNode
- DoctypeAttributeWrapperEndNode
$3
#### BaseNode
Every AST node and token implements the
BaseNode interface.`ts
interface BaseNode {
type: string;
loc: SourceLocation;
range: [number, number];
}
`The
type field is representing the AST type. Its value is one of the NodeTypes or TokenTypes.
The loc and range fields represent the source location of the node.#### SourceLocation
`ts
interface SourceLocation {
start: Position;
end: Position;
}
`The
start field represents the start location of the node.The
end field represents the end location of the node.#### Position
`ts
interface Position {
line: number; // >= 1
column: number; // >= 0
}
`The
line field is a number representing the line number where the node positioned. (1-based index).The
column field is a number representing the offset in the line. (0-based index).#### Token
All tokens implement the
Token interface.`ts
interface Token extends BaseNode {
type: T;
value: string;
}
`$3
DocumentNode represents a whole parsed document. It's a root node of the AST.`ts
interface DocumentNode extends BaseNode {
type: "Document";
children: Array;
}
`$3
TextNode represents any plain text in HTML.`ts
interface TextNode extends BaseNode {
type: "Text";
value: string;
}
`$3
TagNode represents all kinds of tag nodes in HTML except for doctype, script, style, and comment. (e.g. , ...)`ts
interface TagNode extends BaseNode {
type: "Tag";
selfClosing: boolean;
name: string;
openStart: OpenTagStartNode;
openEnd: OpenTagEndNode;
close?: CloseTagNode;
children: Array;
attributes: Array;
}
`#### OpenTagStartNode
OpenTagStartNode represents the opening part of the Start tags. (e.g. `ts`
interface OpenTagStartNode extends BaseNode {
type: "OpenTagStart";
value: string;
}
#### OpenTagEndNode
OpenTagEndNode represents the closing part of the Start tags. (e.g. >, />)
`ts`
interface OpenTagEndNode extends BaseNode {
type: "OpenTagEnd";
value: string;
}
#### CloseTagNode
ClosingTagNode represents the End tags. (e.g.
)`ts
interface CloseTagNode extends BaseNode {
type: "CloseTag";
value: string;
}
`$3
AttributeNode represents an attribute. (e.g. id="foo")`ts
interface AttributeNode extends BaseNode {
type: "Attribute";
key: AttributeKeyNode;
value?: AttributeValueNode;
startWrapper?: AttributeValueWrapperStartNode;
endWrapper?: AttributeValueWrapperEndNode;
}
`#### AttributeKeyNode
AttributeKeyNode represents a key part of an attribute. (e.g. id)`ts
interface AttributeKeyNode extends BaseNode {
type: "AttributeKey";
value: string;
}
`#### AttributeValueWrapperStartNode
AttributeValueWrapperStartNode represents the left side character that wraps the value of the attribute. (e.g. ", ')`ts
interface AttributeValueWrapperStartNode extends BaseNode {
type: "AttributeValueWrapperStart";
value: string;
}
`#### AttributeValueWrapperEndNode
AttributeValueWrapperEndNode represents the right side character that wraps the value of the attribute. (e.g. ", ')`ts
interface AttributeValueWrapperEndNode extends BaseNode {
type: "AttributeValueWrapperEnd";
value: string;
}
`#### AttributeValueNode
AttributeValueNode represents the value part of the attribute. It does not include wrapper characters. (e.g. foo)`ts
interface AttributeValueNode extends BaseNode {
type: "AttributeValue";
value: string;
}
`$3
The
ScriptTagNode represents a script tags in the HTML. (e.g. ).`ts
interface ScriptTagNode extends BaseNode {
type: "ScriptTag";
attributes: Array;
openStart: OpenScriptTagStartNode;
openEnd: OpenScriptTagEndNode;
close: CloseScriptTagNode;
value?: ScriptTagContentNode;
}
`#### OpenScriptTagStartNode
OpenScriptTagStartNode represents an opening part of a start script tag. (e.g. )`ts
interface CloseScriptTagNode extends BaseNode {
type: "CloseScriptTag";
value: string;
}
`#### ScriptTagContentNode
ScriptTagContentNode represents a script content in script tag. (e.g. console.log('hello');)`ts
interface ScriptTagContentNode extends BaseNode {
type: "ScriptTagContent";
value: string;
}
`$3
StyleTagNode represents style tags. (e.g. )`ts
interface StyleTagNode extends BaseNode {
type: "StyleTag";
attributes: Array;
openStart: OpenStyleTagStartNode;
openEnd: OpenStyleTagEndNode;
close: CloseStyleTagNode;
value?: StyleTagContentNode;
}
`#### OpenStyleTagStartNode
OpenStyleTagStartNode represents an opening part of a start style tag. (e.g. )`ts
interface CloseStyleTagNode extends BaseNode {
type: "CloseStyleTag";
value: string;
}
`#### StyleTagContentNode
StyleTagContentNode represents a style content in style tag.`ts
interface StyleTagContentNode extends BaseNode {
type: "StyleTagContent";
value: string;
}
`$3
CommentNode represents comment in HTML. (e.g. )`ts
interface CommentNode extends BaseNode {
type: "Comment";
open: CommentOpenNode;
close: CommentCloseNode;
value: CommentContentNode;
}
`#### CommentOpenNode
CommentOpenNode represents comment start character sequence. (e.g. )`ts
interface CommentCloseNode extends BaseNode {
type: "CommentClose";
value: string;
}
`#### CommentContentNode
The
CommentContentNode represents text in the comment.`ts
interface CommentContentNode extends BaseNode {
type: "CommentContent";
value: string;
}
`$3
DoctypeNode represents the DOCTYPE in html.`ts
interface DoctypeNode extends BaseNode {
type: "Doctype";
attributes: Array;
open: DoctypeOpenNode;
close: DoctypeCloseNode;
}
`#### DoctypeOpenNode
DoctypeOpenNode represents character sequence of doctype start . ()`ts`
interface DoctypeOpenNode extends BaseNode {
type: "DoctypeOpen";
value: string;
}
#### DoctypeCloseNode
DoctypeCloseNode represents the doctype end character sequence (e.g. >)
`ts`
interface DoctypeCloseNode extends BaseNode {
type: "DoctypeClose";
value: string;
}
DoctypeAttributeNode represents an attribute of doctype node. (e.g. html, "-//W3C//DTD HTML 4.01 Transitional//EN")
`ts`
interface DoctypeAttributeNode extends BaseNode {
type: "DoctypeAttribute";
key: DoctypeAttributeKey;
}
#### DoctypeAttributeValueNode
DoctypeAttributeValueNode represents a value of doctype node's attribute. (e.g. html, -//W3C//DTD HTML 4.01 Transitional//EN)'
. It does not include wrapper characters (, ")
`ts`
interface DoctypeAttributeValueNode extends BaseNode {
type: "DoctypeAttributeValue";
value: string;
}
#### DoctypeAttributeWrapperStartNode
DoctypeAttributeWrapperStartNode represents a left side character that wraps the value of the attribute. (e.g. ", ')
`ts`
interface DoctypeAttributeWrapperStartNode extends BaseNode {
type: "DoctypeAttributeWrapperStart";
value: string;
}
#### DoctypeAttributeWrapperEndNode
DoctypeAttributeWrapperEndNode represents a right side character that wraps the value of the attribute. (e.g. ", ')
`ts``
interface DoctypeAttributeWrapperEndNode extends BaseNode {
type: "DoctypeAttributeWrapperEnd";
value: string;
}