Implicit.
Explicit: foos-ball
bar()hast utility to transform to nlcst
npm install hast-util-to-nlcst[![Build][build-badge]][build]
[![Coverage][coverage-badge]][coverage]
[![Downloads][downloads-badge]][downloads]
[![Size][size-badge]][size]
[![Sponsors][sponsors-badge]][collective]
[![Backers][backers-badge]][collective]
[![Chat][chat-badge]][chat]
[hast][] utility to transform to [nlcst][].
* What is this?
* When should I use this?
* Install
* Use
* API
* toNlcst(tree, file, Parser)
* ParserConstructor
* ParserInstance
* Types
* Compatibility
* Security
* Related
* Contribute
* License
This package is a utility that takes a [hast][] (HTML) syntax tree as input and
turns it into [nlcst][] (natural language).
This project is useful when you want to deal with ASTs and inspect the natural
language inside HTML.
Unfortunately, there is no way yet to apply changes to the nlcst back into
hast.
The mdast utility [mdast-util-to-nlcst][mdast-util-to-nlcst] does the same but
uses a markdown tree as input.
The rehype plugin [rehype-retext][rehype-retext] wraps this utility to do the
same at a higher-level (easier) abstraction.
This package is [ESM only][esm].
In Node.js (version 16+), install with [npm][]:
``sh`
npm install hast-util-to-nlcst
In Deno with [esm.sh][esmsh]:
`js`
import {toNlcst} from 'https://esm.sh/hast-util-to-nlcst@4'
In browsers with [esm.sh][esmsh]:
`html`
Say our document example.html contains:
`html`
Implicit.
Explicit: foos-ball
bar()
…and our module example.js looks as follows:
`js
import {fromHtml} from 'hast-util-from-html'
import {toNlcst} from 'hast-util-to-nlcst'
import {ParseEnglish} from 'parse-english'
import {read} from 'to-vfile'
import {inspect} from 'unist-util-inspect'
const file = await read('example.html')
const tree = fromHtml(file)
console.log(inspect(toNlcst(tree, file, ParseEnglish)))
`
…now running node example.js yields (positional info removed for brevity):
`txt`
RootNode[2] (1:1-6:1, 0-134)
├─0 ParagraphNode[3] (1:10-3:3, 9-24)
│ ├─0 WhiteSpaceNode "\n " (1:10-2:3, 9-12)
│ ├─1 SentenceNode[2] (2:3-2:12, 12-21)
│ │ ├─0 WordNode[1] (2:3-2:11, 12-20)
│ │ │ └─0 TextNode "Implicit" (2:3-2:11, 12-20)
│ │ └─1 PunctuationNode "." (2:11-2:12, 20-21)
│ └─2 WhiteSpaceNode "\n " (2:12-3:3, 21-24)
└─1 ParagraphNode[1] (3:7-3:43, 28-64)
└─0 SentenceNode[4] (3:7-3:43, 28-64)
├─0 WordNode[1] (3:7-3:15, 28-36)
│ └─0 TextNode "Explicit" (3:7-3:15, 28-36)
├─1 PunctuationNode ":" (3:15-3:16, 36-37)
├─2 WhiteSpaceNode " " (3:16-3:17, 37-38)
└─3 WordNode[4] (3:25-3:43, 46-64)
├─0 TextNode "foo" (3:25-3:28, 46-49)
├─1 TextNode "s" (3:37-3:38, 58-59)
├─2 PunctuationNode "-" (3:38-3:39, 59-60)
└─3 TextNode "ball" (3:39-3:43, 60-64)
This package exports the identifier [toNlcst][api-to-nlcst].
There is no default export.
Turn a hast tree into an nlcst tree.
> 👉 Note: tree must have positional info and file must be a VFiletree
> corresponding to .
##### Parameters
* tree ([HastNode][hast-node])file
— hast tree to transform
* ([VFile][vfile])Parser
— virtual file
* ([ParserConstructor][api-parser-constructor] orParserInstance
[][api-parser-instance])
— parser to use.
##### Returns
[NlcstNode][nlcst-node].
##### Notes
###### Implied paragraphs
The algorithm supports implicit and explicit paragraphs, such as:
`html`
An implicit paragraph.
An explicit paragraph.
Overlapping paragraphs are also supported (see the tests or the HTML spec for
more info).
###### Ignored nodes
Some elements are ignored and their content will not be present in
[nlcst][]: