hast-util-to-nlcst

[![Build][build-badge]][build]
[![Coverage][coverage-badge]][coverage]
[![Downloads][downloads-badge]][downloads]
[![Size][size-badge]][size]
[![Sponsors][sponsors-badge]][collective]
[![Backers][backers-badge]][collective]
[![Chat][chat-badge]][chat]

[hast][] utility to transform to [nlcst][].

* What is this?
* When should I use this?
* Install
* Use
* API
* toNlcst(tree, file, Parser)
* ParserConstructor
* ParserInstance
* Types
* Compatibility
* Security
* Related
* Contribute
* License

What is this?

This package is a utility that takes a [hast][] (HTML) syntax tree as input and
turns it into [nlcst][] (natural language).

When should I use this?

This project is useful when you want to deal with ASTs and inspect the natural
language inside HTML.
Unfortunately, there is no way yet to apply changes to the nlcst back into
hast.

The mdast utility [mdast-util-to-nlcst][mdast-util-to-nlcst] does the same but
uses a markdown tree as input.

The rehype plugin [rehype-retext][rehype-retext] wraps this utility to do the
same at a higher-level (easier) abstraction.

Install

This package is [ESM only][esm].
In Node.js (version 16+), install with [npm][]:

``sh npm install hast-util-to-nlcst`

In Deno with [esm.sh][esmsh]:

`js import {toNlcst} from 'https://esm.sh/hast-util-to-nlcst@4'`

In browsers with [esm.sh][esmsh]:

`html`

`Use`

Say our document example.html contains:

`html


  Implicit.
  Explicit: foos-ball

  bar()

…and our module example.js looks as follows:

`js import {fromHtml} from 'hast-util-from-html' import {toNlcst} from 'hast-util-to-nlcst' import {ParseEnglish} from 'parse-english' import {read} from 'to-vfile' import {inspect} from 'unist-util-inspect'

const file = await read('example.html') const tree = fromHtml(file)

console.log(inspect(toNlcst(tree, file, ParseEnglish)))`

…now running node example.js yields (positional info removed for brevity):

`txt RootNode[2] (1:1-6:1, 0-134) ├─0 ParagraphNode[3] (1:10-3:3, 9-24) │ ├─0 WhiteSpaceNode "\n " (1:10-2:3, 9-12) │ ├─1 SentenceNode[2] (2:3-2:12, 12-21) │ │ ├─0 WordNode[1] (2:3-2:11, 12-20) │ │ │ └─0 TextNode "Implicit" (2:3-2:11, 12-20) │ │ └─1 PunctuationNode "." (2:11-2:12, 20-21) │ └─2 WhiteSpaceNode "\n " (2:12-3:3, 21-24) └─1 ParagraphNode[1] (3:7-3:43, 28-64) └─0 SentenceNode[4] (3:7-3:43, 28-64) ├─0 WordNode[1] (3:7-3:15, 28-36) │ └─0 TextNode "Explicit" (3:7-3:15, 28-36) ├─1 PunctuationNode ":" (3:15-3:16, 36-37) ├─2 WhiteSpaceNode " " (3:16-3:17, 37-38) └─3 WordNode[4] (3:25-3:43, 46-64) ├─0 TextNode "foo" (3:25-3:28, 46-49) ├─1 TextNode "s" (3:37-3:38, 58-59) ├─2 PunctuationNode "-" (3:38-3:39, 59-60) └─3 TextNode "ball" (3:39-3:43, 60-64)`

`API`

This package exports the identifier [toNlcst][api-to-nlcst]. There is no default export.

`$3`

Turn a hast tree into an nlcst tree.

> 👉 Note: tree must have positional info and file must be a VFile> corresponding totree.

##### Parameters

* tree ([HastNode][hast-node]) — hast tree to transform *file ([VFile][vfile]) — virtual file *Parser ([ParserConstructor][api-parser-constructor] or [ParserInstance][api-parser-instance]) — parser to use.

##### Returns

[NlcstNode][nlcst-node].

##### Notes

###### Implied paragraphs

The algorithm supports implicit and explicit paragraphs, such as:

`html


  An implicit paragraph.
  An explicit paragraph.


Overlapping paragraphs are also supported (see the tests or the HTML spec for
more info).
###### Ignored nodes

Some elements are ignored and their content will not be present in [nlcst][]:hast-util-to-nlcst - npm explorer

hast-util-to-nlcst

v4.0.0TypeScript

hast utility to transform to nlcst

unist hast hast-util util utility rehype retext nlcst html natural

0/weekUpdated 2 years agoMITUnpacked: 29.5 KB

Published by Titus Wormer

npm install hast-util-to-nlcst

Repository Homepage npm

hast-util-to-nlcst

[hast][] utility to transform to [nlcst][].

What is this?

This package is a utility that takes a [hast][] (HTML) syntax tree as input and
turns it into [nlcst][] (natural language).

When should I use this?

This project is useful when you want to deal with ASTs and inspect the natural
language inside HTML.
Unfortunately, there is no way yet to apply changes to the nlcst back into
hast.

The mdast utility [mdast-util-to-nlcst][mdast-util-to-nlcst] does the same but
uses a markdown tree as input.

The rehype plugin [rehype-retext][rehype-retext] wraps this utility to do the
same at a higher-level (easier) abstraction.

Install

This package is [ESM only][esm].
In Node.js (version 16+), install with [npm][]:

``sh npm install hast-util-to-nlcst`

In Deno with [esm.sh][esmsh]:

`js import {toNlcst} from 'https://esm.sh/hast-util-to-nlcst@4'`

In browsers with [esm.sh][esmsh]:

`html`

`Use`

Say our document example.html contains:

`html


  Implicit.
  Explicit: foos-ball

  bar()

…and our module example.js looks as follows:

const file = await read('example.html') const tree = fromHtml(file)

console.log(inspect(toNlcst(tree, file, ParseEnglish)))`

…now running node example.js yields (positional info removed for brevity):

`API`

This package exports the identifier [toNlcst][api-to-nlcst]. There is no default export.

`$3`

Turn a hast tree into an nlcst tree.

> 👉 Note: tree must have positional info and file must be a VFile> corresponding totree.

##### Parameters

##### Returns

[NlcstNode][nlcst-node].

##### Notes

###### Implied paragraphs

The algorithm supports implicit and explicit paragraphs, such as:

`html


  An implicit paragraph.
  An explicit paragraph.


Overlapping paragraphs are also supported (see the tests or the HTML spec for
more info).
###### Ignored nodes

Some elements are ignored and their content will not be present in [nlcst][]:

hast-util-to-nlcst

hast-util-to-nlcst

Contents

What is this?

When should I use this?

Install

`Use`

Explicit: foos-ball

`API`

`$3`

An explicit paragraph.

hast-util-to-nlcst

hast-util-to-nlcst

Contents

What is this?

When should I use this?

Install

`Use`

Explicit: foos-ball

`API`

`$3`

An explicit paragraph.