extract-text-html

Extract text from HTML. Excludes content from metadata tags by default.
For example, script and style. Reduces multiple spaces to a single space
and trims whitespace from the start and end by default. Set preserveWhitespace
to true to disable this behavior. Optionally, replace tags with text.

Offers a much nicer out-of-the-box experience compared to striptags.
See comparison here.

Single dependency on htmlparser2

``typescript export interface Replacement { /* Tag name to match (without brackets) / matchTag: string /* Text to replace the tag with / text: string /* Is the tag self-closing? / isSelfClosing?: boolean }

export interface Options { /* Exclude content from the set of tags. Defaults to all HTML metadata tags. / excludeContentFromTags?: string[] /* Whitespace is trimmed by default. Set this to true to preserve whitespace. / preserveWhitespace?: boolean /* Replace a tag with some text. Flag self-closing tags with isSelfClosing: true. / replacements?: Replacement[] }

// Content from the following tags are excluded by default export const defaultExcludeContentFromTags = [ 'head', 'base', 'link', 'meta', 'noscript', 'script', 'style', 'title', ]`

`Example`

`typescript import { extractText } from 'extract-text-html'

const html =

extracttext - npm search

Some Title

Some text


const extracted = extractText(html)
// Some Title Some text


Replacements example usage

`typescript const html =bold text

some text

more text


const extracted = extractText(html, {
    preserveWhitespace: true,
    replacements: [
        { matchTag: 'br', text: '\n', isSelfClosing: true },
        { matchTag: 'b', text: '__' },
    ],
})
/*
__bold text__
some text

more text */``