Extract text from HTML. Excludes content from metadata tags by default.
npm install extract-text-htmlExtract text from HTML. Excludes content from metadata tags by default.
For example, script and style. Reduces multiple spaces to a single space
and trims whitespace from the start and end by default. Set preserveWhitespace
to true to disable this behavior. Optionally, replace tags with text.
Offers a much nicer out-of-the-box experience compared to striptags.
See comparison here.
Single dependency on htmlparser2
``typescript
export interface Replacement {
/* Tag name to match (without brackets) /
matchTag: string
/* Text to replace the tag with /
text: string
/* Is the tag self-closing? /
isSelfClosing?: boolean
}
export interface Options {
/* Exclude content from the set of tags. Defaults to all HTML metadata tags. /
excludeContentFromTags?: string[]
/* Whitespace is trimmed by default. Set this to true to preserve whitespace. /
preserveWhitespace?: boolean
/* Replace a tag with some text. Flag self-closing tags with isSelfClosing: true. /
replacements?: Replacement[]
}
// Content from the following tags are excluded by default
export const defaultExcludeContentFromTags = [
'head',
'base',
'link',
'meta',
'noscript',
'script',
'style',
'title',
]
`
`typescript
import { extractText } from 'extract-text-html'
const html =
const extracted = extractText(html)
// Some Title Some text
`Replacements example usage
`typescript
const html = bold textmore text
const extracted = extractText(html, {
preserveWhitespace: true,
replacements: [
{ matchTag: 'br', text: '\n', isSelfClosing: true },
{ matchTag: 'b', text: '__' },
],
})
/*
__bold text__
some textmore text
*/
``