Normalizing texts before any natural language processing
npm install text-preprocessor
yarn add text-preprocessor
`
Or using NPM:
`
npm i --save text-preprocessor
`
Usage
`javascript
const preprocessor = require('text-preprocessor');
const text = preprocessor(' thats great! \n \t & but don’t take too long okay? \n bjŏȒk—Ɏó ');
preprocessor(text) ⇒ TextPreprocessor
Constructs a TextPreprocessor instance
| Param | Type |
| --- | --- |
| text | String |
Methods
* TextPreprocessor
* new TextPreprocessor(text)
* .clean()
* .unescape()
* .toLowerCase()
* .toString()
* .expandContractions()
* .killUnicode()
* .replace(regexp, value)
* .remove(regexp)
* .removeTagsAndMentions()
* .removeSpecialCharachters()
* .removeURLs()
* .removeParenthesesContents()
* .removePunctuation()
* .normalizeSingleCurlyQuotes()
* .normalizeDoubleCurlyQuotes()
* .defaults()
* .chain()
$3
Normalizing texts before any natural language processing
| Param | Type |
| --- | --- |
| text | string |
$3
and strips extra whitespace from all documents, leaving only at most one whitespace between any two other characters.
Kind: instance method of TextPreprocessor
$3
Converts the HTML entities &, <, >, ", and ' in string to their corresponding characters.
Kind: instance method of TextPreprocessor
$3
Converts all the alphabetic characters in a string to lowercase.
Kind: instance method of TextPreprocessor
$3
returns the result of chains so far
Kind: instance method of TextPreprocessor
$3
Replaces all occuring English contractions by their expanded equivalents, e.g. "don't" is changed to "do not".
Kind: instance method of TextPreprocessor
$3
Replaces hugely-ignorant, and widely subjective transliteration of latin, cryllic, greek unicode characters with english ascii.
Kind: instance method of TextPreprocessor
$3
Replaces any occurrence of the given expression with the givven string
Kind: instance method of TextPreprocessor
| Param | Type |
| --- | --- |
| regexp | RegExp |
| value | String |
$3
Removes any occurrence of the given expression
Kind: instance method of TextPreprocessor
| Param | Type |
| --- | --- |
| regexp | RegExp |
$3
Removes #tags, @mentions from start of the text
Kind: instance method of TextPreprocessor
$3
Removes all special charachters
Kind: instance method of TextPreprocessor
$3
Removes Urls and emails
Kind: instance method of TextPreprocessor
$3
Remove brackets and parentheses contents.
Kind: instance method of TextPreprocessor
Example
`js
Hello, this is Mike (example) to Hello, this is Mike
`
$3
Removes punctuation from end of the text
Kind: instance method of TextPreprocessor
$3
Coerce single curly quotes. don’t to don't
Kind: instance method of TextPreprocessor
$3
Coerce double curly quotes. it is «Khorzu” to it is "Khorzu"
Kind: instance method of TextPreprocessor
$3
clean,toLowerCase,unescape,killUnicode and normalizeSingleCurlyQuotes`
TextPreprocessor
TextPreprocessor
TextPreprocessorString |