Attempt at guessing stuff, summarize content, based on raw text. A naïve Natural Language Processing toolkit.
npm install @archivator/content-divinator> Attempt at guessing stuff, summarize content, based on raw text. A naïve
> Natural Language Processing toolkit.
This is, by no means, an actual attempt at Machine Learning. It’s simply a few
helpers to help automate maintenance of metadata from imported content.
[repo-url]:
https://github.com/renoirb/archivator/blob/v3.x-dev/packages/content-divinator
'Content Divinator'
[npmjs-package-badge]:
https://img.shields.io/npm/v/%40archivator%2Fcontent-divinator?style=flat-square&logo=appveyor&label=npm&logo=npm
[npmjs-package]: https://www.npmjs.com/package/%40archivator%2Fcontent-divinator
[bundlesize-badge]:
https://img.shields.io/bundlephobia/min/%40archivator%2Fcontent-divinator?style=flat-square
'Bundle Size'
[dependabot-badge]:
https://img.shields.io/librariesio/release/npm/%40archivator%2Fcontent-divinator?style=flat-square&logo=appveyor&logo=dependabot
| Version | Size | Dependencies |
| -------------------------------------------- | ------------------------------------ | ---------------------------------------------------------------------- |
| [![npm][npmjs-package-badge]][npmjs-package] | ![npm bundle size][bundlesize-badge] | ![Libraries.io dependency status for latest release][dependabot-badge] |
_See also_:
- API Extractor code-review signature
- API Documentor generated docs
``js
import { ContentDivinator, utils } from '@archivator/content-divinator'
/**
* Input text to parse.
* Let’s say we want to know the most used words
* so we can guess what it is about
*/
const textContent =
How much wood would a woodchuck chuck
if a woodchuck could chuck wood?
He would chuck, he would, as much as he could,
and chuck as much wood as a woodchuck would
if a woodchuck could chuck wood.
/**
* What we want is to count word usage.
*
* Let’s say, we’d want to get the following knowledge
* from the above text.
*
* We do not need words such as: a,if,He, they don’t bring any value.
* They’re called "Stop words".
* They're different for each language.
*/
const desiredTextHashMap = {
chuck: 5,
woodchuck: 4,
wood: 4,
}
/**
* Stop Words are words that are mostly noise when
* we try to analyze what it is about.
*
* For this purpose, a "word" are letters and numbers in sequence.
*
* Notice in stopWords, we're not adding 'that', it is present 3 times in the text above.
*/
const stopWords = ['a', 'as', 'could', 'he', 'how', 'if', 'much', 'would']
const divinator = new ContentDivinator(stopWords)
/* @type {Map
const textMap = divinator.words(sourceDocumentText)
console.log(textMap)
// > Map { 'chuck' => 5, 'woodchuck' => 4, 'wood' => 4 }
// Convert ECMAScript 2015+ Map to an Object.create(null) type of Hash-Map.
/* @type {Object.
const hashMap = utils.convertMapToRecordHashMap(textMap)
console.log(hashMap)
// > { chuck: 5, woodchuck: 4, wood: 4 }
``
- Brasilian
- Turkish
- _English_
- Textlint to find filler words, buzzwords
During March 5th refactor work session, [I’ve attempted in making use of
coroutine][coroutine-gist]. But I’ve dropped the idea when my [favourite
JavaScript author said _he hasn’t seen strongly typed usage of
coroutine_][renoir-axel-tweets].
[renoir-axel-tweets]: https://twitter.com/renoirb/status/1236386606266953731
[coroutine-gist]:
https://gist.github.com/renoirb/e7d344cb88524800c247c6842e4eb550
- https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/co/index.d.ts
- https://gist.github.com/OrionNebula/bd2d4339497a2c05e599d7d24038d290
- https://github.com/danoctavian/node-coroutine-utils
- https://github.com/wowts/coroutine
- http://calculist.org/blog/2011/12/14/why-coroutines-wont-work-on-the-web/
- https://www.bennadel.com/blog/3264-thoughts-on-defining-coroutines-as-class-methods-in-node-js-and-typescript.htm