A utility for analyzing text to find bigrams, trigrams, and other n-grams.
npm install methodiusA utility for analyzing frequency of text chunks on the web.
Supply a bit o' text to the Methodius class, and let it determine your bigrams, trigrams, ngrams, letter-frequencies, word frequencies, bigram relationships, and create ngram trees.

!npm
``JavaScript
const { Methodius } = require('methodius');
// or import { Methodius } from 'methodius';
const udhr1 =
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.;
const nGrams = new Methodius(udhr1);
const topLetters = nGrams.getTopLetters(10);
const topWords = nGrams.getTopWords(10);
`
new Methodius(text)
Parameters
| name | type | Description |
| --- |--- | --- |
| text | string | raw text to be analyzed |
characters to ignore when analyzing text
period, comma, semicolon, colon, bang, question mark, interrobang, Spanish bang+, parens, bracket, brace, single quote, some spaces\\.,;:!?‽¡¿⸘()\\[\\]{}<>’'…\"\n\t\r####
wordSeparators
characters to ignore AND CONSUME when trying to find words
em-dash, period, comma, semicolon, colon, bang, question mark, interrobang, Spanish bang+, parens, bracket, brace, single quote, space—\\.,;:!?‽¡¿⸘()\\[\\]{}<>…"\\s
$3
#### hasPunctuation(string)
determines if string contains punctuation
Parameters
| name | type | Description |
| --- |--- | --- |
| string | string | |Returns
boolean####
hasSymbols(string)
determines if string contains symbols
Parameters
| name | type | Description |
| --- |--- | --- |
| string | string | |Returns
boolean####
hasSpace(string)
determines if a string has a space Parameters
| name | type | Description |
| --- |--- | --- |
| string | string | |
Returns
boolean####
sanitizeText(string)
lowercases text and removes diacritics and other characters that would throw off n-gram analysis Parameters
| name | type | Description |
| --- |--- | --- |
| string |string | |
Returns
string####
getWords(text)
extracts an array of words from a string Parameters
| name | type | Description |
| --- |--- | --- |
| text | string | |
Returns
Array####
getNGrams(text, gramSize)
gets ngrams from text Parameters
| name | type | Description |
| --- |--- | --- |
| text | string | |
| gramSize | Number | Default = 2 |
Returns
Array####
getMeanWordSize(wordArray)
Gets average size of a wordParameters
| name | type | Description |
| --- |--- | --- |
| wordArray | string[] | |
Returns
number####
getMedianWordSize(wordArray)
Gets the median (middle) size of a wordParameters
| name | type | Description |
| --- |--- | --- |
| wordArray | string[] | |
Returns
number
####
getWordNGrams(text)
Gets 2-word pairs from text.Note: This doesn't use sentence punctuation as a boundary. Should it?
Parameters
| name | type | Description |
| --- |--- | --- |
| text | string | |
| gramSize | number | default=2 |
Returns
Array####
getFrequencyMap(frequencyMap)
converts an array of strings into a map of those strings and number of occurences Parameters
| name | type | Description |
| --- |--- | --- |
| ngramArray |
Array | |Returns
Map####
getPercentMap(frequencyMap)
converts a frequency map into a map of percentages Parameters
| name | type | Description |
| --- |--- | --- |
| frequencyMap |
Map | |Returns
Map####
getTopGrams(frequencyMap)
filters a frequency map into only a small subset of the most frequent ones
Parameters
| name | type | Description |
| --- |--- | --- |
| frequencyMap | Map | |
| limit | number | default=20 |Returns
Map####
getIntersection(iterable1, iterable2)
returns an array of items that occur in both iterables
Parameters
| name | type | Description |
| --- |--- | --- |
| iterable1 | Map|Array | |
| iterable2 | Map|Array | |Returns
Array
An array of items that occur in both iterables. It will compare the keys, if sent a map####
getUnion(iterable1, iterable2)
Returns an array that is the union of two iterablesParameters
| name | type | Description |
| --- |--- | --- |
| iterable1 |
Map|Array | |
| iterable2 | Map|Array | |Returns
Array
A union of the items that occur in both iterables. ####
getDisjunctiveUnion(iterable1, iterable2)
returns an array of arrays of the unique items in either iterable
Parameters
| name | type | Description |
| --- |--- | --- |
| iterable1 | Map|Array | |
| iterable2 | Map|Array | |Returns
Array
An array of arrays of the unique items. The first item is the first parameter, 2nd item second param####
getComparison(iterable1, iterable2)
returns a map containing various comparisons between two iterables
Parameters
| name | type | Description |
| --- |--- | --- |
| iterable1 | Map|Array | |
| iterable2 | Map|Array | |Returns
Map
A map containing various comparisons between two iterables. Those comparisons will be some kind of array (See intersection or disjunctiveUnion)####
getWordPlacementForNGram(ngram, wordsArray)
determines the placement of a single ngram in an array of words
Parameters
| name | type | Description |
| --- |--- | --- |
| ngram | string | |
| wordsArray | Array | |Returns
Map
a map with the keys 'start', 'middle', and 'end' whose values correspond to how often the provided ngram occurs in this position####
getWordPlacementForNGrams(ngrams, wordsArray)
determines the placement of ngrams in an array of words
Parameters
| name | type | Description |
| --- |--- | --- |
| ngram | Array | |
| wordsArray | Array | |Returns
Map
a map with the key of the ngram, and the value that is a map containing start, middle, end####
getNgramCollections(ngrams, wordsArray)
gets ngrams from an array of words
Parameters
| name | type | Description |
| --- |--- | --- |
| wordArray | Array | an array of words |
| ngramSize | number | default = 2. The size of the ngrams to return |Returns
Array
An array containing arrays of ngrams, each array corresponds to a word. ####
getNgramSiblings(searchText, ngramCollections, siblingSize)
using a collection returned from getNgramCollections, searches for a string and returns what comes before and after it
Parameters
| name | type | Description |
| --- |--- | --- |
| searchText | string | the string to search for |
| ngramCollections | Array | an array of ngrams, or an nGramCollection |
| siblingSize | number | default = 1. How many siblings to find in front or behind |Returns
Map<'before'|'after',Map
a Map with the keys 'before' and 'after' which contain maps of what comes before and afterExample
`JavaScript
const words = ['revolution', 'nation'];
const ngramCollections = Methodius.getNgramCollections(words, 2);
const onSiblings = Methodius.getNgramSiblings('io', ngramCollections);
/*
new Map([
['before', new Map(
['ti', 2]
)],
['after', new Map(
['on', 2]
)]
])
*/
`####
getRelatedNgrams(words, ngrams, ngramSize)
Gets the ngrams that will occur before or after other ngrams. Useful for finding patterns of ngrams.Parameters
| name | type | Description |
| --- |--- | --- |
| words |
Array | an array of words to evaluate |
| ngrams | Map | a frequency map of ngrams |
| ngramSize | number | default = 2. the size of the ngram |Returns
Map A frequency map of how often ngrams occured before or after other ngramsExample
This requires several steps. You'll need an array of words and a frequency map of ngrams.
`JavaScript
const ngrams = getNGrams('the revolution of the nation was on television. It was about pollution and the terrible situation ', 2);
const frequencyMap = getFrequencyMap(ngrams);
const topNgrams = getTopGrams(frequencyMap, 5);
const words = ['the', 'revolution', 'of', 'the', 'nation', 'was', 'on', 'television', 'it', 'was', 'about', 'pollution', 'and', 'the', 'terrible', 'situation' ];
const relatedNgrams = getRelatedNgrams(words, topNgrams, 2, 5);
`####
getNgramTreeCollection(words)Gets a nested map of maps that breaks down unique words into their smallest ngrams
Parameters
| name | type | Description |
| --- |--- | --- |
| words |
Array | an array of words to evaluate |Returns
Map A nested map of maps that breaks down unique words into their smallest ngrams.$3
#### sanitizedText
lowercased text with diacritics removedstring
#### letters
an array of letters in the textArray
#### words
an array of words in the text
Array
#### bigrams
an array of letter bigrams in the text
Array
#### trigrams
an array of letter trigrams in the text
Array
#### uniqueLetters
an array of unique letters in the text
Array
#### uniqueBigrams
an array of unique bigrams in the text
Array
#### uniqueTrigrams
an array of unique trigrams in the text
Map
#### letterPositions
a map of placements of letters within words
Map
#### bigramPositions
a map of placements of bigrams within words
Map
#### uniqueTrigrams
a map of placements of trigrams within words
Array
#### uniqueWords
an array of unique words in the text
Array
#### letterFrequencies
a map of letter frequencies in the sanitized text
Map
#### bigramFrequencies
a map of bigram frequencies in the sanitized text
Map
#### trigramFrequencies
a map of trigram frequencies in the sanitized text
Map
#### wordFrequencies
a map of word frequencies in the sanitized text
Map
#### letterPercentages
a map of letter percentages in the sanitized text
Map
#### bigramPercentages
a map of bigram percentages in the sanitized text
Map
#### trigramPercentages
a map of trigram percentages in the sanitized text
Map
#### wordPercentages
a map of word percentages in the sanitized text
Map####
meanWordSize
The average size of a word
number####
medianWordSize
The middle size of a word
number####
ngramTreeCollection
A nested map of maps that breaks down unique words into their smallest ngrams.$3
####
getLetterNGrams(size)
gets an array of customizeable ngrams in the textParameters
| name | type | Description |
| --- |--- | --- |
| size |
number | default = 2 size of the n-gram to return |Returns
Array####
getTopLetters(limit)
a map of the most used letters in the textParameters
| name | type | Description |
| --- |--- | --- |
| limit |
number | default = 20 number of top letters to return |Returns
Map####
getTopBigrams(limit)
a map of the most used bigrams in the textParameters
| name | type | Description |
| --- |--- | --- |
| limit |
number | default = 20 number of top bigrams to return |Returns
Map####
getTopTrigrams(limit)
a map of the most used trigrams in the textParameters
| name | type | Description |
| --- |--- | --- |
| limit |
number | default = 20 number of top trigrams to return |Returns
Map####
getTopWords(limit)
a map of the most used words in the textParameters
| name | type | Description |
| --- |--- | --- |
| limit |
number | default = 20 number of top words to return |Returns
Map
####
compareTo(methodius)
Compare this methodius instance to anotherParameters
| name | type | Description |
| --- |--- | --- |
| methodius |
Methodius | another Methodius instance |Returns
Map
A map of property names and their comparisons (intersection, disjunctiveUnions, etc) for a set of properties
####
getRelatedTopNgrams(ngramSize, limit)
Gets the ngrams that will occur before or after other ngrams based on what the most frequent ngrams are. Useful for finding patterns of ngrams.Parameters
| name | type | Description |
| --- |--- | --- |
| ngramSize |
number | default = 2. the size of the ngram |
| limit | number | default = 20. the number of top ngrams to use |Returns
Map