A few language trigram utilities
npm install trigram-utils[![Build][build-badge]][build]
[![Coverage][coverage-badge]][coverage]
[![Downloads][downloads-badge]][downloads]
[![Size][size-badge]][size]
A few language trigram utilities.
* What is this?
* When should I use this?
* Install
* Use
* API
* clean(value)
* trigrams(value)
* asDictionary(value)
* asTuples(value)
* tuplesAsDictionary(tuples)
* Types
* Compatibility
* Security
* Related
* Contribute
* License
This package contains a few utilities that can help when working with trigram
(an n-gram where each slice is 3 characters) based natural language detection.
Probably not often, except when you want to create something like [franc][],
but build it in something other than UDHR.
This package is [ESM only][esm].
In Node.js (version 12.20+, 14.14+, or 16.0+), install with [npm][]:
``sh`
npm install trigram-utils
In Deno with [Skypack][]:
`js`
import * as trigramUtils from 'https://cdn.skypack.dev/trigram-utils@2?dts'
In browsers with [Skypack][]:
`html`
`js
import {clean, trigrams, asDictionary, asTuples, tuplesAsDictionary} from 'trigram-utils'
clean(' t@rololol ') // => 't rololol'
trigrams(' t@rololol ')
// => [' t ', 't r', ' ro', 'rol', 'olo', 'lol', 'olo', 'lol', 'ol ']
asDictionary(' t@rololol ')
// => {'ol ': 1, lol: 2, olo: 2, rol: 1, ' ro': 1, 't r': 1, ' t ': 1}
const tuples = asTuples(' t@rololol ')
// => [
// ['ol ', 1],
// ['rol', 1],
// [' ro', 1],
// ['t r', 1],
// [' t ', 1],
// ['lol', 2],
// ['olo', 2]
// ]
tuplesAsDictionary(tuples)
// => {olo: 2, lol: 2, ' t ': 1, 't r': 1, ' ro': 1, rol: 1, 'ol ': 1}
`
This package exports the following identifiers: clean, trigrams,asDictionary, asTuples, tuplesAsDictionary.
There is no default export.
Clean a value (string).
Strips some (for language detection) useless punctuation, symbols, and numbers.
Collapses white space, trims, and lowercases.
From a value (string), make clean, padded trigrams (see [n-gram][n-gram])Array
().
From a value (string), get clean trigrams as a dictionaryRecord
(): keys are trigrams, values are occurrence counts.
From a value (string), get clean trigrams with occurrence counts as a tupleArray<[string, number]>
(): first index (0) the trigram, second (1) the
occurrence count.
Turn trigram tuples (Array<[string, number]>) into a dictionaryRecord
().
This package is fully typed with [TypeScript][].
It exports an additional Gemoji type that models its respective interface.
This package is at least compatible with all maintained versions of Node.js.
As of now, that is Node.js 12.20+, 14.14+, and 16.0+.
It also works in Deno and modern browsers.
This package is safe.
* words/trigrams
— trigrams for 400+ languages based on UDHR
* words/n-gram
— get n-grams from text
* [wooorm/franc`][franc]
— natural language detection
Yes please!
See [How to Contribute to Open Source][contribute].
[MIT][license] © [Titus Wormer][author]
[build-badge]: https://github.com/wooorm/trigram-utils/workflows/main/badge.svg
[build]: https://github.com/wooorm/trigram-utils/actions
[coverage-badge]: https://img.shields.io/codecov/c/github/wooorm/trigram-utils.svg
[coverage]: https://codecov.io/github/wooorm/trigram-utils
[downloads-badge]: https://img.shields.io/npm/dm/trigram-utils.svg
[downloads]: https://www.npmjs.com/package/trigram-utils
[size-badge]: https://img.shields.io/bundlephobia/minzip/trigram-utils.svg
[size]: https://bundlephobia.com/result?p=trigram-utils
[npm]: https://docs.npmjs.com/cli/install
[skypack]: https://www.skypack.dev
[license]: license
[author]: https://wooorm.com
[esm]: https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c
[typescript]: https://www.typescriptlang.org
[contribute]: https://opensource.guide/how-to-contribute/
[n-gram]: https://github.com/words/n-gram
[franc]: https://github.com/wooorm/franc