Trigram files for 500+ languages
npm install trigrams[![Build][badge-build-image]][badge-build-url]
[![Coverage][badge-coverage-image]][badge-coverage-url]
[![Downloads][badge-downloads-image]][badge-downloads-url]
Trigrams for 500+ languages.
* What is this?
* When should I use this?
* Install
* Use
* API
* min()
* top()
* Data
* Compatibility
* Contribute
* Security
* License
This package exposes all trigrams for natural languages.
Based on the most translated copyright-free document on this planet:
UDHR.
When you are dealing with natural language detection.
This package is [ESM only][github-gist-esm].
In Node.js
(version 18+),
install with [npm][npmjs-install]:
``sh`
npm install trigrams
In Deno with [esm.sh][esmsh]:
`js`
import {min, top} from 'https://esm.sh/trigrams@6'
In browsers with [esm.sh][esmsh]:
`html`
`js
import {min, top} from 'trigrams'
console.log((await min()).nld)
console.log((await top()).pam)
`
Yields:
`js`
[ // 300 top trigrams.
' ar',
'eer',
'tij',
// …
'de ',
'an ',
'en ' // Most common trigram.
]
`js`
{ // 300 top trigrams.
'isa': 6,
'upa': 6,
'i k': 6,
// …
'ang': 273,
'ing': 282,
'ng ': 572 // Most common trigram with how often it was found.
}
This package exports the identifiers
[min][api-min] andtop
[][api-top].
It exports no [TypeScript][] types.
There is no default export.
Get top trigrams.
###### Returns
Returns a promise resolving to arrays containing the top 300 trigrams sorted
from least occurring to most occurring
(Promise).
Get top trigrams to occurrence counts.
###### Returns
Returns a promise resolving to an object mapping
[UDHR in Unicode][efele-udhr]
codes to objects mapping the top 300 trigrams to occurrence counts
(Promise).
The trigrams are based on the [unicode][efele-udhr] versions of the
[universal declaration of human rights][ohchr-udhr].
The files are created from all paragraphs made available by
[wooorm/udhr][github-wooorm-udhr] and do not include headings and such.
Before creating trigrams,
* the unicode characters from \u0021 to \u0040 (both including) are\s+
removed
* one or more white space characters () are replaced with a single space[A-Z]
* alphabetic characters are lower cased ()
Additionally,
the input is padded with two spaces on both sides.
| Code | Name |
| - | - |
| 007 | Sãotomense |008
| | Crioulo, Upper Guinea (008) |009
| | Mbundu (009) |010
| | Tetun Dili |011
| | Umbundu (011) |013
| | (Mijisa) |014
| | (Maiunan) |016
| | (Minjiang, spoken) |017
| | (Minjiang, written) |020
| | Drung |021
| | (Muzzi) |022
| | (Klau) |025
| | (Bizisa) |026
| | (Yeonbyeon) |027
| | Gumuz |028
| | Kafa |029
| | Sidamo |030
| | Kituba (2) |032
| | South Azerbaijani |041
| | Latvian (2) |042
| | Spanish (resolution) |043
| | Zarma |044
| | Mirandese |045
| | Maasai |046
| | Malay, Papuan |047
| | Malay, Ambonese |048
| | Minangkabau (2) |049
| | Banjar |050
| | (Bataknese) |052
| | Morisyen |053
| | Hausa (2) |054
| | Catalan (2) |055
| | Jamaican Creole English |056
| | Saint Lucian Creole French |057
| | Maay |058
| | Somali (Af Marka) |059
| | North Saami (2) |060
| | Inari Saami |061
| | Skolt Saami |062
| | Swahili (Chimwiini) |063
| | Swahili (Kibajuni) |064
| | Dabarre |065
| | Garre |066
| | Jiiddu |067
| | Finnish (2) |068
| | French (Welche) |069
| | Maori (2) |071
| | Kabyle |aar
| | Afar |abk
| | Abkhaz |ace
| | Aceh |acu
| | Achuar-Shiwiar |acu_1
| | Achuar-Shiwiar (1) |ada
| | Dangme |ady
| | Adyghe |afr
| | Afrikaans |agr
| | Aguaruna |aii
| | Assyrian Neo-Aramaic |ajg
| | Aja |aka_akuapem
| | Twi (Akuapem) |aka_asante
| | Twi (Asante) |aka_fante
| | Fante |als
| | Albanian, Tosk |alt
| | Altai, Southern |amc
| | Amahuaca |ame
| | Yaneshaʼ |amh
| | Amharic |ami
| | Amis |amr
| | Amarakaeri |arb
| | Arabic, Standard |arl
| | Arabela |arn
| | Mapudungun |ast
| | Asturian |auc
| | Waorani |auv
| | Occitan (Auvergnat) |ayo
| | Ayoreo |ayr
| | Aymara, Central |azj_cyrl
| | Azerbaijani, North (Cyrillic) |azj_latn
| | Azerbaijani, North (Latin) |bam
| | Bamanankan |ban
| | Bali |bax
| | Bamun |bba
| | Baatonum |bci
| | Baoulé |bcl
| | Bicolano, Central |bel
| | Belarusan |bem
| | Bemba |ben
| | Bengali |bfa
| | Bari |bho
| | Bhojpuri |bin
| | Edo |bis
| | Bislama |blt
| | Tai Dam |blu
| | Hmong Njua |boa
| | Bora |bod
| | Tibetan, Central |bos_cyrl
| | Bosnian (Cyrillic) |bos_latn
| | Bosnian (Latin) |bre
| | Breton |btb
| | Bulu |buc
| | Bushi |bug
| | Bugis |bul
| | Bulgarian |bvi
| | Belanda Viri |cab
| | Garifuna |cak
| | Kaqchikel, Central |cas
| | Tsimané |cat
| | Catalan |cbi
| | Chachi |cbr
| | Cashibo-Cacataibo |cbs
| | Cashinahua |cbt
| | Chayahuita |cbu
| | Candoshi-Shapra |ccx
| | Zhuang, Yongbei |ceb
| | Cebuano |ces
| | Czech |cha
| | Chamorro |chj
| | Chinantec, Ojitlán |chk
| | Chuukese |chr_cased
| | Cherokee (cased) |chr_uppercase
| | Cherokee (uppercase) |chv
| | Chuvash |cic
| | Chickasaw |cjk
| | Chokwe |cjk_AO
| | Chokwe (Angola) |cjs
| | Shor |ckb
| | Kurdish, Central |cnh
| | Chin, Haka |cni
| | Asháninka |cnr
| | Montenegrin |cof
| | Colorado |cos
| | Corsican |cot
| | Caquinte |cpu
| | Ashéninka, Pichis |crh
| | Crimean Tatar |crs
| | Seselwa Creole French |csa
| | Chinantec, Chiltepec |csw
| | Cree, Swampy |ctd
| | Chin, Tedim |cym
| | Welsh |dag
| | Dagbani |dan
| | Danish |ddn
| | Dendi |deu_1901
| | German, Standard (1901) |deu_1996
| | German, Standard (1996) |dga
| | Dagaare, Southern |dip
| | Dinka, Northeastern |div
| | Maldivian |dyo
| | Jola-Fonyi |dyu
| | Jula |dzo
| | Dzongkha |ell_monotonic
| | Greek (monotonic) |ell_polytonic
| | Greek (polytonic) |emk
| | Maninkakan, Eastern |eml
| | Romagnolo |eng
| | English |epo
| | Esperanto |ese
| | Ese Ejja |est
| | Estonian |eus
| | Basque |eve
| | Even |evn
| | Evenki |ewe
| | Éwé |fao
| | Faroese |fij
| | Fijian |fin
| | Finnish |fkv
| | Finnish, Kven |flm
| | Chin, Falam |fon
| | Fon |fra
| | French |fri
| | Frisian, Western |fuf
| | Pular |fur
| | Friulian |fuv
| | Fulfulde, Nigerian |fuv2
| | Fulfulde, Nigerian (2) |fvr
| | Fur |gaa
| | Ga |gag
| | Gagauz |gax
| | Oromo, Borana-Arsi-Guji |gjn
| | Gonja |gkp
| | Kpelle, Guinea |gla
| | Gaelic, Scottish |gld
| | Nanai |gle
| | Gaelic, Irish |glg
| | Galician |glv
| | Manx |gnw
| | Guarani, Western Bolivian |gsw1
| | Alemannisch (Elsassisch) |guc
| | Wayuu |gug
| | Guaraní, Paraguayan |guj
| | Gujarati |guu
| | Yanomamö |gyr
| | Guarayu |hat_kreyol
| | Haitian Creole French (Kreyol) |hat_popular
| | Haitian Creole French (Popular) |hau_NE
| | Hausa (Niger) |hau_NG
| | Hausa (Nigeria) |hau_3
| | Hausa |haw
| | Hawaiian |hea
| | Hmong, Northern Qiandong |heb
| | Hebrew |hil
| | Hiligaynon |hin
| | Hindi |hlt
| | Chin, Matu |hms
| | Hmong, Southern Qiandong |hna
| | Gen |hni
| | Hani |hns
| | Hindustani, Sarnami |hrv
| | Croatian |hsb
| | Sorbian, Upper |hsf
| | Huastec (Sierra de Otontepec) |hun
| | Hungarian |hus
| | Huastec (Veracruz) |huu
| | Huitoto, Murui |hva
| | Huastec (San Luís Potosí) |hye
| | Armenian |ibb
| | Ibibio |ibo
| | Igbo |ido
| | Ido |idu
| | Idoma |ijs
| | Ijo, Southeast |ike
| | Inuktitut, Eastern Canadian |ilo
| | Ilocano |ina
| | Interlingua |ind
| | Indonesian |isl
| | Icelandic |ita
| | Italian |jav
| | Javanese (Latin) |jav_java
| | Javanese (Javanese) |jiv
| | Shuar |jpn
| | Japanese |jpn_osaka
| | Japanese (Osaka) |jpn_tokyo
| | Japanese (Tokyo) |kaa
| | Karakalpak |kal
| | Inuktitut, Greenlandic |kan
| | Kannada |kat
| | Georgian |kaz
| | Kazakh |kbd
| | Kabardian |kbp
| | Kabiyé |kde
| | Makonde |kdh
| | Tem |kea
| | Kabuverdianu |kek
| | Q'eqchi' |kha
| | Khasi |khk
| | Mongolian, Halh (Cyrillic) |khm
| | Khmer, Central |kin
| | Rwanda |kir
| | Kirghiz |kjh
| | Khakas |kkh_lana
| | Khün |kmb
| | Mbundu |kmr
| | Kurdish, Northern |knc
| | Kanuri, Central |kng
| | Koongo |kng_AO
| | Koongo (Angola) |koi
| | Komi-Permyak |koo
| | Konjo |kor
| | Korean |kqn
| | Kaonde |kqs
| | Kissi, Northern |kri
| | Krio |krl
| | Karelian |ktu
| | Kituba |kwi
| | Awa-Cuaiquer |lad
| | Ladino |lao
| | Lao |lat
| | Latin |lat_1
| | Latin (1) |lav
| | Latvian |lia
| | Limba, West-Central |lij
| | Ligurian |lin
| | Lingala |lin_tones
| | Lingala (tones) |lit
| | Lithuanian |lld
| | Ladin |lnc
| | Occitan (Languedocien) |lns
| | Lamnso' |lob
| | Lobi |lot
| | Otuho |loz
| | Lozi |ltz
| | Luxembourgeois |lua
| | Luba-Kasai |lue
| | Luvale |lug
| | Ganda |lun
| | Lunda |lus
| | Mizo |mad
| | Madura |mag
| | Magahi |mah
| | Marshallese |mai
| | Maithili |mal
| | Malayalam |mal_chillus
| | Malayalam |mam
| | Mam, Northern |mar
| | Marathi |maz
| | Mazahua Central |mcd
| | Sharanahua |mcf
| | Matsés |men
| | Mende |mfq
| | Moba |mic
| | Micmac |min
| | Minangkabau |miq
| | Mískito |mkd
| | Macedonian |mlt
| | Maltese |mly_arab
| | Malay (Arabic) |mly_latn
| | Malay (Latin) |mnw
| | Mon |mor
| | Moro |mos
| | Mòoré |mri
| | Maori |mto
| | Mixe, Totontepec |mtp
| | Wichí Lhamtés Nocten |mxi
| | Mozarabic |mxv
| | Mixtec, Metlatónoc |mya
| | Burmese |mzi
| | Mazatec, Ixcatlán |nav
| | Navajo |nba
| | Nyemba |nbl
| | Ndebele |ndo
| | Ndonga |nds
| | Saxon, Low |nep
| | Nepali |nhn
| | Nahuatl, Central |nio
| | Nganasan |niu
| | Niue |niv
| | Gilyak |njo
| | Naga, Ao |nku
| | Kulango, Bouna |nld
| | Dutch |nno
| | Norwegian, Nynorsk |nob
| | Norwegian, Bokmål |not
| | Nomatsiguenga |nso
| | Sotho, Northern |nya_chechewa
| | Nyanja (Chechewa) |nya_chinyanja
| | Nyanja (Chinyanja) |nym
| | Nyamwezi |nyn
| | Nyankore |nzi
| | Nzema |oaa
| | Orok |oci_1
| | Francoprovençal (Fribourg) |oci_2
| | Francoprovençal (Savoie) |oci_3
| | Francoprovençal (Vaud) |oci_4
| | Francoprovençal (Valais) |ojb
| | Ojibwa, Northwestern |oki
| | Okiek |orh
| | Oroqen |oss
| | Osetin |ote
| | Otomi, Mezquital |pam
| | Pampangan |pan
| | Panjabi, Eastern |pap
| | Papiamentu |pau
| | Palauan |pbb
| | Páez |pbu
| | Pashto, Northern |pcd
| | Picard |pcm
| | Pidgin, Nigerian |pes_1
| | Farsi, Western |pes_2
| | Dari |pis
| | Pijin |piu
| | Pintupi-Luritja |plt
| | Malagasy, Plateau |pnb
| | Panjabi, Western |pol
| | Polish |pon
| | Pohnpeian |por_BR
| | Portuguese (Brazil) |por_PT
| | Portuguese (Portugal) |pov
| | Crioulo, Upper Guinea |ppl
| | Pipil |prv
| | Occitan |quc
| | K'iche', Central |qud
| | Quechua (Unified Quichua, old Hispanic orthography) |qug
| | Quichua, Chimborazo Highland |qul
| | Quechua, North Bolivian |quy
| | Quechua, Ayacucho |quz
| | Quechua, Cusco |qva
| | Quechua, Ambo-Pasco |qvc
| | Quechua, Cajamarca |qvh
| | Quechua, Huamalíes-Dos de Mayo Huánuco |qvm
| | Quechua, Margos-Yarowilca-Lauricocha |qvn
| | Quechua, North Junín |qwh
| | Quechua, Huaylas Ancash |qxa
| | Quechua, South Bolivian |qxn
| | Quechua, Northern Conchucos Ancash |qxu
| | Quechua, Arequipa-La Unión |rar
| | Rarotongan |rmn
| | Romani, Balkan |rmn_1
| | Romani, Balkan (1) |rmy
| | Aromanian |roh
| | Romansch |roh_puter
| | Romansch (Puter) |roh_rumgr
| | Romansch (Grischun) |roh_surmiran
| | Romansch (Surmiran) |roh_sursilv
| | Romansch (Sursilvan) |roh_sutsilv
| | Romansch (Sutsilvan) |roh_vallader
| | Romansch (Vallader) |ron_1953
| | Romanian (1953) |ron_1993
| | Romanian (1993) |ron_2006
| | Romanian (2006) |run
| | Rundi |rus
| | Russian |sag
| | Sango |sah
| | Yakut |san
| | Sanskrit |sco
| | Scots |sey
| | Secoya |shk
| | Shilluk |shn
| | Shan |shp
| | Shipibo-Conibo |sin
| | Sinhala |skr
| | Seraiki |slk
| | Slovak |slr
| | Salar |slv
| | Slovenian |sme
| | North Saami |smo
| | Samoan |sna
| | Shona |snk
| | Soninke |snn
| | Siona |som
| | Somali |sot
| | Sotho, Southern |spa
| | Spanish |src
| | Sardinian, Logudorese |srp_cyrl
| | Serbian (Cyrillic) |srp_latn
| | Serbian (Latin) |srq
| | Sirionó |srr
| | Serer-Sine |ssw
| | Swati |suk
| | Sukuma |sun
| | Sunda |sus
| | Susu |swb
| | Comorian, Maore |swe
| | Swedish |swh
| | Swahili |tah
| | Tahitian |tam
| | Tamil |tam_LK
| | Tamil (Sri Lanka) |tat
| | Tatar |tbz
| | Ditammari |tca
| | Ticuna |tel
| | Telugu |tem
| | Themne |tet
| | Tetun |tgk
| | Tajiki |tgl
| | Tagalog |tha
| | Thai |tha2
| | Thai (2) |tir
| | Tigrigna |tiv
| | Tiv |tji
| | Tujia, Nothern |tly
| | Talysh |tna
| | Tacana |tob
| | Toba |toi
| | Tonga |toj
| | Tojolabal |ton
| | Tongan |top
| | Totonac, Papantla |tpi
| | Tok Pisin |trn
| | Trinitario |tsn
| | Tswana |tso_MZ
| | Tsonga (Mozambique) |tso_ZW
| | Tsonga (Zimbabwe) |tsz
| | Purepecha |tuk_cyrl
| | Turkmen (Cyrillic) |tuk_latn
| | Turkmen (Latin) |tur
| | Turkish |tyv
| | Tuva |tzc
| | Tzotzil (Chamula) |tzh
| | Tzeltal, Oxchuc |tzm
| | Tamazight, Central Atlas |udu
| | Uduk |uig_arab
| | Uyghur (Arabic) |uig_latn
| | Uyghur (Latin) |ukr
| | Ukrainian |umb
| | Umbundu |ura
| | Urarina |urd
| | Urdu |urd_2
| | Urdu (2) |uzn_cyrl
| | Uzbek, Northern (Cyrillic) |uzn_latn
| | Uzbek, Northern (Latin) |vai
| | Vai |vec
| | Venetian |ven
| | Venda |ven2
| | Venda |vep
| | Veps |vie
| | Vietnamese |vmw
| | Makhuwa |war
| | Waray-Waray |wln
| | Walloon |wol
| | Wolof |wwa
| | Waama |xho
| | Xhosa |xsm
| | Kasem |yad
| | Yagua |yao
| | Yao |yap
| | Yapese |ydd
| | Yiddish, Eastern |ykg
| | Yukaghir, Northern |yor
| | Yoruba |yrk
| | Nenets |yua
| | Maya, Yucatán |yuz
| | Yuracare |zam
| | Zapotec, Miahuatlán |zdj
| | Comorian, Ngazidja |zgh
| | Tamazight, Standard Morocan |zro
| | Záparo |ztu
| | Zapotec, Güilá |zul` | Zulu |
|
This package is at least compatible with all maintained versions of Node.js.
As of now,
that is Node.js 18+.
It also works in Deno and modern browsers.
Yes please!
See [How to Contribute to Open Source][opensource-guide-contribute].
This package is safe.
[MIT][file-license] © [Titus Wormer][wooorm]
[api-min]: #min
[api-top]: #top
[badge-build-image]: https://github.com/wooorm/trigrams/workflows/main/badge.svg
[badge-build-url]: https://github.com/wooorm/trigrams/actions
[badge-coverage-image]: https://img.shields.io/codecov/c/github/wooorm/trigrams.svg
[badge-coverage-url]: https://codecov.io/github/wooorm/trigrams
[badge-downloads-image]: https://img.shields.io/npm/dm/trigrams.svg
[badge-downloads-url]: https://www.npmjs.com/package/trigrams
[efele-udhr]: http://efele.net/udhr/
[esmsh]: https://esm.sh
[file-license]: license
[github-gist-esm]: https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c
[github-wooorm-udhr]: https://github.com/wooorm/udhr
[npmjs-install]: https://docs.npmjs.com/cli/install
[ohchr-udhr]: https://www.ohchr.org/EN/UDHR/Pages/UDHRIndex.aspx
[opensource-guide-contribute]: https://opensource.guide/how-to-contribute/
[typescript]: https://www.typescriptlang.org
[wooorm]: https://wooorm.com