A parser for files in the Unicode database
npm install @chr33s/pdf-codepoints> A parser for files in the Unicode database.
Distributed as native ES modules with NodeNext resolution (use Node.js 18+ or a modern bundler).
@chr33s/pdf-codepoints lives in the chr33s/pdf monorepo and provides native ES modules with TypeScript declarations.
Produces a giant array of codepoint objects for
every character represented by Unicode, with many properties derived from files in the Unicode
database.
BUILD SCRIPTS ONLY: Use in production is not recommended
as the parsers are not optimized for speed, the text files are huge, and the resulting array uses a
huge amount of memory. To access this data in real world applications, use modules that have
precompiled the data into a compressed form:
* @chr33s/pdf-unicode-properties
Install using npm:
npm install @chr33s/pdf-codepoints
Basic usage:
``js`
import codepoints from "@chr33s/pdf-codepoints";
The parser generates data by reading the text files contained in the
Unicode Character Database. By default, it will use the database
bundled with this package. To use a custom version of UCD, use @chr33s/pdf-codepoints/parser
instead, which accepts an optional path to a directory containing the uncompressed UCD data:
`js`
import { parser } from "@chr33s/pdf-codepoints";
codepoints = parser("/path/to/UCD");
Each element in the generated array is either undefined (for unassigned code
points), or an object containing the following properties:
* code - the code point indexname
* - character nameunicode1Name
* - legacy name used by Unicode 1category
* - Unicode categoryblock
* - the block name this character is a part ofscript
* - the script this character belongs toeastAsianWidth
* - the east asian width for this charactercombiningClass
* - numeric combining class valuecombiningClassName
* - a string name for the combining classbidiClass
* - class for the Unicode bidirectional algorithmbidiMirrored
* - whether the character is mirrored in the bidi algorithmnumeric
* - the numeric value for this characteruppercase
* - an array of code points mapping this character to upper case, if anylowercase
* - an array of code points mapping this character to lower case, if anytitlecase
* - an array of code points mapping this character to title case, if anyfolded
* - an array of code points mapping this character to a folded equivalent, if anycaseConditions
* - conditions used during case mapping for this characterdecomposition
* - an array of code points that this character decomposes into. Used by the Unicode normalization algorithm.compositions
* - a dictionary mapping of compositions for this characterisCompat
* - whether the decomposition is a compatibility oneisExcluded
* - whether the character is excluded from compositionNFC_QC
* - quickcheck value for NFC (0 = YES, 1 = NO, 2 = MAYBE)NFKC_QC
* - quickcheck value for NFKC (0 = YES, 1 = NO, 2 = MAYBE)NFD_QC
* - quickcheck value for NFD (0 = YES, 1 = NO)NFKD_QC
* - quickcheck value for NFKD (0 = YES, 1 = NO)joiningType
* - arabic joining typejoiningGroup` - arabic joining group
*
MIT