A vCard v4 parser with type safety first
npm install vcard4-tsvcard4-ts was designed with the following goals:
- Compliant with RFC 6350 and
its extensions
- TypeScript (and type safety) from the ground up
- Avoid mistakes, DRY (Don't Repeat Yourself)
- The data structure definition, created from RFC 6350, contains instructions
for the parser
- The returned data structure is easy to use
- The decisions to be made by the calling code should be as few and as simple
as possible. Everything that can be delegated to the IDE (while writing your
code) and TypeScript compile time should be handled there. E.g., no need to
check whether there is a single or multiple values: if something can occur
multiple times, the item is always in an array.
In addition to RFC6350, the following RFCs are implemented:
- RFC 6474: BIRTHPLACE,
DEATHPLACE, and DEATHDATE properties
- RFC 6715: EXPERTISE,
INTEREST, HOBBY, ORG-DIRECTORY properties and LEVEL, INDEX
parameters
- RFC 8605: CONTACT-URI
property and CC (two-letter country code) parameter
vcard4-ts is compatible to the following RFCs, as it does not impose any
limitation on string-valued parameters and values:
- RFC 6473: The
KIND:application property
- RFC 7852: The
TYPE=main-number parameter
yarn add vcard4-ts or npm i vcard4-ts. No dependencies (except
devDependencies). And only about 10 kB (compressed) will end up in your code,
the rest is tests, alternatives, debugging information, …
Basic usage is straightforward:
``ts
import { parseVCards } from 'vcard4-ts';
import { readFileSync } from 'fs';
const vcf = readFileSync('example.vcf').toString();
const cards = parseVCards(vcf);
if (cards.vCards) {
for (const card of cards.vCards) {
console.log('Card found for ' + card.FN[0].value[0]);
}
} else {
console.error('No valid vCards in file');
}
`
We can see two basic principles in action:
1. The types are always clear, no expensive run-time testing whether there is
just a single value or there are multiple values. (This is the prime
directive.)
2. There are no null or undefined (aka nullish) values; and any arrays will
always have at least one element. This is the secondary directive.
As a result of these principles, the following rules apply:
1. Mandatory properties (BEGIN, END, VERSION, and FN) always do existnull
and are never or undefined ("nullish").
2. Optional properties (all the others defined in RFC 6350) only exist, if they
do appear in the file. I.e., if they exist, they also have a value and are
never nullish. (However, the strings may still be empty.)
3. To match the prime directive, any property, whether mandatory or optional,
that _may_ appear more than once, is _always_ an array of values.
These rules make software development more predictable and thus faster, less
error-prone:
- Typescript can verify type correctness.
- Autocompletion and type inference in IDEs such as VSCode/VSCodium works and is
very helpful.
This example demonstrates the access to parsing errors and warnings, to
structured information, and non-RFC properties. Explanations are in the
design and reference sections below.
`tskeepDefective: true
if (cards.nags) {
// There were global problems, e.g. because the file did seem to contain invalid vCards.
// Those cards can be obtained by passing to parseVCards().${nag.key} (${nag.description}): ${nag.attributes}
for (const nag of cards.nags) {
if (nag.isError) {
console.error();${nag.key} (${nag.description}): ${nag.attributes}
} else {
console.warn();
}
}
}
for (let card of cards.vCards) {
// If you would like element 0 to correspond to the most PREFerred item:
sortByPREF(card);
// You're guaranteed to have all these (required) properties,
// no need to check their existence first. Also, the editor will
// auto-complete and know the type.
console.log('Found vCard with version ' + card.VERSION.value);
console.log('Full name: ' + card.FN[0].value[0]);
// Maybe some optional (any-cardinality) RFC6350 property is present?
if (card.EMAIL) {
// There might be multiple EMAIL property lines, but as the EMAIL field
// is present, we're guaranteed to have at least one value. See
// https://netfuture.ch/2021/11/array-thickening-more-can-be-less/
console.log('Emailable at: ' + card.EMAIL[0].value);
// Is it known whether it is a work or home address?
if (card.EMAIL[0].parameters?.TYPE) {
console.log('It is of type: ' + card.EMAIL[0].parameters.TYPE[0]);
}
}
// The same with a structured any-cardinality property
if (card.ADR) {
// All elements of the address, including the locality, can have multiple
// values. And we still could have multiple addresses (e.g., work and
// home). We'll just print the first.
console.log('Living in: ' + card.ADR[0].value.locality[0]);
}
// Any property not in the standard (and its extension RFCs)?
// (Their name should be prefixed with X-)
if (card.x) {
for (const [k, v] of Object.entries(card.x)) {
console.log('Non-RFC6350 property ' + k + ', with ' + JSON.stringify(v));
}
}
// Any problems found while parsing the vCard?
if (card.nags) {
console.log(
'While parsing this card, the following was noticed ' +
'(and either the problematic part dropped or ignored)',
);
for (const nag of card.nags) {
if (nag.isError) {
console.error(Global ${nag.key} (${nag.description}));Global ${nag.key} (${nag.description})
} else {
console.warn();
}
}
// Some of these problems might be unparseable lines. They are archived
// here.
if (card.unparseable) {
console.log('The following unparseable lines were encountered:');
for (const line of card.unparseable) {
console.log(line);
}
}
}
}
`
The prime design goal is to avoid mistakes in the code and enable calling code
to avoid mistakes as well. Designing for (type) safety is achieved by
Don't Repeat Yourself, Parse, don't validate,
and Array thickening.
Don't Repeat Yourself
was a basic design principle while developing the module. The description of the
data structure is centralized. The goal was to have only a single authoritative
source of type information, from which both compile-time type information and
runtime parsing instructions would be derived. As TypeScript transpilation
output no longer contains the type information, it was necessary to jump through
hoops. (Luckily, Colin McDonnell's Zod was
a great resource for educating about hoop-jumping.)
The idea of
parsing instead of validation
was introduced by Alexis King, for the Haskell ecosystem. The gist of it:
Directly parse the source data into the required (type-safe) format, instead of
first parsing it into an (essentially) untyped format and then validating it to
be of the right type. This assures that type safety starts earlier and is
guaranteed to be consistent throughout the entire codebase.
In vcard4-ts, data structures are created and filled type-safe from the start.
Because properties will be added on a line-by-line basis, required properties
cannot be ensured to exist from the start. Therefore, as an exception to this
rule, the existance of required fields is only ensured at the end.
The advantage of always having an array IMHO greatly outweighs the
disadvantages. Calling code can always assume that the contents _are_ an array.
I.e., arrays with just a single value are _never_ flattened (therefore the
name). If you are only interested in one value, just use the one at index 0,
which will always exist. If you want to deal with multiple values, use array
methods such as map() and join(), which you can always use, because it is
always an array. Yes, this results in more time and space spent during the
creation of the data structure.
More importantly, this relieves calling code from performing case distinctions
on every single access. Instead, the existence of the property can be asserted
once and every reference to it later already knows how to deal with it. It is
even possible to combine assertion and access with
optional chaining.
Array thickening
results in less code for the caller, which often also results in less code
coverage, i.e., the uncommon case is not tested. In other words, array
thickening turns the general case (whether common or uncommon) into the only
case.
- parseVCards(vcf: string, keepDefective?: boolean = false): ParsedVCards:sortByPREF
Parse a string into possibly multiple VCards. Details below.
- : Sort properties which existPREF
multiple times by their preference parameter (1…100; the ones without groupVCard
are sorted last).
- : Grouptop
properties with group labels into their named group (all non-lowercase names).
Anything without an explicit group label will end up in the .GroupedVCard
( is Record).
Sorting and grouping are separate functions, not methods of an object, to ensure
that their code will only be included if you call them.
If you need sorting _and_ grouping, use the following sequence:
`ts`
const cards = parseVCards(vcf);
if (cards.vCards) {
for (const card of cards.vCards) {
sortByPREF(card);
const grouped = groupVCard(card);
// Process the PREF-sorted groups here
}
}
All vCard properties and parameters in the data structures are uppercase and
dashes have been converted to underscores. This makes them clearly visible and
easily accessible as JavaScript/TypeScript properties, avoiding the
harder-to-type hash/array notation (i.e., card.SORT_AS instead ofcard['SORT-AS']).
Lowercase JavaScript/TypeScript properties are maintained by the parser.
- BEGIN, END, and VERSION exist exactly once (cardinality 1 in RFC6350;FN
required value in TypeScript)
- (full name) exists at least once (1* in RFC6350; optional array inPRODID
TypeScript)
- , UID, REV, KIND, N (name), BDAY, BIRTHPLACE, DEATHDATE,DEATHPLACE
, ANNIVERSARY, and GENDER are optional (*1 in RFC6350;*
optional value in TypeScript)
- All others can occur any number of times ( in
RFC6350; optional array in TypeScript)
- N is an object with the following properties: familyNames, givenNames,additionalNames
, honorificPrefixes, honorificSuffixes; each a requiredstring[]
. Remember that arrays are guaranteed to always have at least onehonorificPrefixes
element, i.e., the an empty property will be encoded as an['']
array consisting of an empty string .ADR
- is similar to N, but with the following string array fields:postOfficeBox
, extendedAddress, streetAddress, locality (city),region
, postalCode, and countryName.GENDER
- consists of two strings, a required sex and an optionaltext
explanatory . sex is required by RFC6350 to be one of M, F, O,N
, U, or the empty string. However, this is not checked by vcard4-ts.CLIENTPIDMAP
- consists of pidRef, a number, and a uri, a string.string
- All other properties' values are mapped to a single , even if they are
defined as more structured types, such as dates or URIs.
Properties can have
(mostly optional) parameters:
- PREF is a number. It is not asserted whether it is in the range [1…100]NaN
required by the RFC; non-numeric values are returned as .INDEX
- is a number. It is not asserted whether it is a strictly positiveNaN
integer as mandated by RFC6715; non-numeric values are returned as .PID
- , TYPE, and SORT_AS (SORT-AS in the VCF) are string[]s, againTYPE
with a guaranteed minimum array length of 1. (Please note that
the example in the RFC
quotes the enumeration of s, which seems inconsistent with theTYPE
definition,split(',')
so you may want to apply to all TYPE values first.)string
- All others are single s.
Any property or parameter whose type is not explicitely given in RFC6350 and the
RFCs that extend it, including those prefixed by X-, are not included at the
same level as the rest of the properties. One reason is that
TypeScript does not really allow default types on object properties
and therefore,
nested index signatures
are recommended for this.
Instead, non-RFC properties and parameters are put into an x object property.string
The actual value will be a plain, unprocessed . If it has more
structure, you need to extract it yourselves, e.g. using
- scan1DValue(), which unescapes and splits at the specified splitChar (,,PID
as used for or TYPE parameters; or ;, as used for the GENDERscan2DValue()
value); or
- , which splits into a string[][] at ; _and_ , (used forADR
and N values).
For example, the string value of an X-ABUID property in card card would becard.x.X_ABUID.value
available as .
Your application can just ignore the errors, if it does not want to bother.
One of the design goals so obvious that it was not specifically mentioned above,
is that vcard4-ts should be as easy to use as possible. Anyone who ever had to
deal with user-specified input can tell horror stories about what can go wrong.
Last but not least, ensuring
user-specified input fulfills certain requirements is also a matter of security.
Therefore, parseVCards() returns the information in a format as consistent as
possible, minimizing doubt and variability. In general, any line that cannot be
parsed is ignored, and any vCard which does not fulfill minimum criteria is
discarded.
This process is documented in the nags property of the returned object(s). Thenags property is an array of warnings and errors that occurred during the
processing.
#### Warnings and errors
A _warning_ indicates that even though the input does not fulfill an RFC6350
criteria, the parser believes that it could safely correct the problem and that
the data returned is probably exactly what its originator meant it to be.
An _error_, on the other hand, indicates that some information was dropped, or,
alternatively, that some required information was added. The resulting parsed
data is not the same as originally provided, but it is the best the parser could
do to achieve RFC6350 conformance.
If at least one actual error (not just warnings) is included in the nags,
hasErrors is set to true. Depending on the policy of the calling code,
- data can be accepted as returned by the parser (most lenient),
- data can be refused if hasErrors is true (it always exists, but hopefullyfalse
is ), ornags
- data can be refused if exists (i.e., any errors or warnings occured;
the most strict policy).
#### Global, local, and mixed nags
_Local_ nags are specific to a vCard and are stored there, alongside the
properties.
Local nags have the following type:
`ts`
{
key: string; // A short string to match against in the code
description: string; // A longer english-language description to display to the user
isError: boolean; // Error or warning?
attributes: {
property: string; // The property it occurred at (or '', if there was a property name parsing problem)
parameter?: string; // If the problem occurred while parsing a parameter, this is its name
line?: string; // The first few characters of the line on which this error occurred
}
}
_Global_ nags are set at the top level of the returned structure, alongside the
vCards field, if it exists. They indicate problems not related to a vCard, or
related to a vCard which was not included because it was considered too bad to
be returned.
Global nags use the same type as local nags above, but without the attributes.
_Mixed_ nags are used to indicate errors affecting an entire vCard (there are no
mixed warnings). If parseVCards() detects a major problem with a vCardVCARD_BAD_TYPE
( or VCARD_NOT_BEGIN), then—by default—this vCard is droppedkeepDefective=true
and the error—unable to be stored _in_ the vCard itself—is bubbled up to the
_global_ level. However, if is passed as an optional
argument, these vCards are not dropped and the error is stored in the vCard
itself.
#### The nags
- FILE_EMPTY: A global error.FILE_CRLF
- : A global warning, that lines did not end in carriage return+lineVCARD_BAD_TYPE
feed as specified in RFC6350, but just with line feeds. (This only checks the
first line end and is therefore subject to false negatives, if line ends are
not consistent.)
- : A mixed error resulting in a defective card. The BEGIN orEND
property does not have the required VCARD value.VCARD_NOT_BEGIN
- : A mixed error resulting in a defective card. The firstBEGIN
property of the vCard is not a property.VCARD_MISSING_PROP
- : A local error. A required property is missing and hasVERSION
been added with a default value. The default for is 4.0; for FN,PROP_NAME_EMPTY
the empty string.
- : A local error. The property has an empty name.PROP_NAME_EOL
- : A local error. The property name is terminated by the end ofPROP_DUPLICATE
line, i.e., colon and value are missing.
- : A local error. property which may not appear more than oncePARAM_UNCLOSED_QUOTE
has been seen a second time.
- : A local error. A parameter had a quoted value, but thePARAM_MISSING_EQUALS
quote was unbalanced.
- : A local error. A parameter name was not terminated byPARAM_INVALID_NUMBER
an equals sign.
- : A local error. The parameter value should have been aPARAM_DUPLICATE
number but wasn't.
- : A local error. A parameter that can only have a singlePARAM_UNESCAPED_COMMA
value was specified more than once.
- : A local warning. A parameter accepting only a singlePARAM_BAD_BACKSLASH
value contained an unescaped comma. This may indicate incomplete character
escaping or trying to provide multiple values where they are not allowed.
- : A local warning. In a double-quoted parameter value, a^
backslash was found. Escaping in quoted parameter values should be according
to RFC6868, using circumflexes (). This indicates a possible problem in thePARAM_BAD_CIRCUMFLEX
input file; the backslash was not treated as a special character.
- : A local warning. In a double-quoted parameter value, a^
circumflex () was found, which was not part of an escape sequence. ThisVALUE_INVALID
indicates a possible problem in the input file; that circumflex was not
treated as a special character.
- : A local error. A property with a required value had aVALUE_UNESCAPED_COMMA
different value.
- : A local warning. A property accepting only a singlePHOTO
value contained an unescaped comma. This may indicate old-style (vCard3)
value, e.g. for , which is considered incomplete character escaping in
vCard4.
#### Unparseable lines
If any lines in the current vCard left the parser speechless, they are stored
essentially unmodified in the unparseable array. The only modification is that
wrapped lines have been unwrapped, as this happens before parsing. You most
likely want to ignore those lines, unless you want to re-export the vCard as
faithfully as possible, even if that violates the standard (and might cause
errors for other parsers).
- Searching for vcard on NPM results in mostly vCard
generators or converters to/from other formats. Notable exceptions:
- vcard4 is a vCard 4.0 generator which
also includes parsing capabilities.
Trying to create type annotations for vcard4 turned out to be hard. Thevcard4
resulting types for the parser would be so lax as not to help when writing a
program processing it further, requiring runtime type verification in the
application. Also, their design decision to transform arrays with a single
member into requires every access to verify the field's structure.
Furthermore, it has some minor issues with its RFC 6350 compliance (lack of
proper property group support or incomplete unescaping rules) and the IETF's
general
Robustness principle
(i.e., not accepting bare newlines).
- vdata-parser is a generic
vCard/vCalendar parser, handling multiple cards in a single file.
Similar to above, it does not seem amenable to reasonably tight
types and mixes elements and arrays. Furthermore, it is unaware of the
expected parameter/property structure and does not handle escaped data.
- The runtime type introspection required for DRY is modeled after
Zod.
Zod was even used for an early prototype. However, a ultra-lightweight,
tailored alternative to Zod was created (clocking in at under 200 bytes
minified/gzipped). Zod would have created overhead (additional dependencies,
bundle size, but especially the amount of code needed to define and query the
schema, while having to touch Zod internals which might change in the future),
while providing little benefit. For example, Zod's transform` seemed to be
impossible to apply to parsing directly. So, Zod's would just have been used
to duplicate work that had already been performed