Archive and make discoverable data and links with schema.org metadata.
npm install dcatdcat
====
Archive and make discoverable data and links with
schema.org metadata.

Usage (CLI)
===========
dcat --help
adduser`)Run
dcat adduser
and follow the prompting wizard.
publish`)`dcat` allows to publish JSON-LD`
documents using dcat.io context. This context
extends schema.org with terms relevant to do I/O
and preserve data integrity (like filepath` and `Checksum`).
A minimum document has to contain
- a context (`@context`), set to https://dcat.io,`
- an id (@id`) used to uniquely identify things published on`
dcat.io with URLs. All relative URLs will be resolve with a base
(defined in the context (@base`)) of`
https://dcat.io`
e.g:
{
"@context": "https://dcat.io",
"@id": "mydoc"
}
To publish this document, create a file named `JSONLD` and, in the
directory containing it run:
dcat publish
After publication the document will be available at `https://dcat.io/mydoc`.
Documents can contains any properties from
schema.org or from any other ontologies as long
as the associated `@context` are provided.
If a `version` property is specified
in the document, the document will be versionned, that is each update
will require a new version value to be published (preventing to
overwrite existing versions).
When appropriate version number SHOULD follow
semantic versionning
e.g:
{
"@context": "https://dcat.io",
"@id": "mydoc",
"version": "0.0.1"
}
After publication this document will be available at
`https://dcat.io/mydoc?version=0.0.1` whereas the latest version`
will always be available at https://dcat.io/mydoc`.
In case the document is versionned following
Semantic Versioning, a range (e.g `<0.0.1`)`
can be specified as version` (e.g. `https://dcat.io/mydoc?version=<0.0.1`)
Document can be arbitrarily complex (having multiple nodes) and
sometimes, it makes sense to want to assign an URL to a node so that
it can be referred. This is achieved by setting `@id` properties
to the desired nodes e.g:
{
"@context": "https://dcat.io",
"@id": "mydoc",
"version": "0.0.1",
"hasPart": {
"@id": "mydoc/data",
"@type": "Dataset",
"description": "a dataset part of the document"
}
}
The whole document can be retrieved at `https://dcat.io/mydoc``
whereas the part can be retrieved at https://dcat.io/mydoc/data`
Note: nodes can be any valid URLs _but_ they have to be namespaced
within the top level `@id` (for a document of `""@id":`
"mydoc"", `"@id": "mydoc/arbitrarily/long/pathname"` will be`
valid whereas "@id": "part"` won't).
`dcat` can be used to add _machine readable_ metadata to any
resources already published on the web.
For instance running:
dcat init https://github.com/standard-analytics/dcat.git
we get a basic machine readable document:
{
"@context": "https://dcat.io",
"@id": "mydoc",
"@type": "Code",
"codeRepository": "https://github.com/standard-analytics/dcat",
"encoding": {
"@type": "MediaObject",
"contentUrl": "https://api.github.com/repos/standard-analytics/dcat/tarball/master",
"encodingFormat": "application/x-gzip",
"contentSize": 690980
}
}
This document should be extended with more properties (from
schema.org (such as
author,
contributor,
about,
programmingLanguage,
runtime... ) or any other web ontologies
(taking care to add contexts in the latter case)) to improve the
discoverability and reusability of the resource.
Note, in addition to absolute URLs, `dcat` supports`
CURIE for the prefixes defined in the
dcat.io @context`. Using a CURIE, the previous is simplified to:
dcat init github:standard-analytics/dcat.git
For all the subclasses of
schema.org/CreativeWork (e.g
Dataset, Code,
SoftwareApplication,
Article, Book,
ImageObject,
VideoObject,
AudioObject, ...) `dcat` allows
to publish raw data from files (dataset, binaries, images, media...)
along with documents.
For instance if you have an a
PDF of
MedicalScholarlyArticle
and an associated Dataset in
CSV you can run:
dcat init --main article.pdf::MedicalScholarlyArticle --part data.csv
Note: `::MedicalScholarlyArticle` allows to associate a type`
(@type`) with the resource (`article.pdf`).
This will generate a machine readable document (JSONLD) that you can
edit to provide additional metadata.
{
"@context": "https://dcat.io",
"@id": "mydoc",
"@type": "MedicalScholarlyArticle",
"encoding": {
"@type": "MediaObject",
"filePath": "article.pdf"
},
"hasPart": {
"@type": "Dataset",
"distribution": {
"@type": "DataDownload",
"filePath": "data.csv"
}
}
}
After publication (`dcat publish`) the document will acquire
additional URLs properties that can be dereferenced to retrieved the
original raw data:
{
"@context": "https://dcat.io",
"@id": "mydoc",
"@type": "MedicalScholarlyArticle",
"encoding": {
"@type": "MediaObject",
"filePath": "article.pdf",
"contentUrl": "http://example.com/article.pdf" //generated URL
},
"hasPart": {
"@type": "Dataset",
"distribution": {
"@type": "DataDownload",
"filePath": "data.csv",
"contentUrl": "http://example.com/data.csv" //generated URL
}
}
}
Note: `dcat init` supports globbing so you can run commands like:
dcat init --main article.pdf --part *.csv
or repeat `--part` (or the shorter `-p`) if you need more complex matching e.g:
dcat init --m article.pdf -p .csv -p .jpg
TODO describe directories
unpublish`)To delete a specific version of a document of `"@id": "mydoc" run:
dcat unpublish ldr:mydoc?version=0.1.1
`ldr` is the prefix used for `https://dcat.io` (defined in the`
dcat.io @context`).
To delete all versions of a document of `"@id": "mydoc"` run:
dcat unpublish ldr:mydoc
search`, `show`, `clone`)Document containing keywords,
name or
description properties can be
searched by keyword with `dcat search` followed by a list of
keywords.
For more powerful search, all data published on
dcat.io are valid
linked data fragments and can be
queried using SPARQL.
`dcat show` followed by a CURIE
allows to display on
stdout the latest
JSON-LD document corresponding to the CURIE.
Different options (`-e, --expand`, `-f, --flatten`, `-c,`
--compact, `-n, --normalize`) allow to have different
representation of the document. For instance,
`dcat show ldr:mydoc?version=<2.1.0 --normalize`
will serialize the latest version smaller than 2.1.0 of the document
of `"@id": "mydoc"` to N-Quads
(RDF).
`dcat clone` followed by a CURIE allows to download the raw data`
associated with a document and store them along with the document on
disk at the paths specified by the filepath` properties.
maintainer`)Only maintainers of a document can publish or remove versions of a
document. Maintainers of a document can be listed with:
dcat maintainer ls
Maintainers can give users maintainer rights by running:
dcat maintainer add
Note: all user of dcat.io of a CURI of ldr:users/{username}
Maintainers can remove maintainer rights by running:
dcat maintainer rm
API
===
`dcat` can also be used programmatically.
var Dcat = require('dcat');
var dcat = new Dcat();
var doc = {
'@context': 'https://dcat.io,
'@id': 'test',
name: 'hello world'
};
dcat.publish(doc, function(err, cdoc){
console.log(err, cdoc); //cdoc is compacted
});
See `test/test.js` for more examples.
History
=======
`package.json` -> `datapackage.json` -> `package.jsonld` -> `JSON-LD` + schema.org + hydra + linked data fragment.
Registry
========
By default, `dcat`` uses dcat.io
linked data registry
hosted on cloudant.
Tests
=====
You need a local instance of the linked data registry running on your machine on port 3000. Then, run:
npm test
License
=======
Apache-2.0.