Barnard59 Linked Data pipelines
npm install barnard59barnard59 is a toolkit to automate extract, transform and load (ETL) tasks. Its main focus is on creating Linked Data. It allows you to generate RDF out of non-RDF data sources. In doing so, it follows the standard adopted in Semantic Web.
More specifically, barnard59 is an engine to execute data pipelines.
A pipeline is a way to extract, transform, and load data (ETL). It allows you to consolidate data from various sources, and publish it as RDF.
Using a pipeline you can:
* download the source the data
* validate it
* transform it to linked data
* publish it
barnard59 expects a pipeline to be defined as linked-data. The supported formats are:
* application/ld+json
* application/n-triples
* text/turtle
* application/rdf+xml
To learn more about building your own pipeline, go here.
barnard59, you will need node.js environment. All node and npm versions which are actively maintained are supported by barnard59.
To check if you have Node.js installed, run this command in your terminal:node -v
To confirm that you have npm installed, run:npm -v
If you need to install (or upgrade) node, or npm, go here.
Once node and npm are installed, you can install the dependencies for this project. To do so, run:npm install
To execute pipeline, run:barnard59 run
The following arguments are available:
* --pipeline
* --output
* --verbose or -v
* --enable-buffer-monitor
* --variable
If the file contains more than one pipeline, you can define the pipeline to execute. By default, first pipeline will be executed.
To define pipeline to be executed, run:barnard59 run
Some pipelines will produce a write stream. By default, this stream can be will be printed in the terminal. To write it to the file, run:barnard59 run
Pipelines can represent complex data manipulation. To better understand what is happening, add options:
* --verbose or -v - to enable diagnostic console output
* --enable-buffer-monitor - to enable histogram of buffer usage
--variable argument.barnard59 run --variable = For example, to pass example.txt as a filename to pipeline.json, run:barnard59 run pipeline.json --variable filename=example.txt
In a CI/CD environment you might want to define sensitive arguments like usernames and passwords as environment variables. To pass them to the pipeline, run:barnard59 run pipeline.json --variable password=$PASSWORD
where $PASSWORD is a an environment variable.
To pass multiple arguments to the pipeline, call --variable argument multiple times:barnard59 run
For example, to pass $USER as a user, and $PASSWORD as a password to pipeline.json, run:barnard59 run pipeline.json --variable user=$USER --variable password=$PASSWORD
This pipeline parses a CSV file. It uses the CSV on the Web format. The file examples/parse-csvw.ttl contains two pipelines:
* - the main pipeline
* - the metadata pipeline
Tho execute the main pipeline, run:
```
node bin/barnard59.js run examples/parse-csvw.ttl --pipeline=http://example.org/pipeline/parseCsvw`
Or, if you'd like to use npx you can run: `
npx barnard59 run examples/parse-csvw.ttl --pipeline=http://example.org/pipeline/parseCsvw
This pipeline downloads, and transforms json file. It:
* fetches a JSON document from the predefined URL
* transforms JSON into a JSON-LD structure
This is done using a map step, and a context variable in the JSON file.
* generates RDF-JS quads from JSON structure
* serializes the quads into N-Triples.
For comparison, the same pipeline is defined in JSON-LD and Turtle format.
To execute the JSON-LD pipeline, run:
``
node bin/barnard59.js run examples/fetch-json-to-ntriples.json --pipeline http://example.org/pipeline/cet`
with npx: `
npx barnard59 run examples/fetch-json-to-ntriples.json --pipeline http://example.org/pipeline/cet
The --pipeline parameter is required, as the file contains two pipelines.
To execute the Turtle pipeline, run:
``
node bin/barnard59.js run examples/fetch-json-to-ntriples.ttl --pipeline http://example.org/pipeline/utc`
or`
npx barnard59 run examples/fetch-json-to-ntriples.ttl --pipeline http://example.org/pipeline/utc
By default, the pipeline stream will be written to stdout. Use --output parameter to write output to the file:
``
node bin/barnard59.js run examples/fetch-json-to-ntriples.json --pipeline http://example.org/pipeline/cet --output test.nt`
or``
npx barnard59 run examples/fetch-json-to-ntriples.json --pipeline http://example.org/pipeline/cet --output test.nt