A fast and lightweight streaming Microdata to RDF parser
npm install microdata-rdf-streaming-parser


A fast and lightweight _streaming_ and 100% _spec-compliant_ Microdata to RDF parser,
with RDFJS representations of RDF terms, quads and triples.
The streaming nature allows triples to be emitted _as soon as possible_, and documents _larger than memory_ to be parsed.
``bash`
$ npm install microdata-rdf-streaming-parser
or
`bash`
$ yarn add microdata-rdf-streaming-parser
This package also works out-of-the-box in browsers via tools such as webpack and browserify.
`javascript`
import {MicrodataRdfParser} from "microdata-rdf-streaming-parser";
_or_
`javascript`
const MicrodataRdfParser = require("microdata-rdf-streaming-parser").MicrodataRdfParser;
MicrodataRdfParser is a Node Transform stream
that takes in chunks of Microdata to RDF data,
and outputs RDFJS-compliant quads.
It can be used to pipe streams to,
or you can write strings into the parser directly.
Optionally, the following parameters can be set in the MicrodataRdfParser constructor:
* dataFactory: A custom RDFJS DataFactory to construct terms and triples. _(Default: require('@rdfjs/data-model'))_baseIRI
* : An initial default base IRI. _(Default: '')_defaultGraph
* : The default graph for constructing quads. _(Default: defaultGraph())_htmlParseListener
* : An optional listener for the internal HTML parse events, should implement IHtmlParseListener _(Default: null)_xmlMode
* : If the parser should assume strict X(HT)ML documents. _(Default: false)_vocabRegistry
* : A vocabulary registry to define specific behaviour for given URI prefixes. _(Default: contents of http://www.w3.org/ns/md)_
`javascript`
new RdfaParser({
dataFactory: require('@rdfjs/data-model'),
baseIRI: 'http://example.org/',
defaultGraph: namedNode('http://example.org/graph'),
htmlParseListener: new MyHtmlListener(),
xmlMode: true,
vocabRegistry: {
"http://schema.org/": {
"properties": {
"additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}
}
},
"http://microformats.org/profile/hcard": {}
},
});
This tool makes use of the highly performant htmlparser2 library for parsing HTML in a streaming way.
It listens to tag-events, and maintains the required tag metadata in a stack-based datastructure,
which can then be emitted as triples as soon as possible.
Our algorithm closely resembles the suggested algorithm for transforming Microdata to RDF,
with a few changes to make it work in a streaming way.
If you want to make use of a different HTML/XML parser,
you can create a regular instance of MicrodataRdfParser,
and just call the following methods yourself directly:
* onTagOpen(name: string, attributes: {[s: string]: string})onText(data: string)
* onTagClose()`
*
This parser passes all tests from the Microdata to RDF test suite.
This software is written by Ruben Taelman.
This code is released under the MIT license.