A streaming xml editor.
npm install xml-stream-editorLibrary to edit xml files in a streaming manner. Inspired by
xml-stream, but 1. allows using
current node versions, and 2. provides a higher level, easier to use API.
The main benefit of xml-stream-editor over most other existing
(and otherwise excellent) libraries for editing XML is that xml-stream-editor
allows you to modify XML without needing to buffer the XML files in memory.
For small to mid-sized XML files buffering is fine. But when editing very large
files (e.g., multi-Gb files) buffering can be a problem or an absolute blocker.
xml-stream-editor is designed to be used with node's stream systems
by subclassing stream.Transform,
so it can be used with the streams promises API
and stdlib interfaces like stream.pipeline.
The main way to use xml-stream-editor is to:
1. select which XML elements you want to edit using simple declarative selectors
(like _very_ simple XPath rules or CSS selectors), and
2. write functions to be called with each matching XML element in the document.
Those functions then either edit and return the provided element, or remove
the element from the document by returning nothing.
The main way to call xml-stream-editor is by importing createXMLEditor,
passing that function an object, with keys as selectors (strings that describe
which elements to edit) as keys, and values being functions that get passed
matching elements (to edit to delete those elements).
You choose which XML elements to edit by writing (simple, limited) CSS-selector Each element that matches a given selector is passed to the matching `` In addition to a rules ` // Options defined by the "saxes" library, and passed to the "saxes" parser // The createXMLEditor function takes the options object as an optional Start with this input as simpsons.xml ` You can edit in a streaming manner like this: ` // The keys of this object are selector strings, and the // Create an // Also create a new And you'll find this printed to STDOUT ` Nested editing functions are not supported. You can define as many editing For example (using to the same example XML document as above): ` const rules = { xml-stream-editor
like statements. For example, the selector parent child will match
all elements that are _immediate_ children of nodes.
Note, this is a little different than CSS selectors, where the selectordiv a would match elements that were were contained in elements,
regardless of whether the was an immediate child or more deeply nested.$3
function, with the signature (elm: Element) => Element | undefined,
and elements are structured as follows (as typescript):typescript`
interface Element {
name: string
text?: string
attributes: Record
children: Element[]
}$3
argument, createReadStream can also takeOptions
a second argument. This object has the follow parameters.typescripttrue
interface Options {
// Whether to check and enforce the validity of created and modified
// XML element names and attributes. If true, will throw an error
// if you create an XML element with a disallowed name (e.g.,
//
// (
//
// This only checks the syntax of the XML element names and attributes.
// It does not perform any further validation, like if used namespaces
// are valid.
//
// default:
validate: boolean // true
//
// https://github.com/lddubeau/saxes/blob/4968bd09b5fd0270a989c69913614b0e640dae1b/src/saxes.ts#L557
// https://www.npmjs.com/package/saxes
saxes?: SaxesOptions
}
// second argument.
const transformer = createXMLEditor(rules, options)
`Examples
:xml`
javascript
import { createReadStream } from 'node:fs'
import { pipeline } from 'node:stream/promises'
import { createXMLEditor, newElement } from 'xml-stream-editor'
// values are functions that get called with matching elements.
const rules = {
"main character": (elm) => {
switch (elm.text) {
case "Marge Simpson":
elm.attributes["hair"] = "blue"
break
case "Homer Simpson":
elm.text += " (Sr.)"
break
case "Lisa Simpson":
elm.text = ""
const instrumentElm = newElement("instrument")
instrumentElm.text = "saxophone"
elm.children.push(instrumentElm)
// element.
const nameElm = newElement("name")
nameElm.text = "Lisa Simpson"
elm.children.push(nameElm)
break
case "Bart Simpson":
// Remove the node by not returning an element.
return
}
return elm
}
}
await pipeline(
createReadStream("simpsons.xml"), // above example
createXMLEditor(rules),
process.stdout
)
` (reformatted and annotated):xml`
Notes
rules as you'd like, but only one rule can be matching the xml document
at a time as its being streamed. So anytime a selector is matching part of a
document that is already matched by a parent rule, that child rule will
not be applied.javascript
import { createReadStream } from 'node:fs'
import { pipeline } from 'node:stream/promises'
import { createXMLEditor, newElement } from 'xml-stream-editor'
// This rule will match first, since the "main" element will be
// identified first during parsing.
"main character": (elm) => {
// editing goes here
return elm
},
// And as a result, this rule will never match the "Disco Stu"
// or "Julius Hibbert" elements, since anytime the "character" selector
// would match a
// have already been matched by the above "main character" selector.
//
// However, this selector would match (and so this function would
// be called with) the two
// of the
"character": (elm) => {
// this function would never be called in this document.
return elm
},
}
await pipeline(
createReadStream("simpsons.xml"), // above example
createXMLEditor(rules),
process.stdout
)
`Motivation
was built to handle the extremely large XML files
generated by Brave Software's PageGraph system,
which records both a broad range of actions that occur when loading a web page
(e.g.,, an image sub-resource being loaded, a WebAPI being called, a HTML
element being added to the DOM), but also the actor in the page that is
responsible for that action (e.g., the element that included the image,
the