DOM parsing and extraction utilities
npm install @xcrap/dombash
npm i @xcrap/dom
`
---
๐ ๏ธ How to Use
There are several ways to use this parsing engine, from using pre-made models to expanding it by creating parsers for other file types and maintaining the interlocking of these models.
$3
`ts
import { DomParser } from "@xcrap/dom"
const html = "Page Title " // or document.documentElement.outerHTML
const parser = new DomParser(html)
`
$3
`ts
import { DomParser, extract } from "@xcrap/dom"
const html =
const parser = new DomParser(html)
// parseFirst() searches for and extracts something from the first element found
// extract(key: string, isAttribute?: boolean) is a generic extraction function; you can use some that are already created and ready for use by importing them from the same location :)
const title = parser.parseFirst({ query: "title", extractor: extract("innerText") })
// parseMany() searches for all elements matching a query (you can limit the number of results) and uses the extractor to get the data
const links = parser.parseMany({ query: "a", extractor: extract("href", true) })
console.log(title) // "Page Title"
console.log(links) // ["https://example.com"]
`
$3
ParsingModels are sufficiently decoupled so that you don't have to rely on Parser instances, but we will use them here nonetheless:
`ts
import { DomParser, DomParsingModel, extract } from "@xcrap/dom"
const html = 1
Name
23
const parser = new DomParser(html)
const rootParsingModel = new DomParsingModel({
heading: {
query: "h1",
extractor: extract("innerText")
},
id: {
query: "#id",
extractor: extract("innerText")
},
name: {
query: "#name",
extractor: extract("innerText")
},
age: {
query: ".age",
extractor: extract("innerText")
}
})
const data = parser.extractFirst({ model: rootParsingModel })
console.log(data) // { heading: "Header", id: "1", name: "Name", age: "23" }
`
๐ง Create your own Parser: Concepts
$3
A Parser for this library is a class that handles a file type in some way, loads that file, and may or may not have methods to easily extract data.
A parser has a default method called parseModel, which is a wrapper that receives a ParsingModel and calls the parse() method, providing the internal source property.
$3
A Parsing Model is a class that receives a shape in its constructor and stores it as a property. It must have a method called parse() that receives a source, which is the code/text containing the information to be extracted.
This shape is used to declare how the information will be extracted from the source.
๐งช Testing
Automated tests are located in __tests__. To run them:
`bash
npm run test
`
๐ค Contributing
* Want to contribute? Follow these steps:
* Fork the repository.
* Create a new branch (git checkout -b feature-new).
* Commit your changes (git commit -m 'Add new feature').
* Push to the branch (git push origin feature-new`).