Beautiful-dom is a lightweight library that mirrors the capabilities of the HTML DOM API needed for parsing crawled HTML/XML pages. It models the methods and properties of HTML nodes that are relevant for extracting data from HTML nodes. It is written in
npm install beautiful-dombash
npm install --save beautiful-dom
`
`js
const BeautifulDom = require('beautiful-dom');
const document =
My name is Ajah, C.S. and I am a software developer
;
const dom = new BeautifulDom(document);
`
API
Methods on the document object.
- document.getElementsByTagName()
- document.getElementsByClassName()
- document.getElementsByName()
- document.getElementById()
- document.querySelectorAll()
- document.querySelector()
Methods on the HTML node object
- node.getElementsByClassName()
- node.getElementsByTagName()
- node.querySelector()
- node.querySelectorAll()
- node.getAttribute()
Properties of the HTML node object
- node.outerHTML
- node.innerHTML
- node.textContent
- node.innerText
Their usage is as they are expected to be used in an actual HTML DOM with the desired method parameters.
Examples for document object
`js
let paragraphNodes = dom.getElementsByTagName('p');
// returns a list of node objects with node name 'p'
let nodesWithSpecificClass = dom.getElementsByClassName('work');
// returns a list of node objects with class name 'work'
let nodeWithSpecificId = dom.getElementById('container');
// returns a node with id 'container'
let complexQueryNodes = dom.querySelectorAll('p.paragraph b');
// returns a list of nodes that satisfy the complex query of CSS selectors
let nodesWithSpecificName = dom.getElementsByName('name');
// returns a list of nodes with the specific 'name'
let linkNode = dom.querySelector('a#myWebsite');
// returns a node object with with the CSS selector
let linkHref = linkNode.getAttribute('href');
// returns the value of the attribute e.g 'https://www.ajah.xyz'
let linkInnerHTML = linkNode.innerHTML
// returns the innerHTML of a node object e.g ' My website '
let linkTextContent = linkNode.textContent
// returns the textContent of a node object e.g ' My website '
let linkInnerText = linkNode.innerText
// returns the innerText of a node object e.g ' My website '
let linkOuterHTML = linkNode.outerHTML
// returns the outerHTML of a node object i.e. ' My website '
`
Examples for a node object
`js
let paragraphNodes = dom.getElementsByTagName('p');
// returns a list of node objects with node name 'p'
let nodesWithSpecificClass = paragraphNodes[0].getElementsByClassName('work');
// returns a list of node objects inside the first paragraph node with class name 'work'
let complexQueryNodes = paragraphNodes[0].querySelectorAll('span.work');
// returns a list of nodes in the paragraph node that satisfy the complex query of CSS selectors
let linkNode = dom.querySelector('a#myWebsite');
// returns a node object with with the CSS selector
let linkHref = linkNode.getAttribute('href');
// returns the value of the attribute e.g 'https://www.ajah.xyz'
let linkInnerHTML = linkNode.innerHTML
// returns the innerHTML of a node object e.g ' My website '
let linkTextContent = linkNode.textContent
// returns the textContent of a node object e.g ' My website '
let linkInnerText = linkNode.innerText
// returns the innerText of a node object e.g ' My website '
let linkOuterHTML = linkNode.outerHTML
// returns the outerHTML of a node object i.e. ' My website '
`
Contributing
In case you have any ideas, features you would like to be included or any bug fixes, you can send a PR.
(Requires Node v6 or above)
- Clone the repo
`bash
git clone https://github.com/ChukwuEmekaAjah/beautiful-dom.git
``