XPath HTML

![CI](https://github.com/hieuvp/xpath-html/actions?query=workflow%3ACI+branch%3Amaster)
![Release](https://github.com/hieuvp/xpath-html/actions?query=workflow%3Arelease)
![NPM](https://www.npmjs.com/package/xpath-html)
![Downloads](https://www.npmjs.com/package/xpath-html)

XPath stands for XML Path Language.

It provides a flexible non-XML syntax to address (point to) different parts of an XML document.

> With the XPath HTML,
> this will enable us to use such a powerful tool,
> navigating through the HTML DOM by XPath expression.

If you want to learn more about XPath and
know how to use different XPath expression for finding complex or dynamic elements,
take a visit to this concise tutorial
here.

- Installation
- Usages
- Hello XPath from HTML World
- fromPageSource(html).findElement(expression)
- fromPageSource(html).findElements(expression)
- fromNode(xhtml).findElement(expression)
- fromNode(xhtml).findElements(expression)
- node.getTagName()
- node.getText()
- node.getAttribute(name)
- Dependencies
- License

Installation

xpath-html is available as a package on NPM,
open up a Terminal and enter the following command:

``shell script npm install --save xpath-html`

`Usages`

`$3`

`js const fs = require("fs"); const xpath = require("xpath-html");

// Assuming you have an html file locally, // Here is the content that I scraped from www.shopback.sg const html = fs.readFileSync(${__dirname}/shopback.html, "utf8");

// Don't worry about the input much, // you are able to use the HTML response of an HTTP request, // as long as the argument is a string type, everything should be fine. const node = xpath.fromPageSource(html).findElement("//*[contains(text(), 'with love')]");

console.log(The matched tag name is "${node.getTagName()}"); console.log(Your full text is "${node.getText()}");`

`shell script

`A fast way to download .html file above`


$ curl https://www.shopback.sg -o shopback.html
Or from my GitHub examples

$ curl -O https://raw.githubusercontent.com/hieuvp/xpath-html/master/examples/shopback.html


Bang 💥 Output should be something looks like:

`txt The matched tag name is "div" Your full text is "Made with love by"`

It is understandable, right? Now, you can scroll down the APIs below and diving into details.

`$3`

> Locate an element on a page, > the returned node is a representation of the underlying DOM.

Arguments:

| Name | Type | Description | | ------------ | -------- | -------------------------- | |html | string| Input HTML page's source | |expression | string | The given XPath expression |

Returns: Node

Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const node = xpath.fromPageSource(html).findElement("//*[text()='Made with love by']");

console.log(node.toString());`

Result:

`txt

Made with love by



$3
> Search for multiple elements on a page.
Arguments:

| Name | Type | Description | | ------------ | -------- | -------------------------- | |html | string| Input HTML page's source | |expression | string | The given XPath expression |

Returns: Array

Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const nodes = xpath .fromPageSource(html) .findElements("//img[starts-with(@src, 'https://cloud.shopback.com')]");

console.log("Number of nodes found:", nodes.length); console.log("nodes[0]:", nodes[0].toString()); console.log("nodes[1]:", nodes[1].toString());`

Result:

`txt Number of nodes found: 158 nodes[0]: nodes[1]:`

`$3`

> Select an element against an XHTML format. > Similar tofromPageSource(html).findElement(expression), > but it is for a subset of anhtml page this time.

Arguments:

| Name | Type | Description | | ------------ | ------------------ | ------------------------------------------------------------------------------------- | |xhtml | Node or string| Either a returned node from a query or an xhtml string with a good shape | |expression | string | The given XPath expression |

Returns: Node

Notes:

- The input xhtml must have a namespace of xmlns="http://www.w3.org/1999/xhtml"e.g.

Made with love by


Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const group = xpath.fromPageSource(html).findElement("//div[@class='ui-store-group']");

const node = xpath.fromNode(group).findElement("//a[@href='/aliexpress']");

console.log(node.toString());`

Result:

`txt`

`$3`

> Select multiple elements against an XHTML format. > Same asfromPageSource(html).findElements(expression), > however it is being used for querying from a part of anhtml.

Arguments:

Returns: Array

Notes:

- The input xhtml must have a namespace of xmlns="http://www.w3.org/1999/xhtml"e.g.

Made with love by


Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const group = xpath.fromPageSource(html).findElement("//div[@class='ui-store-group']");

const nodes = xpath.fromNode(group).findElements("//img[contains(@src,'shopily')]");

console.log("Number of nodes found:", nodes.length); console.log("nodes[0]:", nodes[0].toString()); console.log("nodes[1]:", nodes[1].toString());`

Result:

`txt Number of nodes found: 2 nodes[0]: nodes[1]:`

`$3`

> Retrieve the node's tag name.

Arguments: None

Returns: string

Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const node = xpath.fromPageSource(html).findElement("//*[text()='Made with love by']");

console.log("Single node's tag name:", node.getTagName());

const nodes = xpath .fromPageSource(html) .findElements("//img[starts-with(@src, 'https://cloud.shopback.com')]");

console.log("First nodes[0] tag name:", nodes[0].getTagName()); console.log("Second nodes[1] tag name:", nodes[1].getTagName());`

Result:

`txt Single node's tag name: div First nodes[0] tag name: img Second nodes[1] tag name: img`

`$3`

> Get the visible innerText of the node.

Arguments: None

Returns: string

Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const node = xpath.fromPageSource(html).findElement("//*[text()='Made with love by']");

console.log("Text of the node:", node.getText());

const nodes = xpath .fromPageSource(html) .findElements("//div[@id='home-page-container']//*[@class='title-text']");

console.log("Text of nodes[0]:", nodes[0].getText()); console.log("Text of nodes[1]:", nodes[1].getText());`

Result:

`txt Text of the node: Made with love by Text of nodes[0]: Up to 10.0% Cash Rewards Text of nodes[1]: Up to 7.0% Cashback`

`$3`

> Retrieve the current value of the given attribute of this node.

Arguments:

| Name | Type | Description | | ------ | -------- | ---------------------------------- | |name | string | The name of the attribute to query |

Returns: string

Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const node = xpath.fromPageSource(html).findElement("//a[text()='View All Popular Stores']");

console.log("The href value:", node.getAttribute("href"));`

Result:

`txt The href value: /all-stores`

`Dependencies`

Special thanks to all contributors of these libraries which are the foundation of whatxpath-html` was built upon.

1. xpath
1. xmldom
1. xmlserializer
1. parse5

License

MIT

Made with ❤ from ShopBack.

XPath HTML

XPath stands for XML Path Language.

It provides a flexible non-XML syntax to address (point to) different parts of an XML document.

> With the XPath HTML,
> this will enable us to use such a powerful tool,
> navigating through the HTML DOM by XPath expression.

If you want to learn more about XPath and
know how to use different XPath expression for finding complex or dynamic elements,
take a visit to this concise tutorial
here.

Installation

xpath-html is available as a package on NPM,
open up a Terminal and enter the following command:

``shell script npm install --save xpath-html`

`Usages`

`$3`

`js const fs = require("fs"); const xpath = require("xpath-html");

// Assuming you have an html file locally, // Here is the content that I scraped from www.shopback.sg const html = fs.readFileSync(${__dirname}/shopback.html, "utf8");

console.log(The matched tag name is "${node.getTagName()}"); console.log(Your full text is "${node.getText()}");`

`shell script

`A fast way to download .html file above`


$ curl https://www.shopback.sg -o shopback.html
Or from my GitHub examples

$ curl -O https://raw.githubusercontent.com/hieuvp/xpath-html/master/examples/shopback.html


Bang 💥 Output should be something looks like:

`txt The matched tag name is "div" Your full text is "Made with love by"`

It is understandable, right? Now, you can scroll down the APIs below and diving into details.

`$3`

> Locate an element on a page, > the returned node is a representation of the underlying DOM.

Arguments:

| Name | Type | Description | | ------------ | -------- | -------------------------- | |html | string| Input HTML page's source | |expression | string | The given XPath expression |

Returns: Node

Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const node = xpath.fromPageSource(html).findElement("//*[text()='Made with love by']");

console.log(node.toString());`

Result:

`txt

Made with love by



$3
> Search for multiple elements on a page.
Arguments:

| Name | Type | Description | | ------------ | -------- | -------------------------- | |html | string| Input HTML page's source | |expression | string | The given XPath expression |

Returns: Array

Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const nodes = xpath .fromPageSource(html) .findElements("//img[starts-with(@src, 'https://cloud.shopback.com')]");

console.log("Number of nodes found:", nodes.length); console.log("nodes[0]:", nodes[0].toString()); console.log("nodes[1]:", nodes[1].toString());`

Result:

`txt Number of nodes found: 158 nodes[0]: nodes[1]:`

`$3`

> Select an element against an XHTML format. > Similar tofromPageSource(html).findElement(expression), > but it is for a subset of anhtml page this time.

Arguments:

Returns: Node

Notes:

- The input xhtml must have a namespace of xmlns="http://www.w3.org/1999/xhtml"e.g.

Made with love by


Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const group = xpath.fromPageSource(html).findElement("//div[@class='ui-store-group']");

const node = xpath.fromNode(group).findElement("//a[@href='/aliexpress']");

console.log(node.toString());`

Result:

`txt`

`$3`

> Select multiple elements against an XHTML format. > Same asfromPageSource(html).findElements(expression), > however it is being used for querying from a part of anhtml.

Arguments:

Returns: Array

Notes:

- The input xhtml must have a namespace of xmlns="http://www.w3.org/1999/xhtml"e.g.

Made with love by


Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const group = xpath.fromPageSource(html).findElement("//div[@class='ui-store-group']");

const nodes = xpath.fromNode(group).findElements("//img[contains(@src,'shopily')]");

console.log("Number of nodes found:", nodes.length); console.log("nodes[0]:", nodes[0].toString()); console.log("nodes[1]:", nodes[1].toString());`

Result:

`txt Number of nodes found: 2 nodes[0]: nodes[1]:`

`$3`

> Retrieve the node's tag name.

Arguments: None

Returns: string

Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const node = xpath.fromPageSource(html).findElement("//*[text()='Made with love by']");

console.log("Single node's tag name:", node.getTagName());

const nodes = xpath .fromPageSource(html) .findElements("//img[starts-with(@src, 'https://cloud.shopback.com')]");

console.log("First nodes[0] tag name:", nodes[0].getTagName()); console.log("Second nodes[1] tag name:", nodes[1].getTagName());`

Result:

`txt Single node's tag name: div First nodes[0] tag name: img Second nodes[1] tag name: img`

`$3`

> Get the visible innerText of the node.

Arguments: None

Returns: string

Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const node = xpath.fromPageSource(html).findElement("//*[text()='Made with love by']");

console.log("Text of the node:", node.getText());

const nodes = xpath .fromPageSource(html) .findElements("//div[@id='home-page-container']//*[@class='title-text']");

console.log("Text of nodes[0]:", nodes[0].getText()); console.log("Text of nodes[1]:", nodes[1].getText());`

Result:

`txt Text of the node: Made with love by Text of nodes[0]: Up to 10.0% Cash Rewards Text of nodes[1]: Up to 7.0% Cashback`

`$3`

> Retrieve the current value of the given attribute of this node.

Arguments:

| Name | Type | Description | | ------ | -------- | ---------------------------------- | |name | string | The name of the attribute to query |

Returns: string

Example:

`js const fs = require("fs"); const xpath = require("xpath-html");

const html = fs.readFileSync(${__dirname}/shopback.html, "utf8"); const node = xpath.fromPageSource(html).findElement("//a[text()='View All Popular Stores']");

console.log("The href value:", node.getAttribute("href"));`

Result:

`txt The href value: /all-stores`

`Dependencies`

Special thanks to all contributors of these libraries which are the foundation of whatxpath-html` was built upon.

1. xpath
1. xmldom
1. xmlserializer
1. parse5

License

MIT

Made with ❤ from ShopBack.

xpath-html

XPath HTML

Table of Contents

Installation

`Usages`

`$3`

`A fast way to download .html file above`

Or from my GitHub examples

`$3`

$3

`$3`

`$3`

`$3`

`$3`

`$3`

`Dependencies`

License

xpath-html

XPath HTML

Table of Contents

Installation

`Usages`

`$3`

`A fast way to download .html file above`

Or from my GitHub examples

`$3`

$3

`$3`

`$3`

`$3`

`$3`

`$3`

`Dependencies`

License