sitemapper - npm explorer

Sitemap-parser

![Code Scanning](https://github.com/seantomburke/sitemapper/actions/workflows/codeql-analysis.yml)
![NPM Publish](https://github.com/seantomburke/sitemapper/actions/workflows/npm-publish.yml)
![Version Bump](https://github.com/seantomburke/sitemapper/actions/workflows/version-bump.yml)
![Test](https://github.com/seantomburke/sitemapper/actions/workflows/test.yml)
![Codecov](https://codecov.io/gh/seantomburke/sitemapper)
![CodeFactor](https://www.codefactor.io/repository/github/seantomburke/sitemapper)
![GitHub license](https://github.com/seantomburke/sitemapper/blob/master/LICENSE)
![GitHub release date](https://github.com/seantomburke/sitemapper/releases)
![Inline docs](https://inch-ci.org/github/seantomburke/sitemapper)
![Libraries.io dependency status for latest release](https://libraries.io/npm/sitemapper)
![license](https://github.com/seantomburke/sitemapper/blob/main/LICENSE)
![Monthly Downloads](https://www.npmjs.com/package/sitemapper)
![npm version](https://badge.fury.io/js/sitemapper)
![release](https://github.com/seantomburke/sitemapper/releases/latest)
![scrutinizer](https://scrutinizer-ci.com/g/seantomburke/sitemapper/)

Parse through a sitemaps xml to get all the urls for your crawler.

Installation

``bash npm install sitemapper --save`

`Simple Example`

`javascript const Sitemapper = require('sitemapper');

const sitemap = new Sitemapper();

sitemap.fetch('https://wp.seantburke.com/sitemap.xml').then(function (sites) { console.log(sites); });`

`Examples`

`javascript import Sitemapper from 'sitemapper';

(async () => { const Google = new Sitemapper({ url: 'https://www.google.com/work/sitemap.xml', timeout: 15000, // 15 seconds });

try { const { sites } = await Google.fetch(); console.log(sites); } catch (error) { console.log(error); } })();

// or

const sitemapper = new Sitemapper(); sitemapper.timeout = 5000;

sitemapper .fetch('https://wp.seantburke.com/sitemap.xml') .then(({ url, sites }) => console.log(url:${url}, 'sites:', sites)) .catch((error) => console.log(error));`

`Options`

You can add options on the initial Sitemapper object when instantiating it.

- requestHeaders: (Object) - Additional Request Headers (e.g. User-Agent) -timeout: (Number) - Maximum timeout in ms for a single URL. Default: 15000 (15 seconds) -url: (String) - Sitemap URL to crawl -debug: (Boolean) - Enables/Disables debug console logging. Default: False -concurrency: (Number) - Sets the maximum number of concurrent sitemap crawling threads. Default: 10 -retries: (Number) - Sets the maximum number of retries to attempt in case of an error response (e.g. 404 or Timeout). Default: 0 -rejectUnauthorized: (Boolean) - If true, it will throw on invalid certificates, such as expired or self-signed ones. Default: True -lastmod: (Number) - Timestamp of the minimum lastmod value allowed for returned urls -proxyAgent: (HttpProxyAgent|HttpsProxyAgent) - instance of npm "hpagent" HttpProxyAgent or HttpsProxyAgent to be passed to npm "got" -exclusions: (Array) - Array of regex patterns to exclude URLs from being processed -fields: (Object) - An object of fields to be returned from the sitemap. Leaving a field out has the same effect as : false. If not specified sitemapper defaults to returning the 'classic' array of urls. Available fields: -loc: (Boolean) - The URL location of the page -sitemap: (Boolean) - The URL of the sitemap containing the URL, useful if was used in the sitemap -lastmod: (Boolean) - The date of last modification of the page -changefreq: (Boolean) - How frequently the page is likely to change -priority: (Boolean) - The priority of this URL relative to other URLs on your site -image:loc: (Boolean) - The URL location of the image (for image sitemaps) -image:title: (Boolean) - The title of the image (for image sitemaps) -image:caption: (Boolean) - The caption of the image (for image sitemaps) -video:title: (Boolean) - The title of the video (for video sitemaps) -video:description: (Boolean) - The description of the video (for video sitemaps) -video:thumbnail_loc: (Boolean) - The thumbnail URL of the video (for video sitemaps)

For Example:

`fields: { loc: true, lastmod: true, changefreq: true, priority: true, }`

Leaving a field out has the same effect as : false. If not specified sitemapper defaults to returning the 'classic' array of urls.

An example using all available options:

`javascript import { HttpsProxyAgent } from 'hpagent';