crawler

> Web crawler based on Puppeteer

![node (scoped)](https://www.npmjs.com/package/@opd/crawler)
![npm (scoped)](https://www.npmjs.com/package/@opd/crawler)
!build
![Build Status](https://dev.azure.com/kagawagao/OPD/_build/latest?definitionId=3&branchName=master)
![Coverage Status](https://coveralls.io/github/open-data-plan/crawler?branch=master)

Install

``bash npm install @opd/crawler`

`Use`

`js import Crawler from '@opd/crawler' // or commonjs const Crawler = require('@opd/crawler').default

const crawler = new Crawler(options)`

`API`

`$3`

create crawler instance

options: crawler instance config

- parallel: maximum number of crawlers, default is 5-pageEvaluate: evaluate function on current page, see Puppeteer, cannot support extra args now

`$3`

launch browser use puppeteer.launch

`$3`

add urls to crawler queue

> Note: check url strictly, means url must start with https?

`$3`

start crawl page, if urls is presented, will call crawler.queue firstly.

`js const result = await crawler.start() console.log(result)

// [ // { // url, // page url // result // crawled result // } // ]`

> Note: if you call start before launch, browser` will also be launched, but with no extra launch options

crawler

> Web crawler based on Puppeteer

Install

``bash npm install @opd/crawler`

`Use`

`js import Crawler from '@opd/crawler' // or commonjs const Crawler = require('@opd/crawler').default

const crawler = new Crawler(options)`

`API`

`$3`

create crawler instance

options: crawler instance config

- parallel: maximum number of crawlers, default is 5-pageEvaluate: evaluate function on current page, see Puppeteer, cannot support extra args now

`$3`

launch browser use puppeteer.launch

`$3`

add urls to crawler queue

> Note: check url strictly, means url must start with https?

`$3`

start crawl page, if urls is presented, will call crawler.queue firstly.

`js const result = await crawler.start() console.log(result)

// [ // { // url, // page url // result // crawled result // } // ]`

> Note: if you call start before launch, browser` will also be launched, but with no extra launch options