a simple node crawler
npm install simple-node-crawlernpm install simple-node-crawler
host - host constraint for the crawling.
patterns - if you want to crawl a specific path, you can specify the path name or leave it as ''; the pattern is the css patterns for the main body of the webpage, id/class/tag name are supported, if you need all the html body, you can specify 'body'.
usedb - if you want to use local file system, then set to false. If you have mongodb installed and want to use it, then set to true.
saveImage - whether to save images to local file system.
dbConnectionString - mongodb connection string, default to 'mongodb://localhost/test'
utf8 - whether need to convert to uft8. Default to true.
crawlerNumber - how many cralwer thread you want to have. Default to 5.