Make sure g++, make, libboost-all-dev, gperf, libevent-dev and uuid-dev have been installed.
``sh $ wget https://launchpad.net/gearmand/1.2/1.1.12/+download/gearmand-1.1.12.tar.gz | tar xvf $ cd gearmand-1.1.12 $ ./configure $ make $ make install `
Generate new app from templates by only one command.`sh $ mkdir demo $ cd demo $ floodesh-cli init # all necessary files will be generated in your directory. `
Please make sure you have /data/tests and /var/log/bda/tests created and have Write access before use, you can customize path by modifying logBaseDir in config/[env]/index.js
Context
A context instance is a kind of Finite-State Machine implemented by Generators which is ECMAScript 6 feature. By context, we can access almost all fields in response and request, like:`javascript worker.use( (ctx,next) => { ctx.content = ctx.body.toString(); // totally do not care about the body return next(); }) `
Request
$3
*
Get querystring.
$3
*
Check if the request is idempotent.
$3
*
Get the search string. It includes the leading "?" compare to querystring.
$3
*
Get request method.
$3
* key * Return:
Get value by key in response headers
$3
* types |Array\> * Return: |false|null
Check if the incoming response contains the "Content-Type" header field, and it contains any of the give mime
types.If there is no response body, null is returned.If there is no content type, false is returned.Otherwise, it returns the first type that matches.
Other
$3
*
Array of generated tasks. A task is an object consists of Options and
next, next is a function name in your spider you want to call in next task , Supported format:` [{ opt:, next: }] `
$3
* retry: Retry times at worker side, default 3 * logBaseDir: Directory where project's log directory exists, default '/var/log/bda/' * parsers: Array of parsers, which are file names in parser directory without '.js'
bottleneck
* defaultCfg * rate: Number of milliseconds to delay between each requests * concurrent: Size of the worker pool * priorityRange: Range of acceptable priorities starting from 0, default 3 * defaultPriority: priority of the request * homogenous:true
* jobs: Max number of jobs per worker, default 1 * srvQueueSize: Max number of jobs queued to gearman server, default 1000 * mongodb: Mongodb Connection String URI, * worker: * servers: Array of server list, server should be an object like {'host':'gearman-server'} * client: * servers: Same as above, * loadBalancing: 'RoundRobin' * retry: Retry times at client side
* repo: [redis|mongodb] default use memory as repo. * removeKeys:Array of keys in query string to skip when test if an url is seen
service
* server: Remote service origin
Error handling
Just throw an Error in a synced middleware, otherwise return a rejected Promise. err.stack will be logged and err.code will be sent to client to persist.`javascript // sync module.exports = (ctx, next) => { // balabala throw new Error('crash here'); }
* mof-cheerio: A simple wrapper of Cheerio. * mof-charsetparser: Parse Charset in response headers. * mof-iconv: Encoding converter middleware using iconv or iconv-lite. * mof-request: A wrapper of Request.js, with some default options. * mof-bottleneck: A wrapper of bottleneckp which is asynchronous rate limiter with priority. * mof-proxy: With power to acquire proxy from a proxy service. * mof-whacko: A wrapper of whacko, which is a fork of cheerio that uses parse5 as an underlying platform. * mof-statsd: A wrapper of statsd-client, which enables you send metrics to a statsd daemon. * mof-uarotate: Rotate User-Agent header automatically from a local file. * mof-seenreq: Only make sense in flowesh, a simple wrapper of seenreq. * mof-validbody: Check if a response body meets a pattern, for instance, a html body should start with < and json body {`. * mof-statuscode: Status code detector. * mof-genestamp: Prints gene and url of a task, along with # of new tasks and # of records.