Grid Utilization Virgilante
npm install guvguv
===
guv, aka Grid Utilization Vigilante, is a governor for your (Heroku) workers:
It automatically scales the numbers of workers based on number of pending jobs in a (RabbitMQ) queue.
> Variable loads? Don't know how many servers you need? Woken up just to start more servers?
> Let robots do the tedious work for you!
The number of workers is calculated to attempt that all jobs are completed within a specified deadline (in seconds),
that you decide as your desired quality-of-service for your users.
The scaling is based on estimates of the job processing time (mean, variance), which you can calculate from metrics.
guv is written in Node.js, but can be used with workers in any programming language.
guv is free and open source software under the MIT license
In production
* Supports RabbitMQ messaging system and Heroku workers
* Uses simple proportional algorithm for scaling to maintain a quality-of-service deadline
* Optional metric reporting to statuspage.io and New Relic
* Used in production at The Grid since June 2015
(with MsgFlo), performing over 1 million jobs per week
Install as NPM dependency
npm install --save guv
Add it to your Procfile
echo "guv: node node_modules/.bin/guv" >> Procfile
Configure an Heroku API key to use. Get it
heroku config:set HEROKU_API_KEY=heroku auth:token
Configure RabbitMQ instance to use. It must have the management plugin installed and configured.
heroku config:set GUV_BROKER=amqp://[user:pass]@example.net/instance
Note: If you use CloudAMQP, guv will automatically respect the CLOUDAMQP_URL envvar. No config needed.
For guv own configuration we also recommend using an envvar.
This allows you to change the configuration without redeploying.
See below for details on the configuration format.
heroku config:set GUV_CONFIG="cat autoscale.guv.yaml"
To verify that guv is running and working, check its log.
heroku logs --app myapp --ps guv
The configuration format for guv is based specified in YAML.
Since YAML is a superset of JSON, you can also use JSON.
guv configuration files by convention use the extension .guv.yaml,
for instance autoscale.guv.yaml or myproject.guv.yaml.
One guv instance can handle multiple worker roles.
Each role has an associated queue, worker and scaling configuration - specified as variables.
# comment
myrole:
variable1: value1
otherrole:
variable2: value2
The special role name * is used for global, application-wide settings.
Each of the individual roles will inherit this configuration if they do not override it.
# Heroku app is my-heroku-app, defaults to using a minimum of 5 workers, maximum of 50
'*': {min: 5, max: 50, app: my-heroku-app}
# uses only defaults
imageprocessing: {}
# except for text processing
textprocessing:
max: 10
Different app keys per role is supported, for services spanning multiple Heroku apps.
guv attempts to scale workers to be within a deadline, based on estimates of processing time.
To let it do a good job you should always specify the deadline, and mean processing time.
# times are in seconds
textprocessing:
deadline: 100
processing: 30
You can also specify the variance, as 1 standard deviation
# 68% of jobs complete within +- 3 seconds
textprocessing:
deadline: 100
processing: 30
stddev: 3
The name of the worker and queue defaults to the role name, but can be overridden.
# will use worker=colorextract and queue=colorextract
colorextract: {}
# explicitly specifying
histogram:
queue: 'hist.INPUT'
worker: processhistograms
For list of all supported configuration variables see the config format schemas.
guv can report errors, and metrics about how workers are being scaled to New Relic Insights.
To enable, setup a newrelic.js configuration
in the application that runs guv.
guv will one events of type GuvScaled per configured role/queue, with payload:
role: 'workerA' # guv role this event is for
app: 'imgflo' # Heroku app name
jobs: 142 # current jobs in queue
workers: 7 # new value for number of workers
fillrate: 2.1 # new jobs per second
drainrate: 1.7 # jobs completed per second
consumers: 3 # number of workers actually consuming from queue. Changes will trail workers
guv can report metrics about in-flight jobs to your statuspage.io.
See status.thegrid.io for an example.
Set the API key as an environment variable
export STATUSPAGE_API_TOKEN=mytoken
And configure in your guv.yaml file:
'*':
statuspage: 'my-statuspage-id'
workerA:
metric: 'my-statuspage-metric'
How to make the most out of guv.
guv config. You can use theguv-update-jobstats to update your configuration, given a set of measurements of processing time.To measure boot times, you can use the guv-heroku-workerstats tool.
If a single input queue is problematic, split up into multiple worker roles.
Or if the CPU/disk/network usage of processing of one queue is affecting processing of another queue too much.
There are currently no plans to consider multiple queues per role/worker when scaling.
As a rough guideline, job processing should ideally be on the order of 1-10 seconds.
prefetch, which is how many messages (jobs) a consumer accepts at the same time.For loads which have significant time spend on I/O this can increase efficiency at lot.
Communicating with external network services, or reading/writing files from disk.
For a mixed CPU/IO-bound load a prefetch of around 2*cpucores is a good baseline.
For primarily networked IO, try 10*cpucores.
Make sure to specify concurrency in your guv config, and that the workers have enough memory.