Express middleware that returns the resulting html after executing javascript, allowing crawlers to read on the page
npm install googlebotThis module implements a middleware for express that allows to render a full Html/JS/Css version of a
page when JS is not available in the client and the site relies heavily on it to render the site, like
when using ember/angular/jquery/backbone; I needed to code this for work to be able to deliver a
SEO friendly version of the website to the Google Crawler, and found no solution available.
this must be done server side,Google will replace the hashbang (or the url) with ?_escaped_fragment_= and append the rest of the url there
and expects a different, completely rendered version of the site, the middleware will realize when the request
has this and instead of retrieving the normal response it will return the full rendered version that phantomJS
creates.
The url fragment that triggers the rendering in phantom can be customized, and something can be appended to it
to create conditionals that will restrict crawling or hide certain parts from Google, this too can be
customized.
I tried to make it as custom as possible to create different uses withouth having to modify the core files,
so you can even serve static files from a different server if it was
the case; since this is technically a proxy you can use it for many things. Pull request are welcome and
encouraged tho.
cd ~/
mkdir phantom
cd phantom
wget https://phantomjs.googlecode.com/files/phantomjs-1.9.2-linux-x86_64.tar.bz2
sudo mv phantomjs-1.9.2-linux-x86_64.tar.bz2 /usr/local/share/.
cd /usr/local/share
sudo tar -xf phantomjs-1.9.2-linux-x86_64.tar.bz2
sudo ln -s /usr/local/share/phantomjs-1.9.2-linux-x86_64 /usr/local/share/phantomjs
sudo ln -s /usr/local/share/phantomjs/bin/phantomjs /usr/local/bin/phantomjs
There's probably no point on installing globally, but if you wish to it will install
npm install --save googlebot
To install locally, or add googlebot in your package.json
app.use googlebot {option:value}
if javascript
app.use(googlebot({option:'value', option2:'othervalue'}));
More complete example
googlebot = require 'googlebot'
express = require 'express'
app = module.exports = express()
app.configure ->
app.set 'views', __dirname + '/views'
app.use googlebot {delay: 5000, canonical: 'http://dvidsilva.com'}
app.use (req, res) ->
res.render 'app/index'
app.startServer = (port) ->
app.listen port, ->
console.log 'Express server started on port %d in %s mode!',
port, app.settings.env
default: true
whether or not to respond to google requests or request that meet a particular requirement(someday)
default: '?_escaped_fragment_='
Which string in the url triggers the phantom rendering instead
default: '&phantom=true'
Add something to the new request, I use to prevent Google from seeing certain stuff
default: 1000
Number of miliseconds to wait for the page to render before sending the request
default: 'http'
In case you want to redirect the request to a different one
default: undefined
In case you want to redirect phantomJS requests to a different host even, where you store the static
files or something
default: undefined
ref
specify the preferred host for google to associate the page resulting, a header will be sent to tell Google
which url you rather show to the people searching for you
default: function(){};
(currently not supported) the idea is to allow you to add more client side javascript that phantomJS will
execute before returning the results to Google withouth having to modify the module. An example could be that
you don't want to have empty alt tags in your images, because is bad SEO so you can do
$('img').each(function(){ $(this).attr('alt',$(this).attr('src')); });