crawler-user-agents

This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.

* NPM package:
* Go package:
* PyPi package:

Each pattern is a regular expression. It should work out-of-the-box wih your favorite regex library.

If you use this project in a commercial product, please sponsor it.

Install

$3

Download the crawler-user-agents.json file from this repository directly.

$3

crawler-user-agents is deployed on npmjs.com:

To use it using npm or yarn:

``sh npm install --save crawler-user-agents

`OR`


yarn add crawler-user-agents

In Node.js, you can require the package to get an array of crawler user agents.

`js const crawlers = require('crawler-user-agents'); console.log(crawlers);`

`$3`

Install with pip install crawler-user-agents

Then:

`python import crawleruseragents if crawleruseragents.is_crawler("Googlebot/"): # do something`

or:

`python import crawleruseragents indices = crawleruseragents.matching_crawlers("bingbot/2.0") print("crawlers' indices:", indices) print( "crawler's URL:", crawleruseragents.CRAWLER_USER_AGENTS_DATA[indices[0]]["url"] )`

Note that matching_crawlers is much slower than is_crawler, if the given User-Agent does indeed match any crawlers.

`$3`

Go: use this package, it provides global variableCrawlers (it is synchronized with crawler-user-agents.json), functionsIsCrawler and MatchingCrawlers.

Example of Go program:

`go package main

import ( "fmt"

"github.com/monperrus/crawler-user-agents" )

func main() { userAgent := "Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)"

isCrawler := agents.IsCrawler(userAgent) fmt.Println("isCrawler:", isCrawler)

indices := agents.MatchingCrawlers(userAgent) fmt.Println("crawlers' indices:", indices) fmt.Println("crawler's URL:", agents.Crawlers[indices[0]].URL) }`

Output:

`isCrawler: true crawlers' indices: [237] crawler' URL: https://discordapp.com``

Contributing

I do welcome additions contributed as pull requests.

The pull requests should:

* contain a single addition
* specify a discriminant relevant syntactic fragment (for example "totobot" and not "Mozilla/5 totobot v20131212.alpha1")
* contain the pattern (generic regular expression), the discovery date (year/month/day) and the official url of the robot
* result in a valid JSON file (don't forget the comma between items)

Example:

{
"pattern": "rogerbot",
"addition_date": "2014/02/28",
"url": "http://moz.com/help/pro/what-is-rogerbot-",
"instances" : ["rogerbot/2.3 example UA"]
}

License

The list is under a MIT License. The versions prior to Nov 7, 2016 were under a CC-SA license.

Related work

There are a few wrapper libraries that use this data to detect bots:

* Voight-Kampff (Ruby)
* isbot (Ruby)
* crawlers (Clojure)
* isBot (Node.JS)

Other systems for spotting robots, crawlers, and spiders that you may want to consider are:

* Crawler-Detect (PHP)
* BrowserDetector (PHP)
* browscap (JSON files)

crawler-user-agents

This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.

* NPM package:
* Go package:
* PyPi package:

Each pattern is a regular expression. It should work out-of-the-box wih your favorite regex library.

If you use this project in a commercial product, please sponsor it.

Install

$3

Download the crawler-user-agents.json file from this repository directly.

$3

crawler-user-agents is deployed on npmjs.com:

To use it using npm or yarn:

``sh npm install --save crawler-user-agents

`OR`


yarn add crawler-user-agents

In Node.js, you can require the package to get an array of crawler user agents.

`js const crawlers = require('crawler-user-agents'); console.log(crawlers);`

`$3`

Install with pip install crawler-user-agents

Then:

`python import crawleruseragents if crawleruseragents.is_crawler("Googlebot/"): # do something`

or:

Note that matching_crawlers is much slower than is_crawler, if the given User-Agent does indeed match any crawlers.

`$3`

Go: use this package, it provides global variableCrawlers (it is synchronized with crawler-user-agents.json), functionsIsCrawler and MatchingCrawlers.

Example of Go program:

`go package main

import ( "fmt"

"github.com/monperrus/crawler-user-agents" )

func main() { userAgent := "Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)"

isCrawler := agents.IsCrawler(userAgent) fmt.Println("isCrawler:", isCrawler)

indices := agents.MatchingCrawlers(userAgent) fmt.Println("crawlers' indices:", indices) fmt.Println("crawler's URL:", agents.Crawlers[indices[0]].URL) }`

Output:

`isCrawler: true crawlers' indices: [237] crawler' URL: https://discordapp.com``

Contributing

I do welcome additions contributed as pull requests.

The pull requests should:

Example:

{
"pattern": "rogerbot",
"addition_date": "2014/02/28",
"url": "http://moz.com/help/pro/what-is-rogerbot-",
"instances" : ["rogerbot/2.3 example UA"]
}

License

The list is under a MIT License. The versions prior to Nov 7, 2016 were under a CC-SA license.

Related work

There are a few wrapper libraries that use this data to detect bots:

* Voight-Kampff (Ruby)
* isbot (Ruby)
* crawlers (Clojure)
* isBot (Node.JS)

Other systems for spotting robots, crawlers, and spiders that you may want to consider are:

* Crawler-Detect (PHP)
* BrowserDetector (PHP)
* browscap (JSON files)