Command Line tool that compares two text files using simhash
npm install node-simhashA simple command line tool for comparing text files using the simhash algorithm and contrasting with the jaccard index.

Near duplicate detection (moz.com)
npm install
npm link
``Or if you would like to install globally
``
npm install https://github.com/sjhorn/node-simhash -g
``Command line tool usage
Using node
``
simhash file1.txt file2.txtsimhash https://file.com/page1.html https://file.com/page2.html
``$3
``js
var simhash = require('node-simhash');simhash.compare(string1, string2);
```#### .compare(file1, file2)
Compare two text strings using both simhash and jaccard index
Count the binary ones in a number.
#### .shingles(string, words_per_single=2)
Convert string to set of shingles using the default of 2 words per shingle and tokenize using the natural libraries default tokenizer.
#### .jaccardIndex(string1, string2)
Compare two strings by tokeniseing and then compare the intersection of shingles to the union of shingles.
#### .createBinaryString(number)
Print a 32-bit number as a binary string of 32 characters
Convert a set of shingles to a set of crc-32 hashes.