Bop, an ultra fast Boyer-Moore parser/matcher, optimized for string and buffer patterns (<= 255 bytes).
npm install bop



!NODE VERSION




!NPM YEARLY


* __Bop__ is a __very fast Boyer-Moore parser__ for string or buffer patterns.
* It is optimized for using with pattern strings/buffers <= 255 bytes.
* It is __ideal__, for example, to parse streams like __multipart/form-data__ ones, in which pattern/boundary length < ~70 bytes).
> Given a m-length pattern and n-length data, and σ-length alphabet ( σ = 256 ):
- it performs the comparisons from right to left.
- preprocessing phase in __O(m+σ)__ time and space complexity.
- searching phase in __O(m*n)__ time complexity.
- __3*n__ text character comparisons in the worst case when searching for a non periodic pattern.
- __O(n/m)__ best performance.
> See __Lecroq__ for reference and also __Qap__, a QuickSearch parser.
bash
$ npm install bop [-g]
`> __require__:
`javascript
var Bop = require( 'bop' );
`$3
`javascript
$cd bop/
$npm test
`> __to execute a single test file simply do__:
`bash
$ node test/file-name.js
`$3
`bash
$ cd bop/
$ npm run bench
`$3
> Create an instance, using a pattern.
`javascript
Bop( Buffer pattern | String pattern )
// or
new Bop( Buffer pattern | String pattern )
`$3
> Arguments within [] are optional.
`javascript
// Change the pattern to search.
Bop#set( Buffer pattern | String pattern ) : Buffer/*
* Count matches, optionally starting from a particular index (default
* is 0). It returns an Array containing the number of matches, and the
* remaining bytes
*/
Bop#count( Buffer data [, Number start_from ] ) : Array
// Same as #count, but without counting overlapping sequences
Bop#scount( Buffer data [, Number start_from ] ) : Array
/*
* Count matches, and the max distance found between 2 matches,
* optionally starting from a particular index (default is 0).
* It returns an Array containing:
* - the number of matches
* - the max distance found between 2 matches, -1 otherwise
* - the distance from data index 0 to the first match, -1 otherwise
* - the distance from the end of the last match to the end of data,
* -1 otherwise
*
* NOTE: if 0 or only 1 occurrence was found, the max distance will
* be -1 and then the resulting array will be respectvely:
* - [0, -1, -1, -1 ]
* - [1, -1, .., .. ]
*/
Bop#dist( Buffer data [, Number start_from ] ) : Array
// same as #dist, but without counting overlapping sequences
Bop#sdist( Buffer data [, Number start_from ] ) : Array
/*
* Collect all indexes of pattern occurrences.
*
* As options you can:
*
* - start parsing from a particular index
* - limit the number of results to parse
* - fill your array with resulting indexes.
*
* NOTE: use Buffers when possible (faster).
*
*/
Bop#parse( Buffer data | String data [, Number start_from [, Number limit_results [, Array my_array ] ] ] ) : Array
/*
* Strict parse, it's the same as parse, without collecting
* any overlapping sequences.
*
* Example with CRLF sequence:
*
* - bop pattern is set to: "\r\n\r\n" (CR LF CR LF)
* - data to parse is: "\r\n\r\n\r\n\r\n\r\n\r\n" (CR LF CR LF CR LF CR LF )
*
* - with Bop.parse( data ) we get 3 indexes as results: [0, 2, 4]
*
* 0 1 2 3 4 5 6 7
* -----------------------
* p: CR LF CR LF
* d: CR LF CR LF CR LF CR LF
*
* p: ----> CR LF CR LF
* d: CR LF CR LF CR LF CR LF
*
* p: ----------> CR LF CR LF
* d: CR LF CR LF CR LF CR LF
*
* - with Bop.sparse( data ) we get only 2 results: [0, 4]
*
* 0 1 2 3 4 5 6 7
* -----------------------
* p: CR LF CR LF
* d: CR LF CR LF CR LF CR LF
*
* p: ----------> CR LF CR LF
* d: CR LF CR LF CR LF CR LF
*
*/
Bop#sparse( Buffer data | String data [, Number start_from [, Number limit_results [, Array my_array ] ] ] ) : Array
`$3
`javascript
var Bop = require( 'bop' )
, pattern = 'hellofolks\r\n'
, somedata = 'hehehe' + pattern +'eheheh' + pattern
, bop = Bop( pattern )
// parse data from beginning
, results = bop.parse( somedata )
;`
> See __examples__.
#### Benchmark for a short pattern ( length <= 255 bytes )
> Parser uses 3 Buffers 256-bytes long to build shifting tables, then:
> - Pattern parsing / table creation space and time complexity is O(σ).
> - Very low memory footprint.
> - Ultra fast to preprocess pattern ( = tables creation ).
`bash
$ node bench/small-pattern-data-rate
`for default it:
> - uses a pattern string of 57 bytes/chars.
> - builds a data buffer of 700 MB in memory.
> - uses a redundancy/distance factor for pattern strings equal to 2. The bigger the value,
the lesser are occurrences of pattern string into the text buffer.
Custom Usage:
`bash
# with [testBufferSizeInMB] [distanceFactor] [aStringPattern]
$ node bench/small-pattern-data-rate.js 700 4 "that'sallfolks"
`
#### Benchmark for a big pattern ( length > 255 bytes )
> Parser uses 3 arrays to build shifting tables for big patterns, then:
> - there will be an high memory consumption, due to the use of arrays.
> - it will take a long time to preprocess pattern ( = tables creation ).
`bash
$ node bench/big-pattern-data-rate
``> - it uses a very big pattern ( 20 MBytes ).
> - it builds a data buffer of 300 MBytes, copying the same pattern 12 times.
See __bench__ dir.
> Copyright (c) 2013-present < Guglielmo Ferri : 44gatti@gmail.com >
> Permission is hereby granted, free of charge, to any person obtaining
> a copy of this software and associated documentation files (the
> 'Software'), to deal in the Software without restriction, including
> without limitation the rights to use, copy, modify, merge, publish,
> distribute, sublicense, and/or sell copies of the Software, and to
> permit persons to whom the Software is furnished to do so, subject to
> the following conditions:
> __The above copyright notice and this permission notice shall be
> included in all copies or substantial portions of the Software.__
> THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.