A lightweight search engine based on Okapi BM25 algorithm
npm install bm25-seTo use this search engine, you need to have Node.js installed on your machine. You can clone this repository and run npm install to install any dependencies required.
``js`
npm i bm25-se
`js
const { SearchEnginee, TokenizerEn } = require('bm25-se')
// create a new search engine instance
const searchEngine = new SearchEnginee({ tokenizer: new TokenizerEn() })
// add documents to the index
searchEngine.addDocument("this is an example document")
searchEngine.addDocument("this is another document")
searchEngine.addDocument("yet document")
// search for documents that contain the word "example"
const searchResults = searchEngine.search("example document")
console.log(searchResults) // should output a list of documents that match the query, ordered by relevance
`
!searchResults
js
git clone https://github.com/Selenium39/bm25-se.git
cd bm25-se
npm install
npm run web
``!demo
#### SearchEnginee(options)
Creates a new instance of the search engine.
- options (optional): An object with the following properties:
- tokenizer (optional): A tokenizer object to use for tokenizing the documents. Defaults to new TokenizerEn().
- k1 (optional): A parameter for the BM25 scoring function. Defaults to 1.2.
- b (optional): A parameter for the BM25 scoring function. Defaults to 0.75.
#### reset()
Resets the search engine, removing all indexed documents and terms.
#### addDocument(document)
Indexes a new document in the search engine.
- document: A string representing the content of the document to index.
#### search(query)
Searches the indexed documents for the given query string and returns a list of matching documents.
- query: A string representing the search query.
#### TokenizerEn
A tokenizer object for tokenizing English text.
#### TokenizerZh
A tokenizer object for tokenizing Chinese text.