chore(retrievers): replace package rank_bm25 with bm25s
#456
+25
−7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Since rank_bm25 is no longer actively maintained and relevant fixes have not been merged, this PR replaces it with bm25s.
This pull request updates the
BM25Retrieverimplementation in libs/community/langchain_community/retrievers/bm25.py to use thebm25spackage instead ofrank_bm25, and refactors the code to work with the new API.Current updated code switched from
rank_bm25.BM25Okapi()tobm25s.BM25(method="atire", idf_method="lucene"),method="atire"is standard Okapi BM25 tf / length normalization (behavior aligned withrank_bm25.BM25Okapi) andidf_method="lucene"uses a Lucene-style IDF that avoids negative values. More different method / idf_method combination setup see variants.It also improves
_get_relevant_documentsby dynamically forwarding only the supported keyword arguments to theretrievemethod of the new BM25 implementation, enabling advanced options ofbm25s.BM25.retrieve()such asbackend_selectionandn_threads.Relevant Issue
no yet
Dependencies Change
wont, not core dependency set
Backwards Compatible
Not fully backwards compatible, but the impact should be limited.
The constructor parameters of
rank_bm25.BM25Okapiandbm25s.BM25do not overlap. If users were passing parameters intended forrank_bm25.BM25Okapiviabm25_paramsinBM25Retriever.from_documents()orBM25Retriever.from_texts(), those parameters will now cause errors because they are not valid forbm25s.BM25.Reference Version