While dense retrievers, powered by sophisticated embedding models, are adept at understanding semantic relationships and user intent, they can sometimes miss the mark on queries demanding exact lexical matches. For instance, a model might grasp the general topic of a query but fail to prioritize documents containing a very specific, yet less common, product code or technical term mentioned by the user. Conversely, traditional sparse retrievers like BM25 excel at finding these exact terms but lack the broader contextual understanding to retrieve semantically similar but lexically different content. They operate on keyword matching, which, while precise, can be brittle when faced with synonyms, paraphrases, or complex natural language queries.
Hybrid search offers a pragmatic and often highly effective solution by combining the strengths of both dense and sparse retrieval techniques. The objective is to create a retrieval pipeline that is more comprehensive and reliable, capturing both the semantic depth offered by embeddings and the keyword precision of sparse methods. This synergistic approach helps ensure that the context provided to the generator model is as relevant and complete as possible.
Combining dense and sparse retrievers brings several advantages to your RAG system, significantly enhancing the quality of the information pipeline:
Consider a scenario in a financial RAG application. A user might ask, "What are the implications of regulation XYZ on Q3 earnings for tech companies?" A dense retriever could find documents discussing earnings and tech companies generally. A sparse retriever would ensure documents explicitly mentioning "regulation XYZ" are surfaced. A hybrid approach would ideally prioritize documents that satisfy both aspects of the query.
The most prevalent method for implementing hybrid search is score fusion (also known as late fusion). In this setup, both the dense and sparse retrievers process the input query independently. Each produces a list of candidate documents along with their respective relevance scores. The core challenge then lies in intelligently merging these two sets of results into a single, re-ranked list.
A common architecture for hybrid search involves parallel retrieval followed by a fusion step to combine and re-rank results.
Several techniques exist for fusing these scores:
Weighted Sum: This is a straightforward approach where the final score for a document d is a weighted combination of its normalized scores from the dense retriever (Sdense(d)) and the sparse retriever (Ssparse(d)).
Shybrid(d)=wdense⋅norm(Sdense(d))+wsparse⋅norm(Ssparse(d))Here, wdense and wsparse are weights that sum to 1 (e.g., wdense=0.6,wsparse=0.4), reflecting the perceived importance of each retriever. The function norm() represents score normalization, a significant step discussed later. The choice of weights is often empirical and may require tuning based on your specific dataset and query patterns.
Reciprocal Rank Fusion (RRF): RRF offers an elegant way to combine rankings without needing to worry too much about the absolute score values or their distributions, which can vary wildly between different retrieval systems. For each document d, its RRF score is calculated by summing the reciprocal of its rank in each retriever's result list.
RRFScore(d)=i∈Retrievers∑k+ranki(d)1If a document is not found by a retriever, its rank for that retriever can be considered infinite (or practically, a very large number), making its contribution to the sum negligible. The constant k (commonly 60) helps to down-weight the influence of documents that are ranked very highly by only one retriever but poorly by others. Documents consistently ranked well across multiple systems receive higher RRF scores.
Two-Stage Retrieval (Cascade): Another approach involves a cascaded, or two-stage, process.
The choice of fusion strategy often depends on the characteristics of your data, the types of queries you expect, and the computational resources available.
Successfully implementing hybrid search requires attention to a few important details:
Sparse retrievers (like BM25) and dense retrievers (cosine similarity of embeddings) produce scores on different scales and with different distributions. BM25 scores can range widely, while cosine similarity is typically bounded between -1 and 1 (or 0 and 1 for positive embeddings). Directly adding these scores in a weighted sum without normalization can lead to one retriever's scores dominating the other's, irrespective of the chosen weights.
Common normalization techniques include:
The Smin, Smax, μ, and σ parameters for normalization should ideally be estimated from a representative set of query results for each retriever.
If using a weighted sum, determining the optimal weights (wdense and wsparse) is significant.
For most production RAG systems, starting with RRF or a well-tuned static weighted sum provides a strong baseline.
rank_bm25
in Python.all-MiniLM-L6-v2
, multi-qa-mpnet-base-dot-v1
) to more powerful proprietary models or models you've fine-tuned on your specific domain (as discussed in the section on "Domain-Specific Fine-tuning of Embedding Models").When evaluating a hybrid search setup, consider:
While powerful, hybrid search introduces some considerations:
Despite these challenges, the benefits in retrieval quality often make hybrid search a worthwhile investment for production RAG systems that demand high accuracy and adaptability. By combining the lexical precision of sparse search with the semantic understanding of dense search, you create a more resilient and effective foundation for your generator model.
Was this section helpful?
© 2025 ApX Machine Learning