Hybrid Search
Keyword search and semantic search fused together. The default 2026 retrieval pattern in any serious RAG system. If your vendor only runs vector search, that's a red flag.
The Technical Definition
Hybrid search combines two retrieval methods on the same query and merges the results. The first method is lexical — typically BM25, the keyword-ranking algorithm that’s powered enterprise search since the 1990s. The second is semantic — vector similarity over embeddings, the method behind every “AI search” pitch since 2023.
Each method scores documents independently. The two ranked lists are then fused using an algorithm like Reciprocal Rank Fusion (RRF) or a weighted sum, producing a single ranked list that catches what either method alone would miss.
What This Actually Means for Your Business
The pitch you’ve been getting since 2023 is that semantic search replaces keyword search. It doesn’t. It complements it. By 2026, the engineering teams running serious RAG systems in production — at companies like Bloomberg, Klarna, and most of the financial services firms with public AI deployments — almost all run hybrid retrieval. Vector-only is a thing you build in a hackathon. It’s not what you ship to operators.
The reason is simple. Semantic search wins when users phrase queries in natural language and the relevant document uses different vocabulary. Keyword search wins when users search for specific identifiers — part numbers, contract IDs, error codes, proper nouns — that the embedding model doesn’t have strong representations for. Production search systems see both query types every day. Optimizing for one and ignoring the other ships a product that fails on half the queries that matter.
There’s a vendor pattern worth flagging. If a vendor’s pitch leans hard on “we use AI-native vector search” without mentioning BM25 or hybrid retrieval, they’ve usually built a thin wrapper over a vector database (Pinecone, Weaviate, Qdrant) and called it a search engine. Ask them what their system does when an operator types in a part number that wasn’t in the training data for the embedding model. If they don’t have a coherent answer, you’re looking at a demo, not a deployment.
The cost of hybrid is small. Both retrieval methods run on the same indexed corpus. The keyword index is cheap to maintain. The fusion step adds milliseconds, not seconds. The quality lift on real queries is meaningful — most published benchmarks show 10–20% improvement in retrieval recall over vector-only on enterprise document sets.
Reality Check
What the vendor says: “We use vector search to find what users actually mean, not just what they typed.”
What that means in practice: Their system handles paraphrased queries well and breaks on specific identifiers. The first time a sales rep searches for a contract number and gets the wrong customer, you’ll wish they’d run BM25 alongside the vectors.
What Operators Actually Do
Companies running RAG in production almost always converge on the same architecture. BM25 retrieves the top 50–100 candidates by keyword match. Vector search retrieves the top 50–100 candidates by semantic similarity. The two lists get fused (usually RRF), and a re-ranker (Cohere Rerank or BGE) reorders the combined top candidates by deeper relevance scoring. The final top 5–10 go to the LLM as context.
That stack — hybrid retrieval plus re-ranking — is the de facto pattern in 2026. Teams that ship this perform better than teams running vector-only with the same model and the same corpus. The difference is the retrieval pipeline, not the LLM.
The other operator move: they evaluate hybrid weighting on their own queries. The right ratio between BM25 and vector scores depends on the query mix. A legal team with lots of citation lookups weights BM25 higher. A customer support team handling paraphrased natural-language tickets weights vector higher. The default 50/50 fusion is a starting point, not an answer.
The Questions to Ask
-
Is your retrieval hybrid or vector-only? If the answer is vector-only, ask why. The answer is almost never good. Vector-only systems fail on identifier queries that operators type every day.
-
How is the fusion tuned for our query mix? The right balance between keyword and semantic depends on what your users actually search for. Has anyone measured your query patterns, or is the system shipped with default weights?
-
What’s the re-ranking step on top of retrieval? Hybrid retrieval gets you the right candidates in the top 50. A re-ranker reorders those into the right top 5. Without re-ranking, the LLM gets noisier context and produces worse answers.