Semantic Search
Search by meaning instead of keywords. It's better than keyword search until it isn't — and the cases where it's worse are the ones your business cares about most.
The Technical Definition
Semantic search retrieves documents based on meaning rather than literal keyword matches. Each document and each query gets converted into a vector — a list of numbers (an “embedding”) that represents its meaning in a high-dimensional space. The search engine then returns documents whose vectors sit closest to the query’s vector by cosine similarity or a related distance measure.
The premise: a query for “cancellation policy” should return a document titled “How customers end their subscription” even though the two share zero keywords. Keyword search misses that. Semantic search catches it.
What This Actually Means for Your Business
Every vendor pitching AI-powered search, document Q&A, or “ask your data” is selling you semantic search underneath. It’s the retrieval engine behind almost every RAG system in production.
It works well when users phrase questions differently from how documents are written — which is most of the time. A customer asking “Why did my card get declined?” finds a document titled “Payment authorization failures.” An employee asking “How do I fire someone?” finds the involuntary termination policy. That’s real value, and it’s why semantic search took over enterprise search projects between 2023 and 2026.
Here’s what vendors don’t tell you. Semantic search is worse than keyword search on a specific class of query: exact-match lookups. Product SKUs. Contract numbers. Error codes. Part numbers. Account IDs. Names of obscure people or products that didn’t appear in the embedding model’s training data. When a field engineer types “ERR-4471-X” into your knowledge base, BM25 (the old keyword algorithm) returns the right document instantly. Semantic search returns a fuzzy approximation of documents that talk about errors generally — and the right one might be on page three.
This is why “vector-only” retrieval is a red flag. The serious 2026 RAG stacks all run hybrid: semantic plus keyword, fused together. If a vendor’s only retrieval mode is vector similarity, they’ve optimized the demo and ignored the queries that actually matter to your operators.
The other failure mode: semantic search inherits the biases of the embedding model. If the model was trained mostly on consumer English, it does worse on legal language, clinical terminology, or industrial parts catalogs. The model literally doesn’t have a strong sense of what “Section 7.3(b) indemnification carveout” means in a way it can locate in vector space.
Reality Check
What the vendor says: “Our semantic search understands what users actually mean.”
What that means in practice: It does, for paraphrased natural-language queries. It doesn’t, for the SKU numbers, ticket IDs, and proper nouns that make up half of what your operators search for. You need both retrieval modes running together, or you’ll watch your support team go back to Ctrl+F.
What Operators Actually Do
Companies getting real value from semantic search treat it as one retrieval mode, not the retrieval mode. They run it alongside BM25 keyword search and fuse the result lists — that’s hybrid search, the default pattern in serious RAG systems.
They also evaluate retrieval quality on their own queries before deployment. A common test: pull 100 real questions from the support queue or sales rep Slack channel, hand-label which document should win for each, and measure whether the system actually returns it in the top three. Most teams skip this step and ship a system that fails silently — the AI generates a confident-sounding answer from the wrong source document, and nobody notices until a customer does.
The teams that do best also pick their embedding model deliberately. Generic embeddings (OpenAI’s text-embedding-3, Cohere’s embed-v3) work fine for general English. For specialized domains — legal, clinical, manufacturing parts catalogs — fine-tuned or domain-specific embeddings move retrieval quality more than any other lever.
The Questions to Ask
-
Is this vector-only or hybrid retrieval? If a vendor only runs semantic search without BM25 alongside it, ask what happens when an operator searches for a specific contract number or part SKU. The answer tells you how seriously they took the failure cases.
-
Which embedding model are you using, and was it trained on data like ours? Generic embeddings are fine for general business documents. They’re worse for specialized vocabulary. If your domain is technical, ask about fine-tuning.
-
How will we measure whether retrieval is actually finding the right documents? Without an evaluation set tied to your real queries, you’ll never know whether the AI is hallucinating from the wrong source. Who builds that test set, and how often is it rerun?