Glossary / Data & Infrastructure

Embeddings

The numerical fingerprint of meaning. Every 'AI search' and 'AI that understands your data' pitch runs on this — and most vendors use the same off-the-shelf model you could buy yourself.

Data & Infrastructure

The Technical Definition

An embedding is a list of numbers — typically 768, 1536, or 3072 of them — that represents the meaning of a piece of text, an image, or any other input. The model converts “customer churn analysis” into a long vector of decimals, and converts “why are clients leaving” into a different but mathematically similar vector. Distance between vectors equals similarity in meaning.

This is how machines compare meaning instead of comparing keywords. It’s the math underneath semantic search, RAG, recommendations, clustering, and “AI that understands your documents.”

What This Actually Means for Your Business

Every vendor pitching AI search, AI knowledge management, or AI customer support is using embeddings under the hood. When they say their system “understands the intent behind your customer’s question,” what’s actually happening is: the customer’s question gets converted to an embedding, your knowledge base was already converted to embeddings, and the system returns the closest matches by mathematical distance.

That’s it. That’s the magic.

Here’s what vendors won’t volunteer: most of them are using the same three or four embedding models everyone else uses. OpenAI’s text-embedding-3, Cohere’s embed-v3, Voyage’s models, or an open-source option like BGE or E5. The model that turns your text into numbers is almost certainly not proprietary. It’s an API call costing fractions of a cent.

The differentiation isn’t the embedding model. It’s what you embed, how you chunk your documents before embedding them, how you store the vectors, and how you retrieve them. Those are content decisions and engineering decisions — not AI decisions.

The cost trap nobody warns you about: embeddings have to be regenerated whenever you change models, change chunking strategy, or update your documents. If you’ve embedded ten million chunks and the vendor announces a better model, re-embedding everything costs real money and real time. Some companies are sitting on embedding pipelines they’re afraid to upgrade because the migration would take weeks.

Reality Check

What the vendor says: “Our proprietary embedding model captures the unique semantics of your industry.”

What that means in practice: They’re calling OpenAI’s API like everyone else, possibly with a thin fine-tuning layer on top. The “industry-specific” part is usually prompt engineering and document curation. Ask them to name the base model. If they hedge, that’s your answer.

What Operators Actually Do

The teams getting real value from embeddings start with the unsexy questions. What documents are worth embedding in the first place? How do we chunk a 200-page policy manual so retrieval returns useful passages instead of fragments? Do we embed the document and the title separately? Do we embed metadata alongside content?

These decisions matter ten times more than which embedding model you pick. A great embedding model on badly chunked documents returns garbage. A mediocre embedding model on well-structured chunks returns useful results.

Smart teams also benchmark before they commit. They take 50 real queries from their actual users, run them through three different embedding models, and measure which one returns the right document in the top 5 results. The winner is rarely the most expensive one. Sometimes the cheapest open-source model wins for a specific domain because it was trained on similar text.

The Questions to Ask

  1. What embedding model is actually generating these vectors? If they can’t name it (text-embedding-3-large, embed-multilingual-v3, BGE-M3, etc.), they’re either hiding it or don’t know. Either is a problem.

  2. What’s the cost and time to re-embed everything if we switch models? The lock-in isn’t the model — it’s the millions of vectors already sitting in your database tied to a specific embedding format.

  3. How are documents chunked before embedding? Fixed-size chunks, semantic chunks, or sentence-level? This decision determines whether retrieval returns useful context or random fragments. It’s the most important call in the whole system, and it has nothing to do with AI.

Get the next Brief

One operator. Every other Wednesday.

Plus the AI Glossary and the Failure Museum.
Real names. Real numbers. Honest analysis.