Glossary / Models & Architecture

RAG (Retrieval-Augmented Generation)

The pattern behind every 'AI that knows your company's data' pitch. Here's when it works and what it actually costs to maintain.

Models & Architecture

The Technical Definition

RAG (Retrieval-Augmented Generation) is an architecture that combines retrieval and generation. Instead of relying only on an LLM’s training data, RAG first searches a database of your company’s documents, then feeds relevant passages into the model’s context window before generating an answer. The system retrieves what’s relevant, then generates a response grounded in that retrieval.

What This Actually Means for Your Business

RAG solves a real problem: LLMs have knowledge cutoffs and don’t know your proprietary data. Every vendor pitching AI-powered customer service, internal search, or document Q&A is essentially selling you RAG.

But here’s what they don’t tell you: RAG doesn’t make your data “AI-ready” by magic. Your documents need to be structured, cleaned, and tagged. If your knowledge base is outdated or contradictory (which it usually is after five years of departmental silos), the AI will confidently cite both versions and let you figure out which is correct.

The real operational cost hits during maintenance. Your data changes. Policies get updated. Products launch and are discontinued. Someone’s old memo stays indexed and pops up in search results next to current guidance. You now own a data problem disguised as an AI problem. You’ll need someone—probably data engineering or knowledge management—to continuously curate what gets indexed. That’s not a button. That’s a job.

Latency also matters. RAG systems have to search before they respond. That extra 500ms of retrieval overhead might be fine for an internal search tool. For a customer-facing chatbot, it can feel slow. And if your retrieval is bad—pulling irrelevant documents instead of the right ones—the AI has nothing useful to work with, and you look incompetent to your customers.

Reality Check

What the vendor says: “Use RAG to instantly give your AI access to your company’s knowledge base.”

What that means in practice: You now need to decide what goes in the retrieval database, how often to update it, what format it needs to be in, how to handle conflicting information, and how to monitor whether the retrieval is actually pulling the right documents.

What Operators Actually Do

The companies getting real value from RAG treat it as a content strategy problem, not a technology problem. They start by auditing what data is actually worth retrieving (spoiler: not all of it). They establish ownership: who maintains the knowledge base? When does it get updated? How do you detect stale information?

Smart teams also add a layer of human review in high-stakes scenarios. A financial services company using RAG for customer inquiries doesn’t trust the system to answer alone—they use it to speed up the research phase that a human still owns. The AI retrieves candidates; the operator verifies and responds.

The other pattern: RAG works best when your data is already well-organized and regularly maintained. If your knowledge base is chaotic, RAG will make that chaos scalable. Clean it first.

The Questions to Ask

  1. What’s actually going in the retrieval database? Not everything in your company’s files should be indexed. Who decides, and how often is that decision reviewed?

  2. How will you know if retrieval is pulling the wrong documents? What’s your quality check mechanism? Who monitors whether customers (or employees) are getting grounded answers?

  3. What’s the plan when your data contradicts itself? You have five versions of the customer onboarding flow in your knowledge base. How does the system choose, and who catches the problem?

Get the next Brief

One operator. Every other Wednesday.

Plus the AI Glossary and the Failure Museum.
Real names. Real numbers. Honest analysis.