Chunking
Breaking your documents into smaller pieces for retrieval. The unsexy 80% of RAG quality. The biggest lever in your AI stack that most teams ignore.
The Technical Definition
Chunking is the process of splitting a document into smaller pieces before indexing it for retrieval. A 200-page contract doesn’t get embedded as a single vector. It gets split into chunks — passages of a few hundred to a few thousand characters — and each chunk gets its own embedding, its own row in the vector database, and its own chance to surface in a search result.
Chunk strategy decides three things: chunk size (how big each piece is), overlap (how much one chunk repeats the end of the previous chunk), and split boundary (where you cut — at fixed character counts, at sentences, at paragraphs, at semantic breaks, or at document structure like headings and sections).
What This Actually Means for Your Business
Chunking is the single biggest determinant of retrieval quality in a RAG system, and almost no vendor talks about it. Your model choice gets the marketing slide. Your vector database gets the architecture diagram. Your chunking strategy is buried in someone’s Python config file, and it’s controlling whether the AI returns useful answers or confidently wrong ones.
Here’s why it matters. When an operator asks “What’s our cancellation penalty for enterprise contracts signed before 2024?” the system has to find the chunk that contains that specific clause. If your chunks are too big — say, the entire contract section as one chunk — the embedding represents an averaged meaning across the whole section, and the specific clause gets diluted. The right document might not even rank in the top 10. If your chunks are too small — say, one sentence — you lose the context that makes the clause make sense, and the LLM generates an answer that’s technically grounded but practically useless.
The other failure mode is worse. Most teams ship with the default chunking strategy from a tutorial: split every 1000 characters, with 200-character overlap, ignoring document structure. This is the LangChain quickstart. It’s also why so many enterprise RAG systems quietly underperform. A clause that starts on character 980 and ends on character 1080 gets split across two chunks, and neither chunk ranks well for the query.
The companies getting real value from RAG made chunking a deliberate engineering decision. They split on semantic boundaries (paragraphs, sections, list items, table rows) instead of fixed character counts. They use document structure as the signal — headings define chunks, tables stay intact, contract clauses don’t get cut mid-sentence. They tune chunk size by document type. A regulatory filing chunks differently than a Slack archive.
Reality Check
What the vendor says: “We handle ingestion automatically — just point us at your documents.”
What that means in practice: They run a default chunker that doesn’t know your contracts have 17 different section formats, that your support tickets need different splits than your product docs, or that your tables shouldn’t be cut in half. The system works on the demo. It struggles on the real corpus.
What Operators Actually Do
Teams that take chunking seriously do three things. First, they look at retrieval failures. When the system returns the wrong document, they check whether the right chunk existed in the index at all. Half the time it didn’t, because chunking sliced through the relevant passage and neither half retained enough meaning to surface.
Second, they use structure-aware splitting. Markdown headings, HTML tags, contract section numbers, table rows — every document format has signals about where one idea ends and another begins. Splitting on those signals beats splitting on character count almost every time. Modern frameworks (LlamaIndex, Unstructured.io, custom parsers) make this practical.
Third, they tune by document class. Long-form regulatory text wants larger chunks (1500–2000 characters) with meaningful overlap. Q&A pairs and chat archives want one exchange per chunk. Tables stay whole. Code stays whole. The “one chunk size for everything” approach loses to a small amount of per-format tuning every time.
The Questions to Ask
-
What’s your chunking strategy, and was it tuned for our document types? If the answer is “default settings,” you have a quality problem you haven’t measured yet. Different document classes need different splits.
-
How are tables, code blocks, and structured content handled? The default chunkers cut tables in half. If your knowledge base has structured data — pricing tables, parts catalogs, contract clauses — ask how those stay intact through ingestion.
-
When retrieval fails, can you trace it back to chunking? Half of bad RAG answers come from the right document being in the corpus but the wrong chunk surfacing. Is there a way to inspect which chunk was retrieved and whether a better one existed?