RAG Chunking Strategies

Chunking means splitting a large document into smaller pieces before storing it for search. This step happens early in the RAG pipeline, and getting it wrong quietly damages every answer that comes later, often without anyone realizing why the answers feel slightly off.

Why Whole Documents Do Not Work Well

A hundred-page manual holds one small paragraph about battery replacement. Searching against the whole manual as one block buries that paragraph inside unrelated content. Breaking the manual into smaller pieces lets the search step find that one paragraph directly, without dragging along a hundred pages of unrelated noise.

A Filing Cabinet Analogy

A messy filing cabinet holds one giant folder with every paper crammed inside. Finding a single receipt takes forever. An organized cabinet holds many small labeled folders instead. Chunking builds those small labeled folders out of one giant document, so the right piece of information becomes easy to locate later.

One Big Document Becomes Many Small Chunks

Common Chunking Methods

Method	How It Splits Text	Best Fit
Fixed size	Cuts text every set number of words	Quick projects, simple documents
Sentence based	Splits at sentence boundaries	Keeps sentences intact and readable
Paragraph based	Splits at natural paragraph breaks	Documents with clear paragraph structure
Semantic	Splits where the topic actually shifts	Long, dense documents covering many topics

The Overlap Trick

Cutting a document in the exact middle of an idea creates broken chunks. Adding a small overlap between neighboring chunks keeps important context intact across the cut. Picture cutting a rope but leaving a short overlapping strand at each cut point, so nothing important falls through the crack.

Overlap Between Neighboring Chunks

Chunk Size Trade-Off

Chunk Size	Effect
Very small chunks	Precise matches, but missing surrounding context
Very large chunks	Rich context, but harder to match precisely and slower to process
Balanced medium chunks	Good mix of precision and context for most use cases

A Practical Example

A company splits its employee handbook into sections by heading, such as "Vacation Policy" and "Sick Leave Policy." Each section becomes its own chunk. A question about sick days now matches the sick leave chunk directly, instead of pulling in unrelated vacation rules that would only confuse the final answer.

A Second Example: A News Archive

A news organization chunks each article by paragraph instead of storing whole articles as single blocks. A reader asks about one specific detail buried deep in a long article. Paragraph-level chunking finds that exact paragraph directly, rather than forcing the model to sift through an entire article to find one small fact.

Key Takeaway for Beginners

Good chunking makes search results sharper and answers more accurate. This single step often causes more real-world RAG problems than any other part of the pipeline, so it deserves careful attention during actual project work, well before any time gets spent tuning fancier parts of the system.

Previous lesson

Back to course

Next lesson