Back to blog
AI Research #rag#chunking#metadata

RAG Chunking & Metadata in 2026: The Details That Decide Quality

Most RAG failures come from bad chunking and missing metadata. A practical guide to chunk strategy, filters, and retrieval evaluation.

18 min · January 24, 2026 · Updated January 27, 2026
Topic relevant background image

TL;DR

  • Chunk by meaning (headings/sections), not by tokens alone.
  • Metadata enables correct filtering and reduces irrelevant context.
  • Evaluate retrieval with expected sources, not just “good answers.”

Why Chunking and Metadata Decide RAG Quality

When RAG fails, it often looks like “the model hallucinated.”

But most production failures happen earlier:

  • the system retrieved the wrong chunk
  • the chunk was missing context (cut mid-procedure)
  • the chunk was correct but outside the user’s scope (tenant/permissions)
  • the right doc existed, but metadata filters couldn’t find it

Chunking and metadata define the retrieval contract: what the system is allowed to retrieve and how it explains why.


Chunking Strategies (What to Use and When)

StrategyHow it worksBest forCommon failure
Fixed-sizesplit every N tokens/charsquick baselinebreaks meaning
Section-basedsplit by headings/sectionsdocs with structureneeds clean markup
Recursivesplit by separators hierarchicallymixed docsneeds tuning
Semanticsplit by embedding boundariesdense textexpensive, unpredictable
Hierarchicalpreserve doc tree (H1/H2/H3)manuals, policiesimplementation complexity

Default recommendation

Start with section-based chunking when you have headings, and fall back to recursive chunking for messy sources.


Practical Chunking Rules (That Prevent Most Failures)

1) Keep chunks self-contained

If a chunk is missing the critical constraint line, the model will guess.

For procedures, a chunk should contain:

  • prerequisites
  • the steps
  • the edge cases (common failure modes)

2) Include heading and section context

Store the heading path as metadata and optionally prepend it to the chunk text:

“Billing → Refunds → Exceptions”

This improves retrieval precision and helps the model interpret meaning correctly.

3) Store source URL + timestamps

Every chunk should know:

  • where it came from (canonical URL)
  • when it was last updated

That enables citations and freshness-aware retrieval.

4) Add metadata for filtering

Minimum useful metadata:

  • product area
  • tenant ID (if multi-tenant)
  • visibility (public/internal)
  • doc type (policy/guide/faq)

Chunk Size: There Is No “Best” Number

Chunk size is a tradeoff:

Smaller chunksLarger chunks
higher precisionbetter context
easier rankingharder ranking (diluted similarity)
more retrieval callsfewer retrieval calls
risk of missing stepsrisk of including irrelevant text

The right way to choose is evaluation: measure recall@k and precision@k with a realistic question set.


Metadata: The Control Plane for Retrieval

Metadata is what lets you filter out irrelevant chunks before the model ever sees them.

A minimum metadata schema

FieldExampleWhy
doc_idstable UUIDupdates + de-dupe
source_urlcanonical URLcitations
title“Refund policy”relevance
section_path“Billing → Refunds”context
doc_type“policy / guide / faq”ranking/filtering
product_area“billing”scope
tenant_id“acme”isolation
visibility“public/internal”permissions
updated_atISO datefreshness

Filters that matter in production:

  • tenant isolation (multi-tenant SaaS)
  • visibility (internal vs public)
  • product area (reduce wrong-but-related context)

Internal link: Knowledge Bases for AI Products in 2026.


Retrieval Evaluation (Separate Retrieval From Generation)

If you only evaluate “final answer quality,” you won’t know whether failures are retrieval or generation.

Retrieval metrics

MetricMeaning
Recall@kdid the right source appear in top‑k?
Precision@khow much of top‑k is relevant?
MRRhow highly ranked is the first relevant chunk?
Source accuracyare we citing the correct docs?

Build a golden question set

For each question, store:

  • expected sources (URLs)
  • allowed alternatives
  • disallowed sources (outdated)

Then run the suite whenever you change:

  • chunking strategy
  • embedding model
  • metadata schema
  • filtering logic

Internal link: Prompt Regression Testing in 2026.


Freshness (Knowledge Decay Is Real)

Even perfect chunking fails when docs drift.

Treat freshness like architecture:

  • change detection (what changed?)
  • incremental re-index (update only what changed)
  • staleness signals (prefer newer sources)
  • ownership (someone responsible for docs)

Implementation Checklist

  • Chunk by meaning first (headings/sections), not tokens
  • Store section_path and include it in retrieval
  • Store canonical source_url and updated_at
  • Add metadata for filtering (tenant, visibility, product area)
  • Evaluate retrieval separately (recall@k, precision@k, MRR)
  • Maintain a golden question set with expected sources
  • Design for freshness (change detection + incremental re-index)

FAQ

What chunk size is “best”?

It depends. Start with section-based chunks and iterate based on retrieval evals.

Should I use semantic chunking?

Sometimes. Semantic chunking can improve coherence but increases complexity and cost. Start with section/recursive chunking and graduate to semantic methods when evaluation shows a plateau.

What metadata is non-negotiable?

At minimum: source URL, doc ID, updated timestamp, tenant/visibility scope, and a section path (heading context).


Sources & Further Reading

Interested in our research?

We share our work openly. If you'd like to collaborate or discuss ideas — we'd love to hear from you.

Get in Touch

Let's build
something real.

No more slide decks. No more "maybe next quarter".
Let's ship your MVP in weeks.

Start Building Now