RAG Chunking & Metadata in 2026: The Details That Decide Quality
Most RAG failures come from bad chunking and missing metadata. A practical guide to chunk strategy, filters, and retrieval evaluation.
TL;DR
- Chunk by meaning (headings/sections), not by tokens alone.
- Metadata enables correct filtering and reduces irrelevant context.
- Evaluate retrieval with expected sources, not just “good answers.”
Why Chunking and Metadata Decide RAG Quality
When RAG fails, it often looks like “the model hallucinated.”
But most production failures happen earlier:
- the system retrieved the wrong chunk
- the chunk was missing context (cut mid-procedure)
- the chunk was correct but outside the user’s scope (tenant/permissions)
- the right doc existed, but metadata filters couldn’t find it
Chunking and metadata define the retrieval contract: what the system is allowed to retrieve and how it explains why.
Chunking Strategies (What to Use and When)
| Strategy | How it works | Best for | Common failure |
|---|---|---|---|
| Fixed-size | split every N tokens/chars | quick baseline | breaks meaning |
| Section-based | split by headings/sections | docs with structure | needs clean markup |
| Recursive | split by separators hierarchically | mixed docs | needs tuning |
| Semantic | split by embedding boundaries | dense text | expensive, unpredictable |
| Hierarchical | preserve doc tree (H1/H2/H3) | manuals, policies | implementation complexity |
Default recommendation
Start with section-based chunking when you have headings, and fall back to recursive chunking for messy sources.
Practical Chunking Rules (That Prevent Most Failures)
1) Keep chunks self-contained
If a chunk is missing the critical constraint line, the model will guess.
For procedures, a chunk should contain:
- prerequisites
- the steps
- the edge cases (common failure modes)
2) Include heading and section context
Store the heading path as metadata and optionally prepend it to the chunk text:
“Billing → Refunds → Exceptions”
This improves retrieval precision and helps the model interpret meaning correctly.
3) Store source URL + timestamps
Every chunk should know:
- where it came from (canonical URL)
- when it was last updated
That enables citations and freshness-aware retrieval.
4) Add metadata for filtering
Minimum useful metadata:
- product area
- tenant ID (if multi-tenant)
- visibility (public/internal)
- doc type (policy/guide/faq)
Chunk Size: There Is No “Best” Number
Chunk size is a tradeoff:
| Smaller chunks | Larger chunks |
|---|---|
| higher precision | better context |
| easier ranking | harder ranking (diluted similarity) |
| more retrieval calls | fewer retrieval calls |
| risk of missing steps | risk of including irrelevant text |
The right way to choose is evaluation: measure recall@k and precision@k with a realistic question set.
Metadata: The Control Plane for Retrieval
Metadata is what lets you filter out irrelevant chunks before the model ever sees them.
A minimum metadata schema
| Field | Example | Why |
|---|---|---|
doc_id | stable UUID | updates + de-dupe |
source_url | canonical URL | citations |
title | “Refund policy” | relevance |
section_path | “Billing → Refunds” | context |
doc_type | “policy / guide / faq” | ranking/filtering |
product_area | “billing” | scope |
tenant_id | “acme” | isolation |
visibility | “public/internal” | permissions |
updated_at | ISO date | freshness |
Filters that matter in production:
- tenant isolation (multi-tenant SaaS)
- visibility (internal vs public)
- product area (reduce wrong-but-related context)
Internal link: Knowledge Bases for AI Products in 2026.
Retrieval Evaluation (Separate Retrieval From Generation)
If you only evaluate “final answer quality,” you won’t know whether failures are retrieval or generation.
Retrieval metrics
| Metric | Meaning |
|---|---|
| Recall@k | did the right source appear in top‑k? |
| Precision@k | how much of top‑k is relevant? |
| MRR | how highly ranked is the first relevant chunk? |
| Source accuracy | are we citing the correct docs? |
Build a golden question set
For each question, store:
- expected sources (URLs)
- allowed alternatives
- disallowed sources (outdated)
Then run the suite whenever you change:
- chunking strategy
- embedding model
- metadata schema
- filtering logic
Internal link: Prompt Regression Testing in 2026.
Freshness (Knowledge Decay Is Real)
Even perfect chunking fails when docs drift.
Treat freshness like architecture:
- change detection (what changed?)
- incremental re-index (update only what changed)
- staleness signals (prefer newer sources)
- ownership (someone responsible for docs)
Implementation Checklist
- Chunk by meaning first (headings/sections), not tokens
- Store
section_pathand include it in retrieval - Store canonical
source_urlandupdated_at - Add metadata for filtering (tenant, visibility, product area)
- Evaluate retrieval separately (recall@k, precision@k, MRR)
- Maintain a golden question set with expected sources
- Design for freshness (change detection + incremental re-index)
FAQ
What chunk size is “best”?
It depends. Start with section-based chunks and iterate based on retrieval evals.
Should I use semantic chunking?
Sometimes. Semantic chunking can improve coherence but increases complexity and cost. Start with section/recursive chunking and graduate to semantic methods when evaluation shows a plateau.
What metadata is non-negotiable?
At minimum: source URL, doc ID, updated timestamp, tenant/visibility scope, and a section path (heading context).
Sources & Further Reading
Interested in our research?
We share our work openly. If you'd like to collaborate or discuss ideas — we'd love to hear from you.
Get in Touch