Skip to content
Curriculum/Day 3: RAG Deep Dive
Day 3Build AI Products

RAG Deep Dive

RAG is the #1 AI pattern in production. You'll go far beyond basics — learn multiple chunking strategies, debug retrieval failures, handle the #1 cause of hallucination (bad retrieval), and build a Q&A system that cites its sources. This is the day that separates AI engineers from tutorial followers.

90 min(+35 min boss)★★★☆☆
📚
Bridge:Database queriesRetrieval + AI generation

Use this at work tomorrow

Build a Q&A bot over your team's internal docs — Confluence, Notion, or README files.

Learning Objectives

  • 1Master the RAG pipeline: chunk → embed → store → retrieve → generate
  • 2Compare chunking strategies: fixed-size, recursive, semantic, by-heading
  • 3Debug RAG failures: bad retrieval, context overflow, hallucination grounding
  • 4Add source citations with [1], [2] notation for trustworthy answers
  • 5Ship a document Q&A system that answers from YOUR data

Ship It: Document Q&A system

By the end of this day, you'll build and deploy a document q&a system. This isn't a toy — it's a real project for your portfolio.

Before You Start — Rate Your Confidence

I can build a RAG pipeline that chunks documents, embeds them, retrieves relevant context, and generates grounded answers with citations.

1 = no idea · 5 = ship it blindfolded
Predict First — Then Learn

How does RAG reduce LLM hallucinations?

RAG = Query Your Own Data with AI

RAG stands for Retrieval-Augmented Generation. Think of it as: query your database, but instead of rendering the data directly in a template, you pass it to an LLM as context to generate a natural language answer. It's the #1 AI pattern in production because it grounds LLM responses in YOUR data, dramatically reducing hallucination.

💡RAG = query your data + pass results as context to an LLM. The #1 pattern in production AI. Grounds answers in YOUR data.
Quick Pulse Check

What does the 'R' in RAG do?

The RAG Pipeline: Chunk → Embed → Store → Retrieve → Generate

The RAG pipeline is a 5-step data flow. (1) Chunk: split documents into manageable pieces. (2) Embed: convert chunks to vectors. (3) Store: save vectors in a vector database. (4) Retrieve: find chunks similar to the user's query. (5) Generate: pass retrieved chunks as context to an LLM. Each step has trade-offs — chunk size affects retrieval quality, embedding model affects accuracy, and the generation prompt affects answer quality.

💡5 steps: Chunk → Embed → Store → Retrieve → Generate. Each step has trade-offs that affect final answer quality.
Quick Pulse Check

In the RAG pipeline, which step happens at query time (not during indexing)?

Predict First — Then Learn

You split a 10,000-word document every 500 characters. What's the biggest problem?

Chunking Strategies: Fixed-Size Is Just the Beginning

Fixed-size (500 chars) is the simplest but often worst strategy — it splits mid-sentence. Recursive splitting follows document structure (paragraphs → sentences → words). Semantic chunking groups by meaning. Heading-based chunking follows document hierarchy. For code: chunk by function/class. The right strategy depends on your data. Bad chunking is the #1 cause of bad RAG.

💡Bad chunking = bad RAG. Recursive splitting respects structure. Fixed-size splits mid-sentence. Always check your chunk boundaries.
Quick Pulse Check

You're chunking API documentation. What's the best strategy?

Predict First — Then Learn

Your RAG app gives bad answers. What should you debug FIRST?

RAG Failure Modes: What Goes Wrong in Production

Garbage retrieval → hallucinated answers. If the retriever pulls irrelevant chunks, the LLM will still generate a confident answer from nonsense context. Other failure modes: context overflow (too many chunks), lost-in-the-middle (LLMs ignore middle chunks), stale data (embeddings from old docs), and adversarial queries that retrieve unrelated content. Debugging RAG means debugging retrieval first.

💡If retrieval is bad, no prompt can fix generation. Debug retrieval first. LLMs will confidently hallucinate from garbage context.
Quick Pulse Check

Your RAG system retrieves chunks about 'cooking recipes' for a question about 'React hooks'. What happens?

The Full Evolution

Watch one function evolve through every concept you just learned.

Production Gotchas

Chunk overlap prevents losing context at boundaries (50-100 char overlap is standard). Always include document metadata (source, date, section) in your chunks — you'll need it for citations and filtering. Monitor retrieval quality separately from generation quality — if retrieval is bad, no prompt can fix generation. Re-rank after retrieval for better quality (reorders by actual relevance, not just vector similarity).

Code Comparison

Data Query: SQL + Template vs RAG

Traditional data display vs RAG-powered answers

Query + TemplateTraditional
// Traditional: query DB, render template
const docs = await db.query(
  "SELECT * FROM docs WHERE topic = $1",
  [userQuestion]
);

return docs.map(doc => ({
  title: doc.title,
  snippet: doc.content.slice(0, 200),
  link: doc.url,
}));
// Returns: list of links
// User must read & synthesize themselves
RAG PipelineAI Engineering
// RAG: retrieve context, generate answer
// 1. Embed user's question
const { embedding } = await embed({
  model: openai.embedding(
    "text-embedding-3-small"
  ),
  value: userQuestion,
});

// 2. Retrieve relevant chunks
const chunks = await vectorDB.query({
  vector: embedding, topK: 5,
});

// 3. Generate grounded answer
const { text } = await generateText({
  model: openai("gpt-4o-mini"),
  system: `Answer based ONLY on the context.
If the answer isn't there, say "I don't know."`,
  prompt: `Context:
${chunks.map(c => c.content).join("\n\n")}

Question: ${userQuestion}`,
});
// Returns: synthesized answer with context

KEY DIFFERENCES

  • Traditional: user searches → reads → synthesizes answer manually
  • RAG: user asks → system retrieves → LLM synthesizes → user gets answer
  • RAG pipeline: Chunk → Embed → Store → Retrieve → Generate
  • The 'only answer from context' prompt prevents hallucination

Chunking: Fixed vs Recursive

Why chunking strategy matters for RAG quality

Fixed-Size ChunkingTraditional
// Fixed-size: simple but crude
function chunkFixed(text: string, size = 500) {
  const chunks: string[] = [];
  for (let i = 0; i < text.length; i += size) {
    chunks.push(text.slice(i, i + size));
  }
  return chunks;
}

// Problem: "The React useEffect hook
// runs after every-"
// [CHUNK BOUNDARY]
// "-render by default."
// Splits mid-sentence! Context lost.
Recursive ChunkingAI Engineering
// Recursive: follows document structure
function chunkRecursive(
  text: string,
  maxSize = 500
) {
  // Split by paragraphs first
  const paragraphs = text.split("\n\n");

  const chunks: string[] = [];
  let current = "";

  for (const para of paragraphs) {
    if ((current + para).length > maxSize) {
      if (current) chunks.push(current.trim());
      current = para;
    } else {
      current += "\n\n" + para;
    }
  }
  if (current) chunks.push(current.trim());
  return chunks;
}
// Respects paragraph boundaries!
// Each chunk is a complete thought.

KEY DIFFERENCES

  • Fixed-size is easy to implement but splits mid-sentence
  • Recursive follows document structure (paragraphs → sentences)
  • Bad chunking = bad retrieval = hallucinated answers
  • Always test your chunking on real docs — look at the boundaries

Bridge Map: Database queries → Retrieval + AI generation

Click any bridge to see the translation

Hands-On Challenges

Build, experiment, and get AI-powered feedback on your code.

Real-World Challenge

Document Q&A System

Build and deploy a RAG-powered document Q&A system that lets users upload documents, ask questions in natural language, and get accurate answers with source citations. This is the #1 AI pattern in production — build it for real.

~4h estimated
Next.js 14+Vercel AI SDKOpenAI GPT-4o-mini + text-embedding-3-smallTailwind CSSVercel (deploy)

Acceptance Criteria

  • Accept document uploads (text, markdown, or PDF) and chunk them intelligently
  • Generate and store embeddings for all document chunks
  • Retrieve the most relevant chunks for a user's question using vector similarity
  • Generate answers grounded in the retrieved context with [1], [2] source citations
  • Handle 'I don't know' gracefully when the answer isn't in the documents
  • Support multiple documents with source attribution
  • Deploy to a public URL (Vercel, Netlify, etc.)

Build Roadmap

0/6

Create a new Next.js app with TypeScript and Tailwind CSS. Set up the project with a document upload page and API routes for processing and querying.

npx create-next-app@latest doc-qa --typescript --tailwind --app
Plan three API routes: /api/upload, /api/embed, /api/ask

Deploy Tip

Push to GitHub and import into Vercel. For the demo, pre-load a few sample documents so reviewers can try it immediately without uploading. Set your OPENAI_API_KEY in Vercel environment variables.

Sign in to submit your deployed project.

After Learning — Rate Your Confidence Again

I can build a RAG pipeline that chunks documents, embeds them, retrieves relevant context, and generates grounded answers with citations.

1 = no idea · 5 = ship it blindfolded