Curriculum/Day 1: LLM APIs, Structured Output & Streaming

Day 1Build AI Products

LLM APIs, Structured Output & Streaming

You already call REST APIs and parse JSON. Today you'll call LLM APIs the same way — but you'll also learn the two killer features every AI app needs: extracting typed JSON from LLMs (structured output) and streaming responses in real-time. By tonight, you'll ship a code review tool you'll actually use at work.

80 min(+30 min boss)★★☆☆☆

💬

Bridge:REST APIs + JSONModel APIs + Structured Output

Use this at work tomorrow

Use generateText() with a system prompt to auto-generate PR descriptions from git diffs.

Learning Objectives

1Call LLM APIs using the Vercel AI SDK (generateText, streamText)
2Extract typed JSON from LLMs with structured output / JSON mode
3Stream AI responses in real-time for production UX
4Master prompt engineering: system prompts, few-shot, chain-of-thought
5Build and ship an AI-powered code review tool

Ship It: AI code review tool

By the end of this day, you'll build and deploy a ai code review tool. This isn't a toy — it's a real project for your portfolio.

Before You Start — Rate Your Confidence

I can call an LLM API, get structured JSON output with a Zod schema, and stream responses to a UI.

1 = no idea · 5 = ship it blindfolded

Predict First — Then Learn

What happens when you send the exact same prompt to GPT-4o twice?

From REST APIs to LLM APIs

You've been calling REST APIs for years — HTTP POST with a JSON body, get back structured data. An LLM API call is structurally identical. The difference? The input is a natural language prompt, and the output is non-deterministic. Same fetch(), same auth headers, same error handling. New superpower.

💡An LLM API call is just fetch() with a prompt instead of params — same pattern, non-deterministic output.

Quick Pulse Check

What's the fundamental difference between a REST API and an LLM API call?

API Call Flow: How Data Moves

Same HTTP pattern — but input is prose and output varies every time

Predict First — Then Learn

Roughly how many tokens is the sentence 'Hello, how are you today?' (7 words)?

Tokens, Context Windows & Why They Matter

Tokens are like variable-size characters (~4 chars per token in English). The context window is the max tokens you can send + receive — think of it as the function's stack size. GPT-4o has 128K tokens (~300 pages). You'll manage this like you manage memory: be efficient, know your limits, and watch your costs (~$0.01 per 1K tokens for GPT-4o-mini).

💡Tokens ≈ 4 chars. Context window = your function's stack size. Watch it like you watch memory.

Quick Pulse Check

GPT-4o has a 128K token context window. Roughly how many pages of text is that?

🔤 Token Playground — Type & See

Hello,·World!

Token Count

~13 chars

Cost (input)

GPT-4o$1.5e-5

GPT-4o-mini$4.5e-7

Claude 3.5$9.0e-6

Context Window (GPT-4o)

3 / 128,000 (0.00%)

📦 Context Window Visualizer — 128K Tokens

Drag the sliders to fill the context window. Watch what happens when you overflow.

🔧 System Prompt500 tokens

💬 Chat History2,000 tokens

📄 RAG Documents4,000 tokens

✍️ Response Budget1,000 tokens

📄 4,000

128K limit

7,500 / 128,000 tokens5.9%

Predict First — Then Learn

You need an LLM to return { sentiment: 'positive', score: 0.9 }. What's the most reliable approach?

Structured Output: The #1 Skill for AI Engineers

Raw LLM text is useless for apps. You need typed JSON. Structured output (JSON mode) forces the LLM to return data matching a schema you define. This is the single most practical AI skill for software engineers — extracting { title: string, sentiment: 'positive' | 'negative', tags: string[] } from free text. It's Zod schemas all the way down.

💡generateObject() + Zod = TypeScript for AI outputs. Same library you already use, new superpower.

Quick Pulse Check

What library does structured output use to define the response schema?

🔄 Structured Output Pipeline

📝

🛡️

🧠

✅

Raw Text

Zod Schema

LLM

Typed JSON

Click "Run Pipeline" to see the transformation in action

Streaming: The UX That Makes AI Feel Magical

Waiting 5 seconds for a wall of text feels broken. Streaming token-by-token feels responsive and alive. streamText() from the Vercel AI SDK gives you a ReadableStream — the same Web Streams API you already know. This is why ChatGPT, Cursor, and every good AI app streams responses.

💡streamText() returns a ReadableStream — the same Web Streams API you already know. Streaming is why AI feels alive.

Quick Pulse Check

Why does every major AI product (ChatGPT, Cursor, Claude) stream responses?

⚡ Streaming vs Loading — Live Demo

Traditional Loadingfetch() → wait → dump

Press Play to start

Perceived wait

Streaming ResponsestreamText() → flows

Press Play to start

Perceived wait

Same total time — streaming feels 3-5x faster because users start reading immediately

Prompt Engineering Patterns

System prompts are like config files — they set behavior rules. Few-shot examples are like test fixtures — they show the expected I/O pattern. Chain-of-thought is like debug logging — making the model show its reasoning step by step. Master these three and you can make LLMs do almost anything.

💡System prompt = config file. Few-shot = test fixtures. Chain-of-thought = debug logging. Three patterns, infinite power.

Quick Pulse Check

Which prompt pattern is most like adding console.log() to trace a bug?

🃏 Prompt Patterns — Tap to Flip

Each SWE pattern has an AI equivalent. Click a card to reveal the connection.

SWE Patternclick to flip →

Config Files

// .env or config.json
{
  "theme": "dark",
  "maxRetries": 3,
  "logLevel": "warn"
}
// Deterministic: same config = same behavior

AI Pattern← click to flip

System Prompts

// System prompt (LLM "config")
const system = `You are a senior code
reviewer. Be constructive and specific.
Always suggest fixes, not just problems.
Format: bullet points with severity.`
// Probabilistic: interpreted, not executed

SWE Patternclick to flip →

Test Fixtures

// test/fixtures.ts
const mockUser = {
  name: "John",
  email: "john@test.com",
  role: "admin"
};
// Shows expected I/O pattern

AI Pattern← click to flip

Few-Shot Examples

// Few-shot examples in prompt
`Classify sentiment:
"Love it!" → positive
"Terrible." → negative
"It's okay" → neutral

Now classify: "Amazing product!"`
// Shows expected I/O pattern to LLM

SWE Patternclick to flip →

Debug Logging

// Add console.log to trace
console.log("Step 1: fetched", data);
console.log("Step 2: parsed", result);
console.log("Step 3: filtered", items);
// Makes execution visible

AI Pattern← click to flip

Chain-of-Thought

// Chain-of-thought prompting
`Let's solve step by step:
1. First, identify the language
2. Then, check for syntax errors
3. Next, analyze logic flow
4. Finally, suggest improvements`
// Makes LLM reasoning visible

0/3 flipped — Flip all three to see the full picture

The Full Evolution

Watch one function evolve through every concept you just learned.

🔄 Code Evolution — One Function, Five Stages

Step 1: Raw fetch()

The SWE starting point

Raw fetch, manual headers, raw text output

1async function reviewCode(code: string) {
2  const response = await fetch(
3    "https://api.openai.com/v1/chat/completions",
4    {
5      method: "POST",
6      headers: {
7        "Authorization": `Bearer ${API_KEY}`,
8        "Content-Type": "application/json",
9      },
10      body: JSON.stringify({
11        model: "gpt-4o-mini",
12        messages: [
13          { role: "user", content: `Review: ${code}` }
14        ],
15      }),
16    }
17  );
18  const data = await response.json();
19  return data.choices[0].message.content;
20  // Returns raw text — unparseable!
21}

1 / 5

Production Gotchas

Rate limits will hit you at ~500 RPM on GPT-4o-mini (use exponential backoff). Token overflow silently truncates your input — always count tokens before sending. Temperature 0 doesn't mean deterministic — it means less random. Cost surprise: a chat app with 10K users can cost $500/day if you're not careful with context management.

Code Comparison

API Call: REST vs LLM

Compare a traditional API call with an LLM API call

REST API CallTraditional

// Traditional REST API call
const response = await fetch(
  "https://api.weather.com/v1/forecast",
  {
    method: "GET",
    headers: {
      "Authorization": "Bearer " + API_KEY,
      "Content-Type": "application/json",
    },
  }
);
const data = await response.json();
// data.temperature -> always the same
// for the same input

LLM API CallAI Engineering

// LLM API call (Vercel AI SDK)
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

const { text } = await generateText({
  model: openai("gpt-4o-mini"),
  system: "You are a weather assistant.",
  prompt: "What's the weather like today?",
});
// text -> non-deterministic!
// Same input can give different outputs

KEY DIFFERENCES

Both are HTTP calls with auth headers — same pattern
LLM input is natural language, not structured params
LLM output is non-deterministic — same prompt can give different results
You control behavior with system prompts instead of query parameters

Raw Text vs Structured Output

Why you need typed JSON from LLMs, not raw strings

API → Typed ResponseTraditional

// Traditional: API returns typed data
interface WeatherResp {
  temp: number;
  conditions: string;
  humidity: number;
}

const res = await fetch("/api/weather");
const data: WeatherResp = await res.json();

// data.temp -> number ✓
// data.conditions -> string ✓
// TypeScript knows the shape

LLM → Structured OutputAI Engineering

// AI: Force LLM to return typed JSON
import { generateObject } from "ai";
import { z } from "zod";

const { object } = await generateObject({
  model: openai("gpt-4o-mini"),
  schema: z.object({
    sentiment: z.enum(["positive", "negative", "neutral"]),
    topics: z.array(z.string()),
    summary: z.string(),
  }),
  prompt: "Analyze: Great product, fast shipping!",
});

// object.sentiment -> "positive" ✓
// object.topics -> ["product", "shipping"] ✓
// TypeScript knows the shape!

KEY DIFFERENCES

Both produce typed data your app can consume
Structured output uses Zod schemas — the same library you know
generateObject() guarantees valid JSON matching your schema
No more parsing raw text with regex — let the LLM do it

Loading State vs Streaming

Why streaming transforms AI UX

Traditional LoadingTraditional

// Traditional: wait for full response
const [loading, setLoading] = useState(false);
const [data, setData] = useState(null);

async function handleSubmit() {
  setLoading(true);
  const res = await fetch("/api/data");
  const json = await res.json();
  setData(json);  // All at once
  setLoading(false);
}

// User sees: spinner... spinner... BOOM
// Full content appears all at once

Streaming ResponseAI Engineering

// AI: Stream response token by token
import { useChat } from "ai/react";

const { messages, input, handleSubmit } =
  useChat({ api: "/api/chat" });

// Or manually with streamText:
const { textStream } = streamText({
  model: openai("gpt-4o-mini"),
  prompt: userQuestion,
});

for await (const chunk of textStream) {
  process.stdout.write(chunk);
  // User sees text appear word by word
}

// User sees: words... flowing... naturally
// Feels fast even if total time is the same

KEY DIFFERENCES

Traditional: spinner → full content dump (feels slow)
Streaming: text flows in real-time (feels responsive)
Uses ReadableStream — the same Web Streams API you know
ChatGPT, Cursor, and every great AI app uses streaming

Bridge Map: REST APIs + JSON → Model APIs + Structured Output

Click any bridge to see the translation

Hands-On Challenges

Build, experiment, and get AI-powered feedback on your code.

starter

Your First LLM API Call

Complete the summarize function using the provided generateText() mock. This simulates the exact Vercel AI SDK pattern you'll use in production. Wire up the model, prompt, and handle the async response.

PLAYGROUND

import { useState } from "react";
// TODO: Import generateText and openai from the mock AI SDK
// Hint: import { generateText, openai } from "./ai-sdk-mock";

export default function App() {
  const [input, setInput] = useState("Machine learning is a subset of artificial intelligence that enables systems to learn from data. Instead of being explicitly programmed, these systems improve through experience and exposure to data.");
  const [summary, setSummary] = useState("");
  const [loading, setLoading] = useState(false);

  async function handleSummarize() {
    if (!input.trim()) return;
    setLoading(true);

    // TODO: Call generateText() with:
    // - model: openai("gpt-4o-mini")
    // - prompt: Ask it to summarize the input in one sentence
    // Then set the summary state with result.text

    setSummary("Implement me!");
    setLoading(false);
  }

  return (
    <div style={{ padding: 20, fontFamily: "sans-serif" }}>
      <h2>🤖 Text Summarizer</h2>
      <p style={{ color: "#666", fontSize: 14 }}>Uses the same pattern as the real Vercel AI SDK</p>
      <textarea
        value={input}
        onChange={(e) => setInput(e.target.value)}
        placeholder="Paste text to summarize..."
        style={{ width: "100%", height: 100, padding: 8, borderRadius: 6 }}
      />
      <button
        onClick={handleSummarize}
        disabled={loading}
        style={{ marginTop: 8, padding: "8px 20px", background: "#0ea5e9", color: "white", border: "none", borderRadius: 6, cursor: "pointer" }}
      >
        {loading ? "Summarizing..." : "Summarize"}
      </button>
      {summary && (
        <div style={{ marginTop: 16, padding: 12, background: "#f0fdf4", borderRadius: 8, border: "1px solid #bbf7d0" }}>
          <strong>Summary:</strong> {summary}
        </div>
      )}
    </div>
  );
}

Open Sandbox

stretch

Structured Output: Extract Typed Data

Build a review analyzer that extracts structured data (sentiment, topics, summary) from product reviews. Use the mock generateObject() to simulate structured output — this is the exact pattern you'll use with real LLMs to get typed JSON instead of raw text.

PLAYGROUND

import { useState } from "react";
// TODO: Import generateObject, openai from the mock
// TODO: Import z from "zod" (already available in sandbox)
import { z } from "zod";

// TODO: Define a Zod schema for the review analysis result:
// - sentiment: "positive" | "negative" | "neutral"
// - confidence: number between 0 and 1
// - topics: array of strings
// - summary: string

const sampleReviews = [
  "Great product! Fast shipping and excellent build quality. Love it!",
  "Terrible experience. Product broke after 2 days. Worst purchase ever.",
  "Price is okay. Delivery was slow but the product quality is decent.",
];

export default function App() {
  const [selectedReview, setSelectedReview] = useState(sampleReviews[0]);
  const [result, setResult] = useState<any>(null);
  const [loading, setLoading] = useState(false);

  async function handleAnalyze() {
    setLoading(true);
    try {
      // TODO: Call generateObject() with:
      // - model: openai("gpt-4o-mini")
      // - schema: your Zod schema
      // - prompt: Ask to analyze the selected review
      // Then set result with the returned object

      setResult({ error: "Implement me!" });
    } finally {
      setLoading(false);
    }
  }

  return (
    <div style={{ padding: 20, fontFamily: "sans-serif" }}>
      <h2>📊 Review Analyzer (Structured Output)</h2>
      <p style={{ color: "#666", fontSize: 14 }}>
        Extracts typed JSON from free text — no regex needed
      </p>

      <div style={{ margin: "12px 0" }}>
        {sampleReviews.map((review, i) => (
          <button
            key={i}
            onClick={() => { setSelectedReview(review); setResult(null); }}
            style={{
              display: "block",
              width: "100%",
              padding: "8px 12px",
              margin: "4px 0",
              background: selectedReview === review ? "#e0f2fe" : "#f8fafc",
              border: selectedReview === review ? "2px solid #0ea5e9" : "1px solid #e2e8f0",
              borderRadius: 6,
              cursor: "pointer",
              textAlign: "left",
              fontSize: 13,
            }}
          >
            {review}
          </button>
        ))}
      </div>

      <button
        onClick={handleAnalyze}
        disabled={loading}
        style={{ padding: "8px 20px", background: "#8b5cf6", color: "white", border: "none", borderRadius: 6, cursor: "pointer" }}
      >
        {loading ? "Analyzing..." : "Analyze Review"}
      </button>

      {result && !result.error && (
        <div style={{ marginTop: 16, padding: 16, background: "#f0fdf4", borderRadius: 8, border: "1px solid #bbf7d0" }}>
          <div style={{ display: "flex", gap: 8, marginBottom: 8 }}>
            <span style={{
              padding: "2px 8px",
              borderRadius: 12,
              fontSize: 12,
              fontWeight: 600,
              background: result.sentiment === "positive" ? "#dcfce7" : result.sentiment === "negative" ? "#fee2e2" : "#fef3c7",
              color: result.sentiment === "positive" ? "#166534" : result.sentiment === "negative" ? "#991b1b" : "#92400e",
            }}>
              {result.sentiment}
            </span>
            <span style={{ fontSize: 12, color: "#666" }}>
              Confidence: {(result.confidence * 100).toFixed(0)}%
            </span>
          </div>
          <div style={{ fontSize: 13, color: "#374151" }}>
            <strong>Topics:</strong> {result.topics?.join(", ")}
          </div>
          <div style={{ fontSize: 13, color: "#374151", marginTop: 4 }}>
            <strong>Summary:</strong> {result.summary}
          </div>
        </div>
      )}
    </div>
  );
}

Open Sandbox

Real-World Challenge

AI-Powered Code Review Tool

Build and deploy a real code review tool that accepts code input, sends it to an LLM via the Vercel AI SDK, and streams back structured feedback with severity levels and fix suggestions. This is the tool you practiced building in the sandbox — now ship it for real.

~3h estimated

Next.js 14+Vercel AI SDKOpenAI GPT-4o-miniTailwind CSSVercel (deploy)

Acceptance Criteria

Accept code input via a text area or code editor
Call an LLM API using the Vercel AI SDK (generateText or streamText)
Stream the AI response in real-time so users see feedback appearing
Return structured output with severity (critical/warning/info) and category (bug/style/performance)
Display review results in a clean, readable UI
Handle errors gracefully (API failures, rate limits, empty input)
Deploy to a public URL (Vercel, Netlify, etc.)

Build Roadmap

0/6

Create a new Next.js app with TypeScript and Tailwind CSS. This gives you file-based routing, server-side API routes, and a modern styling system out of the box.

npx create-next-app@latest ai-code-reviewer --typescript --tailwind --app

Choose the App Router when prompted

Deploy Tip

Push to GitHub and import into Vercel — it auto-detects Next.js and deploys in under a minute. Remember to add your OPENAI_API_KEY in the Vercel environment variables.

After Learning — Rate Your Confidence Again

I can call an LLM API, get structured JSON output with a Zod schema, and stream responses to a UI.

1 = no idea · 5 = ship it blindfolded

Day 2: Embeddings & Vector Search

LLM APIs, Structured Output & Streaming

Learning Objectives

Ship It: AI code review tool

From REST APIs to LLM APIs

API Call Flow: How Data Moves

Tokens, Context Windows & Why They Matter

🔤 Token Playground — Type & See

📦 Context Window Visualizer — 128K Tokens

Structured Output: The #1 Skill for AI Engineers

🔄 Structured Output Pipeline

Streaming: The UX That Makes AI Feel Magical

⚡ Streaming vs Loading — Live Demo

Prompt Engineering Patterns

🃏 Prompt Patterns — Tap to Flip

Config Files

System Prompts

Test Fixtures

Few-Shot Examples

Debug Logging

Chain-of-Thought

The Full Evolution

🔄 Code Evolution — One Function, Five Stages

Step 1: Raw fetch()

Production Gotchas

Code Comparison

API Call: REST vs LLM

Raw Text vs Structured Output

Loading State vs Streaming

Bridge Map: REST APIs + JSON → Model APIs + Structured Output

Hands-On Challenges

Your First LLM API Call

Structured Output: Extract Typed Data

AI-Powered Code Review Tool

Acceptance Criteria

Build Roadmap

Discussion

API Call Flow: How Data Moves

🔤 Token Playground — Type & See

📦 Context Window Visualizer — 128K Tokens

🔄 Structured Output Pipeline

⚡ Streaming vs Loading — Live Demo

🃏 Prompt Patterns — Tap to Flip

Config Files

System Prompts

Test Fixtures

Few-Shot Examples

Debug Logging

Chain-of-Thought

🔄 Code Evolution — One Function, Five Stages

Step 1: Raw fetch()

Your First LLM API Call

Structured Output: Extract Typed Data

Discussion