Skip to content
Curriculum/Day 1: LLM APIs, Structured Output & Streaming
Day 1Build AI Products

LLM APIs, Structured Output & Streaming

You already call REST APIs and parse JSON. Today you'll call LLM APIs the same way — but you'll also learn the two killer features every AI app needs: extracting typed JSON from LLMs (structured output) and streaming responses in real-time. By tonight, you'll ship a code review tool you'll actually use at work.

80 min(+30 min boss)★★☆☆☆
💬
Bridge:REST APIs + JSONModel APIs + Structured Output

Use this at work tomorrow

Use generateText() with a system prompt to auto-generate PR descriptions from git diffs.

Learning Objectives

  • 1Call LLM APIs using the Vercel AI SDK (generateText, streamText)
  • 2Extract typed JSON from LLMs with structured output / JSON mode
  • 3Stream AI responses in real-time for production UX
  • 4Master prompt engineering: system prompts, few-shot, chain-of-thought
  • 5Build and ship an AI-powered code review tool

Ship It: AI code review tool

By the end of this day, you'll build and deploy a ai code review tool. This isn't a toy — it's a real project for your portfolio.

Before You Start — Rate Your Confidence

I can call an LLM API, get structured JSON output with a Zod schema, and stream responses to a UI.

1 = no idea · 5 = ship it blindfolded
Predict First — Then Learn

What happens when you send the exact same prompt to GPT-4o twice?

From REST APIs to LLM APIs

You've been calling REST APIs for years — HTTP POST with a JSON body, get back structured data. An LLM API call is structurally identical. The difference? The input is a natural language prompt, and the output is non-deterministic. Same fetch(), same auth headers, same error handling. New superpower.

💡An LLM API call is just fetch() with a prompt instead of params — same pattern, non-deterministic output.
Quick Pulse Check

What's the fundamental difference between a REST API and an LLM API call?

Predict First — Then Learn

Roughly how many tokens is the sentence 'Hello, how are you today?' (7 words)?

Tokens, Context Windows & Why They Matter

Tokens are like variable-size characters (~4 chars per token in English). The context window is the max tokens you can send + receive — think of it as the function's stack size. GPT-4o has 128K tokens (~300 pages). You'll manage this like you manage memory: be efficient, know your limits, and watch your costs (~$0.01 per 1K tokens for GPT-4o-mini).

💡Tokens ≈ 4 chars. Context window = your function's stack size. Watch it like you watch memory.
Quick Pulse Check

GPT-4o has a 128K token context window. Roughly how many pages of text is that?

Predict First — Then Learn

You need an LLM to return { sentiment: 'positive', score: 0.9 }. What's the most reliable approach?

Structured Output: The #1 Skill for AI Engineers

Raw LLM text is useless for apps. You need typed JSON. Structured output (JSON mode) forces the LLM to return data matching a schema you define. This is the single most practical AI skill for software engineers — extracting { title: string, sentiment: 'positive' | 'negative', tags: string[] } from free text. It's Zod schemas all the way down.

💡generateObject() + Zod = TypeScript for AI outputs. Same library you already use, new superpower.
Quick Pulse Check

What library does structured output use to define the response schema?

Streaming: The UX That Makes AI Feel Magical

Waiting 5 seconds for a wall of text feels broken. Streaming token-by-token feels responsive and alive. streamText() from the Vercel AI SDK gives you a ReadableStream — the same Web Streams API you already know. This is why ChatGPT, Cursor, and every good AI app streams responses.

💡streamText() returns a ReadableStream — the same Web Streams API you already know. Streaming is why AI feels alive.
Quick Pulse Check

Why does every major AI product (ChatGPT, Cursor, Claude) stream responses?

Prompt Engineering Patterns

System prompts are like config files — they set behavior rules. Few-shot examples are like test fixtures — they show the expected I/O pattern. Chain-of-thought is like debug logging — making the model show its reasoning step by step. Master these three and you can make LLMs do almost anything.

💡System prompt = config file. Few-shot = test fixtures. Chain-of-thought = debug logging. Three patterns, infinite power.
Quick Pulse Check

Which prompt pattern is most like adding console.log() to trace a bug?

The Full Evolution

Watch one function evolve through every concept you just learned.

Production Gotchas

Rate limits will hit you at ~500 RPM on GPT-4o-mini (use exponential backoff). Token overflow silently truncates your input — always count tokens before sending. Temperature 0 doesn't mean deterministic — it means less random. Cost surprise: a chat app with 10K users can cost $500/day if you're not careful with context management.

Code Comparison

API Call: REST vs LLM

Compare a traditional API call with an LLM API call

REST API CallTraditional
// Traditional REST API call
const response = await fetch(
  "https://api.weather.com/v1/forecast",
  {
    method: "GET",
    headers: {
      "Authorization": "Bearer " + API_KEY,
      "Content-Type": "application/json",
    },
  }
);
const data = await response.json();
// data.temperature -> always the same
// for the same input
LLM API CallAI Engineering
// LLM API call (Vercel AI SDK)
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

const { text } = await generateText({
  model: openai("gpt-4o-mini"),
  system: "You are a weather assistant.",
  prompt: "What's the weather like today?",
});
// text -> non-deterministic!
// Same input can give different outputs

KEY DIFFERENCES

  • Both are HTTP calls with auth headers — same pattern
  • LLM input is natural language, not structured params
  • LLM output is non-deterministic — same prompt can give different results
  • You control behavior with system prompts instead of query parameters

Raw Text vs Structured Output

Why you need typed JSON from LLMs, not raw strings

API → Typed ResponseTraditional
// Traditional: API returns typed data
interface WeatherResp {
  temp: number;
  conditions: string;
  humidity: number;
}

const res = await fetch("/api/weather");
const data: WeatherResp = await res.json();

// data.temp -> number ✓
// data.conditions -> string ✓
// TypeScript knows the shape
LLM → Structured OutputAI Engineering
// AI: Force LLM to return typed JSON
import { generateObject } from "ai";
import { z } from "zod";

const { object } = await generateObject({
  model: openai("gpt-4o-mini"),
  schema: z.object({
    sentiment: z.enum(["positive", "negative", "neutral"]),
    topics: z.array(z.string()),
    summary: z.string(),
  }),
  prompt: "Analyze: Great product, fast shipping!",
});

// object.sentiment -> "positive" ✓
// object.topics -> ["product", "shipping"] ✓
// TypeScript knows the shape!

KEY DIFFERENCES

  • Both produce typed data your app can consume
  • Structured output uses Zod schemas — the same library you know
  • generateObject() guarantees valid JSON matching your schema
  • No more parsing raw text with regex — let the LLM do it

Loading State vs Streaming

Why streaming transforms AI UX

Traditional LoadingTraditional
// Traditional: wait for full response
const [loading, setLoading] = useState(false);
const [data, setData] = useState(null);

async function handleSubmit() {
  setLoading(true);
  const res = await fetch("/api/data");
  const json = await res.json();
  setData(json);  // All at once
  setLoading(false);
}

// User sees: spinner... spinner... BOOM
// Full content appears all at once
Streaming ResponseAI Engineering
// AI: Stream response token by token
import { useChat } from "ai/react";

const { messages, input, handleSubmit } =
  useChat({ api: "/api/chat" });

// Or manually with streamText:
const { textStream } = streamText({
  model: openai("gpt-4o-mini"),
  prompt: userQuestion,
});

for await (const chunk of textStream) {
  process.stdout.write(chunk);
  // User sees text appear word by word
}

// User sees: words... flowing... naturally
// Feels fast even if total time is the same

KEY DIFFERENCES

  • Traditional: spinner → full content dump (feels slow)
  • Streaming: text flows in real-time (feels responsive)
  • Uses ReadableStream — the same Web Streams API you know
  • ChatGPT, Cursor, and every great AI app uses streaming

Bridge Map: REST APIs + JSON → Model APIs + Structured Output

Click any bridge to see the translation

Hands-On Challenges

Build, experiment, and get AI-powered feedback on your code.

Real-World Challenge

AI-Powered Code Review Tool

Build and deploy a real code review tool that accepts code input, sends it to an LLM via the Vercel AI SDK, and streams back structured feedback with severity levels and fix suggestions. This is the tool you practiced building in the sandbox — now ship it for real.

~3h estimated
Next.js 14+Vercel AI SDKOpenAI GPT-4o-miniTailwind CSSVercel (deploy)

Acceptance Criteria

  • Accept code input via a text area or code editor
  • Call an LLM API using the Vercel AI SDK (generateText or streamText)
  • Stream the AI response in real-time so users see feedback appearing
  • Return structured output with severity (critical/warning/info) and category (bug/style/performance)
  • Display review results in a clean, readable UI
  • Handle errors gracefully (API failures, rate limits, empty input)
  • Deploy to a public URL (Vercel, Netlify, etc.)

Build Roadmap

0/6

Create a new Next.js app with TypeScript and Tailwind CSS. This gives you file-based routing, server-side API routes, and a modern styling system out of the box.

npx create-next-app@latest ai-code-reviewer --typescript --tailwind --app
Choose the App Router when prompted

Deploy Tip

Push to GitHub and import into Vercel — it auto-detects Next.js and deploys in under a minute. Remember to add your OPENAI_API_KEY in the Vercel environment variables.

Sign in to submit your deployed project.

After Learning — Rate Your Confidence Again

I can call an LLM API, get structured JSON output with a Zod schema, and stream responses to a UI.

1 = no idea · 5 = ship it blindfolded