LLM APIs, Structured Output & Streaming
You already call REST APIs and parse JSON. Today you'll call LLM APIs the same way — but you'll also learn the two killer features every AI app needs: extracting typed JSON from LLMs (structured output) and streaming responses in real-time. By tonight, you'll ship a code review tool you'll actually use at work.
Use this at work tomorrow
Use generateText() with a system prompt to auto-generate PR descriptions from git diffs.
Learning Objectives
- 1Call LLM APIs using the Vercel AI SDK (generateText, streamText)
- 2Extract typed JSON from LLMs with structured output / JSON mode
- 3Stream AI responses in real-time for production UX
- 4Master prompt engineering: system prompts, few-shot, chain-of-thought
- 5Build and ship an AI-powered code review tool
Ship It: AI code review tool
By the end of this day, you'll build and deploy a ai code review tool. This isn't a toy — it's a real project for your portfolio.
I can call an LLM API, get structured JSON output with a Zod schema, and stream responses to a UI.
What happens when you send the exact same prompt to GPT-4o twice?
From REST APIs to LLM APIs
You've been calling REST APIs for years — HTTP POST with a JSON body, get back structured data. An LLM API call is structurally identical. The difference? The input is a natural language prompt, and the output is non-deterministic. Same fetch(), same auth headers, same error handling. New superpower.
What's the fundamental difference between a REST API and an LLM API call?
Roughly how many tokens is the sentence 'Hello, how are you today?' (7 words)?
Tokens, Context Windows & Why They Matter
Tokens are like variable-size characters (~4 chars per token in English). The context window is the max tokens you can send + receive — think of it as the function's stack size. GPT-4o has 128K tokens (~300 pages). You'll manage this like you manage memory: be efficient, know your limits, and watch your costs (~$0.01 per 1K tokens for GPT-4o-mini).
GPT-4o has a 128K token context window. Roughly how many pages of text is that?
You need an LLM to return { sentiment: 'positive', score: 0.9 }. What's the most reliable approach?
Structured Output: The #1 Skill for AI Engineers
Raw LLM text is useless for apps. You need typed JSON. Structured output (JSON mode) forces the LLM to return data matching a schema you define. This is the single most practical AI skill for software engineers — extracting { title: string, sentiment: 'positive' | 'negative', tags: string[] } from free text. It's Zod schemas all the way down.
What library does structured output use to define the response schema?
Streaming: The UX That Makes AI Feel Magical
Waiting 5 seconds for a wall of text feels broken. Streaming token-by-token feels responsive and alive. streamText() from the Vercel AI SDK gives you a ReadableStream — the same Web Streams API you already know. This is why ChatGPT, Cursor, and every good AI app streams responses.
Why does every major AI product (ChatGPT, Cursor, Claude) stream responses?
Prompt Engineering Patterns
System prompts are like config files — they set behavior rules. Few-shot examples are like test fixtures — they show the expected I/O pattern. Chain-of-thought is like debug logging — making the model show its reasoning step by step. Master these three and you can make LLMs do almost anything.
Which prompt pattern is most like adding console.log() to trace a bug?
The Full Evolution
Watch one function evolve through every concept you just learned.
Production Gotchas
Rate limits will hit you at ~500 RPM on GPT-4o-mini (use exponential backoff). Token overflow silently truncates your input — always count tokens before sending. Temperature 0 doesn't mean deterministic — it means less random. Cost surprise: a chat app with 10K users can cost $500/day if you're not careful with context management.
Code Comparison
API Call: REST vs LLM
Compare a traditional API call with an LLM API call
// Traditional REST API call
const response = await fetch(
"https://api.weather.com/v1/forecast",
{
method: "GET",
headers: {
"Authorization": "Bearer " + API_KEY,
"Content-Type": "application/json",
},
}
);
const data = await response.json();
// data.temperature -> always the same
// for the same input// LLM API call (Vercel AI SDK)
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
const { text } = await generateText({
model: openai("gpt-4o-mini"),
system: "You are a weather assistant.",
prompt: "What's the weather like today?",
});
// text -> non-deterministic!
// Same input can give different outputsKEY DIFFERENCES
- Both are HTTP calls with auth headers — same pattern
- LLM input is natural language, not structured params
- LLM output is non-deterministic — same prompt can give different results
- You control behavior with system prompts instead of query parameters
Raw Text vs Structured Output
Why you need typed JSON from LLMs, not raw strings
// Traditional: API returns typed data
interface WeatherResp {
temp: number;
conditions: string;
humidity: number;
}
const res = await fetch("/api/weather");
const data: WeatherResp = await res.json();
// data.temp -> number ✓
// data.conditions -> string ✓
// TypeScript knows the shape// AI: Force LLM to return typed JSON
import { generateObject } from "ai";
import { z } from "zod";
const { object } = await generateObject({
model: openai("gpt-4o-mini"),
schema: z.object({
sentiment: z.enum(["positive", "negative", "neutral"]),
topics: z.array(z.string()),
summary: z.string(),
}),
prompt: "Analyze: Great product, fast shipping!",
});
// object.sentiment -> "positive" ✓
// object.topics -> ["product", "shipping"] ✓
// TypeScript knows the shape!KEY DIFFERENCES
- Both produce typed data your app can consume
- Structured output uses Zod schemas — the same library you know
- generateObject() guarantees valid JSON matching your schema
- No more parsing raw text with regex — let the LLM do it
Loading State vs Streaming
Why streaming transforms AI UX
// Traditional: wait for full response
const [loading, setLoading] = useState(false);
const [data, setData] = useState(null);
async function handleSubmit() {
setLoading(true);
const res = await fetch("/api/data");
const json = await res.json();
setData(json); // All at once
setLoading(false);
}
// User sees: spinner... spinner... BOOM
// Full content appears all at once// AI: Stream response token by token
import { useChat } from "ai/react";
const { messages, input, handleSubmit } =
useChat({ api: "/api/chat" });
// Or manually with streamText:
const { textStream } = streamText({
model: openai("gpt-4o-mini"),
prompt: userQuestion,
});
for await (const chunk of textStream) {
process.stdout.write(chunk);
// User sees text appear word by word
}
// User sees: words... flowing... naturally
// Feels fast even if total time is the sameKEY DIFFERENCES
- Traditional: spinner → full content dump (feels slow)
- Streaming: text flows in real-time (feels responsive)
- Uses ReadableStream — the same Web Streams API you know
- ChatGPT, Cursor, and every great AI app uses streaming
Bridge Map: REST APIs + JSON → Model APIs + Structured Output
Click any bridge to see the translation
Hands-On Challenges
Build, experiment, and get AI-powered feedback on your code.
AI-Powered Code Review Tool
Build and deploy a real code review tool that accepts code input, sends it to an LLM via the Vercel AI SDK, and streams back structured feedback with severity levels and fix suggestions. This is the tool you practiced building in the sandbox — now ship it for real.
Acceptance Criteria
- Accept code input via a text area or code editor
- Call an LLM API using the Vercel AI SDK (generateText or streamText)
- Stream the AI response in real-time so users see feedback appearing
- Return structured output with severity (critical/warning/info) and category (bug/style/performance)
- Display review results in a clean, readable UI
- Handle errors gracefully (API failures, rate limits, empty input)
- Deploy to a public URL (Vercel, Netlify, etc.)
Build Roadmap
0/6Create a new Next.js app with TypeScript and Tailwind CSS. This gives you file-based routing, server-side API routes, and a modern styling system out of the box.
npx create-next-app@latest ai-code-reviewer --typescript --tailwind --appChoose the App Router when promptedDeploy Tip
Push to GitHub and import into Vercel — it auto-detects Next.js and deploys in under a minute. Remember to add your OPENAI_API_KEY in the Vercel environment variables.
Sign in to submit your deployed project.
I can call an LLM API, get structured JSON output with a Zod schema, and stream responses to a UI.