Start Lesson
A customer support agent could explain the refund policy in beautiful detail. But when a user said "refund my last order," the agent responded with instructions to visit the refund page and fill out a form. The user was talking to an AI agent specifically to avoid filling out forms. The conversation felt like calling a company and being told to check the website.
The agent could talk about actions. It could not take them. This is the gap that tool use fills. When the model can call functions -- look up an order, process a refund, check inventory -- the conversation becomes genuinely useful. Without tools, your agent is a search bar with personality.
The AI SDK defines tools as functions the model can decide to call. You describe the tool's purpose and its parameters using a Zod schema. The model decides when to invoke it, and your code executes the function.
import { streamText } from 'ai';
import { google } from '@ai-sdk/google';
import { z } from 'zod';
const result = streamText({
model: google('gemini-2.5-flash'),
system: systemPrompt,
messages: modelMessages,
tools: {
searchKnowledge: {
description: 'Search the knowledge base for information about projects, work, or expertise.',
parameters: z.object({
query: z.string().describe('The search query'),
}),
execute: async ({ query }) => {
const docs = await searchDatabase(query);
return JSON.stringify(docs);
},
},
getCurrentWeather: {
description: 'Get current weather for a location',
parameters: z.object({
location: z.string().describe('City name or coordinates'),
unit: z.enum(['celsius', 'fahrenheit']).optional(),
}),
execute: async ({ location, unit }) => {
const weather = await fetchWeather(location, unit);
return JSON.stringify(weather);
},
},
},
maxSteps: 5, // Allow up to 5 tool calls per response
});
Three things matter:
The maxSteps parameter controls the agentic loop. The model can call a tool, read the result, call another tool, and keep going until it has enough information to respond -- or until it hits the step limit.
Tools work the same way conceptually in voice agents, but the UX is fundamentally different. When a chat agent calls a tool, you can show a loading indicator. When a voice agent calls a tool, there is silence.
Here is the same knowledge base search tool in a LiveKit voice agent:
import { llm } from '@livekit/agents';
import { z } from 'zod';
const tools = {
search: llm.tool({
description: 'Search the knowledge base for information about projects, work, or expertise.',
parameters: z.object({
query: z.string().describe('The search query'),
}),
execute: async ({ query }) => {
const docs = await retrieveContext(query);
if (docs.length === 0) {
return 'No specific information found for this query.';
}
return docs.map((d) => d.content).join('\n\n');
},
}),
};
The API surface is nearly identical. The difference is latency sensitivity:
| Context | Acceptable Tool Latency | User Experience During Wait | |---------|------------------------|----------------------------| | Chat | Up to 3 seconds | Loading spinner, "Searching..." indicator | | Voice | Under 500ms | Silence -- feels like the agent froze |
Strategies for voice tool latency:
For complex workflows, you need fine-grained control over which tools are available at each step and when the loop should stop.
import { streamText, stepCountIs } from 'ai';
const result = streamText({
model: google('gemini-2.5-flash'),
messages,
tools: myTools,
maxSteps: 10,
stopWhen: stepCountIs(3), // Stop after 3 steps
});
For more dynamic control, stopWhen accepts a function and prepareStep lets you change available tools per step:
const result = streamText({
model: google('gemini-2.5-flash'),
messages,
tools: myTools,
maxSteps: 10,
stopWhen: (event) => {
// Stop after a specific tool is called
if (event.type === 'tool-result' &&
event.toolName === 'submitOrder') {
return true;
}
return false;
},
prepareStep: async (event) => {
// After 3 steps, only allow the final submission tool
if (event.stepNumber > 3) {
return { tools: { submitOrder: myTools.submitOrder } };
}
return {};
},
});
stopWhen halts the loop based on conditions -- useful for workflows where a specific tool call means "we are done." prepareStep changes the available tools at each step -- useful for guided flows where the agent should not skip ahead.
Structured outputs force the model to return data in a specific shape, validated against a schema. This is different from tool use -- here you are constraining the model's final response, not giving it functions to call.
import { generateObject } from 'ai';
import { google } from '@ai-sdk/google';
import { z } from 'zod';
const schema = z.object({
sentiment: z.enum(['positive', 'negative', 'neutral']),
confidence: z.number().min(0).max(1),
topics: z.array(z.string()).max(5),
summary: z.string().max(200),
});
const { object } = await generateObject({
model: google('gemini-2.5-flash'),
schema,
prompt: `Analyze this customer message: "${userMessage}"`,
});
// object is fully typed:
// { sentiment: 'positive', confidence: 0.87, topics: ['pricing'], summary: '...' }
The model's output is guaranteed to match the schema. No parsing, no regex, no "please format your response as JSON." The AI SDK handles constraint enforcement at the protocol level.
The real power comes from combining both: the agent calls tools to gather information, then returns a structured response.
const result = streamText({
model: google('gemini-2.5-flash'),
messages,
tools: {
lookupUser: {
description: 'Look up user information by email',
parameters: z.object({ email: z.string().email() }),
execute: async ({ email }) => {
return JSON.stringify(await db.users.findByEmail(email));
},
},
checkSubscription: {
description: 'Check subscription status',
parameters: z.object({ userId: z.string() }),
execute: async ({ userId }) => {
return JSON.stringify(await db.subscriptions.get(userId));
},
},
},
maxSteps: 3,
});
The model might first call lookupUser, then checkSubscription with the returned user ID, then synthesize both results into a human-readable response. This is the agentic pattern -- the model reasons about which tools to call and in what order.
.describe() on every field. The description helps the model understand what each field means..max(), .min(), .length() to prevent runaway outputs..optional().// Good: descriptive, constrained
z.object({
priority: z.enum(['low', 'medium', 'high', 'critical'])
.describe('How urgent this issue is'),
estimatedMinutes: z.number().min(1).max(480)
.describe('Estimated time to resolve in minutes'),
category: z.string().max(50)
.describe('The support category this falls under'),
});
Add tool use to the streaming chat you built in Lesson 3:
streamText call with maxSteps: 3.part.type === 'tool-invocation' and part.state === 'call'.stopWhen: stepCountIs(3) and observe how it affects multi-step reasoning.maxSteps controls the agentic loop. Use stopWhen and prepareStep for fine-grained control..describe(), enums, and constraints.You have a streaming chat agent that can call functions and return structured data. Now we cross the modality boundary. Next, we cover WebRTC and the OpenAI Realtime API -- how to build voice agents that process audio end-to-end with sub-second latency, delivered over peer-to-peer connections.