Structured Outputs and Tool Use | Building Your First AI Product | Celestinosalim.com

Structured Outputs and Tool Use

What You Will Build

Two things. First, an API route that extracts structured data from unstructured text --- name, email, and sentiment from a customer message, returned as typed JSON. Second, a chat route where the model can call your functions to get real-time information.

Here is the structured extraction endpoint:

// app/api/extract/route.ts
import { generateObject } from 'ai'
import { z } from 'zod'

const ContactSchema = z.object({
  name: z.string().describe('Full name of the person'),
  email: z.string().email().describe('Email address'),
  sentiment: z.enum(['positive', 'negative', 'neutral'])
    .describe('Overall sentiment of the message')
})

export async function POST(request: Request) {
  const { message } = await request.json()

  const { object } = await generateObject({
    model: 'openai/gpt-4o-mini',
    schema: ContactSchema,
    prompt: `Extract the contact information and sentiment from this customer message:\n\n${message}`
  })

  // object is fully typed: { name: string, email: string, sentiment: 'positive' | 'negative' | 'neutral' }
  return Response.json(object)
}

Send it "Hi, I'm Sarah Chen (sarah@example.com) and I'm really frustrated that my order hasn't shipped yet" and you get back:

{
  "name": "Sarah Chen",
  "email": "sarah@example.com",
  "sentiment": "negative"
}

Typed. Validated. No regex parsing. No "sometimes the model forgets the closing brace" problems. Copy the route, test it, then we break down why it works.

Structured Outputs with Zod

The AI SDK's generateObject function takes a Zod schema and forces the model to return data that matches it. Not "please return JSON" --- the model is constrained at the generation level to produce valid output.

The .describe() calls on each field matter. They tell the model what each field means. Think of them as documentation for the AI --- the more specific your descriptions, the more accurate the extraction.

Install Zod if you have not already:

npm install zod

When to Use generateObject vs streamText

The decision is simple:

Chat, explanations, creative writing: Use streamText. The output is prose for humans.
Data extraction, classification, form filling: Use generateObject. The output is structured data for your code.

You can also use streamObject if you want to show the structured data as it generates (for example, filling in a form in real time):

import { streamObject } from 'ai'
import { z } from 'zod'

const result = streamObject({
  model: 'openai/gpt-4o-mini',
  schema: z.object({
    title: z.string(),
    summary: z.string(),
    tags: z.array(z.string())
  }),
  prompt: 'Analyze this article...'
})

return result.toTextStreamResponse()

Tool Use: The Model Calls Your Functions

Structured outputs handle extraction --- turning unstructured text into data. Tool use goes further: it gives the model the ability to take actions.

Here is the concept: you define functions (tools) with names, descriptions, and input schemas. When the model determines it needs to call a tool to answer the user's question, it generates a tool call with the appropriate arguments. Your code executes the function and returns the result. The model then uses that result to formulate its response.

The model does not execute code. It decides which function to call and with what arguments. Your code handles the actual execution.

Building a Weather Tool

A concrete example. The user asks "What's the weather in Miami?" The model cannot answer this from its training data --- it needs real-time information. So you give it a tool:

// app/api/chat/route.ts
import { streamText, tool } from 'ai'
import { z } from 'zod'

export async function POST(request: Request) {
  const { messages } = await request.json()

  const result = streamText({
    model: 'openai/gpt-4o-mini',
    messages,
    tools: {
      getWeather: tool({
        description: 'Get the current weather for a city',
        inputSchema: z.object({
          city: z.string().describe('The city name'),
          units: z.enum(['celsius', 'fahrenheit'])
            .default('fahrenheit')
            .describe('Temperature units')
        }),
        execute: async ({ city, units }) => {
          // In production, this would call a weather API
          // For now, return mock data
          const weatherData: Record<string, { temp: number; condition: string }> = {
            'Miami': { temp: 82, condition: 'Sunny' },
            'New York': { temp: 45, condition: 'Cloudy' },
            'San Francisco': { temp: 58, condition: 'Foggy' }
          }

          const data = weatherData[city]
          if (!data) return { error: `No weather data for ${city}` }

          return {
            city,
            temperature: data.temp,
            units,
            condition: data.condition
          }
        }
      })
    },
    maxSteps: 3 // Allow the model to use tools and then respond
  })

  return result.toUIMessageStreamResponse()
}

Here is the flow when the user asks "What's the weather in Miami?":

1. User sends: "What's the weather in Miami?"
2. Model analyzes the question and available tools
3. Model decides to call getWeather({ city: "Miami", units: "fahrenheit" })
4. Your execute function runs, returns { city: "Miami", temperature: 82, ... }
5. Model receives the tool result
6. Model generates: "It's currently 82 degrees and sunny in Miami."

Key details in the code

inputSchema not parameters: The tool() function uses inputSchema for the Zod schema that defines the tool's arguments. This is validated at runtime --- if the model generates invalid arguments, the SDK catches it.

maxSteps: 3: This is critical. It tells the SDK to allow the model to make tool calls and then continue generating. Without it, the model would stop after the tool call without producing a final response. The number represents the maximum rounds of tool-call-then-continue the model can make.

toUIMessageStreamResponse(): The same streaming response from lesson 2. The client's useChat hook handles tool calls transparently --- it renders text parts and can display tool invocations if you want to show them.

Multiple Tools

Real applications need multiple tools. The model decides which to call (or none, if it can answer directly):

tools: {
  getWeather: tool({
    description: 'Get current weather for a city',
    inputSchema: z.object({
      city: z.string()
    }),
    execute: async ({ city }) => {
      return await fetchWeather(city)
    }
  }),

  searchProducts: tool({
    description: 'Search the product catalog by name or category',
    inputSchema: z.object({
      query: z.string().describe('Search terms'),
      category: z.string().optional().describe('Product category filter')
    }),
    execute: async ({ query, category }) => {
      return await searchProductDatabase(query, category)
    }
  }),

  createSupportTicket: tool({
    description: 'Create a support ticket for the customer',
    inputSchema: z.object({
      subject: z.string(),
      priority: z.enum(['low', 'medium', 'high']),
      description: z.string()
    }),
    execute: async ({ subject, priority, description }) => {
      const ticket = await createTicket({ subject, priority, description })
      return { ticketId: ticket.id, status: 'created' }
    }
  })
}

The descriptions matter enormously. The model reads them to decide which tool to call. Vague descriptions lead to wrong tool selections. Be specific about what each tool does and when it should be used.

Why This Matters

Structured outputs and tool use transform what you can build:

Without tools: A chatbot that answers questions from its training data.
With tools: An assistant that can search your database, check inventory, create orders, send emails, and update records --- all through natural language.

This is the boundary between "AI feature" and "AI product." A chat widget that generates text is a feature. A chat widget that can look up a customer's order, check the shipping status, and initiate a refund is a product.

Try This

Add a lookupUser tool to the weather chat route. Give it an inputSchema with an email field, and have the execute function return mock user data (name, plan, signup date). Then ask the chat: "What plan is sarah@example.com on and what's the weather in her city?"

The model will need to chain two tool calls --- lookupUser to get the city, then getWeather to get the weather. This is your first taste of multi-step tool use, which becomes the foundation for agents in lesson 5.

What's Next

You can now get structured data from an LLM and let it call your functions. But the model still only knows what is in its training data. When a user asks about your company's knowledge base, your product docs, or last week's support tickets, the model guesses --- or worse, hallucinates. In the next lesson, you build a RAG pipeline that grounds the model's answers in your own data.