Streaming Chat with the AI SDK | Building Your First AI Product | Celestinosalim.com

Streaming Chat with the AI SDK

What You Will Build

A streaming chat interface where the user types a message and sees the AI's response appear word by word in real time. Two files. About 60 lines total. Here they are.

Server --- the API route:

// app/api/chat/route.ts
import { streamText } from 'ai'

export async function POST(request: Request) {
  const { messages } = await request.json()

  const result = streamText({
    model: 'openai/gpt-4o-mini',
    system: 'You are a helpful assistant. Be concise and direct.',
    messages,
    maxTokens: 500
  })

  return result.toUIMessageStreamResponse()
}

Client --- the React component:

// app/page.tsx
'use client'
import { useChat } from '@ai-sdk/react'

export default function ChatPage() {
  const { messages, input, setInput, handleSubmit, status, error } = useChat()

  const isLoading = status === 'streaming' || status === 'submitted'

  return (
    <main className="mx-auto max-w-2xl p-4">
      <h1 className="mb-4 text-2xl font-bold">AI Chat</h1>

      <div className="mb-4 space-y-4">
        {messages.map((message) => (
          <div
            key={message.id}
            className={
              message.role === 'user'
                ? 'rounded-lg bg-blue-100 p-3'
                : 'rounded-lg bg-gray-100 p-3'
            }
          >
            <p className="text-xs font-semibold uppercase text-gray-500">
              {message.role}
            </p>
            <p className="mt-1">
              {message.parts
                .filter((part) => part.type === 'text')
                .map((part) => part.text)
                .join('')}
            </p>
          </div>
        ))}
      </div>

      {isLoading && (
        <p className="mb-2 text-sm text-gray-400">Generating...</p>
      )}

      {error && (
        <p className="mb-2 text-sm text-red-500">
          Error: {error.message}
        </p>
      )}

      <form onSubmit={handleSubmit} className="flex gap-2">
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          placeholder="Type your message..."
          className="flex-1 rounded-lg border p-2"
          disabled={isLoading}
        />
        <button
          type="submit"
          disabled={isLoading || !input.trim()}
          className="rounded-lg bg-blue-600 px-4 py-2 text-white
                     disabled:opacity-50"
        >
          Send
        </button>
      </form>
    </main>
  )
}

Copy both files into your project. Run npm run dev. Open your browser. You have a working streaming chat. Now let us break down every piece.

Install the AI SDK

npm install ai @ai-sdk/openai @ai-sdk/react

The @ai-sdk/openai package is the provider adapter. The SDK also supports @ai-sdk/anthropic, @ai-sdk/google, and others --- same interface, different model.

Why Streaming Matters

In the last lesson, you made an API call and waited for the full response before showing anything to the user. For a one-sentence summary, that is acceptable. For a chat interface generating three paragraphs, the user stares at a blank screen for 3-5 seconds. That feels broken.

Streaming fixes this. Instead of waiting for the complete response, you show each word as the model generates it. The total time is the same, but the perceived speed is dramatically better. This is not a nice-to-have --- it is the baseline expectation for any AI chat interface.

Server Side: Line by Line

import { streamText } from 'ai'

streamText is the core server function. It calls the LLM and returns a streaming result object.

const result = streamText({
  model: 'openai/gpt-4o-mini',
  system: 'You are a helpful assistant. Be concise and direct.',
  messages,
  maxTokens: 500
})

The model parameter uses the provider/model string format --- 'openai/gpt-4o-mini'. This is the universal model identifier. To swap providers, change this one string:

model: 'anthropic/claude-sonnet-4-20250514'
// or
model: 'google/gemini-2.0-flash'

The rest of your code stays identical. This is the main reason to use the SDK --- you are not locked to a single provider.

return result.toUIMessageStreamResponse()

This converts the result into a streaming response that the useChat hook on the client can consume token by token. It uses a protocol optimized for UI message rendering, handling text chunks, tool calls, and metadata.

Client Side: Line by Line

const { messages, input, setInput, handleSubmit, status, error } = useChat()

useChat does all the work:

messages --- the full conversation history, updated in real time as tokens arrive.
input and setInput --- controlled state for the text input.
handleSubmit --- sends the current input as a new user message to your API route.
status --- the current lifecycle state (see below).
error --- any error from the API call.

{message.parts
  .filter((part) => part.type === 'text')
  .map((part) => part.text)
  .join('')}

Messages in the AI SDK have a parts array, not a simple content string. Each part has a type --- text, tool call, tool result, and others. For basic chat, you filter for text parts and join them. This structure becomes important in the next lesson when you add tool use.

The Full Loop

Here is what happens on every message, step by step:

User types "What is RAG?" and clicks Send
  |
useChat sends POST /api/chat with messages array
  |
API route receives messages, calls streamText()
  |
streamText calls OpenAI with streaming enabled
  |
OpenAI generates tokens one at a time
  |
toUIMessageStreamResponse() converts each token to the streaming protocol
  |
useChat receives each event, updates the messages array
  |
React re-renders the message list with each new token
  |
User sees "RAG stands for..." appear word by word

The key insight: useChat manages the entire message array for you. It handles appending the user message, creating the assistant message placeholder, streaming tokens into it, and tracking the loading state. You do not manage any of this manually.

The Status Lifecycle

useChat exposes a status field with four possible values:

const { status } = useChat()

// 'ready'     - Idle. Waiting for user input.
// 'submitted' - Request sent. Waiting for first token from the server.
// 'streaming' - Tokens arriving. Response is being generated.
// 'error'     - Something failed.

Use status to control your UI:

Disable the input during submitted and streaming to prevent double-sends.
Show a typing indicator during streaming.
Display an error message and a retry button on error.

Common Mistakes

Mistake 1: Not streaming at all. If you use generateText instead of streamText, the server waits for the full response before sending anything. The user sees nothing for seconds. Always use streamText for chat interfaces.

// WRONG: blocks until complete
import { generateText } from 'ai'
const { text } = await generateText({ model: 'openai/gpt-4o-mini', messages })
return Response.json({ text })

// RIGHT: streams token by token
import { streamText } from 'ai'
const result = streamText({ model: 'openai/gpt-4o-mini', messages })
return result.toUIMessageStreamResponse()

Mistake 2: Forgetting error handling. API calls fail. Models time out. Rate limits hit. Always check the error field from useChat and show a meaningful message.

Mistake 3: Not setting maxTokens. Without a limit, the model can generate thousands of tokens on a single response. That costs money and creates a bad experience. Set maxTokens to a reasonable ceiling for your use case --- 500 for chat, 1000 for analysis, 2000 for long-form generation.

Mistake 4: Ignoring mobile. Test your chat interface on a phone. The input field should stay visible when the keyboard opens. Messages should scroll automatically. These are small details that break the experience if missed.

Try This

Add a model selector dropdown to your chat page. Let the user pick between openai/gpt-4o-mini, anthropic/claude-sonnet-4-20250514, and google/gemini-2.0-flash. Pass the selected model in the request body and use it in the API route:

// In your API route
const { messages, model = 'openai/gpt-4o-mini' } = await request.json()

const result = streamText({
  model,
  system: 'You are a helpful assistant. Be concise and direct.',
  messages,
  maxTokens: 500
})

This gives you a tangible feel for how different models respond to the same prompt --- some are faster, some are more verbose, some follow instructions more tightly. You will make model selection decisions for every feature you build.

What's Next

Your chat returns plain text. That covers conversations, but most product features need structured data --- extract a name and email from a support ticket, classify sentiment, parse an invoice into line items. In the next lesson, you will get structured JSON outputs and teach the model to call your functions with tool use.