Start Lesson
A developer shipped an AI assistant for a SaaS product. The model was capable -- it could answer questions about every feature. But users kept asking the same question three different ways because the agent's first response was a wall of text that did not address what they actually meant. When users said "never mind" and closed the chat, the agent had no way to recover. There was no clarification step, no disambiguation, no graceful exit.
The model was not the problem. The conversation was. The agent treated every interaction as a single request-response pair. It had no sense of flow, no strategy for ambiguity, and no plan for when things went sideways. Conversation design is what prevents this.
Traditional software has buttons, forms, and navigation. Conversational AI has none of that. The conversation is the entire interface. Every word the agent says is simultaneously content, navigation, and UX feedback.
This means conversation design is not "prompt engineering with manners." It is interface design. It requires the same rigor you would apply to a checkout flow or an onboarding wizard. And like any interface, it must handle the unhappy path as well as the happy one.
Your agent needs a consistent identity. Not a gimmick -- a reliable personality that users learn to predict.
What to define:
The persona must adapt its delivery between modalities while keeping its character constant:
const systemPrompt = buildSystemPrompt(ragContext);
// Voice mode adds delivery constraints:
const voiceAddendum = `
Voice response rules:
- Respond in plain text only; no markdown, lists, or code.
- Keep replies brief: one to three sentences.
- Ask one question at a time.
- Spell out numbers and email addresses.
`;
The voice addendum is critical. The same persona behaves differently in voice -- shorter sentences, no formatting, one question at a time. The character stays the same; the delivery adapts to the medium.
Conversations have rhythm. Someone speaks, someone listens, they switch. In human conversation this is automatic. In AI conversation, you have to engineer it.
Chat turn-taking is straightforward -- the user sends a message, the agent responds. The complexity comes from:
Voice turn-taking is harder:
In production, voice endpointing requires generous margins:
const session = new voice.AgentSession({
stt, llm, tts,
voiceOptions: {
minEndpointingDelay: 1000, // Wait 1s of silence
maxEndpointingDelay: 5000, // But no more than 5s
minInterruptionDuration: 800, // Ignore brief crosstalk
minInterruptionWords: 2, // Need 2+ words to interrupt
preemptiveGeneration: true, // Start generating during silence
},
});
These values prevent the agent from jumping in too early while still feeling responsive. They were tuned through real user testing, not guesswork.
Users do not arrive with full context. They drop into a conversation mid-thought. Your agent needs to ground itself -- establish what it knows, what it does not know, and what it needs.
Good grounding patterns:
Bad grounding patterns:
Every conversation will go wrong. The question is whether the user recovers or abandons.
Three error types, three designed responses:
Misunderstanding: "I interpreted that as [X]. Did you mean something different?"
Inability: "I cannot do [X], but I can help with [Y]. Would that work?"
System failure: "I am having trouble connecting right now. Try again in a moment."
Each response acknowledges the problem, takes responsibility, and offers a next step. Generic "sorry, I did not understand" messages fail on all three counts.
Some agents should guide. Some should follow. Most should do both.
Guided flow works when the user has a clear goal: booking an appointment, filling a form, completing a wizard. The agent leads with structured questions.
Open conversation works when the user is exploring: asking about a product, learning about a topic, chatting out of curiosity. The agent follows the user's lead.
The hybrid approach uses suggestion chips -- pre-written prompts that guide without constraining:
const suggestionChips = [
{ text: "What does Celestino work on?", icon: "robot" },
{ text: "What is his tech stack?", icon: "lightning" },
{ text: "Tell me about his projects", icon: "rocket" },
{ text: "How can I hire him?", icon: "briefcase" },
];
These lower the barrier to entry. Users who do not know what to ask get a starting point. Users who do can ignore them entirely.
Trust is the meta-pattern. Every design decision either builds or erodes it.
Trust builders:
Trust destroyers:
Design a conversation flow document for your agent with these deliverables:
voiceOptions values and document why each value was chosen.This document is your conversation design spec. Reference it every time you write a system prompt or handle an error.
You have a persona, a grounding strategy, and error responses designed. Now it is time to build. Next, we cover Streaming Chat with the AI SDK -- turning these conversation design principles into a working chat interface with real-time token delivery, session management, and custom data channels.