Conversation Design for AI Agents | Voice & Chat Agent Engineering | Celestinosalim.com

Conversation Design for AI Agents

The Failure

A developer shipped an AI assistant for a SaaS product. The model was capable -- it could answer questions about every feature. But users kept asking the same question three different ways because the agent's first response was a wall of text that did not address what they actually meant. When users said "never mind" and closed the chat, the agent had no way to recover. There was no clarification step, no disambiguation, no graceful exit.

The model was not the problem. The conversation was. The agent treated every interaction as a single request-response pair. It had no sense of flow, no strategy for ambiguity, and no plan for when things went sideways. Conversation design is what prevents this.

The Conversation is the Interface

Traditional software has buttons, forms, and navigation. Conversational AI has none of that. The conversation is the entire interface. Every word the agent says is simultaneously content, navigation, and UX feedback.

This means conversation design is not "prompt engineering with manners." It is interface design. It requires the same rigor you would apply to a checkout flow or an onboarding wizard. And like any interface, it must handle the unhappy path as well as the happy one.

The Five Pillars

1. Persona

Your agent needs a consistent identity. Not a gimmick -- a reliable personality that users learn to predict.

What to define:

Tone: Professional? Casual? Technical? Warm?
Expertise level: Does the agent explain like an expert or a peer?
Boundaries: What will the agent refuse to do?
Name and framing: Is this "an AI assistant" or "Celestino's digital twin"?

The persona must adapt its delivery between modalities while keeping its character constant:

const systemPrompt = buildSystemPrompt(ragContext);

// Voice mode adds delivery constraints:
const voiceAddendum = `
Voice response rules:
- Respond in plain text only; no markdown, lists, or code.
- Keep replies brief: one to three sentences.
- Ask one question at a time.
- Spell out numbers and email addresses.
`;

The voice addendum is critical. The same persona behaves differently in voice -- shorter sentences, no formatting, one question at a time. The character stays the same; the delivery adapts to the medium.

2. Turn-Taking

Conversations have rhythm. Someone speaks, someone listens, they switch. In human conversation this is automatic. In AI conversation, you have to engineer it.

Chat turn-taking is straightforward -- the user sends a message, the agent responds. The complexity comes from:

Multi-message sequences: Users who send three messages before the agent responds.
Streaming interruption: The agent is still generating when the user wants to interject.
Tool execution pauses: The agent stops mid-response to call a function.

Voice turn-taking is harder:

Endpointing: When has the user finished speaking? Too early and you cut them off. Too late and silence feels awkward.
Interruptions: The user speaks while the agent is speaking. Do you stop? Keep going?
Backchanneling: Humans say "mm-hmm" and "right" during pauses. AI agents typically do not.

In production, voice endpointing requires generous margins:

const session = new voice.AgentSession({
  stt, llm, tts,
  voiceOptions: {
    minEndpointingDelay: 1000,   // Wait 1s of silence
    maxEndpointingDelay: 5000,   // But no more than 5s
    minInterruptionDuration: 800, // Ignore brief crosstalk
    minInterruptionWords: 2,      // Need 2+ words to interrupt
    preemptiveGeneration: true,   // Start generating during silence
  },
});

These values prevent the agent from jumping in too early while still feeling responsive. They were tuned through real user testing, not guesswork.

3. Grounding and Context

Users do not arrive with full context. They drop into a conversation mid-thought. Your agent needs to ground itself -- establish what it knows, what it does not know, and what it needs.

Good grounding patterns:

Opening statement: "I am Celestino's AI -- I can answer questions about his work, projects, and expertise. What would you like to know?"
Clarification requests: "I found a few things about that. Are you asking about the LiveKit voice agent or the AI SDK chat implementation?"
Scope acknowledgment: "I do not have information about pricing. You can reach Celestino directly for that."

Bad grounding patterns:

Starting with "How can I help you?" (too generic, no context about capabilities)
Answering questions outside the agent's knowledge (hallucination)
Never saying "I do not know" (destroys trust when wrong)

4. Error Recovery

Every conversation will go wrong. The question is whether the user recovers or abandons.

Three error types, three designed responses:

Misunderstanding: "I interpreted that as [X]. Did you mean something different?"
Inability:       "I cannot do [X], but I can help with [Y]. Would that work?"
System failure:  "I am having trouble connecting right now. Try again in a moment."

Each response acknowledges the problem, takes responsibility, and offers a next step. Generic "sorry, I did not understand" messages fail on all three counts.

5. Guided Flow vs. Open Conversation

Some agents should guide. Some should follow. Most should do both.

Guided flow works when the user has a clear goal: booking an appointment, filling a form, completing a wizard. The agent leads with structured questions.

Open conversation works when the user is exploring: asking about a product, learning about a topic, chatting out of curiosity. The agent follows the user's lead.

The hybrid approach uses suggestion chips -- pre-written prompts that guide without constraining:

const suggestionChips = [
  { text: "What does Celestino work on?", icon: "robot" },
  { text: "What is his tech stack?", icon: "lightning" },
  { text: "Tell me about his projects", icon: "rocket" },
  { text: "How can I hire him?", icon: "briefcase" },
];

These lower the barrier to entry. Users who do not know what to ask get a starting point. Users who do can ignore them entirely.

Designing for Trust

Trust is the meta-pattern. Every design decision either builds or erodes it.

Trust builders:

Admitting uncertainty: "I am not sure about that, but based on what I know..."
Citing sources: "According to the knowledge base, Celestino worked on..."
Consistent behavior: Same persona, same quality, every interaction.
Rate limit transparency: Showing "5 of 15 questions remaining today."

Trust destroyers:

Hallucinating facts.
Changing personality mid-conversation.
Generic error messages that explain nothing.
Pretending to be human when the user knows it is AI.

Build This

Design a conversation flow document for your agent with these deliverables:

Persona sheet: Tone, expertise level, boundaries, name/framing. Write 3 example responses that demonstrate the persona -- one helpful, one declining a request, one admitting uncertainty.
Grounding script: Write the opening message for chat and the opening message for voice. They should convey the same information with different delivery.
Error response matrix: For each of the three error types (misunderstanding, inability, system failure), write the chat response and the voice response. Voice responses must be under 2 sentences.
Turn-taking config: If building voice, define your voiceOptions values and document why each value was chosen.

This document is your conversation design spec. Reference it every time you write a system prompt or handle an error.

Key Takeaways

The conversation is the interface. Every word is content, navigation, and UX feedback simultaneously.
Define a persona and stick to it across modalities, adapting delivery but not character.
Engineer turn-taking explicitly -- especially for voice, where endpointing and interruptions determine quality.
Ground the agent early -- state what it can do, what it cannot, and what it needs from the user.
Design error responses for misunderstanding, inability, and system failure separately.
Trust is the meta-pattern. Admit uncertainty, cite sources, and never hallucinate.

What's Next

You have a persona, a grounding strategy, and error responses designed. Now it is time to build. Next, we cover Streaming Chat with the AI SDK -- turning these conversation design principles into a working chat interface with real-time token delivery, session management, and custom data channels.