Start Lesson
A voice agent was handling a customer inquiry about their account balance. Mid-sentence, the LLM provider returned a 503. The agent's error handler logged the error and... did nothing. The user heard silence. Five seconds of silence. Then the WebRTC connection timed out. The user hung up and called back, got a different agent instance with no memory of the previous conversation, and had to start over. One provider blip turned into two failed conversations and a frustrated user.
The error was unavoidable. The experience was not. If the agent had said "I am having trouble connecting right now -- give me one moment" while retrying with a fallback provider, the user would have waited. Silence is the worst possible error message in a conversation. This lesson teaches you how to never deliver it.
Conversational agents have failure modes that traditional software does not. Understanding the categories is the first step to handling them.
Your LLM, STT, or TTS service goes down or times out.
Symptoms: Empty responses, timeouts, HTTP 5xx errors, streaming interruptions.
Handling pattern: Retry with backoff, then fall back to a secondary provider.
async function generateWithFallback(
messages: ModelMessage[],
system: string
) {
try {
// Primary: Gemini 2.5 Flash
return await streamText({
model: google('gemini-2.5-flash'),
system,
messages,
});
} catch (primaryError) {
console.error('Primary LLM failed:', primaryError);
try {
// Fallback: GPT-4o Mini (different provider entirely)
return await streamText({
model: openai('gpt-4o-mini'),
system,
messages,
});
} catch (fallbackError) {
console.error('Fallback LLM also failed:', fallbackError);
// Last resort: static response
throw new AgentError(
'I am having trouble connecting to my brain right now. '
+ 'Please try again in a moment.',
{ retryable: true }
);
}
}
}
The critical principle: each fallback level degrades capability, not availability. The secondary model may be less capable, but the user still gets a response. The static message is the last resort -- the agent admits failure clearly rather than going silent.
In voice mode, STT can misinterpret speech -- producing garbled text, non-English characters when English is expected, or empty transcripts.
Handling pattern: Validate transcripts before processing.
function shouldIgnoreTranscript(text: string): boolean {
const trimmed = text.trim();
if (!trimmed) return true;
// Count meaningful characters
const alphaNumCount = (trimmed.match(/[A-Za-z0-9]/g) || []).length;
const nonAsciiCount = (trimmed.match(/[^\x00-\x7F]/g) || []).length;
// Too short -- likely noise
if (alphaNumCount < 2) return true;
// Non-English when we expect English
if (nonAsciiCount > 0 && !/[A-Za-z]/.test(trimmed)) return true;
return false;
}
This filter runs on every user turn. When a transcript is rejected, the agent responds with a clarification rather than attempting to answer gibberish:
async onUserTurnCompleted(ctx, msg) {
const text = msg.textContent ?? '';
if (shouldIgnoreTranscript(text)) {
await this.session.generateReply({
instructions: 'Let the user know you could not understand them '
+ 'and ask them to repeat their question.',
allowInterruptions: true,
});
throw new voice.StopResponse();
}
await super.onUserTurnCompleted(ctx, msg);
}
The StopResponse exception is conversation-specific error handling. It does not crash the agent -- it tells the pipeline "do not process this turn further." The conversation continues. This is what "conversations are state machines" means in practice: invalid input transitions to a recovery state, not an error state.
Users hit rate limits. This is intentional -- you want to control costs. But the experience of hitting a limit should not feel punitive.
Handling pattern: Transparent, progressive disclosure.
// Show remaining questions proactively
{rateLimitInfo && (
<div aria-label={
`${rateLimitInfo.remaining} of ${rateLimitInfo.limit} questions remaining`
}>
<span>{rateLimitInfo.remaining}</span>
<span>/</span>
<span>{rateLimitInfo.limit}</span>
<span>questions today</span>
</div>
)}
// When the limit is hit, explain clearly
{rateLimitError && (
<div role="alert">
<h3>Daily Limit Reached</h3>
<p>
{rateLimitError.isAuthenticated
? 'You have used all your questions for today. Come back tomorrow.'
: 'Sign in for more daily questions.'}
</p>
{!rateLimitError.isAuthenticated && (
<a href={signInUrl}>Sign In for More</a>
)}
</div>
)}
The pattern: show the limit before they hit it (remaining counter), explain why when they do (clear message), and offer an action (sign in, upgrade, wait).
WebRTC connections drop. SSE streams disconnect. The user's network changes from WiFi to cellular.
Handling pattern: Detect, inform, reconnect.
// Voice: monitor connection state
const connectionState = useConnectionState();
// Show status to user
{connectionState !== ConnectionState.Connected && (
<span role="status" aria-live="polite">
Reconnecting...
</span>
)}
// Chat: handle streaming errors
const { status } = useChat({
transport,
onError: (error) => {
const message = error instanceof Error ? error.message : String(error);
if (message.includes('rate_limit') || message.includes('429')) {
setRateLimitError({
message: 'Daily limit reached.',
isAuthenticated: false,
});
} else {
setGenericError('Something went wrong. Please try again.');
}
},
});
The agent confidently answers a question it should not. This is the hardest failure to handle because the agent does not know it is wrong.
Handling pattern: Guardrails at the system prompt level, plus RAG grounding.
const systemPrompt = buildSystemPrompt(ragContext);
// The prompt includes:
// "If the retrieved context does not contain relevant information,
// say 'I do not have that information' rather than guessing."
// "Do not answer questions about topics outside your expertise."
// "If unsure, ask the user to clarify."
RAG-grounded agents hallucinate less because they answer from retrieved documents, not parametric memory. But "less" is not "never." The system prompt is the last line of defense.
Think of error handling as a stack, where each layer catches what the previous layer missed:
+-------------------------------+
| Layer 5: User-facing message | "I am having trouble. Try again."
+-------------------------------+
| Layer 4: Fallback provider | Switch from Gemini to GPT-4o Mini
+-------------------------------+
| Layer 3: Retry with backoff | 3 attempts, exponential delay
+-------------------------------+
| Layer 2: Input validation | Reject bad transcripts, sanitize
+-------------------------------+
| Layer 1: Circuit breaker | Stop calling a failing service
+-------------------------------+
Each layer reduces the blast radius. If the circuit breaker is open, you skip retries and go straight to the fallback. If the fallback also fails, you give the user a clear, honest message. The user never hears silence.
Error messages in conversational AI are part of the conversation. They must:
Bad: "Error 500: Internal Server Error" Good: "I ran into a problem trying to find that information. Could you ask that in a different way, or try again in a moment?"
Bad: "STT confidence below threshold" Good: "I did not catch that clearly. Could you repeat what you said?"
You cannot fix what you cannot see. Instrument your agent for:
session.on(voice.AgentSessionEventTypes.Error, (ev) => {
trackEvent('AgentError', {
errorType: ev.error.name,
errorMessage: ev.error.message,
sessionId: room.name,
});
});
If your fallback activation rate is above 5%, your primary provider has a reliability problem. If your transcript rejection rate is above 20%, your users are in noisy environments and you need better noise cancellation.
Add error handling to the voice agent from Lesson 6:
generateWithFallback that tries your primary LLM, falls back to a secondary, and returns a static message as the last resort.shouldIgnoreTranscript filter to your agent's onUserTurnCompleted method. Test it by speaking gibberish into the microphone.AgentSessionEventTypes.Error event to your analytics or logging system.You can build agents that work and recover when they fail. The final question is: how do you know if they are actually good? Next, we close the course with Measuring Conversational Quality -- defining the metrics that tell you whether your agent is earning its keep.