Error Handling & Graceful Degradation | Voice & Chat Agent Engineering | Celestinosalim.com

Error Handling & Graceful Degradation

The Failure

A voice agent was handling a customer inquiry about their account balance. Mid-sentence, the LLM provider returned a 503. The agent's error handler logged the error and... did nothing. The user heard silence. Five seconds of silence. Then the WebRTC connection timed out. The user hung up and called back, got a different agent instance with no memory of the previous conversation, and had to start over. One provider blip turned into two failed conversations and a frustrated user.

The error was unavoidable. The experience was not. If the agent had said "I am having trouble connecting right now -- give me one moment" while retrying with a fallback provider, the user would have waited. Silence is the worst possible error message in a conversation. This lesson teaches you how to never deliver it.

The Taxonomy of Failures

Conversational agents have failure modes that traditional software does not. Understanding the categories is the first step to handling them.

1. Provider Failures

Your LLM, STT, or TTS service goes down or times out.

Symptoms: Empty responses, timeouts, HTTP 5xx errors, streaming interruptions.

Handling pattern: Retry with backoff, then fall back to a secondary provider.

async function generateWithFallback(
  messages: ModelMessage[],
  system: string
) {
  try {
    // Primary: Gemini 2.5 Flash
    return await streamText({
      model: google('gemini-2.5-flash'),
      system,
      messages,
    });
  } catch (primaryError) {
    console.error('Primary LLM failed:', primaryError);

    try {
      // Fallback: GPT-4o Mini (different provider entirely)
      return await streamText({
        model: openai('gpt-4o-mini'),
        system,
        messages,
      });
    } catch (fallbackError) {
      console.error('Fallback LLM also failed:', fallbackError);

      // Last resort: static response
      throw new AgentError(
        'I am having trouble connecting to my brain right now. '
        + 'Please try again in a moment.',
        { retryable: true }
      );
    }
  }
}

The critical principle: each fallback level degrades capability, not availability. The secondary model may be less capable, but the user still gets a response. The static message is the last resort -- the agent admits failure clearly rather than going silent.

2. Transcription Failures

In voice mode, STT can misinterpret speech -- producing garbled text, non-English characters when English is expected, or empty transcripts.

Handling pattern: Validate transcripts before processing.

function shouldIgnoreTranscript(text: string): boolean {
  const trimmed = text.trim();
  if (!trimmed) return true;

  // Count meaningful characters
  const alphaNumCount = (trimmed.match(/[A-Za-z0-9]/g) || []).length;
  const nonAsciiCount = (trimmed.match(/[^\x00-\x7F]/g) || []).length;

  // Too short -- likely noise
  if (alphaNumCount < 2) return true;

  // Non-English when we expect English
  if (nonAsciiCount > 0 && !/[A-Za-z]/.test(trimmed)) return true;

  return false;
}

This filter runs on every user turn. When a transcript is rejected, the agent responds with a clarification rather than attempting to answer gibberish:

async onUserTurnCompleted(ctx, msg) {
  const text = msg.textContent ?? '';
  if (shouldIgnoreTranscript(text)) {
    await this.session.generateReply({
      instructions: 'Let the user know you could not understand them '
        + 'and ask them to repeat their question.',
      allowInterruptions: true,
    });
    throw new voice.StopResponse();
  }
  await super.onUserTurnCompleted(ctx, msg);
}

The StopResponse exception is conversation-specific error handling. It does not crash the agent -- it tells the pipeline "do not process this turn further." The conversation continues. This is what "conversations are state machines" means in practice: invalid input transitions to a recovery state, not an error state.

3. Rate Limit Errors

Users hit rate limits. This is intentional -- you want to control costs. But the experience of hitting a limit should not feel punitive.

Handling pattern: Transparent, progressive disclosure.

// Show remaining questions proactively
{rateLimitInfo && (
  <div aria-label={
    `${rateLimitInfo.remaining} of ${rateLimitInfo.limit} questions remaining`
  }>
    <span>{rateLimitInfo.remaining}</span>
    <span>/</span>
    <span>{rateLimitInfo.limit}</span>
    <span>questions today</span>
  </div>
)}

// When the limit is hit, explain clearly
{rateLimitError && (
  <div role="alert">
    <h3>Daily Limit Reached</h3>
    <p>
      {rateLimitError.isAuthenticated
        ? 'You have used all your questions for today. Come back tomorrow.'
        : 'Sign in for more daily questions.'}
    </p>
    {!rateLimitError.isAuthenticated && (
      <a href={signInUrl}>Sign In for More</a>
    )}
  </div>
)}

The pattern: show the limit before they hit it (remaining counter), explain why when they do (clear message), and offer an action (sign in, upgrade, wait).

4. Connection Failures

WebRTC connections drop. SSE streams disconnect. The user's network changes from WiFi to cellular.

Handling pattern: Detect, inform, reconnect.

// Voice: monitor connection state
const connectionState = useConnectionState();

// Show status to user
{connectionState !== ConnectionState.Connected && (
  <span role="status" aria-live="polite">
    Reconnecting...
  </span>
)}

// Chat: handle streaming errors
const { status } = useChat({
  transport,
  onError: (error) => {
    const message = error instanceof Error ? error.message : String(error);
    if (message.includes('rate_limit') || message.includes('429')) {
      setRateLimitError({
        message: 'Daily limit reached.',
        isAuthenticated: false,
      });
    } else {
      setGenericError('Something went wrong. Please try again.');
    }
  },
});

5. Hallucination and Out-of-Scope Requests

The agent confidently answers a question it should not. This is the hardest failure to handle because the agent does not know it is wrong.

Handling pattern: Guardrails at the system prompt level, plus RAG grounding.

const systemPrompt = buildSystemPrompt(ragContext);
// The prompt includes:
// "If the retrieved context does not contain relevant information,
//  say 'I do not have that information' rather than guessing."
// "Do not answer questions about topics outside your expertise."
// "If unsure, ask the user to clarify."

RAG-grounded agents hallucinate less because they answer from retrieved documents, not parametric memory. But "less" is not "never." The system prompt is the last line of defense.

The Graceful Degradation Stack

Think of error handling as a stack, where each layer catches what the previous layer missed:

+-------------------------------+
| Layer 5: User-facing message  |  "I am having trouble. Try again."
+-------------------------------+
| Layer 4: Fallback provider    |  Switch from Gemini to GPT-4o Mini
+-------------------------------+
| Layer 3: Retry with backoff   |  3 attempts, exponential delay
+-------------------------------+
| Layer 2: Input validation     |  Reject bad transcripts, sanitize
+-------------------------------+
| Layer 1: Circuit breaker      |  Stop calling a failing service
+-------------------------------+

Each layer reduces the blast radius. If the circuit breaker is open, you skip retries and go straight to the fallback. If the fallback also fails, you give the user a clear, honest message. The user never hears silence.

Designing Error Messages for Conversation

Error messages in conversational AI are part of the conversation. They must:

Acknowledge the problem without technical jargon.
Take responsibility -- "I am having trouble" not "your request failed."
Offer a next step -- retry, rephrase, or try a different approach.
Match the persona -- the error message should sound like the same agent.

Bad: "Error 500: Internal Server Error" Good: "I ran into a problem trying to find that information. Could you ask that in a different way, or try again in a moment?"

Bad: "STT confidence below threshold" Good: "I did not catch that clearly. Could you repeat what you said?"

Monitoring and Alerting

You cannot fix what you cannot see. Instrument your agent for:

Error rates by type: Provider failures, transcript rejections, rate limits, connection drops.
Latency percentiles: p50, p95, p99 for each pipeline stage.
Conversation completion rate: Did users get their question answered?
Fallback activation rate: How often are secondary providers being used?

session.on(voice.AgentSessionEventTypes.Error, (ev) => {
  trackEvent('AgentError', {
    errorType: ev.error.name,
    errorMessage: ev.error.message,
    sessionId: room.name,
  });
});

If your fallback activation rate is above 5%, your primary provider has a reliability problem. If your transcript rejection rate is above 20%, your users are in noisy environments and you need better noise cancellation.

Build This

Add error handling to the voice agent from Lesson 6:

Implement generateWithFallback that tries your primary LLM, falls back to a secondary, and returns a static message as the last resort.
Add the shouldIgnoreTranscript filter to your agent's onUserTurnCompleted method. Test it by speaking gibberish into the microphone.
Wire up the AgentSessionEventTypes.Error event to your analytics or logging system.
Simulate a provider failure (set an invalid API key for the primary) and verify the fallback activates.
Measure your fallback activation rate over 20 test conversations. Target: under 5% in normal conditions.

Key Takeaways

Categorize failures: provider, transcription, rate limit, connection, hallucination. Each needs a different strategy.
Build a degradation stack: circuit breaker, retry, fallback provider, user message. Each layer catches what the previous missed.
Validate inputs before processing -- especially voice transcripts.
Error messages are part of the conversation. They must match the persona and offer a next step.
Show rate limits proactively -- users should know their remaining quota before they hit it.
Monitor everything. Error rates, latency percentiles, completion rates, fallback activation.

What's Next

You can build agents that work and recover when they fail. The final question is: how do you know if they are actually good? Next, we close the course with Measuring Conversational Quality -- defining the metrics that tell you whether your agent is earning its keep.