Glossary / Industry Applications

Voice AI

What vendors mean: AI you can talk to. What it actually means: a stack of three technologies that finally got good enough to put in front of customers in 2025-2026 — and is still failing in predictable ways.

Industry Applications

The Technical Definition

Voice AI is a system that conducts a spoken conversation with a human. The standard architecture stitches together three components: speech-to-text (the AI hears the caller), a large language model (the AI decides what to say), and text-to-speech (the AI speaks back). Modern voice AI also includes interruption handling, turn-taking logic, latency management, and tool-calling for things like looking up account information or booking an appointment. Vendors include Sierra, Retell, Vapi, Bland, Deepgram Aura, Air, and a fast-growing number of vertical specialists.

What This Actually Means for Your Business

The 2025-2026 wave of voice AI is a different category from the IVR systems and chatbot-on-a-phone-call attempts of the 2010s. Those systems failed because the underlying technologies failed: STT got names wrong, the dialog logic was a brittle decision tree, and TTS sounded like a robot reading. All three components got dramatically better between 2022 and 2026, and the result is voice agents that handle real conversations with real customers without the apology that used to be required.

The applications that are working at scale: appointment scheduling, customer service triage, lead qualification, after-hours support, billing inquiries, password resets, and outbound notification calls. The pattern across the wins: high-volume, narrow-scope, repeatable conversations where the failure case is “transfer to a human” and the cost of the failure is acceptable.

The applications that are failing: anything emotionally complex (cancellations, complaints, condolences), anything where the customer’s situation is unusual enough that the agent has to improvise, and anything where the cost of a wrong answer is high (medical advice, legal counsel, large financial decisions). Voice AI handles structured conversations well; it handles human messiness badly. CEOs who try to deploy voice AI on cancellation calls usually end up with churn rates that look great because the customers can’t get through to cancel.

The economics are increasingly compelling. A reasonable voice AI deployment costs $0.05-0.30 per minute fully loaded, versus $1.00-2.50 per minute for a human agent in a US/EU call center. The math works on volume above ~100K calls per month and on use cases where average handle time is 2-4 minutes. Below those thresholds, the integration work eats the savings. Above them, the savings are real and the customer experience is often better than what mid-tier offshore call centers were producing.

The hidden cost is conversation design. The companies getting voice AI right have invested significantly in writing dialog flows, handling interruptions, managing escalations, and building the test suites that catch regressions when the underlying LLM gets updated. Treating voice AI as a “drop in and watch it work” deployment is the most reliable way to end up in the AI Failure Museum.

Reality Check

What the vendor says: “Our voice AI handles 80% of your calls without a human.”

What that means in practice: It handles 80% of a specific subset of calls — usually account lookups, appointment scheduling, and FAQ-style inquiries — that the vendor scoped during the pilot. The other 20% of those calls escalate. The calls outside the scoped subset (the irate customer, the complex billing dispute, the multi-issue caller) still go to a human. Containment rate on the actual call mix is usually 40-65%, not 80%.

What Operators Actually Do

Companies deploying voice AI well start with one well-defined call type, instrument it heavily, and only expand once the metrics are stable. They define containment (how often the bot resolves without a human), CSAT (whether the customer was satisfied), and AHT (average handle time) up front, and they measure the bot against the human baseline on the same metrics, not against itself.

They also pay attention to escalation paths. The bot needs to recognize when it’s failing and hand off to a human cleanly, with full context, before the customer gets frustrated. The companies that get this wrong have customers screaming “AGENT” at the bot for thirty seconds before transfer. The companies that get this right have a graceful escalation in under five seconds with a warm handoff that includes the conversation transcript.

Conversation design has emerged as a real discipline. The best voice AI deployments have a dedicated conversation designer working alongside the engineering team. They write dialog patterns, define escalation triggers, build glossaries, and own the test suite. This role didn’t exist in most companies two years ago. Companies that try to assign it to a chatbot product manager as a side project usually underestimate the complexity by an order of magnitude.

The Questions to Ask

  1. What’s the containment rate on call types like ours, measured by an independent reviewer? Vendor metrics on demo calls are not the same as production metrics on your actual call mix. Ask for a third-party audited containment number, or run a pilot where you measure it yourself.

  2. What does the escalation path look like, and how do humans receive context? A clean escalation with full conversation context preserves the customer relationship. A cold transfer that makes the customer repeat themselves destroys it. Test the handoff before you sign.

  3. Who owns conversation design, and who maintains it as the model changes? LLM updates change voice agent behavior in subtle ways. Without an owner who tests regressions before each model change, voice AI quality degrades over time without anyone noticing until customers complain.

Get the next Brief

One operator. Every other Wednesday.

Plus the AI Glossary and the Failure Museum.
Real names. Real numbers. Honest analysis.