Question 1

What is STT (Speech-to-Text)?

Accepted Answer

Also called automatic speech recognition (ASR). Modern streaming STT services like Deepgram, Google Speech-to-Text, and AssemblyAI return partial transcripts every 100–200 ms while the caller is still speaking. Quality differentiators include accent handling, noise robustness, and end-of-utterance detection.

Question 2

What is TTS (Text-to-Speech)?

Accepted Answer

Modern providers (ElevenLabs, OpenAI TTS, Cartesia, Play.ht) produce natural prosody, pauses, and inflection. Streaming TTS starts speaking the first sentence while later text is still generating, shaving 200–400 ms off perceived latency.

Question 3

What is LLM (Large Language Model)?

Accepted Answer

In voice agents the LLM also calls tools (calendar, CRM, knowledge base) when the caller’s request requires real-world action. Common choices: Claude Sonnet/Haiku, GPT-4o family, Gemini Flash, and specialty voice-tuned LLMs.

Question 4

What is VAD (Voice Activity Detection)?

Accepted Answer

Silero VAD is a common open-source choice. Good VAD is what lets the agent know when to listen, when to interrupt, and when the caller has finished an utterance.

Question 5

What is Barge-in?

Accepted Answer

Barge-in support means callers can interrupt the AI mid-sentence and the agent will stop speaking, listen, and respond. Without it, calls feel rigid and impatient callers hang up.

Question 6

What is End-of-utterance detection?

Accepted Answer

Cut someone off after a 200 ms pause and you sound rude; wait 1.5 seconds and you sound dead. Tuning end-of-utterance is one of the harder voice-AI engineering problems and a major differentiator between platforms.

Question 7

What is Round-trip latency?

Accepted Answer

A well-tuned 2026 stack runs 500–900 ms total: STT ~100–250 ms + LLM 200–500 ms + TTS first audio 100–250 ms. Sub-800 ms feels human. 1.5+ seconds feels broken.

Question 8

What is Streaming pipeline?

Accepted Answer

STT emits partial transcripts; the LLM starts generating before STT finalizes; TTS starts speaking the first sentence while the LLM continues. Streaming is the difference between a usable voice agent and a chatbot bolted onto a phone line.

Question 9

What is IVR (Interactive Voice Response)?

Accepted Answer

Pre-AI phone-routing systems. Callers navigate by DTMF (touch-tone) or fixed-keyword speech. Hang-up rates are notoriously high. See our deeper take in /blog/ivr-vs-ai-voice-agent.

Question 10

What is DTMF?

Accepted Answer

Stands for Dual-Tone Multi-Frequency. AI voice agents typically don’t require DTMF (callers just speak), but DTMF capture is still useful for sensitive inputs like SSN digits or credit card numbers where typing is preferred.

Question 11

What is SIP (Session Initiation Protocol)?

Accepted Answer

Most AI voice agent platforms use SIP trunks via providers like Twilio, Telnyx, or Vonage. SIP carries the audio between the caller, the platform, and any human transfer endpoints.

Question 12

What is Conditional forwarding?

Accepted Answer

Set on your existing business line: ring 4 times, then forward to the AI agent’s number. Lower-risk way to deploy AI without changing your published number.

Question 13

What is Warm transfer?

Accepted Answer

Better than a cold transfer because the human picks up already knowing who’s calling, what they need, and what the AI has already collected. The caller doesn’t have to repeat themselves.

Question 14

What is BAA (Business Associate Agreement)?

Accepted Answer

Required by HIPAA whenever a third party handles protected health information (PHI) on your behalf. Reputable healthcare-focused voice AI platforms will sign one. See HHS BAA standard provisions for what must be included.

Question 15

What is TCPA (Telephone Consumer Protection Act)?

Accepted Answer

Outbound calls to consumers — including AI-driven ones — generally require prior express consent. The FCC has issued specific rulings on AI-generated voice calls. Reputable platforms help with consent tracking, identification, opt-outs, and DNC compliance.

Question 16

What is PHI (Protected Health Information)?

Accepted Answer

In US healthcare, PHI must be protected per HIPAA. For voice AI, this means encrypted call recording storage, configurable retention, role-based access, audit logs, and a signed BAA.

Question 17

What is ABA Model Rule 1.18?

Accepted Answer

Lawyers owe a baseline confidentiality duty even to people who only inquire about representation. AI legal intake systems must treat every caller’s inputs as protected and design conflict checks accordingly.

Question 18

What is UPL (Unauthorized Practice of Law)?

Accepted Answer

AI legal intake assistants must scrupulously avoid offering advice — they can collect facts, schedule consultations, and route by practice area, but never recommend a course of action.

Question 19

What is AI disclosure laws?

Accepted Answer

California SB 1001 and similar state laws require AI agents to identify themselves on commercial calls. Best practice: identify as AI in your opening greeting on every call regardless of jurisdiction.

Question 20

What is TRS (Telecommunications Relay Services)?

Accepted Answer

FCC TRS rules require accessibility for callers with hearing or speech disabilities. AI voice systems should support TTY relay, real-time text, and human escalation paths.

Question 21

What is Knowledge base?

Accepted Answer

Hours, services, pricing, policies, FAQs, escalation rules, accepted insurance — everything the AI uses to answer caller questions. Quality of the knowledge base is usually the biggest determinant of how good the agent feels.

Question 22

What is Tool call (function call)?

Accepted Answer

Booking a calendar slot, looking up a customer in a CRM, sending an SMS confirmation, transferring to a human. Tool calls are what separate AI voice agents from chatbots that just chat.

Question 23

What is Guardrails?

Accepted Answer

Configured rules like "never give medical advice," "always escalate emergencies to a human," "never quote prices outside the published range." Good agents are honest about what they can’t do and route accordingly.

Question 24

What is No-code platform?

Accepted Answer

Configuration via plain English, drag-and-drop flow builders, and integrations connected through OAuth. JagCall, Goodcall, and Synthflow are no-code-leaning. Bland and Vapi are API-first (developer required).

Question 25

What is API-first platform?

Accepted Answer

You wire STT/LLM/TTS yourself, define logic in code, deploy on the platform’s runtime. Maximum flexibility, minimum onboarding speed. Best when you have engineers and a custom workflow.

Question 26

What is After-hours coverage?

Accepted Answer

Most service businesses get 25–40% of calls outside 9–5. AI handling those calls — even just for booking and overflow — typically pays for itself in the first month. See /use-cases/after-hours-support.

Question 27

What is Per-minute pricing?

Accepted Answer

Common for API-first platforms: $0.07–$0.20 per minute plus telephony costs. Predictable for high-volume operations; can surprise you on a busy month.

Question 28

What is Plan-based pricing?

Accepted Answer

Common for SMB-focused no-code platforms: $49–$199/month all-in. Predictable monthly bill; overage rates kick in if you exceed included minutes.

Question 29

What is Telephony fees?

Accepted Answer

Separate from the AI cost. Inbound DID minutes are typically $0.005–$0.02/min via Twilio or similar. Outbound calls cost more. Some plan-based platforms bundle telephony; some bill it separately.

AI Voice Agent Glossary

Voice pipeline terms

STT (Speech-to-Text)

TTS (Text-to-Speech)

LLM (Large Language Model)

VAD (Voice Activity Detection)

Barge-in

End-of-utterance detection

Round-trip latency

Streaming pipeline

Telephony terms

IVR (Interactive Voice Response)

DTMF

SIP (Session Initiation Protocol)

Conditional forwarding

Warm transfer

Compliance & legal terms

BAA (Business Associate Agreement)

TCPA (Telephone Consumer Protection Act)

PHI (Protected Health Information)

ABA Model Rule 1.18

UPL (Unauthorized Practice of Law)

AI disclosure laws

TRS (Telecommunications Relay Services)

Architecture terms

Knowledge base

Tool call (function call)

Guardrails

No-code platform

API-first platform

After-hours coverage

Pricing terms

Per-minute pricing

Plan-based pricing

Telephony fees

Want more depth?

Ready to deploy an AI voice agent?