All posts

Framework or service? We did both. Here’s when to use which.

Saurabh Sharma
Saurabh Sharma
·

Agent-Stream now has a bridges/ directory. Inside it: a production-grade Python bridge for ElevenLabs, contributed by Jitendra after running it in production with real callers.

This isn’t just an alternative implementation. The diff between the Node.js framework and the Python bridge is the most honest description of what running voice AI at scale actually demands.

What’s new

The feature/elevenlabs-production-bridge branch contains bridges/elevenlabs-production/ — a complete Python service for bridging Exotel telephony to ElevenLabs Conversational AI. It ships with:

  • FastAPI-based WebSocket server
  • Background sound mixing at the PCM level
  • India regional ElevenLabs endpoint support
  • Post-call transfer via Exotel Programmable Connect
  • Per-call transcript logging
  • Deployment configs for AWS ECS Fargate and GCP Cloud Run

The Node.js framework stays in bots/ — seven example bots, multi-provider support, the ExotelWSSServer base class. Both are maintained.

Node.js framework: built for learning and comparison

Seven bots covering different patterns: echo bot (round-trips audio), LLM bot (GPT-based conversation), STT bot, multilingual bot, and others. If you’re learning the Exotel AgentStream WebSocket protocol, comparing how ElevenLabs vs Deepgram fit the same stack, or building something custom — start here.

ExotelWSSServer handles connection lifecycle. Subclass it and override the audio handler:

class MyBot extends ExotelWSSServer {
async handleAudio(audioChunk, callSid) {
const response = await yourAIModel.process(audioChunk);
this.sendAudio(response, callSid);
}
}

It resamples audio explicitly — 8 kHz to 16 kHz in the base class — so your handler always receives 16 kHz regardless of what Exotel sends.

WebSocket endpoint: wss://your-server:5001/media

Python bridge: what production required

India regional endpoint

Set ELEVENLABS_REGION=india and the bridge routes to wss://api.in.residency.elevenlabs.io instead of the global endpoint. In testing this reduced round-trip latency by ~400 ms from Mumbai PSTN infrastructure. For voice AI, 400 ms is the difference between a conversation that sounds natural and one that doesn’t. If your callers are in India, this is the first thing you set.

Background sound mixing

The bridge mixes ambient audio into bot output at the PCM level before it reaches the caller. Prepare your audio:

# Must be 8 kHz, 16-bit, mono
ffmpeg -i your-file.mp3 -ar 8000 -ac 1 -f s16le background.pcm

Set BACKGROUND_SOUND_FILE=background.pcm. Runs on every call automatically.

Post-call transfer

After the bot finishes, the bridge transfers to a human agent via Exotel Programmable Connect:

  1. Bot ends conversation
  2. ElevenLabs fires post-call webhook with outcome
  3. Bridge waits up to 10 seconds to receive the webhook
  4. Bridge calls Exotel Programmable Connect to initiate transfer

ElevenLabs fires within 2–3 seconds in practice. If transfers are failing, check webhook delivery timing in your ElevenLabs dashboard first.

TRANSFER_ENABLED=true
TRANSFER_NUMBER=+91XXXXXXXXXX

Per-call transcript logging

Every call logs to logs/{call_sid}.json with timestamps and speaker turns. Essential for QA, dispute resolution, and building model evaluation datasets.

Getting started

Node.js:

git clone https://github.com/exotel/Agent-Stream
cd Agent-Stream && npm install
ELEVENLABS_API_KEY=xxx node bots/elevenlabs-bot.js

Python bridge:

git clone -b feature/elevenlabs-production-bridge https://github.com/exotel/Agent-Stream
cd Agent-Stream/bridges/elevenlabs-production
pip install -r requirements.txt
cp .env.example .env
# Set: ELEVENLABS_API_KEY, EXOTEL_API_KEY, EXOTEL_API_TOKEN, EXOTEL_ACCOUNT_SID
ELEVENLABS_REGION=india uvicorn main:app –host 0.0.0.0 –port 8000

WebSocket endpoint: wss://your-server/v1/convai/conversation/exotel

The audio resampling question

Node.js resamples 8 kHz → 16 kHz explicitly. The Python bridge passes 8 kHz through without resampling. Both work in production.

We’ll be straight: we don’t have definitive evidence for which is better at scale. Node.js is defensive — 16 kHz guaranteed regardless of provider config. Python is pragmatic — skip the conversion, let ElevenLabs handle it when agent_output_audio_format is set correctly.

If you hit audio quality issues, check agent_output_audio_format in your ElevenLabs agent config first. A mismatch between that setting and what you’re actually sending is the most common root cause.

When to use which

Situation Use
Learning the Exotel AgentStream WebSocket protocol Node.js framework
Comparing ElevenLabs, Deepgram, OpenAI in one codebase Node.js framework
Shipping ElevenLabs to production Python bridge
Need India-region latency (~400 ms improvement) Python bridge
Need background sound mixing Python bridge
Need post-call transfer to human Python bridge
Deploying to AWS or GCP with provided configs Python bridge

AWS deployment note

Don’t use App Runner. App Runner’s Envoy proxy returns HTTP 403 on WebSocket upgrade requests — it fails silently in a way that looks like an auth error. Use ECS Fargate. The repo includes a working task-definition.json.

Credit and contributing

Python bridge built by Jitendra, now vendored as bridges/elevenlabs-production/. Send improvements as PRs to the upstream repo — we sync via git subtree. bridges/README.md is the canonical comparison doc.

Repo: Agent-Stream — feature/elevenlabs-production-bridge

Framework or service? We did both. Here’s when to use which.