By Shivakumar Ganesan, May 2026
Today, an AI called Joy.
Joy is an investor. He picked up, heard a voice say “Hey Joy, this is Shivku’s AI assistant calling (yes, an actual AI making a phone call on his behalf),” and instead of hanging up, he said: “Yeah, I do. Alright, good to connect.”
Three minutes later, he confirmed he’d attend EngageX. He asked what time the event starts. He asked about other highlights. He asked for directions from Worli to Taj Lands End. He even asked the AI if it could help him make a dinner reservation at the Taj (it couldn’t. It stayed in its lane).
This was not a demo. There was no script. I wasn’t in the room. The AI placed the call, had the conversation, and came back to me with: “Joy said yes.”
Here’s how it works. And how you can build the same thing.
Why I built this
I’ve been using AI as an operating system for the last few months. Email triage, note synthesis, calendar management, chase tracking. It’s transformed how I run the company.
But there was always one gap: the AI could think, but it couldn’t pick up a phone.
Every time I needed to call someone: check if a deployment was done, confirm a speaker was attending, follow up on a deadline. I had to break out of the flow, dial, wait, talk, come back, and brief. The assistant was smart but its hands were tied.
So I built /call.
Type /call Joy invite him to EngageX and the AI handles it. Updates the voice agent, places the call, monitors it, fetches the transcript, and reports back in one line. Joy said yes. Or: no answer, want me to retry?
Zero friction. Command to completed call, without touching a phone.
How it actually works
It’s not a chatbot. It’s not a webhook. It’s AI coordinating a real phone call from end to end.
The approach:
Claude Code (orchestrator)
→ ExotelMCP (configure agent, place call, fetch transcript)
→ Exotel Voice Agent (the voice on the call)
→ Exotel Telephony (PSTN: real number, real call)
→ The person picks up
ExotelMCP is the wire. Claude Code uses it to configure an Exotel Voice Agent with the right instruction per call, then places the call via the Calls API using bidirectional streaming. Leg1 is the person on PSTN, Leg2 is the Voice Agent’s WebSocket directly. No IVR flow. No second phone number. The Voice Agent is the second leg.
Once the call is live, Claude Code steps back. The Voice Agent handles the conversation in real time: speech recognition, inference, voice synthesis, activity detection. When the call ends, Claude Code fetches the transcript and distils it into one answer.
This is the piece most people miss: you’re not building a voice AI from scratch. You’re orchestrating one that already exists. ExotelMCP is what makes that orchestration programmable.
What we learned building this
We’ve placed dozens of calls. Here’s what failed, what fixed it, and what we’d never skip again.
1. Identity-first, always
The very first words the person hears must identify the caller. No exceptions.
We learned this the hard way. The first call to Gautam, another investor, opened with something too formal. He heard a robot voice, froze, and said nothing for 20 seconds. The agent ended the call thinking no one was there.
The fix: use the static greeting feature in Exotel Voice Agent: the text that plays before the LLM even loads. It must be identity-first and warm.
- ❌ “Hello, how are you doing today?”
- ✅ “Hey Gautam, this is Shivku’s AI calling. Not a spam call. Shivku literally sent me. Do you have two minutes?”
Gautam picked up on the retry. Spoke for 7 minutes. Called it “fascinating.”
2. There are two layers: update both
The Voice Agent has two separate layers:
- Static greeting (
greeting_message): plays first, before the LLM is involved. Hardcoded text. - Agent instruction: the LLM prompt. Takes over after the greeting.
Early on, we’d update the agent instruction and forget the greeting. The person would hear a stale opening, get confused, and tune out before the LLM even started.
Both must be updated on every call. Treat them as a pair, never one without the other.
3. Disable greeting interruption
Set enable_greeting_interruption: false. Always.
Without this, when the person starts talking over the agent during the intro, the agent yields immediately, before finishing the identity. The person hears half a sentence and has no idea who’s calling.
With it disabled, the agent completes the greeting, then hands control to the LLM which listens properly.
4. Explicitly handle IVR screeners
Outbound calls often hit a screener, especially on corporate numbers. “Please record your name and reason for calling.” If the agent isn’t prepared, it’ll say “Hi,” and go silent. The screener waits. The agent waits. The call ends.
The agent instruction must contain explicit IVR handling:
If you hear anything like "record your name and reason for calling",
"who is calling", or "please hold": say: "This is an AI assistant
calling on behalf of [Name]. I am calling to [one-line purpose]."
Then wait silently for the call to connect.
How to build this
You don’t write a line of code. Here’s what you actually do.
What you need
- An Exotel account
- Claude Code with ExotelMCP connected
For ExotelMCP, copy your account SID, API key, and API token from the Exotel dashboard and paste them into the ExotelMCP config. One-time setup.
Create the /call skill
Paste this into Claude Code. It will create the skill file in the right place:
Create the file ~/.claude/skills/call/SKILL.md with this content:
# /call Skill
Call someone and get back an answer.
## Input
/call [name or +91XXXXXXXXXX] [what to ask or do on the call]
## Steps
### 1. Get the phone number
If a name is given, look it up in your contacts or directory.
Format as +91XXXXXXXXXX.
### 2. Get or create the Voice Agent
Check if a bot named "call-skill" exists via exotel_voicebot_list_all.
If not, create one via exotel_voicebot_create with name "call-skill".
Use this bot for all calls.
### 3. Update the Voice Agent (run both in parallel)
Update the greeting via exotel_voicebot_update_config:
- enable_greeting_interruption: false
- greeting_message: identity-first, one sentence about purpose,
"Do you have two minutes?"
Update the agent instruction via exotel_voicebot_assistant_update_prompt.
Write a prompt for this specific call covering:
- Who is calling and on whose behalf
- IVR handling: if you hear "record your name" or "please hold",
identify yourself and wait silently
- Task: what to find out (2-4 bullet points)
- Speaking style: short sentences, pause after each point, listen
- Stay in lane: don't help beyond the task, redirect to the caller
- Honesty: confirm you are an AI if asked
- Closing: summarise what you learned and confirm with the person
### 4. Place the call
Use exotel_voicebot_place_call with the phone number and bot ID.
Retry once if no-answer or busy. Wait 30 seconds between attempts.
### 5. Get the transcript
Once the call completes, use exotel_voicebot_transcript_get.
### 6. Report back
One line. What did the person say about what you asked?
If the call failed: say so and ask whether to retry.
Use it
/call joy invite him to EngageX on May 21st at Taj Lands End
/call arun check if the deployment is done
/call +91-9999999999 confirm the AV brief has been received
Claude reads the skill, creates the Voice Agent if needed, configures it for this call, places it, fetches the transcript, and returns a one-line answer. You don’t touch a phone.
What you can build
Internal operations: Ask any team member a yes/no question without interrupting their day. Deployment done? Speaker confirmed? Deadline met? One-line answer in two minutes.
Customer follow-ups: Closed a support ticket? The agent calls to confirm satisfaction. No human required for the loop-close.
Investor outreach: As Joy discovered, an AI that calls you to invite you to watch an AI call live on stage is its own pitch. The medium is the message.
Escalation chains: If someone doesn’t pick up, retry. If they still don’t, route to the next person. Build the logic in Claude, not in IVR flows.
The deeper point: we think of AI as something that processes text. But most of the world still runs on voice. Decisions get made on phone calls. Relationships are maintained on phone calls. The question isn’t whether AI can write an email. It’s whether AI can hold a conversation. This is that.
One more thing
The call that Joy was invited to attend: EngageX, May 21st, Taj Lands End Mumbai. It has a 7 PM slot where I make India’s first live AI phone call on stage. A real call to a real Mumbai restaurant. No script. No simulation.
Joy is now going to watch, live, the same thing that called him.
The meta is the point.
Built on Exotel Voice Agent + ExotelMCP.