Emulating a Bank Call Center with AI (Voice Deepfake)

Good Carder · May 30, 2026

Creating a voice model of an employee based on social media recordings, calling clients to obtain OTP and CVV, social engineering 2.0 techniques.

From a carder to carders. You think two-factor authentication and biometrics are secure? Well, try telling that to a bank that lost a million dollars because of a call from a fake "client" with a synthesized voice. In 2027, voice deepfakes are no longer lab experiments, but weapons of mass destruction. Any call center agent armed with a script can deceive a bank simply by calling and posing as a victim. But what if the agent themselves could now be artificial?

In this article, I'll explore a technology for emulating bank call center work using AI. You'll learn how to create a voice model of any person using social media recordings, how to call clients on behalf of the bank, obtaining OTP codes and CVV, how to automate calls using deep neural networks, and how to avoid being tricked by fraud detection systems.

Part 1: Why the Voice is the New Weapon

Bank call centers remain the weakest link. Operators are trained in politeness, not cryptography. They don't verify the caller's origin or analyze their voice for deepfakes; they simply follow a script. If a "client" calls and provides their passport information, the operator will reset the 2FA code or CVV.

The voice biometrics that banks implement to identify clients are based on analyzing unique voice characteristics. But what if these characteristics can be faked? In 2027, AI speech synthesis has reached such a level that distinguishing a real voice from a generated one is virtually impossible. Voice authentication becomes useless if a carder has 30 seconds of the victim's voice recorded from their Instagram Stories.

Part 2. Attack Architecture: From Voice Recording to Successful Deception

The full attack scheme consists of several stages. Each stage can be automated.

Stage 1. Collecting voice material. You need recordings of the victim's voice. Sources: TikTok, Instagram Reels, YouTube, public speaking, voice messages in Telegram (if you have access to the account). 30-60 seconds of clear speech without background noise is sufficient.
Stage 2. Generating a voice model. We use AI models for voice cloning (Voicemod, ElevenLabs, RVC, OpenVoice). The output is a model capable of synthesizing any phrases in the victim's voice.
Stage 3. Creating a conversation script. You need to think in advance about what the "victim" will say. The script should be natural, contain pauses and filler words to avoid suspicion.
Stage 4. Call automation. We use VoIP services (Twilio, Asterisk) to programmatically initiate the call. The voice is reproduced using a TTS (text-to-speech) engine or pre-recorded phrases.
Step 5. Bypassing recognition systems. To prevent the call from being flagged as fraudulent, we replace the caller ID (the sender's number) with the bank's official number. We also add natural background noise (street, office) to simulate a real-life environment.

Part 3. Voice Model Generation: Tools and Techniques

3.1. ElevenLabs (paid, 10/10 quality)

The most advanced voice cloning service. Upload a sample (one minute of speech), select a language, and the AI generates a model. ElevenLabs allows you to fine-tune emotions, speech rate, and add pauses. There's an API for VoIP integration. Cons: Paid, starting at $5 per month, but there's a free plan with limitations (10,000 characters per month).

3.2. RVC (Retrieval-based Voice Conversion) — open-source

A free alternative, it requires a powerful GPU. You train the model on the victim's voice. The process: collect a dataset (WAV files, 16 kHz, mono), run the training (1 to 12 hours). The output is a model that can be used to convert any text into voice. RVC is popular in the deepfake community.

3.3. OpenVoice (by MyShell.ai) - Instant Cloning

OpenVoice allows you to clone a voice using a 10-second sample with zero tuning (zero-shot). The quality is slightly lower than ElevenLabs, but the process is instantaneous. Ideal for express attacks.

3.4. Coqui TTS – a local alternative

Coqui TTS is an open-source library for speech synthesis. You can train the model using your own data. It requires more effort, but gives you complete control.

3.5. How to assemble a high-quality dataset

YouTube-dl downloads videos from the victim's channel.
Audacity cuts out clear speech, removes music, noise.
FFmpeg converts to a single format (16 kHz, 16-bit, mono).
Voice Activity Detection (VAD) removes pauses, leaving only moments where a person speaks.

The minimum quality dataset is 30 seconds of clear speech. For ideal quality, 5–10 minutes.

Part 4. Call automation: from script to call

4.1 VoIP providers

Twilio (USA) — Calling API, allows you to specify caller ID (sender's number). Cost is approximately $0.013 per minute of incoming calls.
Nexmo (Vonage) is an analogue.
Asterisk is a DIY IP PBX for those who don't want to pay. It requires SIP trunk configuration.

Caller ID spoofing — the displayed number is substituted via the "From" header. In the US and Europe, operators block spam, but they often make an exception for bank numbers (or carders exploit vulnerabilities in the SS7 protocol).

4.2. Natural speech generation via TTS

Python:

import requests
import os

# ElevenLabs API
def generate_speech(text, voice_id, api_key):
url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}"
headers = {"xi-api-key": api_key, "Content-Type": "application/json"}
data = {"text": text, "voice_settings": {"stability": 0.3, "similarity_boost": 0.8}}
response = requests.post(url, json=data, headers=headers)
with open("output.mp3", "wb") as f:
f.write(response.content)

4.3. Software call with audio playback

Python:

from twilio.rest import Client

account_sid = "YOUR_SID"
auth_token = "YOUR_TOKEN"
client = Client(account_sid, auth_token)

call = client.calls.create(
twiml='<Response><Say voice="alice">Good afternoon, this is the bank's security service. You are being called about a suspicious transaction.</Say></Response>',
to='+1234567890',
from_='+1987654321' # dummy number
)

Alternative: Upload a pre-generated MP3 file with the victim's voice (not text, but a deepfake).

4.4. Conversational AI (automatic response)

The most advanced level is when the artificial intelligence not only speaks in the victim's voice but also answers the operator's questions. This is achieved through a combination of STT (speech-to-text conversion, such as Whisper) → LLM (generates a response) → TTS (speaks in the victim's voice). This bot can carry on a full conversation without raising suspicion.

Python:

import openai
import whisper

model = whisper.load_model("base")
result = model.transcribe("operator_speech.wav")
user_text = result["text"]

completion = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": f"You are a bank client named Ivan. Answer the operator's question: {user_text}"}]
)

response_text = completion.choices[0].message.content
# Response_text is then sent to the TTS

Part 5. Real-World Cases and Attack Scenarios

5.1. Obtaining an OTP code (one-time password)

A "client" calls the bank: "Hello, I can't log into the app; my account is blocked. Please provide the code that was sent to my phone." The operator, after passing basic verification (providing passport information that has already been stolen), dictates an OTP code. Carders immediately use it to log into online banking. In 2025, a massive campaign was recorded in which carders used deepfake calls to bypass 2FA on crypto exchanges.

First-person narrative:
"I call the bank. The voice is a 30-second clip from YouTube. Script: I introduce myself as Ivan Petrov, provide my passport (taken from a leak), and ask for the code to log into the app because I have a "new phone." The operator sends me the code. I enter it into the app — the account is mine. An hour later, the money on the card is transferred to the drop account."

5.2. Social Engineering 2.0: Calling Clients on Behalf of a Bank

Here, the role changes. You're not a client, but a "security officer." You call the victim, introduce yourself, and explain that there was a suspicious login attempt, and that a code from an SMS is needed to cancel the transaction. The victim dictates the code, and you log into their account. In 2026, Group-IB recorded a 400% increase in such attacks in the CIS countries.

Technique:

AI generates an "employee" voice based on the voice of a real bank employee (samples can be found in interviews and corporate videos).
Number spoofing: The victim's phone displays the bank's official number.

5.3. Deepfake of a call center manager

In 2026, a case was recorded in the UK where a scammer used a deepfake voice of a company's CEO to convince an employee to transfer $243,000 to a dummy account. The scheme involved a fake email from the CEO, followed by an urgent call requesting the funds be transferred by the end of the day. In a call center context, the call to the operator came from the "bank director" demanding an access code for a VIP client.

The gist: operators are trained to obey their superiors. If the caller sounds like the manager, they will break protocol.

5.4. Automated database calling (DeepCall)

You purchase a database of phone numbers (leaks, darknet). A Python script calls all the numbers, synthesizing speech in the victim's voice. Each call has its own context (e.g., "Hello, your order number is..."). The success rate is low (0.1–1%), but with a coverage of 1 million numbers, that's 1,000 successful attacks.

Python:

def attack_loop(numbers):
for number in numbers:
voice_model = load_voice_model(number) # load a model from the database by phone number
script = generate_script(number) # personalization via OSINT
call(number, voice_model, script)
time.sleep(random.randint(60, 300)) # pause between calls

Part 6. Protection (and how to bypass it)

Banks aren't sleeping. They're implementing:

Voice biometrics (voice verification). The system analyzes the unique characteristics of a person's voice. To bypass this, a generative model trained specifically on the victim's voice must be used. In 2027, this almost always works, but some banks use liveness detection (a request to pronounce a random phrase). The bot must be able to synthesize any phrase.
Background noise analysis. If the call comes from a quiet area, and the client has always called from a noisy office, that's a red flag. Add realistic noise (office, street) using an audio editor.
Behavioral analysis. How quickly does a client respond? How do they construct their sentences? We use LLM to simulate human delays and filler words.
Cross-checking. For example, a push notification is simultaneously sent to the app. If the client cannot verify it, the call is considered fraudulent. This is insurmountable unless you have access to the victim's app.
Verification via another channel. The bank can call the client back at the registered number for confirmation. This breaks the attack if you're calling on behalf of the victim. Solution: Don't give the operator a reason to call back — solve everything in one call.

Part 7. OPSEC and the Carder's Checklist

Voice capture. Avoid recordings with obvious signs of editing. The best source is live broadcasts on social media.
Model generation. ElevenLabs is the easiest, RVC is free but complex. Always test the model on a phrase not in the training set.
Number spoofing. Use VoIP providers that don't block spoofing (registered in offshore jurisdictions).
Conversation script. Write naturally, with pauses, questions for the operator, and any doubts. Don't be a robot.
Calls. Don't call the same operator multiple times. Use the bank call center database.
Disguise. Call through a proxy in the bank's country so that the caller ID matches the victim's region.
Wipe your tracks. After a successful call, delete the logs from your VPS, change your VoIP account, and delete the voice model.

Summary

Voice deepfake is no longer a technology of the future, but a working tool of 2027. Using AI clones, you can fool a bank's call center, obtain an OTP code, CVV, and access your account. The key is to collect enough voice material, synthesize the speech correctly, and add human "imperfections." Banks are armed with voice biometrics, but they are powerless against a well-trained model. In 2027, the war has shifted to generative neural networks.

A quick one-line reminder:
"30 seconds of TikTok voice and you're anyone. ElevenLabs clones speech, Twilio calls, Whisper translates voice to text, GPT conducts conversations. The OTP code is in your pocket. Bank voice biometrics is just another myth."

Emulating a Bank Call Center with AI (Voice Deepfake)

Good Carder

Professional

Creating a voice model of an employee based on social media recordings, calling clients to obtain OTP and CVV, social engineering 2.0 techniques.

Part 1: Why the Voice is the New Weapon

Part 2. Attack Architecture: From Voice Recording to Successful Deception

Part 3. Voice Model Generation: Tools and Techniques

3.1. ElevenLabs (paid, 10/10 quality)

3.2. RVC (Retrieval-based Voice Conversion) — open-source

3.3. OpenVoice (by MyShell.ai) - Instant Cloning

3.4. Coqui TTS – a local alternative

3.5. How to assemble a high-quality dataset

Part 4. Call automation: from script to call

4.1 VoIP providers

4.2. Natural speech generation via TTS

4.3. Software call with audio playback

4.4. Conversational AI (automatic response)

Part 5. Real-World Cases and Attack Scenarios

5.1. Obtaining an OTP code (one-time password)

5.2. Social Engineering 2.0: Calling Clients on Behalf of a Bank

5.3. Deepfake of a call center manager

5.4. Automated database calling (DeepCall)

Part 6. Protection (and how to bypass it)

Part 7. OPSEC and the Carder's Checklist

Summary

Similar threads

Emulating a Bank Call Center with AI (Voice Deepfake)

Good Carder

Professional

Creating a voice model of an employee based on social media recordings, calling clients to obtain OTP and CVV, social engineering 2.0 techniques.​

Part 1: Why the Voice is the New Weapon​

Part 2. Attack Architecture: From Voice Recording to Successful Deception​

Part 3. Voice Model Generation: Tools and Techniques​

3.1. ElevenLabs (paid, 10/10 quality)​

3.2. RVC (Retrieval-based Voice Conversion) — open-source​

3.3. OpenVoice (by MyShell.ai) - Instant Cloning​

3.4. Coqui TTS – a local alternative​

3.5. How to assemble a high-quality dataset​

Part 4. Call automation: from script to call​

4.1 VoIP providers​

4.2. Natural speech generation via TTS​

4.3. Software call with audio playback​

4.4. Conversational AI (automatic response)​

Part 5. Real-World Cases and Attack Scenarios​

5.1. Obtaining an OTP code (one-time password)​

5.2. Social Engineering 2.0: Calling Clients on Behalf of a Bank​

5.3. Deepfake of a call center manager​

5.4. Automated database calling (DeepCall)​

Part 6. Protection (and how to bypass it)​

Part 7. OPSEC and the Carder's Checklist​

Summary​

Similar threads

Creating a voice model of an employee based on social media recordings, calling clients to obtain OTP and CVV, social engineering 2.0 techniques.

Part 1: Why the Voice is the New Weapon

Part 2. Attack Architecture: From Voice Recording to Successful Deception

Part 3. Voice Model Generation: Tools and Techniques

3.1. ElevenLabs (paid, 10/10 quality)

3.2. RVC (Retrieval-based Voice Conversion) — open-source

3.3. OpenVoice (by MyShell.ai) - Instant Cloning

3.4. Coqui TTS – a local alternative

3.5. How to assemble a high-quality dataset

Part 4. Call automation: from script to call

4.1 VoIP providers

4.2. Natural speech generation via TTS

4.3. Software call with audio playback

4.4. Conversational AI (automatic response)

Part 5. Real-World Cases and Attack Scenarios

5.1. Obtaining an OTP code (one-time password)

5.2. Social Engineering 2.0: Calling Clients on Behalf of a Bank

5.3. Deepfake of a call center manager

5.4. Automated database calling (DeepCall)

Part 6. Protection (and how to bypass it)

Part 7. OPSEC and the Carder's Checklist

Summary