Behavioral biometrics under the microscope: how to train your AI human emulator

Good Carder

Professional
Messages
753
Reaction score
493
Points
63

Introduction: Why Static Methods Don't Work Anymore​

You may have perfect anti-detection, up-to-date residential proxies, and a card with the appropriate BIN. But the payment still fails with a "fraudulent" error. Why? Because the website analyzes not only who you are, but also how you move.

Behavioral biometrics technologies are the new frontier in detecting automation. BioCatch analyzes over 2,000 parameters, including typing speed, touch patterns, and mouse movement smoothness. BehavioSec tracks keystroke rhythms and mouse movements in real time. NuData Security detects unauthorized users with 99% accuracy, even with the correct credentials.

In this article, I will discuss:
  • What specific parameters do BioCatch, BehavioSec, and NuData collect?
  • How to collect a dataset of real user sessions using rrweb.
  • Libraries for emulating human movements (ghost-cursor, humanize).
  • Using LSTM to generate trajectories indistinguishable from human ones.
  • Emulation quality assessment metrics.

Part 1: What behavioral biometrics systems collect​

BioCatch tracks over 2,000 behavioral parameters, analyzing everything from eye-hand coordination to the slightest mouse movements.

BehavioSec analyzes the characteristic intervals between keystrokes, typing speed, and cursor movement in certain situations.

1.1. Top 10 Mouse Parameters That Give You Away​

ParameterWhat does it measure?How the bot usually makes mistakes
Speed of movementPixels per secondConstant speed without acceleration/deceleration
Acceleration/decelerationThe second derivative of the positionLinear function without smooth transitions
Cursor shakingMicro-oscillations of the trajectoryAbsolutely straight line
Accuracy of hitDistance to the target centerPerfect hit to the center
Reaction timePause before starting to moveNo pauses ("instant reaction")
Rotation anglesDistribution of directions of movementPredictable angles (45°, 90°)
Missed targets (overshoot)Slipping past the targetA rare occurrence among bots
MicrocorrectionsSmall movements after stoppingNone
Smoothing the pathHow the trajectory is smoothedIdeal Bezier curves
Hover pausesHover time before clickInstant click without hovering

1.2. Keyboard Settings​

ParameterWhat does it measure?Example of bot rejection
Dwell timeKey hold timeConstant time for all keys
Flight timeTime between release and next pressEqual spacing between characters
Input rhythmDelay patternLack of natural variation
Using TabNavigating between fieldsSequentially traverse all fields without gaps
Bugs and fixesBackspace frequencyNone - perfect input on first try

Modern processing systems go beyond analyzing keyboard and mouse input on desktops. On mobile devices, they collect additional information, including gyroscope and accelerometer data, and track swipe patterns.

Part 2. Collecting a real session dataset via rrweb​

To emulate a person, you must first understand how they move. The best way is to record real sessions and analyze them.

2.1 What is rrweb?​

RRWeb (Record and Replay the Web) is an open-source library for recording and replaying web sessions. It records DOM events, not screenshots: DOM changes, mouse movements, clicks, keyboard inputs — all with timestamps.

2.2 Installation and Basic Configuration​

HTML:
<!-- Connecting rrweb -->
<script src="https://cdn.jsdelivr.net/npm/rrweb@latest/dist/rrweb.min.js"></script>

<script>
let events = [];
let recordingInterval = null;

// Recording configuration with a full set of events
const recordingConfig = {
emit(event) {
events.push(event);
},
// Record all mouse movements (every pixel)
recordMouseMove: true,
// Record scrolling
recordScroll: true,
// Record keyboard input
recordInput: true,
// Record clicks
recordClick: true,
// Mouse movement sampling rate (in milliseconds)
sampling: {
mousemove: 10 // Collect data every 10 ms -> 100 points/second
}
};

// Start recording
const stopRecording = rrweb.record(recordingConfig);

// Stop after 5 minutes (or on event)
setTimeout(() => {
stopRecording();
// Save events to localStorage or send to the server
console.log(`${events.length} events collected`);
localStorage.setItem('session', JSON.stringify(events));
}, 300000);

// Export to CSV for analysis in Python
function exportToCSV() {
const events = JSON.parse(localStorage.getItem('session'));
const mouseEvents = events.filter(e => e.type === 3); // type 3 = MouseMove

let csv = 'timestamp,x,y,target\n';
for (const ev of mouseEvents) {
csv += `${ev.timestamp},${ev.data.x},${ev.data.y},${ev.data.target}\n`;
}

// Download CSV
const blob = new Blob([csv], {type: 'text/csv'});
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'mouse_movements.csv';
a.click();
}
</script>

2.3. Advanced Settings: Recording Custom Events​

RRWeb supports recording custom events via an API, allowing you to synchronize data with other sources or add metadata.
JavaScript:
// Record a custom event
rrweb.record({
emit(event) { events.push(event); },
recordMouseMove: true,
sampling: { mousemove: 10 },
// Custom handler for additional data
hooks: {
beforeEmit(event) {
if (event.type === 3) { // MouseMove
event.data.screenSize = {
width: window.innerWidth,
height: window.innerHeight
};
}
return event;
}
}
});

2.4. Playing back a recorded session​

JavaScript:
// Replaying a session using rrweb-player
const events = JSON.parse(localStorage.getItem('session'));
new rrwebPlayer({
target: document.getElementById('replay-container'),
props: {
events,
width: 1024,
height: 768,
autoPlay: true,
showController: true
}
});

2.5. Dataset collection: methodology​

To create a training dataset, you need:
  1. Collect sessions of at least 50 different users (each session 3-5 minutes).
  2. Various tasks: filling out forms, website navigation, scrolling.
  3. Various devices: desktops with mouse and trackpad, mobile devices.
  4. Manual marking: marking areas with “natural” and “unnatural” behavior (distracted, slowed down, etc.).

Part 3. Motion Simulation Libraries​

3.1. Ghost Cursor is the gold standard for Puppeteer​

Ghost Cursor is a utility for Puppeteer that generates realistic, human-like mouse movements between coordinates. Instead of an instant jump (page.mouse.click(x, y)) Ghost Cursor moves the cursor along Bézier curves.
JavaScript:
const puppeteer = require('puppeteer');
const ghostCursor = require('ghost-cursor');

const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://example.com');

const cursor = ghostCursor.createCursor(page);
await cursor.moveTo('#submit-button'); // Human-like movement
await cursor.click(); // Click with a random offset

Key features of Ghost Cursor 2025-2026:
  • Automatic selection of random coordinates within the element (instead of the ideal center).
  • Generate a set of points for movement between two coordinates.
  • Allows you to create your own movement sequences.
  • It is clearly not a "silver bullet" for complex security systems, but it significantly reduces automation signals.

3.2. @@extra /humanize — a plugin for Playwright and Puppeteer​

@@extra /humanize is a plugin that emulates human input, with a special emphasis on mouse movements.
JavaScript:
const playwright = require('playwright');
const humanize = require('@extra/humanize');

const browser = await playwright.chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();

// Human-like click with sub-pixel precision and realistic movement
await humanize.click(page, '#button', {
waitTime: 500, // pause before movement (the human "aims")
moveSpeed: 0.8, // relative movement speed (0-1)
variation: 5 // random pixels offset from the target point
});

3.3 Python Libraries (Standalone Use)​

LibraryAlgorithmPeculiarity
human_mouseBezier curves + spline interpolationUltra-realistic trajectories based on complex mathematical algorithms
WindMouseWindMouse AlgorithmCreates nonlinear trajectories with variable speed and natural noise
BumblebeeRNN + LSTMAn AI engine that predicts natural trajectories using deep learning

Example of using human_mouse:
Python:
from human_mouse import HumanMouse
import pyautogui

human = HumanMouse()
# Перемещение мыши из текущей позиции в (500, 500) за 0.8 секунды
human.move_to(500, 500, duration=0.8, bezier=True)
# Клик с человеческим патерном (hover + вариация)
human.click(500, 500, hover_duration=0.2, variation=5)

3.4 Ghost Cursor in Python (Standalone Version)​

There are implementations of the Ghost Cursor algorithm for use without Puppeteer:
Python:
class HumanMouse:
def __init__(self):
self.current_pos = pyautogui.position()
self.bezier_points = []
   
def generate_bezier_curve(self, end_x, end_y, steps=50):
"""Генерация кривой Безье между точками"""
cp1_x = self.current_pos[0] + (end_x - self.current_pos[0]) * 0.25 + random.randint(-30, 30)
cp1_y = self.current_pos[1] + (end_y - self.current_pos[1]) * 0.25 + random.randint(-30, 30)
cp2_x = self.current_pos[0] + (end_x - self.current_pos[0]) * 0.75 + random.randint(-30, 30)
cp2_y = self.current_pos[1] + (end_y - self.current_pos[1]) * 0.75 + random.randint(-30, 30)
       
points = []
for i in range(steps + 1):
t = i / steps
x = (1-t)**3 * self.current_pos[0] + 3*(1-t)**2*t * cp1_x + 3*(1-t)*t**2 * cp2_x + t**3 * end_x
y = (1-t)**3 * self.current_pos[1] + 3*(1-t)**2*t * cp1_y + 3*(1-t)*t**2 * cp2_y + t**3 * end_y
points.append((x, y))
return points
   
def move_to(self, x, y, duration=0.5):
points = self.generate_bezier_curve(x, y, steps=int(duration * 60))
for point in points:
pyautogui.moveTo(point[0], point[1], duration=0.01)
time.sleep(0.01 + random.uniform(0, 0.01))
self.current_pos = (x, y)

Part 4. Using LSTM to Generate Natural Trajectories​

4.1. Neural network architecture for generating mouse movements​

A modern approach to human emulation is the use of recurrent neural networks to predict natural movements.

Bumblebee is an example of an AI package that uses an RNN with an LSTM layer to generate natural mouse trajectories:
Code:
RNN + LSTM Layer
↓
Baseline trajectory generation
↓
Adding natural noise and variable velocity
↓
Final motion close to real human motion

4.2. Collecting training data​

The training dataset should contain labeled mouse trajectories from real sessions. Available datasets include the SapiMouse dataset and datasets from platforms like CleverSys.

Data preparation process:
  1. Recording sessions via rrweb (as described in Part 2).
  2. Export coordinates to CSV with timestamps.
  3. Normalization of coordinates relative to the screen size.
  4. Generating sequences for LSTM (input: previous N points, output: next point).

Example of data structure:
Python:
import numpy as np
from sklearn.preprocessing import MinMaxScaler

def prepare_lstm_data(mouse_trajectories, seq_length=10):
""
mouse_trajectories: array list (n_points, 2)
seq_length: how many previous points to use to predict the next
"""
X, y = [], []
for trajectory in mouse_trajectories:
# Normalize coordinates
scaler = MinMaxScaler()
trajectory_norm = scaler.fit_transform(trajectory)

for i in range(len(trajectory_norm) - seq_length):
X.append(trajectory_norm[i:i+seq_length])
y.append(trajectory_norm[i+seq_length])

return np.array(X), np.array(y)

4.3. Training LSTM in PyTorch​

Python:
import torch
import torch.nn as nn

class MouseMovementLSTM(nn.Module):
def __init__(self, input_size=2, hidden_size=64, num_layers=2, output_size=2):
super(MouseMovementLSTM, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers

self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)

def forward(self, x):
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)

out, _ = self.lstm(x, (h0, c0))
out = self.fc(out[:, -1, :])
return out

# Training example
model = MouseMovementLSTM(input_size=2, hidden_size=64, num_layers=2, output_size=2)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

for epoch in range(100):
outputs = model(X_tensor)
loss = criterion(outputs, y_tensor)
optimizer.zero_grad()
loss.backward()
optimizer.step()

4.4. Trajectory generation using a trained model​

Python:
def generate_trajectory(model, start_pos, target_pos, steps=100, temperature=0.1):
"""
Generate a mouse trajectory using a trained LSTM
temperature: controls randomness (0=deterministic, 1=random)
"""
trajectory = [start_pos]
current = torch.tensor(start_pos).float().unsqueeze(0).unsqueeze(0)

for _ in range(steps):
with torch.no_grad():
next_pos = model(current).squeeze().numpy()

# Add controlled randomness
noise = np.random.normal(0, temperature, size=2)
next_pos = next_pos + noise

# Adjust direction to target (serial planning)
direction_to_target = (np.array(target_pos) - next_pos) / (steps - len(trajectory))
next_pos = next_pos + direction_to_target * 0.1

trajectory.append(next_pos)
current = torch.tensor(next_pos).float().unsqueeze(0).unsqueeze(0)

return trajectory

Part 5: Creating a Script Indistinguishable from a Human​

5.1. Combined approach​

The most effective way to create an indistinguishable script is to combine several methods:
ComponentToolFunction
Mouse movementLSTM model + ghost-cursorNatural trajectories with variable speed
Keyboard inputCustom type with variable delaysRealistic typing rhythm
ScrollRandom pauses and returns upA person often rereads
Hover pauses200-600 ms before clickNatural delay before action
MicrocorrectionsRotate the cursor after movementOvershoot and correction

5.2. A complete example on Puppeteer using all techniques​

JavaScript:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
const ghostCursor = require('ghost-cursor');

puppeteer.use(StealthPlugin());

// Input emulation function with human-like delays
async function humanType(page, selector, text) {
await page.click(selector);
await page.waitForTimeout(200 + Math.random() * 300);

for (let i = 0; i < text.length; i++) {
const char = text[i];
// Variable delay between characters (50-150 ms)
const delay = 50 + Math.random() * 100;
await page.keyboard.type(char, { delay });

// Sometimes we pause (the person thinks)
if (Math.random() < 0.05) {
await page.waitForTimeout(300 + Math.random() * 700);
}

// Sometimes we make mistakes and delete them (typos)
if (Math.random() < 0.02 && i < text.length - 1) {
await page.keyboard.press('Backspace');
await page.keyboard.type(char, { delay: delay / 2 });
}
}
}

// Main input function with behavioral emulation
async function humanLikeCheckout(page, cardData) {
const cursor = ghostCursor.createCursor(page);

// 1. Natural movement to the first field
const nameField = await page.$('#name');
const nameBox = await nameField.boundingBox();
await cursor.moveTo({ x: nameBox.x + 50, y: nameBox.y + 15 });
await page.waitForTimeout(200 + Math.random() * 300);

// 2. Filling out fields with human-like delays
await humanType(page, '#name', cardData.name);

// 3. Natural movement to the next field (scrolling)
const cardField = await page.$('#card-number');
const cardBox = await cardField.boundingBox();

// Simulate variable-speed scrolling
await page.evaluate(() => window.scrollBy(0, 200));
await page.waitForTimeout(100 + Math.random() * 200);

await cursor.moveTo({ x: cardBox.x + 100, y: cardBox.y + 15 });
await humanType(page, '#card-number', cardData.number);

// 4. Pause before submitting (human checking the form)
await page.waitForTimeout(800 + Math.random() * 500);

// 5. Click on the button with an offset
const submitBtn = await page.$('#submit');
const btnBox = await submitBtn.boundingBox();
await cursor.moveTo({
x: btnBox.x + btnBox.width / 2 + (Math.random() - 0.5) * 10,
y: btnBox.y + btnBox.height / 2 + (Math.random() - 0.5) * 10
});
await page.waitForTimeout(200 + Math.random() * 300);
await page.click('#submit');
}

Part 6. Indistinguishability Evaluation Metrics​

To test how human-like your script is, use the following metrics.

6.1. Statistical Metrics​

MetricsHow is it calculated?Target value
Average speedTotal distance / travel time300-800 pixels/second with variation
Coefficient of variation of velocityStd(speed) / Mean(speed)0.3-0.7 (high variability)
Maximum accelerationΔ speed / Δ time1000-3000 pixels/s²
Percentage of straight segmentsProportion of sections with a turning angle < 10°<5% (a person rarely moves straight)

6.2. Trust Scale (Bot Score)​

Modern systems assign each transaction a "bot score" — a number between 0 (definitely human) and 1 (definitely a bot), based on a probability generated by a random forest classifier.

Bot score ranges:
  • 0.0-0.3 → Green zone (person)
  • 0.3-0.7 → Yellow zone (further verification required)
  • 0.7-1.0 → Red Zone (bot - block)

6.3. Turing Test for Scripts​

The only reliable way to check indistinguishability is to compare your script's trajectories with the real human trajectories the system was trained on. To do this, you can calculate similarity metrics:
Python:
from scipy.spatial.distance import directed_hausdorff
import numpy as np

def similarity_score(trajectory_bot, trajectory_human):
""
Calculates the similarity of a bot's trajectory to a human's
Returns a value from 0 (different) to 1 (identical)
"""
# 1. Shape comparison (Hausdorff distance)
hausdorff_dist = max(
directed_hausdorff(trajectory_bot, trajectory_human)[0],
directed_hausdorff(trajectory_human, trajectory_bot)[0]
)

# 2. Speed distribution comparison
speeds_bot = np.linalg.norm(np.diff(trajectory_bot, axis=0), axis=1)
speeds_human = np.linalg.norm(np.diff(trajectory_human, axis=0), axis=1)

speed_similarity = 1 - np.abs(
np.mean(speeds_bot) - np.mean(speeds_human)
) / (np.std(speeds_human) + 1e-6)

#3. Comparison of angle distribution
angles_bot = np.arctan2(
np.diff(trajectory_bot[:,1]),
np.diff(trajectory_bot[:,0])
)
angles_human = np.arctan2(
np.diff(trajectory_human[:,1]),
np.diff(trajectory_human[:,0])
)

hist_bot, _ = np.histogram(angles_bot, bins=36, range=(-np.pi, np.pi), density=True)
hist_human, _ = np.histogram(angles_human, bins=36, range=(-np.pi, np.pi), density=True)

angle_similarity = 1 - np.sum((hist_bot - hist_human) ** 2) / 2

# Final scoring
return (1 / (1 + hausdorff_dist) * 0.3 +
speed_similarity * 0.35 +
angle_similarity * 0.35)

Part 7: A Comprehensive Checklist for Creating an Indistinguishable Script​

Before running the script:
  • At least 50 real sessions were collected via rrweb to create a dataset.
  • The LSTM model was trained on labeled human trajectories.
  • Emulation libraries selected (ghost-cursor for Puppeteer, human_mouse/Bumblebee for Python).
  • Variability of all parameters is configured: speed, delays, acceleration.

Behavioral parameters for simulation:
  • Mouse movements are Bezier curves + natural noise, no straight lines.
  • Hover pauses are 200-600ms before clicking, never instant clicks.
  • Overshoot - Sometimes slip past, then turn around.
  • Input errors - 1-2 typos per form (subsequently corrected).

After launch (validation):
  • Compare bot score with trust scale - should be <0.3 for safe zone.
  • Check it with the Turing test - visually comparing your movements with real sessions.
  • Monitor your rejections - if you're still getting rejected due to fraudulent behavior, it's not a behavioral issue.

Conclusion: The arms race continues​

Behavioral biometrics is the latest frontier in automation detection. These systems are becoming smarter every year: BioCatch analyzes over 2,000 parameters, and NuData achieves 99% detection accuracy.

But emulation technologies are keeping pace. Ghost Cursor, LSTM trajectory generation, and AI engines like Bumblebee are making scripts increasingly indistinguishable from humans. The key to success is constant updating: it's recommended to review the dataset and retrain models on fresh data every 3-6 months.

Three key takeaways:
  1. Static methods (Canvas, WebGL) are no longer the primary line of defense. Behavioral biometrics are the new frontier.
  2. Use a combination of emulation libraries - no single one will give perfect results.
  3. LSTM and RNN are the future of emulation. Train your models on real human sessions.

A quick one-line reminder:
"Speed ≠ monotonous, lines ≠ straight, clicks ≠ instant, input ≠ perfect, pauses ≠ even — and even then you're only getting closer to the person."
 
Top