gpt-4o-mini-realtime

OpenAIConversationalMultilingualGenerally AvailableProprietaryvm-oai-002

About

Lightweight full-duplex voice model offering the same real-time conversational capabilities as the full GPT-4o Realtime at significantly reduced cost and latency. Ideal for high-volume voice agent deployments.

Capabilities (5)

Full-duplex conversation

Lower cost inference

Function calling

Interruption handling

Streaming audio I/O

Talk naturally with gpt-4o-mini-realtime

Start a conversation and speak freely. The AI will listen and respond naturally — no buttons between messages.

Demo Mode · Voice: Browser

Key Highlights

Up to 80% cost reduction versus the full Realtime model

Faster time-to-first-token for latency-sensitive voice agents

Same API surface enables seamless model swapping

Use Cases

Customer Support Agents

Deploy AI voice agents that handle customer inquiries with natural conversation flow and real-time responses.

Virtual Assistants

Build always-on voice assistants for enterprise applications with full-duplex capabilities.

Telehealth & Consultation

Enable voice-first healthcare consultations with HIPAA-compliant conversational AI.

Interactive Voice Response

Replace traditional IVR menus with natural language voice agents that understand intent.

Code Example

// gpt-4o-mini-realtime — Conversational Voice Session
import { VoiceSession } from "@arkitekton/voice";

const session = await VoiceSession.create({
  model: "vm-oai-002",
  vendor: "openai",
  config: {
    fullDuplex: true,
    language: "en-US",
    turnDetection: "server_vad",
  },
});

session.on("speech_started", () => {
  console.log("Agent is speaking...");
});

session.on("transcript", (text) => {
  console.log("User said:", text);
});

// Connect to audio stream
const mic = await navigator.mediaDevices.getUserMedia({ audio: true });
session.connect(mic);

Related Models

PersonaPlex 7B

NVIDIA

Riva

NVIDIA

ACE (Avatar Cloud Engine)

NVIDIA

gpt-4o-realtime

OpenAI

gpt-4o-mini-transcribe

OpenAI

Whisper

OpenAI

Quick Stats

Latency~250ms TTFT

Languages50 supported

LicenseProprietary

Pricing$0.60 / 1M input tokens, $2.40 / 1M output tokens

StatusGenerally Available

Vendor

OpenAI

Foundation models for real-time voice and transcription

View all OpenAI models

Documentation

View on OpenAI Site

Customer Support Agents

Deploy AI voice agents that handle customer inquiries with natural conversation flow and real-time responses.

Virtual Assistants

Build always-on voice assistants for enterprise applications with full-duplex capabilities.

Telehealth & Consultation

Enable voice-first healthcare consultations with HIPAA-compliant conversational AI.

Interactive Voice Response

Replace traditional IVR menus with natural language voice agents that understand intent.

Code Example

// gpt-4o-mini-realtime — Conversational Voice Session
import { VoiceSession } from "@arkitekton/voice";

const session = await VoiceSession.create({
  model: "vm-oai-002",
  vendor: "openai",
  config: {
    fullDuplex: true,
    language: "en-US",
    turnDetection: "server_vad",
  },
});

session.on("speech_started", () => {
  console.log("Agent is speaking...");
});

session.on("transcript", (text) => {
  console.log("User said:", text);
});

// Connect to audio stream
const mic = await navigator.mediaDevices.getUserMedia({ audio: true });
session.connect(mic);