gpt-4o-realtime

OpenAIConversationalMultilingualGenerally AvailableProprietaryvm-oai-001

About

Full-duplex voice model built on GPT-4o enabling real-time conversational interactions. Supports function calling during speech, natural interruption handling, server-side voice activity detection, and simultaneous audio input/output streaming.

Capabilities (5)

Full-duplex conversation

Function calling mid-speech

Natural interruptions

Server VAD

WebSocket streaming

Talk naturally with gpt-4o-realtime

Start a conversation and speak freely. The AI will listen and respond naturally — no buttons between messages.

Demo Mode · Voice: Browser

Key Highlights

Native multimodal reasoning over audio without transcription intermediary

Function calling enables tool use during live conversation

Sub-second first-token latency with global edge deployment

Use Cases

Customer Support Agents

Deploy AI voice agents that handle customer inquiries with natural conversation flow and real-time responses.

Virtual Assistants

Build always-on voice assistants for enterprise applications with full-duplex capabilities.

Telehealth & Consultation

Enable voice-first healthcare consultations with HIPAA-compliant conversational AI.

Interactive Voice Response

Replace traditional IVR menus with natural language voice agents that understand intent.

Code Example

// gpt-4o-realtime — Conversational Voice Session
import { VoiceSession } from "@arkitekton/voice";

const session = await VoiceSession.create({
  model: "vm-oai-001",
  vendor: "openai",
  config: {
    fullDuplex: true,
    language: "en-US",
    turnDetection: "server_vad",
  },
});

session.on("speech_started", () => {
  console.log("Agent is speaking...");
});

session.on("transcript", (text) => {
  console.log("User said:", text);
});

// Connect to audio stream
const mic = await navigator.mediaDevices.getUserMedia({ audio: true });
session.connect(mic);

Related Models

PersonaPlex 7B

NVIDIA

Riva

NVIDIA

ACE (Avatar Cloud Engine)

NVIDIA

gpt-4o-mini-realtime

OpenAI

gpt-4o-mini-transcribe

OpenAI

Whisper

OpenAI

Quick Stats

Latency~320ms TTFT

Languages50 supported

LicenseProprietary

Pricing$5.00 / 1M input tokens, $20.00 / 1M output tokens

StatusGenerally Available

Vendor

OpenAI

Foundation models for real-time voice and transcription

View all OpenAI models

Documentation

View on OpenAI Site

Customer Support Agents

Deploy AI voice agents that handle customer inquiries with natural conversation flow and real-time responses.

Virtual Assistants

Build always-on voice assistants for enterprise applications with full-duplex capabilities.

Telehealth & Consultation

Enable voice-first healthcare consultations with HIPAA-compliant conversational AI.

Interactive Voice Response

Replace traditional IVR menus with natural language voice agents that understand intent.

Code Example

// gpt-4o-realtime — Conversational Voice Session
import { VoiceSession } from "@arkitekton/voice";

const session = await VoiceSession.create({
  model: "vm-oai-001",
  vendor: "openai",
  config: {
    fullDuplex: true,
    language: "en-US",
    turnDetection: "server_vad",
  },
});

session.on("speech_started", () => {
  console.log("Agent is speaking...");
});

session.on("transcript", (text) => {
  console.log("User said:", text);
});

// Connect to audio stream
const mic = await navigator.mediaDevices.getUserMedia({ audio: true });
session.connect(mic);