Full-duplex voice model built on GPT-4o enabling real-time conversational interactions. Supports function calling during speech, natural interruption handling, server-side voice activity detection, and simultaneous audio input/output streaming.
Talk naturally with gpt-4o-realtime
Start a conversation and speak freely. The AI will listen and respond naturally — no buttons between messages.
Demo Mode · Voice: Browser
Native multimodal reasoning over audio without transcription intermediary
Function calling enables tool use during live conversation
Sub-second first-token latency with global edge deployment
Deploy AI voice agents that handle customer inquiries with natural conversation flow and real-time responses.
Build always-on voice assistants for enterprise applications with full-duplex capabilities.
Enable voice-first healthcare consultations with HIPAA-compliant conversational AI.
Replace traditional IVR menus with natural language voice agents that understand intent.
// gpt-4o-realtime — Conversational Voice Session
import { VoiceSession } from "@arkitekton/voice";
const session = await VoiceSession.create({
model: "vm-oai-001",
vendor: "openai",
config: {
fullDuplex: true,
language: "en-US",
turnDetection: "server_vad",
},
});
session.on("speech_started", () => {
console.log("Agent is speaking...");
});
session.on("transcript", (text) => {
console.log("User said:", text);
});
// Connect to audio stream
const mic = await navigator.mediaDevices.getUserMedia({ audio: true });
session.connect(mic);Foundation models for real-time voice and transcription