gpt-4o-mini-transcribe

OpenAISpeech-to-TextMultilingualGenerally AvailableProprietaryvm-oai-003

About

Next-generation speech recognition model achieving dramatically lower hallucination rates than Whisper v2. Provides structured output, timestamp granularity, and logprob-based confidence scores with streaming support.

Capabilities (5)

90% fewer hallucinations

Structured JSON output

Word-level timestamps

Confidence scores

Streaming transcription

Transcript will appear here in real-time as you speak…

Key Highlights

90% reduction in hallucinated text compared to Whisper v2

Logprob-based confidence scoring for every transcribed segment

Native structured output eliminates post-processing pipelines

Use Cases

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// gpt-4o-mini-transcribe — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-oai-003",
  vendor: "openai",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);

Related Models

NeMo ASR

NVIDIA

Riva

NVIDIA

Parakeet

NVIDIA

gpt-4o-realtime

OpenAI

gpt-4o-mini-realtime

OpenAI

Whisper

OpenAI

Quick Stats

Latency<500ms

Languages97 supported

LicenseProprietary

Pricing$0.003 / minute

StatusGenerally Available

Vendor

OpenAI

Foundation models for real-time voice and transcription

View all OpenAI models

Documentation

View on OpenAI Site

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// gpt-4o-mini-transcribe — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-oai-003",
  vendor: "openai",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);