Whisper large-v3-turbo

OpenAISpeech-to-TextMultilingualEdge / On-DeviceGenerally AvailableMITvm-oai-006

About

Distilled Whisper variant that retains large-v3 accuracy while running significantly faster. Reduces decoder layers for efficient inference on consumer GPUs and edge devices while preserving multilingual robustness.

Capabilities (5)

Distilled architecture

Near large-v3 accuracy

Faster inference

Consumer GPU friendly

Multilingual support

Transcript will appear here in real-time as you speak…

Key Highlights

Near large-v3 accuracy at a fraction of the compute cost

Runs real-time on consumer GPUs and Apple Silicon

MIT license with full Whisper ecosystem compatibility

Use Cases

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// Whisper large-v3-turbo — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-oai-006",
  vendor: "openai",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);

Related Models

NeMo ASR

NVIDIA

Riva

NVIDIA

Parakeet

NVIDIA

gpt-4o-realtime

OpenAI

gpt-4o-mini-realtime

OpenAI

gpt-4o-mini-transcribe

OpenAI

Quick Stats

Latency<150ms

Languages97 supported

LicenseMIT

PricingOpen-source / self-hosted

StatusGenerally Available

Vendor

OpenAI

Foundation models for real-time voice and transcription

View all OpenAI models

GitHub Repository

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// Whisper large-v3-turbo — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-oai-006",
  vendor: "openai",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);