Voxtral Realtime

MistralSpeech-to-TextMultilingualPreviewProprietaryvm-mst-003

About

Ultra-low-latency live transcription model from Mistral optimized for real-time applications. Streams transcription results with minimal delay, supporting live captioning, call center analytics, and real-time translation pipelines.

Capabilities (5)

Ultra-low latency

Live transcription

Streaming output

Real-time captioning

Pipeline integration

Transcript will appear here in real-time as you speak…

Key Highlights

Sub-150ms latency purpose-built for live captioning and analytics

Streaming-first architecture eliminates buffering delays

Integrates into real-time translation and analytics pipelines

Use Cases

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// Voxtral Realtime — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-mst-003",
  vendor: "mistral",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);

Related Models

NeMo ASR

NVIDIA

Riva

NVIDIA

Parakeet

NVIDIA

gpt-4o-realtime

OpenAI

gpt-4o-mini-realtime

OpenAI

gpt-4o-mini-transcribe

OpenAI

Quick Stats

Latency<150ms

Languages30 supported

LicenseProprietary

PricingAPI consumption-based

StatusPreview

Vendor

Mistral

Open-weight multilingual speech models from Europe

View all Mistral models

Documentation

View on Mistral Site

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// Voxtral Realtime — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-mst-003",
  vendor: "mistral",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);