State-of-the-art multilingual ASR model at 24B parameters that outperforms Whisper large-v3 on common benchmarks. Features 32K context window for processing long-form audio with a single forward pass and structured output support.
Transcript will appear here in real-time as you speak…
Outperforms Whisper large-v3 across standard ASR benchmarks
32K context window processes 4+ hours of audio in a single pass
Apache 2.0 license with open weights for unrestricted use
Transcribe meetings in real-time with speaker identification and punctuation.
Analyze customer calls at scale with sentiment detection and keyword spotting.
Convert audio and video libraries into searchable text archives.
Provide real-time captions for broadcasts, presentations, and live events.
// Voxtral Small 24B — Speech-to-Text
import { transcribe } from "@arkitekton/voice";
const result = await transcribe({
model: "vm-mst-001",
vendor: "mistral",
audio: audioFile,
language: "en",
options: {
punctuate: true,
diarize: true,
smart_format: true,
},
});
console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);Open-weight multilingual speech models from Europe