Open-source multilingual speech recognition model trained on 680,000 hours of web-scale data. Delivers robust transcription across accents, background noise, and technical jargon with automatic language detection.
Transcript will appear here in real-time as you speak…
Trained on 680K hours of multilingual web audio data
Robust to background noise, accents, and domain jargon
MIT license with sizes from 39M to 1.5B parameters
Transcribe meetings in real-time with speaker identification and punctuation.
Analyze customer calls at scale with sentiment detection and keyword spotting.
Convert audio and video libraries into searchable text archives.
Provide real-time captions for broadcasts, presentations, and live events.
// Whisper — Speech-to-Text
import { transcribe } from "@arkitekton/voice";
const result = await transcribe({
model: "vm-oai-004",
vendor: "openai",
audio: audioFile,
language: "en",
options: {
punctuate: true,
diarize: true,
smart_format: true,
},
});
console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);Foundation models for real-time voice and transcription