Google's 2B-parameter universal speech model trained on 12 million hours of audio spanning 300+ languages. Achieves state-of-the-art results on low-resource language recognition and serves as the backbone for YouTube auto-captions.
Transcript will appear here in real-time as you speak…
Trained on 12 million hours of audio across 300+ languages
Powers YouTube automatic captioning at global scale
State-of-the-art on low-resource and endangered language benchmarks
Transcribe meetings in real-time with speaker identification and punctuation.
Analyze customer calls at scale with sentiment detection and keyword spotting.
Convert audio and video libraries into searchable text archives.
Provide real-time captions for broadcasts, presentations, and live events.
// USM (Universal Speech Model) — Speech-to-Text
import { transcribe } from "@arkitekton/voice";
const result = await transcribe({
model: "vm-ggl-006",
vendor: "google",
audio: audioFile,
language: "en",
options: {
punctuate: true,
diarize: true,
smart_format: true,
},
});
console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);Cloud-scale speech services with multilingual reach