Cloud Speech-to-Text v2

GoogleSpeech-to-TextMultilingualCustom TrainingGenerally AvailableProprietaryvm-ggl-004

About

Enterprise-grade automatic speech recognition with real-time streaming, batch processing, and adaptation features. Supports phrase hints, model adaptation, and multi-channel audio with automatic punctuation.

Capabilities (5)

Real-time streaming

Model adaptation

Multi-channel audio

Phrase hints

Automatic punctuation

Transcript will appear here in real-time as you speak…

Key Highlights

Enterprise SLA with 99.9% uptime guarantee

Model adaptation enables domain-specific accuracy improvements

Multi-channel recognition handles separate speaker tracks

Use Cases

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// Cloud Speech-to-Text v2 — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-ggl-004",
  vendor: "google",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);

Related Models

NeMo ASR

NVIDIA

NeMo TTS

NVIDIA

Riva

NVIDIA

Parakeet

NVIDIA

gpt-4o-realtime

OpenAI

gpt-4o-mini-realtime

OpenAI

Quick Stats

Latency<300ms streaming

Languages125 supported

LicenseProprietary

Pricing$0.016 / minute

StatusGenerally Available

Vendor

Google

Cloud-scale speech services with multilingual reach

View all Google models

Documentation

View on Google Site

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// Cloud Speech-to-Text v2 — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-ggl-004",
  vendor: "google",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);