Universal-2

AssemblyAISpeech-to-TextMultilingualGenerally AvailableProprietaryvm-aai-001

About

Multilingual ASR model supporting 100+ languages with high accuracy and low latency. Features automatic language detection, code-switching support, speaker diarization, and real-time streaming with 300-600ms processing latency.

Capabilities (5)

100+ languages

Code-switching support

Speaker diarization

Real-time streaming

Auto language detection

Transcript will appear here in real-time as you speak…

Key Highlights

100+ language support with automatic detection and code-switching

300-600ms processing latency suitable for real-time applications

One of the highest-accuracy multilingual ASR models available

Use Cases

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// Universal-2 — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-aai-001",
  vendor: "assemblyai",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);

Related Models

NeMo ASR

NVIDIA

Riva

NVIDIA

Parakeet

NVIDIA

gpt-4o-realtime

OpenAI

gpt-4o-mini-realtime

OpenAI

gpt-4o-mini-transcribe

OpenAI

Quick Stats

Latency300-600ms

Languages100 supported

LicenseProprietary

Pricing$0.01 / minute

StatusGenerally Available

Vendor

AssemblyAI

Speech intelligence APIs with LLM-powered understanding

View all AssemblyAI models

Documentation

View on AssemblyAI Site

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// Universal-2 — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-aai-001",
  vendor: "assemblyai",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);