Parakeet

NVIDIASpeech-to-TextEdge / On-DeviceGenerally AvailableApache 2.0vm-nv-005

About

End-to-end CTC-based ASR model family optimized for high accuracy and low footprint. Available in multiple sizes from 0.6B to 1.8B parameters, achieving state-of-the-art word error rates on common benchmarks.

Capabilities (5)

CTC-based decoding

Multiple model sizes

Low memory footprint

Benchmark-leading WER

ONNX export support

Transcript will appear here in real-time as you speak…

Key Highlights

State-of-the-art English WER across LibriSpeech and other benchmarks

Compact CTC architecture enables efficient edge deployment

Open-source Apache 2.0 with NeMo toolkit integration

Use Cases

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// Parakeet — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-nv-005",
  vendor: "nvidia",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);

Related Models

PersonaPlex 7B

NVIDIA

NeMo ASR

NVIDIA

NeMo TTS

NVIDIA

Riva

NVIDIA

ACE (Avatar Cloud Engine)

NVIDIA

gpt-4o-mini-transcribe

OpenAI

Quick Stats

Latency<100ms

Languages1 supported

LicenseApache 2.0

PricingOpen-source / self-hosted

StatusGenerally Available

Vendor

NVIDIA

GPU-accelerated speech AI and conversational frameworks

View all NVIDIA models

GitHub Repository

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// Parakeet — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-nv-005",
  vendor: "nvidia",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);