Riva

NVIDIASpeech-to-TextText-to-SpeechMultilingualGenerally AvailableNVIDIA EULAvm-nv-004

About

Enterprise streaming speech AI platform deployable on-prem or in the cloud. Combines ASR, TTS, and NLU in a single gRPC-based service with GPU-optimized inference via TensorRT and Triton Inference Server.

Capabilities (5)

Streaming ASR + TTS

On-prem deployment

TensorRT optimization

Custom language models

gRPC & WebSocket APIs

Transcript will appear here in real-time as you speak…

Key Highlights

Full on-prem deployment for data-sovereign environments

TensorRT and Triton integration delivers sub-150ms latency

Unified ASR + TTS + NLU service behind a single API

Use Cases

Audiobook Narration

Generate natural-sounding narration for long-form content with consistent voice quality.

Notification Systems

Deliver voice alerts and notifications with expressive, human-like speech synthesis.

Multilingual Content

Produce audio content in multiple languages from a single text source.

Real-Time Voice Chat

Power low-latency voice responses in interactive applications and games.

Code Example

// Riva — Text-to-Speech
import { synthesize } from "@arkitekton/voice";

const audio = await synthesize({
  model: "vm-nv-004",
  vendor: "nvidia",
  input: "Hello, welcome to Arkitekton.",
  voice: "alloy",
  response_format: "mp3",
  speed: 1.0,
});

// Play the audio
const blob = new Blob([audio], { type: "audio/mp3" });
const url = URL.createObjectURL(blob);
const player = new Audio(url);
player.play();

Related Models

PersonaPlex 7B

NVIDIA

NeMo ASR

NVIDIA

NeMo TTS

NVIDIA

Parakeet

NVIDIA

ACE (Avatar Cloud Engine)

NVIDIA

gpt-4o-realtime

OpenAI

Quick Stats

Latency<150ms end-to-end

Languages20 supported

LicenseNVIDIA EULA

PricingEnterprise license

StatusGenerally Available

Vendor

NVIDIA

GPU-accelerated speech AI and conversational frameworks

View all NVIDIA models

Documentation

View on NVIDIA Site

Audiobook Narration

Generate natural-sounding narration for long-form content with consistent voice quality.

Notification Systems

Deliver voice alerts and notifications with expressive, human-like speech synthesis.

Multilingual Content

Produce audio content in multiple languages from a single text source.

Real-Time Voice Chat

Power low-latency voice responses in interactive applications and games.

Code Example

// Riva — Text-to-Speech
import { synthesize } from "@arkitekton/voice";

const audio = await synthesize({
  model: "vm-nv-004",
  vendor: "nvidia",
  input: "Hello, welcome to Arkitekton.",
  voice: "alloy",
  response_format: "mp3",
  speed: 1.0,
});

// Play the audio
const blob = new Blob([audio], { type: "audio/mp3" });
const url = URL.createObjectURL(blob);
const player = new Audio(url);
player.play();