Parler-TTS

Hugging Face / Open SourceText-to-SpeechCustom TrainingGenerally AvailableApache 2.0vm-hf-003

About

Controllable text-to-speech model trained and released by Hugging Face. Generates speech conditioned on natural language descriptions of the desired voice characteristics, enabling prompt-based voice control without fine-tuning.

Capabilities (5)

Natural language voice control

Description-conditioned generation

Custom training recipes

Reproducible research

HF Transformers integration

161 chars

Speed1.0x

Pitch1.0

0:00.00

Key Highlights

Control voice with natural language descriptions like 'a calm female narrator'

Open training recipes enable reproduction and customization

Integrates natively with the Hugging Face Transformers ecosystem

Use Cases

Audiobook Narration

Generate natural-sounding narration for long-form content with consistent voice quality.

Notification Systems

Deliver voice alerts and notifications with expressive, human-like speech synthesis.

Multilingual Content

Produce audio content in multiple languages from a single text source.

Real-Time Voice Chat

Power low-latency voice responses in interactive applications and games.

Code Example

// Parler-TTS — Text-to-Speech
import { synthesize } from "@arkitekton/voice";

const audio = await synthesize({
  model: "vm-hf-003",
  vendor: "huggingface",
  input: "Hello, welcome to Arkitekton.",
  voice: "alloy",
  response_format: "mp3",
  speed: 1.0,
});

// Play the audio
const blob = new Blob([audio], { type: "audio/mp3" });
const url = URL.createObjectURL(blob);
const player = new Audio(url);
player.play();

Related Models

PersonaPlex 7B

NVIDIA

NeMo ASR

NVIDIA

NeMo TTS

NVIDIA

Riva

NVIDIA

ACE (Avatar Cloud Engine)

NVIDIA

OpenAI TTS

OpenAI

Quick Stats

Languages1 supported

LicenseApache 2.0

PricingOpen-source / self-hosted

StatusGenerally Available

Vendor

Hugging Face / Open Source

Community-driven open-source speech models and toolkits

View all Hugging Face / Open Source models

GitHub Repository

Audiobook Narration

Generate natural-sounding narration for long-form content with consistent voice quality.

Notification Systems

Deliver voice alerts and notifications with expressive, human-like speech synthesis.

Multilingual Content

Produce audio content in multiple languages from a single text source.

Real-Time Voice Chat

Power low-latency voice responses in interactive applications and games.

Code Example

// Parler-TTS — Text-to-Speech
import { synthesize } from "@arkitekton/voice";

const audio = await synthesize({
  model: "vm-hf-003",
  vendor: "huggingface",
  input: "Hello, welcome to Arkitekton.",
  voice: "alloy",
  response_format: "mp3",
  speed: 1.0,
});

// Play the audio
const blob = new Blob([audio], { type: "audio/mp3" });
const url = URL.createObjectURL(blob);
const player = new Audio(url);
player.play();