MeloTTS

Hugging Face / Open SourceText-to-SpeechMultilingualEdge / On-DeviceGenerally AvailableMITvm-hf-006

About

High-quality multilingual text-to-speech model from MyShell optimized for speed on CPU. Supports English, Chinese, Japanese, Korean, and more with real-time factor well below 1.0 on standard hardware.

Capabilities (5)

CPU-optimized inference

Sub-real-time factor

Multiple languages

Speaker style control

Lightweight deployment

161 chars

Speed1.0x

Pitch1.0

0:00.00

Key Highlights

Runs faster than real-time on standard CPU hardware

Lightweight enough for edge and embedded deployment

MIT license with active community maintenance

Use Cases

Audiobook Narration

Generate natural-sounding narration for long-form content with consistent voice quality.

Notification Systems

Deliver voice alerts and notifications with expressive, human-like speech synthesis.

Multilingual Content

Produce audio content in multiple languages from a single text source.

Real-Time Voice Chat

Power low-latency voice responses in interactive applications and games.

Code Example

// MeloTTS — Text-to-Speech
import { synthesize } from "@arkitekton/voice";

const audio = await synthesize({
  model: "vm-hf-006",
  vendor: "huggingface",
  input: "Hello, welcome to Arkitekton.",
  voice: "alloy",
  response_format: "mp3",
  speed: 1.0,
});

// Play the audio
const blob = new Blob([audio], { type: "audio/mp3" });
const url = URL.createObjectURL(blob);
const player = new Audio(url);
player.play();

Related Models

PersonaPlex 7B

NVIDIA

NeMo ASR

NVIDIA

NeMo TTS

NVIDIA

Riva

NVIDIA

Parakeet

NVIDIA

ACE (Avatar Cloud Engine)

NVIDIA

Quick Stats

Latency<100ms on CPU

Languages8 supported

LicenseMIT

PricingOpen-source / self-hosted

StatusGenerally Available

Vendor

Hugging Face / Open Source

Community-driven open-source speech models and toolkits

View all Hugging Face / Open Source models

GitHub Repository

Audiobook Narration

Generate natural-sounding narration for long-form content with consistent voice quality.

Notification Systems

Deliver voice alerts and notifications with expressive, human-like speech synthesis.

Multilingual Content

Produce audio content in multiple languages from a single text source.

Real-Time Voice Chat

Power low-latency voice responses in interactive applications and games.

Code Example

// MeloTTS — Text-to-Speech
import { synthesize } from "@arkitekton/voice";

const audio = await synthesize({
  model: "vm-hf-006",
  vendor: "huggingface",
  input: "Hello, welcome to Arkitekton.",
  voice: "alloy",
  response_format: "mp3",
  speed: 1.0,
});

// Play the audio
const blob = new Blob([audio], { type: "audio/mp3" });
const url = URL.createObjectURL(blob);
const player = new Audio(url);
player.play();