Azure Speech Services

MicrosoftSpeech-to-TextText-to-SpeechMultilingualCustom TrainingGenerally AvailableProprietaryvm-ms-002

About

Comprehensive cloud speech platform offering 600+ neural voices across 150+ locales with Neural HD and DragonFly quality tiers. Includes real-time transcription, Custom Neural Voice training, pronunciation assessment, and avatar synthesis.

Capabilities (5)

600+ neural voices

Custom Neural Voice

Pronunciation assessment

Real-time transcription

Avatar synthesis

Transcript will appear here in real-time as you speak…

Key Highlights

600+ voices across 150+ locales — the broadest catalog in the industry

Custom Neural Voice trains brand voices from professional recordings

Pronunciation assessment API for language learning applications

Use Cases

Audiobook Narration

Generate natural-sounding narration for long-form content with consistent voice quality.

Notification Systems

Deliver voice alerts and notifications with expressive, human-like speech synthesis.

Multilingual Content

Produce audio content in multiple languages from a single text source.

Real-Time Voice Chat

Power low-latency voice responses in interactive applications and games.

Code Example

// Azure Speech Services — Text-to-Speech
import { synthesize } from "@arkitekton/voice";

const audio = await synthesize({
  model: "vm-ms-002",
  vendor: "microsoft",
  input: "Hello, welcome to Arkitekton.",
  voice: "alloy",
  response_format: "mp3",
  speed: 1.0,
});

// Play the audio
const blob = new Blob([audio], { type: "audio/mp3" });
const url = URL.createObjectURL(blob);
const player = new Audio(url);
player.play();

Related Models

PersonaPlex 7B

NVIDIA

NeMo ASR

NVIDIA

NeMo TTS

NVIDIA

Riva

NVIDIA

Parakeet

NVIDIA

ACE (Avatar Cloud Engine)

NVIDIA

Quick Stats

Latency<200ms streaming

Languages150 supported

LicenseProprietary

PricingFrom $1.00 / 1M characters (Neural)

StatusGenerally Available

Vendor

Microsoft

Enterprise speech services across Azure and research labs

View all Microsoft models

Documentation

View on Microsoft Site

Audiobook Narration

Generate natural-sounding narration for long-form content with consistent voice quality.

Notification Systems

Deliver voice alerts and notifications with expressive, human-like speech synthesis.

Multilingual Content

Produce audio content in multiple languages from a single text source.

Real-Time Voice Chat

Power low-latency voice responses in interactive applications and games.

Code Example

// Azure Speech Services — Text-to-Speech
import { synthesize } from "@arkitekton/voice";

const audio = await synthesize({
  model: "vm-ms-002",
  vendor: "microsoft",
  input: "Hello, welcome to Arkitekton.",
  voice: "alloy",
  response_format: "mp3",
  speed: 1.0,
});

// Play the audio
const blob = new Blob([audio], { type: "audio/mp3" });
const url = URL.createObjectURL(blob);
const player = new Audio(url);
player.play();