Cloud text-to-speech service offering standard, neural, and generative engine tiers. Provides SSML control, lexicon management, speech marks for lip sync, and newscaster and conversational speaking styles.
Three engine tiers let you balance cost, quality, and expressiveness
Newscaster and conversational speaking styles for media and IVR
Speech marks output enables real-time lip sync in avatar applications
Generate natural-sounding narration for long-form content with consistent voice quality.
Deliver voice alerts and notifications with expressive, human-like speech synthesis.
Produce audio content in multiple languages from a single text source.
Power low-latency voice responses in interactive applications and games.
// Amazon Polly — Text-to-Speech
import { synthesize } from "@arkitekton/voice";
const audio = await synthesize({
model: "vm-amz-002",
vendor: "amazon",
input: "Hello, welcome to Arkitekton.",
voice: "alloy",
response_format: "mp3",
speed: 1.0,
});
// Play the audio
const blob = new Blob([audio], { type: "audio/mp3" });
const url = URL.createObjectURL(blob);
const player = new Audio(url);
player.play();Cloud-native speech services within the AWS ecosystem