Faster Whisper

Hugging Face / Open SourceSpeech-to-TextMultilingualEdge / On-DeviceGenerally AvailableMITvm-hf-007

About

CTranslate2-based reimplementation of Whisper that runs up to 4x faster with comparable accuracy. Uses INT8 quantization and optimized compute kernels for efficient inference on CPU and GPU with reduced memory footprint.

Capabilities (5)

4x faster than Whisper

INT8 quantization

Reduced memory usage

CPU + GPU support

Whisper-compatible output

Transcript will appear here in real-time as you speak…

Key Highlights

4x faster than original Whisper with INT8 quantization

Significantly reduced memory footprint enables larger batch sizes

Drop-in replacement producing identical output format

Use Cases

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// Faster Whisper — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-hf-007",
  vendor: "huggingface",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);

Related Models

NeMo ASR

NVIDIA

Riva

NVIDIA

Parakeet

NVIDIA

gpt-4o-realtime

OpenAI

gpt-4o-mini-realtime

OpenAI

gpt-4o-mini-transcribe

OpenAI

Quick Stats

Latency<100ms (large-v3 on GPU)

Languages97 supported

LicenseMIT

PricingOpen-source / self-hosted

StatusGenerally Available

Vendor

Hugging Face / Open Source

Community-driven open-source speech models and toolkits

View all Hugging Face / Open Source models

GitHub Repository

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// Faster Whisper — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-hf-007",
  vendor: "huggingface",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);