FastConformer

NVIDIASpeech-to-TextEdge / On-DeviceCustom TrainingGenerally AvailableApache 2.0vm-nv-007

About

Highly optimized Conformer variant achieving competitive WER at 8x inference speedup over standard Conformer models. Combines convolutional subsampling with multi-head attention for efficient streaming and offline ASR.

Capabilities (5)

8x inference speedup

Streaming + offline modes

CTC and RNN-T heads

ONNX and TensorRT export

Custom fine-tuning

Transcript will appear here in real-time as you speak…

Key Highlights

8x faster than standard Conformer with comparable accuracy

Exports to ONNX and TensorRT for optimized production serving

Streaming and offline modes in a single model checkpoint

Use Cases

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// FastConformer — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-nv-007",
  vendor: "nvidia",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);

Related Models

PersonaPlex 7B

NVIDIA

NeMo ASR

NVIDIA

NeMo TTS

NVIDIA

Riva

NVIDIA

Parakeet

NVIDIA

ACE (Avatar Cloud Engine)

NVIDIA

Quick Stats

Latency<80ms streaming

Languages15 supported

LicenseApache 2.0

PricingOpen-source / self-hosted

StatusGenerally Available

Vendor

NVIDIA

GPU-accelerated speech AI and conversational frameworks

View all NVIDIA models

GitHub Repository

Meeting Transcription

Transcribe meetings in real-time with speaker identification and punctuation.

Call Center Analytics

Analyze customer calls at scale with sentiment detection and keyword spotting.

Content Indexing

Convert audio and video libraries into searchable text archives.

Live Captioning

Provide real-time captions for broadcasts, presentations, and live events.

Code Example

// FastConformer — Speech-to-Text
import { transcribe } from "@arkitekton/voice";

const result = await transcribe({
  model: "vm-nv-007",
  vendor: "nvidia",
  audio: audioFile,
  language: "en",
  options: {
    punctuate: true,
    diarize: true,
    smart_format: true,
  },
});

console.log("Transcript:", result.text);
console.log("Confidence:", result.confidence);