OpenAI · Specialized

Whisper

OpenAI's open-source automatic speech recognition model trained on 680,000 hours of multilingual audio for robust transcription and translation.

Overview

Whisper is OpenAI's open-source automatic speech recognition (ASR) model trained on 680,000 hours of multilingual and multitask supervised data from the web. It approaches human-level accuracy on standard English speech recognition benchmarks while demonstrating remarkable robustness to accents, background noise, and technical language. Whisper supports transcription in 99 languages and translation to English, making it the most versatile open-source ASR model available and a foundation for countless audio processing applications.

Parameters

39M (tiny) to 1.5B (large-v3)

Languages

99 languages supported

Training Data

680,000 hours of audio

Architecture

Encoder-decoder transformer

License

MIT

Capabilities

Multilingual speech-to-text transcription (99 languages)

Speech translation to English from any supported language

Robust handling of accents, noise, and technical terminology

Timestamp generation at word and segment level

Language detection and identification

Use Cases

Transcribing meetings, interviews, and podcasts automatically

Adding subtitles and captions to video content

Building multilingual voice interfaces and voice search

Creating accessible content through automated transcription

Pros

+Near-human accuracy on English speech recognition
+Open-source with MIT license for unrestricted use
+99-language support makes it the most versatile open ASR model
+Robust to real-world audio conditions and diverse accents

Cons

-Large models require significant GPU memory for real-time use
-Can hallucinate text for silent or low-quality audio segments
-Real-time streaming requires additional engineering effort
-Translation is limited to English as the target language

Pricing

Free and open-source for self-hosting. OpenAI API: $0.006/minute of audio. Runs on consumer GPUs; Tiny model runs on CPU.

Related Models

deepgram-ai elevenlabs chatgpt-openai

WhisperWhisper