Skip to main content

Whisper Large v3

Approved Data Classifications

Description

Whisper Large v3 is an advanced automatic speech recognition (ASR) model developed by OpenAI. It maintains the same architecture as its predecessors but introduces improvements such as using 128 Mel frequency bins for input (up from 80) and adding a new language token for Cantonese. Trained on 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio, Whisper Large v3 demonstrates enhanced performance across various languages, with a 10% to 20% reduction in errors compared to Whisper Large v2. This model excels in both speech recognition and translation tasks, supporting multiple languages and showing robust generalization capabilities across different datasets and domains without requiring fine-tuning.

Capabilities

ModelTraining DataInputOutputContext LengthCost (per minute of audio)
whisper-large-v3Oct 2023AudioTextn/a$0.006/minute
info
  • Pricing is based on one minute of audio
  • All prices listed are based on 1 minute of audio

Availability

Cloud Provider

Usage

curl https://api.ai.it.ufl.edu/v1/audio/transcriptions \
-H "Authorization: Bearer <API_TOKEN>" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/file/audio.mp3" \
-F model="whisper-large-v3"