Whisper Large v3
Approved Data Classifications
Description
Whisper Large v3 is an advanced automatic speech recognition (ASR) model developed by OpenAI. It maintains the same architecture as its predecessors but introduces improvements such as using 128 Mel frequency bins for input (up from 80) and adding a new language token for Cantonese. Trained on 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio, Whisper Large v3 demonstrates enhanced performance across various languages, with a 10% to 20% reduction in errors compared to Whisper Large v2. This model excels in both speech recognition and translation tasks, supporting multiple languages and showing robust generalization capabilities across different datasets and domains without requiring fine-tuning.
Capabilities
Model | Training Data | Input | Output | Context Length | Cost (per minute of audio) |
---|---|---|---|---|---|
whisper-large-v3 | Oct 2023 | Audio | Text | n/a | $0.006/minute |
info
- Pricing is based on one minute of audio
- All prices listed are based on 1 minute of audio
Availability
Cloud Provider
Usage
- curl
- python
- javascript
curl https://api.ai.it.ufl.edu/v1/audio/transcriptions \
-H "Authorization: Bearer <API_TOKEN>" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/file/audio.mp3" \
-F model="whisper-large-v3"
from openai import OpenAI
client = OpenAI(
api_key="your_api_key",
base_url="https://api.ai.it.ufl.edu/v1"
)
audio_file= open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
model="whisper-large-v3",
file=audio_file
)
print(transcription.text)
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: 'your_api_key',
baseURL: 'https://api.ai.it.ufl.edu/v1'
});
async function main() {
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream("/path/to/file/audio.mp3"),
model: "whisper-large-v3",
});
console.log(transcription.text);
}
main();