Speech to Text

The Speech to Text (STT) Generation API toolkit provides intuitive endpoints for utilizing advanced models that specialize in generating text from speech. These Speech to Text generation models have been trained on extensive datasets, allowing them to interpret a variety of audio inputs to generate accurate trasncriptions.

The current model available to achieve this is whisper-large-v3

Here are some simple examples of how to call this API in both python and using curl.

curl
python

curl -s "https://api.ai.it.ufl.edu/v1/audio/transcriptions" \
     -H 'Content-Type: multipart/form-data' \
     -H "Authorization: Bearer $NAVIGATOR_TOOLKIT_API_KEY" \
     -F file="@/path/to/file" \
     -F model="whisper-large-v3" \
     -F "language=en"

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ai.it.ufl.edu/v1",
    api_key="$NAVIGATOR_TOOLKIT_API_KEY"
)

audio_file = open("/path/to/file", "rb")
model = "whisper-large-v3"

try:
    response = client.audio.transcriptions.create(
        model=model,
        file=audio_file
    )
except Exception as e:
    print(f"Error: Some exception occurred: {e}")
    exit(1)

try:
    print(response.text)
except Exception as e:
    print(f"Could not find text response in response object:\n {response}")