Skip to main content

Image to Text

NaviGator Toolkit offers the capability to allow the LLM to use an image as an input to a chat conversation. This allows the LLM to analyze the image and respond to questions based on the inputted image.

Quickstart

The following example shows how to write a python script that takes a URL of an image and asks the LLM what is in the image. The image is of the seal of the University of Florida.

  from openai import OpenAI
from dotenv import dotenv_values
import os
import time

# Set your OpenAI API key and base URL here
api_key = "sk-XXXXXXXX" # Replace with your OpenAI API key
base_url = "https://api.ai.it.ufl.edu/v1/" # Base URL for OpenAI API

# Initialize the OpenAI API client
client = OpenAI(api_key=api_key, base_url=base_url)

response = client.chat.completions.create(
model="mistral-small-3.1",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/en/thumb/6/6d/University_of_Florida_seal.svg/1280px-University_of_Florida_seal.svg.png"
}
},
],
}
],
)

print(response.choices[0])