Image to Text
NaviGator Toolkit offers the capability to allow the LLM to use an image as an input. This allows the LLM to analyze the image and respond to questions based on the inputted image.
The image can be provided to the models as a base64 encoded string or a url of the image can be sent in the call and the model will fetch the image first.
Image to text can be achieved by either using the OpenAI Chat Completions API or the OpenAI Responses API.
Chat Completions API - URL
The following example shows how to write a python script that takes a URL of an image and asks the LLM what is in the image. The image is of the seal of the University of Florida.
- Chat URL
from openai import OpenAI
from dotenv import dotenv_values
import os
import time
# Set your OpenAI API key and base URL here
api_key = "sk-XXXXXXXX" # Replace with your OpenAI API key
base_url = "https://api.ai.it.ufl.edu/v1/" # Base URL for OpenAI API
# Initialize the OpenAI API client
client = OpenAI(api_key=api_key, base_url=base_url)
response = client.chat.completions.create(
model="mistral-small-3.1",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/en/thumb/6/6d/University_of_Florida_seal.svg/1280px-University_of_Florida_seal.svg.png"
}
},
],
}
],
)
print(response.choices[0])
Chat Completions API - Inline
The following example shows how to write a python script that base64 encodes a file and sends it along with the message.
This call requries the following information to be filled out:
- PATH_TO_IMAGE - with the path to the image file you wish to upload
- IMAGE_TYPE - with the type of image that it is valid options are: jpeg, png, gif (non-animated), webp
- Chat Inline
from openai import OpenAI
from dotenv import dotenv_values
import os
import time
# Set your OpenAI API key and base URL here
api_key = "sk-XXXXXXXX" # Replace with your OpenAI API key
base_url = "https://api.ai.it.ufl.edu/v1/" # Base URL for OpenAI API
# Initialize the OpenAI API client
client = OpenAI(api_key=api_key, base_url=base_url)
imageFile = open(PATH_TO_IMAGE, "rb")
contents = imageFile.read()
b64Image = base64.b64encode(contents).decode('utf-8')
try:
response = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/IMAGE_TYPE;base64,{b64Image}"
}
},
],
}
],
)
print(response.choices[0].message.content)
except Exception as e:
print(f"Exception caught: {e}")
exit(1)
Responses API - URL
In the following example you will provide URL and the LLM will fetch and analyze the image based on your prompt.
This call requries the following information to be filled out:
- URL - the URL to the image that you would like the LLM to retrieve and analyze
- Responses URL
from openai import OpenAI
from pydantic import BaseModel
# Set your OpenAI API key and base URL here
api_key = "sk-XXXXXXXX" # Replace with your OpenAI API key
base_url = "https://api.ai.it.ufl.edu/v1/" # Base URL for OpenAI API
prompt = "What is in this image?"
image_url = "URL"
# Initialize the OpenAI API client
client = OpenAI(api_key=api_key, base_url=base_url)
response = client.responses.create(
model="gpt-5-mini,
input=[
{
"role": "user",
"content": [
{
"type": "input_text",
"text": f"{prompt}"
},
{
"type": "input_image",
"image_url": f"{image_url}"
}
]
}
]
)
response_id = response.id
retrieved_response = client.responses.retrieve(response_id)
print(f"Response text is: {retrieved_response.output_text}")
delete_response = client.responses.delete(response_id)
Local models do not support retrieving the response via the response id so just use response.output_text instead and don't worry about deleting the response.
Responses API - Inline
In the following example you will provide an image and the LLM will analyze the image based on your prompt.
This call requries the following information to be filled out:
- PATH_TO_IMAGE - with the path to the image file you wish to upload
- IMAGE_TYPE - with the type of image that it is valid options are: jpeg, png, gif (non-animated), webp
- Responses Inline
from openai import OpenAI
from pydantic import BaseModel
# Set your OpenAI API key and base URL here
api_key = "sk-XXXXXXXX" # Replace with your OpenAI API key
base_url = "https://api.ai.it.ufl.edu/v1/" # Base URL for OpenAI API
prompt = "What is in this image?"
image = "PATH_TO_IMAGE"
with open(image,"rb") as image_file:
image_contents = base64.b64encode(image_file.read()).decode("utf-8")
# Initialize the OpenAI API client
client = OpenAI(api_key=api_key, base_url=base_url)
response = client.responses.create(
model="gpt-5-mini,
input=[
{
"role": "user",
"content": [
{
"type": "input_text",
"text": f"{prompt}"
},
{
"type": "input_image",
"image_url": f"data:image/IMAGE_TYPE;base64,{image_contents}"
}
]
}
]
)
response_id = response.id
retrieved_response = client.responses.retrieve(response_id)
print(f"Response text is: {retrieved_response.output_text}")
delete_response = client.responses.delete(response_id)
Local models do not support retrieving the response via the response id so just use response.output_text instead and don't worry about deleting the response.