Skip to main content

Image to Text

NaviGator Toolkit offers the capability to allow the LLM to use an image as an input. This allows the LLM to analyze the image and respond to questions based on the inputted image.

The image can be provided to the models as a base64 encoded string or a url of the image can be sent in the call and the model will fetch the image first.

Image to text can be achieved by either using the OpenAI Chat Completions API or the OpenAI Responses API.

Chat Completions API - URL

The following example shows how to write a python script that takes a URL of an image and asks the LLM what is in the image. The image is of the seal of the University of Florida.

  from openai import OpenAI
from dotenv import dotenv_values
import os
import time

# Set your OpenAI API key and base URL here
api_key = "sk-XXXXXXXX" # Replace with your OpenAI API key
base_url = "https://api.ai.it.ufl.edu/v1/" # Base URL for OpenAI API

# Initialize the OpenAI API client
client = OpenAI(api_key=api_key, base_url=base_url)

response = client.chat.completions.create(
model="mistral-small-3.1",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/en/thumb/6/6d/University_of_Florida_seal.svg/1280px-University_of_Florida_seal.svg.png"
}
},
],
}
],
)

print(response.choices[0])

Chat Completions API - Inline

The following example shows how to write a python script that base64 encodes a file and sends it along with the message.

This call requries the following information to be filled out:

  • PATH_TO_IMAGE - with the path to the image file you wish to upload
  • IMAGE_TYPE - with the type of image that it is valid options are: jpeg, png, gif (non-animated), webp
  from openai import OpenAI
from dotenv import dotenv_values
import os
import time

# Set your OpenAI API key and base URL here
api_key = "sk-XXXXXXXX" # Replace with your OpenAI API key
base_url = "https://api.ai.it.ufl.edu/v1/" # Base URL for OpenAI API

# Initialize the OpenAI API client
client = OpenAI(api_key=api_key, base_url=base_url)

imageFile = open(PATH_TO_IMAGE, "rb")
contents = imageFile.read()
b64Image = base64.b64encode(contents).decode('utf-8')

try:
response = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/IMAGE_TYPE;base64,{b64Image}"
}
},
],
}
],
)
print(response.choices[0].message.content)
except Exception as e:
print(f"Exception caught: {e}")
exit(1)

Responses API - URL

In the following example you will provide URL and the LLM will fetch and analyze the image based on your prompt.

This call requries the following information to be filled out:

  • URL - the URL to the image that you would like the LLM to retrieve and analyze
  from openai import OpenAI
from pydantic import BaseModel

# Set your OpenAI API key and base URL here
api_key = "sk-XXXXXXXX" # Replace with your OpenAI API key
base_url = "https://api.ai.it.ufl.edu/v1/" # Base URL for OpenAI API

prompt = "What is in this image?"

image_url = "URL"

# Initialize the OpenAI API client
client = OpenAI(api_key=api_key, base_url=base_url)

response = client.responses.create(
model="gpt-5-mini,
input=[
{
"role": "user",
"content": [
{
"type": "input_text",
"text": f"{prompt}"
},
{
"type": "input_image",
"image_url": f"{image_url}"
}
]
}
]
)

response_id = response.id

retrieved_response = client.responses.retrieve(response_id)
print(f"Response text is: {retrieved_response.output_text}")
delete_response = client.responses.delete(response_id)

Local models do not support retrieving the response via the response id so just use response.output_text instead and don't worry about deleting the response.

Responses API - Inline

In the following example you will provide an image and the LLM will analyze the image based on your prompt.

This call requries the following information to be filled out:

  • PATH_TO_IMAGE - with the path to the image file you wish to upload
  • IMAGE_TYPE - with the type of image that it is valid options are: jpeg, png, gif (non-animated), webp
  from openai import OpenAI
from pydantic import BaseModel

# Set your OpenAI API key and base URL here
api_key = "sk-XXXXXXXX" # Replace with your OpenAI API key
base_url = "https://api.ai.it.ufl.edu/v1/" # Base URL for OpenAI API

prompt = "What is in this image?"

image = "PATH_TO_IMAGE"
with open(image,"rb") as image_file:
image_contents = base64.b64encode(image_file.read()).decode("utf-8")


# Initialize the OpenAI API client
client = OpenAI(api_key=api_key, base_url=base_url)

response = client.responses.create(
model="gpt-5-mini,
input=[
{
"role": "user",
"content": [
{
"type": "input_text",
"text": f"{prompt}"
},
{
"type": "input_image",
"image_url": f"data:image/IMAGE_TYPE;base64,{image_contents}"
}
]
}
]
)

response_id = response.id

retrieved_response = client.responses.retrieve(response_id)
print(f"Response text is: {retrieved_response.output_text}")
delete_response = client.responses.delete(response_id)

Local models do not support retrieving the response via the response id so just use response.output_text instead and don't worry about deleting the response.