Skip to main content

Embedding

Embedding models empower users to convert text into numerical representations, enabling various applications such as search functionalities. Text embeddings evaluate the similarity between text strings. Common applications of embeddings include:

  • Search : Ranking results based on relevance to a query.
  • Clustering : Grouping text strings by their similarities.
  • Recommendations : Suggesting items with related text descriptions.
  • Anomaly Detection : Identifying outliers that show little similarity to other data points.
  • Diversity Measurement : Analyzing similarity distributions.
  • Classification : Assigning text strings to the most appropriate labels based on similarity.

An embedding is represented as a vector—a list of floating point numbers. The distance between two vectors indicates their level of relatedness; shorter distances signify higher relatedness, while longer distances indicate lower relatedness.

Quickstart

Choosing a model

To create an embedding, send your text string to the embeddings API endpoint, specifying the embedding model name (e.g., nomic-embed-text-v1.5). The response will include an embedding (a list of floating point numbers) that you can extract, store in a vector database, and utilize for various use cases:

curl https://api.ai.it.ufl.edu/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $NAVIGATOR_TOOLKIT_API_KEY" \
-d '{
"input": "Your text string goes here",
"model": "nomic-embed-text-v1.5"
}'

The response will contain the embedding vector along with some additional metadata.

{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.006929283495996425,
-0.005336422007531337,
... (omitted for spacing)
-8.567632266452536e-05,
-0.024047505110500143
],
}
],
"model": "nomic-embed-text-v1.5",
"usage": {
"prompt_tokens": 5,
"total_tokens": 5
}
}

Embedding Models

NaviGator AI Toolkit offer a varity of AI embedding models.

Usage is priced per input token, below is an example of pricing pages of text per US dollar (assuming ~800 tokens per page):

ModelPages Per DollarPerformance on MTEB evalMax Input
nomic-embed-text-v1.5250,00062.28%8192
sfr-embedding-mistral125,00067.6%32,000
gte-large-en-v1.5125,00065.4%8192
text-embedding-3-small62,50062.3%8191
text-embedding-3-large9,61564.6%8191
text-embedding-ada-00212,50061%8191