Embedding

Embedding models empower users to convert text into numerical representations, enabling various applications such as search functionalities. Text embeddings evaluate the similarity between text strings. Common applications of embeddings include:

Search : Ranking results based on relevance to a query.
Clustering : Grouping text strings by their similarities.
Recommendations : Suggesting items with related text descriptions.
Anomaly Detection : Identifying outliers that show little similarity to other data points.
Diversity Measurement : Analyzing similarity distributions.
Classification : Assigning text strings to the most appropriate labels based on similarity.

An embedding is represented as a vector—a list of floating point numbers. The distance between two vectors indicates their level of relatedness; shorter distances signify higher relatedness, while longer distances indicate lower relatedness.

Quickstart

Choosing a model

To create an embedding, send your text string to the embeddings API endpoint, specifying the embedding model name (e.g., nomic-embed-text-v1.5). The response will include an embedding (a list of floating point numbers) that you can extract, store in a vector database, and utilize for various use cases:

curl
python
javascript

curl https://api.ai.it.ufl.edu/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $NAVIGATOR_TOOLKIT_API_KEY" \
-d '{
  "input": "Your text string goes here",
  "model": "nomic-embed-text-v1.5"
}'

from openai import OpenAI
  client = OpenAI(
      api_key="$NAVIGATOR_TOOLKIT_API_KEY",
      base_url="https://api.ai.it.ufl.edu/v1"
  )

  response = client.embeddings.create(
  input="Your text string goes here",
  model="nomic-embed-text-v1.5"
  )

  print(response.data[0].embedding)

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: '$NAVIGATOR_TOOLKIT_API_KEY',
  baseURL: 'https://api.ai.it.ufl.edu/v1'
});

async function main() {
  const embedding = await openai.embeddings.create({
    model: "nomic-embed-text-v1.5",
    input: "Your text string goes here",
    encoding_format: "float",
  });

  console.log(embedding);
}

main();

The response will contain the embedding vector along with some additional metadata.

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.006929283495996425,
        -0.005336422007531337,
        ... (omitted for spacing)
        -8.567632266452536e-05,
        -0.024047505110500143
      ],
    }
  ],
  "model": "nomic-embed-text-v1.5",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

Embedding Models

NaviGator AI Toolkit offer a varity of AI embedding models.

Usage is priced per input token, below is an example of pricing pages of text per US dollar (assuming ~800 tokens per page):

Model	Pages Per Dollar	Performance on MTEB eval	Max Input
nomic-embed-text-v1.5	250,000	62.28%	8192
sfr-embedding-mistral	125,000	67.6%	32,000
gte-large-en-v1.5	125,000	65.4%	8192
text-embedding-3-small	62,500	62.3%	8191
text-embedding-3-large	9,615	64.6%	8191
text-embedding-ada-002	12,500	61%	8191

Quickstart​

Choosing a model​

Embedding Models​

Quickstart

Choosing a model

Embedding Models