Embedding
Embedding models empower users to convert text into numerical representations, enabling various applications such as search functionalities. Text embeddings evaluate the similarity between text strings. Common applications of embeddings include:
- Search : Ranking results based on relevance to a query.
- Clustering : Grouping text strings by their similarities.
- Recommendations : Suggesting items with related text descriptions.
- Anomaly Detection : Identifying outliers that show little similarity to other data points.
- Diversity Measurement : Analyzing similarity distributions.
- Classification : Assigning text strings to the most appropriate labels based on similarity.
An embedding is represented as a vector—a list of floating point numbers. The distance between two vectors indicates their level of relatedness; shorter distances signify higher relatedness, while longer distances indicate lower relatedness.
Quickstart
Choosing a model
To create an embedding, send your text string to the embeddings API endpoint, specifying the embedding model name (e.g., nomic-embed-text-v1.5
). The response will include an embedding (a list of floating point numbers) that you can extract, store in a vector database, and utilize for various use cases:
- curl
- python
- javascript
curl https://api.ai.it.ufl.edu/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $NAVIGATOR_TOOLKIT_API_KEY" \
-d '{
"input": "Your text string goes here",
"model": "nomic-embed-text-v1.5"
}'
from openai import OpenAI
client = OpenAI(
api_key="$NAVIGATOR_TOOLKIT_API_KEY",
base_url="https://api.ai.it.ufl.edu/v1"
)
response = client.embeddings.create(
input="Your text string goes here",
model="nomic-embed-text-v1.5"
)
print(response.data[0].embedding)
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: '$NAVIGATOR_TOOLKIT_API_KEY',
baseURL: 'https://api.ai.it.ufl.edu/v1'
});
async function main() {
const embedding = await openai.embeddings.create({
model: "nomic-embed-text-v1.5",
input: "Your text string goes here",
encoding_format: "float",
});
console.log(embedding);
}
main();
The response will contain the embedding vector along with some additional metadata.
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.006929283495996425,
-0.005336422007531337,
... (omitted for spacing)
-8.567632266452536e-05,
-0.024047505110500143
],
}
],
"model": "nomic-embed-text-v1.5",
"usage": {
"prompt_tokens": 5,
"total_tokens": 5
}
}
Embedding Models
NaviGator AI Toolkit offer a varity of AI embedding models.
Usage is priced per input token, below is an example of pricing pages of text per US dollar (assuming ~800 tokens per page):
Model | Pages Per Dollar | Performance on MTEB eval | Max Input |
---|---|---|---|
nomic-embed-text-v1.5 | 250,000 | 62.28% | 8192 |
sfr-embedding-mistral | 125,000 | 67.6% | 32,000 |
gte-large-en-v1.5 | 125,000 | 65.4% | 8192 |
text-embedding-3-small | 62,500 | 62.3% | 8191 |
text-embedding-3-large | 9,615 | 64.6% | 8191 |
text-embedding-ada-002 | 12,500 | 61% | 8191 |