Create Text Embedding

This task allows to create embeddings of a text. The embeddings can be used to search, cluster, classify, recommend, anomaly detect or to measure diversity.

Description

This task allows to create an embedding of a text. Please note that this is an internal task, it is used for more advanced tasks that utilities it and also in the REST API if you want an embedding you can further process.

What is an Embedding?

An embedding is an array of numbers representing latent factors of a text.

More simple explained it categorizes text into automatically generated genres similar we do with movie genres and for each genre there is a number how strong the text is related to that genre.

In an simplified example the text 'I love horror movies' may result in an embedding [0.8, 0.5, 0.5, 0.0] where the first dimension stands for movies, the second for horror, the third for entertainment and the forth for school (this dimensions are auto generated). The text 'I like sci-fi movies' may result in the embedding [0.8, 0.1, 0.5, 0.0].

The embedding of different texts can be easily compared by calculating how much the embeddings overlap. In the example a similarity between the two texts could be calculated and a relatively high number would occur as they share the same values in a lot of embedding dimensions.

Available embedding models

OpenAI Text Embedding - ADA 002 (text-embedding-ada-002) [NORMALIZED, TEXT-LENGTH-PRICING]

NORMALIZED: This model is 1 normalized, which means that cosine similarity can be computed with the dot product and that euclidean distance equals to the cosine similarity. TEXT-LENGTH-PRICING: The pricing is based on text length

How to use it?

Enter the text you want to create an embedding from as Input. The resulting embedding can be used to search, cluster, classify, recommend, anomaly detect or to measure diversity.

[1.0.0]:

First release

Version	AI Model	Created	Link
1.0.0	openai	28.03.2023

API

The REST API allows you to call the tool with the same costs as when running the tool. Please generate an Personal access token before using the REST API.

Parameters

input (Input): The input to create the embedding from
model (Embedding model): Defines the embedding model

Call the REST API by cURL
curl -v -H "Authorization: Bearer PERSONAL_ACCESS_TOKEN" https://api.anysolve.ai/rest/v1/intern-embeddings/1.0.0?input=Name%20some%20states%20of%20the%20USA&model=text-embedding-ada-002

Install the package with pip
python3 -m pip install anysolve

Run in python3

import os
from anysolve import AnySolve

anysolve_token = os.environ.get('ANYSOLVE_PERSONAL_ACCESS_TOKEN') # Resolve your personal access token here
client = AnySolve(anysolve_token)
res = client.run('intern-embeddings','1.0.0', {'input': 'Name some states of the USA', 'model': 'text-embedding-ada-002'})

print(res)

Coming soon: Within AnySolve ChatComplete prompts you can use the following command to execute the task:
/run('intern-embeddings','1.0.0', input='Name some states of the USA', model='text-embedding-ada-002')