Does anyone know if it’s possible to use the OpenAI blocks to create an embedding from a string? Or would one use the standard API calls to do this?
Can you say a bit more about what you mean by “an embedding from a string”?
OpenAI has a function to create ‘embeddings’ from strings / chunks of text. Embeddings are 1536 dimensional vector representations of the text. They can be obtained by sending the string / entire dataset to an OpenAI endpoint, where it runs a model on the data and returns the vector representation of each string: Embeddings - OpenAI API at least that’s my noob understanding.
Fascinating. I had no idea.
The built-in OpenAI block is fairly limited so you would need to configure the API manually for that.
OK - that was surprisingly easy! I managed to get a vector embedding back from the OpenAI API. That means that IF I can vectorise large datasets then I could, in theory:
- Ask a question (record this)
- Convert the sound file to text using WhisperAPI
- Convert the text string to an embedding / vector
- Compare that vector to other vectors in the database
- Return similar vectors > the basis of sematic search
For reference: add the Auth (+API key) and Content-Type to the API configuration menu, along with the url. Then set the model and input as shown to get this to work:
Quick follow-up on this as I did try and integrate that into my previous code: be aware that the above only work because the “Retrieve a test embedding for this text” is in double quotes. Any string you’re using here needs to be in double quotes. My return variable from WhisperAPI.com was not, so I needed to add them in using a Join.