I am using Azure AI Search instance with an embedding function text-embedding-ada-002
. I am calling the embedding function via AzureOpenAIEmbeddings
class using langchain_openai
library:
self.model = AzureOpenAIEmbeddings(model=self.embedding_deployment,
azure_endpoint=self.endpoint,
openai_api_key = self.api_key,
openai_api_version="2024-02-01")
I can receive the embedding function without any problem. However, when I use this embedding function to call and create the Azure AI Search index by using AzureSearch
in langchain.community.vectorstores
library like:
self.search_model = AzureSearch(azure_search_endpoint=self.azure_search_endpoint,
azure_search_key=self.search_api_key,
index_name=self.index_name,
embedding_function=self.model.embed_query,
) #CONNECTION ERROR
I am receiving this error:
Exception has occurred: ConnectTimeout
HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000020FFF8A1010>, 'Connection to openaipublic.blob.core.windows.net timed out. (connect timeout=None)'))
KeyError: 'Could not automatically map <embedding_deployment_name> to a tokeniser. Please use `tiktoken.get_encoding` to explicitly get the tokeniser you expect.'
During handling of the above exception, another exception occurred:
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
During handling of the above exception, another exception occurred:
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x0000020FFF8A1010>, 'Connection to openaipublic.blob.core.windows.net timed out. (connect timeout=None)')
It seems that there is a network problem while getting tiktoken file called cl100k_base
.
I found a manual solution here:
how to use tiktoken in offline mode computer
As a summary, it downloads the given .tiktoken file and stores the directory in an environment variable called TIKTOKEN_CACHE_DIR
which does not seem reliable solution for a project that will be used for users clone the repository.
Is there any solution available to solve this issue without downloading a file to a local computer and storing it to the repository?