I’m trying to upsert some data from a list into a Pinecone index with the following code:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")
def generate_embeddings(text):
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1).squeeze().detach().numpy()
return embeddings
embeddings = [generate_embeddings(article) for article in article_content]
pc = Pinecone(api_key="API_KEY")
pc.create_index(
name="index-name",
dimension=4096, # Replace with your model dimensions
metric="cosine", # Replace with your model metric
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
)
)
index = pc.Index("index-name")
index.upsert(embeddings)
but I’m getting the following error:
Traceback (most recent call last):
File "c:Userstvishllama api.py", line 158, in <module>
index.upsert(embeddings)
File "C:UserstvishAppDataLocalPackagesPythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0LocalCachelocal-packagesPython311site-packagespineconeutilserror_handling.py", line 11, in inner_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:UserstvishAppDataLocalPackagesPythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0LocalCachelocal-packagesPython311site-packagespineconedataindex.py", line 175, in upsert
return self._upsert_batch(vectors, namespace, _check_type, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:UserstvishAppDataLocalPackagesPythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0LocalCachelocal-packagesPython311site-packagespineconedataindex.py", line 206, in _upsert_batch
vectors=list(map(vec_builder, vectors)),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:UserstvishAppDataLocalPackagesPythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0LocalCachelocal-packagesPython311site-packagespineconedataindex.py", line 202, in <lambda>
vec_builder = lambda v: VectorFactory.build(v, check_type=_check_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:UserstvishAppDataLocalPackagesPythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0LocalCachelocal-packagesPython311site-packagespineconedatavector_factory.py", line 30, in build
raise ValueError(f"Invalid vector value passed: cannot interpret type {type(item)}")
ValueError: Invalid vector value passed: cannot interpret type <class 'numpy.ndarray'>
This is my first time working with vectors and most of this code is stuff I just found online. Does anyone know what this means and how to fix it?
Your embeddings are in a numpy array object, rather than a list of floats as is required by the Pinecone upsert method. Try using .tolist()
on the array to convert it to a list of floats.