I have been running a code that needs a variable to be in list form in order for the code to run (I am trying to follow the Pinecone OpenAI tutorial). When I run my code, which is (and I apologize for the length, I’m not sure which snippets are necessary to convey my problem):
from tqdm.auto import tqdm
from time import sleep
batch_size = 100 # how many embeddings we create and insert at once
new_data = []
window = 10
stride = 4
for i in tqdm(range(0, len(data), stride)):
i_end = min(len(data)-1, i+window)
summary = ' '.join(data[i:i_end]['summary'])
new_data.append({'year': data.loc[i]['Year'],
'summary': summary,
'quarter': data.loc[i]['Quarter']})
new_data = list(new_data)
for i in tqdm(range(0, len(new_data), batch_size)):
# find end of batch
i_end = min(len(new_data), i+batch_size)
meta_batch = new_data[i:i_end]
# get texts to encode
summarys= [x['summary'] for x in meta_batch]
# create embeddings (try-except added to avoid RateLimitError)
try:
res3 = client.embeddings.create(input=summarys, model=model)
except:
done = False
while not done:
sleep(5)
try:
res3 = client.embeddings.create(input=summarys, model=model)
done = True
except:
pass
record = []
sos = res3.data[0].embedding
for k in range(0, len(res3.data[0].embedding)):
record.append(sos)
embeds = record
# cleanup metadata
meta_batch = [{
'year': x['year'],
'summary': x['summary'],
'quarter': x['quarter']
} for x in meta_batch]
to_upsert = list(zip(embeds,meta_batch))
# upsert to Pinecone
index.upsert(vectors=to_upsert)
I get the error:
Expected a list or list-like data structure, but got: {'year': 2016, 'summary': 'Penn State TMR reports that in 2016 in Northeast SPW had an average price of 6.45 dollars in Q2. Penn State TMR reports that in 2016 in Southwest HPW had an average price of 4.16 dollars in Q2. Penn State TMR reports that in 2016 in Northwest HPW had an average price of 6.27 dollars in Q2. Penn State TMR reports that in 2016 in Northeast HPW had an average price of 6.13 dollars in Q2. Penn State TMR reports that in 2016 in Southwest Pine ST had an average price of 111.0 dollars in Q2. Penn State TMR reports that in 2016 in Southwest Other HST had an average price of 113.0 dollars in Q2. Penn State TMR reports that in 2016 in Southwest SST had an average price of 109.0 dollars in Q2. Penn State TMR reports that in 2016 in Southwest Yellow Poplar had an average price of 233.0 dollars in Q2. Penn State TMR reports that in 2016 in Southwest Soft Maple had an average price of 202.0 dollars in Q2. Penn State TMR reports that in 2016 in Southwest Hard Maple had an average price of 296.0 dollars in Q2.', 'quarter': 'Q2'}
But as far as I can tell that is a list? I specifically tried to convert new_data to a list by
new_data = list(new_data)
but that didn’t seem to solve the problem. I am at quite a loss as to what is going on here and any insight would be greatly appreciated. Again I apologize for the length! The whole code is even longer so I am trying to include just the relevant block but please let me know if I have missed any helpful context.
Thank you I appreciate it a lot!