Problem Summary
I am trying to us OpenAI API to make a batch request to chatgtp 3.5 turbo using Google Colab. It seems to work fine but it only returns the prompt. It never answers the question with the 1000 max_new_token allowance.
I am following the instructions on https://platform.openai.com/docs/guides/batch/getting-started
The goal
To get Chatgtp to evaluate a response generated by another AI.
Libraries and Data
chatgtp_context = """Please act as an impartial...```
prompt = 'n # Query:nwhy are elementary particles structureless?n # Answer:n',
'response': 'Hello! Elementary particles are...If you have any other questions, please let me know!'},
import openai
import os
from google.colab import userdata
OPEN_API_KEY =os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
from google.colab import files
uploaded = files.upload()
Llama_2_7b_output_response = Llama_2_7b_700_output_response.Llama_2_7b_output_response
API Format
openai_queries = {
"custom_id": "request-1",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-3.5-turbo-0125",
"messages": [
{"role": "system", "content": chatgtp_context},
{"role": "user", "content": Llama_2_7b_output_response}
],
"max_new_tokens": 1000
}
}
** Note that “messages” is a list of dictionaries structured as the ‘prompt’ example.
API requires data to be in JSONL type file
with open('batch.jsonl', 'w') as jsonl_file:
jsonl_file.write(json.dumps(openai_queries))
from openai import OpenAI
client = OpenAI()
batch_input_file = client.files.create(
file=open("batch.jsonl", "rb"),
purpose="batch"
)
batch_input_file_id = batch_input_file.id
Create request
client.batches.create(
input_file_id=batch_input_file_id,
endpoint="/v1/chat/completions",
completion_window="24h",
metadata={
"description": "nightly eval job"
}
)
Fetch response
file_response = client.files.content(batch_input_file_id)
print(file_response.text)
I have tried consulting OPENAI’s manual for batching and asking ChatGTP. Looking throughout the internet has not yielded answers.
As far as I know, I am doing it according to the book by formatting the call as a list of dictionaries, changing it to a .jsonl file(.jsonl files have one dictionary on each new line.)
2
Oh wow. So the documentation says that when you are trying to retrieve the output, you code: from openai import OpenAI client = OpenAI() file_response = client.files.content("file-xyz") print(file_response.text)
However, file-xyz is NOT OF THE ORIGINAL ID. This id DOES NOT EXIST until after the batch has been completed. Once it has completed, you can find it in the output of from openai import OpenAI client = OpenAI() file_response = client.files.content("file-abc") print(file_response.text)
Where file-abc is the original id.