I am trying to use Llama 3 on VertexAI to process an image and extract data from the image and put it into JSON format. I have this working with Gemini in a Jupyter Notebook hosted on Vertex, but the output varies so wildly and inconsistently so it’s not useful at this juncture. I’m also trying to achieve this with Llama 3 using Jupyter as well.
First I create an instance of the Endpoint
class:
model = GenerativeModel(model_name="llama3") #Not sure if this is even the correct model name but the eventual call to `predict` stopped throwing a 404 with this value
endpoint = aiplatform.Endpoint('123456')
Then I load an image from disk:
imageBytes = image_to_byte_array(image_path)
I create my prompt: question = "You are a helpful assistant. Your task ..."
Finally I call predict()
:
response = endpoint.predict(instances=[question, imageBytes])
This call returns an error 400 with the message: FailedPrecondition: 400 The request size (1575204 bytes) exceeds 1.500MB limit.
I’ve confirmed that the image is 120k bytes, and the prompt itself is 604 bytes. I can’t figure out why the call would fail with a size exceeded error. Does anyone have any clues on what I can check next?