Comparing two of the Llama3 models, 8b and 405b.
8b I can load locally with docker.
405b is cloud-based. This should present no issues other than the model’s capacity to answer questions.
I think I hit the wall on Llama3-8b-instruct to respond in structured output with a simple query decomposition task. It seems like this should be dead easy for Llama3b but it simply won’t respond with anything else than “None”.
class SubQuery(BaseModel):
"""Given a user question, break it down into distinct sub questions that you need to answer in order to answer the original question."""
sub_questions: List[str] = Field(description="The list of sub questions")
sub_question_generator = llm.with_structured_output(SubQuery)
invoking a question with llama405b:
print(sub_question_generator.invoke(question))
gives you a nice, structured response that I expect…
sub_questions=['What is the goal of GIRFT Urology programme', 'What is TURBT', 'What is URS', 'What are the similarities between TURBT and URS', 'How are TURBT and URS procedures optimized by GIRFT Urology programme']
but when using Llama3 8b locally, there is a pretty disappointing response:
None
Has anyone experienced a diminished response capacity for Llama3-8b-instruct, or have any advice to kickstart Llama3 8b to perform with_structured_output correctly?
1