Objective
I’m working on an idea for a game that involves the user giving answers to questions. While the mechanic seems quite simple the implementation seems to be quite a lot more complex than I thought.
Users will get prompted in a chatbot style to answer some questions, and the responses will be used to score the performance; accuracy of answer, humour etc.
Ultimately I’m trying to find either a model or approach using llms or nlp that can accurately deduce whether a sentence correctly answers a question, either with supplied answer context or implicit knowledge.
What I’ve tried so far
Worth mentioning I am very much a hobbyist trying to figure something out, not a data scientist or ml engineer. Thought this was an interesting problem statement though and keen to see what else I could try.
Approaches I’ve tried:
Use chatGPT. Really chatGPT can do what I’m asking for with it’s existing knowledge base. It can also give some sort of quantification for scoring. E.g. you are a school child, ask me a question about your geography homework. And then score my response based on accuracy, detail etc.
With some tuning this could work if inconsistent, but there’s a major problem in that I don’t want to be paying per call (not that I’m expecting 100s of people to ply this game) but if I could do this locally that would be great. It also makes the chat part of the game actually responsive and real feeling.
Run a local LLM. Running a local LLM for the chat part as above makes sense and works quite well however these smaller models are actaully runnable locally really struggle quite a bit at providing complex comparisons and specific return formats such as an accuracy or detail score. I considered splitting up the chat bot using this and then running my own NLP solution over the answers.
Run a hand rolled-ish answer checker. I’ve had limited success with sentence similarity and answer extraction to extract the answer to a question from a text dump (wikipedia) and then compare the answer similarity. This works but has limitations in that the answer extraction is fairly rudimentary – answers must be next to each other, can’t handle multi-part answers.
What next?
This is a big post basically asking if there’s something else I should try or give up given the limitations of todays chatbots running locally. Searching models on hugging face is quite hard so I want to make sure I haven’t missed some golden bullet for answer checking, but happy if the consensus is that this isn’t really possible or my current attempts are the best I’ll be able to achieve.