Risk of bias with automated examples generation using QAGenerationChain
I was learning the basic of LLM evaluation and the framework is to generate samples of question and answer. Later, the prediction returned from the model that need to be evaluated will be compared with the “correct answer”.