What is the best way to benchmark Gen AI tools for code generation or code reviews etc
A lot of organizations and individuals are evaluating Gen AI tools for code, test case generation, Code review agents, and more. There is no standard I am aware of currently available for these tools to generate a quantifiable score to be used as a standard benchmark.