A lot of organizations and individuals are evaluating Gen AI tools for code, test case generation, Code review agents, and more. There is no standard I am aware of currently available for these tools to generate a quantifiable score to be used as a standard benchmark.
I could think of some generic metrics like acceptance rate, false positive rate, potential edge case coverage rate, etc for generated code, test cases, or code review comments
Creating this thread for capturing some good data-driven and automatable metrics like this