I’m looking for references about hypothesis testing in software management. For example, we might wonder whether “crunch time” leads to an increase in defect rate – this is a surprisingly difficult thing to do.
There are many questions on how to measure quality – this isn’t what I’m asking. And there are books like Kan which discuss various quality metrics and their utilities. I’m not asking this either. I want to know how one applies these metrics to make decisions.
E.g. suppose we decide to go with critical errors / KLOC. One of the problems we’ll have to deal with with that this is not a normally distributed data set (almost all patches have zero critical errors). And further, it’s not clear that we really want to examine the difference in means. So what should our alternative hypothesis be?
(Note: Based on previous questions, my guess is that I’ll get a lot of answers telling me that this is a bad idea. That’s fine, but I’d request that it’s based on published data, instead of your own experience.)
I hope this answer is not too basic, but a simple, but I believe effect method for evaluating metrics is the control chart. This chart shows an expected performance range, and when the metric goes outside, it is easy to spot. Informally, there may be some techniques to plot defects per hundred committed changes vs. day within a 28 day sprint that would show whether changes early in the sprint were relatively defect free, but at then end, things turn ugly. It might be interesting to show a before and after Agile that shows similar plots for days after the deadline vs. days before the deadline.
2