When using planning poker for estimates, we’ve ended up with effectively a non-linear points scale. By that I mean that point value is inconsistent for one-pointers and, say, eight-pointers. Couple examples each:
- 1 point: add tooltip text to a couple of icons; format date&time strings
- 3 points: create new user role and specify their permissions; automatically create a new case in CRM whenever certain actions are performed
- 5 points: watchlist functionality; filtering functionality on several facets
- 8 points: integrate with a third party provider *and* introduce a persistence layer to cache their responses
My concern is two-fold: we lose estimating fidelity by restricting ourselves to the low-end of the scale and any reporting (or calculating average velocity) will be erroneous.
How can I entice the scrum master, product owner and the team to use a more consistent, linear scale?
Edit: just to clarify, we are using one of the traditional decks that goes all the way up to 100… except we’ve somehow anchored ourselves to not go over 5, ending up with effectively a logarithmic scale.
8
How can I entice the scrum master, product owner and the team to use a
more consistent, linear scale?
You shouldn’t need to convince them. Reviewing inaccurate point estimates should be part of your retrospective. If any stories ended up being more complicated than anticipated, make note of it.
We only estimated 8 points on STR-1234; but it was
definitely more than 8 time as complex as a typical 1 point story. I’d say it was more like
13 times as complex. Possibly even in the 21 point range.
At that point, you’ll ideally have a conversation about whether the estimate could have been better with more time to discuss the story, whether it was high enough that it should have been given a huge estimate until being broken down into smaller stories, or whether the bad estimate was unavoidable — and maybe whether it should have been replaced with (or converted to) a spike during the sprint.
If you can, transition into that conversation without “convincing” anyone that points are informative and useful when done right; you just need to keep up the practice of discussing bad estimates at each review.
If you’re SCRUM implementation is broken, you may get pushback.
But, it took the same amount of time as your other 8 pointers!
Or whatever.
At that point, if it occurs, you need to start a discussion about Why you’re attempting to use a SCRUM process and Why successful SCRUM implementations work. That discussion can take months. But, it starts by asking your coworkers and boss what they expect out of the process.
At the very least, to address the question at hand, ask the product owner and team what they expect out of story point estimates. Ask them things like:
- Should they ensure that each sprint can be treated like a commitment?
- Should they ensure we don’t spend too much time on minimally valuable improvements?
- Should they help the product owner determine if X can be done N sprints from now?
The correct answer to all of the above is yes, of course. And to achieve any of the benefits, you need a meaningful velocity data. If 8 != 1 * 8
in your reports, the data is meaningless.
10
For me there is nothing wrong with the fibunacci scale, from what I read and see as examples the problem lies more in the estimates, or the way you estimate.
I do not know the exact situation that you are in, but there seems to be an imbalance between the estimates on the lower ends and the estimates on the higher ends. 8 times a task like adding tooltip text to a couple of icons do not seem to match integrating with integrating 3rd party software and caching the responses.
I know it is not good to compare story points to man days, but I usually seek for a story that will take the developers about 1 day of work, everything included. I don’t tell them that I think it is about 1 day of work, I want to avoid that they compare the points to days. This will be our reference story that gets 2 story points. If another story is only half the work, it will be a 1, if another story is small enough, it might be a 1/2. We pick a good 2 story points story that allows us to go small enough. For bigger stories we always compare them with the reference story and stories that have already been found with a higher number. I see in my team that it helps to have some good reference stories lying on the table with a 1/2, 1, 2, 3, 5, 8 story point estimate. Humans are much better at comparing that at measuring.
We also avoid going to big as you do. The limit is often set to 8 SP’s. The team is allowed to go above the 8 SP in the original estimates, but this almost always results in splitting it up into smaller parts and estimating these parts again.
No one stops you from using a 4 or a 6 or even a 10 if you feel that the team is not confortable of using the fibonacci numbers for a certain story. We do it also from time to time. We do avoid to do this to often.
And yes, sometimes your estimates are wrong. It happens to every team. But the law of the big numbers tells me that if I estimate 10 stories to be an 8, statistically these stories are on average an 8.
For your 8 SP story, I would recommend to first take away the uncertanty. I see a number of possibilities for this that can even be combined.
- Split the story in an integration part and a caching part.
- Clearly define cases to integrate in stead of trying to integrate every specific case at once.
- Go for a timeboxed effort to get to know the third party soft better before estimating.
The biggest problem with using a non-linear story point scale is that it completely screws up Velocity calculations.
In other words, it makes it hard to know how many story points (how much complexity) you can ship in a fixed-length sprint.
Consider two sprints with different story point distributions, but the same total number of story points
Sprint 1 breakdown (16 points):
16 x 1 story point
Sprint 2 breakdown (16 points):
8 x 2 story points
With a linear estimation scale and a steady velocity (eg, 1 story point maps to 1 hour of work) both sprints a comparable amounts of work. Both sprints will take roughly 16 hours and the story point estimates clearly indicate this.
However, if you use an exponential scale you get totally misleading results.
Let’s assume your conversion scale is something like
n story points = n^2 hours
So sprint 1 turns out to be
16 x 1 story point =
16 x (1^2) hours =
16 * 1 hours
16 hours
Sprint 1, consisting of 16 story points, is 16 hours long.
Now let’s consider sprint 2.
8 * 2 story points =
8 * (2^2) hours =
8 * 4 hours =
32 hours
So, sprint 2, consisting of 16 story points, is actually 32 hours long.
This is twice as long as your first sprint because you’re attempting to deliver twice the complexity.
Thus, the impact of using an exponential scale is that the number of story points in a fixed-length sprint doesn’t actually represent the amount of complexity being delivered.
This will make calculating Velocity and using Burdown Charts very difficult.
That should be reason enough to your Scrum Master.
Clarification
The only reason I’m relating story points to time is because sprints have a fixed length in time.
Assuming your team has a fairly stable real velocity, estimating with a non-linear scale will make it so that two sprints of the same length in time with the same story point content being worked on at the same velocity are not actually comparable.
In other words, two sprints with the same total number of story points aren’t actually shipping the same amount of complexity.
6
I’ve used the Fibonacci sequence at several places and it has worked well. The points are a rough estimatation and represent scale as much as a certain level. I’ve found that choosing 5 or 8 for what is an estimate is better than trying to get the desired discussion and ultimately agreement on points when the votes are 5-5-8-5-5-5-8-5
rather than 5-5-9-6-5-6-9-5
To address your concerns about “lose estimating fidelity” I would recommend not worrying about it. Estimating for the future is best used when there has been a bvuild up of several months of activity. A couple of sprints is not enough.
I recommend that your primary use of points be to:
Generate discussion of key points of the ticket when people disagree in their point scores.
3