My cheapest estimate
This article is a translation from the original French version available at Arolla’s blog.
Predictions are hard, especially about the future. Still, everybody wants some. In a coherent world, we would only need to predict the release of a very few features. However, as we are often forced into estimating everything, the quicker the better. Life is too short to waste your time on divination.
At a previous job, we could make predictions in an efficient and effective way. It took 2 or 3 hours to predict several months of releases for each team. And the results were pretty good, compared to the organizations I had seen before. In other organizations, predictions were only derived from cost estimates. In this organization, we relied as much as we could on what we had done in the past.
Let’s start with a disclaimer: I don’t have numbers anymore, so mores or lesses will have to do. Now with the context.
We were 3 or 4 teams, releasing 2 or 3 products on the shelf, more or less related to each other.
Dependencies, between teams or internal to teams, were extremely limited. We’ll come back to that later.
Backlog items were small. By that I mean they were done in 2 to 3 days top. I usually need to work on that issue first. In that organisation, it was already anchored in people’s minds.
Every delivered item was attached to bigger items. Even bugs (I still don’t see the difference between bugs and the rest, I only see things to do, gaps towards an ideal). Using jira, we called big items epics. It took time for the teams to realize epics needed to have an end in order to be useful. But in this article, let’s consider epics were done in the release.
An epic was made up of 10 to 50 smaller items. We’ll call these smaller items tickets. Jira or not, I refuse to call them user stories: user stories are stories, about users, period.
At the end of a release, we could quickly know what the team had foreseen at the beginning of a release.
Recap. At the end of a release, we knew:
- What epics the team had prophecized.
- What epics were done.
- What tickets were done in each epic.
From there, we could announce a content for the next epic:
- Estimate each epic to do.
- Compute each team’s capacity, in tickets.
- Compute the proportion of that capacity we can use for planned and unplanned items.
- Thus, predict the release content.
Considering we knew how many tickets past epics were made up of, we stored them in big buckets. The following buckets were enough: 1, 3, 10, 30, 100. In my career, I quickly realized the Fibonacci sequence was too precise. For example, 1 2 and 3 could be merged into a common 2. And when you predict the future, you don’t want to let people think you know what you’re doing. In the word estimate, there is estimate, never forget that.
Then, we took future epics, from highest to lowest priority, and put each one in the corresponding bucket, by comparing them to the done ones.
We avoided at all cost estimating the number of tickets it would take to finish the epic. That is the mistake you don’t want to make: predicting the future.
We agreed by consensus. A simple rule to settle conflicts: if we thought an epic didn’t fit a given bucket, it went to a bigger one.
This exercise took approximately 1h, with the whole team.
Some people told me I should talk about t-shirt sizing, when they saw the 1/3/10/30/100 scale. A few comments about that:
- T-shirt sizing doesn’t have an arythmetic. You can’t add t-shirt sizes.
- That being said, it’s also its main quality. And choosing buckets for epics while ignoring how many tickets will compose them, is very close to t-shirt sizing. We briefly tried that, but participants were a lot faster with numeric cues. Culture, it depends on teams.
- When I compare kids playing in the kindergarten and the seriousness of estimation meetings, I realize we could also use animals: fly, dog, elephant, diplodocus. The main advantage of this scale is that diplodocuses don’t exist, and we would always like big backlog items didn’t exist.
That’s pretty simple. If you did 200 tickets in the last release, you can predict your team to do 200 in the next one.
You could cross product values when release durations vary. But be wary of such computations: releases always have more or less exploratory periods, and they don’t spread linearly.
Applying linear computations for number of people, availability of people, or holidays, is even more dangerous. I prefer considering a team as a whole. Waiting times explain more of durations than the capacity of teams to parallelize tasks correctly done.
Actually, I think that the function (working time -> team capacity) is not computable or predictable. It is not increasing, not even continuous, and certainly not linear. Don’t evaluate team capacity from working time, it’s simpler.
This exercise took about 15 minutes for the scrum master and the PO.
By comparing what had been planned and what had been done, we got an unplanned rate. Between the start and the delivery of a release, we change our minds, reprioritize, discover functional holes, technical debt, and so on. Well, there is no reason for that to change. Therefore, we considered that unplanned rates would remain identical between releases.
We only considered planned and unplanned epics. We didn’t need to categorize tickets of an epic.
Our unplanned rates were approximately 65 to 75%. That is to say, 65 to 75% of what we did in a release had not been foreseen at the beginning of the release. That’s the way it is. Just take reality as it is, don’t try to distort it. Neither should you do hope-based planning, by firmly affirming you wouldn’t change your mind next time.
Taking new information into account is good news. If you have little unplanned work, don’t take that as good news without digging into it. There is a great chance that someone is burying his head, or, like the three monkeys, is shutting down every door, yelling lalalala prostrated in the corner of a meeting room.
This exercise took approximately 1 hour for the PO, depending on the difficulty of archeological excavations.
We had team capacity, and an unplanned rate. From these, we knew how many tickets we couldn plan. For example, if we had done 400 tickets in the previous release, including 100 planned tickets, then we could plan 100 tickets for the next release. Having estimated epics in number of tickets, we knew which epics we could announce for the next release.
Turtles all the way
What we did for a release could be adapted to other levels or granularity. We used a variation of this method for 2-week iterations, by replacing epics with counting tickets).
The only thing you need to know is, unplanned rate must be evaluated independently for each level of granularity. Knowing you have 75% unplanned work at the release level won’t help you evaluate uncertainty at the iteration level, and vice versa. It can be more, it can be less.
Why did this method work?
First, we found the right level of granularity. Like all nested systems, you have one or more stable (i.e. predictable) levels. In our case, stable levels were epics for the release, and tickets for the iteration. We were lucky to have a system with stable levels, and to identify them the first time.
Then, we already talked about it, dependencies were very few. Workload thus explained most of delivery times. It is rarely the case. In general, items spend most of their time in queues. In this case, estimating workload is useless. Rather measure lead times, and start from there.
Finally, the law of large numbers and having many small tickets per epic helped smoothing out disparities. At the epic scale, and considering the error inherent in divination, we could consider all tickets as identical.
It’s up to you to find the stable elements of your system. This method was the result of several iterations. Stable elements will help you predicting the future. By measuring these stable elements, and projecting them into the future, you have a higher chance to come up with realistic previsions. Avoid estimating what’s going to happen, at all cost. Too many biases will pollute your calculations.
I didn’t invent anything, it is more or less the #NoEstimate approach. I thought of proposing a few links to this approach here, but internet already took care of that. It’s up to you now. Happy exploration.