Episode #69 – Cognitive Bias in Estimation with Jim Benson

Featured speakers:

Clayton Lengel-Zigich Clayton

Clayton Lengel-Zigich, Drew LeSueur chat with Jim Benson about:

  • Cognitive Bias
  • Estimation versus estimates
  • Estimation accuracy
  • PlanningPoker(tm)
  • Group think
  • Meaningful estimates
  • Story points interpreted as money or time
  • Story points vs cycle time
  • Estimating new projects

You can find Jim on twitter @ourfounder and learn more about his company Modus Cooperandi, Inc.



Clayton Lengel‑Zigich:  Welcome to another episode of the Agile Weekly Podcast. I’m Clayton.

Drew LeSueur:  I’m Drew.

Clayton:  With us today, we’ve got Jim Benson. I might know him better on Twitter as ourfounder. We wanted to talk today about the cognitive bias and estimation. Jim, I see that you’ve written an eBook or Kindle book about plans and cognitive biases, under just biases in general and plans. Could you give us an overview of what it was that prompted you to write that book?

Jim Benson:  Sure. My career actually started in Psychology and as I worked my way through being an urban planner where I built really, really large things, like subway systems and freeways. Later, when I came to software development, it was incredibly obvious to me that people just couldn’t estimate their way out of a paper bag.

Most of the breakdowns in projects regardless of what they were or who they were for, generally centered around problems with the estimates.

I started to look into reasons why that was and I started finding clues in psychology. The psychology of how we approach problems, how we gather information, how we make decisions, all of those combine to really muck‑up our estimates.

Clayton:  OK. You know one thing, I don’t know if you are a part of this camp, but a popular mindset nowadays seems to be that estimates are wasteful. No one ever gets them right, so why bother doing them? Where do you stand on that?

Jim:  Eisenhower said that planning was important and that estimates for it and plans were useless. I believe that the same thing is true, that estimation is indispensable and that estimates are useless.

Going through the exercise of estimating is actually rather important. When you change it to an active word instead of a physical object, going from an estimate to estimating, then estimating becomes something that you do constantly throughout the project and that’s much more helpful.

Clayton:  That is an interesting way to think about it. Obviously there’s a lot of teams that, probably experienced the idea, they do some estimates and that gets held against them or something like that. But I agree that there is something important about that mental exercise.

Now, in terms of some of the biases or some of the more psychology things you hinted at, what are some examples there that you could give us about some things that some agile teams might face?

Jim:  I’ll just start to cherry‑pick and I’ll come to the big one, maybe third.

The first one is something called the “availability heuristic.” When you look back on things that have happened and we pick out exemplars of either our fears or our hopes, and we then start to make decisions based on those exemplars. The worst part of this is an exemplar can be status quo.

What we found is that the error in estimates follow almost a perfect Pareto curve, or almost a perfect power curve. We start to feel that we’re very good at estimates because we actually do get them right, about right, or excusably right, about 80 percent of the time. The other 20 percent of the time, we perceive that we are dead wrong, so we say “If I could just get that last 20 percent right, everything would be fine.”

But it’s actually a natural power curve, it’s a natural law, especially in software development that we’re always going to fall prey to because there’s too much variation in our work for us to estimate accurately, a 100 percent of the time.

Clayton:  Take scrum as a methodology that puts a lot of emphasis on predictability and timeboxes. Usually in most of the training literature there’s a lot of emphasis on the estimates, specifically Planning Poker.

On a scrum team, do you think they just need to find some way to overcome some of those biases and problems, and just deal with it? Or should they find creative ways to avoid those issues, or creative ways to do estimates differently?

Jim:  I’ll start this by saying it is dangerous to ask me about Planning Poker, and then I will try my best to state this as distinctly as I possibly can. Planning Poker itself was devised to get around groupthink and other cognitive biases to play teams.

The theory behind it was, if you got people together and they in silence and at least cognitively separated from each other came out with like an opening bid for what the number of story points were for a given story, or a given task, or given feature, or whatever you are estimating against. That will overcome the bias. What happens in teams almost uniformly is that as they do planning poker over time, the teams’ estimates become more uniform.

People see that as a good thing because they see that the team becoming more accurate. But what’s actually happening is that the team is learning how everybody else learns. There’s a heuristic that is being developed within the team that says, when we as a team sees this, we do these things. The individuals stop acting like individuals and they start acting like a team and our time planning poker becomes less and less useful because the individual is sublimated to the will of the group.

A lot of people will argue with me about that because it’s a hard thing for you to swallow as an individual because you don’t feel like you’re doing that. But the actions of the coalescing of the estimates of the teams are very good indications that that indeed is what is happening.

Clayton:  Is that something that, if you go back to the a typical or particular size scrum team, if they are coalescing on those estimates, does it really matter if the estimates are kind of…if they can come full circle with the group think thing…if they are using their velocity, maybe their estimates are all close to each other because they are all kind of learning maybe subconsciously however on those estimates.

But does that really matter in the overall schema of things or is that something that they should avoid?

Jim:  It doesn’t matter because they are still going to be wrong 28 percent of the time, not because they’re wrong, so I want to make this clear that the people doing planning poker are not wrong. The estimates that they are doing for at least…say you have a three‑point story and you have six of them and four of them are right. Let’s say the three‑point story takes three hours to complete.

They do 3,3,3,3, and that’s fine, and the next one is 7 and the next one is 11. That is a valid distribution on our Pareto scale. All three of those different time signatures or time stamps are all valid for that three‑point story.

The problem is our commitment, the sprint commitment, does not take that into account. One of those three is going to end up taking 11 hours. It’s going to make people blow their sprint commitment and people are going to feel bad about that and then they’re going to wonder why and they’re going to try to fix it the next time but it’s not really valid to fix, because statistically that was a valid distribution.

Clayton:  I would question that a little bit, Jim. You’re absolutely right if you look at probability theory you roll a dice, a three‑six‑sided dice or a six‑six‑sided dice and the number of times you’re going to get a distribution curve that’s just like what you’re talking about. But if the team were to do a commitment driven approach to planning I would argue that they would know that one three‑point story was 11 hours and one three‑point story was 7 hours before they actually made the commitment.

Jim:  And that’s awesome. That’s completely awesome and I’m happy with that. The thing is that right now, this conversation we’re having is about a hypothetical team that is operating at a hypothetical level self‑awareness and most teams A, aren’t that self aware and, B, don’t have the luxury to backpedal when they find out that there is variation in their system.

Clayton:  Yeah, I think that the real problem is that we’ve propagated for far too long on this community that yesterday’s weather is a good way when using story points to predict tomorrow and then actually ask teams to make commitments against yesterday’s weather.

And as any weathermen would tell you that they’re not probably willing to bet their jobs on yesterday’s weather and I don’t think that sprint teams should be doing that either and so how do we start educating people that if you’re going to take the time to estimate and if you’re going to take the time to use velocity, that you use the proper techniques to actually make it meaningful.

I would almost argue, to most teams, it is not even worth them taking the time to estimate, because they’re not doing release planning. They’re estimating for, I guess, just to make their boss feel they’re doing as much work as they say they can do, but it’s not being used for anything meaningful, so it almost defeats the purpose.

Jim:  Yes, I would agree with that. I do want to make it clear that my dislike of the measure of story points doesn’t really have too much to do with agile or the active estimating. It has to do with creating a system that doesn’t translate well from one part of the organization to another.

Story points end up being integers. Those integers are communicated to people who try and interpret them, and they’re going to interpret them incorrectly, because, for the team, it’s a relative measure of what ideally would be bizarreness. That’s a 13‑point story, because it’s really quite bizarre.

But other people are going to uniformly interpret those as either money or time. That’s very dangerous, and it leads to a lot of unnecessary conversations and unnecessary meaning. I actually think that the active estimation is awesome, but the creating an artifact that can be so easily misinterpreted is dangerous.

Clayton:  Is it really dangerous, or is it just being used improperly? One of the things that I would argue is that, I think, the reason why teams should probably use points if they’re going to estimate is because they are integers, and they can be used for things. The problem is the people used them incorrectly for a time.

If I try, as you stated earlier, say, how many hours is three points equal to, that is very, very dangerous, because three points, by itself, is going to be very highly variable. However, if I’ve got 100 three‑point stories in my backlog or in my release plan, I can probably get some normalized numbers out of there.

If I say, “Hey, for the last X number of sprints, we’ve been doing roughly X number of story points,” while still an estimate, I can predict the future to a degree ‑‑ several weeks or even a month or so out ‑‑ to be able to say, “I should be able to get roughly somewhere in the neighborhood of this.”

I think people get in trouble when they make them absolutes, when they don’t have the discussions around it, and then they take those numbers to be, “Well, you said you could do 30 story points, so I took 30 story points times 10 iterations, and, by God, you’d better give me 300 story points.” That’s dangerous.

Jim:  Exactly.

Drew:  You talk about the state of estimating. I’m wondering, what does that look like? If it’s not story points, what does that look like? Also, how did you apply that with the bigger projects like subway systems, or whatever, that you did?

Jim:  I will answer that by skipping forward to [inaudible 13:28] , because, if we’re having a cognitive bias conversation, I should probably say something about cognitive bias. The one I’ll talk about really quickly here is the planning policy. The planning policy is exemplified by something called Hofstadter’s Law.

Hofstadter’s Law states that, when given a task, people will uniformly underestimate that task, even when they are aware of Hofstadter’s Law.


Jim:  The planning policy basically says that, that we, as individuals, are really lousy at estimating, we’re extremely lousy when estimating for other people, we’re unbelievably lousy when estimating for ourselves, and we’re just super incredibly terrible at it when estimating for ourselves with witnesses.

When you get into a situation where you’re estimating, we have a lot of natural tendencies to underestimate the thing that we’re estimating. This has been tested by psychologists, social scientists, and behavioral economists around the world. It’s been shown to be a cross‑cultural, universal human condition.

Part of the reason for that, I believe, is that we don’t understand the role of variation in our work, so we don’t understand that Pareto distribution all along a three‑point story. Therefore, when we promise things to people, we promise them like, “Every time I do this task, it will take me three hours.”

In software development, we do not have that luxury. Our work is way too variable. What I replace that with is either cycle time or lead time with some sort of visual control. It might be a Kanban and it might be something else, but, if you have a system that can measure when you start working on a task, or a feature, or a user story, and, to the point that you finish it, that’s it ‑‑ cycle time.

It doesn’t care what excuse you have about why it took longer than you thought it would. What it will do is will say, “That task was a three‑hour task that I got interrupted four times, so it took me 12 hours.” I’ve got bad news for you. For deliverable standpoint, it was a 12‑hour task.

Replacing story points with actual statistical measure of what is completed and how long it takes to complete that thing is extremely powerful. When we do that, we get an added benefit. We used to have to say, “Well, this is a two‑point story,” “This is a three‑point story,” “This is an eight‑point story,” and so forth.

Now, we can say, “That story is too big,” or, “This story, I can do. I don’t care if it’s going to take me an hour, if it’s difficult, or I don’t care if this can take me 13 hours,” because, over the course of the project, the distribution of those stories from small to large is going to be relatively stable, is going to be relatively like it was in the last project.

We’ve been finding that that distribution is uniform or fairly uniform between projects, and, when we start to distinguish between things like a three‑ or a five‑point story, we run into something that’s called distinction bias, where human beings love to figure out what the difference of nearly identical things are.

When I’m doing this before a crowd, I can hold up two green pens, and people instantly, everybody in the room who’s looking at me, are trying to figure out what the differences are between the two things that I’m holding in my hand, and they’re both the same green marker pens.

So using a statistical measure that is impartial to our excuses and immune from a couple of these biases ‑‑ not all of them, but a couple ‑‑ helps us build more predictable models.

Clayton:  The one thing that I could say that could be a potential downside to that is it requires you actually do the work.

If you’re trying to say, “Here’s a project that we might want to tackle, and we’re not sure how feasible it is. Developers, could you give me some ballpark so that I know am I looking at something that might be 6 weeks or something that’s 60 weeks. I don’t have to be precise on it, but I need a rough ballpark to know if it’s something I want to go, grab funding for.”

How do you do that in the cycle time model?

Jim:  There’s two things, if you’re starting and you don’t have a cycle time, then you do a traditional estimate. If you’re starting and you do have cycle time, then you use your cycle time.

You can only start from where you are and the fact that you don’t have data yet doesn’t mean that you can’t collect that data. I’m particularly well known for hating metrics. I don’t like to use that many of them. Basically the only numeric metric that I use are the two that we’re talking about, and the reason for that is that most metrics are lagging indicators.

Right now, the question that you’re asking me is, “If I get this metric, that’s all well and good, but is it best going to be a real‑time indicator and more likely is going to be a lagging indicator.”

If you’re starting a big project, everybody is going to need to get around and they’re going to need to figure out what the level of effort assumed is for that project. After you have this information and perhaps before if you could run some spikes, you can start to figure out what that cycle time is and then you can say the things like, “I believe that the project you’re giving me or we agree that the project that you’re giving me is made up of these 50 initial user stories.”

You’re now buying an option from this team on 50 user stories. We agree to deliver 40 or 50 user stories. Between now and the completion of the project, which we anticipate given our current cycle time is going to be this date. We will do 50 user stories for you, and frankly, we don’t care what they are.

As we move through the project, we’re really not worried about what the specific user stories that are coming up are because we as software development professionals know that over the course of the project 80 percent of the features change anyway.

They’re basically buying a block of work as opposed to a product and assuming that their product will be able to be done within that block of work.

Clayton:  I think we’ve definitely stepped into a very interesting conversation, but unfortunately, we’ve run out of time here. If people listening wanted to find out more about you or if there’s any books you think they should read or anything like that they should check out, where would they do that and what kind of suggestions would you have for them?

Jim:  OK. The self‑serving parts are my name is Jim Benson, and I’m at ourfounder on Twitter, and I’m ourfounder on just about everything else that has ever been put on the Web.

We currently have three books out at Modus Press, one is “ScrumBan” by Corey Ladas and other is “Personal Kanban” by me and Tonianne DeMaria Barry and the third, which is specifically about these cognitive biases is called “Why Plans Fail,” and that’s just an eBook, it’s a little $2.99 eBook.

The Personal Kanban website, which is personalkanban.com has tons of blog posts and free information. My personal blog is ourfounder.com, and my company is Modus Cooperandi, which I’m not going to spell for you, but that’s what it’s called.

Clayton:  We’ll use Google Suggest, how about that?

Jim:  Yes.

Clayton:  We also like to invite the listeners to check out the Agile weekly Facebook page where you can continue the conversations for these different podcasts and one that we have.

We wanted to say thanks for joining us today, Jim. We really appreciate your conversation.

Jim:  Thank you guys. This is fun.


Announcer:  If there’s something you’d like to hear in our future episode, head over to integrumtech.com/podcast, where you can suggest a topic or a guest.

Looking for an easy way to stay up‑to‑date with the latest news, techniques, and events in the Agile community, sign up today at agileweekly.com. It’s the best Agile content delivered weekly for free.

The Agile weekly podcast is brought to you by Integrum Technologies and recorded at Gangplank Studios in Chandler, Arizona. For all the episodes, check out integrumtech.com or subscribe on iTunes.