Estimation with no historical data: a Monte Carlo approach

15 January 2010

Estimation with no historical data: a Monte Carlo approach

By Sanzio Castor, MSc, CSM

Better estimates can significantly change the course of a software development project and its budget results. This article proposes to use a basic Monte Carlo simulation to fulfill a lack of accuracy in estimates not driven by historical data.

Often, the goal is to predict the schedule needed to deliver a specific amount of functionality and frequently estimators are forced to provide a single-point estimate. The first suggestion is to re-estimate each feature's best and worst cases (Table 1). A Fibonacci sequence is used because it reflects the greater uncertainty associated with estimates for larger units of work.

   

Why not insert a column with the ‘most likely case’ and then calculate the ‘expected case’ using the Program Evaluation and Review Technique (PERT) formula? It would be a reasonable short-term solution to the problem. The long-term solution would be work with estimators to make their ‘most likely case’ estimates more accurate; however, the analogy between this new project to a similar past one is necessary to compute an ‘expected case’ is wishful thinking in this case study. In this example, the estimation values for each story are made by an individual expert judgment.

A list of user stories based on Mike Cohn’s case study, “Bomb Shelter Studios” with estimates and effort values would look like this1:

As Jon Wittwer states, “the Monte Carlo method is just one of many methods for analyzing uncertainty propagation, where the goal is to determine how random variation, lack of knowledge, or error affects the sensitivity, performance, or reliability of the system that is being modeled.”2 The formula to calculate the MC value for each story in pseudocode is:

if random < 0.5
  apply first estimate;
else
  apply second estimate;

How do we map the MC value with best and worst cases? A second integer random number between the best and worst cases range is generated. For example, for the second user story, 3 was the MC value. This value in the table 1 points to the third row. Then the number 6 was generated—an integer random number between 5 and 8. In this case, we made use of MS Excel's RAND formula. Every time the worksheet is recalculated, a new random number is generated. Remember that “the key to Monte Carlo simulation is generating the set of random inputs.”

The next step is to total the effort column. Subsequently, 5000 sets of random inputs are generated and the effort sum is evaluated for all 5000 sets (MS Excel can handle all iterations with simple macro codification). A sample of results is here:

Using the data in table 3, the final table (Table 4) presents the cumulative probability corresponding to the possible total effort to complete the software development project.

   

Instead of working with a simplistic single-point estimate, the model was embedded with probability and ignored the use of historical data. According to Steve MacConnell, “the key point is that all estimates include a probability, whether the probability is stated or implied. An explicitly stated probability is one sign of a good estimate.”3

REFERENCES

[1]Mike Cohn, Agile Estimating and Planning (Prentice Hall PTR, 2005)

[2]Jon Wittwer, Monte Carlo Simulation Basics (Vertex42.com, 2004), http://vertex42.com/ExcelArticles/mc/MonteCarloSimulation.html

[3]Steve McConnell, Software Estimation: Demystifying the Black Art (Microsoft Press, 2006)

Juanjuan Zang, Agile Estimation with Monte Carlo Simulation (Agile Processes in Software Engineering and Extreme Programming: 9th International Conference, XP 2008)


Opinions represent those of the author and not of Scrum Alliance. The sharing of member-contributed content on this site does not imply endorsement of specific Scrum methods or practices beyond those taught by Scrum Alliance Certified Trainers and Coaches.



Article Rating

Current rating: 0 (0 ratings)

Comments

John Clifford, CSP,CSM,CSPO, 1/20/2010 4:27:04 PM
This is a very interesting article, particularly from a project management point of view ("Can you give me an idea of how much effort this is going to take?"). I believe the application of this technique would have value in that area.

However, the danger here IMO is that it presupposes there is a direct proportional correlation between story points and hours of effort. Again, IMO this leads to the type of thinking that wants to optimize individual resource utilization, resulting in multi-tasking, moving people from team to team and project to project, etc... in short, the local optimization attempts that lead to global suboptimization... the very thing we want to avoid by using story points instead of duration in Scrum.

The idea behind presenting a range in an estimate is to differentiate estimates from commitments, to ensure that executive management knows that we don't know what we don't know (and that is exactly how long something will take). Aren't story points, in and of themselves, the same thing?
Sanzio Albuquerque, CSP,CSM, 1/22/2010 11:08:38 AM
John, I'm glad you enjoyed the article. About effort and commitment, let me use a text from Mike Cohn in his book Agile Estimating and Planning that translates very well your point of concern: "Embedded within each and every estimate is a probability that the work will be completed in the estimated time. (...) A problem with traditional planning can arise if the project team or its stakeholders equate estimating with committing. As Phillip Armour (2002) points out, an estimate is a probability and a commitment cannot be made to a probability. Commitments are made to dates. Normally the date that a team is asked (or told) to commit to is one to which they would assign a less than 100% probability. Prior to making such a commitment the team needs to assess a variety of business factors and risks. It is important that they be given this opportunity and that every estimate does not become an implicit commitment."

I've tried not to talk about commitment in the article, focusing in the estimation technique itself. Many organizations confuse estimates with commitments and about this point, I think we agree.

Sânzio

You must Login or Signup to comment.