15 January 2010

**By Sanzio Castor, MSc, CSM**

Better estimates can significantly change the course of a software development project and its budget results. This article proposes to use a basic Monte Carlo simulation to fulfill a lack of accuracy in estimates not driven by historical data.

Often, the goal is to predict the schedule needed to deliver a specific amount of functionality and frequently estimators are forced to provide a single-point estimate. The first suggestion is to re-estimate each feature's best and worst cases (Table 1). A Fibonacci sequence is used because it reflects the greater uncertainty associated with estimates for larger units of work.

Why not insert a column with the ‘most likely case’ and then calculate the ‘expected case’ using the *Program Evaluation and Review Technique* (PERT) formula? It would be a reasonable short-term solution to the problem. The long-term solution would be work with estimators to make their ‘most likely case’ estimates more accurate; however, the analogy between this new project to a similar past one is necessary to compute an ‘expected case’ is wishful thinking in this case study. In this example, the estimation values for each story are made by an individual expert judgment.

A list of user stories based on Mike Cohn’s case study, “Bomb Shelter Studios” with estimates and effort values would look like this^{1}:

As Jon Wittwer states, “the Monte Carlo method is just one of many methods for analyzing uncertainty propagation, where the goal is to determine how random variation, lack of knowledge, or error affects the sensitivity, performance, or reliability of the system that is being modeled.”^{2} The formula to calculate the MC value for each story in pseudocode is:

`if random < 0.5`

apply first estimate;

else

apply second estimate;

How do we map the MC value with best and worst cases? A second integer random number between the best and worst cases range is generated. For example, for the second user story, 3 was the MC value. This value in the table 1 points to the third row. Then the number 6 was generated—an integer random number between 5 and 8. In this case, we made use of MS Excel's `RAND`

formula. Every time the worksheet is recalculated, a new random number is generated. Remember that “the key to Monte Carlo simulation is generating the set of random inputs.”

The next step is to total the effort column. Subsequently, 5000 sets of random inputs are generated and the effort sum is evaluated for all 5000 sets (MS Excel can handle all iterations with simple macro codification). A sample of results is here:

Using the data in table 3, the final table (Table 4) presents the cumulative probability corresponding to the possible total effort to complete the software development project.

Instead of working with a simplistic single-point estimate, the model was embedded with probability and ignored the use of historical data. According to Steve MacConnell, “the key point is that all estimates include a probability, whether the probability is stated or implied. An explicitly stated probability is one sign of a good estimate.”^{3}

REFERENCES

[1]Mike Cohn, *Agile Estimating and Planning* (Prentice Hall PTR, 2005)

[2]Jon Wittwer, *Monte Carlo Simulation Basics* (Vertex42.com, 2004), http://vertex42.com/ExcelArticles/mc/MonteCarloSimulation.html

[3]Steve McConnell, *Software Estimation: Demystifying the Black Art* (Microsoft Press, 2006)

Juanjuan Zang, *Agile Estimation with Monte Carlo Simulation* (Agile Processes in Software Engineering and Extreme Programming: 9th International Conference, XP 2008)

Current rating: 0 (0 ratings)

John Clifford, CSP,CSM,CSPO, 1/20/2010 4:27:04 PMHowever, the danger here IMO is that it presupposes there is a direct proportional correlation between story points and hours of effort. Again, IMO this leads to the type of thinking that wants to optimize individual resource utilization, resulting in multi-tasking, moving people from team to team and project to project, etc... in short, the local optimization attempts that lead to global suboptimization... the very thing we want to avoid by using story points instead of duration in Scrum.

The idea behind presenting a range in an estimate is to differentiate estimates from commitments, to ensure that executive management knows that we don't know what we don't know (and that is exactly how long something will take). Aren't story points, in and of themselves, the same thing?

Sanzio Albuquerque, CSP,CSM, 1/22/2010 11:08:38 AMI've tried not to talk about commitment in the article, focusing in the estimation technique itself. Many organizations confuse estimates with commitments and about this point, I think we agree.

S├ónzio

You must Login or Signup to comment.