Measure The Business Value of Your Spikes and Take High Payoff Risks (Lean Software Development Part 4)

8 minute read

Lately at work, we’ve unexpectedly been asked by other teams if they could use our product for something that we had not forseen. As we are not sure whether we’ll be able to tune our product to their needs, we are thinking about doing a short study to know the answer. This looks like a great opportunity to try out Cost of Delay analysis about uncertain tasks.

Unfortunately, I cannot write the details of what we are creating at work in this blog, so let’s assume that we are building a Todo List Software.

We have been targeting the enterprise market. Lately, we’ve seen some interest from individuals planning to use our todo list system for themselves at home.

For individuals, the system would need to be highly available and live 24/7 over the internet, latency will also be critical to retain customers, but the product could get a market share with a basic feature set.

On the other side, enterprise customers need advanced features, absolute data safety, but they can cope with nightly restarts of the server.

In order to know if we can make our todo list system available and fast enough for the individuals market, we are planning to conduct a pre-study, so as not to waste time on an unreachable goal. In XP terms, this is a spike, and it’s a bunch of experiments rather than a theoretical study.

Photo of someone studying behind piles of books

When should we prioritize this spike ?

If we are using the Weighted Shortest Job First metric to prioritize our work, we need to estimate the cost of delay of a task to determine its priority. Hereafter I will explain how we could determine the value of this spike.

Computing the cost of delay

The strategy to compute the cost of delay for such a risk mitigation task is to compute the difference in cost of delays with or without doing it.

1. The products, the features, the MVP and the estimates

As I explained in a previous post, for usual features, cost of delay is equivalent to it’s value. Along with our gross estimates, here are the relative values that our product owner gave us for the different products we are envisioning.

Feature $ Enterprise $ Individuals Estimated work
Robustness 20* 20* 2
Availability 0 40* 2
Latency 0 40* 1
Durability 40* 13 2
Multi user lists 20* 8 2
Labels 20 13 2
Custom report tool 13 0 3
TOTAL Cost Of Delay of v1 80 100  

Stared (*) features are required for the first version of the product. Features with a value of 0 are not required for the product. Eventually, unstared features with a non null business value would be great for a second release.

It seems that the individuals market is a greater opportunity, so it’s worth thinking about it. Unfortunately for the moment, we really don’t know if we’ll manage to get the high availability that is required for such a product.

The availability spike we are envisioning would take 1 unit of time.

2. Computing the cost of delay of this spike

The cost of delay of a task involving some uncertainty is the probabilistic expected value of its cost of delay. We estimate that we have 50% of chances of matching the availability required by individuals. It means that CoD of the spike = 50% * CoD if we match the latency + 50% CoD if we don’t match the availability.

2.a. The Cost of Delay if we get the availability

Let’s consider the future in which we’ll manage to reduce the latency. The cost of delay of a spike task is the difference in Cost with and without doing the spike, per relevent months.

2.a.i. The cost if we don’t do the spike

Unfortunately, at this point in this future, we don’t yet know that we’ll manage to get to the availability.

Feature $ Enterprise $ Individuals $ Expected Estimated work WSJF
Latency 0 40* 20 1 20
Durability 40* 13 26 2 13
Robustness 20* 20* 20 2 10
Availability 0 40* 20 2 10
Labels 20 13 17 2 9
Multi user lists 20* 8 14 2 7
Custom report tool 13 0 8 3 3

We’ll resort to WSJF to prioritize our work. Here is what we’ll be able to ship :

Product Delay CoD Cost
Individuals 7 100 700
Individuals Durability 7 13 91
Individuals Labels 9 13 117
Enterprise 11 80 880
Enterprise labels 11 20 220
Individuals Multi user lists 13 8 104
Enterprise Custom reports 16 13 208
      2320
2.a.ii. The cost if we do the spike

In this case, we would start by the spike, and it would tell us that we can reach the individuals availability and so that we should go for this feature first. Here will be our planning

Feature $ Enterprise $ Individuals Estimated work Enterprise WSJF Individuals WSJF
Feasibility spike     1    
Latency 0 40* 1   40
Availability 0 40* 2   20
Robustness 20* 20* 2 10 10
Durability 40* 13 2 20 7
Multi user lists 20* 8 2 10 4
Labels 20 13 2 10 7
Custom report tool 13 0 3 4  

Here is how we will be able to ship :

Product Delay CoD Cost
Individuals 6 100 600
Individuals Durability 8 13 104
Individuals Multi user lists 10 8 80
Enterprise 10 80 800
Individuals Labels 12 13 156
Enterprise Labels 12 20 240
Enterprise Custom reports 15 13 195
      2175
2.a.iii. Cost of delay of the spike if we reach the availability

By making the spike, we would save 2320 - 2175 = 145$

Without doing the spike, we would discover whether we would reach the availability when we try it, around time 7 (see 2.a.i).

So the cost of delay for the spike would be around 145/7 = 21 $/m

2.b. The Cost of Delay if we don’t get the availability

Let’s now consider the future in which we don’t manage to increase the availability.

Using the same logic as before, let’s now see what happens

2.b.i. The cost if we don’t do the spike

Unfortunately, at this point in this future, we don’t yet know that we’ll not manage to get to the availability.

Feature $ Enterprise $ Individuals $ Expected Estimated work WSJF
Latency 0 40* 20 1 20
Durability 40* 13 26 2 13
Robustness 20* 20* 20 2 10
Availability 0 40* 20 2 10
Multi user lists 20* 8 14 2 7
Labels 20 13 17 2 9
Custom report tool 13 0 8 3 3

When we’ll fail at the availability, we’ll switch multi user lists and labels to be able to ship to enterprises as quickly as possible. Here is what we’ll ship.

Product Delay CoD Cost
Enterprise 9 80 720
Enterprise Labels 11 20 220
Enterprise Custom reports 14 13 182
      1122
2.b.ii. The cost if we do the spike

In this case, we would start by the spike, and it would tell us that we won’t match the availability required for individuals and so that that there’s no need to run after this now.

Feature $ Enterprise Estimated work WSJF
Feasibility spike   1  
Durability 40* 2 13
Robustness 20* 2 10
Multi user lists 20* 2 7
Labels 20 2 9
Custom report tool 13 3 3

Here is how we will be able to ship :

Product Delay CoD Cost
Enterprise 7 80 560
Enterprise Labels 9 20 180
Enterprise Custom reports 12 13 156
      896
2.b.iii. Cost of delay of the spike if we reach the availability

By making the spike, we would save 1122 - 896 = 226$

As before, without doing the spike, we would discover whether we would get the availability when we try it, around time 7.

So the cost of delay for the spike is around 226/7 = 32 $/m

2.c. Compute overall Cost of Delay of the Spike

Bank notes going through an hourglass

Given that we estimate that there is a 50% chances of making the latency, the overall expected cost of delay is

50% * 21 + 50% * 32 = 26.5 $/m

Inject the spike in the backlog

With the Cost of Delay of the spike, we can compute it’s WSJF and prioritize it against other features.

Feature $ Enterprise $ Individuals Expected $ Estimated work WSJF
Feasibility Spike     26.5 1 26.5
Latency 0 40* 20 1 20
Durability 40* 13 26 2 13
Robustness 20* 20* 20 2 10
Availability 0 40* 20 2 10
Multi user lists 20* 8 14 2 7
Labels 20 13 17 2 9
Custom report tool 13 0 8 3 3

The spike comes at the top of our backlog. Which confirms our gut feeling.

Conclusion

Doing this long study confirmed classic rule of thumbs

  • Don’t develop many products at the same time
  • Do some Proof Of Concepts early before starting to work on uncertain features
  • Tackle the most risky features first

By improving the inputs, we could get more quality results :

  • If we had access to real sales or finance figures for the costs
  • If we did some sort of poker risk estimation instead of just guessing at 50% chances

Obviously, the analysis itself is not perfect, but it hints to the good choices. And as Don Reinertsen puts it, using an economical framework, the spread between people estimations goes down from 50:1 to 2:1 ! This seems a good alternative to the experience and gut feeling approach which :

  • can trigger heated unfounded discussions
  • often means high dependence on the intuition of a single individual

As everything is quantitative though, one could imagine that with other figures, we could have got to another conclusion, such as :

  • The spike is not worth doing (it costs more than it might save)
  • The spike can wait a bit

A dilbert strip about gut feeling at work

This was part 4 of my Lean Software Development Series. Part 3 was How to measure your speed with your business value, continue on Part 5 : What optimization should we work on ?.

I usually write about 15 minutes worth of reading per month. I won't transfer your email. No Spam, unsubscribe whenever you want.

As a gift for subscribing, you'll receive an illustrated mini-ebook "How to start a team coding dojo"!

Leave a comment