The One Metric for Innovation Accounting (and How to Use it)

‘Innovation Accounting’ is a loose term for the idea that companies need new metrics to manage innovation, the dominant competitive force driving value today. I agree. After starting and selling (and also not selling) a few startups, investing in others, and advising even more, the only thing I’ve seen reliably produce valuable innovation is intentional, consistent experimentation. Right now, I spend most of my time teaching at UVA’s Darden School of Business. Many of my MBA students have a financial accounting background, and when we talk about ‘innovation accounting’ they ask questions like: ‘What are the standards for this? How do you do it, exactly?’

I think those are good questions. While a good answer won’t look quite like the standards we’ve evolved for financial accounting, right now the starting points for doing actual ‘innovation accounting’ are still pretty murky. A good answer is timely and important because so many teams are practicing some form of agile, not sure if they’re doing it ‘right’, and, of course, lots of software still ends up as waste—not valuable enough to users and not commercially successful.

Week to week, quarter to quarter, here’s the economic question every individual in an agile team should be asking themselves: ‘Is what I’m doing reducing our cost to release a successful feature?’ The equation here describes that in terms of one metric: ‘F’. Basically, ‘F’ considers the cost to build a feature relative to how many of those features are successful. The cost for your team and the amount of ‘release content’ you produce is generally easy to estimate. Setting thresholds for ‘success’ on your individual features requires more judgment, but is integral to the practice of Lean Startup and any purposeful approach to agile.

The first component is ‘team cost’, which you would sum up over whatever period you’re measuring. This includes ‘c$’, which is total compensation including loading (benefits, equipment, etc.), as well as ‘g’ which is the cost of the gear you use—that might be application infrastructure like AWS, GCP, etc. along with any other infrastructure you buy or share with other teams. For example, using a backend-as-a-service like Heroku or Firebase might push up your value for ‘g’ while deferring the cost of building your own app infrastructure.

The next component is ‘release content’, fe. If you’re already estimating story points somehow, you can use those. If you’re a NoEstimates crew, and, hey, I get it, then you’d need to do some kind of rough proportional sizing of your release content for the period in question. The next term, ‘rf’, is optional but this is an estimate of the time you’re having to invest in rework, bug fixes, manual testing, manual deployment, and anything else that doesn’t go as planned.

The last term, ‘sd’, is one of the most critical and is an estimate of the proportion of your release content that’s successful relative to the success metrics you set for it. For example, if you developed a new, additional way for users to search for products and set the success threshold at it being used in >10% of searches, did that feature succeed or fail by that measure? Naturally, if you’re not tracking user behavior in this way, regularly instrumenting the observation you need will require some work and some changes to your habits, but it’s hard to deliver value in agile if you don’t know whether you’re succeeding or failing at creating the user behaviors you need for product or feature to be successful.

The example here shows how a team might tabulate this for a given month:

You can find an example calculation on Google Sheets, and notes on how to calculate ‘F’ on my page about hypothesis-driven development.

Is the punchline that you should be shooting for a cost of $1,742 per story point? No. First, this is for a single month and would only serve the purpose of the team setting a baseline for itself. Like any agile practice, the interesting part of this is seeing how your value for ‘F’ changes from period to period, using your team retrospectives to talk about how to improve it. Second, this is just a single team, and the economic value (ex: revenue) related to a given story point will vary enormously from product to product.

Like any metric, ‘F’ only matters if you find it workable to get in the habit of measuring it and paying attention to it. As a team, say, evaluates its progress on OKR (objectives and key results), ‘F’ offers a view on the health of the team’s collaboration together in the context of their product and organization. For example, if the team is accruing technical debt, that will show up as a steady increase in ‘F’. If a team is invested in test or deploy automation, or started testing their release content with users more specifically, that should show up as a steady lowering of ‘F’.

For a team that’s using hypothesis-driven development to run their product pipeline (see below), ‘F’ offers a complement to sprint retrospectives, helping the team think about their practice of HDD.


Running an end-to-end, collaborative practice of HDD across your product pipeline will give you balanced focus and investment of all the levers you have to reduce ‘F’. For example, great practice of continuous design won’t deliver nearly as much value as it could if you can’t get changes to users for testing because your pipeline doesn’t allow you to release frequently. Just design thinking, just Lean Startup, even just running lots of experiments isn’t alone enough to reliably get to wins and lower ‘F’. I’ve only seen one approach that always works: clear, decisive, iterative, experimentation across your product pipeline.