Hypothesis-Driven Development (Practitioner’s Guide)

After reading this guide and trying out the related practice you will be able to:

  1. Diagnose when and where hypothesis-driven development (HDD) makes sense for your team
  2. Apply techniques from HDD to your work in small, success-based batches
  3. Frame and enhance your existing practices (where applicable) with HDD

What is hypothesis-driven development (HDD)?

Does your product program feel like a Netflix show you’d binge watch? Is your team excited to see what happens when you release stuff? If so, congratulations- you’re already doing it and please hit me up on Twitter so we can talk about it! If not, don’t worry- that’s pretty normal, but HDD offers some awesome opportunities to work better.

Scientific-MethodUnderlying the growing excitement about HDD is the realization that in the product world we’re underusing the scientific method. Now, that doesn’t mean you need to put on a lab coat and be super formal about what you do. What it does mean is that as we deal with uncertainty (aka innovate), we should have explicit, testable ideas and focus on definitely disproving them so we can focus on a better idea or proving them and building on them with confidence. In the diagram here, I’ve borrowed from the language of Lean Startup and referred to these conditions as pivot (disproven) or persevere (proven).

Building on the scientific method, HDD is a take on how to integrate test-driven approaches across your product development activities- everything from creating a user persona to figuring out which integration tests to automate. Yeah- wow, right?! It is a great way to energize and focus your practice of agile and your work in general.

I’ve organized this guide around applying HDD across four big product dev. components:

  1. Finding the Right Problem
  2. Finding the Right Solution/Demand
  3. Finding the Right Solution/Usability
  4. Getting to Continuous Delivery


Here’s a little more on each of those before we dive into the details.

1. Finding the Right Problem

Right-ProblemIs your team referencing personas and jobs-to-be-done/problem scenarios (these are tools for encapsulating your evidence on this)? In particular, how about team members like developers, testers, and analysts who might not be as often involved in interviewing customers? Better yet, are you getting lots of hard/inconvenient questions from that same audience like ‘What problem [or job] is this feature delivering on?’

Are you making a habit of talking to customers (or users if it’s an internal project)? However you interface with your audience, is there a fairly regular and obvious process to do this that’s different than sales or formal meetings for a different purpose?

2. Finding the Right Solution/Demand

This area is pivotal and easy to assess: have you validated that you have something for your audience that they prefer over the alternatives? This area of HDD is kind of a fulcrum between Right Problem and Right Solution:


I’ve found it’s super important to look at motivation/demand for a given proposition separate from usability. They’re related and both important, but the reality on the ground is that teams tend to over invest in usability at the expense of testing demand/motivation.

If you’re familiar with Lean Startup, all this probably sounds familiar. If not, you’ll learn about it in the section below ‘How do you test that you’ve found the ‘right solution’?’.

3. Finding the Right Solution/Usability

Right-Solution-UIIs everything you’re building anchored in strong narrative- specifically, user stories, storyboards, and maybe story maps? Are the stories small, with testable rewards?

Are you in the habit of pairing/marrying qualitative evidence from usability testing with quantitative evidence from your analytics and looking at that evidence at the end of each agile iteration/sprint?

Finally, and this is most telling, are your developers and testers asking you a lot of hard, inconvenient questions during the sprint about those user stories?

4. Getting to Continuous Delivery

Amazon famously releases code every 11.6 seconds. Naturally, getting new product in front of users that often helps them learn quickly, complementing their teams’ work in the areas above. What’s also interesting (I think) is how your work in the other areas can help a) create space and b) drive focus for the investments you need to make into your product pipeline to get to a more continuous capability.

Is Hypothesis-Driven Development agile?

What is agile? What isn’t? The original manifesto is the one thing everyone will generally agree is agile. It’s only 68 words, most of which are that:

& Tools
Responding to
Following a

Really, this is more a set of outcomes you’re trying to get to than a single way to get there. Personally, unpack the practice of agile in the context of a product pipeline- something like this:

In the context of a product pipeline, I’ve found it’s most useful to think about a team’s practice of agile in the three major areas you see above. Continuous Design is about coming up with strong, testable, investable ideas and you can measure your success on that amount of features you release that see high user engagement.

Next, Agile Development is where most team’s principally focus. Most of the core work on agile (scrum, etc.) is primarily focused on velocity- how much stuff are you able to release in a given iteration or sprint.

Finally, there’s the exciting new-ish area of Continuous Delivery- the capability where highly structured, automated test and deploy infrastructure allows developers to release new code with the click of a button (and not freak out).

In that context, here’s how I see the various HDD topics coming into play:


How does it play with Design Thinking, Lean, etc.?

Topline, I would say it’s a way to unify and focus your work across those disciplines. I’ve found that’s a pretty big deal. While none of those practices are hard to understand, practice on the ground is patchy. Usually, the problem is having the confidence that doing things well is going to be worthwhile, and knowing who should be participating when.

My hope is that with this guide and the supporting material (and of course the wider body of practice), that teams will get in the habit of always having a set of hypotheses and that will improve their work and their confidence as a team.

Naturally, these various disciplines have a lot to do with each other, and I’ve summarized some of that here:


Mostly, I find practitioners learn about this through their work, but I’ll point out a few big points of intersection that I think are particularly notable:

  1. Learn by Observing Humans
    We all tend to jump on solutions and over invest in them when we should be observing our user, seeing how they behave, and then iterating. HDD helps reinforce problem-first diagnosis through its connections to relevant practice.
  2. Focus on What Users Actually Do
    A lot of thing might happen- more than we can deal with properly. The goods news is that by just observing what actually happens you can make things a lot easier on yourself.
  3. Move Fast, but Minimize Blast Radius
    Working across so many types of org’s at present (startups, corporations, a university), I can’t overstate how important this is and yet how big a shift it is for more traditional organizations. The idea of ‘moving fast and breaking things’ is terrifying to these places, and the reality is with practice you can move fast and rarely break things/only break them a tiny bit. Without this, you end up stuck waiting for someone else to create the perfect plan or for that next super important hire to fix everything (spoiler: it won’t and they don’t).
  4. Minimize Waste
    Succeeding at innovation is improbable, and yet it happens all the time. Practices like Lean Startup do not warrant that by following them you’ll always succeed; however, they do promise that by minimizing waste you can test five ideas in the time/money/energy it would otherwise take you to test one, making the improbable probable.

What I love about Hypothesis-Driven Development is that it solves a really hard problem with practice: that all these behaviors are important and yet you can’t learn to practice them all immediately. What HDD does is it gives you a foundation where you can see what’s similar across these and how your practice in one is reenforcing the other. It’s also a good tool to decide where you need to focus on any given project or team.

How do you know if it’s working?

I love this question because of course it would be pretty silly to invest a test-driven approach like this and then not subject that decision to the same scrutiny. In your practice of agile or work in general, you probably have some topline metrics (feature output, sales, user engagement, etc.) and adjacent improvements on those likely suggest general improvement.

However, the practice of HDD entails a lot of different things and those are more specifically observable and testable. Each of the sections below includes notes on focal observations to see how things are working. That said, here are a few topline ideas for each of the major areas.

How do you test that you’ve found the ‘Right Problem’?

Right-ProblemBasically, you talk to customers about your area of interest and see what’s actually important to them. This process is very different from selling, and that’s where most people run into trouble.

Let’s say it’s an internal project- a ‘digital transformation’ for an HVAC (heating, ventilation, and air conditioning) service company. The digital team thinks it would be cool to organize the documentation for all the different HVAC equipment the company’s technicians service. But, would it be?

The only way to find out is to go out and talk to these technicians and find out! First, you need to test whether you’re talking to someone who is one of these technicians. For example, you might have a screening question like: ‘How many HVAC’s did you repair last week?’. If it’s <10,  you might instead be talking to a handyman or a manager (or someone who’s not an HVAC tech at all).

Second, you need to ask non-leading questions. The evidentiary value of a specific answer to a general question is much higher than a specific answer to a specific questions. Also, some questions are just leading. For example, if you ask such a subject ‘Would you use a documentation system if we built it?’, they’re going to say yes, just to avoid the awkwardness and sales pitch they expect if they say no.

How do you draft personas?

Much more renowned designers than myself (Donald Norman among them) disagree with me about this, but personally I like to draft my personas while I’m creating my interview guide and before I do my first set of interviews. Whether you draft or interview first is also of secondary important if you’re doing HDD- if you’re not iteratively interviewing and revising your material based on what you’ve found, it’s not going to be very functional anyway.

Really, the persona (and the job-to-be-done/problem scenario) is a means to an end- it should be answering some facet of the question ‘Who is our customer, and what’s important to them?’. It’s iterative, with a process that looks something like this:

personas-process-v3 All that said, for getting started here is-
A guide on creating personas
A template for drafting personas

How do you draft jobs-to-be-done/problem scenarios?

Personally- I like to work these in a similar fashion- draft, interview, revise, and then repeat, repeat, repeat.

You’ll use the same interview guide and subjects for these. The template is the same as the personas, but I maintain a separate (though related) tutorial for these–

A guide on creating JTBD/problem scenarios
A template for drafting JTBD/problem scenarios

How do you interview subjects?

And, action! The #1 place I see teams struggle is at the beginning and it’s with the paradox that to get to a big market you need to nail a series of small markets. Sure, they might have heard something about segmentation in a marketing class, but here you need to apply that from the very beginning.

The fix is to create a screener for each persona. This is a factual question whose job is specifically and only to determine whether a given subject does or does not map to your target persona. In the HVAC in a Hurry technician persona (see above), you might have a screening question like: ‘How many HVAC’s did you repair last week?’. If it’s <10,  you might instead be talking to a handyman or a manager (or someone who’s not an HVAC tech at all).

And this is the point where (if I’ve made them comfortable enough to be candid with me) teams will ask me ‘But we want to go big- be the next Facebook.’ And then we talk about how just about all those success stories where there’s a product that has for all intents and purpose a universal user base started out by killing it in small, specific segments and learning and growing from there.

Sorry for all that, reader, but I find all this so frequently at this point and it’s so crucial to what I think is a healthy practice of HDD it seemed necessary.

The key with the interview guide is to start with general questions where you’re testing for a specific answer and then progressively get into more specific questions. Here are some resources–

An example interview guide related to the previous tutorials
A general take on these interviews in the context of a larger customer discovery/design research program
A template for drafting an interview guide

To recap, what’s a ‘Right Problem’ hypothesis?

The Right Problem (persona and PS/JTBD) hypothesis is the most fundamental, but the hardest to pin down. You should know what kind of shoes your customer wears and when and why they use your product. You should be able to apply factual screeners to identify subjects that map to your persona or personas.

You should know what people who look like/behave like your customer who don’t use your product are doing instead, particularly if you’re in an industry undergoing change. You should be analyzing your quantitative data with strong, specific, emphatic hypotheses.

If you make software for HVAC (heating, ventilation and air conditioning) technicians, you should have a decent idea of what you’re likely to hear if you ask such a person a question like ‘What are the top 5 hardest things about finishing an HVAC repair?’

In summary, HDD here looks something like this:


01 IDEA: The working idea is that you know your customer and you’re solving a problem/doing a job (whatever term feels like it fits for you) that is important to them. If this isn’t the case, everything else you’re going to do isn’t going to matter.

Also, you know the top alternatives, which may or may not be what you see as your direct competitors. This is important as an input into focused testing demand to see if you have the Right Solution.

02 HYPOTHESIS: If you ask non-leading questions (like ‘What are the top 5 hardest things about finishing an HVAC repair?’), then you should generally hear relatively similar responses.

03 EXPERIMENTAL DESIGN: You’ll want an Interview Guide and, critically, a screener. This is a factual question you can use to make sure any given subject maps to your persona. With the HVAC repair example, this would be something like ‘How many HVAC repairs have you done in the last week?’ where you’re expecting an answer >5. This is important because if your screener isn’t tight enough, your interview responses may not converge.

04 EXPERIMENTATION: Get out and interview some subjects- but with a screener and an interview guide. The resources above has more on this, but one key thing to remember is that the interview guide is a guide, not a questionnaire. Your job is to make the interaction as normal as possible and it’s perfectly OK to skip questions or change them. It’s also 1000% OK to revise your interview guide during the process.

05: PIVOT OR PERSEVERE: What did you learn? Was it consistent? Good results are:
a) We didn’t know what was on their A-list and what alternatives they are using, but we do know.
b) We knew what was on their A-list and what alternatives they are using- we were pretty much right (doesn’t happen as much as you’d think).
c) Our interviews just didn’t work/converge. Let’s try this again with some changes (happens all the time to smart teams and is very healthy).

How do you test that you’ve found demand and have the ‘Right Solution’?

By this, I mean: How do you test whether you have demand for your proposition? How do you know whether it’s better enough at solving a problem (doing a job, etc.) than the current alternatives your target persona has available to them now?

If an existing team was going to pick one of these areas to start with, I’d pick this one. While they’ll waste time if they haven’t found the right problem to solve and, yes, usability does matter, in practice this area of HDD is a good forcing function for really finding out what the team knows vs. doesn’t. This is why I show it as a kind of fulcrum between Right Problem and Right Solution:


This is not about usability and it does not involve showing someone a prototype, asking them if they like it, and checking the box.

Lean Startup offers a body of practice that’s an excellent fit for this. However, it’s widely misused because it’s so much more fun to build stuff than to test whether or not anyone cares about your idea. Yeah, seriously- that is the central challenge of Lean Startup.

Here’s the exciting part: You can massively improve your odds of success. While Lean Startup does not claim to be able to take any idea and make it successful, it does claim to minimize waste- and that matters a lot. Let’s just say that a new product or feature has a 1 in 5 chance of being successful. Using Lean Startup, you can iterate through 5 ideas in the space it would take you to build 1 out (and hope for the best)- this makes the improbably probable which is pretty much the most you can ask for in the innovation game.

Build, measure, learn, right?

Kind of. I’ll harp on this since it’s important and a common failure mode relate to Lean Startup: an MVP is not a 1.0. As the Lean Startup folks (and Eric Ries’ book) will tell you, the right order is learn, build, measure. Specifically–

Learn: Who your customer is and what matters to them (see Solving the Right Problem, above). If you don’t do this, you’ll throwing darts with your eyes closed. Those darts are a lot cheaper than the darts you’d throw if you were building out the solution all the way (to strain the metaphor some), but far from free.

In particular, I see lots of teams run an MVP experiment and get confusing, inconsistent results. Most of the time, this is because they don’t have a screener and they’re putting the MVP in front of an audience that’s too wide ranging. A grandmother is going to respond differently than a millennial to the same thing.

Build: An experiment, not a real product, if at all possible (and it almost always is). Then consider MVP archetypes (see below) that will deliver the best results and try them out. You’ll likely have to iterate on the experiment itself some, particularly if it’s your first go.

Measure: Have metrics and link them to a kill decision. The Lean Startup term is ‘pivot or persevere’, which is great and makes perfect sense, but in practice the pivot/kill decisions are hard and as you decision your experiment you should really think about what metrics and thresholds are really going to convince you.

How do you make an MVP?

You don’t. This MVP is a means to running an experiment to test motivation- so formulate your experiment first and then figure out an MVP that will get you the best results with the least amount of time and money. Just since this is a practitioner’s guide, with regard to ‘time’, that’s both time you’ll have to invest as well as how long the experiment will take to conclude. I’ve seen them both matter.

The most important first step is just to start with a simple hypothesis about your idea, and I like the form of ‘If we [do something] for [a specific customer/persona], then they will [respond in a specific, observable way that we can measure]. For example, if you’re building an app for parents to manage allowances for their children, it would be something like ‘If we offer parents and app to manage their kids’ allowances, they will download it, try it, make a habit of using it, and pay for a subscription.’

All that said, for getting started here is-
A guide on testing with Lean Startup
A template for creating motivation/demand experiments

To recap, what’s a Right Solution hypothesis for testing demand?

The core hypothesis is that you have a value proposition that’s better enough than the target persona’s current alternatives that you’re going to acquire customers.

As you may notice, this creates a tight linkage with your testing from Solving the Right Problem. This is important because while testing value propositions with Lean Startup is way cheaper than building product, it still takes work and you can only run a finite set of tests. So, before you do this kind of testing I highly recommend you’ve iterated to validated learning on the what you see below: a persona, one or more PS/JTBD, the alternatives they’re using, and a testable view of why your VP is going to displace those alternatives. With that, your odds of doing quality work in this area dramatically increase!


What’s the testing, then? Well, it looks something like this:

01 IDEA: Most practicing scientists will tell you that the best way to get a good experimental result is to start with a strong hypothesis. Validating that you have the Right Problem and know what alternatives you’re competing against is critical to making investments in this kind of testing yield valuable results.

With that, you have a nice clear view of what alternative you’re trying to see if you’re better than.

02 HYPOTHESIS: I like a cause an effect stated here, like: ‘If we [offer something to said persona], they will [react in some observable way].’ This really helps focus your work on the MVP.

03 EXPERIMENTAL DESIGN: The MVP is a means to enable an experiment. It’s important to have a clear, explicit declaration of that hypothesis and for the MVP to delivery a metric for which you will (in advance) decide on a fail threshold. Most teams find it easier to kill an idea decisively with a kill metric vs. a success metric, even though they’re literally different sides of the same threshold.

04 EXPERIMENTATION: It is OK to tweak the parameters some as you run the experiment. For example, if you’re running a Google AdWords test, feel free to try new and different keyword phrases.

05: PIVOT OR PERSEVERE: Did you end up above or below your fail threshold? If below, pivot and focus on something else. If above, great- what is the next step to scaling up this proposition?

Reference: Usability vs. Motivation

You might reasonably wonder: If my MVP has something that’s hard to understand, won’t that affect the results? Yes, sure. Testing for usability and the related tasks of building stuff are much more fun and (short-term) gratifying. I can’t emphasize enough how much harder it is for most founders, etc. is to push themselves to focus on motivation.

There’s certainly a relationship and, as we transition to the next section on usability, it seems like a good time to introduce the relationship between motivation and usability. My favorite tool for this is BJ Fogg’s Fogg Curve, which appears below. On the y-axis is motivation and on the x-axis is ‘ability’, the inverse of usability. If you imagine a point in the upper left, that would be, say, a cure for cancer where no matter if it’s hard to deal with you really want. On the bottom right would be something like checking Facebook- you may not be super motivated but it’s so easy.

The punchline is that there’s certainly a relationship but beware that for most of us our natural bias is to neglect testing our hypotheses about motivation in favor of testing usability.


How do you test that you’ve designed the ‘Right Solution’?

Right-Solution-UIFor my money, fully articulated user stories are the most important foundation element for getting to great usability in HDD. These have a specific format-
As a [person],
I want to [do something],
so that I can [achieve some testable reward].

First and foremost, delivering great usability is a team sport. Without a strong, co-created narrative, your performance is going to be sub-par. This means your developers, testers, analysts should be asking lots of hard, inconvenient (but relevant) questions about the user stories. For more on how these fit into an overall design program, let’s zoom out and we’ll again stand on the shoulders of Donald Norman.

Usability and User Cognition

To unpack usability in a coherent, testable fashion, I like to use Donald Norman’s 7-step model of user cognition:


The process starts with a Goal and that goals interacts with an object in an environment, the ‘World’. With the concepts we’ve been using here, the Goal is equivalent to a job-to-be-done/problem scenario. The World is your application in whatever circumstances your customer will use it (in a cubicle, on a plane, etc.).

The Reflective layer is where the customer is making a decision about alternatives for their JTBD/PS. In his seminal book, The Design of Everyday Things, Donald Normal’s is to continue reading a book as the sun goes down. In the framings we’ve been using, we looked at understanding your customers Goals/JTBD in ‘How do you test that you’ve found the ‘right problem’?’, and we looked evaluating their alternatives relative to your own (proposition) in ‘How do you test that you’ve found the ‘right solution’?’.

The Behavioral layer is where the user interacts with your application to get what they want- hopefully engaging with interface patterns they know so well they barely have to think about it. This is what we’ll focus on in this section. Critical here is leading with strong narrative (user stories), pairing those with well-understood (by your persona) interface patterns, and then iterating through qualitative and quantitative testing.

The Visceral layer is the lower level visual cues that a user gets- in the design world this is a lot about good visual design and even more about visual consistency. We’re not going to look at that in depth here, but if you haven’t already I’d make sure you have a working style guide to ensure consistency (see  Creating a Style Guide).

Unpacking the UX Stack for Testability

Back to our example company, HVAC in a Hurry, which services commercial heating, ventilation, and A/C systems, let’s say we’ve arrived at the following tested learnings for Trent the Technician:


As we look at how we’ll iterate to the right solution in terms of usability, let’s say we arrive at the following user story we want to unpack (this would be one of many, even just for the PS/JTBD above):

As Trent the Technician,
I know the part number and I want to find it on the system,
so that I can find out its price and availability.

Let’s step through the 7 steps above in the context of HDD, with a particular focus on achieving strong usability.

1. Goal
This is the PS/JTBD: Getting replacement parts to a job site. An HDD-enabled team would have found this out by doing customer discovery interviews with subjects they’ve screened and validated to be relevant to the target persona. They would have asked non-leading questions like ‘What are the top five hardest things about finishing an HVAC repair?’ and consistently heard that one such thing is sorting our replacement parts. This validates the PS/JTBD hypothesis that said PS/JTBD matters.

2. Plan
For the PS/JTBD/Goal, which alternative are they likely to select? Is our proposition better enough than the alternatives? This is where Lean Startup and demand/motivation testing is critical. This is where we focused in ‘How do you test that you’ve found the ‘right solution’?’ and the HVAC in a Hurry team might have run a series of MVP to both understand how their subject might interact with a solution (concierge MVP) as well as whether they’re likely to engage (Smoke Test MVP).

3. Specify
Our first step here is just to think through what the user expects to do and how we can make that as natural as possible. This is where drafting testable user stories, looking at comp’s, and then pairing clickable prototypes with iterative usability testing is critical. Following that, make sure your analytics are answering the same questions but at scale and with the observations available.

4. Perform
If you did a good job in Specify and there are not overt visual problems (like ‘Can I click this part of the interface?’), you’ll be fine here.

5. Perceive
We’re at the bottom of the stack and looping back up from World: Is the feedback from your application readily apparent to the user? For example, if you turn a switch for a lightbulb, you know if it worked or not. Is your user testing delivering similar clarity on user reactions?

6. Interpret
Do they understand what they’re seeing? Does is make sense relative to what they expected to happen. For example, if the user just clicked ‘Save’, do they’re know that whatever they wanted to save is saved and OK? Or not?

7. Compare
Have you delivered your target VP? Did they get what they wanted relative to the Goal/PS/JTBD?

How do you draft relevant, focused, testable user stories?

Without these, everything else is on a shaky foundation. Sometimes, things will work out. Other times, they won’t. And it won’t be that clear why/not. Also, getting in the habit of pushing yourself on the relevance and testability of each little detail will make you a much better designer and a much better steward of where and why your team invests in building software.

For getting started here is-
A guide on creating user stories
A template for drafting user stories

How do you create find the relevant patterns and apply them?

Once you’ve got great narrative, it’s time to put the best-understood, most expected, most relevant interface patterns in front of your user. Getting there is a process.

For getting started here is-
A guide on interface patterns and prototyping

How do you run qualitative user testing early and often?

Once you’ve got great something to test, it’s time to get that design in front of a user, give them a prompt, and see what happens- then rinse and repeat with your design.

For getting started here is-
A guide on qualitative usability testing
A template for testing your user stories

How do you focus your outcomes and instrument actionable observation?

Once you release product (features, etc.) into the wild, it’s important to make sure you’re always closing the loop with analytics that are a regular part of your agile cadences. For example, in a high-functioning practice of HDD the team should be interested in and  reviewing focused analytics to see how their pair with the results of their qualitative usability testing.

For getting started here is-
A guide on quantitative usability testing with Google Analytics.

To recap, what’s a Right Solution hypothesis for usability?

Essentially, the usability hypothesis is that you’ve arrived at a high-performing UI pattern that minimizes the cognitive load, maximizes the user’s ability to act on their motivation to connect with your proposition.


01 IDEA: If you’re writing good user stories, you already have your ideas implemented in the form of testable hypotheses. Stay focused and use these to anchor your testing. You’re not trying to test what color drop-down works best- you’re testing which affordances best deliver on a given user story.

02 HYPOTHESIS: Basically, the hypothesis is that ‘For [x] user story, this interface pattern will perform will, assuming we supply the relevant motivation and have the right assessments in place.

03 EXPERIMENTAL DESIGN: Really, this means have a tests set up that, beyond working, links user stories to prompts and narrative which supply motivation and have discernible assessments that help you make sure the subject didn’t click in the wrong place by mistake.

04 EXPERIMENTATION: It is OK to iterate on your prototypes and even your test plan in between sessions, particularly at the exploratory stages.

05: PIVOT OR PERSEVERE: Did the patterns perform well, or is it worth reviewing patterns and comparables and giving it another go?

How do you get to continuous delivery?

Getting-to-Cont-Delivery-FeatureFor the last couple of decades, test and deploy/ops was often treated like a kind of stepchild to the development- something that had to happen at the end of development and was the sole responsibility of an outside group of specialists. It didn’t make sense then, and now an integral test capability is table stakes for getting to a continuous product pipeline, which at the core of HDD itself.

A continuous pipeline means that you release a lot. Getting good at releasing relieves a lot of energy-draining stress on the product team as well as creating the opportunity for rapid learning that HDD requires. Interestingly, research by outfits like DORA (now part of Google) and CircleCI shows teams that are able to do this both release faster and encounter fewer bugs in production.

Amazon famously releases code every 11.6 seconds. What this means is that a developer can push a button to commit code and everything from there to that code showing up in front of a customer is automated. How does that happen? For starters, there are two big (related) areas: Test & Deploy.


While there is some important plumbing that I’ll cover in the next couple of sections, in practice most teams struggle with test coverage. What does that mean? In principal, what it means is that even though you can’t test everything, you iterate to test automation coverage that is catching most bugs before they end up in front of a user. For most teams, that means a ‘pyramid’ of tests like you see here, where the x-axis the number of tests and the y-axis is the level of abstraction of the tests.


The reason for the pyramid shape is that the tests are progressively more work to create and maintain, and also each one provides less and less isolation about where a bug actually resides. In terms of iteration and retrospectives, what this means is that you’re always asking ‘What’s the lowest level test that could have caught this bug?’.

Unit tests isolate the operation of a single function and make sure it works as expected. Integration tests span two functions and system tests, as you’d guess, more or less emulate the way a user or endpoint would interact with a system.

Feature Flags: These are a separate but somewhat complimentary facility. The basic idea is that as you add new features, they each have a flag that can enable or disable them. They are start out disabled and you make sure they don’t break anything. Then, on small sets of users, you can enable them and test whether a) the metrics look normal and nothing’s broken and, closer to the core of HDD, whether users are actually interacting with the new feature.


In the olden days (which is when I last did this kind of thing for work), if you wanted to update a web application, you had to log in to a server, upload the software, and then configure it, maybe with the help of some scripts. Very often, things didn’t go accordingly to plan for the predictable reason that there was a lot of opportunity for variation between how the update was tested and the machine you were updating, not to mention how you were updating.

Now computers do all that- but you still have to program them. As such, the job of deployment has increasingly become a job where you’re coding solutions on top of platforms like Kubernetes, Chef, and Terraform. These folks are (hopefully) working closely with developers on this. For example, rather than spending time and money on writing documentation for an upgrade, the team would collaborate on code/config. that runs on the kind of application I mentioned earlier.

Pipeline Automation

Most teams with a continuous pipeline orchestrate something like what you see below with an application made for this like Jenkins or CircleCI. The Manual Validation step you see is, of course, optional and not a prevalent part of a truly continuous delivery. In fact, if you automate up to the point of a staging server or similar before you release, that’s what’s generally called continuous integration.

Finally, the two yellow items you see are where the team centralizes their code (version control) and the build that they’re taking from commit to deploy (artifact repository).


To recap, what’s the hypothesis?

Well, you can’t test everything but you can make sure that you’re testing what tends to affect your users and likewise in the deployment process. I’d summarize this area of HDD as follows:


01 IDEA: You can’t test everything and you can’t foresee everything that might go wrong. This is important for the team to internalize. But you can iteratively, purposefully focus your test investments.

02 HYPOTHESIS: Relative to the test pyramid, you’re looking to get to a place where you’re finding issues with the least expensive, least complex test possible- not an integration test when a unit test could have caught the issue, and so forth.

03 EXPERIMENTAL DESIGN: As you run integrations and deployments, you see what happens! Most teams move from continuous integration (deploy-ready system that’s not actually in front of customers) to continuous deployment.

04 EXPERIMENTATION: In  retrospectives, it’s important to look at the tests suite and ask what would have made the most sense and how the current processes were or weren’t facilitating that.

05: PIVOT OR PERSEVERE: It takes work, but teams get there all the time- and research shows they end up both releasing more often and encounter fewer production bugs, believe it or not!