Lies, damn lies and metrics

Of course you need metrics. Just don't mix up the measurement of reality with the reality itself.

Jun 05, 2024

Once upon a time, a company’s NPS (Net Promoter Score) dropped by 30 points overnight.

This company loved to make a big deal of how their NPS was higher than anyone else’s in their industry. This 30 point drop wiped out that claim. 😱

What had happened? Can you guess? Was something dreadfully wrong with their customer service? Their technical systems? Their user experience?

I’ll tell you the answer in a moment. First, quick housekeeping:

Fancy a free preview of my talk about Pitch Provocations? I poo-pooed this method when I first heard about it, but now it’s one of my favourite discovery tools. It’s especially good in ambiguity, when teams wish someone would give them a clear strategy. Limited spaces, no recording, book here: https://lu.ma/egvid1x5
I’ll be premiering this talk and running Multiverse Mapping workshops at UX London 18th-20th. Will you be there? Drop me a reply and let’s grab a coffee!
If you want to grab a ticket, use code JOINTOM at checkout for 20% off.

OK back to the story: what caused that 30% drop in NPS?

The team scrambled to investigate. It didn’t take long to figure out what the problem was.

They’d switched the tool they used to collect NPS from customers, and there was something different about the way the new tool collected the data.

This is what the emails from the different systems looked like. Spot the difference!

Artist’s recreation. (Yes, this was when we still had home buttons on phones.)

In the new system’s email, every number from 0-10 is equally easy to tap. This is in line with the official guidelines for how NPS is supposed to be collected.

Whereas in the previous system’s email, the scores were stacked vertically from high to low, creating a tiny bit of friction for anyone who wanted to give a lower score. And with the janky maths behind NPS, that tiny bit of friction added up to 30 points. Not kosher.

So obviously, the company committed to the new system, right? I mean, if you believe NPS is a useful metric (which it’s not1), and you use it to make decisions (which you shouldn’t), or as a KR in your OKRs (which is a terrible idea), then of course you need to measure it correctly.

Haha nope.

The company quietly switched back to the old system for collecting NPS. Phew! They could still crow about having the best NPS in the market.

Metrics are measurements and measurement is hard

The story above happened pretty much just like that, although I’ve anonymised the details. I think of it every time I see people talk about being data-driven and using metrics to make decisions.

Because business metrics aren’t real. Don’t get me wrong. They’re important, but they’re proxies for the real world, not the real world. Metrics give us clues about some parts of the real world. They’re imperfect clues, with noise, error and lag added in. Some are less imperfect, closer to reality. Others are much further away from reality.

There’s a lot of real world out there. Like, loads. Reality has a surprising amount of detail. The more you look, the more detail you can find. And it’s messy and confusing.

Most of the real world doesn’t follow neat cause-and-effect relationships, nor does it look like a funnel, nor does it owe us being easy to measure.

At best, a metric gestures at parts of the real world that might influence other parts of the real world. Another metric might gesture at some of those other parts. Another metric might lead you off into fairyland.

Put it this way: I spent years in orgs and teams getting confused, side-tracked and slowed down by inappropriate metrics. I want you to be able to side-step all that.

Let’s look at some metrics, then I’ll rant about metrics trees.

Some metrics are closer to reality

We recently measured one of our windows for some new curtains. It’s a British house, so the walls are wonky and the window gaps are only approximately rectangular at best. This means each window gap has a distribution of widths and heights. The window-width metric we noted down is within about a centimetre of the real world window’s distribution of widths. If we needed more precision, we could take multiple measurements to better model the distribution of widths. No number of measurements can ever be the same as the real world window, but we can get our model very close with a lot of effort.

In business, a set of metrics that’s close to reality is payroll data. When that’s inaccurate, it costs someone actual money. People tend to notice that and correct it.

Most metrics are further away from reality

Most kinds of payments-related metrics are a bit like payroll in that people care if they’re wrong. But they also get muddied by delays in processing, or by slow corrections like returns and refunds. This introduces lag, noise and error.

Note that the muddiness and lag is created by real world behaviours – you can’t “fix” it in the measurement process. But it’s easy to accidentally make any metric worse at measuring the real world.

For example, imagine you have a subscription service where people pay weekly and can cancel any time. If something happens that means everyone starts to cancel, that will show up in your payment metric within a week or so. Now imagine you sell an annual plan and only let people cancel once a year. It will take at least months before a sudden increase in churn shows up in the payment metric. You might enjoy the cashflow benefits of the annual plan, but you have to trade that off against how far away from reality you’re going to end up.

And some popular metrics have little to do with reality

NPS is a prime example. The company I mentioned at the start decided that they wanted one metric for “customer experience” and that NPS was that metric. I explained that it didn’t — and couldn’t — measure customer experience. Customer experience emerges in the interactions between millions of people and hundreds of moments on and off screens. It’s subjective, invisible and non-linear, modulated by lots of conditions. It’s not reflected in any number, but in the stories that each person tells themselves and others about their experience.

Measuring customer experience with NPS is about as useful as measuring our window by asking everyone who’s visited our house in the past 6 months how they felt about the quality of light in the house on a scale of 0-10.

(I know that if you need a metric to measure customer experience, what I’m saying here is unsatisfying. You might disagree or even find it offensive. Let me just say there are ways to measure customer experience, but not with simplistic numbers. A topic for another day!)

But let’s imagine for a moment that NPS could measure customer experience. It’s measuring something after all. Even then, it’s still besmirched by sample bias, influenced by what kind of a day each respondent has been having, wonkified by the ~5% of people who misunderstand the question, the people who just tap on anything to make it go away, and skewed by service representatives begging for a 10. That’s just a handful of the confounding factors. What’s more, you can only ask the NPS question occasionally. Maybe 2 or 3 times a year at most. And customers will answer based on their most recent memories of experiences with your brand, which mostly won’t be what you think you’re measuring.

So NPS is more noise than signal, and any change in NPS is going to lag behind changes in reality that do affect it by a variable and unknowable number of months or even years.

Beware the fruit of the metrics tree

Now we get to what inspired this post. I’ve seen lots of people sharing metrics trees recently. These are branching chains of metrics that are intended to show the causal relationships between them. The North Star Framework is an example, but there are many variants.

I get the idea. These are supposed to be a basic starter model for what’s happening in reality. They show relationships between metrics, like “when metric A goes up, that should mean metric B goes up.” These relationships are supposed to be based on evidence – at least on “hypotheses” that you’ve tried to “validate” (don’t get me started!).

But I have a problem with these metrics trees. Because metrics don’t cause anything.

Again – I know that’s not what the people who make these diagrams think or are trying to say.

But we shape our diagrams, and then our diagrams shape us. When you make a causal diagram out of metrics, you imply that the metrics you used are equally close to the reality, and that changing one metric causes another to change.

And it’s a devilishly short hop from “that metric is just an imperfect placeholder” to “my continued employment depends on making that metric go up”.

So it’s better to draw out a model that shows what you believe is happening in the real world. Don’t start by listing metrics, start by laying out stories about context and situations, stimuli and behaviours, responses and choices. People doing things.

If in doubt, I do this by making a Multiverse Map.

Then you can layer metrics on top of your map. Every time I’ve done this with teams, we’ve ended up figuring out different metrics than we expected we’d use, but we zero in on the metrics that are closer to reality. (I talked a bit about how this can work in Signals > Stories > Options.)

Making this kind of a model is a Very Good Idea™ for most teams. It’s foundational for effective discovery, experimentation, strategy and prioritisation.

But be warned: it probably won’t look as neat and tidy as a metrics tree.

Plus your first models will be mostly wrong. Cedric Chin notes that it takes most teams months to start to tease out their business’ causal model. (He also recommends using XmR charts to help you separate noise from signal in your chosen metrics, so you know when something’s really changed, vs when it’s natural variation.)

Along the journey of discovery, you’ll learn that plenty of what you think matters doesn’t, and some surprising things turn out to be important.

Setting out causal relationships is hard

Daniel Schmidt of DoubleLoop told me that their customers frequently find they are wrong when they set out the causal relationships for their businesses.

For example, they believe that More A leads to More B. But when they run the experiments and gather the data, it turns out that More A leads to Less B.

Sometimes, it turns out that they got the arrow of causality back-to-front, and More B leads to More A. This is known as a “wet pavements cause rain” error, and is really common in complex situations where we’re confused by how long it takes for some effects to show up, and where we’re dazzled by the Halo Effect2.

Most common is that A is not correlated with B. Or that A might influence or modulate B, but not in a predictable or linear way, and in combination with lots of other factors. That’s what’s happening in a lot of the world.

Understanding causal relationships is valuable

So it’s up to you to figure out a simplified-but-useful model of what’s happening in the real world.

It’s going to take trial, error and iteration to figure out what’s causal and what’s dispositional in the world. You’ll need to figure out what you can control, what you can influence and what just happens. And you’ll need to figure out which metrics give you useful clues and which are too noisy or lagging.

But still, you don’t need to make the metrics the main thing.

Or maybe I’m wrong. Maybe metrics aren’t supposed to measure reality

I said metrics themselves don’t affect the real world.

But there are important exceptions. There’s something metrics really do affect.

Narratives.

If a business depends more on influencing investor narratives than on delivering value to customers, then it really doesn’t matter whether their causal model holds up against reality, or that their metrics are close to reality. Only that they support a great story.

The company with the NPS I mentioned at the beginning: they talked a great game about investing in great customer experiences. To be fair they did invest a bit. But what they wanted more than the best customer experience was the best NPS score.

So I guess in conclusion: know if you’re using metrics as clues to figure out what’s going on, so you can deliver value to customers; or as tools to support a narrative you want to peddle.

If it’s the former, and you’d like some help with that, get in touch. I love helping teams to use Multiverse Maps, signals and probes to figure out a banging model and make a real difference in the real world.

Until next time,

Tom x

Thanks to

Corissa Nunn

for another killer edit.

My hobby used to be scoring 0 and then using the follow up box to explain why NPS is a terrible, bad, no good metric. I can’t be bothered any more.

The Halo Effect is a cognitive bias in which the perception of one quality is contaminated by a more readily available quality. For example, good looking people are rated as being more intelligent. In business, observers think they are making judgments of a company’s customer focus, quality of leadership, or other virtues. But their judgement is contaminated by indicators of company performance such as share price or profitability. So correlations of, for example, customer-focus with business success then become meaningless because success was the basis for the measure of customer-focus. The Halo Effect by Phil Rosenzweig explains this and several other common delusions. Very worth reading :D

Charles Lambdin

Jun 5, 2024

Reminds me of the Archie Miller quote, "Data is exhaust, not fuel."

Expand full comment

1 reply by Tom Kerwin

Sharon McCarthy

Jun 10, 2024

Fascinating! It demonstrates the behavioral science principle called Framing, which is often used in pricing to influence an outcome. List the prices in highest order first and average price goes up. List the very same prices in lowest order first, and average price goes down. The "previous system" is actually not an NPS. NPS has a very specific way of asking the question and listing the responses. This matters because all of the testing & benchmarks have been conducted using 1 specific methodology, similar to the one listed as "new system." The exact wording is, "how likely are you to recommend xxx to a friend or colleague." It also is a great example of the problems in democratizing survey tools. Thanks for sharing this, Sharon

2 more comments...