The measurement oversimplification trap
Everything should be made as simple as possible, but not simpler
Humans have a bias for simplicity, and for good reasons. Our brains weren’t designed to process large volumes of information in a short time span, and will do anything to avoid engaging higher-order brain functions when it has to reach a judgment and formulate a response.
Simplicity is a worthy goal to have. My previous post, Lowering uncertainty with limited data, include examples of how simple rules of thumb can support high-quality decision-making, in some cases even exceeding the results of much more complex models.
However, as the aphorism often attributed to Einstein goes,
Everything should be made as simple as possible, but not simpler.
So, how can we tell when simplification has gone too far?

Below are two signs that oversimplification is happening and needs to stop.
1) Optimizing for a single measure
Be it revenue per visitor, monthly active users, Net Promoter Score (The One Number You Need to Grow), many companies and teams tend to fixate on a single measure of success they hope will help them achieve a desired outcome (close a new round of funding, convince a prospect to sign up, etc.).
It would be great to find one performance measure capable of indicating when everything is not in fine shape and tell us what to do about it. Unfortunately, it can’t be done.
A classical example is measuring (and rewarding) software developers by number of bugs fixed. If a consequential bug exists, it’s of great value to have it fixed, but what happens if that’s the only dimension of performance being measured? We can quickly devolve into The folly of rewarding A, while hoping for B. Companies using this approach quickly found out that developers were deliberately shipping buggy code so they could subsequently fix it and get their reward.
2) Relying on averages for important decisions
Averages are inherently reductive and often misleading because they ignore the impact of the inevitable variations.
For a long time, scientists believed that the truth of something could be determined by collecting and averaging a massive amount of data. However, with more research, it became clear that in domains with too much variability systems designed around averages do not work well. As the joke that first appeared in the 1950s goes, a statistician who put his head in an oven and his feet in a freezer may conclude, “On average, I feel fine.”
Around the same time, the US Air Force learned the lesson: If you design a cockpit to fit the average pilot, you’ve actually designed it to fit no one.
Decisions made based on average conditions usually go wrong:
Segmenting customers based on their average spend and frequency of purchase may be useful in some scenarios, but create a costly oversimplification in others. For example, among your regular big spenders, a group may be willing to buy a new product without any discounts, while another will only buy if offered a discount. Treating both cohorts as part of the same homogeneous group may cause the business to lose significant money.
Using a scoring system to assess the technical skills of job seekers may be useful to quickly filter out unqualified candidates. However, if the choice of five candidates to be brought to an interview is based only on a composite score, it’s possible that a candidate who scored exceptionally high in one skill but displayed significant weakness in another is kept in the pool while an attractive candidate with solid skills across all important domains is eliminated from the process.
What to do to avoid these negative effects of oversimplification?
Don’t fixate on a single measure of performance to avoid developing a distorted perspective of reality and creating incentives for gaming the system.
Don’t focus on measuring just what’s easily measurable in detriment of what’s important to know. A lot of what’s considered intangible and unmeasurable (like the performance of business analysts) can be measured with a useful degree of accuracy and repeatability to improve decision-making.
Don’t rely on a composite score (e.g., overall code quality) without also reporting on its component parts. If everyone isn’t constantly reminded of what makes up the composite measure (e.g., defect rate, time-to-market, reusability), it soon becomes a meaningless abstraction.
Don’t trust that averages will be a good reference standard. When the US Air Force discovered that measuring average pilot body dimensions wasn’t helping ensure better-fitting cockpits, it started demanding from airplane manufactures that all cockpits needed to fit pilots whose measurements fell within the 5% to 95% range on each dimension. As Sam Savage said in the 2002 HBR article The Flaw of Averages, rather than “Give me a number for my report,” what every executive should be saying is “Give me a distribution for my simulation.”


