Data-based decisions are good, right?
Yes, but… in practice, it’s not quite that simple.
Consider COVID. These days, everyone, particularly those in politics and government, uses “the data” to explain why they made this or that decision. Often, they add that the decision was based on “science.”
But what exactly is “science?”
Oversimplified, it’s a model of what will happen given various inputs to that model.
When inputs yield the expected output, there is confidence in the model. And the inputs and outputs are, well, data. So, all this talk is really about models and data.
We use —
or should use
— models and data in business all the time. Unfortunately,
there is a tendency to accept the words “model” and “data” as verification in and of themselves,
overlooking the pitfalls that occur when these elements contain underlying flaws or inaccuracies.
Three Topics in One
When evaluating a model for business use, there are really three topics to address:
Is this the right model?
Is the data meaningful?
What is the decision(s) that we are trying to make using this model?
#1. Is this the right model?
When faced with a new problem to solve or new decision to make, the first question is what model to use. For example,
when the Pandemic first hit, many epidemiologists started with existing coronavirus models,
because COVID is a coronavirus. Their projections were way off — they were using the wrong model.
Those that made better initial projections
realized that the infection pattern was less like a coronavirus and more like a flu virus, so they used those types of models instead.
So, first off,
ask yourself if you are starting with a model that projects something even close to what you are seeing.
#2. Is the data meaningful?
How was the data collected? Who collected it? How was it measured? Does the data look different if it is
There are lots of ways to get tripped up here.
For example, I have a client that is moving to a new method to calculate how many tons of a raw material is converted into cubic feet of processed material in the production of its product. Currently, once an hour,
a 10-second sample is crudely taken of the raw material going into the process; another sample is taken as the material leaves the process.
Using these and some other inputs, a calculation is made to determine how many tons of raw material result in how many cubic feet of finished product. Soon, however, my client will be moving to a calculation based on raw material received, and beginning and ending raw material inventories that will be accurately measured.
Both approaches involve “data” and a “model.” But the current method relies on very inaccurate measurements taken by people; the new method relies on much more precise measurements taken by machines.
In all cases, the output of a model will be no more accurate than the input — garbage in, garbage out.
A sense of data accuracy on the input side should give a sense of how much variance to expect on the output side (and whether it’s usable at all). In the example above, a bad measurement multiplied by another bad measurement gives bad output data. How did my client know? Because the math should accurately predict the ending inventory of raw materials and that prediction is rarely close.
be aware of any essentially made-up inputs.
For example, in the early COVID models, many of the inputs weren’t known, so they were assumed to be the same or similar to either flu or other coronaviruses.
inaccurate inputs multiplied together can yield outputs that are
more inaccurate than the individual inputs.
Such was the case with my manufacturing client.
#3. What is the decision(s) that we are trying to make using this model?
This question helps set the stage for determining whether or not the data and model are “good enough.”
In some cases, such as when making initial COVID projections, precision is less important in making a sensible decision than is knowing the direction. In other cases, such as when using a product-costing model to make pricing decisions in a low-margin industry, precision can be existential.
A few final tips…
Beware of confirmation bias
That is, focusing on data, models, and projections that support an existing point of view while subconsciously ignoring data that doesn’t.
Experience and judgement do matter.
In the case of COVID-19, it was experience across a range of infection types that prompted the realization that the data looked more like a flu than a traditional coronavirus.
Consider sample size and selection methodology
Is the sample size large enough to be meaningful? In my client example above, 10 seconds was not long enough. Further, in our rush to uncover something useful, have we unwittingly made so many selection decisions that the data is not representative or broadly meaningful?
For new situations, as I have mentioned
in the past
, starting with any model, particularly an older model, is better than nothing.
Use the initial model as a starting point to learn more. Make a prediction. If it doesn’t hold up, there is a variance to work with and evaluate.
Overall, be careful with new situations.
Sound decisions are based on a theory of how the relevant world works and on good data.
When novel situations or problems arise, it’s especially important to understand and appreciate the limitations of the data and models being used.