Businesses look at data all the time to draw conclusions and then make decisions. But before you make the next decision, ask yourself, “Is the data I am relying on good enough to support whatever conclusion I have arrived at?”
This is a critical question, since when evaluating decisions that have gone wrong, I often find that it is the data itself — not the conclusions that were derived from it — that was flawed.
Consider this recent example…
An academic study claims that 3.6 million American households live on $2 a day or less. No way, says Bruce Meyer
and his colleagues at the University of Chicago Harris School of Public Policy, along with a researcher with the U.S. Census Bureau.
That erroneous conclusion, argues the Meyer team, came from looking at a Census Bureau survey that collects data by interviewing households. But when other anonymized government data is compared to the survey answers, it is shown that many of these are flat out wrong — 90% of those whom the study said lived on less than $2 per day were far better off.
It’s hard to get accurate survey data in general; subsequent research has shown that this survey in particular was highly flawed and excluded many forms of income that should have been counted.
The people who wrote the study are highly educated and versed in statistics. If they could get it so wrong, how is a non-data specialist to know if the data they are using is good enough to draw conclusions from?
Some questions to consider…
Does the data pass the common sense test?
When I was in the rent a car business in the early 90’s, we had a facility located in a dicey part of Washington DC that was part of our mid-Atlantic operation. Internal reports said the location was slightly profitable. Yet, that mid-Atlantic operation was losing lots of money and that facility had been firebombed twice in less than 12 months. It was also the source of lots of accidents and damaged cars. None of this made sense.
Until I took a closer look. I quickly discovered that except for rent, revenue, and payroll, all other line items for that facility were allocations — its accident repair costs had been spread evenly across the entire mid-Atlantic operation. Worse, insurance costs per car had been arbitrarily set by the Company’s Chief Operating Officer, disguising the fact that annual insurance payouts for this location exceeded annual revenue! Common sense prevailed and the facility was quickly shut down. (Today, the location is a high-rent office building. Times do change.)
There is such a thing as statistical common sense, too. The data points in the poverty study were significant deviations from the mean — extreme outliers. Statisticians (those in this study not withstanding) understand that in the real world, extreme outliers are usually noise and errors.
How was the data obtained?
There are many limitations to data obtained from in-person surveys. For example, most people in the United States underreport their income. This is something that the researchers above should have known. In my rent a car example, understanding how location profitability was determined should have been enough of a red flag to not draw any significant conclusions.
Or, consider the example of a current client whose measure of customer-specific product profitably showed that nearly all customer pricing for nearly all products was under water — unprofitable. Yet, up until COVID, the company was profitable overall. The data problem? One commodity raw material, representing 50% of revenue in aggregate, was measured in a horribly inaccurate way. Other key costs were allocated based on “run time” (measured in a grossly inaccurate way, too). Today, I am working to establish processes and methods with this client to accurately determine customer-specific product profitability.
Does it crosscheck with other data?
In the poverty example, Meyer’s research team crosschecked the survey results with IRS records and other data. In the rent a car example, location profitability data was crosschecked with insurance payout data.
What if you don’t have crosscheck data? If the conclusions from the existing data lead to significant decisions, go outside and buy it.
Can we disaggregate the data?
As I have written about before
, a problem may also occur when one assumes that the combining of individual numbers leads to a meaningful
Consider another government policy example. For years, economists declared that the huge decline in manufacturing employment in the 2000’s was due to productivity growth. But, when economist Susan Houseman looked into that data
and disaggregated it, she found that productivity growth was not the answer.
If a data source is important for reoccurring decisions, make sure it is a good one and that it is constructed to serve its purpose. Financial statements are a good example; processes and controls must be in place to ensure the data is accurate.
A client of mine has struggled with financial statement accuracy, so much that it is difficult to make accurate assessments of the company’s financial performance. They thought that this year, they had turned the corner on COVID-induced losses. Then, after the most recent month end close, they found out that while they are ahead of last year, it is by nowhere near as much as they thought.
Why? Because they do a poor job of accruing expenses when there is no invoice or record — it’s almost cash basis accounting. When those invoices showed up and were paid, a bucket of expenses for prior periods came to life in the “profit” data. I am helping this company put in proper month end closing procedures to make the monthly financial data meaningful and useful.
Decisions based on data are almost always better than decisions based on just winging it.
That said, it’s important to make sure your data is good enough to support your business decisions. Otherwise, you’re still winging it… just unknowingly!