In God We Trust, All Others Bring Data. Not So Fast! (May 2017)

All data is not created equal.

May 2017 Vol. 6 No. 5

Hello,

Collecting and analyzing data is important to make sound decisions. On that we can all agree.

That said, the presence of statistical significance does not necessarily tell the entire story or reliably predict future outcomes.

In today's newsletter, I examine to two primary reasons that all data is not created equal, and offer several suggestions for reviewing and using data in your decision making.

Regards,

Charlie Goodrich
Founder and Principal
Goodrich & Associates

In this issue...

In God We Trust, All Others Bring Data. Not So Fast!

Heard On The Street - Trade explained in a handful of words

About Us

Click here to subscribe.

Visit the newsletter archive.

In God We Trust, All Others Bring Data. Not So Fast!

In science, both hard and social (sociology, economics, etc.), there is a growing and expensive problem for research. Simply put, early trials, statistical studies and so forth, often fail when broadly applied - whether that be in public policy for the social scientist, or large scale drug testing for the scientist.

Drug companies, in particular, are concerned. They begin with a theory that their scientists have about what might cure a particular disease or condition. Then the company spends a bit of money on tests and so forth and "it" looks encouraging.

Given those early results, they spend (sometimes hundreds of millions of dollars) on R&D and more testing. Here as well, the results look good. So now they do a bigger, more comprehensive test and... they come up with a dud. A very expensive dud.

Why? Two things: path forking and small data sets.

What is that? (Be patient, this applies to business too.)

You may vaguely remember this from a long ago statistics class: "It" is statistically significant if there is a greater than 95% chance that a finding is not random or a result of just noise.

Well, not quite. What the test really says is that given the data sample examined, there is a 95% chance that a finding is not random or a result of just noise.

"Given the data sample." That is the key. And there are often two problems with the data sample.

First, the sample is small. This means that for any result to cross that 95% threshold and be statistically significant, the finding itself must also be significant. A 60+% difference may be required and numbers like that grab our attention. But it may not be real - we mistakenly think this new finding is really important and powerful when, in fact, it's just a mathematical necessity given the small size of the sample used.

The second problem is that the finding only holds true for this data set. And to get to this data set, lots of (sometimes arbitrary) decisions were made. That's called "path forking."

Often, the data was picked because it was easy to get. Or, since we are trying to test a particular theory, we look for data that we think will apply. The point is, the data set may not be sufficiently representative of the world at large. Said differently, we look at and take action based on a given study or analysis that was statistically significant, but we ignore similar studies that found nothing.

Ok, so much for drug companies and scientists, even the "social" ones. What does this mean for your business world?

Lots of business decisions are now made on so called "business analytics." You can get an MBA in that stuff these days. But it is bigger than that - think market research, manufacturing defect data and so forth. It can be simpler and smaller too. Here, as with drug companies, the presence of statistical significance does not necessarily tell the entire story or reliably predict future outcomes.

So here is what to look for when conducting or reviewing data-based analysis and recommendations.

Is the data being analyzed representative of the entire pool of data you need to make your decision? What biases came into play regarding data selection? For example, looking at your own data about existing customers is not the same as looking at data for all potential customers in your market.

What are the limits inherent in looking at readily usable data? I run into this problem frequently. For example, I am currently working with a commercial building contractor for whom I forecast cash flow weekly. While we use the average DSO for a particular customer/project to forecast payments, we recognize that the data set is small. Projects take 12 to 16 weeks from start to finish and invoices are monthly, so we don't have a lot of payment history available. And, it seems, even for the different projects with the same general contractor, payment timing varies. Because of the small sample size, when it comes to extrapolating this finding across all future cash flows, we do so with some caution.

Here's what to do:

When reviewing the analysis, always probe the source of the underlying data with an eye towards recognizing the limitations on generalization it may impose. Do that in the context of the decision you are making. Is it representative? For example, we may use what works for existing customers to sell to the ones we don't have. Analysis is done and so forth, but how good is the assumption that the customers you don't have will behave the same way as the ones you do?

Test whenever possible. For example, years ago at Kraft Foods, packaging developed a squeezable container for Miracle Whip. They did lots of lab tests. When we made the product samples, rather than rolling out straight to a test market, the product was handed out to employees to use for 60 days. On the first day, the bottle fell from the refrigerator onto the VP of Marketing and Miracle Whip splattered everywhere. Seems real people drop bottles differently than the machine in the lab. The lab test data was irrelevant.

Consider the nature of the decision. When I was in the car rental business, we had a huge problem with customers wrecking our cars. We were stuck with big bills to fix them, not to mention covering hospital and legal bills for our customers and others (our customers were usually at fault). In New York City, Hertz led the industry by announcing surcharges by zip code. We analyzed our data, and found similar trends. But we knew that because of our tiny market share, we had a data set too small to mean anything. So we relied on Hertz's much bigger data set and followed along, with a better marketing spin, because our losses were so big.

We then took the lead at Baltimore Washington International (BWI) airport where we had horrendous accident losses (worse than in New York). While our data sets were small, and the room for error high, we raised rates for all locals from anywhere in Maryland. We didn't have the data to identify zip codes with statistically significant bad accident rates - the sample sizes were too small to mean anything at the zip code level. However, all zip codes looked bad, just not in a statistically significant way. Since the cost of a wreck was sky high, we punted trying to find good local drivers: No return flight out of BWI, no rental car. It worked. Just because you don't have enough data, don't be afraid to act when the costs of not acting are too high.

Buy outside data when available. If you can purchase data covering a wider universe than just your own, you'll get a broader, more reliable picture. In my rent a car example above, we had the option of buying accident data reported by many insurers.

Research and the collection of data are important in making sound decisions, there's no doubt about that. That said, remember that when making these decisions, you must be careful to understand and identify the limitations baked into your analysis. Absent that insight, you may spend lots of money on the "new cure" and come up with butkus, just like big pharma.

One last thought. We make lots of decisions based on mental models - of markets, organization, competitors, etc. Often, these models are based on prior experience.

Talk about limited data sets!!!

P.S. For a short and easy read on this topic from the perspective of a career statistician, click here. Or, listen to the podcast interview of the author above, here.

Please share with your colleagues

Click here to visit our newsletter archive.

Heard on the Street

George Schultz, a Distinguished Fellow at the Hoover Institution at Stanford University and former Secretary of Labor, Treasury and State, as well as former Director of the Office of Management and Budget, and Martin Feldstein, the George F. Baker Professor of Economics at Harvard University and President Emeritus of the National Bureau of Economic Research, as well as the former Chairman of the Council of Economic Advisors, explain Trade in fewer words than this introduction.

Read it here.

About Us

Goodrich & Associates is a management consulting firm. We specialize in helping our business clients solve urgent liquidity problems. Our Founder and Principal, Charlie Goodrich, holds an MBA in Finance from the University of Chicago and a Bachelor's Degree in Economics from the University of Virginia, and has over 30 years experience in this area.

To ensure that you continue to receive emails from us, please add
[email protected] to your address book today.

Goodrich & Associates respects your privacy.
We do not sell, rent, or share your information with anybody.

Copyright © 2017 Goodrich & Associates LLC. All rights reserved.

For more on Goodrich & Associates and the services we offer, click here.

Newsletter developed by Blue Penguin Development

Goodrich & Associates
[email protected]
www.goodrich-associates.com
781.863.5019