The Cult of Statistical Significance

“But is it significant?”

That’s always one of the first questions researchers in economics and finance are asked. It is an interesting contrast to: “Does it matter?”

The Cult of Statistical Significance by Stephen T. Ziliak and Deirdre N. McCloskey is a book that every economist, research analyst, and investor probably needs to read but very few have. The authors describe how the entire field of economics and finance has become enthralled by p-values. If a result is statistically significant at the 5% level, it is considered a valid phenomenon. A result that fails that test is supposed to be non-existent.

Obviously, the 5% rule misses two points. First, by chance alone, one in every 20 experiments should meet that threshold. Since thousands, perhaps millions, of tests are conducted on finance and economics data every year, we can imagine how many spuriously positive results are found and then published. After all, a positive result is way easier to publish than a negative one.

I remember sitting through a seminar in my university days. A researcher presented statistically significant evidence that company directors leave the board before the firm gets into trouble with their auditors or regulators. That’s all fine and well. But then he showed us that this observation can make money: a full 0.2% outperformance per year — before transaction costs.

Because the researcher had so many data points to estimate his regression, he could generate statistical significance even though the effect had no economic significance. In the end, it was a purely academic exercise.

And second, in the 21st century, the amount of available data has multiplied time and time again. Hedge funds and traditional asset managers apply big data to find patterns in markets that they can exploit. They analyze the data with artificial intelligence (AI) to find “meaningful” correlations that traditional analyses would miss. This approach to investing has a lot of challenges to overcome.

A major and rarely mentioned one: The more data we look at, the more likely we’ll find statistically significant effects, and the more underlying data we have, the more powerful our statistical tests become. So with more data, we can detect smaller and smaller effects that may or may not be economically meaningful.

Cover image of Risk Tolerance and Circumstances book

In “Statistical Nonsignificance in Empirical Economics,” Alberto Abadie analyzes how much knowledge we gain with a statistically significant test result. The dashed curve in the chart below shows the assumption of the possible distribution of a variable before any tests are done. Then, we measure the data — for example, returns of stocks with specific characteristics — and end up with a statistically significant result. The solid curve demonstrates where the true effect could be depending on the number of data points. With very few data points, a statistically significant result carves out quite a big chunk of the distribution. So we learn much more if we get a significant result with few data points.

But with 10,000 data points, the carve-out is extremely small. What that means is the more data we have, the less informative a statistically significant result becomes. On the other hand, if there’s a failure of statistical significance with a test on 10,000 data points, we learn an awful lot. In fact, we would know that the true value would have to be almost exactly zero. And that, in itself, could give rise to an extremely powerful investment strategy.

The Impact of a Statistically Significant Result on Our Knowledge

This is a major reason why so many big data and AI applications fail in real life and why so many equity factors stop working once they’re described in the academic literature.

In fact, a stricter definition of significance that accounts for possible data-mining bias demonstrates that out of the hundreds of equity factors only three are largely immune from p-hacking and data mining: the value factor, the momentum factor, and a really esoteric factor that I still haven’t understood properly.

So what’s the big takeaway? Just because it’s statistically “significant” doesn’t mean it matters. And if it isn’t significant, it may well matter a lot. The next time you come across a significant new result, ask yourself if it matters.

For more from Joachim Klement, CFA, don’t miss 7 Mistakes Every Investor Makes (And How to Avoid Them) and Risk Profiling and Tolerance, and sign up for his Klement on Investing commentary.

If you liked this post, don’t forget to subscribe to the Enterprising Investor.

All posts are the opinion of the author. As such, they should not be construed as investment advice, nor do the opinions expressed necessarily reflect the views of CFA Institute or the author’s employer.

Professional Learning for CFA Institute Members

CFA Institute members are empowered to self-determine and self-report professional learning (PL) credits earned, including content on Enterprising Investor. Members can record credits easily using their online PL tracker.