The world's leading statisticians have spoken, and their message to researchers, students, and science communicators is clear - it's time to stop using p-values and statistical significance on their own to test hypotheses and determine whether results are important.
If none of that makes sense to you, the ELI5 is this: scientists find correlations in their research all the time, and to figure out whether they're legitimate or just a fluke, they use a test called p-value. The lower the p-value, the better the chance that the results are real, with a p-value of less than 0.05 being the magic number that determines whether something is worthy of publishing ('statistically significant'). Or, at least, that's how we're using it now, but according to a statement just released by the American Statistical Association (ASA), we're doing it all wrong.
"The p-value was never intended to be a substitute for scientific reasoning," said ASA executive director, Ron Wasserstein. "Well-reasoned statistical arguments contain much more than the value of a single number and whether that number exceeds an arbitrary threshold. The ASA statement is intended to steer research into a 'post P<0.05 era'."
Coming from a statistician, those are fighting words, and for the first time in their 177-year history, the ASA has released a statement detailing explicitly how the test should be used.
The decision came after the association became increasingly concerned that the scientific community's reliance on p-values is contributing to the publication of findings that can't be reproduced - which, if recent studies are anything to go by, is a pretty big problem.
"Over time it appears the p-value has become a gatekeeper for whether work is publishable, at least in some fields," said Jessica Utts, president of the ASA. "This apparent editorial bias leads to the 'file-drawer effect', in which research with statistically significant outcomes are much more likely to get published, while other work that might well be just as important scientifically is never seen in print."
It also causes researchers to 'hack' their data to get that much-needed p<0.05 value and focus on those tests solely in journal articles - rather than being transparent about all the statistical test and decisions that were used to calculate data.
So if we're using p-values all wrong, then what's right? The ASA has issued these six guidelines:
- P-values can indicate how incompatible the data are with a specified statistical model.
- P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
- Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
- Proper inference requires full reporting and transparency.
- A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
- By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
This isn't the first time that p-values have been criticised - one journal last year went so far as to ban them altogether - and many scientists are applauding the bold statement.
"Surely if this happened 20 years ago, biomedical research could be in a better place now," Giovanni Parmigiani, a biostatistician at the Dana Farber Cancer Institute in Boston, who isn't part of the ASA, told Nature.
But others warn that it doesn't address the real issue, which goes way beyond p-values and has more to do with society's unrealistic expectations of science.
"People want something that they can't really get," said statistician Andrew Gelman from Columbia University. "They want certainty."
And that's going to take a whole lot more open communication between scientists and the public about what it really means to infer meaning from results, and the nuanced interpretation involved.
It's not going to be easy, but when the goal is to better the scientific method, it's always worth it.