Scientists Are Trying to Get Rid of a Fundamental Measure Used in Science : ScienceAlert

For a while now, scientists have been debating what to do with one of the most famous tools used to describe scientific certainty, the concept of 'statistical significance'. Some think it's fine as is. Others want to make it smaller, while yet others argue in favour of ditching it altogether.

Assuming we did bite the bullet and ditch the concept of 'probability values', we'd need to replace it with an even better idea. This recent issue of The American Statistician contains a few opinions - a whopping 43 of them, in fact.

To paraphrase a famous Winston Churchill quote, the p-value has long been the worst way to distinguish useful ideas in science… except for all of the other methods that were tried from time to time.

It's not really p's fault. On its own, the value simply tells you how likely it is you've backed the wrong horse in your experiment.

Usually, if the value falls below 0.05, it means there's less than a five percent chance that the null-hypothesis – an explanation for your observations that aren't part of your brilliant idea – is what's really behind your results.

Why five percent? Because history, really. It's a better gamble than 10 percent, while not being as strict as one percent. There really isn't a magical quality to it otherwise.

There's a bunch of statistical tools researchers can use to calculate this figure of significance. Problems arise when we attempt to translate this mathematical ideal into something the meat computers inside our skulls can actually act upon.

Our brains don't deal with probability very well. Maybe it has to do with the fact we never evolved to be concerned with the probability of being eaten by a bear when it's already chewing on our face.

We deal far better with the crisp division of a true or false statement. So a hazy maybe of a p < 0.05 is often hard to swallow, making it prone to abuse.

"The world is much more uncertain than that," University of Georgia Nicole Lazar statistician told NPR writer Richard Harris.

Together with American Statistical Association executive director Ronald L. Wasserstein and retired vice president of Mathematica Policy Research Allen Schirm, Lazar penned an editorial fronting an anthology of musings on how we might do better than 'p'.

Clearly there are ways a probability figure can benefit us, but only if we don't do stupid things with it, like assuming it does more than tell you that clever explanation of yours is simply still a contender.

"Knowing what not to do with p-values is indeed necessary, but it does not suffice," the trio write.

"It is as though statisticians were asking users of statistics to tear out the beams and struts holding up the edifice of modern scientific research without offering solid construction materials to replace them."

The issue's articles aren't in complete consensus on what those construction materials should look like. But many do share a few basic elements.

Retiring significance, some argue, should ideally translate into data tabulation and method descriptions that provide added nuance, humbly teasing apart the possibilities while still arguing in favour of a single explanation.

"We must learn to embrace uncertainty," several of the authors write in their Nature opinion piece.

"One practical way to do so is to rename confidence intervals as 'compatibility intervals' and interpret them in a way that avoids overconfidence."

This isn't just a cheap p-value makeover. It would require researchers to actively describe the practical implications of values within those intervals.

The ultimate goal would be to establish practices that avoid the cut-offs leading to true-or-false thinking and instead reinforce the uncertainty that underpins the scientific method.

Science, at its heart, is a conversation after all. Policy makers, technicians, and engineers are the eavesdroppers who distill that buzz of voices into a concrete decision, but for scientists chasing the next step in research, a p-value on its own isn't overly useful.

Unfortunately it has become a finishing line in the race for knowledge, where funding for investigations and public accolades await the winners.

Overturning such a solidly entrenched cultural practice will take a lot more than a few editorials and a handful of well-argued science papers. The p-value has been a respected part of science for about a century now, so it'll be around for a while to come.

But maybe this kind of thinking offers us some convenient stepping stones to get us beyond statistical significance, into a place where the blurred lines of uncertainty can be celebrated.