An attempt to replicate the findings of 21 social science experiments published in two high-profile science journals has thrown up a red flag for reliability in research.
The question of just how much trust we should put into one-off studies isn't new. But there's a silver lining to the so-called Reproducibility Crisis we need to keep in mind – this is an opportunity we simply cannot afford to miss.
Several years ago, the focus was on psychology and the uncomfortable discovery that many influential psychological studies didn't always come to the same conclusion when replicated.
Other scientific fields have since followed. The Experimental Economics Replication Project redid 18 experiments published in two key economics journals between 2011 and 2014, finding a significant match in results for just 11 of them.
A project run by the US Centre for Open Science and Science Exchange has been slowly redoing key experiments in cancer research, discovering similar challenges when it comes to replicating past results with a 40 percent success rate.
This time a team of researchers led by the California Institute of Technology have turned their attention to the social sciences, systematically selecting papers published between 2010 and 2015 in the journals Nature and Science, and repeating their methods.
There was one small change they made to the protocols – in each case, the size of the sample they tested was roughly five times larger, seriously boosting their power.
Science and Nature are the big guns of the peer-review world. Researchers who get their papers into these two journals know their work is not only going to be read by more people, but they're going to be trusted.
But is this trust always warranted?
For 13 experiments, the researchers found a conclusion that fell more or less in line with that of the original study. They determined an agreement of between 57 and 67 percent, depending on how you define 'agree'.
Taking the size of the effects into account, the results were on average 50 percent weaker. So while most of the repeated studies pointed in the same direction as the original, the originals often oversold their conclusions.
Only four papers were selected from Nature, but the team was able to replicate three of those. They were unable to replicate seven of the seventeen studies from the journal Science.
Before we get fingers wagging, we need to put all of this into context.
Failing to replicate a finding does not make the original finding wrong. It does leave us in a precarious situation, open to questioning the results and reducing our confidence in their usefulness.
An inability to come to the same conclusions also doesn't completely invalidate the data itself. Context is everything, and a different conclusion might not make the data or even some of the findings bad.
Interestingly, none of this is news to the social science community. A survey conducted by the research team found they were able to reliably predict which papers would be replicated and which wouldn't.
This suggests the scientific community has the critical thinking skills that help them take some studies with a grain of salt where necessary.
Still, it's clear there's plenty of room for improvement. And this goes for the big journals as much as the smaller, more niche publications.
The findings can help us improve the methodologies we use to design and carry out experiments, or find better ways to report their significance.
Where two studies completely disagree, there's every possibility that a latent independent variable is responsible for the conflict.
This shadowy influence lurking just out of sight could be the key to understanding a phenomenon. Far from poor science, an inability to replicate just might point the way to new science.
To take advantage of this, we need to take a hard look at the way we fund and carry out scientific research. Replicating everything might not be feasible, but how do we prioritise the ones we could get the most benefit out of repeating?
Some researchers have suggested changes to how we report on statistical significance, setting stricter boundaries that could rule out less convincing conclusions before they're ever published.
Others argue we need to register our intent to publish before we finish a study, removing the temptation to file away experiments that don't seem to be going as planned, providing a more accurate view of the research landscape.
The answers are out there, and science is stronger for asking these hard questions.
Forget calling it a crisis. This is the Reproducibility Opportunity, and we're hoping some good things come of it.
This research was published in Nature Human Behaviour .