There's an awful lot of institutional pressure on scientific researchers to keep making new, important discoveries and report their 'breakthrough' findings in academic journals. This phenomenon, sometimes called 'publish or perish', unfortunately leads some authors to take fraudulent shortcuts in their research – falsifying data so they can pump out their latest paper.
But short of peer-reviewing new studies, how can the scientific world weed out the fakers from the legit papers? Researchers from Stanford University have developed one way to spot the frauds, developing an "obfuscation index" that rates the degree to which dishonest scientists attempt to mask false results in their work.
"We believe the underlying idea behind obfuscation is to muddle the truth," said one of the team, David Markowitz. "Scientists faking data know that they are committing a misconduct and do not want to get caught. Therefore, one strategy to evade this may be to obscure parts of the paper. We suggest that language can be one of many variables to differentiate between fraudulent and genuine science."
In other words, scientists use big words and inscrutable jargon to try to hoodwink their readers, hoping that the puzzlement over their language and methods will be enough to distract from any shortcomings in their data.
To test this theory, the researchers examined 253 scientific papers that were retracted from journals for reasons of documented fraud between 1973 and 2013 and compared them with unretracted articles from the same journals during that period.
With their customised obfuscation index, they rated the false papers based on how they used causal terms, abstract language, jargon, positive emotion terms, and also assessed how easy they were to read.
The findings, which are published in the Journal of Language and Social Psychology, describe how the fraudulent papers were written with significantly higher levels of linguistic obfuscation, offered lower readability, and contained higher rates of jargon than unretracted and non-fraudulent studies.
"Fraudulent papers had about 60 more jargon-like words per paper compared to unretracted papers," Markowitz said, noting that this amounts on average to approximately 1.5 percent more jargon. "This is a non-trivial amount."
While it's conceivable that a computerised system based on the obfuscation index could one day automate flagging of suspect studies, the researchers who developed it for this one-off study say such a development could ultimately do more harm than good to the world of scientific discovery.
"Science fraud is of increasing concern in academia, and automatic tools for identifying fraud might be useful," said one of the researchers, Jeff Hancock. "But much more research is needed before considering this kind of approach. Obviously, there is a very high error rate that would need to be improved, but also science is based on trust, and introducing a 'fraud detection' tool into the publication process might undermine that trust."