Universal Pictures

A Tougher Turing Test Shows Chatbots Are Still Pretty Stupid

 A simple way to defeat our robotic overlords.

15 JUL 2016

To find out just how advanced our current AI systems are, researchers have developed a tougher Turing Test - called the Winograd Schema Challenge - which measures how well robotic intelligence matches human intelligence.

In the end, the team found that - even though AI is definitely improving every day - our robotic pals are still seriously lacking some common sense, suggesting that it will be some time before AI is fully ready to meld with society.


First, before we go any further into the new competition's results, it's important to define what a 'Turing Test' actually is. Developed and coined by Alan Turing back in the 1950s, the Turing Test is a way for researchers to challenge computer-based intelligence to see if it can become indistinguishable from human intelligence, which is basically the goal for AI researchers.

These tests are mostly language-based because human language is - when you truly think about it - super weird. In short, Turing believed that AI should aim to have robots that a human can talk to without knowing they are robots. As anyone who has ever screamed at Siri knows, this is a task much easier said than done.

So, to test current AI systems, The Winograd Schema Challenge was created by Hector Levesque from the University of Toronto. Basically, the challenge pits artificial intelligence against sentences that are ambiguous but still simple for humans to understand.

The best way to understand the test is to see a few samples in action. Take this question: "The trophy would not fit in the brown suitcase because it was too big. What was too big?"

The trophy, obviously, because if the suitcase was too big the sentence wouldn't make sense. But bots still struggle with this kind of language.

Here's another: "The city councilmen refused the demonstrators a permit because they feared violence."


Most of us would recognise that the city councilmen feared violence, rather than the demonstrators, but again that's tricky for a computer to understand.

The two winners  - both of which only came up with the correct interpretations 48 percent of the time - of the contest were programmed by Quan Liu, from the University of Science and Technology of China, and Nicos Issak from the Open University of Cyprus. Unfortunately, a success rate of 90 percent is required to claim the $25,000 prize so they couldn't cash in.

As Will Knight reports at the MIT Technology Review, this kind of understanding is difficult to create from statistical analysis (which is what computers are good at), but also takes an impossibly long time to code by hand.

"It's unsurprising that machines were barely better than chance," said one of the contest's advisors, research psychologist Gary Marcus from New York University.

The Turing Test only asks bots to be clever enough for judges to be unsure if they're talking to a human or not – the Winograd Schema Challenge is on a whole new level that closely examines how well robots actually understand what people are saying to them.

The entrant submitted by Quan Liu, built with assistance from researchers at York University in Toronto and the National Research Council of Canada, used techniques known as deep learning, where software is trained on huge amounts of data to try and spot patterns and mimic the neuron activity going on in our own brains.

More powerful computers and more complex mathematical equations mean deep learning processes are improving quickly, but there's still that hard-to-define element of human common sense that's so hard to copy.

Liu and his team used thousands of sample texts to teach their bot the differences between different types of events like "playing basketball" or "getting injured".

While Apple, Google, and Facebook have been promoting digital assistants and bots of their own lately, none of them decided to enter the contest, and that's probably because their technology isn't ready.

"It could've been that those guys waltzed into this room and got a hundred percent and said 'hah!'" added Marcus. "But that would’ve astounded me."

If we really want our AI assistants to help us with everyday tasks then this kind of understanding is eventually going to be vital – it just might take us a long time to get there. The hope is that some advance will take place, raising the level of AI intelligence quickly, but that's very wishful thinking. 

More From ScienceAlert

New hydrogel that mimics cartilage could make knee repairs easier
10 hours ago