Artificial intelligence (AI) is proving very adept at certain tasks – like inventing human faces that don't actually exist, or winning games of poker – but these networks still struggle when it comes to something humans do naturally: imagine.
Once human beings know what a cat is, we can easily imagine a cat of a different color, or a cat in a different pose, or a cat in different surroundings. For AI networks, that's much harder, even though they can recognize a cat when they see it (with enough training).
To try and unlock AI's capacity for imagination, researchers have come up with a new method for enabling artificial intelligence systems to work out what an object should look like, even if they've never actually seen one exactly like it before.
"We were inspired by human visual generalization capabilities to try to simulate human imagination in machines," says computer scientist Yunhao Ge from the University of Southern California (USC).
"Humans can separate their learned knowledge by attributes – for instance, shape, pose, position, color – and then recombine them to imagine a new object. Our paper attempts to simulate this process using neural networks."
The key is extrapolation – being able to use a big bank of training data (like pictures of a car) to then go beyond what's seen into what's unseen. This is difficult for AI because of the way it's typically trained to spot specific patterns rather than broader attributes.
What the team has come up with here is called controllable disentangled representation learning, and it uses an approach similar to those used to create deepfakes – disentangling different parts of a sample (so separating face movement and face identity, in the case of a deepfake video).
It means that if an AI sees a red car and a blue bike, it will then be able to 'imagine' a red bike for itself – even if it has never seen one before. The researchers have put this together in a framework they're calling Group Supervised Learning.
One of the main innovations in this technique is processing samples in groups rather than individually, and building up semantic links between them along the way. The AI is then able to recognize similarities and differences in the samples it sees, using this knowledge to produce something completely new.
"This new disentanglement approach, for the first time, truly unleashes a new sense of imagination in AI systems, bringing them closer to humans' understanding of the world," says USC computer scientist Laurent Itti.
These ideas aren't completely new, but here the researchers have taken the concepts further, making the approach more flexible and compatible with additional types of data. They've also made the framework open source, so other scientists can make use of it more easily.
In the future, the system developed here could guard against AI bias by removing more sensitive attributes from the equation – helping to make neural networks that aren't racist or sexist, for example.
The same approach could also be applied in the fields of medicine and self-driving cars, the researchers say, with AI able to 'imagine' new drugs, or visualize new road scenarios that it hasn't been specifically trained for in the past.
"Deep learning has already demonstrated unsurpassed performance and promise in many domains, but all too often this has happened through shallow mimicry, and without a deeper understanding of the separate attributes that make each object unique," says Itti.
The research has been presented at the 2021 International Conference on Learning Representations and can be read here.