Limits of deep learning and way ahead

Artificial intelligence has reached peak hype. News outlets report that companies have replaced workers with IBM Watson and algorithms are beating doctors at diagnoses. New AI startups pop up every day – especially in China – and claim to solve all your personal and business problems with machine learning.

Ordinary objects like juicers and wifi routers suddenly advertise themselves as “powered by AI”. Not only can smart standing desks remember your height settings, they can also order you lunch.

Much of the AI hubbub is generated by reporters who’ve little or superficial knowledge about the subject matter and startups  hoping to be acquihired for engineering talent despite not solving any real business problems. No wonder there are so many misconceptions about what A.I. can and cannot do.

Deep learning will shape the future ahead

Neural networks were invented in the 60s, but recent boosts in big data and computational power made them actually useful. The results are undeniably incredible. Computers can now recognize objects in images and video and transcribe speech to text better than humans can. Google replaced Google Translate’s architecture with neural networks and now machine translation is also closing in on human performance.

The practical applications are mind-blowing. Computers can predict crop yield better than the USDA and indeed diagnose cancer more accurately than expert physicians.

DARPA, the creator of Internet and many other modern technologies, sees three waves of AI:

  1. Handcrafted knowledge, or expert systems like IBM’s DeepBlue or IBM Watson;
  2. Statistical learning, which includes machine learning and deep learning;
  3. Contextual adaption, which involves constructing reliable, explanatory models for real world phenomena using sparse data, like humans do.

As part of the current second wave of AI, deep learning algorithms work well because of what the report calls the “manifold hypothesis.” This refers to how different types of high-dimensional natural data tend to clump and be shaped differently when visualised in lower dimensions.

darpa_manifolds_750px_web

By mathematically manipulating and separating data clumps, deep neural networks can distinguish different data types. While neural networks can achieve nuanced classification and predication capabilities they are what is called “spreadsheets on steroids.”

darpa_manifolds_separation_750px_web

Deep learning algorithms have deep learning problems

At the recent AI By The Bay conference, one expert and inventor of widely used deep learning library Keras,  Francois Chollet, thinks that deep learning is simply more powerful pattern recognition vs. previous statistical and machine learning methods and that the most important problems for AI today are abstraction and reasoning. Current supervised perception and reinforcement learning algorithms require lots of training, are terrible at planning, and are only doing straightforward pattern recognition.

By contrast, humans “learn from very few examples, can do very long-term planning, and are capable of forming abstract models of a situation and manipulate these models to achieve extreme generalisation.”

Even simple human behaviours are laborious to teach to a deep learning algorithm. Let’s examine the task of not being hit by a car as you walk down the road.

Humans only need to be told once to avoid cars. We’re equipped with the ability to generalise from just a few examples and are capable of imagining (i.e. modelling) the dire consequences of being run over. Without losing life or limb, most of us quickly learn to avoid being overrun by motor vehicles.

Let’s now see how this works out if we train a computer. If you go the supervised learning route, you need big data sets of car situations with clearly labeled actions to take, such as “stop” or “move”. Then you’d need to train a neural network to learn the mapping between the situation and the appropriate action. If you go the reinforcement learning route, where you give an algorithm a goal and let it independently determine the ideal actions to take, the computer will “die” many times before learning to avoid cars in different situations.

While neural networks achieve statistically impressive results across large sample sizes, they are “individually unreliable” and often make mistakes humans would never make, such as classify a toothbrush as a baseball bat.

misclassification_darpa_web

Your results are only as good as your data

Neural networks fed inaccurate or incomplete data will simply produce the wrong results. The outcomes can be both embarrassing and damaging. In two major PR debacles, Google Images incorrectly classified African Americans as gorillas, while Microsoft’s Tay learned to spew racist, misogynistic hate speech after only hours training on Twitter.

Undesirable biases may even be implicit in our input data. Google’s massive Word2Vec embeddings are built off of 3 million words from Google News.  The data set makes associations such as “father is to doctor as mother is to nurse” which reflect gender bias in our language.

For example, researchers go to human ratings on Mechanical Turk to perform “hard de-biasing” to undo the associations. Such tactics are essential since word embeddings not only reflect stereotypes but can also amplify them. If the term “doctor” is more associated with men than women, then an algorithm might prioritise male job applicants over female job applicants for open physician positions.

Neural networks can be tricked or exploited

Ian Goodfellow, inventor of GANsshowed that neural networks can be deliberately tricked with adversarial examples. By mathematically manipulating an image in a way that is undetectable to the human eye, sophisticated attackers can trick neural networks into grossly misclassifying objects.

ian_goodfellow_adversarial_attacks

The dangers such adversarial attacks pose to AI systems are alarming, especially since adversarial images and original images seem identical to us. Self-driving cars could be hijacked with seemingly innocuous signage and secure systems could be compromised by data that initially appears normal.

Potential solutions

How can we overcome the limitations of deep learning and proceed towards general artificial intelligence? Chollet’s initial plan is using “super-human pattern recognition like deep learning to augment explicit search and formal systems”, starting with the field of mathematical proofs. Automated Theorem Provers (ATPs) typically use brute force search and quickly hit combinatorial explosions in practical use. In the DeepMath project, Chollet and his colleagues used deep learning to assist the proof search process, simulating a mathematician’s intuitions about what lemmas might be relevant.

Another approach is to develop more explainable models. In handwriting recognition, neural nets currently need to be trained on many thousand examples to perform decent classification. Instead of looking at just pixels, generative models can be taught the strokes behind any given character and use this physical construction information to disambiguate between similar numbers, such as a 9 or a 4.

Yann LeCun, AI boss of Facebook, proposes “energy-based models” as a method of overcoming limits in deep learning. Typically, a neural network is trained to produce a single output, such as an image label or sentence translation. LeCun’s energy-based models instead give an entire set of possible outputs, such as the many ways a sentence could be translated, along with scores for each configuration.

Geoffrey Hinton, called the “father of deep learning” wants to replace neurons in neural networks with “capsules” which he believes more accurately reflect the cortical structure in the human mind. Evolution must have found an efficient way to adapt features that are early in a sensory pathway so that they are more helpful to features that are several stages later in the pathway. He thinks capsule-based neural network architectures will be more resistant to the adversarial attacks.

Perhaps all of these approaches to overcoming the limits of deep learning have a value. Perhaps none of them do. Only time and continued investment in AI will tell. But one thing seems quite certain: it might be impossible to achieve general intelligence simply by scaling up today’s deep learning techniques.