Top 13 challenges AI is facing in 2017

AI and ML feed on data, and companies that center their business around the technology are growing a penchant for collecting user data, with or without the latter’s consent, in order to make their services more targeted and efficient. Already implementations of AI/ML are making it possible to impersonate people by imitating their handwritingvoice and conversation style, an unprecedented power that can come in handy in a number of dark scenarios. However, despite large amounts of previously collected data, early AI pilots have challenges producing  dramatic results that technology enthusiasts predicted. For example, early efforts of companies developing chatbots for Facebook’s Messenger platform saw 70% failure rates in handling user requests.

One of main challenges of AI goes beyond data: false positives. For example, a name-ranking algorithm ended up favoring white-sounding names, and advertising algorithms preferred to show high-paying job ads to male visitors.

Another challenge that caused much controversy in the past year was the “filter bubble” phenomenon that was seen in Facebook and other social media that tailored content to the biases and preferences of users, effectively shutting them out from other viewpoints and realities that were out there.

Additionally, as we give more control and decision-making power to AI algorithms, not only technological, but moral/philosophical considerations become important – when a self-driving car has to choose between the life of a passenger and a pedestrian.

To sum up, following are the challenges that AI still faces, despite creating and processing increasing amounts of of data and unprecedented amounts of other resources (number of people working on algorithms, CPUs, storage, better algorithms, etc.):

  1. Unsupervised Learning: Deep neural networks have afforded huge leaps in performance across a variety of image, sound and text problems. Most noticeably in 2015, the application of RNNs to text problems (NLP, language translation, etc.) have exploded. A major bottleneck in unsupervised learning is labeled data acquisition. It is known humans learn about objects and navigation with relatively little labeled “training” data. How is this performed? How can this be efficiently implemented in machines?
  2. Select Induction Vs. Deduction Vs. Abduction Based Approach: Induction is almost always a default choice when it comes to building an AI model for data analysis. However, it – as well as deduction, abduction, transduction – has its limitations which need serious consideration.
  3. Model Building: TensorFlow has opened the door for conversations about  building scalable ML platforms. There are plenty of companies working on data-science-in-the-cloud (H2O, Dato, MetaMind, …) but the question remains, what is the best way to build ML pipelines? This includes ETL, data storage and  optimisation algorithms.
  4. Smart Search: How can deep learning create better vector spaces and algorithms than Tf-Idf? What are some better alternative candidates?
  5. Optimise Reinforced Learning: As this approach avoids the problems of getting labelled data, the system needs to get data, learn from it and improve. While AlphaGo used RL to win against the Go champion, RL isn’t without its own issues: discussion on a more lightweight and conceptual level one on a more technical aspect.

  6. Build Domain Expertise: How to build and sustain domain knowledge in industries and for problems, which involve reasoning based on a complex body of knowledge like Legal, Financial, etc. and then formulate a process where machines can simulate an expert in the field.
  7. Grow Domain Knowledge: How can AI tackle problems, which involve extending a complex body of knowledge by suggesting new insights to the domain itself – for example new drugs to cure diseases?
  8. Complex Task Analyser and Planner: How can AI tackle complex tasks requiring data analysis, planning and execution? Many logistics and scheduling tasks can be done by current (non-AI) algorithms. A good example is the use of AI techniques in IoT for Sparse datasets . AI techniques help this case because there are large and complex datasets where human beings cannot detect patterns but machines can do so easily.
  9. Better Communication: While proliferation of smart chatbots and AI-powered communication tools is a trend since several years, these communication tools are still far from being smart, and may at times fail at recognising even a simple human language.
  10. Better Perception and Understanding: While Alibaba, Face+ create facial recognition software, visual perception and labelling are still generally problematic. There are few good examples, like this Russian face recognition app  that is good enough to be considered a potential tool for oppressive regimes seeking to identify and crack down on dissidents. Another algorithm proved to be effective at peeking behind masked images and blurred pictures.
  11. Anticipate Second-Order (and higher) Consequences: AI and deep learning have improved computer vision, for example, to the point that autonomous vehicles (cars and trucks) are viable (Otto, Waymo) . But what will their impact be on economy and society? What’s scary is that with advance of AI and related technologies, we might know less on AI’s data analysis and decision making process. Starting in 2012, Google used LSTMs to power the speech recognition system in Android, and in December 2016, Microsoft reported their system reached a word error rate of 5.9%  —  a figure roughly equal to that of human abilities for the first time in history. The goal-post continues to be moved rapidly .. for example loom.ai is building an avatar that can capture your personality. Preempting what’s to come, starting in the summer of 2018, EU is considering to require that companies be able to give users an explanation for decisions that their automated systems reach.
  12. Evolution of Expert SystemsExpert systems have been around for a long time.  Much of the vision of expert systems could be implemented in AI/deep learning algorithms in the near future. The architecture of IBM Watson is an indicative example.
  13. Better Sentiment Analysis: Catching up but still far from lexicon-based model for sentiment analysis, it is still pretty much a nascent and unchartered space for most AI applications. There are some small steps in this regard though, including OpenAI’s usage of mLSTM methodology to conduct sentiment analysis of text. The main issue is that there are many conceptual and contextual rules (rooted and steeped in particulars of culture, society, upbringing, etc of individuals) that govern sentiment and there are even more clues (possibly unlimited) that can convey these concepts.

Thoughts/comments?

Reinforcement Learning vs. Evolutionary Strategy: combine, aggregate, multiply

A birds-eye view of main ML algorithms

In statistics, we have descriptive and inferential statistics. ML deals with the same problems and claims any problem where the solution isn’t programmed directly, but is learned by the program. ML generally works by numerically minimising something: a cost function or error.

Supervised learning – You have labeled data: a sample of ground truth with features and labels. You estimate a model that predicts the labels using the features. Alternative terminology: predictor variables and target variables. You predict the values of the target using the predictors.

  • Regression. The target variable is numeric. Example: you want to predict the crop yield based on remote sensing data. Recurrent neural networks result in a “regression” since they usually output a number (a sequence or a vector) instead of a class (e.g. sentence generation, curve plotting). Algorithms: linear regression, polynomial regression, generalised linear models.
  • Classification. The target variable is categorical. Example: you want to detect the crop type that was planted using remote sensing data. Or Silicon Valley’s “Not Hot Dog” application. Algorithms: Naïve Bayes, logistic regression, discriminant analysis, decision trees, random forests, support vector machines, neural networks (NN) of many variations: feed-forward NNs, convolutional NNs, recurrent NNs.

Unsupervised learning – You have a sample with unlabeled information. No single variable is the specific target of prediction. You want to learn interesting features of the data:

  • Clustering. Which of these things are similar? Example: group consumers into relevant psychographics. Algorithms – k-means, hierarchical clustering.
  • Anomaly detection. Which of these things are different? Example: credit card fraud detection. Algorithms: k-nearest-neighbor.
  • Dimensionality reduction. How can you summarise the data in a high-dimensional data set using a lower-dimensional dataset which captures as much of the useful information as possible (possibly for further modelling with supervised or unsupervised algorithms)? Example: image compression. Algorithms: principal component analysis (PCA), neural network auto-encoders.

Reinforcement Learning  (Policy Gradients, DQN, A3C,..) – You are presented with a game/environment that responds sequentially or continuously to your inputs, and you learn to maximise an objective through trial and error.

Evolutionary Strategy – This approach consists of maintaining a distribution over network weight values, and having a large number of agents act in parallel using parameters sampled from this distribution. With this score, the parameter distribution can be moved toward that of the more successful agents, and away from that of the unsuccessful ones. By repeating this approach millions of times, with hundreds of agents, the weight distribution moves to a space that provides the agents with a good policy for solving the task at hand.

All the complex tasks in ML, from self-driving cars to machine translation, are solved by combining these building blocks into complex stacks.

Pro/cons of RL and ES

One step towards building safe AI systems is to remove the need for humans to write goal functions, since using a simple proxy for a complex goal, or getting the complex goal a bit wrong, can lead to undesirable and even dangerous behaviour.

RL is known to be unstable or even to diverge when a nonlinear function approximator such as a NN is used to represent the action-value (also known as Q) function. This instability has several causes: the correlations present in the sequence of observations, the fact that small updates to Q may significantly change the policy and therefore change the data distribution, and the correlations between the action-values and the target values.

RL’s other challenge is generalisation. In typical deep RL methods, this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable.

Whereas RL methods such as A3C need to communicate gradients back and forth between workers and a parameter server, ES only requires fitness scores and high-level parameter distribution information to be communicated. It is this simplicity that allows the technique to scale up in ways current RL methods cannot. However, in situations with richer feedback signals however, things don’t go so well for ES.

Contextualising and combining the RL and ES

Appealing to nature for inspiration in AI can sometimes be seen as a problematic approach. Nature, after all, is working under constraints that computer scientists simply don’t have. If we look at intelligent behaviour in mammals, we find that it comes from a complex interplay of two ultimately intertwined processes, inter-life learning, and intra-life learning. Roughly speaking these two approaches in nature can be compared to the two in neural network optimisation. ES for which no gradient information is used to update the organism, is related to inter-life learning. Likewise, the gradient based methods (RL), for which specific experiences change the agent in specific ways, can be compared to intra-life learning.

The techniques employed in RL are in many ways inspired directly by the psychological literature on operant conditioning to come out of animal psychology. (In fact, Richard Sutton, one of the two founders of RL actually received his Bachelor’s degree in Psychology). In operant conditioning animals learn to associate rewarding or punishing outcomes with specific behaviour patterns. Animal trainers and researchers can manipulate this reward association in order to get animals to demonstrate their intelligence or behave in certain ways.

The central role of prediction in intra-life learning changes the dynamics quite a bit. What was before a somewhat sparse signal (occasional reward), becomes an extremely dense signal. At each moment mammalian brains are predicting the results of the complex flux of sensory stimuli and actions which the animal is immersed in. The outcome of the animals behaviour then provides a dense signal to guide the change in predictions and behaviour going forward. All of these signals are put to use in the brain in order to improve predictions (and consequently the quality of actions) going forward. If we apply this way of thinking to learning in artificial agents, we find that RL isn’t somehow fundamentally flawed, rather it is that the signal being used isn’t nearly as rich as it could (or should) be. In cases where the signal can’t be made more rich, (perhaps because it is inherently sparse, or to do with low-level reactivity) it is likely the case that learning through a highly parallelizable method such as ES is instead better.

Combining many

It is clear that for many reactive policies, or situations with extremely sparse rewards, ES is a strong candidate, especially if you have access to the computational resources that allow for massively parallel training.  On the other hand, gradient-based methods using RL or supervision are going to be useful when a rich feedback signal is available, and we need to learn quickly with less data.

An extreme example is combining more than just ES and RL and Microsoft’s Maluuba is a an illustrative example, which used many algorithms to beat the game Ms. Pac-Man. When the agent (Ms. Pac-Man) starts to learn, it moves randomly; it knows nothing about the game board. As it discovers new rewards (the little pellets and fruit Ms. Pac-Man eats) it begins placing little algorithms in those spots, which continuously learn how best to avoid ghosts and get more points based on Ms. Pac-Man’s interactions, according to the Maluuba research paper.

As the 163 potential algorithms are mapped, they continually send which movement they think would generate the highest reward to the agent, which averages the inputs and moves Ms. Pac-Man. Each time the agent dies, all the algorithms process what generated rewards. These helper algorithms were carefully crafted by humans to understand how to learn, however.

Instead of having one algorithm learn one complex problem, the AI distributes learning over many smaller algorithms, each tackling simpler problems, Maluuba says in a video. This research could be applied to other highly complex problems, like financial trading, according to the company.

But it’s worth noting that since more than 100 algorithms are being used to tell Ms. Pac-Man where to move and win the game, this technique is likely to be extremely computationally intensive.