How AI systems learn: approaches and concepts

As you know, goal of AI learning is generalisation, but one major issue is that data alone will never be enough, no matter how much of it is available. AI systems need both data and they need to learn based on data in order to generalise.

So let’s look at how AI systems learn. But before we do that, what are the few different and prevalent AI approaches?

Neural networks model a brain learning by example―given a set of right answers, a neural network learns the general patterns. Reinforcement Learning models a brain learning by experience―given some set of actions and an eventual reward or punishment, it learns which actions are ‘good’ or ‘bad,’ as relevant in context. Genetic Algorithms model evolution by natural selection―given some set of agents, let the better ones live and the worse ones die.

Usually, genetic algorithms do not allow agents to learn during their lifetimes, while neural networks allow agents to learn only during their lifetimes. Reinforcement learning allows agents to learn during their lifetimes and share knowledge with other agents.

Consider learning a Boolean function of (say) 100 variables from a million examples. There are 2100 ^ 100 examples whose classes you don’t know. How do you figure out what those classes are? In the absence of further information, there is no way to do this that beats flipping a coin. This observation was first made (in somewhat different form) by David Hume over 200 years ago, but even today many mistakes in ML stem from failing to appreciate it. Every learner must embody some knowledge/assumptions beyond the data it’s given in order to generalise beyond it.

This seems like rather depressing news. How then can we ever hope to learn anything? Luckily, the functions we want to learn in the real world are not drawn uniformly from the set of all mathematically possible functions. In fact, very general assumptions—like similar examples having similar classes, limited dependences, or limited complexity—are often enough to do quite well, and this is a large part of why ML has been so successful to date.

AI systems use induction, deduction, abduction and other methodologies to collect, analyse and learn from data, allowing generalisation to happen.

Like deduction, induction (what learners do) is a knowledge lever: it turns a small amount of input knowledge into a large amount of output knowledge. Induction (despite its limitations) is a more powerful lever than deduction, requiring much less input knowledge to produce useful results, but it still needs more than zero input knowledge to work.

Abduction is sometimes used to identify faults and revise knowledge based on empirical data. For each individual positive example that is not derivable from the current theory, abduction is applied to determine a set of assumptions that would allow it to be proven. These assumptions can then be used to make suggestions for modifying the theory. One potential repair is to learn a new rule for the assumed proposition so that it could be inferred from other known facts about the example. Another potential repair is to remove the assumed proposition from the list of antecedents of the rule in which it appears in the abductive explanation of the example – parsimonious covering theory (PCT). Abductive reasoning is useful in inductively revising existing knowledge bases to improve their accuracy. Inductive learning can be used to acquire accurate abductive theories.

One key concept in AI is classifier. Generally, AI systems can be divided into two types: classifiers (“if shiny and yellow then gold”) and controllers (“if shiny and yellow then pick up”). Controllers also include classify-ing conditions before inferring actions. Classifiers are functions that use pattern matching to determine a closest match. They can be tuned according to examples known as observations or patterns. In supervised learning, each pattern belongs to a certain predefined class. A class can be seen as a decision that has to be made. All the observations combined with their class labels are known as data set. When a new observation is made, it is classified based on previous experience.

Classifier performance depends greatly on the characteristics of the data to be classified. The most widely used classifiers use kernel methods to be trained (i.e. to learn). There is no single classifier that works best on all given problems – “no free lunch“. Determining an optimal classifier for a given problem is still more an art than science.

The following formula sums up the process of AI learning.

LEARNING = REPRESENTATION + EVALUATION + OPTIMISATION

Representation. A classifier must be represented in some formal language that the computer can handle. Conversely, choosing a representation for a learner is tantamount to choosing the set of classifiers that it can possibly learn. This set is called the hypothesis space of the learner. If a classifier is not in the hypothesis space, it cannot be learned. A related question is how to represent the input, i.e., what features to use.

Evaluation. An evaluation function is needed to distinguish good classifiers from bad ones. The evaluation function used internally by the algorithm may differ from the external one that we want the classifier to optimise, for ease of optimisation (see below) and due to the issues discussed in the next section.

Optimisation. We need a method to search among the classifiers in the language for the highest-scoring one. The choice of optimisation technique is key to the efficiency of the learner, and also helps determine the classifier produced if the evaluation function has more than one optimum. It is common for new learners to start out using off-the-shelf optimisers.

Key criteria for choosing a representation is which kinds of knowledge are easily expressed in it. For example, if we have knowledge about probabilistic dependencies, graphical models are a good fit. And if we have knowledge about what kinds of preconditions are required by each class, “IF . . . THEN . . .” rules may be the the best option. The most useful learners in this regard are those that don’t just have assumptions hard-wired into them, but allow us to state them explicitly, vary them widely, and incorporate them dynamically into the learning.

What if the knowledge and data we have are not sufficient to completely determine the correct classifier? Then we run the risk of just inventing a classifier (or parts of it) that is not grounded in reality, and is simply encoding random quirks in the data. This problem is called overfitting, and is the bugbear of ML. When a learner outputs a classifier that is 100% accurate on the training data but only 50% accurate on real data, when in fact it could have output one that is 75% accurate on both, it has overfit.

One way to understand overfitting is by decomposing generalisation error into bias and variance. Bias is a learner’s tendency to consistently learn the same wrong thing. Variance is the tendency to learn random things irrespective of the real signal. Cross-validation can help to combat overfitting, but it’s no panacea, since if we use it to make too many parameter choices it can itself start to overfit. Besides cross-validation, there are many methods to combat overfitting, the most popular one is adding a regularisation term to the evaluation function. Another option is to perform a statistical significance test like chi-square before adding new structure, to decide whether the distribution of the class really is different with and without this structure.

Sources and relevant articles:

Top 13 challenges AI is facing in 2017

AI and ML feed on data, and companies that center their business around the technology are growing a penchant for collecting user data, with or without the latter’s consent, in order to make their services more targeted and efficient. Already implementations of AI/ML are making it possible to impersonate people by imitating their handwriting, voice and conversation style, an unprecedented power that can come in handy in a number of dark scenarios. However, despite large amounts of previously collected data, early AI pilots have challenges producing dramatic results that technology enthusiasts predicted. For example, early efforts of companies developing chatbots for Facebook’s Messenger platform saw 70% failure rates in handling user requests.

One of main challenges of AI goes beyond data: false positives. For example, a name-ranking algorithm ended up favoring white-sounding names, and advertising algorithms preferred to show high-paying job ads to male visitors.

Another challenge that caused much controversy in the past year was the “filter bubble” phenomenon that was seen in Facebook and other social media that tailored content to the biases and preferences of users, effectively shutting them out from other viewpoints and realities that were out there.

Additionally, as we give more control and decision-making power to AI algorithms, not only technological, but moral/philosophical considerations become important – when a self-driving car has to choose between the life of a passenger and a pedestrian.

To sum up, following are the challenges that AI still faces, despite creating and processing increasing amounts of of data and unprecedented amounts of other resources (number of people working on algorithms, CPUs, storage, better algorithms, etc.):

Unsupervised Learning: Deep neural networks have afforded huge leaps in performance across a variety of image, sound and text problems. Most noticeably in 2015, the application of RNNs to text problems (NLP, language translation, etc.) have exploded. A major bottleneck in unsupervised learning is labeled data acquisition. It is known humans learn about objects and navigation with relatively little labeled “training” data. How is this performed? How can this be efficiently implemented in machines?
Select Induction Vs. Deduction Vs. Abduction Based Approach: Induction is almost always a default choice when it comes to building an AI model for data analysis. However, it – as well as deduction, abduction, transduction – has its limitations which need serious consideration.
Model Building: TensorFlow has opened the door for conversations about building scalable ML platforms. There are plenty of companies working on data-science-in-the-cloud (H2O, Dato, MetaMind, …) but the question remains, what is the best way to build ML pipelines? This includes ETL, data storage and optimisation algorithms.
Smart Search: How can deep learning create better vector spaces and algorithms than Tf-Idf? What are some better alternative candidates?
Optimise Reinforced Learning: As this approach avoids the problems of getting labelled data, the system needs to get data, learn from it and improve. While AlphaGo used RL to win against the Go champion, RL isn’t without its own issues: discussion on a more lightweight and conceptual level one on a more technical aspect.
Build Domain Expertise: How to build and sustain domain knowledge in industries and for problems, which involve reasoning based on a complex body of knowledge like Legal, Financial, etc. and then formulate a process where machines can simulate an expert in the field.
Grow Domain Knowledge: How can AI tackle problems, which involve extending a complex body of knowledge by suggesting new insights to the domain itself – for example new drugs to cure diseases?
Complex Task Analyser and Planner: How can AI tackle complex tasks requiring data analysis, planning and execution? Many logistics and scheduling tasks can be done by current (non-AI) algorithms. A good example is the use of AI techniques in IoT for Sparse datasets . AI techniques help this case because there are large and complex datasets where human beings cannot detect patterns but machines can do so easily.
Better Communication: While proliferation of smart chatbots and AI-powered communication tools is a trend since several years, these communication tools are still far from being smart, and may at times fail at recognising even a simple human language.
Better Perception and Understanding: While Alibaba, Face+ create facial recognition software, visual perception and labelling are still generally problematic. There are few good examples, like this Russian face recognition app that is good enough to be considered a potential tool for oppressive regimes seeking to identify and crack down on dissidents. Another algorithm proved to be effective at peeking behind masked images and blurred pictures.
Anticipate Second-Order (and higher) Consequences: AI and deep learning have improved computer vision, for example, to the point that autonomous vehicles (cars and trucks) are viable (Otto, Waymo) . But what will their impact be on economy and society? What’s scary is that with advance of AI and related technologies, we might know less on AI’s data analysis and decision making process. Starting in 2012, Google used LSTMs to power the speech recognition system in Android, and in December 2016, Microsoft reported their system reached a word error rate of 5.9% — a figure roughly equal to that of human abilities for the first time in history. The goal-post continues to be moved rapidly .. for example loom.ai is building an avatar that can capture your personality. Preempting what’s to come, starting in the summer of 2018, EU is considering to require that companies be able to give users an explanation for decisions that their automated systems reach.
Evolution of Expert Systems: Expert systems have been around for a long time. Much of the vision of expert systems could be implemented in AI/deep learning algorithms in the near future. The architecture of IBM Watson is an indicative example.
Better Sentiment Analysis: Catching up but still far from lexicon-based model for sentiment analysis, it is still pretty much a nascent and unchartered space for most AI applications. There are some small steps in this regard though, including OpenAI’s usage of mLSTM methodology to conduct sentiment analysis of text. The main issue is that there are many conceptual and contextual rules (rooted and steeped in particulars of culture, society, upbringing, etc of individuals) that govern sentiment and there are even more clues (possibly unlimited) that can convey these concepts.

Thoughts/comments?

101 and failures of Machine Learning

Nowadays, ‘artificial intelligence’ (AI) and ‘machine learning’ (ML) are cliches that people use to signal awareness about technological trends. Companies tout AI/ML as panaceas to their ills and competitive advantage over their peers. From flower recognition to an algorithm that won against Go champion to big financial institutions, including ETFs of the biggest hedge fund in the world are already or moving to the AI/ML era.

However, as with any new technological breakthroughs, discoveries and inventions, the path is laden with misconceptions, failures, political agendas, etc. Let’s start by an overview of basic methodologies of ML, the foundation of AI.

101 and limitations of AI/ML

The fundamental goal of ML is to generalise beyond specific examples/occurrences of data. ML research focuses on experimental evaluation on actual data for realistic problems. ML’s performance is then evaluated by training a system (algorithm, program) on a set of test examples and measuring its accuracy at predicting the novel test (or real-life) examples.

Most frequently used methods in ML are induction and deduction. Deduction goes from the general to the particular, and induction goes from the particular to the general. Deduction is to induction what probability is to statistics.

Let’s start with induction. Domino effect is perhaps the most famous instance of induction. Inductive reasoning consists in constructing the axioms (hypotheses, theories) from the observation of supposed consequences of these axioms.Induction alone is not that useful: the induction of a model (a general knowledge) is interesting only if you can use it, i.e. if you can apply it to new situations, by going somehow from the general to the particular. This is what scientists do: observing natural phenomena, they postulate the laws of Nature. However, there is a problem with induction. It’s impossible to prove that an inductive statement is correct. At most can one empirically observe that the deductions that can be made from this statement are not in contradiction with experiments. But one can never be sure that no future observation will contradict the statement. Black Swam theory is the most famous illustration of this problem.

Deductive reasoning consists in combining logical statements (axioms, hypothesis, theorem) according to certain agreed upon rules in order to obtain new statements. This is how mathematicians prove theorems from axioms. Proving a theorem is nothing but combining a small set of axioms with certain rules. Of course, this does not mean proving a theorem is a simple task, but it could theoretically be automated.

A problem with deduction is exemplified by Gödel’s theorem, which states that for a rich enough set of axioms, one can produce statements that can be neither proved nor disproved.

Two other kinds of reasoning exist, abduction and analogy, and neither is frequently used in AI/ML, which may explain many of current AI/ML failures/problems.

Like deduction, abduction relies on knowledge expressed through general rules. Like deduction, it goes from the general to the particular, but it does in an unusual manner since it infers causes from consequences. So, from “A implies B” and “B”, A can be inferred. For example, most of a doctor’s work is inferring diseases from symptoms, which is what abduction is about. “I know the general rule which states that flu implies fever. I’m observing fever, so there must be flu.” However, abduction is not able to build new general rules: induction must have been involved at some point to state that “flu implies fever”.

Lastly, analogy goes from the particular to the particular. The most basic form of analogy is based on the assumption that similar situations have similar properties. More complex analogy-based learning schemes, involving several situations and recombinations can also be considered. Many lawyers use analogical reasoning to analyse new problems based on previous cases. Analogy completely bypasses the model construction: instead of going from the particular to the general, and then from to the general to the particular, it goes directly from the particular to the particular.

Let’s next check some of conspicuous failures in AI/ML (in 2016) and corresponding AI/ML methodology that, in my view, was responsible for failure:

Microsoft’s chatbot Tay utters racist, sexist, homophobic slurs (mimicking/analogising failure)

In an attempt to form relationships with younger customers, Microsoft launched an AI-powered chatbot called “Tay.ai” on Twitter in 2016. “Tay,” modelled around a teenage girl, morphed into a “Hitler-loving, feminist-bashing troll“—within just a day of her debut online. Microsoft yanked Tay off the social media platform and announced it planned to make “adjustments” to its algorithm.

AI-judged beauty contest was racist (deduction failure)

In “The First International Beauty Contest Judged by Artificial Intelligence,” a robot panel judged faces, based on “algorithms that can accurately evaluate the criteria linked to perception of human beauty and health.” But by failing to supply the AI/ML with a diverse training set, the contest winners were all white.

Chinese facial recognition study predicted convicts but shows bias (induction/abduction failure)

Researchers in China’s published a study entitled “Automated Inference on Criminality using Face Images.” They “fed the faces of 1,856 people (half of which were convicted violent criminals) into a computer and set about analysing them.” The researchers concluded that there were some discriminating structural features for predicting criminality, such as lip curvature, eye inner corner distance, and the so-called nose-mouth angle. Many in the field questioned the results and the report’s ethics underpinnings.

Concluding remarks

The above examples must not discourage companies to incorporate AI/ML into their processes and products. Most AI/ML failures seem to stem from band-aid, superfluous way of embracing AI/ML. A better and more sustainable approach to incorporating AI/ML would be to initiate a mix of projects generating both quick-wins and long-term transformational products/services/process. For quick-wins, a company might focus on changing internal employee touchpoints, using recent advances in speech, vision, and language understanding, etc.

For long-term projects, a company might go beyond local/point optimisation, to rethinking business lines, products/services, end-to-end processes, which is the area in which companies are likely to see the greatest impact. Take Google. Google’s initial focus was on incorporating ML into a few of their products (spam detection in Gmail, Google Translate, etc), but now the company is using machine learning to replace entire sets of systems. Further, to increase organisational learning, the company is dispersing ML experts across product groups and training thousands of software engineers, across all Google products, in basic machine learning.