Brief overview: neural networks, architectures, frameworks

Deep learning is a new name for an approach to AI called neural networks, which have been going in and out of fashion for more than 70 years. Neural networks were first proposed in 1944 by Warren McCullough and Walter Pitts, two researchers who moved to MIT in 1952 as founding members of what’s sometimes called the first cognitive science department.

Neural networks were a major area of research in both neuroscience and computer science until 1969, when, according to computer science lore, they were killed off by the MIT mathematicians Marvin Minsky and Seymour Papert, who became co-directors of the new MIT Artificial Intelligence Laboratory in 1970.

Neural networks are a means of doing machine learning, in which a computer learns to perform specific tasks by analysing training examples. Usually, these examples have been hand-labeled in advance. An object recognition system, for instance, might be fed thousands of labeled images of cars, houses, coffee cups, and so on, and it would find visual patterns in the images that consistently correlate with particular labels.

Modelled loosely on the human brain, a neural net consists of thousands or even millions of simple processing nodes that are densely interconnected. Most of today’s neural nets are organised into layers of nodes, and they’re “feed-forward,” meaning that data moves through them in only one direction. An individual node might be connected to several nodes in the layer beneath it, from which it receives data, and several nodes in the layer above it, to which it sends data.

Architecture and main types of neural networks

A typical neural network contains a large number of artificial neurons called units arranged in a series of layers.

  • Input layer  contains units (artificial neurons) which receive input from the outside world on which network will learn, recognise about or otherwise process.
  • Output layer  contains units that respond to the information about how it learned a task.
  • Hidden layers  are situated between input and output layers. Their task is to transform the input into something that output unit can use in some way.
  • Perceptron  has two input units and one output unit with no hidden layers, and is also called single layer perceptron.
  • Radial Basis Function Network  are similar to the feed-forward neural network except radial basis function is used as activation function of these neurons.
  • Multilayer Perceptron  networks use more than one hidden layer of neurons. These are also known as deep feed-forward neural networks.
  • Recurrent Neural Network’s (RNN) hidden layer neurons have self-connections and thus possess memory. LSTM is a type of RNN.
  • Hopfield Network is a fully interconnected network of neurons in which each neuron is connected to every other neuron. The network is trained with input pattern by setting a value of neurons to the desired pattern. Then its weights are computed. The weights are not changed. Once trained for one or more patterns, the network will converge to the learned patterns.
  • Boltzmann Machine Network  are similar to Hopfield network except some neurons are for input, while others are hidden. The weights are initialized randomly and learn through back-propagation algorithm.
  • Convolutional Neural Network(CNN) derives its name from the “convolution” operator. The primary purpose of Convolution in case is to extract features from an input image/video. Convolution preserves the spatial relationship between pixels by learning about image/video features using small squares of input data.

Of these, let’s have a very brief review of CNNs and RNNs, as these are the most commonly used.


  1. CNNs are ideal for image and video processing.
  2. CNN takes a fixed size input and generate fixed-size outputs.
  3. Use CNNs to break a component (image/video) into subcomponents (lines, curves, etc.).
  4. CNN is a type of feed-forward artificial neural network – variation of multilayer perceptrons, which are designed to use minimal amounts of preprocessing.
  5. CNNs use connectivity pattern between its neurons as inspired by the organization of the animal visual cortex, whose neurons are arranged in such a way that they respond to overlapping regions tiling the visual field.
  6. CNN looks for the same patterns on all the different subfields of the image/video.


  1. RNNs are ideal for text and speech analysis.
  2. RNN can handle arbitrary input/output lengths.
  3. Use RNNs to create combinations of subcomponents (image captioning, text generation, language translation, etc.)
  4. RNN, unlike feedforward neural networks, can use its internal memory to process arbitrary sequences of inputs.
  5. RNNs use time-series information, i.e. what is last done will impact what done next.
  6. RNN, in the simplest case, feed hidden layers from the previous step as an additional input into the next step and while it builds up memory in this process, it is not looking for the same patterns.

A type of RNN are LSTM and GRU. The key difference between GRU and LSTM is that a GRU has two gates (reset and update) whereas an LSTM has three gates (inputoutput and forget). GRU is similar to LSTM in that both utilise gating information to solve vanishing gradient problem. GRU’s performance is on par with LSTM, but computationally more efficient.

  • GRUs train faster and perform better than LSTMs on less training data if used for language modelling.
  • GRUs are simpler and easier to modify, for example adding new gates in case of additional input to the network.
  • In theory, LSTMs remember longer sequences than GRUs and outperform them in tasks requiring modelling long-distance relations.
  • GRUs expose complete memory, unlike LSTM
  • It’s recommended to train both GRU and LSTM and see which is better.

Deep learning frameworks

There are several frameworks that provide advanced AI/ML capabilities. How do you determine which framework is best for you?

The below figure summarises most of the popular open source deep network repositories. The ranking is based on the number of stars awarded by developers in GitHub (as of May 2017).

deep learning frameworks ranked via GitHub

Google’s TensorFlow is a library developed at Google Brain. TensorFlow supports a broad set of capabilities such as image, handwriting and speech recognition, forecasting and natural language processing (NLP). Its programming interfaces includes Python and C++ and alpha releases of Java, GO, R, and Haskell API will soon be supported.

Caffe is the brainchild of Yangqing Jia who leads engineering for Facebook AI. Caffe is the first mainstream industry-grade deep learning toolkit, started in late 2013. Due to its excellent convolutional model, it is one of the most popular toolkits within the computer vision community. Speed makes Caffe perfect for research experiments and commercial deployment. However, it does not support fine granularity network layers like those found in TensorFlow and Theano. Caffe can process over 60M images per day with a single Nvidia K40 GPU. It’s cross-platform and supports C++, Matlab and Python programming interfaces and has a large user community that contributes to their own repository known as “Model Zoo.” AlexNet and GoogleNet are two popular user-made networks available to the community.

Caffe 2 was unveiled in April 2017 and is focused on being modular and excelling at mobile and at large scale deployments. Like TensorFlow, Caffe 2 will support ARM architecture using the C++ Eigen library and continue offering strong support for vision-related problems, also adding in RNN and LSTM networks for NLP, handwriting recognition, and time series forecasting.

MXNet is a fully featured, programmable and scalable deep learning framework, which offers the ability to both mix programming models (imperative and declarative) and code in Python, C++, R, Scala, Julia, Matlab and JavaScript. MXNet supports CNN and RNN, including LTSM networks and provides excellent capabilities for imaging, handwriting and speech recognition, forecasting and NLP. It’s considered the world’s best image classifier, and supports GAN simulations. This model is used in Nash equilibrium to perform experimental economics methods. Amazon supports MXNet, planning to use it in existing and upcoming services whereas Apple is rumorred to be also using it.

Theano architecture lacks the elegance of TensorFlow, but provides capabilities like symbolic API supports looping control, so-called scan, which makes implementing RNNs easy and efficient. Theano supports many types of convolutions for hand writing and image classification including medical images. Theano uses 3D convolution/pooling for video classification. It can process natural language processing tasks, including language understanding, translation, and generation. Theano supports GAN.


Top 13 challenges AI is facing in 2017

AI and ML feed on data, and companies that center their business around the technology are growing a penchant for collecting user data, with or without the latter’s consent, in order to make their services more targeted and efficient. Already implementations of AI/ML are making it possible to impersonate people by imitating their handwritingvoice and conversation style, an unprecedented power that can come in handy in a number of dark scenarios. However, despite large amounts of previously collected data, early AI pilots have challenges producing  dramatic results that technology enthusiasts predicted. For example, early efforts of companies developing chatbots for Facebook’s Messenger platform saw 70% failure rates in handling user requests.

One of main challenges of AI goes beyond data: false positives. For example, a name-ranking algorithm ended up favoring white-sounding names, and advertising algorithms preferred to show high-paying job ads to male visitors.

Another challenge that caused much controversy in the past year was the “filter bubble” phenomenon that was seen in Facebook and other social media that tailored content to the biases and preferences of users, effectively shutting them out from other viewpoints and realities that were out there.

Additionally, as we give more control and decision-making power to AI algorithms, not only technological, but moral/philosophical considerations become important – when a self-driving car has to choose between the life of a passenger and a pedestrian.

To sum up, following are the challenges that AI still faces, despite creating and processing increasing amounts of of data and unprecedented amounts of other resources (number of people working on algorithms, CPUs, storage, better algorithms, etc.):

  1. Unsupervised Learning: Deep neural networks have afforded huge leaps in performance across a variety of image, sound and text problems. Most noticeably in 2015, the application of RNNs to text problems (NLP, language translation, etc.) have exploded. A major bottleneck in unsupervised learning is labeled data acquisition. It is known humans learn about objects and navigation with relatively little labeled “training” data. How is this performed? How can this be efficiently implemented in machines?
  2. Select Induction Vs. Deduction Vs. Abduction Based Approach: Induction is almost always a default choice when it comes to building an AI model for data analysis. However, it – as well as deduction, abduction, transduction – has its limitations which need serious consideration.
  3. Model Building: TensorFlow has opened the door for conversations about  building scalable ML platforms. There are plenty of companies working on data-science-in-the-cloud (H2O, Dato, MetaMind, …) but the question remains, what is the best way to build ML pipelines? This includes ETL, data storage and  optimisation algorithms.
  4. Smart Search: How can deep learning create better vector spaces and algorithms than Tf-Idf? What are some better alternative candidates?
  5. Optimise Reinforced Learning: As this approach avoids the problems of getting labelled data, the system needs to get data, learn from it and improve. While AlphaGo used RL to win against the Go champion, RL isn’t without its own issues: discussion on a more lightweight and conceptual level one on a more technical aspect.

  6. Build Domain Expertise: How to build and sustain domain knowledge in industries and for problems, which involve reasoning based on a complex body of knowledge like Legal, Financial, etc. and then formulate a process where machines can simulate an expert in the field.
  7. Grow Domain Knowledge: How can AI tackle problems, which involve extending a complex body of knowledge by suggesting new insights to the domain itself – for example new drugs to cure diseases?
  8. Complex Task Analyser and Planner: How can AI tackle complex tasks requiring data analysis, planning and execution? Many logistics and scheduling tasks can be done by current (non-AI) algorithms. A good example is the use of AI techniques in IoT for Sparse datasets . AI techniques help this case because there are large and complex datasets where human beings cannot detect patterns but machines can do so easily.
  9. Better Communication: While proliferation of smart chatbots and AI-powered communication tools is a trend since several years, these communication tools are still far from being smart, and may at times fail at recognising even a simple human language.
  10. Better Perception and Understanding: While Alibaba, Face+ create facial recognition software, visual perception and labelling are still generally problematic. There are few good examples, like this Russian face recognition app  that is good enough to be considered a potential tool for oppressive regimes seeking to identify and crack down on dissidents. Another algorithm proved to be effective at peeking behind masked images and blurred pictures.
  11. Anticipate Second-Order (and higher) Consequences: AI and deep learning have improved computer vision, for example, to the point that autonomous vehicles (cars and trucks) are viable (Otto, Waymo) . But what will their impact be on economy and society? What’s scary is that with advance of AI and related technologies, we might know less on AI’s data analysis and decision making process. Starting in 2012, Google used LSTMs to power the speech recognition system in Android, and in December 2016, Microsoft reported their system reached a word error rate of 5.9%  —  a figure roughly equal to that of human abilities for the first time in history. The goal-post continues to be moved rapidly .. for example is building an avatar that can capture your personality. Preempting what’s to come, starting in the summer of 2018, EU is considering to require that companies be able to give users an explanation for decisions that their automated systems reach.
  12. Evolution of Expert SystemsExpert systems have been around for a long time.  Much of the vision of expert systems could be implemented in AI/deep learning algorithms in the near future. The architecture of IBM Watson is an indicative example.
  13. Better Sentiment Analysis: Catching up but still far from lexicon-based model for sentiment analysis, it is still pretty much a nascent and unchartered space for most AI applications. There are some small steps in this regard though, including OpenAI’s usage of mLSTM methodology to conduct sentiment analysis of text. The main issue is that there are many conceptual and contextual rules (rooted and steeped in particulars of culture, society, upbringing, etc of individuals) that govern sentiment and there are even more clues (possibly unlimited) that can convey these concepts.