Bayes craze, neural networks and uncertainty

Story, context and hype

Named after its inventor, the 18^th-century Presbyterian minister Thomas Bayes, Bayes’ theorem is a method for calculating the validity of beliefs (hypotheses, claims, propositions) based on the best available evidence (observations, data, information). Here’s the most dumbed-down description: Initial/prior belief + new evidence/information = new/improved belief.

P(B|E) = P(B) X P(E|B) / P(E), with P standing for probability, B for belief and E for evidence. P(B) is the probability that B is true, and P(E) is the probability that E is true. P(B|E) means the probability of B if E is true, and P(E|B) is the probability of E if B is true.

Since recently, Bayesian theorem has become ubiquitous in modern life and is applied in everything from physics to cancer research, psychology to ML spam algorithms. Physicists have proposed Bayesian interpretations of quantum mechanics and Bayesian defences of string and multiverse theories. Philosophers assert that science as a whole can be viewed as a Bayesian process, and that Bayesian approach can distinguish science from pseudoscience more precisely than falsification, the method popularised by Karl Popper. Some even claim Bayesian machines might be so intelligent that they make humans “obsolete.”

Bayes going into AI/ML

Neural networks are all the rage in AI/ML. They learn tasks by analysing vast amounts of data and power everything from face recognition at Facebook to translation at Microsoft to search at Google. They’re beginning to help chatbots learn the art of conversation. And they’re part of the movement toward driverless cars and other autonomous machines. But because they can’t make sense of the world without help from such large amounts of carefully labelled data, they aren’t suited to everything. Induction is prevalent approach for learning methods and they have difficulties dealing with uncertainties, probabilities of future occurrences of different types of data/events and “confident error” problems.

Additionally, AI researchers have limited insight into why neural networks make particular decisions. They are, in many ways, black boxes. This opacity could cause serious problems: What if a self-driving car runs someone down?

Regular/standard neural networks are bad at calculating uncertainty. However, there is a recent trend of bringing in Bayes (and other alternative methodologies) into this game too. Currently, AI researchers, including those working on Google’s self-driving cars, started employing Bayesian software to help machines recognise patterns and make decisions.

Gamalon, an AI startup that went life earlier in 2017, touts a new type of AI that requires only small amounts of training data – its secret sauce is Bayesian Program Synthesis.

Rebellion Research, founded by the grandson of baseball grand Hank Greenberg, relies upon a form of ML called Bayesian networks, using a handful of machines to predict market trends and pinpoint particular trades.

There are many more examples.

The dark side of Bayesian inference

The most notable pitfall of Bayesian approach is the calculation of prior probability. In many cases, estimating the prior is just guesswork, allowing subjective factors to creep into calculations. Some prior probabilities are unknown or don’t even exist such as multiverses, inflation or God. In this way, Bayes’ theorem can promote pseudoscience and superstition as well as reason.

In 1997, Microsoft launched its animated MS Office assistant Clippit, which was conceived to work on Bayesian inference system but failed miserably .

In law courts, Bayesian principles may lead to serious miscarriages of justice (see the prosecutor’s fallacy). In a famous example from the UK, Sally Clark was wrongly convicted in 1999 of murdering her two children. Prosecutors had argued that the probability of two babies dying of natural causes (the prior probability that she is innocent of both charges) was so low – one in 73 million – that she must have murdered them. But they failed to take into account that the probability of a mother killing both of her children (the prior probability that she is guilty of both charges) was also incredibly low.

So the relative prior probabilities that she was totally innocent or a double murderer were more similar than initially argued. Clark was later cleared on appeal with the appeal court judges criticising the use of the statistic in the original trial. Here is another such case.

Alternative, complimentary approaches

In actual practice, the method of evaluation most scientists/experts use most of the time is a variant of a technique proposed by Ronald Fisher in the early 1900s. In this approach, a hypothesis is considered validated by data only if the data pass a test that would be failed 95% or 99% of the time if the data were generated randomly. The advantage of Fisher’s approach (which is by no means perfect) is that to some degree it sidesteps the problem of estimating priors where no sufficient advance information exists. In the vast majority of scientific papers, Fisher’s statistics (and more sophisticated statistics in that tradition) are used.

As many AI/ML algorithms automate their optimisation and learning processes, they can deploy a more careful Gaussian process consideration, including type of kernel and the treatment of its hyper-parameters, can play a crucial role in obtaining a good optimiser that can achieve expert level performance.

Dropout (which addresses overfitting problem), is another technique that has been in use for several years in deep learning, is another technique that enables uncertainty estimates by approximating those of Gaussian process. This is a powerful tool in statistics that allows model distributions over functions and been applied in both the supervised and unsupervised domains, for both regression and classification tasks. It offers nice properties such as uncertainty estimates over the function values, robustness to over-fitting, and principled ways for hyper-parameter tuning.

Google’s Project Loon uses Gaussian process (together with reinforcement learning) for its navigation.

Can technology fail humanity?

Technology, a combination of two Greek words signifying ‘systematic treatment of art/craft/technique,’ is:

the collection of techniques, skills, methods and processes used in the production of goods or services or in the accomplishment of objectives..

Whether it was discovery of fire, building a shelter, invention of weapons – and in modern times – invention of Internet, microchips, etc., it has always been about inventing, discovering and using information, techniques and tools to induce or cause economic, scientific and social progress or improvement.

However, the progress that technology caused has neither been linear or impending or ubiquitous or even obvious. All Four Great Inventions of China happened before 12th century AD. But, on the other side, despite Hippocrates’ treatise (dating from 400 BC) that, contrary to the common ancient Greek belief that epilepsy was caused by offending moon goddess Selene, it had a cure in form of medicine and diet, 12th-14th century Christendom perceived epilepsy as the work of demons and evil spirits, and its cure was to pray to St. Valentine and other saints. And in many cases, progress of technology itself or its consequences have been a matter of pure chance or serendipity, whether it is penicillin, X-rays or 3M’s post-its.

So, ironic as it is, until recently, technology hasn’t been very systematic in its own progress, let alone its impact on society, economy and culture of nations. But it’s become a lot more systematic since the dawn of Information Age, last 60 or so years. Since microchips, computer networks and digital communication were invented (all in the US), the technology became more systematic in its own progress and it’s becoming more miniature, cheaper, faster and more ubiquitous than ever before in human history. Ubiquitous technology makes the world hyper-connected and digital. Whether it is our phones, thermostats, cars, washing machines, everything is becoming connected to everything. It is thus no coincidence that California (Silicon Valley + Hollywood) has recently become the 6th largest economy in the world, thanks to its beaconing technological and creative progress embodied in last 60 or so years.

Trump era has begun in January 2017, and he already did more to damage any potential technological and scientific progress coming from the US than any of his predecessors. From trying to unreasonably curb immigration from Muslim countries to terminating TPP to undoing progress in transitioning to clean energy and again focusing on coal to disempowering OSTP, Trump wraps his decisions with firebrand rhetoric and well-thought out psychological biases (anchoring bias is his favourite) around one message: MAGA. Hopes are turning to China as the next flag-bearer of technological progress.

Nowadays, even coffee-shops are hyper-connected, aiming to personalize our coffee-drinking experience. And thanks to its omnipresence and pervasiveness of Internet, wireless connections, telecommunications, etc., technology (smartphones, games, virtual worlds, 3D headsets, etc.) is becoming and end in itself. In countries and cities like Singapore, Hong Kong, New York, digital and smartphone addiction is already a societal problem causing unintended deaths, lack of maturity, loss of educational productivity, marriage breakups, to cite but a few. In Singapore, where according to recent research, Millennials spend an average of 3.4h/day on their smartphones, government is now putting in place policies and organizations to tackle this psychological addiction.

However, even Bernie Sanders knows that technology cannot and should not be an end in itself or an addiction. Could Internet and technologies fail? Could Internet and thinking linked to it spell the end of capitalism? Could it cause societies, cultures and nations to fail?

Technology has proven to fail itself and us when it became an end in itself.

Only when it stays true to its nature and acts as an enabler, a platform for human endeavors is when technology will succeed. It can even end poverty or other problems and issues human race is facing..

Some historic fails of science and technology

Historically, science and technology have gone along many routes which turned out to be dead-ends.

Science has many discredited theories and obsolete paradigms such as alchemy, phlogiston, universal ether, and more. They failed the reality test and were cast aside, occasionally turning up in fantasy stories and crackpot websites. Some scientists use science to explain/interpret social and cultural phenomena. Sam Harris, for example, says (completely ignoring the spiritual, cultural and social aspects) the religion “is indeed failed science” and expresses hope that information, education and science will rectify this situation. In the modern world even renowned scientists are prone to making claims and predictions, which cannot be substantiated. An interesting case in point is Paul Ehrlich, a world-renowned entomologist whose failed predictions about environment (for example “There is no evidence that global warming is real“) are still resoundingly discrediting relevant scientific research and available empirical data. There are even cases (controversy surrounding discovery of element 118) when scientists deliberately fabricate fake data to support their theories and claims.

Technology has a bit more wiggle room but is still full of false/failed predictions and intentions. Some technologies were perfectly viable from an engineering standpoint, but either couldn’t compete economically or never really had a market. The canonical example is airships. With abandoned technologies there’s always the suspicion that if things had turned out differently we might be driving atomic cars or be regular tourists on the spacecraft Cycler commuting between Mars and the Earth or some other high-end, futuristic sci-fi-inspired gig. Douglas Self‘s Museum of Retro Technology contains information on dozens of devices, which existed, but never became part of everyday life.

Failed technologies are different from completely bogus technology. We’ll never get power from a Keely Motor and we will most probably be unable to make mainstream steam-powered airplanes because they either had high costs related to manufacturing or unprepared markets or could simply not compete with more conventional designs.

But all these false roads and blank directions were not taken in vain. If you take a look at major inventions and discoveries in science and technology you will see that most, if not all, happened by merry happenstance (radioactivity theory by Marie Curie), unanticipated development (Arpanet and Internet) or in the course of pursuing a plainly different objective (serendipitous case of discovering penicillin by Alexander Fleming).

Some eponymies in science

In history, it is rare that scientist achieve notoriety and fame during their lifetimes. If they nonetheless do, they get credit and lasting recognition by having a scientific discovery named after them.

However, there happen to be wrong naming attributions. Indeed, naming disputes are so common that there is even a rule of thumb called the Zeroth theorem, which states that eponymous discoveries are, more often than not, wrongly attributed. Appropriately enough, the theorem is also known as Stigler’s law of eponymy even though it was originally formulated by Robert Merton.

Below are few examples.

Antonio Meucci – who despite developing the first telephone spent his whole life in poverty (“if Meucci had been able to pay the $10 fee to maintain the caveat after 1874, no patent could have been issued to Bell”), while Alexander Graham Bell got all the glory.

Alan Turing – whose huge strides in the conception of the first generation of computers (his work for the Colossus computer, the world’s first programmable digital electronic computer) were destined to never to be fully attributed to him, due to his untimely death.

Nikola Tesla – who died almost totally penniless, while the ideas he had put forward for radio (he demonstrated a wireless communication – radio – in 1894) made Guglielmo Marconi (who received Nobel Prize in Physics for radio in 1909) a fortune.

Jean-Baptiste Lamarck – who correctly surmised that living things evolved, over sixty years before Charles Darwin publicized the fact, but was to die in ignominy with his ideas not appreciated (but tacitly considered by Darwin in his On Origin Of Species).

Geoffrey Dummer – whose musings on the development of the integrated circuit preceded those of Bob Noyce and Jack Kilby by almost a decade, but due to lack of vision by the British Government his plans were never to make it off the drawing board.

Joseph Swan – who despite having the technical expertise that allowed him to design the first workable electric light bulb, was no match for the commercial machinations of adversary Thomas Edison.

Johann Loschmidt – an Austrian scientist who calculated in 1865 the number of molecules in a mole but it was Italian chemist Amedeo Avogadro, whose name became associated with the number.

Albert Neisser – who discovered leprosy (officially known as Hansen’s disease, in honour of the Norwegian physician Gerhard Armauer Hansen, who discovered the bacterium responsible but did not manage to cultivate it, or show that it was truly linked to leprosy), and who obtained from Hansen a large set of samples from people with leprosy. Neisser succeeded in staining the bacterium and, in 1880, announced that he had discovered the cause of leprosy. Hansen wrote a lengthy article about his own research for a conference on leprosy, which credited him, not Niesser, for the discovery.

Robert Hooke – Who postulated, amongst other things, the true nature of planetary motion, only to witness his rival Isaac Newton take all the praise for it.

Sources: New Scientist, ECNmag

Failures of the theory of Darwin (part 1)

Evolution theory devised by Darwin is generally considered one of the most important intellectual achievements of the modern age. The theory allegedly put an end to hitherto existing speculations purporting to explain evolution of humanity and life on earth. In 1859, when the Origin of Species was first published, it did not directly reference humans nor made any claims of our common ancestry with other mammals. Ever since and with increasing knowledge in spheres of anthropology, genetics and biology, modern scientists came to hold it not as a possible conjecture (a sound theory with many explanations of empiric data) but as universal truth about the human life on earth. Currently, two main version of evolution theory exist: phyletic gradualism (uniformity and gradual transformation) and punctuated equilibrium (slight changes with final leap).

However till now, the theory failed to exhaustively explain or address a number of open questions and and issues:

1. Darwin, in The Descent of Man, considered it logical to extend the theory to cognition, when he considered human characteristics such as morality or emotions to have been evolved, introducing evolutionary psychology. It holds that human nature was designed by natural selection in the Pleistocene epoch and aims to apply evolutionary theory to the human mind. It proposes that the mind consists of cognitive modules that evolved in response to selection pressures faced by our Stone Age ancestors. In the recent research conducted by authorities on the topic, Buller (in his book Adapting Minds) and Richardson (in his book Evolutionary Psychology as Maladapted Psychology) show that neither the methodology nor the results of evolutionary psychology can be justified scientifically.

2. An apparent lack of “evolutionary” effect on bacteria (new generation: 12 mins to 24 hours) and fruit flies (new generation: 9 days) with unlimited number of genetic mutations and variations. Evolution theory must have had even a bigger effect on those because of a recently introduced model, which suggests that body size and temperature combine to control the overall rate of evolution through their effects on metabolism (smaller organisms evolve faster and are more diverse than larger organisms).

3. On rare and random occasions a mutation in DNA improves a creature’s ability to survive, so it is more likely to reproduce (natural selection). But it is widely known that there are very few human treats, which were tracked to one gene (sicknesses like the Dracula Gene and the Cheeseburger Gene). Modern science currently holds that most of even simplest of human treats, features and behavioral patterns have underlying sophisticated molecular and genetic mechanisms. Therefore it is doubtful natural selection could favor parts that did not have all their components existing in place, connected, and regulated because the parts would not work.

4. The Cambrian/Precambrian time period does not support Darwinian evolution. There are no intermediate (transitional forms) found during this period. There appear to be no fossil ancestors for complex invertebrates or fish.

5. The theory of evolution seems to be in violation of two fundament laws: second law of thermodynamics (things fall apart over time, they do not get more organized) and law of biogenesis (living cells divide to make new cells, and fertilized eggs and seeds develop into animals and plants, but chemicals don’t fall together and life appears).

To be continued some time soon..