Think about the smartest person you know. What about this person leads you to describe him or her as incredibly intelligent?
Is she a quick thinker, able to internalize and reuse new concepts immediately? Is he highly creative, able to endlessly generate new ideas you’d never think of? Perhaps she’s highly perceptive and can pick out the tiniest details of the world around her. Or maybe he’s deeply empathetic and understands what you’re feeling even before you do.
Computers trounce humans at large-scale computational tasks, but human intelligence spans a much wider spectrum beyond what machines are currently capable of. From math geniuses to musical prodigies to sales wizards, human exhibit logical, spatial, emotional, verbal, somatic, and many other modalities of intelligence. We leverage cognitive abilities like working memory, sustained attention, category formation, and pattern recognition to understand and succeed in the world.
AI versus AGI
Due to a resurgence in popularity and hype, the term “artificial intelligence” has been misused to describe almost kind of computerized analysis or automation, regardless of whether the technology can be described as “intelligent”. If you define “intelligence” as “human-level intelligence”, then by that definition we don’t have artificial intelligence today.
AI is like teenage sex: everyone talks about it, nobody knows how to do it, everyone thinks everyone else is doing it & so claims to do it
— Mariya (@thinkmariya) April 5, 2017
To avoid confusion with the more general term AI, experts prefer to use the term Artificial General Intelligence (AGI) to refer to human-level intelligence capable of abstracting concepts from limited experience and transferring knowledge between domains. AGI is also referred to as “Strong AI” to differentiate against “Weak AI” or “Narrow AI”, which are systems designed for a specific task whose capabilities are not easily transferrable to others.
All of the AI systems we have today are “Weak AI”, including impressive achievements like Deep Blue, which beat the world champion in chess in 1997, and AlphaGo, which did the same for the game of Go in 2016. These narrowly intelligent programs defeat humans in a specific task, but unlike human world champions are not capable of also driving cars or creating art. Solving those other tasks requires other narrow programs to be built.
While many novel techniques have emerged recently to built “Narrow AI”, most experts in the industry agree that we are far from achieving AGI or human-level intelligence in our machines. The path towards AGI is also unclear. Many of the approaches which work well for solving narrow problems do not generalize well to abstract reasoning, concept formulation, and strategic planning – capabilities that even human toddlers exhibit that our computers cannot.
Modern AI Techniques
How are we building today’s “Narrow AI”? Most enterprise-scale technologies use a wide range of methodologies, not all of which count as “AI”. Differentiating between them can be tricky and often there is material overlap.
While engineers and researchers need to master the subtle differences between various technical approaches, business and product leaders should focus on the ultimate end goal and real world results of machine learning models. You will find that often simpler approaches outperform complex ones in the wild, even if they’re intellectually less “advanced”.
Statistics & Data Mining
Statistics is the discipline of collecting, analyzing, describing, visualizing, and drawing inferences from data. The focus is on discovering mathematical relationships and properties within data sets and quantifying uncertainty.
Descriptive statistics entails describing or visualizing a population, or group that data has been gathered from. A simple application may be analyzing the items sold by a retail store in a specific time period.
Inferential statistics is applied when the true population is too large or difficult to capture and analyze, and a smaller representative sample must be drawn from it for study. The answers in inferential statistics are never 100% accurate and are instead probabilistic bets, since the analysis is done on a subset of the data, not the entirety. Election polling, for example, relies on surveying a small percentage of citizens to gauge the sentiments of the entire population. As we saw during the 2016 US election cycle, conclusions drawn from samples might not be representative of the truth!
Data mining is the automation of exploratory statistical analysis on large-scale databases, though the term is often used to describe any kind of algorithmic data analysis and information processing, which may also include machine learning and deep learning techniques. The goal of data mining is to extract patterns and knowledge from large-scale data sets so that they can be reshaped into a more understandable structure for later analysis.
Symbolic & Expert Systems
Symbolic systems are programs that use human-understandable symbols to represent problems and reasoning and were the dominant approach to AI from the 1950s to the 1980s. The most successful form of symbolic systems are expert systems which are designed to mimic the decision-making process of human experts. They are comprised of a series of production rules, similar to if-then statements, that govern how the computer makes inferences and accesses a knowledge base.
Rule-based expert systems are best applied to automate calculations and logical processes where rules and outcomes are relatively clear. As decision making becomes more complex or nuanced, formalizing the full range of requisite knowledge and inference schemes required to make human-level decisions becomes intractable.
The obvious drawback of expert systems is that domain expert are required to hand engineer the rules engine and knowledge base for the expert system. While symbolic systems are historically not scalable or adaptable, recent research has investigated combining them with newer methods like machine learning and deep learning to improve performance.
What happens if you want to teach a computer to do a task, but you’re not entirely sure how to do it yourself? Or the problem is so complex that it’s impossible for you to encode all the rules and knowledge upfront?
Machine learning is the field of computer science that enables computers to learn without being explicitly programmed and builds on top of computational statistics and data mining.
Supervised learning is when the computer is presented with input and output pairs, such as an image with a label (i.e. “cat”) and learns general rules to map the input to the output. Supervised learning is commonly used for classification, where you divide inputs into distinct categories, and regression, where the outputs are continuous numbers. If you are trying to predict whether an image is of a cat or a dog, this is a classification problem with discrete classes. If you are trying to predict the numeric price of a stock or other asset, this is a continuous output and can be framed as a regression problem.
Unsupervised learning occurs when computers are given unstructured rather than labeled data, i.e. no input-output pairs, and asked to discover inherent structure and patterns that lie within the data. One common application of unsupervised learning is clustering, where input data is divided into different groups based on a measure of “similarity”. For example, you may want to cluster your LinkedIn or Facebook friends into social groups based on how interconnected they are with each other. Unlike with supervised learning, the groups are not known in advance, and different measures of similarity will produce different results.
Semi-supervised learning lies between supervised and unsupervised learning, where the input-output pairs are incomplete. Many real-world data sets are missing labels or have noisy, incorrect labels. Active learning, a special case of semi-supervised learning, occurs when an algorithm actively queries a user to discover the right output or label for a new input. Active learning is used to optimize recommender systems like the ones used to recommend new movies on Netflix or new products on Amazon.
Reinforcement learning is applied when computer programs are instructed to achieve a goal in a dynamic environment. The program learns by repeatedly taking actions, measuring the feedback from those actions, and improving its behavioral policy iteratively. Reinforcement learning is applied successfully in game-playing, robotic control, and other well-defined and contained problems, but is less effective with complex, ambiguous problems where rewards and environments are not well understood and quantified.
Ensemble methods combine different machine learning models to produce superior results to any single model. Most successful applications of machine learning to enterprise problems utilize ensemble approaches. There are four broad categories of ensembling: bagging, boosting, stacking, and bucketing. Bagging entails training the same algorithm on different subsets of the data and includes popular algorithms like random forest. Boosting involves training a sequence of models, where each model prioritizes learning from the examples that the previous model failed on. In stacking, you directly combine the output of many models. In bucketing, you train multiple models for a given problem and dynamically choose the best one for each specific input.
Deep learning is a subfield of machine learning that builds algorithms using multi-layered artificial neural networks, which are mathematical structures loosely inspired by how biological neurons fire. Neural networks were invented in the 1950s, but recent advances in big data and computational power have resulted in human-level performance by deep learning algorithms in tasks such as speech recognition and image classification. Deep learning in combination with reinforcement learning enabled Google DeepMind’s AlphaGo to defeat human world champions of Go in 2016, a challenge that was considered computationally impossible by many experts.
Due to the recent hype, much media attention has been focused on deep learning, but only a handful of sophisticated technology companies have successfully implemented deep learning for enterprise-scale products. Google replaced previous statistical methods for machine translation with neural networks to achieve superior performance. Microsoft announced in 2017 that they achieved human parity in conversational speech recognition. Promising startups like Clarifai employ deep learning to achieve state-of-the-art results in recognizing objects in images and video for Fortune 500 brands.
While deep learning models outperform older machine learning approaches to many problems, they are more difficult to develop and require specialized expertise. Operationalizing and productizing models for enterprise-scale usage also requires different but equally difficult to acquire technical expertise. In practice, ensemble approaches often outperform deep learning approaches in both performance and transparency. Many enterprises also look to machine-learning-as-a-service (MLaaS) solutions from Google, Amazon, IBM, Microsoft and a number of leading AI startups rather than build custom deep learning solutions.
Aside from complexity of production deployment and a competitive labor market, deep learning also suffers from a few notable drawbacks. Successful models typically require a large volume of reliable, clean labeled data, which enterprises often lack. They also require significant and specialize computing power in the form of graphical processing units (GPUs) or tensor processing units (TPUs).
Critics of deep learning point out that human toddlers only need to see a few examples of an object to form a mental concept, while deep learning algorithms need to thousands of examples to achieve reasonable accuracy and even then still make laughable errors. Deep learning algorithms do not form abstractions or perform reasoning and planning the way we humans do.
Probabilistic programming creates systems capable of making decisions in the face of uncertainty by making inferences from prior knowledge. According to Avi Pfeffer in his book Practical Probabilistic Programming, you first create a model that captures knowledge of your domain in quantitative, probabilistic terms, then apply this model to specific evidence to generate an answer. This process is called inference.
While the research and applications are early, many experts see probabilistic programming as an alternative approach for areas where deep learning performs poorly, such as with concept formulation on sparse or medium-sized data. Probabilistic programs have been used successfully in applications such as medical imaging, machine perception, financial predictions, and econometric and atmospheric forecasting.
You can check out the MIT Probabilistic Computing Project for recommended reading and tutorials.
Other AI Approaches
Many other approaches to AI exist which are outside of the scope of this article, but which can be used alone or in combination with machine learning and deep learning to improve performance. Evolutionary and genetic algorithms are used in practice for generative design and in combination with neural networks to improve learning. Other approaches like Whole Brain Uploading (WBE), also known as “mind uploading”, seek to replicate human-level intelligence in machines by fully digitizing human brains. Yet other approaches seek to innovate at the hardware level by leveraging optical computing, quantum computing, or human-machine interfaces to accelerate or augment current methods.
This is part one of our WTF IS AI?! series. Part two is the Machine Intelligence Continuum.