Is Machine Learning Analytics or AI?
One of the definitional debates that bedevils the artificial intelligence (AI) field is whether machine learning is an AI-based method or technology. Or is it just an analytics-based activity? After all, it is statistical in nature, and attempts—as virtually analytical methods do—to fit a line or curve to a set of data points. So is it really AI? And what difference does it make?
Basic machine learning is practically indistinguishable from predictive analytics. Eric Siegel, the author of the book Predictive Analytics, once told me that, and I discounted it somewhat. But the more I think and know about machine learning, the more I agree with him—with one key qualification.
Basic machine learning is basically predictive analytics. It uses “supervised learning”— the creation of a statistical model based on data for which the values of the outcome variable are known. For example, a machine learning model attempting to predict fraud in a bank would need to be trained on a system in which fraud has been clearly established in some cases. It can involve as simple a modeling approach as linear regression. The model is tested with a validation dataset, for which the predicted outcome is compared to the known outcome. Then once a model is found that explains the variance in the training data and predicts well, it is deployed to predict or classify new data for which the outcome variable isn’t known—sometimes called a scoring process. This sort of process has been going on for a long time—Jim Goodnight, the CEO of SAS, says that SAS has been doing machine learning since the 1980s. The first FICO score, which uses a form of basic machine learning to create its credit scores, was introduced in 1989. In other words, this is clearly a well-established idea, and it is clearly very analytical in nature.
It’s important to point out that supervised machine learning models don’t learn continuously, as many people think. They learn from a set of training data, and then they continue to use the same model unless a new set of data is employed to train and validate new models. If the learning were more continuous rather than one-time or episodic, it might make traditional supervised machine learning feel more exotic and more likely to be classified as artificial intelligence.
But there are machine learning approaches other than “basic.” Beyond regression models, there are more than a hundred possible algorithms in machine learning, many of them somewhat esoteric. They range from “gradient boosting” (an approach that builds models that addresses errors of previous models, thus boosting the predictive or classification ability) to “random forests” (models that are collections of decision tree models). Machine learning also encompasses even more complex model types like neural networks and deep learning. These are still statistical in nature, but they are not very interpretable by human analysts.
One could argue that these more complex forms of machine learning are of a different class than the basic ones, and closer to what we think of as Ai. Neural networks and deep learning models are heavily oriented to classification—fraudulent transaction or not, cat photo or not—and less focused on prediction. The average quantitative analyst would not be able to perform or understand such analyses, whereas basic machine learning would be easily within their reach.
In addition, there are Increasingly software tools (including DataRobot and Google’s AutoML) that allow the automated construction of supervised machine learning models. They try out many different algorithms to see which is most successful, and in some cases perform other important functions such as categorizing or excluding data, providing explanations for which variables are important, and generating code to deploy the model (to create a score). The automated nature of these tools feels to me more like AI because we often associate AI with automation.
There are also types of machine learning beyond supervised learning. Unsupervised models detect patterns in data that isn’t labeled and for which the result isn’t known. A third variation, reinforcement learning, is when machine learning systems have a defined goal and each move toward it yields a form of reward. It has been very useful in playing games, but also requires a lot of data.
No supreme being saw fit to establish clear definitions of anything in this world, and AI is among the worst-defined terms in our lexicon. I personally include machine learning in the AI category even though the most basic forms of it are really just analytics. It seems too confusing to have some versions of machine learning in AI, and some not. And I would certainly put automated machine learning, neural networks, and deep learning in the AI domain.
But what does all this mean for practitioners? I will leave it to you to make your own definitional call. But if you are an analytics person, I would certainly encourage you to pursue the basic forms of machine learning. In fact, you’ve probably already done that. But if your analytical expertise tops out at logistic regression—as mine does—you may want to be careful about jumping into unsupervised learning, deep learning, and other exotic forms of machine learning. Whatever you call them, they’re not an easy set of concepts to grasp—even though they have statistics and analytics at their core.