A Revolution in Analytical Technology
It’s been 10 years since Jeanne Harris and I published our book, Competing on Analytics, and we’ve just finished updating it for early-fall (2017) re-publication. We realized during this process that there have been a lot of changes in the world of analytics, although some things have remained the same. The timeless issues of analytical leadership, change management, and culture haven’t evolved much in 10 years, and in many cases those remain the toughest problems to address.
But analytical data, technology, and the people who use them have changed a lot. I must confess that I didn’t anticipate how difficult updating the book would be (so please buy it when it comes out to make the effort worthwhile!). I thought that perhaps a few global Replace commands in Microsoft Word would do the trick—changing terabytes to petabytes, for example, and quantitative analysts to data scientists. Although we did make a few such easy changes, there are many others that required more than a simple word substitution.
I won’t go into all of them here, but below is an annotated list of some of the most impactful changes over the past decade to the world of analytical technology.
Interest in this term started around 2011, according to Google Trends. Of course, it began to take off earlier in Silicon Valley, with the rise of Internet behavior (clickstream) data. These new data sources led to a variety of new hardware offerings involving distributed computing. And the need to store and process this data in new ways led to a whole new raft of software, such as:
HADOOP AND THE OPEN-SOURCE REVOLUTION
Hadoop was necessary to store and do basic processing on Big Data, along with such scripting languages as Pig, Hive, and Python. Since then we’ve seen other open-source tools rise in popularity, such as Spark for streaming data and R for statistics. Acquiring and using open-source data is a pretty big change in and of itself, but each of these specific software offerings brought a set of new capabilities.
Data lakes are Hadoop-based repositories of data. These don’t perform analytics themselves, but they are a great way to store different types of data—big and small, structured and unstructured—until it needs to be analyzed.
Many organizations want and need to integrate analytics with their production systems—for evaluating customers, suppliers, and partners in real time, and for making real-time offers to customers. This requires a good deal of work to integrate analytics into databases and legacy systems.
COMPONENTIZATION AND MICRO-SERVICES
Integration with production systems is much easier when the analytics are performed by small, component-based applications or APIs. Even proprietary vendors like SAS are moving in this direction.
Internet of Things and other streaming data sources have made it increasingly desirable to analyze data as it streams into an organization. This often requires integration with some sort of event-processing technology.
A big change in analytics has resulted in a change in the hardware environment for computing analytics. The outcome is speed—often order-of-magnitude increases in the speed of doing analytical calculations on data. In many organizations, the idea of submitting your analytics job and getting it back hours later is a distant historical memory.
I’ve saved some of the most important technologies for last. A key assumption behind analytics in the past is that they are prepared for human decision-makers. But cognitive technologies take the next step and actually make the decision or take the recommended action. They are actually a family of technologies, including machine and deep learning, natural language processing, robotic process automation, and more. Most of these technologies have some form of analytics at their core and, to me, they have more potential for changing how we do analytics than any other technology.
Just keeping up with these technologies and updating our analytics infrastructures to accommodate them is a full-time job. In addition, users need to be educated on which technologies make the most sense for their business problems.
And the old technologies haven’t gone away. Companies still use basic statistics packages, spreadsheets, data warehouses and marts, and all sorts of small data. In almost every organization, one can make a case that it’s a combination of the new and old analytical technologies that makes the most sense. In data storage, for example, the structured data that needs lots of security and access control can reside in warehouses, while the unstructured/prestructured data swims in a data lake. And if you want to understand your customers, you certainly need a combination of Big Data and small data—and the combination of methods and tools to analyze them.
This much change in 10 years surely constitutes a revolution. And the combination of new and old analytical technologies in most firms requires that we add more resources to manage them. That’s the downside of this revolution, but the upside is that we have a better understanding of our business environment than at any other time in history. Let’s hope we use it to make some great decisions, take informed actions, and introduce great new products and services based on data and analytics.