What's Up With In-Memory Analytics?

There’s been a lot of noise lately about the concept of in-memory analytics.  As a recent example, if you followed the press around some major conferences in late April, you’ll notice that a lot of messaging was present around in-memory analytics products.  Many people have been asking questions about how in-memory can fit into their mix and how it relates to other options like in-database analytics.  So, what’s up with in-memory analytics?

There are several different types of “in-memory” offerings in the market.  Let me start by clarifying that what I will focus on here is specifically a new analytic modeling approach where analytics such as logistic regression models or neural networks are run in a massively parallel in-memory environment.  Such approaches are very relevant to organizations that need to build models at a new level of scale and how such offerings might fit into your company’s mix is important to understand.

First, does an in-memory analytics platform replace or augment traditional in-database approaches?  The answer is that it is quite complementary.  In-database approaches put a large focus on the data preparation and scoring portions of the analytic process.  The value of in-database processing is the ability to handle terabytes or petabytes of data effectively.  Much of the processing may not be highly sophisticated, but it is critical.  Think of the generation of hundreds of customer metrics based on detailed transaction history prior to building a model.  The metrics are later computed again in order to apply the scoring algorithm generated from the model.

What in-database analytics has not as broadly addressed is the model building process.  In-memory analytics addresses this step in a compelling way.  There are some very resource-intensive modeling algorithms.  Logistic regression is one example.  Traditionally, sampling was used and a limited pool of metrics was included in order to avoid resource constraints.  This is no longer necessary.

The new in-memory architectures use a massively parallel platform to enable the multiple terabytes of system memory to be utilized (conceptually) as one big pool of memory.  This means that samples can be much larger, or even eliminated.  The number of variables tested can be expanded immensely.  And, the number of iterations that can be run can go up exponentially.  Of course, this power comes at a cost.  So, it will only make sense to use it in cases where the costs can be justified by the benefits.

In-memory approaches fit best in situations where there is a need for:

  • High Volume & Speed:  It is necessary to run many, many models quickly

  • High Width & Depth: It is desired to test hundreds or thousands of metrics across tens of millions customers (or other entities)

  • High Complexity: It is critical to run processing-intensive algorithms on all this data and to allow for many iterations to occur

One area that has been an initial focal point for in-memory analytics is the risk management function within major financial institutions. These institutions have hundreds or thousands of models that address various aspects of risk.  As new information comes in, the models need to be updated.  With in-memory analytics, these organizations are now able to update their models much more frequently than in the past, even daily or hourly, which leads to less risk and fewer losses.

The current generation of in-memory technologies won’t be a fit for everyone.  However, the use of such tools leads to several major benefits when they do fit:

  • More accurate models will be generated since many more iterations to tune the results can be completed in a timely fashion.

  • More frequent model updates since models can be updated regularly to keep scoring routines fresh instead of using existing scoring routines

  • A much higher volume of models can be generated in the same timeframe which allows models to be applied more broadly to new problems

  • Models can be built even on large and complex data in seconds or minutes, which allows expansion into near real time modeling

The combination of in-database analytics with in-memory analytics can be game changing in the right setting.  Be sure to consider how it may be a fit for your organization!