Five Challenges of Analyzing Internet of Things (IoT) Data
The analysis of Internet of Things (IoT) data is quickly becoming a mainstream activity. I’ve written about the Analytics of Things (AoT) before (some examples here, here, and here). For this blog, I’m going to focus on a few unique challenges that you’ll most likely encounter as you move to take IoT data into the AoT realm.
CHALLENGE 1: THE DECEPTIVE SIMPLICITY OF IOT DATA
With many historical data sources, such as transactional data, it was often quite an effort to gather the source data required for analysis. It was necessary to identify what information was available, how it was formatted, and also to reconcile data from different sources that often contained similar information, but had inconsistencies in how it was provided. Ironically, this is one area where IoT sensor data can seem deceptively simple compared to many other sources.
Most sensors spit out data in a simple format. There is a timestamp, a measure identifier (temperature, pressure, etc.), and then a value. For example, at 4:59pm the temperature is 95 degrees. The good news is that this makes ingesting raw sensor data fairly straight forward in terms of the coding logic required. So, you can fairly quickly go from a raw feed to a dataset or table that’s ready for exploration. The catch is that after the ingestion of the raw data there are some challenges you’ll face before you can analyze your data, as we’ll explore next. Don’t let the simplicity of ingestion fool you.
CHALLENGE 2: DETERMINING THE PROPER FREQUENCY OF SENSOR READINGS
When it comes to data, such as a retail transaction, we typically care about every single record. With IoT data, it is necessary to determine the cadence that actually makes sense for your specific problem. For example, a temperature sensor may spit out a reading every millisecond. However, in most cases, receiving data at that cadence is overkill. That overkill has a price due to the cost of storing the extra data and the cost and complexity of analyzing masses of data that aren’t valuable.
As a result, it is necessary to determine what cadence actually has value for the problem you’re tackling. If you’re monitoring a car engine, readings once per second might be more than enough. It could be that readings every 5 or 10 or 60 seconds would be plenty. The point is that you have to assess each metric and determine what you need through some experimentation. Then, filter the data down to the proper level. Otherwise, you’ll be overwhelmed with data and meaningful patterns will be that much harder to identify.
CHALLENGE 3: IDENTIFYING COMPLEX PATTERNS OVER TIME
At the heart of many IoT analytics is the need to identify complex patterns or trends that occur over time. Classic time series and forecasting models are oriented toward identifying a trend and then extending it forward. When analyzing IoT data, in contrast, we are often interested in deviations from normal rather than projecting the expected.
After identifying what is normal we must do work to find abnormal patterns that are of importance. However, there are multiple ways that abnormal patterns might evolve. Sudden increases in temperature would naturally draw interest. But, what about the impacts of a very small rise in temperature that either persists for an extended period or that comes and goes with increasing frequency? There is much complexity in the identification of these time-based patterns.
CHALLENGE 4: FIGURING OUT HOW TO HANDLE INTERACTIONS
This challenge builds on the last two. Accounting for interaction between terms in a model was pretty easy in a classic regression model. You simply add special terms that capture how factors enhance or suppress the impact of each other. With sensor data, this process is much more difficult. Let’s assume you’ve figured out the proper cadence for each metric you care about and which patterns are important for each metric individually. How do you account for any interactions?
The problem is that there can be lags between impacts. For example, temperature may start rising in advance of pressure rising. To identify the interactions between various sensor readings requires complicated analysis to determine not just what metrics might interact, but also over what timeframe and with what lag. This makes the analysis difficult.
CHALLENGE 5: ACCOUNTING FOR ERRORS AND MISSING READINGS
Sensors aren’t always reliable. Any analytics process must build in checks and balances to account for missing data or data that is in error. For example, if an engine sensor says temperature spiked from 300 to 1,000 degrees in one second, there is a good chance that the reading was an error. If the next reading is back to normal, it is easy to flag and correct the error. But, what about if a sensor gets moved into an improper position or breaks and bad readings continue? What if a sensor fails to transmit at all for a period of time?
Your analytics processes must include logic to identify suspected errors or transmission gaps and to handle those scenarios. You don’t want a multitude of warning lights or messages alerting to a problem if it is really a data issue.
The analysis of IoT data is on the rise, but not without its challenges. Think through the topics we’ve covered here to help ensure your success. There are good examples of how to successfully handle all of these issues that you can find, so you aren’t starting from scratch. But, it is necessary to do some homework up front to be successful.