The challenge: Moving from batch retrospective to real-time predictive analytics
First, let’s get the lay of the land. Batch processed analytics has a solid track record with well-defined use cases and best practices. And most analytics today are still limited to getting retrospective answers. Many organizations are turning to the more forward-looking predictive analytics to broaden the actions they could take beyond the confines of historical decisions and add ability to ask what-if questions.
The Internet of Things (IoT) on the other hand is still quite new, with methodologies for solving unique problems, such as those around real-time processing and connectivity limitations, still being hashed out. Now add the desire to move from retrospective to predictive and you have a world of new challenges. In the past few years, there have been great strides made in both technologies and methods in processing of data in real-time. But the move to predictive often takes an even bigger leap, due to the challenge that to get more incremental insight requires an exponential increase in data ingestion and processing capacity. As a result, predictive analytics still accounts for only a small fraction of a typical organization’s analytical capabilities.
IoT adds another twist: Network and processing bottlenecks
What makes IoT different when it comes to predictive analytics? In some applications, the majority of collected data loses value within milliseconds. Historically, data collection has been the hard part of a predictive analytics system. However, that’s shifting, especially with advances in industrial IoT technologies. IoT brings capabilities to scale the volume of data collection while reducing latency. As a result, now collection is becoming the easy part. The bottlenecks start at sanitization, modeling and integration. This in turn makes the downstream components of analytics and taking action more challenging.
When looking at optimizing an IoT implementation, it’s important to balance the roles and capabilities of “edge” vs. “cloud”. Edge refers to specialized infrastructure that can improve performance through physical proximity. It enables analytics and knowledge generation to occur closer to the source of the data. Edge gives you responsiveness, but not scale. Cloud, on the other hand, gives you scale but not responsiveness.
Making it work: Configuring the 3 components
There are three core components in any IoT implementation. First, collection, which includes sensing, network, storage and query capability. Second, learning, to analyze the data and generate predictions. And, third, taking action, typically using automated methods, on the analytics from prior stages. In traditional cloud-centric IoT architecture, while the actual sensors are outside of the cloud, the collect, learn and act components of the system often run into responsiveness challenges. In situations where data collection volumes are particularly large, the overhead of network communications has a significant impact on cost, sometimes up to 50% of the entire system.
Moving any portion of these three components to the edge results in performance gains, because less data needs to be moved between each. More buffering and storage can occur at the edge as the cost of memory and disk continues to drop. (It should also be noted however that the storage and query functions of the system may become less prevalent as data is processed and acted on in real-time.) Then various aspects of data filtering, computation and predictive analytics can be executed at the edge, so that only data required for centralized processing needs to be moved. These gains need to be balanced against increasing system complexity resulting from the fact that edge resources may not be continuously connected to the network. So clearly there are many ways to implement and fine tune predictive analytics for IoT. Doing this will only get easier as the field matures.
For context… The subject matter for this post came from a talk by Venu Vasudevan, Professor of Electrical & Computer Engineering at Rice University, where we discussed what makes IoT more challenging when it comes to specifically predictive analytics. This topic was presented at an IoT meetup for the Predix platform, a cloud-based IoT platform-as-a-service (PaaS) created by GE. It’s open source and built on CouldFoundry’s stack. Predix is available on AWS and Azure cloud services.