By Kilvin Mitchell, technical writer at Wallaroo Labs
A growing trend among manufacturers is using streaming data to mitigate bottlenecks experienced in supply chain management. Simply put streaming data is any data constantly generated, often by a large number of data sources, delivered concurrently. Examples of streaming data can include log files created from the use of web and/or mobile applications, activity through e-commerce, data pulled from social media, or data transmitted from connected devices.
IoT streaming-data insights are especially useful in a manufacturing setting as they can support equipment/labor monitoring, operations/labor improvement, materials management, and supply chain optimization.
This kind of data needs to be analyzed one-by-one and piece-by-piece across ever-changing time intervals, for a range of different analytics such as sampling, correlations and filtering. Data pulled from these types of analyses provides manufacturers with transparency into various facets of their supply chain. It also provides insight into customer activity, which enables them to respond more rapidly to emerging trends. It can allow manufacturers to track changes regarding customer reactions to their brands and/or products by, for instance, constantly analyzing social networks to adjust production schedules as opportunities present themselves. It also allows for a quicker prioritization of budgets for high-runner products, while simultaneously eliminating production and warehousing costs for products no longer in demand.
Challenges using streaming data with machine learning models
Real-time data has unique traits that separate it from other types of data used in ML models. Typically when dealing with ML models the data is what’s called “batch data.” With batch data, all of the information is available from the beginning and is already saved. By contrast, streaming data is generated in consecutive style, which may also be temporary by definition. Therefore data streams will usually carry these traits:
● Your system does not control the order in which you receive the data
● The data streams are unrestricted in their size
● After a component from the data stream is analyzed, it may be abandoned
The unique traits of streaming data are also responsible for the obstacles it presents when ML models analyze data. The stability of batch data enables it to be searched completely and specific parts of the data can be retrieved at any time. However, with streaming data being temporary, it must be searched constantly by the ML algorithm, putting a bigger strain on the system being used. For example, with an average of 2 billion e-commerce transactions per year, the average data-stream-mining algorithm is compiling data from about 3,800 transactions per minute. Therefore, batch processing (while slower) tends to be used in supply chain management as it may not be feasible to deal with large volumes of data at these speeds without expanding computational infrastructure.
Also, given the unrestricted size of streaming data, the amount streamed along with the speed at which it is delivered can cause a major strain on the limits of the system being used. In particular, since this strain impacts the memory available in a given system, it may not be practical to save components of your streaming data because the amount of the data that comes through the stream is usually not predictable.
Another restriction is the processor’s power and the speed at which data components arrive. Data-stream-mining algorithms will often require a quick and minimal structure (computationally speaking).
Potential solutions for integrating streaming data into ML models
For manufacturers that don't employ teams of developers dedicated to mitigating the above issues, designing a supply chain process that can properly leverage an ML model is a near-impossible task. The amount of services needed to work in concert to harness the power of streaming data would mimic the responsibilities of an air-traffic controller, where it is all too easy for a delay to take place. Add to that the technology available today to utilize streaming data is only user-friendly for companies with a team of developers dedicated to this problem. For those without these resources, it is a constant issue that needs attention.
Manufacturers should consider ML-model-compute engines that are designed specifically for such real-time and high-volume computational tasks as data-stream mining. Utilizing an event-by-event runtime engine for streaming analysis is ideal for the constant flow of supply chain data. At the same time, the efficient and low-strain platform also allows for reduced computing costs, despite the size of your data stream, solving an issue that has plagued the last mile of the ML process. The platform design allows for seamless integration into an existing data ecosystem. Now with this tool, data scientists can develop, monitor, and optimize models for streaming data with relative ease.
Using one of the newer platforms on the market that provides a reduction in time-to-deployment with quicker inference, ML models enable faster analysis. They also use far less infrastructure than deployments using older platforms. The types of streaming-model deployments that would have required GPUs to provide the desired performance can now run successfully on more standard CPUs, which has resulted in significant savings for manufacturers.
Together, these enhancements make it much more practical for manufacturers to optimize their supply chain with streaming data.