Hear that? It's IBM’s Acoustic Insights Program

As head of IBM’s Acoustic Insights Program, Nancy Greco has the ability to diagnose mechanical issues based purely on acoustic data. (And her unique skill extends beyond the factory—she recently lent her talents to the U.S. Open, where she analyzed ball-strike sounds.)

IBM Acoustic Insights Program's Nancy Greco

Sounds cool, right? We wanted to learn more. Take a look…

Smart Industry: What is IBM's Acoustic Insights program?

Nancy: IBM Research is developing the ability to use machine learning to identify sounds, and AI to advise corrective actions. Have you ever been in a car that makes a weird sound, but no error code appears? This problem, in which a malfunction or defect is not being detected by any other sensor, happens in factories, buildings and machines. Sound has so much hidden data. It surrounds us, but we have not been able to harvest it until the advent of machine learning. However, machine learning models must be user-friendly so that anyone, from a pedestrian to a mechanical operator, can build and infuse their expertise into the model.

Here’s another example: building chillers, like air conditioning in hospitals, tend to malfunction at the worst possible time and create massive issues. Often times, we were able to find a technician who could diagnose the chiller by the sound it was making. We can now leverage the technician's expertise by having them record and label sound clips. IBM then creates a model that learns to recognize this exact sound. This can even be accomplished on any edge device, such as a phone, which has the computational power and memory to build and deploy a model. Once a sound is identified, the AI activates and identifies the corrective action. IBM Research is working with IBM's Global Business Services to offer acoustics as differentiating analytics in many industry use cases spanning manufacturing, buildings and even animal wellness.

Smart Industry: How is this field changing in this era of digital transformation?

Nancy: Sound was previously not being harvested much at all. Now it has the ability to fill in critical pieces of data. We combine it with other information, like parametric data and visual data, in order to quickly reach a diagnosis and ensure that the corrective action can immediately be identified and launched. Sound can also be used for preventive measures in order to proactively detect the onset of a problem. For example, a light rattle can be the onset of a bracket that is going to fall off in two weeks, or even be a ball bearing that’s going to fail in a week. Digital transformation entails getting the data in digestible forms to drive a degree of automation that saves valuable time, ensures quality and shares and trains humans to a level of expertise that took years to obtain previously.

Imagine your phone being able to listen to your engine, and diagnoses its issues immediately based on dozens of experts who previously infused their knowledge into the model. The same device can now verbally (or through augmented reality) give you step-by-step instructions to complete the repair. And if you learn a better approach, you can add your knowledge into the model. Sensory-based analytics like sight and sound are changing the way we work.

Smart Industry: Provide an example of fixing a mechanical issue purely with acoustic data.

Nancy: Imagine that you just bought a very expensive car, but the car is emitting a very weird sound and no sensor is detecting it. An acoustic model would immediately identify it as the wheel trim coming loose or a ball joint going bad.

The good news is that robotic welding machines have increased the production of car frames by ten to a thousand times. The bad news is that a defective welder can go undetected because parametric data is not picking it up. So a lot of product can be exposed before a visual inspection detects it. IBM has proven that we can detect a welder malfunction in seconds and stop the welder for corrective actions.

Smart Industry: What are machines communicating when they are "talking" to us?

Nancy: Machines are telling you one of three things.

All is okay, and everything is running well.
They‘re not running in an optimal condition. They’re not bad enough to shut down but, have the onset of a problem and going to fail in 3-4 days.
They need attention now and are malfunctioning in a certain manner. Very shortly, your machine in some way will say, "I am going into failure mode. Pick up these parts and tools. I will finish my current task in three hours, then let’s do the repair." When the technician arrives, the machine will instruct through AR and audible prompts. When the repair is done and the machine is fixed and turned on, it will validate the repair and indicate, "I'm normal now."

Smart Industry: What are the most common problems detected audibly?

Nancy: Unusual rattles, air leaks, falling products or parts, a part coming loose, or a broken fan or gear on a motor.

Smart Industry: How are you using pattern-recognition machine-learning and artificial-intelligence technology to detect anomalies related to equipment failure?

Nancy: We create a model to define what “normal” is, and attributes of that model can then detect an outlier: a piece of data that is not in the normal distribution of the other data. It can stand out due to several data characteristics, and it can be very nuanced and undetectable by a human. This is where outlier detection techniques come into play. Manufacturing environments are often very noisy. If something falls from an assembly line or from a machine, the event can last less than a second. And while a human may not have detected it, our outlier detection will.

Smart Industry: What is the process of fingerprinting these sounds?

Nancy: Step one: we need to precisely record the sound of interest and label it. The labeling process involves identifying portions of the sound wave. Let's say that you have a one-minute sound clip. Every five seconds, a tap of 500 milliseconds occurs in which a robotic arm hits an object rather than picks it up due to misalignment. This is an onset of a bigger issue and could result in a damaged product. The subject-matter expert will only label the actual tap, not the entire one-minute clip, as robotic-arm misalignment. Otherwise, the model could get confused. We built the capability to do this labeling fairly easily. The next step, which we have automated, involves feature extraction, which determines the characteristic about this sound wave for which we should additionally train. This could be viewed as the fingerprint technique that’s defined by what we are looking at. Accordingly, the final step is to build a specific model trained to recognize these sounds.