Skip to main content

Manufacturing · Machine Learning

Reducing unplanned downtime by 67% across 3 production lines through ML-based predictive maintenance

A mid-size US precision parts manufacturer operating 3 high-throughput CNC production lines was experiencing 340+ hours of unplanned downtime annually — each hour costing $18,000 in lost output, scrapped material, and emergency maintenance labour. Maintenance was entirely reactive: machines failed, production stopped, technicians responded. We built an ML-based predictive maintenance system on top of existing sensor infrastructure that predicts failures 6–18 hours in advance, reduced unplanned downtime by 67%, and delivered $4.1M in measurable first-year savings.

Business Context

The machines were telling them something was wrong.Nobody was listening.

The manufacturer ran three CNC production lines — 47 machines in total — producing precision aerospace and automotive components under tight tolerance specifications. Each machine had between 8 and 24 sensors already installed: vibration, temperature, spindle load, coolant pressure, and acoustic emission. The data was being collected by the SCADA system and stored — but never analysed. Maintenance was scheduled on fixed calendar intervals regardless of actual machine condition, and failures between scheduled maintenance windows were handled reactively. The maintenance team was experienced and capable. They simply had no early warning system.

The cost of reactive maintenance

340 hrs
unplanned downtime per year across 3 lines

Average across 2022–2023; individual incidents ranged from 2 hours to 3 days

$18K
cost per hour of unplanned downtime

Lost output, scrapped in-process parts, emergency labour, and expedited parts procurement

23%
of maintenance budget spent on emergency repairs

vs. industry benchmark of 8–10% for facilities with condition-based maintenance

The failure modes were well understood by the maintenance team — spindle bearing degradation, coolant pump cavitation, and tool holder runout were responsible for 74% of unplanned stops. The team could often tell a machine was "running rough" hours before failure, but had no systematic way to act on that intuition across 47 machines simultaneously. One experienced technician could monitor a handful of machines closely. Nobody could monitor all 47.

The sensor data was the asset. Two years of vibration, temperature, and load data sat in the SCADA historian — including the signatures of every failure event that had occurred. The problem was not a lack of data. It was the absence of a system that could read that data in real time and translate it into actionable maintenance alerts before the failure occurred.

Scope of Work

What we were asked to build

01

Sensor data pipeline and feature extraction

Real-time ingestion pipeline from the SCADA historian — pulling vibration, temperature, spindle load, coolant pressure, and acoustic emission data at 1-second intervals per machine. Feature extraction computing 60+ time-domain and frequency-domain features per sensor per machine: RMS, kurtosis, spectral entropy, bearing fault frequencies, and trend derivatives.

02

Failure prediction models per failure mode

Separate ML models trained per failure mode per machine class — spindle bearing degradation, coolant system faults, and tool holder anomalies. Models trained on 2 years of historical sensor data with failure event labels provided by the maintenance team. Outputs a health score per machine updated every 15 minutes with a predicted time-to-failure range.

03

Maintenance alert and work order integration

Alert engine generating prioritised maintenance work orders when health scores cross configurable thresholds. Alerts routed to the CMMS (computerised maintenance management system) automatically — creating a work order with the predicted failure mode, recommended action, and estimated urgency window. No new tooling for the maintenance team to learn.

04

Production floor dashboard

Real-time health status dashboard for all 47 machines — colour-coded by health score, showing trend direction, active alerts, and predicted maintenance windows. Accessible on floor terminals and mobile devices. Maintenance manager can drill into any machine to see the sensor signals driving the health score.

Constraints we worked within

  • SCADA system and sensor infrastructure could not be modified — data pipeline had to read from the historian without impacting production systems
  • CMMS integration required work orders in a specific format — custom connector built to match existing workflow
  • Some machines had incomplete failure history — cold-start handling required for 11 machines with fewer than 3 recorded failure events
  • Model alerts had to be actionable within the maintenance team's shift structure — 6-hour minimum advance warning required to schedule planned intervention

Explicitly not in scope

  • New sensor installation or hardware procurement
  • Quality control or defect detection on finished parts
  • Supply chain or spare parts inventory optimisation
  • ERP integration or production scheduling changes

System Architecture

Existing sensors. New intelligence layer. Failures predicted hours before they happen.

Primary sensor-to-prediction pipeline
Monitoring and output layer

How We Worked

7 months. Maintenance team as domain experts throughout. Zero production disruption.

Month 1–2

Data Audit & Failure Mode Mapping

Extracted and audited 2 years of SCADA historian data across all 47 machines. Worked with the maintenance team to label every failure event in the historical record — 127 distinct failure events across the 3 primary failure modes. Identified 11 machines with insufficient failure history for supervised training — flagged for anomaly detection approach rather than supervised classification.

Month 3–4

Feature Engineering & Model Development

Built the feature extraction pipeline — 60+ features per sensor per machine computed on a rolling 15-minute window. Trained failure prediction models per failure mode. Spindle bearing model achieved 89% precision and 84% recall on held-out test data. Coolant fault model achieved 91% precision. Tool holder model was harder — 78% precision due to thinner failure history — flagged to client with recommendation to collect more labelled data over the next 6 months.

Month 5

Alert Engine & CMMS Integration

Alert thresholds configured with maintenance manager — calibrated to generate 3–5 actionable alerts per day across all 47 machines, avoiding alert fatigue. CMMS connector built and tested. Dashboard deployed to floor terminals. Maintenance team ran a 3-week shadow period — alerts generated but not acted on, team compared predictions against their own assessments.

Month 6–7

Live Operation & Model Refinement

System went live. First predicted failure caught: spindle bearing on Line 2, Machine 14 — alert fired 11 hours before the bearing would have failed based on degradation rate. Planned replacement completed in a scheduled 2-hour window. Equivalent reactive failure would have caused an estimated 14-hour unplanned stop. Model performance monitored weekly; 2 refinement cycles completed in month 7 based on new failure event data.

Working rhythm

  • CadenceTwo-week sprints, weekly maintenance team reviews
  • Decision ownerVP of Operations and Maintenance Manager
  • Primary metricUnplanned downtime hours vs. prior year baseline
  • Escalation SLA24 hours with written recommendation

Results

Measured at 6 months post go-live.

reduction in unplanned downtime hours

Was: 340 hours of unplanned downtime per year across 3 lines

Annualised from 6-month post-go-live data. The system predicted 34 of the 41 failure events that occurred in the measurement period — 83% catch rate. The 7 missed predictions were all on the tool holder model, which had the thinnest training data. Additional failure event labelling is ongoing to improve this model.

0%

in measurable first-year savings

Was: $18,000/hour × 340 hours = $6.1M annual downtime cost

Savings calculated as avoided downtime cost ($3.7M) plus reduction in emergency maintenance spend ($0.4M). Does not include quality improvements from catching degraded machines before they produce out-of-tolerance parts — estimated at an additional $0.3M in scrap reduction.

0M

average advance warning before predicted failure

Was: zero advance warning — failures discovered when production stopped

Advance warning range across all caught predictions: 6 hours (minimum, tool holder faults) to 31 hours (spindle bearing degradation). The 6-hour minimum was sufficient for the maintenance team to schedule planned interventions within shift structure in all but 2 cases.

0hrs

failure prediction catch rate across all 3 failure modes

Was: 0% — no predictive capability, all failures discovered reactively

Spindle bearing: 94% catch rate. Coolant faults: 88% catch rate. Tool holder: 67% catch rate (improving as more labelled failure data accumulates). False positive rate: 1.2 false alerts per week across all 47 machines — maintenance team reports this as acceptable given the cost of a missed failure.

0%

What This Means for You

The sensor data already exists in most manufacturing facilities. The gap is not hardware — it is the absence of a system that reads that data continuously and translates it into maintenance decisions before failures occur.

This engagement was built entirely on top of existing sensor infrastructure — no new hardware, no SCADA modifications, no production disruption during implementation. The maintenance team's domain knowledge was the most valuable input to the model: their failure event labels and their assessment of alert thresholds shaped the system from day one.

Tell us what you're building.

"They don't force us to go their way; instead, they follow our way of thinking."

★★★★★Marek StrzelczykHead of New Products & IT, GS1 Polska

What happens next

  • We respond to every inquiry within 1 business day.
  • A 30-minute discovery call — no templates, no sales scripts.
  • An honest assessment of fit. We'll tell you early if we're not the right partner.