Seen enough to have a conversation?

No sales deck. Just a direct conversation about your problem.

Manufacturing · Machine Learning

Reducing unplanned downtime by 67% across 3 production lines through ML-based predictive maintenance

A mid-size US precision parts manufacturer operating 3 high-throughput CNC production lines was experiencing 340+ hours of unplanned downtime annually — each hour costing $18,000 in lost output, scrapped material, and emergency maintenance labour. Maintenance was entirely reactive: machines failed, production stopped, technicians responded. We built an ML-based predictive maintenance system on top of existing sensor infrastructure that predicts failures 6–18 hours in advance, reduced unplanned downtime by 67%, and delivered $4.1M in measurable first-year savings.

The Problem

The machines were telling them something was wrong.
Nobody was listening.

The manufacturer ran three CNC production lines — 47 machines in total — producing precision aerospace and automotive components under tight tolerance specifications. Each machine had between 8 and 24 sensors already installed: vibration, temperature, spindle load, coolant pressure, and acoustic emission. The data was being collected by the SCADA system and stored — but never analysed. Maintenance was scheduled on fixed calendar intervals regardless of actual machine condition, and failures between scheduled maintenance windows were handled reactively. The maintenance team was experienced and capable. They simply had no early warning system.

The failure modes were well understood by the maintenance team — spindle bearing degradation, coolant pump cavitation, and tool holder runout were responsible for 74% of unplanned stops. The team could often tell a machine was "running rough" hours before failure, but had no systematic way to act on that intuition across 47 machines simultaneously. One experienced technician could monitor a handful of machines closely. Nobody could monitor all 47.

The sensor data was the asset. Two years of vibration, temperature, and load data sat in the SCADA historian — including the signatures of every failure event that had occurred. The problem was not a lack of data. It was the absence of a system that could read that data in real time and translate it into actionable maintenance alerts before the failure occurred.

The cost of reactive maintenance

340 hrs
unplanned downtime per year across 3 lines: $18K
cost per hour of unplanned downtime: 23%
of maintenance budget spent on emergency repairs

Scope of Work

What we were asked to build

Sensor data pipeline and feature extraction

Real-time ingestion pipeline from the SCADA historian — pulling vibration, temperature, spindle load, coolant pressure, and acoustic emission data at 1-second intervals per machine. Feature extraction computing 60+ time-domain and frequency-domain features per sensor per machine: RMS, kurtosis, spectral entropy, bearing fault frequencies, and trend derivatives.

Failure prediction models per failure mode

Separate ML models trained per failure mode per machine class — spindle bearing degradation, coolant system faults, and tool holder anomalies. Models trained on 2 years of historical sensor data with failure event labels provided by the maintenance team. Outputs a health score per machine updated every 15 minutes with a predicted time-to-failure range.

Maintenance alert and work order integration

Alert engine generating prioritised maintenance work orders when health scores cross configurable thresholds. Alerts routed to the CMMS (computerised maintenance management system) automatically — creating a work order with the predicted failure mode, recommended action, and estimated urgency window. No new tooling for the maintenance team to learn.

Production floor dashboard

Real-time health status dashboard for all 47 machines — colour-coded by health score, showing trend direction, active alerts, and predicted maintenance windows. Accessible on floor terminals and mobile devices. Maintenance manager can drill into any machine to see the sensor signals driving the health score.

Constraints we worked within

SCADA system and sensor infrastructure could not be modified — data pipeline had to read from the historian without impacting production systems
CMMS integration required work orders in a specific format — custom connector built to match existing workflow
Some machines had incomplete failure history — cold-start handling required for 11 machines with fewer than 3 recorded failure events
Model alerts had to be actionable within the maintenance team's shift structure — 6-hour minimum advance warning required to schedule planned intervention

Explicitly not in scope

New sensor installation or hardware procurement
Quality control or defect detection on finished parts
Supply chain or spare parts inventory optimisation
ERP integration or production scheduling changes

System Architecture

Existing sensors. New intelligence layer. Failures predicted hours before they happen.

SCADA Historian

Sensor Ingestion Pipeline

47 machines · 1-second intervals

Vibration Features

RMS, kurtosis, bearing frequencies

Thermal Features

Temperature trend, delta rate

Spindle Load Features

Load deviation, spectral entropy

Coolant Pressure Features

Pressure variance, cavitation index

Failure Prediction Engine

Per-mode ML models · health scoring

CMMS Work Orders

Floor Health Dashboard

Model Performance Monitor

Primary sensor-to-prediction pipeline

Monitoring and output layer

How We Worked

7 months. Maintenance team as domain experts throughout. Zero production disruption.

Month 1–2

Data Audit & Failure Mode Mapping

Extracted and audited 2 years of SCADA historian data across all 47 machines. Worked with the maintenance team to label every failure event in the historical record — 127 distinct failure events across the 3 primary failure modes. Identified 11 machines with insufficient failure history for supervised training — flagged for anomaly detection approach rather than supervised classification.

Month 3–4

Feature Engineering & Model Development

Built the feature extraction pipeline — 60+ features per sensor per machine computed on a rolling 15-minute window. Trained failure prediction models per failure mode. Spindle bearing model achieved 89% precision and 84% recall on held-out test data. Coolant fault model achieved 91% precision. Tool holder model was harder — 78% precision due to thinner failure history — flagged to client with recommendation to collect more labelled data over the next 6 months.

Month 5

Alert Engine & CMMS Integration

Alert thresholds configured with maintenance manager — calibrated to generate 3–5 actionable alerts per day across all 47 machines, avoiding alert fatigue. CMMS connector built and tested. Dashboard deployed to floor terminals. Maintenance team ran a 3-week shadow period — alerts generated but not acted on, team compared predictions against their own assessments.

Month 6–7

Live Operation & Model Refinement

System went live. First predicted failure caught: spindle bearing on Line 2, Machine 14 — alert fired 11 hours before the bearing would have failed based on degradation rate. Planned replacement completed in a scheduled 2-hour window. Equivalent reactive failure would have caused an estimated 14-hour unplanned stop. Model performance monitored weekly; 2 refinement cycles completed in month 7 based on new failure event data.

Working rhythm

CadenceTwo-week sprints, weekly maintenance team reviews
Decision ownerVP of Operations and Maintenance Manager
Primary metricUnplanned downtime hours vs. prior year baseline
Escalation SLA24 hours with written recommendation

Results

Measured at 6 months post go-live.

Independent measurement · 90 days post go-live

before·340 hours of unplanned downtime per year across 3 lines

reduction in unplanned downtime hours

Annualised from 6-month post-go-live data. The system predicted 34 of the 41 failure events that occurred in the measurement period — 83% catch rate. The 7 missed predictions were all on the tool holder model, which had the thinnest training data. Additional failure event labelling is ongoing to improve this model.

before·$18,000/hour × 340 hours = $6.1M annual downtime cost

in measurable first-year savings

Savings calculated as avoided downtime cost ($3.7M) plus reduction in emergency maintenance spend ($0.4M). Does not include quality improvements from catching degraded machines before they produce out-of-tolerance parts — estimated at an additional $0.3M in scrap reduction.

before·zero advance warning — failures discovered when production stopped

0hrs

average advance warning before predicted failure

Advance warning range across all caught predictions: 6 hours (minimum, tool holder faults) to 31 hours (spindle bearing degradation). The 6-hour minimum was sufficient for the maintenance team to schedule planned interventions within shift structure in all but 2 cases.

before·0% — no predictive capability, all failures discovered reactively

failure prediction catch rate across all 3 failure modes

Spindle bearing: 94% catch rate. Coolant faults: 88% catch rate. Tool holder: 67% catch rate (improving as more labelled failure data accumulates). False positive rate: 1.2 false alerts per week across all 47 machines — maintenance team reports this as acceptable given the cost of a missed failure.

Is This Your Situation?

The sensor data already exists in most manufacturing facilities.

The gap is not hardware — it is the absence of a system that reads that data continuously and translates it into maintenance decisions before failures occur.

Your maintenance team responds to failures rather than preventing them — unplanned stops are a regular operational reality
You have sensor data being collected by your SCADA or historian system that is not being used for predictive purposes
Emergency maintenance and expedited parts procurement represent a disproportionate share of your maintenance budget

This engagement was built entirely on top of existing sensor infrastructure — no new hardware, no SCADA modifications, no production disruption during implementation. The maintenance team's domain knowledge was the most valuable input to the model: their failure event labels and their assessment of alert thresholds shaped the system from day one.

Seen enough to have a conversation?

We scope every engagement before we quote. No sales deck. Just a direct conversation about your problem.

Talk to an engineer See how we approach Machine Learning for manufacturing

Scoped before quoted — no surprise costs

Response within 1 business day

170+ engagements delivered

More Work

Fintech·Machine Learning

76%reduction in false positive alerts

Cutting AML false positive alerts by 76% — without increasing regulatory risk

Read case study

Travel & Hospitality·Machine Learning

23%increase in RevPAR

Increasing RevPAR by 23% across 14 properties through ML-driven dynamic pricing