10. Anomaly Detection

Anomaly Detection is the task of identifying samples that are abnormal / anomalous, meaning that they deviate considerably from some normal / typical data. The task is highly related to Outlier Detection and Novelty Detection. Many of the tools and techniques mentioned here can be transferred directly to those tasks.

Anomaly Detection is normally approached as an unsupervised learning problem, with the training data consisting only of normal data (potentially with a low degree of contamination). Any deviation from this normal training data is to be considered an anomaly. This is sometimes called one-class classification, as it focuses on modeling a single class (the normal class).

The one-class approach ensures that novel anomalies (different from those seen during development), are correctly classified as anomalies. If one would use a supervised binary classification, it usually has the problem that novel datapoints may be incorrectly marked as normal, because they falls inside a decision boundary in training set.

While labeled data is not used in the training set when applying unsupervised learning, in practice a small labeled dataset is critical for performance evaluation.

10.1. Applications

Anomaly Detection has wide range of applications using sensor data and embedded systems. Here are a few examples.

Application examples

Area

Task

Sensor

Industrial

Condition Monitoring of rotating machninery

Accelerometer

Industrial

Detecting fault in machines

Microphone

Electronics

Detecting issues in Lithium Ion batteries

Electrical/thermal

Automotive

Instrusion and fault detection in CANBus networks

CANBus

Robotics

Monitoring executed tasks for faults

Mix

Health

Detection of anomalous heartbeats

Electrocardiogram (ECG)

10.2. Trade-off between False Alarms and Missed Detections

Anomaly Detection involves a binary decision problem (is it an anomaly or not), has an inherent trade-off between False Alarms and Missed Detections.

Anomaly Detection models in emlearn provide a continuous anomaly score as output. The C function has a name ending with _score() A threshold needs to be applied, in order to convert the continuous anomaly score to a binary decision.

10.3. Selecting anomaly threshold

Models in scikit-learn have an automatic threshold selection, based on the hyperparameter contamination (proportion of outliers in training data). We recommend not using this mechanism, but instead analyze the continuous anomaly scores to determine an appropriate threshold. It is smart to plot the histogram over anomaly scores, with the anomaly/normal class indicated (when known).

_images/anomaly_score_distribution.png

The anomaly score histogram is very useful plot to visualize. When labels are available, it is also possible to compute a trade-off curve. Image source: “Robust Anomaly Detection in Time Series through Variational AutoEncoders and a Local Similarity Score”, Matias et.al., 2021.

The optimal way of setting the threshold, is to use a labeled validation dataset. It is then possible to pick a desired false positive vs false negative trade-off, (for example using f-score), and to determine an optimal threshold.

When the labeled data is insufficient or not present, one has to resort to other heuristics for selecting the threshold. This is often done visually, by picking a point where “most of” the “normal” data look to be included.

10.4. Anomaly Detection models

emlearn supports a selection of Anomaly Detection models.

Supported models for Anomaly Detection

Algorithm

Implementation

Gaussian Mixture Model (GMM)

GaussianMixture, BayesianGaussianMixture

Mahalanobis distance

EllipticEnvelope

A basic example of some of the models can be found in Anomaly Detection comparison.

10.5. Outlier detection for handling unknown data

Anomaly/outlier detection models are also used in Classification or Regression systems, in order to detect input data that are outside the data distribution of a trained model. Such input data can result in spurious results from the classifier/regressor. To prevent this the input data is also passed through an outlier detection model, and the outliers are marked as “unknown”.