Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "Reconstruction Error"

Sort by: Order: Results:

  • Iltanen, Henri (2020)
    Anomaly detection is an important task in many domains such as maritime where it is used to detect, for example, unsafe, unexpected or criminal behaviour. This thesis studies the use of deep autoencoders for anomaly detection on high dimensional data in an unsupervised manner. The study is performed on a benchmark data set and a real-life AIS (Automatic Tracking System) data set containing actual ship trajectories. The ships’ trajectories in the AIS data set are a form of time-series data, and therefore recurrent layers are used in an autoencoder to allow the model to capture temporal dependencies in the data. An autoencoder is a neural network architecture where an encoder network produces an encoding and decoder network takes the encoding intending to produce the original input. An encoding is a compressed fixed-sized vector presentation of the original input. Since the encoding is used by the decoder to construct the original input, the model learns during the training process to store essential information of the input sequence to the encoding. Autoencoders can be used to detect anomalies using reconstruction error by assuming that a trained autoencoder is able to reconstruct non-anomalius data points more accurately than anomalous data points, and therefore data points with high reconstruction error can be considered anomalies. In addition to reconstruction error, the autoencoders produce encodings. The research of this thesis studies the possibility of calculating an outlier score for the encodings and combining the score with resconstruction error to form a combined outlier score. OPTICS-OF (Ordering Points to Identify the ClusteringStructure with Outlier Factors) is a density based anomaly detection technique which can be used to calculate outlier scores for the encodings. The outlier score of OPTICS-OF for a data point is based on how isolated it is within its neighbourhood. The proposed method is evaluated on a benchmark Musk data set for which anomalies are known. A data set with labelled anomalies provides a setting for analyzing the performance of the method and its properties. The method is then put to the test on the AIS data set where it is used to find new anomalies in the data set from two derived distinct feature sets. The AIS data set contains one known anomaly which is presented both as an example of a maritime anomaly and for which more detailed analysis of the produced outlier scores are presented. The results of the study show potential for the proposed combined score method, and the analysis identifies multiple areas for further research. Deep autoencoders are successfully used to find new anomalies from the AIS data set which show actual behaviour deviating from normal ship movement.