A Keras-Based Autoencoder for Anomaly Detection in Sequences

栏目: IT技术 · 发布时间: 5年前

内容简介：Use Keras to develop a robust NN architecture that can be used to efficiently recognize anomalies in sequencesSuppose that you have a very long list of string sequences, such as a list of amino acid structures (‘PHE-SER-CYS’, ‘GLN-ARG-SER’,…), product seri

Use Keras to develop a robust NN architecture that can be used to efficiently recognize anomalies in sequences

Alon Agmon

Jan 16 ·6min read

A Keras-Based Autoencoder for Anomaly Detection in Sequences — Photo by Markus Spiske on Unsplash

Suppose that you have a very long list of string sequences, such as a list of amino acid structures (‘PHE-SER-CYS’, ‘GLN-ARG-SER’,…), product serial numbers (‘AB121E’, ‘AB323’, ‘DN176’…), or users UIDs, and you are required to create a validation process of some kind that will detect anomalies in this sequence. An anomaly might be a string that follows a slightly different or unusual format than the others (whether it was created by mistake or on purpose) or just one that is extremely rare. To make things even more interesting, suppose that you don't know what is the correct format or structure that sequences suppose to follow.

This is a relatively common problem (though with an uncommon twist) that many data scientists usually approach using one of the popular unsupervised ML algorithms, such as DBScan, Isolation Forest, etc. Many of these algorithms typically do a good job in finding anomalies or outliers by singling out data points that are relatively far from the others or from areas in which most data points lie. Although autoencoders are also well-known for their anomaly detection capabilities, they work quite differently and are less common when it comes to problems of this sort.

I will leave the explanations of what is exactly an autoencoder to the many insightful and well-written posts, and articles that are freely available online. Very very briefly (and please just read on if this doesn't make sense to you), just like other kinds of ML algorithms, autoencoders learn by creating different representations of data and by measuring how well these representations do in generating an expected outcome; and just like other kinds of neural network, autoencoders learn by creating different layers of such representations that allow them to learn more complex and sophisticated representations of data (which on my view is exactly what makes them superior for a task like ours). Autoencoders are a special form of a neural network, however, because the output that they attempt to generate is a reconstruction of the input they receive . An autoencoder starts with input data (i.e., a set of numbers) and then transforms it in different ways using a set of mathematical operations until it learns the parameters that it ought to use in order to reconstruct the same data (or get very close to it). In this learning process, an autoencoder essentially learns the format rules of the input data. And, that's exactly what makes it perform well as an anomaly detection mechanism in settings like ours.

Using autoencoders to detect anomalies usually involves two main steps:

First, we feed our data to an autoencoder and tune it until it is well trained to reconstruct the expected output with minimum error. An autoencoder that receives an input like 10,5,100 and returns 11,5,99, for example, is well-trained if we consider the reconstructed output as sufficiently close to the input and if the autoencoder is able to successfully reconstruct most of the data in this way.

Second, we feed all our data again to our trained autoencoder and measure the error term of each reconstructed data point. In other words, we measure how “far” is the reconstructed data point from the actual datapoint. A well-trained autoencoder essentially learns how to reconstruct an input that follows a certain format, so if we give a badly formatted data point to a well-trained autoencoder then we are likely to get something that is quite different from our input, and a large error term.

以上所述就是小编给大家介绍的《A Keras-Based Autoencoder for Anomaly Detection in Sequences》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

A Keras-Based Autoencoder for Anomaly Detection in Sequences

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

DOM Scripting

Jeremy Keith / friendsofED / 2010-12 / GBP 27.50

There are three main technologies married together to create usable, standards-compliant web designs: XHTML for data structure, Cascading Style Sheets for styling your data, and JavaScript for adding ......一起来看看《DOM Scripting》这本书的介绍吧!

码农工具

A Keras-Based Autoencoder for Anomaly Detection in Sequences

DOM Scripting

RGB转16进制工具

RGB HSV 转换

HEX HSV 转换工具