Recurrent Neural Networks

栏目: IT技术 · 发布时间: 4年前

内容简介:The goal of this article is to explore Recurrent Neural Networks in-depth, which are a kind of Neural Networks with a different architecture than the ones seen in previous articles (Link).Concretely, the article is segmented in the following parts:As we ha

Understand the intuition behind RNN!

Recurrent Neural Networks

May 3 ·9min read

Recurrent Neural Networks

Figure by Author

Introduction

The goal of this article is to explore Recurrent Neural Networks in-depth, which are a kind of Neural Networks with a different architecture than the ones seen in previous articles (Link).

Concretely, the article is segmented in the following parts:

  • What RNNs are
  • Long Short-Term Memory (LSTM) networks
  • Implementation of RNNs to temporal series

What are RNNs?

As we have seen here, CNNs do not have any kind of memory, RNNs can fo beyond this limitation of ‘starting to think from scratch’ each time because they have some kind of memory.

Let’s see how do they work with a very visual example:

Example

Let’s say that we live in an apartment and we have the perfect roommate, he cooks one different meal depending on the weather, sunny or rainy.

Recurrent Neural Networks

Figure by Author

So, if we codify these meals with vectors:

Recurrent Neural Networks

Figure by Author

And our Neural Network does the following:

Recurrent Neural Networks

Figure by Author

If we recall, neural networks learn some weights that can be expressed as matrixes, and those weights are used to make predictions. Ours will be as follows:

If it is a sunny day:

Recurrent Neural Networks

Figure by Author

If it is a rainy day:

Recurrent Neural Networks

Figure by Author

And if we take a look at our weight matrix, this time seen as a graph:

Recurrent Neural Networks

Figure by Author

Let’s see now what add RNNs following this example:

Recurrent Neural Networks

Let’s say that now our dear roommate not only bases the decision of what to cook on the weather but now simply looks at what he cooked yesterday.

The network in charge of getting to predict what the roommate will cook tomorrow based on what she cooked today is a Recurrent Neural Network (RNN).

This RNN can be expressed as the following matrix:

Recurrent Neural Networks

Figure by Author

So what we have is a:

Recurrent Neural Networks

Figure by Author

Let’s Make it a Little Bit More Complex

Imagine now that your roommate decides what to cook based on what she cooked yesterday and the weather.

  • If the day is sunny, she spends the day on the terrace with a good beer in her hand, so she does not cook, so we eat the same thing as yesterday. But
  • If it rains, she stays home and cooks.

It would be something like this:

Recurrent Neural Networks

Figure by Author

So we end up having one model that tells us what we are going to eat depending on what we ate yesterday and another model that tells us whether our roommate will cook or not.

Recurrent Neural Networks

Figure by Author

And the add and merge operations are the following:

Recurrent Neural Networks

Figure by Author

Recurrent Neural Networks

Figure by Author

And here you can see the graph:

Recurrent Neural Networks

Figure by Author

And that is how it works!

This example is from a great video which I recommend you check out as many times as you need to interiorize and understand the previous explanation. You can find the video here: https://www.youtube.com/watch?v=UNmqTiOnRfg

And what are RNNs used for?

There are several types:

Recurrent Neural Networks

Figure by Author

They are very good at making predictions, especially when our data is sequential:

Stock market forecasts

The values of a share depend largely on the values it had previously

Sequence generation

As long as data are sequences and data in an instant t depends on the data in the instant t-1.

Text generation

For example, when your cell phone suggests words. It looks at the last word you have written, and at the letters, you are writing at that moment to suggest the next letters or even words.

Voice recognition

In this case, we have the previous word recognized, and the audio that reaches us at that moment.

Long Short-Term Memory Networks

Let’s study now how the most popular RNN work. They are the LSTM networks and their structure is as follows:

Recurrent Neural Networks

Figure by Author

But first: Why are they the most popular ones?

It turns out that conventional RNNs have memory problems. Specially designed memory networks are incapable of long-term memory. And why is this a problem?

Well, going back to the problem of our roommate, for this example we just need to know what we ate yesterday, so nothing would happen.

Recurrent Neural Networks

Figure by Author

But imagine if instead of a three-course menu, I had 60 courses.

Recurrent Neural Networks

Figure by Author

Conventional RNNs wouldn’t be able to remember things that happened a long time ago. However, the LSTM would!

And why?

Let’s take a look at the architecture of the RNN and the LSTM:

RNN

Recurrent Neural Networks

Figure by Author

LSTM

Recurrent Neural Networks

Figure by Author

It turns out that where RNNs have a single layer, LSTMs have a combination of layers that interact with each other in a very special way.

Let’s try to understand this, but first, let me explain the nomenclature:

Figure by Author

In the diagrams above:

  • A vector travels along each line, from the output of one node to the inputs of others.
  • The pink circles indicate element to element operations, such as vector sums, while the yellow boxes are neural layers that are learned by training.
  • Lines that join indicate concatenation, and lines that separate indicate that the same line content travels to two different destinations.

The key idea of LSTMs

The key is the state of the cell, which is indicated in the diagram as the line that travels across the top:

Recurrent Neural Networks

Figure by Author

The state of the cell is like a kind of conveyor belt that travels along with the whole architecture of the network with very few interactions (and they are linear): this implies that the information simply flows without being modified.

The ingenious part is that the layers of the LSTM can (or cannot) contribute information to this conveyor belt, and that decision is made by the “gates”:

Recurrent Neural Networks
Figure by Author

The gates are nothing more than a way of carefully regulating the information that arrives on the conveyor belt. They are composed of a neural network with sigmoid-type activation and elemental multiplication.

Thus, the sigmoid layer outputs a number between 0 and one, which implies how important that information is to let it pass to the conveyor belt. 0 means I don’t care, and a 1 means it’s very important.

As you can see in the diagram, an LSTM has 3 such doors, to protect and control the conveyor belt.

The specific details about this operation, are greatly explained here: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

And this blog is also very interesting: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

With this in mind, let’s see what Recurring Networks can do!

LSTM Implementation

Image Classification with LSTM

We’ll follow an example that can be found here:

https://medium.com/the-artificial-impostor/notes-understanding-tensorflow-part-2-f7e5ece849f5

from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras.datasets import mnist
from keras.utils import np_utils
from keras import initializers

# Hyper parameters
batch_size = 128
nb_epoch = 10
# Parameters for MNIST dataset
img_rows, img_cols = 28, 28
nb_classes = 10
# Parameters for LSTM network
nb_lstm_outputs = 30
nb_time_steps = img_rows
dim_input_vector = img_cols
# Load MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print('X_train original shape:', X_train.shape)
input_shape = (nb_time_steps, dim_input_vector)
X_train = X_train.astype('float32') / 255.
X_test = X_test.astype('float32') / 255.
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# LSTM Building
model = Sequential()
model.add(LSTM(nb_lstm_outputs, input_shape=input_shape))
model.add(Dense(nb_classes, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

Recurrent Neural Networks

# Training the model
history = model.fit(X_train,
Y_train,
nb_epoch=nb_epoch,
batch_size=batch_size,
shuffle=True,
validation_data=(X_test, Y_test),
verbose=1)

Recurrent Neural Networks

# Evaluation
evaluation = model.evaluate(X_test, Y_test, batch_size=batch_size, verbose=1)
print('Summary: Loss over the test dataset: %.2f, Accuracy: %.2f' % (evaluation[0], evaluation[1]))

Time Series Prediction with LSTM

# LSTM for international airline passengers problem with regression framing
# https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
!wget https://raw.githubusercontent.com/lazyprogrammer/machine_learning_examples/master/airline/international-airline-passengers.csv
import numpy
import matplotlib.pyplot as plt
from pandas import read_csv
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1):
dataX, dataY = [], []
for i in range(len(dataset)-look_back-1):
a = dataset[i:(i+look_back), 0]
dataX.append(a)
dataY.append(dataset[i + look_back, 0])
return numpy.array(dataX), numpy.array(dataY)
# fix random seed for reproducibility
numpy.random.seed(7)
# load the dataset
dataframe = read_csv('international-airline-passengers.csv', usecols=[1], engine='python', skipfooter=3)
dataset = dataframe.values
dataset = dataset.astype('float32')
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
# split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
# reshape into X=t and Y=t+1
look_back = 1
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)
# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
# invert predictions
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform([trainY])
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform([testY])
# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))
# shift train predictions for plotting
trainPredictPlot = numpy.empty_like(dataset)
trainPredictPlot[:, :] = numpy.nan
trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict
# shift test predictions for plotting
testPredictPlot = numpy.empty_like(dataset)
testPredictPlot[:, :] = numpy.nan
testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict
# plot baseline and predictions
plt.plot(scaler.inverse_transform(dataset))
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()

Recurrent Neural Networks

Recurrent Neural Networks

Final Words

As always, I hope you enjoyed the post, and that you gained an intuition about RNNs and how to implement them!

If you liked this post then you can take a look at my other posts on Data Science and Machine Learning here .

If you want to learn more about Machine Learning, Data Science and Artificial Intelligence follow me on Medium , and stay tuned for my next posts!


以上所述就是小编给大家介绍的《Recurrent Neural Networks》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Python网络数据采集

Python网络数据采集

米切尔 (Ryan Mitchell) / 陶俊杰、陈小莉 / 人民邮电出版社 / 2016-3-1 / CNY 59.00

本书采用简洁强大的Python语言,介绍了网络数据采集,并为采集新式网络中的各种数据类型提供了全面的指导。第一部分重点介绍网络数据采集的基本原理:如何用Python从网络服务器请求信息,如何对服务器的响应进行基本处理,以及如何以自动化手段与网站进行交互。第二部分介绍如何用网络爬虫测试网站,自动化处理,以及如何通过更多的方式接入网络。一起来看看 《Python网络数据采集》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换