Build an Artificial Neural Network(ANN) from scratch: Part-2

栏目: IT技术 · 发布时间: 4年前

内容简介:The article is focused on understanding the maths behind an ANN with one hidden layer and building it using the Python NumPy library.In my previous article,In the previous article, we concluded that a Perceptron is capable of finding a linear decision boun

Build an Artificial Neural Network(ANN) from scratch: Part-2

The article is focused on understanding the maths behind an ANN with one hidden layer and building it using the Python NumPy library.

In my previous article, Build an Artificial Neural Network(ANN) from scratch: Part-1 we started our discussion about what are artificial neural networks; we saw how to create a simple neural network with one input and one output layer, from scratch in Python. Such a neural network is called a perceptron. However, real-world neural networks, capable of performing complex tasks such as image classification and stock market analysis, contain multiple hidden layers in addition to the input and output layer.

In the previous article, we concluded that a Perceptron is capable of finding a linear decision boundary. We used the perceptron to predict whether a person is diabetic or not using a dummy dataset. However, a perceptron is not capable of finding non-linear decision boundaries.

In this article, we will develop a neural network with one input layer, one hidden layer, and one output layer. We will see that the neural network that we will develop will be capable of finding non-linear boundaries.

Generating a dataset

Let’s start by generating a dataset we can play with. Fortunately, scikit-learn has some useful dataset generators, so we don’t need to write the code ourselves. We will go with the make_moons function.

from sklearn import datasets
np.random.seed(0)
feature_set, labels = datasets.make_moons(300, noise=0.20)
plt.figure(figsize=(10,7))
plt.scatter(feature_set[:,0], feature_set[:,1], c=labels, cmap=plt.cm.Spectral)

The dataset we generated has two classes, plotted as red and blue points. You can think of the blue dots as male patients and the red dots as female patients, with the x-axis and y-axis being medical measurements.

Our goal is to train a Machine Learning classifier that predicts the correct class (male or female) given the x and y coordinates. Note that the data is not linearly separable, we can’t draw a straight line that separates the two classes. This means that linear classifiers, such as ANN without any hidden layer or even Logistic Regression, won’t be able to fit the data unless you hand-engineer non-linear features (such as polynomials) that work well for the given dataset.

Neural Networks with One Hidden Layer

Here’s our simple network:

We have two inputs: x1 and x2 . There is a single hidden layer with 3 units (nodes): h1, h2 , and h3 . Finally, there are two outputs: y1 and y2 . The arrows that connect them are the weights. There are two weights matrices: w , and u . The w weights connect the input layer and the hidden layer. The u weights connect the hidden layer and the output layer. We have employed the letters w , and u , so it is easier to follow the computation to follow. You can also see that we compare the outputs y1 and y2 with the targets t1 and t2 .

There is one last letter we need to introduce before we can get to the computations. Let a be the linear combination prior to activation. Thus, we have:

and

Since we cannot exhaust all activation functions and all loss functions, we will focus on two of the most common. A sigmoid activation and an L2-norm loss . With this new information and the new notation, the output y is equal to the activated linear combination.

Therefore, for the output layer, we have:

while for the hidden layer:

We will examine backpropagation for the output layer and the hidden layer separately, as the methodologies differ.

I would like to remind you that:

The sigmoid function is:

and its derivative is:

Backpropagation for the output layer

In order to obtain the update rule:

we must calculate

Let’s take a single weight u ij . The partial derivative of the loss w.r.t. u ij equals:

where i corresponds to the previous layer (input layer for this transformation) and j corresponds to the next layer (output layer of the transformation). The partial derivatives were computed simply following the chain rule.

following the L2-norm loss derivative.

following the sigmoid derivative.

Finally, the third partial derivative is simply the derivative of:

So,

Replacing the partial derivatives in the expression above, we get:

Therefore, the update rule for a single weight for the output layer is given by:

5. Backpropagation of a hidden layer

Similarly to the backpropagation of the output layer, the update rule for a single weight, wij would depend on:

following the chain rule. Taking advantage of the results we have so far for transformation using the sigmoid activation and the linear model, we get:

and

The actual problem for backpropagation comes from the term

That’s due to the fact that there is no “hidden” target. You can follow the solution for weight w11 below. It is advisable to keep a look on the NN diagram shown above, while going through the computations.

From here, we can calculate

Which was what we wanted. The final expression is:

The generalized form of this equation is:

Backpropagation generalization

Using the results for backpropagation for the output layer and the hidden layer, we can put them together in one formula, summarizing backpropagation, in the presence of L2-norm loss and sigmoid activations.

where for a hidden layer

Code for Neural Networks with One Hidden Layer

Now let’s implement the neural network that we just discussed in Python from scratch. We will again try to classify the non-linear data that we created above.

We start by defining some useful variables and parameters for gradient descent like training dataset size, dimensions of input and output layers.

num_examples = len(X) # training set size
nn_input_dim = 2 # input layer dimensionality
nn_output_dim = 2 # output layer dimensionality

Also defining the Gradient descent parameters.

epsilon = 0.01 # learning rate for gradient descent
reg_lambda = 0.01 # regularization strength

First, let’s implement the loss function we defined above. We use this to evaluate how well our model is doing:

# Helper function to evaluate the total loss on the dataset
def calculate_loss(model, X, y):
 num_examples = len(X) # training set size
 W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
 # Forward propagation to calculate our predictions
 z1 = X.dot(W1) + b1
 a1 = np.tanh(z1)
 z2 = a1.dot(W2) + b2
 exp_scores = np.exp(z2)
 probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
 # Calculating the loss
 corect_logprobs = -np.log(probs[range(num_examples), y])
 data_loss = np.sum(corect_logprobs)
 # Add regulatization term to loss (optional)
 data_loss += Config.reg_lambda / 2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)))
 return 1. / num_examples * data_loss

We also implement a helper function to calculate the output of the network. It does forward propagation and returns the class with the highest probability.

def predict(model, x):
 W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
 # Forward propagation
 z1 = x.dot(W1) + b1
 a1 = np.tanh(z1)
 z2 = a1.dot(W2) + b2
 exp_scores = np.exp(z2)
 probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
 return np.argmax(probs, axis=1)

Finally, here comes the function to train our Neural Network. It implements batch gradient descent using the backpropagation derivates we found above.

This function learns parameters for the neural network and returns the model.

nn_hdim: Number of nodes in the hidden layer

num_passes: Number of passes through the training data for gradient descent

print_loss: If True, print the loss every 1000 iterations

def build_model(X, y, nn_hdim, num_passes=20000, print_loss=False):
 # Initialize the parameters to random values. We need to learn these.
 num_examples = len(X)
 np.random.seed(0)
 W1 = np.random.randn(Config.nn_input_dim, nn_hdim) / np.sqrt(Config.nn_input_dim)
 b1 = np.zeros((1, nn_hdim))
 W2 = np.random.randn(nn_hdim, Config.nn_output_dim) / np.sqrt(nn_hdim)
 b2 = np.zeros((1, Config.nn_output_dim))# This is what we return at the end
 model = {}# Gradient descent. For each batch...
 for i in range(0, num_passes):# Forward propagation
 z1 = X.dot(W1) + b1
 a1 = np.tanh(z1)
 z2 = a1.dot(W2) + b2
 exp_scores = np.exp(z2)
 probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Backpropagation
 delta3 = probs
 delta3[range(num_examples), y] -= 1
 dW2 = (a1.T).dot(delta3)
 db2 = np.sum(delta3, axis=0, keepdims=True)
 delta2 = delta3.dot(W2.T) * (1 - np.power(a1, 2))
 dW1 = np.dot(X.T, delta2)
 db1 = np.sum(delta2, axis=0)# Add regularization terms (b1 and b2 don't have regularization terms)
 dW2 += Config.reg_lambda * W2
 dW1 += Config.reg_lambda * W1# Gradient descent parameter update
 W1 += -Config.epsilon * dW1
 b1 += -Config.epsilon * db1
 W2 += -Config.epsilon * dW2
 b2 += -Config.epsilon * db2# Assign new parameters to the model
 model = {'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}# Optionally print the loss.
 # This is expensive because it uses the whole dataset, so we don't want to do it too often.
 if print_loss and i % 1000 == 0:
 print("Loss after iteration %i: %f" % (i, calculate_loss(model, X, y)))return model

Finally the main method:

def main():
 X, y = generate_data()
 model = build_model(X, y, 3, print_loss=True)
 visualize(X, y, model)

Print loss after every 1000 iterations:

Classification when the number of nodes in the hidden layer is 3

Now Let’s now get a sense of how varying the hidden layer size affects the result.

hidden_layer_dimensions = [1, 2, 3, 4, 5, 20, 50] 
for i, nn_hdim in enumerate(hidden_layer_dimensions): 
 plt.subplot(5, 2, i+1)
 plt.title('Hidden Layer size %d' % nn_hdim)
 model = build_model(X, y,nn_hdim, 20000, print_loss=False)
 plot_decision_boundary(lambda x:predict(model,x), X, y) 
 plt.show()

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

白话区块链

白话区块链

蒋勇 / 文延、嘉文 / 机械工业出版社 / 2017-10-1 / 59.00

由浅入深:从比特币开始,到区块链技术的骨骼(密码算法)和灵魂(共识算法),再到目前知名的区块链框架介绍,到最后从零构建一个微型区块链系统(微链),循序渐进。 多图多表:各种示例以及图表,通过流程图与示意图介绍比特币的源码编译、以太坊智能合约的开发部署、超级账本Fabric的配置使用、模拟比特币的微型区块链系统的设计实现等,形象而直观。 白话通俗:通过“村民账本记账”、“百花村选举记账”......一起来看看 《白话区块链》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试