An Introduction to Artificial Neural Networks

栏目: IT技术 · 发布时间: 3年前

内容简介:Artificial Neural Network (ANN) is a deep learning algorithm that emerged and evolved from the idea ofANN algorithm would accept only numeric and structured data as input. To accept unstructured and non-numeric data formats such as Image, Text, and Speech,

Artificial Neural Network (ANN)

Artificial Neural Network (ANN) is a deep learning algorithm that emerged and evolved from the idea of Biological Neural Networks of human brains . An attempt to simulate the workings of the human brain culminated in the emergence of ANN. ANN works very similar to the biological neural networks but doesn’t exactly resemble its workings.

ANN algorithm would accept only numeric and structured data as input. To accept unstructured and non-numeric data formats such as Image, Text, and Speech, Convolutional Neural Networks (CNN), and Recursive Neural Networks (RNN) are used respectively. In this post, we concentrate only on Artificial Neural Networks.

Biological neurons vs Artificial neurons

Structure of Biological neurons and their functions

  • Dendrites receive incoming signals.
  • Soma (cell body) is responsible for processing the input and carries biochemical information.
  • Axon is tubular in structure responsible for the transmission of signals.
  • Synapse is present at the end of the axon and is responsible for connecting other neurons.
Biological Neuron. Image Source

Structure of Artificial neurons and their functions

  • A neural network with a single layer is called a perceptron . A multi-layer perceptron is called Artificial Neural Networks.
  • A Neural network can possess any number of layers. Each layer can have one or more neurons or units. Each of the neurons is interconnected with each and every other neuron. Each layer could have different activation functions as well.
  • ANN consists of two phases Forward propagation and Backpropagation. The forward propagation involves multiplying weights, adding bias, and applying activation function to the inputs and propagating it forward.
  • The backpropagation step is the most important step which usually involves finding optimal parameters for the model by propagating in the backward direction of the Neural network layers. The backpropagation requires optimization function to find the optimal weights for the model.
  • ANN can be applied to both Regression and Classification tasks by changing the activation functions of the output layers accordingly. (Sigmoid activation function for binary classification, Softmax activation function for multi-class classification and Linear activation function for Regression).
Artificial Neural Networks. Image Source

Why Neural Networks?

  • Traditional Machine Learning algorithms tend to perform at the same level when the data size increases but ANN outperforms traditional Machine Learning algorithms when the data size is huge as shown in the graph below.
Performance graph of various algorithms. Image Source
  • Feature Learning . The ANN tries to learn hierarchically in an incremental manner layer by layer. Due to this reason, it is not necessary to perform feature engineering explicitly.
  • Neural Networks can handle unstructured data like images, text, and speech. When the data contains unstructured data the neural network algorithms such as CNN (Convolutional Neural Networks) and RNN (Recurrent Neural Networks) are used.

How ANN works

The working of ANN can be broken down into two phases,

  • Forward Propagation
  • Back Propagation

Forward Propagation

  • Forward propagation involves multiplying feature values with weights, adding bias, and then applying an activation function to each neuron in the neural network.
  • Multiplying feature values with weights and adding bias to each neuron is basically applying Linear Regression . If we apply Sigmoid function to it then each neuron is basically performing a Logistic Regression.

Activation functions

  • The purpose of an activation function is to introduce non-linearity to the data. Introducing non-linearity helps to identify the underlying patterns which are complex. It is also used to scale the value to a particular interval. For example, the sigmoid activation function scales the value between 0 and 1.

Logistic or Sigmoid function

  • Logistic/ Sigmoid function scales the values between 0 and 1.
  • It is used in the output layer for Binary classification.
  • It may cause a vanishing gradient problem during backpropagation and slows the training time.
Sigmoid function
Graph of the sigmoid function. Image Source

Tanh function

  • Tanh is the short form for Hyperbolic Tangent . Tanh function scales the values between -1 and 1.
Hyperbolic Tangent function
Graph of Hyperbolic Tangent function. Image Source

ReLU function

  • ReLU (Rectified Linear Unit) outputs the same number if x>0 and outputs 0 if x<0.
  • It prevents the vanishing gradient problem but introduces an exploding gradient problem during backpropagation. The exploding gradient problem can be prevented by capping gradients.
ReLU function
Graph of ReLU function. Image Source

Leaky ReLU function

  • Leaky ReLU is very much similar to ReLU but when x<0 it returns (0.01 * x) instead of 0.
  • If the data is normalized using Z-Score it may contain negative values and ReLU would fail to consider it but leaky ReLU overcomes this problem.
Leaky ReLU function

Backpropagation

  • Backpropagation is done to find the optimal value for parameters for the model by iteratively updating parameters by partially differentiating gradients of the loss function with respect to the parameters .
  • An optimization function is applied to perform backpropagation. The objective of an optimization function is to find the optimal value for parameters.

The optimization functions available are,

  • Gradient Descent
  • Adam optimizer
  • Gradient Descent with momentum
  • RMS Prop (Root Mean Square Prop)

The C hain rule of Calculus plays an important role in backpropagation. The formula below denotes partial differentiation of Loss (L) with respect to Weights/ parameters (w).

A small change in weights ‘w’ influences the change in the value ‘z’ (∂푧/∂푤). A small change in the value ‘z’ influences the change in the activation ‘a’ (∂a/∂z). A small change in the activation ‘a’ influences the change in the Loss function ‘L’ (∂L/∂a).

Chain rule
Description of the values in the Chain rule

Terminologies:

Metrics

  • A metric is used to gauge the performance of the model.
  • Metric functions are similar to cost functions, except that the results from evaluating a metric are not used when training the model. Note that you may use any cost function as a metric.
  • We have used Mean Squared Logarithmic Error as a metric and cost function.
Mean Squared Logarithmic Error (MSLE) and Root Mean Squared Logarithmic Error(RMSLE)

Epoch

  • A single pass through the training data is called an epoch. The training data is fed to the model in mini-batches and when all the mini-batches of the training data are fed to the model that constitutes an epoch.

Hyperparameters

Hyperparameters are the tunable parameters that are not produced by a model which means the users must provide a value for these parameters. The values of hyperparameters that we provide affect the training process so hyperparameter optimization comes to the rescue.

The Hyperparameters used in this ANN model are,

  • Number of layers
  • Number of units/ neurons in a layer
  • Activation function
  • Initialization of weights
  • Loss function
  • Metric
  • Optimizer
  • Number of epochs

Coding ANN in Tensorflow

Load the preprocessed data

The data you feed to the ANN must be preprocessed thoroughly to yield reliable results. The training data has been preprocessed already. The preprocessing steps involved are,

  • MICE Imputation
  • Log transformation
  • Square root transformation
  • Ordinal Encoding
  • Target Encoding
  • Z-Score Normalization

For the detailed implementation of the above-mentioned steps refer my notebook on data preprocessing

Notebook Link

Neural Architecture

  • The ANN model that we are going to use, consists of seven layers including one input layer, one output layer, and five hidden layers.
  • The first layer (input layer) consists of 128 units/ neurons with the ReLU activation function.
  • The second, third, and fourth layers consist of 256 hidden units/ neurons with the ReLU activation function.
  • The fifth and sixth layer consists of 384 hidden units with ReLU activation function.
  • The last layer (output layer) consists of one single neuron which outputs an array with the shape (1, N) where N is the number of features.

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

ACM程序设计

ACM程序设计

曾棕根 / 北京大学 / 2011-4 / 34.00元

《ACM程序设计(第2版)》详细讲解了ACM国际大学生程序设计竞赛(ACM/ICPC)编程、调试方法,以及提高时间、空间性能的策略,并充分利用了C++泛型编程的高效率、规范化的特性,全部采用C++泛型编程。第1章讲解了ACM程序设计入门知识;第2章讲解了C++泛型编程的容器、迭代器和常用算法;第3章讲解了ACM程序设计的基本编程技巧;第4章讲解了50道原版ACM竞赛题的解题思路,并配有C++泛型编......一起来看看 《ACM程序设计》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

html转js在线工具
html转js在线工具

html转js在线工具