Hyperparameter Tuning of Keras Deep Learning Model in Python

栏目: IT技术 · 发布时间: 5年前

内容简介：Currently, deep learning is being used in solving a variety of problems, such as image recognition, object detection, text classification, speech recognition (natural language processing), sequence prediction, neural style transfer, text generation, image

This article will give you an overview of how to tune the deep learning model hyperparameters.

Hyperparameter Tuning of Keras Deep Learning Model in Python

Rahul Raoniar

May 13 ·11min read

Article Outline

Introduction
About Dataset
Loading Dataset
Data Preprocessing
Setting Model Configuration
Model Tuning Strategy
Identifying the best model parameters
Retraining with best parameters
Retrieving mean and standard deviation of CV score
Tutorial Code

Introduction

Currently, deep learning is being used in solving a variety of problems, such as image recognition, object detection, text classification, speech recognition (natural language processing), sequence prediction, neural style transfer, text generation, image reconstruction and many more.

It is the technology used behind self-driving cars, speech recognition used in Siri, Alexa or Google, photo tagging on Facebook, song recommendation on Spotify and product recommendation engines. Now even researches are using deep learning to understand complex patterns in data, for example detecting glaucoma in diabetes patients, disaster management (earthquake and flood predictions), new material development, fake news detection, robotics and biomechanics. For better understanding the practical application of deep learning, I will recommend you to watch the YouTube Series “ The Age of A.I. ”.

There are many tools available to train a deep neural network. For research work, researchers use programming language and libraries/packages to implement such complex models, as it provides more flexibility and one can modify the model as per work requirement. Nowadays training a deep neural network is very easy, thanks to François Chollet for developing Keras deep learning library. Using Keras, one can implement a deep neural network model with few lines of code.

The problem starts when as a researcher you need to find out the best set of hyperparameters that gives you the most accurate model/solution. Manually trying each set of parameters could be very time consuming and exhausting. Here, KerasRegressor class, which act as a wrapper of scikit-learn’s library in Keras comes as a handy tool for automating the tuning process.

In this article, we will learn step by step, how to tune a Keras deep learning regression model and identify the best set of hyperparameters. Same can be applied for the classification model.

About Dataset

I have a Transportation Engineering (Civil Engineering Domain) background. During my civil engineering Diploma, B.Tech and M.Tech I had performed the Concrete’s Characteristics Compressive Strength test in a laboratory setting. Thus, I thought it would be interesting to model the concrete’s compressive strength using a deep learning model.

Hence, in this article, we are going to use the concrete dataset [1] obtained from the UCI Machine Learning library.

The dataset includes the following variables, which are the ingredients used to make high strength durable concrete mix.

I1: Cement (C1): kg in a m3 mixture

I2: Blast Furnace Slag (C2): kg in a m3 mixture

I3: Fly Ash (C3): kg in a m3 mixture

I4: Water (C4): kg in a m3 mixture

I5: Superplasticizer (C5): kg in a m3 mixture

I6: Coarse Aggregate (C6): kg in a m3 mixture

I7: Fine Aggregate (C7): kg in a m3 mixture

I8: Age: Day (1~365)

O1: Concrete compressive strength: MPa

Where I: Input; O: Output, C: Component; m3: meter cube and MPa: Megapascal.

Before proceeding to the data analysis part, let’s get familiar with the different inputs of the concrete dataset.

Concrete

Concrete is comprised of three basic components: water, aggregate (rock, sand, or gravel) and cement. Cement acts as a binding agent when mixed with water and aggregates.

Compressive Strength

Compressive strength is one of the vital parameters that determine the performance as a construction material. A concrete mix designed to get the required performance and durability for a given construction work/project. The compressive strength of concrete is determined in laboratories in order to maintain the desired quality of concrete during casting. The compressive strength is calculated by dividing the failure load with the area of application of load, usually after 28 days (I8: Age) of the curing period. Though researchers also report strength after 7, 14 and 21 days of curing period. The strength of concrete is achieved by controlling the proportion of cement (C1), fine (C7) and coarse (C6) aggregates, water, and various admixtures. The characteristic compressive strength of concrete fc/ fck is usually reported in MPa (O1). For normal Construction, the characteristic compressive strength can vary from 10 to 60 MPa; while for a certain structure the requirement can go beyond 600 MPa.

A dmixture

Nowadays, researchers are using different admixtures to get desired property; the fly ash (C3) is one of them. The fly ash act as an admixture in concrete mixes, which is a pozzolan substance containing aluminous and siliceous material; when mixed with lime and water, forms a compound similar to cement. Fly ash is mixed in concrete as an admixture to improve workability and to reduce permeability and bleeding.

Similarly, the ground granulated blast furnace slag (C2), a mineral admixture is added in concrete to improves its properties such as workability, strength and durability.

Superplasticizers

Superplasticizers (high range water reducers) are used in concrete mixes for making high strength durable concrete. Superplasticizers (C5) are water-soluble organic substances that reduce the amount of water require to achieve certain stability of concrete, reduce the water-cement ratio, reduce cement content and increase slump. Use of superplasticizers reduces the water requirement up to 30% without losing workability.

Aim

The aim of the modelling is to predict the characteristic compressive strength of concrete ( regression problem ) based on the given input components (cement, blast furnace slag, fly ash, water, superplasticizers, coarse and fine aggregates, and Age).

Here, we will try to find out the best set of hyperparameters that minimizes the loss function to the maximum extend. In other words, we will look for the parameter set that provides the most accurate solution.

Loading relevant libraries

The very first step is to load relevant python libraries

import numpy as np                #for array manipulation
import pandas as pd               #data manipulation
from sklearn import preprocessing #scalingimport keras
from keras.layers import Dense    #for Dense layers
from keras.layers import BatchNormalization #for batch normalization
from keras.layers import Dropout            #for random dropout
from keras.models import Sequential #for sequential implementation
from keras.optimizers import Adam   #for adam optimizer
from keras import regularizers      #for l2 regularization
from keras.wrappers.scikit_learn import KerasRegressor 
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import KFold 
from sklearn.model_selection import cross_val_score

Loading dataset

The next step is to load the data from an excel sheet from your local storage and performing basic exploratory data analysis.

concrete = pd.read_excel('Concrete_Data.xlsx')
concrete.head()

Defining input and target data

The next step is to assign the input columns (components) to train_inputs, and output/target column to train_targets variable. We need to convert the data to a NumPy array using .values method before feeding into the neural network model. The dataset includes 1030 observations and 8 columns.

train_inputs = concrete.drop("Comp_str", axis = 1).values
train_targets = concrete["Comp_str"].valuesprint(train_inputs.shape)
print(train_targets.shape)

Data Preprocessing

Standardizationof datasets is a common requirement for many machine learning estimators; else they might behave badly if the individual features do not more or less look like standard normally distributed data: Gaussian with zero mean and unit variance . So, the next step is to scale data so that it has zero mean and unit variance.

train_inputs =  preprocessing.scale(train_inputs)

Setting Model Configuration

To perform hyperparameter tuning the first step is to define a function comprised of the model layout of your deep neural network. Here, is the step by step guide for defining the function named create_model .

Step1: The very first step is to define a function create_model where we have initiated default arguments learning_rate = 0.01, and activation function “relu”. Don’t worry these are default and later we will tweak them for tuning purpose.

def create_model(learning_rate = 0.01, activation = 'relu')

Step2: The next step is to set our optimizer. Here we have selected Adam optimizer and initiated with our default argument learning rate value.

# Use Adam optimizer with the given learning rate
opt = Adam(lr = learning_rate)

Step3: The first layer always needs an input shape. Here, the input shape is the number of columns in the training dataset. We extracted the number of columns input using the .shape method and indexing the second value.

n_cols = train_inputs.shape[1]
input_shape = (n_cols, )

Step4: The next step is to define the sequential layout of your model. Here, we used two dense layers of 128 hidden neurons. The activation is set to the default argument i.e. “relu” and we also set an l2 regularization to penalize large weights and to improve representation learning. To make the representation learning more robust we added Dropout layer that drops 50% of the connections randomly.

# Create your binary classification model  
    model = Sequential()
    model.add(Dense(128,
                    activation = activation,
                    input_shape = input_shape,
                    activity_regularizer = regularizers.l2(1e-5)))
    model.add(Dropout(0.50))
    model.add(Dense(128,
                    activation = activation, 
                    activity_regularizer = regularizers.l2(1e-5)))
    model.add(Dropout(0.50))
    model.add(Dense(1, activation = activation))

Step5: The next step is to compile the model. For compilation, we need an optimizer and a loss function. Here we have opted for the Adam optimizer and as this is a regression task hence we opted for “ mean_absolute_error ” loss function. We choose mae as it is more robust to outlier than mse . To keep track of the other errors we set other two metrics which are mean absolute error ( mse ) and mean absolute percentage error ( mape ).

# Compile the model
    model.compile(optimizer = opt,
                  loss = "mean_absolute_error",
                  metrics=['mse', "mape"])
    return model

Here is the overall blueprint of model configuration:

n_cols = train_inputs.shape[1]
input_shape = (n_cols, )# Creates a model given an activation and learning rate
def create_model(learning_rate = 0.01, activation = 'relu'):
  
    # Create an Adam optimizer with the given learning rate
    opt = Adam(lr=learning_rate)
  
    # Create your binary classification model  
    model = Sequential()
    model.add(Dense(128, 
                    activation = activation,
                    input_shape = input_shape,
                    activity_regularizer = regularizers.l2(1e-5)))
    model.add(Dropout(0.50))
    model.add(Dense(128,
                    activation = activation, 
                    activity_regularizer = regularizers.l2(1e-5)))
    model.add(Dropout(0.50))
    model.add(Dense(1, activation = activation))# Compile the model
    model.compile(optimizer = opt,
                  loss = "mean_absolute_error",
                  metrics=['mse', "mape"])
    return model

Defining Model Tuning Strategy

The next step is to set the layout for hyperparameter tuning.

Step1: The first step is to create a model object using KerasRegressor from keras.wrappers.scikit_learn by passing the create_model function. We set verbose = 0 to stop showing the model training logs. Similarly, one can use KerasClassifier for tuning a classification model.

# Create a KerasRegressor
model = KerasRegressor(build_fn = create_model,
                       verbose = 0)

Step2: Next step is to define the hyperparameter search space. Here, we will try the following common hyperparameters:

activation function: relu and tanh

batch size: 16 , 32 and 64

epochs: 50 and 100

learning rate: 0.01, 0.001 and 0.0001

# Define the parameters to try out
params = {'activation': ["relu", "tanh"],
          'batch_size': [16, 32, 64], 
          'epochs': [50, 100],
          'learning_rate': [0.01, 0.001, 0.0001]}

Step3: Next we will perform a randomized cross-validation search across the parameter space using RandomizedSearchCV function. We selected the randomized search as it works faster than grid search. Here, we will perform a 10 fold cross-validation search. For smaller datasets, creating a separate validation dataset costs training data thus, in such scenarios cross-validation technique could be a better model training approach.

random_search = RandomizedSearchCV(model,
                                   param_distributions = params,
                                   cv = KFold(10))

Step4: Next, we will fit the model to our train_inputs and train_targets

random_search_results = random_search.fit(train_inputs, train_targets)

Here, is the blueprint of overall model tuning layout

# Create a KerasClassifier object
model = KerasRegressor(build_fn = create_model,
                       verbose = 0)# Define the hyperparameter space
params = {'activation': ["relu", "tanh"],
          'batch_size': [16, 32, 64], 
          'epochs': [50, 100],
          'learning_rate': [0.01, 0.001, 0.0001]}# Create a randomize search cv object random_search = RandomizedSearchCV(model,
                                   param_distributions = params,
                                   cv = KFold(10))random_search_results = random_search.fit(train_inputs, train_targets)

Identifying best parameters

The model with the best parameters has achieved a Mean Absolute Error (MAE) of 6.197 (approx.). The best model performance is achieved with a learning rate of 0.001, epochs size of 100, batch_size of 16 and with a relu activation function.

print("Best Score: ",
      random_search_results.best_score_,
      "and Best Params: ",
      random_search_results.best_params_)

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Hyperparameter Tuning of Keras Deep Learning Model in Python

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Webbots、Spiders和Screen Scrapers

斯昆克 / 2013-5 / 69.00元

《Webbots、Spiders和Screen Scrapers:技术解析与应用实践(原书第2版)》共31章，分为4个部分：第一部分（1～7章），系统全面地介绍了与Webbots、Spiders、Screen Scrapers相关的各种概念和技术原理，是了解和使用它们必须掌握的基础知识；第二部分（8～16章），以案例的形式仔细地讲解了价格监控、图片抓取、搜索排名检测、信息聚合、FTP信息、阅读与发......一起来看看《Webbots、Spiders和Screen Scrapers》这本书的介绍吧!

码农工具