Build your own deep learning classification model in Keras

栏目: IT技术 · 发布时间: 3年前

Build your own deep learning classification model in Keras

An intuitive guide to building your own deep learning classification model from scratch

Jun 30 ·8min read

Build your own deep learning classification model in Keras — Figure 1: A classification example of the Pascal-VOC dataset

Introduction

I mage classification is a field of artificial intelligence that is gaining in popularity in the latest years. It has various applications: self-driving cars, face recognition, and augmented reality.

In this article, we will use a step by step approach to build a deep learning image classification model.

I have made the full code available in a shared google collab so you can easily execute the code yourself!

After reading this guide, you will know the following things:

How to make use of the free GPU power of google collab
How to load in a popular image classification dataset (Pascal VOC)
How to create a deep learning convolutional neural network using a combination of Keras & Tensorflow
How to implement a datagenerator
How to train a deep learning model & evaluate the results

Step #1: Set up the environment

Please check the google collab for the required packages.

We will be building a deep learning convolutional network from scratch. This requires huge amounts of computing power.

Luckily Google comes to our rescue! They have developed an online python notebook which gives users free computing power.

We enable the free computing power feature by select the GPU option in the notebook settings.

Step #2: Import the data

We will use the Pascal VOC image dataset for our deep learning model.

The pascal voc dataset is a standardised image dataset for objects class recognition and is widely used by computer vision professionals to showcase their skills.

We use the Wget package to download the dataset. This package fetches the data and downloads it to your current working directory.

As a last step, we open the tarfile and extract it.

Good job! You have now successfully loaded in and extracted the dataset.

import tarfile!wget -nc http://host.robots.ox.ac.uk/pascal/VOC/voc2009/VOCtrainval_11-May-2009.tartf = tarfile.open("/content/VOCtrainval_11-May-2009.tar")tf.extractall()

Step #3: Load the data

The current data structure is not optimal for building deep learning convolutional models.

We will have to transform the data in a more optimized format.

The extracted Pascal VOC dataset should have the two following folders:

Annotations: This folder contains all the information about the image labels.
JPEGImages: This folder contains all the raw images

We will first create a dataset with all the filenames and their respective labels. E.g. filename “2208–001068” has the following labels “bicycle” & “sofa”.

directory_annotations = '/content/VOCdevkit/VOC2009/Annotations'filenames = []
classification = []for xml_file in os.listdir(directory_annotations):    # Save image for classification and their class label    if os.path.isfile(xml_file):
    xml_tree = ET.parse(xml_file)
    root = xml_tree.getroot()
    imgname = root.find('filename').text.strip('.jpg')
    labels = []
    for obj in root.findall('object'):
    label = obj.find('name').text
    labels.append(label)    filenames.append(imgname)
    classification.append(labels)

Step #4: Preprocess

In this step, we will perform the following tasks:

We split the up the filenames and their respective classification in a training and test set.

label_filenames_temp = os.listdir(directory_annotations)
filenames = []for lbl in label_filenames_temp:
    filenames.append(lbl.split('.')[0])filecount = len(filenames)indexes = []for index in range(filecount):
     indexes.append(index)training_indexes = indexes[:int(filecount*0.7)]
validation_indexes = indexes[int(filecount*0.7):int(filecount*0.9)]
testing_indexes = indexes[int(filecount*0.9):]

We convert these labels to numeric values since deep learning networks require the input and output variables to be numbers.

directory_images = '/content/VOCdevkit/VOC2009/JPEGImages'directory_annotations = '/content/VOCdevkit/VOC2009/Annotations'labelnames = preprocessing.LabelEncoder()labelnames.fit(["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"])

We resize the images when loaded in to the 224,224,3 format. Literature review advises this for the VGG16 model. (Simonyan & Zisserman, 2014)

def generate_from_xml(filename):label = np.zeros((20), dtype = 'float32')tree = ET.parse(os.path.join(directory_annotations, filename + ".xml"))raw_image = cv2.imread(os.path.join(directory_images, filename + ".jpg"))res_img = cv2.resize(raw_image, (224,224))     for elems in tree.iter():
         if elems.tag == "object":
            name = elems.find("name").text
            labelnr = labelnames.transform([name])[0]
            label[labelnr] = 1return label, res_img

Step #5: Datagenerator

If we would run our model on the dataset without using a datagenerator, our ram memory will crash. It is best practice to use a datagenerator when using big datasets (opposing to buying more ram memory). We create our datagenerator class instance and call it twice, one for the training set and once for the validation set.

Step #6: Create our model

In this task we will build a classification convolutional neural network from scratch and train it to recognize the 20 target classes in the Pascal Voc dataset.

Our Model architecture will be based on the popular VGG-16 architecture. This is a CNN with a total of 13 convolutional layers (cfr. figure 1).

We opt for the sequential approach of building the model.

model = Sequential()

We add 2 convolutional layers.

In the convolutional layers, multiple filters are applied to the image to extract different features.

Arguments given:

- Input-shape: The image given should be of the shape (224,224,3).

- Filters: The number of filters that the convolutional layer will learn.

- Kernel_size: specifies the width and height of the 2D convolution window.

- Padding: Specifying “same” ensures that the spatial dimensions are the same after the convolution.

- Activation: This is more of a convenience argument. Here, we specify which activation function will be applied after the convolutional layers. We will apply the ReLU activation function. More on this later.

model.add(Conv2D(input_shape=(224,224,3),filters=64,kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))

Next, we add 1 maxpool layer.

Pooling is used to reduce the dimensionality of images by reducing the number of pixels in the output of the previous convolutional layer.

- Pool_size= 2,2 -> this is the ‘matrix’ that will go over the output and where the maximum value is taken from

- strides= 2,2 -> the increment of how the pool matrix will move along x & y -axis.

model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))

We continue to add layers to our deep learning network. The same logic as described above is applied.

model.add(Conv2D(filters=128, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=128, kernel_size=(3,3),padding="same", activation="relu"))model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))model.add(Conv2D(filters=256, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=256, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=256, kernel_size=(3,3),padding="same", activation="relu"))model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))model.add(Conv2D(filters=512, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=512, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=512, kernel_size=(3,3),padding="same", activation="relu"))model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))model.add(Conv2D(filters=512, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=512, kernel_size=(3,3),padding="same", activation="relu"))model.add(Conv2D(filters=512, kernel_size=(3,3),padding="same", activation="relu"))model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))

Now the convolutional base is created. To be able to generate a prediction, we will now have to flatten the output of the convolutional base.

model.add(Flatten())

Add the dense layers. The dense layers feeds the output of the convolutional base to its neurons.

Arguments:

- Units: Number of neurons

- Activation function: Relu

The Relu activation function speeds up training since the gradient computation is very simple (0 or 1). This also implies that negative values are not passed or “activated” on to the next layer. This makes that only a certain number of neurons are activated which makes it computationally interesting.

model.add(Dense(units=4096,activation="relu"))model.add(Dense(units=4096,activation="relu"))

We add a sigmoid layer in order to turn the output of the previous layer into a probability distribution. The sigmoid is ideal for multi label classification so that is why we used sigmoid instead of for example a softmax activation.

The probabilities produced by a sigmoid are independent and are not constrained to sum to one. This is crucial in a classification with multiple output labels.

We set the units argument to 20 since we have 20 possible classes.

model.add(Dense(units=20, activation="sigmoid"))

Step #7: Loss function & optimizer

As a final step, we have to compile the model. We use the RMSprop optimier to be able to reach the global minima. We set the learning rate at 0.001.

RMSprop, root mean square prop, is an unpublished optimization algorithm but is very popular by machine learning practitioners. It reduces the fluctuations in the vertical direction while speeding up the learning in the horizontal direction. This causes our model to converge faster to a global minima. The main difference with the regular gradient descent algorithm is how the gradients are calculated. The formula of the calculation of the gradients is shown in the figure below.

We opted for the binary cross-entropy loss. It is recommended to use this loss function for a multi-label classification since each element belonging to a certain class should not be influenced by the decision for another class.

model.compile(optimizer= keras.optimizers.RMSprop(lr=0.001), loss='binary_crossentropy',metrics=['accuracy'])model.summary()

Step #8: Model training

We use the earlystopping method to stop the training once the model performance stops improving on a hold out dataset. In this way we automatically have the perfect number of epochs while monitoring overfitting.

We give the earlystopping the instructions to seek a minimum for the validation loss.

The earlystopping method only stops training when no further improvement is detected.

However, the last epoch is not necessarily the one with the best performance.

Therefore we also use the model checkpoint method. This will save the best model observed during the training based on the validation loss.

filepath = "/content/drive/My Drive/MYCNN/CNN1505_v1.h5"earlyStopping = EarlyStopping(monitor='val_loss', verbose=0, mode='min', patience = 4)mcp_save = ModelCheckpoint(filepath, save_best_only=True, monitor='val_loss', mode='min')

Now we will start training our deep learning neural network. We use the fit generator from keras to load in the data in batches. This is necessary since our entire training set doesn’t fit in our RAM.

We set the following arguments:

Use multiprocessing: Whether to use process-based threading
Workers: Number of threads generating batches in parallel.

history = model.fit_generator(generator=training_generator,
validation_data=val_generator,
use_multiprocessing=True,
workers=6,
epochs = 20,
callbacks=[earlyStopping, mcp_save])

When our training has finished, we visualize our training and validation results. Two metrics are plotted:

Model accuracy
Model loss

Step #9: Validate our model

We see that the model quickly converged from a huge training loss in the first epoch to lower numbers. This fast learning rate is due to the nature of the optimizer chosen (RMS prop) which speeds up convergence. Our model picks then the model with the lowest validation loss when this metric has not been improved over four epochs.

df = pd.DataFrame(history.history)
print(history.history.keys())# summarize history for accuracyplt.plot(history.history['accuracy'])plt.plot(history.history['val_accuracy'])plt.title('model accuracy')plt.ylabel('accuracy')plt.xlabel('epoch')plt.legend(['train', 'test'], loc='upper left')plt.show()# summarize history for lossplt.plot(history.history['loss'])plt.plot(history.history['val_loss'])plt.title('model loss')plt.ylabel('loss')plt.xlabel('epoch')plt.legend(['train', 'test'], loc='upper left')plt.show()

Step #10: Test our model performance

We now test our model on the test set to see how it performs on unseen data:

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Build your own deep learning classification model in Keras

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

数据结构（C语言版）

严蔚敏、吴伟民 / 清华大学出版社 / 2012-5 / 29.00元

《数据结构》（C语言版）是为“数据结构”课程编写的教材，也可作为学习数据结构及其算法的C程序设计的参数教材。本书的前半部分从抽象数据类型的角度讨论各种基本类型的数据结构及其应用；后半部分主要讨论查找和排序的各种实现方法及其综合分析比较。其内容和章节编排1992年4月出版的《数据结构》（第二版）基本一致，但在本书中更突出了抽象数据类型的概念。全书采用类C语言作为数据结构和算法的描述语言。 ......一起来看看《数据结构（C语言版）》这本书的介绍吧!

码农工具

Build your own deep learning classification model in Keras