Implementing Custom Data Generators in Keras

栏目: IT技术 · 发布时间: 5年前

内容简介:Data Generators is one of the most useful features of the Keras API. Consider a scenario where you have lots of data, so much that you cannot have all of it at once in the RAM. Wyd? Purchasing more RAM is obviously isn’t an option.Well, the solution to thi

How to Implement Custom Data Generators for Enabling Dynamic Data Flow in a Keras Model

Implementing Custom Data Generators in Keras

Jun 8 ·4min read

Implementing Custom Data Generators in Keras

Photo by Alexander Sinn on Unsplash

Data Generators is one of the most useful features of the Keras API. Consider a scenario where you have lots of data, so much that you cannot have all of it at once in the RAM. Wyd? Purchasing more RAM is obviously isn’t an option.

Well, the solution to this can be loading the mini-batches fed to the model dynamically. This is exactly what data generators do. They can generate the model input dynamically thus forming a pipeline from the storage to the RAM to load the data as and when it is required. Another advantage of this pipeline is, one can easily apply preprocessing routines on these mini-batches of data as they are prepared to feed the model.

In this article, we will see how to subclass the tf.keras.utils.Sequence class to implement custom data generators.

ImageDataGenerator

First things first, we will now see how to use the ImageDataGenerator API for dynamic image pipelining and hence, address the need for implementing custom ones.

datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True
)

The ImageDataGenerator API provides features for pipelining of image data from directories as well as from paths mentioned in a dataframe. One may include preprocessing steps like scaling, augmentation on images that would be directly applied to the images in real-time.

So, Why Custom Ones?

Model training is not limited to a single type of input and target. There are times when a model is fed with multiple types of inputs at once. For example, say you are working on a multi-modal classification problem where you need to process text and image data simultaneously. Here, you obviously cannot use ImageDataGenerator. And, loading all the data at once isn’t affordable. Hence, we tackle this issue by implementing a custom data generator.

Implementing a Custom Data Generator

We finally start with the implementation.

This will be a very generic implementation and hence can be directly copied. You just have to fill the blanks/replace certain variables with your own logic.

As mentioned earlier, we will subclass the tf.keras.utils.Sequence API.

def __init__(
     self, 
     df, 
     x_col, 
     y_col=None, 
     batch_size=32, 
     num_classes=None,
     shuffle=True
):
     self.batch_size = batch_size
     self.df = dataframe
     self.indices = self.df.index.tolist()
     self.num_classes = num_classes
     self.shuffle = shuffle
     self.x_col = x_col
     self.y_col = y_col
     self.on_epoch_end()

First, we define the constructor to initialize the configuration of the generator. Note that here, we assume the path to the data is in a dataframe column. Hence, we define the x_col and y_col parameters. This could also be a directory name from where you can load the data.

The on_epoch_end method is a method that is called after every epoch. We can add routines like shuffling here.

def on_epoch_end(self):
     self.index = np.arange(len(self.indices))
     if self.shuffle == True:
          np.random.shuffle(self.index)

Basically, we shuffled the order of the dataframe rows in this snippet.

Another utility method we have is __len__ . It essentially returns the number of steps in an epoch, using the samples and the batch size.

def __len__(self):
# Denotes the number of batches per epoch
return len(self.indices) // self.batch_size

Next is the __getitem__ method which is called with the batch number as an argument to obtain a given batch of data.

def __getitem__(self, index):
# Generate one batch of data
# Generate indices of the batch

index = self.index[index * self.batch_size:(index + 1) * self.batch_size]
# Find list of IDs
batch = [self.indices[k] for k in index]
# Generate data
X, y = self.__get_data(batch)
return X, y

Basically, we just obtained the shuffled indices and called the dataset from a different method and returned it to the caller. The logic for the dataset generation can be implemented here itself. However, it is a good practice to abstract it to somewhere else.

Finally, we write the logic for our data generation in the __ get_data method. Since this method is going to be called by us, we can name it anything. Moreover, there is no reason for this method to be public, hence we define it private.

def __get_data(self, batch):
# X.shape : (batch_size, *dim)
# We can have multiple Xs and can return them as a list
X = # logic to load the data from storage
y = # logic for the target variables
# Generate data
for i, id in enumerate(batch):
# Store sample
X[i,] = # logic
# Store class
y[i] = # labels
return X, y

Additionally, we can add preprocessing / augmentation routines to enable them in real-time. In the above piece of code, the X and y are loaded from data sources according to the batch indices argument passed in the method. This can be anything from loading images to loading texts or both simultaneously or any other kind of data.

After incorporating all the methods, the complete generator looks like this:

Complete Custom Data Generator Class

Conclusion

In this article, we saw the usefulness of data generators while training models with a huge amount of data. We peeked at the ImageDataGenerator API to see what it is and to address the need for custom ones. Then, we finally learned how to implement a custom data generator by subclassing the tf.keras.utils.Sequence API.

Feel free to copy this code and add your own generator logic to it.

References


以上所述就是小编给大家介绍的《Implementing Custom Data Generators in Keras》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

C++标准模板库编程实战

C++标准模板库编程实战

Ivor Horton / 郭小虎、程聪 / 2017-1

《C++标准模板库编程实战》介绍最新的C++14标准的API、库和扩展,以及如何将它们运用到C++14程序中。在书中,作者Ivor Horton 则阐述了什么是STL,以及如何将它们应用到程序中。我们将学习如何使用容器、迭代器,以及如何定义、创建和应用算法。此外,还将学习函数对象和适配器,以及它们的用法。 阅读完本书之后,你将能够了解如何扩展STL,如何定义自定义类型的C++组件,你还将能够......一起来看看 《C++标准模板库编程实战》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

URL 编码/解码
URL 编码/解码

URL 编码/解码