A reading guide about Deep Learning with CNNs

栏目: IT技术 · 发布时间: 5年前

A reading guide about Deep Learning with CNNs

Part II: Image segmentation

Welcome back to Part II of this series. If you have missed the first part, have a look here: Part I: Image recognition and convolutional backbones .

In this part, you will find a guide through the literature about image segmentation with convolutional neural networks (CNNs) until 2019. It adds none scientific sources to this open access review paper to further increase an intuitive understanding of the evolution of CNNs.

Same as in part one, you can find the tables of the sources in this github repository:

Now, let’s dive into the next chapter of our adventure of deep learning with CNNs.

A rough overview about image segmentation with CNNs

During image segmentation, for each pixel a single class is predicted, like this:

A reading guide about Deep Learning with CNNs

Example for image segmentation. Modified according to: Hoeser and Kuenzer 2020 p. 8 [1]

When CNNs, which we discussed in Part I , became more popular, they were first used for so called patch based image segmentation. Therefore, a CNN moves over the input image in a moving window style and predicts the class of the center pixel of the patch (a little part of the whole the image) or the complete patch.

With the work of Long et al. 2014 [2], so called fully convolutional networks (FCNs) were introduced, and image segmentation with CNNs became much more sophisticated. Overall, the processing in FCNs looks like this: first features are extracted from the input image, by using a convolutional backbone ( the encoder ,see Part I). Thereby, the resolution is getting smaller, while feature depth is growing. The so extracted feature maps have a high semantic meaning but no precise localization. Since we need pixel-wise predictions for image segmentation, this feature maps are then upsampled back to input resolution (the decoder) . The difference to the input image is now, that each pixel holds a discrete class label and therefore the image is segmented in semantic meaningful classes.

Two major different concepts how the upsampling in the decoder can be done, do exist:

  • Naive decoder (this term was used e.g. in Chen et al. 2018 [3]): The upsampling is done by applying e.g. bilinear interpolation
  • Encoder-decoder: Upsampling is done by trainable deconvolution operations and/or by merging features from the encoder part with higher localization information during upsampling, see those examples:

A reading guide about Deep Learning with CNNs

Source: Hoeser and Kuenzer 2020 p. 17 [1]

In order to dive into image segmentation with deep learning, the sources in the table below are good starting points. Be aware of the fact, that next to CNNs there other deep learning model types which perform image segmentation, like generative adversarial networks (GANs) or long short term memory (LSTM) approaches, but this guide focuses on CNNs. Also, some times models of the R-CNN family are discussed from an image segmentation perspective. This guide will discuss them, when we reach object detection in the next part. So do not be confused, when you read about them somewhere else (like in review papers) and they are not mentioned here yet.

The evolution of FCNs for image segmentation

A reading guide about Deep Learning with CNNs

Performance of different FCN inspired architectures on the PASCAL-VOC 2012 benchmark dataset. * those models were tested on other datasets. Source: Hoeser and Kuenzer 2020 p. 17 [1]

The evolution of the DeepLab family is characteristic for the evolution of FCN inspired models for image segmentation. DeepLab variants can be found in both, naive-decoder and encoder-decoder models. Hence, the guide orientates on this family by first looking at naive-decoders and then turning towards encoder-decoder models.

Naive-decoder models

The most important insights of naive-decoder models are mainly the establishment of so called atrous convolutions and long range image context exploitation for prediction on pixel level. Atrous convolutions are a variant of normal convolutions, which allow an increasing receptive field without the loss of image resolution. The famous Atrous Spatial Pyramid Pooling module ( ASPP module ) in DeepLab-V2 [4] and later combines both: atrous convolutions and long range image context exploitation. When reading the following literature, focus on the developments of those features — Atrous convolutions, the ASPP module and long range image context exploitation/parsing.

Encoder-decoder models

The today most famous encoder-decoder is probably the U-Net [5]. A CNN which was developed for analyzing medical images. Its clear structure invited many researchers to experiment and adopt it and it is famous for its skip connections, which allow the sharing of features between encoder and decoder paths. Encoder-decoder models focus on enhancing the semantically rich feature maps during upsampling in the decoder with more locally precise feature maps from the encoder.

With the literature at hand, you will be able to reflect on modern image segmentation papers and implementations with CNNs. Let’s meet again in Part III, where we will discuss object detection.

References

[1] Hoeser, T; Kuenzer, C. Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review-Part I: Evolution and Recent Trends. Remote Sensing 2020, 12(10), 1667. DOI: 10.3390/rs12101667.

[2] Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 39, 640–651.

[3] Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Computer Vision–ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C.; Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 833–851

[4] Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal.

Mach. Intell. 2016, 40, 834–848.

[5] Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015; Navab, N., Hornegger, J.,

Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

产品经理修炼之道

产品经理修炼之道

费杰 / 机械工业出版社华章公司 / 2012-7-30 / 59.00元

本书由资深产品经理、中国最大的产品经理沙龙Pmcaff创始人费杰亲自执笔,微软、腾讯、百度、新浪、搜狐、奇虎、阿里云、Evernote等国内外20余家大型互联网企业资深产品经理和技术专家联袂推荐。用系统化的方法论和丰富的实战案例解读了优秀产品经理所必须修炼的产品规划能力、产品设计能力、产品执行能力,以及思考、分析和解决问题的能力和方法,旨在为互联网产品经理打造核心竞争力提供实践指导。 全书一......一起来看看 《产品经理修炼之道》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具