Neural Networks Intuitions: 8. Translation Invariance in Object Detectors

栏目: IT技术 · 发布时间: 4年前

Neural Networks Intuitions: 8. Translation Invariance in Object Detectors

May 3 ·4min read

Hello everyone!

This article is going to be a short one and focuses on a less significant but highly overlooked concept in object detectors, especially in single shot detectors — Translation Invariance.

Let’s understand what translation invariance is and what makes an image classifier/object detector translation invariant.

*Note: This article assumes you have background knowledge of how single and two-stage detectors work :-)

Translation Invariance:

Translation in computer vision means displacement in space and Invariance means the property of being unchanged.

Therefore when we say an image classifier or an object detector is translation invariant, it means:

Image Classifier can predict a class accurately despite where the class(more specifically, pattern) is located along the image’s spatial dimensions. Similarly, a detector can detect an object irrespective of where it is present in the image.

Let us look at an example for each of the problem to make things clear.

Neural Networks Intuitions: 8. Translation Invariance in Object Detectors
Image Classification: Untranslated and Translated versions of an image from MNIST dataset

Neural Networks Intuitions: 8. Translation Invariance in Object Detectors

Object Detection: Translated versions of dog — object to be detected in the image.

In this article, we will be considering only Convolutional Neural Nets — be it a classifier or a detector and see whether they are translation invariant or not!

Translation Invariance in Convolutional Classifiers:

Are CNNs translation invariant? If so, what makes them invariant to translation?

Firstly, CNNs are not completely translation invariant but only to some extent. Next, it is ‘pooling’ that makes them translation invariant and not the convolution operation(applying filters).
The above statement is applicable only for classifiers and not for object detectors.

If we read Hinton’s paper on translation invariance in CNNs , he clearly states that the pooling layer was introduced to reduce computation complexity and that Translation Invariance was only a by-product of it.

One can make CNNs completely translation invariant by feeding the right kind of data — although this may not be 100% feasible.

Note: I won’t be addressing the question of how pooling makes CNNs translation invariant. You can check it out in the links below :-)

http://cs231n.stanford.edu/reports/2016/pdfs/107_Report.pdf

Translation Invariance in Two-stage Detectors:

Two stage object detectors has the following components:

  1. Region proposal stage
  2. Classification stage

The first stage predicts locations of objects of interest(i.e region proposals) and the second stage classifies those region proposals.

We can see that the first stage predicts foreground object locations, which means the problem now is reduced to image classification — performed by the second stage . This reduction makes a two-stage detector translation invariant without introducing any explicit changes to the neural network architecture.

This decoupling of the object’s class prediction from the object’s bounding box prediction makes a two stage detector translation invariant !

Translation Invariance in Single stage detectors:

Now that we have looked into two stage detectors, we know that a single stage detector needs to couple box and class predictions. One way of doing that is to make dense predictions(anchors) on a feature map i.e at every grid cell or a group of cells on a feature map.

Read the following article where I explain in-depth about Anchors: Neural Networks Intuitions: 5. Anchors and Object Detection .

Since these dense predictions are made by convolving filters on feature maps, this enables the network to detect the same pattern when occurred in a different location on the feature map.

For example, let us consider a neural network trained to detect dogs present in an image. The filters in the final conv layer are responsible for recognizing these dog patterns.

We feed data to the network such that the dog always appear on the left side of the image and test it with an image where the dog appears on the right side.

Neural Networks Intuitions: 8. Translation Invariance in Object Detectors

One of the filters in the last layer learns the above dog pattern and since the same filter is being convolved throughout and a prediction is being made at every location in the feature map, it recognizes the same dog pattern in a different location!

Finally to answer the question of “Why filters make detectors translation invariant but not classifiers?”

Filters in Conv Nets learn local features in an image rather than taking in the global context. Since the problem of object detection is to detect local features(objects) from the image and not make predictions from the entire feature map(which is what happens in case of an image classifier), filters help in making them invariant to translation.

That’s all in this eighth instalment of my series. I hope you folks were able to get a good grasp of translation invariance in general and what makes the detector invariant to object translation in images. Please feel free to correct me if am wrong :-)

Cheers!


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

计算机程序设计艺术:第4卷 第4册(双语版)

计算机程序设计艺术:第4卷 第4册(双语版)

Donald E.Knuth / 苏运霖 / 机械工业出版社 / 2007-4 / 42.00元

关于算法分析的这多卷论著已经长期被公认为经典计算机科学的定义性描述。迄今已出版的完整的三卷组成了程序设计理论和实践的惟一的珍贵源泉,无数读者都赞扬Knuth的著作对个人的深远影响。科学家们为他的分析的美丽和优雅所惊叹,而从事实践的程序员们已经成功地应用他的“菜谱式”的解到日常问题上,所有人都由于Knuth在书中所表现出的博学、清晰、精确和高度幽默而对他无比敬仰。   为开始后续各卷的写作并更......一起来看看 《计算机程序设计艺术:第4卷 第4册(双语版)》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

URL 编码/解码
URL 编码/解码

URL 编码/解码

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试