Neural Networks Intuitions: 8. Translation Invariance in Object Detectors

栏目: IT技术 · 发布时间: 5年前

Neural Networks Intuitions: 8. Translation Invariance in Object Detectors

May 3 ·4min read

Hello everyone!

This article is going to be a short one and focuses on a less significant but highly overlooked concept in object detectors, especially in single shot detectors — Translation Invariance.

Let’s understand what translation invariance is and what makes an image classifier/object detector translation invariant.

*Note: This article assumes you have background knowledge of how single and two-stage detectors work :-)

Translation Invariance:

Translation in computer vision means displacement in space and Invariance means the property of being unchanged.

Therefore when we say an image classifier or an object detector is translation invariant, it means:

Image Classifier can predict a class accurately despite where the class(more specifically, pattern) is located along the image’s spatial dimensions. Similarly, a detector can detect an object irrespective of where it is present in the image.

Let us look at an example for each of the problem to make things clear.

Neural Networks Intuitions: 8. Translation Invariance in Object Detectors — **Image Classification:** Untranslated and Translated versions of an image from MNIST dataset

In this article, we will be considering only Convolutional Neural Nets — be it a classifier or a detector and see whether they are translation invariant or not!

Translation Invariance in Convolutional Classifiers:

Are CNNs translation invariant? If so, what makes them invariant to translation?

Firstly, CNNs are not completely translation invariant but only to some extent. Next, it is ‘pooling’ that makes them translation invariant and not the convolution operation(applying filters).

The above statement is applicable only for classifiers and not for object detectors.

If we read Hinton’s paper on translation invariance in CNNs , he clearly states that the pooling layer was introduced to reduce computation complexity and that Translation Invariance was only a by-product of it.

One can make CNNs completely translation invariant by feeding the right kind of data — although this may not be 100% feasible.

Note: I won’t be addressing the question of how pooling makes CNNs translation invariant. You can check it out in the links below :-)

Geoffrey Hinton on what's wrong with CNNs

I am going to be posting some loose notes on different biologically-inspired machine learning lectures. In this note I…

moreisdifferent.com

http://cs231n.stanford.edu/reports/2016/pdfs/107_Report.pdf

Translation Invariance in Two-stage Detectors:

Two stage object detectors has the following components:

Region proposal stage
Classification stage

The first stage predicts locations of objects of interest(i.e region proposals) and the second stage classifies those region proposals.

We can see that the first stage predicts foreground object locations, which means the problem now is reduced to image classification — performed by the second stage . This reduction makes a two-stage detector translation invariant without introducing any explicit changes to the neural network architecture.

This decoupling of the object’s class prediction from the object’s bounding box prediction makes a two stage detector translation invariant !

Translation Invariance in Single stage detectors:

Now that we have looked into two stage detectors, we know that a single stage detector needs to couple box and class predictions. One way of doing that is to make dense predictions(anchors) on a feature map i.e at every grid cell or a group of cells on a feature map.

Read the following article where I explain in-depth about Anchors: Neural Networks Intuitions: 5. Anchors and Object Detection .

Since these dense predictions are made by convolving filters on feature maps, this enables the network to detect the same pattern when occurred in a different location on the feature map.

For example, let us consider a neural network trained to detect dogs present in an image. The filters in the final conv layer are responsible for recognizing these dog patterns.

We feed data to the network such that the dog always appear on the left side of the image and test it with an image where the dog appears on the right side.

One of the filters in the last layer learns the above dog pattern and since the same filter is being convolved throughout and a prediction is being made at every location in the feature map, it recognizes the same dog pattern in a different location!

Finally to answer the question of “Why filters make detectors translation invariant but not classifiers?”

Filters in Conv Nets learn local features in an image rather than taking in the global context. Since the problem of object detection is to detect local features(objects) from the image and not make predictions from the entire feature map(which is what happens in case of an image classifier), filters help in making them invariant to translation.

That’s all in this eighth instalment of my series. I hope you folks were able to get a good grasp of translation invariance in general and what makes the detector invariant to object translation in images. Please feel free to correct me if am wrong :-)

Cheers!

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Neural Networks Intuitions: 8. Translation Invariance in Object Detectors

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

密码学概论(中文版)

wade trappe、lawrence C.washington / 特拉普 / 人民邮电出版社 / 2004-6-1 / 38.00

本书全面讲解了密码学基本知识以及相关的基础数学理论，介绍了椭圆曲线、AES和量子密码体制等密码学前沿知识，详细地阐述了数字签名、数字现金等应用问题。另外，书中每章均给出了相应的习题，在附录中给出了相关Mathematica、Maple和 MATLAB实例。本书可供高等院校就用数学、通信和计算机等专业用作密码学、通信安全和网络安全等课程的教材或参考书，也可供信息安全系统设计开发人......一起来看看《密码学概论(中文版)》这本书的介绍吧!

码农工具