Architecture comparison of AlexNet, VGGNet, ResNet, Inception, DenseNet

栏目: IT技术 · 发布时间: 5年前

内容简介：Hello readers, If you are looking for a perfect guide to get all the information about AlexNet, VGGNet, ResNet, Inception and DenseNet then you are at the correct place. Read the blog carefully and you will get the detailed information regarding all the ar

Layers Description with Hyperparameter and accuracy in ILSVRC Challenge results

Khush Patel

Mar 8 ·9min read

Hello readers, If you are looking for a perfect guide to get all the information about AlexNet, VGGNet, ResNet, Inception and DenseNet then you are at the correct place. Read the blog carefully and you will get the detailed information regarding all the architecture. Enjoy !!!

AlexNet

AlexNet is the first large scale convolutional neural network architecture that does well on ImageNet classification. AlexNet was entered into the competition and was able to outperform all previous non-deep learning-based models by a significant margin.

AlexNet architecture is a conv layer followed by pooling layer, normalization, conv-pool-norm, and then a few more conv layers, a pooling layer, and then several fully connected layers afterwards. Actually looks very similar to the LeNet network. There are just more layers in total. There are five of these conv layers, and two fully connected layers before the final fully connected layer going to the output classes.

AlexNet was trained on ImageNet, with inputs at a size 227 x 227 x 3 images. If we look at this first layer which is a conv layer for the AlexNet, it’s 11 x 11 filters, 96 of these applied at stride 4. I had 55 x 55 x 96 in the output and 35K parameters in this first layer. The second layer is a pooling layer and in this case, we have 3 filters of 3 x 3 applied at stride 2. The output volume of the pooling layer is 27 x 27 x 96 with and 0 parameter to learn. The pooling layer does not learn anything because the parameters are the weights which trying to learn. Convolutional layers have weights that we learn but pooling all we do is have a rule, we look at the pooling region, and we take the max. So there are no parameters that are learned.

There are 11 x 11 filters at the beginning, then five by five and some three by three filters. In the end, we have a couple of fully connected layers of size 4096 and finally, the last layer, is FC8 going to the softmax, which is going to the 1000 ImageNet classes. This architecture is the first use of the ReLu non-linearity.

Hyperparameter :

This architecture is the first use of the ReLU non-linearity. AlexNet uses a layer of normalization also. In data augmentation, ALexNet used flipping, jittering, cropping, colour normalization and these things. Other parameters are Dropout with 0.5, SGD + Momentum with 0.9, initial learning rate 1e-2 and again reduced by 10 when validation accuracy become flat. The regularization used in this network is L2 with a weight decay of 5e-4. It was trained on GTX580 GPU which contains 3GB of memory.

It has an error rate of 16.4 in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC).

AlexNet was the winner of the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) classification the benchmark in 2012.

VGGNet

In 2014 there are a couple of architectures that were more significantly different and made another jump in performance, and the main difference with these networks with the deeper networks.

VGG 16 is 16 layer architecture with a pair of convolution layers, poolings layer and at the end fully connected layer. VGG network is the idea of much deeper networks and with much smaller filters. VGGNet increased the number of layers from eight layers in AlexNet. Right now it had models with 16 to 19 layers variant of VGGNet. One key thing is that these models kept very small filters with 3 x 3 conv all the way, which is basically the smallest conv filter size that is looking at a little bit of the neighbouring pixels. And they just kept this very simple structure of 3 x 3 convs with the periodic pooling all the way through the network.

VGG used small filters because of fewer parameters and stack more of them instead of having larger filters. VGG has smaller filters with more depth instead of having large filters. It has ended up having the same effective receptive field as if you only have one 7 x 7 convolutional layers.

VGGNet has conv layers and a pooling layer a couple more conv layers, pooling layer, several more conv layers and so on. VGG architecture has the 16 total number of convolutional and fully connected layers. It has 16 in this case for VGG 16, and then 19 for VGG 19, it’s just a very similar architecture, but with a few more conv layers in there.

So this is quite costly computations with 138M total Parameter and each image has a memory of 96MB which is so much large than a regular image. It has just a 7.3 error rate in the ILSVRC challenge.

VGGNet was the runner up of the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) classification the benchmark in 2014.

ResNet

The main base element of ResNet is the residual block. As we go deeper into the network with a large number of layers, computation becomes more complex. These layers put on top of each other and every layer try to learn some underlying mapping of the desired function and instead of having these blocks, we try and fit a residual mapping.

Here on this right where the input to these blocks is just the input coming in whereas on the other side, we’re going to use our layers to try and fit some residual of our H(X) - X instead of the desired function H(X) directly. So basically, at the end of this block it takes the skip connection on this right here, where it just takes the input and pass it through as an identity, and so if it had no weight layers in between it was just going to be the identity. It would be the same thing as the output, but now we use additional weight layers to learn some delta, for some residual from our X.

In nutshell, as we go deeper into the network it is so hard to learn H(X) as we have a large number of layers. So here we used skip connection and learning F(x) direct input of x as the final output. So F(x) is called as a Residual.

In ResNet, stacks all these blocks together very deeply. Another thing with this very deep architecture is that it is enabling up to 150 layers deep of this, and then what we do is we stack all these layers periodically. We also double the number of filters and downsample spatially using stride two. In the end, only fully connected layer 1000 to output classes.

Hyperparameters:

In ResNet, it uses Batch Normalization after every conv layer. It also uses Xavier initialization with SGD + Momentum. The learning rate is 0.1 and is divided by 10 as validation error becomes constant. Moreover, batch-size is 256 and weight decay is 1e-5. The important part is there is no dropout is used in ResNet.

ResNet secured 1st Position in ILSVRC and COCO 2015 competition with just error rate of 3.6% of error rate. (Better than Human Performance !!!)

Inception

Inception v3 is a widely-used image recognition model that has been shown to attain greater than 78.1% accuracy on the ImageNet dataset. The model is the combination of many ideas developed by multiple researchers over the years.

The model itself is made up of symmetric and asymmetric building blocks, including convolutions, average pooling, max pooling, dropouts, and fully connected layers. Batchnorm is used extensively throughout the model and applied to activation inputs. Loss is computed via Softmax.

Inception work with Factorizing Convolutions. Factorizing Convolutions used to reduce the number of connections and parameters to learn. This will increase the speed and gives a good performance.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Architecture comparison of AlexNet, VGGNet, ResNet, Inception, DenseNet

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Flexible Rails

Peter Armstrong / Manning Publications / 2008-01-23 / USD 44.99

Rails is a fantastic tool for web application development, but its Ajax-driven interfaces stop short of the richness you gain with a tool like Adobe Flex. Simply put, Flex is the most productive way t......一起来看看《Flexible Rails》这本书的介绍吧!

码农工具