Squeeze-Net — Model size of 0.5MB, and accuracy same as AlexNet

Updated: Nov 30, 2020

Why we need Squeeze-Net?

Nowadays, technology is at its peak. Self-driving cars and IoT is going to be household talks in the next few years to come. Therefore, everything is controlled remotely, say, for example, in self-driving cars, we will need our system to constantly communicate with the servers. So, therefore, if we have a model that has a small size then we can easily deploy it in the cloud. So that’s why we needed an architecture that is less in size and also achieves the same level of accuracy that other architecture achieves.

Here are some advantages of Squeeze-Net —

  • More efficient distributed training.

  • Less overhead when exporting new models to clients.

SqueezeNet achieves AlexNet-level accuracy on ImageNet with50x fewer parameters. Additionally, with model compression techniques, we are able to compress SqueezeNet toless than 0.5MB(510× smaller than AlexNet).


  • Replace 3x3 filters with 1x1 filters- We plan to use a maximum number of 1x1 filters as using a 1X1 filter rather than a 3X3 filter can reduce the number of parameters by 9X. One may think that replacing 3X3 filters with 1X1 filters may perform poorly as it has less information to work on. But this is not the case. Typically a 3X3 filter may capture the spatial information of pixels close to each other while the 1X1 filter zeros in on a pixel and captures the features amongst its channels.

  • Decrease the number of input channels to 3x3 filters- to maintain a small total number of parameters in a CNN, it is important not only to decrease the number of 3x3 filters but also to decrease the number of input channels to the 3x3 filters. We decrease the number of input channels to 3x3 filters using squeeze layers. The author of the paper has used a term called the “fire module” in which there is a squeeze layer and an expanded layer. In the squeeze layer, we are only using 1X1 filters while in the expanding layer we are using a combination of 3X3 filters and 1X1 filters. The author is trying to limit the number of inputs to the 3X3 filters so as to reduce the number of parameters in the layer.

The Fire Module in Squeeze Net
The Fire Module in Squeeze Net
  • Downsample late in the network so that convolution layers have large activation maps- Having got an intuition about reducing the sheer number of parameters we are working with, how the model is getting the most out of the remaining set of parameters. The author in the paper has downsampled the feature map in later layers and this actually increases the accuracy. But this is a great contrast to networks like VGG where a large feature map is taken and then it gets smaller as the network approaches the end. This different approach is very interesting and they cite a paper by K. He and H. Sun that similarly applies delayed downsampling that leads to higher classification accuracy.

The squeeze-net architecture consists of the fire module which enables it to bring down the number of parameters.

The Squeeze-Net Architecture
The Squeeze-Net Architecture

Another thing that surprises me is the lack of fully connected layers or dense layers at the end which one will see in a typical CNN architecture. The dense layers at the end learn all the relationships between the high-level features and the classes it is trying to identify. The fully connected layers are designed to learn that noses and ears make up a face, and wheels and lights indicate cars. However, in this architecture, that extra learning step seems to be embedded within the transformations between various “fire modules”.

Squeeze-Net benchmarking with all the other CNN architectures
Squeeze-Net benchmarking with all the other CNN architectures

The squeeze-net is able to achieve an accuracy nearly equal to AlexNet with 50X less number of parameters. The most impressive part is that if we apply Deep compression to the already smaller model then it can reduce the size of the squeeze-net model to 510x times that of AlexNet.

I will now share the model of the squeeze-net which consists of 8 fire modules and 2 convolutional layers one at the start and another at the end.

So this is the Squeeze-Net model, so feel free to leave your comments below if you need any help. Thanks for reading my post.

You can go through the squeeze-net official paper- Click here


Hope you liked my article. If you have any questions and doubts related to this topic or any topic in AI and machine learning, do let me know in the comment section, and I will be more than happy to help you out. Do hit like on this article and share it among your friends who are in AI. Follow us on Instagram and Twitter. Let's democratize AI.

89 views0 comments