CNN Architecture Series — Hands on with AlexNet(Part II)

Updated: Nov 29, 2020

Welcome to Part II of the CNN architecture series, I hope you liked Part I of this series which was based on VGG-16. Today, we will discuss AlexNet, so let’s dive deep without wasting any time. If you are new and haven’t checked part I, I would like to give a small description. Today, Convolutional Neural Networks(CNNs) are used in all computer vision tasks, so it becomes very necessary for us to understand what is the Alexnet architecture and how it works.

CNN architecture
CNN architecture

In case you missed Part I of this series, you can always go back and read it and come back again.

CNN Architecture Series — Hands-on with VGG-16


What is AlexNet?

AlexNet competed in the ImageNet Large Scale Visual Recognition Challenge in 2012. The CNN network achieved a top-5 error of 15.3%, which is 10.8 percentage more than that of the runner up.

Note- If you want a head start on Convolutional Neural Network, do go through my article given below.

Is Convolutional Neural Networks(CNNs) the next big thing in AI?

AlexNet is an 8 layer convolutional neural network architecture which consists of Convolutional layers, Activation layers, Max Pooling layers, and Dense layers. It has 5 convolutional layers, 3 max-pooling layers, and 3 dense layers with one output dense-softmax layer.

Conv_1 consists of 11x11 filters, while Conv_2 uses 5x5 filters, and Conv_3 uses 3x3 filters, Conv_4 uses 3x3 filters and Conv_5 uses 3x3 filters.

AlexNet architecture
AlexNet architecture

As you can see above, the first layer is a convolution layer that has a filter size of 11x11 which is very interesting. (Note — VGG-16 which came after AlexNet used many small 3x3 filters which increased the model’s accuracy and performance).

AlexNet architecture
AlexNet architecture

Now we are going to train an AlexNet network using Keras library and for the purpose of training this model, we will be using the cats and dogs dataset from Kaggle, so let's get started.


As a part of this tutorial, we will be using Google Colab to run our deep learning model because they provide us with free GPU which is an essential toolkit for running our deep learning models at a faster pace. Many of us who doesn't have a GPU embedded in our system can use Google Colab to run our deep learning models.

The first thing we need to make sure of before running any deep learning model is that we should have all the packages installed which are necessary for running our deep learning models. We will be using the Keras library to run our AI model as it is very easy to code and alongside that, we will also be using the Matplotlib library which is used for visualization purposes. One can download the dataset which we will be using today from Kaggle using this link. Make sure you put it in the right folder and we are all set to go.

We have used the ImageDataGenerator function from Keras library which helps us with all the preprocessing of the dataset. Note that we are setting the Image size as 227x227 as that is the input size required for AlexNet (For VGG-16 we use a standard size of 224x224).

Note — (Please make sure that the paths to the directory for the train and test images are set where it should be).

This is the step where we will be defining our model using Keras Sequential API. As you can see we have used Convolutional layers, Max Pooling layers, and Dense Layers to build our deep learning model.

Now it's time to define the optimizer and the loss function.

Hereafter defining the model and setting up all the hyperparameters required for our model, we will be training it. We will be using two callbacks (ModelCheckpoint and EarlyStopping). We wil not get into the details on how callbacks in keras works, because we can discuss it in a different article. But if you guys have any problems regarding this, please feel free to comment below.

Here the training starts...

Epoch 1/10 10/10 [==============================]83s 8s/step — loss: 8.0139 — acc: 0.4875 — val_loss: 8.4620 — val_acc: 0.4750 Epoch 00001: val_acc improved from -inf to 0.47500, saving model to alexnet_1.h5 
Epoch 2/10 10/10 [==============================]80s 8s/step — loss: 7.7065 — acc: 0.5219 — val_loss: 7.8072 — val_acc: 0.5156 Epoch 00002: val_acc improved from 0.47500 to 0.51562, saving model to alexnet_1.h5 
Epoch 3/10 10/10 [==============================]80s 8s/step — loss: 8.6131 — acc: 0.4656 — val_loss: 7.8576 — val_acc: 0.5125 Epoch 00003: val_acc did not improve from 0.51562 
Epoch 4/10 10/10 [==============================]80s 8s/step — loss: 7.8072 — acc: 0.5156 — val_loss: 8.4620 — val_acc: 0.4750 Epoch 00004: val_acc did not improve from 0.51562 
Epoch 5/10 10/10 [==============================]81s 8s/step — loss: 8.3613 — acc: 0.4813 — val_loss: 8.3109 — val_acc: 0.4844 Epoch 00005: val_acc did not improve from 0.51562 
Epoch 6/10 10/10 [==============================]79s 8s/step — loss: 7.5554 — acc: 0.5312 — val_loss: 7.6561 — val_acc: 0.5250 Epoch 00006: val_acc improved from 0.51562 to 0.52500, saving model to alexnet_1.h5 
Epoch 7/10 10/10 [==============================]78s 8s/step — loss: 8.3109 — acc: 0.4844 — val_loss: 8.3049 — val_acc: 0.4847 Epoch 00007: val_acc did not improve from 0.52500 
Epoch 8/10 10/10 [==============================]79s 8s/step — loss: 8.1094 — acc: 0.4969 — val_loss: 8.3109 — val_acc: 0.4844 Epoch 00008: val_acc did not improve from 0.52500 
Epoch 9/10 10/10 [==============================]80s 8s/step — loss: 7.2531 — acc: 0.5500 — val_loss: 7.1524 — val_acc: 0.5563 Epoch 00009: val_acc improved from 0.52500 to 0.55625, saving model to alexnet_1.h5 
Epoch 10/10 10/10 [==============================]80s 8s/step — loss: 8.3109 — acc: 0.4844 — val_loss: 7.7568 — val_acc: 0.5188 Epoch 00010: val_acc did not improve from 0.55625

After training the model for 10 epochs, one can train more for achieving more accuracy.



Now it's time to test our deep learning model and see how it performs given an image.

AlexNet model prediction
Prediction of the model

So this computer vision model gives the right prediction that this is a dog. So, finally, the deep learning model is built and the next set of steps could be to deploy it in the cloud or build an android app out of it, which we will get into eventually in future artificial intelligence and computer vision blogs. So, this is all for this article, if you like the article, do give a clap and also wait for our next Part which will cover other CNN architectures. If you have any problems regarding this, please do comment in the comment section or contact me on Twitter. The Google Colab Notebook for this article is given in this link. Do check it out and run it on Google Colab with the dark mode on, it’s really awesome, do check that out too. Do check my other articles. For any help do mail me at and don't forget to subscribe to this blog for more such awesome content on deep learning, data science, computer vision, and AI. Thank you.

37 views0 comments