CNN Architecture Series — Hands on with VGG-16(Part-I)


VGG16 Architecture


Welcome guys to the journey of Convolutional Neural Networks where we will take a in-depth look at all the standard CNN architectures, let it be as simple as VGG-16 or as complex as Xception. In today’s world, Convolutional Neural Networks has come a long way in terms of capability. It can solve complex image classification tasks which are even impossible for humans, like classifying rare species of dogs. In Part I of this series, we will take a look at VGG-16 architecture.

VGG16 Architecture

What is VGG-16?

VGG-16 is a convolutional neural network architecture, it’s name VGG-16 comes from the fact that it has 16 layers. It’s layers consists of Convolutional layers, Max Pooling layers, Activation layers, Fully connected layers.

VGG16 Architecture

There are 13 convolutional layers, 5 Max Pooling layers and 3 Dense layers which sums up to 21 layers but only 16 weight layers. Conv 1 has number of filters as 64 while Conv 2 has 128 filters, Conv 3 has 256 filters while Conv 4 and Conv 5 has 512 filters. VGG-16 network is trained on ImageNet dataset which has over 14 million images and 1000 classes, and acheives 92.7% top-5 accuracy. It surpasses AlexNet network by replacing large filters of size 11 and 5 in the first and second convolution layers with small size 3x3 filters. Now, we will train an image dataset with VGG-16 in Keras. So, let’s get started hands-on.

Implementation


As we all need to run our models to see, I am going to give a demo on how to run models on Google Colab so that you can just download the notebook and run it on Colab. I am using a cats and dogs dataset from Kaggle, so let’s get started.


!pip3 install keras
!pip3 install tensorflow-gpu==1.14
import keras
import matplotlib.pyplot as plt
from keras import backend as K
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten, MaxPool2D
from keras.optimizers import Adam,SGD,Adagrad,Adadelta,RMSprop
from keras.utils import to_categorical

So, we have got all the packages required to run our model. We need keras, tensorflow for running our model and for visualization we will need matplotlib library. Download the cats and dogs dataset from Kaggle through this link. Put it into the right folder and then we are good to go.

from keras.preprocessing.image import ImageDataGenerator
import numpy as np
trdata = ImageDataGenerator()
traindata = trdata.flow_from_directory(directory="training_set/training_set",target_size=(224,224))
tsdata = ImageDataGenerator()
testdata = tsdata.flow_from_directory(directory="test_set/test_set", target_size=(224,224))

This is a keras preprocessing library called ImageDataGenerator where we preprocess our data. Here, we are setting the size of all the images as 224x224 as that is the standard input size for a VGG-16 network. Note - (Please give the right directory for the train and test images) Now we will define our model using Keras Sequential API. We will define the VGG-16 network using Convolution Layers, Max Pooling Layers and Dense Layers.

model = Sequential()
model.add(Conv2D(input_shape=(224,224,3),filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Flatten())
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(units=2, activation="softmax"))

Now we need to define the optimizer and the loss function.

from keras.optimizers import Adam opt = Adam(lr=0.001) model.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])

Now, we need to train our model, we have used two callbacks(ModelCheckpoint and EarlyStopping), I am not getting into the details of those two, but if you have doubt, do comment in the comment section and I will help you out.

from keras.callbacks import ModelCheckpoint, EarlyStopping
checkpoint = ModelCheckpoint("vgg16_1.h5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
early = EarlyStopping(monitor='val_acc', min_delta=0, patience=20, verbose=1, mode='auto')
hist = model.fit_generator(steps_per_epoch=10,generator=traindata, validation_data= testdata, validation_steps=10,epochs=10,callbacks=[checkpoint,early])

Now the model starts training.

Epoch 1/10 10/10 [==============================]14s 1s/step — loss: 6.8730 — acc: 0.5125 — val_loss: 7.8576 — val_acc: 0.5125 
Epoch 00001: val_acc improved from -inf to 0.51250, saving model to vgg16_1.h5 
Epoch 2/10 10/10 [==============================]3s 342ms/step — loss: 7.1524 — acc: 0.5563 — val_loss: 8.4620 — val_acc: 0.4750 
Epoch 00002: val_acc did not improve from 0.51250 
Epoch 3/10 10/10 [==============================]3s 343ms/step — loss: 8.5124 — acc: 0.4719 — val_loss: 7.5050 — val_acc: 0.5344 
Epoch 00003: val_acc improved from 0.51250 to 0.53438, saving model to vgg16_1.h5 
Epoch 4/10 10/10 [==============================]3s 343ms/step — loss: 8.1598 — acc: 0.4938 — val_loss: 8.1598 — val_acc: 0.4938 
Epoch 00004: val_acc did not improve from 0.53438 
Epoch 5/10 10/10 [==============================]3s 340ms/step — loss: 8.5627 — acc: 0.4688 — val_loss: 8.3109 — val_acc: 0.4844 
Epoch 00005: val_acc did not improve from 0.53438 
Epoch 6/10 10/10 [==============================]3s 341ms/step — loss: 7.4043 — acc: 0.5406 — val_loss: 7.9583 — val_acc: 0.5062 
Epoch 00006: val_acc did not improve from 0.53438 
Epoch 7/10 10/10 [==============================]4s 370ms/step — loss: 8.6635 — acc: 0.4625 — val_loss: 7.9225 — val_acc: 0.5085 
Epoch 00007: val_acc did not improve from 0.53438 
Epoch 8/10 10/10 [==============================]3s 347ms/step — loss: 8.3613 — acc: 0.4813 — val_loss: 8.1094 — val_acc: 0.4969 
Epoch 00008: val_acc did not improve from 0.53438 
Epoch 9/10 10/10 [==============================]3s 342ms/step — loss: 8.5124 — acc: 0.4719 — val_loss: 7.6561 — val_acc: 0.5250 
Epoch 00009: val_acc did not improve from 0.53438 
Epoch 10/10 10/10 [==============================]3s 342ms/step — loss: 8.6131 — acc: 0.4656 — val_loss: 7.9079 — val_acc: 0.5094 
Epoch 00010: val_acc did not improve from 0.53438

So we have trained our model for 10 epochs, you can obviously train it for larger epochs for more accuracy.

Now we will perform inference on this model.

from keras.preprocessing import image
img = image.load_img("website-donate-mobile.jpg",target_size=(224,224))
img = np.asarray(img)
plt.imshow(img)
img = np.expand_dims(img, axis=0)
from keras.models import load_model
saved_model = load_model("vgg16_1.h5")
output = saved_model.predict(img)
if output[0][0] > output[0][1]:
    print("cat")
else:
    print('dog')

VGG16 Prediction Result

So this model gives the right prediction that this is a dog. So, finally the model is built and the next set of steps maybe to deploy it in GCP or using flask, which we will eventually cover in future articles. So, this is all for this article, if you like the article, do give a clap and also wait for our next Part which will cover other CNN architectures. If you have any problems regarding this, please do comment in the comment section or contact me on Twitter. The Google Colab Notebook for this article is given in this link. Do check it out and run it on Google Colab with the dark mode on, it’s really awesome, do check that out too. Do check my other articles.

Drop Me a Line, Let Me Know What You Think

© Subham Tewari