
Introduction
Self-driving cars mark the biggest change in the automotive industry in the last decade. All the major car companies are involved in developing their own self-driving car. Autonomous tech will be a $7 trillion industry and also save a lot of human lives in the years to come.
This article will give you an insight into how to develop a self-driving car using a Convolutional Neural Network.
The Approach
Mostly when we talk about self-driving cars, we talk about LIDARs, RADARs, 360-degree cameras, and costly GPUs. We decompose the problem into several parts like lane detection, path planning, and control. But in the end to end self-driving car model, we are building a model which only takes the front camera images of the car and predicts the steering angles. We use minimum training data and minimum computation, the car learns to drive on roads with or without lane markings.
This builds a great model that is light and computationally inexpensive and provides an end to end solution to this self-driving car problem.
Why we need a Self-driving car?
* Reduce driver costs. * Reduce driver stress. * More efficient parks. * Saves time and reduces traffic. * Reduces accidents. * Supports carpooling.
Methodology
The car is equipped with three cameras mounted behind the windshield of the car. Video is captured simultaneously with the steering angle data. In order to make the system independent of the car geometry, we consider the steering data as 1/r where r is the turning radius in meters. We use 1/r instead of r to prevent singularity when driving straight. The training data consists of single image frames sampled from the video and paired with the steering data.

The above image depicts the CNN model where the videos are fed into the CNN as image frames and the model outputs the desired steering angle. Then using back-propagation algorithms we try to minimize the loss between the desired steering angle and the computed steering angle.

After training, the model can predict the steering angles using video images of a single front camera.
Dataset- Around 45,000 images of the driving car, 2.2 GB. Dataset was made by Sully Chen in 2017. The data was recorded around Rancho Palos Verdes and San Pedro California. The dataset can be found through this link.
Network Architecture- The network consists of 9 layers which include 5 convolutional layers, one normalization layer, and 3 fully connected layers.

The first layer of the network consists of image normalization. This is hardcoded as it is not learned during the model learning process. Normalization helps in accelerating GPU processing.
The convolutional layers are used for feature extraction, which was chosen empirically through experiments done for convolutional layers configuration. We used stridden convolutions in the first three convolutional layers with a 2x2 stride and 5x5 kernel size and used non-strided convolutions with a kernel size of 3x3 in the last two layers.
The five convolutional layers are followed by three fully connected layers which output an inverse of turning radius.
Training Details
For training a convolutional neural network we have to select the image frames which will be input to it. We have sampled the video at 10 FPS. We haven’t used a higher sampling rate as it would lead to the inclusion of similar images which wouldn’t have given any useful information.
We have also done image augmentation on image frames by adding shifts, rotations so that car can learn to recover from unexpected situations. Image augmentation perturbations are chosen randomly from a normal distribution. The normal distribution has mean zero and the standard deviation is twice the standard deviation that was measured with human drivers.
Visualization
We have taken two examples- an unpaved road and a forest road. In the case of an unpaved road, we can see that the feature map activation shows the outline of the road. While in the case of forest road, the model is not able to find anything useful and mostly contains noise.
We can see that CNN is able to detect the outline of the road, but we never explicitly trained it to detect the outline of the road.


Conclusion
So now we have discussed how with less than 100 hours of driving data, CNN was able to learn the outline of the road(including diverse conditions like unpaved roads, sunny weather, rainy weather). The model was able to detect the outline of the road, without explicitly label the data.
We need to improve the robustness of the problem, to find a way to verify the robustness and improve the visualization of the network-internal processing steps.
For implementation, code send me an email or comment below. The research paper can be found here.