Improving
Performance of Convolutional Neural Network!
Convolutional Neural Network – a pillar algorithm of deep learning -- has been one of
the most influential innovations in the field of computer vision. They have
performed a lot better than traditional computer vision algorithms. These
neural networks have proven to be successful in many different real-life case
studies and applications, like:
· Image
classification, object detection, segmentation, face recognition;
· Classification of
crystal structure using a convolutional neural network;
· Self driving cars
that leverage CNN based vision systems;
· And many more, of
course!
Lot of articles
are available on how to build Convolution Neural Network. Hence, I am not going
in detail regarding implementation of CNN. If you are interested in Document
Classification using CNN please click here
The red theme of this tutorial is to know how to improve
performance of CNN?
Let’s start ...
The common question is:
How can I get
better performance from deep learning model?
It might be asked as:
How can I improve
accuracy?
Oh God! My CNN is
performing poor..
Don’t be
stressed..
Here is the tutorial ..It will give you certain ideas to
lift the performance of CNN.
The list is divided into 4 topics
1. Tune
Parameters
2. Image Data
Augmentation
3. Deeper Network
Topology
4. Handel
Overfitting and Underfitting problem
Oh! Cool.. Let’s start with explanation
1.
Tune Parameters
To improve CNN
model performance, we can tune parameters like epochs, learning rate etc.. Number of epochs definitely affect the
performance. For large number of epochs , there is improvement in performance.
But need to do certain experimentation for deciding epochs, learning rate. We can see after certain epochs there is not
any reduction is training loss and improvement in training accuracy.
Accordingly we can decide number of epochs. Also we can use dropout layer in
the CNN model. As per the application, need to decide proper optimizer during
compilation of model. We can use various optimizer e.g SGD,rmsprop etc. There
is need to tune model with various optimizers . All these things affect the
performance of CNN.
2.
Image Data Augmentation
"Deep
learning is only relevant when you have a huge amount of data". It’s not
wrong. CNN requires the ability to learn features automatically from the data,
which is generally only possible when lots of training data is available.
If we have less training data available.. what to do?
Solution is here.. use Image Augmentation
Image
augmentation parameters that are generally used to increase the data sample
count are zoom, shear, rotation, preprocessing function and so on. Usage of
these parameters results in generation of images having these attributes during
training of Deep Learning model. Image samples generated using image
augmentation, in general existing data samples increased by the rate of nearly
3x to 4x times.
Fig.1 Data Augmentation (source: wikipedia)
One more
advantage of data augmentation is as we know CNN is not rotation invariant,
using augmentation we can add the images in the dataset by considering
rotation. Definitely it will increase the accuracy of system.
3. Deeper Network
Topology
Now let’s start to talk on wide network vs deep network!
A wide neural
network is possible to train with every possible input value. Hence, these networks are very good at good
at memorization, but not so good at generalization. There are, however, a few difficulties with
using an extremely wide, shallow network. Though, wide neural network is able
to accept every possible input value, in the practical application we won’t
have every possible value for training.
Deeper networks
capture the natural “hierarchy” that is present everywhere in nature. See a
convnet for example, it captures low level features in first layer, a little
better but still low level features in the next layer and at higher layers
object parts and simple structures are captured. The advantage of multiple
layers is that they can learn features at various levels of abstraction.
So that explains
why you might use a deep network rather than a very wide but shallow network.
But why not a
very deep, very wide network?
The answer is we
want our network to be as small as possible to produce good results. The wider
network will take longer time to train. Deep networks are very computationally
expensive to train. Hence, make them wide and deep enough that they work well,
but no wider and deeper.
4. Handel
Overfitting and Underfitting problem
In order to talk
on overfitting and underfitting, let’s start with simple concept e.g Model.
What is model? It is a system which maps input to output. e.g we can generate a
model of image classification which takes test input image and predicts class label
for it. It’s interesting!
To generate a
model we divide dataset into training and testing set. We train our model with
classifier e.g CNN on training set . Then we can use trained model for
predicting output of test data.
Now what is Overfitting and Underfitting?
Overfitting refers to a
model that models the training data too well. What is the meaning of it. Lets
simplify... In the overfitting your model gives very nice accuracy on trained
data but very less accuracy on test data. The meaning of this is overfitting
model is having good memorization ability but less generalization ability. Our
model doesn’t generalize well from our training data to unseen data.
Underfiiting refers to a
model which works well on the testing data. Its very dangerous..isn’t it? Model
is having good accuracy on test data, but less accuracy on training data.
In the technical
terms a model that overfits has low bias and high variance. A model that
underfits has high bias and less variance. In any modeling, there will always
be a tradeoff between bias and variance and when we build models, we try to
achieve the best balance.
Now
what is bias and variance?
Bias is error
w.r.t training set. Variance is how much a model changes in response to the
training data. The meaning of variance is model doesn’t give good accuracy on
test data.
Fig.2 Underfitting Vs. Overfitting (Source: Wikipedia)
How to Prevent
Underfitting and Overfitting?
Let’s start with
Underfitting:
The Example of underfiiting is your model is giving 50%
accuracy on train data and 80% accuracy on test data?
Its the worst problem..
Why it occurs?
The answer is Underfitting occurs when a model is
too simple – informed by too few features or regularized too much – which makes
it inflexible in learning from the dataset.
Solution...
I would suggest if there is underfitting, focus on the
level of deepness of the model. You may need to add layers.. as it will give
you more detailed features. As we discussed above you need to tune parameters
to avoid Underfitting.
Overfitting:
The Example of overfiiting is your model is giving 99%
accuracy on train data and 60% accuracy on test data?
Overfitting is a common problem in machine learning..
There are certain solutions to avoid overfitting
1. Train with more data
2. Early stopping:
3. Cross validation
let’s start to discuss
1.Train with more
data:
Train with more
data helps to increase accuracy of mode. Large training data may avoid the
overfitting problem. In CNN we can use data augmentation to increase the size
of training set.
2. Early
stopping:
System is getting
trained with number of iterations. Model is improved through each new iteration
.. But wait.. after certain number of iterations model starts to overfit the
training data. Hence, the model’s generalization ability can be weaken. So do
the Early stopping. Early stopping refers stopping the training process before
the learner passes that point.
Fig.3 Early Stopping (Source: Wikipedia)
3. Cross
validation:
Cross validation is a nice technique to avoid overfitting
problem.
What is cross
validation?
Let’s start with k-fold cross validation. (where k is any
integer number)
Partition the original training data set into k equal
subsets. Each subset is called a fold. Let the folds be named as f1,
f2, …, fk .
· For i = 1 to i =
k
· Keep the fold fi
as Validation set and keep all the remaining k-1 folds in the Cross
validation training set.
· Train your
machine learning model using the cross validation training set and calculate
the accuracy of your model by validating the predicted results against the
validation set.
· Estimate the
accuracy of your machine learning model by averaging the accuracies derived in
all the k cases of cross validation.
Fig.4 5-fold Cross
Validation(Source: Wikipedia)
Fig. 4 describes
5-fold cross validation, where training dataset is divided into 5 equal
sub-datsets. There are 5 iterations. In each iteration 4 sub-datasets are used
for training whilst one sub-dataset is used for testing.
Cross-validation
is definitely helpful to reduce overfitting problem.
Go Further!
I
hope you enjoyed this post. The tutorial is good to understand how we can
improve performance of CNN model..While these concepts may feel overwhelming at
first, they will ‘click into place’ once you start seeing them in the context
of real-world code and problems. If you are able to follow the things in the
post easily or even with little more efforts, well done! Try doing some
experiments ... Good Luck!
Nice and detailed article! Thanks for sharing.
ReplyDeleteGreat efforts put it to find the list of articles which is very useful to know, Definitely will share the same to other forums.
ReplyDeleteBest Machine Learning institute in velachery | python machine learning course in velachery | Machine Learning course in chennai