Thursday, July 5, 2018

Improving Performance of Convolutional Neural Network!

Convolutional Neural Network – a pillar algorithm of deep learning -- has been one of the most influential innovations in the field of computer vision. They have performed a lot better than traditional computer vision algorithms. These neural networks have proven to be successful in many different real-life case studies and applications, like:
·     Image classification, object detection, segmentation, face recognition; 
·     Classification of crystal structure using a convolutional neural network;
·     Self driving cars that leverage CNN based vision systems;
·     And many more, of course!
Lot of articles are available on how to build Convolution Neural Network. Hence, I am not going in detail regarding implementation of CNN. If you are interested in Document Classification using CNN please click here
The red theme of this tutorial is to know how to improve performance of CNN?
Let’s start ...
The common question is:
How can I get better performance from deep learning model?
It might be asked as:
How can I improve accuracy?
Oh God! My CNN is performing poor..
Don’t be stressed..
Here is the tutorial ..It will give you certain ideas to lift the performance of CNN.
The list is divided into 4 topics
1. Tune Parameters
2. Image Data Augmentation
3. Deeper Network Topology
4. Handel Overfitting and Underfitting problem

Oh! Cool.. Let’s start with explanation
1. Tune Parameters
To improve CNN model performance, we can tune parameters like epochs, learning rate etc..  Number of epochs definitely affect the performance. For large number of epochs , there is improvement in performance. But need to do certain experimentation for deciding epochs, learning rate.  We can see after certain epochs there is not any reduction is training loss and improvement in training accuracy. Accordingly we can decide number of epochs. Also we can use dropout layer in the CNN model. As per the application, need to decide proper optimizer during compilation of model. We can use various optimizer e.g SGD,rmsprop etc. There is need to tune model with various optimizers . All these things affect the performance of CNN.

2. Image Data Augmentation
"Deep learning is only relevant when you have a huge amount of data". It’s not wrong. CNN requires the ability to learn features automatically from the data, which is generally only possible when lots of training data is available.
If we have less training data available.. what to do?
Solution is here.. use Image Augmentation
Image augmentation parameters that are generally used to increase the data sample count are zoom, shear, rotation, preprocessing function and so on. Usage of these parameters results in generation of images having these attributes during training of Deep Learning model. Image samples generated using image augmentation, in general existing data samples increased by the rate of nearly 3x to 4x times.


Fig.1 Data Augmentation (source: wikipedia)

One more advantage of data augmentation is as we know CNN is not rotation invariant, using augmentation we can add the images in the dataset by considering rotation. Definitely it will increase the accuracy of system.

3. Deeper Network Topology
Now let’s start to talk on wide network vs deep network!

A wide neural network is possible to train with every possible input value.  Hence, these networks are very good at good at memorization, but not so good at generalization.  There are, however, a few difficulties with using an extremely wide, shallow network. Though, wide neural network is able to accept every possible input value, in the practical application we won’t have every possible value for training.
Deeper networks capture the natural “hierarchy” that is present everywhere in nature. See a convnet for example, it captures low level features in first layer, a little better but still low level features in the next layer and at higher layers object parts and simple structures are captured. The advantage of multiple layers is that they can learn features at various levels of abstraction.
So that explains why you might use a deep network rather than a very wide but shallow network.
But why not a very deep, very wide network?
The answer is we want our network to be as small as possible to produce good results. The wider network will take longer time to train. Deep networks are very computationally expensive to train. Hence, make them wide and deep enough that they work well, but no wider and deeper.

4. Handel Overfitting and Underfitting problem
In order to talk on overfitting and underfitting, let’s start with simple concept e.g Model. What is model? It is a system which maps input to output. e.g we can generate a model of image classification which takes test input image and predicts class label for it. It’s interesting!
To generate a model we divide dataset into training and testing set. We train our model with classifier e.g CNN on training set . Then we can use trained model for predicting output of test data.
Now what is Overfitting and Underfitting?
Overfitting refers to a model that models the training data too well. What is the meaning of it. Lets simplify... In the overfitting your model gives very nice accuracy on trained data but very less accuracy on test data. The meaning of this is overfitting model is having good memorization ability but less generalization ability. Our model doesn’t generalize well from our training data to unseen data.

Underfiiting refers to a model which works well on the testing data. Its very dangerous..isn’t it? Model is having good accuracy on test data, but less accuracy on training data.
In the technical terms a model that overfits has low bias and high variance. A model that underfits has high bias and less variance. In any modeling, there will always be a tradeoff between bias and variance and when we build models, we try to achieve the best balance.
Now what is bias and variance?
Bias is error w.r.t training set. Variance is how much a model changes in response to the training data. The meaning of variance is model doesn’t give good accuracy on test data.


Fig.2 Underfitting Vs. Overfitting  (Source: Wikipedia)

How to Prevent Underfitting and Overfitting?
Let’s start with Underfitting:
The Example of underfiiting is your model is giving 50% accuracy on train data and 80% accuracy on test data?
Its the worst problem..
Why it occurs?
The answer is Underfitting occurs when a model is too simple – informed by too few features or regularized too much – which makes it inflexible in learning from the dataset.
Solution...
I would suggest if there is underfitting, focus on the level of deepness of the model. You may need to add layers.. as it will give you more detailed features. As we discussed above you need to tune parameters to avoid Underfitting.

Overfitting:
The Example of overfiiting is your model is giving 99% accuracy on train data and 60% accuracy on test data?
Overfitting is a common problem in machine learning..
There are certain solutions to avoid overfitting
1. Train with more data
2. Early stopping:
3. Cross validation
let’s start to discuss
1.Train with more data:
Train with more data helps to increase accuracy of mode. Large training data may avoid the overfitting problem. In CNN we can use data augmentation to increase the size of training set.
2. Early stopping:
System is getting trained with number of iterations. Model is improved through each new iteration .. But wait.. after certain number of iterations model starts to overfit the training data. Hence, the model’s generalization ability can be weaken. So do the Early stopping. Early stopping refers stopping the training process before the learner passes that point.


Fig.3 Early Stopping (Source: Wikipedia)

3. Cross validation:
Cross validation is a nice technique to avoid overfitting problem.
What is cross validation?
Let’s start with k-fold cross validation. (where k is any integer number)
Partition the original training data set into k equal subsets. Each subset is called a fold. Let the folds be named as f1, f2, …, fk .
·     For i = 1 to i = k
·     Keep the fold fi as Validation set and keep all the remaining k-1 folds in the Cross validation training set.
·     Train your machine learning model using the cross validation training set and calculate the accuracy of your model by validating the predicted results against the validation set.
·     Estimate the accuracy of your machine learning model by averaging the accuracies derived in all the k cases of cross validation.


Fig.4 5-fold Cross Validation(Source: Wikipedia)

Fig. 4 describes 5-fold cross validation, where training dataset is divided into 5 equal sub-datsets. There are 5 iterations. In each iteration 4 sub-datasets are used for training whilst one sub-dataset is used for testing.
Cross-validation is definitely helpful to reduce overfitting problem.


Go Further!

I hope you enjoyed this post. The tutorial is good to understand how we can improve performance of CNN model..While these concepts may feel overwhelming at first, they will ‘click into place’ once you start seeing them in the context of real-world code and problems. If you are able to follow the things in the post easily or even with little more efforts, well done! Try doing some experiments ... Good Luck!


2 comments:

  1. Nice and detailed article! Thanks for sharing.

    ReplyDelete
  2. Great efforts put it to find the list of articles which is very useful to know, Definitely will share the same to other forums.
    Best Machine Learning institute in velachery | python machine learning course in velachery | Machine Learning course in chennai

    ReplyDelete