Document
Classification Using Deep Learning
Textual Document classification is a
challenging problem. In this tutorial you will learn document classification
using Deep learning (Convolutional Neural Network).
Dataset-Tobacco3482
dataset.
You
can download the dataset using following link.
Link:
Tobacco3482_dataset
Dataset
Description:
Tobacco3482 dataset consists of total
3482 images of 10 different document classes namely, Memo, News, Note, Report,
Resume, Scientific, Advertisement, Email, Form, Letter. The dataset is having
two directories i.e Tobacco3482_1 and
Tobacco3482_2.
Tobacco3482_1 directory consists images
of 6 document classes i.e Memo, News, Note, Report, Resume, Scientific.
Tobacco3482_2 directory consists images
of 4 document classes i.e Advertisement, Email, Form, Letter.
Here
are some Examples:
In Recent years Convolutional Neural
Network enjoyed great success for Image Classification., There exist large
domain differences between natural images and document images. For example, in
natural image , the object of interest can appear in any region of the image.
In contrast, many document images are 2D entities that occupy the whole image.
So question arises whether the same architecture of CNN is also optimal for document images. The
answer is big ‘YES’. Thanks to the
beauty of CNN we can use it for natural image classification as well as
document image classification.
For
the Experimentation the Tobacco3482 dataset is used. Experiments are
carried out with python 2.7 on Ubuntu operating system. The following procedure
need to follow for the successful implementation.
1. Import the
necessary libraries:
#
Import libraries
import
os,cv2
from
keras import backend as K
K.set_image_dim_ordering('tf')
from
keras.models import Sequential
from
keras.layers.core import Dense, Dropout, Activation, Flatten
from
keras.layers.convolutional import Conv2D, MaxPooling2D
from
keras.optimizers import RMSprop
2. Image Preprocessing:
We can use
cv2.resize( ) function , since CNN is taking the input image of fixed size . So
resize the images which we are using for experimentation.
input_img_resize=cv2.resize(input_img,(299,299))
3. One-hot
encoding:
In one-hot encoding, we convert the categorical data into a vector of
numbers. The reason why you convert the categorical data in one hot encoding is
that machine learning algorithms cannot work with categorical data directly.
You generate one boolean column for each category or class. Only one of these
columns could take on the value 1 for each sample. Hence, the term one-hot
encoding.
For Our problem
statement, the one hot encoding will be a row vector, and for each document
image, it will have a dimension of 1 x 10 as there are 10 classes. The
important thing to note here is that the vector consists of all zeros except
for the class that it represents, and for that, it is 1. For example, the image having label of 2, the one hot encoding vector would be [0
1 0 0 0 0 0 0 0 0].
So let's convert the
training and testing labels into one-hot encoding vectors:
#
convert class labels to one-hot encoding
Y
= np_utils.to_categorical(labels, num_classes)
4. Train-Test-Split:
We can
divide the dataset for training and testing purpose using train_test_split( )
function.
X_train,
X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=2)
5.
Building CNN Model:
Oh! Good...Now actual story starts. I used Keras CNN using TensorFlow platform
for the training purpose. First build the model, compile it and fit it on
training data.
#
CNN Model
model
= Sequential()
model.add(Conv2D(32,(3,3),padding='same',input_shape=(299,299,1)))
model.add(Activation('relu'))
model.add(Conv2D(32,
(3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,
2)))
model.add(Dropout(0.5))
model.add(Conv2D(64,
(3, 3)))
model.add(Activation('relu'))
#model.add(Convolution2D(64,
3, 3))
#model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,
2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
#sgd
= SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
#model.compile(loss='categorical_crossentropy',
optimizer=sgd,metrics=["accuracy"])
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',metrics=["accuracy"])
#
Viewing model_configuration
model.summary()
#
Fit the model
model.fit(X_train,
y_train, batch_size=16, nb_epoch=num_epoch, verbose=1, validation_data=(X_test,
y_test))
6.Evaluate
CNN Model:
Once the model
is trained we can evaluate it on Test data.
#
Evaluating the model
score
= model.evaluate(X_test, y_test, verbose=0)
print('Test
Loss:', score[0])
print('Test
accuracy:', score[1])
Congratualtions!
You will get quite good results.
7.
Classification Report and Confusion Matrix:
from
sklearn.metrics import classification_report,confusion_matrix
Y_pred
= model.predict(X_test)
y_pred
= np.argmax(Y_pred, axis=1)
target_names
= ['class 0(Note)', 'class 1(Scientific)','class 2(Report)','class
3(Resume)','class 4(News)','class 5(Memo),'class 6(Advertisement)', 'class
7(Email)','class 8(Form)','class 9(Letter)']
print(classification_report(np.argmax(y_test,axis=1),
y_pred,target_names=target_names))
print(confusion_matrix(np.argmax(y_test,axis=1),
y_pred))
8.
Save Model:
We
can save the weights of trained model .
#
Saving and loading model and weights
from
keras.models import model_from_json
from
keras.models import load_model
#
serialize model to JSON
model_json
= model.to_json()
with
open("model.json", "w") as json_file:
json_file.write(model_json)
#
serialize weights to HDF5
model.save_weights("model.h5")
print("Saved
model to disk")
9.Load
Model:
In the future if you want to test using weights of trained model which we
already save e.g in model.h5
#
load json and create model
json_file
= open('model.json', 'r')
loaded_model_json
= json_file.read()
json_file.close()
loaded_model
= model_from_json(loaded_model_json)
#
load weights into new model
loaded_model.load_weights("model.h5")
print("Loaded
model from disk")
# evaluate loaded model on test data
loaded_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop', metrics=['accuracy'])
#
Read the test image using cv2.imread ( )
function
print
loaded_model.predict(test_image)
Go
Further!
I hope you enjoyed this
post. The tutorial is good start to build convolutional neural networks in
Python with Keras. The code in the tutorial helps to develop document
classification system. If you are able to follow easily or even with little
more efforts, well done! Try doing some experiments maybe with same model
architecture but using different types of public datasets available. Good Luck!
Reference:
Jayant Kumar, Peng Ye and David Doermann. "Structural Similarity for Document Image Classification and Retrieval." Pattern Recognition Letters, November 2013.
Jayant Kumar, Peng Ye and David Doermann. "Structural Similarity for Document Image Classification and Retrieval." Pattern Recognition Letters, November 2013.
In detail explanation. Thanks for sharing , Great help.
ReplyDeleteGreat! Really helpful!! Thank you for the sharing:-)
ReplyDelete