Skip to content
On this page

Data extraction from Handwritten documents

Deep Learning Model: resnet18

In this project, I extracted text data from handwritten documents using an image classification model.

Developed using: Python, Pytorch, Numpy, matplot, and skimage

Objective

In the Aerospace and defense manufacturing industry, all the processes are documented. Most of the data are documented in handwritten form. At the time of my writing, this post I work in an Aerospace manufacturing industry called TATA Boeing Aerospace Limited[TBAL]. Though we document everything in handwritten forms we cannot easily infer/process the data unless they are in digital format. So, to infer data TBAL has assigned a few persons[ around 24+ man hours per day] to digitalize a few important documents.

The main objective of this project is to extract the data from the handwritten document without human interference So, a few persons can be utilized for other works.

Solution planning

For this problem, I opted for an image classification model. You may think that why not Optical Character Recognition (OCR)? The Words they use in the handwritten forms are predefined So, they write only the words predefined hence we can solve the problem with a simple Image classification model.

Input image of Document

INFO

This is a sample image prepared by me for reference.

INPUT IMAGE

At first, we need to find the bounding box for each Observation Feild. I found the bounding box of each observation using image processing and warping the image to reference the image.

And the split input looks like this,

NM

And the expected out is like,

NM

Defining Model

With small googling, I found that Resnet18 would do the job. So, I defined a Resnet18 model in Pyorch(python) in the Google colab environment.

Data Collection

I used the old documents data for data collection and a few new data from a few people for testing the model. + image augmentation for the training dataset

Training Resnet18

The training happened in a colab environment utilizing GPU[ Cuda enabled GPU ]. I trained the model for 100 epoch

Epoch 99/99
----------
train Loss: 0.8627 Acc: 0.6667
val Loss: 0.2408 Acc: 0.9333

Training complete in 38m 21s
Best val Acc: 1.000000

Saving the predictions in Google sheets

Overall process


take the data from one folder -> Run prediction -> Edit Google sheet -> Put the image in another folder

In Google colab, we can easily mount Google drive and edit the google Sheets.

py
from google.colab import drive

drive.mount('/drive')

I edited the google sheet using a python library called gspread like below,

py
from google.colab import auth
auth.authenticate_user()

import gspread
from google.auth import default
creds, _ = default()

gc = gspread.authorize(creds)

The project worked well and helped TBAL to reduce the manpower involved in data entering work.