Text classification is a crucial step for optical character recognition. The output of the scanner is non- editable. Though one cannot make any change in scanned text image, if required. Thus, this provides the feed for the theory of optical character recognition. Optical Character Recognition (OCR) is the process of converting scanned images of machine printed or handwritten text into a computer readable format. The process of OCR involves several steps including pre-processing after image acquisition, segmentation, feature extraction, and classification. The incorrect classification is like a garbage in and garbage out. Existing methods focuses only upon the classification of unmixed characters in Arab, English, Latin, Farsi, Bangla, and Devnagari script. The Hybrid Techniques is solving the mixed (Machine printed and handwritten) character classification problem. Classification is carried out on different kind of daily use forms like as self declaration forms, admission forms, verification forms, university forms, certificates, banking forms, dairy forms, Punjab govt forms etc. The proposed technique is capable to classify the handwritten and machine printed text written in Gurumukhi script in mixed text. The proposed technique has been tested on 150 different kinds of forms in Gurumukhi and Roman scripts. The proposed techniques achieve 93% accuracy on mixed character form and 96% accuracy achieves on unmixed character forms. The overall accuracy of the proposed technique is 94.5%.
References
Saba T., Almazyad A.S., Rehman A. "Language Independent Rule Based Classification of Printed & Handwritten Text", International conference on evolving and adaptive intelligent system (EAIS), pp.1-4 December 1, 2015.
Srivastava R.,Tewari R.K., Kant S., "Separation of Machine Printed and Handwritten Text for Hindi Documents" International research journal of engineering and technology(IRJET), Vol.2, Issue 2, pp.704-708, 2015.
Jindal A., Amir M., "Automatic Classification of handwritten & printed text in ICR Boxes”, International advance Computing Conference(IACC), IEEE, pp.1028-1032, 21 Feb., 2014.
Saïdani A, and Echi A.K.. Belaid A., “pyramid histogram to oriented gradient for machine-printed/handwritten and Arabic/latin words discrimination”, 6th international conference of soft computing and pattern recognition, IEEE, pp.267-272, 11 Aug., 2014.
Wang X., Hansch R., Ma L., Hellwich O., “Comparision of different Color Spaces for Image Segmentation Using Graph Cut”, International Conference on Complex Vision theory and Applications(VISAPP), Vol.1, pp. 301-308, 5 Jan.,2014.
Saïdani A, and Echi A.K.. Belaid A. "Identification of Machine-printed and Handwritten Words in Arabic and Latin Scripts", 12th International conference on document analysis and recognition, IEEE, pp.798-802, 25 Aug., 2013.
Zagoris K.et. al, "Handwritten and Machine printed text separation in document images using the Bag of Visual Words Paradigm”, International Conference on Frontiers in Handwriting Recognition, IEEE, pp.103-108, 18 Sep., 2012.
Narayan S., Gowda S.D., "Discrimination of handwritten and machine Printed text is Scanner document Images based on Rough Set Theory" World Congress on Information and Communication Technologies, IEEE, pp.590-594, 30 Oct., 2012.
Banerjee P., Chaudhari B.B., "A System for Hand-Written and Machine-Printed Text Separation in Bangla Document Images" International Conference on Frontiers in Handwriting Recognition, IEEE, pp. 758-762, 18 Sep., 2012.
Mozaffari S., Bahar P., “Farsi/Arabic handwritten from machine printed words discrimination”, International Conference on Frontiers in Handwriting Recognition, IEEE, pp. 698-703, 18 Sept., 2012.
Zemouri ET-T., Chibani Y., “Machine printed handwritten text discrimination using random transform and SVM classifier”, 11th International Conference on Intelligent Systems Design and Applications, IEEE, pp.1306-1310, 22 Nov., 2011.
Sulaimen S. N., Isa N. A. M., “Adaptive Fuzzy –K-Means Clustering Algorithm for Image Segmentation”, IEEE Transactions on consumer electronics, Vol.56, Issue 4, pp. 2661-2668, Nov 2010.
Silva et. al, “Automatic discrimination between printed and handwritten text in documents”, Brazilian symposiam on computer graphics and image processing, IEEE, pp. 261-267, 11 Oct., 2009.
Zheng et. al, “Machine printed text and handwriting identification in noisy document images ”, IEEE transaction on pattern analysis and machine intelligence, Vol 26, No 3, pp. 337-353, 26 Mar., 2004.
Kavallieratou E., Stamatates S., "Discrimination of Machine-Printed from Handwritten Text Using Simple Structural Characteristics", 17th international conference on pattern recognition (ICPR), IEEE , Vol.1, pp.437-440, 23 Aug., 2004.
Guo J. K. , Ma M. Y., “Separating handwritten material from machine printed text using hidden Markov Models”, 6th international conference on analysis and recognition (ICAR), pp. 439-443,2001.
Pal U., Chaudhuri B.B., "Machine-printed and hand-written text lines identification", Pattern recognition letter, Vol.22, Issue 3, Elsevier, pp. 431-441, 31 Mar., 2001.