Abstract
Devanagari ancient document recognition process is drawing a lot of consideration from researchers nowadays. These ancient documents contain a wealth of knowledge. However, these documents are not available to all because of their fragile condition. A Devanagari ancient manuscript recognition system is designed for digital archiving. This system includes image binarization, character segmentation and recognition phases. It incorporates automatic recognition of scanned and segmented characters. Segmented characters may include basic characters (vowels and consonants), modifiers (matras) and various compound characters (characters formed by joining more than one basic characters). In this paper, handwritten Devanagari ancient manuscripts recognition system has been presented using statistical features extraction techniques. In feature extraction phase, intersection points, open endpoints, centroid, horizontal peak extent and vertical peak extent features are extracted. For classification, Convolutional Neural Network, Neural Network, Multilayer Perceptron, RBF-SVM and random forest techniques are considered in this work. Various feature extraction and classification techniques are considered and compared to the recognition of basic characters segmented from Devanagari ancient manuscripts. A data set, of 6152 pre-segmented samples of Devanagari ancient documents, is considered for experimental work. Authors have achieved 88.95% recognition accuracy using a combination of all features and a combination of all classifiers considered in this work by a simple majority voting scheme.





Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Shah K R and Badgujar D D 2013 Devnagari handwritten character recognition (DHCR) for ancient documents: a review. In: Proceedings of the 2013 IEEE Conference on Information and Communication Technology. 656–660
Sarkar R, Malakar S, Das N, Basu S and Nasipuri M 2010 A script independent technique for extraction of characters from handwritten word images. Int. J. Comput. Appl. 1(23): 83–88
Kleber F, Sablatnig R, Gau M and Miklas H 2008 Ancient document analysis based on text line extraction. In: Proceedings of the 19th International Conference on Pattern Recognition. 1–4
Bansal V and Sinha R M K 2001 A complete OCR for printed Hindi text in Devanagari script. In: Proceedings of the 6th International Conference on Document Analysis and Recognition. 800–804
Kim M S, Jang M D, Choi H L, Rhee T H, Kim J H and Kwag H K 2004 Digitalizing scheme of handwritten Hanja historical documents. In: Proceedings of the First International Workshop on Document Image Analysis for Libraries. 321–327
Sousa J M C, Pinto J R C, Ribeiro C S and Gil J M 2005 Ancient document recognition using fuzzy methods. In: Proceedings of the IEEE International Conference on Fuzzy Systems. 833–836
Cecotti H and Belaid A 2005 Hybrid OCR combination approach complemented by a specialized ICR applied on ancient documents. In: Proceedings of the 8th International Conference on Document Analysis and Recognition. 1045–1049
Diem M and Sablatnig R 2009 Recognition of degraded handwritten characters using local features. In: Proceedings of the 10th International Conference on Document Analysis and Recognition. 221–225
Raghuraj S, Yadav C S, Verma P and Yadav V 2010 Optical Character Recognition (OCR) for printed Devanagari script using artificial neural network. Int. J.Computer Science & Communication 1(1): 91–95
Holambe A N, Thool R C and Jagade S M 2011 A brief review and survey of feature extraction methods for Devnagari OCR. In: Proceedings of the 9th International Conference on ICT and Knowledge Engineering. 99–104
Yadav D, Sánchez-Cuadrado S and Morato J 2013 OCR for Hindi language using a neural network approach. J. Inf. Process. Syst. 9(1): 117–140
Yunxue S Y, Wang C and Xiao B 2015 A character image restoration method for unconstrained handwritten Chinese character recognition. Int. J. Doc. Anal. Recognit. 18(1): 73–86
Katiyar G and Mehfuz S 2016 A hybrid recognition system for off-line handwritten characters. SpringerPlus 5: 1–18
Belhe S, Paulzagade C, Deshmukh A, Jetley S and Mehrotra K 2012 Hindi handwritten word recognition using HMM and symbol tree. In: Proceedings of the Workshop on Document Analysis and Recognition (DAR). 9–14
Lehal G S and Singh C 1999 Feature extraction and classification for OCR of Gurmukhi script. Vivek 12(2): 2–12
Kumar M, Sharma R K and Jindal M K 2013 A novel feature extraction technique for offline handwritten Gurmukhi character recognition. IETE J. Res. 59(6): 687–692
Kumar M, Sharma R K and Jindal M K 2014 Efficient feature extraction techniques for offline handwritten Gurmukhi character recognition. Natl. Acad. Sci. Lett. 37(4): 381–391
Kumar M, Sharma R K and Jindal M K 2018 Character and numeral recognition for non-Indic and Indic scripts: a survey. Artif. Intell. Rev. https://doi.org/10.1007/s10462-017-9607-x
Kumar M, Jindal M K and Sharma R K 2012 Offline handwritten Gurmukhi character recognition: study of different features and classifiers combinations. In: Proceedings of the International Workshop on Document Analysis and Recognition, IIT Bombay. 94–99
Elleuch M, Maalej R and Kherallah M 2016 A new design based-SVM of the CNN classifier architecture with dropout for offline Arabic handwritten recognition. Procedia Computer Science 80: 1712–1723
Lecun Y, Bottou L, Bengio Y and Haffner P 1998 Gradient-based learning applied to document recognition. Proc. IEEE 86(11): 2278–2324
Niu X Y, Xia L Y, Wang T X and Zhang X Y 2010 Application of BP-ANN and LS-SVM to discrimination of rice origin based on trace metals. Proc. Int. Conf. Mach. Learn. Cybern. 3: 1426–1430
Zhang Y, Liu B and Yang F 2016 Differential evolution based selective ensemble of extreme learning machine. In: IEEE Trustcom/Bgdatase/ispa. 1327–1333
Kumar M, Sharma R K and Jindal M K 2014 A novel hierarchical technique for offline handwritten Gurmukhi character recognition. Natl. Acad. Sci. Lett. 37(6): 567–572
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Narang, S., Jindal, M.K. & Kumar, M. Devanagari ancient documents recognition using statistical feature extraction techniques. Sādhanā 44, 141 (2019). https://doi.org/10.1007/s12046-019-1126-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-019-1126-9