Devanagari ancient documents recognition using statistical feature extraction techniques

Narang, Sonika; Jindal, M K; Kumar, Munish

doi:10.1007/s12046-019-1126-9

Devanagari ancient documents recognition using statistical feature extraction techniques

Published: 13 May 2019

Volume 44, article number 141, (2019)
Cite this article

Sādhanā Aims and scope Submit manuscript

Sonika Narang¹,
M K Jindal² &
Munish Kumar³

354 Accesses
Explore all metrics

Abstract

Devanagari ancient document recognition process is drawing a lot of consideration from researchers nowadays. These ancient documents contain a wealth of knowledge. However, these documents are not available to all because of their fragile condition. A Devanagari ancient manuscript recognition system is designed for digital archiving. This system includes image binarization, character segmentation and recognition phases. It incorporates automatic recognition of scanned and segmented characters. Segmented characters may include basic characters (vowels and consonants), modifiers (matras) and various compound characters (characters formed by joining more than one basic characters). In this paper, handwritten Devanagari ancient manuscripts recognition system has been presented using statistical features extraction techniques. In feature extraction phase, intersection points, open endpoints, centroid, horizontal peak extent and vertical peak extent features are extracted. For classification, Convolutional Neural Network, Neural Network, Multilayer Perceptron, RBF-SVM and random forest techniques are considered in this work. Various feature extraction and classification techniques are considered and compared to the recognition of basic characters segmented from Devanagari ancient manuscripts. A data set, of 6152 pre-segmented samples of Devanagari ancient documents, is considered for experimental work. Authors have achieved 88.95% recognition accuracy using a combination of all features and a combination of all classifiers considered in this work by a simple majority voting scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 4

References

Shah K R and Badgujar D D 2013 Devnagari handwritten character recognition (DHCR) for ancient documents: a review. In: Proceedings of the 2013 IEEE Conference on Information and Communication Technology. 656–660
Sarkar R, Malakar S, Das N, Basu S and Nasipuri M 2010 A script independent technique for extraction of characters from handwritten word images. Int. J. Comput. Appl. 1(23): 83–88
Article Google Scholar
Kleber F, Sablatnig R, Gau M and Miklas H 2008 Ancient document analysis based on text line extraction. In: Proceedings of the 19th International Conference on Pattern Recognition. 1–4
Bansal V and Sinha R M K 2001 A complete OCR for printed Hindi text in Devanagari script. In: Proceedings of the 6th International Conference on Document Analysis and Recognition. 800–804
Kim M S, Jang M D, Choi H L, Rhee T H, Kim J H and Kwag H K 2004 Digitalizing scheme of handwritten Hanja historical documents. In: Proceedings of the First International Workshop on Document Image Analysis for Libraries. 321–327
Sousa J M C, Pinto J R C, Ribeiro C S and Gil J M 2005 Ancient document recognition using fuzzy methods. In: Proceedings of the IEEE International Conference on Fuzzy Systems. 833–836
Cecotti H and Belaid A 2005 Hybrid OCR combination approach complemented by a specialized ICR applied on ancient documents. In: Proceedings of the 8th International Conference on Document Analysis and Recognition. 1045–1049
Diem M and Sablatnig R 2009 Recognition of degraded handwritten characters using local features. In: Proceedings of the 10th International Conference on Document Analysis and Recognition. 221–225
Raghuraj S, Yadav C S, Verma P and Yadav V 2010 Optical Character Recognition (OCR) for printed Devanagari script using artificial neural network. Int. J.Computer Science & Communication 1(1): 91–95
Google Scholar
Holambe A N, Thool R C and Jagade S M 2011 A brief review and survey of feature extraction methods for Devnagari OCR. In: Proceedings of the 9th International Conference on ICT and Knowledge Engineering. 99–104
Yadav D, Sánchez-Cuadrado S and Morato J 2013 OCR for Hindi language using a neural network approach. J. Inf. Process. Syst. 9(1): 117–140
Article Google Scholar
Yunxue S Y, Wang C and Xiao B 2015 A character image restoration method for unconstrained handwritten Chinese character recognition. Int. J. Doc. Anal. Recognit. 18(1): 73–86
Article Google Scholar
Katiyar G and Mehfuz S 2016 A hybrid recognition system for off-line handwritten characters. SpringerPlus 5: 1–18
Article Google Scholar
Belhe S, Paulzagade C, Deshmukh A, Jetley S and Mehrotra K 2012 Hindi handwritten word recognition using HMM and symbol tree. In: Proceedings of the Workshop on Document Analysis and Recognition (DAR). 9–14
Lehal G S and Singh C 1999 Feature extraction and classification for OCR of Gurmukhi script. Vivek 12(2): 2–12
Google Scholar
Kumar M, Sharma R K and Jindal M K 2013 A novel feature extraction technique for offline handwritten Gurmukhi character recognition. IETE J. Res. 59(6): 687–692
Article Google Scholar
Kumar M, Sharma R K and Jindal M K 2014 Efficient feature extraction techniques for offline handwritten Gurmukhi character recognition. Natl. Acad. Sci. Lett. 37(4): 381–391
Article Google Scholar
Kumar M, Sharma R K and Jindal M K 2018 Character and numeral recognition for non-Indic and Indic scripts: a survey. Artif. Intell. Rev. https://doi.org/10.1007/s10462-017-9607-x
Kumar M, Jindal M K and Sharma R K 2012 Offline handwritten Gurmukhi character recognition: study of different features and classifiers combinations. In: Proceedings of the International Workshop on Document Analysis and Recognition, IIT Bombay. 94–99
Elleuch M, Maalej R and Kherallah M 2016 A new design based-SVM of the CNN classifier architecture with dropout for offline Arabic handwritten recognition. Procedia Computer Science 80: 1712–1723
Article Google Scholar
Lecun Y, Bottou L, Bengio Y and Haffner P 1998 Gradient-based learning applied to document recognition. Proc. IEEE 86(11): 2278–2324
Article Google Scholar
Niu X Y, Xia L Y, Wang T X and Zhang X Y 2010 Application of BP-ANN and LS-SVM to discrimination of rice origin based on trace metals. Proc. Int. Conf. Mach. Learn. Cybern. 3: 1426–1430
Google Scholar
Zhang Y, Liu B and Yang F 2016 Differential evolution based selective ensemble of extreme learning machine. In: IEEE Trustcom/Bgdatase/ispa. 1327–1333
Kumar M, Sharma R K and Jindal M K 2014 A novel hierarchical technique for offline handwritten Gurmukhi character recognition. Natl. Acad. Sci. Lett. 37(6): 567–572
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, DAV College, Abohar, India
Sonika Narang
Department of Computer Science and Applications, Panjab University Regional Centre, Muktsar, India
M K Jindal
Department of Computational Sciences, Maharaja Ranjit Singh Punjab Technical University, Bathinda, India
Munish Kumar

Authors

Sonika Narang
View author publications
You can also search for this author inPubMed Google Scholar
M K Jindal
View author publications
You can also search for this author inPubMed Google Scholar
Munish Kumar
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Munish Kumar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Narang, S., Jindal, M.K. & Kumar, M. Devanagari ancient documents recognition using statistical feature extraction techniques. Sādhanā 44, 141 (2019). https://doi.org/10.1007/s12046-019-1126-9

Download citation

Received: 19 July 2018
Revised: 14 November 2018
Accepted: 04 March 2019
Published: 13 May 2019
DOI: https://doi.org/10.1007/s12046-019-1126-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Devanagari ancient documents recognition using statistical feature extraction techniques

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Devanagari ancient character recognition using DCT features with adaptive boosting and bootstrap aggregating

Performance evaluation of different features and classifiers for Gurumukhi newspaper text recognition

Improved Recognition Results of Medieval Handwritten Gurmukhi Manuscripts Using Boosting and Bagging Methodologies

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Devanagari ancient documents recognition using statistical feature extraction techniques

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Devanagari ancient character recognition using DCT features with adaptive boosting and bootstrap aggregating

Performance evaluation of different features and classifiers for Gurumukhi newspaper text recognition

Improved Recognition Results of Medieval Handwritten Gurmukhi Manuscripts Using Boosting and Bagging Methodologies

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now