PESTD: a large-scale Persian-English scene text dataset

Rashtehroudi, Atefeh Ranjkesh; Akoushideh, Alireza; Shahbahrami, Asadollah

doi:10.1007/s11042-023-15062-0

PESTD: a large-scale Persian-English scene text dataset

Published: 25 March 2023

Volume 82, pages 34793–34808, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Atefeh Ranjkesh Rashtehroudi¹,
Alireza Akoushideh ORCID: orcid.org/0000-0001-9958-4613² &
Asadollah Shahbahrami¹

248 Accesses
4 Citations
Explore all metrics

Abstract

Extracting text from natural scene images has become a vital issue. The uncertainty of size, color, background, and alignment of the characters make text recognition in natural scene images a demanding challenge. Also, another recent challenge has been the development and expansion of intelligent systems in the field of transportation, especially the recognition of traffic signs, which help ensure safer and easier driving. Therefore, existing a scene-text dataset as a benchmark to generalize researchers’ algorithms is critical. This study, as one of the first studies in the field of text-based traffic signs, intends to prepare a Persian-English multilingual dataset (PESTD) that includes 5832 instances including letters, digits, and symbols in three categories: Persian, English, and Persian-English. Due to the similarity of the calligraphy of numbers and letters in Persian (Farsi), Arabic and Urdu languages, The PESTD can be used in all countries with these languages. To prepare PESTD instances, the text detection process was performed on the traffic signs in Iran. The CRAFT feature extraction algorithm with YOLO and the Tesseract engine have been combined to take an effective step to recognize cursive and multilingual languages despite their specific challenges. Experimental results depict that the values of the evaluation criteria in YOLOv5 are better than its older versions. The accuracy and F1-score values on the PESTD have been attained at 95.3% and 92.3%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

TraffSign: Multilingual Traffic Signboard Text Detection and Recognition for Urdu and English

IndicSTR12: A Dataset for Indic Scene Text Recognition

A Review of Scene Text Detection and Recognition of South Indian Languages in Natural Scene Images

Data availability

The datasets generated during the current study are available in the Persian-English-Scene-Text-Dataset (PESTD) repository, [Link].

References

Ahmed SB, Naz S, Razzak MI, Yusof RB (2019) A novel dataset for English-Arabic scene text recognition (EASTR)-42 K and its evaluation using invariant feature extraction on detected extremal regions. IEEE Access
Baek Y, Lee B, Han D, Yun S, Lee H (2019) ‘Character region awareness for text detection’, in roceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Bochkovskiy A, Wang CY, Liao HYM (2020) ‘Yolov4: Optimal speed and accuracy of object detection’, arXiv preprint arXiv:2004.10934
Brunessaux S, Giroux P, Grilheres B, Manta M, Bodin M, Choukri K, Galibert O, Kahn J (2014) ‘The Maurdor project: improving automatic processing of digital documents’, 11th IAPR International Workshop on Document Analysis Systems (DAS), 349–354
Chernyshova Y, Emelianova E, Sheshkus A, Arlazarov VV (2021) ‘MIDV- LAIT: A Challenging Dataset for Recognition of IDs with Perso-Arabic, Thai, and Indian Scripts’, in International Conference on Document Analysis and Recognition, 258–272
Chowdhury MA, Deb K (2013) Extracting and Segmenting Container Name from Container Images. Int J Comput Appl 74:18–22
Google Scholar
Chtourou I, Rouhou AC, Jaiem FK, Kanoun S (2015) ‘ALTID: Arabic/Latin text images database for recognition research’, in Document Analysis and Recognition (ICDAR), in 13th International Conference on, 836–840
Dvorin Y, Havosha UE (2009) ‘Method and device for instant translation’, Google Patents
Greenwood PM, Lenneman JK, Baldwin CL (2022) Advanced driver assistance systems (ADAS): Demographics, preferred sources of information, and accuracy of ADAS knowledge. Transport Res F: Traffic Psychol Behav 86:131–150
Article Google Scholar
Gupta A, Vedaldi A, Zisserman A (2016) ‘Synthetic data for text localisation in natural images’, in Proceedings of the IEEE conference on computer vision and pattern recognition
‘International Phonetic Association and International Phonetic Association Staff and others, Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet’, Cambridge University Press, 1999.
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) ‘Synthetic data and artificial neural networks for natural scene text recognition’, in arXiv preprint arXiv:1406.2227
Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda, LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013) ‘ICDAR 2013 robust reading competition’, in 12th International Conference on Document Analysis and Recognition, IEEE
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, et al (2015) ‘ICDAR 2015 competition on robust reading’, in 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE
Kheirinejad S, Riaihi N, Azmi R (2020) ‘Persian Text Based Traffic sign Detection with Convolutional Neural Network: A New Dataset’, in 10th International Conference on Computer and Knowledge Engineering (ICCKE), IEEE
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ‘ICDAR 2003 robust reading competitions’, in Seventh International Conference on Document Analysis and Recognition, Proceedings, Springer
Maier D, Baden C, Stoltenberg D, De Vries-Kedem M, Waldherr A (2022) Machine translation vs. multilingual dictionaries assessing two strategies for the topic modeling of multilingual text collections. Commun Methods Meas 16(1):19–38
Article Google Scholar
Mishra A, Alahari K, Jawahar C (2012) ‘Top-down and bottom-up cues for scene text recognition’, in IEEE Conference on Computer Vision and Pattern Recognition, IEEE
Mseddi WS, Sedrine MA, Attia R (2021) ‘YOLOv5 Based Visual Localization for Autonomous Vehicles’, in 29th European Signal Processing Conference (EUSIPCO), 746–750
Naiemi F, Ghods V, Khalesi H (2022) Scene text detection and recognition: a survey. Multimed Tools Appl 81:1–36
Article Google Scholar
Phan TQ, Shivakumara P, Tian S, Tan CL (2013) ‘Recognizing text with perspective distortion in natural scenes’
Powers DM (2020) ‘Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation’, arXiv preprint arXiv:2010.16061
Rashtehroudi AR, Shahbahrami S, Akoushideh A (2020) ‘Iranian license plate recognition using deep learning’, in International Conference on Machine Vision and Image Processing (MVIP)
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint. https://doi.org/10.48550/ARXIV.1804.02767
Redmon J, Divvala S, Girshick R, Farhadi A (2016) ‘You only look once: Unified, real-time object detection’, in Proceedings of the IEEE conference on computer vision and pattern recognition
Schulz R, Talbot B, Lam O, Dayoub F, Corke P, Upcroft B, Wyeth G (2015) ‘Robot navigation using human cues: A robot navigation system for symbolic goal-directed exploration’, in International Conference on Robotics and Automation (ICRA)
Shetty AK, Saha I, Sanghvi RM, Save SA, Patel YJ (2021) ‘A review: Object detection models’, in 6th International Conference for Convergence in Technology (I2CT), 1–8
Tounsi M, Moalla I, Alimi AM, Lebouregois F (2015) ‘Arabic characters recognition in natural scenes using sparse coding for feature representations’, in 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE
Tounsi M, Moalla I, Alimi AM (2017) ARASTI: a database for Arabic scene text recognition. In 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, pp 140–144. https://doi.org/10.1109/ASAR.2017.8067776
Tourani A, Soroori S, Shahbahrami A, Akoushideh A (2021) ‘Iranis: A Large-scale Dataset of Iranian Vehicles License Plate Characters’, in 5th International Conference on Pattern Recognition and Image Analysis (IPRIA), pp. 1–5, https://doi.org/10.1109/IPRIA53572.2021.9483461
Tsai SS, Chen H, Chen D, Schroth G, Grzeszczuk R, Girod B (2011) ‘Mobile visual search on printed documents using text and low bit-rate features’, in 18th IEEE International Conference on Image Processing
Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) ‘Coco-text: Dataset and benchmark for text detection and recognition in natural images’, in arXiv preprint arXiv:1601.07140
Wang K, Wei Z (2022) YOLO V4 with hybrid dilated convolution attention module for object detection in the aerial dataset. Int J Remote Sens 43(4):1323–1344
Article Google Scholar
Wang K, Babenko B, Belongie S (2011) ‘End-to-end scene text recognition’, in International Conference on Computer Vision, IEEE
Wolf C, Jolion J (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4):280–296
Article Google Scholar
Wu W, Liu H, Li L, Long Y, Wang X, Wang Z, Li J, Chang Y (2021) Application of local fully Convolutional Neural Network combined with YOLO v5 algorithm in small target detection of remote sensing image. PLoS One 16(10):e0259283
Article Google Scholar
Yousfi S, Berrani S, Garcia C (2015) ‘ALIF: A dataset for Arabic embedded text recognition in TV broadcast’, in 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE
Zayene O, Hennebert J, Touj SM, Ingold R, Amara NEB (2015) ‘A dataset for Arabic text detection, tracking and recognition in news videos-AcTiV’, in 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE
Zhan F, Lu S, Xue C (2018) ‘Verisimilar image synthesis for accurate detection and recognition of texts in scenes’, in Proceedings of the European Conference on Computer Vision (ECCV)
Zhang C, Ding W, Peng G, Fu F, Wang W (2020) Street View Text Recognition With Deep Learning for Urban Scene Understanding in Intelligent Transportation Systems. IEEE Trans Intell Transp Syst 22:4727–4743
Article Google Scholar
Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: Recent advances and future trends. Front Comput Sci 10(1):19–36
Article Google Scholar

Download references

Funding

The author(s) received no financial support for this article’s research, authorship, and publication.

Author information

Authors and Affiliations

Computer Engineering Department, Guilan University, Rasht, Iran
Atefeh Ranjkesh Rashtehroudi & Asadollah Shahbahrami
Electrical and Computer Department, Technical and Vocational University (TVU), Guilan Branch, Rasht, Iran
Alireza Akoushideh

Authors

Atefeh Ranjkesh Rashtehroudi
View author publications
You can also search for this author in PubMed Google Scholar
Alireza Akoushideh
View author publications
You can also search for this author in PubMed Google Scholar
Asadollah Shahbahrami
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alireza Akoushideh.

Ethics declarations

Conflict of interest

The author(s) declared no potential conflicts of interest concerning this article’s research, authorship, and publication.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rashtehroudi, A.R., Akoushideh, A. & Shahbahrami, A. PESTD: a large-scale Persian-English scene text dataset. Multimed Tools Appl 82, 34793–34808 (2023). https://doi.org/10.1007/s11042-023-15062-0

Download citation

Received: 23 April 2022
Revised: 05 November 2022
Accepted: 02 March 2023
Published: 25 March 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s11042-023-15062-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PESTD: a large-scale Persian-English scene text dataset

Abstract

Access this article

Similar content being viewed by others

TraffSign: Multilingual Traffic Signboard Text Detection and Recognition for Urdu and English

IndicSTR12: A Dataset for Indic Scene Text Recognition

A Review of Scene Text Detection and Recognition of South Indian Languages in Natural Scene Images

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PESTD: a large-scale Persian-English scene text dataset

Abstract

Access this article

Similar content being viewed by others

TraffSign: Multilingual Traffic Signboard Text Detection and Recognition for Urdu and English

IndicSTR12: A Dataset for Indic Scene Text Recognition

A Review of Scene Text Detection and Recognition of South Indian Languages in Natural Scene Images

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation