Using Ontology-Based Approach to Improved Information Retrieval Semantically for Historical Domain

Fatihah Ramli (1), Shahrul Azman Mohd Noah (2), Tri Basuki Kurniawan (3)
(1) Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 93400 Kota Samarahan, Sarawak, Malaysia
(2) Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
(3) Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
Fulltext View | Download
How to cite (IJASEIT) :
Ramli, Fatihah, et al. “Using Ontology-Based Approach to Improved Information Retrieval Semantically for Historical Domain”. International Journal on Advanced Science, Engineering and Information Technology, vol. 10, no. 3, June 2020, pp. 1130-6, doi:10.18517/ijaseit.10.3.10180.
Searching and retrieving documents from large historical archives prove to be challenging for the information retrieval (IR) field as historians typically employ their knowledge, experience, and intuition. There are several works done on the application of IR in historical documents. As such, the conventional IR model is mostly used a simple Bag-of-Word (BOW) approach and usually unable to support precise document retrieval for the domain of history. We proposed an ontology-based approach to semantically index and ranked rich historical documents. The historical documents relating to the Vietnam War were chosen for this study. Several existing ontologies have been reviewed to identify the most suitable concepts and properties which contain rich information pertaining to relevant entities such as an event, time, and people. The domain ontology was developed by utilizing the existing Simple News and Press (SNaP) ontology and extended with concepts related to the Vietnam War. The ontology was then semantically mapped with concepts found in a collection of 133 documents relating to the Vietnam war. In this paper, we also proposed a simple ontology-based weighting mechanism derived from the classic tf-idf scoring scheme. Finally, 20 SPARQL queries are implemented to do the evaluation. The evaluation shows that the proposed ontological-based approach achieved better results as compared to the base-line BM-25 probabilistic retrieval model in terms of precision and recall metrics. The use of the ontology-based approach in document retrieval can compete with the keyword-based approach.

T. Elena, A. Katifori, C. Vassilakis, G. Lepouras, and C. Halatsis, “Historical research in archives: user methodology and supporting tools,” International Journal on Digital Libraries, vol. 11, no. 1, pp. 25-36, 2010.

A. Gotscharek, A. Neumann, U. Reffle, C. Ringlstetter, and K. U. Schulz, “Enabling information retrieval on historical document collections,” Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data - AND 09, 2009.

M. J. A. Cabo and R. B. Llavori, “A retrieval language for historical documents,” Lecture Notes in Computer Science Database and Expert Systems Applications, pp. 216-225, 1998.

V. Mirzaee, L. Iverson, and B. Hamidzadeh. Towards ontological modelling of historical documents. in The 16th International Conference on Software Engineering and Knowledge Engineering (SEKE). 2004.

W. Frakes, Introduction to information storage and retrieval systems. Space, 1992. 14: p. 10.

S. Shekarpour, F. Alshargi, K. Thirunaravan, V. L. Shalin, and A. Sheth, “CEVO: comprehensive event ontology enhancing cognitive annotation on relations,” in 2019 IEEE 13th International Conference on Semantic Computing (ICSC), 2019, pp. 385-391.

I. Corda, " Ontology-based representation and reasoning about the history of science ," M. Eng. thesis, The University of Leeds, 2007

D. Demner-Fushman, S. Abhyankar, A. Jimeno-Yepes. A Knowledge-Based Approach to Medical Records Retrieval. in TREC. 2011.

S. Schockaert., M. Cock, and E. Kerre, Reasoning about fuzzy temporal information from the web: towards retrieval of historical events. Soft Computing, 2010. 14(8): p. 869-886.

O. Alonso, M. Gertz, and R. Baeza-Yates, On the value of temporal information in information retrieval. SIGIR Forum, 2007. 41(2): p. 35-41.

R. Campos, G. Dias, A. M. Jorge, A. Jatowt. Survey of temporal information retrieval and related applications. ACM Computing Surveys (CSUR), 2015. 47(2): p. 15.

H. P. Blossfeld, G. Rohwer, and T. Schneider, Event history analysis with Stata, 2019: Routledge.

G. Adomi, M. Maratea, L. Pandolfo, L. Pulina. An ontology for historical research documents. in International Conference on Web Reasoning and Rule Systems. 2015. Springer.

E. Hyví¶nen., O. Alm, and H. Kuittinen. Using an ontology of historical events in semantic portals for cultural heritage. in Proceedings of the Cultural Heritage on the Semantic Web Workshop at the 6th International Semantic Web Conference (ISWC 2007). 2007.

D. Calvanese, A. Mosca, J. Remesal, M. Rezk, and G. Rull, “A ‘historical case’ of Ontology-Based Data Access,” in 2015 Digital Heritage, 2015, pp. 291-298.

N. Ide and D. Woolner, “Historical Ontologies,” in Words and intelligence II, K. Ahmad, C. Brewster, and M. Stevenson, Eds. Dordrecht: Springer Netherlands, 2007, pp. 137-152.

J. M. Vieira and A. Ciula. Implementing an RDF/OWL Ontology on Henry the III Fine Rolls. in OWLED. 2007. Citeseer.

O. Signore. Ontology driven access to Museum Information. in Annual Conference of CIDOC Documentation and Users CIDOC. 2005.

C. d’Amato, S. Staab, A. G. B. Tettamanzi, T. D. Minh, and F. Gandon, “Ontology enrichment by discovering multi-relational association rules from ontological knowledge bases,” in Proceedings of the 31st Annual ACM Symposium on Applied Computing - SAC ’16, New York, New York, USA, 2016, pp. 333-338.

F. Ramli and S. A. Mohd Noah, “Building an event ontology for historical domain to support semantic document retrieval,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 6, no. 6, p. 1154, Dec. 2016.

Gopnik, M., Linguistic structures in scientific texts. Vol. 129. 2018: Walter de Gruyter GmbH & Co KG.

J. Pí©rez-Iglesias, J. R. Perez-Aguera, V. Fresno, Integrating the probabilistic models BM25/BM25F into Lucene. arXiv preprint arXiv:0911.5046, 2009.

P. Bafna, D. Pramod, and A. Vaidya, “Document clustering: TF-IDF approach,” 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 2016.

G. L. Zíºñiga, “Ontology: its transformation from philosophy to information systems,” Proceedings of the international conference on Formal Ontology in Information Systems - FOIS 01, 2001.

F. Jian, J. X. Huang, J. Zhao, T. He, and P. Hu, “A Simple Enhancement for Ad-hoc Information Retrieval via Topic Modelling,” Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR 16, 2016.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).