Ontology Driven Information Extraction from Tables Using Connectivity Analysis

Bahulkar, Ashwin; Reddy, Sreedhar

doi:10.1007/978-3-642-41030-7_47

Ontology Driven Information Extraction from Tables Using Connectivity Analysis

Ashwin Bahulkar²⁴ &
Sreedhar Reddy²⁴

Conference paper

1714 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8185))

Abstract

Table is one of the most common mechanisms used for presenting structured information on the web. A table presents information on a set of related concepts in a domain. A column typically represents a concept or an attribute of a concept that the column header identifies. A row contains corresponding instances and attribute values. However column headers are usually quite noisy and sometimes even missing. While a human reader can figure out the required domain mappings relatively easily by using domain knowledge and surrounding context, discovering them algorithmically poses challenges. In this paper we present an algorithm that exploits the idea that a table only presents information on connected entities of a domain ontology. The algorithm works in two phases. In the first phase it uses local optimization criteria such as lexical matching, instance matching, and so on to find an initial set of mappings. In the second phase it takes these mappings and constructs all possible connected sub graphs of the ontology that can be formed from these mappings. The largest of these sub graphs that has the highest local mapping score is then selected as the underlying domain mapping of the table. We present experimental results demonstrating the effectiveness of the algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and Searching Web Tables Using Entities, Types and Relationships. Proceedings of the Very Large Data Bases Endowment 3(1) (2010)
Google Scholar
Cafarella, M.J., Halevy, A., Wang, Z.D., Wu, E., Zhang, Y.: WebTables: Exploring the Power of Tables on the Web. In: Very Large Data Bases, Auckland, New Zealand (2008)
Google Scholar
Embley, D.W., Tao, C., Liddle, S.W.: Automating the Extraction of Data from HTML Tables with Unknown Structure. Data & Knowledge Engineering - Special Issue 54(1) (July 2005)
Google Scholar
Wang, H.L., Wu, S.H., Wang, K.K., Sung, C.L., Hsu, W.L., Shih, W.K.: Semantic Search on Internet Tabular Information Extraction for Answering Queries. In: Proceedings of the ACM CIKM International Conference on Information and Knowledge Management (2000)
Google Scholar
Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K.: Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering (2004)
Google Scholar
Furche, T., Gottlob, G., et al.: DIADEM: Domain-centric, Intelligent, Automated Data Extraction Methodology. In: World Wide Web Conference – European Projects Track (2012)
Google Scholar
Levenshtein distance. In: Black, P.E. (ed.) Dictionary of Algorithms and Data Structures, August 14, U.S. National Institute of Standards and Technology, Algorithms and Theory of Computation Handbook. CRC Press LLC (2008) (accessed October 31, 2011)
Google Scholar
Pivk, A., Cimiano, P., Sure, Y.: From Tables to Frames. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 166–181. Springer, Heidelberg (2004)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Tata Consultancy Services Ltd, 54 B Hadapsar Industrial Estate, Pune, India
Ashwin Bahulkar & Sreedhar Reddy

Authors

Ashwin Bahulkar
View author publications
You can also search for this author in PubMed Google Scholar
Sreedhar Reddy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

STAR Lab, Vrije Universiteit Brussel, Bldg G/10, Pleinlaan 2, 1050, Brussels, Belgium
Robert Meersman
CRAN, University of Lorraine, Campus Sciences, BP 70239, 54506, Vandoevre-les-Nancy, France
Hervé Panetto
Computer Science and Computer Engineering, La Trobe University, 3086, Melbourne, Victoria, Australia
Tharam Dillon
Dept. of Informatics Systems, University of Klagenfurt, Universitaetsstrasse 65, 9020, Klagenfurt, Austria
Johann Eder
LIRMM, University of Montpellier II, 161 Rue Ada, 34392, Montpellier Cedex 5, France
Zohra Bellahsene
Department of Informatics, University of Hamburg, 22527, Hamburg, Germany
Norbert Ritter
Department of Computer Sciences, VU University Amsterdam, De Boelelaan 1081, 1081 HV, Amsterdam, The Netherlands
Pieter De Leenheer
Computer and Information Science Department, University of Oregon, 97403, Eugene, OR, USA
Deijing Dou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bahulkar, A., Reddy, S. (2013). Ontology Driven Information Extraction from Tables Using Connectivity Analysis. In: Meersman, R., et al. On the Move to Meaningful Internet Systems: OTM 2013 Conferences. OTM 2013. Lecture Notes in Computer Science, vol 8185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41030-7_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-41030-7_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41029-1
Online ISBN: 978-3-642-41030-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics