Capturing Semantics in XML Documents

Ling, Tok Wang

doi:10.1007/11730262_2

Capturing Semantics in XML Documents

Tok Wang Ling¹⁸

Conference paper

425 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3915))

Abstract

Traditional semantic data models, such as the Entity Relationship (ER) data model, are used to represent real world semantics that are crucial for the effective management of structured data. The semantics that can be expressed in the ER data model include the representation of entity types together with their identifiers and attributes, n-ary relationship types together with their participating entity types and attributes, and functional dependencies among the participating entity types of relationship types and their attributes, etc.

Today, semistructured data has become more prevalent on the Web, and XML has become the de facto standard for semi-structured data. A DTD and an XML Schema of an XML document only reflect the hierarchical structure of the semistructured data stored in the XML document. The hierarchical structures of XML documents are captured by the relationships between an element and its attributes, and between an element and its subelements. Elementattribute relationships do not have clear semantics, and the relationships between elements and their subelements are binary. The semantics of n-ary relationships with n > 2 cannot be represented or captured correctly and precisely in DTD and XML Schema. Many of the crucial semantics captured by the ER model for structured data are not captured by either DTD or XML Schema. We present the problems encountered in order to correctly and efficiently store, query, and transform (view) XML documents without knowing these important semantics. We solve these problems by using a semantic-rich data model called the Object, Relationship, Attribute data model for SemiStructured Data (ORA-SS). We briefly describe how to mine such important semantics from given XML documents.

Download to read the full chapter text

Chapter PDF

Author information

Authors and Affiliations

Department of Computer Science, School of Computing, National University of Singapore, Singapore
Tok Wang Ling

Authors

Tok Wang Ling
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information Technology, Queensland University of Technology, Brisbane, Australia
Richi Nayak
Computer Science Department, Rensselaer Polytechnic Institute, USA
Mohammed J. Zaki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ling, T.W. (2006). Capturing Semantics in XML Documents. In: Nayak, R., Zaki, M.J. (eds) Knowledge Discovery from XML Documents. KDXD 2006. Lecture Notes in Computer Science, vol 3915. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11730262_2

Download citation

DOI: https://doi.org/10.1007/11730262_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33180-3
Online ISBN: 978-3-540-33181-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics