skip to main content
10.1145/1141277.1141467acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Computing edit distances between an XML document and a schema and its application in document classification

Published:23 April 2006Publication History

ABSTRACT

In this paper, we present an algorithm to find a sequence of top-down edit operations with minimum cost that transforms an XML document such that it conforms to a schema. It is shown that the algorithm runs in O(p x log p x n), where p is the size of the schema(grammar) and n is the size of the XML document (tree). We have also shown that edit distance with restricted top-down edit operations can be computed the same way.We will also show how to use the edit distances in document classification. Experimental studies have shown that our methods are effective in structure-oriented classification for both real and synthesized data sets.

References

  1. Nobutaka Suzuki, Finding an Optimum Edit Script between an XML Document and a DTD, Proceedings of ACM Symposium on Applied Computing, pp. 647 - 653, March, 2005, Santa Fe, NM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rodney Canfield, Guangming Xing, Approximate XML Document Matching (Poster), Proceedings of ACM Symposium on Applied Computing, March, 2005, Santa Fe, NM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Bray, J. Paoli, M. Sperberg-McQueen, and etl., Extensible Markup Language (XML) 1.0 (Third Edition), W3C, http://www.w3.org/TR/2004/REC-xml-20040204/.Google ScholarGoogle Scholar
  4. D. Shasha, K. Zhang, Approximate Tree Pattern Matching, Chapter 14 Pattern Matching Algorithms (eds. Apostolico, A. and Galil, Z.), Oxford University Press, June 1997.Google ScholarGoogle Scholar
  5. E. Tanaka, K. Tanaka, The Tree-to-tree Editing Problem, International Journal of Pattern Recognition and Artificial Intelligence, 2, (2), pp.221--240, 1988.Google ScholarGoogle Scholar
  6. M. Murata Hedge Automata: A Formal Model for XML Schemata http://www.xml.gr.jp/relax/hedge_nice.htmlGoogle ScholarGoogle Scholar
  7. G. Myers Approximately Matching Context Free Languages, Information Processing Letters, 54, 2, pp. 85--92, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Bertino, G. Guerrini, M. Mesiti, A Matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications, Info. Systems, V29, pp 23--46, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Boukottaya, C. Vanoirbeek, F. Paganelli, O. Abou Khaled, Automating XML Documents Transformations: A Conceptual Modelling Based Approach, Proceedings of 1st Asian-Pacific conference on Conceptual modelling, Dunedin, NZ, pp81 - 90 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Reis, P. Golgher, A. Silva, A. Laender, Automatic web news extraction using tree edit distance, WWW'04, 2004, pp 502--511, Manhattan, NY, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Chen, New Algorithm for Ordered Tree-to-Tree Correction Problem, J. of Algorithms, 40:135--158, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Nierman, H. V. Jagadish, Evaluating structural similarity in XML documents, WebDB 2002, Madison, Wisconsin, June 2002.Google ScholarGoogle Scholar
  13. XML Document Mining Challenge, http://xmlmining.lip6.fr/Google ScholarGoogle Scholar
  14. B. Chidlovskii, Schema Extraction from XML Data: A Grammatical Inference Approach, KRDB'01 Workshop, Rome, Italy, September 15, 2001.Google ScholarGoogle Scholar
  15. WEKA Project, http://www.cs.waikato.ac.nz/ml/weka/Google ScholarGoogle Scholar

Index Terms

  1. Computing edit distances between an XML document and a schema and its application in document classification

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
          April 2006
          1967 pages
          ISBN:1595931082
          DOI:10.1145/1141277

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 April 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate1,650of6,669submissions,25%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader