Skip to main content

Correctors for XML Data

  • Conference paper
Database and XML Technologies (XSym 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3186))

Included in the following conference series:

Abstract

A corrector takes an invalid XML file F as input and produces a valid file F′ which is not far from F when F is ε –close to its DTD, using the classical Tree Edit distance between a tree T and a language L defined by a DTD or a tree-automaton. We show how testers and correctors for regular trees can be used to estimate distances between a document and a set of DTDs, a useful operation to rank XML documents.

We describe the implementation of a linear time corrector using the Xerces parser and present test data for various DTDs comparing the parsing and correction time. We propose a generalization to homomorphic DTDs.

Work supported by ACI Sécurité Informatique: VERA of the French Ministry of Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, N., Krivelevich, M., Newman, I., Szegedy, M.: Regular languages are testable with a constant number of queries. In: IEEE Symposium on Foundations of Computer Science (1999)

    Google Scholar 

  2. Apostolico, A., Galil, Z.: Pattern matching algorithms, chapter 14. In: Approximate tree Pattern matching, Oxford University Press, Oxford (1997)

    Google Scholar 

  3. Blum, M., Kannan, S.: Designing programs that test their work. In: ACM Symposium on Theory of Computing, pp. 86–97 (1989)

    Google Scholar 

  4. Blum, M., Luby, M., Rubinfeld, R.: Self-testing/correcting with applications to numerical problems. In: ACM Symposium on Theory of Computing, pp. 73–83 (1990)

    Google Scholar 

  5. Chawathe, S., Rajaraman, A., Garcia-Molina, H., Widom, J.: Change detection in hierarchically structured information. In: Proceedings of the ACM SIGMOD, pp. 493–504 (1996)

    Google Scholar 

  6. Cormode, G.: Sequence distance embeddings. Ph.D. thesis, University of Warwick (2003)

    Google Scholar 

  7. de Rougemont. M.: A corrector for XML. In: ISIP: Franco-Japanese Workshop on Information Search, Integration and Personalization, Hokkaido University (2003), http://ca.meme.hokudai.ac.jp/project/fj2003/

  8. Goldreich, O., Goldwasser, S., Ron, D.: Property testing and its connection to learning and approximation. In: IEEE Symposium on Foundations of Computer Science, pp. 339–348 (1996)

    Google Scholar 

  9. Magniez, F., Rougemont, M.: Property testing of regular tree languages. ICALP (2004)

    Google Scholar 

  10. Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: Proceedings of the fifth International Workshop on the Web and Databases, pp. 61–66 (2002)

    Google Scholar 

  11. Rubinfeld, R., Sudan, M.: Robust characterizations of polynomials with applications to program testing. SIAM Journal on Computing 25, 23–32 (1996)

    Article  MathSciNet  Google Scholar 

  12. Tai, K.C.: The tree-to-tree correction Problem. Journal of the Association for Computing Machinery 26, 422–433 (1979)

    MATH  MathSciNet  Google Scholar 

  13. Tidy. HTML Tidy Library Project (2000), http://tidy.sourceforge.net

  14. Wagner, R., Fisher, M.: The string-to-string correction Problem. Journal of the Association for Computing Machinery 21, 168–173 (1974)

    MATH  MathSciNet  Google Scholar 

  15. Wu, S., Manber, U., Myers, E.: A subquadratic algorithm for approximate regular expression matching. Journal of algorithms 19, 346–360 (1995)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boobna, U., de Rougemont, M. (2004). Correctors for XML Data. In: Bellahsène, Z., Milo, T., Rys, M., Suciu, D., Unland, R. (eds) Database and XML Technologies. XSym 2004. Lecture Notes in Computer Science, vol 3186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30081-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30081-6_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22969-8

  • Online ISBN: 978-3-540-30081-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics