Skip to main content

CX-DIFF: A Change Detection Algorithm for XML Content and Change Presentation Issues for WebVigiL

  • Conference paper
Conceptual Modeling for Novel Application Domains (ER 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2814))

Included in the following conference series:

Abstract

The exponential increase of information on the web has affected the manner in which the information is accessed, disseminated and delivered. The emphasis has shifted from mere viewing of information to efficient retrieval and monitoring of selective changes to information content. Hence, an effective monitoring system for change detection and notification based on user-profile is needed. WebVigiL is a general-purpose, active capability-based information monitoring and notification system, which handles specification, management, and propagation of customized changes as requested by a user. The emphasis of change detection in WebVigiL is to detect customized changes on the content of the document, based on user intent. As XML is an ordered semi-structured language, detecting customized changes to part of the value of the text nodes and even portion of the content spanning multiple text nodes of an ordered XML tree is difficult. In this paper, we propose an algorithm to handle customized change detection to content of XML documents based on user-intent. An optimization to the algorithm is presented that has a better performance for XML pages with certain characteristics. We also discuss various change presentation schemes to display the changes computed. We highlight the change detection in the context of WebVigiL and briefly describe the rest of the system.

This work was supported, in part, by the Office of Naval Research, the SPAWAR System Center-San Diego & by the Rome Laboratory (grant F30602-01-2-0543), and by NSF (grantsIIS-0123730 and IIS-0097517).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Chakravarthy, S., et al.: WebVigiL: An approach to Just-In-Time Information Propagation In Large Network-Centric Environments. In: Second International Workshop on Web Dynamics, Hawaii (2002)

    Google Scholar 

  2. Chakravarthy, S., et al.: WebVigiL: Architecture and Functionality of a Web Monitoring System, http://itlab.uta.edu/sharma/Projects/WebVigil/files/WVFetch.pdf

  3. Jacob, J., et al.: WebVigiL: An approach to Just-In-Time Information Propagation In Large Network-Centric Environments(to be published). In: Web Dynamics Book. Springer, Heidelberg (2003)

    Google Scholar 

  4. Pandrangi, N., et al.: WebVigiL: User Profile-Based Change Detection for HTML/XML Documents. In: Twentieth British National Conference on Databases. Coventry, UK (2003)

    Google Scholar 

  5. Hunt, J.W., Mcllroy, M.D.: An algorithm for efficient file comparison, Murray Hill, Bell Laboratories, N.J (1975)

    Google Scholar 

  6. Myers, E.: An O(ND) difference algorithm and its variations. Algorithmica 1, 251–266 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  7. Wu, S., Manber, U., Myers, E.: An O(NP) sequence comparision algorithm. Information Processing Letters 35, 317–323 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  8. Zhang, K., Shasha, D.: Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal of Computing 18(6), 1245–1262 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  9. Zhang, K., Statman, R., Shasha, D.: On the Editing Distance between Unordered Labeled Trees. Information Processing Letters 42, 133–139 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  10. Chawathe, S., et al.: Change detection in hierarchically structured information. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Montréal, Québec (1996)

    Google Scholar 

  11. Wang, Y., DeWitt, D., Cai, J.: X-Diff: An Effective Change Detection Algorithm for XML Documents, Technical Report, University of Wisconsin (2001)

    Google Scholar 

  12. Cobena, G., Abiteboul, S., Marian, A.: Detecting Changes in XML Documents. Data Engineering (2002)

    Google Scholar 

  13. Curbera, F.P., Epstein, D.A.: Fast Difference and Update of XML Documents. In: XTech 1999 (1999)

    Google Scholar 

  14. Fontaine, R.L.: A Delta Format for XML: Identifying Changes in XML Files and Representing the Changes in XML. In: XML Europe 2001, Berlin (May 2001)

    Google Scholar 

  15. Fontaine, R.L.: Merging XML Files: A New Approach Providing Intelligent Merge of XML Data Sets. In: XML Europe 2002, Barcelona, Spain (May 2002)

    Google Scholar 

  16. XMLDiffMerge, http://www.alphaworks.ibm.com/tech/xmldiffmerge

  17. DOMMITT, http://www.dommitt.com/

  18. Document Object Model, http://www.w3.org/DOM/

  19. XML, Extensible Markup Language(XML), World Wide Web Consortium, http://www.w3.org/XML/

  20. Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  21. Xerces-J, http://xml.apache.org/xerces2-j/index.html

  22. Hirschberg, D.: Algorithms for the longest common subsequence problem. Journal of the ACM, 664–675 (1977)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jacob, J., Sachde, A., Chakravarthy, S. (2003). CX-DIFF: A Change Detection Algorithm for XML Content and Change Presentation Issues for WebVigiL. In: Jeusfeld, M.A., Pastor, Ó. (eds) Conceptual Modeling for Novel Application Domains. ER 2003. Lecture Notes in Computer Science, vol 2814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39597-3_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39597-3_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20257-8

  • Online ISBN: 978-3-540-39597-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics