skip to main content
article

XDuce: A statically typed XML processing language

Published:01 May 2003Publication History
Skip Abstract Section

Abstract

XDuce is a statically typed programming language for XML processing. Its basic data values are XML documents, and its types (so-called regular expression types) directly correspond to document schemas. XDuce also provides a flexible form of regular expression pattern matching, integrating conditional branching, tag checking, and subtree extraction, as well as dynamic typechecking. We survey the principles of XDuce's design, develop examples illustrating its key features, describe its foundations in the theory of regular tree automata, and present a complete formal definition of its core, along with a proof of type safety.

References

  1. Abiteboul, S., Quass, D., McHugh, J., Widom, J., and Wiener, J. L. 1997. The Lorel query language for semistructured data. Int. J. Dig. Lib. 1, 1, 68--88.]]Google ScholarGoogle Scholar
  2. Asami, T. 2000. Relaxer. http://www.asahi-net.or.jp/˜dp8t-asm/java/tools/Relaxer/index.html.]]Google ScholarGoogle Scholar
  3. Bourret, R. 2001. XML data binding resources. http://www.rpbourret.com/xml/XMLData-Binding.htm.]]Google ScholarGoogle Scholar
  4. Brabrand, C., Møller, A., and Schwartzbach, M. I. 2001. Static validation of dynamically generated HTML. In Workshop on Program Analysis for Software Tools and Engineering (PASTE'01).]] Google ScholarGoogle Scholar
  5. Brabrand, C., Møller, A., and Schwartzbach, M. I. 2002. The <bigwig> project. ACM Trans. Inter. Tech. (TOIT).]] Google ScholarGoogle Scholar
  6. Bray, T., Paoli, J., Sperberg-McQueen, C. M., and Maler, E. 2000. Extensible markup language (XMLTM). http://www.w3.org/XML/.]]Google ScholarGoogle Scholar
  7. Brüggemann-Klein, A. 1993. Regular expressions into finite automata. Theoret. Comput. Sci. 120, 197--213.]] Google ScholarGoogle Scholar
  8. Buneman, P. and Pierce, B. 1998. Union types for semistructured data. In Proceedings of the International Database Programming Languages Workshop. Lecture Notes in Computer Science, vol. 1686. Springer-Verlag, New York.]] Google ScholarGoogle Scholar
  9. Cardelli, L. and Ghelli, G. 2001. A query language for semistructured data based on the ambient logic. In Proceedings of the 10th European Symposium on Programming. Lecture Notes in Computer Science, vol. 2028. Springer-Verlag, New York, pp. 1--22.]] Google ScholarGoogle Scholar
  10. Cardelli, L. and Gordon, A. D. 2000. Anytime, anywhere. Modal logics for mobile ambients. In Proceedings of the 27th ACM Symposium on Principles of Programming Languages. ACM, New York, 365--377.]] Google ScholarGoogle Scholar
  11. Christensen, A. S., Møller, A., and Schwartzbach, M. I. 2002a. Extending Java for high-level web service construction. ACM Trans. Inter. Tech. (TOIT).]]Google ScholarGoogle Scholar
  12. Christensen, A. S., Møller, A., and Schwartzbach, M. I. 2002b. Static analysis for dynamic xml. In PLAN-X: Programming Language Technologies for XML.]]Google ScholarGoogle Scholar
  13. Clark, J. 1999. XSL Transformations (XSLT). http://www.w3.org/TR/xslt.]]Google ScholarGoogle Scholar
  14. Clark, J. 2001. TREX: Tree Regular Expressions for XML. http://www.thaiopensource.com/trex/.]]Google ScholarGoogle Scholar
  15. Clark, J. and Murata, M. 2001. RELAX NG. http://www.relaxng.org.]]Google ScholarGoogle Scholar
  16. Cluet, S. and Siméon, J. 1998. Using YAT to build a web server. In Proceedings of the International Workshop on the Web and Databases (WebDB).]] Google ScholarGoogle Scholar
  17. Comon, H., Dauchet, M., Gilleron, R., Jacquemard, F., Lugiez, D., Tison, S., and Tommasi, M. 1999. Tree automata techniques and applications. Draft book; available electronically on http://www.grappa.univ-lille3.fr/tata.]]Google ScholarGoogle Scholar
  18. Deutsch, A., Fernandez, M., Florescu, D., Levy, A., and Suciu, D. 1998. XML-QL: A Query Language for XML. http://www.w3.org/TR/NOTE-xml-ql.]]Google ScholarGoogle Scholar
  19. Fähndrich, M. and Boyland, J. 1997. Statically checkable pattern abstractions. In Proceedings of the International Conference on Functional Programming (ICFP). 75--84.]] Google ScholarGoogle Scholar
  20. Fallside, D. C. 2001. XML Schema Part 0: Primer, W3C Recommendation. http://www.w3.org/TR/xmlschema-0/.]]Google ScholarGoogle Scholar
  21. Fankhauser, P., Fernández, M., Malhotra, A., Rys, M., Siméon, J., and Wadler, P. 2001. XQuery 1.0 Formal Semantics. http://www.w3.org/TR/query-semantics/.]]Google ScholarGoogle Scholar
  22. Fernández, M. F., Siméon, J., and Wadler, P. 2001. A semi-monad for semi-structured data. In Proceedings of 8th International Conference on Database Theory (ICDT 2001), J. V. den Bussche and V. Vianu, Eds. Lecture Notes in Computer Science, vol. 1973. Springer-Verlag, New York, 263--300.]] Google ScholarGoogle Scholar
  23. Frisch, A., Castagna, G., and Benzaken, V. 2002. Semantic subtyping. In Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science. IEEE Computer Society Press, Los Alamitos, Calif.]] Google ScholarGoogle Scholar
  24. Hosoya, H. 2003. Regular expression pattern matching---A simpler design. Tech. Rep. 1397, RIMS, Kyoto University, Kyoto, Japan.]]Google ScholarGoogle Scholar
  25. Hosoya, H., and Murata, M. 2002. Validation and Boolean operations for attribute-element constraints. In Programming Languages Technologies for XML (PLAN-X). 1--10.]]Google ScholarGoogle Scholar
  26. Hosoya, H. and Pierce, B. C. 2000. XDuce: A typed XML processing language (preliminary report). In Proceedings of 3rd International Workshop on the Web and Databases (WebDB2000). Lecture Notes in Computer Science, vol. 1997. Springer-Verlag, New York, 226--244.]] Google ScholarGoogle Scholar
  27. Hosoya, H. and Pierce, B. C. 2001. Regular expression pattern matching for XML. In Proceedings of the 25th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, New York, 67--80.]] Google ScholarGoogle Scholar
  28. Hosoya, H., Vouillon, J., and Pierce, B. C. 2000. Regular expression types for XML. In Proceedings of the International Conference on Functional Programming (ICFP). 11--22. (Full version under submission to TOPLAS.)]] Google ScholarGoogle Scholar
  29. Klarlund, N., Møller, A., and Schwartzbach, M. I. 2000. DSD: A schema language for XML. http://www.brics.dk/DSD/.]]Google ScholarGoogle Scholar
  30. Kuper, G. M. and Siméon, J. 2001. Subsumption for XML types. In Proceedings of the International Conference on Database Theory (ICDT'2001). London, England.]] Google ScholarGoogle Scholar
  31. Leroy, X., Vouillon, J., Doligez, D., Garrigue, J., Remy, D., and Vouillon, J. 1996. The Objective Caml system. Software and documentation available on the Web, http://pauillac.inria.fr/ocaml/.]]Google ScholarGoogle Scholar
  32. Meijer, E. and Shields, M. 1999. XMλ: A functional programming language for constructing and manipulating XML documents. Submitted to USENIX 2000 Technical Conference.]]Google ScholarGoogle Scholar
  33. Milner, R., Tofte, M., and Harper, R. 1990. The Definition of Standard ML. The MIT Press, Cambridge, Mass.]] Google ScholarGoogle Scholar
  34. Milo, T., Suciu, D., and Vianu, V. 2000. Typechecking for XML transformers. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, New York, 11--22.]] Google ScholarGoogle Scholar
  35. Murata, M. 1997. Transformation of documents and schemas by patterns and contextual conditions. In Principles of Document Processing '96. Lecture Notes in Computer Science, vol. 1293. Springer-Verlag, 153--169.]] Google ScholarGoogle Scholar
  36. Neumann, A. and Seidl, H. 1998. Locating matches of tree patterns in forests. In Proceedings of the 18th Symposium on Foundations of Software Technology and Theoretical Computer Science. Lecture Notes in Computer Science, vol. 1530. Springer-Verlag, New York, 134--145.]] Google ScholarGoogle Scholar
  37. Neven, F. and Schwentick, T. 2000. Expressive and efficient pattern languages for tree-structured data. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, New York, 145--156.]] Google ScholarGoogle Scholar
  38. Papakonstantinou, Y. and Vianu, V. 2000. DTD Inference for Views of XML Data. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, (Dallas, Tex). ACM, New York, 35--46.]] Google ScholarGoogle Scholar
  39. Peyton Jones, S. L., Hall, C. V., Hammond, K., Partain, W., and Wadler, P. 1993. The Glasgow Haskell compiler: A technical overview. In Proceedings of the UK Joint Framework for Information Technology (JFIT) Technical Conference.]]Google ScholarGoogle Scholar
  40. Pierce, B. C. 2002. Types and Programming Languages. MIT Press, Cambridge, Mass.]] Google ScholarGoogle Scholar
  41. Queinnec, C. 1990. Compilation of non-linear, second order patterns on s-expressions. In Programming Language Implementation and Logic Programming, 2nd International Workshop (PLILP'90). Lecture Notes in Computer Science. Springer-Verlag, New York, 340--357.]] Google ScholarGoogle Scholar
  42. Seidl, H. 1990. Deciding equivalence of finite tree automata. SIAM J. Comput. 19, 3 (June), 424--437.]] Google ScholarGoogle Scholar
  43. Shields, M. and Meijer, E. 2001. Type-indexed rows. In Proceedings of the 25th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (London, England). ACM, New York.]] Google ScholarGoogle Scholar
  44. Siméon, J. and Wadler, P. 2003. The essence of XML. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages.]] Google ScholarGoogle Scholar
  45. Sippu, S. and Soisalon-Soininen, E. 1988. Parsing theory. In EATCS Monographs on Theoretical Computer Science. Vol. 1. Springer-Verlag, New York.]]Google ScholarGoogle Scholar
  46. Sun Microsystems, I. 2001. The Java architecture for XML binding (JAXB). http://java.sun.com/xml/jaxb.]]Google ScholarGoogle Scholar
  47. Thiemann, P. 2002. A typed representation for html and xml documents in Haskell. J. Funct. Prog. 12, 425, 393--433.]] Google ScholarGoogle Scholar
  48. Tozawa, A. 2001. Towards static type inference for XSLT. In Proceedings of ACM Symposium on Document Engineering. ACM, New York.]] Google ScholarGoogle Scholar
  49. Wallace, M. and Runciman, C. 1999. Haskell and XML: Generic combinators or type-based translation? In Proceedings of the 4th ACM SIGPLAN International Conference on Functional Programming (ICFP'99). ACM SIGPLAN Notices, vol. 34-9. ACM, New York, 148--159.]] Google ScholarGoogle Scholar

Index Terms

  1. XDuce: A statically typed XML processing language

    Recommendations

    Reviews

    Kirill Rezchikov

    Hosoya and Pierce present the modern experimental programming language Xduce, which is specifically designed to process Extensible Markup Language (XML) documents. The language uses XML documents as basic data, and provides powerful functionality to manipulate them. Supported operations include composition of XML documents by schema, extraction of the parts of a document, and transformation of documents from one schema to another. A type system is one of the most interesting aspects of Xduce. It uses a static type system, based on information from document type definitions (DTD), to actually type-check a program. This language design results in reliable, error-free programs, which generate XML documents that always conform to specified types. Another remarkable feature of Xduce is its support for flexibility in subtyping relations, which therefore imposes no restrictions on dependencies between document schemas. The language is also helpful in schema design and implementation. Xduce demonstrates an efficient matching facility, based on regular expression pattern matching. The paper presents the core features of the language, including formal definitions of values, types, terms, and type checking, along with proofs of their soundness and completeness. The major contribution made by this paper is its complete description of Xduce as a programming language. The ideas used in Xduce have already inspired the development of a family of XML processing and schema languages, such as Cduce, tree regular expressions for XML (TREX), and relational exchange next generation (RELAX NG). Researchers and developers in XML processing languages will benefit from reading this paper. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Internet Technology
      ACM Transactions on Internet Technology  Volume 3, Issue 2
      May 2003
      91 pages
      ISSN:1533-5399
      EISSN:1557-6051
      DOI:10.1145/767193
      Issue’s Table of Contents

      Copyright © 2003 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 May 2003
      Published in toit Volume 3, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader