skip to main content
research-article

LL(*): the foundation of the ANTLR parser generator

Published:04 June 2011Publication History
Skip Abstract Section

Abstract

Despite the power of Parser Expression Grammars (PEGs) and GLR, parsing is not a solved problem. Adding nondeterminism (parser speculation) to traditional LL and LR parsers can lead to unexpected parse-time behavior and introduces practical issues with error handling, single-step debugging, and side-effecting embedded grammar actions. This paper introduces the LL(*) parsing strategy and an associated grammar analysis algorithm that constructs LL(*) parsing decisions from ANTLR grammars. At parse-time, decisions gracefully throttle up from conventional fixed k>=1 lookahead to arbitrary lookahead and, finally, fail over to backtracking depending on the complexity of the parsing decision and the input symbols. LL(*) parsing strength reaches into the context-sensitive languages, in some cases beyond what GLR and PEGs can express. By statically removing as much speculation as possible, LL(*) provides the expressivity of PEGs while retaining LL's good error handling and unrestricted grammar actions. Widespread use of ANTLR (over 70,000 downloads/year) shows that it is effective for a wide variety of applications.

References

  1. Bermudez, M. E., and Schimpf, K. M. Practical arbitrary lookahead LR parsing. Journal of Computer and System Sciences 41, 2 (1990), 230--250.Google ScholarGoogle ScholarCross RefCross Ref
  2. Charles, P. A Practical Method for Constructing Efficient LALR(K) Parsers with Automatic Error Recovery. PhD thesis, New York University, New York, NY, USA, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cohen, R., and Culik, K. LR-Regular grammars an extension of LR(k) grammars. In SWAT '71: Proceedings of the 12th Annual Symposium on Switching and Automata Theory (swat 1971) (Washington, DC, USA, 1971), IEEE Computer Society, pp. 153--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Earley, J. An efficient context-free parsing algorithm. Communications of the ACM 13, 2 (1970), 94--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ford, B. Packrat Parsing: Simple, powerful, lazy, linear time. In Proceedings of annual ACM SIGPLAN International Conference on Functional Programming (2002), ACM Press, pp. 36--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ford, B. Parsing Expression Grammars: A recognition-based syntactic foundation. In POPL '04: Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages (2004), ACM Press, pp. 111--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Grimm, R. Better extensibility through modular syntax. In PLDI'06: Proceedings of annual ACM SIGPLAN Conference on Programming Language Design and Implementation (2006), ACM Press, pp. 38--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hanson, D. R. Compact recursive-descent parsing of expressions. Software Practice and Experience 15 (December 1985), 1205--1212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jarzabek, S., and Krawczyk, T. LL-Regular grammars. Information Processing Letters 4, 2 (1975), 31--37.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jim, T., Mandelbaum, Y., and Walker, D. Semantics and algorithms for data-dependent grammars. In POPL '10: Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages (New York, NY, USA, 2010), ACM, pp. 417--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. McPeak, S., and Necula, G. C. Elkhound: A fast, practical GLR parser generator. In Compiler Construction (2004), pp. 73--88.Google ScholarGoogle ScholarCross RefCross Ref
  12. Milton, D. R., and Fischer, C. N. LL(k) parsing for attributed grammars. In International Conference on Automata, Languages, and Programming (1979), pp. 422--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Nederhof, M.-J. Practical experiments with regular approximation of context-free languages. Computational Linguistics 26, 1 (2000), 17--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nijholt, A. On the parsing of LL-Regular grammars. In Mathematical Foundations of Computer Science 1976 (Heidelberg, 1976), A. Mazurkiewicz, Ed., vol. 45 of Lecture Notes in Computer Science, Springer Verlag, pp. 446--452.Google ScholarGoogle Scholar
  15. Nijholt, A. On the parsing of LL-Regular grammars. In Mathematical Foundations of Computer Science 1976 (Heidelberg, 1976), A. Mazurkiewicz, Ed., vol. 45 of Lecture Notes in Computer Science, Springer Verlag, pp. 446--452.Google ScholarGoogle Scholar
  16. Parr, T. J. Obtaining practical variants of LL(k) and LR(k) for k > 1 by splitting the atomic k-tuple. PhD thesis, Purdue University, West Lafayette, IN, USA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Parr, T. J., and Quong, R. W. Adding Semantic and Syntactic Predicates to LL(k)|pred-LL(k). In Proceedings of the International Conference on Compiler Construction; Edinburgh, Scotland (April 1994). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Poplawski, D. A. On LL-Regular grammars. Journal of Computer and System Sciences 18, 3 (1979), 218--227.Google ScholarGoogle ScholarCross RefCross Ref
  19. Tomita, M. Efficient Parsing for Natural Language. Kluwer Academic Publishers, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Woods, W. A. Transition network grammars for natural language analysis. Communications of the ACM 13, 10 (1970), 591--606. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. LL(*): the foundation of the ANTLR parser generator

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 46, Issue 6
          PLDI '11
          June 2011
          652 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1993316
          Issue’s Table of Contents
          • cover image ACM Conferences
            PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
            June 2011
            668 pages
            ISBN:9781450306638
            DOI:10.1145/1993498
            • General Chair:
            • Mary Hall,
            • Program Chair:
            • David Padua

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 June 2011

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader