research-article

LL(*): the foundation of the ANTLR parser generator

Authors:
Terence Parr

University of San Francisco, San Francisco, CA, USA

University of San Francisco, San Francisco, CA, USA
View Profile

,
Kathleen Fisher

Tufts University, Boston, MA, USA

Tufts University, Boston, MA, USA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 46 Issue 6June 2011pp 425–436https://doi.org/10.1145/1993316.1993548

Published:04 June 2011Publication History

ACM SIGPLAN Notices

Abstract

Despite the power of Parser Expression Grammars (PEGs) and GLR, parsing is not a solved problem. Adding nondeterminism (parser speculation) to traditional LL and LR parsers can lead to unexpected parse-time behavior and introduces practical issues with error handling, single-step debugging, and side-effecting embedded grammar actions. This paper introduces the LL(*) parsing strategy and an associated grammar analysis algorithm that constructs LL(*) parsing decisions from ANTLR grammars. At parse-time, decisions gracefully throttle up from conventional fixed k>=1 lookahead to arbitrary lookahead and, finally, fail over to backtracking depending on the complexity of the parsing decision and the input symbols. LL(*) parsing strength reaches into the context-sensitive languages, in some cases beyond what GLR and PEGs can express. By statically removing as much speculation as possible, LL(*) provides the expressivity of PEGs while retaining LL's good error handling and unrestricted grammar actions. Widespread use of ANTLR (over 70,000 downloads/year) shows that it is effective for a wide variety of applications.

References

Bermudez, M. E., and Schimpf, K. M. Practical arbitrary lookahead LR parsing. Journal of Computer and System Sciences 41, 2 (1990), 230--250.Google ScholarCross Ref
Charles, P. A Practical Method for Constructing Efficient LALR(K) Parsers with Automatic Error Recovery. PhD thesis, New York University, New York, NY, USA, 1991. Google ScholarDigital Library
Cohen, R., and Culik, K. LR-Regular grammars an extension of LR(k) grammars. In SWAT '71: Proceedings of the 12th Annual Symposium on Switching and Automata Theory (swat 1971) (Washington, DC, USA, 1971), IEEE Computer Society, pp. 153--165. Google ScholarDigital Library
Earley, J. An efficient context-free parsing algorithm. Communications of the ACM 13, 2 (1970), 94--102. Google ScholarDigital Library
Ford, B. Packrat Parsing: Simple, powerful, lazy, linear time. In Proceedings of annual ACM SIGPLAN International Conference on Functional Programming (2002), ACM Press, pp. 36--47. Google ScholarDigital Library
Ford, B. Parsing Expression Grammars: A recognition-based syntactic foundation. In POPL '04: Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages (2004), ACM Press, pp. 111--122. Google ScholarDigital Library
Grimm, R. Better extensibility through modular syntax. In PLDI'06: Proceedings of annual ACM SIGPLAN Conference on Programming Language Design and Implementation (2006), ACM Press, pp. 38--51. Google ScholarDigital Library
Hanson, D. R. Compact recursive-descent parsing of expressions. Software Practice and Experience 15 (December 1985), 1205--1212. Google ScholarDigital Library
Jarzabek, S., and Krawczyk, T. LL-Regular grammars. Information Processing Letters 4, 2 (1975), 31--37.Google ScholarCross Ref
Jim, T., Mandelbaum, Y., and Walker, D. Semantics and algorithms for data-dependent grammars. In POPL '10: Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages (New York, NY, USA, 2010), ACM, pp. 417--430. Google ScholarDigital Library
McPeak, S., and Necula, G. C. Elkhound: A fast, practical GLR parser generator. In Compiler Construction (2004), pp. 73--88.Google ScholarCross Ref
Milton, D. R., and Fischer, C. N. LL(k) parsing for attributed grammars. In International Conference on Automata, Languages, and Programming (1979), pp. 422--430. Google ScholarDigital Library
Nederhof, M.-J. Practical experiments with regular approximation of context-free languages. Computational Linguistics 26, 1 (2000), 17--44. Google ScholarDigital Library
Nijholt, A. On the parsing of LL-Regular grammars. In Mathematical Foundations of Computer Science 1976 (Heidelberg, 1976), A. Mazurkiewicz, Ed., vol. 45 of Lecture Notes in Computer Science, Springer Verlag, pp. 446--452.Google Scholar
Nijholt, A. On the parsing of LL-Regular grammars. In Mathematical Foundations of Computer Science 1976 (Heidelberg, 1976), A. Mazurkiewicz, Ed., vol. 45 of Lecture Notes in Computer Science, Springer Verlag, pp. 446--452.Google Scholar
Parr, T. J. Obtaining practical variants of LL(k) and LR(k) for k > 1 by splitting the atomic k-tuple. PhD thesis, Purdue University, West Lafayette, IN, USA, 1993. Google ScholarDigital Library
Parr, T. J., and Quong, R. W. Adding Semantic and Syntactic Predicates to LL(k)|pred-LL(k). In Proceedings of the International Conference on Compiler Construction; Edinburgh, Scotland (April 1994). Google ScholarDigital Library
Poplawski, D. A. On LL-Regular grammars. Journal of Computer and System Sciences 18, 3 (1979), 218--227.Google ScholarCross Ref
Tomita, M. Efficient Parsing for Natural Language. Kluwer Academic Publishers, 1986. Google ScholarDigital Library
Woods, W. A. Transition network grammars for natural language analysis. Communications of the ACM 13, 10 (1970), 591--606. Google ScholarDigital Library

Index Terms

LL(*): the foundation of the ANTLR parser generator
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Parsers
    2. Formal language definitions
      1. Syntax
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Parsing

Recommendations

Parsing expression grammars: a recognition-based syntactic foundation
POPL '04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages

For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...
Read More
Adaptive LL(*) parsing: the power of dynamic analysis
OOPSLA '14

Despite the advances made by modern parsing strategies such as PEG, LL(*), GLR, and GLL, parsing is not a solved problem. Existing approaches suffer from a number of weaknesses, including difficulties supporting side-effecting embedded actions, slow and/...
Read More
Adaptive LL(*) parsing: the power of dynamic analysis
OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications

Despite the advances made by modern parsing strategies such as PEG, LL(*), GLR, and GLL, parsing is not a solved problem. Existing approaches suffer from a number of weaknesses, including difficulties supporting side-effecting embedded actions, slow and/...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 46, Issue 6
PLDI '11
June 2011
652 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1993316
Issue’s Table of Contents
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2011
668 pages
ISBN:9781450306638
DOI:10.1145/1993498
General Chair:
Mary Hall
University of Utah
,
Program Chair:
David Padua
University of Illinois at Urbana-Champaign
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2011
Check for updates
Author Tags
augmented transition networks
backtracking
context-sensitive parsing
deterministic finite automata
glr
memoization
nondeterministic parsing
peg
semantic predicates
subset construction
syntactic predicates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 133
  Total Citations
  View Citations
- 2,353
  Total Downloads
- Downloads (Last 12 months)111
- Downloads (Last 6 weeks)24
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.