Abstract
Despite the power of Parser Expression Grammars (PEGs) and GLR, parsing is not a solved problem. Adding nondeterminism (parser speculation) to traditional LL and LR parsers can lead to unexpected parse-time behavior and introduces practical issues with error handling, single-step debugging, and side-effecting embedded grammar actions. This paper introduces the LL(*) parsing strategy and an associated grammar analysis algorithm that constructs LL(*) parsing decisions from ANTLR grammars. At parse-time, decisions gracefully throttle up from conventional fixed k>=1 lookahead to arbitrary lookahead and, finally, fail over to backtracking depending on the complexity of the parsing decision and the input symbols. LL(*) parsing strength reaches into the context-sensitive languages, in some cases beyond what GLR and PEGs can express. By statically removing as much speculation as possible, LL(*) provides the expressivity of PEGs while retaining LL's good error handling and unrestricted grammar actions. Widespread use of ANTLR (over 70,000 downloads/year) shows that it is effective for a wide variety of applications.
- Bermudez, M. E., and Schimpf, K. M. Practical arbitrary lookahead LR parsing. Journal of Computer and System Sciences 41, 2 (1990), 230--250.Google ScholarCross Ref
- Charles, P. A Practical Method for Constructing Efficient LALR(K) Parsers with Automatic Error Recovery. PhD thesis, New York University, New York, NY, USA, 1991. Google ScholarDigital Library
- Cohen, R., and Culik, K. LR-Regular grammars an extension of LR(k) grammars. In SWAT '71: Proceedings of the 12th Annual Symposium on Switching and Automata Theory (swat 1971) (Washington, DC, USA, 1971), IEEE Computer Society, pp. 153--165. Google ScholarDigital Library
- Earley, J. An efficient context-free parsing algorithm. Communications of the ACM 13, 2 (1970), 94--102. Google ScholarDigital Library
- Ford, B. Packrat Parsing: Simple, powerful, lazy, linear time. In Proceedings of annual ACM SIGPLAN International Conference on Functional Programming (2002), ACM Press, pp. 36--47. Google ScholarDigital Library
- Ford, B. Parsing Expression Grammars: A recognition-based syntactic foundation. In POPL '04: Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages (2004), ACM Press, pp. 111--122. Google ScholarDigital Library
- Grimm, R. Better extensibility through modular syntax. In PLDI'06: Proceedings of annual ACM SIGPLAN Conference on Programming Language Design and Implementation (2006), ACM Press, pp. 38--51. Google ScholarDigital Library
- Hanson, D. R. Compact recursive-descent parsing of expressions. Software Practice and Experience 15 (December 1985), 1205--1212. Google ScholarDigital Library
- Jarzabek, S., and Krawczyk, T. LL-Regular grammars. Information Processing Letters 4, 2 (1975), 31--37.Google ScholarCross Ref
- Jim, T., Mandelbaum, Y., and Walker, D. Semantics and algorithms for data-dependent grammars. In POPL '10: Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages (New York, NY, USA, 2010), ACM, pp. 417--430. Google ScholarDigital Library
- McPeak, S., and Necula, G. C. Elkhound: A fast, practical GLR parser generator. In Compiler Construction (2004), pp. 73--88.Google ScholarCross Ref
- Milton, D. R., and Fischer, C. N. LL(k) parsing for attributed grammars. In International Conference on Automata, Languages, and Programming (1979), pp. 422--430. Google ScholarDigital Library
- Nederhof, M.-J. Practical experiments with regular approximation of context-free languages. Computational Linguistics 26, 1 (2000), 17--44. Google ScholarDigital Library
- Nijholt, A. On the parsing of LL-Regular grammars. In Mathematical Foundations of Computer Science 1976 (Heidelberg, 1976), A. Mazurkiewicz, Ed., vol. 45 of Lecture Notes in Computer Science, Springer Verlag, pp. 446--452.Google Scholar
- Nijholt, A. On the parsing of LL-Regular grammars. In Mathematical Foundations of Computer Science 1976 (Heidelberg, 1976), A. Mazurkiewicz, Ed., vol. 45 of Lecture Notes in Computer Science, Springer Verlag, pp. 446--452.Google Scholar
- Parr, T. J. Obtaining practical variants of LL(k) and LR(k) for k > 1 by splitting the atomic k-tuple. PhD thesis, Purdue University, West Lafayette, IN, USA, 1993. Google ScholarDigital Library
- Parr, T. J., and Quong, R. W. Adding Semantic and Syntactic Predicates to LL(k)|pred-LL(k). In Proceedings of the International Conference on Compiler Construction; Edinburgh, Scotland (April 1994). Google ScholarDigital Library
- Poplawski, D. A. On LL-Regular grammars. Journal of Computer and System Sciences 18, 3 (1979), 218--227.Google ScholarCross Ref
- Tomita, M. Efficient Parsing for Natural Language. Kluwer Academic Publishers, 1986. Google ScholarDigital Library
- Woods, W. A. Transition network grammars for natural language analysis. Communications of the ACM 13, 10 (1970), 591--606. Google ScholarDigital Library
Index Terms
- LL(*): the foundation of the ANTLR parser generator
Recommendations
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languagesFor decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...
Adaptive LL(*) parsing: the power of dynamic analysis
OOPSLA '14Despite the advances made by modern parsing strategies such as PEG, LL(*), GLR, and GLL, parsing is not a solved problem. Existing approaches suffer from a number of weaknesses, including difficulties supporting side-effecting embedded actions, slow and/...
Adaptive LL(*) parsing: the power of dynamic analysis
OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & ApplicationsDespite the advances made by modern parsing strategies such as PEG, LL(*), GLR, and GLL, parsing is not a solved problem. Existing approaches suffer from a number of weaknesses, including difficulties supporting side-effecting embedded actions, slow and/...
Comments