Abstract
It is difficult to develop style-preserving source-to-source transformation engines for C and C++. The main reason is not the complexity of those languages, but the use of the C pre-processor (cpp), especially ifdefs and macros. This has for example hindered the development of refactoring tools for C and C++.
In this paper we propose to combine multiple techniques and heuristics to parse C/C++ source files as-is, while still having only a few modifications to the original grammars of C and C++. We rely on the fact that in most C and C++ software, programmers follow a limited number of conventions on the use of cpp which makes it possible to disambiguate different situations by just looking at the context, names, or indentation of cpp constructs.
We have implemented a parser, Yacfe, based on these techniques and evaluated it on 16 large open source projects. Yacfe can on average parse 96% of those projects correctly. As a side effect, we also found mistakes in code that was not compiled because it was protected by particular ifdefs, but that was still analyzed by Yacfe. Using Yacfe on new projects may require adapting some of our techniques. We found that as conventions and idioms are shared by many projects, the adaptation time is on average less than 2 hours for a new project.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
EDG C++ frontend. Edison Design Group, www.edg.com
Aversano, L., Penta, M.D., Baxter, I.D.: Handling preprocessor-conditioned declarations. In: International Workshop on Source Code Analysis and Manipulation (2002)
Badros, G.J., Notkin, D.: A framework for preprocessor-aware C source code analyses. Software, Practice and Experience (2000)
Baxter, I.D., Pidgeon, C., Mehlich, M.: DMS: Program transformations for practical scalable software evolution. In: ICSE (2004)
Ellis, M.A., Stroustrup, B.: The Annotated C++ Reference Manual. Addison-Wesley, Reading (1990)
Ernst, M.D., Badros, G.J., Notkin, D., Member, S.: An empirical analysis of C preprocessor use. IEEE Transactions on Software Engineering (2002)
Evans, D.: Splint (2007), http://www.splint.org/
Fowler, M.: Refactoring tools, http://www.refactoring.com/tools.html
Fowler, M.: Refactoring: Improving the Design of Existing Code. Addison-Wesley, Reading (1999)
Garrido, A., Johnson, R.: Analyzing multiple configurations of a C program. In: ICSM (2005)
Johnson, S.C.: Yacc: Yet another compiler-compiler. Tech. rep, Unix Programmer’s Manual Vol 2b (1979)
Leroy, X.: Ocaml, http://caml.inria.fr/ocaml/
Livadas, P.E., Small, D.T.: Understanding code containing preprocessor constructs. In: IEEE Workshop on Program Comprehension (1994)
McCloskey, B., Brewer, E.: ASTEC: a new approach to refactoring C. In: FSE (2005)
McPeak, S., Necula, G.C.: Elkhound: A fast, practical GLR parser generator. In: Duesterwald, E. (ed.) CC 2004. LNCS, vol. 2985, pp. 73–88. Springer, Heidelberg (2004)
Necula, G.C., McPeak, S., Rahul, S.P., Weimer, W.: CIL: Intermediate language and tools for analysis and transformation of C programs. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, p. 213. Springer, Heidelberg (2002)
Padioleau, Y., Lawall, J.L., Hansen, R.R., Muller, G.: Documenting and automating collateral evolutions in Linux device drivers. In: EuroSys (2008)
Padioleau, Y., Tan, L., Zhou, Y.: Listening to programmers: Taxonomies and characteristics of comments in operating system code. In: ICSE (2009)
Ritchie, D.M., Kernighan, B.: The C Programming Language. Prentice-Hall, Englewood Cliffs (1988)
Spencer, H.: #ifdef considered harmful, or portability experience with C News. In: USENIX Summer (1992)
Spinellis, D.: Global analysis and transformations in preprocessed languages. IEEE Transactions on Software Engineering (2003)
Stallman, R. M. Using GCC. GNU Press, GNU C extensions (2003), http://gcc.gnu.org/onlinedocs/gcc/index.html#toc_C-Extensions
Stroustrup, B.: The Design and Evolution of C++. Addison-Wesley, Reading (1994)
Tomita, M.: An efficient context-free parsing algorithm for natural languages. In: IJCAI (1985)
Vittek, M.: Xrefactory for C/C++, http://xref-tech.com/xrefactory/main.html
Vittek, M.: Refactoring browser with preprocessor. In: Conference on Software Maintenance And Reengineering (2003)
Wadler, P.: Views: A way for pattern matching to cohabit with data abstraction. In: POPL (1987)
Wansbrough, K.: Macros and preprocessing in Haskell (1999), http://www.cl.cam.ac.uk/~kw217/research/misc/hspp-hw99.ps.gz
Willink, E.D., Vyacheslav, Muchnick, B.: Fog: A meta-compiler for C++ patterns. Tech. rep. (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Padioleau, Y. (2009). Parsing C/C++ Code without Pre-processing. In: de Moor, O., Schwartzbach, M.I. (eds) Compiler Construction. CC 2009. Lecture Notes in Computer Science, vol 5501. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00722-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-00722-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00721-7
Online ISBN: 978-3-642-00722-4
eBook Packages: Computer ScienceComputer Science (R0)