Abstract
Context-free language reachability (CFL-reachability) is a fundamental framework for program analysis. A large variety of static analyses can be formulated as CFL-reachability problems, which determines whether specific source-sink pairs in an edge-labeled graph are connected by a reachable path, i.e., a path whose edge labels form a string accepted by the given CFL. Computing CFL-reachability is expensive. The fastest algorithm exhibits a slightly subcubic time complexity with respect to the input graph size. Improving the scalability of CFL-reachability is of practical interest, but reducing the time complexity is inherently difficult.
In this paper, we focus on improving the scalability of CFL-reachability from a more practical perspective---reducing the input graph size. Our idea arises from the existence of trivial edges, i.e., edges that do not affect any reachable path in CFL-reachability. We observe that two nodes joined by trivial edges can be folded---by merging the two nodes with all the edges joining them removed---without affecting the CFL-reachability result. By studying the characteristic of the recursive state machines (RSMs), an alternative form of CFLs, we propose an approach to identify foldable node pairs without the need to verify the underlying reachable paths (which is equivalent to solving the CFL-reachability problem). In particular, given a CFL-reachability problem instance with an input graph G and an RSM, based on the correspondence between paths in G and state transitions in RSM, we propose a graph folding principle, which can determine whether two adjacent nodes are foldable by examining only their incoming and outgoing edges.
On top of the graph folding principle, we propose an efficient graph folding algorithm GF. The time complexity of GF is linear with respect to the number of nodes in the input graph. Our evaluations on two clients (alias analysis and value-flow analysis) show that GF significantly accelerates RSM/CFL-reachability by reducing the input graph size. On average, for value-flow analysis, GF reduces 60.96% of nodes and 42.67% of edges of the input graphs, obtaining a speedup of 4.65× and a memory usage reduction of 57.35%. For alias analysis, GF reduces 38.93% of nodes and 35.61% of edges of the input graphs, obtaining a speedup of 3.21× and a memory usage reduction of 65.19%.
Supplemental Material
Available for Download
Supplementary material of the paper "Recursive State Machine Guided Graph Folding for Context-Free Language Reachability", including the proofs of Property 4.1 and Property 4.2 of the paper.
- Rajeev Alur, Michael Benedikt, Kousha Etessami, Patrice Godefroid, Thomas Reps, and Mihalis Yannakakis. 2005. Analysis of recursive state machines. ACM Transactions on Programming Languages and Systems (TOPLAS), 27, 4 (2005), 786–818. https://doi.org/10.1145/1075382.1075387 Google ScholarDigital Library
- Rajeev Alur, Swarat Chaudhuri, Kousha Etessami, and P Madhusudan. 2005. On-the-fly reachability and cycle detection for recursive state machines. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 61–76. https://doi.org/10.1007/978-3-540-31980-1_5 Google ScholarDigital Library
- Rajeev Alur, Salvatore La Torre, and P Madhusudan. 2006. Modular strategies for recursive game graphs. Theoretical computer science, 354, 2 (2006), 230–249. https://doi.org/10.1016/j.tcs.2005.11.017 Google ScholarDigital Library
- Rajeev Alur and Parthasarathy Madhusudan. 2004. Visibly pushdown languages. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing. 202–211. https://doi.org/10.1145/1007352.1007390 Google ScholarDigital Library
- Osbert Bastani, Saswat Anand, and Alex Aiken. 2015. Specification Inference Using Context-Free Language Reachability. Acm Sigplan Notices, 50, 1 (2015), 553–566. https://doi.org/10.1145/2775051.2676977 Google ScholarDigital Library
- Massimo Benerecetti, Stefano Minopoli, and Adriano Peron. 2010. Analysis of timed recursive state machines. In 2010 17th International Symposium on Temporal Representation and Reasoning. 61–68. https://doi.org/10.1145/1075382.1075387 Google ScholarDigital Library
- Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications. 243–262. https://doi.org/10.1145/1639949.1640108 Google ScholarDigital Library
- Krishnendu Chatterjee, Bhavya Choudhary, and Andreas Pavlogiannis. 2018. Optimal Dyck reachability for data-dependence and alias analysis. Proc. ACM Program. Lang., 2, POPL (2018), 30:1–30:30. https://doi.org/10.1145/3158118 Google ScholarDigital Library
- Krishnendu Chatterjee, Rasmus Ibsen-Jensen, Andreas Pavlogiannis, and Prateesh Goyal. 2015. Faster algorithms for algebraic path properties in recursive state machines with constant treewidth. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 97–109. https://doi.org/10.1145/2676726.2676979 Google ScholarDigital Library
- Krishnendu Chatterjee and Yaron Velner. 2012. Mean-payoff pushdown games. In 2012 27th Annual IEEE Symposium on Logic in Computer Science. 195–204. https://doi.org/10.1109/LICS.2012.30 Google ScholarDigital Library
- Swarat Chaudhuri. 2008. Subcubic algorithms for recursive state machines. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 159–169. https://doi.org/10.1145/1328897.1328460 Google ScholarDigital Library
- Manuel Fähndrich, Jeffrey S Foster, Zhendong Su, and Alexander Aiken. 1998. Partial online cycle elimination in inclusion constraint graphs. In Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation. 85–96. https://doi.org/10.1145/277652.277667 Google ScholarDigital Library
- Olivier Gauwin, Anca Muscholl, and Michael Raskin. 2019. Minimization of visibly pushdown automata is NP-complete. arXiv preprint arXiv:1907.09563, https://doi.org/10.48550/arXiv.1907.09563 Google Scholar
- Tang Hao, Xiaoyin Wang, Lingming Zhang, Xie Bing, Zhang Lu, and Mei Hong. 2015. Summary-Based Context-Sensitive Data-Dependence Analysis in Presence of Callbacks. In Acm Sigplan-sigact Symposium on Principles of Programming Languages. https://doi.org/10.1145/2676726.2676997 Google ScholarDigital Library
- Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. 290–299. https://doi.org/10.1145/1273442.1250767 Google ScholarDigital Library
- Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location equivalence to optimize pointer analysis. In International Static Analysis Symposium. 265–280. https://doi.org/10.1007/978-3-540-74061-2_17 Google ScholarCross Ref
- David L. Heine and Monica S. Lam. 2003. A practical flow-sensitive and context-sensitive C and C++ memory leak detector. 168. https://doi.org/10.1145/780822.781150 Google ScholarDigital Library
- Matthias Heizmann, Christian Schilling, and Daniel Tischner. 2017. Minimization of visibly pushdown automata using partial Max-SAT. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 461–478. https://doi.org/10.48550/arXiv.1701.05160 Google ScholarDigital Library
- Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On synthesis of program analyzers. In International Conference on Computer Aided Verification. 422–430. https://doi.org/10.1007/978-3-319-41540-6_23 Google ScholarCross Ref
- John Kodumal and Alex Aiken. 2004. The set constraint/CFL reachability connection in practice. ACM Sigplan Notices, 39, 6 (2004), 207–218. https://doi.org/10.1145/996893.996867 Google ScholarDigital Library
- François Le Gall. 2014. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th international symposium on symbolic and algebraic computation. 296–303. https://doi.org/10.1145/2608628.2608664 Google ScholarDigital Library
- Yuxiang Lei, Shin Hwei Sui, Tan, and Qirun Zhang. 2023. Artifact of “Recursive State Machine Guided Graph Folding for Context-Free Language Reachability”. https://doi.org/10.5281/zenodo.7787371 Google ScholarDigital Library
- Yuxiang Lei and Yulei Sui. 2019. Fast and precise handling of positive weight cycles for field-sensitive pointer analysis. In International Static Analysis Symposium. 27–47. https://doi.org/10.1007/978-3-030-32304-2_3 Google ScholarDigital Library
- Yuxiang Lei, Yulei Sui, Shuo Ding, and Qirun Zhang. 2022. Taming transitive redundancy for context-free language reachability. Proceedings of the ACM on Programming Languages, 6, OOPSLA2 (2022), 1556–1582. https://doi.org/10.1145/3563343 Google ScholarDigital Library
- Yuanbo Li, Qirun Zhang, and Thomas Reps. 2020. Fast graph simplification for interleaved Dyck-reachability. In PLDI ’20: 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation. https://doi.org/10.1145/3385412.3386021 Google ScholarDigital Library
- David Melski and Thomas Reps. 2000. Interconvertibility of a class of set constraints and context-free-language reachability. Theoretical Computer Science, 248, 1-2 (2000), 29–98. https://doi.org/10.1145/258994.259006 Google ScholarDigital Library
- Nomair A Naeem and Ondrej Lhoták. 2008. Typestate-like analysis of multiple interacting objects. ACM Sigplan Notices, 43, 10 (2008), 347–366. https://doi.org/10.1145/1449764.1449792 Google ScholarDigital Library
- Esko Nuutila and Eljas Soisalon-Soininen. 1994. On finding the strongly connected components in a directed graph. Inform. Process. Lett., 49, 1 (1994), 9–14. https://doi.org/10.1016/0020-0190(94)90047-7 Google ScholarDigital Library
- David J Pearce, Paul HJ Kelly, and Chris Hankin. 2007. Efficient field-sensitive pointer analysis of C. ACM Transactions on Programming Languages and Systems (TOPLAS), 30, 1 (2007), 4–es. https://doi.org/10.1145/1290520.1290524 Google ScholarDigital Library
- Fernando Magno Quintao Pereira and Daniel Berlin. 2009. Wave propagation and deep propagation for pointer analysis. In 2009 International Symposium on Code Generation and Optimization. 126–135. https://doi.org/10.1109/CGO.2009.9 Google ScholarDigital Library
- Jakob Rehof and Manuel Fähndrich. 2001. Type-base flow analysis: from polymorphic subtyping to CFL-reachability. ACM SIGPLAN Notices, 36, 3 (2001), 54–66. https://doi.org/10.1145/373243.360208 Google ScholarDigital Library
- Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 49–61. https://doi.org/10.1145/199448.199462 Google ScholarDigital Library
- Thomas Reps, Akash Lal, and Nick Kidd. 2007. Program analysis using weighted pushdown systems. In International Conference on Foundations of Software Technology and Theoretical Computer Science. 23–51. https://doi.org/10.1007/978-3-540-77050-3_4 Google ScholarDigital Library
- Thomas Reps, Stefan Schwoon, Somesh Jha, and David Melski. 2005. Weighted pushdown systems and their application to interprocedural dataflow analysis. Science of Computer Programming, 58, 1-2 (2005), 206–263. https://doi.org/10.1016/j.scico.2005.02.009 Google ScholarDigital Library
- Thomas W. Reps. 1998. Program analysis via graph reachability. Information & Software Technology, 40, 11-12 (1998), 701–726. https://doi.org/10.1016/S0950-5849(98)00093-7 Google ScholarCross Ref
- Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution for scaling points-to analysis. Acm Sigplan Notices, 35, 5 (2000), 47–56. https://doi.org/10.1145/349299.349310 Google ScholarDigital Library
- Wojciech Rytter. 1983. Time complexity of loop-free two-way pushdown automata. Inform. Process. Lett., 16, 3 (1983), 127–129. https://doi.org/10.1016/0020-0190(83)90063-7 Google ScholarCross Ref
- Johannes Späth, Karim Ali, and Eric Bodden. 2019. Context-, flow-, and field-sensitive data-flow analysis using synchronized Pushdown systems. Proc. ACM Program. Lang., 3, POPL (2019), 48:1–48:29. https://doi.org/10.1145/3291641 Google ScholarDigital Library
- Yulei Sui, Xiao Cheng, Guanqin Zhang, and Haoyu Wang. 2020. Flow2Vec: value-flow-based precise code embedding. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 1–27. https://doi.org/10.1145/3428301 Google ScholarDigital Library
- Yulei Sui and Jingling Xue. 2016. On-demand strong update analysis via value-flow refinement. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 460–473. https://doi.org/10.1145/2950290.2950296 Google ScholarDigital Library
- Yulei Sui and Jingling Xue. 2016. SVF: interprocedural static value-flow analysis in LLVM. In Proceedings of the 25th international conference on compiler construction. 265–266. https://doi.org/10.1145/2892208.2892235 Google ScholarDigital Library
- Yulei Sui and Jingling Xue. 2018. Value-flow-based demand-driven pointer analysis for C and C++. IEEE Transactions on Software Engineering, 46, 8 (2018), 812–835. https://doi.org/10.48550/arXiv.1701.05650 Google ScholarCross Ref
- Yulei Sui, Ding Ye, and Jingling Xue. 2014. Detecting memory leaks statically with full-sparse value-flow analysis. IEEE Transactions on Software Engineering, 40, 2 (2014), 107–122. https://doi.org/10.1145/2338965.2336784 Google ScholarDigital Library
- Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing, 1, 2 (1972), 146–160. https://doi.org/10.1137/0201010 Google ScholarDigital Library
- Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. ACM SIGARCH Computer Architecture News, 45, 1 (2017), 389–404. https://doi.org/10.1145/3093336.3037744 Google ScholarDigital Library
- Virginia Vassilevska Williams and R. Ryan Williams. 2018. Subcubic Equivalences Between Path, Matrix, and Triangle Problems. J. ACM, 65, 5 (2018), 27:1–27:38. https://doi.org/10.1145/3186893 Google ScholarDigital Library
- Wojciech and Rytter. 1985. Fast recognition of pushdown automaton and context-free languages. Information and Control, https://doi.org/10.1016/S0019-9958(85)80024-3 Google ScholarDigital Library
- Guoqing Xu, Atanas Rountev, and Manu Sridharan. 2009. Scaling CFL-reachability-based points-to analysis using context-sensitive must-not-alias analysis. In European Conference on Object-Oriented Programming. 98–122. https://doi.org/10.1007/978-3-642-03013-0_6 Google ScholarDigital Library
- Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011. Demand-driven context-sensitive alias analysis for Java. In Proceedings of the 2011 International Symposium on Software Testing and Analysis. 155–165. https://doi.org/10.1145/2001420.2001440 Google ScholarDigital Library
- Qirun Zhang, Michael R Lyu, Hao Yuan, and Zhendong Su. 2013. Fast algorithms for Dyck-CFL-reachability with applications to alias analysis. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation. 435–446. https://doi.org/10.1145/2491956.2462159 Google ScholarDigital Library
- Qirun Zhang and Zhendong Su. 2017. Context-sensitive data-dependence analysis via linear conjunctive language reachability. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages. 344–358. https://doi.org/10.1145/3093333.3009848 Google ScholarDigital Library
- Qirun Zhang, Xiao Xiao, Charles Zhang, Hao Yuan, and Zhendong Su. 2014. Efficient subcubic alias analysis for C. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications. 829–845. https://doi.org/10.1145/2660193.2660213 Google ScholarDigital Library
- Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 197–208. https://doi.org/10.1145/1328897.1328464 Google ScholarDigital Library
Index Terms
- Recursive State Machine Guided Graph Folding for Context-Free Language Reachability
Recommendations
Taming transitive redundancy for context-free language reachability
Given an edge-labeled graph, context-free language reachability (CFL-reachability) computes reachable node pairs by deriving new edges and adding them to the graph. The redundancy that limits the scalability of CFL-reachability manifests as redundant ...
Precise and scalable context-sensitive pointer analysis via value flow graph
ISMM '13: Proceedings of the 2013 international symposium on memory managementIn this paper, we propose a novel method for context-sensitive pointer analysis using the value flow graph (VFG) formulation. We achieve context-sensitivity by simultaneously applying function cloning and computing context-free language reachability (...
Precise and scalable context-sensitive pointer analysis via value flow graph
ISMM '13: Proceedings of the 2013 international symposium on memory managementIn this paper, we propose a novel method for context-sensitive pointer analysis using the value flow graph (VFG) formulation. We achieve context-sensitivity by simultaneously applying function cloning and computing context-free language reachability (...
Comments