skip to main content

Recursive State Machine Guided Graph Folding for Context-Free Language Reachability

Published:06 June 2023Publication History
Skip Abstract Section

Abstract

Context-free language reachability (CFL-reachability) is a fundamental framework for program analysis. A large variety of static analyses can be formulated as CFL-reachability problems, which determines whether specific source-sink pairs in an edge-labeled graph are connected by a reachable path, i.e., a path whose edge labels form a string accepted by the given CFL. Computing CFL-reachability is expensive. The fastest algorithm exhibits a slightly subcubic time complexity with respect to the input graph size. Improving the scalability of CFL-reachability is of practical interest, but reducing the time complexity is inherently difficult.

In this paper, we focus on improving the scalability of CFL-reachability from a more practical perspective---reducing the input graph size. Our idea arises from the existence of trivial edges, i.e., edges that do not affect any reachable path in CFL-reachability. We observe that two nodes joined by trivial edges can be folded---by merging the two nodes with all the edges joining them removed---without affecting the CFL-reachability result. By studying the characteristic of the recursive state machines (RSMs), an alternative form of CFLs, we propose an approach to identify foldable node pairs without the need to verify the underlying reachable paths (which is equivalent to solving the CFL-reachability problem). In particular, given a CFL-reachability problem instance with an input graph G and an RSM, based on the correspondence between paths in G and state transitions in RSM, we propose a graph folding principle, which can determine whether two adjacent nodes are foldable by examining only their incoming and outgoing edges.

On top of the graph folding principle, we propose an efficient graph folding algorithm GF. The time complexity of GF is linear with respect to the number of nodes in the input graph. Our evaluations on two clients (alias analysis and value-flow analysis) show that GF significantly accelerates RSM/CFL-reachability by reducing the input graph size. On average, for value-flow analysis, GF reduces 60.96% of nodes and 42.67% of edges of the input graphs, obtaining a speedup of 4.65× and a memory usage reduction of 57.35%. For alias analysis, GF reduces 38.93% of nodes and 35.61% of edges of the input graphs, obtaining a speedup of 3.21× and a memory usage reduction of 65.19%.

Skip Supplemental Material Section

Supplemental Material

References

  1. Rajeev Alur, Michael Benedikt, Kousha Etessami, Patrice Godefroid, Thomas Reps, and Mihalis Yannakakis. 2005. Analysis of recursive state machines. ACM Transactions on Programming Languages and Systems (TOPLAS), 27, 4 (2005), 786–818. https://doi.org/10.1145/1075382.1075387 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rajeev Alur, Swarat Chaudhuri, Kousha Etessami, and P Madhusudan. 2005. On-the-fly reachability and cycle detection for recursive state machines. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 61–76. https://doi.org/10.1007/978-3-540-31980-1_5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Rajeev Alur, Salvatore La Torre, and P Madhusudan. 2006. Modular strategies for recursive game graphs. Theoretical computer science, 354, 2 (2006), 230–249. https://doi.org/10.1016/j.tcs.2005.11.017 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rajeev Alur and Parthasarathy Madhusudan. 2004. Visibly pushdown languages. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing. 202–211. https://doi.org/10.1145/1007352.1007390 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Osbert Bastani, Saswat Anand, and Alex Aiken. 2015. Specification Inference Using Context-Free Language Reachability. Acm Sigplan Notices, 50, 1 (2015), 553–566. https://doi.org/10.1145/2775051.2676977 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Massimo Benerecetti, Stefano Minopoli, and Adriano Peron. 2010. Analysis of timed recursive state machines. In 2010 17th International Symposium on Temporal Representation and Reasoning. 61–68. https://doi.org/10.1145/1075382.1075387 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications. 243–262. https://doi.org/10.1145/1639949.1640108 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Krishnendu Chatterjee, Bhavya Choudhary, and Andreas Pavlogiannis. 2018. Optimal Dyck reachability for data-dependence and alias analysis. Proc. ACM Program. Lang., 2, POPL (2018), 30:1–30:30. https://doi.org/10.1145/3158118 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Krishnendu Chatterjee, Rasmus Ibsen-Jensen, Andreas Pavlogiannis, and Prateesh Goyal. 2015. Faster algorithms for algebraic path properties in recursive state machines with constant treewidth. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 97–109. https://doi.org/10.1145/2676726.2676979 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Krishnendu Chatterjee and Yaron Velner. 2012. Mean-payoff pushdown games. In 2012 27th Annual IEEE Symposium on Logic in Computer Science. 195–204. https://doi.org/10.1109/LICS.2012.30 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Swarat Chaudhuri. 2008. Subcubic algorithms for recursive state machines. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 159–169. https://doi.org/10.1145/1328897.1328460 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Manuel Fähndrich, Jeffrey S Foster, Zhendong Su, and Alexander Aiken. 1998. Partial online cycle elimination in inclusion constraint graphs. In Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation. 85–96. https://doi.org/10.1145/277652.277667 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Olivier Gauwin, Anca Muscholl, and Michael Raskin. 2019. Minimization of visibly pushdown automata is NP-complete. arXiv preprint arXiv:1907.09563, https://doi.org/10.48550/arXiv.1907.09563 Google ScholarGoogle Scholar
  14. Tang Hao, Xiaoyin Wang, Lingming Zhang, Xie Bing, Zhang Lu, and Mei Hong. 2015. Summary-Based Context-Sensitive Data-Dependence Analysis in Presence of Callbacks. In Acm Sigplan-sigact Symposium on Principles of Programming Languages. https://doi.org/10.1145/2676726.2676997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. 290–299. https://doi.org/10.1145/1273442.1250767 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location equivalence to optimize pointer analysis. In International Static Analysis Symposium. 265–280. https://doi.org/10.1007/978-3-540-74061-2_17 Google ScholarGoogle ScholarCross RefCross Ref
  17. David L. Heine and Monica S. Lam. 2003. A practical flow-sensitive and context-sensitive C and C++ memory leak detector. 168. https://doi.org/10.1145/780822.781150 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Matthias Heizmann, Christian Schilling, and Daniel Tischner. 2017. Minimization of visibly pushdown automata using partial Max-SAT. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 461–478. https://doi.org/10.48550/arXiv.1701.05160 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On synthesis of program analyzers. In International Conference on Computer Aided Verification. 422–430. https://doi.org/10.1007/978-3-319-41540-6_23 Google ScholarGoogle ScholarCross RefCross Ref
  20. John Kodumal and Alex Aiken. 2004. The set constraint/CFL reachability connection in practice. ACM Sigplan Notices, 39, 6 (2004), 207–218. https://doi.org/10.1145/996893.996867 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. François Le Gall. 2014. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th international symposium on symbolic and algebraic computation. 296–303. https://doi.org/10.1145/2608628.2608664 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yuxiang Lei, Shin Hwei Sui, Tan, and Qirun Zhang. 2023. Artifact of “Recursive State Machine Guided Graph Folding for Context-Free Language Reachability”. https://doi.org/10.5281/zenodo.7787371 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yuxiang Lei and Yulei Sui. 2019. Fast and precise handling of positive weight cycles for field-sensitive pointer analysis. In International Static Analysis Symposium. 27–47. https://doi.org/10.1007/978-3-030-32304-2_3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yuxiang Lei, Yulei Sui, Shuo Ding, and Qirun Zhang. 2022. Taming transitive redundancy for context-free language reachability. Proceedings of the ACM on Programming Languages, 6, OOPSLA2 (2022), 1556–1582. https://doi.org/10.1145/3563343 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yuanbo Li, Qirun Zhang, and Thomas Reps. 2020. Fast graph simplification for interleaved Dyck-reachability. In PLDI ’20: 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation. https://doi.org/10.1145/3385412.3386021 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. David Melski and Thomas Reps. 2000. Interconvertibility of a class of set constraints and context-free-language reachability. Theoretical Computer Science, 248, 1-2 (2000), 29–98. https://doi.org/10.1145/258994.259006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Nomair A Naeem and Ondrej Lhoták. 2008. Typestate-like analysis of multiple interacting objects. ACM Sigplan Notices, 43, 10 (2008), 347–366. https://doi.org/10.1145/1449764.1449792 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Esko Nuutila and Eljas Soisalon-Soininen. 1994. On finding the strongly connected components in a directed graph. Inform. Process. Lett., 49, 1 (1994), 9–14. https://doi.org/10.1016/0020-0190(94)90047-7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. David J Pearce, Paul HJ Kelly, and Chris Hankin. 2007. Efficient field-sensitive pointer analysis of C. ACM Transactions on Programming Languages and Systems (TOPLAS), 30, 1 (2007), 4–es. https://doi.org/10.1145/1290520.1290524 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Fernando Magno Quintao Pereira and Daniel Berlin. 2009. Wave propagation and deep propagation for pointer analysis. In 2009 International Symposium on Code Generation and Optimization. 126–135. https://doi.org/10.1109/CGO.2009.9 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jakob Rehof and Manuel Fähndrich. 2001. Type-base flow analysis: from polymorphic subtyping to CFL-reachability. ACM SIGPLAN Notices, 36, 3 (2001), 54–66. https://doi.org/10.1145/373243.360208 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 49–61. https://doi.org/10.1145/199448.199462 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Thomas Reps, Akash Lal, and Nick Kidd. 2007. Program analysis using weighted pushdown systems. In International Conference on Foundations of Software Technology and Theoretical Computer Science. 23–51. https://doi.org/10.1007/978-3-540-77050-3_4 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Thomas Reps, Stefan Schwoon, Somesh Jha, and David Melski. 2005. Weighted pushdown systems and their application to interprocedural dataflow analysis. Science of Computer Programming, 58, 1-2 (2005), 206–263. https://doi.org/10.1016/j.scico.2005.02.009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Thomas W. Reps. 1998. Program analysis via graph reachability. Information & Software Technology, 40, 11-12 (1998), 701–726. https://doi.org/10.1016/S0950-5849(98)00093-7 Google ScholarGoogle ScholarCross RefCross Ref
  36. Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution for scaling points-to analysis. Acm Sigplan Notices, 35, 5 (2000), 47–56. https://doi.org/10.1145/349299.349310 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wojciech Rytter. 1983. Time complexity of loop-free two-way pushdown automata. Inform. Process. Lett., 16, 3 (1983), 127–129. https://doi.org/10.1016/0020-0190(83)90063-7 Google ScholarGoogle ScholarCross RefCross Ref
  38. Johannes Späth, Karim Ali, and Eric Bodden. 2019. Context-, flow-, and field-sensitive data-flow analysis using synchronized Pushdown systems. Proc. ACM Program. Lang., 3, POPL (2019), 48:1–48:29. https://doi.org/10.1145/3291641 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yulei Sui, Xiao Cheng, Guanqin Zhang, and Haoyu Wang. 2020. Flow2Vec: value-flow-based precise code embedding. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 1–27. https://doi.org/10.1145/3428301 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yulei Sui and Jingling Xue. 2016. On-demand strong update analysis via value-flow refinement. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 460–473. https://doi.org/10.1145/2950290.2950296 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yulei Sui and Jingling Xue. 2016. SVF: interprocedural static value-flow analysis in LLVM. In Proceedings of the 25th international conference on compiler construction. 265–266. https://doi.org/10.1145/2892208.2892235 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Yulei Sui and Jingling Xue. 2018. Value-flow-based demand-driven pointer analysis for C and C++. IEEE Transactions on Software Engineering, 46, 8 (2018), 812–835. https://doi.org/10.48550/arXiv.1701.05650 Google ScholarGoogle ScholarCross RefCross Ref
  43. Yulei Sui, Ding Ye, and Jingling Xue. 2014. Detecting memory leaks statically with full-sparse value-flow analysis. IEEE Transactions on Software Engineering, 40, 2 (2014), 107–122. https://doi.org/10.1145/2338965.2336784 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing, 1, 2 (1972), 146–160. https://doi.org/10.1137/0201010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. ACM SIGARCH Computer Architecture News, 45, 1 (2017), 389–404. https://doi.org/10.1145/3093336.3037744 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Virginia Vassilevska Williams and R. Ryan Williams. 2018. Subcubic Equivalences Between Path, Matrix, and Triangle Problems. J. ACM, 65, 5 (2018), 27:1–27:38. https://doi.org/10.1145/3186893 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wojciech and Rytter. 1985. Fast recognition of pushdown automaton and context-free languages. Information and Control, https://doi.org/10.1016/S0019-9958(85)80024-3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Guoqing Xu, Atanas Rountev, and Manu Sridharan. 2009. Scaling CFL-reachability-based points-to analysis using context-sensitive must-not-alias analysis. In European Conference on Object-Oriented Programming. 98–122. https://doi.org/10.1007/978-3-642-03013-0_6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011. Demand-driven context-sensitive alias analysis for Java. In Proceedings of the 2011 International Symposium on Software Testing and Analysis. 155–165. https://doi.org/10.1145/2001420.2001440 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Qirun Zhang, Michael R Lyu, Hao Yuan, and Zhendong Su. 2013. Fast algorithms for Dyck-CFL-reachability with applications to alias analysis. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation. 435–446. https://doi.org/10.1145/2491956.2462159 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Qirun Zhang and Zhendong Su. 2017. Context-sensitive data-dependence analysis via linear conjunctive language reachability. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages. 344–358. https://doi.org/10.1145/3093333.3009848 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Qirun Zhang, Xiao Xiao, Charles Zhang, Hao Yuan, and Zhendong Su. 2014. Efficient subcubic alias analysis for C. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications. 829–845. https://doi.org/10.1145/2660193.2660213 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 197–208. https://doi.org/10.1145/1328897.1328464 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Recursive State Machine Guided Graph Folding for Context-Free Language Reachability

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader