research-article

Open Access

Recursive State Machine Guided Graph Folding for Context-Free Language Reachability

Authors:
Yuxiang Lei

University of New South Wales, Australia

University of New South Wales, Australia

0000-0002-4484-8172
View Profile

,
Yulei Sui

University of New South Wales, Australia

University of New South Wales, Australia

0000-0002-9510-6574
View Profile

,
Shin Hwei Tan

Concordia University, Canada

Concordia University, Canada

0000-0001-8633-3372
View Profile

,
Qirun Zhang

Georgia Institute of Technology, USA

Georgia Institute of Technology, USA

0000-0001-5367-9377
View Profile

Proceedings of the ACM on Programming Languages Volume 7 Issue PLDIArticle No.: 119pp 318–342https://doi.org/10.1145/3591233

Published:06 June 2023Publication History

Proceedings of the ACM on Programming Languages

Abstract

Context-free language reachability (CFL-reachability) is a fundamental framework for program analysis. A large variety of static analyses can be formulated as CFL-reachability problems, which determines whether specific source-sink pairs in an edge-labeled graph are connected by a reachable path, i.e., a path whose edge labels form a string accepted by the given CFL. Computing CFL-reachability is expensive. The fastest algorithm exhibits a slightly subcubic time complexity with respect to the input graph size. Improving the scalability of CFL-reachability is of practical interest, but reducing the time complexity is inherently difficult.

In this paper, we focus on improving the scalability of CFL-reachability from a more practical perspective---reducing the input graph size. Our idea arises from the existence of trivial edges, i.e., edges that do not affect any reachable path in CFL-reachability. We observe that two nodes joined by trivial edges can be folded---by merging the two nodes with all the edges joining them removed---without affecting the CFL-reachability result. By studying the characteristic of the recursive state machines (RSMs), an alternative form of CFLs, we propose an approach to identify foldable node pairs without the need to verify the underlying reachable paths (which is equivalent to solving the CFL-reachability problem). In particular, given a CFL-reachability problem instance with an input graph G and an RSM, based on the correspondence between paths in G and state transitions in RSM, we propose a graph folding principle, which can determine whether two adjacent nodes are foldable by examining only their incoming and outgoing edges.

On top of the graph folding principle, we propose an efficient graph folding algorithm GF. The time complexity of GF is linear with respect to the number of nodes in the input graph. Our evaluations on two clients (alias analysis and value-flow analysis) show that GF significantly accelerates RSM/CFL-reachability by reducing the input graph size. On average, for value-flow analysis, GF reduces 60.96% of nodes and 42.67% of edges of the input graphs, obtaining a speedup of 4.65× and a memory usage reduction of 57.35%. For alias analysis, GF reduces 38.93% of nodes and 35.61% of edges of the input graphs, obtaining a speedup of 3.21× and a memory usage reduction of 65.19%.

Supplemental Material

Available for Download

zip

pldi23main-p98-p-archive.zip (333.3 KB)

Supplementary material of the paper "Recursive State Machine Guided Graph Folding for Context-Free Language Reachability", including the proofs of Property 4.1 and Property 4.2 of the paper.

References

Rajeev Alur, Michael Benedikt, Kousha Etessami, Patrice Godefroid, Thomas Reps, and Mihalis Yannakakis. 2005. Analysis of recursive state machines. ACM Transactions on Programming Languages and Systems (TOPLAS), 27, 4 (2005), 786–818. https://doi.org/10.1145/1075382.1075387 Google ScholarDigital Library
Rajeev Alur, Swarat Chaudhuri, Kousha Etessami, and P Madhusudan. 2005. On-the-fly reachability and cycle detection for recursive state machines. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 61–76. https://doi.org/10.1007/978-3-540-31980-1_5 Google ScholarDigital Library
Rajeev Alur, Salvatore La Torre, and P Madhusudan. 2006. Modular strategies for recursive game graphs. Theoretical computer science, 354, 2 (2006), 230–249. https://doi.org/10.1016/j.tcs.2005.11.017 Google ScholarDigital Library
Rajeev Alur and Parthasarathy Madhusudan. 2004. Visibly pushdown languages. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing. 202–211. https://doi.org/10.1145/1007352.1007390 Google ScholarDigital Library
Osbert Bastani, Saswat Anand, and Alex Aiken. 2015. Specification Inference Using Context-Free Language Reachability. Acm Sigplan Notices, 50, 1 (2015), 553–566. https://doi.org/10.1145/2775051.2676977 Google ScholarDigital Library
Massimo Benerecetti, Stefano Minopoli, and Adriano Peron. 2010. Analysis of timed recursive state machines. In 2010 17th International Symposium on Temporal Representation and Reasoning. 61–68. https://doi.org/10.1145/1075382.1075387 Google ScholarDigital Library
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications. 243–262. https://doi.org/10.1145/1639949.1640108 Google ScholarDigital Library
Krishnendu Chatterjee, Bhavya Choudhary, and Andreas Pavlogiannis. 2018. Optimal Dyck reachability for data-dependence and alias analysis. Proc. ACM Program. Lang., 2, POPL (2018), 30:1–30:30. https://doi.org/10.1145/3158118 Google ScholarDigital Library
Krishnendu Chatterjee, Rasmus Ibsen-Jensen, Andreas Pavlogiannis, and Prateesh Goyal. 2015. Faster algorithms for algebraic path properties in recursive state machines with constant treewidth. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 97–109. https://doi.org/10.1145/2676726.2676979 Google ScholarDigital Library
Krishnendu Chatterjee and Yaron Velner. 2012. Mean-payoff pushdown games. In 2012 27th Annual IEEE Symposium on Logic in Computer Science. 195–204. https://doi.org/10.1109/LICS.2012.30 Google ScholarDigital Library
Swarat Chaudhuri. 2008. Subcubic algorithms for recursive state machines. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 159–169. https://doi.org/10.1145/1328897.1328460 Google ScholarDigital Library
Manuel Fähndrich, Jeffrey S Foster, Zhendong Su, and Alexander Aiken. 1998. Partial online cycle elimination in inclusion constraint graphs. In Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation. 85–96. https://doi.org/10.1145/277652.277667 Google ScholarDigital Library
Olivier Gauwin, Anca Muscholl, and Michael Raskin. 2019. Minimization of visibly pushdown automata is NP-complete. arXiv preprint arXiv:1907.09563, https://doi.org/10.48550/arXiv.1907.09563 Google Scholar
Tang Hao, Xiaoyin Wang, Lingming Zhang, Xie Bing, Zhang Lu, and Mei Hong. 2015. Summary-Based Context-Sensitive Data-Dependence Analysis in Presence of Callbacks. In Acm Sigplan-sigact Symposium on Principles of Programming Languages. https://doi.org/10.1145/2676726.2676997 Google ScholarDigital Library
Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. 290–299. https://doi.org/10.1145/1273442.1250767 Google ScholarDigital Library
Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location equivalence to optimize pointer analysis. In International Static Analysis Symposium. 265–280. https://doi.org/10.1007/978-3-540-74061-2_17 Google ScholarCross Ref
David L. Heine and Monica S. Lam. 2003. A practical flow-sensitive and context-sensitive C and C++ memory leak detector. 168. https://doi.org/10.1145/780822.781150 Google ScholarDigital Library
Matthias Heizmann, Christian Schilling, and Daniel Tischner. 2017. Minimization of visibly pushdown automata using partial Max-SAT. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 461–478. https://doi.org/10.48550/arXiv.1701.05160 Google ScholarDigital Library
Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On synthesis of program analyzers. In International Conference on Computer Aided Verification. 422–430. https://doi.org/10.1007/978-3-319-41540-6_23 Google ScholarCross Ref
John Kodumal and Alex Aiken. 2004. The set constraint/CFL reachability connection in practice. ACM Sigplan Notices, 39, 6 (2004), 207–218. https://doi.org/10.1145/996893.996867 Google ScholarDigital Library
François Le Gall. 2014. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th international symposium on symbolic and algebraic computation. 296–303. https://doi.org/10.1145/2608628.2608664 Google ScholarDigital Library
Yuxiang Lei, Shin Hwei Sui, Tan, and Qirun Zhang. 2023. Artifact of “Recursive State Machine Guided Graph Folding for Context-Free Language Reachability”. https://doi.org/10.5281/zenodo.7787371 Google ScholarDigital Library
Yuxiang Lei and Yulei Sui. 2019. Fast and precise handling of positive weight cycles for field-sensitive pointer analysis. In International Static Analysis Symposium. 27–47. https://doi.org/10.1007/978-3-030-32304-2_3 Google ScholarDigital Library
Yuxiang Lei, Yulei Sui, Shuo Ding, and Qirun Zhang. 2022. Taming transitive redundancy for context-free language reachability. Proceedings of the ACM on Programming Languages, 6, OOPSLA2 (2022), 1556–1582. https://doi.org/10.1145/3563343 Google ScholarDigital Library
Yuanbo Li, Qirun Zhang, and Thomas Reps. 2020. Fast graph simplification for interleaved Dyck-reachability. In PLDI ’20: 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation. https://doi.org/10.1145/3385412.3386021 Google ScholarDigital Library
David Melski and Thomas Reps. 2000. Interconvertibility of a class of set constraints and context-free-language reachability. Theoretical Computer Science, 248, 1-2 (2000), 29–98. https://doi.org/10.1145/258994.259006 Google ScholarDigital Library
Nomair A Naeem and Ondrej Lhoták. 2008. Typestate-like analysis of multiple interacting objects. ACM Sigplan Notices, 43, 10 (2008), 347–366. https://doi.org/10.1145/1449764.1449792 Google ScholarDigital Library
Esko Nuutila and Eljas Soisalon-Soininen. 1994. On finding the strongly connected components in a directed graph. Inform. Process. Lett., 49, 1 (1994), 9–14. https://doi.org/10.1016/0020-0190(94)90047-7 Google ScholarDigital Library
David J Pearce, Paul HJ Kelly, and Chris Hankin. 2007. Efficient field-sensitive pointer analysis of C. ACM Transactions on Programming Languages and Systems (TOPLAS), 30, 1 (2007), 4–es. https://doi.org/10.1145/1290520.1290524 Google ScholarDigital Library
Fernando Magno Quintao Pereira and Daniel Berlin. 2009. Wave propagation and deep propagation for pointer analysis. In 2009 International Symposium on Code Generation and Optimization. 126–135. https://doi.org/10.1109/CGO.2009.9 Google ScholarDigital Library
Jakob Rehof and Manuel Fähndrich. 2001. Type-base flow analysis: from polymorphic subtyping to CFL-reachability. ACM SIGPLAN Notices, 36, 3 (2001), 54–66. https://doi.org/10.1145/373243.360208 Google ScholarDigital Library
Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 49–61. https://doi.org/10.1145/199448.199462 Google ScholarDigital Library
Thomas Reps, Akash Lal, and Nick Kidd. 2007. Program analysis using weighted pushdown systems. In International Conference on Foundations of Software Technology and Theoretical Computer Science. 23–51. https://doi.org/10.1007/978-3-540-77050-3_4 Google ScholarDigital Library
Thomas Reps, Stefan Schwoon, Somesh Jha, and David Melski. 2005. Weighted pushdown systems and their application to interprocedural dataflow analysis. Science of Computer Programming, 58, 1-2 (2005), 206–263. https://doi.org/10.1016/j.scico.2005.02.009 Google ScholarDigital Library
Thomas W. Reps. 1998. Program analysis via graph reachability. Information & Software Technology, 40, 11-12 (1998), 701–726. https://doi.org/10.1016/S0950-5849(98)00093-7 Google ScholarCross Ref
Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution for scaling points-to analysis. Acm Sigplan Notices, 35, 5 (2000), 47–56. https://doi.org/10.1145/349299.349310 Google ScholarDigital Library
Wojciech Rytter. 1983. Time complexity of loop-free two-way pushdown automata. Inform. Process. Lett., 16, 3 (1983), 127–129. https://doi.org/10.1016/0020-0190(83)90063-7 Google ScholarCross Ref
Johannes Späth, Karim Ali, and Eric Bodden. 2019. Context-, flow-, and field-sensitive data-flow analysis using synchronized Pushdown systems. Proc. ACM Program. Lang., 3, POPL (2019), 48:1–48:29. https://doi.org/10.1145/3291641 Google ScholarDigital Library
Yulei Sui, Xiao Cheng, Guanqin Zhang, and Haoyu Wang. 2020. Flow2Vec: value-flow-based precise code embedding. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 1–27. https://doi.org/10.1145/3428301 Google ScholarDigital Library
Yulei Sui and Jingling Xue. 2016. On-demand strong update analysis via value-flow refinement. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 460–473. https://doi.org/10.1145/2950290.2950296 Google ScholarDigital Library
Yulei Sui and Jingling Xue. 2016. SVF: interprocedural static value-flow analysis in LLVM. In Proceedings of the 25th international conference on compiler construction. 265–266. https://doi.org/10.1145/2892208.2892235 Google ScholarDigital Library
Yulei Sui and Jingling Xue. 2018. Value-flow-based demand-driven pointer analysis for C and C++. IEEE Transactions on Software Engineering, 46, 8 (2018), 812–835. https://doi.org/10.48550/arXiv.1701.05650 Google ScholarCross Ref
Yulei Sui, Ding Ye, and Jingling Xue. 2014. Detecting memory leaks statically with full-sparse value-flow analysis. IEEE Transactions on Software Engineering, 40, 2 (2014), 107–122. https://doi.org/10.1145/2338965.2336784 Google ScholarDigital Library
Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing, 1, 2 (1972), 146–160. https://doi.org/10.1137/0201010 Google ScholarDigital Library
Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. ACM SIGARCH Computer Architecture News, 45, 1 (2017), 389–404. https://doi.org/10.1145/3093336.3037744 Google ScholarDigital Library
Virginia Vassilevska Williams and R. Ryan Williams. 2018. Subcubic Equivalences Between Path, Matrix, and Triangle Problems. J. ACM, 65, 5 (2018), 27:1–27:38. https://doi.org/10.1145/3186893 Google ScholarDigital Library
Wojciech and Rytter. 1985. Fast recognition of pushdown automaton and context-free languages. Information and Control, https://doi.org/10.1016/S0019-9958(85)80024-3 Google ScholarDigital Library
Guoqing Xu, Atanas Rountev, and Manu Sridharan. 2009. Scaling CFL-reachability-based points-to analysis using context-sensitive must-not-alias analysis. In European Conference on Object-Oriented Programming. 98–122. https://doi.org/10.1007/978-3-642-03013-0_6 Google ScholarDigital Library
Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011. Demand-driven context-sensitive alias analysis for Java. In Proceedings of the 2011 International Symposium on Software Testing and Analysis. 155–165. https://doi.org/10.1145/2001420.2001440 Google ScholarDigital Library
Qirun Zhang, Michael R Lyu, Hao Yuan, and Zhendong Su. 2013. Fast algorithms for Dyck-CFL-reachability with applications to alias analysis. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation. 435–446. https://doi.org/10.1145/2491956.2462159 Google ScholarDigital Library
Qirun Zhang and Zhendong Su. 2017. Context-sensitive data-dependence analysis via linear conjunctive language reachability. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages. 344–358. https://doi.org/10.1145/3093333.3009848 Google ScholarDigital Library
Qirun Zhang, Xiao Xiao, Charles Zhang, Hao Yuan, and Zhendong Su. 2014. Efficient subcubic alias analysis for C. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications. 829–845. https://doi.org/10.1145/2660193.2660213 Google ScholarDigital Library
Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 197–208. https://doi.org/10.1145/1328897.1328464 Google ScholarDigital Library

Index Terms

Recursive State Machine Guided Graph Folding for Context-Free Language Reachability
1. Theory of computation
  1. Formal languages and automata theory
    1. Grammars and context-free languages

Recommendations

Taming transitive redundancy for context-free language reachability

Given an edge-labeled graph, context-free language reachability (CFL-reachability) computes reachable node pairs by deriving new edges and adding them to the graph. The redundancy that limits the scalability of CFL-reachability manifests as redundant ...
Read More
Precise and scalable context-sensitive pointer analysis via value flow graph
ISMM '13: Proceedings of the 2013 international symposium on memory management

In this paper, we propose a novel method for context-sensitive pointer analysis using the value flow graph (VFG) formulation. We achieve context-sensitivity by simultaneously applying function cloning and computing context-free language reachability (...
Read More
Precise and scalable context-sensitive pointer analysis via value flow graph
ISMM '13: Proceedings of the 2013 international symposium on memory management

In this paper, we propose a novel method for context-sensitive pointer analysis using the value flow graph (VFG) formulation. We achieve context-sensitivity by simultaneously applying function cloning and computing context-free language reachability (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Programming Languages Volume 7, Issue PLDI
June 2023
2020 pages
EISSN:2475-1421
DOI:10.1145/3554310
Editor:
Michael Hicks
Amazon, USA
Issue’s Table of Contents
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution 4.0 International License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2023
Published in pacmpl Volume 7, Issue PLDI

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available / v1.1
- Artifacts Evaluated & Reusable / v1.1
Author Tags
CFL-reachability
graph simplification
recursive state machines
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 386
  Total Downloads
- Downloads (Last 12 months)386
- Downloads (Last 6 weeks)66
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.