Abstract
In this paper, we propose a static analysis technique for assembly code, based on abstract interpretation, to discover properties on arrays. Considering assembly code rather than source code has important advantages: we do not require to make assumptions on the compiler behaviour, and we can handle closed-source programs. The main disadvantage however, is that information about source code variables and their types, in particular about arrays, is unavailable. Instead, the binary code reasons about data-locations (registers and memory addresses) and their sizes in bytes. Without any knowledge of the source code, our analysis infers which sets of memory addresses correspond to arrays, and establishes properties on these addresses and their contents. The underlying abstract domain is relational, meaning that we can infer relations between variables of the domain. As a consequence, we can infer properties on arrays whose start address and size are defined with respect to variables of the domain, and thus can be unknown statically. Currently, no other tool operating at the assembly or binary level can infer such properties.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Notes
This simply consists in variable substitutions in the polyhedron of the abstract state.
A heuristic for this function is detailed in [4]. To summarize, to test \( matchVar (v_1,v_2,p_1,p_2)\), we try to express \(v_1\) and \(v_2\) as linear expressions of a set of variables \( npiv \) chosen among the common variables of \(p_1\) and \(p_2\). The set \( npiv \) is determined using Gauss-Jordan elimination on the constraints of \(p_1\sqcup _{\diamond }p_2\)
References
Balakrishnan G, Gruian R, Reps T, Teitelbaum T (2005) Codesurfer/x86-a platform for analyzing x86 executables. In: International conference on compiler construction
Balakrishnan G, Reps T (2004) Analyzing memory accesses in x86 executables. In: compiler construction. Springer, Berlin, Heidelberg, pp 2732–2733
Clément B, Hugues C, Christine R, Pascal S (2010) OTAWA: an open toolbox for adaptive WCET analysis. software technologies for embedded and ubiquitous systems, vol. 6399 of Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp 35–46
Ballabriga C, Forget J, Gonnord L, Lipari G, Ruiz J (2019) Static analysis of binary code with memory indirections using polyhedra. In: International conference on verification, model checking, and abstract interpretation. Springer, Cham, pp 114–135
Bardin S, Herrmann P, Védrine F (2011) Refinement-based CFG reconstruction from unstructured programs. In: International workshop on verification, model checking, and abstract interpretation (VMCAI’11)
Blanchet B, Cousot P, Cousot R, Feret J, Mauborgne L, Miné A, Monniaux D, Rival X (2003). A static analyzer for large safety-critical software. In: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Bradley AR, Manna Z, Sipma HB (2006) What’s decidable about arrays? In: International workshop on verification, model checking, and abstract interpretation. Springer, Berlin, Heidelberg, pp 427–442
Brumley D, Jager I, Avgerinos T, Schwartz EJ (2011) Bap: A binary analysis platform. In International conference on computer aided verification. Springer, Berlin, Heidelberg, pp 463–469
Bygde S, Lisper B, Holsti N (2012) Fully bounded polyhedral analysis of integers with wrapping. Electron Notes Theor Comput Sci 288:3–13
Caballero J, Lin Z (2016) Type inference on executables. ACM Computing Surveys (CSUR) 48(4):1–35
Cousot P, Cousot R (1977) Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages (PLDI’77). ACM, pp 238–252
Cousot P, Cousot R, Logozzo F (2011) A parametric segmentation functor for fully automatic and scalable array content analysis. ACM SIGPLAN Notices 46:105–118
Cousot P, Halbwachs N (1978) Automatic discovery of linear restraints among variables of a program. In: Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on principles of programming languages (POPL). ACM, pp 84–96
Cozzie A, Stratton F, Xue H, King ST (2008) Digging for data structures. OSDI 8:255–266
Eagle C (2011) The IDA pro book: the unofficial guide to the world’s most popular disassembler. No Starch Press, San Francisco
Gopan D, DiMaio F, Dor N, Reps T, Sagiv M (2004) Numeric domains with summarized dimensions. In: International conference on tools and algorithms for the construction and analysis of systems. Springer, Berlin, Heidelberg, pp 512–529
Gopan D, Reps T (2006) Lookahead widening. In: International conference on computer aided verification. Springer, Berlin, Heidelberg, pp 452–466
Gopan D, Reps T, Sagiv M (2005) A framework for numeric analysis of array operations. ACM SIGPLAN Notices 40(1):338–350
Gustafsson J, Betts A, Ermedahl A, Lisper B (2010) The Mälardalen WCET benchmarks: Past, present and future. In OASIcs-OpenAccess Series in Informatics, volume 15. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
Habermehl P, Iosif R, Vojnar T (2008) What else is decidable about integer arrays? In International conference on foundations of software science and computational structures. Springer, Berlin, Heidelberg, pp 474–489
Halbwachs N, Péron M (2008) Discovering properties about arrays in simple programs. ACM SIGPLAN Notices 43:339–348
Hoder K, Kovács L, Voronkov A (2011) Case studies on invariant generation using a saturation theorem prover. In: Mexican international conference on artificial intelligence. Springer, Berlin, Heidelberg, pp 1–15
Kinder J, Veith H (2010) Precise static analysis of untrusted driver binaries. In: Formal methods in computer aided design
Kovács L, Voronkov A (2009) Finding loop invariants for programs over arrays using a theorem prover. In: International conference on fundamental approaches to software engineering. Springer, Berlin, Heidelberg, pp 470–485
Liu J, Rival X (2015) Abstraction of optional numerical values. In: Asian symposium on programming languages and systems. Springer, Cham, pp 146–166
Liu J, Rival X (2017) An array content static analysis based on non-contiguous partitions. Comput Lang Syst Struct 47:104–129
Miné A (2006) The octagon abstract domain. Higher-order Symbolic Comput 19(1):31–100
Monk JD (1976). Cylindric algebras, vol. 37. In: Mathematical Logic Graduate Texts in Mathematics. Springer, Cham
Nikolić Đ, Spoto F (2013) Inferring complete initialization of arrays. Theor Comput Sci 484:16–40
Pouchet L-N (2012) Polybench: The polyhedral benchmark suite. http://www.cs.ucla.edu/pouchet/software/polybench
Ramalingam G, Field J, Tip F (1999) Aggregate structure identification and its application to program analysis. In: Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp 119–132
Reps T, Balakrishnan G (2008) Improved memory-access analysis for x86 executables. In: Compiler Construction. Springer, Berlin, Heidelberg, pp 16–35
Sen R, Srikant YN (2007) Executable analysis using abstract interpretation with circular linear progressions. In: 2007 5th IEEE/ACM International conference on formal methods and models for codesign (MEMOCODE 2007), pages 39–48. IEEE
Sepp A, Mihaila B, Simon A (2011) Precise static analysis of binaries by extracting relational information. In: 18th Working conference on reverse engineering (WCRE’11). IEEE
Sharir M, Pnueli A (1978) Two approaches to interprocedural data flow analysis. New York University. Courant Institute of Mathematical Sciences, ComputerScience Department
Shoshitaishvili Y, Wang R, Salls C, Stephens N, Polino M, Dutcher A, Grosen J, Feng S, Hauser C, Kruegel C, Vigna G et al. (2016) Sok:(state of) the art of war: Offensive techniques in binary analysis. In: 2016 IEEE Symposium on security and privacy (SP), pp 138–157
Slowinska A, Stancescu T, Bos H (2011) A dynamic excavator for reverse engineering data structures. In: NDSS, Howard
Troshina K, Derevenets Y, Chernov A (2010) Reconstruction of composite types for decompilation. In 2010 10th IEEE Working conference on source code analysis and manipulation
Acknowledgements
We would like to thank Giuseppe Lipari for his precious feedback on the paper. We would also like to thank Andrei Florea for chasing some naughty bugs away from our definitions.
Funding
Partially funded by the French National Research Agency (ANR), Corteva Project (ANR-17-CE25-0003).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ballabriga, C., Forget, J. & Ruiz, J. Relational abstract interpretation of arrays in assembly code. Form Methods Syst Des 59, 103–135 (2021). https://doi.org/10.1007/s10703-022-00399-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10703-022-00399-3