Skip to main content
Log in

Relational abstract interpretation of arrays in assembly code

  • Published:
Formal Methods in System Design Aims and scope Submit manuscript

Abstract

In this paper, we propose a static analysis technique for assembly code, based on abstract interpretation, to discover properties on arrays. Considering assembly code rather than source code has important advantages: we do not require to make assumptions on the compiler behaviour, and we can handle closed-source programs. The main disadvantage however, is that information about source code variables and their types, in particular about arrays, is unavailable. Instead, the binary code reasons about data-locations (registers and memory addresses) and their sizes in bytes. Without any knowledge of the source code, our analysis infers which sets of memory addresses correspond to arrays, and establishes properties on these addresses and their contents. The underlying abstract domain is relational, meaning that we can infer relations between variables of the domain. As a consequence, we can infer properties on arrays whose start address and size are defined with respect to variables of the domain, and thus can be unknown statically. Currently, no other tool operating at the assembly or binary level can infer such properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

  1. This simply consists in variable substitutions in the polyhedron of the abstract state.

  2. A heuristic for this function is detailed in [4]. To summarize, to test \( matchVar (v_1,v_2,p_1,p_2)\), we try to express \(v_1\) and \(v_2\) as linear expressions of a set of variables \( npiv \) chosen among the common variables of \(p_1\) and \(p_2\). The set \( npiv \) is determined using Gauss-Jordan elimination on the constraints of \(p_1\sqcup _{\diamond }p_2\)

References

  1. Balakrishnan G, Gruian R, Reps T, Teitelbaum T (2005) Codesurfer/x86-a platform for analyzing x86 executables. In: International conference on compiler construction

  2. Balakrishnan G, Reps T (2004) Analyzing memory accesses in x86 executables. In: compiler construction. Springer, Berlin, Heidelberg, pp 2732–2733

  3. Clément B, Hugues C, Christine R, Pascal S (2010) OTAWA: an open toolbox for adaptive WCET analysis. software technologies for embedded and ubiquitous systems, vol. 6399 of Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp 35–46

  4. Ballabriga C, Forget J, Gonnord L, Lipari G, Ruiz J (2019) Static analysis of binary code with memory indirections using polyhedra. In: International conference on verification, model checking, and abstract interpretation. Springer, Cham, pp 114–135

  5. Bardin S, Herrmann P, Védrine F (2011) Refinement-based CFG reconstruction from unstructured programs. In: International workshop on verification, model checking, and abstract interpretation (VMCAI’11)

  6. Blanchet B, Cousot P, Cousot R, Feret J, Mauborgne L, Miné A, Monniaux D, Rival X (2003). A static analyzer for large safety-critical software. In: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation

  7. Bradley AR, Manna Z, Sipma HB (2006) What’s decidable about arrays? In: International workshop on verification, model checking, and abstract interpretation. Springer, Berlin, Heidelberg, pp 427–442

  8. Brumley D, Jager I, Avgerinos T, Schwartz EJ (2011) Bap: A binary analysis platform. In International conference on computer aided verification. Springer, Berlin, Heidelberg, pp 463–469

  9. Bygde S, Lisper B, Holsti N (2012) Fully bounded polyhedral analysis of integers with wrapping. Electron Notes Theor Comput Sci 288:3–13

    Article  MATH  Google Scholar 

  10. Caballero J, Lin Z (2016) Type inference on executables. ACM Computing Surveys (CSUR) 48(4):1–35

    Article  Google Scholar 

  11. Cousot P, Cousot R (1977) Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages (PLDI’77). ACM, pp 238–252

  12. Cousot P, Cousot R, Logozzo F (2011) A parametric segmentation functor for fully automatic and scalable array content analysis. ACM SIGPLAN Notices 46:105–118

  13. Cousot P, Halbwachs N (1978) Automatic discovery of linear restraints among variables of a program. In: Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on principles of programming languages (POPL). ACM, pp 84–96

  14. Cozzie A, Stratton F, Xue H, King ST (2008) Digging for data structures. OSDI 8:255–266

    Google Scholar 

  15. Eagle C (2011) The IDA pro book: the unofficial guide to the world’s most popular disassembler. No Starch Press, San Francisco

  16. Gopan D, DiMaio F, Dor N, Reps T, Sagiv M (2004) Numeric domains with summarized dimensions. In: International conference on tools and algorithms for the construction and analysis of systems. Springer, Berlin, Heidelberg, pp 512–529

  17. Gopan D, Reps T (2006) Lookahead widening. In: International conference on computer aided verification. Springer, Berlin, Heidelberg, pp 452–466

  18. Gopan D, Reps T, Sagiv M (2005) A framework for numeric analysis of array operations. ACM SIGPLAN Notices 40(1):338–350

    Article  MATH  Google Scholar 

  19. Gustafsson J, Betts A, Ermedahl A, Lisper B (2010) The Mälardalen WCET benchmarks: Past, present and future. In OASIcs-OpenAccess Series in Informatics, volume 15. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik

  20. Habermehl P, Iosif R, Vojnar T (2008) What else is decidable about integer arrays? In International conference on foundations of software science and computational structures. Springer, Berlin, Heidelberg, pp 474–489

  21. Halbwachs N, Péron M (2008) Discovering properties about arrays in simple programs. ACM SIGPLAN Notices 43:339–348

    Article  Google Scholar 

  22. Hoder K, Kovács L, Voronkov A (2011) Case studies on invariant generation using a saturation theorem prover. In: Mexican international conference on artificial intelligence. Springer, Berlin, Heidelberg, pp 1–15

  23. Kinder J, Veith H (2010) Precise static analysis of untrusted driver binaries. In: Formal methods in computer aided design

  24. Kovács L, Voronkov A (2009) Finding loop invariants for programs over arrays using a theorem prover. In: International conference on fundamental approaches to software engineering. Springer, Berlin, Heidelberg, pp 470–485

  25. Liu J, Rival X (2015) Abstraction of optional numerical values. In: Asian symposium on programming languages and systems. Springer, Cham, pp 146–166

  26. Liu J, Rival X (2017) An array content static analysis based on non-contiguous partitions. Comput Lang Syst Struct 47:104–129

    MATH  Google Scholar 

  27. Miné A (2006) The octagon abstract domain. Higher-order Symbolic Comput 19(1):31–100

    Article  MATH  Google Scholar 

  28. Monk JD (1976). Cylindric algebras, vol. 37. In: Mathematical Logic Graduate Texts in Mathematics. Springer, Cham

  29. Nikolić Đ, Spoto F (2013) Inferring complete initialization of arrays. Theor Comput Sci 484:16–40

    Article  MathSciNet  MATH  Google Scholar 

  30. Pouchet L-N (2012) Polybench: The polyhedral benchmark suite. http://www.cs.ucla.edu/pouchet/software/polybench

  31. Ramalingam G, Field J, Tip F (1999) Aggregate structure identification and its application to program analysis. In: Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp 119–132

  32. Reps T, Balakrishnan G (2008) Improved memory-access analysis for x86 executables. In: Compiler Construction. Springer, Berlin, Heidelberg, pp 16–35

  33. Sen R, Srikant YN (2007) Executable analysis using abstract interpretation with circular linear progressions. In: 2007 5th IEEE/ACM International conference on formal methods and models for codesign (MEMOCODE 2007), pages 39–48. IEEE

  34. Sepp A, Mihaila B, Simon A (2011) Precise static analysis of binaries by extracting relational information. In: 18th Working conference on reverse engineering (WCRE’11). IEEE

  35. Sharir M, Pnueli A (1978) Two approaches to interprocedural data flow analysis. New York University. Courant Institute of Mathematical Sciences, ComputerScience Department

  36. Shoshitaishvili Y, Wang R, Salls C, Stephens N, Polino M, Dutcher A, Grosen J, Feng S, Hauser C, Kruegel C, Vigna G et al. (2016) Sok:(state of) the art of war: Offensive techniques in binary analysis. In: 2016 IEEE Symposium on security and privacy (SP), pp 138–157

  37. Slowinska A, Stancescu T, Bos H (2011) A dynamic excavator for reverse engineering data structures. In: NDSS, Howard

  38. Troshina K, Derevenets Y, Chernov A (2010) Reconstruction of composite types for decompilation. In 2010 10th IEEE Working conference on source code analysis and manipulation

Download references

Acknowledgements

We would like to thank Giuseppe Lipari for his precious feedback on the paper. We would also like to thank Andrei Florea for chasing some naughty bugs away from our definitions.

Funding

Partially funded by the French National Research Agency (ANR), Corteva Project (ANR-17-CE25-0003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julien Forget.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ballabriga, C., Forget, J. & Ruiz, J. Relational abstract interpretation of arrays in assembly code. Form Methods Syst Des 59, 103–135 (2021). https://doi.org/10.1007/s10703-022-00399-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10703-022-00399-3

Keywords

Navigation