TypeMiner: Recovering Types in Binary Programs Using Machine Learning

Maier, Alwin; Gascon, Hugo; Wressnegger, Christian; Rieck, Konrad

doi:10.1007/978-3-030-22038-9_14

TypeMiner: Recovering Types in Binary Programs Using Machine Learning

Alwin Maier¹⁸,
Hugo Gascon¹⁸,
Christian Wressnegger¹⁸ &
…
Konrad Rieck¹⁸

Conference paper
First Online: 06 June 2019

4202 Accesses
16 Citations

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11543))

Abstract

Closed-source software is a major hurdle for assessing the security of computer systems. In absence of source code, it is particularly difficult to locate vulnerabilities and malicious functionality, as crucial information is removed by the compilation process. Most notably, binary programs usually lack type information, which complicates spotting vulnerabilities such as integer flaws or type confusions dramatically. Moreover, data types are often essential for gaining a deeper understanding of the program logic. In this paper we present TypeMiner, a static method for recovering types in binary programs. We build on the assumption that types leave characteristic traits in compiled code that can be automatically identified using machine learning starting at usage locations determined by an analyst. We evaluate the performance of our method with 14 real world software projects written in C and show that it is able to correctly recover the data types in 76%–93% of the cases.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aho, A.V., Sethi, R., Ullman, J.D.: Compilers Principles, Techniques, and Tools, 2nd edn. Addison-Wesley, Boston (2006)
MATH Google Scholar
Böhme, M., Pham, V.T., Nguyen, M.D., Roychoudhury, A.: Directed greybox fuzzing. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 2329–2344 (2017)
Google Scholar
Brumley, D., Chiueh, T., Johnson, R., Lin, H., Song, D.X.: RICH: automatically protecting against integer-based vulnerabilities. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2007)
Google Scholar
Caballero, J., Johnson, N.M., McCamant, S., Song, D.: Binary code extraction and interface identification for security applications. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2010)
Google Scholar
Caballero, J., Lin, Z.: Type inference on executables. ACM Comput. Surv. (CSUR) 48, 65 (2016)
Article Google Scholar
Checkoway, S., et al.: A systematic analysis of the Juniper Dual EC incident. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 468–479 (2016)
Google Scholar
Chipounov, V., Kuznetsov, V., Candea, G.: S2E: a platform for in-vivo multi-path analysis of software systems. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 265–278 (2011)
Google Scholar
Costin, A., Zaddach, J., Francillon, A., Balzarotti, D.: A large-scale analysis of the security of embedded firmwares. In: Proceedings of the USENIX Security Symposium, pp. 95–110 (2014)
Google Scholar
Dewey, D., Giffin, J.: Static detection of C++ vtable escape vulnerabilities in binary code. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2012)
Google Scholar
Dietz, W., Li, P., Regehr, J., Adve, V.: Understanding integer overflow in C/C++. In: Proceedings of the International Conference on Software Engineering (ICSE), pp. 760–770 (2012)
Google Scholar
Fokin, A., Troshina, K., Chernov, A.: Reconstruction of class hierarchies for decompilation of C++ programs. In: European Conference on Software Maintenance and Reengineering (CSMR) (2010)
Google Scholar
Haller, I., et al.: TypeSan: practical type confusion detection. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 517–528 (2016)
Google Scholar
Haller, I., Slowinska, A., Bos, H.: MemPick: high-level data structure detection in C/C++ binaries. In: Proceedings of the Working Conference on Reverse Engineering (WCRE) (2013)
Google Scholar
Hex-Rays SA: Hex-Rays Decompiler (2017). https://www.hex-rays.com/products/decompiler. Accessed February 2019
Hex-Rays SA: Hex-Rays IDA Disassembler (2017). https://www.hex-rays.com/products/ida. Accessed February 2019
Intel Corporation: Intel^® 64 and IA-32 Architectures Software Developer’s Manual. Intel Corporation (2016)
Google Scholar
ISO: Programming languages - C. International Organization for Standardization, Committee Draft (N1570), April 2011
Google Scholar
Jamrozik, K., Fraser, G., Tillmann, N., de Halleux, J.: Augmented dynamic symbolic execution. In: Proceedings of the International Conference on Automated Software Engineering (ASE), pp. 254–257 (2012)
Google Scholar
Jeon, Y., Biswas, P., Carr, S.A., Lee, B., Payer, M.: HexType: efficient detection of type confusion errors for C++. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 2373–2387 (2017)
Google Scholar
Jin, W., et al.: Recovering C++ objects from binaries using inter-procedural data-flow analysis. In: Proceedings of the ACM SIGPLAN Program Protection and Reverse Engineering Workshop (PPREW) (2014)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. Technical report. 23, LS VIII, University of Dortmund (1997)
Google Scholar
Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer (2002)
Google Scholar
Jung, C., Clark, N.: DDT: design and evaluation of a dynamic program analysis for optimizing data structure usage. In: Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO) (2009)
Google Scholar
Katz, O., El-Yaniv, R., Yahav, E.: Estimating types in binaries using predictive modeling. In: Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pp. 313–326 (2016)
Google Scholar
Lee, B., Song, C., Kim, T., Lee, W.: Type casting verification: stopping an emerging attack vector. In: Proceedings of the USENIX Security Symposium, pp. 81–96 (2015)
Google Scholar
Lee, J., Avgerinos, T., Brumley, D.: TIE: principled reverse engineering of types in binary programs. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2011)
Google Scholar
Lin, Z., Zhang, X., Xu, D.: Automatic reverse engineering of data structures from binary execution. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2010)
Google Scholar
Noonan, M., Loginov, A., Cok, D.: Polymorphic type inference for machine code. In: Proceedings of the ACM SIGPLAN International Conference on Programming Languages Design and Implementation (PLDI) (2016)
Google Scholar
Pawlowski, A., et al.: MARX: uncovering class hierarchies in C++ programs. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2017)
Google Scholar
Petsios, T., Tang, A., Stolfo, S., Keromytis, A.D., Jana, S.: Nezha: efficient domain-independent differential testing. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 615–632 (2017)
Google Scholar
Prakashm, A., Hu, X., Yin, H.: vfGuard: strict protection for virtual function calls in COTS C++ binaries. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2015)
Google Scholar
Ramos, D.A., Engler, D.: Under-constrained symbolic execution: correctness checking for real code. In: Proceedings of the USENIX Security Symposium, pp. 49–64 (2015)
Google Scholar
Rawat, S., Jain, V., Kumar, A., Cojocar, L., Giuffrida, C., Bos, H.: VUzzer: application-aware evolutionary fuzzing. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2017)
Google Scholar
Rupprecht, T., Chen, X., White, D.H., Boockmann, J.H., Lüttgen, G., Bos, H.: DSIbin: identifying dynamic data structures in C/C++ binaries. In: Proceedings of the International Conference on Automated Software Engineering (ASE) (2017)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article Google Scholar
Schumilo, S., Aschermann, C., Gawlik, R., Schinzel, S., Holz, T.: kAFL: hardware-assisted feedback fuzzing for OS kernels. In: Proceedings of the USENIX Security Symposium, pp. 167–182 (2017)
Google Scholar
Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2011)
Google Scholar
Stephens, N.D., et al.: Driller: augmenting fuzzing through selective symbolic execution. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2016)
Google Scholar
Summit, S.: C Programming FAQs: Frequently Asked Questions. Addison-Wesley, Boston (1996)
Google Scholar
Wang, T., Wei, T., Lin, Z., Zou, W.: IntScope: automatically detecting integer overflow vulnerability in X86 binary using symbolic execution. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2009)
Google Scholar
Wang, X., Chen, H., Jia, Z., Zeldovich, N., Kaashoek, M.F.: Improving integer security for systems with KINT. In: Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 163–177 (2012)
Google Scholar
White, D.H., Rupprecht, T., Lüttgen, G.: DSI: an evidence-based approach to identify dynamic data structures in C programs. In: Proceedings of the International Symposium on Software Testing and Analysis (ISSTA) (2016)
Google Scholar
Wressnegger, C., Yamaguchi, F., Maier, A., Rieck, K.: Twice the bits, twice the trouble: vulnerabilities induced by migrating to 64-bit platforms. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 541–552, October 2016
Google Scholar
Zhang, C., Songz, C., Chen, K.Z., Cheny, Z., Song, D.: VTint: protecting virtual function tables’ integrity. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2015)
Google Scholar

Download references

Acknowledgments

The authors gratefully acknowledge funding from the German Federal Ministry of Education and Research (BMBF) under the project VAMOS (FKZ 16KIS0534) and FIDI (FKZ 16KIS0786K).

Author information

Authors and Affiliations

Institute of System Security, TU Braunschweig, Braunschweig, Germany
Alwin Maier, Hugo Gascon, Christian Wressnegger & Konrad Rieck

Authors

Alwin Maier
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Gascon
View author publications
You can also search for this author in PubMed Google Scholar
Christian Wressnegger
View author publications
You can also search for this author in PubMed Google Scholar
Konrad Rieck
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alwin Maier .

Editor information

Editors and Affiliations

University of Georgia, Athens, GA, USA
Roberto Perdisci
University of Rennes, CNRS, IRISA, Rennes, France
Clémentine Maurice
University of Cagliari, Cagliari, Italy
Giorgio Giacinto
Chalmers University of Technology, Gothenburg, Sweden
Magnus Almgren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maier, A., Gascon, H., Wressnegger, C., Rieck, K. (2019). TypeMiner: Recovering Types in Binary Programs Using Machine Learning. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2019. Lecture Notes in Computer Science(), vol 11543. Springer, Cham. https://doi.org/10.1007/978-3-030-22038-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-22038-9_14
Published: 06 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22037-2
Online ISBN: 978-3-030-22038-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics