Abstract
Closed-source software is a major hurdle for assessing the security of computer systems. In absence of source code, it is particularly difficult to locate vulnerabilities and malicious functionality, as crucial information is removed by the compilation process. Most notably, binary programs usually lack type information, which complicates spotting vulnerabilities such as integer flaws or type confusions dramatically. Moreover, data types are often essential for gaining a deeper understanding of the program logic. In this paper we present TypeMiner, a static method for recovering types in binary programs. We build on the assumption that types leave characteristic traits in compiled code that can be automatically identified using machine learning starting at usage locations determined by an analyst. We evaluate the performance of our method with 14 real world software projects written in C and show that it is able to correctly recover the data types in 76%–93% of the cases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers Principles, Techniques, and Tools, 2nd edn. Addison-Wesley, Boston (2006)
Böhme, M., Pham, V.T., Nguyen, M.D., Roychoudhury, A.: Directed greybox fuzzing. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 2329–2344 (2017)
Brumley, D., Chiueh, T., Johnson, R., Lin, H., Song, D.X.: RICH: automatically protecting against integer-based vulnerabilities. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2007)
Caballero, J., Johnson, N.M., McCamant, S., Song, D.: Binary code extraction and interface identification for security applications. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2010)
Caballero, J., Lin, Z.: Type inference on executables. ACM Comput. Surv. (CSUR) 48, 65 (2016)
Checkoway, S., et al.: A systematic analysis of the Juniper Dual EC incident. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 468–479 (2016)
Chipounov, V., Kuznetsov, V., Candea, G.: S2E: a platform for in-vivo multi-path analysis of software systems. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 265–278 (2011)
Costin, A., Zaddach, J., Francillon, A., Balzarotti, D.: A large-scale analysis of the security of embedded firmwares. In: Proceedings of the USENIX Security Symposium, pp. 95–110 (2014)
Dewey, D., Giffin, J.: Static detection of C++ vtable escape vulnerabilities in binary code. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2012)
Dietz, W., Li, P., Regehr, J., Adve, V.: Understanding integer overflow in C/C++. In: Proceedings of the International Conference on Software Engineering (ICSE), pp. 760–770 (2012)
Fokin, A., Troshina, K., Chernov, A.: Reconstruction of class hierarchies for decompilation of C++ programs. In: European Conference on Software Maintenance and Reengineering (CSMR) (2010)
Haller, I., et al.: TypeSan: practical type confusion detection. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 517–528 (2016)
Haller, I., Slowinska, A., Bos, H.: MemPick: high-level data structure detection in C/C++ binaries. In: Proceedings of the Working Conference on Reverse Engineering (WCRE) (2013)
Hex-Rays SA: Hex-Rays Decompiler (2017). https://www.hex-rays.com/products/decompiler. Accessed February 2019
Hex-Rays SA: Hex-Rays IDA Disassembler (2017). https://www.hex-rays.com/products/ida. Accessed February 2019
Intel Corporation: Intel® 64 and IA-32 Architectures Software Developer’s Manual. Intel Corporation (2016)
ISO: Programming languages - C. International Organization for Standardization, Committee Draft (N1570), April 2011
Jamrozik, K., Fraser, G., Tillmann, N., de Halleux, J.: Augmented dynamic symbolic execution. In: Proceedings of the International Conference on Automated Software Engineering (ASE), pp. 254–257 (2012)
Jeon, Y., Biswas, P., Carr, S.A., Lee, B., Payer, M.: HexType: efficient detection of type confusion errors for C++. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 2373–2387 (2017)
Jin, W., et al.: Recovering C++ objects from binaries using inter-procedural data-flow analysis. In: Proceedings of the ACM SIGPLAN Program Protection and Reverse Engineering Workshop (PPREW) (2014)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. Technical report. 23, LS VIII, University of Dortmund (1997)
Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer (2002)
Jung, C., Clark, N.: DDT: design and evaluation of a dynamic program analysis for optimizing data structure usage. In: Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO) (2009)
Katz, O., El-Yaniv, R., Yahav, E.: Estimating types in binaries using predictive modeling. In: Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pp. 313–326 (2016)
Lee, B., Song, C., Kim, T., Lee, W.: Type casting verification: stopping an emerging attack vector. In: Proceedings of the USENIX Security Symposium, pp. 81–96 (2015)
Lee, J., Avgerinos, T., Brumley, D.: TIE: principled reverse engineering of types in binary programs. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2011)
Lin, Z., Zhang, X., Xu, D.: Automatic reverse engineering of data structures from binary execution. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2010)
Noonan, M., Loginov, A., Cok, D.: Polymorphic type inference for machine code. In: Proceedings of the ACM SIGPLAN International Conference on Programming Languages Design and Implementation (PLDI) (2016)
Pawlowski, A., et al.: MARX: uncovering class hierarchies in C++ programs. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2017)
Petsios, T., Tang, A., Stolfo, S., Keromytis, A.D., Jana, S.: Nezha: efficient domain-independent differential testing. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 615–632 (2017)
Prakashm, A., Hu, X., Yin, H.: vfGuard: strict protection for virtual function calls in COTS C++ binaries. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2015)
Ramos, D.A., Engler, D.: Under-constrained symbolic execution: correctness checking for real code. In: Proceedings of the USENIX Security Symposium, pp. 49–64 (2015)
Rawat, S., Jain, V., Kumar, A., Cojocar, L., Giuffrida, C., Bos, H.: VUzzer: application-aware evolutionary fuzzing. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2017)
Rupprecht, T., Chen, X., White, D.H., Boockmann, J.H., Lüttgen, G., Bos, H.: DSIbin: identifying dynamic data structures in C/C++ binaries. In: Proceedings of the International Conference on Automated Software Engineering (ASE) (2017)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Schumilo, S., Aschermann, C., Gawlik, R., Schinzel, S., Holz, T.: kAFL: hardware-assisted feedback fuzzing for OS kernels. In: Proceedings of the USENIX Security Symposium, pp. 167–182 (2017)
Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2011)
Stephens, N.D., et al.: Driller: augmenting fuzzing through selective symbolic execution. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2016)
Summit, S.: C Programming FAQs: Frequently Asked Questions. Addison-Wesley, Boston (1996)
Wang, T., Wei, T., Lin, Z., Zou, W.: IntScope: automatically detecting integer overflow vulnerability in X86 binary using symbolic execution. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2009)
Wang, X., Chen, H., Jia, Z., Zeldovich, N., Kaashoek, M.F.: Improving integer security for systems with KINT. In: Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 163–177 (2012)
White, D.H., Rupprecht, T., Lüttgen, G.: DSI: an evidence-based approach to identify dynamic data structures in C programs. In: Proceedings of the International Symposium on Software Testing and Analysis (ISSTA) (2016)
Wressnegger, C., Yamaguchi, F., Maier, A., Rieck, K.: Twice the bits, twice the trouble: vulnerabilities induced by migrating to 64-bit platforms. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 541–552, October 2016
Zhang, C., Songz, C., Chen, K.Z., Cheny, Z., Song, D.: VTint: protecting virtual function tables’ integrity. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2015)
Acknowledgments
The authors gratefully acknowledge funding from the German Federal Ministry of Education and Research (BMBF) under the project VAMOS (FKZ 16KIS0534) and FIDI (FKZ 16KIS0786K).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Maier, A., Gascon, H., Wressnegger, C., Rieck, K. (2019). TypeMiner: Recovering Types in Binary Programs Using Machine Learning. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2019. Lecture Notes in Computer Science(), vol 11543. Springer, Cham. https://doi.org/10.1007/978-3-030-22038-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-22038-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22037-2
Online ISBN: 978-3-030-22038-9
eBook Packages: Computer ScienceComputer Science (R0)