Skip to main content

TypeMiner: Recovering Types in Binary Programs Using Machine Learning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11543))

Abstract

Closed-source software is a major hurdle for assessing the security of computer systems. In absence of source code, it is particularly difficult to locate vulnerabilities and malicious functionality, as crucial information is removed by the compilation process. Most notably, binary programs usually lack type information, which complicates spotting vulnerabilities such as integer flaws or type confusions dramatically. Moreover, data types are often essential for gaining a deeper understanding of the program logic. In this paper we present TypeMiner, a static method for recovering types in binary programs. We build on the assumption that types leave characteristic traits in compiled code that can be automatically identified using machine learning starting at usage locations determined by an analyst. We evaluate the performance of our method with 14 real world software projects written in C and show that it is able to correctly recover the data types in 76%–93% of the cases.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers Principles, Techniques, and Tools, 2nd edn. Addison-Wesley, Boston (2006)

    MATH  Google Scholar 

  2. Böhme, M., Pham, V.T., Nguyen, M.D., Roychoudhury, A.: Directed greybox fuzzing. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 2329–2344 (2017)

    Google Scholar 

  3. Brumley, D., Chiueh, T., Johnson, R., Lin, H., Song, D.X.: RICH: automatically protecting against integer-based vulnerabilities. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2007)

    Google Scholar 

  4. Caballero, J., Johnson, N.M., McCamant, S., Song, D.: Binary code extraction and interface identification for security applications. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2010)

    Google Scholar 

  5. Caballero, J., Lin, Z.: Type inference on executables. ACM Comput. Surv. (CSUR) 48, 65 (2016)

    Article  Google Scholar 

  6. Checkoway, S., et al.: A systematic analysis of the Juniper Dual EC incident. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 468–479 (2016)

    Google Scholar 

  7. Chipounov, V., Kuznetsov, V., Candea, G.: S2E: a platform for in-vivo multi-path analysis of software systems. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 265–278 (2011)

    Google Scholar 

  8. Costin, A., Zaddach, J., Francillon, A., Balzarotti, D.: A large-scale analysis of the security of embedded firmwares. In: Proceedings of the USENIX Security Symposium, pp. 95–110 (2014)

    Google Scholar 

  9. Dewey, D., Giffin, J.: Static detection of C++ vtable escape vulnerabilities in binary code. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2012)

    Google Scholar 

  10. Dietz, W., Li, P., Regehr, J., Adve, V.: Understanding integer overflow in C/C++. In: Proceedings of the International Conference on Software Engineering (ICSE), pp. 760–770 (2012)

    Google Scholar 

  11. Fokin, A., Troshina, K., Chernov, A.: Reconstruction of class hierarchies for decompilation of C++ programs. In: European Conference on Software Maintenance and Reengineering (CSMR) (2010)

    Google Scholar 

  12. Haller, I., et al.: TypeSan: practical type confusion detection. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 517–528 (2016)

    Google Scholar 

  13. Haller, I., Slowinska, A., Bos, H.: MemPick: high-level data structure detection in C/C++ binaries. In: Proceedings of the Working Conference on Reverse Engineering (WCRE) (2013)

    Google Scholar 

  14. Hex-Rays SA: Hex-Rays Decompiler (2017). https://www.hex-rays.com/products/decompiler. Accessed February 2019

  15. Hex-Rays SA: Hex-Rays IDA Disassembler (2017). https://www.hex-rays.com/products/ida. Accessed February 2019

  16. Intel Corporation: Intel® 64 and IA-32 Architectures Software Developer’s Manual. Intel Corporation (2016)

    Google Scholar 

  17. ISO: Programming languages - C. International Organization for Standardization, Committee Draft (N1570), April 2011

    Google Scholar 

  18. Jamrozik, K., Fraser, G., Tillmann, N., de Halleux, J.: Augmented dynamic symbolic execution. In: Proceedings of the International Conference on Automated Software Engineering (ASE), pp. 254–257 (2012)

    Google Scholar 

  19. Jeon, Y., Biswas, P., Carr, S.A., Lee, B., Payer, M.: HexType: efficient detection of type confusion errors for C++. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 2373–2387 (2017)

    Google Scholar 

  20. Jin, W., et al.: Recovering C++ objects from binaries using inter-procedural data-flow analysis. In: Proceedings of the ACM SIGPLAN Program Protection and Reverse Engineering Workshop (PPREW) (2014)

    Google Scholar 

  21. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. Technical report. 23, LS VIII, University of Dortmund (1997)

    Google Scholar 

  22. Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer (2002)

    Google Scholar 

  23. Jung, C., Clark, N.: DDT: design and evaluation of a dynamic program analysis for optimizing data structure usage. In: Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO) (2009)

    Google Scholar 

  24. Katz, O., El-Yaniv, R., Yahav, E.: Estimating types in binaries using predictive modeling. In: Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pp. 313–326 (2016)

    Google Scholar 

  25. Lee, B., Song, C., Kim, T., Lee, W.: Type casting verification: stopping an emerging attack vector. In: Proceedings of the USENIX Security Symposium, pp. 81–96 (2015)

    Google Scholar 

  26. Lee, J., Avgerinos, T., Brumley, D.: TIE: principled reverse engineering of types in binary programs. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2011)

    Google Scholar 

  27. Lin, Z., Zhang, X., Xu, D.: Automatic reverse engineering of data structures from binary execution. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2010)

    Google Scholar 

  28. Noonan, M., Loginov, A., Cok, D.: Polymorphic type inference for machine code. In: Proceedings of the ACM SIGPLAN International Conference on Programming Languages Design and Implementation (PLDI) (2016)

    Google Scholar 

  29. Pawlowski, A., et al.: MARX: uncovering class hierarchies in C++ programs. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2017)

    Google Scholar 

  30. Petsios, T., Tang, A., Stolfo, S., Keromytis, A.D., Jana, S.: Nezha: efficient domain-independent differential testing. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 615–632 (2017)

    Google Scholar 

  31. Prakashm, A., Hu, X., Yin, H.: vfGuard: strict protection for virtual function calls in COTS C++ binaries. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2015)

    Google Scholar 

  32. Ramos, D.A., Engler, D.: Under-constrained symbolic execution: correctness checking for real code. In: Proceedings of the USENIX Security Symposium, pp. 49–64 (2015)

    Google Scholar 

  33. Rawat, S., Jain, V., Kumar, A., Cojocar, L., Giuffrida, C., Bos, H.: VUzzer: application-aware evolutionary fuzzing. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2017)

    Google Scholar 

  34. Rupprecht, T., Chen, X., White, D.H., Boockmann, J.H., Lüttgen, G., Bos, H.: DSIbin: identifying dynamic data structures in C/C++ binaries. In: Proceedings of the International Conference on Automated Software Engineering (ASE) (2017)

    Google Scholar 

  35. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  Google Scholar 

  36. Schumilo, S., Aschermann, C., Gawlik, R., Schinzel, S., Holz, T.: kAFL: hardware-assisted feedback fuzzing for OS kernels. In: Proceedings of the USENIX Security Symposium, pp. 167–182 (2017)

    Google Scholar 

  37. Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2011)

    Google Scholar 

  38. Stephens, N.D., et al.: Driller: augmenting fuzzing through selective symbolic execution. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2016)

    Google Scholar 

  39. Summit, S.: C Programming FAQs: Frequently Asked Questions. Addison-Wesley, Boston (1996)

    Google Scholar 

  40. Wang, T., Wei, T., Lin, Z., Zou, W.: IntScope: automatically detecting integer overflow vulnerability in X86 binary using symbolic execution. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2009)

    Google Scholar 

  41. Wang, X., Chen, H., Jia, Z., Zeldovich, N., Kaashoek, M.F.: Improving integer security for systems with KINT. In: Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 163–177 (2012)

    Google Scholar 

  42. White, D.H., Rupprecht, T., Lüttgen, G.: DSI: an evidence-based approach to identify dynamic data structures in C programs. In: Proceedings of the International Symposium on Software Testing and Analysis (ISSTA) (2016)

    Google Scholar 

  43. Wressnegger, C., Yamaguchi, F., Maier, A., Rieck, K.: Twice the bits, twice the trouble: vulnerabilities induced by migrating to 64-bit platforms. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), pp. 541–552, October 2016

    Google Scholar 

  44. Zhang, C., Songz, C., Chen, K.Z., Cheny, Z., Song, D.: VTint: protecting virtual function tables’ integrity. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2015)

    Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge funding from the German Federal Ministry of Education and Research (BMBF) under the project VAMOS (FKZ 16KIS0534) and FIDI (FKZ 16KIS0786K).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alwin Maier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Maier, A., Gascon, H., Wressnegger, C., Rieck, K. (2019). TypeMiner: Recovering Types in Binary Programs Using Machine Learning. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2019. Lecture Notes in Computer Science(), vol 11543. Springer, Cham. https://doi.org/10.1007/978-3-030-22038-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22038-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22037-2

  • Online ISBN: 978-3-030-22038-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics