Abstract
When analyzing an untrusted binary, reverse engineers usually rely on ad-hoc collections of interesting dynamic patterns—known as behaviors in the malware-analysis community—and static patterns—known as signatures in the antivirus community. Such patterns are often part of the skill set of the analyst, sometimes implemented in manually-created post-processing scripts. It would be desirable to be able to automatically find such behaviors, present them to analysts, and create a systematic catalog of matching rules and relevant implementations. We propose Jackdaw, a system that finds interesting dynamic patterns, and ranks them to unveil potentially interesting behaviors. Then, it annotates them with static information, capturing the distinct implementations of each across different malware families. Finally, Jackdaw associates semantic information to the behaviors, so as to create a descriptive summary that helps the analysts in querying the catalog of behaviors by type. To do this, it leverages the dynamic information and an indexed Web-based knowledge databases.
We implement and demonstrate Jackdaw on the Win32 API (even if the technique can be generalized to any OS). On a dataset of 2,136 distinct binaries, including both malicious and benign libraries and executables, we compared the behaviors extracted automatically against a ground truth of 44 behaviors created manually by expert analysts. Jackdaw found 77.3 % of them and was able to exclude spurious behaviors in 99.6 % cases. We also discovered 466 novel behaviors, among which manual exploration and review by expert reverse engineers revealed interesting findings and confirmed the correctness of the semantic tagging.
Keywords
- Reverse Engineering
- Control Flow Graph
- Reverse Engineer
- Candidate Behavior
- Virtual Machine Introspection
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: NDSS (2009)
Bayer, U., Habibi, I., Balzarotti, D., Kirda, E., Kruegel, C.: Insights into current malware behavior. In: LEET (2009)
Caselden, D., Bazhanyuk, A., Payer, M., McCamant, S., Song, D.: HI-CFG: construction by binary analysis and application to attack polymorphism. In: Crampton, J., Jajodia, S., Mayes, K. (eds.) ESORICS 2013. LNCS, vol. 8134, pp. 164–181. Springer, Heidelberg (2013)
Cesare, S., Xiang, Y.: Software Similarity and Classification. Springer Briefs in Computer Science. Springer, London (2012)
Cesare, S., Xiang, Y., Zhou, W.: Control flow-based malware variant detection. IEEE Trans. Dependable Secure Comput. 11(4), 307–317 (2014). doi:10.1109/TDSC.2013.40
Comparetti, P.M., Salvaneschi, G., Kirda, E., Kolbitsch, C., Kruegel, C., Zanero, S.: Identifying dormant functionality in malware programs. In: SP, pp. 61–76. IEEE Computer Society, Washington, DC (2010)
Crandall, J.R., Wu, S.F., Chong, F.T.: Minos: architectural support for protecting control data. TACO 3(4), 359–389 (2006)
Deng, Z., Zhang, X., Xu, D.: Spider: stealthy binary program instrumentation and debugging via hardware virtualization. In: ACSAC, New York, NY, USA (2013)
Dolan-Gavitt, B., Leek, T., Zhivich, M., Giffin, J., Lee, W.: Virtuoso: narrowing the semantic gap in virtual machine introspection. In: SP, pp. 297–312 (2011)
Eskandari, M., Khorshidpour, Z., Hashemi, S.: Hdm-analyser: a hybrid analysis approach based on data mining techniques for malware detection. JCV 9(2), 77–93 (2013)
Fredrikson, M., Jha, S., Christodorescu, M., Sailer, R., Yan, X.: Synthesizing near-optimal malware specifications from suspicious behaviors. In: SP, pp. 45–60. IEEE Computer Society, Washington, DC (2010)
Fu, Y., Lin, Z.: Space traveling across vm: automatically bridging the semantic gap in virtual machine introspection via online kernel data redirection. In: SP, pp. 586–600 (2012)
Garfinkel, T., Adams, K., Warfield, A., Franklin, J.: Compatibility is not transparency: Vmm detection myths and realities. In: HOTOS, pp. 6:1–6:6. USENIX Association, Berkeley (2007)
Holz, T., Raynal, F.: Detecting honeypots and other suspicious environments. In: 6th IEEE SMC Information Assurance Workshop (2005)
Jacob, G., Comparetti, P.M., Neugschwandtner, M., Kruegel, C., Vigna, G.: A static, packer-agnostic filter to detect similar malware samples. In: Flegel, U., Markatos, E., Robertson, W. (eds.) DIMVA 2012. LNCS, vol. 7591, pp. 102–122. Springer, Heidelberg (2013)
Jacob, G., Debar, H., Filiol, E.: Behavioral detection of malware: from a survey towards an established taxonomy. JCV 4(3), 251–266 (2008)
Jang, J., Woo, M., Brumley, D.: Towards automatic software lineage inference. In: USENIX Security, pp. 81–96. USENIX Association, Berkeley (2013)
Kirat, D., Vigna, G., Kruegel, C.: Barebox: efficient malware analysis on bare-metal. In: ACSAC, pp. 403–412. ACM, New York (2011)
Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006)
Lee, J., Avgerinos, T., Brumley, D.: Tie: principled reverse engineering of types in binary programs. In: NDSS (2011)
Lindorfer, M., Federico, A.D., Maggi, F., Comparetti, P.M., Zanero, S.: Lines of malicious code: insights into the malicious software industry. In: ACSAC, pp. 349–358. ACM, New York (2012)
Lindorfer, M., Kolbitsch, C., Milani Comparetti, P.: Detecting environment-sensitive malware. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 338–357. Springer, Heidelberg (2011)
Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static disassembly. In: CCS, pp. 290–299. ACM, New York (2003)
Maggi, F., Matteucci, M., Zanero, S.: Detecting intrusions through system call sequence and argument analysis. TODS 7(4), 381–395 (2008)
Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: fast, generic, and safe unpacking of malware. In: ACSAC, pp. 431–441. IEEE (2007)
Martignoni, L., Stinson, E., Fredrikson, M., Jha, S., Mitchell, J.C.: A layered architecture for detecting malicious behaviors. In: Lippmann, R., Kirda, E., Trachtenberg, A. (eds.) RAID 2008. LNCS, vol. 5230, pp. 78–97. Springer, Heidelberg (2008)
Moser, A., Kruegel, C., Kirda, E.: Exploring multiple execution paths for malware analysis. In: SP (2007)
Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: ACSAC, pp. 421–430 (2007)
Mutz, D., Valeur, F., Vigna, G., Kruegel, C.: Anomalous system call detection. TISSEC 9(1), 61–93 (2006)
Nance, K., Bishop, M., Hay, B.: Virtual machine introspection: observation or interference? IEEE Secur. Priv. 6(5), 32–37 (2008)
Newsome, J.: Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In: NDSS. Internet Society (2005)
Palahan, S., Babic, D., Chaudhuri, S., Kifer, D.: Extraction of statistically signicant malware behaviors. In: ACSAC, New York, NY, USA, December 2013
Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. JCS 19(4), 639–668 (2011)
Royal, P., Halpin, M., Dagon, D., Edmonds, R., Lee, W.: Polyunpack: automating the hidden-code extraction of unpack-executing malware. In: ACSAC, pp. 289–300. IEEE Computer Society, Washington, DC (2006)
Schwartz, E.J., Lee, J., Woo, M., Brumley, D.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In: USENIX Security (2013)
Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: NDSS. Citeseer (2011)
Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poosankam, P., Saxena, P.: BitBlaze: a new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008)
Song, Q., Kasabov, N.: Ecm - a novel on-line, evolving clustering method and its applications. In: Posner, M.I. (ed.) Foundations of Cognitive Science, pp. 631–682. The MIT Press, Cambridge (2001)
Willems, C., Hund, R., Fobian, A., Felsch, D., Holz, T., Vasudevan, A.: Down to the bare metal: using processor features for binary analysis. In: ACSAC, pp. 189–198. ACM, New York (2012)
Yan, G., Brown, N., Kong, D.: Exploring discriminatory features for automated malware classification. In: Rieck, K., Stewin, P., Seifert, J.-P. (eds.) DIMVA 2013. LNCS, vol. 7967, pp. 41–61. Springer, Heidelberg (2013)
Yetiser, T.: Polymorphic Viruses, Implementation, Detection, and Protection (1993)
Yin, H., Song, D.X., Egele, M., Kruegel, C., Kirda, E.: Panorama: capturing system-wide information flow for malware detection and analysis. In: Ning, P., di Vimercati, S.D.C., Syverson, P.F. (eds.) CCS, pp. 116–127. ACM, New York (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Polino, M., Scorti, A., Maggi, F., Zanero, S. (2015). Jackdaw: Towards Automatic Reverse Engineering of Large Datasets of Binaries. In: Almgren, M., Gulisano, V., Maggi, F. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2015. Lecture Notes in Computer Science(), vol 9148. Springer, Cham. https://doi.org/10.1007/978-3-319-20550-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-20550-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20549-6
Online ISBN: 978-3-319-20550-2
eBook Packages: Computer ScienceComputer Science (R0)