Abstract
Reverse engineering is labor-intensive work to understand the inner implementation of a program, and is necessary for malware analysis, vulnerability hunting, etc. Cross-version function identification and subroutine matching would greatly release manpower by indicating the known parts coming from different binary programs. Existing approaches mainly focus on function recognition ignoring the recovery of the relationships between functions, which makes the researchers hard to locate the calling routine they are interested in.
In this paper, we propose a method using graphlet edge embedding to abstract high-level topology features of function call graphs and recover the relationships between functions. With the recovery of function relationships, we reconstruct the calling routine of the program and then infer the specific functions in it. We implement a prototype model called RouAlign, which can automatically align the trunk routine of assembly codes. We evaluated RouAlign on 65 groups of real-world programs, with over two million functions. RouAlign outperforms state-of-the-art binary comparing solutions by over 35% with a high precision of 92% on average in pairwise function recognition.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Le, S.: Structure2Vec: deep learning for security analytics over graphs (2018)
Bromley, J., et al.: Signature verification using a “siamese” time delay neural network. In: Advances in neural information processing systems (1994)
Altschul, S.F., et al.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Shi, C., et al.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2016)
Andriesse, D., et al.: An in-depth analysis of disassembly on full-scale x86/x64 binaries. In: 25th USENIX Security Symposium (USENIX Security 2016) (2016)
Chandramohan, M., et al.: BinGo: cross-architecture cross-OS binary search. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM (2016)
Ding, S., et al.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. IEEE (2019)
Dullien, T., et al.: Automated attacker correlation for malicious code. Bochum University (Germany FR) (2010)
Dullien, T., Rolles, R.: Graph-based comparison of executable objects (English version). SSTIC 5(1), 3 (2005)
BinDiff manual. https://www.zynamics.com/bindiff/manual/. Accessed 15 Sept 2019
Junod, P., et al.: Obfuscator-LLVM-software protection for the masses. In: 2015 IEEE/ACM 1st International Workshop on Software Protection, pp. 3–9. IEEE (2015)
Eschweiler, S, Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: NDSS (2016)
Feng, M., et al.: Open-source license violations of binary software at large scale. In: IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER) (2019)
Feng, Q., et al.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM (2016)
Xu, X., et al.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM (2017)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2016)
Hu, X., et al.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications Security. ACM (2009)
Kuchaiev, O., et al.: Topological network alignment uncovers biological function and phylogeny. J. R. Soc. Interface 7(50), 1341–1354 (2010)
László, T., Kiss, Á.: Obfuscating C++ programs via control flow flattening. Annales Universitatis Scientarum Budapestinensis de Rolando Eötvös Nominatae, Sectio Computatorica 30(1), 3–19 (2009)
Liu, B., et al.: \(\alpha \)diff: cross-version binary code similarity detection with DNN. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM (2018)
Luo, M., Yang, C., Gong, X., Yu, L.: FuncNet: a Euclidean embedding approach for lightweight cross-platform binary recognition. In: Chen, S., Choo, K.-K.R., Fu, X., Lou, W., Mohaisen, A. (eds.) SecureComm 2019. LNICST, vol. 304, pp. 319–337. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37228-6_16
Milenković, T., Pržulj, N.: Uncovering biological network function via graphlet degree signatures. Cancer Inform. 6, 257–273 (2008). CIN-S680
Milo, R., et al.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)
Tang, J., et al.: LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2015)
Tang, Y., Wang, Y., Wei, S.N., Yu, B., Yang, Q.: Matching function-call graph of binary codes and its applications (Short Paper). In: Liu, J.K., Samarati, P. (eds.) ISPEC 2017. LNCS, vol. 10701, pp. 770–779. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72359-4_48
Zuo, F., Li, X., et al. Neural machine translation inspired binary code similarity comparison beyond function pairs. In: Proceedings of the 2019 Network and Distributed Systems Security Symposium (NDSS) (2019, in press)
SimHash wiki. https://en.wikipedia.org/wiki/SimHash. Accessed 3 Jan 2020
Acknowledgments
We thank anonymous reviewers for their invaluable comments and suggestions. Can Yang and Jian Liu share the co-first authorship.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 IFIP International Federation for Information Processing
About this paper
Cite this paper
Yang, C., Liu, J., Luo, M., Gong, X., Liu, B. (2020). RouAlign: Cross-Version Function Alignment and Routine Recovery with Graphlet Edge Embedding. In: Hölbl, M., Rannenberg, K., Welzer, T. (eds) ICT Systems Security and Privacy Protection. SEC 2020. IFIP Advances in Information and Communication Technology, vol 580. Springer, Cham. https://doi.org/10.1007/978-3-030-58201-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-58201-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58200-5
Online ISBN: 978-3-030-58201-2
eBook Packages: Computer ScienceComputer Science (R0)