ABSTRACT
Binary code similarity analysis (BCSA) is a fundamental building block for various software security, reverse engineering, and re-engineering applications. Existing research has applied deep neural networks (DNNs) to measure the similarity between binary code, following the major breakthrough of DNNs in processing media data like images. Despite the encouraging results of DNN-based BCSA, it is however not widely deployed in the industry due to the instability and the black-box nature of DNNs.
In this work, we first launch an extensive study over the state-of-the-art (SoTA) BCSA tools, and investigate their erroneous predictions from both quantitative and qualitative perspectives. Then, we accordingly design a low-cost and generic framework, namely Binaug, to improve the accuracy of BCSA tools by repairing their input binary codes. Aligned with the typical workflow of DNN-based BCSA, Binaug obtains the sorted top-K results of code similarity, and then re-ranks the results using a set of carefully-designed transformations. Binaug supports both black- and white-box settings, depending on the accessibility of the DNN model internals. Our experimental results show that Binaug can constantly improve performance of the SoTA BCSA tools by an average of 2.38pt and 6.46pt in the black- and the white-box settings. Moreover, with Binaug, we enhance the F1 score of binary software component analysis, an important downstream application of BCSA, by an average of 5.43pt and 7.45pt in the black- and the white-box settings.
- [n.d.]. 3.11 Options That Control Optimization. https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html.Google Scholar
- [n.d.]. Average embedding vectors. https://datascience.stackexchange.com/questions/107462/why-does-averaging-word-embedding-vectors-exctracted-from-the-nn-embedding-laye.Google Scholar
- [n.d.]. binaug. https://github.com/wwkenwong/BinAug/.Google Scholar
- [n. d.]. BSCA. https://sites.google.com/view/binaug/bsca/.Google Scholar
- [n. d.]. Hungarian algorithm. https://en.wikipedia.org/wiki/Hungarian_algorithm.Google Scholar
- [n.d.]. Parameter Analysis. https://sites.google.com/view/binaug/parameter-analysis.Google Scholar
- [n. d.]. Result Website. https://sites.google.com/view/binaug.Google Scholar
- [n. d.]. Speedy cosine similarity computation in PyTorch. https://stackoverflow.com/questions/50411191/how-to-compute-the-cosine-similarity-in-pytorch-for-all-rows-in-a-matrix-with-re.Google Scholar
- [n. d.]. Yara. http://virustotal.github.io/yara/.Google Scholar
- 2014. BinDiff. https://www.zynamics.com/bindiff.html.Google Scholar
- Uri Alon and Eran Yahav. 2021. On the Bottleneck of Graph Neural Networks and its Practical Implications. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3--7, 2021. OpenReview.net. https://openreview.net/forum?id=i80OPhOCVH2Google Scholar
- László Babai and Ludek Kucera. 1979. Canonical labelling of graphs in linear average time. 20th Annual Symposium on Foundations of Computer Science (sfcs 1979) (1979), 39--46.Google ScholarDigital Library
- Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. 2009. Scalable, behavior-based malware clustering.. In NDSS, Vol. 9. 8--11.Google Scholar
- Tal Ben-Nun, Alice Shoshana Jakobovits, and Torsten Hoefler. 2018. Neural Code Comprehension: A Learnable Representation of Code Semantics (NIPS 2018).Google Scholar
- Yoshua Bengio, Aaron C. Courville, and Pascal Vincent. 2012. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2012), 1798--1828.Google ScholarDigital Library
- Michael Brengel and Christian Rossow. 2021. {YARIX}: Scalable {YARA-based} malware intelligence. In 30th USENIX Security Symposium (USENIX Security 21). 3541--3558.Google Scholar
- Chen Cai and Yusu Wang. 2020. A Note on Over-Smoothing for Graph Neural Networks. ICML Workshop: Graph Representation Learning and Beyond, 2020. (2020).Google Scholar
- Jialun Cao, Meiziniu Li, Yeting Li, Ming Wen, S. C. Cheung, and Haiming Chen. 2020. SemMT: A Semantic-Based Testing Approach for Machine Translation Systems. ACM Transactions on Software Engineering and Methodology (TOSEM) 31 (2020), 1 -- 36.Google ScholarDigital Library
- Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. Bingo: Cross-architecture cross-os binary search. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 678--689.Google ScholarDigital Library
- Di Chen, Shanshan Zhang, Jian Yang, and Bernt Schiele. 2020. Norm-Aware Embedding for Efficient Person Search. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 12612--12621.Google Scholar
- Kai Chen, Peng Wang, Yeonjoon Lee, XiaoFeng Wang, Nan Zhang, Heqing Huang, Wei Zou, and Peng Liu. 2015. Finding unknown malice in 10 seconds: Mass vetting for new threats at the google-play scale. In 24th {USENIX} security symposium ({USENIX} security 15). 659--674.Google Scholar
- Tsong Y Chen, Shing C Cheung, and Shiu Ming Yiu. 1998. Metamorphic testing: a new approach for generating next test cases. Technical Report. Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong ....Google Scholar
- Hanjun Dai, Bo Dai, and Le Song. 2016. Discriminative Embeddings of Latent Variable Models for Structured Data. In International Conference on Machine Learning.Google Scholar
- Yaniv David, Nimrod Partush, and Eran Yahav. 2016. Statistical Similarity of Binaries (PLDI).Google Scholar
- Yaniv David and Eran Yahav. 2014. Tracelet-based Code Search in Executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '14). ACM, 349--360.Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2019), 4171--4186.Google Scholar
- Kaize Ding, Zhe Xu, Hanghang Tong, and Huan Liu. 2022. Data Augmentation for Deep Graph Learning: A Survey. SIGKDD Explor. Newsl. 24, 2 (dec 2022), 61--77. Google ScholarDigital Library
- S. H. Ding, B. M. Fung, and P. Charland. 2019. Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. In IEEE S&P.Google Scholar
- Yue Duan, Xuezixiang Li, Jinghan Wang, and Heng Yin. 2020. DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing. (2020).Google Scholar
- Anurag Dwarakanath, Manish Ahuja, Samarth Sikand, Raghotham M. Rao, Jagadeesh Chandra J. C. Bose, Neville Dubash, and Sanjay Podder. 2018. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (2018).Google ScholarDigital Library
- Manuel Egele, Maverick Woo, Peter Chapman, and David Brumley. 2014. Blanket execution: Dynamic similarity testing for program binaries and components. In 23rd {USENIX} Security Symposium ({USENIX} Security 14). 303--317.Google Scholar
- Mohammad Reza Farhadi, Benjamin CM Fung, Philippe Charland, and Mourad Debbabi. 2014. Binclone: Detecting code clones in malware. In 2014 Eighth International Conference on Software Security and Reliability (SERE). IEEE, 78--87.Google ScholarDigital Library
- Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 480--491.Google ScholarDigital Library
- Debin Gao, Michael K. Reiter, and Dawn Song. 2008. BinHunt: Automatically Finding Semantic Differences in Binary Programs (ICICS).Google Scholar
- Jian Gao, Xin Yang, Ying Fu, Yu Jiang, and Jiaguang Sun. 2018. VulSeeker: A semantic learning based vulnerability seeker for cross-platform binary. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 896--899.Google ScholarDigital Library
- Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In International conference on machine learning. PMLR, 1263--1272.Google Scholar
- Irfan Ul Haq and Juan Caballero. 2021. A survey of binary code similarity. ACM Computing Surveys (CSUR) 54, 3 (2021), 1--38.Google ScholarDigital Library
- Armijn Hemel, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Dolstra. 2011. Finding software license violations through binary code clone detection. In Proceedings of the 8th Working Conference on Mining Software Repositories. 63--72.Google ScholarDigital Library
- SA Hex-Rays. 2014. IDA Pro: a cross-platform multi-processor disassembler and debugger.Google Scholar
- Yong Jin Kim (https://stats.stackexchange.com/users/188593/yong-jin kim). [n. d.]. What does average of word2vec vector mean? Cross Validated. arXiv:https://stats.stackexchange.com/q/318882 https://stats.stackexchange.com/q/318882 URL:https://stats.stackexchange.com/q/318882 (version: 2017-12-15).Google Scholar
- Xin Hu, Kang G Shin, Sandeep Bhatkar, and Kent Griffin. 2013. Mutantx-s: Scalable malware clustering based on static features. In 2013 {USENIX} Annual Technical Conference ({USENIX} {ATC} 13). 187--198.Google Scholar
- Zhenlan Ji, Pingchuan Ma, Yuanyuan Yuan, and Shuai Wang. 2023. CC: Causality-Aware Coverage Criterion for Deep Neural Networks. 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) (2023), 1788--1800.Google Scholar
- Mingyue Jiang, Tsong Yueh Chen, and Shuai Wang. 2022. On the effectiveness of testing sentiment analysis systems with metamorphic testing. Inf. Softw. Technol. 150 (2022), 106966.Google ScholarDigital Library
- Giorgios Kollias, Shahin Mohammadi, and Ananth Y. Grama. 2012. Network Similarity Decomposition (NSD): A Fast and Scalable Approach to Network Alignment. IEEE Transactions on Knowledge and Data Engineering 24 (2012), 2232--2243.Google ScholarDigital Library
- Oleksii Kuchaiev and Natasa Przulj. 2011. Integrative network alignment reveals large regions of global network similarity in yeast and human. Bioinformatics 27 10 (2011), 1390--6.Google Scholar
- Per Larsen, Andrei Homescu, Stefan Brunthaler, and Michael Franz. 2014. SoK: Automated Software Diversity. In IEEE S&P.Google Scholar
- Eugene L Lawler. 1963. The quadratic assignment problem. Management science 9, 4 (1963), 586--599.Google Scholar
- Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning. PMLR, 1188--1196.Google Scholar
- Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved code summarization via a graph neural network. In Proceedings of the 28th international conference on program comprehension. 184--195.Google ScholarDigital Library
- Xuezixiang Li, Qu Yu, and Heng Yin. 2021. PalmTree: Learning an Assembly Language Model for Instruction Embedding. (2021).Google Scholar
- Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. 2019. Graph matching networks for learning the similarity of graph structured objects. In International conference on machine learning. PMLR, 3835--3845.Google Scholar
- Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated graph sequence neural networks. (2016).Google Scholar
- Zongjie Li, Chaozheng Wang, Zhibo Liu, Haoxuan Wang, Dong Chen, Shuai Wang, and Cuiyun Gao. 2023. Cctest: Testing and repairing code completion systems. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).Google ScholarDigital Library
- Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. (2018).Google Scholar
- Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A Structured Self-attentive Sentence Embedding. (2017).Google Scholar
- Liang Liu, Bo Qu, Bin Chen, Alan Hanjalic, and Huijuan Wang. 2017. Modeling of Information Diffusion on Social Networks with Applications to WeChat. ArXiv abs/1704.03261 (2017).Google Scholar
- Linhao Luo, Gholamreza Haffari, and Shirui Pan. 2023. Graph sequential neural ode process for link prediction on dynamic and sparse graphs. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 778--786.Google ScholarDigital Library
- Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. 2017. Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Applications to Software and Algorithm Plagiarism Detection. IEEE Trans. Softw. Eng. 43, 12 (Dec. 2017), 1157--1177.Google ScholarDigital Library
- Pingchuan Ma, Shuai Wang, and Jin Liu. 2020. Metamorphic Testing and Certified Mitigation of Fairness Violations in NLP Models. In IJCAI. 458--465.Google Scholar
- Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, X. Zhang, and Ananth Y. Grama. 2018. MODE: automated neural network model debugging via state differential analysis and input selection. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2018).Google ScholarDigital Library
- Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio, Mohamad Mansouri, and Davide Balzarotti. 2022. How Machine Learning Is Solving the Binary Function Similarity Problem. In 31st USENIX Security Symposium (USENIX Security 22). USENIX Association, Boston, MA, 2099--2116.Google Scholar
- Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Leonardo Querzoni, and Roberto Baldoni. 2019. SAFE: Self-Attentive Function Embeddings for Binary Similarity. In Proceedings of 16th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA).Google ScholarCross Ref
- Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Leonardo Querzoni, and Roberto Baldoni. 2019. Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis. Proceedings 2019 Workshop on Binary Analysis Research (2019).Google ScholarCross Ref
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. (2013).Google Scholar
- Qi Pang, Yuanyuan Yuan, and Shuai Wang. 2021. MDPFuzz: testing models solving Markov decision processes. Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (2021).Google Scholar
- Changhua Pei, Yi Zhang, Yongfeng Zhang, Fei Sun, Xiao Lin, Hanxiao Sun, Jian Wu, Peng Jiang, Junfeng Ge, Wenwu Ou, and Dan Pei. 2019. Personalized re-ranking for recommendation. Proceedings of the 13th ACM Conference on Recommender Systems (2019).Google ScholarDigital Library
- Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP '17). ACM, New York, NY, USA, 1--18. Google ScholarDigital Library
- Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity. arXiv preprint arXiv:2012.08680 (2020).Google Scholar
- Danfeng Qin, Stephan Gammeter, Lukas Bossard, Till Quack, and Luc Van Gool. 2011. Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. CVPR 2011 (2011), 777--784.Google ScholarDigital Library
- Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE transactions on neural networks 20, 1 (2008), 61--80.Google ScholarDigital Library
- Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815--823.Google ScholarCross Ref
- Xiaohui Shen, Zhe L. Lin, Jonathan Brandt, Shai Avidan, and Ying Wu. 2012. Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking. 2012 IEEE Conference on Computer Vision and Pattern Recognition (2012), 3013--3020.Google ScholarCross Ref
- Aarne Talman, Anssi Yli-Jyrä, and Jörg Tiedemann. 2019. Sentence embeddings in NLI with iterative refinement encoders. Natural Language Engineering 25, 4 (2019), 467--482.Google ScholarCross Ref
- Wei Tang, Yanlin Wang, Hongyu Zhang, Shi Han, Ping Luo, and Dongmei Zhang. 2022. LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries. 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR) (2022), 423--434.Google ScholarDigital Library
- Yongqiang Tian, Shiqing Ma, Ming Wen, Yepang Liu, S. C. Cheung, and X. Zhang. 2021. To what extent do DNN-based image classification models make unreliable inferences? Empirical Software Engineering 26 (2021).Google Scholar
- Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M. Bronstein. 2022. Understanding over-squashing and bottlenecks on graphs via curvature. (2022).Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).Google Scholar
- Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lid, and Yoshua Bengio. 2018. Graph Attention Networks. International Conference on Learning Representations (2018). https://openreview.net/forum?id=rJXMpikCZGoogle Scholar
- Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. 2016. Order matters: Sequence to sequence for sets. (2016).Google Scholar
- Huaijin Wang, Pingchuan Ma, Shuai Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2022. sem2vec: Semantics-aware Assembly Tracelet Embedding. ACM Transactions on Software Engineering and Methodology 32 (2022), 1 -- 34.Google Scholar
- Huaijin Wang, Pingchuan Ma, Yuanyuan Yuan, Zhibo Liu, Shuai Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2023. Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking. IEEE Transactions on Software Engineering 49 (2023), 226--250.Google ScholarCross Ref
- Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang. 2022. jTrans: Jump-Aware Transformer for Binary Code Similarity. (2022). Google ScholarDigital Library
- Huaijin Wang, Shuai Wang, Dongpeng Xu, X. Zhang, and Xiao Liu. 2022. Generating Effective Software Obfuscation Sequences With Reinforcement Learning. IEEE Transactions on Dependable and Secure Computing 19 (2022), 1900--1917.Google ScholarCross Ref
- Shuai Wang and Zhendong Su. 2020. Metamorphic Object Insertion for Testing Object Detection Systems. In ASE.Google Scholar
- Shuai Wang, Pei Wang, and Dinghao Wu. 2015. Reassembleable Disassembling. In USENIX Security.Google Scholar
- Shuai Wang and Dinghao Wu. 2017. In-memory fuzzing for binary code similarity analysis. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 319--330.Google ScholarCross Ref
- Boris Weisfeiler and Andrei Leman. 1968. THE REDUCTION OF A GRAPH TO CANONICAL FORM AND THE ALGEBRA WHICH APPEARS THEREIN. NTI, Series, 2(9):12--16.Google Scholar
- Asiri Wijesinghe and Qing Wang. 2022. A New Perspective on "How Graph Neural Networks Go Beyond Weisfeiler-Lehman?". In International Conference on Learning Representations.Google Scholar
- Wai Kin Wong, Huaijin Wang, Pingchuan Ma, Shuai Wang, Mingyue Jiang, Tsong Yueh Chen, Qiyi Tang, Sen Nie, and Shi Wu. 2022. Deceiving Deep Neural Networks-Based Binary Code Matching with Adversarial Programs. 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2022), 117--128.Google Scholar
- Seunghoon Woo, Sung-Hwuy Park, Seulbae Kim, Heejo Lee, and Hakjoo Oh. 2021. CENTRIS: A Precise and Scalable Approach for Identifying Modified Open-Source Software Reuse. 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) (2021), 860--872.Google Scholar
- Yan Xiao, Ivan Beschastnikh, David S. Rosenblum, Changsheng Sun, Sebastian G. Elbaum, Yun Lin, and Jin Song Dong. 2021. Self-Checking Deep Neural Networks in Deployment. 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) (2021), 372--384.Google Scholar
- Yan Xiao, Yun Lin, Ivan Beschastnikh, Changsheng Sun, David Rosenblum, and Jin Song Dong. 2022. Repairing Failure-inducing Inputs with Input Reflection. Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (2022).Google ScholarDigital Library
- Xiaofei Xie, Wenbo Guo, L. Ma, Wei Le, Jian Wang, Lingjun Zhou, Yang Liu, and Xinyu Xing. 2021. RNNRepair: Automatic RNN Repair via Model-based Analysis. In International Conference on Machine Learning.Google Scholar
- Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks? (2019).Google Scholar
- Xiangzhe Xu, Shiwei Feng, Yapeng Ye, Guangyu Shen, Zian Su, Siyuan Cheng, Guanhong Tao, Qingkai Shi, Zhuo Zhang, and Xiangyu Zhang. 2023. Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis. (2023).Google Scholar
- Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection. In CCS.Google Scholar
- Hui Wang Ya Liu. 2018. Tracking Mirai variants. https://www.virusbulletin.com/conference/vb2018/abstracts/tracking-mirai-variants/.Google Scholar
- Can Yang, Zhengzi Xu, Hongxu Chen, Yang Liu, Xiaorui Gong, and Baoxu Liu. 2022. ModX: binary level partially imported third-party library detection via program modularization and semantic matching. In Proceedings of the 44th International Conference on Software Engineering. 1393--1405.Google ScholarDigital Library
- Zeping Yu, Rui Cao, Qiyi Tang, Sen Nie, Junzhou Huang, and Shi Wu. 2020. Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection. (2020).Google Scholar
- Zeping Yu, Wenxin Zheng, Jiaqi Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2020. CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching. In Neural Information Processing Systems.Google Scholar
- Yuanyuan Yuan, Qi Pang, and Shuai Wang. 2021. Enhancing Deep Neural Networks Testing by Traversing Data Manifold. arXiv preprint arXiv:2112.01956 (2021).Google Scholar
- Yuanyuan Yuan, Qi Pang, and Shuai Wang. 2022. Unveiling Hidden DNN Defects with Decision-Based Metamorphic Testing. Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (2022).Google ScholarDigital Library
- Yuanyuan Yuan, Qi Pang, and Shuai Wang. 2023. Revisiting neuron coverage for dnn testing: A layer-wise and distribution-aware criterion. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1200--1212.Google ScholarDigital Library
- Yuanyuan Yuan, Shuai Wang, Mingyue Jiang, and Tsong Yueh Chen. 2021. Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16908--16917.Google ScholarCross Ref
- J Zhang, Mark Harman, Lei Ma, and Yang Liu. 2019. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering 48 (2019), 1--36.Google ScholarDigital Library
- Xiaohui Zhang, Yuanjun Gong, Bin Liang, Jianjun Huang, Wei You, Wenchang Shi, and Jian Zhang. 2022. Hunting bugs with accelerated optimal graph vertex matching. Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (2022).Google ScholarDigital Library
- Zhaowei Zhang, Hongyu Zhang, Beijun Shen, and Xiaodong Gu. 2022. Diet code is healthy: simplifying programs for pre-trained models of code. Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2022).Google ScholarDigital Library
- Tong Zhao, Yozen Liu, Leonardo Neves, Oliver J. Woodford, Meng Jiang, and Neil Shah. 2020. Data Augmentation for Graph Neural Networks. In AAAI Conference on Artificial Intelligence.Google Scholar
- Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking Person Re-identification with k-Reciprocal Encoding. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 3652--3661.Google ScholarCross Ref
- Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Advances in neural information processing systems 32 (2019).Google Scholar
Index Terms
- BinAug: Enhancing Binary Similarity Analysis with Low-Cost Input Repairing
Recommendations
Automatically repairing binary programs using adapter synthesis
ASE '19: Proceedings of the 34th IEEE/ACM International Conference on Automated Software EngineeringBugs in commercial software and third-party components are an undesirable and expensive phenomenon. Such software is usually released to users only in binary form. The lack of source code renders users of such software dependent on their software ...
LDN-RC: a lightweight denoising network with residual connection to improve adversarial robustness
AbstractDeep neural networks (DNNs) are prone to produce incorrect prediction results under the attack of adversarial samples. To cope with this problem, some defense methods are presented. However, most of them are based on adversarial training, which ...
Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning
ACSAC '22: Proceedings of the 38th Annual Computer Security Applications ConferenceBinary code similarity detection (BCSD) serves as a basis for a wide spectrum of applications, including software plagiarism, malware classification, and known vulnerability discovery. However, the inference of contextual meanings of a binary is ...
Comments