ABSTRACT
Despite the successes of machine learning (ML) and deep learning (DL)-based vulnerability detectors (VD), they are limited to providing only the decision on whether a given code is vulnerable or not, without details on what part of the code is relevant to the detected vulnerability. We present IVDetect, an interpretable vulnerability detector with the philosophy of using Artificial Intelligence (AI) to detect vulnerabilities, while using Intelligence Assistant (IA) to provide VD interpretations in terms of vulnerable statements.
For vulnerability detection, we separately consider the vulnerable statements and their surrounding contexts via data and control dependencies. This allows our model better discriminate vulnerable statements than using the mixture of vulnerable code and contextual code as in existing approaches. In addition to the coarse-grained vulnerability detection result, we leverage interpretable AI to provide users with fine-grained interpretations that include the sub-graph in the Program Dependency Graph (PDG) with the crucial statements that are relevant to the detected vulnerability. Our empirical evaluation on vulnerability databases shows that IVDetect outperforms the existing DL-based approaches by 43%–84% and 105%–255% in top-10 nDCG and MAP ranking scores. IVDetect correctly points out the vulnerable statements relevant to the vulnerability via its interpretation in 67% of the cases with a top-5 ranked list. IVDetect improves over the baseline interpretation models by 12.3%–400% and 9%–400% in accuracy.
- [n.d.]. Checkmarx. https://www.checkmarx.com/Google Scholar
- [n.d.]. Coverity. https://scan.coverity.com/Google Scholar
- [n.d.]. CWE-120: Buffer Overflow. https://cwe.mitre.org/data/definitions/120.htmlGoogle Scholar
- [n.d.]. CWE-290: Authentication Bypass by Spoofing. https://cwe.mitre.org/data/definitions/290.htmlGoogle Scholar
- [n.d.]. CWE-79: Cross-site Scripting. http://cwe.mitre.org/data/definitions/79.htmlGoogle Scholar
- [n.d.]. CWE-89: SQL Injection. https://cwe.mitre.org/data/definitions/89.htmlGoogle Scholar
- [n.d.]. FlawFinder. http://www.dwheeler.com/FlawFinderGoogle Scholar
- [n.d.]. HP Fortify. https://www.hpfod.com/Google Scholar
- [n.d.]. RATS: Rough Audit Tool for Security. https://code.google.com/archive/p/rough-auditing-tool-for-security/Google Scholar
- 2021. The GitHub Repository for This Study. https://github.com/vulnerabilitydetection/VulnerabilityDetectionResearchGoogle Scholar
- Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2020. Deep Learning based Vulnerability Detection: Are We There Yet? arXiv preprint arXiv:2009.07235.Google Scholar
- Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.Google Scholar
- Jiahao Fan, Yi Li, Shaohua Wang, and Tien Nguyen. 2020. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In The 2020 International Conference on Mining Software Repositories (MSR).Google ScholarDigital Library
- Jacob Harer, Onur Ozdemir, Tomo Lazovich, Christopher Reale, Rebecca Russell, and Louis Kim. 2018. Learning to repair software vulnerabilities with generative adversarial networks. In Advances in Neural Information Processing Systems. 7933–7943.Google Scholar
- Jacob A Harer, Louis Y Kim, Rebecca L Russell, Onur Ozdemir, Leonard R Kosta, Akshay Rangamani, Lei H Hamilton, Gabriel I Centeno, Jonathan R Key, and Paul M Ellingwood. 2018. Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497.Google Scholar
- Fabian Keller, Lars Grunske, Simon Heiden, Antonio Filieri, Andre van Hoorn, and David Lo. 2017. A critical evaluation of spectrum-based fault localization techniques on a large-scale software system. In 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). 114–125.Google ScholarCross Ref
- Thomas N. Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. CoRR, abs/1609.02907 (2016), arxiv:1609.02907. arxiv:1609.02907Google Scholar
- Yi Li, Shaohua Wang, Tien N Nguyen, and Son Van Nguyen. 2019. Improving bug detection via context-based code representation learning and attention-based neural networks. Proceedings of the ACM on Programming Languages, 3, OOPSLA (2019), 1–30.Google ScholarDigital Library
- Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2018. Sysevr: A framework for using deep learning to detect software vulnerabilities. arXiv preprint arXiv:1807.06756.Google Scholar
- Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. Vuldeepecker: A deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681.Google Scholar
- Microsoft. [n.d.]. Neural Network Intelligence.. https://github.com/microsoft/nni Last Accessed August 28th, 2020.Google Scholar
- Stephan Neuhaus and Thomas Zimmermann. 2009. The Beauty and the Beast: Vulnerabilities in Red Hat’s Packages.. In USENIX Annual Technical Conference.Google Scholar
- Stephan Neuhaus, Thomas Zimmermann, Christian Holler, and Andreas Zeller. 2007. Predicting vulnerable software components. In Proceedings of the 14th ACM conference on Computer and communications security. 529–540.Google ScholarDigital Library
- Son Nguyen, Hung Dang Phan, Trinh Le, and Tien N. Nguyen. 2020. Suggesting Natural Method Names to Check Name Consistencies. In Proceedings of the 42nd International Conference on Software Engineering (ICSE ’20). ACM Press, 12 pages.Google Scholar
- Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Graph-Based Mining of Multiple Object Usage Patterns. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE ’09). Association for Computing Machinery, New York, NY, USA. 383–392. isbn:9781605580012 https://doi.org/10.1145/1595696.1595767 Google ScholarDigital Library
- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532–1543. http://www.aclweb.org/anthology/D14-1162Google Scholar
- Nam H. Pham, Tung Thanh Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2010. Detection of recurring software vulnerabilities. In Proceedings of the IEEE/ACM international conference on Automated software engineering. 447–456.Google Scholar
- Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. 2018. Automated vulnerability detection in source code using deep representation learning. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). 757–762.Google ScholarCross Ref
- Riccardo Scandariato, James Walden, Aram Hovsepyan, and Wouter Joosen. 2014. Predicting vulnerable software components via text mining. IEEE Transactions on Software Engineering, 40, 10 (2014), 993–1006.Google ScholarCross Ref
- Min Shi, Yufei Tang, Xingquan Zhu, and Jianxun Liu. 2019. Feature-attention graph convolutional networks for noise resilient learning. arXiv preprint arXiv:1912.11755.Google Scholar
- Yonghee Shin, Andrew Meneely, Laurie Williams, and Jason A Osborne. 2010. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE transactions on software engineering, 37, 6 (2010), 772–787.Google Scholar
- Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075.Google Scholar
- John Viega, Jon-Thomas Bloch, Yoshi Kohno, and Gary McGraw. 2000. ITS4: A static vulnerability scanner for C and C++ code. In Proceedings 16th Annual Computer Security Applications Conference (ACSAC’00). 257–267.Google ScholarCross Ref
- Fabian Yamaguchi, Felix Lindner, and Konrad Rieck. 2011. Vulnerability extrapolation: Assisted discovery of vulnerabilities using machine learning. In Proceedings of the 5th USENIX conference on Offensive technologies. 13–13.Google Scholar
- Fabian Yamaguchi, Markus Lottmann, and Konrad Rieck. 2012. Generalized vulnerability extrapolation using abstract syntax trees. In Proceedings of the 28th Annual Computer Security Applications Conference. 359–368.Google ScholarDigital Library
- Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. 2019. GNNExplainer: Generating Explanations for Graph Neural Networks. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 9244–9255.Google Scholar
- Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Advances in Neural Information Processing Systems. 10197–10207.Google Scholar
Index Terms
- Vulnerability detection with fine-grained interpretations
Recommendations
Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities
AbstractThe past decade has seen significant progress in artificial intelligence (AI), which has resulted in algorithms being adopted for resolving a variety of problems. However, this success has been met by increasing model complexity and ...
Commit-Level, Neural Vulnerability Detection and Assessment
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringSoftware Vulnerabilities (SVs) are security flaws that are exploitable in cyber-attacks. Delay in the detection and assessment of SVs might cause serious consequences due to the unknown impacts on the attacked systems. The state-of-the-art approaches ...
Research on Vulnerability Detection Technology for WEB Mail System
Recently, the Email system is seriously threatened by the vulnerability attack, and XSS vulnerability is one of the most serious vulnerability of WEB mail system. In this paper, we proposed a crossing site script injection vulnerability detection method ...
Comments