Vulnerability detection with fine-grained interpretations

Authors:
Yi Li

New Jersey Institute of Technology, USA

New Jersey Institute of Technology, USA
View Profile

,
Shaohua Wang

New Jersey Institute of Technology, USA

New Jersey Institute of Technology, USA
View Profile

,
Tien N. Nguyen

University of Texas at Dallas, USA

University of Texas at Dallas, USA
View Profile

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringAugust 2021Pages 292–303https://doi.org/10.1145/3468264.3468597

Published:18 August 2021Publication History

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 292–303

ABSTRACT

Despite the successes of machine learning (ML) and deep learning (DL)-based vulnerability detectors (VD), they are limited to providing only the decision on whether a given code is vulnerable or not, without details on what part of the code is relevant to the detected vulnerability. We present IVDetect, an interpretable vulnerability detector with the philosophy of using Artificial Intelligence (AI) to detect vulnerabilities, while using Intelligence Assistant (IA) to provide VD interpretations in terms of vulnerable statements.

For vulnerability detection, we separately consider the vulnerable statements and their surrounding contexts via data and control dependencies. This allows our model better discriminate vulnerable statements than using the mixture of vulnerable code and contextual code as in existing approaches. In addition to the coarse-grained vulnerability detection result, we leverage interpretable AI to provide users with fine-grained interpretations that include the sub-graph in the Program Dependency Graph (PDG) with the crucial statements that are relevant to the detected vulnerability. Our empirical evaluation on vulnerability databases shows that IVDetect outperforms the existing DL-based approaches by 43%–84% and 105%–255% in top-10 nDCG and MAP ranking scores. IVDetect correctly points out the vulnerable statements relevant to the vulnerability via its interpretation in 67% of the cases with a top-5 ranked list. IVDetect improves over the baseline interpretation models by 12.3%–400% and 9%–400% in accuracy.

References

[n.d.]. Checkmarx. https://www.checkmarx.com/Google Scholar
[n.d.]. Coverity. https://scan.coverity.com/Google Scholar
[n.d.]. CWE-120: Buffer Overflow. https://cwe.mitre.org/data/definitions/120.htmlGoogle Scholar
[n.d.]. CWE-290: Authentication Bypass by Spoofing. https://cwe.mitre.org/data/definitions/290.htmlGoogle Scholar
[n.d.]. CWE-79: Cross-site Scripting. http://cwe.mitre.org/data/definitions/79.htmlGoogle Scholar
[n.d.]. CWE-89: SQL Injection. https://cwe.mitre.org/data/definitions/89.htmlGoogle Scholar
[n.d.]. FlawFinder. http://www.dwheeler.com/FlawFinderGoogle Scholar
[n.d.]. HP Fortify. https://www.hpfod.com/Google Scholar
[n.d.]. RATS: Rough Audit Tool for Security. https://code.google.com/archive/p/rough-auditing-tool-for-security/Google Scholar
2021. The GitHub Repository for This Study. https://github.com/vulnerabilitydetection/VulnerabilityDetectionResearchGoogle Scholar
Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2020. Deep Learning based Vulnerability Detection: Are We There Yet? arXiv preprint arXiv:2009.07235.Google Scholar
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.Google Scholar
Jiahao Fan, Yi Li, Shaohua Wang, and Tien Nguyen. 2020. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In The 2020 International Conference on Mining Software Repositories (MSR).Google ScholarDigital Library
Jacob Harer, Onur Ozdemir, Tomo Lazovich, Christopher Reale, Rebecca Russell, and Louis Kim. 2018. Learning to repair software vulnerabilities with generative adversarial networks. In Advances in Neural Information Processing Systems. 7933–7943.Google Scholar
Jacob A Harer, Louis Y Kim, Rebecca L Russell, Onur Ozdemir, Leonard R Kosta, Akshay Rangamani, Lei H Hamilton, Gabriel I Centeno, Jonathan R Key, and Paul M Ellingwood. 2018. Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497.Google Scholar
Fabian Keller, Lars Grunske, Simon Heiden, Antonio Filieri, Andre van Hoorn, and David Lo. 2017. A critical evaluation of spectrum-based fault localization techniques on a large-scale software system. In 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). 114–125.Google ScholarCross Ref
Thomas N. Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. CoRR, abs/1609.02907 (2016), arxiv:1609.02907. arxiv:1609.02907Google Scholar
Yi Li, Shaohua Wang, Tien N Nguyen, and Son Van Nguyen. 2019. Improving bug detection via context-based code representation learning and attention-based neural networks. Proceedings of the ACM on Programming Languages, 3, OOPSLA (2019), 1–30.Google ScholarDigital Library
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2018. Sysevr: A framework for using deep learning to detect software vulnerabilities. arXiv preprint arXiv:1807.06756.Google Scholar
Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. Vuldeepecker: A deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681.Google Scholar
Microsoft. [n.d.]. Neural Network Intelligence.. https://github.com/microsoft/nni Last Accessed August 28th, 2020.Google Scholar
Stephan Neuhaus and Thomas Zimmermann. 2009. The Beauty and the Beast: Vulnerabilities in Red Hat’s Packages.. In USENIX Annual Technical Conference.Google Scholar
Stephan Neuhaus, Thomas Zimmermann, Christian Holler, and Andreas Zeller. 2007. Predicting vulnerable software components. In Proceedings of the 14th ACM conference on Computer and communications security. 529–540.Google ScholarDigital Library
Son Nguyen, Hung Dang Phan, Trinh Le, and Tien N. Nguyen. 2020. Suggesting Natural Method Names to Check Name Consistencies. In Proceedings of the 42nd International Conference on Software Engineering (ICSE ’20). ACM Press, 12 pages.Google Scholar
Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Graph-Based Mining of Multiple Object Usage Patterns. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE ’09). Association for Computing Machinery, New York, NY, USA. 383–392. isbn:9781605580012 https://doi.org/10.1145/1595696.1595767 Google ScholarDigital Library
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532–1543. http://www.aclweb.org/anthology/D14-1162Google Scholar
Nam H. Pham, Tung Thanh Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2010. Detection of recurring software vulnerabilities. In Proceedings of the IEEE/ACM international conference on Automated software engineering. 447–456.Google Scholar
Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. 2018. Automated vulnerability detection in source code using deep representation learning. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). 757–762.Google ScholarCross Ref
Riccardo Scandariato, James Walden, Aram Hovsepyan, and Wouter Joosen. 2014. Predicting vulnerable software components via text mining. IEEE Transactions on Software Engineering, 40, 10 (2014), 993–1006.Google ScholarCross Ref
Min Shi, Yufei Tang, Xingquan Zhu, and Jianxun Liu. 2019. Feature-attention graph convolutional networks for noise resilient learning. arXiv preprint arXiv:1912.11755.Google Scholar
Yonghee Shin, Andrew Meneely, Laurie Williams, and Jason A Osborne. 2010. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE transactions on software engineering, 37, 6 (2010), 772–787.Google Scholar
Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075.Google Scholar
John Viega, Jon-Thomas Bloch, Yoshi Kohno, and Gary McGraw. 2000. ITS4: A static vulnerability scanner for C and C++ code. In Proceedings 16th Annual Computer Security Applications Conference (ACSAC’00). 257–267.Google ScholarCross Ref
Fabian Yamaguchi, Felix Lindner, and Konrad Rieck. 2011. Vulnerability extrapolation: Assisted discovery of vulnerabilities using machine learning. In Proceedings of the 5th USENIX conference on Offensive technologies. 13–13.Google Scholar
Fabian Yamaguchi, Markus Lottmann, and Konrad Rieck. 2012. Generalized vulnerability extrapolation using abstract syntax trees. In Proceedings of the 28th Annual Computer Security Applications Conference. 359–368.Google ScholarDigital Library
Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. 2019. GNNExplainer: Generating Explanations for Graph Neural Networks. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 9244–9255.Google Scholar
Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Advances in Neural Information Processing Systems. 10197–10207.Google Scholar

Index Terms

Vulnerability detection with fine-grained interpretations
1. Security and privacy
  1. Software and application security
    1. Software security engineering

Recommendations

Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities
Abstract
The past decade has seen significant progress in artificial intelligence (AI), which has resulted in algorithms being adopted for resolving a variety of problems. However, this success has been met by increasing model complexity and ...
Read More
Commit-Level, Neural Vulnerability Detection and Assessment
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Software Vulnerabilities (SVs) are security flaws that are exploitable in cyber-attacks. Delay in the detection and assessment of SVs might cause serious consequences due to the unknown impacts on the attacked systems. The state-of-the-art approaches ...
Read More
Research on Vulnerability Detection Technology for WEB Mail System

Recently, the Email system is seriously threatened by the vulnerability attack, and XSS vulnerability is one of the most serious vulnerability of WEB mail system. In this paper, we proposed a crossing site script injection vulnerability detection method ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
August 2021
1690 pages
ISBN:9781450385626
DOI:10.1145/3468264
General Chairs:
Diomidis Spinellis
Athens University of Economics and Business, Greece
,
Georgios Gousios
Facebook, Netherlands / Delft University of Technology, Netherlands
,
Program Chairs:
Marsha Chechik
University of Toronto, Canada
,
Massimiliano Di Penta
University of Sannio, Italy
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 August 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep Learning
Explainable AI (XAI)
Intelligence Assistant
Interpretable AI
Vulnerability Detection
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Upcoming Conference
FSE '24

Sponsor:

sigsoft

32nd ACM International Conference on the Foundations of Software Engineering

July 15 - 19, 2024

Ipojuca (Pernambuco) , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 37
  Total Citations
  View Citations
- 2,204
  Total Downloads
- Downloads (Last 12 months)1,255
- Downloads (Last 6 weeks)191
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Vulnerability detection with fine-grained interpretations

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities

Commit-Level, Neural Vulnerability Detection and Assessment

Research on Vulnerability Detection Technology for WEB Mail System