A comparison of the efficiency and effectiveness of vulnerability discovery techniques

doi:10.1016/j.infsof.2012.11.007

Information and Software Technology

Volume 55, Issue 7, July 2013, Pages 1279-1288

https://doi.org/10.1016/j.infsof.2012.11.007 Get rights and content

Abstract

Context

Security vulnerabilities discovered later in the development cycle are more expensive to fix than those discovered early. Therefore, software developers should strive to discover vulnerabilities as early as possible. Unfortunately, the large size of code bases and lack of developer expertise can make discovering software vulnerabilities difficult. A number of vulnerability discovery techniques are available, each with their own strengths.

Objective

The objective of this research is to aid in the selection of vulnerability discovery techniques by comparing the vulnerabilities detected by each and comparing their efficiencies.

Method

We conducted three case studies using three electronic health record systems to compare four vulnerability discovery techniques: exploratory manual penetration testing, systematic manual penetration testing, automated penetration testing, and automated static analysis.

Results

In our case study, we found empirical evidence that no single technique discovered every type of vulnerability. We discovered that the specific set of vulnerabilities identified by one tool was largely orthogonal to that of other tools. Systematic manual penetration testing found the most design flaws, while automated static analysis found the most implementation bugs. The most efficient discovery technique in terms of vulnerabilities discovered per hour was automated penetration testing.

Conclusion

The results show that employing a single technique for vulnerability discovery is insufficient for finding all types of vulnerabilities. Each technique identified only a subset of the vulnerabilities, which, for the most part were independent of each other. Our results suggest that in order to discover the greatest variety of vulnerability types, at least systematic manual penetration testing and automated static analysis should be performed.

Introduction

Results of decades of empirical research on effectiveness and efficiency of fault and failure discovery techniques, such as unit testing and inspections, can be used to provide evidence-based guidance on the use of these techniques. However, similar empirical results on the effectiveness and efficiency of vulnerability discovery techniques, such as security-focused automated static analysis and penetration testing are sparse. As a result, practitioners lack evidence-based guidance on the use of vulnerability discovery techniques.

In his book Software Security: Building Security In, Gary McGraw draws on his experience as a security researcher and claims: “Security problems evolve, grow, and mutate, just like species on a continent. No one technique or set of rules will ever perfectly detect all security vulnerabilities” [1]. Instead, he advocates using a variety of vulnerability discovery and prevention techniques throughout the software development lifecycle. McGraw’s claim, however, is based upon his experience and is not substantiated with empirical evidence. The objective of this research is to aid in the selection of vulnerability discovery techniques by comparing the vulnerabilities detected using each and comparing their efficiencies.

In previous work [2], the first author analyzed four vulnerability discovery techniques on two electronic health record (EHR) systems. The vulnerability discovery techniques analyzed included: exploratory manual penetration testing, systematic manual penetration testing, automated penetration testing, and automated static analysis. The first author used these four techniques on Tolven Electronic Clinician Health Record (eCHR)¹ and OpenEMR.² These two systems are currently used within the United States to store patient records. Tolven eCHR and OpenEMR are web-based systems. This paper adds the same analysis conducted on an additional EHR, PatientOS,³ performed by the second author. PatientOS is a custom client/server application written in Java. The new results corroborate the findings of the previous paper. Additionally, we examine the validity of the study in greater detail and offer new insights based on the PatientOS data. The second author was careful to follow the exact same procedure used by the first author and collaborated throughout the process with the first author to confirm agreement on classifications.

We classified the vulnerabilities found by these techniques as either implementation bugs or design flaws. Design flaws are high-level problems associated with the architecture of the software. An example of a design flaw is failure of authentication where necessary. Implementation bugs are code-level software problems, such as an instance of buffer overflow. Design flaws and implementation bugs occur with roughly equal frequency [1]. We then manually analyzed each discovered vulnerability to determine if the same vulnerability could be found by multiple vulnerability discovery techniques.

The contributions of this paper are as follows:

•
A comparison of the type and number of vulnerabilities found with exploratory manual penetration testing, systematic manual penetration testing, automated penetration testing, and automated static analysis.
•
Empirical evidence indicating which discovery techniques should be used to find implementation bugs and design flaw types of vulnerabilities.
•
An evaluation of the efficiency for each vulnerability discovery technique based on the metric vulnerabilities discovered per hour.
•
Additional data collected with a new EHR that served to further support previous work [2].

The rest of the paper is organized as follows; Section 2 provides background information requiring familiarity to understand the contents of the paper. Section 3 describes related work. Section 4 describes the case study and its methodology. Section 5 gives our study results. Section 6 discusses our results and provides analysis. Section 7 discusses our limitations. Section 8 summarizes our conclusions. Finally Section 9 talks about possible future work.

Section snippets

Background

This section describes the terminology used throughout the paper and gives background information on the types of security issues one may encounter when doing security analysis. The section also discusses work related to the vulnerability discovery techniques.

Related work

Researchers have already examined some differences between vulnerability discovery techniques. Autunes and Vieira compared the effectiveness of static analysis and automated penetration testing in detecting SQL injection vulnerabilities in web services [11]. They found more SQL injection vulnerabilities with static analysis than with automated penetration testing. They also found that both static analysis and automated penetration testing had a large false positive rate. In our work we focus on

Case study

This section describes the subjects chosen for our case studies as well as our methodology.

Results

In this section we present the results of our case study. We include analyses of the results of applying the four vulnerability discovery techniques against the three EHR systems. We include false positive rates with the automated tools. Finally, a metric intended to reflect overall efficiency, vulnerabilities discovered per hour, is discussed for each. This metric only takes the time spent by the authors reviewing potential vulnerabilities into account; it does not include time spent setting

Analysis and discussion

The first subsection discusses and analyzes the vulnerabilities discovered. The second subsection section discusses the efficiency of the various discovery techniques, while the third subsection talks about several vulnerabilities the discovery techniques discussed failed to find.

Limitations

Runeson and Höst outline four primary threats to validity that can help to determine the extent to which the results of a case study might be influenced by researcher bias [24]. Construct validity pertains to whether or not the measured results really serve to describe the constructs that they are intended to describe. We believe that our vulnerability data is perfectly descriptive of what we intended. Internal validity is threatened when chains of causality are hypothesized (e.g. A causes B),

Conclusion

In our case study we found that systematic manual penetration testing was more effective in finding vulnerabilities than exploratory manual penetration testing. We found that systematic manual penetration testing was the most effective at finding design flaw vulnerabilities. When compared to automated and manual penetration testing techniques, static analysis found different types of and more vulnerabilities. This result suggests that one cannot rely on static analysis or automated penetration

Future work

There are several areas branching from our work that could benefit from further study. This paper approached the question of vulnerability discovery with a top-down approach, ultimately allowing us to draw comparisons between the techniques. A bottom-up approach could reveal more about the ability of techniques to detect the exact same vulnerabilities. Additionally, it would be beneficial to know why different types of vulnerabilities were discovered by one technique and not another. Even

Acknowledgment

This work is supported by the Agency for Healthcare Research Quality. We also would like to thank the members of the Realsearch group for their invaluable feedback on our research and this paper.

References (24)

G. McGraw
Software Security: Building Security
(2006)
Andrew Austen, Laurie Williams, One technique is not enough: an empirical comparison of vulnerability discovery...
G. Stoneburner, A. Goguen, A. Feringa, Risk management guide for information security systems. NIST Special Publication...
D. Allan, Web application security: automated scanning versus manual penetration testing, IBM Rational Software,...
B. Chess et al.
Static analysis for security
IEEE Security and Privacy
(2004)
W. Pugh, D. Hovemeyer, Finding bugs is easy, ACM SIGPLAN Notices, vol. 39, no. 12, December...
N. Ayewah, D. Hovemeyer, J.D. Morgenthaler, J. Penix, W. Pugh, Using static analysis to find bugs, in: IEEE Software,...
T. Henzinger et al.
Software verification with BLAST
The Open Web Application Security Project, August 2010. HttpOnly....
The MITRE Corporation, Common Weakness Enumeration, March 2011....

N. Antunes, M. Vieira, Comparing the effectiveness of penetration testing and static code analysis on the detection of...

A. Doupè, M. Cova, G. Vigna, Why Johnny Can’t Pentest: an analysis of black-box web vulnerability scanners, in:...

Cited by (42)

An empirical study of vulnerability discovery methods over the past ten years
2022, Computers and Security
In recent years, hundreds of vulnerability discovery methods have been proposed and proven to be effective (i.e., Is Effective) by discovering thousands of vulnerabilities in real-world programs. However, the quantified ability to indicate how effective (i.e., How Effective) a method is still unknown. In this paper, we perform an empirical study to understand the effectiveness of these methods better. More specifically, we prepare a dataset of 124 papers focusing on vulnerability discovery from S&P, SECURITY, CCS, and NDSS over the past ten years. These papers cover four techniques, including static analysis, dynamic analysis, concolic analysis, and fuzzing, yielding 3970 vulnerabilities, of which 954 get CVE records. Then, we extract several attributes from the paper and categorize them into five dimensions, i.e., popularity, scalability, capability, severity, and diversity, which facilitate us to compare various techniques along these dimensions statistically. Moreover, taking these attributes into account, we propose a scoring method to quantify the effectiveness of a method, thereby indicating how effective a method is. The empirical study on dimensions and effectiveness scores reveals several findings that help better understand the effectiveness of vulnerability discovery techniques.
Evaluating and comparing memory error vulnerability detectors
2021, Information and Software Technology
Citation Excerpt :
Also, the evaluations and comparisons involved relatively small numbers of samples, and the vulnerability cases examined were limited to those detected by the studied tools—cases missed by the tools were not considered. Austin et al. [8,9] compared vulnerability detection techniques based on penetration testing and static analysis. Also, as in a few other studies [10,11], the comparisons were done by counting the number of vulnerabilities found, without referring to any ground truth.
Memory error vulnerabilities have been consequential and several well-known, open-source memory error vulnerability detectors exist, built on static and/or dynamic code analysis. Yet there is a lack of assessment of such detectors based on rigorous, quantitative accuracy and efficiency measures while not being limited to specific application domains.
Our study aims to assess and explain the strengths and weaknesses of state-of-the-art memory error vulnerability detectors based on static and/or dynamic code analysis, so as to inform tool selection by practitioners and future design of better detectors by researchers and tool developers.
We empirically evaluated and compared five state-of-the-art memory error vulnerability detectors against two benchmark datasets of 520 and 474 C/C++ programs, respectively. We conducted case studies to gain in-depth explanations of successes and failures of individual tools.
While generally fast, these detectors had largely varied accuracy across different vulnerability categories and moderate overall accuracy. Complex code (e.g., deep loops and recursions) and data (e.g., deeply embedded linked lists) structures appeared to be common, major barriers. Hybrid analysis did not always outperform purely static or dynamic analysis for memory error vulnerability detection. Yet the evaluation results were noticeably different between the two datasets used. Our case studies further explained the performance variations among these detectors and enabled additional actionable insights and recommendations for improvements.
There was no single most effective tool among the five studied. For future research, integrating different techniques is a promising direction, yet simply combining different classes of code analysis (e.g., static and dynamic) may not. For practitioners to choose right tools, making various tradeoffs (e.g., between precision and recall) might be inevitable.
Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities
2024, arXiv
An empirical study of vulnerabilities in edge frameworks to support security testing improvement
2023, Empirical Software Engineering
Metamorphic Testing for Web System Security
2023, IEEE Transactions on Software Engineering
Benchmarking Software Vulnerability Detection Techniques: A Survey
2023, arXiv

View all citing articles on Scopus

View full text

A comparison of the efficiency and effectiveness of vulnerability discovery techniques

Abstract

Context

Objective

Method

Results

Conclusion

Introduction

Section snippets

Background

Related work

Case study

Results

Analysis and discussion

Limitations

Conclusion

Future work

Acknowledgment

Software Security: Building Security

Static analysis for security

IEEE Security and Privacy

Software verification with BLAST