A comparison of the efficiency and effectiveness of vulnerability discovery techniques

https://doi.org/10.1016/j.infsof.2012.11.007Get rights and content

Abstract

Context

Security vulnerabilities discovered later in the development cycle are more expensive to fix than those discovered early. Therefore, software developers should strive to discover vulnerabilities as early as possible. Unfortunately, the large size of code bases and lack of developer expertise can make discovering software vulnerabilities difficult. A number of vulnerability discovery techniques are available, each with their own strengths.

Objective

The objective of this research is to aid in the selection of vulnerability discovery techniques by comparing the vulnerabilities detected by each and comparing their efficiencies.

Method

We conducted three case studies using three electronic health record systems to compare four vulnerability discovery techniques: exploratory manual penetration testing, systematic manual penetration testing, automated penetration testing, and automated static analysis.

Results

In our case study, we found empirical evidence that no single technique discovered every type of vulnerability. We discovered that the specific set of vulnerabilities identified by one tool was largely orthogonal to that of other tools. Systematic manual penetration testing found the most design flaws, while automated static analysis found the most implementation bugs. The most efficient discovery technique in terms of vulnerabilities discovered per hour was automated penetration testing.

Conclusion

The results show that employing a single technique for vulnerability discovery is insufficient for finding all types of vulnerabilities. Each technique identified only a subset of the vulnerabilities, which, for the most part were independent of each other. Our results suggest that in order to discover the greatest variety of vulnerability types, at least systematic manual penetration testing and automated static analysis should be performed.

Introduction

Results of decades of empirical research on effectiveness and efficiency of fault and failure discovery techniques, such as unit testing and inspections, can be used to provide evidence-based guidance on the use of these techniques. However, similar empirical results on the effectiveness and efficiency of vulnerability discovery techniques, such as security-focused automated static analysis and penetration testing are sparse. As a result, practitioners lack evidence-based guidance on the use of vulnerability discovery techniques.

In his book Software Security: Building Security In, Gary McGraw draws on his experience as a security researcher and claims: “Security problems evolve, grow, and mutate, just like species on a continent. No one technique or set of rules will ever perfectly detect all security vulnerabilities” [1]. Instead, he advocates using a variety of vulnerability discovery and prevention techniques throughout the software development lifecycle. McGraw’s claim, however, is based upon his experience and is not substantiated with empirical evidence. The objective of this research is to aid in the selection of vulnerability discovery techniques by comparing the vulnerabilities detected using each and comparing their efficiencies.

In previous work [2], the first author analyzed four vulnerability discovery techniques on two electronic health record (EHR) systems. The vulnerability discovery techniques analyzed included: exploratory manual penetration testing, systematic manual penetration testing, automated penetration testing, and automated static analysis. The first author used these four techniques on Tolven Electronic Clinician Health Record (eCHR)1 and OpenEMR.2 These two systems are currently used within the United States to store patient records. Tolven eCHR and OpenEMR are web-based systems. This paper adds the same analysis conducted on an additional EHR, PatientOS,3 performed by the second author. PatientOS is a custom client/server application written in Java. The new results corroborate the findings of the previous paper. Additionally, we examine the validity of the study in greater detail and offer new insights based on the PatientOS data. The second author was careful to follow the exact same procedure used by the first author and collaborated throughout the process with the first author to confirm agreement on classifications.

We classified the vulnerabilities found by these techniques as either implementation bugs or design flaws. Design flaws are high-level problems associated with the architecture of the software. An example of a design flaw is failure of authentication where necessary. Implementation bugs are code-level software problems, such as an instance of buffer overflow. Design flaws and implementation bugs occur with roughly equal frequency [1]. We then manually analyzed each discovered vulnerability to determine if the same vulnerability could be found by multiple vulnerability discovery techniques.

The contributions of this paper are as follows:

  • A comparison of the type and number of vulnerabilities found with exploratory manual penetration testing, systematic manual penetration testing, automated penetration testing, and automated static analysis.

  • Empirical evidence indicating which discovery techniques should be used to find implementation bugs and design flaw types of vulnerabilities.

  • An evaluation of the efficiency for each vulnerability discovery technique based on the metric vulnerabilities discovered per hour.

  • Additional data collected with a new EHR that served to further support previous work [2].

The rest of the paper is organized as follows; Section 2 provides background information requiring familiarity to understand the contents of the paper. Section 3 describes related work. Section 4 describes the case study and its methodology. Section 5 gives our study results. Section 6 discusses our results and provides analysis. Section 7 discusses our limitations. Section 8 summarizes our conclusions. Finally Section 9 talks about possible future work.

Section snippets

Background

This section describes the terminology used throughout the paper and gives background information on the types of security issues one may encounter when doing security analysis. The section also discusses work related to the vulnerability discovery techniques.

Related work

Researchers have already examined some differences between vulnerability discovery techniques. Autunes and Vieira compared the effectiveness of static analysis and automated penetration testing in detecting SQL injection vulnerabilities in web services [11]. They found more SQL injection vulnerabilities with static analysis than with automated penetration testing. They also found that both static analysis and automated penetration testing had a large false positive rate. In our work we focus on

Case study

This section describes the subjects chosen for our case studies as well as our methodology.

Results

In this section we present the results of our case study. We include analyses of the results of applying the four vulnerability discovery techniques against the three EHR systems. We include false positive rates with the automated tools. Finally, a metric intended to reflect overall efficiency, vulnerabilities discovered per hour, is discussed for each. This metric only takes the time spent by the authors reviewing potential vulnerabilities into account; it does not include time spent setting

Analysis and discussion

The first subsection discusses and analyzes the vulnerabilities discovered. The second subsection section discusses the efficiency of the various discovery techniques, while the third subsection talks about several vulnerabilities the discovery techniques discussed failed to find.

Limitations

Runeson and Höst outline four primary threats to validity that can help to determine the extent to which the results of a case study might be influenced by researcher bias [24]. Construct validity pertains to whether or not the measured results really serve to describe the constructs that they are intended to describe. We believe that our vulnerability data is perfectly descriptive of what we intended. Internal validity is threatened when chains of causality are hypothesized (e.g. A causes B),

Conclusion

In our case study we found that systematic manual penetration testing was more effective in finding vulnerabilities than exploratory manual penetration testing. We found that systematic manual penetration testing was the most effective at finding design flaw vulnerabilities. When compared to automated and manual penetration testing techniques, static analysis found different types of and more vulnerabilities. This result suggests that one cannot rely on static analysis or automated penetration

Future work

There are several areas branching from our work that could benefit from further study. This paper approached the question of vulnerability discovery with a top-down approach, ultimately allowing us to draw comparisons between the techniques. A bottom-up approach could reveal more about the ability of techniques to detect the exact same vulnerabilities. Additionally, it would be beneficial to know why different types of vulnerabilities were discovered by one technique and not another. Even

Acknowledgment

This work is supported by the Agency for Healthcare Research Quality. We also would like to thank the members of the Realsearch group for their invaluable feedback on our research and this paper.

References (24)

  • G. McGraw

    Software Security: Building Security

    (2006)
  • Andrew Austen, Laurie Williams, One technique is not enough: an empirical comparison of vulnerability discovery...
  • G. Stoneburner, A. Goguen, A. Feringa, Risk management guide for information security systems. NIST Special Publication...
  • D. Allan, Web application security: automated scanning versus manual penetration testing, IBM Rational Software,...
  • B. Chess et al.

    Static analysis for security

    IEEE Security and Privacy

    (2004)
  • W. Pugh, D. Hovemeyer, Finding bugs is easy, ACM SIGPLAN Notices, vol. 39, no. 12, December...
  • N. Ayewah, D. Hovemeyer, J.D. Morgenthaler, J. Penix, W. Pugh, Using static analysis to find bugs, in: IEEE Software,...
  • T. Henzinger et al.

    Software verification with BLAST

  • The Open Web Application Security Project, August 2010. HttpOnly....
  • The MITRE Corporation, Common Weakness Enumeration, March 2011....
  • N. Antunes, M. Vieira, Comparing the effectiveness of penetration testing and static code analysis on the detection of...
  • A. Doupè, M. Cova, G. Vigna, Why Johnny Can’t Pentest: an analysis of black-box web vulnerability scanners, in:...
  • Cited by (42)

    • Evaluating and comparing memory error vulnerability detectors

      2021, Information and Software Technology
      Citation Excerpt :

      Also, the evaluations and comparisons involved relatively small numbers of samples, and the vulnerability cases examined were limited to those detected by the studied tools—cases missed by the tools were not considered. Austin et al. [8,9] compared vulnerability detection techniques based on penetration testing and static analysis. Also, as in a few other studies [10,11], the comparisons were done by counting the number of vulnerabilities found, without referring to any ground truth.

    • Metamorphic Testing for Web System Security

      2023, IEEE Transactions on Software Engineering
    View all citing articles on Scopus
    View full text