ABSTRACT
Statistical debugging uses dynamic instrumentation and machine learning to identify predicates on program state that are strongly predictive of program failure. Prior approaches have only considered simple, atomic predicates such as the directions of branches or the return values of function calls. We enrich the predicate vocabulary by adding complex Boolean formulae derived from these simple predicates. We draw upon three-valued logic, static program structure, and statistical estimation techniques to efficiently sift through large numbers of candidate Boolean predicate formulae. We present qualitative and quantitative evidence that complex predicates are practical, precise, and informative. Furthermore, we demonstrate that our approach is robust in the face of incomplete data provided by the sparse random sampling that typifies postdeployment statistical debugging.
- H. Cleve and A. Zeller. Locating causes of program failures. In ICSE '05: Proceedings of the 27th International Conference on Software Engineering, pages 342--351, 2005. Google ScholarDigital Library
- N. Dodoo, A. Donovan, L. Lin, and M. D. Ernst. Selecting predicates for implications in program analysis, March 16, 2002. Draft. http://pag.csail.mit.edu/~mernst/pubs/invariants-implications.ps.Google Scholar
- M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering, 27(2):99--123, Feb. 2001. A previous version appeared in ICSE '99, Proceedings of the 21st International Conference on Software Engineering, pages 213--224, Los Angeles, CA, USA, May 19-21, 1999. Google ScholarDigital Library
- G. R. H. Do, S. Elbaum. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empirical Software Engineering: An International Journal, 10(4):405--435, 2005. Google ScholarDigital Library
- S. Hangal and M. S. Lam. Tracking down software bugs using automatic anomaly detection. In ICSE '02: Proceedings of the 24th International Conference on Software Engineering, pages 291--301, New York, NY, USA, 2002. ACM Press. Google ScholarDigital Library
- M. Haran, A. Karr, M. Last, A. Orso, A. A. Porter, A. Sanil, and S. Fouché. Techniques for classifying executions of deployed software to support software engineering tasks. IEEE Transactions on Software Engineering, 33(5):287--304, 2007. Google ScholarDigital Library
- M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In ICSE '94: Proceedings of the 16th International Conference on Software Engineering, pages 191--200, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press. Google ScholarDigital Library
- A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264--323, Sept. 1999. Google ScholarDigital Library
- J. A. Jones and M. J. Harrold. Empirical evaluation of the Tarantula automatic fault-localization technique. In ASE '05: Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering, pages 273--282, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
- J. A. Jones, A. Orso, and M. J. Harrold. GAMMATELLA: visualizing program-execution data for deployed software. Information Visualization, 3(3):173--188, 2004. Google ScholarDigital Library
- A. Lal, J. Lim, M. Polishchuk, and B. Liblit. Path optimization in programs and its application to debugging. In P. Sestoft, editor, 15th European Symposium on Programming, pages 246--263, Vienna, Austria, Mar. 2006. Springer. Google ScholarDigital Library
- B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, pages 141--154, New York, NY, USA, 2003. ACM Press. Google ScholarDigital Library
- B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable statistical bug isolation. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 15--26, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
- B. R. Liblit. Cooperative Bug Isolation. PhD thesis, University of California, Berkeley, Dec. 2004. Google ScholarDigital Library
- M. Litzkow, M. Livny, and M. Mutka. Condor - a hunter of idle workstations. In Proceedings of the 8th International Conference of Distributed Computing Systems, pages 104--111, June 1988.Google ScholarCross Ref
- C. Liu, X. Yan, L. Fei, J. Han, and S. P. Midkiff. SOBER: Statistical model-based bug localization. In ESEC/FSE-13: Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 286--295, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
- G. Rothermel, S. Elbaum, A. Kinneer, and H. Do. Software-artifact intrastructure repository. http://sir.unl.edu/portal/, Sept. 2006.Google Scholar
- E. W. Weisstein. Boolean function. MathWorld-A Wolfram Web Resource, Dec.20 2006. http://mathworld.wolfram.com/BooleanFunction.html.Google Scholar
- A. X. Zheng, M. I. Jordan, B. Liblit, M. Naik, and A. Aiken. Statistical debugging: simultaneous identification of multiple bugs. In ICML '06: Proceedings of the 23rd international conference on Machine learning, pages 1105--1112, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
Index Terms
- Statistical debugging using compound boolean predicates
Recommendations
Reasoning about ignorance and contradiction: many-valued logics versus epistemic logic
This paper tries to reinterpret three- and four-valued logics of partial ignorance and contradiction in the light of epistemic logic. First, we try to cast Kleene three-valued logic in the setting of a simplified form of epistemic logic. It is a two-...
Monotonic and nonmonotonic gentzen deduction systems for L3-valued propositional logic
AbstractA sequent is a pair (Γ, Δ), which is true under an assignment if either some formula in Γ is false, or some formula in Δ is true. In L3-valued propositional logic, a multisequent is a triple Δ|Θ|Γ, which is true under an assignment if either some ...
Three-valued Logics in Modal Logic
Every truth-functional three-valued propositional logic can be conservatively translated into the modal logic S5. We prove this claim constructively in two steps. First, we define a Translation Manual that converts any propositional formula of any three-...
Comments