ABSTRACT
Modern software relies on existing application programming interfaces (APIs) from libraries. Formal specifications for the APIs enable many software engineering tasks as well as help developers correctly use them. In this work, we mine large-scale repositories of existing open-source software to derive potential preconditions for API methods. Our key idea is that APIs’ preconditions would appear frequently in an ultra-large code corpus with a large number of API usages, while project-specific conditions will occur less frequently. First, we find all client methods invoking APIs. We then compute a control dependence relation from each call site and mine the potential conditions used to reach those call sites. We use these guard conditions as a starting point to automatically infer the preconditions for each API. We analyzed almost 120 million lines of code from SourceForge and Apache projects to infer preconditions for the standard Java Development Kit (JDK) library. The results show that our technique can achieve high accuracy with recall from 75–80% and precision from 82–84%. We also found 5 preconditions missing from human written specifications. They were all confirmed by a specification expert. In a user study, participants found 82% of the mined preconditions as a good starting point for writing specifications. Using our mining result, we also built a benchmark of more than 4,000 precondition-related bugs.
- Code Contracts at Rise4Fun. http://rise4fun.com/CodeContracts.Google Scholar
- Java Path Finder (JPF). http://babelfish.arc.nasa.gov/trac/jpf.Google Scholar
- M. Acharya, T. Xie, J. Pei, and J. Xu. Mining api patterns as partial orders from source code: From usage scenarios to specifications. In Proceedings of the Symposium on Foundations of Software Engineering, ESEC-FSE ’07, pages 25–34. ACM, 2007. Google ScholarDigital Library
- R. Alur, P. ˇ Cerný, P. Madhusudan, and W. Nam. Synthesis of interface specifications for java classes. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’05, pages 98–109. ACM, 2005. Google ScholarDigital Library
- G. Ammons, R. Bodík, and J. R. Larus. Mining specifications. In Proceedings of the 29th ACM SIGPLAN SIGACT Symposium on Principles of Programming Languages, POPL ’02, pages 4–16. ACM, 2002. Google ScholarDigital Library
- Apache Software Foundation. http://apache.org.Google Scholar
- T. Ball and S. K. Rajamani. Automatically validating temporal safety properties of interfaces. In Proceedings of the 8th International SPIN Workshop on Model Checking of Software, SPIN ’01, pages 103–122. Springer-Verlag, 2001. Google ScholarDigital Library
- I. Beschastnikh, Y. Brun, S. Schneider, M. Sloan, and M. D. Ernst. Leveraging existing instrumentation to automatically infer invariant-constrained models. In Proceedings of the 19th Symposium on Foundations of Software Engineering, ESEC/FSE ’11, pages 267–277. ACM, 2011. Google ScholarDigital Library
- L. Burdy, Y. Cheon, D. R. Cok, M. D. Ernst, J. R. Kiniry, G. T. Leavens, K. R. M. Leino, and E. Poll. An overview of JML tools and applications. Int. J. Softw. Tools Technol. Transf., 7(3):212–232, June 2005. Google ScholarDigital Library
- R.-Y. Chang, A. Podgurski, and J. Yang. Discovering neglected conditions in software by mining dependence graphs. IEEE Trans. Softw. Eng., 34(5):579–596, 2008. Google ScholarDigital Library
- J. C. Corbett, M. B. Dwyer, J. Hatcliff, S. Laubach, C. S. Păsăreanu, Robby, and H. Zheng. Bandera: Extracting finite-state models from Java source code. In Proceedings of the 22nd International Conference on Software Engineering, ICSE ’00, pages 439–448. ACM, 2000. Google ScholarDigital Library
- P. Cousot, R. Cousot, and F. Logozzo. Precondition inference from intermittent assertions and application to contracts on collections. In Proceedings of the 12th International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI’11, pages 150–168. Springer-Verlag, 2011. Google ScholarDigital Library
- V. Dallmeier, C. Lindig, and A. Zeller. Lightweight defect localization for java. In Proceedings of the 19th European Conference on Object-Oriented Programming, ECOOP’05, pages 528–550. Springer-Verlag, 2005. Google ScholarDigital Library
- G. de Caso, V. Braberman, D. Garbervetsky, and S. Uchitel. Automated abstractions for contract validation. IEEE Trans. Softw. Eng., 38(1):141–162, Jan. 2012. Google ScholarDigital Library
- X. Deng, Robby, and J. Hatcliff. Kiasan: A verification and test-case generation framework for java based on symbolic execution. In Proceedings of the Second International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, ISOLA ’06, pages 137–. IEEE Computer Society, 2006. Google ScholarDigital Library
- D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as deviant behavior: A general approach to inferring errors in systems code. In Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles, SOSP’01, pages 57–72. ACM, 2001. Google ScholarDigital Library
- M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. In Proceedings of the 21st International Conference on Software Engineering, ICSE’99, pages 213–224. ACM, 1999. Google ScholarDigital Library
- J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319–349, July 1987. Google ScholarDigital Library
- J. Fischer, R. Jhala, and R. Majumdar. Joining dataflow with predicates. In Proceedings of the 13th Symposium on Foundations of Software Engineering, ESEC/FSE-13, pages 227–236. ACM, 2005. Google ScholarDigital Library
- C. Flanagan, K. R. M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe, and R. Stata. Extended static checking for java. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pages 234–245. ACM, 2002. Google ScholarDigital Library
- M. Gabel and Z. Su. Javert: Fully automatic mining of general temporal properties from dynamic traces. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, SIGSOFT ’08/FSE-16, pages 339–349. ACM, 2008. Google ScholarDigital Library
- P. Godefroid, N. Klarlund, and K. Sen. Dart: Directed automated random testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’05, pages 213–223. ACM, 2005. Google ScholarDigital Library
- N. Gruska, A. Wasylkowski, and A. Zeller. Learning from 6,000 projects: Lightweight cross-project anomaly detection. In Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA ’10, pages 119–130. ACM, 2010. Google ScholarDigital Library
- T. A. Henzinger, R. Jhala, and R. Majumdar. Permissive interfaces. In Proceedings of the 13th Symposium on Foundations of Software Engineering, ESEC/FSE-13, pages 31–40. ACM, 2005. Google ScholarDigital Library
- D. Hovemeyer and W. Pugh. Finding bugs is easy. SIGPLAN Not., 39(12):92–106, 2004. Google ScholarDigital Library
- JML. Examples page. http://www.eecs.ucf.edu/ ~leavens/JML/examples.shtml, 2013.Google Scholar
- Jmol. http://sourceforge.net/projects/jmol/.Google Scholar
- T. Kremenek, P. Twohey, G. Back, A. Ng, and D. Engler. From uncertainty to belief: inferring the specification within. In Proceedings of the 7th symposium on Operating systems design and implementation, OSDI ’06, pages 161–176. USENIX Association, 2006. Google ScholarDigital Library
- G. T. Leavens. The Java Modeling Language (JML). http://www.eecs.ucf.edu/~leavens/JML.Google Scholar
- Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code. IEEE Trans. Softw. Eng., 32(3):176–192, 2006. Google ScholarDigital Library
- Z. Li and Y. Zhou. Pr-miner: Automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 13th Symposium on Foundations of Software Engineering, ESEC/FSE-13, pages 306–315. ACM, 2005. Google ScholarDigital Library
- B. H. Liskov and J. M. Wing. A behavioral notion of subtyping. ACM Trans. Program. Lang. Syst., 16(6):1811–1841, Nov. 1994. Google ScholarDigital Library
- C. Liu, E. Ye, and D. J. Richardson. Software library usage pattern extraction using a software model checker. In Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering, ASE ’06, pages 301–304. IEEE Computer Society, 2006. Google ScholarDigital Library
- B. Livshits and T. Zimmermann. Dynamine: finding common error patterns by mining software revision histories. SIGSOFT Softw. Eng. Notes, 30(5):296–305, 2005. Google ScholarDigital Library
- D. Lo and S. Maoz. Mining hierarchical scenario-based specifications. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, pages 359–370. IEEE Computer Society, 2009. Google ScholarDigital Library
- D. Lo, L. Mariani, and M. Pezzè. Automatic steering of behavioral model inference. In Proceedings of the Symposium on Foundations of Software Engineering, ESEC/FSE ’09, pages 345–354. ACM, 2009. Google ScholarDigital Library
- D. Lorenzoli, L. Mariani, and M. Pezzè. Automatic generation of software behavioral models. In Proceedings of the 30th International Conference on Software Engineering, ICSE ’08, pages 501–510. ACM, 2008. Google ScholarDigital Library
- D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: helping to navigate the api jungle. In Proceedings of the 2005 conference on Programming language design and implementation, PLDI ’05, pages 48–61. ACM, 2005. Google ScholarDigital Library
- L. Mariani and F. Pastore. Automated identification of failure causes in system logs. In Proceedings of the 2008 19th International Symposium on Software Reliability Engineering, ISSRE ’08, pages 117–126. IEEE Computer Society, 2008. Google ScholarDigital Library
- A. Michail. Data mining library reuse patterns using generalized association rules. In Proceedings of the 22nd International Conference on Software Engineering, ICSE’00, pages 167–176. ACM, 2000. Google ScholarDigital Library
- C. D. Nguyen, A. Marchetto, and P. Tonella. Automated oracles: An empirical study on cost and effectiveness. In Proceedings of the Symposium on Foundations of Software Engineering, ESEC/FSE 2013, pages 136–146. ACM, 2013. Google ScholarDigital Library
- T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen. Graph-based mining of multiple object usage patterns. In Proceedings of the Symposium on Foundations of Software Engineering, ESEC/FSE ’09, pages 383–392. ACM, 2009. Google ScholarDigital Library
- M. Pradel and T. R. Gross. Automatic generation of object usage specifications from large method traces. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, pages 371–382. IEEE Computer Society, 2009. Google ScholarDigital Library
- M. K. Ramanathan, A. Grama, and S. Jagannathan. Static specification inference using predicate mining. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’07, pages 123–134. ACM, 2007. Google ScholarDigital Library
- N. Sahavechaphan and K. Claypool. Xsnippet: Mining for sample code. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA ’06, pages 413–430. ACM, 2006. Google ScholarDigital Library
- SeMoA - Secure Mobile Agents. http://sourceforge.net/projects/semoa/.Google Scholar
- S. Shoham, E. Yahav, S. Fink, and M. Pistoia. Static specification mining using automata-based abstractions. In Proceedings of the 2007 International Symposium on Software Testing and Analysis, ISSTA ’07, pages 174–184. ACM, 2007. Google ScholarDigital Library
- SourceForge.net. http://sourceforge.net/.Google Scholar
- S. Thummalapenta and T. Xie. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering, ASE ’07, pages 204–213. ACM, 2007. Google ScholarDigital Library
- S. Thummalapenta and T. Xie. Alattin: Mining alternative patterns for detecting neglected conditions. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, pages 283–294. IEEE Computer Society, 2009. Google ScholarDigital Library
- A. Wasylkowski and A. Zeller. Mining temporal specifications from object usage. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, pages 295–306. IEEE Computer Society, 2009. Google ScholarDigital Library
- A. Wasylkowski, A. Zeller, and C. Lindig. Detecting object usage anomalies. In Proceedings of the Symposium on Foundations of Software Engineering, ESEC-FSE ’07, pages 35–44. ACM, 2007. Google ScholarDigital Library
- Y. Wei, C. A. Furia, N. Kazmin, and B. Meyer. Inferring better contracts. In Proceedings of the 33rd International Conference on Software Engineering, ICSE ’11, pages 191–200. ACM, 2011. Google ScholarDigital Library
- W. Weimer and G. C. Necula. Mining temporal specifications for error detection. In Proceedings of the 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’05, pages 461–476. Springer-Verlag, 2005. Google ScholarDigital Library
- C. C. Williams and J. K. Hollingsworth. Automatic mining of source code repositories to improve bug finding techniques. IEEE Trans. Softw. Eng., 31(6):466–480, 2005. Google ScholarDigital Library
- T. Xie and J. Pei. Mapo: Mining api usages from open source repositories. In Proceedings of the 2006 International Workshop on Mining Software Repositories, MSR ’06, pages 54–57. ACM, 2006. Google ScholarDigital Library
- Y. Xie and A. Aiken. Scalable error detection using boolean satisfiability. In Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’05, pages 351–363. ACM, 2005. Google ScholarDigital Library
- J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M. Das. Perracotta: Mining temporal api rules from imperfect traces. In Proceedings of the 28th International Conference on Software Engineering, ICSE ’06, pages 282–291. ACM, 2006. Google ScholarDigital Library
- H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. Mapo: Mining and recommending api usage patterns. In Proceedings of the 23rd European Conference on ECOOP 2009 — Object-Oriented Programming, pages 318–343. Springer-Verlag, 2009. Google ScholarDigital Library
Index Terms
Mining preconditions of APIs in large-scale code corpus
Recommendations
Consensus-based mining of API preconditions in big code
SPLASH Companion 2015: Companion Proceedings of the 2015 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for HumanityFormal specifications for APIs help developers correctly use them and enable checker tools automatically verify their uses. However, formal specifications are not always available with released APIs. In this work, we demonstrate an approach for mining ...
IntelliJML: a JML plugin for IntelliJ IDEA
FTfJP '21: Proceedings of the 23rd ACM International Workshop on Formal Techniques for Java-like ProgramsJava code can be annotated with formal specifications using the Java Modelling Language (JML). Previous work has provided IDE plugins intended to help write JML, but mostly for the Eclipse IDE. We introduce IntelliJML, a JML plugin for IntelliJ IDEA, ...
OpenJML: JML for Java 7 by extending OpenJDK
NFM'11: Proceedings of the Third international conference on NASA Formal methodsThe Java Modeling Language is a widely used specification language for Java. However, the tool support has not kept pace with advances in the Java language. This paper describes OpenJML, an implementation of JML tools built by extending the OpenJDK Java ...
Comments