skip to main content
10.1145/2635868.2635924acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Mining preconditions of APIs in large-scale code corpus

Published:11 November 2014Publication History

ABSTRACT

Modern software relies on existing application programming interfaces (APIs) from libraries. Formal specifications for the APIs enable many software engineering tasks as well as help developers correctly use them. In this work, we mine large-scale repositories of existing open-source software to derive potential preconditions for API methods. Our key idea is that APIs’ preconditions would appear frequently in an ultra-large code corpus with a large number of API usages, while project-specific conditions will occur less frequently. First, we find all client methods invoking APIs. We then compute a control dependence relation from each call site and mine the potential conditions used to reach those call sites. We use these guard conditions as a starting point to automatically infer the preconditions for each API. We analyzed almost 120 million lines of code from SourceForge and Apache projects to infer preconditions for the standard Java Development Kit (JDK) library. The results show that our technique can achieve high accuracy with recall from 75–80% and precision from 82–84%. We also found 5 preconditions missing from human written specifications. They were all confirmed by a specification expert. In a user study, participants found 82% of the mined preconditions as a good starting point for writing specifications. Using our mining result, we also built a benchmark of more than 4,000 precondition-related bugs.

References

  1. Code Contracts at Rise4Fun. http://rise4fun.com/CodeContracts.Google ScholarGoogle Scholar
  2. Java Path Finder (JPF). http://babelfish.arc.nasa.gov/trac/jpf.Google ScholarGoogle Scholar
  3. M. Acharya, T. Xie, J. Pei, and J. Xu. Mining api patterns as partial orders from source code: From usage scenarios to specifications. In Proceedings of the Symposium on Foundations of Software Engineering, ESEC-FSE ’07, pages 25–34. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Alur, P. ˇ Cerný, P. Madhusudan, and W. Nam. Synthesis of interface specifications for java classes. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’05, pages 98–109. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Ammons, R. Bodík, and J. R. Larus. Mining specifications. In Proceedings of the 29th ACM SIGPLAN SIGACT Symposium on Principles of Programming Languages, POPL ’02, pages 4–16. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Apache Software Foundation. http://apache.org.Google ScholarGoogle Scholar
  7. T. Ball and S. K. Rajamani. Automatically validating temporal safety properties of interfaces. In Proceedings of the 8th International SPIN Workshop on Model Checking of Software, SPIN ’01, pages 103–122. Springer-Verlag, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. Beschastnikh, Y. Brun, S. Schneider, M. Sloan, and M. D. Ernst. Leveraging existing instrumentation to automatically infer invariant-constrained models. In Proceedings of the 19th Symposium on Foundations of Software Engineering, ESEC/FSE ’11, pages 267–277. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Burdy, Y. Cheon, D. R. Cok, M. D. Ernst, J. R. Kiniry, G. T. Leavens, K. R. M. Leino, and E. Poll. An overview of JML tools and applications. Int. J. Softw. Tools Technol. Transf., 7(3):212–232, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R.-Y. Chang, A. Podgurski, and J. Yang. Discovering neglected conditions in software by mining dependence graphs. IEEE Trans. Softw. Eng., 34(5):579–596, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. C. Corbett, M. B. Dwyer, J. Hatcliff, S. Laubach, C. S. Păsăreanu, Robby, and H. Zheng. Bandera: Extracting finite-state models from Java source code. In Proceedings of the 22nd International Conference on Software Engineering, ICSE ’00, pages 439–448. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Cousot, R. Cousot, and F. Logozzo. Precondition inference from intermittent assertions and application to contracts on collections. In Proceedings of the 12th International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI’11, pages 150–168. Springer-Verlag, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Dallmeier, C. Lindig, and A. Zeller. Lightweight defect localization for java. In Proceedings of the 19th European Conference on Object-Oriented Programming, ECOOP’05, pages 528–550. Springer-Verlag, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. de Caso, V. Braberman, D. Garbervetsky, and S. Uchitel. Automated abstractions for contract validation. IEEE Trans. Softw. Eng., 38(1):141–162, Jan. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Deng, Robby, and J. Hatcliff. Kiasan: A verification and test-case generation framework for java based on symbolic execution. In Proceedings of the Second International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, ISOLA ’06, pages 137–. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as deviant behavior: A general approach to inferring errors in systems code. In Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles, SOSP’01, pages 57–72. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. In Proceedings of the 21st International Conference on Software Engineering, ICSE’99, pages 213–224. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319–349, July 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Fischer, R. Jhala, and R. Majumdar. Joining dataflow with predicates. In Proceedings of the 13th Symposium on Foundations of Software Engineering, ESEC/FSE-13, pages 227–236. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Flanagan, K. R. M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe, and R. Stata. Extended static checking for java. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pages 234–245. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Gabel and Z. Su. Javert: Fully automatic mining of general temporal properties from dynamic traces. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, SIGSOFT ’08/FSE-16, pages 339–349. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Godefroid, N. Klarlund, and K. Sen. Dart: Directed automated random testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’05, pages 213–223. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Gruska, A. Wasylkowski, and A. Zeller. Learning from 6,000 projects: Lightweight cross-project anomaly detection. In Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA ’10, pages 119–130. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. A. Henzinger, R. Jhala, and R. Majumdar. Permissive interfaces. In Proceedings of the 13th Symposium on Foundations of Software Engineering, ESEC/FSE-13, pages 31–40. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Hovemeyer and W. Pugh. Finding bugs is easy. SIGPLAN Not., 39(12):92–106, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. JML. Examples page. http://www.eecs.ucf.edu/ ~leavens/JML/examples.shtml, 2013.Google ScholarGoogle Scholar
  27. Jmol. http://sourceforge.net/projects/jmol/.Google ScholarGoogle Scholar
  28. T. Kremenek, P. Twohey, G. Back, A. Ng, and D. Engler. From uncertainty to belief: inferring the specification within. In Proceedings of the 7th symposium on Operating systems design and implementation, OSDI ’06, pages 161–176. USENIX Association, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. T. Leavens. The Java Modeling Language (JML). http://www.eecs.ucf.edu/~leavens/JML.Google ScholarGoogle Scholar
  30. Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code. IEEE Trans. Softw. Eng., 32(3):176–192, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Z. Li and Y. Zhou. Pr-miner: Automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 13th Symposium on Foundations of Software Engineering, ESEC/FSE-13, pages 306–315. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. H. Liskov and J. M. Wing. A behavioral notion of subtyping. ACM Trans. Program. Lang. Syst., 16(6):1811–1841, Nov. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Liu, E. Ye, and D. J. Richardson. Software library usage pattern extraction using a software model checker. In Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering, ASE ’06, pages 301–304. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Livshits and T. Zimmermann. Dynamine: finding common error patterns by mining software revision histories. SIGSOFT Softw. Eng. Notes, 30(5):296–305, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Lo and S. Maoz. Mining hierarchical scenario-based specifications. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, pages 359–370. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. Lo, L. Mariani, and M. Pezzè. Automatic steering of behavioral model inference. In Proceedings of the Symposium on Foundations of Software Engineering, ESEC/FSE ’09, pages 345–354. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Lorenzoli, L. Mariani, and M. Pezzè. Automatic generation of software behavioral models. In Proceedings of the 30th International Conference on Software Engineering, ICSE ’08, pages 501–510. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: helping to navigate the api jungle. In Proceedings of the 2005 conference on Programming language design and implementation, PLDI ’05, pages 48–61. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. L. Mariani and F. Pastore. Automated identification of failure causes in system logs. In Proceedings of the 2008 19th International Symposium on Software Reliability Engineering, ISSRE ’08, pages 117–126. IEEE Computer Society, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. A. Michail. Data mining library reuse patterns using generalized association rules. In Proceedings of the 22nd International Conference on Software Engineering, ICSE’00, pages 167–176. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. D. Nguyen, A. Marchetto, and P. Tonella. Automated oracles: An empirical study on cost and effectiveness. In Proceedings of the Symposium on Foundations of Software Engineering, ESEC/FSE 2013, pages 136–146. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen. Graph-based mining of multiple object usage patterns. In Proceedings of the Symposium on Foundations of Software Engineering, ESEC/FSE ’09, pages 383–392. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Pradel and T. R. Gross. Automatic generation of object usage specifications from large method traces. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, pages 371–382. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. K. Ramanathan, A. Grama, and S. Jagannathan. Static specification inference using predicate mining. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’07, pages 123–134. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. N. Sahavechaphan and K. Claypool. Xsnippet: Mining for sample code. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA ’06, pages 413–430. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. SeMoA - Secure Mobile Agents. http://sourceforge.net/projects/semoa/.Google ScholarGoogle Scholar
  47. S. Shoham, E. Yahav, S. Fink, and M. Pistoia. Static specification mining using automata-based abstractions. In Proceedings of the 2007 International Symposium on Software Testing and Analysis, ISSTA ’07, pages 174–184. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. SourceForge.net. http://sourceforge.net/.Google ScholarGoogle Scholar
  49. S. Thummalapenta and T. Xie. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering, ASE ’07, pages 204–213. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. Thummalapenta and T. Xie. Alattin: Mining alternative patterns for detecting neglected conditions. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, pages 283–294. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. A. Wasylkowski and A. Zeller. Mining temporal specifications from object usage. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, pages 295–306. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. A. Wasylkowski, A. Zeller, and C. Lindig. Detecting object usage anomalies. In Proceedings of the Symposium on Foundations of Software Engineering, ESEC-FSE ’07, pages 35–44. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Y. Wei, C. A. Furia, N. Kazmin, and B. Meyer. Inferring better contracts. In Proceedings of the 33rd International Conference on Software Engineering, ICSE ’11, pages 191–200. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. W. Weimer and G. C. Necula. Mining temporal specifications for error detection. In Proceedings of the 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’05, pages 461–476. Springer-Verlag, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. C. C. Williams and J. K. Hollingsworth. Automatic mining of source code repositories to improve bug finding techniques. IEEE Trans. Softw. Eng., 31(6):466–480, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. T. Xie and J. Pei. Mapo: Mining api usages from open source repositories. In Proceedings of the 2006 International Workshop on Mining Software Repositories, MSR ’06, pages 54–57. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Y. Xie and A. Aiken. Scalable error detection using boolean satisfiability. In Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’05, pages 351–363. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M. Das. Perracotta: Mining temporal api rules from imperfect traces. In Proceedings of the 28th International Conference on Software Engineering, ICSE ’06, pages 282–291. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. Mapo: Mining and recommending api usage patterns. In Proceedings of the 23rd European Conference on ECOOP 2009 — Object-Oriented Programming, pages 318–343. Springer-Verlag, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining preconditions of APIs in large-scale code corpus

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering
        November 2014
        856 pages
        ISBN:9781450330565
        DOI:10.1145/2635868

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 November 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate17of128submissions,13%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader