skip to main content
research-article
Free Access
Just Accepted

Generating Python Type Annotations from Type Inference: How Far Are We?

Online AM:11 March 2024Publication History
Skip Abstract Section

Abstract

In recent years, dynamic languages such as Python have become popular due to their flexibility and productivity. The lack of static typing makes programs face the challenges of fixing type errors, early bug detection, and code understanding. To alleviate these issues, PEP 484 introduced optional type annotations for Python in 2014, but unfortunately, a large number of programs are still not annotated by developers. Annotation generation tools can utilize type inference techniques. However, several important aspects of type annotation generation are overlooked by existing works, such as in-depth effectiveness analysis, potential improvement exploration, and practicality evaluation. And it is unclear how far we have been and how far we can go.

In this paper, we set out to comprehensively investigate the effectiveness of type inference tools for generating type annotations, applying three categories of state-of-the-art tools on a carefully-cleaned dataset. First, we use a comprehensive set of metrics and categories, finding that existing tools have different effectiveness and cannot achieve both high accuracy and high coverage. Then, we summarize six patterns to present the limitations in type annotation generation. Next, we implement a simple but effective tool to demonstrate that existing tools can be improved in practice. Finally, we conduct a controlled experiment showing that existing tools can reduce the time spent annotating types and determine more precise types, but cannot reduce subjective difficulty. Our findings point out the limitations and improvement directions in type annotation generation, which can inspire future work.

References

  1. Miltiadis Allamanis, Earl T Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural type hints. In Proceedings of the 41st acm sigplan conference on programming language design and implementation. 91–105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jong-hoon An, Avik Chaudhuri, Jeffrey S Foster, and Michael Hicks. 2011. Dynamic inference of static types for Ruby. ACM SIGPLAN Notices 46, 1 (2011), 459–472.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Christopher Anderson, Paola Giannini, and Sophia Drossopoulou. 2005. Towards type inference for JavaScript. In ECOOP, Vol.  5. Springer, 428–452.Google ScholarGoogle Scholar
  4. Justus Bogner and Manuel Merkel. 2022. To type or not to type? a systematic comparison of the software quality of JavaScript and typescript applications on GitHub. In Proceedings of the 19th International Conference on Mining Software Repositories. 658–669.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brett Cannon. 2005. Localized type inference of atomic types in python. California Polytechnic State University.Google ScholarGoogle Scholar
  6. Satish Chandra, Colin S Gordon, Jean-Baptiste Jeannin, Cole Schlesinger, Manu Sridharan, Frank Tip, and Youngil Choi. 2016. Type inference for static compilation of JavaScript. ACM SIGPLAN Notices 51, 10 (2016), 410–429.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Winter Collin and Lownds Tony. 2006. PEP 3107 - Function Annotations. https://peps.python.org/pep-3107/.Google ScholarGoogle Scholar
  8. Bas Cornelissen, Andy Zaidman, and Arie van Deursen. 2010. A controlled experiment for program comprehension through trace visualization. IEEE Transactions on Software Engineering 37, 3 (2010), 341–355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Siwei Cui, Gang Zhao, Zeyu Dai, Luochao Wang, Ruihong Huang, and Jeff Huang. 2021. Pyinfer: Deep learning semantic type inference for python variables. arXiv preprint arXiv:2106.14316(2021).Google ScholarGoogle Scholar
  10. Santanu Kumar Dash, Miltiadis Allamanis, and Earl T Barr. 2018. Refinym: Using names to refine types. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 107–117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Luca Di Grazia and Michael Pradel. 2022. The evolution of type annotations in python: an empirical study. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 209–220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Michael Furr, Jong-hoon An, Jeffrey S Foster, and Michael Hicks. 2009. Static type inference for Ruby. In Proceedings of the 2009 ACM symposium on Applied Computing. 1859–1866.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Zheng Gao, Christian Bird, and Earl T Barr. 2017. To type or not to type: quantifying detectable bugs in JavaScript. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 758–769.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Google. 2022. Pytype. https://github.com/google/pytype.Google ScholarGoogle Scholar
  15. Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, and Andreas Stefik. 2014. An empirical study on the impact of static typing on software maintainability. Empirical Software Engineering 19, 5 (2014), 1335–1382.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mostafa Hassan, Caterina Urban, Marco Eilers, and Peter Müller. 2018. MaxSMT-based type inference for Python 3. In Computer Aided Verification: 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part II 30. Springer, 12–19.Google ScholarGoogle ScholarCross RefCross Ref
  17. Vincent J Hellendoorn, Christian Bird, Earl T Barr, and Miltiadis Allamanis. 2018. Deep learning type inference. In Proceedings of the 2018 26th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering. 152–162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Simon Holm Jensen, Anders Møller, and Peter Thiemann. 2009. Type analysis for javascript.. In SAS, Vol.  9. Springer, 238–255.Google ScholarGoogle Scholar
  19. Kevin Jesse, Premkumar T Devanbu, and Toufique Ahmed. 2021. Learning type annotation: is big data enough?. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1483–1486.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jetbrains. 2020. Python developer survey conducted by jetbrains and python software foundation. https://www.jetbrains.com/lp/python-developers-survey-2020/.Google ScholarGoogle Scholar
  21. Wuxia Jin, Dinghong Zhong, Zifan Ding, Ming Fan, and Ting Liu. 2021. Where to start: Studying type annotation practices in python. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 529–541.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Faizan Khan, Boqi Chen, Daniel Varro, and Shane Mcintosh. 2021. An Empirical Study of Type-Related Defects in Python Projects. IEEE Transactions on Software Engineering 48, 8 (2021), 3145–3158.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Robert V Krejcie and Daryle W Morgan. 1970. Determining sample size for research activities. Educational and psychological measurement 30, 3 (1970), 607–610.Google ScholarGoogle Scholar
  24. Triet HM Le, Hao Chen, and Muhammad Ali Babar. 2020. Deep learning for source code modeling and generation: Models, applications, and challenges. ACM Computing Surveys (CSUR) 53, 3 (2020), 1–38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jukka Lehtosalo. 2019. PEP 589 – TypedDict: Type hints for dictionaries with a fixed set of keys. https://www.python.org/dev/peps/pep-0589/.Google ScholarGoogle Scholar
  26. Jukka Lehtosalo, G v Rossum, Ivan Levkivskyi, Michael J Sullivan, David Fisher, Greg Price, Michael Lee, N Seyfer, R Barton, S Ilinskiy, et al. 2017. Mypy-optional static typing for python. https://mypy-lang.org/.Google ScholarGoogle Scholar
  27. Ivan Levkivskyi, Jukka Lehtosalo, and Łukasz Langa. 2017. PEP 544 – protocols: Structural subtyping (static duck typing). https://www.python.org/dev/peps/pep-0544/.Google ScholarGoogle Scholar
  28. Magnus Madsen. 2015. Static analysis of dynamic languages. https://pure.au.dk/ws/files/85299449/Thesis.pdf(2015).Google ScholarGoogle Scholar
  29. Eva Maia, Nelma Moreira, and Rogério Reis. 2012. A static type inference for python. Proc. of DYLA 5, 1 (2012), 1.Google ScholarGoogle Scholar
  30. Rabee Sohail Malik, Jibesh Patra, and Michael Pradel. 2019. NL2Type: inferring JavaScript function types from natural language information. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 304–315.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Microsoft. 2022. Pyright. https://github.com/microsoft/pyright.Google ScholarGoogle Scholar
  32. Amir M Mir, Evaldas Latoškinas, and Georgios Gousios. 2021. Manytypes4py: A benchmark python dataset for machine learning-based type inference. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 585–589.Google ScholarGoogle ScholarCross RefCross Ref
  33. Amir M Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4Py: Practical deep similarity learning-based type inference for Python. In Proceedings of the 44th International Conference on Software Engineering. 2241–2252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yusuke Miyazaki, Taro Sekiyama, and Atsushi Igarashi. 2019. Dynamic type inference for gradual Hindley–Milner typing. Proceedings of the ACM on Programming Languages 3, POPL(2019), 1–29.Google ScholarGoogle Scholar
  35. GitHub Octoverse. 2022. The 2022 state of open source software. https://octoverse.github.com/.Google ScholarGoogle Scholar
  36. John-Paul Ore, Carrick Detweiler, and Sebastian Elbaum. 2021. An empirical study on type annotations: Accuracy, speed, and suggestion effectiveness. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 2(2021), 1–29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. John-Paul Ore, Sebastian Elbaum, Carrick Detweiler, and Lambros Karkazis. 2018. Assessing the type annotation burden. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 190–201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Francisco Ortin, Jose Baltasar Garcia Perez-Schofield, and Jose Manuel Redondo. 2015. Towards a static type checker for python. In European Conference on Object-Oriented Programming (ECOOP), Scripts to Programs Workshop, STOP, Vol.  15. 1–2.Google ScholarGoogle Scholar
  39. Irene Vlassi Pandi, Earl T Barr, Andrew D Gordon, and Charles Sutton. 2020. Opttyper: Probabilistic type inference by optimising logical and natural constraints. arXiv preprint arXiv:2004.00348(2020).Google ScholarGoogle Scholar
  40. Jibesh Patra and Michael Pradel. 2022. Nalin: learning from runtime behavior to find name-value inconsistencies in jupyter notebooks. In Proceedings of the 44th International Conference on Software Engineering. 1469–1481.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zvonimir Pavlinovic. 2019. Leveraging Program Analysis for Type Inference. Ph. D. Dissertation. New York University.Google ScholarGoogle Scholar
  42. Yun Peng, Cuiyun Gao, Zongjie Li, Bowei Gao, David Lo, Qirun Zhang, and Michael Lyu. 2022. Static inference meets deep learning: a hybrid type inference approach for python. In Proceedings of the 44th International Conference on Software Engineering. 2019–2030.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. 2020. Typewriter: Neural type prediction with search-based validation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 209–220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Python. 2022. Pyre: A performant type-checker for Python 3. https://pyre-check.org/.Google ScholarGoogle Scholar
  45. Jochen Quante. 2008. Do Dynamic Object Process Graphs Support Program Understanding?-A Controlled Experiment.. In 2008 16th IEEE international conference on program comprehension. IEEE, 73–82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. 155–165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from” big code”. ACM SIGPLAN Notices 50, 1 (2015), 111–124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Brianna M Ren, John Toman, T Stephen Strickland, and Jeffrey S Foster. 2013. The ruby type checker. In Proceedings of the 28th Annual ACM Symposium on Applied Computing. 1565–1572.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Michael Salib. 2004. Starkiller: A static type inferencer and compiler for Python. Ph. D. Dissertation. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  50. Guido Salvaneschi and Mira Mezini. 2016. Debugging for reactive programming. In Proceedings of the 38th International Conference on Software Engineering. 796–807.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Sandro Schulze, Jörg Liebig, Janet Siegmund, and Sven Apel. 2013. Does the discipline of preprocessor annotations matter? A controlled experiment. In Proceedings of the 12th international conference on Generative programming: concepts & experiences. 65–74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. 2008. Introduction to information retrieval. Vol.  39. Cambridge University Press Cambridge.Google ScholarGoogle Scholar
  53. Jeremy Siek and Walid Taha. 2007. Gradual typing for objects. In ECOOP 2007–Object-Oriented Programming: 21st European Conference, Berlin, Germany, July 30-August 3, 2007. Proceedings 21. Springer, 2–27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. IEEE Spectrum. 2022. Top Programming Languages 2022. https://spectrum.ieee.org/top-programming-languages.Google ScholarGoogle Scholar
  55. Andreas Stuchlik and Stefan Hanenberg. 2011. Static vs. dynamic type systems: An empirical study about the relationship between type casts and development time. In Proceedings of the 7th symposium on Dynamic languages. 97–106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Ke Sun, Yifan Zhao, Dan Hao, and Lu Zhang. 2022. Static Type Recommendation for Python. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Yida Tao, Jindae Kim, Sunghun Kim, and Chang Xu. 2014. Automatically generated patches as debugging aids: a human study. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 64–74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Guido van Rossum, Jukka Lehtosalo, and Łukasz Langa. 2014. PEP 484 – Type Hints. https://www.python.org/dev/peps/pep-0484/.Google ScholarGoogle Scholar
  59. Guido van Rossum and Ivan Levkivskyi. 2014. PEP 483 – The Theory of Type Hints. https://www.python.org/dev/peps/pep-0483/.Google ScholarGoogle Scholar
  60. Guido van van Rossum. 2004. Adding Optional Static Typing to Python. https://www.artima.com/weblogs/viewpost.jsp?thread=85551.Google ScholarGoogle Scholar
  61. Michael M Vitousek, Andrew M Kent, Jeremy G Siek, and Jim Baker. 2014. Design and evaluation of gradual typing for Python. In Proceedings of the 10th ACM Symposium on Dynamic languages. 45–56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Michael M Vitousek, Andrew M Kent, Jeremy G Siek, and Jim Baker. 2014. Design and evaluation of gradual typing for Python. In Proceedings of the 10th ACM Symposium on Dynamic languages. 45–56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yin Wang. 2022. Pysonar2. https://github.com/yinwang0/pysonar2..Google ScholarGoogle Scholar
  64. Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2020. Lambdanet: Probabilistic type inference using graph neural networks. arXiv preprint arXiv:2005.02161(2020).Google ScholarGoogle Scholar
  65. Zhaogui Xu, Peng Liu, Xiangyu Zhang, and Baowen Xu. 2016. Python predictive analysis for bug detection. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 121–132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, and Baowen Xu. 2016. Python probabilistic type inference with natural language support. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 607–618.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Yanyan Yan, Yang Feng, Hongcheng Fan, and Baowen Xu. 2023. DLInfer: Deep Learning with Static Slicing for Python Type Inference. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2009–2021.Google ScholarGoogle Scholar
  68. Łukasz Langa. 2019. PEP 585 – type hinting generics in standard collections. https://www.python.org/dev/peps/pep-0585/.Google ScholarGoogle Scholar

Index Terms

  1. Generating Python Type Annotations from Type Inference: How Far Are We?

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Software Engineering and Methodology
          ACM Transactions on Software Engineering and Methodology Just Accepted
          ISSN:1049-331X
          EISSN:1557-7392
          Table of Contents

          Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Online AM: 11 March 2024
          • Accepted: 22 February 2024
          • Revised: 5 December 2023
          • Received: 13 February 2023
          Published in tosem Just Accepted

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)234
          • Downloads (Last 6 weeks)77

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader