research-article

Free Access

Just Accepted

Generating Python Type Annotations from Type Inference: How Far Are We?

Authors:
Yimeng Guo

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China
View Profile

,
Zhifei Chen

Nanjing University of Science and Technology, Nanjing, China

Nanjing University of Science and Technology, Nanjing, China
View Profile

,
Lin Chen

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China
View Profile

,
Wenjie Xu

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China
View Profile

,
Yanhui Li

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China
View Profile

,
Yuming Zhou

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China
View Profile

,
Baowen Xu

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China
View Profile

Authors Info & Claims

ACM Transactions on Software Engineering and MethodologyAccepted on February 2024https://doi.org/10.1145/3652153

Online AM:11 March 2024Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

In recent years, dynamic languages such as Python have become popular due to their flexibility and productivity. The lack of static typing makes programs face the challenges of fixing type errors, early bug detection, and code understanding. To alleviate these issues, PEP 484 introduced optional type annotations for Python in 2014, but unfortunately, a large number of programs are still not annotated by developers. Annotation generation tools can utilize type inference techniques. However, several important aspects of type annotation generation are overlooked by existing works, such as in-depth effectiveness analysis, potential improvement exploration, and practicality evaluation. And it is unclear how far we have been and how far we can go.

In this paper, we set out to comprehensively investigate the effectiveness of type inference tools for generating type annotations, applying three categories of state-of-the-art tools on a carefully-cleaned dataset. First, we use a comprehensive set of metrics and categories, finding that existing tools have different effectiveness and cannot achieve both high accuracy and high coverage. Then, we summarize six patterns to present the limitations in type annotation generation. Next, we implement a simple but effective tool to demonstrate that existing tools can be improved in practice. Finally, we conduct a controlled experiment showing that existing tools can reduce the time spent annotating types and determine more precise types, but cannot reduce subjective difficulty. Our findings point out the limitations and improvement directions in type annotation generation, which can inspire future work.

References

Miltiadis Allamanis, Earl T Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural type hints. In Proceedings of the 41st acm sigplan conference on programming language design and implementation. 91–105.Google ScholarDigital Library
Jong-hoon An, Avik Chaudhuri, Jeffrey S Foster, and Michael Hicks. 2011. Dynamic inference of static types for Ruby. ACM SIGPLAN Notices 46, 1 (2011), 459–472.Google ScholarDigital Library
Christopher Anderson, Paola Giannini, and Sophia Drossopoulou. 2005. Towards type inference for JavaScript. In ECOOP, Vol. 5. Springer, 428–452.Google Scholar
Justus Bogner and Manuel Merkel. 2022. To type or not to type? a systematic comparison of the software quality of JavaScript and typescript applications on GitHub. In Proceedings of the 19th International Conference on Mining Software Repositories. 658–669.Google ScholarDigital Library
Brett Cannon. 2005. Localized type inference of atomic types in python. California Polytechnic State University.Google Scholar
Satish Chandra, Colin S Gordon, Jean-Baptiste Jeannin, Cole Schlesinger, Manu Sridharan, Frank Tip, and Youngil Choi. 2016. Type inference for static compilation of JavaScript. ACM SIGPLAN Notices 51, 10 (2016), 410–429.Google ScholarDigital Library
Winter Collin and Lownds Tony. 2006. PEP 3107 - Function Annotations. https://peps.python.org/pep-3107/.Google Scholar
Bas Cornelissen, Andy Zaidman, and Arie van Deursen. 2010. A controlled experiment for program comprehension through trace visualization. IEEE Transactions on Software Engineering 37, 3 (2010), 341–355.Google ScholarDigital Library
Siwei Cui, Gang Zhao, Zeyu Dai, Luochao Wang, Ruihong Huang, and Jeff Huang. 2021. Pyinfer: Deep learning semantic type inference for python variables. arXiv preprint arXiv:2106.14316(2021).Google Scholar
Santanu Kumar Dash, Miltiadis Allamanis, and Earl T Barr. 2018. Refinym: Using names to refine types. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 107–117.Google ScholarDigital Library
Luca Di Grazia and Michael Pradel. 2022. The evolution of type annotations in python: an empirical study. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 209–220.Google ScholarDigital Library
Michael Furr, Jong-hoon An, Jeffrey S Foster, and Michael Hicks. 2009. Static type inference for Ruby. In Proceedings of the 2009 ACM symposium on Applied Computing. 1859–1866.Google ScholarDigital Library
Zheng Gao, Christian Bird, and Earl T Barr. 2017. To type or not to type: quantifying detectable bugs in JavaScript. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 758–769.Google ScholarDigital Library
Google. 2022. Pytype. https://github.com/google/pytype.Google Scholar
Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, and Andreas Stefik. 2014. An empirical study on the impact of static typing on software maintainability. Empirical Software Engineering 19, 5 (2014), 1335–1382.Google ScholarDigital Library
Mostafa Hassan, Caterina Urban, Marco Eilers, and Peter Müller. 2018. MaxSMT-based type inference for Python 3. In Computer Aided Verification: 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part II 30. Springer, 12–19.Google ScholarCross Ref
Vincent J Hellendoorn, Christian Bird, Earl T Barr, and Miltiadis Allamanis. 2018. Deep learning type inference. In Proceedings of the 2018 26th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering. 152–162.Google ScholarDigital Library
Simon Holm Jensen, Anders Møller, and Peter Thiemann. 2009. Type analysis for javascript.. In SAS, Vol. 9. Springer, 238–255.Google Scholar
Kevin Jesse, Premkumar T Devanbu, and Toufique Ahmed. 2021. Learning type annotation: is big data enough?. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1483–1486.Google ScholarDigital Library
Jetbrains. 2020. Python developer survey conducted by jetbrains and python software foundation. https://www.jetbrains.com/lp/python-developers-survey-2020/.Google Scholar
Wuxia Jin, Dinghong Zhong, Zifan Ding, Ming Fan, and Ting Liu. 2021. Where to start: Studying type annotation practices in python. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 529–541.Google ScholarDigital Library
Faizan Khan, Boqi Chen, Daniel Varro, and Shane Mcintosh. 2021. An Empirical Study of Type-Related Defects in Python Projects. IEEE Transactions on Software Engineering 48, 8 (2021), 3145–3158.Google ScholarDigital Library
Robert V Krejcie and Daryle W Morgan. 1970. Determining sample size for research activities. Educational and psychological measurement 30, 3 (1970), 607–610.Google Scholar
Triet HM Le, Hao Chen, and Muhammad Ali Babar. 2020. Deep learning for source code modeling and generation: Models, applications, and challenges. ACM Computing Surveys (CSUR) 53, 3 (2020), 1–38.Google ScholarDigital Library
Jukka Lehtosalo. 2019. PEP 589 – TypedDict: Type hints for dictionaries with a fixed set of keys. https://www.python.org/dev/peps/pep-0589/.Google Scholar
Jukka Lehtosalo, G v Rossum, Ivan Levkivskyi, Michael J Sullivan, David Fisher, Greg Price, Michael Lee, N Seyfer, R Barton, S Ilinskiy, et al. 2017. Mypy-optional static typing for python. https://mypy-lang.org/.Google Scholar
Ivan Levkivskyi, Jukka Lehtosalo, and Łukasz Langa. 2017. PEP 544 – protocols: Structural subtyping (static duck typing). https://www.python.org/dev/peps/pep-0544/.Google Scholar
Magnus Madsen. 2015. Static analysis of dynamic languages. https://pure.au.dk/ws/files/85299449/Thesis.pdf(2015).Google Scholar
Eva Maia, Nelma Moreira, and Rogério Reis. 2012. A static type inference for python. Proc. of DYLA 5, 1 (2012), 1.Google Scholar
Rabee Sohail Malik, Jibesh Patra, and Michael Pradel. 2019. NL2Type: inferring JavaScript function types from natural language information. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 304–315.Google ScholarDigital Library
Microsoft. 2022. Pyright. https://github.com/microsoft/pyright.Google Scholar
Amir M Mir, Evaldas Latoškinas, and Georgios Gousios. 2021. Manytypes4py: A benchmark python dataset for machine learning-based type inference. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 585–589.Google ScholarCross Ref
Amir M Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4Py: Practical deep similarity learning-based type inference for Python. In Proceedings of the 44th International Conference on Software Engineering. 2241–2252.Google ScholarDigital Library
Yusuke Miyazaki, Taro Sekiyama, and Atsushi Igarashi. 2019. Dynamic type inference for gradual Hindley–Milner typing. Proceedings of the ACM on Programming Languages 3, POPL(2019), 1–29.Google Scholar
GitHub Octoverse. 2022. The 2022 state of open source software. https://octoverse.github.com/.Google Scholar
John-Paul Ore, Carrick Detweiler, and Sebastian Elbaum. 2021. An empirical study on type annotations: Accuracy, speed, and suggestion effectiveness. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 2(2021), 1–29.Google ScholarDigital Library
John-Paul Ore, Sebastian Elbaum, Carrick Detweiler, and Lambros Karkazis. 2018. Assessing the type annotation burden. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 190–201.Google ScholarDigital Library
Francisco Ortin, Jose Baltasar Garcia Perez-Schofield, and Jose Manuel Redondo. 2015. Towards a static type checker for python. In European Conference on Object-Oriented Programming (ECOOP), Scripts to Programs Workshop, STOP, Vol. 15. 1–2.Google Scholar
Irene Vlassi Pandi, Earl T Barr, Andrew D Gordon, and Charles Sutton. 2020. Opttyper: Probabilistic type inference by optimising logical and natural constraints. arXiv preprint arXiv:2004.00348(2020).Google Scholar
Jibesh Patra and Michael Pradel. 2022. Nalin: learning from runtime behavior to find name-value inconsistencies in jupyter notebooks. In Proceedings of the 44th International Conference on Software Engineering. 1469–1481.Google ScholarDigital Library
Zvonimir Pavlinovic. 2019. Leveraging Program Analysis for Type Inference. Ph. D. Dissertation. New York University.Google Scholar
Yun Peng, Cuiyun Gao, Zongjie Li, Bowei Gao, David Lo, Qirun Zhang, and Michael Lyu. 2022. Static inference meets deep learning: a hybrid type inference approach for python. In Proceedings of the 44th International Conference on Software Engineering. 2019–2030.Google ScholarDigital Library
Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. 2020. Typewriter: Neural type prediction with search-based validation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 209–220.Google ScholarDigital Library
Python. 2022. Pyre: A performant type-checker for Python 3. https://pyre-check.org/.Google Scholar
Jochen Quante. 2008. Do Dynamic Object Process Graphs Support Program Understanding?-A Controlled Experiment.. In 2008 16th IEEE international conference on program comprehension. IEEE, 73–82.Google ScholarDigital Library
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. 155–165.Google ScholarDigital Library
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from” big code”. ACM SIGPLAN Notices 50, 1 (2015), 111–124.Google ScholarDigital Library
Brianna M Ren, John Toman, T Stephen Strickland, and Jeffrey S Foster. 2013. The ruby type checker. In Proceedings of the 28th Annual ACM Symposium on Applied Computing. 1565–1572.Google ScholarDigital Library
Michael Salib. 2004. Starkiller: A static type inferencer and compiler for Python. Ph. D. Dissertation. Massachusetts Institute of Technology.Google Scholar
Guido Salvaneschi and Mira Mezini. 2016. Debugging for reactive programming. In Proceedings of the 38th International Conference on Software Engineering. 796–807.Google ScholarDigital Library
Sandro Schulze, Jörg Liebig, Janet Siegmund, and Sven Apel. 2013. Does the discipline of preprocessor annotations matter? A controlled experiment. In Proceedings of the 12th international conference on Generative programming: concepts & experiences. 65–74.Google ScholarDigital Library
Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. 2008. Introduction to information retrieval. Vol. 39. Cambridge University Press Cambridge.Google Scholar
Jeremy Siek and Walid Taha. 2007. Gradual typing for objects. In ECOOP 2007–Object-Oriented Programming: 21st European Conference, Berlin, Germany, July 30-August 3, 2007. Proceedings 21. Springer, 2–27.Google ScholarDigital Library
IEEE Spectrum. 2022. Top Programming Languages 2022. https://spectrum.ieee.org/top-programming-languages.Google Scholar
Andreas Stuchlik and Stefan Hanenberg. 2011. Static vs. dynamic type systems: An empirical study about the relationship between type casts and development time. In Proceedings of the 7th symposium on Dynamic languages. 97–106.Google ScholarDigital Library
Ke Sun, Yifan Zhao, Dan Hao, and Lu Zhang. 2022. Static Type Recommendation for Python. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.Google ScholarDigital Library
Yida Tao, Jindae Kim, Sunghun Kim, and Chang Xu. 2014. Automatically generated patches as debugging aids: a human study. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 64–74.Google ScholarDigital Library
Guido van Rossum, Jukka Lehtosalo, and Łukasz Langa. 2014. PEP 484 – Type Hints. https://www.python.org/dev/peps/pep-0484/.Google Scholar
Guido van Rossum and Ivan Levkivskyi. 2014. PEP 483 – The Theory of Type Hints. https://www.python.org/dev/peps/pep-0483/.Google Scholar
Guido van van Rossum. 2004. Adding Optional Static Typing to Python. https://www.artima.com/weblogs/viewpost.jsp?thread=85551.Google Scholar
Michael M Vitousek, Andrew M Kent, Jeremy G Siek, and Jim Baker. 2014. Design and evaluation of gradual typing for Python. In Proceedings of the 10th ACM Symposium on Dynamic languages. 45–56.Google ScholarDigital Library
Michael M Vitousek, Andrew M Kent, Jeremy G Siek, and Jim Baker. 2014. Design and evaluation of gradual typing for Python. In Proceedings of the 10th ACM Symposium on Dynamic languages. 45–56.Google ScholarDigital Library
Yin Wang. 2022. Pysonar2. https://github.com/yinwang0/pysonar2..Google Scholar
Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2020. Lambdanet: Probabilistic type inference using graph neural networks. arXiv preprint arXiv:2005.02161(2020).Google Scholar
Zhaogui Xu, Peng Liu, Xiangyu Zhang, and Baowen Xu. 2016. Python predictive analysis for bug detection. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 121–132.Google ScholarDigital Library
Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, and Baowen Xu. 2016. Python probabilistic type inference with natural language support. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 607–618.Google ScholarDigital Library
Yanyan Yan, Yang Feng, Hongcheng Fan, and Baowen Xu. 2023. DLInfer: Deep Learning with Static Slicing for Python Type Inference. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2009–2021.Google Scholar
Łukasz Langa. 2019. PEP 585 – type hinting generics in standard collections. https://www.python.org/dev/peps/pep-0585/.Google Scholar

Index Terms

Generating Python Type Annotations from Type Inference: How Far Are We?
1. General and reference
  1. Cross-computing tools and techniques
    1. Empirical studies
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Automatic programming
  2. Software notations and tools
    1. General programming languages
      1. Language features

Recommendations

Static Type Recommendation for Python
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Recently, Python has adopted optional type annotation to support type checking and program documentation. However, to enjoy the benefits, developers have to manually write type annotations, which is recognized to be a time-consuming task. To alleviate ...
Read More
The evolution of type annotations in python: an empirical study
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Type annotations and gradual type checkers attempt to reveal errors and facilitate maintenance in dynamically typed programming languages. Despite the availability of these features and tools, it is currently unclear how quickly developers are ...
Read More
ML^F: raising ML to the power of system F
ICFP '03: Proceedings of the eighth ACM SIGPLAN international conference on Functional programming

We propose a type system ML^F that generalizes ML with first-class polymorphism as in System F. Expressions may contain second-order type annotations. Every typable expression admits a principal type, which however depends on type annotations. Principal ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Software Engineering and Methodology Just Accepted
ISSN:1049-331X
EISSN:1557-7392
Table of Contents

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Online AM: 11 March 2024
- Accepted: 22 February 2024
- Revised: 5 December 2023
- Received: 13 February 2023
Published in tosem Just Accepted

Check for updates
Author Tags
Type annotations
type inference
Python
empirical study
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 234
  Total Downloads
- Downloads (Last 12 months)234
- Downloads (Last 6 weeks)77
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Generating Python Type Annotations from Type Inference: How Far Are We?

ACM Transactions on Software Engineering and Methodology

Abstract

References

Cited By

Index Terms

Recommendations

Static Type Recommendation for Python

The evolution of type annotations in python: an empirical study

ML^F: raising ML to the power of system F

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Generating Python Type Annotations from Type Inference: How Far Are We?

ACM Transactions on Software Engineering and Methodology

Abstract

References

Cited By

Index Terms

Recommendations

Static Type Recommendation for Python

The evolution of type annotations in python: an empirical study

MLF: raising ML to the power of system F

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

ML^F: raising ML to the power of system F